DocumentCode :
2815034
Title :
Detecting anomalies in high-performance parallel programs
Author :
Florez, German ; Liu, Zhen ; Bridges, Susan ; Vaughn, Rayford ; Skjellum, Anthony
Author_Institution :
The Center for Comput. Security Res., Mississippi State Univ., MS, USA
Volume :
2
fYear :
2004
fDate :
5-7 April 2004
Firstpage :
30
Abstract :
Message passing interface (MPI) is an effective programming technique for implementing parallel programs for distributed computation. As these applications run, a number of different types of irregularities can occur including those that result from intrusions, user misbehavior, corrupted data, deadlocks or failure of cluster components. We perform a comparison of different artificial intelligence (AI) techniques that can be used to implement a lightweight monitoring and detection system for parallel applications on a cluster of Linux workstations. We study the accuracy and performance of deterministic and stochastic algorithms when we observe the flow of function library and OS system calls of parallel programs written with MPI. We demonstrate that monitoring of MPI programs can be achieved with high accuracy and in some cases with a 0% false positive rate in real-time, and we show that the added computational load on each node is small. Finally we demonstrate that simple deterministic methods perform poorly when the program flow grows in size and variety, and that more complex methods are required.
Keywords :
Unix; application program interfaces; artificial intelligence; deterministic algorithms; hidden Markov models; message passing; neural nets; parallel programming; system monitoring; workstation clusters; Linux workstation clusters; MPI program monitoring; OS system calls; anomaly detection; artificial intelligence techniques; deterministic algorithms; distributed computation; function library; high-performance parallel programs; lightweight monitoring detection system; message passing interface; parallel applications; stochastic algorithms; Artificial intelligence; Computer interfaces; Concurrent computing; Condition monitoring; Distributed computing; Linux; Message passing; Parallel programming; System recovery; Workstations;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on
Print_ISBN :
0-7695-2108-8
Type :
conf
DOI :
10.1109/ITCC.2004.1286585
Filename :
1286585
Link To Document :
بازگشت