Title :
A database-centric approach to system management in the Blue Gene/L supercomputer
Author :
Bellofatto, Ralph ; Crumley, Paul G. ; Darrington, David ; Knudson, Brant ; Megerian, Mark ; Moreira, Jose E. ; Ohmacht, A.S. ; Orbeck, John ; Reed, Don ; Stewart, Greg
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY
Abstract :
In designing the management system for Blue Gene/L, we adopted a database-centric approach. All configuration and operational data for a particular Blue Gene/L system are stored in a relational database that is kept in the system´s service node. The database also serves as the communication bus for the various processes implementing the management system. This design offers many advantages, including the ability to use SQL commands to retrieve reliability, availability, and serviceability (RAS) information about the system. Information about machine partitioning and user jobs can be obtained the same way. Leveraging the database, we have developed a Web interface for system management. This management system has been successfully implemented and deployed in all 19 Blue Gene/L installations at the time of this writing
Keywords :
SQL; configuration management; parallel machines; relational databases; Blue Gene/L supercomputer; SQL commands; communication bus; machine partitioning; relational database; system availability; system management; system reliability; system serviceability; Application software; Computer networks; Control systems; Ethernet networks; File servers; Hardware; High performance computing; Relational databases; Supercomputers; Technology management;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Conference_Location :
Rhodes Island
Print_ISBN :
1-4244-0054-6
DOI :
10.1109/IPDPS.2006.1639697