DocumentCode
85808
Title
Systematic Debugging Methods for Large-Scale HPC Computational Frameworks
Author
Humphrey, Alan ; Qingyu Meng ; Berzins, Martin ; Caminha B. de Oliveira, Diego ; Rakamaric, Zvonimir ; Gopalakrishnan, Ganesh
Author_Institution
Univ. of Utah, Salt Lake City, UT, USA
Volume
16
Issue
3
fYear
2014
fDate
May-June 2014
Firstpage
48
Lastpage
56
Abstract
Parallel computational frameworks for high-performance computing are central to the advancement of simulation-based studies in science and engineering. Unfortunately, finding and fixing bugs in these frameworks can be extremely time consuming. Left unchecked, these bugs can drastically diminish the amount of new science that can be performed. This article presents a systematic study of the Uintah Computational Framework and approaches to debug it more incisively. A key insight is to leverage the modular structure of Uintah, which lends itself to systematic debugging. In particular, the authors have developed a new approach based on coalesced stack trace graphs (CSTG) that summarize the system behavior in terms of key control flows manifested through function invocation chains. They illustrate several scenarios for how CSTGs could help efficiently localize bugs, and present a case study of how they found and fixed a real Uintah bug using CSTGs.
Keywords
graph theory; parallel programming; program debugging; CSTG; Uintah bug; Uintah computational framework; bugs fixing; bugs localization; coalesced stack trace graphs; engineering; function invocation chains; high-performance computing; large-scale HPC computational frameworks; modular structure; parallel computational frameworks; science; simulation-based studies; system behavior; systematic debugging methods; Computational modeling; Computer bugs; Debugging; Runtime; Scientific computing; Software development; Systematics; computational modeling and frameworks; debugging aids; parallel programming; reliability; scientific computing;
fLanguage
English
Journal_Title
Computing in Science & Engineering
Publisher
ieee
ISSN
1521-9615
Type
jour
DOI
10.1109/MCSE.2014.11
Filename
6729885
Link To Document