DocumentCode :
1926005
Title :
Numerically stable, single-pass, parallel statistics algorithms
Author :
Bennett, Janine ; Grout, Ray ; Pébay, Philippe ; Roe, Diana ; Thompson, David
Author_Institution :
Sandia Nat. Labs., Livermore, CA, USA
fYear :
2009
fDate :
Aug. 31 2009-Sept. 4 2009
Firstpage :
1
Lastpage :
8
Abstract :
Statistical analysis is widely used for countless scientific applications in order to analyze and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. In this paper we derive a series of formulas that allow for single-pass, yet numerically robust, pairwise parallel and incremental updates of both arbitrary-order centered statistical moments and co-moments. Using these formulas, we have built an open source parallel statistics framework that performs principal component analysis (PCA) in addition to computing descriptive, correlative, and multi-correlative statistics. The results of a scalability study demonstrate numerically stable, near-optimal scalability on up to 128 processes and results are presented in which the statistical framework is used to process large-scale turbulent combustion simulation data with 1500 processes.
Keywords :
data handling; parallel algorithms; principal component analysis; numerical stability; numerically stable parallel statistics algorithms; open source parallel statistics; principal component analysis; single-pass parallel statistics algorithms; statistical analysis package; turbulent combustion simulation data; Concurrent computing; Large-scale systems; Numerical stability; Packaging; Principal component analysis; Robustness; Scalability; Statistical analysis; Statistical distributions; Statistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Conference_Location :
New Orleans, LA
ISSN :
1552-5244
Print_ISBN :
978-1-4244-5011-4
Electronic_ISBN :
1552-5244
Type :
conf
DOI :
10.1109/CLUSTR.2009.5289161
Filename :
5289161
Link To Document :
بازگشت