Title :
Numerically stable, single-pass, parallel statistics algorithms
Author :
Bennett, Janine ; Grout, Ray ; Pébay, Philippe ; Roe, Diana ; Thompson, David
Author_Institution :
Sandia Nat. Labs., Livermore, CA, USA
fDate :
Aug. 31 2009-Sept. 4 2009
Abstract :
Statistical analysis is widely used for countless scientific applications in order to analyze and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. In this paper we derive a series of formulas that allow for single-pass, yet numerically robust, pairwise parallel and incremental updates of both arbitrary-order centered statistical moments and co-moments. Using these formulas, we have built an open source parallel statistics framework that performs principal component analysis (PCA) in addition to computing descriptive, correlative, and multi-correlative statistics. The results of a scalability study demonstrate numerically stable, near-optimal scalability on up to 128 processes and results are presented in which the statistical framework is used to process large-scale turbulent combustion simulation data with 1500 processes.
Keywords :
data handling; parallel algorithms; principal component analysis; numerical stability; numerically stable parallel statistics algorithms; open source parallel statistics; principal component analysis; single-pass parallel statistics algorithms; statistical analysis package; turbulent combustion simulation data; Concurrent computing; Large-scale systems; Numerical stability; Packaging; Principal component analysis; Robustness; Scalability; Statistical analysis; Statistical distributions; Statistics;
Conference_Titel :
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4244-5011-4
Electronic_ISBN :
1552-5244
DOI :
10.1109/CLUSTR.2009.5289161