Title :
Fast and robust bootstrap in analysing large multivariate datasets
Author :
Basiri, Shahab ; Ollila, Esa ; Koivunen, Visa
Author_Institution :
Dept. of Signal Process. & Acoust., Aalto Univ., Aalto, Finland
Abstract :
In this paper we address the problem of performing statistical inference for large scale data sets. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single node. We propose a scalable, statistically robust and computationally efficient bootstrap method compatible with distributed processing and storage systems. Bootstrapping is performed on multiple smaller distinct subsets of data similarly to the bag of little bootstrap method (BLB) [1]. For each bootstrap replica drawn from distinct data subsets, a computationally efficient fixed-point estimation equation is solved. The proposed bootstrap method facilitates using highly robust statistical methods in analyzing large scale data sets. Significant savings in computation is achieved since the method does not require recomputing the estimator for each bootstrap sample but it is done analytically using a smart approximation. Simulation examples demonstrate the usefulness and validity of the method for bootstrap analysis of large data sets.
Keywords :
distributed processing; estimation theory; statistical analysis; BLB method; bag of little bootstrap method; bootstrap method; bootstrap replica; distributed processing; fixed-point estimation equation; multivariate dataset; statistical inference; statistical method; storage system; Big data; Complexity theory; Distributed databases; Mathematical model; Robustness; Statistical analysis; Uncertainty; bag of little bootstraps; big data; bootstrap; distributed computation; fast and robust bootstrap; robust estimation;
Conference_Titel :
Signals, Systems and Computers, 2014 48th Asilomar Conference on
Print_ISBN :
978-1-4799-8295-0
DOI :
10.1109/ACSSC.2014.7094385