Title : 
Scalable bootstrap clustering for massive data
         
        
            Author : 
Haocheng Wang ; Fuzhen Zhuang ; Xiang Ao ; Qing He ; Zhongzhi Shi
         
        
            Author_Institution : 
Key Lab. of Intell. Inf. Process., Inst. of Comput. Technol., Beijing, China
         
        
        
            fDate : 
June 30 2014-July 2 2014
         
        
        
        
            Abstract : 
The bootstrap provides a simple and powerful means of improving the accuracy of clustering. However, for today´s increasingly large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. In this paper we introduce the Bag of Little Bootstraps Clustering (BLBC), a new procedure which utilizes the Bag of Little Bootstraps technique to obtain a robust, computationally efficient means of clustering for massive data. Moreover, BLBC is suited to implementation on modern parallel and distributed computing architectures which are often used to process large datasets. We investigate empirically the performance characteristics of BLBC and compare to the performances of existing methods via experiments on simulated data and real data. The results show that BLBC has a significantly more favorable computational profile than the bootstrap based clustering while maintaining good statistical correctness.
         
        
            Keywords : 
parallel processing; pattern clustering; BLBC; bag of little bootstraps clustering technique; distributed computing architectures; massive data; parallel computing architectures; scalable bootstrap clustering; Accuracy; Clustering algorithms; Computer architecture; Distributed computing; Partitioning algorithms; Program processors; Vectors; bag of little boot-straps; clustering; data mining; machine learning; parallel and distributed computing;
         
        
        
        
            Conference_Titel : 
Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2014 15th IEEE/ACIS International Conference on
         
        
            Conference_Location : 
Las Vegas, NV
         
        
        
            DOI : 
10.1109/SNPD.2014.6888693