Title : 
Multi-objective clustering ensemble for high-dimensional data based on Strength Pareto Evolutionary Algorithm (SPEA-II)
         
        
            Author : 
Abdul Wahid;Xiaoying Gao;Peter Andreae
         
        
            Author_Institution : 
Victoria University of Wellington, New Zealand
         
        
        
        
        
            Abstract : 
Clustering is one of the fundamental data analysis techniques, which aims to find distinct groups of similar objects and discovers hidden structures in data. A recent clustering approach, clustering ensembles tries to derive an improved clustering solution based on previously generated different candidate clustering solutions. Clustering ensembles have two steps: generating multiple candidate clustering solutions from the data and forming a final clustering solution from previously generated candidate clustering solutions. A problem of the first step is the text representation, where word frequencies are often used as features. Other semantic information of the text such as topics, hypertext, etc are ignored. The problem for the second step is that the current popular median partition approach selects one clustering solution from previously generated candidate clustering solutions. A common clustering ensemble approach uses word frequencies as features to represent text data (documents). However, documents usually contain semantically rich information i.e. words, hypertext, titles, topics etc. The cluster ensemble approach ignores the semantic information of the documents and hence is prone to produce futile groupings of the documents. In this research work, we present a new multi-objective clustering ensemble method based on Strength Pareto Evolutionary Algorithm (SPEA-II). Our method utilizes the semantic information (rich features) to address the first problem of clustering ensembles. The cluster oriented evolutionary approach which derives the final clustering solution by selecting better quality clusters is in the second step of our method to address the second problem. The results show that our new method provides better results than other clustering ensemble methods.
         
        
            Keywords : 
"Evolutionary computation","Clustering methods","Linear programming","Optimization","Semantics","Clustering algorithms","Sociology"
         
        
        
            Conference_Titel : 
Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
         
        
            Print_ISBN : 
978-1-4673-8272-4
         
        
        
            DOI : 
10.1109/DSAA.2015.7344795