Title : 
The influence of word normalization in English document clustering
         
        
            Author : 
Han, Pu ; Shen, Si ; Wang, Dongbo ; Liu, Yanyun
         
        
            Author_Institution : 
School of Information Management, Nanjing University, Nanjing, China
         
        
        
        
        
        
        
            Abstract : 
Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper makes a comprehensive study about two stemming algorithms and one lemmatization algorithm. According to the experimental result, it shows that the performance is not remarkable, compared with Snowball stemmer and Stanford lemmatization, Porter stemmer can make a better performance in entropy and purity.
         
        
            Keywords : 
document clustering; lemmatization; stemming;
         
        
        
        
            Conference_Titel : 
Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on
         
        
            Conference_Location : 
Zhangjiajie, China
         
        
            Print_ISBN : 
978-1-4673-0088-9
         
        
        
            DOI : 
10.1109/CSAE.2012.6272740