Title : 
New Features Acquisition of Text with Cloud-LDA Model
         
        
            Author : 
Maoyuan Zhang ; Fanli He ; Shuiyin Chen
         
        
            Author_Institution : 
Dept. of Comput. Sci. & Technol., Central China Normal Univ., Wuhan, China
         
        
        
        
        
        
            Abstract : 
This paper probes into how to improve Information Retrieval by changing the feature distribution of the text. It introduces Cloud Model theory into Latent Dirichlet Allocation(LDA) Model and build a new feature selection system. LDA Model is used to mine the underlying topical structure. Each topic is associated with a multinomial distribution over words which are semantic related. But there is doubt that themes are relevant with each other in the light of semantics. Based on LDA model presented probability distribution of vocabulary in text, the new system with Cloud Model theory can automatically simulate feature set whose contribution degree is high in the text. Results show this feature set has less features but higher classification accuracy, thus obviously better than currently popular feature selection methods. If the query is matched to words with high contribution degree, the more these words are, the more relevant the article searched out is with the query. NTCIR-5 (the 5th NII Test Collection for IR Systems) collections of Experiment on SLIR (Single Language IR) show that this method achieves an obvious improvement compared with some other methods in IR.
         
        
            Keywords : 
cloud computing; feature extraction; information retrieval; statistical distributions; text analysis; Information Retrieval; LDA model; NTCIR-5; SLIR; cloud model theory; cloud-LDA model; feature selection system; latent Dirichlet allocation; probability distribution; single language IR; text features acquisition; Computational modeling; Data models; Feature extraction; Indexes; Information retrieval; Semantics; Uncertainty; Cloud Model; Information Retrieval; LDA model; feature;
         
        
        
        
            Conference_Titel : 
Information Science and Cloud Computing Companion (ISCC-C), 2013 International Conference on
         
        
            Conference_Location : 
Guangzhou
         
        
        
            DOI : 
10.1109/ISCC-C.2013.94