Title : 
New Word Extraction Utilizing Google News Corpuses for Supporting Lexicon-based Chinese Word Segmentation Systems
         
        
            Author : 
Hong, Chin-Ming ; Chen, Chih-Ming ; Chiu, Chao-Yang
         
        
            Author_Institution : 
Nat. Taiwan Normal Univ., Taipei
         
        
        
        
        
        
            Abstract : 
This study proposes a novel statistics-based scheme for new word extraction based on Google news to promote the word identification ability for the lexicon-based Chinese word segmentation systems. To extract news words from the corpuses of news and incrementally add them into the lexicon for the lexicon-based Chinese word segmentation systems provides benefits in terms of automatically constructing a professional lexicon of news and enhancing word identification ability. Compared with another proposed method, the experimental results indicated that the proposed new word extraction scheme not only can more correctly retrieve news words from the categorized corpuses of Google news, but also obtain has larger amount of new words.
         
        
            Keywords : 
natural language processing; search engines; statistics; text analysis; Google news corpuses; lexicon-based Chinese word segmentation systems; statistics; word extraction; word identification ability; Crawlers; Data mining; Degradation; Educational technology; Information retrieval; Internet; Natural language processing; Natural languages; Text mining; Web sites; Chinese word segmentation; Information retrieval; Natural language processing; New word extraction;
         
        
        
        
            Conference_Titel : 
Neural Networks, 2006. IJCNN '06. International Joint Conference on
         
        
            Conference_Location : 
Vancouver, BC
         
        
            Print_ISBN : 
0-7803-9490-9
         
        
        
            DOI : 
10.1109/IJCNN.2006.247263