DocumentCode :
2773708
Title :
New Word Extraction Utilizing Google News Corpuses for Supporting Lexicon-based Chinese Word Segmentation Systems
Author :
Hong, Chin-Ming ; Chen, Chih-Ming ; Chiu, Chao-Yang
Author_Institution :
Nat. Taiwan Normal Univ., Taipei
fYear :
0
fDate :
0-0 0
Firstpage :
3040
Lastpage :
3046
Abstract :
This study proposes a novel statistics-based scheme for new word extraction based on Google news to promote the word identification ability for the lexicon-based Chinese word segmentation systems. To extract news words from the corpuses of news and incrementally add them into the lexicon for the lexicon-based Chinese word segmentation systems provides benefits in terms of automatically constructing a professional lexicon of news and enhancing word identification ability. Compared with another proposed method, the experimental results indicated that the proposed new word extraction scheme not only can more correctly retrieve news words from the categorized corpuses of Google news, but also obtain has larger amount of new words.
Keywords :
natural language processing; search engines; statistics; text analysis; Google news corpuses; lexicon-based Chinese word segmentation systems; statistics; word extraction; word identification ability; Crawlers; Data mining; Degradation; Educational technology; Information retrieval; Internet; Natural language processing; Natural languages; Text mining; Web sites; Chinese word segmentation; Information retrieval; Natural language processing; New word extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2006. IJCNN '06. International Joint Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-9490-9
Type :
conf
DOI :
10.1109/IJCNN.2006.247263
Filename :
1716512
Link To Document :
بازگشت