DocumentCode :
479777
Title :
Automatic Segmentation of Hierarchy Feature without Lexicon for Chinese Text Based on Iterative Learning
Author :
Jiang, Shaohua ; Dang, Yanzhong
Author_Institution :
Sch. of Civil & Hydraulic Eng., Dalian Univ. of Technol., Dalian
Volume :
1
fYear :
2008
fDate :
12-14 Dec. 2008
Firstpage :
657
Lastpage :
661
Abstract :
Chinese features extraction is indispensable in a processing of Chinese natural language because it is beneficial to Chinese text knowledge discovery and information retrieval. Chinese Segmentation is the precondition of features extraction. To conquer the disadvantage of current Chinese segmentation methods, such as lexicon-based scheme, syntax and rules-based scheme, statistics-based scheme and the integration method of the above scheme, the maximum matching and frequency statistics (MMFS) segmentation method based on length descending and string frequency statistics was put forward. To extract shorter words and phrases included in longer ones, a novel Chinese hierarchy feature extraction method based on MMFS and iterative learning was proposed. This method can obtain hierarchy feature according to morphology with no lexicon, no acquiring the probability between words in advance and no Chinese character index. Experimental results confirmed the efficiency of this statistical method in extracting Chinese hierarchy feature.
Keywords :
feature extraction; information retrieval; natural language processing; text analysis; Chinese character index; Chinese hierarchy feature extraction; Chinese natural language; Chinese segmentation method; Chinese text knowledge discovery; automatic segmentation; frequency statistics segmentation; information retrieval; iterative learning; lexicon-based scheme; maximum matching; rules-based scheme; statistics-based scheme; string frequency statistics; syntax; Computer science; Feature extraction; Frequency; Information retrieval; Natural languages; Probability; Software engineering; Statistics; Testing; Text mining; Chinese text; automatic segmentation; hierarchy feature; iterative learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Software Engineering, 2008 International Conference on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3336-0
Type :
conf
DOI :
10.1109/CSSE.2008.1434
Filename :
4721835
Link To Document :
بازگشت