Title :
Automatic Segmentation of Hierarchy Feature without Lexicon for Chinese Text Based on Iterative Learning
Author :
Jiang, Shaohua ; Dang, Yanzhong
Author_Institution :
Sch. of Civil & Hydraulic Eng., Dalian Univ. of Technol., Dalian
Abstract :
Chinese features extraction is indispensable in a processing of Chinese natural language because it is beneficial to Chinese text knowledge discovery and information retrieval. Chinese Segmentation is the precondition of features extraction. To conquer the disadvantage of current Chinese segmentation methods, such as lexicon-based scheme, syntax and rules-based scheme, statistics-based scheme and the integration method of the above scheme, the maximum matching and frequency statistics (MMFS) segmentation method based on length descending and string frequency statistics was put forward. To extract shorter words and phrases included in longer ones, a novel Chinese hierarchy feature extraction method based on MMFS and iterative learning was proposed. This method can obtain hierarchy feature according to morphology with no lexicon, no acquiring the probability between words in advance and no Chinese character index. Experimental results confirmed the efficiency of this statistical method in extracting Chinese hierarchy feature.
Keywords :
feature extraction; information retrieval; natural language processing; text analysis; Chinese character index; Chinese hierarchy feature extraction; Chinese natural language; Chinese segmentation method; Chinese text knowledge discovery; automatic segmentation; frequency statistics segmentation; information retrieval; iterative learning; lexicon-based scheme; maximum matching; rules-based scheme; statistics-based scheme; string frequency statistics; syntax; Computer science; Feature extraction; Frequency; Information retrieval; Natural languages; Probability; Software engineering; Statistics; Testing; Text mining; Chinese text; automatic segmentation; hierarchy feature; iterative learning;
Conference_Titel :
Computer Science and Software Engineering, 2008 International Conference on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3336-0
DOI :
10.1109/CSSE.2008.1434