Title :
Chinese Keyword Extraction Based on N-Gram and Word Co-occurrence
Author :
Jiao, Hui ; Liu, Qian ; Jia, Hui-bo
Author_Institution :
Tsinghua Univ., Beijing
Abstract :
This paper presents a new kind of Chinese text encoding method based on Chinese word, and establishes a new Chinese document format which deals with the automatic segmentation issue. This method makes word the smallest information unit. Chinese text analysis does not rely on segmentation by the method. On this word platform, N-gram and word co-occurrence statistical analysis are combined to carry out Chinese keyword extraction experiment. Firstly, candidate keywords are extracted with bi-gram model. Then, a set of co-occurrences between every word in bi-grams and frequent words is generated. Co-occurrence distribution shows importance of every word. According to the analysis result, keywords are chosen from bi-grams. This algorithm applies to a single document without using a corpus, and experimental results are satisfying.
Keywords :
indexing; statistical analysis; text analysis; Chinese document format; Chinese keyword extraction; Chinese text encoding; N-gram; word cooccurrence statistical analysis; Computational intelligence; Data mining; Encoding; Indexing; Instruments; Laboratories; Security; Statistical analysis; Text analysis; Writing;
Conference_Titel :
Computational Intelligence and Security Workshops, 2007. CISW 2007. International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-0-7695-3073-4
DOI :
10.1109/CISW.2007.4425468