DocumentCode :
2666001
Title :
A novel text subject extraction method
Author :
Yinghua, Ma ; Guiyang, Su ; Jianhua, Li ; Shenghong, Li
Author_Institution :
Shanghai Jiao Tong Univ., China
fYear :
2003
fDate :
26-29 Oct. 2003
Firstpage :
698
Lastpage :
703
Abstract :
Word segmentation or word extraction is always the first step of subject extraction. For no intervals between words, word segmentation of Chinese text is rather complicated. A novel text subject extraction method based on contextual cooccurrence is put forward, and an approach of extracting subject sentence from Chinese text using character contextual cooccurrence data is described. The new approach has fast speed and can skip the segmentation. It also can be applied in multistyle text. The result of three experiments shows that the approach gains high accuracy in multistyle text, 77.19% in news text. Comparative experiment shows that there was no loss in accuracy.
Keywords :
natural languages; text analysis; word processing; Chinese text processing; contextual cooccurrence; subject extraction; word extraction; word segmentation; Data mining; Entropy; Frequency; Humans; Information filtering; Information filters; Information processing; Information retrieval; Natural languages; Statistical analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
Type :
conf
DOI :
10.1109/NLPKE.2003.1275995
Filename :
1275995
Link To Document :
بازگشت