DocumentCode :
387590
Title :
Subtopic segmentation of Chinese document: an adapted dotplot approach
Author :
Chen, Qing-cai ; Wang, Xiao-long ; Liu, Bing-quan ; Wang, Ying-Yu
Author_Institution :
Sch. of Comput. Sci. & Techniques, Harbin Inst. of Technol., China
Volume :
3
fYear :
2002
fDate :
2002
Firstpage :
1571
Abstract :
An adapted dotplot model based on Chinese word sense quantization is presented to find the boundaries of subtopics in a document. The data reduction techniques of rough sets are introduced for the purpose of selecting axis words for word space. For discrete and filter data in the information table, the mutual information between axis words and feature words is calculated. Then the adapted model is constructed by replacing the counting identical words with the calculation of similarity between feature words. As a submodule of our InsunAbs Chinese auto-summarization system, its performance is indirectly evaluated through a quantitative evaluation. By comparison this adapted model outperforms the baseline and original dotplot model in the test experiments.
Keywords :
document handling; natural languages; rough set theory; text analysis; Chinese document handling; Chinese summarization system; Chinese word sense quantization; InsunAbs; attribute reduction; dotplot model; mutual information; rough set theory; subtopic segmentation; Computer science; Data mining; Information filtering; Information filters; Information retrieval; Mutual information; Quantization; TV; Testing; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
Type :
conf
DOI :
10.1109/ICMLC.2002.1167475
Filename :
1167475
Link To Document :
بازگشت