DocumentCode :
476248
Title :
Multiple features fusion method for identifying text topic boundaries
Author :
Xu, Yong-Dong ; Quan, Guang-Ri ; Wang, Ya-dong ; Xu, Zhi-Ming
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol. (Wei Hai), Harbin
Volume :
5
fYear :
2008
fDate :
12-15 July 2008
Firstpage :
2950
Lastpage :
2956
Abstract :
In general, a document should be regarded as form of some coherent units which are called discourse segments. Discovering the segment boundaries is an important task for many natural language processing applications. In this paper, we proposed a new Chinese text topic boundaries identification method based on multiple features fusion. Our approach firstly extracts multiple features of topics shift from text. For each feature, we adopt corresponding F-dotplotting model to respectively calculate the boundary values of neighboring sentences. Subsequently, the useful features among above cues are automatically select and combined to determine topic boundaries automatically by a statistical method based on logistic regression analysis. The experimental result shows that the F-dotplotting method is more effective than common dotplotting method and the multiple features fusion method based on the logistic regression model can effectively improve Chinese text topic segmentation performance.
Keywords :
feature extraction; natural language processing; regression analysis; sensor fusion; text analysis; Chinese text topic boundary identification; Chinese text topic segmentation performance; F-dotplotting model; discourse segment boundary discovery; document segments; logistic regression analysis; multiple feature fusion; natural language processing; neighboring sentences; statistical method; topic shift feature extraction; Application software; Computer science; Cybernetics; Feature extraction; Information retrieval; Logistics; Machine learning; Natural language processing; Regression analysis; Statistical analysis; F-dotplotting method; Topic boundaries identification; logistic regression model; multiple features fusion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2095-7
Electronic_ISBN :
978-1-4244-2096-4
Type :
conf
DOI :
10.1109/ICMLC.2008.4620913
Filename :
4620913
Link To Document :
بازگشت