DocumentCode :
2514663
Title :
An automatic normalized cut topic segmentation approach
Author :
Jin, YuanYuan ; Gao, Baojian ; Zhang, ZiRan
Author_Institution :
Northwest Univ., Xian, China
fYear :
2010
fDate :
28-30 Nov. 2010
Firstpage :
403
Lastpage :
406
Abstract :
This paper presents an automatic topic segmentation approach based on subwords normalized cut (Ncut) for Chinese broadcast news, since the classical Ncut has a limitation that the number of segments has to be set as a prior. We abstract a text into a weighted undirected graph, where the nodes correspond to sentences and the weights of edges describe inter-sentence lexical similarities at Chinese subwords level, thus the segmentation task is formalized as a graph-partitioning problem under the Ncut criterion. In order to break through the limitation, we proposed a text dotplotting inspired method, which can evaluate the segmentation results and select the optimal number of segments automatically. Lastly, we put the whole approach into a machine learning framework, learning the best arguments on train set. Our method achieved relative improvement of 3% over non-automatic subwords Ncut, also the previous best method.
Keywords :
graph theory; learning (artificial intelligence); natural language processing; text analysis; Chinese broadcast news; Chinese subwords level; Ncut criterion; automatic normalized cut topic segmentation; graph-partitioning problem; intersentence lexical similarity; machine learning; segmentation task; subwords normalized cut; text abstract; text dotplotting inspired method; weighted undirected graph; Complexity theory; Computational linguistics; Dynamic programming; Machine learning; Speech recognition; Vocabulary; Weight measurement; Intelligent Information Processing; Machine Learning; Natural Language Processing; Normalized Cut; Topic Segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8883-4
Type :
conf
DOI :
10.1109/YCICT.2010.5713130
Filename :
5713130
Link To Document :
بازگشت