Title :
Subsequence-Based Text Segmentation and Labeling
Author :
Chen, Xi ; Chen, Shihong
Author_Institution :
Comput. Sch., Wuhan Univ., Wuhan
Abstract :
Text segmentation is important for many natural language processing tasks, such as passage retrieval and summarization. This paper uses suffix tree model for the text representation and introduces a new measure, subsequence-based coherence, to represent the coherence between sentences and utilize the word order information. This paper also introduces a text segmentation algorithm, subsequence-based maximum cut, and a passage labeling approach based on subsequences. The educational text segmentation results show that our method outperforms some of the existing methods, and the passage labeling result is approving.
Keywords :
natural language processing; text analysis; trees (mathematics); natural language processing task; passage labeling approach; subsequence-based coherence; subsequence-based maximum cut; subsequence-based text segmentation; suffix tree model; text labeling; text representation; word order information; Books; Coherence; Computer science; Computer science education; Educational technology; Information retrieval; Intelligent systems; Labeling; Natural language processing; Supervised learning; maximum cut; passage labeling; sentence coherence; subsequence; text segmentation;
Conference_Titel :
Education Technology and Computer Science, 2009. ETCS '09. First International Workshop on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-1-4244-3581-4
DOI :
10.1109/ETCS.2009.138