Title :
Using hidden Markov models for topic segmentation of meeting transcripts
Author :
Sherman, Melissa ; Liu, Yang
Author_Institution :
Behavioral & Brain Sci., Univ. of Texas at Dallas, Dallas, TX
Abstract :
In this paper, we present a hidden Markov model (HMM) approach to segment meeting transcripts into topics. To learn the model, we use unsupervised learning to cluster the text segments obtained from topic boundary information. Using modified WinDiff and Pk metrics, we demonstrate that an HMM outperforms LCSeg, a state-of-the-art lexical chain based method for topic segmentation using the ICSI meeting corpus. We evaluate the effect of language model order, the number of hidden states, and the use of stop words. Our experimental results show that a unigram LM is better than a trigram LM, using too many hidden states degrades topic segmentation performance, and that removing the stop words from the transcripts does not improve segmentation performance.
Keywords :
hidden Markov models; information analysis; unsupervised learning; Pk metrics; hidden Markov model; language model order; lexical chain; stop words; text segment clustering; topic boundary information; topic segmentation performance; unsupervised learning; Broadcasting; Coherence; Computer science; Decision trees; Degradation; Feature extraction; Hidden Markov models; Machine learning algorithms; Speech analysis; Unsupervised learning; Hidden Markov Model; LCSeg; Meeting Transcript; Topic Segmentation;
Conference_Titel :
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location :
Goa
Print_ISBN :
978-1-4244-3471-8
Electronic_ISBN :
978-1-4244-3472-5
DOI :
10.1109/SLT.2008.4777871