Title :
Event tracking and text segmentation via hidden Markov models
Author :
Yamron, J.P. ; Carp, I. ; Gillick, L. ; Lowe, S. ; van Mulbregt, P.
Author_Institution :
Dragon Syst. Inc., Newton, MA, USA
Abstract :
We present an approach to the problems of text segmentation and event tracking that makes use of hidden Markov modeling and clustering techniques. In essence, we regard a stream of unsegmented text (as might be generated from automatic transcription of broadcast news, for example) as being composed of a series of “topics” in something like the same way that a stream of speech consists of a series of phonemes. A story on a particular topic can then be viewed as analogous to an utterance of a particular phoneme, and a stream of text can be decoded into a series of topics in the same way that a speech recognizer decodes a stream of speech into a series of phonemes. We identify the boundaries of these topics with story boundaries. Because a segmenter operating in this way assigns a topic label to each story, it is possible (with some modifications) to use the same engine to “track” or find successive stories on an event of special interest. We have applied these ideas in some recent experiments on the Topic Detection and Tracking Pilot Study Corpus. Our preliminary results are promising and suggest that this general methodology is likely to be quite successful both at recovering story boundaries and at identifying instances of stories about the same event over time
Keywords :
hidden Markov models; string matching; word processing; Topic Detection and Tracking Pilot Study Corpus; automatic transcription; broadcast news; clustering techniques; event tracking; hidden Markov modeling; phoneme; special interest; speech recognizer; story boundaries; text segmentation; topic label; unsegmented text; Automatic speech recognition; Broadcasting; Decoding; Engines; Event detection; Hidden Markov models; Speech recognition; Streaming media; Text recognition;
Conference_Titel :
Automatic Speech Recognition and Understanding, 1997. Proceedings., 1997 IEEE Workshop on
Conference_Location :
Santa Barbara, CA
Print_ISBN :
0-7803-3698-4
DOI :
10.1109/ASRU.1997.659131