Title :
Kernel topic segmentation for informal multi-party meetings and performance degradation caused by insufficient lexicon
Author_Institution :
Nat. Inst. of Adv. Ind. Sci. & Technol. (AIST), Tsukuba, Japan
Abstract :
We herein propose a domain-independent topic segmentation algorithm for free-form multi-party meeting recordings. The advantage of the proposed algorithm is that topical and lexical knowledge, which are difficult to adapt to the target meeting before speech recognition and topic segmentation, are not required. For an errorful sequence of phonemes obtained using a continuous phoneme recognizer, the proposed algorithm exhaustively analyzes the occurrence pattern of subsequences of phonemes and partitions the sequence into segments with coherent patterns. An empirical study on the ICSI Meeting Corpus has indicated that it performs comparably to lexical-cohesion-based text segmenters applied to human transcripts. Furthermore, the performance of the text segmenters applied to LVCSR output decreases significantly when keywords are not included in the lexicon. This suggests that, for the purpose of obtaining topical structure, the phoneme sequence segmenter could be more robust than text segmenters with LVCSR.
Keywords :
speech recognition; Kernel topic segmentation; informal multiparty meetings; insufficient lexicon; lexical knowledge; multiparty meeting recordings; performance degradation; speech recognition; text segmenters; topical knowledge; Topic segmentation; kernel method; meeting summarization; string kernel; sub-word recognition;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2010 IEEE
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-7904-7
Electronic_ISBN :
978-1-4244-7902-3
DOI :
10.1109/SLT.2010.5700891