Title :
Automatic Parliamentary Meeting Minute Generation Using Rhetorical Structure Modeling
Author :
Zhang, Justin Jian ; Fung, Pascale
Author_Institution :
Eng. Technol. Inst., Dongguan Univ. of Technol., Dongguan, China
Abstract :
In this paper, we propose a one step rhetorical structure parsing, chunking and extractive summarization approach to automatically generate meeting minutes from parliamentary speech using acoustic and lexical features. We investigate how to use lexical features extracted from imperfect ASR transcriptions, together with acoustic features extracted from the speech itself, to form extractive summaries with the structure of meeting minutes. Each business item in the minute is modeled as a rhetorical chunk which consists of smaller rhetorical units. Principal Component Analysis (PCA) graphs of both acoustic and lexical features in meeting speech show clear self-clustering of speech utterances according to the underlying rhetorical state-for example acoustic and lexical feature vectors from the question and answer or motion of a parliamentary speech, are grouped together. We then propose a Conditional Random Fields (CRF)-based approach to perform both rhetorical structure modeling and extractive summarization in one step, by chunking, parsing and extraction of salient utterances. Extracted salient utterances are grouped under the labels of each rhetorical state, emulating meeting minutes to yield summaries that are more easily understandable by humans. We compare this approach to different machine learning methods. We show that our proposed CRF-based one step minute generation system obtains the best summarization performance both in terms of ROUGE-L F-measure at 74.5% and by human evaluation, at 77.5% on average.
Keywords :
feature extraction; graph theory; learning (artificial intelligence); pattern clustering; principal component analysis; speech recognition; CRF-based approach; CRF-based one step minute generation system; PCA graphs; ROUGE-L F-measure; acoustic feature extraction; automatic parliamentary meeting minute generation; automatic speech recognition system; conditional random field based approach; extractive summarization approach; imperfect ASR transcriptions; lexical feature extraction; machine learning methods; one step rhetorical structure parsing approach; parliamentary speech; principal component analysis; rhetorical chunking approach; rhetorical structure modeling; salient utterance extraction; speech utterance self-clustering; Acoustics; Business; Feature extraction; Hidden Markov models; Minutes; Speech; Syntactics; Extractive speech summarization; meeting minutes generation; rhetorical structure modeling;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2215592