DocumentCode
2659953
Title
Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion
Author
Liu, Fei ; Liu, Feifan ; Liu, Yang
Author_Institution
Dept. of Comput. Sci., Univ. of Texas at Dallas, Dallas, TX
fYear
2008
fDate
15-19 Dec. 2008
Firstpage
181
Lastpage
184
Abstract
In this paper, we tackle the problem of automatic keyword extraction in the meeting domain, a genre significantly different from written text. For the supervised framework, we proposed a rich set of features beyond the typical TFIDF measures, such as sentence salience weight, lexical features, summary sentences, and speaker information. We also evaluate different candidate sampling approaches for better model training and testing. In addition, we introduced a bigram expansion module which aims at extracting ldquoentity bigramsrdquo using Web resources. Using the ICSI meeting corpus, we demonstrate the effectiveness of the features and show that the supervised method and the bigram expansion module outperform the unsupervised TFIDF selection with POS (part-of-speech) filtering. Finally, we show the approaches introduced in this paper perform well on the speech recognition output.
Keywords
natural language processing; speech recognition; ICSI meeting corpus; TFIDF measures; automatic keyword extraction; bigram expansion; lexical features; sentence salience weight; speech recognition output; summary sentences; supervised approach; written text; Computer science; Data mining; Decision making; Filtering; Frequency; Mutual information; Sampling methods; Speech recognition; Supervised learning; Testing; TFIDF; feature selection; keyword extraction; meeting transcripts;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location
Goa
Print_ISBN
978-1-4244-3471-8
Electronic_ISBN
978-1-4244-3472-5
Type
conf
DOI
10.1109/SLT.2008.4777870
Filename
4777870
Link To Document