Title :
Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion
Author :
Liu, Fei ; Liu, Feifan ; Liu, Yang
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Dallas, Dallas, TX
Abstract :
In this paper, we tackle the problem of automatic keyword extraction in the meeting domain, a genre significantly different from written text. For the supervised framework, we proposed a rich set of features beyond the typical TFIDF measures, such as sentence salience weight, lexical features, summary sentences, and speaker information. We also evaluate different candidate sampling approaches for better model training and testing. In addition, we introduced a bigram expansion module which aims at extracting ldquoentity bigramsrdquo using Web resources. Using the ICSI meeting corpus, we demonstrate the effectiveness of the features and show that the supervised method and the bigram expansion module outperform the unsupervised TFIDF selection with POS (part-of-speech) filtering. Finally, we show the approaches introduced in this paper perform well on the speech recognition output.
Keywords :
natural language processing; speech recognition; ICSI meeting corpus; TFIDF measures; automatic keyword extraction; bigram expansion; lexical features; sentence salience weight; speech recognition output; summary sentences; supervised approach; written text; Computer science; Data mining; Decision making; Filtering; Frequency; Mutual information; Sampling methods; Speech recognition; Supervised learning; Testing; TFIDF; feature selection; keyword extraction; meeting transcripts;
Conference_Titel :
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location :
Goa
Print_ISBN :
978-1-4244-3471-8
Electronic_ISBN :
978-1-4244-3472-5
DOI :
10.1109/SLT.2008.4777870