• DocumentCode
    2659953
  • Title

    Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion

  • Author

    Liu, Fei ; Liu, Feifan ; Liu, Yang

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Texas at Dallas, Dallas, TX
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    181
  • Lastpage
    184
  • Abstract
    In this paper, we tackle the problem of automatic keyword extraction in the meeting domain, a genre significantly different from written text. For the supervised framework, we proposed a rich set of features beyond the typical TFIDF measures, such as sentence salience weight, lexical features, summary sentences, and speaker information. We also evaluate different candidate sampling approaches for better model training and testing. In addition, we introduced a bigram expansion module which aims at extracting ldquoentity bigramsrdquo using Web resources. Using the ICSI meeting corpus, we demonstrate the effectiveness of the features and show that the supervised method and the bigram expansion module outperform the unsupervised TFIDF selection with POS (part-of-speech) filtering. Finally, we show the approaches introduced in this paper perform well on the speech recognition output.
  • Keywords
    natural language processing; speech recognition; ICSI meeting corpus; TFIDF measures; automatic keyword extraction; bigram expansion; lexical features; sentence salience weight; speech recognition output; summary sentences; supervised approach; written text; Computer science; Data mining; Decision making; Filtering; Frequency; Mutual information; Sampling methods; Speech recognition; Supervised learning; Testing; TFIDF; feature selection; keyword extraction; meeting transcripts;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
  • Conference_Location
    Goa
  • Print_ISBN
    978-1-4244-3471-8
  • Electronic_ISBN
    978-1-4244-3472-5
  • Type

    conf

  • DOI
    10.1109/SLT.2008.4777870
  • Filename
    4777870