• DocumentCode
    2330260
  • Title

    Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features

  • Author

    Chen, Yun-nung ; Huang, Yu ; Kong, Sheng-Yi ; Lee, Lin-shan

  • Author_Institution
    Grad. Inst. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • fYear
    2010
  • fDate
    12-15 Dec. 2010
  • Firstpage
    265
  • Lastpage
    270
  • Abstract
    This paper proposes a set of approaches to automatically extract key terms from spoken course lectures including audio signals, ASR transcriptions and slides. We divide the key terms into two types: key phrases and keywords and develop different approaches to extract them in order. We extract key phrases using right/left branching entropy and extract keywords by learning from three sets of features: prosodic features, lexical features and semantic features from Probabilistic Latent Semantic Analysis (PLSA). The learning approaches include an unsupervised method (K-means exemplar) and two supervised ones (AdaBoost and neural network). Very encouraging preliminary results were obtained with a corpus of course lectures, and it is found that all approaches and all sets of features proposed here are useful.
  • Keywords
    audio signal processing; entropy; learning (artificial intelligence); neural nets; probability; speech recognition; ASR transcriptions; AdaBoost; audio signals; automatic key term extraction; branching entropy; key phrase extraction; keyword extraction; lexical features; neural network; probabilistic latent semantic analysis; prosodic features; semantic features; spoken course lectures; unsupervised method; K-means; PAT tree; Probabilistic Latent Semantic Analysis (PLSA); course lectures; entropy; key phrase extraction; keyword extraction; machine learning; prosody;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2010 IEEE
  • Conference_Location
    Berkeley, CA
  • Print_ISBN
    978-1-4244-7904-7
  • Electronic_ISBN
    978-1-4244-7902-3
  • Type

    conf

  • DOI
    10.1109/SLT.2010.5700862
  • Filename
    5700862