• DocumentCode
    417189
  • Title

    Automatic indexing of key sentences for lecture archives using statistics of presumed discourse markers

  • Author

    Nanjo, Hiroaki ; Kitade, Tasuku ; Kawahara, Tatsuya

  • Author_Institution
    Sch. of Informatics, Kyoto Univ., Japan
  • Volume
    1
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in a totally unsupervised manner based on word statistics. The statistics of the presumed discourse markers are then used to define the importance of the sentences. It is also combined with the conventional tf-idf measure of content words. Experimental results using a large corpus of lectures confirm the effectiveness of the method based on the discourse markers and its combination with the keyword-based method. It is also shown that the method is robust against ASR errors and sentence segmentation accuracy is more vital. Thus, we also enhance segmentation by incorporating prosodic information.
  • Keywords
    audio signal processing; indexing; natural languages; speech recognition; statistical analysis; text analysis; ASR errors; automatic indexing; content words; discourse markers; key sentence extraction; lecture archives; lecture audio archives; prosodic information; sentence segmentation accuracy; word statistics; Acoustic testing; Automatic speech recognition; Data mining; Informatics; Machine assisted indexing; Natural languages; Robustness; Speech recognition; Statistics; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1326019
  • Filename
    1326019