• DocumentCode
    3785991
  • Title

    Automatic recognition of spontaneous speech for access to multilingual oral history archives

  • Author

    W. Byrne;D. Doermann;M. Franz;S. Gustman;J. Hajic;D. Oard;M. Picheny;J. Psutka;B. Ramabhadran;D. Soergel;T. Ward; Wei-Jing Zhu

  • Author_Institution
    Speech Process. & the Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
  • Volume
    12
  • Issue
    4
  • fYear
    2004
  • Firstpage
    420
  • Lastpage
    435
  • Abstract
    Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40% were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65-84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80% agreement with manually annotated boundary positions at a 0.35 false alarm rate. Categorization was considerably more challenging, with a nearest-neighbor technique yielding F=0.3. This is less than half the value obtained by the same technique on a standard newswire categorization benchmark, but replication on human-transcribed interviews showed that ASR errors explain little of that difference. The paper concludes with a description of how these capabilities could be used together to search large collections of recorded oral histories.
  • Keywords
    "Automatic speech recognition","History","Broadcasting","Speech recognition","Manuals","Speech analysis","Humans","Error analysis","Senior citizens","Vocabulary"
  • Journal_Title
    IEEE Transactions on Speech and Audio Processing
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2004.828702
  • Filename
    1306515