Automatic recognition of spontaneous speech for access to multilingual oral history archives

Author

W. Byrne;D. Doermann;M. Franz;S. Gustman;J. Hajic;D. Oard;M. Picheny;J. Psutka;B. Ramabhadran;D. Soergel;T. Ward; Wei-Jing Zhu

Author_Institution

Speech Process. & the Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA

Volume

12

Issue

4

fYear

2004

Firstpage

420

Lastpage

435

Abstract

Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40% were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65-84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80% agreement with manually annotated boundary positions at a 0.35 false alarm rate. Categorization was considerably more challenging, with a nearest-neighbor technique yielding F=0.3. This is less than half the value obtained by the same technique on a standard newswire categorization benchmark, but replication on human-transcribed interviews showed that ASR errors explain little of that difference. The paper concludes with a description of how these capabilities could be used together to search large collections of recorded oral histories.

Keywords

"Automatic speech recognition","History","Broadcasting","Speech recognition","Manuals","Speech analysis","Humans","Error analysis","Senior citizens","Vocabulary"

Journal_Title

IEEE Transactions on Speech and Audio Processing

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/TSA.2004.828702

Filename

1306515