DocumentCode
3785991
Title
Automatic recognition of spontaneous speech for access to multilingual oral history archives
Author
W. Byrne;D. Doermann;M. Franz;S. Gustman;J. Hajic;D. Oard;M. Picheny;J. Psutka;B. Ramabhadran;D. Soergel;T. Ward; Wei-Jing Zhu
Author_Institution
Speech Process. & the Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
Volume
12
Issue
4
fYear
2004
Firstpage
420
Lastpage
435
Abstract
Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40% were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65-84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80% agreement with manually annotated boundary positions at a 0.35 false alarm rate. Categorization was considerably more challenging, with a nearest-neighbor technique yielding F=0.3. This is less than half the value obtained by the same technique on a standard newswire categorization benchmark, but replication on human-transcribed interviews showed that ASR errors explain little of that difference. The paper concludes with a description of how these capabilities could be used together to search large collections of recorded oral histories.
Keywords
"Automatic speech recognition","History","Broadcasting","Speech recognition","Manuals","Speech analysis","Humans","Error analysis","Senior citizens","Vocabulary"
Journal_Title
IEEE Transactions on Speech and Audio Processing
Publisher
ieee
ISSN
1063-6676
Type
jour
DOI
10.1109/TSA.2004.828702
Filename
1306515
Link To Document