Improving ASR by integrating lecture audio and slides

Author

Miranda, Joao ; Neto, Joao Paulo ; Black, Alan W.

Author_Institution

INESC-ID / Inst. Super. Tecnico, Lisbon, Portugal

fYear

2013

Firstpage

8131

Lastpage

8135

Abstract

We propose a method to combine audio of a lecture with its supporting slides in order to improve automatic speech recognition performance. We view both the lecture speech and the slides as parallel streams which contain redundant information. We integrate both streams in order to bias the recognizer´s language model towards the words in the slides, by first aligning the speech with the slide words, thus correcting errors on the ASR transcripts. We obtain a 5.9% relative WER improvement on a lecture test set, when compared to a speech recognition only system.

Keywords

speech recognition; ASR transcript; automatic speech recognition; integrating lecture audio; language model; lecture speech; lecture test set; parallel stream slide; relative WER improvement; Computational modeling; Computer science; Data models; Lattices; Multimedia communication; Speech; Speech recognition; Lecture; Slides; Speech Recognition; System Combination;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6639249

Filename

6639249

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1694472