Data selection for speech recognition

Author

Wu, Yi ; Zhang, Rong ; Rudnicky, Alexander

Author_Institution

Carnegie Mellon Univ., Pittsburgh

fYear

2007

fDate

9-13 Dec. 2007

Firstpage

562

Lastpage

565

Abstract

This paper presents a strategy for efficiently selecting informative data from large corpora of transcribed speech. We propose to choose data uniformly according to the distribution of some target speech unit (phoneme, word, character, etc). In our experiment, in contrast to the common belief that "there is no data like more data", we found it possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data. At the same time, our selection process is efficient and fast.

Keywords

maximum entropy methods; speech recognition; data selection; speech recognition; transcribed speech; Automatic speech recognition; Broadcasting; Decoding; Entropy; Impedance; Linear discriminant analysis; Management training; Natural languages; Speech recognition; Training data; acoustic modeling; data selection; maximum entropy; speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on

Conference_Location

Kyoto

Print_ISBN

978-1-4244-1746-9

Electronic_ISBN

978-1-4244-1746-9

Type

conf

DOI

10.1109/ASRU.2007.4430173

Filename

4430173

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2770150