DocumentCode
2770150
Title
Data selection for speech recognition
Author
Wu, Yi ; Zhang, Rong ; Rudnicky, Alexander
Author_Institution
Carnegie Mellon Univ., Pittsburgh
fYear
2007
fDate
9-13 Dec. 2007
Firstpage
562
Lastpage
565
Abstract
This paper presents a strategy for efficiently selecting informative data from large corpora of transcribed speech. We propose to choose data uniformly according to the distribution of some target speech unit (phoneme, word, character, etc). In our experiment, in contrast to the common belief that "there is no data like more data", we found it possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data. At the same time, our selection process is efficient and fast.
Keywords
maximum entropy methods; speech recognition; data selection; speech recognition; transcribed speech; Automatic speech recognition; Broadcasting; Decoding; Entropy; Impedance; Linear discriminant analysis; Management training; Natural languages; Speech recognition; Training data; acoustic modeling; data selection; maximum entropy; speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location
Kyoto
Print_ISBN
978-1-4244-1746-9
Electronic_ISBN
978-1-4244-1746-9
Type
conf
DOI
10.1109/ASRU.2007.4430173
Filename
4430173
Link To Document