DocumentCode :
417263
Title :
Studies in massively speaker-specific speech recognition
Author :
Shi, Yu ; Chang, Eric
Volume :
1
fYear :
2004
fDate :
17-21 May 2004
Abstract :
Over the past several years, the primary focus for the speech-recognition research community has been speaker-independent speech recognition, with the emphasis of working on databases with larger and larger numbers of speakers. For example, the most recent EARS program, which is sponsored by DARPA, calls for recordings of thousands of speakers. However, we are interested in making a speech interface work well for one particular individual, and we propose using massive amounts of speaker-specific training data recorded in daily life. We call this massively speaker-specific recognition (MSSR). As a pre-research, we leverage the large corpus we have available from speech-synthesis work to study the benefit of MSSR only from the acoustic-modeling aspect. Initial results show that, by changing the focus to MSSR, word error rates can drop very significantly. In comparison with speaker-adaptive speech recognition systems, MSSR also performs better since model parameters can be tuned to be suitable to one particular individual.
Keywords :
error statistics; learning (artificial intelligence); natural language interfaces; speech recognition; speech-based user interfaces; massively speaker-specific recognition; massively speaker-specific speech recognition; speaker-adaptive speech recognition; speaker-independent speech recognition; speech interface; speech synthesis; training data; word error rates; Asia; Databases; Ear; Error analysis; Maximum likelihood linear regression; Mobile handsets; Software systems; Speech recognition; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1326113
Filename :
1326113
Link To Document :
بازگشت