DocumentCode :
1691437
Title :
Using multiple versions of speech input in phone recognition
Author :
Liberman, Mark ; Jiahong Yuan ; Stolcke, Andreas ; Wen Wang ; Mitra, Ved
Author_Institution :
Univ. of Pennsylvania, Philadelphia, PA, USA
fYear :
2013
Firstpage :
7591
Lastpage :
7595
Abstract :
This study investigates the use of multiple versions of the same speech unit in automatic phone recognition. Two methods were applied to combine multiple utterance versions in decoding: cross forced-alignment and n-best ROVER. The phone error rate was reduced from 15% to 2% on isolated words and from 33% to 19% on TIMIT sentences. The error rate was reduced the most when the second version was added, and less so as each additional version was added. Depending on the language model weight, it might be better to use the language model only in n-best generation, but omit it in scoring the hypotheses applied to the combination methods. N-best ROVER effectiveness may be enhanced by lowering the language model weight.
Keywords :
acoustic signal processing; decoding; error statistics; speech coding; speech recognition; telephone sets; TIMIT sentences; automatic phone recognition; cross forced-alignment; decoding; isolated words; language model weight; n-best ROVER; n-best generation; phone error rate; speech input; speech unit; utterance versions; Acoustics; Computational modeling; Decoding; Error analysis; Hidden Markov models; Speech; Speech recognition; Forced alignment; N-best ROVER; multiple utterance versions; phone recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639139
Filename :
6639139
Link To Document :
بازگشت