مرکز منطقه ای اطلاع رساني علوم و فناوري - Using multiple versions of speech input in phone recognition

DocumentCode :

1691437

Title :

Using multiple versions of speech input in phone recognition

Author :

Liberman, Mark ; Jiahong Yuan ; Stolcke, Andreas ; Wen Wang ; Mitra, Ved

Author_Institution :

Univ. of Pennsylvania, Philadelphia, PA, USA

fYear :

2013

Firstpage :

7591

Lastpage :

7595

Abstract :

This study investigates the use of multiple versions of the same speech unit in automatic phone recognition. Two methods were applied to combine multiple utterance versions in decoding: cross forced-alignment and n-best ROVER. The phone error rate was reduced from 15% to 2% on isolated words and from 33% to 19% on TIMIT sentences. The error rate was reduced the most when the second version was added, and less so as each additional version was added. Depending on the language model weight, it might be better to use the language model only in n-best generation, but omit it in scoring the hypotheses applied to the combination methods. N-best ROVER effectiveness may be enhanced by lowering the language model weight.

Keywords :

acoustic signal processing; decoding; error statistics; speech coding; speech recognition; telephone sets; TIMIT sentences; automatic phone recognition; cross forced-alignment; decoding; isolated words; language model weight; n-best ROVER; n-best generation; phone error rate; speech input; speech unit; utterance versions; Acoustics; Computational modeling; Decoding; Error analysis; Hidden Markov models; Speech; Speech recognition; Forced alignment; N-best ROVER; multiple utterance versions; phone recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639139

Filename :

6639139

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1691437