Evidence for the strength of the relationship between Automatic Speech Recognition and Phoneme Alignment performance

Author

Baghai-Ravary, Ladan

Author_Institution

Phonetics Lab., Univ. of Oxford, Wellington, UK

fYear

2010

fDate

14-19 March 2010

Firstpage

5262

Lastpage

5265

Abstract

It might be naïvely assumed that the performance of an Automatic Speech Recognition (ASR) system, and that of an Automatic Speech-to-Phoneme Alignment (ASPA) system using the same acoustic-phonetic models, would be closely related. However many researchers believe this relationship to be, at best weak - but this belief has not previously been tested in an objective and quantitative manner. This paper quantifies the strength of the relationship using analysis of data without reference to manually defined alignment labels. By avoiding comparison with a set of reference labels, both the ASR and the ASPA systems can be considered equivalent, removing any bias due to any difference of “opinion” between the human labeller and the automatic system.

Keywords

data analysis; hidden Markov models; speech processing; speech recognition; acoustic-phonetic models; automatic speech recognition; data analysis; phoneme alignment performance; Acoustic testing; Automatic speech recognition; Data analysis; Hidden Markov models; Humans; Laboratories; Speech recognition; Speech synthesis; System performance; System testing; HMMs; acoustic-phonetic models; optimal performance; phoneme alignment; speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location

Dallas, TX

ISSN

1520-6149

Print_ISBN

978-1-4244-4295-9

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2010.5494977

Filename

5494977