Title :
Framework for cross-language automatic phonetic segmentation
Author :
Ogbureke, Kalu U. ; Carson-Berndsen, Julie
Author_Institution :
Sch. of Comput. Sci. & Inf., Univ. Coll. Dublin, Dublin, Ireland
Abstract :
Annotation of large multilingual corpora remains a challenge to the data-driven approach to speech research, especially for under-resourced languages. This paper presents cross-language automatic phonetic segmentation using Hidden Markov Models (HMMs). The underlying notion is segmentation based on articulation (manner and place) so as to provide extensive models that will be applicable across languages. A test on the Appen Spanish speech corpus gives phone recognition accuracy of 61.15% when bootstrapped with acoustic models trained on the TIMIT as compared with a baseline result of 54.63% for flat start initialization of the monophone models.
Keywords :
hidden Markov models; natural language processing; speech processing; Appen Spanish speech corpus; annotation; cross-language automatic phonetic segmentation; hidden Markov models; multilingual corpora; phone recognition; speech research; Acoustic testing; Automatic speech recognition; Computer science; Hidden Markov models; Loudspeakers; Maximum likelihood estimation; Natural languages; Noise robustness; Speech recognition; Speech synthesis; Automatic phonetic segmentation; articulatory features; hidden Markov models;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5494978