Title :
A factored conditional random field model for articulatory feature forced transcription
Author :
Prabhavalkar, Rohit ; Fosler-Lussier, Eric ; Livescu, Karen
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
We investigate joint models of articulatory features and apply these models to the problem of automatically generating articulatory transcriptions of spoken utterances given their word transcriptions. The task is motivated by the need for larger amounts of labeled articulatory data for both speech recognition and linguistics research, which is costly and difficult to obtain through manual transcription or physical measurement. Unlike phonetic transcription, in our task it is important to account for the fact that the articulatory features can desynchronize. We consider factored models of the articulatory state space with an explicit model of articulator asynchrony. We compare two types of graphical models: a dynamic Bayesian network (DBN), based on previously proposed models; and a conditional random field (CRF), which we develop here. We demonstrate how task-specific constraints can be leveraged to allow for efficient exact inference in the CRF. On the transcription task, the CRF outperforms the DBN, with relative improvements of 2.2% to 10.0%.
Keywords :
belief networks; random processes; speech recognition; articulator asynchrony explicit model; articulatory feature forced transcription; articulatory feature joint models; dynamic Bayesian network; factored conditional random field model; linguistics research; phonetic transcription; speech recognition; spoken utterances; word transcriptions; Acoustics; Adaptation models; Equations; Hidden Markov models; Mathematical model; Tongue; Training;
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Conference_Location :
Waikoloa, HI
Print_ISBN :
978-1-4673-0365-1
Electronic_ISBN :
978-1-4673-0366-8
DOI :
10.1109/ASRU.2011.6163909