Author_Institution :
Dept. of Electr. & Comput. Eng., Ohio State Univ., Columbus, OH
Abstract :
Articulatory feature modeling in automatic speech recognition (ASR), while not (yet) mainstream, has received a significant amount of attention in recent research ((S. Chang, et al., 2001), (K. Kirchhoff, May 2000), (S. Stuker, et al., 2003), (F. Metze and A. Waibel, 2002) inter alia). One study in particular (S. Chang, et al., 2001) has provided evidence that hierarchical articulatory feature models can potentially significantly outperform their non-hierarchical counterparts. In such a system, the probability of an articulatory feature is conditional upon some other feature - for example, the classifier for place of articulation may depend on the manner of articulation. In this work, we seek to further the studies in (S. Chang, et al., 2001) by changing the assumption of perfect recognition of the conditioning class made in that study. The gains shown over non-hierarchical classification are minimized; our analysis shows that this is in part because the errors in different acoustic feature streams are in fact correlated. We conclude the study by observing that joint acoustic feature modeling, rather than conditional modeling, may provide better gains
Keywords :
feature extraction; speech processing; speech recognition; acoustic feature streams; automatic speech recognition; hierarchical articulatory feature detectors; nonhierarchical classification; Acoustic noise; Automatic speech recognition; Computer science; Computer vision; Detectors; Humans; Noise robustness; Speech enhancement; Speech recognition; Vocabulary;