Title :
Voiced/unvoiced pattern-based duration modeling for language identification
Author :
Yin, Bo ; Ambikairajah, Eliathamby ; Chen, Fang
Abstract :
Most existing duration modeling approaches facilitates phone recognizer and require manually annotated corpus to train the segmentation models, which is usually cost- and time-consuming. In this paper, a novel duration modeling approach is proposed, which does not require phone recognizer/annotated training data, and facilitates fast computation of language identification. In this approach, the segmentation is implemented by using articulatory features like voicing status. A pair of connected unvoiced and voiced segments is considered as the unit, and the duration of each segment is normalized for each utterance and then quantized into 20 discrete ranges. The ranges of units are later considered as symbol sequences and are modeled by n-gram models, to capture the temporal pattern, which is hypothesized to vary in different languages. The experiments based on the NIST LRE 2005 tasks show a relative 19.7% EER improvement by introducing the proposed duration modeling-based system into a fusion system containing two GMM-UBM based acoustic systems using MFCC and pitch+intensity features.
Keywords :
natural language processing; speech recognition; GMM-UBM; MFCC; acoustic systems; duration modeling approaches; duration modeling-based system; fusion system; language identification; phone recognizer; pitch+intensity features; segmentation models; unvoiced pattern; voicing status; Australia; Data mining; Loudspeakers; Mel frequency cepstral coefficient; NIST; Natural languages; Pattern recognition; Speech recognition; Target recognition; Training data; articulatory features; duration modeling; language identification; quantization;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2009.4960590