DocumentCode :
16425
Title :
Rapid Language Identification
Author :
Van Segbroeck, Maarten ; Travadi, Ruchir ; Narayanan, Shrikanth S.
Author_Institution :
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
Volume :
23
Issue :
7
fYear :
2015
fDate :
Jul-15
Firstpage :
1118
Lastpage :
1129
Abstract :
A critical challenge to automatic language identification (LID) is achieving accurate performance with the shortest possible speech segment in a rapid fashion. The accuracy to correctly identify the spoken language is highly sensitive to the duration of speech and is bounded by the amount of information available. The proposed approach for rapid language identification transforms the utterances to a low dimensional i-vector representation upon which language classification methods are applied. In order to meet the challenges involved in rapidly making reliable decisions about the spoken language, a highly accurate and computationally efficient framework of i-vector extraction is proposed. The LID framework integrates the approach of universal background model (UBM) fused total variability modeling. UBM-fused modeling yields the estimation of a more discriminant, single i-vector space. This way, it is also a computationally more efficient alternative than system level fusion. A further reduction in equal error rate is achieved by training the i-vector model on long duration speech utterances and by the deployment of a robust feature extraction scheme that aims to capture the relevant language cues under various acoustic conditions. Evaluation results on the DARPA RATS data corpus suggest the potential of performing successful automated language identification at the level of one second of speech or even shorter duration.
Keywords :
acoustic signal processing; natural language processing; signal representation; speech processing; DARPA RATS data corpus; LID framework; UBM-fused total variability modeling; acoustic conditions; automatic rapid language identification; decision making; equal error rate reduction; i-vector extraction; i-vector model training; language cues; long-duration speech utterances; low-dimensional i-vector representation; robust feature extraction; speech segment; spoken language identification; universal background model; Acoustics; Computational modeling; Feature extraction; Robustness; Speech; Speech processing; Training; I-vector; noise robustness; rapid language identification; short-duration speech; total variability modeling; universal background model (UBM) fusion;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2015.2419978
Filename :
7080944
Link To Document :
بازگشت