DocumentCode
16425
Title
Rapid Language Identification
Author
Van Segbroeck, Maarten ; Travadi, Ruchir ; Narayanan, Shrikanth S.
Author_Institution
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
Volume
23
Issue
7
fYear
2015
fDate
Jul-15
Firstpage
1118
Lastpage
1129
Abstract
A critical challenge to automatic language identification (LID) is achieving accurate performance with the shortest possible speech segment in a rapid fashion. The accuracy to correctly identify the spoken language is highly sensitive to the duration of speech and is bounded by the amount of information available. The proposed approach for rapid language identification transforms the utterances to a low dimensional i-vector representation upon which language classification methods are applied. In order to meet the challenges involved in rapidly making reliable decisions about the spoken language, a highly accurate and computationally efficient framework of i-vector extraction is proposed. The LID framework integrates the approach of universal background model (UBM) fused total variability modeling. UBM-fused modeling yields the estimation of a more discriminant, single i-vector space. This way, it is also a computationally more efficient alternative than system level fusion. A further reduction in equal error rate is achieved by training the i-vector model on long duration speech utterances and by the deployment of a robust feature extraction scheme that aims to capture the relevant language cues under various acoustic conditions. Evaluation results on the DARPA RATS data corpus suggest the potential of performing successful automated language identification at the level of one second of speech or even shorter duration.
Keywords
acoustic signal processing; natural language processing; signal representation; speech processing; DARPA RATS data corpus; LID framework; UBM-fused total variability modeling; acoustic conditions; automatic rapid language identification; decision making; equal error rate reduction; i-vector extraction; i-vector model training; language cues; long-duration speech utterances; low-dimensional i-vector representation; robust feature extraction; speech segment; spoken language identification; universal background model; Acoustics; Computational modeling; Feature extraction; Robustness; Speech; Speech processing; Training; I-vector; noise robustness; rapid language identification; short-duration speech; total variability modeling; universal background model (UBM) fusion;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2015.2419978
Filename
7080944
Link To Document