Rapid Language Identification

Author

Van Segbroeck, Maarten ; Travadi, Ruchir ; Narayanan, Shrikanth S.

Author_Institution

Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA

Volume

23

Issue

7

fYear

2015

fDate

Jul-15

Firstpage

1118

Lastpage

1129

Abstract

A critical challenge to automatic language identification (LID) is achieving accurate performance with the shortest possible speech segment in a rapid fashion. The accuracy to correctly identify the spoken language is highly sensitive to the duration of speech and is bounded by the amount of information available. The proposed approach for rapid language identification transforms the utterances to a low dimensional i-vector representation upon which language classification methods are applied. In order to meet the challenges involved in rapidly making reliable decisions about the spoken language, a highly accurate and computationally efficient framework of i-vector extraction is proposed. The LID framework integrates the approach of universal background model (UBM) fused total variability modeling. UBM-fused modeling yields the estimation of a more discriminant, single i-vector space. This way, it is also a computationally more efficient alternative than system level fusion. A further reduction in equal error rate is achieved by training the i-vector model on long duration speech utterances and by the deployment of a robust feature extraction scheme that aims to capture the relevant language cues under various acoustic conditions. Evaluation results on the DARPA RATS data corpus suggest the potential of performing successful automated language identification at the level of one second of speech or even shorter duration.

Keywords

acoustic signal processing; natural language processing; signal representation; speech processing; DARPA RATS data corpus; LID framework; UBM-fused total variability modeling; acoustic conditions; automatic rapid language identification; decision making; equal error rate reduction; i-vector extraction; i-vector model training; language cues; long-duration speech utterances; low-dimensional i-vector representation; robust feature extraction; speech segment; spoken language identification; universal background model; Acoustics; Computational modeling; Feature extraction; Robustness; Speech; Speech processing; Training; I-vector; noise robustness; rapid language identification; short-duration speech; total variability modeling; universal background model (UBM) fusion;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2015.2419978

Filename

7080944