مرکز منطقه ای اطلاع رساني علوم و فناوري - Improved Modeling of Cross-Decoder Phone Co-Occurrences in SVM-Based Phonotactic Language Recognition

DocumentCode :

1484003

Title :

Improved Modeling of Cross-Decoder Phone Co-Occurrences in SVM-Based Phonotactic Language Recognition

Author :

Penagarikano, Mikel ; Varona, Amparo ; Rodriguez-Fuentes, Luis Javier ; Bordel, German

Author_Institution :

Dept. of Electr. & Electron., Univ. of the Basque Country, Leioa, Spain

Volume :

Issue :

fYear :

2011

Firstpage :

2348

Lastpage :

2363

Abstract :

Most common approaches to phonotactic language recognition deal with several independent phone decodings. These decodings are processed and scored in a fully uncoupled way, their time alignment (and the information that may be extracted from it) being completely lost. Recently, we have presented two new approaches to phonotactic language recognition which take into account time alignment information, by considering time-synchronous cross-decoder phone co-occurrences. Experiments on the 2007 NIST LRE database demonstrated that using phone co-occurrence statistics could improve the performance of baseline phonotactic recognizers. In this paper, approaches based on time-synchronous cross-decoder phone co-occurrences are further developed and evaluated with regard to a baseline SVM-based phonotactic system, by using: 1) counts of n-grams (up to 4-grams) of phone co-occurrences; and 2) the degree of co-occurrence of phone n-grams (up to 4-grams). To evaluate these approaches, a choice of open software (Brno University of Technology phone decoders, LIBLINEAR and FoCal) was used, and experiments were carried out on the 2007 NIST LRE database. The two approaches presented in this paper outperformed the baseline phonotactic system, yielding around 7% relative improvement in terms of C_LLR. The fusion of the baseline system with the two proposed approaches yielded 1.83% EER and C_LLR=0.270 (meaning 18% relative improvement), the same performance (on the same task) than state-of-the-art phonotactic systems which apply more complex models and techniques, thus supporting the use of cross-decoder dependencies for language recognition.

Keywords :

natural language processing; public domain software; speaker recognition; statistics; support vector machines; LIBLINEAR; SVM-based phonotactic language recognition; account time alignment information; baseline phonotactic recognizers; phone cooccurrence statistics; time-synchronous cross-decoder phone cooccurrence modelling; Acoustics; Databases; Decoding; Feature extraction; NIST; Speech; Support vector machines; Phonotactic language recognition; support vector machines (SVMs); time-synchronous cross-decoder phone co-occurrences;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2011.2134088

Filename :

5740582

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1484003