A Target-Oriented Phonotactic Front-End for Spoken Language Recognition

Author

Tong, Rong ; Ma, Bin ; Li, Haizhou ; Chng, Eng Siong

Author_Institution

Inst. for Infocomm Res., Singapore, Singapore

Volume

17

Issue

7

fYear

2009

Firstpage

1335

Lastpage

1347

Abstract

This paper presents a strategy to optimize the phonotactic front-end for spoken language recognition. This is achieved by selecting a subset of phones from an existing phone recognizer´s phone inventory such that only the phones that best discriminate each of the target languages are selected. Each such phone subset will be used to construct a target-oriented phone tokenizer (TOPT). In this study, we examine different approaches to construct such phone tokenizers for the front-end of a parallel phone recognizers followed by vector space modeling (PPR-VSM) system. We show that the target-oriented phone tokenizers derived from language-specific phone recognizers are more effective than the original parallel phone recognizers. Our experimental results also show that the target-oriented phone tokenizers derived from universal phone recognizers achieve better performance than those derived from language-specific phone recognizers. Using the proposed target-oriented phone tokenizers as the phonotactic front-end, the language recognition system performance is significantly improved without the need for additional training samples. We achieve an equal error rate (EER) of 1.27%, 1.42% and 2.73% on the NIST 1996, 2003 and 2007 LRE databases respectively for 30-s closed-set tests. This system is one of the subsystems in IIR´s submission to NIST 2007 LRE.

Keywords

speech recognition; vectors; language-specific phone recognizer; parallel phone recognizer; phone inventory; phone recognizer; spoken language recognition; target-oriented phone tokenizer; target-oriented phonotactic front-end; vector space modeling; Cepstral analysis; Error analysis; Humans; Mel frequency cepstral coefficient; NIST; Natural languages; Speech processing; Speech recognition; System performance; Target recognition; Feature selection; parallel phone recognizer (PPR); phonotactic feature; spoken language recognition; target-oriented phone tokenizer (TOPT); universal phone recognizer;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2009.2016731

Filename

5165117