• DocumentCode
    1135700
  • Title

    A Target-Oriented Phonotactic Front-End for Spoken Language Recognition

  • Author

    Tong, Rong ; Ma, Bin ; Li, Haizhou ; Chng, Eng Siong

  • Author_Institution
    Inst. for Infocomm Res., Singapore, Singapore
  • Volume
    17
  • Issue
    7
  • fYear
    2009
  • Firstpage
    1335
  • Lastpage
    1347
  • Abstract
    This paper presents a strategy to optimize the phonotactic front-end for spoken language recognition. This is achieved by selecting a subset of phones from an existing phone recognizer´s phone inventory such that only the phones that best discriminate each of the target languages are selected. Each such phone subset will be used to construct a target-oriented phone tokenizer (TOPT). In this study, we examine different approaches to construct such phone tokenizers for the front-end of a parallel phone recognizers followed by vector space modeling (PPR-VSM) system. We show that the target-oriented phone tokenizers derived from language-specific phone recognizers are more effective than the original parallel phone recognizers. Our experimental results also show that the target-oriented phone tokenizers derived from universal phone recognizers achieve better performance than those derived from language-specific phone recognizers. Using the proposed target-oriented phone tokenizers as the phonotactic front-end, the language recognition system performance is significantly improved without the need for additional training samples. We achieve an equal error rate (EER) of 1.27%, 1.42% and 2.73% on the NIST 1996, 2003 and 2007 LRE databases respectively for 30-s closed-set tests. This system is one of the subsystems in IIR´s submission to NIST 2007 LRE.
  • Keywords
    speech recognition; vectors; language-specific phone recognizer; parallel phone recognizer; phone inventory; phone recognizer; spoken language recognition; target-oriented phone tokenizer; target-oriented phonotactic front-end; vector space modeling; Cepstral analysis; Error analysis; Humans; Mel frequency cepstral coefficient; NIST; Natural languages; Speech processing; Speech recognition; System performance; Target recognition; Feature selection; parallel phone recognizer (PPR); phonotactic feature; spoken language recognition; target-oriented phone tokenizer (TOPT); universal phone recognizer;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2009.2016731
  • Filename
    5165117