• DocumentCode
    2696795
  • Title

    Discriminatively trained Language Models using Support Vector Machines for Language Identification

  • Author

    Zhai, Lu-Feng ; Siu, Man-Hung ; Yang, Xi ; Gish, Herbert

  • Author_Institution
    Hong Kong Univ. of Sci. & Technol.
  • fYear
    2006
  • fDate
    28-30 June 2006
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper, we explore the use of the support vector machines (SVMs) to learn a discriminatively trained n-gram model for automatic language identification. Our focus is on practical considerations that make SVM technology more effective. We address the performance related issues of class priors, data imbalance, feature weighting, score normalization and combining multiple knowledge sources with SVMs. Using modified n-gram counts as features, we show that the SVM-trained n-grams are effective classifiers but they are sensitive to changes in prior class distributions. Using balanced prior distributions or score normalization procedures, the SVM-trained n-gram outperformed the traditional n-gram in parallel phoneme recognition with language model and GMM-UBM-based language identification systems by more than 30% relative error reduction on the OGI-TS corpus
  • Keywords
    Gaussian distribution; natural languages; speech recognition; support vector machines; training; GMM-UBM; Gaussian mixture model; OGI-TS corpus; SVM; automatic language identification; balanced prior distribution; discriminatively trained language model; parallel phoneme recognition; score normalization; support vector machine; Acoustics; Engines; Maximum likelihood estimation; Natural languages; Pattern recognition; Power system modeling; Support vector machine classification; Support vector machines; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speaker and Language Recognition Workshop, 2006. IEEE Odyssey 2006: The
  • Conference_Location
    San Juan
  • Print_ISBN
    1-424400471-1
  • Electronic_ISBN
    1-4244-0472-X
  • Type

    conf

  • DOI
    10.1109/ODYSSEY.2006.248098
  • Filename
    4013515