• DocumentCode
    454592
  • Title

    Robust Large Vocabulary Continuous Speech Recognition using Polynomial Segment Model with Unsupervised Adaptation

  • Author

    Siu, Man-Hung ; Yeung, Siu-Kei Au

  • Author_Institution
    Dept. of Electr. & Electron. Eng., Hong Kong Univ. of Sci. & Technol., Kowloon
  • Volume
    1
  • fYear
    2006
  • fDate
    14-19 May 2006
  • Abstract
    Robustness has been an important issue for applying speech technologies to real applications. While the polynomial segment models (PSMs) have been shown to outperform HMM under the clean environment, the segmental likelihood evaluation may make the PSM distributions sharper and may adversely affect their performance in mis-matched conditions. In this paper, we explore the robustness properties of the PSM under noisy and channel mis-match conditions. In addition, unsupervised adaptation techniques have been shown to work well for environmental adaptation even with small amount of adaptation data. Thus, it is interesting to compare the PSMs´ and the HMMs´ performances after applying two types of unsupervised adaptation: the maximum likelihood linear regression (MLLR) and the reference speaker weighting (RSW). Experiments were performed on the Aurora 4 corpus under both clean and multi-conditional training. Our results show that even under noisy and mis-match conditions, the PSMs performed well compared to the HMMs both before and after environmental adaptation. Using the best lattice, the RSW adapted PSM gave word error rates of 26.5% and 21.3% for clean and multi-conditional training respectively which were approximately 24% better than the unadapted HMM
  • Keywords
    error statistics; hidden Markov models; maximum likelihood estimation; regression analysis; speech recognition; Aurora 4 corpus; HMM; channel mismatch condition; maximum likelihood linear regression; multi-conditional training; noisy mismatch condition; polynomial segment model; reference speaker weighting; robust large vocabulary continuous speech recognition; robustness properties; segmental likelihood evaluation; speech technologies; unsupervised adaptation techniques; word error rates; Adaptation model; Error analysis; Hidden Markov models; Lattices; Maximum likelihood linear regression; Polynomials; Robustness; Speech recognition; Vocabulary; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
  • Conference_Location
    Toulouse
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0469-X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2006.1660054
  • Filename
    1660054