• DocumentCode
    746464
  • Title

    Robust endpoint detection and energy normalization for real-time speech and speaker recognition

  • Author

    Li, Qi ; Zheng, Jinsong ; Tsai, Augustine ; Zhou, Qiru

  • Author_Institution
    Multimedia Commun. Res. Lab, Lucent Technol. Bell Labs., Murray Hill, NJ, USA
  • Volume
    10
  • Issue
    3
  • fYear
    2002
  • fDate
    3/1/2002 12:00:00 AM
  • Firstpage
    146
  • Lastpage
    157
  • Abstract
    When automatic speech recognition (ASR) and speaker verification (SV) are applied in adverse acoustic environments, endpoint detection and energy normalization can be crucial to the functioning of both systems. In low signal-to-noise ratio (SNR) and nonstationary environments, conventional approaches to endpoint detection and energy normalization often fail and ASR performances usually degrade dramatically. The purpose of this paper is to address the endpoint problem. For ASR, we propose a real-time approach. It uses an optimal filter plus a three-state transition diagram for endpoint detection. The filter is designed utilizing several criteria to ensure accuracy and robustness. It has almost invariant response at various background noise levels. The detected endpoints are then applied to energy normalization sequentially. Evaluation results show that the proposed algorithm significantly reduces the string error rates in low SNR situations. The reduction rates even exceed 50% in several evaluated databases. For SV, we propose a batch-mode approach. It uses the optimal filter plus a two-mixture energy model for endpoint detection. The experiments show that the batch-mode algorithm can detect endpoints as accurately as using HMM forced alignment while the proposed one has much less computational complexity
  • Keywords
    filtering theory; hidden Markov models; noise; real-time systems; signal detection; speaker recognition; speech recognition; HMM forced alignment; adverse acoustic environments; automatic speech recognition; background noise levels; batch-mode algorithm; batch-mode approach; computational complexity; databases; energy normalization; error rate reduction; low SNR; nonstationary environments; optimal filter; real-time approach; real-time speaker recognition; real-time speech recognition; robust endpoint detection; signal-to-noise ratio; speaker verification; three-state transition diagram; two-mixture energy model; Acoustic signal detection; Automatic speech recognition; Background noise; Degradation; Error analysis; Filters; Hidden Markov models; Loudspeakers; Noise robustness; Signal to noise ratio;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2002.1001979
  • Filename
    1001979