Title :
Power function-based power distribution normalization algorithm for robust speech recognition
Author :
Kim, Chanwoo ; Stern, Richard M.
Author_Institution :
Dept. of Electr. & Comput. Eng. & Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
fDate :
Nov. 13 2009-Dec. 17 2009
Abstract :
A novel algorithm that normalizes the distribution of spectral power coefficients is described in this paper. The algorithm, called power-function-based power distribution (PPDN) is based on the observation that the ratio of arithmetic mean to geometric mean changes as speech is corrupted by noise, and a parametric power function is used to equalize this ratio. We also observe that a longer ¿medium-duration¿ observation window (of approximately 100 ms) is better suited for parameter estimation for noise compensation than the briefer window typically used for automatic speech recognition. We also describe the implementation of an online version of PPDN based on exponentially weighted temporal averaging. Experimental results shows that PPDN provides comparable or slightly better results than state of- the-art algorithms such as vector Taylor series for speech recognition while requiring much less computation. Hence, the algorithm is suitable for both real-time speech communication or as a real-time preprocessing stage for speech recognition systems.
Keywords :
algorithm theory; real-time systems; speech recognition; Taylor series speech recognition; automatic speech recognition; medium duration observation window; normalization algorithm; parameter estimation noise compensation; parametric power function; power function based power distribution; ratio arithmetic mean; real-time preprocessing stage; robust speech recognition; spectral power coefficients; weighted temporal averaging; Arithmetic; Automatic speech recognition; Noise robustness; Parameter estimation; Power distribution; Real time systems; Signal to noise ratio; Speech enhancement; Speech recognition; Taylor series; Power distribution; equalization; medium-duration window; ratio of arithmetic mean to geometric mean;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
DOI :
10.1109/ASRU.2009.5373233