Title :
On the generalization of Shannon entropy for speech recognition
Author :
Obin, Nicolas ; Liuni, M.
Author_Institution :
IRCAM, STMS, Paris, France
Abstract :
This paper introduces an entropy-based spectral representation as a measure of the degree of noisiness in audio signals, complementary to the standard MFCCs for audio and speech recognition. The proposed representation is based on the Rényi entropy, which is a generalization of the Shannon entropy. In audio signal representation, Rényi entropy presents the advantage of focusing either on the harmonic content (prominent amplitude within a distribution) or on the noise content (equal distribution of amplitudes). The proposed representation outperforms all other noisiness measures - including Shannon and Wiener entropies - in a large-scale classification of vocal effort (whispered-soft/normal/loud-shouted) in the real scenario of multi-language massive role-playing video games. The improvement is around 10% in relative error reduction, and is particularly significant for the recognition of noisy speech - i.e., whispery/breathy speech. This confirms the role of noisiness for speech recognition, and will further be extended to the classification of voice quality for the design of an automatic voice casting system in video games.
Keywords :
audio signal processing; computer games; information theory; signal classification; signal representation; spectral analysis; speech recognition; Renyi entropy-based spectral representation; Shannon entropy; Wiener entropies; audio recognition; audio signal; audio signal representation; automatic voice casting system; degree of noisiness; harmonic content; large-scale classification; multilanguage massive role-playing video games; noise content; noisiness measures; noisy speech recognition; real scenario; relative error reduction; standard MFCC; vocal effort; voice quality classification; whispery-breathy speech; Entropy; Games; Harmonic analysis; Mel frequency cepstral coefficient; Noise; Speech; Speech recognition; expressive speech; information theory; spectral entropy; speech recognition; video games; voice quality;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4673-5125-6
Electronic_ISBN :
978-1-4673-5124-9
DOI :
10.1109/SLT.2012.6424204