DocumentCode :
900341
Title :
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations
Author :
Mesgarani, Nima ; Slaney, Malcolm ; Shamma, Shihab A.
Author_Institution :
Electr. & Comput. Eng. Dept., Univ. of Maryland, College Park, MD, USA
Volume :
14
Issue :
3
fYear :
2006
fDate :
5/1/2006 12:00:00 AM
Firstpage :
920
Lastpage :
930
Abstract :
We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation.
Keywords :
audio signal processing; modulation; speech processing; support vector machines; SVM; auditory cortical processing; content-based audio classification; multidimensional spectro-temporal representation; multilinear dimensionality reduction technique; multiscale spectro-temporal modulations; nonspeech; speech discrimination; support vector machine; Acoustic noise; Animals; Classification algorithms; Humans; Music; Reverberation; Speech; Support vector machine classification; Support vector machines; Working environment noise; Audio classification and segmentation; auditory model; speech discrimination;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TSA.2005.858055
Filename :
1621204
Link To Document :
بازگشت