DocumentCode :
3349465
Title :
Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation
Author :
Kim, Hyoung-Gook ; Sikora, Thomas
Author_Institution :
Commun. Syst. Group, Technische Univ. Berlin, Germany
Volume :
5
fYear :
2004
fDate :
17-21 May 2004
Abstract :
We evaluate the MPEG-7 audio spectrum projection (ASP) features for general sound recognition performance against the well established MFCC. The recognition tasks of interest are speaker recognition, sound classification, and segmentation of audio using sound/speaker identification. For sound classification we use three approaches: direct approach; hierarchical approach without hints; hierarchical approach with hints. For audio segmentation, the MPEG-7 ASP features and MFCCs are used to train hidden Markov models (HMM) for individual speakers and sounds. The trained sound/speaker models are then used to segment conversational speech involving a given subset of people in panel discussion television programs. Results show that the MFCC approach yields a sound/speaker recognition rate superior to MPEG-7 implementations.
Keywords :
audio signal processing; hidden Markov models; learning (artificial intelligence); signal classification; speaker recognition; HMM; MFCC; MPEG-7 audio spectrum projection; audio segmentation; conversational speech segmentation; hidden Markov models; panel discussion; sound classification; sound identification; speaker recognition; Application specific processors; Data mining; Feature extraction; Hidden Markov models; Indexing; Loudspeakers; MPEG 7 Standard; Mel frequency cepstral coefficient; Principal component analysis; Speaker recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1327263
Filename :
1327263
Link To Document :
بازگشت