Title :
Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation
Author :
Kim, Hyoung-Gook ; Sikora, Thomas
Author_Institution :
Commun. Syst. Group, Technische Univ. Berlin, Germany
Abstract :
We evaluate the MPEG-7 audio spectrum projection (ASP) features for general sound recognition performance against the well established MFCC. The recognition tasks of interest are speaker recognition, sound classification, and segmentation of audio using sound/speaker identification. For sound classification we use three approaches: direct approach; hierarchical approach without hints; hierarchical approach with hints. For audio segmentation, the MPEG-7 ASP features and MFCCs are used to train hidden Markov models (HMM) for individual speakers and sounds. The trained sound/speaker models are then used to segment conversational speech involving a given subset of people in panel discussion television programs. Results show that the MFCC approach yields a sound/speaker recognition rate superior to MPEG-7 implementations.
Keywords :
audio signal processing; hidden Markov models; learning (artificial intelligence); signal classification; speaker recognition; HMM; MFCC; MPEG-7 audio spectrum projection; audio segmentation; conversational speech segmentation; hidden Markov models; panel discussion; sound classification; sound identification; speaker recognition; Application specific processors; Data mining; Feature extraction; Hidden Markov models; Indexing; Loudspeakers; MPEG 7 Standard; Mel frequency cepstral coefficient; Principal component analysis; Speaker recognition;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
Print_ISBN :
0-7803-8484-9
DOI :
10.1109/ICASSP.2004.1327263