مرکز منطقه ای اطلاع رساني علوم و فناوري - Audio classification based on MPEG-7 spectral basis representations

DocumentCode :

975526

Title :

Audio classification based on MPEG-7 spectral basis representations

Author :

Kim, Hyoung-Gook ; Moreau, Nicolas ; Sikora, Thomas

Author_Institution :

Commun. Syst. Group, Tech. Univ. of Berlin, Germany

Volume :

Issue :

fYear :

2004

fDate :

5/1/2004 12:00:00 AM

Firstpage :

716

Lastpage :

725

Abstract :

In this paper, we present an MPEG-7-based audio classification and retrieval technique targeted for analysis of film material. The technique consists of low-level descriptors and high-level description schemes. For low-level descriptors, low-dimensional features such as audio spectrum projection based on audio spectrum basis descriptors is produced in order to find a balanced tradeoff between reducing dimensionality and retaining maximum information content. High-level description schemes are used to describe the modeling of reduced-dimension features, the procedure of audio classification, and retrieval. A classifier based on continuous hidden Markov models is applied. The sound model state path, which is selected according to the maximum-likelihood model, is stored in an MPEG-7 sound database and used as an index for query applications. Various experiments are presented where the speaker- and sound-recognition rates are compared for different feature extraction methods. Using independent component analysis, we achieved better results than normalized audio spectrum envelope and principal component analysis in a speaker recognition system. In audio classification experiments, audio sounds are classified into selected sound classes in real time with an accuracy of 96%.

Keywords :

audio databases; audio signal processing; feature extraction; hidden Markov models; independent component analysis; information retrieval; maximum likelihood detection; pattern classification; principal component analysis; spectral analysis; MPEG-7-based audio classification; audio retrieval technique; audio spectrum basis descriptors; audio spectrum envelope; audio spectrum projection; double-data features; feature extraction methods; film material analysis; hidden Markov models; high-level description schemes; independent component analysis; low-level descriptors; maximum-likelihood model; principal component analysis; recognition errors; sound class hypothesis; sound database; sound recognition; speaker recognition system; Brightness; Decorrelation; Discrete cosine transforms; Feature extraction; Filter bank; Hidden Markov models; Independent component analysis; MPEG 7 Standard; Mel frequency cepstral coefficient; Speech;

fLanguage :

English

Journal_Title :

Circuits and Systems for Video Technology, IEEE Transactions on

Publisher :

ieee

ISSN :

1051-8215

Type :

jour

DOI :

10.1109/TCSVT.2004.826766

Filename :

1294962

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=975526