مرکز منطقه ای اطلاع رساني علوم و فناوري - Audio-based classification of speaker characteristics

DocumentCode :

2930435

Title :

Audio-based classification of speaker characteristics

Author :

Dutta, Promiti ; Haubold, Alexander

Author_Institution :

Columbia Univ., New York, NY, USA

fYear :

2009

fDate :

June 28 2009-July 3 2009

Firstpage :

422

Lastpage :

425

Abstract :

The human voice is primarily a carrier of speech, but it also contains non-linguistic features unique to a speaker and indicative of various speaker demographics, e.g. gender, nativity, ethnicity. Such characteristics are helpful cues for audio/video search and retrieval. In this paper, we evaluate the effects of various low-, mid-, and high-level features for effective classification of speaker characteristics. Low-level signal-based features include MFCCs, LPCs, and six spectral features; mid-level statistical features model low-level features; and high-level semantic features are based on selected phonemes in addition to mid-level features. Our data set consists of approximately 76.4 hours of annotated audio with 2786 unique speaker segments used for classification. Quantitative evaluation of our method results in accuracy rates up to 98.6% on our test data for male/female classification using mid-level features and a linear kernel support vector machine. We determine that mid- and high-level features are optimal for identification of speaker characteristics.

Keywords :

audio signal processing; feature extraction; signal classification; speaker recognition; spectral analysis; statistical analysis; support vector machines; LPC; MFCC; audio annotation; audio-based speaker characteristic classification; audio/video search cue; audio/video search retrieval; high-level semantic feature extraction; linear kernel support vector machine; low-level signal-based feature extraction; male/female classification; mid-level statistical feature extraction; nonlinguistic feature extraction; phoneme selection; speaker characteristic identification; speaker demographics; speaker ethnicity; speaker gender; speaker nativity; speaker segmentation; spectral feature extraction; Aggregates; Automatic speech recognition; Covariance matrix; Feature extraction; Frequency estimation; Indexing; Linear predictive coding; Mel frequency cepstral coefficient; Speech analysis; Testing; LPC; MFCC; audio signal processing; classification; ethnicity; feature extraction; gender;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on

Conference_Location :

New York, NY

ISSN :

1945-7871

Print_ISBN :

978-1-4244-4290-4

Electronic_ISBN :

1945-7871

Type :

conf

DOI :

10.1109/ICME.2009.5202524

Filename :

5202524

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2930435