Automatic gender classification using the mel frequency cepstrum of neutral and whispered speech: A comparative study

Author

Nisha Meenakshi, G. ; Ghosh, Prasanta Kumar

Author_Institution

Electr. Eng., Indian Inst. of Sci. (IISc), Bangalore, India

fYear

2015

fDate

Feb. 27 2015-March 1 2015

Firstpage

1

Lastpage

6

Abstract

A whispered speech resembles an unvoiced speech due to the lack of vocal fold vibration unlike the neutral speech. Since information about the gender of a speaker typically lies in the pitch resulted from the vocal fold vibration (or source signal), identifying gender from the whispered speech is more challenging compared to that from the neutral speech. In the absence of the pitch, we study the use of the vocal tract filter captured through the spectral envelope for automatic gender classification (AGC) from a whispered speech. The spectral envelope is represented by the Mel frequency cepstral coefficients (MFCCs). We also compare the AGC performance from the neutral speech using only MFCCs with that from the whispered speech. AGC experiment using a set of 33 sentences spoken in neutral and whispered mode by 16 female and 20 male speakers reveals that the AGC accuracy using the neutral speech is, on average, higher (4.5% absolute) than that using the whispered speech when only the spectral shape information is used. This is true even when we use a subset of MFCCs obtained by a forward cepstral coefficient selection algorithm. However, the AGC accuracy obtained using the MFCC of the neutral speech is found to be 2.83% (absolute) lower compared to that using pitch. These findings not only suggest that there is gender specific information in the spectral shape but also indicate that the spectral shape carries less gender specific information when a speaker whispers as opposed to speaking normally.

Keywords

speech processing; vibrations; AGC; MFCC; Mel frequency cepstral coefficient; automatic gender classification; forward cepstral coefficient selection algorithm; neutral speech; source signal; spectral envelope; spectral shape information; unvoiced speech; vocal fold vibration; vocal tract filter; whispered speech; Accuracy; Mel frequency cepstral coefficient; Spectral shape; Speech; Support vector machines; Training;

fLanguage

English

Publisher

ieee

Conference_Titel

Communications (NCC), 2015 Twenty First National Conference on

Conference_Location

Mumbai

Type

conf

DOI

10.1109/NCC.2015.7084886

Filename

7084886