• DocumentCode
    2427258
  • Title

    Automatic gender classification using the mel frequency cepstrum of neutral and whispered speech: A comparative study

  • Author

    Nisha Meenakshi, G. ; Ghosh, Prasanta Kumar

  • Author_Institution
    Electr. Eng., Indian Inst. of Sci. (IISc), Bangalore, India
  • fYear
    2015
  • fDate
    Feb. 27 2015-March 1 2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    A whispered speech resembles an unvoiced speech due to the lack of vocal fold vibration unlike the neutral speech. Since information about the gender of a speaker typically lies in the pitch resulted from the vocal fold vibration (or source signal), identifying gender from the whispered speech is more challenging compared to that from the neutral speech. In the absence of the pitch, we study the use of the vocal tract filter captured through the spectral envelope for automatic gender classification (AGC) from a whispered speech. The spectral envelope is represented by the Mel frequency cepstral coefficients (MFCCs). We also compare the AGC performance from the neutral speech using only MFCCs with that from the whispered speech. AGC experiment using a set of 33 sentences spoken in neutral and whispered mode by 16 female and 20 male speakers reveals that the AGC accuracy using the neutral speech is, on average, higher (4.5% absolute) than that using the whispered speech when only the spectral shape information is used. This is true even when we use a subset of MFCCs obtained by a forward cepstral coefficient selection algorithm. However, the AGC accuracy obtained using the MFCC of the neutral speech is found to be 2.83% (absolute) lower compared to that using pitch. These findings not only suggest that there is gender specific information in the spectral shape but also indicate that the spectral shape carries less gender specific information when a speaker whispers as opposed to speaking normally.
  • Keywords
    speech processing; vibrations; AGC; MFCC; Mel frequency cepstral coefficient; automatic gender classification; forward cepstral coefficient selection algorithm; neutral speech; source signal; spectral envelope; spectral shape information; unvoiced speech; vocal fold vibration; vocal tract filter; whispered speech; Accuracy; Mel frequency cepstral coefficient; Spectral shape; Speech; Support vector machines; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications (NCC), 2015 Twenty First National Conference on
  • Conference_Location
    Mumbai
  • Type

    conf

  • DOI
    10.1109/NCC.2015.7084886
  • Filename
    7084886