• DocumentCode
    417116
  • Title

    Using Haar transformed vocal source information for automatic speaker recognition

  • Author

    Zheng, Nengheng ; Ching, P.C.

  • Author_Institution
    Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Shatin, China
  • Volume
    1
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    This paper attempts to investigate the effectiveness of incorporating vocal source information for enhancing automatic speaker recognition accuracy. We propose a new method to extract discriminative features from the linear prediction (LP) residual signal, which are closely related to the glottal excitation of individual speaker. A complementary parameter set in addition to the commonly used linear predictive cepstral coefficients (LPCC), called Haar octave coefficients of residue (HOCOR), is obtained by applying a Haar transform to the LP residue. This additional feature vector retains the spectro-temporal characteristics of the source excitation sequences that are related to the fundamental frequency, harmonics, as well as their phases. Experimental evaluation over the YOHO corpus demonstrates the high speaker discriminative power and high inter-speaker variability of HOCOR. Speaker recognition tests with both vocal tract feature (LPCC) and vocal source information (HOCOR) outperform the conventional methods of using LPCC only.
  • Keywords
    Haar transforms; cepstral analysis; feature extraction; speaker recognition; time-frequency analysis; HOCOR; Haar octave coefficients of residue; Haar transformed vocal source information; LPCC; automatic speaker recognition; discriminative feature extraction; individual speaker glottal excitation; inter-speaker variability; linear prediction residual signal; linear predictive cepstral coefficients; residue time-frequency analysis; source excitation sequence spectro-temporal characteristics; speaker discriminative power; vocal tract features; Automatic speech recognition; Cepstral analysis; Data mining; Feature extraction; Fourier transforms; Mel frequency cepstral coefficient; Partial response channels; Speaker recognition; Testing; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1325926
  • Filename
    1325926