• DocumentCode
    589849
  • Title

    Speaker accent recognition through statistical descriptors of Mel-bands spectral energy and neural network model

  • Author

    Ma, Yanru ; Paulraj, M.P. ; Yaacob, Sazali ; Shahriman, A.B. ; Nataraj, Sathees Kumar

  • Author_Institution
    Fac. of Electr. Eng., Univ. Teknol. MARA, Nibong Tebal, Malaysia
  • fYear
    2012
  • fDate
    6-9 Oct. 2012
  • Firstpage
    262
  • Lastpage
    267
  • Abstract
    Accent recognition is one of the most important topics in automatic speaker and speaker-independent speech recognition (SI-ASR) systems in recent years. The growth of voice-controlled technologies has becoming part of our daily life, nevertheless variability in speech makes these spoken language technologies relatively difficult. One of the profound variability is accent. By classifying accent types, different models could be developed to handle SI-ASR. In this paper, we classified three accents in English language recorded from three main ethnicities in Malaysia namely Malay, Chinese and Indian using artificial neural network model. All experiments were performed in speaker-independent and three most accent-sensitive words-independent modes. Mel-bands spectral energy was extracted from eighteen bands taking the statistical values of each speech sample i.e. mean, standard deviation, kurtosis and the ratio of standard deviation to kurtosis to characterize the spectral energy distribution. The system was evaluated using independent test dataset, partial-independent test dataset and training dataset. The best three-class accuracy rate of 99.01% with independent test dataset was obtained. The overall accuracy rate for several trials was averaged to 96.79% with the average learning time at 49 epochs.
  • Keywords
    natural language processing; neural nets; signal classification; speaker recognition; spectral analysis; statistical analysis; Chinese ethnicity; English language; Indian ethnicity; Malay ethnicity; Malaysia; Mel-band spectral energy distribution; SI-ASR; accent classification; accent-sensitive word-independent mode; artificial neural network model; automatic speaker recognition systems; kurtosis value; learning time; mean value; partial-independent test dataset; speaker accent recognition; speaker-independent speech recognition systems; speech variability; spoken language technology; standard deviation value; statistical descriptors; training dataset; voice-controlled technology; Artificial neural networks; Feature extraction; Hidden Markov models; Neurons; Speech; Training; Accent recognition; Mel-bands; Neural network; Spectral energy; Statistical analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Sustainable Utilization and Development in Engineering and Technology (STUDENT), 2012 IEEE Conference on
  • Conference_Location
    Kuala Lumpur
  • ISSN
    1985-5753
  • Print_ISBN
    978-1-4673-1649-1
  • Electronic_ISBN
    1985-5753
  • Type

    conf

  • DOI
    10.1109/STUDENT.2012.6408416
  • Filename
    6408416