مرکز منطقه ای اطلاع رساني علوم و فناوري - ASR system based on pitch, energy contours and unvoiced regions

DocumentCode :

672887

Title :

ASR system based on pitch, energy contours and unvoiced regions

Author :

Gupta, V.K. ; Das, Pradip K.

Author_Institution :

Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Guwahati, Guwahati, India

fYear :

2013

fDate :

25-27 Nov. 2013

Firstpage :

Lastpage :

Abstract :

Most of the leading speech recognition technologies are based on frequency domain analysis. Most of them gives good accuracy. We generally take an assumption in frequency domain analysis that speech is a periodic signal and so we can use the Fourier transform for analysis purposes. But in reality speech is not a periodic signal; to be exact speech is a quasi-periodic signal. So using the Fourier transform for analysis purposes in itself introduces some approximation errors, which results into some inherent noise in the system. We think that this is the reason why the existing recognition techniques are getting saturated and accuracy is not improving. In spite of these limitations we cannot deny the importance of the frequency domain analysis in speech recognition area as it provides a very good mechanism to extract very relevant features form the speech. So in this work we have tried to use efficiency of frequency domain analysis without using Fourier transform along with time domain analysis. To represent frequency domain we have selected Pitch for analysis purposes and to incorporate time domain we have retained its temporal information. In our previous work we suggested a method for speech tokenization using pitch (Fundamental Frequency), energy and detected unvoiced regions. In this work we will be showing that it is possible to use this tokenization scheme to make a speech recognition system. Results are encouraging and we are hoping that this frame work in conjunction with the existing techniques will result into better recognition systems.

Keywords :

frequency-domain analysis; speech recognition; ASR system; Fourier transform; approximation errors; automatic speech recognition technology; energy contours; frequency domain analysis; fundamental frequency; pitch analysis; quasiperiodic signal; speech tokenization; time domain analysis; unvoiced region detection; Accuracy; Databases; Feature extraction; Frequency-domain analysis; Hidden Markov models; Speech; Speech recognition; Frequency Domain Analysis (Spectral Analysis); Fundamental Frequency (F0); Speech Tokenization; Time Domain analysis (Temporal Analysis);

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference

Conference_Location :

Gurgaon

Type :

conf

DOI :

10.1109/ICSDA.2013.6709912

Filename :

6709912

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=672887