مرکز منطقه ای اطلاع رساني علوم و فناوري - Recognizing voice over IP: a robust front-end for speech recognition on the world wide web

DocumentCode :

1491368

Title :

Recognizing voice over IP: a robust front-end for speech recognition on the world wide web

Author :

Peláez-Moreno, Carmen ; Gallardo-Antolín, Ascensión ; Díaz-de-María, Fernando

Author_Institution :

Dept. de Tecnologias de las Comunicaciones, Univ. Carlos III de Madrid, Spain

Volume :

Issue :

fYear :

2001

fDate :

6/1/2001 12:00:00 AM

Firstpage :

209

Lastpage :

218

Abstract :

The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due to the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen

Keywords :

Internet; Internet telephony; decoding; encoding; error handling; information resources; protocols; speech coding; speech recognition; ITU G.723.1 standard codec; Internet protocol environment; automatic speech recognition; encoded speech; encoding-decoding process; lossy speech coding; packet loss; quantization distortion; recognition feature vectors; robust front-end; simulated packet loss rates; speaker-independent continuous speech recognition; speaker-independent isolated digit recognition; speech recognition; world wide web; Automatic speech recognition; Codecs; Decoding; Feature extraction; IP networks; Internet telephony; Protocols; Robustness; Speech coding; Speech recognition;

fLanguage :

English

Journal_Title :

Multimedia, IEEE Transactions on

Publisher :

ieee

ISSN :

1520-9210

Type :

jour

DOI :

10.1109/6046.923820

Filename :

923820

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1491368