مرکز منطقه ای اطلاع رساني علوم و فناوري - Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array

DocumentCode :

1253782

Title :

Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array

Author :

Yamada, Takeshi ; Nakamura, Satoshi ; Shikano, Kiyohiro

Author_Institution :

Inst. of Inf. Sci. & Electron., Tsukuba Univ., Ibaraki, Japan

Volume :

Issue :

fYear :

2002

fDate :

2/1/2002 12:00:00 AM

Firstpage :

Lastpage :

Abstract :

This paper focuses on microphone arrays to realize distant-talking speech recognition in real environments. In distant-talking situations, users can speak at arbitrary positions while moving. Therefore, it,is very important for high quality speech acquisition using microphone arrays to localize a talker accurately. However, it is very difficult to localize a moving talker in noisy and reverberant environments. The talker localization errors result in performance degradation of speech recognition. One way to solve this problem is to integrate the speech recognition process and the talker localization into a unified framework. This paper proposes a new speech recognition algorithm based on a three-dimensional (3-D) Viterbi search. The 3-D Viterbi method extracts a direction-time sequence of parameter vectors by steering a beam to every direction in every frame, then finds the most likely path in a 3-D trellis space composed of talker directions, input frames and HMM states. This means that speech recognition and talker localization are performed simultaneously within a statistical framework. To evaluate the performance of the 3-D Viterbi method, recognition experiments for real environment data were carried out. The results confirmed that the 3-D Viterbi method drastically improves the recognition performance for the moving talker case as well as for the fixed-position talker case

Keywords :

acoustic transducer arrays; direction-of-arrival estimation; hidden Markov models; maximum likelihood estimation; microphones; search problems; speech recognition; 3-D Viterbi search; 3-D trellis space; HMM states; direction-time sequence; distant-talking speech recognition; fixed-position talker; high quality speech acquisition; input frames; microphone array; moving talker; speech recognition algorithm; talker directions; talker localization errors; Additive noise; Degradation; Hidden Markov models; Microphone arrays; Neural networks; Speech enhancement; Speech recognition; Viterbi algorithm; Wiener filter; Working environment noise;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/89.985542

Filename :

985542

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1253782