An introduction to voice search

Author

Wang, Ye-Yi ; Yu, Dong ; Ju, Yun-Cheng ; Acero, Alex

Author_Institution

Shanghai Jiao Tong Univ., Shanghai

Volume

25

Issue

3

fYear

2008

fDate

5/1/2008 12:00:00 AM

Firstpage

28

Lastpage

38

Abstract

Voice search is the technology underlying many spoken dialog systems (SDSs) that provide users with the information they request with a spoken query. The information normally exists in a large database, and the query has to be compared with a field in the database to obtain the relevant information. The contents of the field, such as business or product names, are often unstructured text. This article categorized spoken dialog technology into form filling, call routing, and voice search, and reviewed the voice search technology. The categorization was made from the technological perspective. It is important to note that a single SDS may apply the technology from multiple categories. Robustness is the central issue in voice search. The technology in acoustic modeling aims at improved robustness to environment noise, different channel conditions, and speaker variance; the pronunciation research addresses the problem of unseen word pronunciation and pronunciation variance; the language model research focuses on linguistic variance; the studies in search give rise to improved robustness to linguistic variance and ASR errors; the dialog management research enables graceful recovery from confusions and understanding errors; and the learning in the feedback loop speeds up system tuning for more robust performance. While tremendous achievements have been accomplished in the past decade on voice search, large challenges remain. Many voice search dialog systems have automation rates around or below 50% in field trials.

Keywords

interactive systems; query processing; speaker recognition; acoustic modeling; automatic speech recognition error; channel condition; dialog management; environment noise; large database; linguistic variance; pronunciation research; query processing; speaker variance; spoken dialog system categorization; voice search; Acoustic noise; Automatic speech recognition; Databases; Environmental management; Filling; Loudspeakers; Natural languages; Noise robustness; Routing; Working environment noise;

fLanguage

English

Journal_Title

Signal Processing Magazine, IEEE

Publisher

ieee

ISSN

1053-5888

Type

jour

DOI

10.1109/MSP.2008.918411

Filename

4490199