• DocumentCode
    1133436
  • Title

    An introduction to voice search

  • Author

    Wang, Ye-Yi ; Yu, Dong ; Ju, Yun-Cheng ; Acero, Alex

  • Author_Institution
    Shanghai Jiao Tong Univ., Shanghai
  • Volume
    25
  • Issue
    3
  • fYear
    2008
  • fDate
    5/1/2008 12:00:00 AM
  • Firstpage
    28
  • Lastpage
    38
  • Abstract
    Voice search is the technology underlying many spoken dialog systems (SDSs) that provide users with the information they request with a spoken query. The information normally exists in a large database, and the query has to be compared with a field in the database to obtain the relevant information. The contents of the field, such as business or product names, are often unstructured text. This article categorized spoken dialog technology into form filling, call routing, and voice search, and reviewed the voice search technology. The categorization was made from the technological perspective. It is important to note that a single SDS may apply the technology from multiple categories. Robustness is the central issue in voice search. The technology in acoustic modeling aims at improved robustness to environment noise, different channel conditions, and speaker variance; the pronunciation research addresses the problem of unseen word pronunciation and pronunciation variance; the language model research focuses on linguistic variance; the studies in search give rise to improved robustness to linguistic variance and ASR errors; the dialog management research enables graceful recovery from confusions and understanding errors; and the learning in the feedback loop speeds up system tuning for more robust performance. While tremendous achievements have been accomplished in the past decade on voice search, large challenges remain. Many voice search dialog systems have automation rates around or below 50% in field trials.
  • Keywords
    interactive systems; query processing; speaker recognition; acoustic modeling; automatic speech recognition error; channel condition; dialog management; environment noise; large database; linguistic variance; pronunciation research; query processing; speaker variance; spoken dialog system categorization; voice search; Acoustic noise; Automatic speech recognition; Databases; Environmental management; Filling; Loudspeakers; Natural languages; Noise robustness; Routing; Working environment noise;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Magazine, IEEE
  • Publisher
    ieee
  • ISSN
    1053-5888
  • Type

    jour

  • DOI
    10.1109/MSP.2008.918411
  • Filename
    4490199