• DocumentCode
    542686
  • Title

    Visual speech feature extraction for improved speech recognition

  • Author

    Zhang, X. ; Mersereau, R.M. ; Clements, M. ; Broun, C.C.

  • Author_Institution
    Center for Signal & Image Processing, Georgia Institute of Technology, Atlanta, 30332-0250, USA
  • Volume
    2
  • fYear
    2002
  • fDate
    13-17 May 2002
  • Abstract
    Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real world in the presence of noise. On the other hand, most human listeners, both hearing-impaired and normal hearing, make use of visual information to improve speech perception in acoustically hostile environments. Motivated by humans´ ability to lipread, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved accuracy over totally acoustic systems, especially in noisy environments. In this paper, we investigate the usefulness of visual information in speech recognition. We first present a method for automatically locating and extracting visual speech features from a talking person in color video sequences. We then develop a recognition engine to train and recognize sequences of visual parameters for the purpose of speech recognition. We particularly explore the impact of various combinations of visual features on the recognition accuracy. We conclude that the inner lip contour features together with the information about the visibility of the tongue and teeth significantly improve the performance over using outer contour only features in both speaker dependent and speaker independent recognition tasks.
  • Keywords
    Feature extraction; Hidden Markov models;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
  • Conference_Location
    Orlando, FL, USA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7402-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2002.5745022
  • Filename
    5745022