• DocumentCode
    2406643
  • Title

    Development of Recognition System Using Fusion of Natural Gesture/Speech

  • Author

    Jung, Young-Giu ; Han, Mun-sung ; Park, Jun Seok ; Lee, Sang Jo

  • fYear
    2008
  • fDate
    9-13 Jan. 2008
  • Firstpage
    1
  • Lastpage
    2
  • Abstract
    A multimodal interface can achieve more natural and effective human-computer interaction. In this paper, we present an isolated-word recognizer using a fusion of speech and natural visual gestures. The fusion of audio and visual signals can be carried out either at the class level or the feature level. Our system incorporates a fusion system at the feature level which supports 10 natural gestures. One of most difficult problems in feature level fusion is synchronization between audio and visual features. To solve this problem, we propose a modified time delay neural network (TDNN) architecture with a dedicated fusion layer and optimize parameters of this recognition model. Experimental results show that this system yields a performance improvement when compared to the performance of automatic speech recognition (ASR) under various signal-to-noise rate (SNR) conditions.
  • Keywords
    human computer interaction; image recognition; neural nets; speech recognition; user interfaces; audio signals; automatic speech recognition; human-computer interaction; multimodal interface; signal-to-noise rate conditions; time delay neural network; visual signals; Auditory system; Automatic speech recognition; Delay effects; Face detection; Feature extraction; Humans; Lips; Neural networks; Speech recognition; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Consumer Electronics, 2008. ICCE 2008. Digest of Technical Papers. International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4244-1458-1
  • Electronic_ISBN
    978-1-4244-1459-8
  • Type

    conf

  • DOI
    10.1109/ICCE.2008.4588016
  • Filename
    4588016