• DocumentCode
    64745
  • Title

    Art Critic: Multisignal Vision and Speech Interaction System in a Gaming Context

  • Author

    Reale, Michael J. ; Peng Liu ; Lijun Yin ; Canavan, Shaun

  • Author_Institution
    Dept. of Comput. Sci., State Univ. of New York at Binghamton, Binghamton, NY, USA
  • Volume
    43
  • Issue
    6
  • fYear
    2013
  • fDate
    Dec. 2013
  • Firstpage
    1546
  • Lastpage
    1559
  • Abstract
    True immersion of a player within a game can only occur when the world simulated looks and behaves as close to reality as possible. This implies that the game must correctly read and understand, among other things, the player´s focus, attitude toward the objects/persons in focus, gestures, and speech. In this paper, we proposed a novel system that integrates eye gaze estimation, head pose estimation, facial expression recognition, speech recognition, and text-to-speech components for use in real-time games. Both the eye gaze and head pose components utilize underlying 3-D models, and our novel head pose estimation algorithm uniquely combines scene flow with a generic head model. The facial expression recognition module uses the local binary patterns with three orthogonal planes approach on the 2-D shape index domain rather than the pixel domain, resulting in improved classification. Our system has also been extended to use a pan-tilt-zoom camera driven by the Kinect, allowing us to track a moving player. A test game, Art Critic, is also presented, which not only demonstrates the utility of our system but also provides a template for player/non-player character (NPC) interaction in a gaming context. The player alters his/her view of the 3-D world using head pose, looks at paintings/NPCs using eye gaze, and makes an evaluation based on the player´s expression and speech. The NPC artist will respond with facial expression and synthetic speech based on its personality. Both qualitative and quantitative evaluations of the system are performed to illustrate the system´s effectiveness.
  • Keywords
    cameras; computer games; face recognition; feature extraction; pose estimation; solid modelling; speech recognition; 2D shape index domain; 3D models; 3D world; Art Critic game; Kinect; eye gaze estimation; facial expression recognition; gaming context; generic head model; head pose estimation; local binary patterns; multisignal vision interaction system; orthogonal planes approach; pan-tilt-zoom camera; pixel domain; player attitude; player expression; player focus; player immersion; player speech; player-nonplayer character interaction; qualitative evaluation; quantitative evaluation; scene flow; speech interaction system; speech recognition; text-to-speech components; Cameras; Estimation; Face; Games; Solid modeling; Speech; Expression recognition; gaming interaction; gaze tracking; head pose estimation; speech recognition; text-to-speech;
  • fLanguage
    English
  • Journal_Title
    Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2267
  • Type

    jour

  • DOI
    10.1109/TCYB.2013.2271606
  • Filename
    6572826