• DocumentCode
    62163
  • Title

    Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation

  • Author

    Wen-Li Wei ; Chung-Hsien Wu ; Jen-Chun Lin ; Han Li

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • Volume
    22
  • Issue
    3
  • fYear
    2014
  • fDate
    Mar-14
  • Firstpage
    659
  • Lastpage
    671
  • Abstract
    Determining how a speaker is engaged in a conversation is crucial for achieving harmonious interaction between computers and humans. In this study, a fusion approach was developed based on psychological factors to recognize Interaction Style ( IS) in spoken conversation, which plays a key role in creating natural dialogue agents. The proposed Fused Cross-Correlation Model (FCCM) provides a unified probabilistic framework to model the relationships among the psychological factors of emotion, personality trait ( PT), transient IS, and IS history, for recognizing IS. An emotional arousal-dependent speech recognizer was used to obtain the recognized spoken text for extracting linguistic features to estimate transient IS likelihood and recognize PT. A temporal course modeling approach and an emotional sub-state language model, based on the temporal phases of an emotional expression, were employed to obtain a better emotion recognition result. The experimental results indicate that the proposed FCCM yields satisfactory results in IS recognition and also demonstrate that combining psychological factors effectively improves IS recognition accuracy.
  • Keywords
    correlation methods; emotion recognition; feature extraction; probability; speaker recognition; FCCM; emotion recognition; emotional arousal-dependent speech recognizer; emotional sub-state language model; fused cross-correlation model; fusion approach; harmonious interaction; interaction style recognition; linguistic feature extraction; natural dialogue agents; psychological factors; spoken conversation; temporal course modeling approach; temporal phases; transient IS likelihood; unified probabilistic framework; Accuracy; Emotion recognition; Feature extraction; Psychology; Speech; Speech recognition; Text recognition; Emotion; interaction style; language model; personality trait; temporal phase;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2300339
  • Filename
    6714384