• DocumentCode
    1059868
  • Title

    Automatic Detection of Disfluency Boundaries in Spontaneous Speech of Children Using Audio–Visual Information

  • Author

    Yildirim, Serdar ; Narayanan, Shrikanth

  • Author_Institution
    Dept. of Electr. Eng. & IMSC, Univ. of Southern California, Los Angeles, CA
  • Volume
    17
  • Issue
    1
  • fYear
    2009
  • Firstpage
    2
  • Lastpage
    12
  • Abstract
    The presence of disfluencies in spontaneous speech, while poses a challenge for robust automatic recognition, also offers means for gaining additional insights into understanding a speaker´s communicative and cognitive state. This paper analyzes disfluencies in children´s spontaneous speech, in the context of spoken dialog based computer game play, and addresses the automatic detection of disfluency boundaries. Although several approaches have been proposed to detect disfluencies in speech, relatively little work has been done to utilize visual information to improve the performance and robustness of the disfluency detection system. This paper describes the use of visual information along with prosodic and language information to detect the presence of disfluencies in a child´s computer-directed speech and shows how these information sources can be integrated to increase the overall information available for disfluency detection. The experimental results on our children´s multimodal dialog corpus indicate that disfluency detection accuracy of over 80% can be obtained by utilizing audio-visual information. Specifically, results showed that the addition of visual information to prosody and language features yield relative improvements in disfluency detection error rates of 3.6% and 6.3%, respectively, for information fusion at the feature level and decision level.
  • Keywords
    computer games; sensor fusion; speech recognition; audio-visual information; children spontaneous speech recognition; disfluency boundary automatic detection; information fusion; multimodal dialog corpus; prosodic information; spoken dialog based computer game play; Automatic speech recognition; Computer vision; Context; Engineering profession; Error analysis; Feature extraction; Natural languages; Robustness; Speech analysis; Speech processing; Disfluency detection; feature selection; information fusion; spoken language processing; spontaneous children speech;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2008.2006728
  • Filename
    4740159