DocumentCode
730369
Title
Multimodal addressee detection in multiparty dialogue systems
Author
Tsai, T.J. ; Stolcke, Andreas ; Slaney, Malcolm
Author_Institution
Univ. of California Berkeley, Berkeley, CA, USA
fYear
2015
fDate
19-24 April 2015
Firstpage
2314
Lastpage
2318
Abstract
Addressee detection answers the question, “Are you talking to me?” When multiple users interact with a dialogue system, it is important to know when a user is speaking to the computer and when he or she is speaking to another person. We approach this problem from a multimodal perspective, using lexical, acoustic, visual, dialog state, and beam-forming information. Using data from a multiparty dialogue system, we demonstrate the benefit of using multiple modalities over using a single modality. We also assess the relative importance of the various modalities in predicting the addressee. In our experiments, we find that acoustic features are by far the most important, that ASR and system-state information are useful, and that visual and beamforming features provide little additional benefit. Our study suggests that acoustic, lexical, and system state information are an effective, economical combination of modalities to use in addressee detection.
Keywords
interactive systems; speech processing; acoustic information; beam-forming information; dialog state information; lexical information; multimodal addressee detection; multimodal perspective; multiparty dialogue system; single modality; visual information; Acoustics; Computational modeling; Computers; Face; Feature extraction; Speech; Visualization; addressee detection; dialog system; human-human-computer; multimodality; multiparty;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7178384
Filename
7178384
Link To Document