Title :
Audiovisual classification of vocal outbursts in human conversation using Long-Short-Term Memory networks
Author :
Eyben, Florian ; Petridis, Stavros ; Schuller, Björn ; Tzimiropoulos, George ; Zafeiriou, Stefanos ; Pantic, Maja
Author_Institution :
Inst. for Human-Machine Commun., Tech. Univ. Munchen, Munich, Germany
Abstract :
We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year´s Paralinguistic Challenge´s Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.
Keywords :
audio signal processing; audio-visual systems; recurrent neural nets; support vector machines; video signal processing; LSTM networks; audiovisual approach; audiovisual classification; human conversation; human-to-human natural conversation; long short-term memory; long-short-term memory networks; nonlinguistic vocalisations; recurrent speech processing neural networks; support vector machines; video-based analysis; Acoustics; Face; Feature extraction; Recurrent neural networks; Shape; Speech recognition; Visualization; Audio-visual Processing; Laughter; Long Short-Term Memory; Non-linguistic Vocalisations;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2011.5947690