مرکز منطقه ای اطلاع رساني علوم و فناوري - Classifying laughter and speech using audio-visual feature prediction

DocumentCode :

2789313

Title :

Classifying laughter and speech using audio-visual feature prediction

Author :

Petridis, Stavros ; Asghar, Ali ; Pantic, Maja

Author_Institution :

Dept. of Comput., Imperial Coll. London, London, UK

fYear :

2010

fDate :

14-19 March 2010

Firstpage :

5254

Lastpage :

5257

Abstract :

In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and visual-to-audio features mapping for both classes. Classification of a new frame is performed via prediction. All the networks produce a prediction of the expected audio/visual features and the network with the best prediction, i.e., the model which best describes the audiovisual feature relationship, provides its label to the input frame. When trained on a simple dataset and tested on a hard dataset, the proposed approach outperforms audiovisual feature-level fusion, resulting in a 10.9% and 6.4% absolute increase in the F1 rate for laughter and classification rate, respectively. This indicates that classification based on prediction can produce a good model even when the available dataset is not challenging enough.

Keywords :

audio signal processing; audio-visual systems; neural nets; pattern classification; speech processing; audio-to-visual feature mapping; audiovisual feature-level fusion; laughter; neural networks; prediction-based classification; speech feature; visual-to-audio feature mapping; Computer networks; Concatenated codes; Educational institutions; Neural networks; Performance analysis; Predictive models; Speech; Testing; audiovisual speech / laughter feature relationship; laughter-vs-speech discrimination; prediction-based classification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location :

Dallas, TX

ISSN :

1520-6149

Print_ISBN :

978-1-4244-4295-9

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2010.5494992

Filename :

5494992

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2789313