Title :
Visual Speech Detection Using an Unsupervised Learning Framework
Author :
Ahmad, Rabiah ; Raza, Syed Paymaan ; Malik, Haroon
Author_Institution :
ECE Dept., Univ. of Michigan - Dearborn, Dearborn, MI, USA
Abstract :
This paper presents an unsupervised learning framework for visual speech detection. Bimodal GMM is used to model visual features, i.e., mouth region intensity, which varies during speech. Variation in the mouth region intensity is used for visual speech and non-speech classification. The GMM parameters are estimated using the EM algorithm. Performance of the proposed algorithm is evaluated using a dataset consisting of 14 video clips containing almost 20, 000 frames. Performance of the proposed algorithm is also compared with existing state-of-the-art. Experimental results show that the proposed method achieves high detection and low false alarm rates.
Keywords :
Gaussian processes; expectation-maximisation algorithm; feature extraction; image classification; learning (artificial intelligence); mixture models; video signal processing; EM algorithm; Gaussian mixture model; bimodal GMM parameter estimation; detection rates; expectation maximization algorithm; false alarm rates; mouth region intensity; performance evaluation; unsupervised learning framework; video clips; visual feature modelling; visual nonspeech classification; visual speech classification; visual speech detection; Cavity resonators; Signal processing algorithms; Speech; Teeth; Video sequences; Visualization; Expectation Maximization (EM); Gaussian Mixture Model (GMM); Unsupervised Learning; Voice Activity Detection (VAD);
Conference_Titel :
Machine Learning and Applications (ICMLA), 2013 12th International Conference on
Conference_Location :
Miami, FL
DOI :
10.1109/ICMLA.2013.171