Title :
Combining acoustic and visual modalities in vowel recognition system for laryngectomees
Author :
Pietruch, Rafal ; Grzanka, Antoni
Author_Institution :
Ind. Res. Inst. for Autom. & Meas., Warsaw, Poland
Abstract :
This paper addresses the problem of vowels recognition in patients after total laryngectomy using combined visual and acoustic features. The linear prediction coefficients were estimated from speech signal using weighted recursive least squares algorithm. Ten cross-sectional areas of vocal tract model were calculated. Face expression parameters related to the spoken vowel were extracted from video recordings. Lips width, lips height and jaw opening were measured from grabbed video frames. The principal component analysis was applied to show correlations of auditory and visual features. The vowel recognition procedures were based on single hidden layer neural networks. The recognition performances of visual, acoustic and fused modalities were compared. It was presented that recognition performance of sustained vowels using 10 cross-sectional areas estimates is very low. Facial expression analysis is needed when there is problem with estimation of standard acoustic parameters of pathological speech.
Keywords :
face recognition; feature extraction; least squares approximations; neural nets; speech recognition; video signal processing; acoustic modality; face expression extraction; facial expression analysis; laryngectomy patients; linear prediction coefficients; pathological speech; principal component analysis; single hidden layer neural networks; speech signal estimation; video recording extraction; visual modality; vowel recognition system; weighted recursive least squares algorithm; Acoustics; Artificial neural networks; Face; Feature extraction; Speech; Speech recognition; Visualization;
Conference_Titel :
Neural Network Applications in Electrical Engineering (NEUREL), 2010 10th Symposium on
Conference_Location :
Belgrade
Print_ISBN :
978-1-4244-8821-6
DOI :
10.1109/NEUREL.2010.5644075