Title :
Enhancement of VSR using low dimension visual feature
Author :
Upadhyaya, Parag ; Farooq, Omar ; Varshney, Praveen ; Upadhyaya, Ajay
Author_Institution :
Dept. of Electron. & Commun. Eng., SRI-Datia, Datia, India
Abstract :
This paper presents a study about the low dimension visual (LDV) space features and investigates the improvement in audio visual automatic speech recognition using different set of visual features. The experiment is divided into three sub-sections; in first phase the recognition is performed on 12 static DCT features; in second phase the recognition is performed for combination of 6 static and 6 dynamic features and in third phase the recognition is performed on 12 low dimension DCT feature. For this research work Hindi AMUAV (Aligarh Muslim University Audio-Visual) database was developed in which audio sample at 44.1 kHz and video sample at 25 frames per second was opted. Hidden Markov Model (HMM) tool kit with left-right HMMs modeled was used for recognition and an overall improvement of 26.04% in word recognition is achieved with LDV space features.
Keywords :
audio-visual systems; discrete cosine transforms; feature extraction; hidden Markov models; speech recognition; Aligarh Muslim University; HMM; Hindi AMUAV database; LDV space features; VSR enhancement; audio sample; audio visual automatic speech recognition; audio-visual database; frequency 44.1 kHz; hidden Markov model; low dimension DCT feature; low dimension visual space features; static DCT features; visual speech recognition; word recognition; Databases; Discrete cosine transforms; Feature extraction; Hidden Markov models; Speech; Speech recognition; Visualization;
Conference_Titel :
Multimedia, Signal Processing and Communication Technologies (IMPACT), 2013 International Conference on
Conference_Location :
Aligarh
Print_ISBN :
978-1-4799-1202-5
DOI :
10.1109/MSPCT.2013.6782090