DocumentCode
3735347
Title
Lip-based visual speech recognition system
Author
Aufaclav Zatu Kusuma Frisky;Chien-Yao Wang;Andri Santoso;Jia-Ching Wang
Author_Institution
Department of Computer Science and Information Engineering, National Central University, Taiwan, R.O.C.
fYear
2015
Firstpage
315
Lastpage
319
Abstract
This paper proposes a system to address the problem of visual speech recognition. The proposed system is based on visual lip movement recognition by applying video content analysis technique. Using spatiotemporal features descriptors, we extracted features from video containing visual lip information. A preprocessing step is employed by removing the noise and enhancing the contrast of images in every frames of video. Extracted feature are used to build a dictionary for kernel sparse representation classifier (K-SRC) in the classification step. We adopted non-negative matrix factorization (NMF) method to reduce the dimensionality of the extracted features. We evaluated the performance of our system using AVLetters and AVLetters2 dataset. To evaluate the performance of our system, we used the same configuration as another previous works. Using AVLetters dataset, the promising accuracies of 67.13%, 45.37%, and 63.12% can be achieved in semi speaker dependent, speaker independent, and speaker dependent, respectively. Using AVLetters2 dataset, our method can achieve accuracy rate of 89.02% for speaker dependent case and 25.9% for speaker independent. This result showed that our proposed method outperforms another methods using same configuration.
Keywords
"Feature extraction","Kernel","Visualization","Speech recognition","Dictionaries","Testing","Mouth"
Publisher
ieee
Conference_Titel
Security Technology (ICCST), 2015 International Carnahan Conference on
Print_ISBN
978-1-4799-8690-3
Electronic_ISBN
2153-0742
Type
conf
DOI
10.1109/CCST.2015.7389703
Filename
7389703
Link To Document