Lip-based visual speech recognition system

Author

Aufaclav Zatu Kusuma Frisky;Chien-Yao Wang;Andri Santoso;Jia-Ching Wang

Author_Institution

Department of Computer Science and Information Engineering, National Central University, Taiwan, R.O.C.

fYear

2015

Firstpage

315

Lastpage

319

Abstract

This paper proposes a system to address the problem of visual speech recognition. The proposed system is based on visual lip movement recognition by applying video content analysis technique. Using spatiotemporal features descriptors, we extracted features from video containing visual lip information. A preprocessing step is employed by removing the noise and enhancing the contrast of images in every frames of video. Extracted feature are used to build a dictionary for kernel sparse representation classifier (K-SRC) in the classification step. We adopted non-negative matrix factorization (NMF) method to reduce the dimensionality of the extracted features. We evaluated the performance of our system using AVLetters and AVLetters2 dataset. To evaluate the performance of our system, we used the same configuration as another previous works. Using AVLetters dataset, the promising accuracies of 67.13%, 45.37%, and 63.12% can be achieved in semi speaker dependent, speaker independent, and speaker dependent, respectively. Using AVLetters2 dataset, our method can achieve accuracy rate of 89.02% for speaker dependent case and 25.9% for speaker independent. This result showed that our proposed method outperforms another methods using same configuration.

Keywords

"Feature extraction","Kernel","Visualization","Speech recognition","Dictionaries","Testing","Mouth"

Publisher

ieee

Conference_Titel

Security Technology (ICCST), 2015 International Carnahan Conference on

Print_ISBN

978-1-4799-8690-3

Electronic_ISBN

2153-0742

Type

conf

DOI

10.1109/CCST.2015.7389703

Filename

7389703