DocumentCode :
3752087
Title :
Audio-visual speech recognition using deep bottleneck features and high-performance lipreading
Author :
Satoshi Tamura;Hiroshi Ninomiya;Norihide Kitaoka;Shin Osuga;Yurie Iribe;Kazuya Takeda;Satoru Hayamizu
Author_Institution :
Gifu University, Japan
fYear :
2015
Firstpage :
575
Lastpage :
582
Abstract :
This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring high-performance visual features, (2) applying audio and visual deep bottleneck features to improve AVSR performance, and (3) investigating effectiveness of voice activity detection in a visual modality. In our approach, many kinds of visual features are incorporated, subsequently converted into bottleneck features by deep learning technology. By using proposed features, we successfully achieved 73.66% lipreading accuracy in speaker-independent open condition, and about 90% AVSR accuracy on average in noisy environments. In addition, we extracted speech segments from visual features, resulting 77.80% lipreading accuracy. It is found VAD is useful in both audio and visual modalities, for better lipreading and AVSR.
Keywords :
"Visualization","Feature extraction","Speech recognition","Hidden Markov models","Discrete cosine transforms","Principal component analysis","Mouth"
Publisher :
ieee
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type :
conf
DOI :
10.1109/APSIPA.2015.7415335
Filename :
7415335
Link To Document :
بازگشت