Title :
Insights into machine lip reading
Author :
Lan, Yuxuan ; Harvey, Richard ; Theobald, Barry-John
Author_Institution :
Sch. of Comput. Sci., Univ. of East Anglia, Norwich, UK
Abstract :
Computer lip-reading is one of the great signal processing challenges. Not only is the signal noisy, it is variable. However it is almost unknown to compare the performance with human lip-readers. Partly this is because of the paucity of human lip-readers and partly because most automatic systems only handle data that are trivial and therefore not representative of human speech. Here we generate a multiview dataset using connected words that can be analysed by an automatic system, based on linear predictive trackers and active appearance models, and human lip-readers. The automatic system we devise has a viseme accuracy of ≈ 46% which is comparable to poor professional human lip-readers. However, unlike human lip-readers our system is good at guessing its fallibility.
Keywords :
signal processing; speech processing; speech recognition; active appearance model; automatic system; computer lip reading; human speech; linear predictive trackers; machine lip reading; multiview dataset; professional human lip readers; signal noisy; signal processing; viseme accuracy; Accuracy; Active appearance model; Hidden Markov models; Humans; Speech; Speech recognition; Visualization; automated lip-reading; speech recognition; visual speech;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288999