DocumentCode
395510
Title
Online speech-reading system for Japanese language
Author
Tobely, T.El. ; Tsuruta, Naoyuki ; Amamiya, Makoto
Author_Institution
Dept. of Intelligent Syst., Kyushu Univ., Fukuoka, Japan
Volume
3
fYear
2002
fDate
18-22 Nov. 2002
Firstpage
1188
Abstract
In this paper, a speech-reading system dedicated for Japanese language is introduced. The Japanese language has unique feature that its letters must end with a English vowel character (a, i, u, e, and o). Using this feature, it is possible to quantize the continuous lip motion into a discrete sequence of vowels, which can classify the spoken sentences. This technique can achieve high accuracy especially when the required sentences to recognize is defined in advance. In this paper, the quantization process is performed using the Hypercolumn neural network model (HCM), which consists of hierarchical layers of the Hierarchical Self-Organizing Maps (HSOM) neural network arranged as the cell planes of the Neocognitron (NC) neural network. HCM can recognize images with variant objects size, position, and spatial resolution. However, due to the hierarchical structure of the HCM model, the network spends long time in the recognition. To achieve on-line recognition, during the recognition phase a new competition algorithm for the HCM is proposed. This algorithm is based on selecting subset from the network codebook, which can reduce the network recognition time into the range of real-time rate. Results show that the system perform well in the online recognition of eight Japanese sentences.
Keywords
natural languages; neural nets; quantisation (signal); real-time systems; speech recognition; English vowel character; Japanese language; hierarchical structure; hypercolumn neural network model; neocognitron neural network; online speech recognition; quantization; speech-reading system; Active shape model; Cities and towns; Image recognition; Intelligent systems; Natural languages; Neural networks; Quantization; Self organizing feature maps; Spatial resolution; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Information Processing, 2002. ICONIP '02. Proceedings of the 9th International Conference on
Print_ISBN
981-04-7524-1
Type
conf
DOI
10.1109/ICONIP.2002.1202809
Filename
1202809
Link To Document