Title :
Stabilize Sequence Learning with Recurrent Neural Networks by Forced Alignment
Author :
Schambach, Marc-Peter ; Rashid, Sheikh Faisal
Author_Institution :
Siemens AG, Konstanz, Germany
Abstract :
Cursive handwriting recognition is still a hot topic of research, especially for non-Latin scripts. One of the techniques which yields best recognition results is based on recurrent neural networks: with neurons modeled by long short-term memory (LSTM) cells, and alignment of label sequence to output sequence performed by a connectionist temporal classification (CTC) layer. However, network training is time consuming, unstable, and tends to over-adaptation. One of the reasons is the bootstrap process, which aligns the label data more or less randomly in early training iterations. This also leads to the fact that the emission peak positions within a character are located unpredictably. But positions near the center of a character are more desirable: In theory, they better model the properties of a character. The solution presented here is to guide the back-propagation training in early iterations: Character alignment is enforced by replacing the forward-backward alignment by fixed character positions: either pre-segmented, or equally distributed. After a number of guided iterations, training may be continued by standard dynamic alignment. A series of experiments is performed to answer some of these questions: Can peak positions be controlled in the long run? Can training iterations be reduced, getting results faster? Is training more stable? And finally: Do defined character position lead to better recognition performance?
Keywords :
backpropagation; handwriting recognition; image classification; neural nets; CTC layer; LSTM cells; back-propagation training; bootstrap process; character alignment; connectionist temporal classification layer; cursive handwriting recognition; emission peak positions; forced alignment; forward-backward alignment; label sequence alignment; long short-term memory cell; nonLatin scripts; output sequence; recurrent neural networks; sequence learning stabilization; standard dynamic alignment; Convergence; Handwriting recognition; Hidden Markov models; Recurrent neural networks; Standards; Training; Training data; Handwriting recognition; character alignment; recurrent neural nets;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.257