Author_Institution :
Dept. of Comput. Sci. & Technol., Univ. of Sci. & Technol. Beijing, Beijing, China
Abstract :
After being kick-started with major breakthrough in 2006 by Hinton, LeCun and Bengio respectively, deep learning has been becoming the mainstream for challenging classification systems, which, however always were with "shallow" discriminative classifiers in the past. In this paper, we argue that in common classification cases with plenty but not enough training examples, mixed-quality examples for dozens of categories, deep learning and shallow classification may have complementary performance. Then, we design a hybrid recognition strategy with classification switching to adaptively fuse deep learning and shallow classification technologies. Finally, we present a variety of experiments with visual recognition tasks, i.e., USPS character recognition, Caltech101 visual object classification, and ICDAR scene text recognition. Specifically, we perform word recognition by dynamically combing the conventional open source OCR engine with the present popular convolutional neural networks, and construct an effective end-to-end scene text recognition system with open-vocabulary. This end-to-end system is evaluated on ICDAR 2011 Robust Reading Competition (Challenge 2) dataset, the f measure of which is 54.5%, much better than 45.2% of the latest state-of-the-art performance.
Keywords :
image classification; learning (artificial intelligence); neural nets; optical character recognition; Caltech101 visual object classification; ICDAR scene text recognition; USPS character recognition; conventional open source OCR engine; convolutional neural networks; deep learning technology; discriminative classifiers; effective end-to-end scene text recognition system; hybrid recognition strategy; open-vocabulary; shallow classification technology; visual recognition tasks; Character recognition; Neural networks; Support vector machines; Switches; Text recognition; Training; Visualization;