Title :
Kanji Character Detection from Complex Real Scene Images based on Character Properties
Author :
Xu, Lianli ; Nagayoshi, Hiroto ; Sako, Hiroshi
Author_Institution :
Economic Dept. of the French Embassy in China, High Technol. Sect., Beijing
Abstract :
Character recognition in complex real scene images is a very challenging undertaking. The most popular approach is to segment the text area using some extra pre-knowledge, such as "characters are in a signboard\´\´, etc. This approach makes it possible to construct a very time-consuming method, but generality is still a problem. In this paper, we propose a more general method by utilizing only character features. Our algorithm consists of five steps: pre-processing to extract connected components, initial classification using primitive rules, strong classification using AdaBoost, Markov random field (MRF) clustering to combine connected components with similar properties, and post-processing using optical character recognition (OCR) results. The results of experiments using 11 images containing 1691 characters (including characters in bad condition) indicated the effectiveness of the proposed system, namely, that 52.9% of characters were extracted correctly with 625 noise components extracted as characters.
Keywords :
Markov processes; image segmentation; optical character recognition; text analysis; AdaBoost; Kanji character detection; Markov random field clustering; character properties; complex real scene images; optical character recognition; text area segmentation; time-consuming method; Background noise; Character recognition; Clustering algorithms; Image analysis; Layout; Markov random fields; Optical character recognition software; Optical noise; Poles and towers; Text analysis; AdaBoost; Character Detection; MRF; OCR;
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara
Print_ISBN :
978-0-7695-3337-7
DOI :
10.1109/DAS.2008.34