DocumentCode
2142492
Title
A Handwritten Character Extraction Algorithm for Multi-language Document Image
Author
Song, Yonghong ; Xiao, Guilin ; Zhang, Yuanlin ; Yang, Lei ; Zhao, Liuliu
Author_Institution
Inst. of Artificial Intell. & Robot., Xi´´an Jiaotong Univ., Xi´´an, China
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
93
Lastpage
98
Abstract
In this paper, we propose a novel method for extracting handwritten characters from multi-language document images, which may contain various types of characters, e.g. Chinese, English, Japanese or their mixture. Firstly, text patches in document image are segmented based on connected component analysis. Rules for merging connected components are chosen according to the results of language identification. Then features are extracted for each basic analysis unit-text patch. Genetic algorithm is applied for feature fusion and patch type classification. Finally, a Markov Random Field model is utilized as a post-processing step to further correct the misclassification of text patch type by considering the document context. Experimental results show that the proposed algorithm can apparently improve the performance of handwritten character extraction.
Keywords
Markov processes; document image processing; genetic algorithms; handwritten character recognition; image classification; Markov random field model; connected component analysis; feature fusion; genetic algorithm; handwritten character extraction; language identification; multilanguage document image; patch type classification; unit-text patch; Feature extraction; Genetic algorithms; Image segmentation; Markov random fields; Merging; Text analysis; Vectors; Markov random field; document segmentation; feature fusion; handwritten character extraction; multi-language;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location
Beijing
ISSN
1520-5363
Print_ISBN
978-1-4577-1350-7
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2011.28
Filename
6065283
Link To Document