• DocumentCode
    2142492
  • Title

    A Handwritten Character Extraction Algorithm for Multi-language Document Image

  • Author

    Song, Yonghong ; Xiao, Guilin ; Zhang, Yuanlin ; Yang, Lei ; Zhao, Liuliu

  • Author_Institution
    Inst. of Artificial Intell. & Robot., Xi´´an Jiaotong Univ., Xi´´an, China
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    93
  • Lastpage
    98
  • Abstract
    In this paper, we propose a novel method for extracting handwritten characters from multi-language document images, which may contain various types of characters, e.g. Chinese, English, Japanese or their mixture. Firstly, text patches in document image are segmented based on connected component analysis. Rules for merging connected components are chosen according to the results of language identification. Then features are extracted for each basic analysis unit-text patch. Genetic algorithm is applied for feature fusion and patch type classification. Finally, a Markov Random Field model is utilized as a post-processing step to further correct the misclassification of text patch type by considering the document context. Experimental results show that the proposed algorithm can apparently improve the performance of handwritten character extraction.
  • Keywords
    Markov processes; document image processing; genetic algorithms; handwritten character recognition; image classification; Markov random field model; connected component analysis; feature fusion; genetic algorithm; handwritten character extraction; language identification; multilanguage document image; patch type classification; unit-text patch; Feature extraction; Genetic algorithms; Image segmentation; Markov random fields; Merging; Text analysis; Vectors; Markov random field; document segmentation; feature fusion; handwritten character extraction; multi-language;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.28
  • Filename
    6065283