• DocumentCode
    1779030
  • Title

    Research on Optimization Segmentation Algorithm for Chinese/English Mixed Character Image in OCR

  • Author

    Liu Mingzhu ; Suo Yuxiu ; Ding Yinan

  • Author_Institution
    Higher Educ. Key Lab. for Meas. &Control Technol. & Instrumentations of Heilongjiang Province, Harbin Univ. of Sci. & Technol., Harbin, China
  • fYear
    2014
  • fDate
    18-20 Sept. 2014
  • Firstpage
    764
  • Lastpage
    769
  • Abstract
    In allusion to the problem of low accuracy rate recognition in Chinese/English mixed characters, the paper researches on optimization algorithm for segmentation in Chinese/English mixed characters based on OCR system. Rough segment for text images is based on vertical projection method, which follows to characters segmentation theory, extraction of Chinese character, Chinese character component and English number connectivity regional. In Chinese character component connectivity regional, traditional Chinese character component merging algorithms will cause some Chinese characters components are merged incompletely, therefore, an unit merging algorithm based on feedback recognition is presented to merge Chinese character component, in Chinese character, English and number connectivity regional, as adhesion character can lead to segmentation errors, achieving the detection of adhesion character and re-segmentation through the geometric features of character. The test of mixed character segmentation showes that: In the course of recognition on Chinese/English mixed character, the segmentation optimization algorithm have obvious advantage over the traditional algorithms on the accuracy rate of recognition, especially on Chinese characters those are composed of left and right components.
  • Keywords
    feedback; image segmentation; natural language processing; optical character recognition; text analysis; Chinese character component connectivity regional component merging algorithms; Chinese-English mixed character image; English number connectivity regional; OCR system; adhesion character; character segmentation theory; feedback recognition; optimization segmentation algorithm; rough segment; text image segmentation; traditional Chinese character component merging algorithms; unit merging algorithm; vertical projection method; Accuracy; Adhesives; Character recognition; Image segmentation; Merging; Optical character recognition software; Optimization; Chinese/English mixed character; OCR; feedback recognition; unit merging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Instrumentation and Measurement, Computer, Communication and Control (IMCCC), 2014 Fourth International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-1-4799-6574-8
  • Type

    conf

  • DOI
    10.1109/IMCCC.2014.162
  • Filename
    6995132