• DocumentCode
    3259166
  • Title

    Underline removal method by utilizing characteristics of Japanese business documents

  • Author

    Oba, Mitsuharu ; Nozaki, Yasuyuki ; Matsumoto, Toshiko ; Onoyama, Takashi

  • Author_Institution
    R&D Dept., Hitachi Software Eng. Co., Ltd., Tokyo, Japan
  • fYear
    2009
  • fDate
    23-26 Jan. 2009
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper we propose an underline removal method specific to Japanese business document. Automated removal of underlines is important, because underline is the main cause of OCR misrecognition. The main feature of our method is to remove various types of underlines such as touched, inclined, and blurred lines by line template matching. Moreover, our method makes it possible to remove all possible underlines by excluding table ruled lines which are necessary for document structure analysis. The experimental result demonstrates that the proposed method is able to improve OCR recognition accuracy.
  • Keywords
    document image processing; image matching; optical character recognition; Japanese business documents; OCR misrecognition; document structure analysis; line template matching; optical character recognition software; table ruled lines; underline removal method; Character recognition; Companies; Content management; Data mining; Electrochemical machining; Optical character recognition software; Production; Software engineering; Technology management; Text analysis; Line Template Matching; OCR; business documents; underline removal;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON 2009 - 2009 IEEE Region 10 Conference
  • Conference_Location
    Singapore
  • Print_ISBN
    978-1-4244-4546-2
  • Electronic_ISBN
    978-1-4244-4547-9
  • Type

    conf

  • DOI
    10.1109/TENCON.2009.5396199
  • Filename
    5396199