Title :
Underline removal method by utilizing characteristics of Japanese business documents
Author :
Oba, Mitsuharu ; Nozaki, Yasuyuki ; Matsumoto, Toshiko ; Onoyama, Takashi
Author_Institution :
R&D Dept., Hitachi Software Eng. Co., Ltd., Tokyo, Japan
Abstract :
In this paper we propose an underline removal method specific to Japanese business document. Automated removal of underlines is important, because underline is the main cause of OCR misrecognition. The main feature of our method is to remove various types of underlines such as touched, inclined, and blurred lines by line template matching. Moreover, our method makes it possible to remove all possible underlines by excluding table ruled lines which are necessary for document structure analysis. The experimental result demonstrates that the proposed method is able to improve OCR recognition accuracy.
Keywords :
document image processing; image matching; optical character recognition; Japanese business documents; OCR misrecognition; document structure analysis; line template matching; optical character recognition software; table ruled lines; underline removal method; Character recognition; Companies; Content management; Data mining; Electrochemical machining; Optical character recognition software; Production; Software engineering; Technology management; Text analysis; Line Template Matching; OCR; business documents; underline removal;
Conference_Titel :
TENCON 2009 - 2009 IEEE Region 10 Conference
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-4546-2
Electronic_ISBN :
978-1-4244-4547-9
DOI :
10.1109/TENCON.2009.5396199