DocumentCode
3259166
Title
Underline removal method by utilizing characteristics of Japanese business documents
Author
Oba, Mitsuharu ; Nozaki, Yasuyuki ; Matsumoto, Toshiko ; Onoyama, Takashi
Author_Institution
R&D Dept., Hitachi Software Eng. Co., Ltd., Tokyo, Japan
fYear
2009
fDate
23-26 Jan. 2009
Firstpage
1
Lastpage
6
Abstract
In this paper we propose an underline removal method specific to Japanese business document. Automated removal of underlines is important, because underline is the main cause of OCR misrecognition. The main feature of our method is to remove various types of underlines such as touched, inclined, and blurred lines by line template matching. Moreover, our method makes it possible to remove all possible underlines by excluding table ruled lines which are necessary for document structure analysis. The experimental result demonstrates that the proposed method is able to improve OCR recognition accuracy.
Keywords
document image processing; image matching; optical character recognition; Japanese business documents; OCR misrecognition; document structure analysis; line template matching; optical character recognition software; table ruled lines; underline removal method; Character recognition; Companies; Content management; Data mining; Electrochemical machining; Optical character recognition software; Production; Software engineering; Technology management; Text analysis; Line Template Matching; OCR; business documents; underline removal;
fLanguage
English
Publisher
ieee
Conference_Titel
TENCON 2009 - 2009 IEEE Region 10 Conference
Conference_Location
Singapore
Print_ISBN
978-1-4244-4546-2
Electronic_ISBN
978-1-4244-4547-9
Type
conf
DOI
10.1109/TENCON.2009.5396199
Filename
5396199
Link To Document