DocumentCode :
3259166
Title :
Underline removal method by utilizing characteristics of Japanese business documents
Author :
Oba, Mitsuharu ; Nozaki, Yasuyuki ; Matsumoto, Toshiko ; Onoyama, Takashi
Author_Institution :
R&D Dept., Hitachi Software Eng. Co., Ltd., Tokyo, Japan
fYear :
2009
fDate :
23-26 Jan. 2009
Firstpage :
1
Lastpage :
6
Abstract :
In this paper we propose an underline removal method specific to Japanese business document. Automated removal of underlines is important, because underline is the main cause of OCR misrecognition. The main feature of our method is to remove various types of underlines such as touched, inclined, and blurred lines by line template matching. Moreover, our method makes it possible to remove all possible underlines by excluding table ruled lines which are necessary for document structure analysis. The experimental result demonstrates that the proposed method is able to improve OCR recognition accuracy.
Keywords :
document image processing; image matching; optical character recognition; Japanese business documents; OCR misrecognition; document structure analysis; line template matching; optical character recognition software; table ruled lines; underline removal method; Character recognition; Companies; Content management; Data mining; Electrochemical machining; Optical character recognition software; Production; Software engineering; Technology management; Text analysis; Line Template Matching; OCR; business documents; underline removal;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
TENCON 2009 - 2009 IEEE Region 10 Conference
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-4546-2
Electronic_ISBN :
978-1-4244-4547-9
Type :
conf
DOI :
10.1109/TENCON.2009.5396199
Filename :
5396199
Link To Document :
بازگشت