DocumentCode
2421915
Title
Approximate matching for OCR-processed bibliographic data
Author
Takasu, Atsuhiro ; KATAYAMA, Norio ; Yamaoka, Masaki ; Iwaki, Osamu ; Oyama, Keizo ; Adachi, Jun
Author_Institution
Res. & Dev. Dept., Nat. Center for Sci. Inf. Syst., Tokyo, Japan
Volume
3
fYear
1996
fDate
25-29 Aug 1996
Firstpage
175
Abstract
This paper presents a method for matching bibliographies in references of academic papers obtained as document images with records of bibliographic databases. The main subject of this paper is to handle the erroneous bibliographic data obtained by a document understanding methodology. The presented method can find a candidate record set from referral databases in spite of the errors of string by means of approximate matching which is performed as an exact matching of k substrings of length m chosen from the strings of bibliographic data in references and in databases. For the accuracy α of the OCR, theoretical observation shows that the accuracy of the presented method is 1-(1-αm)k under the assumption that the OCR error occurs randomly and independently in the string. The method is applied to references of 187 Japanese articles and achieves accuracy of 94.05%
Keywords
bibliographic systems; optical character recognition; visual databases; Japanese articles; OCR-processed bibliographic data; academic papers; approximate matching; bibliographies; document images; referral databases; Character recognition; Data communication; Data mining; Image analysis; Image databases; Information systems; Information technology; Laboratories; Optical character recognition software; Text analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 1996., Proceedings of the 13th International Conference on
Conference_Location
Vienna
ISSN
1051-4651
Print_ISBN
0-8186-7282-X
Type
conf
DOI
10.1109/ICPR.1996.546933
Filename
546933
Link To Document