• DocumentCode
    2421915
  • Title

    Approximate matching for OCR-processed bibliographic data

  • Author

    Takasu, Atsuhiro ; KATAYAMA, Norio ; Yamaoka, Masaki ; Iwaki, Osamu ; Oyama, Keizo ; Adachi, Jun

  • Author_Institution
    Res. & Dev. Dept., Nat. Center for Sci. Inf. Syst., Tokyo, Japan
  • Volume
    3
  • fYear
    1996
  • fDate
    25-29 Aug 1996
  • Firstpage
    175
  • Abstract
    This paper presents a method for matching bibliographies in references of academic papers obtained as document images with records of bibliographic databases. The main subject of this paper is to handle the erroneous bibliographic data obtained by a document understanding methodology. The presented method can find a candidate record set from referral databases in spite of the errors of string by means of approximate matching which is performed as an exact matching of k substrings of length m chosen from the strings of bibliographic data in references and in databases. For the accuracy α of the OCR, theoretical observation shows that the accuracy of the presented method is 1-(1-αm)k under the assumption that the OCR error occurs randomly and independently in the string. The method is applied to references of 187 Japanese articles and achieves accuracy of 94.05%
  • Keywords
    bibliographic systems; optical character recognition; visual databases; Japanese articles; OCR-processed bibliographic data; academic papers; approximate matching; bibliographies; document images; referral databases; Character recognition; Data communication; Data mining; Image analysis; Image databases; Information systems; Information technology; Laboratories; Optical character recognition software; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 1996., Proceedings of the 13th International Conference on
  • Conference_Location
    Vienna
  • ISSN
    1051-4651
  • Print_ISBN
    0-8186-7282-X
  • Type

    conf

  • DOI
    10.1109/ICPR.1996.546933
  • Filename
    546933