• DocumentCode
    344189
  • Title

    A document image retrieval method tolerating recognition and segmentation errors of OCR using shape-feature and multiple candidates

  • Author

    Kameshiro, Taizo ; Hirano, Takashi ; Okada, Yasuhiro ; Yoda, Fumio

  • Author_Institution
    Inf. Technol. R&D Center, Mitsubishi Electr. Corp., Kanagawa, Japan
  • fYear
    1999
  • fDate
    20-22 Sep 1999
  • Firstpage
    681
  • Lastpage
    684
  • Abstract
    There are document image retrieval methods that are robust to character recognition errors. Some of them tolerate recognition errors by having multiple candidates for a character image, but they are intolerant of segmentation errors of characters. In addition, these methods cannot retrieve documents that do not contain the correct character code. We propose a method that overcomes these problems. This method uses multiple candidates and “shape-feature” which describes the outline of the character shape for uncertain characters. Documents are retrieved using both “shape-feature” and multiple candidate techniques. Our experimental results reveal that the method has a high recall rate compared with that of conventional methods
  • Keywords
    document image processing; image retrieval; image segmentation; optical character recognition; visual databases; OCR; character recognition errors; document image retrieval; experimental results; image recognition; image segmentation; multiple candidates; shape-feature; uncertain characters; Character recognition; Computer errors; Image converters; Image databases; Image recognition; Image retrieval; Image segmentation; Image storage; Optical character recognition software; Spatial databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    0-7695-0318-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1999.791879
  • Filename
    791879