• DocumentCode
    2029553
  • Title

    Document retrieval system tolerant of segmentation errors of document images

  • Author

    Nagasaki, Takeshi ; Takahashi, Toshikazu ; Marukawa, Katsumi

  • Author_Institution
    Central Res. Laboratory, Hitachi, Ltd., Tokyo, Japan
  • fYear
    2004
  • fDate
    26-29 Oct. 2004
  • Firstpage
    280
  • Lastpage
    285
  • Abstract
    This paper describes a new document retrieval method that is tolerant of OCR segmentation errors in document images. To overcome the segmentation and recognition errors that most OCR-based retrieval systems suffer from, the proposed method consists of two processing phases. First, the OCR engine first generates multiple character-segmentation and recognition hypotheses. Then the retrieval engine extracts keywords from the recognition hypotheses by using lexicon-driven dynamic programming (DP) matching. We have applied this method to both handwritten and printed document images and have demonstrated its effectiveness in reducing false drops and false alarms.
  • Keywords
    document image processing; dynamic programming; feature extraction; image segmentation; information retrieval; optical character recognition; OCR engine; document images segmentation; document retrieval system; lexicon-driven dynamic programming matching; multiple character-segmentation; optical character recognition; recognition hypotheses; retrieval engine extraction; Character generation; Character recognition; Dictionaries; Dynamic programming; Engines; Image retrieval; Image segmentation; Image sequence analysis; Laboratories; Optical character recognition software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition, 2004. IWFHR-9 2004. Ninth International Workshop on
  • ISSN
    1550-5235
  • Print_ISBN
    0-7695-2187-8
  • Type

    conf

  • DOI
    10.1109/IWFHR.2004.36
  • Filename
    1363924