• DocumentCode
    3050117
  • Title

    Fast historic document retrieval by extracting document image summary

  • Author

    Shiah, Chwan-Yi ; Yen, Yun-Sheng

  • Author_Institution
    Dept. of Appl. Inf., Fo Guang Univ., YiLan, Taiwan
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    3062
  • Lastpage
    3065
  • Abstract
    Historic documents such as Chinese calligraphy and old newspapers usually were handwritten or printed in poor quality so that an automatic optical character recognition procedure for scanned document images is difficult to apply. Thus efficient pattern matching techniques are required in order to do content-based information retrieval based on user´s queries. In this paper, a fast pattern clustering and image matching procedure is proposed to do image/pattern search in a historic document image based on user´s query images. The image summary extracted from the document image is constructed so that a set of distinct image clusters are formed. A couple of distance measures that calculate distance between image patterns are also proposed to evaluate their cluster similarities. By precise pattern matching and hierarchical image clustering, our experimental results show that an online query image can produce accurate and faster results than traditional approaches for a broad range of historic document images.
  • Keywords
    content-based retrieval; feature extraction; image matching; optical character recognition; pattern clustering; query processing; Chinese calligraphy; automatic optical character recognition procedure; content-based information retrieval; distinct image clusters; document image summary extraction; fast historic document retrieval; hierarchical image clustering; image matching procedure; image-pattern search; old newspapers; pattern clustering; pattern matching techniques; scanned document images; user query images; Clustering algorithms; Complexity theory; Histograms; Image segmentation; Information retrieval; Pattern matching; Shape; historic document image; image clustering; information retrieval; pattern matching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Technology (ICMT), 2011 International Conference on
  • Conference_Location
    Hangzhou
  • Print_ISBN
    978-1-61284-771-9
  • Type

    conf

  • DOI
    10.1109/ICMT.2011.6003077
  • Filename
    6003077