Title :
Fast historic document retrieval by extracting document image summary
Author :
Shiah, Chwan-Yi ; Yen, Yun-Sheng
Author_Institution :
Dept. of Appl. Inf., Fo Guang Univ., YiLan, Taiwan
Abstract :
Historic documents such as Chinese calligraphy and old newspapers usually were handwritten or printed in poor quality so that an automatic optical character recognition procedure for scanned document images is difficult to apply. Thus efficient pattern matching techniques are required in order to do content-based information retrieval based on user´s queries. In this paper, a fast pattern clustering and image matching procedure is proposed to do image/pattern search in a historic document image based on user´s query images. The image summary extracted from the document image is constructed so that a set of distinct image clusters are formed. A couple of distance measures that calculate distance between image patterns are also proposed to evaluate their cluster similarities. By precise pattern matching and hierarchical image clustering, our experimental results show that an online query image can produce accurate and faster results than traditional approaches for a broad range of historic document images.
Keywords :
content-based retrieval; feature extraction; image matching; optical character recognition; pattern clustering; query processing; Chinese calligraphy; automatic optical character recognition procedure; content-based information retrieval; distinct image clusters; document image summary extraction; fast historic document retrieval; hierarchical image clustering; image matching procedure; image-pattern search; old newspapers; pattern clustering; pattern matching techniques; scanned document images; user query images; Clustering algorithms; Complexity theory; Histograms; Image segmentation; Information retrieval; Pattern matching; Shape; historic document image; image clustering; information retrieval; pattern matching;
Conference_Titel :
Multimedia Technology (ICMT), 2011 International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-61284-771-9
DOI :
10.1109/ICMT.2011.6003077