Title :
Similarity measure for CCITT Group 4 compressed document images
Author :
Lu, Yue ; Tan, Chew Lim ; Fan, Liying ; Huang, Weihua
Author_Institution :
Dept. of Comput. Sci., Nat. Univ. of Singapore, Singapore
fDate :
6/23/1905 12:00:00 AM
Abstract :
The similarity measure of document images has a crucial role in the area of document image retrieval. A method of measuring the similarity of CCITT Group 4 compressed document images is proposed. The features are extracted directly from the changing elements of the compressed images. Weighted Hausdorff distance is utilized to assign all of the word objects from two document images to corresponding classes by an unsupervised classifier, whereas the possible stop words are excluded. Document vectors are built by the occurrence frequency of the word object classes, and the pair-wise similarity of two document images is represented by the scalar product of the document vectors. Five group articles relating to different domains are used to test the validity of the presented approach
Keywords :
content-based retrieval; data compression; digital libraries; document image processing; feature extraction; image classification; image coding; image retrieval; object recognition; CCITT Group 4 images; changing elements; compressed document images; digital libraries; document retrieval; image retrieval; similarity measure; stop words; unsupervised classifier; weighted Hausdorff distance; word objects; Content based retrieval; Frequency; Image coding; Image retrieval; Image storage; Information retrieval; Internet; Optical character recognition software; Software libraries; Testing;
Conference_Titel :
Image Processing, 2001. Proceedings. 2001 International Conference on
Conference_Location :
Thessaloniki
Print_ISBN :
0-7803-6725-1
DOI :
10.1109/ICIP.2001.959247