Title :
Vertical bar detection for gauging text similarity of document images
Author :
Huang, Weihua ; Tan, Chew Lim ; Sung, Sam Yuan ; Xu, Yi
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore
fDate :
6/23/1905 12:00:00 AM
Abstract :
A new method for gauging text similarity of image-based documents using word shape recognition is proposed in this paper. Image features are directly extracted instead of using OCR (optical character recognition). The proposed method forms so-called vertical bar patterns by detecting local extrema points in word units extracted by segmenting the document images. These vertical bar patterns form the feature vector of a document. The pair-wise similarity of document images is measured by calculating the scalar product of two document feature vectors. The proposed method is robust to changing fonts and styles, and is less affected by degradation of document qualities. To test the validity of the method, four corpora of document images were used and the ability of the method to retrieve relevant documents is reported
Keywords :
document image processing; feature extraction; image matching; image segmentation; information retrieval; document degradation; document feature vectors; document image segmentation; document image similarity; document retrieval; fonts; image feature extraction; local extrema points; text similarity; word shape recognition; Character recognition; Degradation; Feature extraction; Image recognition; Image segmentation; Optical character recognition software; Robustness; Shape; Testing; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
DOI :
10.1109/ICDAR.2001.953868