Title :
Document image dataset indexing and compression using connected components clustering
Author :
Chatbri, Houssem ; Kameyama, Keisuke
Author_Institution :
Dept. of Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan
Abstract :
We present a method for document image dataset indexing and compression by clustering of connected components. Our method extracts connected components from each dataset image and performs component clustering to make a hash table that is a compressed indexing of the dataset. Clustering is based on component similarity which is estimated by comparing shape features extracted from the components. Then, the hash table is saved in a text file, and the text file is further compressed using any available compression methodology. Component encoding in the hash table is storage efficient and done using components´ contour points and a reduced number of interior points that are sufficient for component reconstruction. We evaluate our method´s performances in indexing and compression using four document image datasets. Experimental results show that indexing significantly improves efficiency when used in document image retrieval. In addition, comparative evaluation with two compression standards, namely the ZIP and XZ formats, show competitive performances. Our compression rates are below 20% and the compression errors are very low being at the order of 10-6% per image.
Keywords :
data compression; document image processing; feature extraction; image coding; image reconstruction; XZ formats; ZIP formats; component contour points; component encoding; component reconstruction; component similarity; compressed indexing; compression errors; compression methodology; connected components clustering; document image dataset compression; document image dataset indexing; document image retrieval; feature extraction; hash table; text file; Clustering algorithms; Encoding; Feature extraction; Image coding; Image reconstruction; Indexing; Redundancy;
Conference_Titel :
Machine Vision Applications (MVA), 2015 14th IAPR International Conference on
Conference_Location :
Tokyo
DOI :
10.1109/MVA.2015.7153182