DocumentCode
3082745
Title
Document image dataset indexing and compression using connected components clustering
Author
Chatbri, Houssem ; Kameyama, Keisuke
Author_Institution
Dept. of Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan
fYear
2015
fDate
18-22 May 2015
Firstpage
267
Lastpage
270
Abstract
We present a method for document image dataset indexing and compression by clustering of connected components. Our method extracts connected components from each dataset image and performs component clustering to make a hash table that is a compressed indexing of the dataset. Clustering is based on component similarity which is estimated by comparing shape features extracted from the components. Then, the hash table is saved in a text file, and the text file is further compressed using any available compression methodology. Component encoding in the hash table is storage efficient and done using components´ contour points and a reduced number of interior points that are sufficient for component reconstruction. We evaluate our method´s performances in indexing and compression using four document image datasets. Experimental results show that indexing significantly improves efficiency when used in document image retrieval. In addition, comparative evaluation with two compression standards, namely the ZIP and XZ formats, show competitive performances. Our compression rates are below 20% and the compression errors are very low being at the order of 10-6% per image.
Keywords
data compression; document image processing; feature extraction; image coding; image reconstruction; XZ formats; ZIP formats; component contour points; component encoding; component reconstruction; component similarity; compressed indexing; compression errors; compression methodology; connected components clustering; document image dataset compression; document image dataset indexing; document image retrieval; feature extraction; hash table; text file; Clustering algorithms; Encoding; Feature extraction; Image coding; Image reconstruction; Indexing; Redundancy;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Vision Applications (MVA), 2015 14th IAPR International Conference on
Conference_Location
Tokyo
Type
conf
DOI
10.1109/MVA.2015.7153182
Filename
7153182
Link To Document