DocumentCode :
3498155
Title :
Tree clustering for layout-based document image retrieval
Author :
Marinai, Simone ; Marino, Emanuele ; Soda, Giovanni
Author_Institution :
Dipt. di Sistemi e Informatica, Univ. di Firenze
fYear :
2006
fDate :
27-28 April 2006
Lastpage :
253
Abstract :
We describe a system for the retrieval on the basis of layout similarity of document images belonging to collections stored in digital libraries. Layout regions are extracted and represented with the XY tree. The proposed indexing method combines a new tree clustering algorithm (based on self organizing maps) with principal component analysis. The combination of these techniques allows us to retrieve the most similar pages from large collections without the need for a direct comparison of the query page with each indexed document
Keywords :
document image processing; image retrieval; indexing; principal component analysis; trees (mathematics); digital libraries; indexing method; layout similarity; layout-based document image retrieval; principal component analysis; self organizing maps; tree clustering; Clustering algorithms; Computational efficiency; Encoding; Image retrieval; Indexing; Neurons; Principal component analysis; Self organizing feature maps; Software libraries; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Image Analysis for Libraries, 2006. DIAL '06. Second International Conference on
Conference_Location :
Lyon
Print_ISBN :
0-7695-2531-8
Type :
conf
DOI :
10.1109/DIAL.2006.44
Filename :
1612966
Link To Document :
بازگشت