DocumentCode :
1584142
Title :
Text extraction from color documents-clustering approaches in three and four dimensions
Author :
Bunke, Horst
fYear :
2001
fDate :
6/23/1905 12:00:00 AM
Firstpage :
937
Lastpage :
941
Abstract :
Colored paper documents often contain important text information. For automating the retrieval process, identification of text elements is essential. In order to reduce the number of colors in a scanned document, color clustering is usually done first. In this article two histogram-based color clustering algorithms are investigated. The first is based on the RGB color space exclusively, while the second takes spatial information into account, in addition to the colors. Experimental results have shown that the use of spatial information in the clustering algorithm has a positive impact. Thus the automatic retrieval of text information can be improved. The proposed methods for clustering are not restricted to document images. They can also be used for processing Web or video images, for example
Keywords :
document image processing; image colour analysis; information retrieval; optical character recognition; OCR; RGB color space; Web images; color documents; document image processing; document scanning; experimental results; histogram-based color clustering; information retrieval; spatial information; text extraction; text retrieval; video images; Books; Clustering algorithms; Color; Computer science; Data mining; Histograms; Information retrieval; Machine assisted indexing; Marine vehicles; Mathematics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
Type :
conf
DOI :
10.1109/ICDAR.2001.953923
Filename :
953923
Link To Document :
بازگشت