Title :
A Novel Image Text Extraction Method Based on K-Means Clustering
Author :
Song, Yan ; Liu, Anan ; Pang, Lin ; Lin, Shouxun ; Zhang, Yongdong ; Tang, Sheng
Author_Institution :
Key Lab. of Intell. Inf. Process., Chinese Acad. of Sci., Beijing
Abstract :
Texts in web pages, images and videos contain important clues for information indexing and retrieval. Most existing text extraction methods depend on the language type and text appearance. In this paper, a novel and universal method of image text extraction is proposed. A coarse-to-fine text location method is implemented. Firstly, a multi-scale approach is adopted to locate texts with different font sizes. Secondly, projection profiles are used in location refinement step. Color-based k-means clustering is adopted in text segmentation. Compared to grayscale image which is used in most existing methods, color image is more suitable for segmentation based on clustering. It treats corner-points, edge-points and other points equally so that it solves the problem of handling multilingual text. It is demonstrated in experimental results that best performance is obtained when k is 3. Comparative experimental results on a large number of images show that our method is accurate and robust in various conditions.
Keywords :
image colour analysis; image segmentation; information retrieval; pattern clustering; text analysis; coarse-to-fine text location method; color-based k-means clustering; grayscale image; information indexing; information retrieval; location refinement step; multilingual text; novel image text extraction method; text segmentation; Color; Data mining; Gray-scale; Image retrieval; Image segmentation; Indexing; Information retrieval; Robustness; Videos; Web pages;
Conference_Titel :
Computer and Information Science, 2008. ICIS 08. Seventh IEEE/ACIS International Conference on
Conference_Location :
Portland, OR
Print_ISBN :
978-0-7695-3131-1
DOI :
10.1109/ICIS.2008.31