• DocumentCode
    3281328
  • Title

    Automatic extraction of text regions from document images by multilevel thresholding and k-means clustering

  • Author

    Hoai Nam Vu ; Tuan Anh Tran ; In Seop Na ; Soo Hyung Kim

  • Author_Institution
    Dept. of Comput. Sci., Chonnam Nat. Univ., Gwangju, South Korea
  • fYear
    2015
  • fDate
    June 28 2015-July 1 2015
  • Firstpage
    329
  • Lastpage
    334
  • Abstract
    Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefor we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.
  • Keywords
    database indexing; document image processing; image segmentation; optical character recognition; pattern clustering; text analysis; visual databases; Web searching; automatic extraction; complex background; document images; document understanding; graphics; grayscale image; heuristic filter; image database indexing; k-means clustering; multilevel thresholding; nontextual object; optical character recognition application; pictures; recursive filter; text regions; user focusing; Clustering algorithms; Data mining; Feature extraction; Gray-scale; Image color analysis; Image segmentation; Lighting; Connected Component; K-mean; Multilevel;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Science (ICIS), 2015 IEEE/ACIS 14th International Conference on
  • Conference_Location
    Las Vegas, NV
  • Type

    conf

  • DOI
    10.1109/ICIS.2015.7166615
  • Filename
    7166615