• DocumentCode
    305708
  • Title

    An edge-based block segmentation and classification for document analysis with automatic character string extraction

  • Author

    Park, Chang-Joon ; Jeon, Joon-Hyung ; Koo, Tak-Mo ; Choi, Heung-Moon

  • Author_Institution
    Sch. of Electron. & Electr. Eng., Kyungpook Nat. Univ., Taegu, South Korea
  • Volume
    1
  • fYear
    1996
  • fDate
    14-17 Oct 1996
  • Firstpage
    707
  • Abstract
    Presents an edge-based block segmentation and classification with automatic character string extraction for document analysis. By exploiting only four edge features from the gradient and the orientation of the edge pixels, we can make the block segmentations, classifications, and the character string extractions all insensitive to the background noise and the brightness variation of the image. We can efficiently classify a document image into seven categories of small-sized letters, large-sized letters, tables, equations, flow charts, graphs, and photographs, the first five of which are text or character blocks containing characters, and the last two are non-character blocks. We can obtain an efficient block segmentation with reduced memory size by introducing the column and the text line intervals of the document in CRLA (constrained run length algorithm). The simulation results show that an efficient document image segmentation, block classification, and the character string extraction can be done concurrently
  • Keywords
    document image processing; edge detection; feature extraction; image classification; image segmentation; automatic character string extraction; constrained run length algorithm; document analysis; edge-based block segmentation; edge-based classification; equations; flow charts; graphs; large-sized letters; photographs; small-sized letters; tables; Background noise; Brightness; Computer science; Data mining; Equations; Feature extraction; Flowcharts; Image segmentation; Pixel; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics, 1996., IEEE International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-3280-6
  • Type

    conf

  • DOI
    10.1109/ICSMC.1996.569881
  • Filename
    569881