DocumentCode
305708
Title
An edge-based block segmentation and classification for document analysis with automatic character string extraction
Author
Park, Chang-Joon ; Jeon, Joon-Hyung ; Koo, Tak-Mo ; Choi, Heung-Moon
Author_Institution
Sch. of Electron. & Electr. Eng., Kyungpook Nat. Univ., Taegu, South Korea
Volume
1
fYear
1996
fDate
14-17 Oct 1996
Firstpage
707
Abstract
Presents an edge-based block segmentation and classification with automatic character string extraction for document analysis. By exploiting only four edge features from the gradient and the orientation of the edge pixels, we can make the block segmentations, classifications, and the character string extractions all insensitive to the background noise and the brightness variation of the image. We can efficiently classify a document image into seven categories of small-sized letters, large-sized letters, tables, equations, flow charts, graphs, and photographs, the first five of which are text or character blocks containing characters, and the last two are non-character blocks. We can obtain an efficient block segmentation with reduced memory size by introducing the column and the text line intervals of the document in CRLA (constrained run length algorithm). The simulation results show that an efficient document image segmentation, block classification, and the character string extraction can be done concurrently
Keywords
document image processing; edge detection; feature extraction; image classification; image segmentation; automatic character string extraction; constrained run length algorithm; document analysis; edge-based block segmentation; edge-based classification; equations; flow charts; graphs; large-sized letters; photographs; small-sized letters; tables; Background noise; Brightness; Computer science; Data mining; Equations; Feature extraction; Flowcharts; Image segmentation; Pixel; Text analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man, and Cybernetics, 1996., IEEE International Conference on
Conference_Location
Beijing
ISSN
1062-922X
Print_ISBN
0-7803-3280-6
Type
conf
DOI
10.1109/ICSMC.1996.569881
Filename
569881
Link To Document