DocumentCode :
1583667
Title :
Structure in on-line documents
Author :
Jain, Anil K. ; Namboodiri, Anoop M. ; Subrahmonia, Jayashree
Author_Institution :
Dept. of Comput. Sci. & Eng., Michigan State Univ., East Lansing, MI, USA
fYear :
2001
fDate :
6/23/1905 12:00:00 AM
Firstpage :
844
Lastpage :
848
Abstract :
We present a hierarchical approach for extracting homogeneous regions in on-line documents. The problem of identifying and processing ruled and unruled tables, text and drawings is addressed. The on-line document is first segmented into regions with only text strokes and regions with both text and non-text strokes. The text region is further classified as unruled table or plain text. Stroke clustering is used to segment the non-text regions. Each nontext segment is then classified as drawing, ruled table or underlined keyword using stroke properties. The individual regions are processed and the results are assembled to identify the structure of the on-line document
Keywords :
document image processing; feature extraction; image classification; image segmentation; document analysis; document understanding; drawings; extracting homogeneous regions; on-line documents; segmented; segmenting document pages; table identification; tables; text; text recognition; text strokes; Assembly; Color; Data preprocessing; Focusing; Graphics; Handwriting recognition; Image segmentation; Text analysis; Text recognition; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
Type :
conf
DOI :
10.1109/ICDAR.2001.953906
Filename :
953906
Link To Document :
بازگشت