Title :
Syntactic and Semantic Labeling of Hierarchically Organized Document Image Components of Indian Scripts
Author :
Harit, Gaurav ; Garg, Ritu ; Chaudhury, Santanu
Author_Institution :
IIT Kharagpur, Kharagpur
Abstract :
In this paper we describe our document image analysis system which performs segmentation, content characterization as well as semantic labeling of components. Segmentation is done using white spaces and gives the segmented components arranged in a hierarchy. Semantic labeling is done using domain knowledge which is specified where possible in the form of a document model applicable to a class of documents. The novelty of the system lies in the suite of methods it employs which are capable of handling documents in Indian scripts. We have obtained promising results for semantic segmentation of over 30 categories of documents in Indian scripts.
Keywords :
document image processing; image segmentation; Indian script; content characterization; document image analysis system; gray-scale image processing; hierarchically organized document image component; image segmentation; semantic labeling; syntactic labeling; white space; Computer science; Graphics; Image segmentation; Iterative algorithms; Labeling; Particle separators; Pattern recognition; Skeleton; Text analysis; White spaces; Page Segmentation; Semantic labeling;
Conference_Titel :
Advances in Pattern Recognition, 2009. ICAPR '09. Seventh International Conference on
Conference_Location :
Kolkata
Print_ISBN :
978-1-4244-3335-3
DOI :
10.1109/ICAPR.2009.88