DocumentCode :
2454647
Title :
Trainable table location in document images
Author :
Cesarini, F. ; Marinai, S. ; Sarti, L. ; Soda, G.
Author_Institution :
DSI, Universita di Firenze, Italy
Volume :
3
fYear :
2002
fDate :
2002
Firstpage :
236
Abstract :
We describe an approach for table location in document images. The documents are described by means of a hierarchical representation that is based on the MXY tree. The presence of a table is hypothesized by searching parallel lines in the MXY tree of the page. This hypothesis is afterwards verified by locating perpendicular lines or white spaces in the region included between the parallel lines. Lastly, located tables can be merged on the basis of proximity and similarity criteria. The use of an optimization method, that relies on the definition of an appropriate table location index, allows us to identify, the optimal values of thresholds involved in the algorithm. In this way the algorithm can be adapted to recognize tables with different features by maximizing the performance on an appropriate training set. The algorithm has been evaluated on two data-sets containing more than 1500 pages, and comparing its results with the tables identified by two commercial OCRs.
Keywords :
data structures; document image processing; iterative methods; optical character recognition; optimisation; search problems; MXY tree; OCRs; document images; hierarchical representation; optimization method; parallel lines; perpendicular lines; proximity criteria; similarity criteria; trainable table location; white spaces; Data mining; Data structures; Design methodology; Image analysis; Optical character recognition software; Optimization methods; Particle separators; Text analysis; Tree data structures; White spaces;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2002. Proceedings. 16th International Conference on
ISSN :
1051-4651
Print_ISBN :
0-7695-1695-X
Type :
conf
DOI :
10.1109/ICPR.2002.1047838
Filename :
1047838
Link To Document :
بازگشت