Title :
Automatic table ground truth generation and a background-analysis-based table structure extraction method
Author :
Wang, Yalin ; Phillips, Ihsin T. ; Haralick, Robert
Author_Institution :
Dept. of Electr. Eng., Washington Univ., Seattle, WA, USA
fDate :
6/23/1905 12:00:00 AM
Abstract :
We first describe an automatic table ground truth generation system which can efficiently generate a large amount of accurate table ground truth suitable for the development of table detection algorithms. Then a novel background analysis-based, coarse-to-fine table identification algorithm and an X-Y cut table decomposition algorithm are described. We discuss an experimental protocol to evaluate the table detection algorithms. For a total of 1,125 document pages having 518 table entities and a total of 10,941 cell entities, our table detection algorithm takes line, word segmentation results as input and obtains around 90% cell correct detection rates
Keywords :
document image processing; image segmentation; X-Y cut table decomposition algorithm; background analysis-based identification; document layout analysis; experimental results; line segmentation; table detection algorithms; table ground truth generation system; table structure extraction method; word segmentation; Clustering algorithms; Computer science; Data mining; Detection algorithms; Educational institutions; Image analysis; Image segmentation; Partitioning algorithms; Protocols; Text analysis;
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
DOI :
10.1109/ICDAR.2001.953845