Title :
Layout Analysis for Historical Manuscripts Using Sift Features
Author :
Garz, Angelika ; Sablatnig, Robert ; Diem, Markus
Author_Institution :
Comput. Vision Lab., Vienna Univ. of Technol., Vienna, Austria
Abstract :
We propose a layout analysis method for historical manuscripts that relies on the part-based identification of layout entities. A layout entity -- such as letters of the text, initials or headings -- is composed of a set of characteristic segments or structures, which is dissimilar for distinct classes in the manuscripts under consideration. This fact is exploited in order to segment a manuscript page into homogeneous regions. Historical documents traditionally involve challenges such as uneven writing support and varying shapes of characters, fluctuating text lines, changing scripts and writing styles, and variance in the layout itself. Hence, a part-based detection of layout entities is proposed using a multi-stage algorithm for the localization of the entities, based on interest points. Results show that the proposed method is able to locate initials, headings and text areas in ancient manuscripts containing stains, tears and partially faded-out ink sufficiently well.
Keywords :
character recognition; document image processing; text analysis; SIFT features; historical manuscript; layout analysis; layout entity; multistage algorithm; part-based identification; scale invariant feature transform; writing styles; Clustering algorithms; Layout; Noise; Robustness; Shape; Support vector machines; Writing; Sift; document layout; handwritten; historical manuscripts; layout analysis; part-based;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.108