DocumentCode
2483833
Title
Ancient document analysis based on text line extraction
Author
Kleber, Florian ; Sablatnig, Robert ; Gau, Melanie ; Miklas, Heinz
Author_Institution
Inst. of Comput. Aided Autom., Vienna Univ. of Technol., Vienna
fYear
2008
fDate
8-11 Dec. 2008
Firstpage
1
Lastpage
4
Abstract
In order to preserve our cultural heritage and for automated document processing libraries and national archives have started digitizing historical documents. In the case of degraded manuscripts (e.g. by mold, humidity, bad storage conditions) the text or parts of it can disappear. The remaining parts of the text can be segmented and the ruling can be extrapolated with the a priori knowledge. Since the ruling defines the position of the text within a page, it can be used for layout analysis and as a basis for the enhancement of the readability. Furthermore, information about the scribe (hand) of the manuscript, its spatiotemporal origin can be gained by analyzing the ruling. This paper presents an algorithm for ruling estimation of Glagolitic texts based on text line extraction and is suitable for degraded manuscripts by extrapolating the baselines with the a priori knowledge of the ruling. The algorithm was tested on 30 pages of the Missale Sinaiticum and the evaluation was based on visual criteria.
Keywords
document handling; Glagolitic texts; Missale Sinaiticum; ancient document analysis; automated document processing libraries; degraded manuscripts; historical documents; national archives; text line extraction; Algorithm design and analysis; Clustering algorithms; Data mining; Degradation; Image analysis; Image segmentation; Information analysis; Software libraries; Spatiotemporal phenomena; Text analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location
Tampa, FL
ISSN
1051-4651
Print_ISBN
978-1-4244-2174-9
Electronic_ISBN
1051-4651
Type
conf
DOI
10.1109/ICPR.2008.4761530
Filename
4761530
Link To Document