• DocumentCode
    2483833
  • Title

    Ancient document analysis based on text line extraction

  • Author

    Kleber, Florian ; Sablatnig, Robert ; Gau, Melanie ; Miklas, Heinz

  • Author_Institution
    Inst. of Comput. Aided Autom., Vienna Univ. of Technol., Vienna
  • fYear
    2008
  • fDate
    8-11 Dec. 2008
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    In order to preserve our cultural heritage and for automated document processing libraries and national archives have started digitizing historical documents. In the case of degraded manuscripts (e.g. by mold, humidity, bad storage conditions) the text or parts of it can disappear. The remaining parts of the text can be segmented and the ruling can be extrapolated with the a priori knowledge. Since the ruling defines the position of the text within a page, it can be used for layout analysis and as a basis for the enhancement of the readability. Furthermore, information about the scribe (hand) of the manuscript, its spatiotemporal origin can be gained by analyzing the ruling. This paper presents an algorithm for ruling estimation of Glagolitic texts based on text line extraction and is suitable for degraded manuscripts by extrapolating the baselines with the a priori knowledge of the ruling. The algorithm was tested on 30 pages of the Missale Sinaiticum and the evaluation was based on visual criteria.
  • Keywords
    document handling; Glagolitic texts; Missale Sinaiticum; ancient document analysis; automated document processing libraries; degraded manuscripts; historical documents; national archives; text line extraction; Algorithm design and analysis; Clustering algorithms; Data mining; Degradation; Image analysis; Image segmentation; Information analysis; Software libraries; Spatiotemporal phenomena; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
  • Conference_Location
    Tampa, FL
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-2174-9
  • Electronic_ISBN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2008.4761530
  • Filename
    4761530