Title :
Machine learning methods for automatically processing historical documents: from paper acquisition to XML transformation
Author :
Esposito, F. ; Malerba, D. ; Semeraro, G. ; Ferilli, S. ; Altamura, O. ; Basile, T. M A ; Berardi, M. ; Ceci, M. ; Di Mauro, N.
Author_Institution :
Dipt. di Informatica, Bari Univ., Italy
Abstract :
One of the aims of the EU project COLLATE is to design and implement a Web-based collaboratory for archives, scientists and end-users working with digitized cultural material. Since the originals of such a material are often unique and scattered in various archives, severe problems arise for their wide fruition. A solution would be to develop intelligent document processing tools that automatically transform printed documents into a Web-accessible form such as XML. Here, we propose the use of a document processing system, WISDOM++, which uses heavily machine learning techniques in order to perform such a task, and report promising results obtained in preliminary experiments.
Keywords :
XML; digital libraries; document handling; history; learning (artificial intelligence); records management; COLLATE EU project; WISDOM++ document processing system; Web-based collaboratory; XML transformation; automatic historical document processing tools; digitized cultural material; machine learning; paper acquisition; Collaborative work; Cultural differences; Image sequence analysis; Layout; Learning systems; Optical character recognition software; Scattering; Software libraries; Text analysis; XML;
Conference_Titel :
Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on
Print_ISBN :
0-7695-2088-X
DOI :
10.1109/DIAL.2004.1263262