DocumentCode :
3498030
Title :
What can we learn from the processing of 165,000 forms from the 19th century?
Author :
Coüasnon, Bertrand
Author_Institution :
IRISA/INSA, Campus Univ. de Beaulieu, Rennes
fYear :
2006
fDate :
27-28 April 2006
Lastpage :
179
Abstract :
This paper presents an assessment of the structure recognition of 165,000 pages of military forms from the 19th century. This recognition have been done with the DMOS method, a generic structure recognition method already applied on various kind of documents: musical scores, mathematical formulae, recursive table structures and archival documents. With such an amount of documents, we have been confronted with the reality of difficulties found in ancient documents. We will present in this paper what we learned from this processing at a very large scale: in archival documents it is quite impossible to foresee difficulties we will have to deal with. Even with a large sample considered by archivist as representative, documents we had to deal with were much more damaged than anticipated. Even with strong and precise specifications on the way documents should be digitized, theses specifications were not followed at all, introducing new difficulties for the recognition phase. To overcome theses unexpected difficulties, the genericity of the DMOS method was particularly important
Keywords :
document image processing; history; military computing; DMOS method; ancient documents; archival documents; form processing; generic structure recognition; military forms; page structure recognition; Image analysis; Image recognition; Image segmentation; Large-scale systems; Libraries; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Image Analysis for Libraries, 2006. DIAL '06. Second International Conference on
Conference_Location :
Lyon
Print_ISBN :
0-7695-2531-8
Type :
conf
DOI :
10.1109/DIAL.2006.47
Filename :
1612960
Link To Document :
بازگشت