DocumentCode :
3020689
Title :
Universal data capture technology from semi-structured forms
Author :
Tuganbaev, Diar ; Pakhchanian, Aram ; Deryagin, Dmitry
Author_Institution :
ABBYY Software House, Moscow, Russia
fYear :
2005
fDate :
29 Aug.-1 Sept. 2005
Firstpage :
458
Abstract :
This paper describes a universal technology for automated data capture from documents with similar data but different layouts, such as invoices, claim forms, resumes, contracts, loan documents, etc. Prior to data capture, the relevant data are detected on the document image. A formalization of top-down document analysis is suggested and a language for describing document structures is presented. Formalized descriptions in this language can be compiled into executable code. The process of matching such formalized descriptions with actual semi-structured documents in order to find the relevant data is described.
Keywords :
document image processing; string matching; formalized descriptions; semistructured documents; top-down document analysis; universal automated data capture technology; Biomedical imaging; Boosting; Contracts; Data mining; Filling; Image databases; Medical diagnostic imaging; Printing; System testing; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
ISSN :
1520-5263
Print_ISBN :
0-7695-2420-6
Type :
conf
DOI :
10.1109/ICDAR.2005.247
Filename :
1575588
Link To Document :
بازگشت