DocumentCode
2145845
Title
A Scriptable, Statistical Oracle for a Metadata Extraction System
Author
Maly, Kurt J. ; Zeil, Steven J. ; Zubair, Mohammad ; Amrou, Ashraf ; Aazhar, Ali ; Ratkal, Naveen
Author_Institution
Old Dominion Univ., Norfolk
fYear
2007
fDate
11-12 Oct. 2007
Firstpage
396
Lastpage
403
Abstract
An oracle is described for dynamic validation of an application (metadata extraction from scanned documents) where a moderate failure rate is acceptable provided that instances of failures during operation can be identified. The oracle combines a variety of deterministic tests and statistical tests based upon characteristics of the document collection on which the system operates. Because this system must adapt to a variety of document collections with different characteristics, a scripting language is developed that binds combinations of tests to the metadata fields expected in a given document collection. The suitability of the oracle is demonstrated by an experiment measuring its ability to mimic human judgments as to which of several alternate outputs for the same document would be preferred.
Keywords
authoring languages; meta data; document collection; metadata extraction system; moderate failure rate; scripting language; statistical oracle; Application software; Computer errors; Computer science; Data mining; Engines; Error correction; Humans; Optical character recognition software; System testing; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Quality Software, 2007. QSIC '07. Seventh International Conference on
Conference_Location
Portland, OR
ISSN
1550-6002
Print_ISBN
978-0-7695-3035-2
Type
conf
DOI
10.1109/QSIC.2007.4385526
Filename
4385526
Link To Document