• DocumentCode
    2145845
  • Title

    A Scriptable, Statistical Oracle for a Metadata Extraction System

  • Author

    Maly, Kurt J. ; Zeil, Steven J. ; Zubair, Mohammad ; Amrou, Ashraf ; Aazhar, Ali ; Ratkal, Naveen

  • Author_Institution
    Old Dominion Univ., Norfolk
  • fYear
    2007
  • fDate
    11-12 Oct. 2007
  • Firstpage
    396
  • Lastpage
    403
  • Abstract
    An oracle is described for dynamic validation of an application (metadata extraction from scanned documents) where a moderate failure rate is acceptable provided that instances of failures during operation can be identified. The oracle combines a variety of deterministic tests and statistical tests based upon characteristics of the document collection on which the system operates. Because this system must adapt to a variety of document collections with different characteristics, a scripting language is developed that binds combinations of tests to the metadata fields expected in a given document collection. The suitability of the oracle is demonstrated by an experiment measuring its ability to mimic human judgments as to which of several alternate outputs for the same document would be preferred.
  • Keywords
    authoring languages; meta data; document collection; metadata extraction system; moderate failure rate; scripting language; statistical oracle; Application software; Computer errors; Computer science; Data mining; Engines; Error correction; Humans; Optical character recognition software; System testing; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Quality Software, 2007. QSIC '07. Seventh International Conference on
  • Conference_Location
    Portland, OR
  • ISSN
    1550-6002
  • Print_ISBN
    978-0-7695-3035-2
  • Type

    conf

  • DOI
    10.1109/QSIC.2007.4385526
  • Filename
    4385526