Abstract :
To process high-volume input data, such as the scanned images of publishers´ book and journal collections, content understanding systems should run automatically, continuously, and without human attendance. Ensuring the output quality of such systems is a challenging task, however, and automated quality assurance techniques are thus essential to its success. The author discusses three automated QA techniques that were developed for Hewlett-Packard´s Digital Content ReMastering system.
Keywords :
document image processing; quality control; text analysis; Digital Content ReMastering system; Hewlett-Packard; automated quality assurance; book collections; content understanding systems; document understanding systems; journal collections; Computer architecture; Hardware; Humans; Material storage; Network servers; Optical character recognition software; Quality assurance; Switches; Workstations; XML;