Title of article :
Selecting a restoration technique to minimize OCR error
Author/Authors :
M.، Cannon, نويسنده , , M.، Fugate, نويسنده , , D.R.، Hush, نويسنده , , C.، Scovel, نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2003
Abstract :
This paper introduces a learning problem related to the task of converting printed documents to ASCII text files. The goal of the learning procedure is to produce a function that maps documents to restoration techniques in such a way that on average the restored documents have minimum optical character recognition error. We derive a general form for the optimal function and use it to motivate the development of a nonparametric method based on nearest neighbors. We also develop a direct method of solution based on empirical error minimization for which we prove a finite sample bound on estimation error that is independent of distribution. We show that this empirical error minimization problem is an extension of the empirical optimization problem for traditional M-class classification with general loss function and prove computational hardness for this problem. We then derive a simple iterative algorithm called generalized multiclass ratchet (GMR) and prove that it produces an optimal function asymptotically (with probability 1). To obtain the GMR algorithm we introduce a new data map that extends Keslerʹs construction for the multiclass problem and then apply an algorithm called Ratchet to this mapped data, where Ratchet is a modification of the Pocket algorithm . Finally, we apply these methods to a collection of documents and report on the experimental results.
Keywords :
Reflectance measurements , corn , Nitrogen deficiency , Crop N monitoring
Journal title :
IEEE TRANSACTIONS ON NEURAL NETWORKS
Journal title :
IEEE TRANSACTIONS ON NEURAL NETWORKS