Author :
Bhattacharya, Ujjwal ; Shaw, Bikash ; Parui, Swapan K.
Abstract :
Automatic form processing is an important application of document analysis subject. Such a system requires to be trained and tested on a standard database of forms collected from real-life. However, to the best of our knowledge, the only such available databases are NIST Special Databases. These databases consist of images of synthesized form documents. On the other hand, recently we developed a form database, samples of which had been taken from the real-life. ISIFormReader, a form processing system, also developed recently, has been tested using these real-life samples. An intensive study of the processing errors showed that writers´ idiosyncracies are one of the major reasons of such errors as analyzed in U. Bhattacharya, et al., (2006). In the present paper, we investigated various other sources of errors which together cause a major concern. These include sample forms which are low in contrast, noisy, smudgy, skewed, scaled disturbing its aspect ratio and so on. An analysis of errors due to similar such sources is important towards development of an improved form processing system.