DocumentCode :
3030957
Title :
Correcting English text using PPM models
Author :
Teahan, W.J. ; Inglis, S. ; Cleary, J.G. ; Holmes, G.
Author_Institution :
Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand
fYear :
1998
fDate :
30 Mar-1 Apr 1998
Firstpage :
289
Lastpage :
298
Abstract :
An essential component of many applications in natural language processing is a language modeler able to correct errors in the text being processed. For optical character recognition (OCR), poor scanning quality or extraneous pixels in the image may cause one or more characters to be mis-recognized, while for spelling correction, two characters may be transposed, or a character may be inadvertently inserted or missed out, This paper describes a method for correcting English text using a PPM model. A method that segments words in English text is introduced and is shown to be a significant improvement over previously used methods. A similar technique is also applied as a post-processing stage after pages have been recognized by a state-of-the-art commercial OCR system. We show that the accuracy of the OCR system can be increased from 96.3% to 96.9%, a decrease of about 14 errors per page
Keywords :
data compression; image recognition; natural languages; optical character recognition; word processing; English text correction; PPM models; accuracy; commercial OCR system; language modeler; natural language processing; optical character recognition; pixels; post-processing stage; scanning quality; spelling correction; text compression; Application software; Character recognition; Communication channels; Computer errors; Computer science; Error correction; Natural language processing; Natural languages; Optical character recognition software; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1998. DCC '98. Proceedings
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-8186-8406-2
Type :
conf
DOI :
10.1109/DCC.1998.672157
Filename :
672157
Link To Document :
بازگشت