Title :
Supervised and unsupervised automatic spelling correction algorithms
Author :
Van Delden, Sebastian ; Bracewell, David ; Gomez, Fernando
Author_Institution :
Dept. of Math. & Comput. Sci., South Carolina Univ., Spartanburg, SC, USA
Abstract :
We present two algorithms for automatically improving the quality of texts which contain a large number of spelling errors. A supervised algorithm, which automatically corrects unknown words that are generated primarily from typing errors, is presented first. The second algorithm is an unsupervised approach to automatically correcting typing errors, individual words that have been split, multiple words which have been concatenated, and a combination of these errors. The algorithms have been developed and tested on a large source of real-world, human- and machine-generated spelling errors.
Keywords :
natural languages; text analysis; word processing; automatic spelling correction algorithms; spelling errors; supervised algorithm; unsupervised approach; Computer errors; Computer science; Concatenated codes; Databases; Error correction; Filtering algorithms; Humans; Information retrieval; NASA; Natural languages;
Conference_Titel :
Information Reuse and Integration, 2004. IRI 2004. Proceedings of the 2004 IEEE International Conference on
Print_ISBN :
0-7803-8819-4
DOI :
10.1109/IRI.2004.1431515