Title :
Thai OCR error correction using genetic algorithm
Author :
Kruatrachue, Boontee ; Somguntar, Krich ; Siriboon, Kritawan
Author_Institution :
King Mongkut´´s Inst. of Technol., Bangkok, Thailand
Abstract :
This paper presents an efficient method for Thai OCR error correction based on genetic algorithm (GA). The correction process starts with word graph construction from spell checking with dictionary, then a graph is searched for a corrected sentence with the highest perplexity (using language model, bi-gram and tri-gram) and word probability from OCR. For a long sentence, a search space is huge and can be resolved using GA. A list of nodes is used for chromosome encoding to represent all possible paths in a graph instead of standard binary string. The performance of the suggested technique is evaluated and compared to the full search for tested sentences of different size constructed from 10 nodes to 200 nodes word graphs.
Keywords :
error correction; genetic algorithms; optical character recognition; protocols; Thai OCR error correction; chromosome encoding; dictionary; genetic algorithm; spell checking; word graph construction; word probability; Character generation; Character recognition; Dictionaries; Error correction; Genetic algorithms; Natural languages; Optical character recognition software; Read only memory; Testing;
Conference_Titel :
Cyber Worlds, 2002. Proceedings. First International Symposium on
Print_ISBN :
0-7695-1862-1
DOI :
10.1109/CW.2002.1180870