Title :
Error detection in character recognition using pseudosyllable analysis
Author :
García, R. García ; Dimitriadis, Yannis A. ; Merino Pastor, F. ; Coronado, J. López
Author_Institution :
Dept. of Syst. Eng. & Control, Valladolid Univ., Spain
Abstract :
In modern document management systems it is difficult to include large vocabularies (more than 150,000 words long) to detect on-line errors. The main drawback lies in the manipulation of the great amounts of data. This difficulty becomes critical if the system incorporates character recognition modules. In this paper we propose a new technique that stems from a written text segmentation based on phonetical and etymological criteria. The procedure we use integrates dictionary n-gram techniques. It checks whether the given character sequence matches a sequence of pseudosyllables (using a dictionary) and simultaneously checks if the pairs of pseudosyllables is admissible (through n-gram techniques). The results obtained from the proposed method using lists of words of various sizes, as well as a corpus in Spanish, are better than the n-gram methods typically used in error detection. Furthermore, it requires less memory and processing time, as compared with dictionary look-up methods
Keywords :
character recognition; computational linguistics; document handling; Document Management Systems; Spanish; character recognition; character sequence; dictionary n-gram techniques; large vocabularies; pseudosyllable analysis; text segmentation; Character recognition; Communication system control; Control systems; Dictionaries; Error correction; Industrial engineering; Systems engineering and theory; Telecommunication control; Telematics; Vocabulary;
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
DOI :
10.1109/ICDAR.1995.599032