Title :
Vietnamese spelling detection and correction using Bi-gram, Minimum Edit Distance, SoundEx algorithms with some additional heuristics
Author :
Nguyen, Phuong H. ; Ngo, Thuan D. ; Phan, Dung A. ; Dinh, Thu P T ; Huynh, Thang Q.
Author_Institution :
Dept. of Software Eng., Hanoi Univ. of Technol., Hanoi
Abstract :
The spelling checking problem is considered to contain two main phases: the detecting phase and the correcting phase. In this paper, we present a new approach for Vietnamese spelling checking based on Vietnamese characteristics for each phase. Our research approach includes the use of a syllable Bi-gram in combination with parts of speech (POS) to find out suspected syllables. In the correcting phase, we based on minimum edit distance, SoundEx algorithms and some heuristics to build a weight function for assessing suggestion candidates. The training corpus and the test set were collected from e-newspapers.
Keywords :
natural language processing; speech recognition; SoundEx algorithms; Vietnamese spelling correction; Vietnamese spelling detection; bi-gram; correcting phase; minimum edit distance; phase detection; spelling checking problem; Dictionaries; Error correction; Heuristic algorithms; Natural languages; Optical character recognition software; Phase detection; Software algorithms; Software engineering; Speech; Testing; Minimum Edit Distance; N-gram; SoundEx; Vietnamese spelling checking; spelling correction; spelling detection;
Conference_Titel :
Research, Innovation and Vision for the Future, 2008. RIVF 2008. IEEE International Conference on
Conference_Location :
Ho Chi Minh City
Print_ISBN :
978-1-4244-2379-8
Electronic_ISBN :
978-1-4244-2380-4
DOI :
10.1109/RIVF.2008.4586339