Title :
Data-driven spell checking: The synergy of two algorithms for spelling error detection and correction
Author :
Jayalatharachchi, E. ; Wasala, A. ; Weerasinghe, Ruvan
Author_Institution :
Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka
Abstract :
Sinhala, the majority language of Sri Lanka, is still in its infancy with respect to natural language processing research and applications. Spell checking is an important application which has received inadequate attention. One of the major issues with implementing a Sinhala spell checker is the deficiency of resources such as morphological analyzers, tagged corpora and comprehensive lexica. Due to the richness of Sinhala morphology, using an entirely rule based approach is deficient. An interesting alternative is to use data-driven approaches. This research attempts to improve the quality of Subasa, an existing n-gram based data driven spell checker using minimum edit distance techniques and to make the system freely available online. Our empirical results show that the proposed design improvements succeeded in improving the spell checking coverage. In addition, we also compare the performance of this system with others in the literature.
Keywords :
data handling; natural language processing; Sinhala spell checker; Sri Lanka; Subasa; comprehensive lexica; corpora lexica; data driven spell checking; edit distance techniques; morphological analyzers; natural language processing applications; natural language processing research; spelling error correction; spelling error detection; two algorithm synergy; Dictionaries; Data-driven; Sinhala spell-checking; edit-distance;
Conference_Titel :
Advances in ICT for Emerging Regions (ICTer), 2012 International Conference on
Conference_Location :
Colombo
Print_ISBN :
978-1-4673-5529-2
DOI :
10.1109/ICTer.2012.6422063