DocumentCode :
3334515
Title :
A non word error spell checker for Indonesian using morphologically analyzer and HMM
Author :
Soleh, M.Y. ; Purwarianti, Ayu
Author_Institution :
Dept. of Inf., Bandung Inst. of Technol., Bandung, Indonesia
fYear :
2011
fDate :
17-19 July 2011
Firstpage :
1
Lastpage :
6
Abstract :
Spell checker consists of two main methods, error detection and error correction. In this study, spell checker is built by using morphological analyzer and dictionary lookup as error detection method with two alternative optimization, binary search and hash. Whilst as for error correction, two alternative methods, namely forward reversed dictionary and probability of similarity is used. Forward reversed dictionary corrects the misspelled word by considering edit distance between the misspelled word and its candidates. Probability of similarity, which is the main proposed method for error correction, correct the misspelled word by calculating its similarity to a candidate word, based on the value of optimum subsequence between them. Candidate sorting was accomplished through the use of HMM (Hidden Markov Model), where the word is considered as observed state and the candidates as hidden state. By using HMM, the system does not only consider the similarity of the candidate word with misspelled words, but also consider the sequence of words in sentences where the word is located. The experiment result proves that sorting candidates by using HMM increase the precision accuracy. As for correction method, the result showed that using probability of similarity has better correctness accuracy than forward reversed dictionary.
Keywords :
hidden Markov models; natural language processing; probability; HMM; Indonesian nonword error spell checker; binary search optimzation; candidate sorting; dictionary lookup; edit distance; error correction; error detection method; forward reversed dictionary method; hash optimzation; hidden Markov model; hidden state; morphologically analyzer; observed state; similarity probability method; Accuracy; Dictionaries; Equations; Forward error correction; Hidden Markov models; Mathematical model; Probability; HMM; Indonesian spell checker; forward reverse dictionary; morphologically analyzer; non-word error; probability of similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical Engineering and Informatics (ICEEI), 2011 International Conference on
Conference_Location :
Bandung
ISSN :
2155-6822
Print_ISBN :
978-1-4577-0753-7
Type :
conf
DOI :
10.1109/ICEEI.2011.6021514
Filename :
6021514
Link To Document :
بازگشت