Title :
Word level correction in Gujarati document using probabilistic approach
Author :
Patel, Dhruv B. ; Goswami, Mukesh M.
Author_Institution :
Dept. of Inf. Technol., Dharmsinh Desai Univ., Nadiad, India
Abstract :
Post processing is an important part of any document processing system. There are two ways of post processing. First word level correction and second sentence level correction in document. The word level is performed in two ways first, finding error and finding dictionary by most similar word. That is called dictionary based approach. Another method to find most probable word is known as probabilistic approach. In order to generate the probabilistic model which includes unigram, bigram, trigram, online resources from various Gujarati newspaper websites are used. The proposed system will use models like Naïve Bayes and Hidden Markov Model to correct word level error. The system will be tested on synthetic dataset which is generated by adding random word level error in the actual document.
Keywords :
Bayes methods; Web sites; hidden Markov models; word processing; Gujarati document; Gujarati newspaper Web sites; bigram; dictionary based approach; document processing system; first word level correction; hidden Markov model; naive Bayes model; post processing; probabilistic approach; random word level error; second sentence level correction; synthetic dataset; trigram; unigram; Context; Crawlers; Dictionaries; Error correction; Hidden Markov models; Optical character recognition software; Probabilistic logic; Hidden Markov Model; Naïve Bayes; Probabilistic graphical model;
Conference_Titel :
Green Computing Communication and Electrical Engineering (ICGCCEE), 2014 International Conference on
Conference_Location :
Coimbatore
DOI :
10.1109/ICGCCEE.2014.6921395