Title :
Character Code Conversion and Misspelled Word Processing in Uyghur, Kazak, Kyrgyz Multilingual Information Retrieval System
Author :
Tohti, Turdi ; Musajan, Winira ; Hamdulla, Askar
Author_Institution :
Sch. of Inf. Sci. & Eng., Xinjiang Univ., Urumqi
Abstract :
The spelling errors often occur in the web pages or in the user query phrases, and the non-Unicode character coding scheme used by some of the Uyghur, Kazak, and Kyrgyz language based websites have a serious impact on recall and accuracy of Uyghur, Kazak, and Kyrgyz information retrieval system (UKKIRS). In this paper, studied and proposed the most effective solutions and ideas for above actual problems: in view of the problem of character coding varieties, proposed a character code conversion method from the non-Unicode to Unicode; For spelling errors, proposed a reconstruction and a root-expansion method based on user query phrases. The experimental results indicated that, the proposed algorithms solved well the problems mentioned above, and are very dedicated to this UKKIRS.
Keywords :
Web sites; information retrieval systems; natural language processing; word processing; UKKIRS; Uyghur, Kazak, and Kyrgyz information retrieval system; Uyghur, Kazak, and Kyrgyz language based websites; character code conversion; misspelled word processing; multilingual information retrieval system; non unicode character coding scheme; user query phrases; web pages; Code standards; Information analysis; Information retrieval; Information science; Information technology; Natural languages; Query processing; Text processing; Web pages; Writing; Candidate Suggestion; Character coding; Code conversion; Root expansion;
Conference_Titel :
Advanced Language Processing and Web Information Technology, 2008. ALPIT '08. International Conference on
Conference_Location :
Dalian Liaoning
Print_ISBN :
978-0-7695-3273-8
DOI :
10.1109/ALPIT.2008.95