DocumentCode :
675534
Title :
Towards improving Khoja rule-based Arabic stemmer
Author :
Al-Kabi, Mohammed N.
Author_Institution :
Fac. of Sci., IT Zarqa Univ., Zarqa, Jordan
fYear :
2013
fDate :
3-5 Dec. 2013
Firstpage :
1
Lastpage :
6
Abstract :
Stemming algorithms are used to remove irrelevant morphological variations from different words, and extract the stem or the root from which the inputted word is derived. Stemming can then help to standardize terms referring to the same concept. These algorithms are widely used in information retrieval systems and Web search engines, in addition to other systems such as: Machine translation, text clustering, text summarization, question answering, indexing, text mining, text classification... etc. Khoja stemmer is a standard Arabic stemmer, which has a number of flaws. Previous studies and this one show that Khoja stemmer is better than other two competitive ones evaluated in this study. The Khoja stemmer and the other two evaluated Arabic stemmers depend mainly in their work on (Patterns, Forms). Therefore the identification of the flaws leads to identification of missing Patterns not used by Khoja stemmer. So the enhancement to Khoja stemmer is restricted to adding missing patterns, and this leads to around 5% improvement to the accuracy of Khoja stemmer.
Keywords :
information retrieval; natural language processing; Khoja rule-based Arabic stemmer; Web search engines; flaws identification; information retrieval systems; missing pattern identification; stemming algorithms; Accuracy; Algorithm design and analysis; Computers; Conferences; Electrical engineering; Fault diagnosis; Standards; Arabic; Information Retrieval; Root-Based Stemming; Stemming; Tokenization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applied Electrical Engineering and Computing Technologies (AEECT), 2013 IEEE Jordan Conference on
Conference_Location :
Amman
Print_ISBN :
978-1-4799-2305-2
Type :
conf
DOI :
10.1109/AEECT.2013.6716437
Filename :
6716437
Link To Document :
بازگشت