DocumentCode :
2000117
Title :
Information retrieval: A new multilingual stemmer based on a statistical approach
Author :
Gadri, Said ; Moussaoui, Abdelouahab
Author_Institution :
Dept. of ICST, Univ. of M´sila, M´sila, Algeria
fYear :
2015
fDate :
25-27 May 2015
Firstpage :
1
Lastpage :
6
Abstract :
Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results. In this paper, we propose a new multilingual stemmer based on the extraction of word root and in which we use the technique of n-grams. We validated our stemmer on three languages which are: Arabic, French and English.
Keywords :
data mining; information retrieval; natural language processing; statistical analysis; text analysis; vocabulary; Arabic language; English language; French language; NLP; information retrieval; multilingual stemmer; natural language processing; statistical approach; terms vocabulary; text categorization; text mining; text summarizing; word root extraction; Error analysis; Information retrieval; Integrated circuits; Natural language processing; Statistical analysis; Text categorization; Text mining; Bigrams technique; Information retrieval; Machine learning; Natural language processing; Root extraction; Stemming; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control, Engineering & Information Technology (CEIT), 2015 3rd International Conference on
Conference_Location :
Tlemcen
Type :
conf
DOI :
10.1109/CEIT.2015.7233113
Filename :
7233113
Link To Document :
بازگشت