Title of article

A novel approach to the extraction of roots from Arabic words using bigrams

Author/Authors

Ismail I. Hmeidi1، نويسنده , , Riyad F. Al-Shalabi2، نويسنده , , Ahmad T. Al-Taani3، نويسنده , , Hassan Najadat4، نويسنده , , Shaker A. Al-Hazaimeh4، نويسنده ,

Issue Information

ماهنامه با شماره پیاپی سال 2010

Pages

9

From page

583

To page

591

Abstract

Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the “Manhattan distance,” and Diceʹs measure of similarity. The proposed algorithm is tested on the Holy Quʹran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Quʹran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N-grams with the Dice measure gives better results than using the Manhattan distance measure.

Journal title

Journal of the American Society for Information Science and Technology

Serial Year

2010

Journal title

Journal of the American Society for Information Science and Technology

Record number

A novel approach to the extraction of roots from Arabic words using bigrams

Ismail I. Hmeidi1، نويسنده , , Riyad F. Al-Shalabi2، نويسنده , , Ahmad T. Al-Taani3، نويسنده , , Hassan Najadat4، نويسنده , , Shaker A. Al-Hazaimeh4، نويسنده ,

994184