Title : 
On the use of fuzzy information retrieval for gauging similarity of Arabic documents
         
        
            Author : 
Alzahrani, Salha Mohammed ; Salim, Naomie
         
        
            Author_Institution : 
Fac. of CS & Info. Sys, Taif Univ., Hawiah, Saudi Arabia
         
        
        
        
        
        
            Abstract : 
As one of the richest human languages in terms of words constructions and diversity of meanings, judging similarity amongst statements in Arabic documents is complex. In this paper, we present a mechanism for gauging similarity of Arabic documents using fuzzy IR model. Similarity degree of two documents is the averaged similarity among statements treated as equal although they have been restructured or reworded. We introduced some fuzzy similarity sets such as near duplicate, very similar, similar, slightly similar, dissimilar and very dissimilar. These similarity sets can be implemented as a spectrum of values ranges from 1 (duplicate) and 0 (different). Our corpus collection has been built in which all stop words were removed and nonstop words were stemmed using typical Arabic IR techniques. The corpora has 100 documents with 4477 statements and 54346 non-stop-word, stemmed words in total. Another 15 query documents with 303 statements and 1620 words were specifically constructed for our test. Experimental results show that fuzzy IR can be used to define the extent documents are similar or dissimilar, where similarity can be mapped to one of the proposed fuzzy sets. The performance of our fuzzy IR system, measured in fuzzy precision and fuzzy recall, shows that it outperforms Boolean IR in retrieving more documents that have similar content but with different synonyms.
         
        
            Keywords : 
document handling; fuzzy set theory; information retrieval; natural languages; Arabic document similarity gauging; fuzzy information retrieval; fuzzy similarity sets; human language; nonstop-word stemmed words; statements; Aggregates; Content based retrieval; Fuzzy logic; Fuzzy sets; Fuzzy systems; HTML; Humans; Information retrieval; Testing; Uncertainty; Arabic; Fuzzy IR; Information Retrieval; Similarity;
         
        
        
        
            Conference_Titel : 
Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the
         
        
            Conference_Location : 
London
         
        
            Print_ISBN : 
978-1-4244-4456-4
         
        
            Electronic_ISBN : 
978-1-4244-4457-1
         
        
        
            DOI : 
10.1109/ICADIWT.2009.5273835