DocumentCode :
468215
Title :
Using Fuzzy-Word Correlation Factors to Compute Document Similarity Based on Phrase Matching
Author :
Won Lee, Jun ; Ng, Yiu-Kai
Author_Institution :
Brigham Young Univ., Provo
Volume :
2
fYear :
2007
fDate :
24-27 Aug. 2007
Firstpage :
186
Lastpage :
192
Abstract :
One of the Web information Retrieval (IR) problems these days is to identify redundant information that exist in (replicated) Web documents. These documents can easily be found in several forms, such as documents in different versions, small documents combined with others to form a larger document, etc. As the Web is becoming more and more popular, the number of documents on the Web is increasing on a daily basis, and filtering redundant ones among this huge number of documents becomes a more difficult and an urgent task. As one of the solutions to this problem, we present a new method that identifies similar documents based on phrase matching using the fuzzy-word correlation factors among words in phrases. Since phrases can be treated as sequences of words in a sentence in any document, we consider the correlation factors of different words in any two phrases of two different documents to determine the degree of similarity of the phrases, which in turns can determine the similarity of the documents based on the number of matched phrases/sentences in the documents. Experimental results show that our phrase-matching approach is accurate and outperforms the word-based similarity matching approach.
Keywords :
document handling; fuzzy set theory; information filtering; pattern matching; Web document; Web information retrieval; document filtering; document similarity; fuzzy-word correlation factors; phrase matching; word sequence; Computer science; Content based retrieval; Degradation; Floods; Fuzzy sets; Information filtering; Information filters; Information retrieval; Infrared detectors; Optical computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
Conference_Location :
Haikou
Print_ISBN :
978-0-7695-2874-8
Type :
conf
DOI :
10.1109/FSKD.2007.602
Filename :
4406070
Link To Document :
بازگشت