DocumentCode
430495
Title
Similarity score for information filtering thresholds
Author
Lai, Jun ; Soh, Ben
Author_Institution
Dept. of Comput. Sci. & Comput. Eng., Latrobe Univ., Melbourne, Vic., Australia
Volume
1
fYear
2004
fDate
26-29 Oct. 2004
Firstpage
216
Abstract
The rapid growth of on-line information has led to the development of many techniques for information filtering. The tremendous growth in the amount of information available and the number of visitors to Web sites in recent years poses some key challenges for information filtering and retrieval. Web visitors not only expect high quality and relevant information, but also wish that the information be presented in as efficient a way as possible. The traditional filtering methods, however, only consider the relevant values of document. These conventional methods fail to consider the efficiency of document retrieval. In this paper, we propose a new algorithm to calculate an index called document similarity score based on elements of the document. Using the index, document profile is derived. Any documents with the similarity score above a given threshold are clustered. Using these pre-clustered documents, information filtering and retrieval can be made more efficient. Experimental results clearly show our proposed method tremendously improves the efficiency of information filtering and retrieval.
Keywords
Web sites; information filtering; search engines; Web sites; document retrieval efficiency; document similarity score; information filtering thresholds; information retrieval; on-line information; pre-clustered documents; Books; Clustering algorithms; Conference proceedings; Crawlers; Electronic mail; Information filtering; Information filters; Information retrieval; Search engines; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Communications and Information Technology, 2004. ISCIT 2004. IEEE International Symposium on
Print_ISBN
0-7803-8593-4
Type
conf
DOI
10.1109/ISCIT.2004.1412482
Filename
1412482
Link To Document