DocumentCode :
3262695
Title :
Web clustering based on the information of sibling pages
Author :
Lu, Caimei ; Zhang, Xiaodan ; Park, Jung-ran ; Hu, Xiaohua ; He, Tingting
Author_Institution :
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA
fYear :
2008
fDate :
26-28 Aug. 2008
Firstpage :
480
Lastpage :
485
Abstract :
This paper is dedicated to investigating the value of information from sibling pages for Web page clustering. We use a link-based clustering algorithm to examine the usefulness of sibling links for improving clustering quality. The algorithm is extended by two types of edge weighting techniques. The results of the experiments conducted on WebKB4 dataset prove that: (1) using information from sibling pages can significantly improve clustering quality; (2) sibling pages are more useful than parent and child pages in enhancing clustering performance; (3) weighting and pruning sibling links can not improve the clustering quality. We also conducted an experiment on the citation dataset Cora7. The results indicate that sibling links are not more useful than the direct citation links when used to cluster collections of research papers.
Keywords :
Internet; citation analysis; pattern clustering; text analysis; Web page clustering; citation analysis; edge weighting technique; link-based clustering algorithm; sibling page; text analysis; Bridges; Clustering algorithms; Computer science; Educational institutions; Feature extraction; HTML; Helium; Information science; Iterative algorithms; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Granular Computing, 2008. GrC 2008. IEEE International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4244-2512-9
Electronic_ISBN :
978-1-4244-2513-6
Type :
conf
DOI :
10.1109/GRC.2008.4664743
Filename :
4664743
Link To Document :
بازگشت