DocumentCode
2678331
Title
K-Means for Search Results Clustering Using URL and Tag Contents
Author
Poomagal, S. ; Hamsapriya, T.
Author_Institution
Dept. of Comput. & Inf. Sci., PSG Coll. of Technol., Coimbatore, India
fYear
2011
fDate
20-22 July 2011
Firstpage
1
Lastpage
7
Abstract
Increasing volume of web has resulted in the flooding of huge collection of web documents in search results creating difficulty for the user to browse the necessary document. Clustering is a solution to organize search results in a better way for browsing. It is a process of combining similar web documents into groups. For web page clustering, terms (features) can be extracted from different parts of a web page. Giansalvatore, Salvatore and Alessandro have extracted terms from entire web page for clustering Stanis law Osinski et al., have considered terms only from snippets. A new method is introduced in this paper which extract terms from URL, Title tag and Meta tag to produce clusters of web documents. The reason for selecting these parts of a web page is that they contain keywords which are available in a web page. Clustering algorithm used in this paper is K-means. Proposed method of clustering is compared with snippet based clustering in terms of intra-cluster distance and inter-cluster distance.
Keywords
Web sites; document handling; feature extraction; information retrieval; pattern clustering; search problems; URL; Web documents; Web page; feature extraction; k-means clustering; meta tag; search result clustering; snippet based clustering; tag content; title tag; Clustering algorithms; Ear; Feature extraction; Frequency measurement; Partitioning algorithms; Search engines; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Process Automation, Control and Computing (PACC), 2011 International Conference on
Conference_Location
Coimbatore
Print_ISBN
978-1-61284-765-8
Type
conf
DOI
10.1109/PACC.2011.5978906
Filename
5978906
Link To Document