K-Means for Search Results Clustering Using URL and Tag Contents

Author

Poomagal, S. ; Hamsapriya, T.

Author_Institution

Dept. of Comput. & Inf. Sci., PSG Coll. of Technol., Coimbatore, India

fYear

2011

fDate

20-22 July 2011

Firstpage

1

Lastpage

7

Abstract

Increasing volume of web has resulted in the flooding of huge collection of web documents in search results creating difficulty for the user to browse the necessary document. Clustering is a solution to organize search results in a better way for browsing. It is a process of combining similar web documents into groups. For web page clustering, terms (features) can be extracted from different parts of a web page. Giansalvatore, Salvatore and Alessandro have extracted terms from entire web page for clustering Stanis law Osinski et al., have considered terms only from snippets. A new method is introduced in this paper which extract terms from URL, Title tag and Meta tag to produce clusters of web documents. The reason for selecting these parts of a web page is that they contain keywords which are available in a web page. Clustering algorithm used in this paper is K-means. Proposed method of clustering is compared with snippet based clustering in terms of intra-cluster distance and inter-cluster distance.

Keywords

Web sites; document handling; feature extraction; information retrieval; pattern clustering; search problems; URL; Web documents; Web page; feature extraction; k-means clustering; meta tag; search result clustering; snippet based clustering; tag content; title tag; Clustering algorithms; Ear; Feature extraction; Frequency measurement; Partitioning algorithms; Search engines; Web pages;

fLanguage

English

Publisher

ieee

Conference_Titel

Process Automation, Control and Computing (PACC), 2011 International Conference on

Conference_Location

Coimbatore

Print_ISBN

978-1-61284-765-8

Type

conf

DOI

10.1109/PACC.2011.5978906

Filename

5978906