Clustering of web search results using Suffix tree algorithm and avoidance of repetition of same images in search results using L-Point Comparison algorithm

Author

Suneetha, Manne ; Fatima, S. Sameen ; Pervez, Shaik Mohd Zaheer

Author_Institution

Dept. of Inf. Technol., Velagapudi Ramakrishna Siddhartha Eng. Coll., Vijayawada, India

fYear

2011

fDate

23-24 March 2011

Firstpage

1041

Lastpage

1046

Abstract

It is a common experience to the web users with the existing search engines like Google, Yahoo, MSN, Ask, e.t.c., that the information related to the entered query returns a long ranked list of results (snippets). It becomes cumbersome to the user to go through each title, snippet and even sometimes link of the search results until relevant results are found to the query. Clustering of search results is a special technique in data mining using which the retrieved results are organized into meaningful groups enlightening the user work. This paper deals with the generalized Suffix tree based clustering approach. The most repeated phrase in the document tags is considered as cluster name. Thus in short, web search results that are fetched from the prevailing web search engines grouped under phrases that contain one or more search keywords. This paper aims at organizing web search results into clusters facilitating quick browsing options to the browser providing an excellent interface to results precisely. Suffix tree clustering produces comparatively more accurate and informative grouped results. A basic problem during image searching in any search engine is Image Repetition. This can be avoided by using the L-Point Comparison algorithm, a specially worked out technique in field of Information Retrieval systems, is also discussed with a practical example.

Keywords

Internet; content-based retrieval; data mining; image retrieval; pattern clustering; search engines; tree data structures; trees (mathematics); Ask; Google; L-point comparison algorithm; MSN; Web search result clustering; Yahoo; cluster name; data mining; document tags; generalized suffix tree based clustering approach; image repetition avoidance; image searching; information retrieval system; query return; quick browsing option; search engines; suffix tree algorithm; Clustering algorithms; Data mining; Engines; Pixel; Search engines; Shape; Web search; Cleaning of Document; Coherent clustering; L-point image Comparison (LPC); Shared phrase; Suffix Tree Based Clustering (STBC);

fLanguage

English

Publisher

ieee

Conference_Titel

Emerging Trends in Electrical and Computer Technology (ICETECT), 2011 International Conference on

Conference_Location

Tamil Nadu

Print_ISBN

978-1-4244-7923-8

Type

conf

DOI

10.1109/ICETECT.2011.5760272

Filename

5760272