Title :
Fuzzy clustering and relevance ranking of web search results with differentiating cluster label generation
Author :
Matsumoto, Takazumi ; Hung, Edward
Author_Institution :
Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong, China
Abstract :
This paper introduces a prototype web search results clustering engine that enhances search results by performing fuzzy clustering on web documents returned by conventional search engines, as well as ranking the results and labeling the resulting clusters. This is done using a fuzzy transduction-based clustering algorithm (FTCA), which employs a transduction-based relevance model (TRM) to generate document relevance values. These relevance values are used to cluster similar documents, rank them, and facilitate a term frequency based label generator. The membership degrees of documents to fuzzy clusters also facilitates effective detection and removal of overly similar clusters. FTCA is compared against two other established web document clustering algorithms: Suffix Tree Clustering (STC) and Lingo, which are provided by the free open source Carrot2 Document Clustering Workbench. To measure cluster quality, an extended version of the classic precision measurement is used to take into account relevance and fuzzy clustering, along with recall and F1 score. Results from testing on five different datasets show a considerable clustering quality and performance advantage over STC and Lingo in most cases.
Keywords :
Internet; fuzzy set theory; pattern clustering; public domain software; search engines; trees (mathematics); Lingo; Web document clustering algorithms; Web search results clustering engine; differentiating cluster label generation; fuzzy transduction-based clustering algorithm; open source carrot2 document clustering workbench; precision measurement; relevance ranking; search engines; suffix tree clustering; term frequency; transduction-based relevance model; Clustering algorithms; Labeling; Prototypes; Search engines; Testing; Transmission line measurements; Web search;
Conference_Titel :
Fuzzy Systems (FUZZ), 2010 IEEE International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6919-2
DOI :
10.1109/FUZZY.2010.5584771