Author_Institution :
Comput. Sci. Dept., Brigham Young Univ., Provo, UT, USA
Abstract :
Current web search engines, such as Google, Bing, and Yahoo!, rank the set of documents S retrieved in response to a user query and display the URL of each document D in S with a title and a snippet, which serves as an abstract of D. Snippets, however, are not as useful as they are designed for, which is supposed to assist its users to quickly identify results of interest, if they exist. These snippets fail to (i) provide distinct information and (ii) capture the main contents of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is very difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user´s intended request without requiring additional inputs. Furthermore, a document title is not always a good indicator of the content of the corresponding document. All of these design problems can be solved by our proposed query-based cluster and labeler, called QCL. QCL generates concise clusters of documents covering various subject areas retrieved in response to a user query, which saves the user´s time and effort in searching for specific information of interest without having to browse through the documents one by one. Experimental results show that QCL is effective and efficient in generating high-quality clusters of documents on specific topics with informative labels.