DocumentCode :
3228767
Title :
WISE: Hierarchical Soft Clustering of Web Page Search Results Based on Web Content Mining Techniques
Author :
Campos, Ricardo ; Dias, Gael ; Nunes, Celia
Author_Institution :
Centre of Human Language Technol. & Bioinformatics, Univ. of Beira Interior
fYear :
2006
fDate :
18-22 Dec. 2006
Firstpage :
301
Lastpage :
304
Abstract :
Typically, search engines are low precision in response to a query, retrieving lots of useless Web pages, and missing some other important ones. In this paper, we study the problem of the hierarchical clustering of Web pages search results. In particular, we propose an architecture called WISE, a meta-search engine that automatically builds clusters of related Web pages embodying one meaning of the query. These clusters are then hierarchically organized and labeled with a phrase representing the key concept of the cluster and the corresponding Web documents. The system which is a Web-based interface (soon available at wise.di.ubi.pt), introduces some interesting new ideas, such as the preselection of the retrieved Web pages, the capacity to statistically detect phrases within documents and the representation of documents based on their most relevant key concepts by using Web content mining techniques. The final step of the system is supported by a graph-based overlapping clustering algorithm which groups the selected documents into a hierarchy of clusters
Keywords :
Internet; data mining; document handling; graph theory; information retrieval; pattern clustering; search engines; WISE architecture; Web content mining; Web document; Web page retrieval; Web page search result hierarchical soft clustering; Web-based interface; document clustering; document phrase statistical detection; document representation; graph-based overlapping clustering algorithm; meta-search engine; Bioinformatics; Clustering algorithms; Content based retrieval; Humans; Information analysis; Information retrieval; Metasearch; Search engines; Service oriented architecture; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2747-7
Type :
conf
DOI :
10.1109/WI.2006.201
Filename :
4061381
Link To Document :
بازگشت