Title :
Web Document Clustering using Semantic Link Analysis
Author :
Arch-int, Somjit
Author_Institution :
Dept. of Comput. Sci., Khon Kaen Univ.
Abstract :
Searching and discovering the relevant information on the Web have always been challenging research areas. Web document clustering is a promising technique in preparing a huge collection of Web documents suitable for Web search engines. This paper proposes a semantic document clustering approach to categorize Web documents in a semantic manner. First, the formal methods and algorithms are introduced as techniques for document extraction and clustering. The approach incorporates WordNet and ontology knowledge as the assistant mechanisms such that the resulting set of concepts are thus utilized as formal representation for extracted documents. As a consequence, the semantic-based clusters are finally determined the cluster scores. Next, the semantic-based link analysis method is also proposed for clustering Web documents into semantic clusters that are scored based on the notion of semantic-based concepts and documents. Finally, these document scores are subsequently used for evaluating the semantic document similarity and document quality. As such, the precision criterion is employed for efficient evaluations by comparing with keywords-based search method. The experimental results reported that the proposed method was able to outperform the TF/IDF method up to 9% on average
Keywords :
document handling; information retrieval; ontologies (artificial intelligence); semantic Web; Web document clustering; WordNet; document extraction; ontology; semantic link analysis; Clustering algorithms; Computer science; Information analysis; Information technology; Internet; Ontologies; Search engines; Semantic Web; Technological innovation; Web pages;
Conference_Titel :
Computational Intelligence for Modelling, Control and Automation, 2005 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, International Conference on
Conference_Location :
Vienna
Print_ISBN :
0-7695-2504-0
DOI :
10.1109/CIMCA.2005.1631438