مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

389281

Title :

Web documents mining

Author :

Song, Qin-Bao ; Li, Nai-Qian ; Jun-Yi Shen ; Chen, Li-Ming

Volume :

fYear :

2002

fDate :

2002

Firstpage :

791

Abstract :

By grouping similar Web documents into clusters, the search space can be reduced, the search accelerated, and its precision improved. In the paper, a clustering algorithm is introduced. In the proposed clustering method, topics are represented according to a vector space model, documents are represented according to the topics, and the relation between the documents and the topics is viewed as a transaction, one document corresponds to a transaction and one topic corresponds to an item. An association rules mining algorithm discovers the frequent item sets, and the corresponding documents are seen as the initial clusters. The clusters are merged if the distance between them is small enough, or the cluster is divided if the connection strength between its documents is smaller than the given threshold. Experiments are conducted on real Web documents, results show the algorithm´s effectivity and suitability for tackling the overlapping clusters inherent in documents.

Keywords :

Web sites; data mining; information retrieval; Web documents mining; association rule; clustering algorithm; overlapping clusters; search space; topics; vector space model; Acceleration; Association rules; Clustering algorithms; Clustering methods; Computer science; Data mining; Electronic mail; Mathematics; Search engines; Space technology;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on

Print_ISBN :

0-7803-7508-4

Type :

conf

DOI :

10.1109/ICMLC.2002.1174490

Filename :

1174490

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=389281