مرکز منطقه ای اطلاع رساني علوم و فناوري - Web Document Clustering by Using Automatic Keyphrase Extraction

DocumentCode :

2731407

Title :

Web Document Clustering by Using Automatic Keyphrase Extraction

Author :

Han, Juhyun ; Kim, Taehwan ; Choi, Joongmin

Author_Institution :

Hanyang Univ., Ansan

fYear :

2007

fDate :

5-12 Nov. 2007

Firstpage :

Lastpage :

Abstract :

In most traditional techniques of document clustering, the number of total clusters is not known in advance and the cluster that contain the target information cannot be determined since the semantic nature is not associated with the cluster. The well-known K-means clustering algorithm partially solves these problems by allowing users to specify the number of clusters. However, if the pre-specified number of clusters is modified, the precision of each result also changes. To solve this problem, this paper proposes a new clustering algorithm based on the Kea keyphrase extraction algorithm which returns several keyphrases from the source documents by using some machine learning techniques. In this paper, documents are grouped into several clusters like K-means, but the number of clusters is automatically determined by the algorithm with some heuristics using the extracted keyphrases. Our Kea-means clustering algorithm provides easy and efficient ways to extract test documents from massive quantities of resources.

Keywords :

Internet; document handling; feature extraction; learning (artificial intelligence); pattern clustering; K-means clustering algorithm; Kea-means clustering algorithm; Web document clustering; automatic keyphrase extraction; machine learning techniques; Clustering algorithms; Computer science; Conferences; Data mining; Heuristic algorithms; Intelligent agent; Machine learning; Machine learning algorithms; Partitioning algorithms; Testing; Kea-means ClusteringKeyphrasesK-meansClustering;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Web Intelligence and Intelligent Agent Technology Workshops, 2007 IEEE/WIC/ACM International Conferences on

Conference_Location :

Silicon Valley, CA

Print_ISBN :

0-7695-3028-1

Type :

conf

DOI :

10.1109/WI-IATW.2007.46

Filename :

4427539

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2731407