مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

464196

Title :

Hybrid Clustering of Large Text Data

Author :

Kogan, Jacob

Author_Institution :

Dept. of Math. & Stat., UMBC, Baltimore, MD

Volume :

fYear :

2007

fDate :

21-23 May 2007

Firstpage :

367

Lastpage :

372

Abstract :

Clustering algorithms often require that the entire dataset be kept in the computer memory. When the dataset is large and does not fit into available memory one has to compress the dataset to make the application of clustering algorithms possible. The Balanced Iterative Reducing and Clustering algorithm (BIRCH) is a clustering algorithm designed to operate under the assumption "the amount of memory available is limited, whereas the dataset can be arbitrary large". The algorithm generates "a compact dataset summary" minimizing the I/O cost involved. The "summaries" contain enough information to apply the well known k-means clustering algorithm to the set of summaries and to generate partitions of the original dataset. An application of k-means requires an initial partition to be supplied as an input. To generate a "good" initial partition of the "summaries" this paper suggests a clustering algorithm, PDsDP, motivated by PDDP. We report preliminary numerical experiments involving sequential applications of BIRCH, PDsDP, and k-means/Deterministic Annealing to the Enron email dataset.

Keywords :

data compression; data mining; iterative methods; pattern clustering; text analysis; balanced iterative reducing-clustering algorithm; compact dataset summary; computer memory; data mining; dataset compression; hybrid large text data clustering; k-means clustering algorithm; principal directions divisive partitioning; Algorithm design and analysis; Annealing; Application software; Clustering algorithms; Costs; Iterative algorithms; Jacobian matrices; Mathematics; Partitioning algorithms; Statistics;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advanced Information Networking and Applications Workshops, 2007, AINAW '07. 21st International Conference on

Conference_Location :

Niagara Falls, Ont.

Print_ISBN :

978-0-7695-2847-2

Type :

conf

DOI :

10.1109/AINAW.2007.202

Filename :

4221087

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=464196