مرکز منطقه ای اطلاع رساني علوم و فناوري - Clustering very large databases using EM mixture models

DocumentCode :

1742918

Title :

Clustering very large databases using EM mixture models

Author :

Bradley, P.S. ; Fayyad, U.M. ; Reina, C.A.

Author_Institution :

Microsoft Res., USA

Volume :

fYear :

2000

fDate :

2000

Firstpage :

Abstract :

Clustering very large databases is a challenge for traditional pattern recognition algorithms, e.g. the expectation-maximization (EM) algorithm for fitting mixture models, because of high memory and iteration requirements. Over large databases, the cost of the numerous scans required to converge and large memory requirement of the algorithm becomes prohibitive. We present a decomposition of the EM algorithm requiring a small amount of memory by limiting iterations to small data subsets. The scalable EM approach requires at most one database scan and is based on identifying regions of the data that are discardable, regions that are compressible, and regions that must be maintained in memory. Data resolution is preserved to the extent possible based upon the size of the memory buffer and fit of the current model to the data. Computational tests demonstrate that the scalable scheme outperforms similarly constrained EM approaches

Keywords :

data mining; maximum likelihood estimation; pattern clustering; probability; very large databases; data resolution; data summarisation; expectation-maximization mixture models; model estimation; very large databases; Clustering algorithms; Costs; Data mining; Distributed databases; Machine learning algorithms; Maximum likelihood estimation; Pattern recognition; Probability density function; Read-write memory; Visual databases;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Pattern Recognition, 2000. Proceedings. 15th International Conference on

Conference_Location :

Barcelona

ISSN :

1051-4651

Print_ISBN :

0-7695-0750-6

Type :

conf

DOI :

10.1109/ICPR.2000.906021

Filename :

906021

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1742918