Title :
Global and Componentwise Extrapolation for Accelerating Data Mining from Large Incomplete Data Sets with the EM Algorithm
Author :
Hsu, Chun-Nan ; Huang, Han-Shen ; Yang, Bo-Hou
Author_Institution :
Inst. of Inf. Sci., Acad. Sinica Nankang, Taipei
Abstract :
The expectation-maximization (EM) algorithm is one of the most popular algorithms for data mining from incomplete data. However, when applied to large data sets with a large proportion of missing data, the EM algorithm may converge slowly. The triple jump extrapolation method can effectively accelerate the EM algorithm by substantially reducing the number of iterations required for EM to converge. There are two options for the triple jump method, global extrapolation (TJEM) and componentwise extrapolation (CTJEM). We tried these two methods for a variety of probabilistic models and found that in general, global extraplolation yields a better performance, but there are cases where componentwise extrapolation yields very high speedup. In this paper, we investigate when componentwise extrapolation should be preferred. We conclude that, when the Jacobian of the EM mapping is diagonal or block diagonal, CTJEM should be preferred. We show how to determine whether a Jacobian is diagonal or block diagonal and experimentally confirm our claim. In particular, we show that CTJEM is especially effective for the semi-supervised Bayesian classifier model given a highly sparse data set.
Keywords :
Bayes methods; data analysis; data mining; expectation-maximisation algorithm; extrapolation; pattern classification; probability; EM algorithm; EM mapping; componentwise extrapolation; data mining; expectation-maximization algorithm; global extrapolation; highly sparse data set; large incomplete data sets; probabilistic model; semisupervised Bayesian classifier model; triple jump extrapolation; Acceleration; Bayesian methods; Convergence; Data mining; Extrapolation; Information science; Iterative algorithms; Jacobian matrices; Packaging; Parameter estimation;
Conference_Titel :
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2701-7
DOI :
10.1109/ICDM.2006.77