DocumentCode :
2369750
Title :
Learning Bayesian networks from incomplete data based on EMI method
Author :
Tian, Fengzhan ; Zhang, Hongwei ; Lu, Yuchang
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear :
2003
fDate :
19-22 Nov. 2003
Firstpage :
323
Lastpage :
330
Abstract :
Currently, there are few efficient methods in practice for learning Bayesian networks from incomplete data, which affects their use in real world data mining applications. We present a general-duty method that estimates the (conditional) mutual information directly from incomplete datasets, EMI. EMI starts by computing the interval estimates of a joint probability of a variable set, which are obtained from the possible completions of the incomplete dataset. And then computes a point estimate via a convex combination of the extreme points, with weights depending on the assumed pattern of missing data. Finally, based on these point estimates, EMI gets the estimated (conditional) mutual information. We also apply EMI to the dependency analysis based learning algorithm by J. Cheng so as to efficiently learn BNs with incomplete data. The experimental results on Asia and Alarm networks show that EMI based algorithm is much more efficient than two search & scoring based algorithms, SEM and EM-EA algorithms. In terms of accuracy, EMI based algorithm is more accurate than SEM algorithm, and comparable with EM-EA algorithm.
Keywords :
belief networks; data mining; directed graphs; estimation theory; learning (artificial intelligence); probability; Alarm network; Bayesian network learning; EM-EA algorithm; EMI method; SEM algorithm; conditional mutual information estimation; dependency analysis based learning algorithm; general-duty method; incomplete data; joint probability; point estimates; real world data mining application; search & scoring based algorithm; variable set; Algorithm design and analysis; Application software; Bayesian methods; Computer science; Convergence; Data mining; Electromagnetic interference; Mutual information; Probability distribution; Sampling methods;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
Type :
conf
DOI :
10.1109/ICDM.2003.1250936
Filename :
1250936
Link To Document :
بازگشت