• DocumentCode
    245137
  • Title

    Learning Sparse Gaussian Bayesian Network Structure by Variable Grouping

  • Author

    Jie Yang ; Leung, Henry C. M. ; Yiu, S.M. ; Yunpeng Cai ; Chin, Francis Y. L.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Hong Kong, Hong Kong, China
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    1073
  • Lastpage
    1078
  • Abstract
    Bayesian networks (BNs) are popular for modeling conditional distributions of variables and causal relationships, especially in biological settings such as protein interactions, gene regulatory networks and microbial interactions. Previous BN structure learning algorithms treat variables with similar tendency separately. In this paper, we propose a grouped sparse Gaussian BN (GSGBN) structure learning algorithm which creates BN based on three assumptions: (i) variables follow a multivariate Gaussian distribution, (ii) the network only contains a few edges (sparse), (iii) similar variables have less-divergent sets of parents, while not-so-similar ones should have divergent sets of parents (variable grouping). We use L1 regularization to make the learned network sparse, and another term to incorporate shared information among variables. For similar variables, GSGBN tends to penalize the differences of similar variables´ parent sets more, compared to those not-so-similar variables´ parent sets. The similarity of variables is learned from the data by alternating optimization, without prior domain knowledge. Based on this new definition of the optimal BN, a coordinate descent algorithm and a projected gradient descent algorithm are developed to obtain edges of the network and also similarity of variables. Experimental results on both simulated and real datasets show that GSGBN has substantially superior prediction performance for structure learning when compared to several existing algorithms.
  • Keywords
    Gaussian distribution; belief networks; biology computing; gradient methods; learning (artificial intelligence); optimisation; GSGBN structure learning algorithm; L1 regularization; alternating optimization; biological settings; causal relationships; coordinate descent algorithm; grouped sparse Gaussian BN; multivariate Gaussian distribution; not-so-similar variable parent sets; projected gradient descent algorithm; similar variable parent sets; sparse Gaussian Bayesian network structure learning; variable conditional distributions; variable grouping; Benchmark testing; Bismuth; Gaussian distribution; Linear regression; Optimization; Probability distribution; Sensitivity; Bayesian network; microbial interactions; sparsity; variable grouping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.126
  • Filename
    7023449