DocumentCode :
679524
Title :
The Pairwise Gaussian Random Field for High-Dimensional Data Imputation
Author :
Zhuhua Cai ; Jermaine, Christopher ; Vagena, Z. ; Logothetis, Dionysios ; Perez, Luis L.
fYear :
2013
fDate :
7-10 Dec. 2013
Firstpage :
61
Lastpage :
70
Abstract :
In this paper, we consider the problem of imputation (recovering missing values) in very high-dimensional data with an arbitrary covariance structure. The modern solution to this problem is the Gaussian Markov random field (GMRF). The problem with applying a GMRF to very high-dimensional data imputation is that while the GMRF model itself can be useful even for data having tens of thousands of dimensions, utilizing a GMRF requires access to a sparsified, inverse covariance matrix for the data. Computing this matrix using even state-of-the-art methods is very costly, as it typically requires first estimating the covariance matrix from the data (at a O(nm2) cost for m dimensions and n data points) and then performing a regularized inversion of the estimated covariance matrix, which is also very expensive. This is impractical for even moderately-sized, high-dimensional data sets. In this paper, we propose a very simple alternative to the GMRF called the pair wise Gaussian random field or PGRF for short. The PGRF is a graphical, factor-based model. Unlike traditional Gaussian or GMRF models, a PGRF does not require a covariance or correlation matrix as input. Instead, a PGRF takes as input a set of p (dimension, dimension) pairs for which the user suspects there might be a strong correlation or anti-correlation. This set of pairs defines the graphical structure of the model, with a simple Gaussian factor associated with each of the p (dimension, dimension) pairs. Using this structure, it is easy to perform simultaneous inference and imputation of the model. The key benefit of the approach is that the time required for the PGRF to perform inference is approximately linear with respect to p, where p will typically be much smaller than the number of entries in a m×m covariance or precision matrix.
Keywords :
Gaussian processes; computational complexity; random processes; Gaussian factor; PGRF; arbitrary covariance structure; graphical factor-based model; graphical structure; high-dimensional data imputation; moderately-sized high-dimensional data sets; pairwise Gaussian random field; Computational modeling; Correlation; Covariance matrices; Data models; Gaussian distribution; Markov processes; Monte Carlo methods; classification; imputation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
ISSN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2013.149
Filename :
6729490
Link To Document :
بازگشت