DocumentCode :
1399921
Title :
Maximum-Parsimony Haplotype Inference Based on Sparse Representations of Genotypes
Author :
Jajamovich, Guido H. ; Wang, Xiaodong
Author_Institution :
Electr. Eng. Dept., Columbia Univ., New York, NY, USA
Volume :
60
Issue :
4
fYear :
2012
fDate :
4/1/2012 12:00:00 AM
Firstpage :
2013
Lastpage :
2023
Abstract :
The haplotypes of an individual can be used to predict diseases and help designing drugs. However, experimentally determining haplotypes is expensive and time-consuming, so genotypes are usually measured instead. Given the set of genotypes for a group of unrelated individuals, it is possible to infer the haplotype pair for each subject based on the maximum parsimony principle. Finding the exact solution to this problem is NP-hard. We propose two related formulations of the haplotype inference problem that translate the maximum parsimony principle into the sparse representation of genotypes. In the first formulation we look for the set of haplotypes that explain the genotypes such that the resulting frequency vector of haplotypes is as sparse as possible. The sparseness condition is achieved by minimizing the Tsallis entropy of the frequency vector, which is still an NP-hard problem. We propose a method that enumerates all local minima with high probability by solving a set of integer linear programs of low dimensionality. The minimizer is then found by identifying the local minimum point that achieves the lowest Tsallis entropy. In the second formulation, we state the haplotypes inference as a sparse dictionary selection problem. Each genotype is reconstructed by a haplotype pair selected from a set of available haplotypes that needs to be sparse. This leads to an approximately submodular maximization problem and therefore, can be solved with a fast greedy method. We test the proposed solutions with different data sets and compare the performance with the state-of-the-art methods, achieving similar or better results.
Keywords :
biology computing; computational complexity; diseases; genetic engineering; integer programming; linear programming; NP-hard problem; Tsallis entropy; disease prediction; drug design; genotypes; integer linear programs; maximum-parsimony haplotype inference; sparse dictionary selection problem; sparse representations; Biological cells; Clustering algorithms; Dictionaries; Entropy; Inference algorithms; Statistical analysis; Vectors; Haplotype inference; maximum parsimony principle; sparse dictionary; sparse representations;
fLanguage :
English
Journal_Title :
Signal Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1053-587X
Type :
jour
DOI :
10.1109/TSP.2011.2179542
Filename :
6104417
Link To Document :
بازگشت