DocumentCode :
952158
Title :
Highly Scalable Genotype Phasing by Entropy Minimization
Author :
Gusev, Alexander ; Mandoiu, Ion I. ; Pasaniuc, B.
Author_Institution :
Dept. of Comput. Sci., Columbia Univ., New York, NY
Volume :
5
Issue :
2
fYear :
2008
Firstpage :
252
Lastpage :
261
Abstract :
A single nucleotide polymorphism (SNP) is a position in the genome at which two or more of the possible four nucleotides occur in a large percentage of the population. SNPs account for most of the genetic variability between individuals and mapping SNPs in the human population has become the next high priority in genomics after the completion of the Human Genome Project. In diploid organisms such as humans, there are two nonidentical copies of each autosomal chromosome. A description of the SNPs in a chromosome is called a haplotype. At present, it is prohibitively expensive to directly determine the haplotypes of an individual, but it is possible to rather easily obtain the conflated SNP information in the so-called genotype. Computational methods for genotype phasing, that is, inferring haplotypes from genotype data, have received much attention in recent years as haplotype information leads to an increased statistical power of disease association tests. However, many of the existing algorithms have impractical runtime for phasing large genotype data sets such as those generated by the international HapMap Project. In this paper, we propose a highly scalable algorithm based on entropy minimization. Our algorithm is capable of phasing both unrelated and related genotypes coming from complex pedigrees. Experimental results on both real and simulated data sets show that our algorithm achieves a phasing accuracy worse than but close to that of the best existing methods while being several orders of magnitude faster. The open source code implementation of the algorithm and a Web interface are publicly available at http://dna.engr.uconn.edu/~software/ent/.
Keywords :
biochemistry; genetics; medical computing; minimisation; minimum entropy methods; public domain software; Human Genome Project; Web interface; autosomal chromosome; computational methods; diploid organisms; disease association tests; entropy minimization; genetic variability; haplotype information; highly scalable genotype phasing mechanism; human population; international HapMap Project; open source code implementation; phasing accuracy; single nucleotide polymorphism; statistical power; Single Nucleotide Polymorphism; algorithm.; genotype phasing; haplotype; Algorithms; Alleles; Computational Biology; Databases, Nucleic Acid; Female; Genetic Techniques; Genotype; Humans; Male; Models, Genetic; Models, Statistical; Pedigree; Polymorphism, Single Nucleotide;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2007.70223
Filename :
4359881
Link To Document :
بازگشت