DocumentCode
472206
Title
Multiple Linear Regression for Index SNP Selection on Unphased Genotypes
Author
He, Jingwu ; Zelikovsky, Alex
Author_Institution
Fac. Comput. Sci., Georgia State Univ., Atlanta, GA
fYear
2006
fDate
Aug. 30 2006-Sept. 3 2006
Firstpage
5759
Lastpage
5762
Abstract
The search for the association between complex diseases and single nucleotide polymorphism (SNPs) or haplotypes has recently received great attention. Recent successes in high throughput genotyping technologies drastically increase the length of available SNP sequences. This elevates the importance for the use of a small subset of informative SNPs, called index SNPs, accurately representing the rest of the SNPs (i.e., the rest of the SNPs can be highly predicted from the index SNPs). Index SNP selection achieves the compaction of huge unphased genotype data (obtained, e.g., from Affimetrix Map Array) in order to make feasible fine genotype analysis. In this paper we propose a novel index SNP selection on unphased genotypes based on multiple linear regression (MLR) SNP prediction. We measure the quality of our index SNP selection algorithm by comparing actual SNPs with the SNPs computationally predicted from chosen index SNPs. We obtain an extremely good prediction rates and compression. For example, for region ENm010 (123 SNPs), we can use 2% of SNPs for representing all SNPs with 93.5% accuracy. An experimental study on 4 ENCODE regions from HapMap shows that our method uses significantly fewer index SNPs (e.g., up to two times less index SNPs to reach 90% prediction accuracy) than the state-of-the-art method of Halperin et al. for genotypes
Keywords
biochemistry; biology computing; diseases; molecular biophysics; polymorphism; regression analysis; ENCODE regions; HapMap; diseases; haplotypes; index SNP selection algorithm; multiple linear regression; single nucleotide polymorphism; unphased genotypes; Accuracy; Cities and towns; Compaction; Computer science; Diseases; Helium; Linear regression; Prediction methods; Throughput; USA Councils; Index SNPs; Multiple linear regression; Single nucleotide polymorphism;
fLanguage
English
Publisher
ieee
Conference_Titel
Engineering in Medicine and Biology Society, 2006. EMBS '06. 28th Annual International Conference of the IEEE
Conference_Location
New York, NY
ISSN
1557-170X
Print_ISBN
1-4244-0032-5
Electronic_ISBN
1557-170X
Type
conf
DOI
10.1109/IEMBS.2006.259408
Filename
4463115
Link To Document