Title :
Informative SNP Selection Methods Based on SNP Prediction
Author :
He, Jingwu ; Zelikovsky, Alexander
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA
fDate :
3/1/2007 12:00:00 AM
Abstract :
The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. For these studies, it is essential to use a small subset of informative SNPs, i.e., tag SNPs, accurately representing the rest of the SNPs. Tag SNP selection can achieve: 1) considerable budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs or 2) necessary reduction of the huge SNP sets (obtained, e.g., from Affymetrix) for further fine haplotype analysis. In this paper, we show that the tag SNP selection strongly depends on how the chosen tags will be used-advantage of one tag set over another can only be considered with respect to a certain prediction method. We show how to separate tag selection from SNP prediction and propose greedy and local-minimization algorithms for tag SNP selection. We give two novel approaches to SNP prediction based on multiple linear regression (MLR) and support vector machines (SVMs). An extensive experimental study on various datasets including ten regions from hapMap project shows that the MLR prediction combined with stepwise tag selection uses fewer tags than the state-of-the-art method of Halperin The MLR-based method also uses on average 30% fewer tags than IdSelect for statistical covering all SNPs. The tag selection based on SVM SNP prediction uses fewer tags to achieve the same prediction accuracy as the methods of Halldorsson
Keywords :
DNA; diseases; greedy algorithms; medical computing; minimisation; molecular biophysics; molecular configurations; polymorphism; prediction theory; regression analysis; support vector machines; Affymetrix; SNP prediction; SVM; complex diseases; genotyping; greedy algorithm; hapMap project; haplotypes; informative SNP selection methods; local-minimization algorithm; multiple linear regression; single nucleotide polymorphisms; stepwise tag selection; support vector machines; tag selection separation; Accuracy; Bioinformatics; Biological cells; Computer science; Diseases; Genomics; Helium; Linear regression; Prediction methods; Support vector machines; Genotypes; haplotypes; informative single nucleotide polymorphism (SNP); single nucleotide polymorphism (SNP); tag selection; Algorithms; Artificial Intelligence; Base Sequence; Computer Simulation; DNA Mutational Analysis; Expressed Sequence Tags; Genotype; Haplotypes; Linkage Disequilibrium; Models, Genetic; Models, Statistical; Molecular Sequence Data; Pattern Recognition, Automated; Polymorphism, Single Nucleotide; Sequence Alignment;
Journal_Title :
NanoBioscience, IEEE Transactions on
DOI :
10.1109/TNB.2007.891901