• DocumentCode
    472206
  • Title

    Multiple Linear Regression for Index SNP Selection on Unphased Genotypes

  • Author

    He, Jingwu ; Zelikovsky, Alex

  • Author_Institution
    Fac. Comput. Sci., Georgia State Univ., Atlanta, GA
  • fYear
    2006
  • fDate
    Aug. 30 2006-Sept. 3 2006
  • Firstpage
    5759
  • Lastpage
    5762
  • Abstract
    The search for the association between complex diseases and single nucleotide polymorphism (SNPs) or haplotypes has recently received great attention. Recent successes in high throughput genotyping technologies drastically increase the length of available SNP sequences. This elevates the importance for the use of a small subset of informative SNPs, called index SNPs, accurately representing the rest of the SNPs (i.e., the rest of the SNPs can be highly predicted from the index SNPs). Index SNP selection achieves the compaction of huge unphased genotype data (obtained, e.g., from Affimetrix Map Array) in order to make feasible fine genotype analysis. In this paper we propose a novel index SNP selection on unphased genotypes based on multiple linear regression (MLR) SNP prediction. We measure the quality of our index SNP selection algorithm by comparing actual SNPs with the SNPs computationally predicted from chosen index SNPs. We obtain an extremely good prediction rates and compression. For example, for region ENm010 (123 SNPs), we can use 2% of SNPs for representing all SNPs with 93.5% accuracy. An experimental study on 4 ENCODE regions from HapMap shows that our method uses significantly fewer index SNPs (e.g., up to two times less index SNPs to reach 90% prediction accuracy) than the state-of-the-art method of Halperin et al. for genotypes
  • Keywords
    biochemistry; biology computing; diseases; molecular biophysics; polymorphism; regression analysis; ENCODE regions; HapMap; diseases; haplotypes; index SNP selection algorithm; multiple linear regression; single nucleotide polymorphism; unphased genotypes; Accuracy; Cities and towns; Compaction; Computer science; Diseases; Helium; Linear regression; Prediction methods; Throughput; USA Councils; Index SNPs; Multiple linear regression; Single nucleotide polymorphism;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering in Medicine and Biology Society, 2006. EMBS '06. 28th Annual International Conference of the IEEE
  • Conference_Location
    New York, NY
  • ISSN
    1557-170X
  • Print_ISBN
    1-4244-0032-5
  • Electronic_ISBN
    1557-170X
  • Type

    conf

  • DOI
    10.1109/IEMBS.2006.259408
  • Filename
    4463115