• DocumentCode
    66002
  • Title

    Informative SNPs Selection Based on Two-Locus and Multilocus Linkage Disequilibrium: Criteria of Max-Correlation and Min-Redundancy

  • Author

    Xiong Li ; Bo Liao ; Lijun Cai ; Zhi Cao ; Wen Zhu

  • Author_Institution
    Coll. of Inf. Sci. & Eng., Hunan Univ., Changsha, China
  • Volume
    10
  • Issue
    3
  • fYear
    2013
  • fDate
    May-June 2013
  • Firstpage
    688
  • Lastpage
    695
  • Abstract
    Currently, there are lots of methods to select informative SNPs for haplotype reconstruction. However, there are still some challenges that render them ineffective for large data sets. First, some traditional methods belong to wrappers which are of high computational complexity. Second, some methods ignore linkage disequilibrium that it is hard to interpret selection results. In this study, we innovatively derive optimization criteria by combining two-locus and multilocus LD measure to obtain the criteria of MaxCorrelation and Min-Redundancy (MCMR). Then, we use a greedy algorithm to select the candidate set of informative SNPs constrained by the criteria. Finally, we use backward scheme to refine the candidate subset. We separately use small and middle (>1,000 SNPs) data sets to evaluate MCMR in terms of the reconstruction accuracy, the time complexity, and the compactness. Additionally, to demonstrate that MCMR is practical for large data sets, we design a parameter w to adapt to various platforms and introduce another replacement scheme for larger data sets, which sharply narrow down the computational complexity of evaluating the reconstruct ratio. Then, we first apply our method based on haplotype reconstruction for large size (>5,000 SNPs) data sets. The results confirm that MCMR leads to promising improvement in informative SNPs selection and prediction accuracy.
  • Keywords
    biological techniques; computational complexity; greedy algorithms; optimisation; polymorphism; computational complexity; haplotype reconstruction; high computational complexity; informative SNP selection; large size data sets; linkage disequilibrium; max-correlation; min-redundancy; multilocus LD measurement; multilocus linkage disequilibrium; optimization criteria; single nucleotide polymorphism; two-locus LD measurement; two-locus linkage disequilibrium; wrappers; Accuracy; Bioinformatics; Couplings; Greedy algorithms; Prediction algorithms; Predictive models; Time complexity; Accuracy; Bioinformatics; Couplings; Greedy algorithms; Haplotypes; Prediction algorithms; Predictive models; SVM; Time complexity; biological techniques; computational complexity; greedy algorithms; haplotype reconstruction; high computational complexity; informative SNP selection; informative SNPs; large size data sets; linkage disequilibrium; max-correlation; min-redundancy; multilocus LD measurement; multilocus linkage disequilibrium; optimisation; optimization criteria; polymorphism; single nucleotide polymorphism; two-locus LD measurement; two-locus linkage disequilibrium; wrappers;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.61
  • Filename
    6517182