Author :
de Araujo, F.R.B. ; Gusmao, Eduardo G. ; Guimaraes, Katia S.
Author_Institution :
Center of Inf., Fed. Univ. of Pernambuco, Recife, Brazil
Abstract :
The massive volume of SNP data available requires the use of adequate computational strategies to properly handle it. Identify the SNP-SNP and SNP-environment combinations that better explain the propensity for a certain disease. We introduce a website (https://jaqueira.cin.ufpe.br/pit/faces/index.jsp) where three previously reported and a new method for SNP-SNP interaction are implemented, and can be executed individually or all together for the same dataset. We also present the results of a case-control study of those methods, based on 70 epistatic models, varying rates for heritability and for minor allele frequency. The experiments also consider different numbers of SNPs and sizes of case-control sets. We observe that for a small number of SNPs, the four methods are statistically equal, but when the number of SNPs grow, they have different behavior, except for ESNP2 and our method. Although the methods are exhaustive, in general, in our analysis ESNP2 runs much faster and achieves better accuracy. Nonetheless, the performance of ESNP2 can be disturbed in a scenario where a single gene can explain most of the epistatic effects. In those cases, considering the interaction effects of all SNPs, instead of only the most significant, can deliver more accurate results. A proposed method, called Multi-Approach SNP-SNP Interaction Analysis (MASS), although statistically equal to ESNP2, achieves better results than ESNP2 in that situation. Our experiments show that specific epistatic models lead to particularly better or worse performance. While a small value for minor allele frequency can negatively impact the accuracy, small heritability rates if the single variation studied that has the strongest negative impact on the accuracy.
Keywords :
DNA; bioinformatics; data analysis; diseases; medical computing; statistical analysis; DNA data availability; ESNP2; MASS; SNP-SNP interaction detection; SNP-environment combinations; case-control study; disease; epistatic effects; epistatic models; heritability; minor allele frequency; multiapproach SNP-SNP interaction analysis; nonparametric approaches; single nucleotide polymorphisms; Accuracy; Computational modeling; Diseases; Genetics; Measurement; Sociology; Statistics; Computational Tools; Non-parametric methods; Polymorphism;