DocumentCode :
2387241
Title :
Comparison of Machine Learning and Pattern Discovery Algorithms for the Prediction of Human Single Nucleotide Polymorphisms
Author :
Yan, Rui ; Boutros, Paul C. ; Jurisica, Igor ; Penn, Linda Z.
Author_Institution :
Univ. of Toronto, Toronto
fYear :
2007
fDate :
2-4 Nov. 2007
Firstpage :
452
Lastpage :
452
Abstract :
This paper compares machine learning techniques and pattern discovery algorithms for the prediction of human single nucleotide polymorphisms (SNPs). We selected six pattern discovery algorithms (YMF, Projection, Weeder, MotifSampler, AlignACE and ANN-Spec) and two machine learning techniques (random forests and K-nearest neighbours) and applied them to the DNA sequences flanking non- coding SNPs on human chromosome 21. We compared the pattern similarity amongst the methods and validated the predictions using known SNPs on chromosome 22. Parameterization of both machine learning and pattern discovery algorithms was critical to their performance. Memory usage was broadly constant amongst the pattern discovery algorithms, but the CPU running time varied significantly between deterministic and probabilistic pattern discovery methods, i.e., on average, probabilistic methods run 19 times slower than deterministic methods. This is the first demonstration of SNP prediction, as well as the first comparison of machine learning and pattern discovery algorithms in SNP prediction studies.
Keywords :
biology computing; data mining; learning (artificial intelligence); DNA sequences; K-nearest neighbours; human single nucleotide polymorphisms; machine learning; pattern discovery; random forests; Biological cells; Biophysics; Computer science; DNA; Humans; Machine learning; Machine learning algorithms; Prediction algorithms; Sampling methods; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Granular Computing, 2007. GRC 2007. IEEE International Conference on
Conference_Location :
Fremont, CA
Print_ISBN :
978-0-7695-3032-1
Type :
conf
DOI :
10.1109/GrC.2007.72
Filename :
4403141
Link To Document :
بازگشت