DocumentCode :
1797944
Title :
A hierarchical learning approach to calibrate allele frequencies for SNP based genotyping of DNA pools
Author :
Hellicar, Andrew D. ; Smith, D. ; Rahman, Aminur ; Engelke, Ulrich ; Henshall, John
Author_Institution :
Comput. Inf., CSIRO, Hobart, TAS, Australia
fYear :
2014
fDate :
6-11 July 2014
Firstpage :
183
Lastpage :
189
Abstract :
The combination of low density SNP arrays and DNA pooling is a fast and cost effective approach to genotyping that opens up basic genomics to a range of new applications and studies. However we have identified significant limitations in the existing approach to calculating allele frequencies with DNA pooling. These limitations include a reduced ability to deal with SNP to SNP variation via the standard interpolation method. Our contribution is a new hierarchical learning framework which resolves these drawbacks. The framework involves a hierarchy of two greedily trained layers of learners. The first layer learns the bias of each SNP then applies a calibration to reduce SNP bias by mapping into a common coordinate system across all SNPs. The second layer learns an allele frequency function exploiting the global SNP data. A range of algorithms have been applied including linear regression, neural network and support vector regression. The framework has been tested on pooled samples of Black Tiger prawns that have been genotyped with low density Sequenom iPLEX panels. Analysis of pooled samples and the corresponding individually genotyped SNP samples indicate the pooling approach introduces an allele frequency RMS error of 0.12. The existing calibration approach corrects ~14% of the error. Our hierarchical approach is 4.5 times as effective by correcting for ~64% of the introduced error. This is a significant reduction and has the potential to enable genetic studies previously not possible due to allele frequency error. Although testing so far is limited to low density SNP arrays the approach was developed to generalize to other SNP genotyping technologies.
Keywords :
DNA; biology computing; calibration; genetics; genomics; interpolation; learning (artificial intelligence); least mean squares methods; molecular biophysics; neural nets; polymorphism; regression analysis; support vector machines; DNA pool; SNP array; allele frequency RMS error; allele frequency calibration; allele frequency function; common coordinate system; genotyping; hierarchical learning approach; linear regression; neural network; sequenom iPLEX panel; singular nucleotide polymorphism; standard interpolation method; support vector regression; Accuracy; Bioinformatics; Calibration; DNA; Frequency estimation; Genomics; Interpolation; DNA; Machine learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
Type :
conf
DOI :
10.1109/IJCNN.2014.6889697
Filename :
6889697
Link To Document :
بازگشت