• DocumentCode
    3350350
  • Title

    Automatic genotype calling of single nucleotide polymorphisms using a linear grouping algorithm

  • Author

    Guohua Yan ; Welch, W.J. ; Zamar, R.H. ; Akhabir, L. ; McDonald, Tony

  • Author_Institution
    Dept. of Math. & Stat., Univ. of New Brunswick, Fredericton, NB, Canada
  • Volume
    4
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    2391
  • Lastpage
    2395
  • Abstract
    The use of single nucleotide polymorphisms (SNPs) has become increasingly important for a wide range of genetic studies. A high-throughput genotyping technology usually involves a statistical algorithm for automatic (non-manual) genotype calling. Most calling algorithms in the literature, using methods such as k-means and mixture-models, rely on elliptical structures of the genotyping intensity data. They may fail when the intensity data have linear patterns. We propose an automatic genotype calling algorithm by further developing a linear grouping algorithm (LGA). The proposed method clusters data points around lines as opposed to around centroids. The clusters are on lines because we do not normalize the intensities. In addition, we associate a quality value, silhouette width, with each DNA sample and with each whole plate. For a data set of 101 SNPs from the TaqMan platform (Applied Biosystems), the LGA algorithm has 100% automatic calling and 93% of samples pass a quality criterion and are assigned a genotype. For a subset of 30 SNPs where validated samples are available, the accuracy for called genotypes is over 98%. Thus, a key feature of applying LGA to unnormalized TaqMan SNP assay fluorescent signals is that it is able to call automatically and realiably a substantial proportion of samples, reducing the need for manual intervention. It could be potentially adapted to other fluorescent-based SNP genotyping technologies such as Invader Assay.
  • Keywords
    DNA; biology computing; genetics; genomics; pattern clustering; statistical analysis; DNA sample; LGA algorithm; automatic genotype calling algorithm; data point clustering; elliptical structures; genotyping intensity data; high-throughput genotyping technology; linear grouping algorithm; linear patterns; quality criterion; silhouette width; single nucleotide polymorphism; statistical algorithm; unnormalized TaqMan SNP assay fluorescent signals; Accuracy; Approximation algorithms; Clustering algorithms; DNA; Educational institutions; Manuals; Software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Computation (ICNC), 2011 Seventh International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    2157-9555
  • Print_ISBN
    978-1-4244-9950-2
  • Type

    conf

  • DOI
    10.1109/ICNC.2011.6022592
  • Filename
    6022592