• DocumentCode
    3714371
  • Title

    Does encoding matter? A novel view on the quantitative genetic trait prediction problem

  • Author

    Dan He;Laxmi Parida

  • Author_Institution
    IBM T.J. Watson Research, Yorktown Heights, NY, United States of America
  • fYear
    2015
  • Firstpage
    123
  • Lastpage
    126
  • Abstract
    Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes. There are lots of work on the prediction algorithms, but none of the existing work investigated the effects of the encodings on the genetic trait prediction problem. In this work, we view the genetic trait prediction problem from a novel angle: a multiple regression on categorical data problem, which requires encoding the categorical data into numerical data. We evaluate various encoding mechanisms and investigate by theory how different encodings affect the performance of the genetic trait prediction algorithms. To our knowledge, this is the first analysis on different encoding mechanisms for genetic trait prediction problem. We further proposed two novel encoding methods and we show that they are able to generate numerical features with higher predictive power. Our experiments show that our methods are superior to the other encoding methods for both single marker model and epistasis model.
  • Keywords
    "Encoding","Yttrium","Agriculture"
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BIBM.2015.7359667
  • Filename
    7359667