Title of article :
Imputation of Ungenotyped Individuals Based on Genotyped Relatives Using Machine Learning Methodology
Author/Authors :
Rastin Bojnord, Naeem Department of Animal Science - Science and Research Branch Islamic Azad University, Tehran, Iran , Aminafshar, Mehdi Department of Animal Science - Science and Research Branch Islamic Azad University, Tehran, Iran , Honarvar, Mahmood Department of Animal Science - Shahr-e-Qods Branch Islamic Azad University, Tehran, Iran , Emam Jomeh Kashan, Nasser Department of Animal Science - Science and Research Branch Islamic Azad University, Tehran, Iran
Abstract :
1) Background: Machine learning methods have been used in genetic studies to build
models that are capable of predicting missing genotypes for both human and animal
genetic variations. Genotype imputation is an important process of predicting unknown
genotypes. The present study aimed to investigate the idea of machine learning as an
imputation to compare the family-based methods and improve the imputation
performance in different scenarios. It also compared the accuracies of Support Vector
Machine (SVM) and Random Forest (RF). 2) Methods: The final population were
simulated in the form of 100 families, including one sire with different number of
genotyped progenies (2, 3, 4, 5 or 7). The number of markers was set to 5000 for whole
genome. The sires in families and other scenarios such as, both parents, sire/dam and
one progeny, sire and maternal grandsire were defined to investigate the ability of
learning machine algorithm for imputation. 3) Results: The imputation accuracy ranged
from 0.78 to 0.99 in different scenarios. Also, the least amount of imputation accuracy
was achieved for sire and maternal grand sire scenario with both methods. Increasing in
number of progenies from 2 to 3 was considerably increased in imputation accuracy
(SVM and RF). The imputation of non-genotyped individuals is possible based on
parent-offspring trios and the paired close relatives. However, the use of child- one
parent genotyped, both parents genotyped and sire and maternal grandsire genotyped,
average imputation accuracy would not exceed 85%. While the genotype of progenies
are the best source of predicted genotypes for ungenotyped individuals, if the number of
progeny is more than 4, the imputation accuracy is increased more than 95%. According
to the results, the performance of machine learning methods in family of trios has a
good accuracy and computational speed, which can be used in estimated breeding value.
Keywords :
Ungenotype , Support Vector Machine , Random Forest , Machine Learning , Imputation Accuracy , Genomic
Journal title :
Journal of Epigenetics