Title of article :
Imputation of Ungenotyped Individuals Based on Genotyped Relatives Using Machine Learning Methodology
Author/Authors :
Rastin Bojnord, Naeem Department of Animal Science - Science and Research Branch Islamic Azad University, Tehran, Iran , Aminafshar, Mehdi Department of Animal Science - Science and Research Branch Islamic Azad University, Tehran, Iran , Honarvar, Mahmood Department of Animal Science - Shahr-e-Qods Branch Islamic Azad University, Tehran, Iran , Emam Jomeh Kashan, Nasser Department of Animal Science - Science and Research Branch Islamic Azad University, Tehran, Iran
Pages :
10
From page :
13
To page :
22
Abstract :
1) Background: Machine learning methods have been used in genetic studies to build models that are capable of predicting missing genotypes for both human and animal genetic variations. Genotype imputation is an important process of predicting unknown genotypes. The present study aimed to investigate the idea of machine learning as an imputation to compare the family-based methods and improve the imputation performance in different scenarios. It also compared the accuracies of Support Vector Machine (SVM) and Random Forest (RF). 2) Methods: The final population were simulated in the form of 100 families, including one sire with different number of genotyped progenies (2, 3, 4, 5 or 7). The number of markers was set to 5000 for whole genome. The sires in families and other scenarios such as, both parents, sire/dam and one progeny, sire and maternal grandsire were defined to investigate the ability of learning machine algorithm for imputation. 3) Results: The imputation accuracy ranged from 0.78 to 0.99 in different scenarios. Also, the least amount of imputation accuracy was achieved for sire and maternal grand sire scenario with both methods. Increasing in number of progenies from 2 to 3 was considerably increased in imputation accuracy (SVM and RF). The imputation of non-genotyped individuals is possible based on parent-offspring trios and the paired close relatives. However, the use of child- one parent genotyped, both parents genotyped and sire and maternal grandsire genotyped, average imputation accuracy would not exceed 85%. While the genotype of progenies are the best source of predicted genotypes for ungenotyped individuals, if the number of progeny is more than 4, the imputation accuracy is increased more than 95%. According to the results, the performance of machine learning methods in family of trios has a good accuracy and computational speed, which can be used in estimated breeding value.
Keywords :
Ungenotype , Support Vector Machine , Random Forest , Machine Learning , Imputation Accuracy , Genomic
Journal title :
Journal of Epigenetics
Serial Year :
2021
Record number :
2703936
Link To Document :
بازگشت