شماره ركورد كنفرانس :
5513
عنوان مقاله :
ResidualConv1D: A Deep Learning Approach for Enhancing Splice Site Prediction across Genomic Contexts
پديدآورندگان :
Rezvan Mohammad Reza reza.rzvn1@gmail.com Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran , Ghanbari Sorkhi Ali ali.ghanbari@mazust.ac.ir Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran , Pirgazi Jamshid j.pirgazi@mazust.ac.ir Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran , Pourhashem Kallehbasti Mohammad Mehdi pourhashem@mazust.ac.ir Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
كليدواژه :
splice site prediction , Two , Gram features , ResidualConv1D , genomic contexts , accuracy
عنوان كنفرانس :
نخستين همايش ملي هوش مصنوعي و فناوري هاي آينده نگر
چكيده فارسي :
This study addresses the challenge of accurately predicting splice sites, a crucial element in understanding gene expression and protein synthesis. We assume that conventional prediction methods may lack the specificity and adaptability required for diverse genomic contexts. To improve this, we present a novel method that integrates two-Gram features and One-Hot encoding with a Deep Convolutional Neural Network (ResidualConv1D) model. Our approach begins with using the two-Gram technique to capture nucleotide dependencies at splice sites. These sequences are then enriched with two-Gram features using one-hot encoding. The core of our methodology is the ResidualConv1D model, which employs convolutional blocks with residual connections to detect complex sequence patterns effectively. Our results indicate a significant advancement in splice site prediction accuracy. The model particularly excels in the HS3D acceptor and Arabidopsis thaliana donor datasets, outperforming the established Ensemble Splice algorithm. In the HS3D acceptor dataset, the model achieved an accuracy of 94.18% and an F1-score of 94.24%, demonstrating its effectiveness. Additionally, it shows competitive performance in a range of metrics across various datasets, highlighting its robustness in different genomic environments. In conclusion, our innovative combination of two-Gram features, one-hot encoding, and the ResidualConv1D model substantially improves the accuracy of splice site prediction across diverse species. This improvement in prediction capability could be pivotal in advancing the understanding of gene splicing mechanisms.