Title :
The RNA String Kernel for siRNA Efficacy Prediction
Author :
Qiu, Shibin ; Lane, Terran
Author_Institution :
Pathwork Diagnostics, Inc., Sunnyvale
Abstract :
String kernels directly model sequence similarities without the necessity of extracting numerical features in a vector space. Since they better capture complex traits in the sequences, string kernels often achieve better prediction performance. RNA interference is a cell defense mechanism with many biological and therapeutical applications, where strings can be used to represent target messenger RNAs and initiating short RNAs and string kernels can be applied for training and prediction. While most existing string kernels are developed for general purpose sequences and have been applied to text and protein classifications, the RNA string kernel is particularly designed to model mismatches, GU wobbles, and bulges of RNA biology and has been applied to RNAi off-target evaluation. We adapt the RNA string kernel to compute the similarity of siRNA sequences and use it in support vector regression to predict siRNA silencing efficacy. We evaluate the performance of the RNA kernel against the spectrum kernel, the string subsequence kernel of arbitrary mismatch, the randomized string kernel, and numerical kernels computed from numerical features extracted according to siRNA design rules. We also give insights into computational performance and common properties and differences of the RNA kernel as compared to other kernels. Empirical results on biological data sets demonstrate that the RNA string kernel performed favorably than most existing string kernels and achieved significant improvements over kernels computed from numerical descriptors extracted according to structural and thermodynamic rules. Meanwhile, the string kernels achieved favorable results relative to other methods in related work. Furthermore, the RNA string kernel is simple to implement and fast to compute.
Keywords :
biology computing; cellular biophysics; genetics; molecular biophysics; molecular configurations; regression analysis; support vector machines; RNA interference; RNA string kernel; cell defense mechanism; randomized string kernel; sequence similarities; siRNA efficacy prediction; siRNA sequences; siRNA silencing efficacy; spectrum kernel; string subsequence kernel; support vector regression; Biological system modeling; Biology computing; Cells (biology); Data mining; Feature extraction; Interference; Kernel; Protein engineering; RNA; Sequences;
Conference_Titel :
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-1509-0
DOI :
10.1109/BIBE.2007.4375581