DocumentCode :
3290097
Title :
Splice Site Prediction Based on Characteristic of Sequential Motifs and C4.5 Algorithm
Author :
Sun, Hequan ; Peng, Qinke ; Zhang, Quanwei ; Mou, Dan
Author_Institution :
Sch. of Electron. & Inf. Eng., Xian Jiaotong Univ., Xian
Volume :
4
fYear :
2008
fDate :
18-20 Oct. 2008
Firstpage :
417
Lastpage :
422
Abstract :
Through statistic analysis on the donor site sequences in the dataset of HS3D, the rules that the bases appear in the adjacent sites around the splice sites are used for constructing motifs, which are then utilized as the attributes of the DNA sequences. And by setting the value of each attribute the literal sequences are transformed into quasi numeric vectors, based on which a decision tree (C4.5 algorithm) model is built to predict splice sites. The experimental results indicate that compared with the improved Maisheng Yinpsilas motif-scoring model, the proposed method has diminished the influence on the prediction generated by the abnormal data effectively and shows that the new encoding method in virtue of motifs is practicable and effectual.
Keywords :
DNA; decision trees; encoding; prediction theory; sequences; statistical analysis; vectors; C4.5 algorithm; DNA sequences; HS3D dataset; decision tree model; donor site sequences; encoding method; motif-scoring model; quasi numeric vectors; sequential motifs characteristic; splice site prediction; statistic analysis; Biological system modeling; DNA; Data engineering; Data mining; Decision trees; Fuzzy systems; Predictive models; Proteins; Sequences; Sun; Bioinformation processing; Decision tree; Motif; Splice sites prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Jinan Shandong
Print_ISBN :
978-0-7695-3305-6
Type :
conf
DOI :
10.1109/FSKD.2008.331
Filename :
4666421
Link To Document :
بازگشت