DocumentCode :
3699947
Title :
The power study about three statistics of alignment-free comparison based on AT-RICH model
Author :
Xue-mei Liu; Rui-bin He; Bo-dong Liu; Xiang Zang; Yu-xia Zhang; Wen-yao Liang
Author_Institution :
Dept. of Phys., South China Univ. of Technol., Guangzhou, China
Volume :
2
fYear :
2015
fDate :
7/1/2015 12:00:00 AM
Firstpage :
550
Lastpage :
553
Abstract :
Similarity comparison between two biological sequences is one of the main problems in computational biology research. A powerful statistical method D2 which depends on the joint k-tuples content in the two sequences, has been applied to the alignment-free sequences comparison. Two mutually independent random sequences under the null model have been produced, which is composed by AT-rich (PA=PT=0.33, PC=PG=0.17) distribution, and based on the null model, we got two foreground sequences with Bernoulli variables by a pattern transfer model. For the foreground sequences, by comparing local sequences pairs and then summing over all the local sequences pairs of certain length, and the local alignment-free of two sequences has been tested by statistics D2, D2star, D2shepp, then from the power of the three statistics, we can find the optimal parameters. The simulation results show that D2star is better than D2shepp, and D2 is relatively weak. We also analyze the power value distribution under different parameters, including Bernoulli variable g and tuple size k and type I Error. At the same time by comparing the proposed local with global-alignment-free about D2star, and D2shepp under the same parameters, it showed that the power of local alignment-free based on D2star tends to 1 quickly with the increase of the length of the sequence, faster and more accurate than the global alignment.
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2015 International Conference on
Type :
conf
DOI :
10.1109/ICMLC.2015.7340613
Filename :
7340613
Link To Document :
بازگشت