DocumentCode :
3230289
Title :
Identification of coding and non-coding sequences in a complete genome using local Hölder exponent formalism and Multi-affinity analysis
Author :
Li, Xiaochun ; Zhou, Tiejun ; Tang, Xiaoyong ; Cai, Xiangwen
Author_Institution :
Orient Sci. & Technol. Coll., Hunan Agric. Univ., Changsha, China
fYear :
2010
fDate :
23-26 Sept. 2010
Firstpage :
775
Lastpage :
782
Abstract :
Accurate prediction of genes in genomes has always been a challenging task for bioinformaticians and computational biologists. Therefore, the discovery of relations in coding and non-coding sequences has led to new perspectives in the understanding of the DNA sequences. This has motivated us to find new methods to distinguish coding and non-coding sequences. We first introduce a number sequence representation of DNA sequences. Multi-affinity analysis and local Hölder exponent are then performed on the representation of the obtained number sequence. Three suited exponents are selected to form a parameter space. The two exponents γ(-2), γ(6) are from Multi-affinity analysis, the exponent h is from local Hölder exponent. Thus, each coding or non-coding sequence may be represented by a point in the three-dimensional parameter space. We can see the points corresponding to coding and non-coding sequences in the complete genome of many prokaryotes be divided to different regions roughly. If the point (γ(-2), γ(6), h) for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is discriminated as a coding sequence; otherwise, the sequence is classified as a non-coding one. Therefore these exponents can be used to distinguish coding and non-coding sequences. The Fisher´s discriminant algorithm is used to give the discriminant accuracies. The average discriminant accuracies pc, pnc, qc and qnc of all 51 prokaryotes obtained by the present method reach 69.08%, 83.34%, 72.08% and 83.54%, respectively.
Keywords :
bioinformatics; genomics; DNA sequences; Fisher discriminant algorithm; complete genome; genes prediction; local Hölder exponent formalism; multiaffinity analysis; noncoding sequences; number sequence representation; three-dimensional parameter space; DNA; Manganese; Microorganisms; Hölder exponent; coding/noncoding sequences; multiaffinity analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bio-Inspired Computing: Theories and Applications (BIC-TA), 2010 IEEE Fifth International Conference on
Conference_Location :
Changsha
Print_ISBN :
978-1-4244-6437-1
Type :
conf
DOI :
10.1109/BICTA.2010.5645223
Filename :
5645223
Link To Document :
بازگشت