DocumentCode
3409506
Title
On complexity measures for biological sequences
Author
Nan, Fei ; Adjeroh, Donald
Author_Institution
Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
fYear
2004
fDate
16-19 Aug. 2004
Firstpage
522
Lastpage
526
Abstract
In this work, we perform an empirical study of different published measures of complexity for general sequences, to determine their effectiveness in dealing with biological sequences. By effectiveness, we refer to how closely the given complexity measure is able to identify known biologically relevant relationships, such as closeness on a phylogenic tree. In particular, we study three complexity measures, namely, the traditional Shanon´s entropy, linguistic complexity, and T-complexity. For each complexity measure, we construct the complexity profile for each sequence in our test set, and based on the profiles we compare the sequences using different performance measures based on: (i) the information theoretic divergence measure of relative entropy; (ii) apparent periodicity in the complexity profile; and (iii) correct phylogeny. The preliminary results show that the T-complexity was the least effective in identifying previously established known associations between the sequences in our test set. Shannon´s entropy and linguistic-complexity provided better results, with Shannon´s entropy having an upper hand.
Keywords
biology computing; computational complexity; computational linguistics; entropy; Shanon entropy; T-complexity; biological sequences; complexity measures; information theoretic divergence measure; linguistic complexity; phylogenic tree; relative entropy; Bioinformatics; Biological information theory; Computer science; DNA; Entropy; Genomics; Organisms; Particle measurements; Sequences; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN
0-7695-2194-0
Type
conf
DOI
10.1109/CSB.2004.1332483
Filename
1332483
Link To Document