• DocumentCode
    3409506
  • Title

    On complexity measures for biological sequences

  • Author

    Nan, Fei ; Adjeroh, Donald

  • Author_Institution
    Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
  • fYear
    2004
  • fDate
    16-19 Aug. 2004
  • Firstpage
    522
  • Lastpage
    526
  • Abstract
    In this work, we perform an empirical study of different published measures of complexity for general sequences, to determine their effectiveness in dealing with biological sequences. By effectiveness, we refer to how closely the given complexity measure is able to identify known biologically relevant relationships, such as closeness on a phylogenic tree. In particular, we study three complexity measures, namely, the traditional Shanon´s entropy, linguistic complexity, and T-complexity. For each complexity measure, we construct the complexity profile for each sequence in our test set, and based on the profiles we compare the sequences using different performance measures based on: (i) the information theoretic divergence measure of relative entropy; (ii) apparent periodicity in the complexity profile; and (iii) correct phylogeny. The preliminary results show that the T-complexity was the least effective in identifying previously established known associations between the sequences in our test set. Shannon´s entropy and linguistic-complexity provided better results, with Shannon´s entropy having an upper hand.
  • Keywords
    biology computing; computational complexity; computational linguistics; entropy; Shanon entropy; T-complexity; biological sequences; complexity measures; information theoretic divergence measure; linguistic complexity; phylogenic tree; relative entropy; Bioinformatics; Biological information theory; Computer science; DNA; Entropy; Genomics; Organisms; Particle measurements; Sequences; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
  • Print_ISBN
    0-7695-2194-0
  • Type

    conf

  • DOI
    10.1109/CSB.2004.1332483
  • Filename
    1332483