• DocumentCode
    2299771
  • Title

    Sequence distances based on exhaustive substring composition

  • Author

    Apostolico, Alberto ; Denas, Olgert

  • Author_Institution
    Accademia Naz. dei Lincei & DEI, Univ. of Padova, Padova
  • fYear
    2008
  • fDate
    5-9 May 2008
  • Firstpage
    95
  • Lastpage
    98
  • Abstract
    The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background.
  • Keywords
    biology computing; evolution (biological); genetics; sequences; string matching; tree data structures; exhaustive substring composition; gene rearrangement; phylogenetic tree reconstruction; sequence distances; string edit; subword composition; suffix tree; Bioinformatics; Biology computing; Computational efficiency; Drives; Educational institutions; Frequency; Genomics; Organisms; Phylogeny; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Theory Workshop, 2008. ITW '08. IEEE
  • Conference_Location
    Porto
  • Print_ISBN
    978-1-4244-2269-2
  • Electronic_ISBN
    978-1-4244-2271-5
  • Type

    conf

  • DOI
    10.1109/ITW.2008.4578629
  • Filename
    4578629