• DocumentCode
    945945
  • Title

    Discovering Frequent Agreement Subtrees from Phylogenetic Data

  • Author

    Zhang, Sen ; Wang, Jason T L

  • Author_Institution
    State Univ. of New York, Oneonta
  • Volume
    20
  • Issue
    1
  • fYear
    2008
  • Firstpage
    68
  • Lastpage
    82
  • Abstract
    We study a new data mining problem concerning the discovery of frequent agreement subtrees (FASTs) from a set of phylogenetic trees. A phylogenetic tree, or phylogeny, is an unordered tree in which the order among siblings is unimportant. Furthermore, each leaf in the tree has a label representing a taxon (species or organism) name, whereas internal nodes are unlabeled. The tree may have a root, representing the common ancestor of all species in the tree, or may be unrooted. An unrooted phylogeny arises due to the lack of sufficient evidence to infer a common ancestor of the taxa in the tree. The FAST problem addressed here is a natural extension of the maximum agreement subtree (MAST) problem widely studied in the computational phylogenetics community. The paper establishes a framework for tackling the FAST problem for both rooted and unrooted phylogenetic trees using data mining techniques. We first develop a novel canonical form for rooted trees together with a phylogeny-aware tree expansion scheme for generating candidate subtrees level by level. Then, we present an efficient algorithm to find all FASTs in a given set of rooted trees, through an Apriori-like approach. We show the correctness and completeness of the proposed method. Finally, we discuss the extensions of the techniques to unrooted trees. Experimental results demonstrate that the proposed methods work well, and are capable of finding interesting patterns in both synthetic data and real phylogenetic trees.
  • Keywords
    biology computing; data mining; data mining; frequent agreement subtrees discovery; maximum agreement subtree problem; phylogenetic tree; phylogeny; tree expansion; unordered tree; Bioinformatics (genome or protein) databases; Data mining;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2007.190676
  • Filename
    4358962