• DocumentCode
    983738
  • Title

    A probabilistic model for mining labeled ordered trees: capturing patterns in carbohydrate sugar chains

  • Author

    Ueda, Nobuhisa ; Aoki-Kinoshita, Kiyoko F. ; Yamaguchi, Atsuko ; Akutsu, Tatsuya ; Mamitsuka, Hiroshi

  • Author_Institution
    Bioinformatics Center, Kyoto Univ., Goknsho Uji, Japan
  • Volume
    17
  • Issue
    8
  • fYear
    2005
  • Firstpage
    1051
  • Lastpage
    1064
  • Abstract
    Glycans, or carbohydrate sugar chains, which play a number of important roles in the development and functioning of multicellular organisms, can be regarded as labeled ordered trees. A recent increase in the documentation of glycan structures, especially in the form of database curation, has made mining glycans important for the understanding of living cells. We propose a probabilistic model for mining labeled ordered trees, and we further present an efficient learning algorithm for this model, based on an EM algorithm. The time and space complexities of this algorithm are rather favorable, falling within the practical limits set by a variety of existing probabilistic models, including stochastic context-free grammars. Experimental results have shown that, in a supervised problem setting, the proposed method outperformed five other competing methods by a statistically significant factor in all cases. We further applied the proposed method to aligning multiple glycan trees, and we detected biologically significant common subtrees in these alignments where the trees are automatically classified into subtypes already known in glycobiology.
  • Keywords
    biology computing; computational complexity; context-free grammars; data mining; genetics; learning (artificial intelligence); microorganisms; molecular biophysics; pattern recognition; tree data structures; biologically significant common subtrees; carbohydrate sugar chains; genetics; glycans; glycobiology; labeled ordered tree mining; machine learning; multicellular organisms; probabilistic model; space complexity; stochastic context-free grammars; time complexity; Bioinformatics; Context modeling; Data mining; Documentation; Hidden Markov models; Machine learning; Organisms; Sequences; Stochastic processes; XML; Index Terms- Biology and genetics; data mining; machine learning; mining methods and algorithms.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.117
  • Filename
    1458699