• DocumentCode
    1576426
  • Title

    A Structural Data Mining Approach for the Classification of Secondary RNA structure

  • Author

    Lam, Winnie W M ; Chan, Keith C C

  • Author_Institution
    Hong Kong Polytech. Univ.
  • fYear
    2006
  • Firstpage
    4759
  • Lastpage
    4762
  • Abstract
    There exist many methods for classifying genomic data by aligning, comparing, and analyzing primary nucleotide sequences using such algorithms as dynamic programming and kinetic folding, etc.. These methods are, however, not always effective as motifs are more conserved in structures than in sequences. Instead of performing classification based on primary sequences, we therefore propose to perform the task from structure, exploiting the phenomenon in which molecules form from a sequence of nucleotides, beginning with a primary sequence that can fold back onto itself to form a secondary structure and then a tertiary structure. The algorithm we propose is able to perform data mining in structural data and is called the random multi-level attributed (RMLA) graph algorithm for mining and representing secondary genomic structure from such biomolecule as tRNA. The identification of structural similarity is implemented with information measure concept to characterize the resultant class. Experiments are based on known tRNA structural data from database of compilation of tRNA genes. The results show that our approach is able to effectively classify different class of tRNA secondary structure. We also compare our result with other classification algorithms to prove the effectiveness. Our approach shows a better way to classify structural data. In fact, RMLA graph is not suitable only for the classification of genomic data, wherever graphs are used to model data, it is useful for discovering patterns in the databases
  • Keywords
    biology computing; data mining; genetics; graphs; macromolecules; molecular biophysics; molecular configurations; biomolecule; dynamic programming; genomic data; kinetic folding; primary nucleotide sequences; random multilevel attributed graph algorithm; secondary RNA structure classification; structural data mining approach; structural similarity; Algorithm design and analysis; Bioinformatics; Data mining; Databases; Dynamic programming; Genomics; Heuristic algorithms; Kinetic theory; Molecular biophysics; RNA;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the
  • Conference_Location
    Shanghai
  • Print_ISBN
    0-7803-8741-4
  • Type

    conf

  • DOI
    10.1109/IEMBS.2005.1615535
  • Filename
    1615535