DocumentCode :
1518262
Title :
Structural mining of molecular biology data
Author :
Cook, Diane J. ; Holder, Lawrence B. ; Su, Shaobing ; Maglothin, Ron ; Jonyer, Istvan
Author_Institution :
Dept. of Comput. Sci. Eng., Texas Univ., Arlington, TX, USA
Volume :
20
Issue :
4
fYear :
2001
Firstpage :
67
Lastpage :
74
Abstract :
One method for discovering knowledge in structural data is the identification of common substructures, or subgraphs, within the data. Once identified, these substructures can be used to simplify the data by replacing instances of the substructure with a pointer to the newly discovered concept. The discovered substructure concepts allow abstraction over detailed structure in the original data and provide new, relevant attributes for interpreting the data. In this article, we describe the SUBDUE system that discovers interesting substructures in structural data. SUBDUE discovers substructures that compress the original database and represent interesting structural concepts in the data. By compressing previously discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data, The capabilities of SUBDUE are used to discover patterns in protein and DNA databases.
Keywords :
DNA; biology computing; data compression; data mining; deductive databases; graphical user interfaces; learning (artificial intelligence); molecular biophysics; pattern matching; proteins; software tools; DNA databases; SUBDUE system; abstraction over detailed structure; biological patterns; common substructures identification; data compression; graph-based tool; hierarchical description; inexact graph match; knowledge discovery; molecular biology data; multiple passes; protein databases; relevant attributes; structural mining; structural regularities; supervised SUBDUE; DNA; Data mining; Databases; Humans; Information analysis; Pattern analysis; Pattern matching; Polynomials; Proteins; Sequences; Algorithms; Animals; Databases, Nucleic Acid; Databases, Protein; Humans; Information Storage and Retrieval; Molecular Biology; Saccharomyces cerevisiae;
fLanguage :
English
Journal_Title :
Engineering in Medicine and Biology Magazine, IEEE
Publisher :
ieee
ISSN :
0739-5175
Type :
jour
DOI :
10.1109/51.940050
Filename :
940050
Link To Document :
بازگشت