Title :
Structural mining of molecular biology data
Author :
Cook, Diane J. ; Holder, Lawrence B. ; Su, Shaobing ; Maglothin, Ron ; Jonyer, Istvan
Author_Institution :
Dept. of Comput. Sci. Eng., Texas Univ., Arlington, TX, USA
Abstract :
One method for discovering knowledge in structural data is the identification of common substructures, or subgraphs, within the data. Once identified, these substructures can be used to simplify the data by replacing instances of the substructure with a pointer to the newly discovered concept. The discovered substructure concepts allow abstraction over detailed structure in the original data and provide new, relevant attributes for interpreting the data. In this article, we describe the SUBDUE system that discovers interesting substructures in structural data. SUBDUE discovers substructures that compress the original database and represent interesting structural concepts in the data. By compressing previously discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data, The capabilities of SUBDUE are used to discover patterns in protein and DNA databases.
Keywords :
DNA; biology computing; data compression; data mining; deductive databases; graphical user interfaces; learning (artificial intelligence); molecular biophysics; pattern matching; proteins; software tools; DNA databases; SUBDUE system; abstraction over detailed structure; biological patterns; common substructures identification; data compression; graph-based tool; hierarchical description; inexact graph match; knowledge discovery; molecular biology data; multiple passes; protein databases; relevant attributes; structural mining; structural regularities; supervised SUBDUE; DNA; Data mining; Databases; Humans; Information analysis; Pattern analysis; Pattern matching; Polynomials; Proteins; Sequences; Algorithms; Animals; Databases, Nucleic Acid; Databases, Protein; Humans; Information Storage and Retrieval; Molecular Biology; Saccharomyces cerevisiae;
Journal_Title :
Engineering in Medicine and Biology Magazine, IEEE