• DocumentCode
    1689658
  • Title

    Database support for 3D-protein data set analysis

  • Author

    Hinneburg, Alexander ; Lehner, Wolfgang

  • Author_Institution
    Inst. of Comput. Sci., Halle Univ., Germany
  • fYear
    2003
  • Firstpage
    161
  • Lastpage
    170
  • Abstract
    The progress in genome research demands for an adequate infrastructure to analyze the data sets. Database systems reflect a key technology to organize data and speed up the analysis process. This paper discusses the role of a relational database system based on the problem of finding frequent substructures in multi-dimensional protein databases. The specific problem consists of producing a set of association rules regarding frequent substructures with different lengths and gaps between the amino acid residues of a protein. From a database point of view, the process of finding association rules building the base for a more in-depth analysis of the data material is split into two parts. The first part performs a discretization of the conformational angle space of a single amino acid residue by computing the nearest neighbor of a given set of representatives. The second part consists in adapting a well-known association rule algorithm to determine the frequent substructures. Both steps within this comprehensive analysis task requires substantial support of the underlying database in order to reduce the programming overhead at the application level.
  • Keywords
    data analysis; data models; file organisation; proteins; relational databases; solid modelling; 3D-protein data set; amino acid; association rule; conformational angle space; data analysis; data discretization; data organization; database support; database system; genetics; multidimensional database; protein database; relational database; Amino acids; Association rules; Bioinformatics; Data analysis; Database systems; Genomics; Proteins; Relational databases; Spatial databases; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Scientific and Statistical Database Management, 2003. 15th International Conference on
  • ISSN
    1099-3371
  • Print_ISBN
    0-7695-1964-4
  • Type

    conf

  • DOI
    10.1109/SSDM.2003.1214977
  • Filename
    1214977