• DocumentCode
    3230769
  • Title

    A fast protein structure retrieval system using image-based distance matrices and multidimensional index

  • Author

    Chi, Pin-Hao ; Scott, Grant ; Shyu, Chi-Ren

  • Author_Institution
    Dept. of Comput. Sci., Missouri Univ., Columbia, MO, USA
  • fYear
    2004
  • fDate
    19-21 May 2004
  • Firstpage
    522
  • Lastpage
    529
  • Abstract
    Indexing protein structures has been shown to provide a scalable solution for structure-to-structure comparisons in large protein structure retrieval systems. To conduct similarity searches against 46,075 polypeptide chains in a database with real-time responses, two critical issues must be addressed, information extraction and suitable indexing. In this paper, we apply computer vision techniques to extract the predominant information encoded in each 2D distance matrix, generated from 3D coordinates of protein chains. Distance matrices are capable of representing specific protein structural topologies, and similar proteins will generate similar matrices. Once meaningful features are extracted from distance images, an advanced indexing structure, entropy balanced statistical (EBS) k-d tree, can be utilized to index the multidimensional data. With a limited amount of training data from domain experts, namely structural classification of a subset of available protein chains, we apply various techniques in the pattern recognition field to determine clusters of proteins in the multi-dimensional feature space. Our system is able to recall search results in a ranked order from the protein database in seconds, exhibiting a reasonably high degree of precision.
  • Keywords
    biology computing; computer vision; database indexing; feature extraction; information retrieval; pattern classification; pattern clustering; proteins; statistical analysis; tree searching; computer vision techniques; entropy balanced statistical k-d tree; image-based distance matrices; information extraction; multidimensional index; pattern recognition; protein chains; protein structure retrieval system; similarity searches; Computer vision; Data mining; Entropy; Feature extraction; Image databases; Image retrieval; Indexing; Multidimensional systems; Proteins; Topology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering, 2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on
  • Print_ISBN
    0-7695-2173-8
  • Type

    conf

  • DOI
    10.1109/BIBE.2004.1317387
  • Filename
    1317387