DocumentCode
3230769
Title
A fast protein structure retrieval system using image-based distance matrices and multidimensional index
Author
Chi, Pin-Hao ; Scott, Grant ; Shyu, Chi-Ren
Author_Institution
Dept. of Comput. Sci., Missouri Univ., Columbia, MO, USA
fYear
2004
fDate
19-21 May 2004
Firstpage
522
Lastpage
529
Abstract
Indexing protein structures has been shown to provide a scalable solution for structure-to-structure comparisons in large protein structure retrieval systems. To conduct similarity searches against 46,075 polypeptide chains in a database with real-time responses, two critical issues must be addressed, information extraction and suitable indexing. In this paper, we apply computer vision techniques to extract the predominant information encoded in each 2D distance matrix, generated from 3D coordinates of protein chains. Distance matrices are capable of representing specific protein structural topologies, and similar proteins will generate similar matrices. Once meaningful features are extracted from distance images, an advanced indexing structure, entropy balanced statistical (EBS) k-d tree, can be utilized to index the multidimensional data. With a limited amount of training data from domain experts, namely structural classification of a subset of available protein chains, we apply various techniques in the pattern recognition field to determine clusters of proteins in the multi-dimensional feature space. Our system is able to recall search results in a ranked order from the protein database in seconds, exhibiting a reasonably high degree of precision.
Keywords
biology computing; computer vision; database indexing; feature extraction; information retrieval; pattern classification; pattern clustering; proteins; statistical analysis; tree searching; computer vision techniques; entropy balanced statistical k-d tree; image-based distance matrices; information extraction; multidimensional index; pattern recognition; protein chains; protein structure retrieval system; similarity searches; Computer vision; Data mining; Entropy; Feature extraction; Image databases; Image retrieval; Indexing; Multidimensional systems; Proteins; Topology;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Bioengineering, 2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on
Print_ISBN
0-7695-2173-8
Type
conf
DOI
10.1109/BIBE.2004.1317387
Filename
1317387
Link To Document