DocumentCode :
3230769
Title :
A fast protein structure retrieval system using image-based distance matrices and multidimensional index
Author :
Chi, Pin-Hao ; Scott, Grant ; Shyu, Chi-Ren
Author_Institution :
Dept. of Comput. Sci., Missouri Univ., Columbia, MO, USA
fYear :
2004
fDate :
19-21 May 2004
Firstpage :
522
Lastpage :
529
Abstract :
Indexing protein structures has been shown to provide a scalable solution for structure-to-structure comparisons in large protein structure retrieval systems. To conduct similarity searches against 46,075 polypeptide chains in a database with real-time responses, two critical issues must be addressed, information extraction and suitable indexing. In this paper, we apply computer vision techniques to extract the predominant information encoded in each 2D distance matrix, generated from 3D coordinates of protein chains. Distance matrices are capable of representing specific protein structural topologies, and similar proteins will generate similar matrices. Once meaningful features are extracted from distance images, an advanced indexing structure, entropy balanced statistical (EBS) k-d tree, can be utilized to index the multidimensional data. With a limited amount of training data from domain experts, namely structural classification of a subset of available protein chains, we apply various techniques in the pattern recognition field to determine clusters of proteins in the multi-dimensional feature space. Our system is able to recall search results in a ranked order from the protein database in seconds, exhibiting a reasonably high degree of precision.
Keywords :
biology computing; computer vision; database indexing; feature extraction; information retrieval; pattern classification; pattern clustering; proteins; statistical analysis; tree searching; computer vision techniques; entropy balanced statistical k-d tree; image-based distance matrices; information extraction; multidimensional index; pattern recognition; protein chains; protein structure retrieval system; similarity searches; Computer vision; Data mining; Entropy; Feature extraction; Image databases; Image retrieval; Indexing; Multidimensional systems; Proteins; Topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Bioengineering, 2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on
Print_ISBN :
0-7695-2173-8
Type :
conf
DOI :
10.1109/BIBE.2004.1317387
Filename :
1317387
Link To Document :
بازگشت