DocumentCode :
3322137
Title :
An assessment of a metric space database index to support sequence homology
Author :
Mao, Rui ; Xu, Weijia ; Singh, Neha ; Miranker, Daniel P.
Author_Institution :
Dept. of Comput. Sci., Texas Univ., Austin, TX, USA
fYear :
2003
fDate :
10-12 March 2003
Firstpage :
375
Lastpage :
382
Abstract :
Hierarchical metric-space clustering methods have been commonly used to organize proteomes into taxonomies. Consequently, it is often anticipated that hierarchical clustering can be leveraged as a basis for scalable database index structures capable of managing the hyper-exponential growth of sequence data. M-tree is one such data structure specialized for the management of large data sets on disk. We explore the application of M-trees to the storage and retrieval of peptide sequence data. Exploiting a technique first suggested by Myers (1994), we organize the database as records of fixed length substrings. Empirical results are promising. However, metric-space indexes are subject to "the curse of dimensionality" and the ultimate performance of an index is sensitive to the quality of the initial construction of the index. We introduce new hierarchical bulk-load algorithm that alternates between top-down and bottom-up clustering to initialize the index. Using the Yeast Proteomes, the bi-directional bulk load produces a more effective index than the existing M-tree initialization algorithms.
Keywords :
biology computing; database management systems; microorganisms; proteins; trees (mathematics); bi-directional bulk load; bottom-up clustering; existing M-tree initialization algorithms; fixed length substrings; hierarchical bulk-load algorithm; metric space database index; more effective index; proteomes organization; sequence homology support; taxonomies; top-down clustering; Clustering algorithms; Clustering methods; Data structures; Databases; Extraterrestrial measurements; Indexes; Information retrieval; Peptides; Sequences; Taxonomy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Bioengineering, 2003. Proceedings. Third IEEE Symposium on
Print_ISBN :
0-7695-1907-5
Type :
conf
DOI :
10.1109/BIBE.2003.1188976
Filename :
1188976
Link To Document :
بازگشت