Title :
A point of contact between computer science and molecular biology
Author :
Miller, Webb ; Schwartz, Scott ; Hardison, Ross C.
Author_Institution :
Dept. of Comput. Sci., Pennsylvania State Univ., University Park, PA, USA
Abstract :
Molecular biology is rapidly becoming a data-rich science with extensive computational needs. The sheer volume of data poses a serious challenge in storing and retrieving biological information, and the rate of growth is exponential. Linking the heterogeneous data libraries of molecular biology, organizing its diverse and interrelated data sets, and developing effective query options for its databases are all areas for cross-fertilization between molecular biology and computer science. However, even the apparently simple task of analyzing a single sequence of DNA requires complex collaboration. For several years, we have been developing a computer toolkit for analyzing DNA sequences. The biology of gene regulation in mammals has driven the design of the sequence comparison toolkit to emphasize space-efficient algorithms with a high degree of sensitivity and has profoundly affected choice of tools and the development of algorithms. We sketch the biology of this class of problem and show how it specifically drives the software development. The main components of this toolkit are outlined.<>
Keywords :
DNA; biology computing; data analysis; factographic databases; molecular biophysics; DNA sequences; biological information; computational needs; computer science; computer toolkit; data-rich science; gene regulation; heterogeneous data libraries; interrelated data sets; mammals; molecular biology; query options; sequence comparison toolkit; software development; space-efficient algorithms; Biology computing; Computational biology; Computer science; DNA; Databases; Information retrieval; Joining processes; Libraries; Organizing; Sequences;
Journal_Title :
Computational Science & Engineering, IEEE