DocumentCode
2341530
Title
An index structure for pattern similarity searching in DNA microarray data
Author
Wang, Haixun ; Perng, Chang-Shing ; Fan, Wei ; Yu, Philip S.
Author_Institution
IBM Thomas J. Watson Res. Center, Hawthorne, NY, USA
fYear
2002
fDate
2002
Firstpage
256
Lastpage
267
Abstract
DNA microarray technology is about to bring an explosion of gene expression data that may dwarf even the human sequencing projects. Researchers are motivated to identify genes whose expression levels rise and fall coherently under a set of experimental perturbations, that is, they exhibit fluctuation of a similar shape when conditions change. In this paper, we show that queries based on pattern correlations against large-scale microarray databases can be supported by the weighted-sequence model, an index structure designed for sequence matching. A weighted-sequence is a two-dimensional structure where each element in the sequence is associated with a weight. We transform the DNA microarray data, as well as pattern-based queries, into weighted-sequences, and use subsequence matching algorithms to retrieve from the database all genes that match the query pattern. We demonstrate, using both synthetic and real-world data sets, that our method is effective and efficient.
Keywords
DNA; arrays; biology computing; database indexing; genetics; molecular biophysics; pattern matching; query processing; scientific information systems; sequences; very large databases; 2D structure; DNA microarray data; fluctuation; gene expression data; index structure; large-scale microarray databases; matching algorithms; pattern correlations; pattern similarity searching; pattern-based queries; queries; sequence matching; weighted-sequence model; DNA; Databases; Explosions; Fluctuations; Gene expression; Humans; Large-scale systems; Pattern matching; Sequences; Shape;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
Print_ISBN
0-7695-1653-X
Type
conf
DOI
10.1109/CSB.2002.1039348
Filename
1039348
Link To Document