Title :
cWINNOWER algorithm for finding fuzzy DNA motifs
Author_Institution :
NASA Adv. Supercomput. Div., NASA Ames Res. Center, USA
Abstract :
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N=12000 for (l, d)=(15,4).
Keywords :
DNA; biology computing; genetic algorithms; genetics; graph theory; molecular biophysics; pattern recognition; proteins; DNA sequences; Pevzner winnower method; Sze winnower method; cWINNOWER algorithm; d mutations; detectable motifs qc; four-member sub-cliques; fuzzy DNA motifs; motif mutated copies; protein-binding signals; short nucleotide pattern; three-member sub-cliques; weaker signals detection; Bioinformatics; DNA computing; Fuzzy systems;
Conference_Titel :
Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
Print_ISBN :
0-7695-2000-6
DOI :
10.1109/CSB.2003.1227326