DocumentCode :
2281532
Title :
A visualization approach to motif discovery in DNA sequences
Author :
Rambally, Gerard
Author_Institution :
Dept. of Comput. Sci., Prairie View A & M Univ., TX
fYear :
2007
fDate :
22-25 March 2007
Firstpage :
348
Lastpage :
353
Abstract :
Given a set of DNA sequences, the motif finding problem involves finding short DNA sequences or consensus patterns that occur surprisingly often, without any prior knowledge of what these patterns looks like. This paper proposes a visualization technique and algorithm for finding motifs in DNA sequences. The authors demonstrate that the performance of the motif finding algorithm is significantly improved with the proposed visualization technique. In the proposed method, each nucleotide base {A, T, C, G} in a DNA sequence is assigned a unique integer as a function of its immediate subsequent base, allowing the DNA sequence to be mapped to a corresponding numeric sequence. This numeric sequence is then plotted in 3-D space. After plotting multiple DNA sequences in the same 3-D space, approximately identical regions of the plots are aligned by translation and rotational transformations. The images of the approximately identical regions appear as clusters in the combined sequence image and are then used to generate the l-mers alignment matrix from which the nucleotide profile matrix is computed. Finally, the profile matrix is used to generate the consensus string pattern or motif.
Keywords :
DNA; biology computing; data visualisation; molecular biophysics; 3D space; l-mers alignment matrix; motif discovery; nucleotide base; nucleotide profile matrix; numeric sequence; short DNA sequences; visualization approach; Algorithm design and analysis; Bioinformatics; DNA; Genomics; Pattern analysis; Pattern matching; Proteins; RNA; Sequences; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
SoutheastCon, 2007. Proceedings. IEEE
Conference_Location :
Richmond, VA
Print_ISBN :
1-4244-1028-2
Electronic_ISBN :
1-4244-1029-0
Type :
conf
DOI :
10.1109/SECON.2007.342923
Filename :
4147453
Link To Document :
بازگشت