Title :
A compressing method for genome sequence cluster using sequence alignment
Author :
Su Jung, Kwang ; Hee Yu, Nam ; Jung Shin, Seung ; Ho Ryu, Keun
Author_Institution :
Database/Bioinf. Lab., Chungbuk Nat. Univ., Cheongju
Abstract :
After identifying the function of a protein, biologists produce new useful proteins by substituting some residues of the identified protein. These new proteins have high sequence homology (similarity). We define a sequence cluster as a cluster that is constituted of similar sequences. As another example of a sequence cluster, we consider a SNP (single nucleotide polymorphism) cluster. A SNP is a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual). We suggest a new compressing technique for these sequence clusters using a sequence alignment method. We select a representative sequence which has a minimum sequence distance in the cluster by scanning distances of all sequences. The distances are obtained by calculating a sequence alignment score. The result of this sequence alignment is utilized to author conversion information called an edit-script between the two sequences. We only stored representative sequences and edit-scripts of each cluster into a database. Member sequences of each cluster can then be easily created using representative sequences and edit-scripts.
Keywords :
DNA; genetics; image coding; image representation; image sequences; medical image processing; compressing method; edit-script conversion information; genome sequence cluster; genomes; high sequence homology; sequence alignment; single nucleotide polymorphism; Bioinformatics; Biological cells; Clustering algorithms; DNA; Databases; Genomics; Laboratories; Matrices; Proteins; Sequences;
Conference_Titel :
Computer and Information Technology, 2008. CIT 2008. 8th IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-2357-6
Electronic_ISBN :
978-1-4244-2358-3
DOI :
10.1109/CIT.2008.4594729