Title :
Removing contamination from genomic sequences based on vector reference libraries
Author :
Bagci, Caner ; Allmer, Jens
Author_Institution :
Mol. Biol. & Genetics, Izmir Inst. of Technol., Izmir, Turkey
Abstract :
DNA is often sequenced after being cloned into a vector since this provides the possibility for using standard primers and removes the need to develop custom primers. In this way a certain amount of vector is sequenced along with the sequence of interest. Unfortunately, occasionally these contaminating vector sequences find their way into public databases as part of submitted sequences. It has been pointed out that SeqClean, a program used to remove vector contamination from sequences, does not take into account that vectors are circular structures. A workaround has been presented before, but we were able to simplify the process and, additionally, we provide an implementation. We further applied our method to a test set of EST sequences and also analyzed the amount of contamination found in the EST sequences available on NCBI.
Keywords :
DNA; bioinformatics; genomics; DNA; EST sequences; NCBI; SeqClean; circular structure; genomic sequences; public database; standard primer; vector contamination; vector reference libraries; vector sequences; Bioinformatics; Cleaning; Contamination; Databases; Libraries; Software; Vectors;
Conference_Titel :
Health Informatics and Bioinformatics (HIBIT), 2012 7th International Symposium on
Conference_Location :
Nevsehir
Print_ISBN :
978-1-4673-0879-3
DOI :
10.1109/HIBIT.2012.6209053