Title :
Large Scale Analysis of Small Repeats via Mining of the Human Genome
Author :
Inge van den Berg;Dragan Bosnacki;Peter A.J. Hilbers
Author_Institution :
Dept. of Biomed. Eng., Eindhoven Univ. of Technol., Eindhoven, Netherlands
Abstract :
Small repetitive sequences, called tandem repeats,are abundant throughout the human genome,both in coding and in non-coding regions. Their role is still mostlyunknown, but at least 20 of those repetitive sequences have been related to neurodegenerative disorders. The mutational process that isthe basis of these disorders is not yet fully understood. Comprehendingthe origin, function and possible usefulness of the tandemrepeats, will require analysis of huge data from various sources.In this paper we attempt such a large scale analysis of short repeats.We describe and discuss the steps that are needed to be taken to performlarge scale genomic analysis. We define tandem repeats and comparethe results of repeat localization with genome annotations. We show that the degree of repetitiveness is different for the humanchromosomes. Chromosome 19 and 17 have more repeats per mega base pair than any of the other chromosomes, the Y chromosome has the least. We also demonstrate that some repeat motifs are much more common than others. Mono- and dinucleotide repeats are the most abundant, with A and AAC the mostcommon motifs, while CG is hardly present within the genome. Repeats with unit length three are underrepresented on the genome and repeats with unit length 9 are extremely rare.
Keywords :
"Large-scale systems","Humans","Genomics","Bioinformatics","Sequences","Diseases","Biological cells","Satellites","Data analysis","Genetic mutations"
Conference_Titel :
Database and Expert Systems Application, 2009. DEXA ´09. 20th International Workshop on
Print_ISBN :
978-0-7695-3763-4
DOI :
10.1109/DEXA.2009.78