Title :
Grouping and visualizing human endogenous retroviruses by bootstrapping median self-organizing maps
Author :
Oja, Merja ; Sperber, Goran ; Blomberg, Jonas ; Kaski, Samuel
Author_Institution :
Neural Networks Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
Abstract :
About eight percent of the human genome consists of human endogenous retrovirus sequences. Human endogenous retroviruses (HERV) are remains from ancient infections by retroviruses. The HERVs are mutated and deficient, but they still may give rise to transcripts or may affect the expression of human genes. The HERVs stem from several kinds of retroviruses., The possible current functioning of the HERV sequences may reflect the origin of the HERVs. Hence, the classification of the diverse HERV sequences is a natural starting point when investigating the effect of HERVs in humans. The current HERV taxonomy is incomplete: some sequences cannot be assigned to any class and the classification is ambiguous for others. A median self-organizing map (SOM), a SOM for data about pairwise distances between samples, can be used to group all the HERVs found in the human genome. It visualizes the collection of 3661 HERV sequences found by the RetroTector system, on a two-dimensional display that represents similarity relationships between individual sequences, as well as cluster structures and similarities of clusters. The SOM, as any dimensionality reduction method, necessarily has to make compromises when representing the data. In this work we extend the visualizations by bootstrap-based estimates on which parts of the visualization are reliable and which not, and use the SOM to find potentially new HERV groups.
Keywords :
genetics; medical computing; microorganisms; pattern clustering; self-organising feature maps; RetroTector system; bootstrapping; cluster structure; human endogenous retroviruses sequence; human endogenous retroviruses visualization; human genome; median self-organizing maps; taxonomy; Bioinformatics; DNA; Data visualization; Genomics; Humans; Phylogeny; Self organizing feature maps; Sequences; Taxonomy; Two dimensional displays;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2004. CIBCB '04. Proceedings of the 2004 IEEE Symposium on
Print_ISBN :
0-7803-8728-7
DOI :
10.1109/CIBCB.2004.1393939