Title :
A software tool to explore the structure of high dimensional biomolecular data
Author :
Pavesi, Giulio ; Zambelli, Federico ; R?¨, Matteo ; Valentini, G.
Author_Institution :
Dept. of Biomol. Sci. & Biotechnol., Univ. of Milan, Milan, Italy
Abstract :
In gene expression data analysis, several methods based on the concept of stability have been proposed to estimate the reliability of each individual expression gene cluster as well as the “optimal” number of clusters. In this conceptual framework a clustering ensemble is obtained through bootstrapping techniques, noise injection into the data or random projections into lower dimensional subspaces. A measure of the reliability of a given clustering is obtained through specific stability/reliability scores based on the similarity of the clusterings composing the ensemble. In this paper we present a software tool for detecting realiable and possibly multiple structures (e.g. hierarchical structures) simultaneously present in the data. Statistical approaches based on the chi-square distribution and on the Bernstein inequality, show that stability-based methods can be successfully applied to the statistical assessment of the reliability of clusters, and to discover multiple structures underlying complex bio-molecular data.
Keywords :
Bioinformatics; Clustering algorithms; Data analysis; Gene expression; Maintenance; Packaging; Reproducibility of results; Software tools; Stability; Testing;
Conference_Titel :
Electrical Engineering/Electronics Computer Telecommunications and Information Technology (ECTI-CON), 2010 International Conference on
Print_ISBN :
978-1-4244-5606-2
Electronic_ISBN :
978-1-4244-5607-9