DocumentCode :
38775
Title :
Unsupervised Structure Detection in Biomedical Data
Author :
Vogt, Julia E.
Author_Institution :
Comput. Biol. Center, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
Volume :
12
Issue :
4
fYear :
2015
fDate :
July-Aug. 1 2015
Firstpage :
753
Lastpage :
760
Abstract :
A major challenge in computational biology is to find simple representations of high-dimensional data that best reveal the underlying structure. In this work, we present an intuitive and easy-to-implement method based on ranked neighborhood comparisons that detects structure in unsupervised data. The method is based on ordering objects in terms of similarity and on the mutual overlap of nearest neighbors. This basic framework was originally introduced in the field of social network analysis to detect actor communities. We demonstrate that the same ideas can successfully be applied to biomedical data sets in order to reveal complex underlying structure. The algorithm is very efficient and works on distance data directly without requiring a vectorial embedding of data. Comprehensive experiments demonstrate the validity of this approach. Comparisons with state-of-the-art clustering methods show that the presented method outperforms hierarchical methods as well as density based clustering methods and model-based clustering. A further advantage of the method is that it simultaneously provides a visualization of the data. Especially in biomedical applications, the visualization of data can be used as a first pre-processing step when analyzing real world data sets to get an intuition of the underlying data structure. We apply this model to synthetic data as well as to various biomedical data sets which demonstrate the high quality and usefulness of the inferred structure.
Keywords :
data analysis; data structures; medical computing; pattern clustering; unsupervised learning; biomedical data sets; complex underlying structure; computational biology; data structure; density based clustering methods; detects structure; distance data; easy-to-implement method; high-dimensional data; ranked neighborhood comparisons; social network analysis; state-of-the-art clustering methods; unsupervised structure detection; Bioinformatics; Clustering methods; Data visualization; Indexes; Proteins; Runtime; Sparse matrices; Bioinformatics; Clustering; Data Mining; Data mining; Knowledge Discovery; Network Analysis; Structure Detection; Unsupervised Learning; bioinformatics; clustering; knowledge discovery; network analysis; structure detection; unsupervised learning;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2015.2394408
Filename :
7024124
Link To Document :
بازگشت