• DocumentCode
    38775
  • Title

    Unsupervised Structure Detection in Biomedical Data

  • Author

    Vogt, Julia E.

  • Author_Institution
    Comput. Biol. Center, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
  • Volume
    12
  • Issue
    4
  • fYear
    2015
  • fDate
    July-Aug. 1 2015
  • Firstpage
    753
  • Lastpage
    760
  • Abstract
    A major challenge in computational biology is to find simple representations of high-dimensional data that best reveal the underlying structure. In this work, we present an intuitive and easy-to-implement method based on ranked neighborhood comparisons that detects structure in unsupervised data. The method is based on ordering objects in terms of similarity and on the mutual overlap of nearest neighbors. This basic framework was originally introduced in the field of social network analysis to detect actor communities. We demonstrate that the same ideas can successfully be applied to biomedical data sets in order to reveal complex underlying structure. The algorithm is very efficient and works on distance data directly without requiring a vectorial embedding of data. Comprehensive experiments demonstrate the validity of this approach. Comparisons with state-of-the-art clustering methods show that the presented method outperforms hierarchical methods as well as density based clustering methods and model-based clustering. A further advantage of the method is that it simultaneously provides a visualization of the data. Especially in biomedical applications, the visualization of data can be used as a first pre-processing step when analyzing real world data sets to get an intuition of the underlying data structure. We apply this model to synthetic data as well as to various biomedical data sets which demonstrate the high quality and usefulness of the inferred structure.
  • Keywords
    data analysis; data structures; medical computing; pattern clustering; unsupervised learning; biomedical data sets; complex underlying structure; computational biology; data structure; density based clustering methods; detects structure; distance data; easy-to-implement method; high-dimensional data; ranked neighborhood comparisons; social network analysis; state-of-the-art clustering methods; unsupervised structure detection; Bioinformatics; Clustering methods; Data visualization; Indexes; Proteins; Runtime; Sparse matrices; Bioinformatics; Clustering; Data Mining; Data mining; Knowledge Discovery; Network Analysis; Structure Detection; Unsupervised Learning; bioinformatics; clustering; knowledge discovery; network analysis; structure detection; unsupervised learning;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2015.2394408
  • Filename
    7024124