• DocumentCode
    109360
  • Title

    CAMS-RS: Clustering Algorithm for Large-Scale Mass Spectrometry Data Using Restricted Search Space and Intelligent Random Sampling

  • Author

    Saeed, Fahad ; Hoffert, Jason D. ; Knepper, Mark A.

  • Author_Institution
    Epithelial Syst. Biol. Lab., Nat. Inst. of Health (NIH), Bethesda, MD, USA
  • Volume
    11
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan.-Feb. 2014
  • Firstpage
    128
  • Lastpage
    141
  • Abstract
    High-throughput mass spectrometers can produce massive amounts of redundant data at an astonishing rate with many of them having poor signal-to-noise (S/N) ratio. These low S/N ratio spectra may not get interpreted using conventional spectra-to-database matching techniques. In this paper, we present an efficient algorithm, CAMS-RS (Clustering Algorithm for Mass Spectra using Restricted Space and Sampling) for clustering of raw mass spectrometry data. CAMS-RS utilizes a novel metric (called F-set) that exploits the temporal and spatial patterns to accurately assess similarity between two given spectra. The F-set similarity metric is independent of the retention time and allows clustering of mass spectrometry data from independent LC-MS/MS runs. A novel restricted search space strategy is devised to limit the comparisons of the number of spectra. An intelligent sampling method is executed on individual bins that allow merging of the results to make the final clusters. Our experiments, using experimentally generated data sets, show that the proposed algorithm is able to cluster spectra with high accuracy and is helpful in interpreting low S/N ratio spectra. The CAMS-RS algorithm is highly scalable with increasing number of spectra and our implementation allows clustering of up to a million spectra within minutes.
  • Keywords
    biochemistry; mass spectroscopy; proteomics; CAMS-RS algorithm; F-set; S-N ratio spectra; cluster spectra; clustering algorithm-for-mass spectra-using-restricted space-and-sampling; conventional spectra-database matching techniques; intelligent random sampling; large-scale mass spectrometry data; raw mass spectrometry; restricted search space strategy; signal-noise ratio; spatial patterns; temporal patterns; Accuracy; Algorithm design and analysis; Clustering algorithms; Mass spectroscopy; Peptides; Proteins; Mass spectrometry; clustering; proteomics; search space;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.152
  • Filename
    6674297