DocumentCode :
109360
Title :
CAMS-RS: Clustering Algorithm for Large-Scale Mass Spectrometry Data Using Restricted Search Space and Intelligent Random Sampling
Author :
Saeed, Fahad ; Hoffert, Jason D. ; Knepper, Mark A.
Author_Institution :
Epithelial Syst. Biol. Lab., Nat. Inst. of Health (NIH), Bethesda, MD, USA
Volume :
11
Issue :
1
fYear :
2014
fDate :
Jan.-Feb. 2014
Firstpage :
128
Lastpage :
141
Abstract :
High-throughput mass spectrometers can produce massive amounts of redundant data at an astonishing rate with many of them having poor signal-to-noise (S/N) ratio. These low S/N ratio spectra may not get interpreted using conventional spectra-to-database matching techniques. In this paper, we present an efficient algorithm, CAMS-RS (Clustering Algorithm for Mass Spectra using Restricted Space and Sampling) for clustering of raw mass spectrometry data. CAMS-RS utilizes a novel metric (called F-set) that exploits the temporal and spatial patterns to accurately assess similarity between two given spectra. The F-set similarity metric is independent of the retention time and allows clustering of mass spectrometry data from independent LC-MS/MS runs. A novel restricted search space strategy is devised to limit the comparisons of the number of spectra. An intelligent sampling method is executed on individual bins that allow merging of the results to make the final clusters. Our experiments, using experimentally generated data sets, show that the proposed algorithm is able to cluster spectra with high accuracy and is helpful in interpreting low S/N ratio spectra. The CAMS-RS algorithm is highly scalable with increasing number of spectra and our implementation allows clustering of up to a million spectra within minutes.
Keywords :
biochemistry; mass spectroscopy; proteomics; CAMS-RS algorithm; F-set; S-N ratio spectra; cluster spectra; clustering algorithm-for-mass spectra-using-restricted space-and-sampling; conventional spectra-database matching techniques; intelligent random sampling; large-scale mass spectrometry data; raw mass spectrometry; restricted search space strategy; signal-noise ratio; spatial patterns; temporal patterns; Accuracy; Algorithm design and analysis; Clustering algorithms; Mass spectroscopy; Peptides; Proteins; Mass spectrometry; clustering; proteomics; search space;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2013.152
Filename :
6674297
Link To Document :
بازگشت