Title :
Compression of LC/MS proteomic data
Author :
Miguel, Agnieszka C. ; Keane, John F. ; Whiteaker, Jeffrey ; Zhang, Heidi ; Paulovich, Amanda
Author_Institution :
Electr. & Comput. Eng., Seattle Univ., WA
Abstract :
Summary form only given. The unrelenting growth of liquid chromatography-mass spectrometry (LC-MS) based proteomic data to gigabytes per sample and terabytes per experiment motivates this investigation into compression methods suited to MS signal sources. Compression is needed to facilitate storage, searching, archiving, retrieval, and communication of proteomic MS data. We demonstrate compression techniques that reduce the average file size by a factor of 25 without any loss of accuracy. We have designed two main methods to code the MS data. The first method predicts the mass-to-charge ratio based on the intensity values and encodes the residual with bzip2. The second algorithm maps the original intensity values onto a universal grid and either directly encodes them with bzip2 or applies an arithmetic coder to the results of run-length coding. The latter method achieves the highest compression ratios
Keywords :
arithmetic codes; data compression; medical signal processing; arithmetic coder; average file size reduction; bzip2 encoding; mass-to-charge ratio; proteomic data compression; run-length coding; universal grid; Biological information theory; Biology computing; Cancer; Data engineering; Mass spectroscopy; Peptides; Protein engineering; Proteomics; Sampling methods; XML;
Conference_Titel :
Data Compression Conference, 2006. DCC 2006. Proceedings
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-2545-8
DOI :
10.1109/DCC.2006.14