DocumentCode :
3421783
Title :
Compression of LC/MS proteomic data
Author :
Miguel, Agnieszka C. ; Keane, John F. ; Whiteaker, Jeffrey ; Zhang, Heidi ; Paulovich, Amanda
Author_Institution :
Electr. & Comput. Eng., Seattle Univ., WA
fYear :
2006
fDate :
28-30 March 2006
Lastpage :
460
Abstract :
Summary form only given. The unrelenting growth of liquid chromatography-mass spectrometry (LC-MS) based proteomic data to gigabytes per sample and terabytes per experiment motivates this investigation into compression methods suited to MS signal sources. Compression is needed to facilitate storage, searching, archiving, retrieval, and communication of proteomic MS data. We demonstrate compression techniques that reduce the average file size by a factor of 25 without any loss of accuracy. We have designed two main methods to code the MS data. The first method predicts the mass-to-charge ratio based on the intensity values and encodes the residual with bzip2. The second algorithm maps the original intensity values onto a universal grid and either directly encodes them with bzip2 or applies an arithmetic coder to the results of run-length coding. The latter method achieves the highest compression ratios
Keywords :
arithmetic codes; data compression; medical signal processing; arithmetic coder; average file size reduction; bzip2 encoding; mass-to-charge ratio; proteomic data compression; run-length coding; universal grid; Biological information theory; Biology computing; Cancer; Data engineering; Mass spectroscopy; Peptides; Protein engineering; Proteomics; Sampling methods; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2006. DCC 2006. Proceedings
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-7695-2545-8
Type :
conf
DOI :
10.1109/DCC.2006.14
Filename :
1607303
Link To Document :
بازگشت