• DocumentCode
    2052289
  • Title

    Lossless compression for large scale cluster logs

  • Author

    Balakrishnan, Raju ; Sahoo, Ramendra K.

  • Author_Institution
    India Software Lab., IBM, Bangalore
  • fYear
    2006
  • fDate
    25-29 April 2006
  • Abstract
    The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM´s Blue Gene/L which can accommodate as many as 128K processors. One of the biggest challenges these systems face, is to manage generated system logs while deploying in production environments. Large amount of log data is created over extended period of time, across thousands of processors. These logs generated can be voluminous because of the large temporal and spatial dimensions, and containing records which are repeatedly entered to the log archive. Storing and transferring such large amount of log data is a challenging problem. Commonly used generic compression utilities are not optimal for such large amount of data considering a number of performance requirements. In this paper we propose a compression algorithm which preprocesses these logs before trying out any standard compression utilities. The compression ratios and times for the combination shows 28.3% improvement in compression ratio and 43.4% improvement in compression time on average over different generic compression utilities. The test data used is log data produced by 64 racks, 65536 processor Blue Gene/L installation at Lawrence Livermore National Laboratory
  • Keywords
    data compression; parallel machines; workstation clusters; IBM Blue Gene/L; compression utilities; extreme-scale parallel machine; large scale cluster logs; log archive; lossless compression; production environment; scientific application; Bandwidth; Concurrent computing; Data compression; Data handling; Information analysis; Information filtering; Information filters; Laboratories; Large-scale systems; Performance analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
  • Conference_Location
    Rhodes Island
  • Print_ISBN
    1-4244-0054-6
  • Type

    conf

  • DOI
    10.1109/IPDPS.2006.1639692
  • Filename
    1639692