• DocumentCode
    704188
  • Title

    Memory-Optimised Parallel Processing of Hi-C Data

  • Author

    Drocco, Maurizio ; Misale, Claudia ; Pezzi, Guilherme Peretti ; Tordini, Fabio ; Aldinucci, Marco

  • Author_Institution
    Comput. Sci. Dept., Univ. of Turin, Turin, Italy
  • fYear
    2015
  • fDate
    4-6 March 2015
  • Firstpage
    741
  • Lastpage
    746
  • Abstract
    This paper presents the optimisation efforts on the creation of a graph-based mapping representation of gene adjacency. The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes. Straightforward parallelisation of this scheme does not yield acceptable performance on multicore architectures since the scalability is rather limited due to the memory bound nature of the problem. This work focuses on the memory optimisations that can be applied to the graph construction algorithm and its (complex) data structures to derive a cache-oblivious algorithm and eventually to improve the memory bandwidth utilisation. We used as running example not, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neighborhood graph. The proposed approach, which is exemplified for Hi-C, addresses several common issue in the parallelisation of memory bound algorithms for multicore. Results show that the proposed approach is able to increase the parallel speedup from 7x to 22x (on a 32-core platform). Finally, the proposed C++ implementation outperforms the first R Nu Chart prototype, by which it was not possible to complete the graph generation because of strong memory-saturation problems.
  • Keywords
    cache storage; graph theory; multiprocessing systems; optimisation; parallel processing; statistical analysis; C++ implementation; Hi-C data; cache-oblivious algorithm; data structures; gene-centric neighborhood graph; graph construction algorithm; graph-based mapping representation; memory bandwidth utilisation; memory optimisations; memory-optimised parallel processing; multicore architectures; next generation sequencing data; Arrays; Bioinformatics; Biological cells; Genomics; Optimization; Resource management; Skeleton;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on
  • Conference_Location
    Turku
  • ISSN
    1066-6192
  • Type

    conf

  • DOI
    10.1109/PDP.2015.63
  • Filename
    7092802