• DocumentCode
    154093
  • Title

    A Hybrid CPU-GPU System for Stitching Large Scale Optical Microscopy Images

  • Author

    Blattner, Timothy ; Keyrouz, Walid ; Chalfoun, Joe ; Stivalet, Bertrand ; Brady, Mary ; Shujia Zhou

  • Author_Institution
    ITL, Nat. Inst. of Stand. & Technol., Gaithersburg, MD, USA
  • fYear
    2014
  • fDate
    9-12 Sept. 2014
  • Firstpage
    1
  • Lastpage
    9
  • Abstract
    Researchers in various fields are using optical microscopy to acquire very large images, 10000 - 200000 of pixels per side. Optical microscopes acquire these images as grids of overlapping partial images (thousands of pixels per side) that are then stitched together via software. Composing such large images is a compute and data intensive task even for modern machines. Researchers compound this difficulty further by obtaining time-series, volumetric, or multiple channel images with the resulting data sets now having or approaching terabyte sizes. We present a scalable hybrid CPU-GPU implementation of image stitching that processes large image sets at near interactive rates. Our implementation scales well with both image sizes and the number of CPU cores and GPU cards in a machine. It processes a grid of 42 × 59 tiles into a 17 k × 22 k pixels image in 43 s (end-to-end execution times) when using one NVIDIA Tesla C2070 card and two Intel Xeon E-5620 quad-core CPUs, and in 29 s when using two Tesla C2070 cards and the same two CPUs. It also composes and renders the composite image without saving it in 15 s. In comparison, ImageJ/Fiji, which is widely used by biologists, has an image stitching plugin that takes > 3.6 h for the same workload despite being multithreaded and executing the same mathematical operators, it composes and saves the large image in an additional 1.5 h. This implementation takes advantage of coarse-grain parallelism. It organizes the computation into a pipeline architecture that spans CPU and GPU resources and overlaps computation with data motion. The implementation achieves a nearly 10× performance improvement over our optimized non-pipeline GPU implementation and demonstrates near-linear speedup when increasing CPU thread count and increasing number of GPUs.
  • Keywords
    graphics processing units; image processing; optical images; optical microscopy; pipeline processing; Intel Xeon E-5620 quad-core CPU; NVIDIA Tesla C2070 card; coarse-grain parallelism; hybrid CPU-GPU implementation; image stitching plugin; large scale optical microscopy image; pipeline architecture; Correlation; Graphics processing units; Kernel; Microscopy; Optical microscopy; Random access memory; Transforms; Heterogeneous (hybrid) systems; Hybrid systems; Parallel Architectures; Scheduling and task partitioning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2014 43rd International Conference on
  • Conference_Location
    Minneapolis MN
  • ISSN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2014.9
  • Filename
    6957209