• DocumentCode
    734217
  • Title

    An Optimized Generic Client Service API for Managing Large Datasets within a Data Repository

  • Author

    Prabhune, Ajinkya ; Stotzka, Rainer ; Jejkal, Thomas ; Hartmann, Volker ; Bach, Margund ; Schmitt, Eberhard ; Hausmann, Michael ; Hesser, Juergen

  • Author_Institution
    Karlsruhe Inst. of Technol., Karlsruhe, Germany
  • fYear
    2015
  • fDate
    March 30 2015-April 2 2015
  • Firstpage
    44
  • Lastpage
    51
  • Abstract
    Exponential growth in scientific research data demands novel measures for managing the extremely large datasets. In particular, due to advancements in high-resolution microscopy, the nanoscopy scientific research community is producing datasets up to the range of multiple TeraBytes (TB). Systematically acquired datasets of biological specimens are composed of multiple high-resolution images, in the range of 150-200 TB. The management of these extremely large datasets requires an optimized Generic Client Service (GCS) API with an integration into a data repository system. The novel API proposed in this paper provides an abstract interface that connects various disparate systems. The API is optimized to provide an efficient and automated ingest, download of the data and management of its metadata. The ingest and download processes are based on well-defined workflows stated in this paper. The base metadata model for comprehensive description of the datasets is also stated in the paper. The API is seamlessly integrated with a digital data repository system, namely KIT Data Manager to make it adaptable for a wide range of communities. Finally, a simple and easy to use command line tool is realized based on GCS API to manage large datasets of nanoscopy research community.
  • Keywords
    application program interfaces; biomedical optical imaging; client-server systems; data warehouses; medical computing; meta data; optical microscopy; scientific information systems; GCS API; KIT data manager; abstract interface; biological specimen datasets; digital data repository system; download process; high-resolution images; high-resolution microscopy; ingest process; large dataset management; metadata management; nanoscopy scientific research community; optimized generic client service API; scientific research data; Cache storage; Communities; Computer architecture; Data transfer; Metadata; Microscopy; Command Line Tool; Generic Client Service (GCS) API; KIT Data Manager; Large Datasets; Large Scale Data Repository; Localization Microscopy (LM); Metadata; Nanoscopy; Workflow;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
  • Conference_Location
    Redwood City, CA
  • Type

    conf

  • DOI
    10.1109/BigDataService.2015.25
  • Filename
    7184863