• DocumentCode
    3043362
  • Title

    Scaling and parallelizing a scientific feature mining application using a cluster middleware

  • Author

    Glimcher, Leonid ; Zhang, Xuan ; Agrawal, Gagan

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
  • fYear
    2004
  • fDate
    26-30 April 2004
  • Firstpage
    87
  • Abstract
    Summary form only given. As scientific simulations are generating large amounts of data, analyzing this data to gain insights into scientific phenomenon is increasingly becoming a challenge. We present a case study on the use of a cluster middleware for rapidly creating a scalable and parallel implementation of a scientific data analysis application. Using FREERIDE (framework for rapid implementation of data mining engines), we parallelize as well as scale to disk-resident datasets a feature extraction algorithm. We have developed a parallel algorithm for this problem which matches the communication and computation structure supported by the FREERIDE system. The main observations from our experimental results are as follows: 1) the overhead of using the middleware is quite small in most cases, 2) there is an overhead associated with breaking the datasets into more partitions or chunks, and 3) if the dataset is partitioned into the same number of chunks, the execution time stays proportional to the size of the dataset and inversely proportional to the number of nodes, i.e. the overhead of communication or reading disk-resident datasets is very small.
  • Keywords
    data analysis; data mining; feature extraction; middleware; parallel algorithms; workstation clusters; FREERIDE system; cluster middleware; disk-resident datasets; feature extraction algorithm; parallel algorithm; scientific data analysis; scientific simulation; Analytical models; Application software; Clustering algorithms; Computational modeling; Computer simulation; Data analysis; Engines; Feature extraction; Middleware; Parallel algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International
  • Print_ISBN
    0-7695-2132-0
  • Type

    conf

  • DOI
    10.1109/IPDPS.2004.1303029
  • Filename
    1303029