• DocumentCode
    3016236
  • Title

    Parallelizing a Defect Detection and Categorization Application

  • Author

    Glimcher, Leonid ; Agrawal, Gagan ; Mehta, Sameep ; Jin, Ruoming ; Machiraju, Raghu

  • Author_Institution
    Dept. of Comput. & Sci., Ohio State Univ., Columbus, OH, USA
  • fYear
    2005
  • fDate
    04-08 April 2005
  • Abstract
    This paper presents a case study in creating a parallel and scalable implementation of a scientific data analysis application. We focus on a defect detection and categorization application which analyzes datasets produced by Molecular Dynamics (MD) simulations. In parallelizing this application, we had the following three goals. First, we obviously wanted to achieve high parallel efficiency. Second, we wanted to create an implementation that can scale to disk-resident datasets. Third, we wanted to create an easy to maintain and modify implementation, which is possible only through using high-level interfaces. We used a number of techniques for organizing the input data, achieving load balance, and efficiently parallelizing the step for updating and matching with the defect catalog. To meet our third goal, we used a system called FREERIDE (FRamework for Rapid Implementation of Datamining Engines), which was originally developed for parallelizing data mining algorithms. We have carried out a detailed evaluation of our implementation. The main observations from our experiments are as follows: 1) our implementation achieves high parallel efficiency, 2) the execution time remains proportional to the amount of computation even as the dataset becomes disk-resident, and 3) our scheme for load balancing and the method we use for parallelizing updating and matching of the defect catalog are crucial for parallel efficiency of the defect categorization phase.
  • Keywords
    data analysis; data mining; molecular dynamics method; physics computing; resource allocation; categorization application; defect detection; disk-resident dataset; high-level interfaces; load balance; molecular dynamics simulation; parallelizing data mining algorithm; Distributed processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International
  • Print_ISBN
    0-7695-2312-9
  • Type

    conf

  • DOI
    10.1109/IPDPS.2005.332
  • Filename
    1419853