• DocumentCode
    167656
  • Title

    Optimizing the Join Operation on Hive to Accelerate Cross-Matching in Astronomy

  • Author

    Liang Li ; Dixin Tang ; Taoying Liu ; Hong Liu ; Wei Li ; Chenzhou Cui

  • Author_Institution
    Inst. of Comput. Technol., Beijing, China
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    1735
  • Lastpage
    1745
  • Abstract
    Cross-matching in astronomy is a basic procedure for comprehensibly analyzing the relations among different celestial objects. The aim is to search celestial objects in different catalogs and to determine if they are the same object. Basically, cross-matching can be expressed as a join query statement. Since celestial catalogs usually contain billion of stars, the join operator must be carefully designed and optimized for efficiency. In this paper, we focus on fulfilling cross-matching by MapReduce based join operators. The challenge is how to optimize the join operators to satisfy specific requirements of cross-matching. Therefore, we propose an optimized method and investigate its efficiency by theoretical analysis and experiment. Our study shows that the method has a remarkable improvement to previous work, especially when the data is very large.
  • Keywords
    astronomy computing; optimisation; query processing; string matching; MapReduce; astronomy cross-matching; celestial object relations; join operation optimization; join query statement; Conferences; Distributed processing; Astronomy; Cross-Matching; Join; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
  • Conference_Location
    Phoenix, AZ
  • Print_ISBN
    978-1-4799-4117-9
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2014.193
  • Filename
    6969584