• DocumentCode
    3506793
  • Title

    A MapReduceMerge-based Data Cube Construction Method

  • Author

    Wang, Yuxiang ; Song, Aibo ; Luo, Junzhou

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Southeast Univ., Nanjing, China
  • fYear
    2010
  • fDate
    1-5 Nov. 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The pre-computation of data cubes is critical to improve the response time of On-Line Analytical Processing (OLAP) system. However, as the size of data grows, the time it takes to construct data cubes becomes a significant performance bottleneck. Therefore, we need the parallel pre-computation approach to further improve the performance of OLAP. Current parallel approaches can be grouped into two categories: work partitioning and data partitioning. But the first one can not guarantee the load balance among processors and the second one produces massive data movement between processors. This paper proposes a MapReduceMerge-based parallel data cube construction method with a read-optimized data storage strategy which is more suitable for OLAP. Our method can ensure good load balancing and reduce the large amount of data movement compared with traditional approaches. MapReduceMerge is the expansion of Map Reduce which is a programming model that enables easy development of parallel applications to process massive data on large clusters and it is the key element of Hadoop(an cloud computing framework) which used to support the businesses of Face book under cloud environment. We modify the original MapReduceMerge framework to make it meet the needs of cuboids construction and show the implementation in details through an example of 2-dimension cuboids construction. In the mean time, we discuss the optimization for the construction of multi-dimension cuboids.
  • Keywords
    cloud computing; data mining; data warehouses; parallel processing; resource allocation; social networking (online); 2-dimension cuboids construction; Facebook; Hadoop; MapReduceMerge framework; OLAP; cloud computing framework; data partitioning; data warehouse; load balancing; multidimension cuboid construction; on-line analytical processing system; parallel application; parallel data cube construction method; parallel pre-computation approach; performance bottleneck; processor; programming model; read-optimized data storage strategy; work partitioning; MapReduceMerge; OLAP; data cube;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Grid and Cooperative Computing (GCC), 2010 9th International Conference on
  • Conference_Location
    Nanjing
  • Print_ISBN
    978-1-4244-9334-0
  • Electronic_ISBN
    978-0-7695-4313-0
  • Type

    conf

  • DOI
    10.1109/GCC.2010.14
  • Filename
    5662731