• DocumentCode
    3525851
  • Title

    An Evolutionary Algorithm for Column Family Schema Optimization in HBase

  • Author

    Fangzhou Yang ; Jian Cao ; Milosevic, Dragan

  • Author_Institution
    Dept. of CSE, Shanghai Jiaotong Univ., Shanghai, China
  • fYear
    2015
  • fDate
    March 30 2015-April 2 2015
  • Firstpage
    439
  • Lastpage
    445
  • Abstract
    Apache HBase is a column-oriented NoSQL key-value store built on top of the Hadoop distributed file-system. Logically, columns in HBase are grouped into column families. Physically, all columns in one column family are stored in the same set of files. Therefore the division of column families is closely related to the response time for a specific row query. In this paper, one new Evolutionary Algorithm is designed and applied to find the optimum column family schema for the given user queries. The reading performance of the optimized column family schema is evaluated on a real dataset provided by ZANOX AG, which contains 2.6 million rows of aggregated tracking data and 1.3 million user queries. It is shown that by using the found optimized column family schema, the reading performance of HBase is improved with a statistical significance. User queries from a testing set show that the average response time is reduced by up to 72% compared to un-optimized column family schemas.
  • Keywords
    SQL; data handling; evolutionary computation; parallel processing; query processing; statistical analysis; Apache HBase; Hadoop distributed file-system; ZANOX AG; column family schema optimization; column-oriented NoSQL key-value store; evolutionary algorithm; statistical significance; user queries; Algorithm design and analysis; Big data; Conferences; Evolutionary computation; Genetic algorithms; Layout; Optimization; Column Family; Column Layout; Evolutionary Algorithm; HBase; NoSQL; Schema Optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
  • Conference_Location
    Redwood City, CA
  • Type

    conf

  • DOI
    10.1109/BigDataService.2015.20
  • Filename
    7184913