• DocumentCode
    710106
  • Title

    Progressive diversification for column-based data exploration platforms

  • Author

    Khan, Hina A. ; Sharaf, Mohamed A.

  • Author_Institution
    Univ. of Queensland, Brisbane, QLD, Australia
  • fYear
    2015
  • fDate
    13-17 April 2015
  • Firstpage
    327
  • Lastpage
    338
  • Abstract
    In Data Exploration platforms, diversification has become an essential method for extracting representative data, which provide users with a concise and meaningful view of the results to their queries. However, the benefits of diversification are achieved at the expense of an additional cost for the post-processing of query results. For high dimensional large result sets, the cost of diversification is further escalated due to massive distance computations required to evaluate the similarity between results. To address that challenge, in this paper we propose the Progressive Data Diversification (pDiverse) scheme. The main idea underlying pDiverse is to utilize partial distance computation to reduce the amount of processed data. Our extensive experimental results on both synthetic and real data sets show that our proposed scheme outperforms existing diversification methods in terms of both I/O and CPU costs.
  • Keywords
    data handling; query processing; column-based data exploration platforms; pDiverse scheme; progressive data diversification scheme; progressive diversification; query processing; Data mining; Euclidean distance; Handheld computers; Heuristic algorithms; Memory; Query processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2015 IEEE 31st International Conference on
  • Conference_Location
    Seoul
  • Type

    conf

  • DOI
    10.1109/ICDE.2015.7113295
  • Filename
    7113295