• DocumentCode
    1122585
  • Title

    Index selection for databases: a hardness study and a principled heuristic solution

  • Author

    Chaudhuri, Surajit ; Datar, Mayur ; Narasayya, Vivek

  • Author_Institution
    Microsoft Res., Redmond, WA, USA
  • Volume
    16
  • Issue
    11
  • fYear
    2004
  • Firstpage
    1313
  • Lastpage
    1323
  • Abstract
    We study the index selection problem: Given a workload consisting of SQL statements on a database, and a user-specified storage constraint, recommend a set of indexes that have the maximum benefit for the given workload. We present a formal statement for this problem and show that it is computationally "hard" to solve or even approximate it. We develop a new algorithm for the problem which is based on treating the problem as a knapsack problem. The novelty of our approach lies in an LP (linear programming) based method that assigns benefits to individual indexes. For a slightly modified algorithm, that does more work, we prove that we can give instance specific guarantees about the quality of our solution. We conduct an extensive experimental evaluation of this new heuristic and compare it with previous solutions. Our results demonstrate that our solution is more scalable while achieving comparable quality.
  • Keywords
    SQL; approximation theory; computational complexity; database indexing; heuristic programming; knapsack problems; linear programming; SQL statement; index selection; knapsack problem; linear programming; user-specified storage constraint; Computer Society; Constraint optimization; Costs; Database systems; Indexes; Linear programming; Paper technology; Scalability; 65; Index Terms- Index selection; NP-hardness; approximation; hardness result; knapsack; linear programming; scalability.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2004.75
  • Filename
    1339260