• DocumentCode
    1203370
  • Title

    Sort vs. hash revisited

  • Author

    Graefe, Goetz ; Linville, Ann ; Shapiro, Leonard D.

  • Author_Institution
    Microsoft Corp., Redmond, WA, USA
  • Volume
    6
  • Issue
    6
  • fYear
    1994
  • fDate
    12/1/1994 12:00:00 AM
  • Firstpage
    934
  • Lastpage
    944
  • Abstract
    Efficient algorithms for processing large volumes of data are very important both for relational and new object-oriented database systems. Many query-processing operations can be implemented using sort- or hash-based algorithms, e.g. intersections, joins, and duplicate elimination. In the early relational database systems, only sort-based algorithms were employed. In the last decade, hash-based algorithms have gained acceptance and popularity, and are often considered generally superior to sort-based algorithms such as merge-join. In this article, we compare the concepts behind sort- and hash-based query-processing algorithms and conclude that (1) many dualities exist between the two types of algorithms, (2) their costs differ mostly by percentages rather than by factors, (3) several special cases exist that favor one or the other choice, and (4) there is a strong reason why both hash- and sort-based algorithms should be available in a query-processing system. Our conclusions are supported by experiments performed using the Volcano query execution engine
  • Keywords
    database theory; file organisation; object-oriented databases; query processing; relational databases; sorting; Volcano query execution engine; costs; dualities; duplicate elimination; hash-based algorithms; intersections; joins; merge-join algorithm; object-oriented database systems; performance; query-processing operations; relational database systems; sort-based algorithms; value matching; Computer science; Costs; Database languages; Database systems; Engines; Partitioning algorithms; Query processing; Relational databases; Sorting; Volcanoes;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.334883
  • Filename
    334883