• DocumentCode
    3122860
  • Title

    Similarity Group-By

  • Author

    Silva, Yasin N. ; Aref, Walid G. ; Ali, Mohamed H.

  • Author_Institution
    Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN
  • fYear
    2009
  • fDate
    March 29 2009-April 2 2009
  • Firstpage
    904
  • Lastpage
    915
  • Abstract
    Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.
  • Keywords
    SQL; optimisation; query processing; SQL operators; core database operation; decision support systems; similarity group-by; Biological system modeling; Clustering algorithms; Computer science; Data engineering; Data mining; Database systems; Engines; Query processing; Scalability; USA Councils; Clustering; Database Systems; OLAP; Similarity Query Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    1084-4627
  • Print_ISBN
    978-1-4244-3422-0
  • Electronic_ISBN
    1084-4627
  • Type

    conf

  • DOI
    10.1109/ICDE.2009.113
  • Filename
    4812464