Title :
Finding a set of high-frequency queries for high-frequency-query-based filter for similarity join
Author :
Kunanusont, Kamolwan ; Chongstitvatana, Jaruloj
Author_Institution :
Dept. of Math. & Comput. Sci., Chulalongkorn Univ., Bangkok, Thailand
Abstract :
Similarity search and similarity join are two important operations in text databases. Filter-and-verify framework aims to reduce the comparison time by filtering out some pairs of texts before actually comparing the remaining pairs. Many filter methods do not take into account the repetition of the query words over time. A query which is frequently repeated over a time period is called a high-frequency query. High-frequency-queries-based filter is a filter method that deals with this type of queries. The performance of this method depends on the choice of high-frequency queries. This paper proposes methods to find the set of high-frequency queries from the given query set. One method is to use DBSCAN and the other is to use DBSCAN with merging strategy, called DBSM. The experimental results show that both DBSCAN and DBSM can find high-frequency queries, but the set of high-frequency queries obtained from DBSM gives higher the pruning power for high-frequency-queries-based filter.
Keywords :
information filtering; query processing; text analysis; DBSCAN; DBSM merging strategy; filter-and-verify framework; high-frequency query; high-frequency-query-based filter method; query word repetition; similarity join; similarity search; text databases; Clustering algorithms; Filtering; Filtering algorithms; Force; Indexes; Merging; Cluster analysis; DBSCAN; High-frequency queries; Similarity join; Similarity search;
Conference_Titel :
Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2015 12th International Conference on
Conference_Location :
Hua Hin
DOI :
10.1109/ECTICon.2015.7206993