Title :
TOP-MATA: A Max-First traversal method for top-K cosine similarity search
Author :
Zhu, Shiwei ; Wu, Junjie ; Xia, Guoping ; Li, Limin
Author_Institution :
Sch. of Econ. & Manage., Beihang Univ., Beijing, China
Abstract :
Recent years have witnessed an increased interest in computing cosine similarities between documents (or commodities). Most previous studies require the specification of a minimum similarity threshold to perform cosine similarity search. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly related pairs of objects as measured by the cosine similarity. Specifically, we first define the cosine similarity measure from the association analysis point of view and identify the monotone property of an upper bound of the cosine measure, then exploit a Max-First traversal strategy for developing the TOP-MATA algorithm. Compared with previous TOP-DATA method, TOP-MATA has the advantage of saving the computations for false-positive item pairs. Finally, experimental results demonstrate the computational efficiency of the algorithm.
Keywords :
data mining; document handling; search problems; TOP-MATA; documents; max-first traversal method; top-K cosine similarity search; Aircraft; Algorithm design and analysis; Association rules; Bioinformatics; Computational efficiency; Data mining; Databases; Pattern analysis; Sampling methods; Upper bound; Anti-Monotone Property; Association Analysis; Cosine Similarity; Interestingness Measure;
Conference_Titel :
Service Systems and Service Management (ICSSSM), 2010 7th International Conference on
Conference_Location :
Tokyo
Print_ISBN :
978-1-4244-6485-2
DOI :
10.1109/ICSSSM.2010.5530100