Title :
TOP-COP: Mining TOP-K Strongly Correlated Pairs in Large Databases
Author :
Xiong, Hui ; Brodie, Mark ; Ma, Sheng
Author_Institution :
MSIS Dept., Rutgers Univ., New Brunswick, NJ
Abstract :
Recently, there has been considerable interest in computing strongly correlated pairs in large databases. Most previous studies require the specification of a minimum correlation threshold to perform the computation. However, it may be difficult for users to provide an appropriate threshold in practice, since different data sets typically have different characteristics. To this end, we propose an alternative task: mining the top-k strongly correlated pairs. In this paper, we identify a 2-D monotone property of an upper bound of Pearson´s correlation coefficient and develop an efficient algorithm, called TOP-COP to exploit this property to effectively prune many pairs even without computing their correlation coefficients. Our experimental results show that the TOP-COP algorithm can be orders of magnitude faster than brute-force alternatives for mining the top-k strongly correlated pairs.
Keywords :
data mining; database management systems; 2D monotone property; Pearson correlation coefficient; TOP-COP; TOP-K strongly correlated pair mining; large databases; minimum correlation threshold; Algorithm design and analysis; Bioinformatics; Books; Computational efficiency; Data mining; Marketing and sales; Promotion - marketing; Public healthcare; Transaction databases; Upper bound;
Conference_Titel :
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2701-7
DOI :
10.1109/ICDM.2006.161