مرکز منطقه ای اطلاع رساني علوم و فناوري - Efficient Discovery of the Top-K Optimal Dependency Rules with Fisher´s Exact Test of Significance

DocumentCode :

2207663

Title :

Efficient Discovery of the Top-K Optimal Dependency Rules with Fisher´s Exact Test of Significance

Author :

Hämäläinen, Wilhelmiina

Author_Institution :

Dept. of Comput. Sci., Univ. of Helsinki, Helsinki, Finland

fYear :

2010

fDate :

13-17 Dec. 2010

Firstpage :

196

Lastpage :

205

Abstract :

Statistical dependency analysis is the basis of all empirical science. A commonly occurring problem is to find the most significant dependency rules, which describe either positive or negative dependencies between categorical attributes. For example, in medical science one is interested in genetic factors, which can either predispose or prevent diseases. The requirement of statistical significance is essential, because the discoveries should hold also in the future data. Typically, the significance is estimated either by Fisher´s exact test or the χ²-measure. The problem is computationally very difficult, because the number of all possible dependency rules increases exponentially with the number of attributes. As a solution, different kinds of restrictions and heuristics have been applied, but a general, scalable search method has been missing. In this paper, we introduce an efficient algorithm for searching for the top-K globally optimal dependency rules using Fisher´s exact test as a measure function. The rules can express either positive or negative dependencies between a set of positive attributes and a single consequent attribute. The algorithm is based on an application of the branch- and-bound search strategy, supplemented by several pruning properties. Especially, we prove a new lower-bound for the Fisher´s p, and introduce a new effective pruning principle. The general search algorithm is applicable to other goodness measures, like the χ²-measure, as well. According to our experiments on classical benchmark data, the algorithm is well scalable and can efficiently handle even dense and high dimensional data sets. In addition, the quality of rules is significantly better than with the χ²-measure using the same search algorithm.

Keywords :

data analysis; data mining; set theory; statistical analysis; tree searching; Fisher exact test; branch-and-bound search; consequent attribute; optimal dependency rules; positive attributes; pruning principle; search algorithm; statistical dependency analysis; Fisher´s exact test; dependency rule; negative rule; rule discovery; statistical significance;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining (ICDM), 2010 IEEE 10th International Conference on

Conference_Location :

Sydney, NSW

ISSN :

1550-4786

Print_ISBN :

978-1-4244-9131-5

Electronic_ISBN :

1550-4786

Type :

conf

DOI :

10.1109/ICDM.2010.143

Filename :

5693973

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2207663