DocumentCode :
2131349
Title :
If Constraint-Based Mining is the Answer: What is the Constraint? (Invited Talk)
Author :
Boulicaut, Jean-François
Author_Institution :
CNRS INSA-Lyon, Univ. of Lyon, Villeurbanne
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
730
Lastpage :
730
Abstract :
Constraint-based mining has been proven to be extremely useful. It has been applied not only to many pattern discovery settings (e.g., for sequential pattern mining) but also, recently, on classification and clustering tasks (see, e.g., ). It appears as a key technology for an inductive database perspective on knowledge discovery in databases (KDD), and constraint-based mining is indeed an answer to important data mining issues (e.g., for supporting a priori relevancy and subjective interestingness but also to achieve computational feasibility). However, few authors study the nature of constraints and their semantics. Considering several examples of non trivial KDD processes, we discuss the Hows, Whys, and Whens of constraints in a broader context than. Our thesis is that most of the typical data mining methods are constraint-based techniques and that it is worth studying and designing them as such. In many cases, we exploit constraints that are not really explicit (e.g., the objective function optimization of a clustering for a given similarity measure) and/or constraints whose operational semantics are relaxed w.r.t. their declarative counterparts (e.g., the optimization constraint is not enforced because of some local optimization heuristics). We think that is important to explicit every primitive constraint and the operators that combine them because this constitutes the declarative semantics of the constraints and thus the mining queries. Then, a well-studied challenge is to design some operational semantics like correct and complete solvers and/or relaxation schemes for more or less complex constraints. Designing complete solvers has been extensively studied in useful but yet limited settings (see, e.g., the algorithms for exploiting combinations of monotonic and anti-monotonic primitives). It is however clear that many relevant constraints lack from such nice properties. On another hand, understanding constraint relaxation strategies remains fairly open, cert- - ainly because of its intrinsically heuristic nature. Interestingly, the recent approaches that suggest global pattern or model construction based on local patterns enable to revisit the relaxation issue thanks to constraint back propagation possibilities. This can be discussed within a case study on constrained co-clustering.
Keywords :
data integrity; data mining; deductive databases; pattern classification; pattern clustering; query processing; classification task; clustering task; constraint back propagation; constraint relaxation strategy; constraint-based data mining query; declarative semantics; inductive database; knowledge discovery; operational semantics; pattern discovery; sequential pattern mining; Algorithm design and analysis; Clustering algorithms; Conferences; Constraint optimization; Constraint theory; Cyclic redundancy check; Data mining; Databases; FETs; Itemsets; constraints; data mining; inductive databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
Type :
conf
DOI :
10.1109/ICDMW.2008.96
Filename :
4733999
Link To Document :
بازگشت