Toward multidatabase mining: identifying relevant databases

Author

Liu, Huan ; Lu, Hongjun ; Yao, Jun

Author_Institution

Dept. of Comput. Sci., Arizona State Univ., Tempe, AZ, USA

Volume

13

Issue

4

fYear

2001

Firstpage

541

Lastpage

553

Abstract

Various tools and systems for knowledge discovery and data mining have been developed and are available for applications. However, when there are many databases, an immediate question is where one should start mining. It is not true that data mining is better the more databases there are. It is only true when the databases involved are relevant to the task at hand. By breaking away from the conventional data mining assumption that many databases should be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most relevant to an application; without doing so, the mining process can be lengthy, aimless, and ineffective. A measure of relevance is thus proposed for mining tasks with the objective of finding patterns or regularities of certain attributes. An efficient algorithm for identifying relevant databases is described. Experiments are conducted to verify the measure´s performance and to exemplify its application

Keywords

data mining; distributed databases; data mining; knowledge discovery; multidatabase mining; pattern finding; regularity finding; relevant database identification; Data mining; Database systems; Pressing; Statistics; Surges;

fLanguage

English

Journal_Title

Knowledge and Data Engineering, IEEE Transactions on

Publisher

ieee

ISSN

1041-4347

Type

jour

DOI

10.1109/69.940731

Filename

940731