• DocumentCode
    1519333
  • Title

    Toward multidatabase mining: identifying relevant databases

  • Author

    Liu, Huan ; Lu, Hongjun ; Yao, Jun

  • Author_Institution
    Dept. of Comput. Sci., Arizona State Univ., Tempe, AZ, USA
  • Volume
    13
  • Issue
    4
  • fYear
    2001
  • Firstpage
    541
  • Lastpage
    553
  • Abstract
    Various tools and systems for knowledge discovery and data mining have been developed and are available for applications. However, when there are many databases, an immediate question is where one should start mining. It is not true that data mining is better the more databases there are. It is only true when the databases involved are relevant to the task at hand. By breaking away from the conventional data mining assumption that many databases should be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most relevant to an application; without doing so, the mining process can be lengthy, aimless, and ineffective. A measure of relevance is thus proposed for mining tasks with the objective of finding patterns or regularities of certain attributes. An efficient algorithm for identifying relevant databases is described. Experiments are conducted to verify the measure´s performance and to exemplify its application
  • Keywords
    data mining; distributed databases; data mining; knowledge discovery; multidatabase mining; pattern finding; regularity finding; relevant database identification; Data mining; Database systems; Pressing; Statistics; Surges;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.940731
  • Filename
    940731