Author_Institution :
Dipt. di Ing. Inf., Gestionale e dell´´Autom., Univ. Politec. delle Marche, Ancona, Italy
Abstract :
The rapid growth of databases in last years asks organizations to deal with issues related to the management of large amounts of data, which represent a valuable resource for decision making processes. Although technologies for data management/storage are widely available, much effort is still needed to provide users with systems for effectively analyzing and understanding data. We use the term Knowledge Discovery in Databases (KDD) to refer to the non-trivial process of extracting interesting, valid and useful patterns from data. As a process, it often involves several steps, which may include: selection of a subset of data from the whole dataset, data cleaning and transformation, feature extraction, choice of the appropriate Data Mining technique for extracting patterns, its configuration and execution, evaluation and interpretation of results and deployment of new knowledge to users. Especially for non-experts, definition and management of a KDD process are themselves demanding activities, because they require user to know how to choose the proper tools among the plethora of available ones, how to setup them, how to interpret their output. In order to manage a KDD process, it is usually needed a team of different experts, each of which is able to configure only a part of the whole process. Such a team either can belong to the same organization or can be a geographically distributed virtual team of experts. Hence, integration of distributed users and tools, along with heterogeneity of these latter are issues to take into account in order to define effective solutions for the problem at hand. Real scenarios for KDD may include not only network organizations, for which distributed cooperative KDD projects may represent a significant added value, but also the support to e-Science processes (e.g. particle physics, earth sciences, and bioinformatics). In such a highly distributed environment, in fact, scientists need technologies for a collaborative analysis of data pr- - oduced by scientific experimentations.
Keywords :
data mining; database management systems; virtual storage; KDD processes management; data cleaning; data management; data mining; e-science processes; geographically distributed virtual team; knowledge discovery in databases; organizations; semantic driven design; Cleaning; Data analysis; Data mining; Decision making; Feature extraction; Geoscience; Resource management; Spatial databases; Technology management; Virtual groups; Collaborative Knowledge Discovery in Databases; Knowledge Discovery Support System; Semantic Technologies for KDD; e-Science;