Author_Institution :
Dept. of Comput. Sci., Southern Illinois Univ., Carbondale, IL, USA
Abstract :
Presents a framework for statistical data mining using summary tables. A set of operators is proposed for common data mining tasks, such as summarization, association, classification and clustering, as well as for basic statistical analysis, such as hypothesis testing, estimation and regression, which can help explore knowledge. The operators enable users to explore a variety of knowledge effectively and yet require users to have little statistical knowledge. Summary tables, which store basic information about groups of tuples of the underlying relations, are constructed to speed up the data mining process. The summary tables are incrementally updatable and are able to support a variety of data mining and statistical analysis tasks. The operators, together with the uses of the summary tables, can make interactive data mining flexible, effective, and perhaps instantaneous
Keywords :
data mining; statistical databases; association; classification; clustering; data mining operators; estimation; hypothesis testing; incrementally updatable tables; interactive data mining; knowledge exploration; regression; statistical analysis; statistical data mining; summarization; summary tables; tuple groups; Data analysis; Data mining; Information theory; Machine learning; Measurement uncertainty; Psychology; Relational databases; Solids; Statistical analysis; Testing;