DocumentCode :
2985233
Title :
Assessing the Significance of Data Mining Results on Graphs with Feature Vectors
Author :
Gunnemann, Stephan ; Phuong Dao ; Jamali, Mohsin ; Ester, Martin
Author_Institution :
RWTH Aachen Univ., Aachen, Germany
fYear :
2012
fDate :
10-13 Dec. 2012
Firstpage :
270
Lastpage :
279
Abstract :
Assessing the significance of data mining results is an important step in the knowledge discovery process. While results might appear interesting at a first glance, they can often be explained by already known characteristics of the data. Randomization is an established technique for significance testing, and methods to assess data mining results on vector data or network data have been proposed. In many applications, however, both sources are simultaneously given. Since these sources are rarely independent of each other but highly correlated, naively applying existing randomization methods on each source separately is questionable. In this work, we present a method to assess the significance of mining results on graphs with binary features vectors. We propose a novel null model that preserves correlation information between both sources. Our randomization exploits an adaptive Metropolis sampling and interweaves attribute randomization and graph randomization steps. In thorough experiments, we demonstrate the application of our technique. Our results indicate that while simultaneously using both sources is beneficial, often one source of information is dominant for determining the mining results.
Keywords :
data mining; graph theory; sampling methods; vectors; adaptive Metropolis sampling; attribute randomization; binary feature vector; correlation information; data mining result; graph randomization; information source; knowledge discovery process; network data; randomization technique; significance testing; vector data; Clustering algorithms; Correlation; Data mining; Data models; Markov processes; Testing; Vectors; data mining; graph; network; randomization; significance testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
ISSN :
1550-4786
Print_ISBN :
978-1-4673-4649-8
Type :
conf
DOI :
10.1109/ICDM.2012.70
Filename :
6413896
Link To Document :
بازگشت