DocumentCode :
659454
Title :
An NML-based model selection criterion for general relational data modeling
Author :
Sakai, Yoshiki ; Yamanishi, Kenji
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
421
Lastpage :
429
Abstract :
Whereas the main interest in most existing data mining approaches has been sequence data on a single type of object, namely attribute data, real-world databases store information about multiple relationships between various classes of objects. The modeling of these general relational data (GRD) plays an important role in eliciting knowledge across multiple relations. It is not reasonable to directly apply existing modeling methods to GRD, because GRD have statistical properties that distinguish them from attribute data. In this paper, we address the issue of statistical model selection in GRD modeling. From the viewpoint of the minimum description length principle, we propose a new model selection criterion by considering the statistical properties of GRD. We employ the normalized maximum likelihood code-length as a model selection criterion, and provide an asymptotic expansion theorem for its application to GRD modeling. To demonstrate its use in a critical application, we apply our proposed criterion to the issue of model selection in relational data clustering. An experiment using artificial datasets demonstrates the effectiveness of our technique compared to other criteria, and we also present a brand analysis using real beer-purchase data.
Keywords :
maximum likelihood estimation; pattern clustering; relational databases; GRD statistical properties; NML-based model selection criterion; asymptotic expansion theorem; attribute data; data mining approach; general relational data modeling; knowledge elicitation; minimum description length principle; normalized maximum likelihood code-length; real-world databases; relational data clustering; statistical model selection; Approximation methods; Bayes methods; Computational modeling; Data mining; Data models; Probabilistic logic; Stochastic processes; model selection; normalized maximum likelihood code-length; relational data; stochastic block model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691603
Filename :
6691603
Link To Document :
بازگشت