Title :
Metric Factorization for Exploratory Analysis of Complex Data
Author_Institution :
Helmholtz Zentrum Munchen, Tech. Univ. Munchen, Munich, Germany
Abstract :
How to explore complex data? Often, several representations for each data object are available, the data are described by attributes of heterogeneous data type and/or each data object is characterized by many features. It is difficult to choose a suitable similarity measure and an appropriate data mining technique to get an unbiased overview on the information contained in complex data. In this paper, we introduce Metric Factorization as a novel data mining task. The goal of Metric Factorization is to discover the major alternative views of complex data. Our novel algorithm MF extends matrix factorization techniques to support metric data. We do not need to choose a single similarity measure but can just input any available metric. Metric Factorization builds automatically interesting basis spaces from a large variety of input metrics. Due to metric properties, the basis spaces can be further explored with standard techniques like Multidimensional Scaling. We relate the Metric Factorization task to data compression and demonstrate how ideas from information theory (Minimum Description Length principle) make the parametrization of MF optional. We further introduce the idea of landmark points to effectively compress and thus support large data sets. Extensive experiments demonstrate the benefits of our approach.
Keywords :
data analysis; data compression; data mining; information theory; matrix decomposition; MF optional parametrization; data compression; data mining technique; heterogeneous data type; information theory; metric factorization; minimum description length principle; multidimensional scaling; similarity measure; Data mining; Encoding; Feature extraction; Image color analysis; Measurement; Optimization; Standards; MDL; Matrix factorization; metric data;
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4303-6
DOI :
10.1109/ICDM.2014.57