DocumentCode
244971
Title
Metric Factorization for Exploratory Analysis of Complex Data
Author
Plant, Claudia
Author_Institution
Helmholtz Zentrum Munchen, Tech. Univ. Munchen, Munich, Germany
fYear
2014
fDate
14-17 Dec. 2014
Firstpage
510
Lastpage
519
Abstract
How to explore complex data? Often, several representations for each data object are available, the data are described by attributes of heterogeneous data type and/or each data object is characterized by many features. It is difficult to choose a suitable similarity measure and an appropriate data mining technique to get an unbiased overview on the information contained in complex data. In this paper, we introduce Metric Factorization as a novel data mining task. The goal of Metric Factorization is to discover the major alternative views of complex data. Our novel algorithm MF extends matrix factorization techniques to support metric data. We do not need to choose a single similarity measure but can just input any available metric. Metric Factorization builds automatically interesting basis spaces from a large variety of input metrics. Due to metric properties, the basis spaces can be further explored with standard techniques like Multidimensional Scaling. We relate the Metric Factorization task to data compression and demonstrate how ideas from information theory (Minimum Description Length principle) make the parametrization of MF optional. We further introduce the idea of landmark points to effectively compress and thus support large data sets. Extensive experiments demonstrate the benefits of our approach.
Keywords
data analysis; data compression; data mining; information theory; matrix decomposition; MF optional parametrization; data compression; data mining technique; heterogeneous data type; information theory; metric factorization; minimum description length principle; multidimensional scaling; similarity measure; Data mining; Encoding; Feature extraction; Image color analysis; Measurement; Optimization; Standards; MDL; Matrix factorization; metric data;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location
Shenzhen
ISSN
1550-4786
Print_ISBN
978-1-4799-4303-6
Type
conf
DOI
10.1109/ICDM.2014.57
Filename
7023368
Link To Document