• DocumentCode
    244971
  • Title

    Metric Factorization for Exploratory Analysis of Complex Data

  • Author

    Plant, Claudia

  • Author_Institution
    Helmholtz Zentrum Munchen, Tech. Univ. Munchen, Munich, Germany
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    510
  • Lastpage
    519
  • Abstract
    How to explore complex data? Often, several representations for each data object are available, the data are described by attributes of heterogeneous data type and/or each data object is characterized by many features. It is difficult to choose a suitable similarity measure and an appropriate data mining technique to get an unbiased overview on the information contained in complex data. In this paper, we introduce Metric Factorization as a novel data mining task. The goal of Metric Factorization is to discover the major alternative views of complex data. Our novel algorithm MF extends matrix factorization techniques to support metric data. We do not need to choose a single similarity measure but can just input any available metric. Metric Factorization builds automatically interesting basis spaces from a large variety of input metrics. Due to metric properties, the basis spaces can be further explored with standard techniques like Multidimensional Scaling. We relate the Metric Factorization task to data compression and demonstrate how ideas from information theory (Minimum Description Length principle) make the parametrization of MF optional. We further introduce the idea of landmark points to effectively compress and thus support large data sets. Extensive experiments demonstrate the benefits of our approach.
  • Keywords
    data analysis; data compression; data mining; information theory; matrix decomposition; MF optional parametrization; data compression; data mining technique; heterogeneous data type; information theory; metric factorization; minimum description length principle; multidimensional scaling; similarity measure; Data mining; Encoding; Feature extraction; Image color analysis; Measurement; Optimization; Standards; MDL; Matrix factorization; metric data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.57
  • Filename
    7023368