• DocumentCode
    3226424
  • Title

    A Model Conditioned Data Compression Based Similarity Measure

  • Author

    Cerra, D. ; Datcu, M.

  • Author_Institution
    German Aerosp. Center, Wessling
  • fYear
    2008
  • fDate
    25-27 March 2008
  • Firstpage
    509
  • Lastpage
    509
  • Abstract
    Many methodologies and similarity measures based on data compression have been recently introduced to compute similarities between general kinds of data. Two important similarity indices are the normalized information distance (NID), with its approximation normalized compression distance (NCD), and the pattern recognition based on data compression (PRDC). At first sight NCD and PRDC are quite different: the former is a direct metric while the latter is a methodology which computes a compression distance with an intermediate step of encoding files into texts. In spite of this, it is possible to demonstrate that they are both based on estimates of Kolmogorov complexities (when this is known for the former but not for the latter). Finally, this results in the definition of a new measure: the model conditioned data compression based similarity measure (McDCSM), which is a modified version of PRDC, and is the topic of this paper.
  • Keywords
    data compression; encoding; Kolmogorov complexities; encoding; model conditioned data compression based similarity measure; normalized compression distance; normalized information distance; pattern recognition based on data compression; Data compression; Dictionaries; Encoding; Equations; Length measurement; Machine intelligence; Pattern analysis; Pattern recognition; Remote sensing; Satellites; Kolmogorov Complexity; Normalized Compression Distance; Similarity Measure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2008. DCC 2008
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    978-0-7695-3121-2
  • Type

    conf

  • DOI
    10.1109/DCC.2008.46
  • Filename
    4483336