• DocumentCode
    2188968
  • Title

    A Similarity Measure Using Smallest Context-Free Grammars

  • Author

    Cerra, Daniele ; Datcu, Mihai

  • Author_Institution
    Remote Sensing Technol. Inst., German Aerosp. Center (DLR), Wessling, Germany
  • fYear
    2010
  • fDate
    24-26 March 2010
  • Firstpage
    346
  • Lastpage
    355
  • Abstract
    This work presents a new approximation for the Kolmogorov complexity of strings based on compression with smallest Context Free Grammars (CFG). If, for a given string, a dictionary containing its relevant patterns may be regarded as a model, a Context-Free Grammar may represent a generative model, with all of its rules (and as a consequence its own size) being meaningful. Thus, we define a new complexity approximation which takes into account the size of the string model, in a representation similar to the Minimum Description Length. These considerations result in the definition of a new compression-based similarity measure: its novelty lies in the fact that the impact of complexity overestimations, due to the limits that a real compressor has, can be accounted for and decreased.
  • Keywords
    computational complexity; context-free grammars; string matching; CFG; Kolmogorov complexity; complexity approximation; complexity overestimations; compression-based similarity measure; context-free grammars; dictionary; generative model; minimum description length; string model; Clustering algorithms; Context modeling; Data compression; Data mining; Data structures; Dictionaries; Entropy; Image coding; Information theory; Remote sensing; Compression-based Similarity Measure; Context Free Grammars; Kolmogorov Complexity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2010
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    978-1-4244-6425-8
  • Electronic_ISBN
    1068-0314
  • Type

    conf

  • DOI
    10.1109/DCC.2010.37
  • Filename
    5453478