• DocumentCode
    59928
  • Title

    Normalized Compression Distance of Multisets with Applications

  • Author

    Cohen, Andrew R. ; Vitanyi, Paul M. B.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Drexel Univ., Philadelphia, PA, USA
  • Volume
    37
  • Issue
    8
  • fYear
    2015
  • fDate
    Aug. 1 2015
  • Firstpage
    1602
  • Lastpage
    1614
  • Abstract
    Pairwise normalized compression distance (NCD) is a parameter-free, feature-free, alignment-free, similarity metric based on compression. We propose an NCD of multisets that is also metric. Previously, attempts to obtain such an NCD failed. For classification purposes it is superior to the pairwise NCD in accuracy and implementation complexity. We cover the entire trajectory from theoretical underpinning to feasible practice. It is applied to biological (stem cell, organelle transport) and OCR classification questions that were earlier treated with the pairwise NCD. With the new method we achieved significantly better results. The theoretic foundation is Kolmogorov complexity.
  • Keywords
    bioinformatics; computational complexity; image classification; optical character recognition; Kolmogorov complexity; NCD; OCR classification; biological task; multisets; normalized compression distance; organelle transport; pairwise NCD; pairwise normalized compression distance; parameter-free-feature-free-alignment-free similarity metric; stem cell; Accuracy; Additives; Complexity theory; Educational institutions; Measurement; Pattern recognition; Retina; Kolmogorov complexity; Normalized compression distance; classification; data mining; handwritten character recognition; multisets or multiples; organelle transport; pattern recognition; retinal progenitor cells; similarity; synthetic data;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2014.2375175
  • Filename
    6967789