DocumentCode :
1472393
Title :
Information Distance in Multiples
Author :
Vitányi, Paul M B
Author_Institution :
Nat. Res. Center for Math. & Comput. Sci. in the Netherlands, Netherlands
Volume :
57
Issue :
4
fYear :
2011
fDate :
4/1/2011 12:00:00 AM
Firstpage :
2451
Lastpage :
2456
Abstract :
Information distance is a parameter-free similarity measure based on compression, used in pattern recognition, data mining, phylogeny, clustering and classification. The notion of information distance is extended from pairs to multiples (finite lists). We study maximal overlap, metricity, universality, minimal overlap, additivity and normalized information distance in multiples. We use the theoretical notion of Kolmogorov complexity which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program.
Keywords :
communication complexity; data mining; information theory; pattern classification; pattern clustering; Kolmogorov complexity; data mining; information distance; parameter-free similarity measure; pattern classification; pattern clustering; pattern recognition; phylogeny; Additives; Color; Complexity theory; Measurement; Pattern recognition; Proposals; Turing machines; Data mining; Kolmogorov complexity; information distance; multiples; pattern recognition; similarity;
fLanguage :
English
Journal_Title :
Information Theory, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9448
Type :
jour
DOI :
10.1109/TIT.2011.2110130
Filename :
5730590
Link To Document :
بازگشت