DocumentCode :
3105742
Title :
What is the Dimension of Your Binary Data?
Author :
Tatti, Nikolaj ; Mielikäinen, Taneli ; Gionis, Aristides ; Mannila, Heikki
Author_Institution :
Dept. of Comput. Sci., Univ. of Helsinki, Helsinki
fYear :
2006
fDate :
18-22 Dec. 2006
Firstpage :
603
Lastpage :
612
Abstract :
Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensionality of such a dataset is a nontrivial problem. We consider the problem of defining a robust measure of dimension for 0/1 datasets, and show that the basic idea of fractal dimension can be adapted for binary data. However, as such the fractal dimension is difficult to interpret. Hence we introduce the concept of normalized fractal dimension. For a dataset D, its normalized fractal dimension counts the number of independent columns needed to achieve the unnormalized fractal dimension of D. The normalized fractal dimension measures the degree of dependency structure of the data. We study the properties of the normalized fractal dimension and discuss its computation. We give empirical results on the normalized fractal dimension, comparing it against PCA.
Keywords :
data handling; data mining; principal component analysis; binary data; datasets; dependency structure; fractal dimension; principal component analysis; Computer science; Data analysis; Data mining; Fractals; Linear discriminant analysis; Matrix decomposition; Principal component analysis; Random variables; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location :
Hong Kong
ISSN :
1550-4786
Print_ISBN :
0-7695-2701-7
Type :
conf
DOI :
10.1109/ICDM.2006.167
Filename :
4053086
Link To Document :
بازگشت