DocumentCode :
3756622
Title :
Bloom Features
Author :
Ashok Cutkosky;Kwabena Boahen
Author_Institution :
Comput. Sci. Dept., Stanford Univ., Stanford, CA, USA
fYear :
2015
Firstpage :
547
Lastpage :
552
Abstract :
We introduce a method for function-fitting that achieves high accuracy with a low memory footprint. For d-dimensional data and any user-specified m, we define a feature map from d to m dimensional Euclidean space with memory footprint O(m) that scales as follows: As m increases, the space of linear functions on our m-dimensional features approximates any MAX (or boolean OR) function on the d-dimensional inputs with expected error inversely proportional to m. Our method is the only one in existence with this scaling that can simultaneously run in O(m) time, process real-value inputs, and approximate non-linear functions, properties respectively not achieved by random Fourier features, b-bit Minwise Hashing, and Vowpal Wabbit, three competing methods. We achieve all three properties by using hashing (O(m) space) to implement a sparse-matrix multiply (O(m) time) with addition replaced by MAX (non-linear approximation). As these techniques are inspired by the Bloom filter, we call the vectors produced by our mapping Bloom features. We demonstrate that the scaling prefactors are reasonable by testing our method on simulated (Dirichlet distributions) and real (MNIST and webspam) datasets.
Keywords :
"Sparse matrices","Computer science","Testing","Memory management","Loss measurement","Linear regression","Bars"
Publisher :
ieee
Conference_Titel :
Computational Science and Computational Intelligence (CSCI), 2015 International Conference on
Type :
conf
DOI :
10.1109/CSCI.2015.144
Filename :
7424153
Link To Document :
بازگشت