مرکز منطقه ای اطلاع رساني علوم و فناوري

Abstract :

We introduce a method for function-fitting that achieves high accuracy with a low memory footprint. For d-dimensional data and any user-specified m, we define a feature map from d to m dimensional Euclidean space with memory footprint O(m) that scales as follows: As m increases, the space of linear functions on our m-dimensional features approximates any MAX (or boolean OR) function on the d-dimensional inputs with expected error inversely proportional to m. Our method is the only one in existence with this scaling that can simultaneously run in O(m) time, process real-value inputs, and approximate non-linear functions, properties respectively not achieved by random Fourier features, b-bit Minwise Hashing, and Vowpal Wabbit, three competing methods. We achieve all three properties by using hashing (O(m) space) to implement a sparse-matrix multiply (O(m) time) with addition replaced by MAX (non-linear approximation). As these techniques are inspired by the Bloom filter, we call the vectors produced by our mapping Bloom features. We demonstrate that the scaling prefactors are reasonable by testing our method on simulated (Dirichlet distributions) and real (MNIST and webspam) datasets.