Title :
Exploring sketches for probability estimation with sublinear memory
Author :
Kleerekoper, Anthony ; Lujan, Mikel ; Brown, G.
Author_Institution :
Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK
Abstract :
As data sets become ever larger it becomes increasingly complex to apply traditional machine learning techniques to them. Feature selection can greatly reduce the computational requirements of machine learning but it too can be memory intensive. In this paper we explore the use of succinct data structures called sketches for probability estimation as a component of information theoretic feature selection. These data structures are sublinear in the number of items but were designed only for estimating the frequency of the most frequent items. To the best of our knowledge this is the first time they have been examined for estimating the frequency of all items and we find that often some information theoretic measures can be estimated to within a few percent of the correct values.
Keywords :
data handling; data structures; information theory; learning (artificial intelligence); probability; computational requirements; data sets; frequent items; information theoretic feature selection; information theoretic measures; machine learning techniques; probability estimation; sublinear memory; succinct data structures; Entropy; Equations; Estimation; Frequency estimation; Mathematical model; Radiation detectors; Standards; big data; information theoretic feature selection; machine learning; memory efficiency; sketch data structures;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691737