DocumentCode :
659588
Title :
Exploring sketches for probability estimation with sublinear memory
Author :
Kleerekoper, Anthony ; Lujan, Mikel ; Brown, G.
Author_Institution :
Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
79
Lastpage :
86
Abstract :
As data sets become ever larger it becomes increasingly complex to apply traditional machine learning techniques to them. Feature selection can greatly reduce the computational requirements of machine learning but it too can be memory intensive. In this paper we explore the use of succinct data structures called sketches for probability estimation as a component of information theoretic feature selection. These data structures are sublinear in the number of items but were designed only for estimating the frequency of the most frequent items. To the best of our knowledge this is the first time they have been examined for estimating the frequency of all items and we find that often some information theoretic measures can be estimated to within a few percent of the correct values.
Keywords :
data handling; data structures; information theory; learning (artificial intelligence); probability; computational requirements; data sets; frequent items; information theoretic feature selection; information theoretic measures; machine learning techniques; probability estimation; sublinear memory; succinct data structures; Entropy; Equations; Estimation; Frequency estimation; Mathematical model; Radiation detectors; Standards; big data; information theoretic feature selection; machine learning; memory efficiency; sketch data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691737
Filename :
6691737
Link To Document :
بازگشت