DocumentCode
659588
Title
Exploring sketches for probability estimation with sublinear memory
Author
Kleerekoper, Anthony ; Lujan, Mikel ; Brown, G.
Author_Institution
Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
79
Lastpage
86
Abstract
As data sets become ever larger it becomes increasingly complex to apply traditional machine learning techniques to them. Feature selection can greatly reduce the computational requirements of machine learning but it too can be memory intensive. In this paper we explore the use of succinct data structures called sketches for probability estimation as a component of information theoretic feature selection. These data structures are sublinear in the number of items but were designed only for estimating the frequency of the most frequent items. To the best of our knowledge this is the first time they have been examined for estimating the frequency of all items and we find that often some information theoretic measures can be estimated to within a few percent of the correct values.
Keywords
data handling; data structures; information theory; learning (artificial intelligence); probability; computational requirements; data sets; frequent items; information theoretic feature selection; information theoretic measures; machine learning techniques; probability estimation; sublinear memory; succinct data structures; Entropy; Equations; Estimation; Frequency estimation; Mathematical model; Radiation detectors; Standards; big data; information theoretic feature selection; machine learning; memory efficiency; sketch data structures;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691737
Filename
6691737
Link To Document