Learning Sums of Independent Integer Random Variables

Author

Daskalakis, Constantinos ; Diakonikolas, Ilias ; ODonnell, Ryan ; Servedio, Rocco A. ; Li-Yang Tan

fYear

2013

fDate

26-29 Oct. 2013

Firstpage

217

Lastpage

226

Abstract

Let S = X₁+···+X_n be a sum of n independent integer random variables X_i, where each X_i is supported on {0, 1, ..., k - 1} but otherwise may have an arbitrary distribution (in particular the Xi´s need not be identically distributed). How many samples are required to learn the distribution S to high accuracy? In this paper we show that the answer is completely independent of n, and moreover we give a computationally efficient algorithm which achieves this low sample complexity. More precisely, our algorithm learns any such S to ε-accuracy (with respect to the total variation distance between distributions) using poly(k, 1/ε) samples, independent of n. Its running time is poly(k, 1/ε) in the standard word RAM model. Thus we give a broad generalization of the main result of [DDS12b] which gave a similar learning result for the special case k = 2 (when the distribution S is a Poisson Binomial Distribution). Prior to this work, no nontrivial results were known for learning these distributions even in the case k = 3. A key difficulty is that, in contrast to the case of k = 2, sums of independent {0, 1, 2}-valued random variables may behave very differently from (discretized) normal distributions, and in fact may be rather complicated - they are not log-concave, they can be Θ(n)-modal, there is no relationship between Kolmogorov distance and total variation distance for the class, etc. Nevertheless, the heart of our learning result is a new limit theorem which characterizes what the sum of an arbitrary number of arbitrary independent {0, 1, ... , k-1}-valued random variables may look like. Previous limit theorems in this setting made strong assumptions on the “shift invariance” of the random variables Xi in order to force a discretized normal limit. We believe that our new limit theorem, as the first result for truly arbitrary sums of independent {0, 1, ... - k-1}-valued random variables, is of independent interest.

Keywords

binomial distribution; computational complexity; stochastic processes; Kolmogorov distance; Poisson binomial distribution; computational complexity; independent integer random variables; normal distributions; sums of discrete distribution learning; Accuracy; Approximation methods; Complexity theory; Digital TV; Gaussian distribution; Random variables; Standards; discrete distribution learning; limit theorem; sums of independent integer random variables;

fLanguage

English

Publisher

ieee

Conference_Titel

Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on

Conference_Location

Berkeley, CA

ISSN

0272-5428

Type

conf

DOI

10.1109/FOCS.2013.31

Filename

6686157

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=655188