At each instant of time we are required to sample a fixed number

out of

i.i.d, processes whose distributions belong to a family suitably parameterized by a real number

. The objective is to maximize the long run total expected value of the samples. Following Lai and Robbins, the learning loss of a sampling scheme corresponding to a configuration of parameters

is quantified by the regret

. This is the difference between the maximum expected reward at time

that could be achieved if

were known and the expected reward actually obtained by the sampling scheme. We provide a lower bound for the regret associated with any uniformly good scheme, and construct a scheme which attains the lower bound for every configuration

. The lower bound is given explicitly in terms of the Kullback-Liebler number between pairs of distributions. Part II of this paper considers the same problem when the reward processes are Markovian.