This paper solves the classical two-armed-bandit problem under the finite-memory constraint described below. Given are probability densities

and

, and two experiments

and

. It is not known which density is associated with which experiment. Thus the experimental outcome

of experiment

is as likely to be distributed according to

as it is to be distributed according to

. It is desired to sequentially choose an experiment to be performed on the basis of past observations according to the algorithm

, where

is the state of memory at time

is the choice of experiment, and

, is the random variable observation. The goal is to maximize the asymptotic proportion

of uses of the experiment associated with density

. Let

, and let

and

denote the almost everywhere greatest lower bound and least upper bound on

. Let

. Then the optimal value of

, over all

-state algorithms

, will be shown to be

. An

-optimal family of

-state algorithms will be demonstrated. In general, optimal algorithms do not exist, and

-optimal algorithms require artificial randomization.