Department of Computer Science | Institute of Theoretical Computer Science

We consider the problem of continuum armed bandits where the arms are
indexed by a compact subset of \R^d. For large d, it is well known that
mere smoothness assumptions on the reward functions lead to regret
bounds that suffer from the curse of dimensionality. A typical way to
tackle this in the literature has been to make further assumptions on
the structure of reward functions. In this work we assume the reward
functions to be intrinsically of low dimension k\ll d and consider two
models: (i) The reward functions depend on only an unknown subset of k
coordinate variables and, (ii) a generalization of (i) where the
reward functions depend on an unknown k dimensional subspace of Rd. By
placing suitable assumptions on the smoothness of the rewards we
derive randomized algorithms for both problems that achieve nearly
optimal regret bounds in terms of the number of rounds n.