ICML Beijing

Download Pic

Yuan Zhou and Xi Chen and Jian Li

Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing (pdf)

We study the problem of selecting $K$ arms with the highest expected rewards in a stochastic $N$-armed bandit game. Instead of using existing evaluation metrics (e.g., misidentification probability or the metric in EXPLORE-K), we propose to use the aggregate regret, which is defined as the gap between the average reward of the optimal solution and that of our solution. Besides being a natural metric by itself, we argue that in many applications, such as our motivating example from crowdsourcing, the aggregate regret bound is more suitable. We propose a new PAC algorithm, which, with probability at least $1-\delta$, identifies a set of $K$ arms with regret at most $\epsilon$. We provide the sample complexity bound of our algorithm. To complement, we establish the lower bound and show that the sample complexity of our algorithm matches the lower bound. Finally, we report experimental results on both synthetic and real data sets, which demonstrates the superior performance of the proposed algorithm.

Download PDF

Download Pic

Yevgeny Seldin and Peter Bartlett and Koby Crammer and Yasin Abbasi-Yadkori

Prediction with Limited Advice and Multiarmed Bandits with Paid Observations (pdf)

We study two problems of online learning under restricted information access. In the first problem, \emph{prediction with limited advice

Download PDF

Download Pic

Cun Mu and Bo Huang and John Wright and Donald Goldfarb

Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery (pdf)

Recovering a low-rank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms (SNN) of the unfolding matrices of the tensor. We show that this approach can be substantially suboptimal: reliably recovering a $K$-way $n$$\times$$n$$\times$$\cdots$$\times n$ tensor of Tucker rank $(r, r, \ldots, r)$ from Gaussian measurements requires $\Omega( r n^{K-1

Download PDF

Download Pic

Haipeng Luo and Robert Schapire

Towards Minimax Online Learning with Unknown Time Horizon (pdf)

We consider online learning when the time horizon is unknown. We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some distribution, and the other which allows the adversary full control over the horizon. For the random horizon setting with restricted losses, we derive a fully optimal minimax algorithm. And for the adversarial horizon setting, we prove a nontrivial lower bound which shows that the adversary obtains strictly more power than when the horizon is fixed and known. Based on the minimax solution of the random horizon setting, we then propose a new adaptive algorithm which ``pretends'' that the horizon is drawn from a distribution from a special family, but no matter how the actual horizon is chosen, the worst-case regret is of the optimal rate. Furthermore, our algorithm can be combined and applied in many ways, for instance, to online convex optimization, follow the perturbed leader, exponential weights algorithm and first order bounds. Experiments show that our algorithm outperforms many other existing algorithms in an online linear optimization setting.

Download PDF

Download Pic

Christopher Tosh and Sanjoy Dasgupta

Lower Bounds for the Gibbs Sampler over Mixtures of Gaussians (pdf)

The mixing time of a Markov chain is the minimum time $t$ necessary for the total variation distance between the distribution of the Markov chain's current state $X_t$ and its stationary distribution to fall below some $\epsilon > 0$. In this paper, we present lower bounds for the mixing time of the Gibbs sampler over Gaussian mixture models with Dirichlet priors.

Download PDF

Download Pic

Zongzhang Zhang and David Hsu and Wee Sun Lee

Covering Number for Efficient Heuristic-based POMDP Planning (pdf)

The difficulty of POMDP planning depends on the size of the search space involved. Heuristics are often used to reduce the search space size and improve computational efficiency; however, there are few theoretical bounds on their effectiveness. In this paper, we use the covering number to characterize the size of the search space reachable under heuristics and connect the complexity of POMDP planning to the effectiveness of heuristics. With insights from the theoretical analysis, we have developed a practical POMDP algorithm, Packing-Guided Value Iteration (PGVI). Empirically, PGVI is competitive with the state-of-the-art point-based POMDP algorithms on 65 small benchmark problems and outperforms them on 4 larger problems.

Download PDF