Skip to yearly menu bar Skip to main content


Session

Other Applications 1

Abstract:
Chat is not available.

Wed 11 July 4:30 - 4:50 PDT

Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design

Ahmed M. Alaa · Mihaela van der Schaar

Estimating heterogeneous treatment effects fromobservational data is a central problem in manydomains. Because counterfactual data is inaccessible,the problem differs fundamentally fromsupervised learning, and entails a more complexset of modeling choices. Despite a variety of recentlyproposed algorithmic solutions, a principledguideline for building estimators of treatmenteffects using machine learning algorithmsis still lacking. In this paper, we provide such aguideline by characterizing the fundamental limitsof estimating heterogeneous treatment effects,and establishing conditions under which theselimits can be achieved. Our analysis reveals thatthe relative importance of the different aspectsof observational data vary with the sample size.For instance, we show that selection bias mattersonly in small-sample regimes, whereas witha large sample size, the way an algorithm modelsthe control and treated outcomes is what bottlenecksits performance. Guided by our analysis,we build a practical algorithm for estimatingtreatment effects using a non-stationary Gaussianprocesses with doubly-robust hyperparameters.Using a standard semi-synthetic simulationsetup, we show that our algorithm outperformsthe state-of-the-art, and that the behavior of existingalgorithms conforms with our analysis.

Wed 11 July 4:50 - 5:10 PDT

Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization

Hang Wu · May Wang

Off-policy learning, the task of evaluating and improving policies using historic data collected from a logging policy, is important because on-policy evaluation is usually expensive and has adverse impacts.One of the major challenge of off-policy learning is to derive counterfactual estimators that also has low variance and thus low generalization error.In this work, inspired by learning bounds for importance sampling problems, we present a new counterfactual learning principle for off-policy learning with bandit feedbacks.Our method regularizes the generalization error by minimizing the distribution divergence between the logging policy and the new policy, and removes the need for iterating through all training samples to compute sample variance regularization in prior work.With neural network policies, our end-to-end training algorithms using variational divergence minimization showed significant improvement over conventional baseline algorithms and is also consistent with our theoretical results.

Wed 11 July 5:10 - 5:20 PDT

An Estimation and Analysis Framework for the Rasch Model

Andrew Lan · Mung Chiang · Christoph Studer

The Rasch model is widely used for item response analysis in applications ranging from recommender systems to psychology, education, and finance. While a number of estimators have been proposed for the Rasch model over the last decades, the associated analytical performance guarantees are mostly asymptotic. This paper provides a framework that relies on a novel linear minimum mean-squared error (L-MMSE) estimator which enables an exact, nonasymptotic, and closed-form analysis of the parameter estimation error under the Rasch model. The proposed framework provides guidelines on the number of items and responses required to attain low estimation errors in tests or surveys. We furthermore demonstrate its efficacy on a number of real-world collaborative filtering datasets, which reveals that the proposed L-MMSE estimator performs on par with state-of-the-art nonlinear estimators in terms of predictive performance.

Wed 11 July 5:20 - 5:30 PDT

End-to-end Active Object Tracking via Reinforcement Learning

Wenhan Luo · Peng Sun · Fangwei Zhong · Wei Liu · Tong Zhang · Yizhou Wang

We study active object tracking, where a tracker takes as input the visual observation (\ie, frame sequence) and produces the camera control signal (\eg, move forward, turn left, \etc). Conventional methods tackle the tracking and the camera control separately, which is challenging to tune jointly. It also incurs many human efforts for labeling and many expensive trial-and-errors in real-world. To address these issues, we propose, in this paper, an end-to-end solution via deep reinforcement learning, where a ConvNet-LSTM function approximator is adopted for the direct frame-to-action prediction. We further propose an environment augmentation technique and a customized reward function, which are crucial for a successful training. The tracker trained in simulators (ViZDoom, Unreal Engine) shows good generalization in the case of unseen object moving path, unseen object appearance, unseen background, and distracting object. It can restore tracking when occasionally losing the target. With the experiments over the VOT dataset, we also find that the tracking ability, obtained solely from simulators, can potentially transfer to real-world scenarios.