Spotlight Poster
Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning
Armin Behnamnia · Gholamali Aminian · Alireza Aghaei · Chengchun Shi · Vincent Tan · Hamid R Rabiee
West Exhibition Hall B2-B3 #W-705
In many real-world applications, we often want to learn or evaluate decision-making systems (like recommending products or showing ads) using data that was collected in the past, rather than running new experiments. This setup is called off-policy learning and evaluation. The data usually includes the situation (context), the action taken, how likely that action was to be taken (called the propensity score), and the result (feedback or reward).However, this approach can run into problems—especially when the recorded action probabilities are inaccurate or when the feedback is noisy and unpredictable. These issues can make the learning unstable and unreliable.In this work, we propose a new method that uses a mathematical tool called the log-sum-exponential (LSE) operator. Compared to standard techniques, our method is more stable and less sensitive to noisy or extreme feedback. We provide mathematical guarantees showing how close our method’s results are to the best possible outcome, and we explain how this closeness improves as we get more data.We also tested our method on a variety of tasks. The results show that it performs well in practice, especially in difficult situations where existing methods struggle.
Live content is unavailable. Log in and register to view live content