Timezone: »
Off-policy evaluation (OPE) is to evaluate a target policy with data generated by other policies. Most previous OPE methods focus on precisely estimating the true performance of a policy. We observe that in many applications, (1) the end goal of OPE is to compare two or multiple candidate policies and choose a good one, which is a much simpler task than precisely evaluating their true performance; and (2) there are usually multiple policies that have been deployed to serve users in real-world systems and thus the true performance of these policies can be known. Inspired by the two observations, in this work, we study a new problem, supervised off-policy ranking (SOPR), which aims to rank a set of target policies based on supervised learning by leveraging off-policy data and policies with known performance. We propose a method to solve SOPR, which learns a policy scoring model by minimizing a ranking loss of the training policies rather than estimating the precise policy performance. The scoring model in our method, a hierarchical Transformer based model, maps a set of state-action pairs to a score, where the state of each pair comes from the off-policy data and the action is taken by a target policy on the state in an offline manner. Extensive experiments on public datasets show that our method outperforms baseline methods in terms of rank correlation, regret value, and stability. Our code is publicly available at GitHub.
Author Information
Yue Jin (Tsinghua University)
Yue Zhang (University of Science and Technology of China)
Tao Qin (Microsoft Research Asia)
Xudong Zhang (Tsinghua university)
Jian Yuan (Tsinghua University)
Houqiang Li (University of Science and Technology of China)
Tie-Yan Liu (Microsoft Research Asia)
Tie-Yan Liu is a principal researcher of Microsoft Research Asia, leading the research on artificial intelligence and machine learning. He is very well known for his pioneer work on learning to rank and computational advertising, and his recent research interests include deep learning, reinforcement learning, and distributed machine learning. Many of his technologies have been transferred to Microsoft’s products and online services (such as Bing, Microsoft Advertising, and Azure), and open-sourced through Microsoft Cognitive Toolkit (CNTK), Microsoft Distributed Machine Learning Toolkit (DMTK), and Microsoft Graph Engine. On the other hand, he has been actively contributing to academic communities. He is an adjunct/honorary professor at Carnegie Mellon University (CMU), University of Nottingham, and several other universities in China. His papers have been cited for tens of thousands of times in refereed conferences and journals. He has won quite a few awards, including the best student paper award at SIGIR (2008), the most cited paper award at Journal of Visual Communications and Image Representation (2004-2006), the research break-through award (2012) and research-team-of-the-year award (2017) at Microsoft Research, and Top-10 Springer Computer Science books by Chinese authors (2015), and the most cited Chinese researcher by Elsevier (2017). He has been invited to serve as general chair, program committee chair, local chair, or area chair for a dozen of top conferences including SIGIR, WWW, KDD, ICML, NIPS, IJCAI, AAAI, ACL, ICTIR, as well as associate editor of ACM Transactions on Information Systems, ACM Transactions on the Web, and Neurocomputing. Tie-Yan Liu is a fellow of the IEEE, a distinguished member of the ACM, and a vice chair of the CIPS information retrieval technical committee.
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Supervised Off-Policy Ranking »
Tue. Jul 19th 06:45 -- 06:50 PM Room Room 309
More from the Same Authors
-
2023 Poster: Retrosynthetic Planning with Dual Value Networks »
Guoqing Liu · Di Xue · Shufang Xie · Yingce Xia · Austin Tripp · Krzysztof Maziarz · Marwin Segler · Tao Qin · Zongzhang Zhang · Tie-Yan Liu -
2022 Poster: SE(3) Equivariant Graph Neural Networks with Complete Local Frames »
weitao du · He Zhang · Yuanqi Du · Qi Meng · Wei Chen · Nanning Zheng · Bin Shao · Tie-Yan Liu -
2022 Spotlight: SE(3) Equivariant Graph Neural Networks with Complete Local Frames »
weitao du · He Zhang · Yuanqi Du · Qi Meng · Wei Chen · Nanning Zheng · Bin Shao · Tie-Yan Liu -
2022 Poster: Analyzing and Mitigating Interference in Neural Architecture Search »
Jin Xu · Xu Tan · Kaitao Song · Renqian Luo · Yichong Leng · Tao Qin · Tie-Yan Liu · Jian Li -
2022 Poster: Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent »
Weiming Liu · Huacong Jiang · Bin Li · Houqiang Li -
2022 Spotlight: Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent »
Weiming Liu · Huacong Jiang · Bin Li · Houqiang Li -
2022 Spotlight: Analyzing and Mitigating Interference in Neural Architecture Search »
Jin Xu · Xu Tan · Kaitao Song · Renqian Luo · Yichong Leng · Tao Qin · Tie-Yan Liu · Jian Li -
2021 Poster: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Weizhen Qi · Yeyun Gong · Jian Jiao · Yu Yan · Weizhu Chen · Dayiheng Liu · Kewen Tang · Houqiang Li · Jiusheng Chen · Ruofei Zhang · Ming Zhou · Nan Duan -
2021 Poster: Large Scale Private Learning via Low-rank Reparametrization »
Da Yu · Huishuai Zhang · Wei Chen · Jian Yin · Tie-Yan Liu -
2021 Spotlight: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Weizhen Qi · Yeyun Gong · Jian Jiao · Yu Yan · Weizhu Chen · Dayiheng Liu · Kewen Tang · Houqiang Li · Jiusheng Chen · Ruofei Zhang · Ming Zhou · Nan Duan -
2021 Spotlight: Large Scale Private Learning via Low-rank Reparametrization »
Da Yu · Huishuai Zhang · Wei Chen · Jian Yin · Tie-Yan Liu -
2021 Poster: The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks »
Bohan Wang · Qi Meng · Wei Chen · Tie-Yan Liu -
2021 Oral: The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks »
Bohan Wang · Qi Meng · Wei Chen · Tie-Yan Liu -
2021 Poster: Temporally Correlated Task Scheduling for Sequence Learning »
Xueqing Wu · Lewen Wang · Yingce Xia · Weiqing Liu · Lijun Wu · Shufang Xie · Tao Qin · Tie-Yan Liu -
2021 Poster: GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training »
Tianle Cai · Shengjie Luo · Keyulu Xu · Di He · Tie-Yan Liu · Liwei Wang -
2021 Spotlight: GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training »
Tianle Cai · Shengjie Luo · Keyulu Xu · Di He · Tie-Yan Liu · Liwei Wang -
2021 Spotlight: Temporally Correlated Task Scheduling for Sequence Learning »
Xueqing Wu · Lewen Wang · Yingce Xia · Weiqing Liu · Lijun Wu · Shufang Xie · Tao Qin · Tie-Yan Liu -
2020 Poster: On Layer Normalization in the Transformer Architecture »
Ruibin Xiong · Yunchang Yang · Di He · Kai Zheng · Shuxin Zheng · Chen Xing · Huishuai Zhang · Yanyan Lan · Liwei Wang · Tie-Yan Liu -
2020 Poster: Sequence Generation with Mixed Representations »
Lijun Wu · Shufang Xie · Yingce Xia · Yang Fan · Jian-Huang Lai · Tao Qin · Tie-Yan Liu -
2019 Poster: MASS: Masked Sequence to Sequence Pre-training for Language Generation »
Kaitao Song · Xu Tan · Tao Qin · Jianfeng Lu · Tie-Yan Liu -
2019 Poster: Efficient Training of BERT by Progressively Stacking »
Linyuan Gong · Di He · Zhuohan Li · Tao Qin · Liwei Wang · Tie-Yan Liu -
2019 Poster: Almost Unsupervised Text to Speech and Automatic Speech Recognition »
Yi Ren · Xu Tan · Tao Qin · Sheng Zhao · Zhou Zhao · Tie-Yan Liu -
2019 Oral: Efficient Training of BERT by Progressively Stacking »
Linyuan Gong · Di He · Zhuohan Li · Tao Qin · Liwei Wang · Tie-Yan Liu -
2019 Oral: MASS: Masked Sequence to Sequence Pre-training for Language Generation »
Kaitao Song · Xu Tan · Tao Qin · Jianfeng Lu · Tie-Yan Liu -
2019 Oral: Almost Unsupervised Text to Speech and Automatic Speech Recognition »
Yi Ren · Xu Tan · Tao Qin · Sheng Zhao · Zhou Zhao · Tie-Yan Liu -
2018 Poster: Towards Binary-Valued Gates for Robust LSTM Training »
Zhuohan Li · Di He · Fei Tian · Wei Chen · Tao Qin · Liwei Wang · Tie-Yan Liu -
2018 Oral: Towards Binary-Valued Gates for Robust LSTM Training »
Zhuohan Li · Di He · Fei Tian · Wei Chen · Tao Qin · Liwei Wang · Tie-Yan Liu -
2018 Poster: Model-Level Dual Learning »
Yingce Xia · Xu Tan · Fei Tian · Tao Qin · Nenghai Yu · Tie-Yan Liu -
2018 Oral: Model-Level Dual Learning »
Yingce Xia · Xu Tan · Fei Tian · Tao Qin · Nenghai Yu · Tie-Yan Liu -
2017 Poster: Dual Supervised Learning »
Yingce Xia · Tao Qin · Wei Chen · Jiang Bian · Nenghai Yu · Tie-Yan Liu -
2017 Talk: Dual Supervised Learning »
Yingce Xia · Tao Qin · Wei Chen · Jiang Bian · Nenghai Yu · Tie-Yan Liu