Timezone: »
Many reinforcement learning (RL) applications have combinatorial action spaces, where each action is a composition of sub-actions. A standard RL approach ignores this inherent factorization structure, resulting in a potential failure to make meaningful inferences about rarely observed sub-action combinations; this is particularly problematic for offline settings, where data may be limited. In this work, we propose a form of linear Q-function decomposition induced by factored action spaces. We study the theoretical properties of our approach, identifying scenarios where it is guaranteed to lead to zero bias when used to approximate the Q-function. Outside the regimes with theoretical guarantees, we show that our approach can still be useful because it leads to better sample efficiency without necessarily sacrificing policy optimality, allowing us to achieve a better bias-variance trade-off. Across several offline RL problems using simulators and real-world datasets motivated by healthcare problems, we demonstrate that incorporating factored action spaces into value-based RL can result in better-performing policies. Our approach can help an agent make more accurate inferences within under-explored regions of the state-action space when applying RL to observational datasets.
Author Information
Shengpu Tang (University of Michigan)

Shengpu Tang is a PhD candidate in the computer science department at the University of Michigan. He is a member of the Machine Learning for Data-Driven Decisions (MLD3) research group led by Jenna Wiens. His current research focuses on developing computational methods that help solve important problems in healthcare, such as risk stratification and dynamic treatment recommendations. More generally, he is interested in broader applications of AI/ML, reinforcement learning and graph mining, computer game design, self-driving cars, security and hacking, as well as teaching. For more details, see his website at https://shengpu-tang.me/
Maggie Makar (University of Michigan)
Michael Sjoding
Finale Doshi-Velez (Harvard University)

Finale Doshi-Velez is a Gordon McKay Professor in Computer Science at the Harvard Paulson School of Engineering and Applied Sciences. She completed her MSc from the University of Cambridge as a Marshall Scholar, her PhD from MIT, and her postdoc at Harvard Medical School. Her interests lie at the intersection of machine learning, healthcare, and interpretability. Selected Additional Shinies: BECA recipient, AFOSR YIP and NSF CAREER recipient; Sloan Fellow; IEEE AI Top 10 to Watch
Jenna Wiens (University of Michigan)
More from the Same Authors
-
2021 : Promises and Pitfalls of Black-Box Concept Learning Models »
· Anita Mahinpei · Justin Clark · Isaac Lage · Finale Doshi-Velez · Weiwei Pan -
2021 : Prediction-focused Mixture Models »
Abhishek Sharma · Sanjana Narayanan · Catherine Zeng · Finale Doshi-Velez -
2021 : Online structural kernel selection for mobile health »
Eura Shin · Predag Klasnja · Susan Murphy · Finale Doshi-Velez -
2021 : Interpretable learning-to-defer for sequential decision-making »
Shalmali Joshi · Sonali Parbhoo · Finale Doshi-Velez -
2021 : Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings »
Shengpu Tang · Jenna Wiens -
2021 : Interpretable learning-to-defer for sequential decision-making »
Shalmali Joshi · Sonali Parbhoo · Finale Doshi-Velez -
2021 : On formalizing causal off-policy sequential decision-making »
Sonali Parbhoo · Shalmali Joshi · Finale Doshi-Velez -
2022 : Causally motivated multi-shortcut identification and removal »
Jiayun Zheng · Maggie Makar -
2022 : Fairness and robustness in anti-causal prediction »
Maggie Makar · Alexander D'Amour -
2022 : Fairness and robustness in anti-causal prediction »
Maggie Makar · Alexander D'Amour -
2022 : From Soft Trees to Hard Trees: Gains and Losses »
Xin Zeng · Jiayu Yao · Finale Doshi-Velez · Weiwei Pan -
2022 : Success of Uncertainty-Aware Deep Models Depends on Data Manifold Geometry »
Mark Penrod · Harrison Termotto · Varshini Reddy · Jiayu Yao · Finale Doshi-Velez · Weiwei Pan -
2023 : Towards Modular Machine Learning Pipelines »
Aditya Modi · JIVAT NEET KAUR · Maggie Makar · Pavan Mallapragada · Amit Sharma · Emre Kiciman · Adith Swaminathan -
2023 : Why do universal adversarial attacks work on large language models?: Geometry might be the answer »
Varshini Subhash · Anna Bialas · Siddharth Swaroop · Weiwei Pan · Finale Doshi-Velez -
2023 : Implications of Gaussian process kernel mismatch for out-of-distribution data »
Beau Coker · Finale Doshi-Velez -
2023 : Leveraging Factored Action Spaces for Off-Policy Evaluation »
Aaman Rebello · Shengpu Tang · Jenna Wiens · Sonali Parbhoo -
2023 : Inverse Transition Learning for Characterizing Near-Optimal Dynamics in Offline Reinforcement Learning »
Leo Benac · Sonali Parbhoo · Finale Doshi-Velez -
2023 : Leveraging Factored Action Spaces for Off-Policy Evaluation »
Aaman Rebello · Shengpu Tang · Jenna Wiens · Sonali Parbhoo -
2023 : Discovering User Types: Characterization of User Traits by Task-Specific Behaviors in Reinforcement Learning »
Lars L. Ankile · Brian Ham · Kevin Mao · Eura Shin · Siddharth Swaroop · Finale Doshi-Velez · Weiwei Pan -
2023 : Adaptive interventions for both accuracy and time in AI-assisted human decision making »
Siddharth Swaroop · Zana Buçinca · Krzysztof Gajos · Finale Doshi-Velez -
2023 : SAP-sLDA: An Interpretable Interface for Exploring Unstructured Text »
Charumathi Badrinath · Weiwei Pan · Finale Doshi-Velez -
2023 : Signature Activation: A Sparse Signal View for Holistic Saliency »
Jose Tello Ayala · Akl Fahed · Weiwei Pan · Eugene Pomerantsev · Patrick Ellinor · Anthony Philippakis · Finale Doshi-Velez -
2023 : Signature Activation: A Sparse Signal View for Holistic Saliency »
Jose Tello Ayala · Akl Fahed · Weiwei Pan · Eugene Pomerantsev · Patrick Ellinor · Anthony Philippakis · Finale Doshi-Velez -
2023 : Implications of kernel mismatch for OOD data »
Beau Coker · Finale Doshi-Velez -
2023 : Soft prompting might be a bug, not a feature »
Luke Bailey · Gustaf Ahdritz · Anat Kleiman · Siddharth Swaroop · Finale Doshi-Velez · Weiwei Pan -
2023 : Bayesian Inverse Transition Learning for Offline Settings »
Leo Benac · Sonali Parbhoo · Finale Doshi-Velez -
2023 : Discovering User Types: Characterization of User Traits by Task-Specific Behaviors in Reinforcement Learning »
Lars L. Ankile · Brian Ham · Kevin Mao · Eura Shin · Siddharth Swaroop · Finale Doshi-Velez · Weiwei Pan -
2023 : SCIS 2023 Panel, The Future of Generalization: Scale, Safety and Beyond »
Maggie Makar · Samuel Bowman · Zachary Lipton · Adam Gleave -
2023 Workshop: The Second Workshop on Spurious Correlations, Invariance and Stability »
Yoav Wald · Claudia Shi · Aahlad Puli · Amir Feder · Limor Gultchin · Mark Goldstein · Maggie Makar · Victor Veitch · Uri Shalit -
2023 : Discovering User Types: Characterization of User Traits by Task-Specific Behaviors in Reinforcement Learning »
Lars L. Ankile · Brian Ham · Kevin Mao · Eura Shin · Siddharth Swaroop · Finale Doshi-Velez · Weiwei Pan -
2023 Poster: The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning »
Sarah Rathnam · Sonali Parbhoo · Weiwei Pan · Susan Murphy · Finale Doshi-Velez -
2023 Poster: Mitigating the Effects of Non-Identifiability on Inference for Bayesian Neural Networks with Latent Variables »
Yaniv Yacoby · Weiwei Pan · Finale Doshi-Velez -
2022 : Responsible Decision-Making in Batch RL Settings »
Finale Doshi-Velez -
2022 Workshop: Spurious correlations, Invariance, and Stability (SCIS) »
Aahlad Puli · Maggie Makar · Victor Veitch · Yoav Wald · Mark Goldstein · Limor Gultchin · Angela Zhou · Uri Shalit · Suchi Saria -
2021 : RL Explainability & Interpretability Panel »
Ofra Amir · Finale Doshi-Velez · Alan Fern · Zachary Lipton · Omer Gottesman · Niranjani Prasad -
2021 : [01:50 - 02:35 PM UTC] Invited Talk 3: Interpretability in High Dimensions: Concept Bottlenecks and Beyond »
Finale Doshi-Velez -
2021 Poster: Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement »
Andrew Ross · Finale Doshi-Velez -
2021 Oral: Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement »
Andrew Ross · Finale Doshi-Velez -
2021 Poster: State Relevance for Off-Policy Evaluation »
Simon Shen · Yecheng Jason Ma · Omer Gottesman · Finale Doshi-Velez -
2021 Spotlight: State Relevance for Off-Policy Evaluation »
Simon Shen · Yecheng Jason Ma · Omer Gottesman · Finale Doshi-Velez -
2020 : Keynote #2 Finale Doshi-Velez »
Finale Doshi-Velez -
2020 Poster: Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies »
Shengpu Tang · Aditya Modi · Michael Sjoding · Jenna Wiens -
2020 Poster: Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions »
Omer Gottesman · Joseph Futoma · Yao Liu · Sonali Parbhoo · Leo Celi · Emma Brunskill · Finale Doshi-Velez -
2019 Poster: Combining parametric and nonparametric models for off-policy evaluation »
Omer Gottesman · Yao Liu · Scott Sussex · Emma Brunskill · Finale Doshi-Velez -
2019 Oral: Combining parametric and nonparametric models for off-policy evaluation »
Omer Gottesman · Yao Liu · Scott Sussex · Emma Brunskill · Finale Doshi-Velez -
2018 Poster: Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning »
Stefan Depeweg · Jose Miguel Hernandez-Lobato · Finale Doshi-Velez · Steffen Udluft -
2018 Poster: Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors »
Soumya Ghosh · Jiayu Yao · Finale Doshi-Velez -
2018 Oral: Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors »
Soumya Ghosh · Jiayu Yao · Finale Doshi-Velez -
2018 Oral: Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning »
Stefan Depeweg · Jose Miguel Hernandez-Lobato · Finale Doshi-Velez · Steffen Udluft -
2017 Tutorial: Interpretable Machine Learning »
Been Kim · Finale Doshi-Velez