Timezone: »
Successful negotiators must learn how to balance optimizing for self-interest and cooperation. Yet current artificial negotiation agents often heavily depend on the quality of the static datasets they were trained on, limiting their capacity to fashion an adaptive response balancing self-interest and cooperation. For this reason, we find that these agents can achieve either high utility or cooperation, but not both. To address this, we introduce a targeted data acquisition framework where we guide the exploration of a reinforcement learning agent using annotations from an expert oracle. The guided exploration incentivizes the learning agent to go beyond its static dataset and develop new negotiation strategies. We show that this enables our agents to obtain higher-reward and more Pareto-optimal solutions when negotiating with both simulated and human partners compared to standard supervised learning and reinforcement learning methods. This trend additionally holds when comparing agents using our targeted data acquisition framework to variants of agents trained with a mix of supervised learning and reinforcement learning, or to agents using tailored reward functions that explicitly optimize for utility and Pareto-optimality.
Author Information
Minae Kwon (Stanford University)
Siddharth Karamcheti (Stanford University)
Mariano-Florentino Cuellar (Stanford University)
Dorsa Sadigh (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Targeted Data Acquisition for Evolving Negotiation Agents »
Tue. Jul 20th 04:00 -- 06:00 PM Room Virtual
More from the Same Authors
-
2023 Poster: Generating Language Corrections for Teaching Physical Control Tasks »
Megha Srivastava · Noah Goodman · Dorsa Sadigh -
2023 Poster: Language Instructed Reinforcement Learning for Human-AI Coordination »
Hengyuan Hu · Dorsa Sadigh -
2023 Poster: Distance Weighted Supervised Learning for Offline Interaction Data »
Joey Hejna · Jensen Gao · Dorsa Sadigh -
2023 Poster: Long Horizon Temperature Scaling »
Andy Shih · Dorsa Sadigh · Stefano Ermon -
2022 Poster: Imitation Learning by Estimating Expertise of Demonstrators »
Mark Beliaev · Andy Shih · Stefano Ermon · Dorsa Sadigh · Ramtin Pedarsani -
2022 Spotlight: Imitation Learning by Estimating Expertise of Demonstrators »
Mark Beliaev · Andy Shih · Stefano Ermon · Dorsa Sadigh · Ramtin Pedarsani -
2022 : Learning to interact: LET’S LEARN IT ALL Implicit coordination though learned representations »
Dorsa Sadigh -
2022 : Learning to interact: GAME! Coordinating actions with humans via game theory »
Dorsa Sadigh -
2022 : Q&A »
Dorsa Sadigh · Anca Dragan -
2022 : Learning objectives and preferences: HOW? Actively »
Dorsa Sadigh -
2022 Tutorial: Learning for Interactive Agents »
Dorsa Sadigh · Anca Dragan -
2021 : The Role of Conventions in Adaptive Human-AI Collaboration »
Dorsa Sadigh -
2020 : "Active Learning of Robot Reward Functions" »
Dorsa Sadigh -
2019 : Contributed Talk: Continual Adaptation for Efficient Machine Communication »
Minae Kwon -
2019 : Dorsa Sadigh: "Influencing Interactive Mixed-Autonomy Systems" »
Dorsa Sadigh -
2019 : Poster Session »
Ivana Balazevic · Minae Kwon · Benjamin Lengerich · Amir Asiaee · Alex Lambert · Wenyu Chen · Yiming Ding · Carlos Florensa · Joseph E Gaudio · Yesmina Jaafra · Boli Fang · Ruoxi Wang · Tian Li · SWAMINATHAN GURUMURTHY · Andy Yan · Kubra Cilingir · Vithursan (Vithu) Thangarasa · Alexander Li · Ryan Lowe