firstbacksecondback
377 Results
Workshop
|
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge? Zhaorun Chen · Yichao Du · Zichen Wen · Yiyang Zhou · Chenhang Cui · Zhenzhen Weng · Haoqin Tu · Chaoqi Wang · Zhengwei Tong · Leria HUANG · Canyu Chen · Qinghao Ye · Zhihong Zhu · Yuqing Zhang · Jiawei Zhou · Zhuokai Zhao · Rafael Rafailov · Chelsea Finn · Huaxiu Yao |
||
Poster
|
Tue 2:30 |
Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem Maciej Wołczyk · Bartłomiej Cupiał · Mateusz Ostaszewski · Michał Bortkiewicz · Michał Zając · Razvan Pascanu · Lukasz Kucinski · Piotr Milos |
|
Workshop
|
Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets Ulrich Armel Mbou Sob · Qiulin Li · Miguel Arbesú · Oliver Bent · Andries Smit · Arnu Pretorius |
||
Poster
|
Thu 4:30 |
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback Gaurav Pandey · Yatin Nandwani · Tahira Naseem · Mayank Mishra · Guangxuan Xu · Dinesh Raghu · Sachindra Joshi · Asim Munawar · Ramón Astudillo |
|
Poster
|
Wed 4:30 |
Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning Michal Nauman · Michał Bortkiewicz · Piotr Milos · Tomasz Trzcinski · Mateusz Ostaszewski · Marek Cygan |
|
Poster
|
Wed 4:30 |
Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning Michael Matthews · Michael Beukman · Benjamin Ellis · Mikayel Samvelyan · Matthew T Jackson · Samuel Coward · Jakob Foerster |
|
Workshop
|
Generative Design of Decision Tree Policies for Reinforcement Learning Jacob Pettit · Chak Shing Lee · Jiachen Yang · Alex Ho · Daniel Faissol · Brenden Petersen · Mikel Landajuela |
||
Workshop
|
Language Model-In-The-Loop: Data Optimal Approach to Recommend Actions in Text Games Arjun V SS · Prasanna Parthasarathi · Janarthanan Rajendran · Sarath Chandar |
||
Poster
|
Thu 2:30 |
Configurable Mirror Descent: Towards a Unification of Decision Making Pengdeng Li · Shuxin Li · Chang Yang · Xinrun Wang · Shuyue Hu · Xiao Huang · Hau Chan · Bo An |
|
Poster
|
Wed 2:30 |
No-Regret Reinforcement Learning in Smooth MDPs Davide Maran · Alberto Maria Metelli · Matteo Papini · Marcello Restelli |
|
Poster
|
Tue 4:30 |
Learning Optimal Deterministic Policies with Stochastic Policy Gradients Alessandro Montenegro · Marco Mussi · Alberto Maria Metelli · Matteo Papini |
|
Poster
|
Tue 4:30 |
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning Yen-Ju Chen · Nai-Chieh Huang · Ching-pei Lee · Ping-Chun Hsieh |