firstbacksecondback
377 Results
Workshop
|
Scaling Automated Quantum Error Correction Discovery with Reinforcement Learning Jan Olle · Remmy Zen · Matteo Puviani · Florian Marquardt |
||
Poster
|
Thu 2:30 |
Do Transformer World Models Give Better Policy Gradients? Michel Ma · Tianwei Ni · Clement Gehring · Pierluca D'Oro · Pierre-Luc Bacon |
|
Poster
|
Tue 2:30 |
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences Andi Nika · Debmalya Mandal · Parameswaran Kamalaruban · Georgios Tzannetos · Goran Radanovic · Adish Singla |
|
Poster
|
Wed 4:30 |
PcLast: Discovering Plannable Continuous Latent States ANURAG KOUL · Shivakanth Sujit · Shaoru Chen · Benjamin Evans · Lili Wu · Byron Xu · Rajan Chari · Riashat Islam · Raihan Seraj · Yonathan Efroni · Lekan Molu · Miroslav Dudik · John Langford · Alex Lamb |
|
Poster
|
Tue 4:30 |
Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss Ruijie Zheng · Yongyuan Liang · xiyao wang · shuang ma · Hal Daumé · Huazhe Xu · John Langford · Praveen Palanisamy · Kalyan Basu · Furong Huang |