[7:00]
Instabilities of Offline RL with Pre-Trained Neural Representation
[7:05]
Path Planning using Neural A* Search
[7:10]
Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings
[7:15]
Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning
[7:20]
Solving Challenging Dexterous Manipulation Tasks With Trajectory Optimisation and Reinforcement Learning
[7:25]
Continuous-time Model-based Reinforcement Learning
[7:30]
Bayesian Optimistic Optimisation with Exponentially Decaying Regret
[7:35]
Best Model Identification: A Rested Bandit Formulation
[7:40]
Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time