1. 9.8 Kernelized Reinforcement Learning with Order Optimal Regret Bounds
  2. 9.8 Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second
  3. 9.6 A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
  4. 9.4 Provably Learning Nash Policies in Constrained Markov Potential Games
  5. 9.4 A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning
  6. 9.2 Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes
  7. 9.2 Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
  8. 9.0 Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
  9. 8.9 Robust Reinforcement Learning through Efficient Adversarial Herding
  10. 8.4 Composing Efficient, Robust Tests for Policy Selection