1. 9.5 Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
  2. 9.5 Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design
  3. 9.3 Learning Optimal Advantage from Preferences and Mistaking it for Reward
  4. 9.2 Searching for High-Value Molecules Using Reinforcement Learning and Transformers
  5. 9.0 Distributionally Safe Reinforcement Learning under Model Uncertainty: A Single-Level Approach by Differentiable Convex Programming
  6. 8.9 Use Your INSTINCT: INSTruction optimization usIng Neural bandits Coupled with Transformers
  7. 8.8 Learning to Reach Goals via Diffusion
  8. 8.5 Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
  9. 8.5 Expected flow networks in stochastic environments and two-player zero-sum games
  10. 8.1 Towards Fully Adaptive Regret Minimization in Heavy-Tailed Bandits