1. 9.3 ComSD: Balancing Behavioral Quality and Diversity in Unsupervised Skill Discovery
  2. 9.2 Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
  3. 9.1 Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness
  4. 9.1 Memory Gym: Partially Observable Challenges to Memory-Based Agents in Endless Episodes
  5. 9.0 A Real-World Quadrupedal Locomotion Benchmark for Offline Reinforcement Learning
  6. 8.9 DTC: Deep Tracking Control – A Unifying Approach to Model-Based Planning and Reinforcement-Learning for Versatile and Robust Locomotion
  7. 8.7 MORPH: Design Co-optimization with Reinforcement Learning via a Differentiable Hardware Model Proxy
  8. 8.5 Estimation and Inference in Distributional Reinforcement Learning
  9. 8.3 Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
  10. 7.9 Directly Fine-Tuning Diffusion Models on Differentiable Rewards