1. 9.6 All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization
  2. 9.5 Motif: Intrinsic Motivation from Artificial Intelligence Feedback
  3. 9.5 Towards Causal Foundation Model: on Duality between Causal Inference and Attention
  4. 9.3 Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
  5. 9.3 Using Reinforcement Learning to Optimize Responses in Care Processes: A Case Study on Aggression Incidents
  6. 9.2 Reinforcement Learning for Node Selection in Branch-and-Bound
  7. 9.1 Sparse Backpropagation for MoE Training
  8. 9.0 Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
  9. 9.0 GenSim: Generating Robotic Simulation Tasks via Large Language Models
  10. 8.9 Bayesian Design Principles for Frequentist Sequential Learning
  11. 8.7 Combining Spatial and Temporal Abstraction in Planning for Better Generalization
  12. 8.7 Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
  13. 8.5 From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information
  14. 8.5 H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation
  15. 8.3 Pre-training with Synthetic Data Helps Offline Reinforcement Learning