1. 9.4 Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
  2. 9.2 A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
  3. 9.0 Entropy-Regularized Token-Level Policy Optimization for Large Language Models
  4. 9.0 Informativeness of Reward Functions in Reinforcement Learning
  5. 9.0 ODIN: Disentangled Reward Mitigates Hacking in RLHF
  6. 8.9 Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine
  7. 8.8 Dynamic Graph Information Bottleneck
  8. 8.8 Potential-Based Reward Shaping For Intrinsic Motivation
  9. 8.7 Refined Sample Complexity for Markov Games with Independent Linear Function Approximation
  10. 8.6 Future Prediction Can be a Strong Evidence of Good History Representation in Partially Observable Environments
  11. 8.6 Auxiliary Reward Generation with Transition Distance Representation Learning
  12. 8.6 Mixed Q-Functionals: Advancing Value-Based Methods in Cooperative MARL with Continuous Action Domains
  13. 8.5 Corruption Robust Offline Reinforcement Learning with Human Feedback
  14. 8.5 Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning
  15. 8.3 Scaling Laws for Fine-Grained Mixture of Experts
  16. 8.2 Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
  17. 8.0 Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
  18. 7.9 Learn to Teach: Improve Sample Efficiency in Teacher-student Learning for Sim-to-Real Transfer
  19. 7.9 Policy Improvement using Language Feedback Models
  20. 7.8 Online Sequential Decision-Making with Unknown Delays