1. 9.5 Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
  2. 9.2 HyperAgent: A Simple, Scalable, Efficient and Provable Reinforcement Learning Framework for Complex Environments
  3. 9.0 Robust agents learn causal world models
  4. 8.8 Provably Sample Efficient RLHF via Active Preference Optimization
  5. 8.7 Policy Learning for Off-Dynamics RL with Deficient Support