1. 9.8 DiffTOP: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning
  2. 9.6 QGFN: Controllable Greediness with Action Values
  3. 9.4 FlowPG: Action-constrained Policy Gradient with Normalizing Flows
  4. 9.1 Meta-learning the mirror map in policy mirror descent
  5. 9.1 Offline Actor-Critic Reinforcement Learning Scales to Large Models
  6. 9.0 Convergence for Natural Policy Gradient on Infinite-State Average-Reward Markov Decision Processes
  7. 8.9 Improving Token-Based World Models with Parallel Observation Prediction
  8. 8.7 Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization
  9. 8.5 Differentially Private Model-Based Offline Reinforcement Learning
  10. 8.3 Reinforcement Learning as a Catalyst for Robust and Fair Federated Learning: Deciphering the Dynamics of Client Contributions