1. 9.3 Dataset Clustering for Improved Offline Policy Learning
  2. 9.2 Revisiting Recurrent Reinforcement Learning with Memory Monoids
  3. 9.0 Optimistic Thompson Sampling for No-Regret Learning in Unknown Games
  4. 9.0 Symmetry-Breaking Augmentations for Ad Hoc Teamwork
  5. 8.8 Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts
  6. 8.7 Reward Poisoning Attack Against Offline Reinforcement Learning
  7. 8.6 Diffusion Models Meet Contextual Bandits with Large Action Spaces
  8. 8.5 PMGDA: A Preference-based Multiple Gradient Descent Algorithm
  9. 8.5 Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective
  10. 8.1 Persuading a Learning Agent