1. 9.5 Aligning Agent Policy with Externalities: Reward Design via Bilevel RL
  2. 9.5 Provably Efficient Learning in Partially Observable Contextual Bandit
  3. 9.2 SMARLA: A Safety Monitoring Approach for Deep Reinforcement Learning Agents
  4. 9.2 Generalized Early Stopping in Evolutionary Direct Policy Search
  5. 9.0 Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces
  6. 8.9 qgym: A Gym for Training and Benchmarking RL-Based Quantum Compilation
  7. 8.9 Scaling may be all you need for achieving human-level object recognition capacity with human-like visual experience
  8. 8.7 Vehicles Control: Collision Avoidance using Federated Deep Reinforcement Learning
  9. 8.7 When Federated Learning meets Watermarking: A Comprehensive Overview of Techniques for Intellectual Property Protection
  10. 8.3 Reinforcement Learning for Financial Index Tracking