1. 9.4 Adaptive Proximal Policy Optimization with Upper Confidence Bound
  2. 9.3 Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability
  3. 9.1 I Open at the Close: A Deep Reinforcement Learning Evaluation of Open Streets Initiatives
  4. 9.0 GLOP: Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time
  5. 8.9 A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
  6. 8.8 Traffic Signal Control Using Lightweight Transformers: An Offline-to-Online RL Approach
  7. 8.6 Secure Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless MEC Networks
  8. 8.4 Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
  9. 8.3 Combinatorial Stochastic-Greedy Bandit
  10. 7.9 An Invitation to Deep Reinforcement Learning