1. 9.5 On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $ε$-Greedy Exploration
  2. 9.1 QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
  3. 9.0 Contextual Bandits for Evaluating and Improving Inventory Control Policies
  4. 9.0 Model predictive control-based value estimation for efficient reinforcement learning
  5. 8.9 Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies
  6. 8.8 Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits
  7. 8.8 Towards Control-Centric Representations in Reinforcement Learning from Images
  8. 8.7 Symphony of experts: orchestration with adversarial insights in reinforcement learning
  9. 8.4 Imperfect Digital Twin Assisted Low Cost Reinforcement Training for Multi-UAV Networks
  10. 8.2 Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms