1. 9.2 Reinforcement Learning for Solving Stochastic Vehicle Routing Problem
  2. 8.9 Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
  3. 8.9 Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
  4. 8.8 Communication-Constrained Bayesian Active Knowledge Distillation
  5. 8.7 On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
  6. 8.6 Out-of-Distribution Knowledge Distillation via Confidence Amendment
  7. 8.5 Adversarial Preference Optimization
  8. 8.5 Ensemble sampling for linear bandits: small ensembles suffice
  9. 8.3 Introducing an Improved Information-Theoretic Measure of Predictive Uncertainty
  10. 7.9 MVSA-Net: Multi-View State-Action Recognition for Robust and Deployable Trajectory Generation