1. 9.5 Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization
  2. 9.2 Learning Reusable Manipulation Strategies
  3. 8.9 LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion
  4. 8.7 Using General Value Functions to Learn Domain-Backed Inventory Management Policies
  5. 8.7 MAAIP: Multi-Agent Adversarial Interaction Priors for imitation from fighting demonstrations for physics-based characters
  6. 8.7 TS-Diffusion: Generating Highly Complex Time Series with Diffusion Models
  7. 8.6 QOCO: A QoE-Oriented Computation Offloading Algorithm based on Deep Reinforcement Learning for Mobile Edge Computing
  8. 8.5 Hierarchical Reinforcement Learning for Power Network Topology Control
  9. 8.3 Combining Deep Learning on Order Books with Reinforcement Learning for Profitable Trading
  10. 8.3 Nonlinear Multi-objective Reinforcement Learning with Provable Guarantees
  11. 8.3 Neural Structure Learning with Stochastic Differential Equations
  12. 8.1 Steady-State Analysis of Queues with Hawkes Arrival and Its Application to Online Learning for Hawkes Queues
  13. 7.9 AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation
  14. 7.9 RELand: Risk Estimation of Landmines via Interpretable Invariant Risk Minimization
  15. 7.6 Imitation Bootstrapped Reinforcement Learning