1. 9.3 Efficiently Escaping Saddle Points for Non-Convex Policy Optimization
  2. 9.2 Supported Trust Region Optimization for Offline Reinforcement Learning
  3. 9.0 On the Foundation of Distributionally Robust Reinforcement Learning
  4. 8.8 Purpose in the Machine: Do Traffic Simulators Produce Distributionally Equivalent Outcomes for Reinforcement Learning Applications?
  5. 8.6 Rankitect: Ranking Architecture Search Battling World-class Engineers at Meta Scale