1. 9.1 DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
  2. 8.7 Learning to Model the World with Language
  3. 8.5 MARLIM: Multi-Agent Reinforcement Learning for Inventory Management
  4. 8.3 End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear MPC
  5. 8.1 Bag of Policies for Distributional Deep Exploration