9.1 DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
- Authors: Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He
- Reason: This work democratizes RLHF training, providing a scalable and efficient method for training GPT-like models. This is crucial as such models have numerous AI applications, and this system paves the way for further innovation in the field.
8.7 Learning to Model the World with Language
- Authors: Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan
- Reason: This paper introduces an agent which uses language as a self-supervised learning objective, tying language understanding and future prediction together. This provides a valuable intersection of language processing and reinforcement learning.
8.5 MARLIM: Multi-Agent Reinforcement Learning for Inventory Management
- Authors: Rémi Leluc, Elie Kadoche, Antoine Bertoncello, Sébastien Gourvénec
- Reason: This presents a novel RL framework for managing supply and demand in the supply chain industry, a critical real-world application. Results show RL methods outperforming traditional baselines, indicating promise for real-world implementation.
8.3 End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear MPC
- Authors: Daniel Mayfrank, Alexander Mitsos, Manuel Dahmen
- Reason: Offering a way to balance control performance with computational demand, this paper presents a method for RL of dynamic system models for optimal performance in (e)NMPC applications. It shows promise for improving dynamic control systems.
8.1 Bag of Policies for Distributional Deep Exploration
- Authors: Asen Nachkov, Luchen Li, Giulia Luise, Filippo Valdetaro, Aldo Faisal
- Reason: The paper develops a novel approach that can improve on distributional RL, offering a method to enable ensemble approaches for exploration. The method shows a greater robustness and speed during learning, demonstrating potential for improving the efficiency of future RL methods.