9.5 Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning
- Authors: Patrick Emami, Xiangyu Zhang, David Biagioni, Ahmed S. Zamzam
- Reason: This paper introduces a new framework for learning non-stationary policies for multi-timescale MARL using available information about agent timescales, which is a novel idea in the field. Also, this paper was accepted at a prestigious conference, IEEE CDC’23.
9.5 Scaling Laws for Imitation Learning in NetHack
- Authors: Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik Narasimhan, Sham Kakade
- Reason: This paper explores the untapped potential of scaling up model and data size in imitation learning, using the game of NetHack as an example. The authors’ findings demonstrate the benefits of this approach and propose its viability for increasingly competent agents.
9.3 REX: Rapid Exploration and eXploitation for AI Agents
- Authors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
- Reason: The authors of this paper propose an improved approach to exploration and exploitation in artificial intelligence, addressing inherent limitations in existing AutoGPT-style techniques and offering enhanced AI agent performance.
9.2 Towards Accelerating Benders Decomposition via Reinforcement Learning Surrogate Models
- Authors: Stephen Mak, Kyle Mana, Parisa Zehtabi, Michael Cashmore, Daniele Magazzeni, Manuela Veloso
- Reason: Apart from proposing an acceleration method for complex stochastic optimization problems, this paper comprises contributions from esteemed researchers in the field of AI and machine learning like Manuela Veloso.
9.2 Learning Dynamic Attribute-factored World Models for Efficient Multi-object Reinforcement Learning
- Authors: Fan Feng, Sara Magliacane
- Reason: The paper addresses the opportunity for better factorization in reinforcement learning tasks, introducing DAFT-RL, a novel framework aimed at improving sample efficiency and generalizability in complex learning environments.
9.1 Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
- Authors: Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian
- Reason: This paper offers a novel method for robust reinforcement learning suitable for large-scale RL using function approximation. The authors provide finite-time convergence guarantees which has high potential in pushing Reinforcement Learning forward.
8.9 An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient
- Authors: Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan
- Reason: The paper challenges existing risk-averse Reinforcement Learning methods based on variance, proposing Gini deviation as a more effective alternative. The innovative perspective on risk measures gives it the potential to become highly influential in the field.
8.9 IxDRL: A Novel Explainable Deep Reinforcement Learning Toolkit based on Analyses of Interestingness
- Authors: Pedro Sequeira, Melinda Gervasio
- Reason: This article proposes a new framework for explainable Deep RL, offering various measures to assess RL agent competency and providing insightful views of the agent’s capabilities and limitations.
8.8 Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees
- Authors: Brent A. Wallace, Jennie Si
- Reason: A comprehensive paper discussing new algorithms and design approaches for continuous-time reinforcement learning, with various applicable theoretical insights, performance guarantees, and practical use cases.
8.7 Basal-Bolus Advisor for Type 1 Diabetes (T1D) Patients Using Multi-Agent Reinforcement Learning (RL) Methodology
- Authors: Mehrad Jalolia, Marzia Cescon
- Reason: The application of reinforcement learning for personalized glucose control in Type 1 diabetes patients makes it a potentially influential paper, signifying the transferability of RL to real-world health applications.