9.2 Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour with Multi-Agent Reinforcement Learning
- Authors: Benjamin Patrick Evans, Sumitra Ganesh
- Reason: This paper addresses the critical concern of behavioral rule specification in ABMs using MARL, aligns with economic and financial models through the utilization of rational utility-maximizing agents, and integrates the bounded rationality and agent heterogeneity fundamental to ABM. Accepted at AAMAS 2024, a leading conference, suggesting high peer recognition and applicability in diverse real-world settings, indicative of potential broad influence.
9.0 Dense Reward for Free in Reinforcement Learning from Human Feedback
- Authors: Alex J. Chan, Hao Sun, Samuel Holt, Mihaela van der Schaar
- Reason: The paper proposes an innovative method to redistribute reward in RLHF systems, using attention maps from transformer architectures. The approach is theoretically grounded in potential-based reward shaping and empirically demonstrated to improve the learning process, potentially influencing a wide range of applications involving LLMs.
8.9 Introducing PetriRL: An Innovative Framework for JSSP Resolution Integrating Petri nets and Event-based Reinforcement Learning
- Authors: Sofiene Lassoued, Andreas Schwung
- Reason: Introduction of a novel framework incorporating Petri nets for improved explainability and performance in scheduling, which is a significant advancement in the application of RL in industrial contexts.
8.9 Neural Style Transfer with Twin-Delayed DDPG for Shared Control of Robotic Manipulators
- Authors: Raul Fernandez-Fernandez, Marco Aggravi, Paolo Robuffo Giordano, Juan G. Victores, Claudio Pacchierotti
- Reason: This paper uniquely applies the concept of NST to robotic motion, utilizing advanced reinforcement learning techniques for style transfer in robot control. Given the extensive evaluation with human subjects and the potential applications in human-robot interaction, it could significantly impact the field of robotic manipulators.
8.7 Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning
- Authors: Xuecheng Niu, Akinori Ito, Takashi Nose
- Reason: Proposes a curiosity-driven learning framework improving upon established RL methods for dialog systems, a critical area for conversational AI development.
8.7 Distilling Conditional Diffusion Models for Offline Reinforcement Learning through Trajectory Stitching
- Authors: Shangzhe Li, Xinhua Zhang
- Reason: By proposing a novel knowledge distillation method to address the challenges of computation in offline RL with deep generative models, this paper contributes to efficient RL applications. The trajectory stitching technique and competitive benchmarks demonstrate practical value with potential influence in the RL community.
8.6 Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments
- Authors: Alexander W. Goodall, Francesco Belardinelli
- Reason: Introducing an extension of the AMBS framework to continuous environments and providing probabilistic safety guarantees is highly relevant for deploying safe RL systems. Given its acceptance at AAMAS 2024 and potential for impact in complex real-world applications, this work presents important advancements in safe RL.
8.5 Behind the Myth of Exploration in Policy Gradients
- Authors: Adrien Bolland, Gaspard Lambrechts, Damien Ernst
- Reason: Offers a new analysis of exploration in policy-gradient algorithms, which could impact future RL methods in environments with continuous state and action spaces.
8.3 Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach
- Authors: Zhiyuan Yao, Ionut Florescu, Chihoon Lee
- Reason: Addresses the challenging issue of delayed feedback in control problems with an innovative stochastic planning-based RL method, pushing the boundaries of real-time decision making in unpredictable environments.
8.1 Adaptive Primal-Dual Method for Safe Reinforcement Learning
- Authors: Weiqin Chen, James Onyejizu, Long Vu, Lan Hoang, Dharmashankar Subramanian, Koushik Kar, Sandipan Mishra, Santiago Paternain
- Reason: Enhances the application of primal-dual methods in SRL through adaptive learning rates, contributing to the development of safer RL algorithms in practice, showcasing both theoretical and practical progress.