9.3 Dataset Clustering for Improved Offline Policy Learning
- Authors: Qiang Wang, Yixin Deng, Francisco Roldan Sanchez, Keru Wang, Kevin McGuinness, Noel O’Connor, Stephen J. Redmond
- Reason: Addresses crucial aspect of data quality in offline RL which is a current trend, proposed methodology is novel and has shown empirical success.
9.2 Revisiting Recurrent Reinforcement Learning with Memory Monoids
- Authors: Steven Morad, Chris Lu, Ryan Kortvelesy, Stephan Liwicki, Jakob Foerster, Amanda Prorok
- Reason: The paper introduces a novel memory monoid framework, which may significantly improve sample efficiency and the implementation of recurrent loss functions in RL, and it involves authoritative authors such as Jakob Foerster and Amanda Prorok who are well-known in the field of AI and machine learning.
9.0 Optimistic Thompson Sampling for No-Regret Learning in Unknown Games
- Authors: Yingru Li, Liangqi Liu, Wenqiang Pi, Hao Liang, Zhi-Quan Luo
- Reason: Thompson sampling is a central concept in RL, the paper tackles the significant problem of multi-agent environments, authors have high credibility in the field.
9.0 Symmetry-Breaking Augmentations for Ad Hoc Teamwork
- Authors: Ravi Hammond, Dustin Craggs, Mingyu Guo, Jakob Foerster, Ian Reid
- Reason: The paper presents a novel approach to improving ad hoc teamwork by using symmetry-breaking augmentations and is co-authored by Jakob Foerster, a prominent figure in multi-agent systems, indicating high potential influence.
8.8 Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts
- Authors: Tobias Enders, James Harrison, Maximilian Schiffer
- Reason: This paper addresses the important topic of robustness in deep reinforcement learning and the authors propose a risk-sensitive algorithm, highlighting a significant advancement in the field.
8.7 Reward Poisoning Attack Against Offline Reinforcement Learning
- Authors: Yinglun Xu, Rohan Gumaste, Gagandeep Singh
- Reason: Security in RL is gaining importance, this paper explores a new attack vector which could influence future defense strategies.
8.6 Diffusion Models Meet Contextual Bandits with Large Action Spaces
- Authors: Imad Aouali
- Reason: The paper tackles the key challenge of efficient exploration in contextual bandits and leverages diffusion models, showcasing a creative merger of techniques that could have a noteworthy impact in the area of bandits and exploration.
8.5 PMGDA: A Preference-based Multiple Gradient Descent Algorithm
- Authors: Xiaoyuan Zhang, Xi Lin, Qingfu Zhang
- Reason: Tackles multi-objective optimization in RL, which is a challenging and less explored area, potentially influential for complex decision-making systems.
8.5 Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective
- Authors: Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Han Yang, Josef Dai, Xuehai Pan, Yaodong Yang
- Reason: The paper provides a theoretical framework to reexamine reinforcement learning from human feedback (RLHF) and introduces a graph theory-based approach to improve data structures in RLHF, which can significantly contribute to solving a central trilemma in RLHF.
8.1 Persuading a Learning Agent
- Authors: Tao Lin, Yiling Chen
- Reason: Connects learning in RL with concepts from behavioral economics which may influence a range of applications from recommendation systems to autonomous agents.