- 9.0 Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale
- Authors: Emile Anand, Guannan Qu
- Reason: Addresses a critical scalability challenge in RL with practical applications, presents novel algorithm with promising results, longer paper suggesting depth of research.
- 8.9 EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
- Authors: Shengjie Wang, Shaohuai Liu, Weirui Ye, Jiacheng You, Yang Gao
- Reason: The authors extend EfficientZero to diverse domains with significant improvement over state-of-the-art, indicating a groundbreaking approach to sample efficiency which is a key challenge in RL applications.
- 8.7 Causal Bandits with General Causal Models and Interventions
- Authors: Zirui Yan, Dennis Wei, Dmitriy Katz-Rogozhnikov, Prasanna Sattigeri, Ali Tajer
- Reason: Expands the theoretical understanding of CBs with applications to RL, includes generalizations not previously covered, and provides both upper and lower bounds on regret.
- 8.7 Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey
- Authors: Lucas Schott, Josephine Delas, Hatem Hajri, Elies Gherbi, Reda Yaich, Nora Boulahia-Cuppens, Frederic Cuppens, Sylvain Lamprier
- Reason: In-depth exploration and categorization of adversarial attacks to improve DRL robustness is critical for real-world applications, and the authors provide a comprehensive analysis which could shape future research on RL agent resilience.
- 8.5 Robust Policy Learning via Offline Skill Diffusion
- Authors: Woo Kyung Kim, Minjong Yoo, Honguk Woo
- Reason: Proposes an innovative offline skill learning framework and demonstrates its robustness in policy learning across different domains, which is key for transferring skills in RL.
- 8.5 Safe Hybrid-Action Reinforcement Learning-Based Decision and Control for Discretionary Lane Change
- Authors: Ruichen Xu, Xiao Liu, Jinming Xu, Yuan Lin
- Reason: The paper introduces a novel algorithm promoting safety in autonomous driving, a high-impact application area, with demonstrated superiority over previous methods in safety-critical simulations.
- 8.3 Robustifying a Policy in Multi-Agent RL with Diverse Cooperative Behavior and Adversarial Style Sampling for Assistive Tasks
- Authors: Tayuki Osa, Tatsuya Harada
- Reason: Addressing the real-world applicability of multi-agent RL in healthcare for assistive tasks and the proposed solutions for policy robustification could significantly influence the development of RL in cooperative, real-life settings.
- 8.2 Go Beyond Black-box Policies: Rethinking the Design of Learning Agent for Interpretable and Verifiable HVAC Control
- Authors: Zhiyu An, Xianzhong Ding, Wan Du
- Reason: Focuses on improving interpretability and reliability of RL in HVAC control, indicating a step towards practical deployment, backed by significant energy efficiencies proven in experiments.
- 8.1 Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency
- Authors: Yanxiao Zhao, Yangge Qian, Tianyi Wang, Jingyang Shan, Xiaolin Qin
- Reason: Introduces a framework for improving sample efficiency without modifying existing DRL algorithms, with strong potential for practical adoption due to its flexibility and demonstrated improvement over standard baselines in sample efficiency and average return.
- 7.9 Influencing Bandits: Arm Selection for Preference Shaping
- Authors: Viraj Nadkarni, D. Manjunath, Sharayu Moharir
- Reason: Addresses a novel aspect of non-stationary bandits with real-world implications in preference shaping, though more niche compared to the broader applicability of the other papers listed.