- 8.9 ROMA-iQSS: An Objective Alignment Approach via State-Based Value Learning and ROund-Robin Multi-Agent Scheduling
- Authors: Chi-Hui Lin, Joewie J. Koh, Alessandro Roncone, Lijun Chen
- Reason: The paper addresses multi-agent collaboration, a critical aspect of reinforcement learning. It proposes a novel approach that surpasses existing strategies, indicating a high potential for influencing future research in decentralized multi-agent systems.
- 8.6 An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems
- Authors: Peng Liu, Cong Xu, Ming Zhao, Jiawei Zhu, Bin Wang, Yi Ren
- Reason: The paper’s practical application of reinforcement learning in real-world large-scale recommender systems, and its deployment in Tencent News, suggests it could have a significant impact on the field of reinforcement learning applied to industry-scale problems.
- 8.4 Using Deep Q-Learning to Dynamically Toggle between Push/Pull Actions in Computational Trust Mechanisms
- Authors: Zoi Lygizou, Dimitris Kalles
- Reason: Presents an innovative application of Deep Q-Learning in dynamic environments and addresses a critical aspect of trust and reputation models. The author’s prior comparison of CA and FIRE models provides a strong foundation for this work.
- 8.3 Knowledge Transfer for Cross-Domain Reinforcement Learning: A Systematic Review
- Authors: Sergio A. Serrano, Jose Martinez-Carranza, L. Enrique Sucar
- Reason: This systematic review may greatly influence future cross-domain reinforcement learning research by categorizing methods and identifying challenges, guiding researchers in an emerging and technically challenging subset of reinforcement learning.
- 8.1 Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies
- Authors: Seyed Soroush Karimi Madahi, Gargya Gokhale, Marie-Sophie Verwee, Bert Claessens, Chris Develder
- Reason: Introduces a relevant RL-based framework for energy arbitrage with a safety correction mechanism, which has high applicability in real-world settings, underscored by a deployment on a real battery.
- 8.0 Center-Based Relaxed Learning Against Membership Inference Attacks
- Authors: Xingli Fang, Jung-Eun Kim
- Reason: The work deals with an important topic of privacy preservation in the context of machine learning, but it is not solely focused on reinforcement learning. Nevertheless, advancements in privacy-preserving techniques are relevant to the broader machine learning community, including reinforcement learning practitioners.
- 7.9 Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty
- Authors: Laixi Shi, Eric Mazumdar, Yuejie Chi, Adam Wierman
- Reason: Addresses the challenge of environmental uncertainty in multi-agent settings and offers sample-efficient solutions with finite-sample complexity guarantees, filling a gap in robust RL literature.
- 7.8 Learning Manipulation Tasks in Dynamic and Shared 3D Spaces
- Authors: Hariharan Arunachalam, Marc Hanheide, Sariah Mghames
- Reason: The paper proposes a novel reinforcement learning strategy for robotics, which is a valuable application area for RL. However, it scores slightly lower in potential influence due to its focus on a specific task and a preliminary stage of research compared to the other papers listed.
- 7.7 SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies
- Authors: Amir Samadi, Konstantinos Koufos, Kurt Debattista, Mehrdad Dianati
- Reason: Focuses on enhancing explainability in DRL, a critical factor for the adoption of RL in safety-critical applications, offering a novel approach to generating counterfactual explanations.
- 7.5 DPO Meets PPO: Reinforced Token Optimization for RLHF
- Authors: Han Zhong, Guhao Feng, Wei Xiong, Li Zhao, Di He, Jiang Bian, Liwei Wang
- Reason: Suggests an advanced method for optimizing token-wise rewards in RLHF and the integration of DPO with PPO is an innovative approach that could provide a structured methodology for open-source LLM alignment tasks.