9.4 Scale-free Adversarial Reinforcement Learning
- Authors: Mingyu Chen, Xuezhou Zhang
- Reason: Groundbreaking approach to scale-free learning in MDPs which resolves an open problem — significant relevance to adversarial RL situations and potential for wide influence due to the fundamental nature of the contribution.
9.2 Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning
- Authors: Hyungho Na, Yunkyeong Seo, Il-chul Moon
- Reason: Introduces important advancements in MARL with potential reductions in learning time, likely impactful in high-complexity contexts like real-time strategy games, driving strong future interest and citations.
9.0 On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games
- Authors: Awni Altabaa, Zhuoran Yang
- Reason: The paper’s thorough analysis and novel RL models could significantly affect the field of strategic decision-making under uncertainty, promising for reinforcement learning applications in real-world complex systems.
9.0 Iterated $Q$-Network: Beyond the One-Step Bellman Operator
- Authors: Théo Vincent, Daniel Palenicek, Boris Belousov, Jan Peters, Carlo D’Eramo
- Reason: Introduces a new approach that improves upon the traditional way reinforcement learning algorithms apply the Bellman operator. The paper has the potential to significantly impact RL methodologies and is authored by researchers who have a track record in the field, including Jan Peters who is a well-known authority.
8.9 Mixed-Strategy Nash Equilibrium for Crowd Navigation
- Authors: Muchen Sun, Francesca Baldini, Peter Trautman, Todd Murphey
- Reason: Presents a model for anticipating cooperative human behavior in crowds, integrating game theory into reinforcement learning, with practical implications for robotics. High potential for influencing research on human-robot interaction and crowd navigation algorithms.
8.7 Towards Fair and Firm Real-Time Scheduling in DNN Multi-Tenant Multi-Accelerator Systems via Reinforcement Learning
- Authors: Enrico Russo, Francesco Giulio Blanco, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Vincenzo Catania
- Reason: Introduces a novel deep reinforcement learning approach for real-time multi-tenant QoS management in cloud environments, a critical and highly relevant area of research with substantial implications across cloud services.
8.7 A Case for Validation Buffer in Pessimistic Actor-Critic
- Authors: Michal Nauman, Mateusz Ostaszewski, Marek Cygan
- Reason: Addresses critical issues in the error accumulation of critic networks — the proposed Validation Pessimism Learning (VPL) algorithm could extensively influence actor-critic methods.
8.7 Quantized Hierarchical Federated Learning: A Robust Approach to Statistical Heterogeneity
- Authors: Seyed Mohammad Azimi-Abarghouyi, Viktoria Fodor
- Reason: Proposes a novel framework for federated learning that is robust to statistical heterogeneity, a key challenge in distributed learning, suggesting advancements in efficient and resilient reinforcement learning systems.
8.7 Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy
- Authors: Chenyang Cao, Zichen Yan, Renhao Lu, Junbo Tan, Xueqian Wang
- Reason: Tackles the important issue of safety in reinforcement learning, which is critical for real-world applications. The paper could be influential given the increasing importance of safety and robustness in RL and the authors’ contribution of a new benchmark.
8.5 Offline Fictitious Self-Play for Competitive Games
- Authors: Jingxiao Chen, Weiji Xie, Weinan Zhang, Yong yu, Ying Wen
- Reason: Addresses the challenge of learning in offline multi-agent competitive settings, presenting the first model-free offline RL algorithm for competitive games, which is significant for the advancement of offline RL in multi-agent systems.
8.5 PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning
- Authors: Tian Gao, Soroush Nasiriany, Huihan Liu, Quantao Yang, Yuke Zhu
- Reason: Presents a behavior primitive-based framework for imitation learning that demonstrates substantial performance improvements — likely to spur further research in robotics and manipulation tasks.
8.5 Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
- Authors: Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla
- Reason: Presents a thorough comparative analysis of two paradigms in reinforcement learning from human feedback, which is a hot topic in the RL community. The authors’ detailed theoretical analysis underscores its potential influence on future research in learning from human preferences.
8.3 Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization
- Authors: Zirui Zhu, Yong Liu, Zangwei Zheng, Huifeng Guo, Yang You
- Reason: Proposes a novel optimizer for CTR prediction enhancing model performance, and has implications for online advertising and recommendation scenarios where CTR prediction is fundamental.
8.3 SERVAL: Synergy Learning between Vertical Models and LLMs towards Oracle-Level Zero-shot Medical Prediction
- Authors: Jiahuan Yan, Jintai Chen, Chaowen Hu, Bo Zheng, Yaojun Hu, Jimeng Sun, Jian Wu
- Reason: Introduces a new model for unsupervised development of domain-specific capabilities in LLMs, using medical prediction as an example, indicating potential impact on the use of reinforcement learning in healthcare and other domain-specific applications.
8.2 Tsallis Entropy Regularization for Linearly Solvable MDP and Linear Quadratic Regulator
- Authors: Yota Hashizume, Koshi Oishi, Kenji Kashima
- Reason: Applies Tsallis entropy to reinforcement learning, offering a novel perspective on exploration-exploitation balance. This could inspire new research directions in RL related to entropy regularizations and their influence on policy learning.
8.0 LiMAML: Personalization of Deep Recommender Models via Meta Learning
- Authors: Ruofan Wang, Prakruthi Prabhakar, Gaurav Srivastava, Tianqi Wang, Zeinab S. Jalali, Varun Bharill, Yunbo Ouyang, Aastha Nigam, Divya Venugopalan, Aman Gupta, Fedor Borisyuk, Sathiya Keerthi, Ajith Muralidharan
- Reason: Focuses on personalized recommender systems using a meta-learning approach, and considering the vast usage of recommender systems in online platforms, the impact of this work on personalization is potentially substantial.
7.9 Enhancing Long-Term Recommendation with Bi-level Learnable Large Language Model Planning
- Authors: Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, Fuli Feng
- Reason: Uses LLMs to address long-term recommendation, a crucial problem in the field. The novelty of blending planning with language models for recommendations could be influential in how recommender systems are approached.
7.9 In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
- Authors: Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He
- Reason: Examines the issue of hallucination in LLMs from the inner representation perspective, offering a practical solution for reliability which is critical for the deployment of safe reinforcement learning systems, especially those interacting with the real world.
7.9 Koopman-Assisted Reinforcement Learning
- Authors: Preston Rozwood, Edward Mehrez, Ludger Paehler, Wen Sun, Steven L. Brunton
- Reason: Integrates Koopman operator theory with reinforcement learning presenting a novel methodology that applies to both deterministic and stochastic systems. Given the authors’ focus on addressing the issue of high dimensionality and nonlinearity, this paper could become influential in the development of data-driven hybrid physics-ML models.
7.4 Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks
- Authors: Ziping Xu, Zifan Xu, Runxuan Jiang, Peter Stone, Ambuj Tewari
- Reason: Provides theoretical insights into the exploration benefits of multitask reinforcement learning, with implications for practical algorithm development and the understanding of efficient exploration strategies in RL.