9.4 Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
- Authors: Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney
- Reason: The paper proposes a new algorithm for model-based distributional reinforcement learning, which is claimed to be near minimax-optimal with a generative model, advancing theoretical understanding in the field. The presence of Rémi Munos, a well-known authority in RL, supports the high potential influence of this paper.
9.2 A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
- Authors: Chenlu Ye, Wei Xiong, Yuheng Zhang, Nan Jiang, Tong Zhang
- Reason: The paper provides theoretical insights into Nash learning from human feedback using a general preference model, connecting it with traditional reinforcement learning theory. Nan Jiang and Tong Zhang are recognized researchers, which enhances the paper’s impact.
9.0 Entropy-Regularized Token-Level Policy Optimization for Large Language Models
- Authors: Muning Wen, Cheng Deng, Jun Wang, Weinan Zhang, Ying Wen
- Reason: Introduces a novel entropy-augmented RL method tailored for optimizing LLMs at the token level, which appears to significantly enhance the interactive decision-making capabilities of LLMs, backed by theoretical proof and better performance compared to RLHF-prompt based PPO in a simulated environment for a complex task of code generation.
9.0 Informativeness of Reward Functions in Reinforcement Learning
- Authors: Rati Devidze, Parameswaran Kamalaruban, Adish Singla
- Reason: Offers a novel reward informativeness criterion, extensive theoretical insights, and empirical validation. Given as an extended version of an AAMAS’24 paper, author’s authority and conference recognition contribute to the potential high influence.
9.0 ODIN: Disentangled Reward Mitigates Hacking in RLHF
- Authors: Lichang Chen, Chen Zhu, Davit Soselia, Jiuhai Chen, Tianyi Zhou, Tom Goldstein, Heng Huang, Mohammad Shoeybi, Bryan Catanzaro
- Reason: This work addresses reward hacking in RLHF, a critical problem for model reliability. The involvement of Tom Goldstein and Heng Huang, respected authors in the domain, along with practical improvements suggested in the study, point to a strong potential impact.
8.9 Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine
- Authors: Shayan Meshkat Alsadat, Jean-Raphael Gaglione, Daniel Neider, Ufuk Topcu, Zhe Xu
- Reason: Introduces innovative use of LLMs in RL with theoretical guarantees and significant performance improvements mentioned, authors affiliated with reputable institutions.
8.8 Dynamic Graph Information Bottleneck
- Authors: Haonan Yuan, Qingyun Sun, Xingcheng Fu, Cheng Ji, Jianxin Li
- Reason: Proposes a novel framework with the potential to robustify representation learning in dynamic graphs against adversarial attacks, which could significantly improve the reliability of many applications relying on dynamic graph data analysis.
8.8 Potential-Based Reward Shaping For Intrinsic Motivation
- Authors: Grant C. Forbes, Nitish Gupta, Leonardo Villalobos-Arias, Colin M. Potts, Arnav Jhala, David L. Roberts
- Reason: The paper presents an extension to potential-based reward shaping that preserves the set of optimal policies under more general conditions, which is significant for RL in complex and sparse-reward environments. The specific technical contribution and gathering of reputable authors in AI and gaming (David L. Roberts) suggest potential influence.
8.7 Refined Sample Complexity for Markov Games with Independent Linear Function Approximation
- Authors: Yan Dai, Qiwen Cui, Simon S. Du
- Reason: Presents crucial improvements to Markov Games and tackles the curse of multi-agents, authors have strong reputations in the field.
8.6 Future Prediction Can be a Strong Evidence of Good History Representation in Partially Observable Environments
- Authors: Jeongyeol Kwon, Liu Yang, Robert Nowak, Josiah Hanna
- Reason: Addresses a significant challenge in RL with partially observable environments, with promising empirical results. Authors have previous contributions to the field.
8.6 Auxiliary Reward Generation with Transition Distance Representation Learning
- Authors: Siyuan Li, Shijie Han, Yingnan Zhao, By Liang, Peng Liu
- Reason: Addressing the challenge of designing rewards in RL, this paper proposes a novel approach for automatic auxiliary reward generation. Its practical implications for improving learning efficiency and stability in manipulation tasks indicate a notable potential impact.
8.6 Mixed Q-Functionals: Advancing Value-Based Methods in Cooperative MARL with Continuous Action Domains
- Authors: Yasin Findik, S. Reza Ahmadzadeh
- Reason: Introduces a novel algorithm in a challenging multi-agent reinforcement learning (MARL) domain which often has significant impact, and backed by promising empirical results.
8.5 Corruption Robust Offline Reinforcement Learning with Human Feedback
- Authors: Debmalya Mandal, Andi Nika, Parameswaran Kamalaruban, Adish Singla, Goran Radanović
- Reason: Addresses a novel and relevant problem of data corruption robustness in offline RL with human feedback, which is highly applicable in practical scenarios where data corruption can occur, and offers provable performance guarantees.
8.5 Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning
- Authors: Alex Christopher Stutts, Danilo Erricolo, Theja Tulabandhula, Amit Ranjan Trivedi
- Reason: Proposes an interesting statistical approach to uncertainty in deep Q networks with solid experimental validation, adding value to current methodologies in the area of uncertainty estimation in RL.
8.3 Scaling Laws for Fine-Grained Mixture of Experts
- Authors: Jakub Krajewski, Jan Ludziejewski, Kamil Adamczewski, Maciej Pióro, Michał Krutul, Szymon Antoniak, Kamil Ciebiera, Krystian Król, Tomasz Odrzygóźdź, Piotr Sankowski, Marek Cygan, Sebastian Jaszczur
- Reason: Authors provide a deep analysis of Mixture of Experts (MoE) models, a relevant topic in reinforcement learning related to architecture design, and establish scaling laws which can guide future research.
8.2 Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
- Authors: Han Shen, Zhuoran Yang, Tianyi Chen
- Reason: Discusses a novel approach for bilevel RL problems, extending potential solutions to complex tasks including RLHF and incentive design, and could have implications on developing more robust and efficient RL algorithms for solving dynamic optimization problems.
8.0 Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
- Authors: Noam Razin, Yotam Alexander, Edo Cohen-Karlik, Raja Giryes, Amir Globerson, Nadav Cohen
- Reason: Offers new theoretical insights into the implicit bias of policy gradient methods in LQR, a fundamental problem in reinforcement learning with potential wide impact on understanding generalization in policy learning.
7.9 Learn to Teach: Improve Sample Efficiency in Teacher-student Learning for Sim-to-Real Transfer
- Authors: Feiyang Wu, Zhaoyuan Gu, Ye Zhao, Anqi Wu
- Reason: Proposes a sample-efficient learning framework by leveraging the teacher-student paradigm, with potential to significantly improve sim-to-real transfer in robotics, hence impacting both the fields of reinforcement learning and robotics.
7.9 Policy Improvement using Language Feedback Models
- Authors: Victor Zhong, Dipendra Misra, Xingdi Yuan, Marc-Alexandre Côté
- Reason: Presents an innovative approach using language models to improve policy learning, showing strong empirical results and the potential to influence imitation learning in instruction-following tasks.
7.8 Online Sequential Decision-Making with Unknown Delays
- Authors: Ping Wu, Heyan Huang, Zhengyang Liu
- Reason: Addresses the problem of decision-making with unknown delays, with broad applications in reinforcement learning, and provides new algorithms and theoretical guarantees which may be influential for future research in online learning settings.