- 9.5 Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
- Authors: Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
- Reason: Methodology proposed has demonstrated significant improvement in test generation quality. This not only has applications in software development but also has a potential impact on machine learning models’ quality and performance.
- 9.5 Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design
- Authors: Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster
- Reason: This paper stands out because of its novel approach to meta-learning for reinforcement learning tasks and has been accepted at a reputed conference (NeurIPS 2023). The authors’ focus on environment design and regret-based algorithm evaluation is a significant contribution.
- 9.3 Learning Optimal Advantage from Preferences and Mistaking it for Reward
- Authors: W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum
- Reason: The paper investigates human preference models in reinforcement learning which is crucial for creating RL models that interact with or learn from human input.
- 9.2 Searching for High-Value Molecules Using Reinforcement Learning and Transformers
- Authors: Raj Ghugare, Santiago Miret, Adriana Hugessen, Mariano Phielipp, Glen Berseth
- Reason: The application of RL and transformers to high-value molecule search is novel. The authors provide a comprehensive survey and results on multiple problem domains, strengthening the potential influence of the paper.
- 9.0 Distributionally Safe Reinforcement Learning under Model Uncertainty: A Single-Level Approach by Differentiable Convex Programming
- Authors: Alaa Eddine Chriat, Chuangchuang Sun
- Reason: The approach developed is first of its kind and tackles the critical issue of safety in RL, especially in safety-critical environments.
- 8.9 Use Your INSTINCT: INSTruction optimization usIng Neural bandits Coupled with Transformers
- Authors: Xiaoqiang Lin, Zhaoxuan Wu, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low
- Reason: The paper introduces a new method to tune the instructions given to large language models, which might have a significant impact on the performance of many applications using these models.
- 8.8 Learning to Reach Goals via Diffusion
- Authors: Vineet Jain, Siamak Ravanbakhsh
- Reason: A novel approach to goal-conditioned reinforcement learning presented in the paper could potentially improve optimization strategies with RL.
- 8.5 Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
- Authors: Weidong Liu, Jiyuan Tu, Yichen Zhang, Xi Chen
- Reason: The paper introduces a novel procedure for policy evaluation in reinforcement learning, considering robust statistics that can handle outlier and heavy-tailed distributions. It solves a major issue in policy evaluation in RL and hence has a major potential impact on the community.
- 8.5 Expected flow networks in stochastic environments and two-player zero-sum games
- Authors: Marco Jiralerspong, Bilun Sun, Danilo Vucetic, Tianyu Zhang, Yoshua Bengio, Gauthier Gidel, Nikolay Malkin
- Reason: The author list includes some of the leading figures on machine learning and specifically reinforcement learning (like Yoshua Bengio), which is precisely the basses of this paper.
- 8.1 Towards Fully Adaptive Regret Minimization in Heavy-Tailed Bandits
- Authors: Gianmarco Genalti, Lupo Marsigli, Nicola Gatti, Alberto Maria Metelli
- Reason: Although niched, the paper’s exploration of adaptive heavy-tailed bandit problems and the proposed solutions are interesting and can be influential in heavy-tailed bandit problems and regret minimization research.