9.8 Policy-Based Self-Competition for Planning Problems
- Authors: Jonathan Pirnay, Quirin Göttl, Jakob Burger, Dominik Gerhard Grimm.
- Reason: It adapts AlphaZero-type algorithms for single-player tasks, proposing a new technique called “self-competition.” By directly incorporating a historical policy into the planning process, the authors achieved promising results in combinatorial optimization problems.
9.6 End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
- Authors: Yves Rychener, Daniel Kuhn, Tobias Sutter
- While this paper is not entirely focused on reinforcement learning, it offers a novel perspective on end-to-end learning in stochastic optimization. The authors show that the standard end-to-end learning algorithm can be interpreted in a Bayesian framework, proposing intriguing new end-to-end learning algorithms for empirical risk minimization and distributionally robust optimization problems. These insights could be highly influential for RL researchers.
9.5 Balancing of competitive two-player Game Levels with Reinforcement Learning
- Authors: Florian Rupp, Manuel Eberhardinger, Kai Eckert.
- Reason: It presents an innovative architecture for automated balancing of tile-based levels in games, exploiting a novel family of swap-based representations to increase robustness. This potentially influential paper streamlines the process of game level design in the competitive gaming sector.
9.3 Rethinking Weak Supervision in Helping Contrastive Learning
- Authors: Jingyi Cui, Weiran Huang, Yifei Wang, Yisen Wang
- This paper explores how weak supervision can aid contrastive learning, a crucial part of reinforcement learning. With new theoretical findings and practical strategies presented, it can provide new insights for those working with weak supervision and reinforcement learning.
9.3 Timing Process Interventions with Causal Inference and Reinforcement Learning
- Authors: Hans Weytjens, Wouter Verbeke, Jochen De Weerdt.
- Reason: This paper provides valuable insights into the effectiveness of RL over causal inference (CI) for process interventions. The authors demonstrated RL’s superior performance and wider applicability to various PresPM problems through experiments with synthetic data.
9.2 Reinforcement Learning-Based Control of CrazyFlie 2.X Quadrotor
- Authors: Arshad Javeed, Valentín López Jiménez, Johan Grönqvist
- Reason: This paper explores the synergy between classical control algorithms and modern reinforcement learning, a hot topic in the RL research. The practical application of controlling quadrotors and well-established authors brings a potential high influence.
9.1 Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL
- Authors: Peng Cheng, Xianyuan Zhan, Zhihao Wu, Wenjia Zhang, Shoucheng Song, Han Wang, Youfang Lin, Li Jiang
- The paper provides new insights into offline reinforcement learning. The authors propose a new method (TSRL) for utilizing the symmetry of system dynamics that can significantly enhance the performance of offline RL under small datasets. This finding enables a much-needed improvement on data-efficiency in RL.
9.1 Dual policy as self-model for planning
- Authors: Jaesung Yoo, Fernanda de la Torre, Guangyu Robert Yang.
- Reason: By utilizing a distilled policy network as the self-model, the authors manage to secure more stable training, faster inference, and better exploration for RL agents. Despite the additional cost of distilling a new network, this paper’s method indicates potential in improving reinforcement learning paradigms.
9.0 Agent Performing Autonomous Stock Trading under Good and Bad Situations
- Authors: Yunfei Luo, Zhangqi Duan
- Reason: Research on ML for stock trading is widely investigated, and a successful approach could have profound impacts. The paper introduces a robust model improving previous methods and being evaluated on well-known stocks during various market conditions.
9.0 Improving Hyperparameter Learning under Approximate Inference in Gaussian Process Models
- Authors: Rui Li, ST John, Arno Solin
- This work, while more focused on Gaussian Process models, can still be influential as it tackles the intricate problem of model hyperparameters. The proposed hybrid training procedure could be beneficial for RL algorithms that perform hyperparameter optimization.
9.0 Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
- Authors: Alexandre Rame, Guillaume Couairon, Mustafa Shukor, Corentin Dancette, Jean-Baptiste Gaya, Laure Soulier, Matthieu Cord.
- Reason: A novel approach to account for the diversity of objectives and human opinions in RL training. This paper’s approach to generalize across a space of preferences by fine-tuning multiple networks independently and then interpolating their weights could provide a promising avenue for future alignment of deep models.
8.8 Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory
- Authors: Daniel C.H. Tan, Fernando Acero, Robert McCarthy, Dimitrios Kanoulas, Zhibin Li
- Reason: This paper deals with safety issues in RL, a critical topic particularly in autonomous systems. The risk reduction approach and theoretical links they establish between value functions and control barrier functions potentially influence safety-focused RL research.
8.8 Causally Learning an Optimal Rework Policy
- Authors: Oliver Schacht, Sven Klaassen, Philipp Schwarz, Martin Spindler, Daniel Grünbaum, Sebastian Imhof
- Even though this paper specialized in manufacturing processes, the approach of using double/debiased machine learning to estimate the conditional treatment effect can be applied in the context of RL. This could help in the development of more effective rework policies in reinforcement learning.
8.7 Finding Counterfactually Optimal Action Sequences in Continuous State Spaces
- Authors: Stratis Tsirtsis, Manuel Gomez-Rodriguez
- Reason: The paper tackles an important aspect of RL, finding optimal actions in continuous spaces. Although it presents a significant mathematical contribution, its practical exploitation (claimed non-polynomial complexity) could limit its immediate applicability.
8.6 Generalization Across Observation Shifts in Reinforcement Learning
- Authors: Anuj Mahajan, Amy Zhang
- This paper is focused on achieving good generalization across environment shifts in Reinforcement Learning, a critical necessity for real-world applications. The authors provide novel theoretical bounds and perform empirical analysis to verify their approach.
8.5 NTKCPL: Active Learning on Top of Self-Supervised Model by Estimating True Coverage
- Authors: Ziting Wen, Oscar Pizarro, Stefan Williams
- Reason: Although not strictly on RL, the paper introduces an important contribution on active learning methodology combined with self-supervised learning, a technique being increasingly adopted in RL. This paper could influence active learning research, including its application in RL.
8.3 Divide and Repair: Using Options to Improve Performance of Imitation Learning Against Adversarial Demonstrations
- Authors: Prithviraj Dasgupta
- Considering the problem of learning to perform a task from potentially adversarial demonstrations, this paper introduces a novel technique for identifying and utilizing temporelly extended policies or options. The author aims to enhance the learner’s performance by combining theory and experiments.
8.0 Convergence of SARSA with linear function approximation: The random horizon case
- Authors: Lina Palmborg
- This paper investigates the convergence of the reinforcement learning algorithm SARSA for random horizon Markov decision problems. The author explores this previously unexplored area to understand the impact on algorithmic convergence.
7.9 Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
- Authors: Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, Song Mei
- While this paper provides a comprehensive statistical theory for transformers to perform in-context learning, it also delves into how these transformers can be used to implement a broad range of standard machine learning algorithms, showcasing the versatility of transformers in both theory and practice.
7.6 Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design
- Authors: Julien Roy, Pierre-Luc Bacon, Christopher Pal, Emmanuel Bengio
- Contributing to the less explored domain of in-silico molecular design in the machine learning community, this paper highlights a novel goal-conditioned molecular generation technique for a balanced exploration of the entire Pareto front that is more controllable.