- 9.2 Reinforcement Learning for Solving Stochastic Vehicle Routing Problem
- Authors: Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac
- Reason: The paper presents a novel RL framework for a complex optimization problem, promising significant improvements in travel costs, and is accepted to a reputable conference (ACML24), which suggests a high potential for influence.
- 8.9 Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
- Authors: Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li
- Reason: This study tackles a challenging aspect of RL in environments with adversarial changes and unknown dynamics, providing a novel algorithm with sublinear regret guarantees, indicating a strong potential impact on the field.
- 8.9 Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
- Authors: Yifei Zhou, Ayush Sekhari, Yuda Song, Wen Sun
- Reason: This paper is influential as it presents a novel hybrid RL algorithm that integrates off-policy training with on-policy methods, providing a robust theoretical framework alongside practical benefits. The authors’ affiliations and the paper’s comprehensive treatment of the subject matter increase its potential impact.
- 8.8 Communication-Constrained Bayesian Active Knowledge Distillation
- Authors: Victor Croisfelt, Shashi Raj Pandey, Osvaldo Simeone, Petar Popovski
- Reason: The unique combination of Bayesian active learning with communication constraints could significantly influence the efficiency of knowledge transfer in RL, especially in scenarios with limited communication bandwidth.
- 8.7 On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
- Authors: Nicholas E. Corrado, Josiah P. Hanna
- Reason: The development of the Proximal Robust On-Policy Sampling (PROPS) method to improve data efficiency for on-policy policy gradient algorithms is a significant contribution to the reinforcement learning field. The authors demonstrate empirical results that enhance our understanding of the on-policy vs off-policy dichotomy.
- 8.6 Out-of-Distribution Knowledge Distillation via Confidence Amendment
- Authors: Zhilin Zhao, Longbing Cao, Yixuan Zhang
- Reason: As OOD detection is crucial for safe and robust AI, including RL applications, this paper’s innovative approach to OOD detection could have strong implications for the field.
- 8.5 Adversarial Preference Optimization
- Authors: Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Nan Du
- Reason: The proposed framework addresses the important issue of human preference alignment in RL, which can significantly enhance interaction quality with LLMs, indicating potential for high impact in practical applications.
- 8.5 Ensemble sampling for linear bandits: small ensembles suffice
- Authors: David Janz, Alexander E. Litvak, Csaba Szepesvári
- Reason: Ensemble sampling is a critical technique in stochastic linear bandits, and this paper provides the first rigorous analysis showing smaller ensembles can yield near-optimal regret bounds, challenging previous beliefs in the field.
- 8.3 Introducing an Improved Information-Theoretic Measure of Predictive Uncertainty
- Authors: Kajetan Schweighofer, Lukas Aichberger, Mykyta Ielanskyi, Sepp Hochreiter
- Reason: Although not directly on reinforcement learning, this paper could have an influence on the field due to its contributions to predictive uncertainty, a critical aspect of decision-making in reinforcement learning. The authoritative status of Sepp Hochreiter, one of the authors, also adds to its potential impact.
- 7.9 MVSA-Net: Multi-View State-Action Recognition for Robust and Deployable Trajectory Generation
- Authors: Ehsan Asali, Prashant Doshi, Jin Sun
- Reason: This paper addresses a pragmatic challenge in learn-from-observation paradigms, which are related to reinforcement learning applications in robotics. It is influential due to its practical contribution towards making robotic learning more robust to occlusions with multi-view analysis.