9.2 Reinforcement Learning for Solving Stochastic Vehicle Routing Problem
- Authors: Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac
- Reason: The paper presents a novel RL framework for a complex optimization problem, promising significant improvements in travel costs, and is accepted to a reputable conference (ACML24), which suggests a high potential for influence.
8.9 Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
- Authors: Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li
- Reason: This study tackles a challenging aspect of RL in environments with adversarial changes and unknown dynamics, providing a novel algorithm with sublinear regret guarantees, indicating a strong potential impact on the field.
8.9 Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
- Authors: Yifei Zhou, Ayush Sekhari, Yuda Song, Wen Sun
- Reason: This paper is influential as it presents a novel hybrid RL algorithm that integrates off-policy training with on-policy methods, providing a robust theoretical framework alongside practical benefits. The authors’ affiliations and the paper’s comprehensive treatment of the subject matter increase its potential impact.
8.8 Communication-Constrained Bayesian Active Knowledge Distillation
- Authors: Victor Croisfelt, Shashi Raj Pandey, Osvaldo Simeone, Petar Popovski
- Reason: The unique combination of Bayesian active learning with communication constraints could significantly influence the efficiency of knowledge transfer in RL, especially in scenarios with limited communication bandwidth.
8.7 On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
- Authors: Nicholas E. Corrado, Josiah P. Hanna
- Reason: The development of the Proximal Robust On-Policy Sampling (PROPS) method to improve data efficiency for on-policy policy gradient algorithms is a significant contribution to the reinforcement learning field. The authors demonstrate empirical results that enhance our understanding of the on-policy vs off-policy dichotomy.
8.6 Out-of-Distribution Knowledge Distillation via Confidence Amendment
- Authors: Zhilin Zhao, Longbing Cao, Yixuan Zhang
- Reason: As OOD detection is crucial for safe and robust AI, including RL applications, this paper’s innovative approach to OOD detection could have strong implications for the field.
8.5 Adversarial Preference Optimization
- Authors: Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Nan Du
- Reason: The proposed framework addresses the important issue of human preference alignment in RL, which can significantly enhance interaction quality with LLMs, indicating potential for high impact in practical applications.
8.5 Ensemble sampling for linear bandits: small ensembles suffice
- Authors: David Janz, Alexander E. Litvak, Csaba Szepesvári
- Reason: Ensemble sampling is a critical technique in stochastic linear bandits, and this paper provides the first rigorous analysis showing smaller ensembles can yield near-optimal regret bounds, challenging previous beliefs in the field.
8.3 Introducing an Improved Information-Theoretic Measure of Predictive Uncertainty
- Authors: Kajetan Schweighofer, Lukas Aichberger, Mykyta Ielanskyi, Sepp Hochreiter
- Reason: Although not directly on reinforcement learning, this paper could have an influence on the field due to its contributions to predictive uncertainty, a critical aspect of decision-making in reinforcement learning. The authoritative status of Sepp Hochreiter, one of the authors, also adds to its potential impact.
7.9 MVSA-Net: Multi-View State-Action Recognition for Robust and Deployable Trajectory Generation
- Authors: Ehsan Asali, Prashant Doshi, Jin Sun
- Reason: This paper addresses a pragmatic challenge in learn-from-observation paradigms, which are related to reinforcement learning applications in robotics. It is influential due to its practical contribution towards making robotic learning more robust to occlusions with multi-view analysis.