- 9.5 Unexpected Improvements to Expected Improvement for Bayesian Optimization
- Authors: Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, Eytan Bakshy
- Reason: This paper addresses a critical issue in Bayesian optimization by proposing LogEI, a new family of acquisition functions. By presenting a solution to the numerical optimization problem in classic EI and EHVI, the authors have genuinely advanced the field, particularly given the empirical evidence provided.
- 9.3 A Kernel Perspective on Behavioural Metrics for Markov Decision Processes
- Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland
- Reason: This paper provides valuable insights into the use of behavioural metrics in reinforcement learning. The authors redefine the recent MICo distance using kernel methods and contribute new theoretical results, including bounding value function differences. The contributions are complemented by strong empirical results, demonstrating the efficacy of their proposed methodologies.
- 9.3 Offline RL with Observation Histories: Analyzing and Improving Sample Complexity
- Authors: Joey Hong, Anca Dragan, Sergey Levine
- Reason: The authors give a valuable contribution to the realm of Offline RL. They have made a significant theoretical exploration and proposed a practical solution that may enhance worst-case sample complexity.
- 9.0 SERA: Sample Efficient Reward Augmentation in offline-to-online Reinforcement Learning
- Authors: Ziqi Zhang, Xiao Xiong, Zifeng Zhuang, Jinxin Liu, Donglin Wang
- Reason: This paper introduces the Sample Efficient Reward Augmentation (SERA) framework, enhancing exploration during online fine-tuning and offline-to-online performance. The framework’s versatility is demonstrated by its integration into various RL algorithms and its asymptotic performance improvement.
- 8.9 Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback
- Authors: Max Balsells, Marcel Torne, Zihan Wang, Samedh Desai, Pulkit Agrawal, Abhishek Gupta
- Reason: This research describes an original system for real-world reinforcement learning. It is of high importance not just because it allows robots to train autonomously without interruption, but also because it manages to effectively use human input for improving its learning process.
- 8.7 Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates
- Authors: Guangchen Lan, Han Wang, James Anderson, Christopher Brinton, Vaneet Aggarwal
- Reason: The authors present a solution to a critical bottleneck in Federated reinforcement learning (FedRL) - the high communication overhead. By leveraging an alternating direction method of multipliers (ADMM), they reduce the communication complexity at each iteration. The practical relevance of this paper is shown through evaluations in MuJoCo environments.
- 8.7 Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents
- Authors: Woojun Kim, Yongjae Shin, Jongeui Park, Youngchul Sung
- Reason: This is an innovative paper dealing with primacy bias in Deep RL. The proposed reset-based method leveraging deep ensemble learning has shown effectiveness in improving sample efficiency and ensuring safety through various experiments.
- 8.5 Handover Protocol Learning for LEO Satellite Networks: Access Delay and Collision Minimization
- Authors: Ju-Hyung Lee, Chanyoung Park, Soohyun Park, Andreas F. Molisch
- Reason: The study presents a novel deep reinforcement learning based handover protocol, DHO, which could be useful in satellite networks. The empirical evaluation results showing effective performance make the contribution substantial.
- 8.3 Efficient Exploration in Continuous-time Model-based Reinforcement Learning
- Authors: Lenart Treven, Jonas Hübotter, Bhavya Sukhija, Florian Dörfler, Andreas Krause
- Reason: This paper introduces a novel model-based reinforcement learning algorithm for continuous-time systems. The authors showcase benefits of continuous-time modeling over discrete-time counterparts and potential for sublinear regret with significantly fewer samples.
- 8.1 Posterior Sampling for Competitive RL: Function Approximation and Partial Observation
- Authors: Shuang Qiu, Ziyu Dai, Han Zhong, Zhaoran Wang, Zhuoran Yang, Tong Zhang
- Reason: The study contributes to the understanding of competitive reinforcement learning applied to zero-sum Markov games under self-play and adversarial learning settings. The proposed methods demonstrate low regret bounds and can advance the development of learning algorithms for competitive RL.