- 9.8 Kernelized Reinforcement Learning with Order Optimal Regret Bounds
- Authors: Sattar Vakili, Julia Olkhovskaya
- Reason: Expands on optimization techniques, order optimal regret bounds under general setting, and improves upon results with non-smooth kernels.
- 9.8 Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second
- Authors: Vincent-Pierre Berges, Andrew Szot, Devendra Singh Chaplot, Aaron Gokaslan, Roozbeh Mottaghi, Dhruv Batra, Eric Undersander
- Reason: Authors with significant contributions in the field and the paper introduces a unique and rapid RL framework to advance robotics.
- 9.6 A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
- Authors: Kihyuk Hong, Yuhang Li, Ambuj Tewari
- Reason: Novel algorithm for offline constrained RL and improvements over existing models with lesser assumptions.
- 9.4 Provably Learning Nash Policies in Constrained Markov Potential Games
- Authors: Pragnya Alatur, Giorgia Ramponi, Niao He, Andreas Krause
- Reason: Addresses complexities in Multi-agent reinforcement learning and provides a strong practical approach for safe MARL problems.
- 9.4 A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning
- Authors: Siyuan Guo, Yanchao Sun, Jifeng Hu, Sili Huang, Hechang Chen, Haiyin Piao, Lichao Sun, Yi Chang
- Reason: Introduces a unified framework to solve critical challenges in offline-to-online RL, demonstrating strong performance across various environments.
- 9.2 Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes
- Authors: Luca Sabbioni, Francesco Corda, Marcello Restelli
- Reason: Adequately tackles issues with currently used policy-based algorithm and applies Meta reinforcement learning to improve the adaptability of the learning rate in different environments.
- 9.2 Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
- Authors: Zeyu Zhang, Yi Su, Hui Yuan, Yiran Wu, Rishab Balasubramanian, Qingyun Wu, Huazheng Wang, Mengdi Wang
- Reason: Proposes a new application of offline RL techniques to learning to rank problems, showing consistent improvement across various datasets.
- 9.0 Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
- Authors: Lequn Wang, Akshay Krishnamurthy, Aleksandrs Slivkins
- Reason: New development on pessimistic policy optimization. Advantageous in broad applicability as it reduces supervized learning. Results did show superior performance to unregularized policy optimization.
- 8.9 Robust Reinforcement Learning through Efficient Adversarial Herding
- Authors: Juncheng Dong, Hao-Lun Hsu, Qitong Gao, Vahid Tarokh, Miroslav Pajic
- Reason: An innovative method is proposed to enhance the robustness of RL agents under various scenarios, offering an advanced application of adversarial herding.
- 8.4 Composing Efficient, Robust Tests for Policy Selection
- Authors: Dustin Morrill, Thomas J. Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone
- Reason: The research presents a new algorithm to optimize policy selection test cases under a robustness criterion, showing its effectiveness across different tasks.