9.8 Kernelized Reinforcement Learning with Order Optimal Regret Bounds
- Authors: Sattar Vakili, Julia Olkhovskaya
- Reason: Expands on optimization techniques, order optimal regret bounds under general setting, and improves upon results with non-smooth kernels.
9.8 Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second
- Authors: Vincent-Pierre Berges, Andrew Szot, Devendra Singh Chaplot, Aaron Gokaslan, Roozbeh Mottaghi, Dhruv Batra, Eric Undersander
- Reason: Authors with significant contributions in the field and the paper introduces a unique and rapid RL framework to advance robotics.
9.6 A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
- Authors: Kihyuk Hong, Yuhang Li, Ambuj Tewari
- Reason: Novel algorithm for offline constrained RL and improvements over existing models with lesser assumptions.
9.4 Provably Learning Nash Policies in Constrained Markov Potential Games
- Authors: Pragnya Alatur, Giorgia Ramponi, Niao He, Andreas Krause
- Reason: Addresses complexities in Multi-agent reinforcement learning and provides a strong practical approach for safe MARL problems.
9.4 A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning
- Authors: Siyuan Guo, Yanchao Sun, Jifeng Hu, Sili Huang, Hechang Chen, Haiyin Piao, Lichao Sun, Yi Chang
- Reason: Introduces a unified framework to solve critical challenges in offline-to-online RL, demonstrating strong performance across various environments.
9.2 Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes
- Authors: Luca Sabbioni, Francesco Corda, Marcello Restelli
- Reason: Adequately tackles issues with currently used policy-based algorithm and applies Meta reinforcement learning to improve the adaptability of the learning rate in different environments.
9.2 Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
- Authors: Zeyu Zhang, Yi Su, Hui Yuan, Yiran Wu, Rishab Balasubramanian, Qingyun Wu, Huazheng Wang, Mengdi Wang
- Reason: Proposes a new application of offline RL techniques to learning to rank problems, showing consistent improvement across various datasets.
9.0 Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
- Authors: Lequn Wang, Akshay Krishnamurthy, Aleksandrs Slivkins
- Reason: New development on pessimistic policy optimization. Advantageous in broad applicability as it reduces supervized learning. Results did show superior performance to unregularized policy optimization.
8.9 Robust Reinforcement Learning through Efficient Adversarial Herding
- Authors: Juncheng Dong, Hao-Lun Hsu, Qitong Gao, Vahid Tarokh, Miroslav Pajic
- Reason: An innovative method is proposed to enhance the robustness of RL agents under various scenarios, offering an advanced application of adversarial herding.
8.4 Composing Efficient, Robust Tests for Policy Selection
- Authors: Dustin Morrill, Thomas J. Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone
- Reason: The research presents a new algorithm to optimize policy selection test cases under a robustness criterion, showing its effectiveness across different tasks.