- 9.6 Deep Backtracking Counterfactuals for Causally Compliant Explanations
- Authors: Klaus-Rudolf Kladny, Julius von Kügelgen, Bernhard Schölkopf, Michael Muehlebach
- Reason: The paper provides a novel technique to compute backtracking counterfactuals in structural causal models. The authors’ method increases versatility, modularity and causally compliant alternatives in reinforcement learning.
- 9.3 MatFormer: Nested Transformer for Elastic Inference
- Authors: Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain
- Reason: The paper introduces MatFormer, a nested Transformer architecture that provides flexibility in several deployment constraints. Their approach allows for hundreds of accurate smaller models, making it essential in the field of reinforcement learning.
- 9.1 Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages
- Authors: Guozheng Ma, Lu Li, Sen Zhang, Zixuan Liu, Zhen Wang, Yixin Chen, Li Shen, Xueqian Wang, Dacheng Tao
- Reason: This paper provides an extensive review of the plasticity in visual reinforcement learning. It also introduces a new method of addressing the high replay ratio dilemma, making it important in the reinforcement learning sphere.
- 8.9 Imitation Learning from Purified Demonstration
- Authors: Yunke Wang, Minjing Dong, Bo Du, Chang Xu
- Reason: Introduces a unique approach to directly tackle the challenges of imperfect demonstrations in reinforcement learning and extraction of optimal demonstrations. The use of diffusion models in this context presents an innovative approach.
- 8.9 Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning
- Authors: Mirco Mutti, Riccardo De Santi, Marcello Restelli, Alexander Marx, Giorgia Ramponi
- Reason: The paper proposes a novel posterior sampling method, providing enhancement in exploiting prior knowledge. It’s particularly useful in reinforcement learning since it explicitly connects regret rate with the degree of prior knowledge.
- 8.7 Robust Safe Reinforcement Learning under Adversarial Disturbances
- Authors: Zeyang Li, Chuxiong Hu, Shengbo Eben Li, Jia Cheng, Yunan Wang
- Reason: Addresses a critical concern in real-world control situations by proposing a framework that tackles worst-case disturbances, ensuring persistent safety even in their presence. This work stands out for its focus on practical applicability and robustness in reinforcement learning.
- 8.7 Score Regularized Policy Optimization through Diffusion Behavior
- Authors: Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu
- Reason: This paper proposes a new method that regularizes policy gradient with the behavior distribution’s score function, significantly boosting action sampling speed while maintaining high performance. This paper’s techniques contribute to advances in reinforcement learning.
- 8.4 Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration
- Authors: Zeyang Li, Chuxiong Hu, Yunan Wang, Guojian Zhan, Jie Li, Shengbo Eben Li
- Reason: Presents novel insights into regularized policy iteration by proving its strict equivalence to the standard Newton-Raphson method for particular conditions. This work bridges an essential theoretical gap, thus enhancing our understanding of the convergence properties of regularized policy iteration algorithms.
- 8.2 Off-Policy Evaluation for Human Feedback
- Authors: Qitong Gao, Juncheng Dong, Vahid Tarokh, Min Chi, Miroslav Pajic
- Reason: Introduces a framework to accurately evaluate human feedback signals in reinforcement learning, addressing a largely unsolved challenge and showing potential to improve efficiency and safety in healthcare and other applications.
- 8.0 COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL
- Authors: Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn Wongkamjan, Huazhe Xu, Furong Huang
- Reason: Provides a novel planning-driven framework that addresses the inaccurately learned dynamics model problem. This plug-and-play framework significantly improves the sample efficiency and asymptotic performance of strong model-based methods.