9.6 Deep Backtracking Counterfactuals for Causally Compliant Explanations
- Authors: Klaus-Rudolf Kladny, Julius von Kügelgen, Bernhard Schölkopf, Michael Muehlebach
- Reason: The paper provides a novel technique to compute backtracking counterfactuals in structural causal models. The authors’ method increases versatility, modularity and causally compliant alternatives in reinforcement learning.
9.3 MatFormer: Nested Transformer for Elastic Inference
- Authors: Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain
- Reason: The paper introduces MatFormer, a nested Transformer architecture that provides flexibility in several deployment constraints. Their approach allows for hundreds of accurate smaller models, making it essential in the field of reinforcement learning.
9.1 Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages
- Authors: Guozheng Ma, Lu Li, Sen Zhang, Zixuan Liu, Zhen Wang, Yixin Chen, Li Shen, Xueqian Wang, Dacheng Tao
- Reason: This paper provides an extensive review of the plasticity in visual reinforcement learning. It also introduces a new method of addressing the high replay ratio dilemma, making it important in the reinforcement learning sphere.
8.9 Imitation Learning from Purified Demonstration
- Authors: Yunke Wang, Minjing Dong, Bo Du, Chang Xu
- Reason: Introduces a unique approach to directly tackle the challenges of imperfect demonstrations in reinforcement learning and extraction of optimal demonstrations. The use of diffusion models in this context presents an innovative approach.
8.9 Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning
- Authors: Mirco Mutti, Riccardo De Santi, Marcello Restelli, Alexander Marx, Giorgia Ramponi
- Reason: The paper proposes a novel posterior sampling method, providing enhancement in exploiting prior knowledge. It’s particularly useful in reinforcement learning since it explicitly connects regret rate with the degree of prior knowledge.
8.7 Robust Safe Reinforcement Learning under Adversarial Disturbances
- Authors: Zeyang Li, Chuxiong Hu, Shengbo Eben Li, Jia Cheng, Yunan Wang
- Reason: Addresses a critical concern in real-world control situations by proposing a framework that tackles worst-case disturbances, ensuring persistent safety even in their presence. This work stands out for its focus on practical applicability and robustness in reinforcement learning.
8.7 Score Regularized Policy Optimization through Diffusion Behavior
- Authors: Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu
- Reason: This paper proposes a new method that regularizes policy gradient with the behavior distribution’s score function, significantly boosting action sampling speed while maintaining high performance. This paper’s techniques contribute to advances in reinforcement learning.
8.4 Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration
- Authors: Zeyang Li, Chuxiong Hu, Yunan Wang, Guojian Zhan, Jie Li, Shengbo Eben Li
- Reason: Presents novel insights into regularized policy iteration by proving its strict equivalence to the standard Newton-Raphson method for particular conditions. This work bridges an essential theoretical gap, thus enhancing our understanding of the convergence properties of regularized policy iteration algorithms.
8.2 Off-Policy Evaluation for Human Feedback
- Authors: Qitong Gao, Juncheng Dong, Vahid Tarokh, Min Chi, Miroslav Pajic
- Reason: Introduces a framework to accurately evaluate human feedback signals in reinforcement learning, addressing a largely unsolved challenge and showing potential to improve efficiency and safety in healthcare and other applications.
8.0 COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL
- Authors: Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn Wongkamjan, Huazhe Xu, Furong Huang
- Reason: Provides a novel planning-driven framework that addresses the inaccurately learned dynamics model problem. This plug-and-play framework significantly improves the sample efficiency and asymptotic performance of strong model-based methods.