9.6 SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores
- Authors: Zhiyu Mei, Wei Fu, Guangju Wang, Huanchen Zhang, Yi Wu
- The paper presents a novel abstraction on RL training dataflows, a scalable and efficient distributed system called “Really Scalable RL” (SRL), and an evaluation of SRL against existing libraries. It also benchmarks against OpenAI’s Rapid in a challenging hide-and-seek environment and conducts large-scale RL experiments.
9.4 Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning
- Authors: Qiang He, Tianyi Zhou, Meng Fang, Setareh Maghsudi
- The paper introduces “Eigensubspace Regularized Critic” (ERC), a value approximation method for deep RL. ERC uses a regulariser to guide the approximation error and proves its convergence theoretically. It also demonstrates superior performance in the DMControl benchmark.
9.3 Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
- Authors: Alexander Meulemans, Simon Schug, Seijin Kobayashi, Nathaniel Daw, Gregory Wayne
- This paper introduces “Counterfactual Contribution Analysis” (COCOA), a new method of model-based credit assignment. It measures action contributions towards rewarding outcomes and achieves higher performance than HCA and common baselines in a suite of problems.
9.2 SARC: Soft Actor Retrospective Critic
- Authors: Sukriti Verma, Ayush Chopra, Jayakumar Subramanian, Mausoom Sarkar, Nikaash Puri, Piyush Gupta, Balaji Krishnamurthy
- This paper presents Soft Actor Retrospective Critic (SARC), a method for faster critic convergence and better policy gradient estimates. The paper shows consistent improvement over SAC on benchmark environments.
9.0 Policy Space Diversity for Non-Transitive Games
- Authors: Jian Yao, Weiming Liu, Haobo Fu, Yaodong Yang, Stephen McAleer, Qiang Fu, Wei Yang
- The paper proposes a new diversity metric for non-transitive games, enhances the Policy-Space Response Oracles (PSRO) framework and presents the convergence property of PSD-PSRO while outperforming state-of-the-art PSRO variants in various games.