9.1 A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees
- Authors: Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo
- Reason: This paper introduces a policy gradient primal-dual algorithm with uniform probably approximately correct (Uniform-PAC) guarantees for constrained Markov decision processes (CMDPs), offering theoretical and empirical enhancements to previous algorithms without such guarantees.
9.0 Step-size Optimization for Continual Learning
- Authors: Thomas Degris, Khurram Javed, Arsalan Sharifnassab, Yuxin Liu, Richard Sutton
- Reason: Co-authored by Richard Sutton, a prominent authority in the field of reinforcement learning, this paper presents improvements in learning algorithms’ step-size vectors, potentially influencing the optimization procedures in RL methods.
8.9 Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion
- Authors: Tairan He, Chong Zhang, Wenli Xiao, Guanqi He, Changliu Liu, Guanya Shi
- Reason: Integrates advanced learning-based control with robotics, a critical RL application domain, and proposes a promising framework to address safety and agility in legged robot locomotion, presented with real-world experiments.
8.6 Game-Theoretic Unlearnable Example Generator
- Authors: Shuang Liu, Yihan Wang, Xiao-Shan Gao
- Reason: This paper proposes a novel attack method from a game-theoretic perspective in the context of unlearnable example attacks, intersecting RL with adversarial machine learning, which is a burgeoning area of research.
8.5 Graph Attention-based Reinforcement Learning for Trajectory Design and Resource Assignment in Multi-UAV Assisted Communication
- Authors: Zikai Feng, Di Wu, Mengxing Huang, Chau Yuen
- Reason: Tackles the dynamic and complex problem of trajectory design and resource management in UAV-assisted communication networks using a graph attention model, integrating ideas from multi-agent systems in the RL framework.