- 9.2 Minimax Exploiter: A Data Efficient Approach for Competitive Self-Play
- Authors: Daniel Bairamian, Philippe Marcotte, Joshua Romoff, Gabriel Robert, Derek Nowrouzezahrai
- Reason: The paper presents a novel approach using game theory to exploit learning agents in competitive self-play, an area of high interest and impact in reinforcement learning. Coming from reputable authors and validated in various settings including modern video games, the paper is likely to influence strategies in MARL, having immediate real-world applications.
- 8.8 Bias Resilient Multi-Step Off-Policy Goal-Conditioned Reinforcement Learning
- Authors: Lisheng Wu, Ke Chen
- Reason: Addresses a significant challenge in goal-conditioned reinforcement learning related to off-policy bias and multi-step learning. It offers practical solutions potentially enhancing learning efficiency and performance, promising wide influence in the development of more robust GCRL methods.
- 8.4 Maximum Entropy Model Correction in Reinforcement Learning
- Authors: Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand
- Reason: Proposes an approach to alleviate model error and speed up convergence to the true value function using a novel MaxEnt Model Correction procedure. With strong theoretical backing and potential practical improvements in planning with approximate models, it will likely influence future research directions in model-based RL.
- 7.9 Federated Online and Bandit Convex Optimization
- Authors: Kumar Kshitij Patel, Lingxiao Wang, Aadirupa Saha, Nati Sebro
- Reason: Introduces insights into the benefits of collaboration in distributed online optimization, especially under limited feedback settings. It bridges the gap between stochastic and adaptive scenarios, providing grounds for advancements in federated learning and optimization.
- 7.5 LanGWM: Language Grounded World Model
- Authors: Rudra P.K. Poudel, Harit Pandya, Chao Zhang, Roberto Cipolla
- Reason: Explores the intersection of language models and reinforcement learning for improving state abstraction and action selection. The approach’s potential to improve human-robot interaction and its state-of-the-art performance in visual navigation tasks make it a compelling contribution to the field.