- 9.8 DiffTOP: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning
- Authors: Weikang Wan, Yufei Wang, Zackory Erickson, David Held
- Reason: This paper introduces a novel approach integrating trajectory optimization into deep reinforcement and imitation learning, potentially solving the objective mismatch issue in model-based RL algorithms and outperforming state-of-the-art methods in high-dimensional tasks. The paper involves high-profile institutions and has shown empirical success over a diverse range of tasks.
- 9.6 QGFN: Controllable Greediness with Action Values
- Authors: Elaine Lau, Stephen Zhewen Lu, Ling Pan, Doina Precup, Emmanuel Bengio
- Reason: This paper presents a novel method to bias Generative Flow Networks (GFNs) towards producing higher utility samples without sacrificing diversity, leveraging connections between GFNs and RL. The inclusion of prominent authors like Doina Precup and the practical implications suggest significant influence.
- 9.4 FlowPG: Action-constrained Policy Gradient with Normalizing Flows
- Authors: Janaka Chathuranga Brahmanage, Jiajing Ling, Akshat Kumar
- Reason: The paper addresses a key challenge in action-constrained reinforcement learning with a potentially faster and less restrictive method, offering empirical advantages in training speed and constraint violations over existing methods.
- 9.1 Meta-learning the mirror map in policy mirror descent
- Authors: Carlo Alfano, Sebastian Towers, Silvia Sapora, Chris Lu, Patrick Rebeschini
- Reason: The paper provides an empirical investigation into the influence of mirror map choices on the efficacy of Policy Mirror Descent within reinforcement learning, indicating a novel approach for meta-learning in various benchmark environments and suggesting the potential for broader applications.
- 9.1 Offline Actor-Critic Reinforcement Learning Scales to Large Models
- Authors: Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller
- Reason: The paper discusses the scalability of offline actor-critic RL to large models, an important topic for the advancement of RL research. The involvement of Nicolas Heess and Martin Riedmiller, known for their contributions to DeepMind’s groundbreaking work, adds to the paper’s authority.
- 9.0 Convergence for Natural Policy Gradient on Infinite-State Average-Reward Markov Decision Processes
- Authors: Isaac Grosof, Siva Theja Maguluri, R. Srikant
- Reason: This paper represents a substantial theoretical contribution, providing the first convergence rate bound for Natural Policy Gradient (NPG) in infinite-state average-reward MDPs and demonstrating how to leverage the MaxWeight policy to achieve convergence, priming further research in policy-gradient based algorithms for complex systems.
- 8.9 Improving Token-Based World Models with Parallel Observation Prediction
- Authors: Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor
- Reason: The proposed method addresses the training bottleneck in token-based world models (TBWMs) and showcases significant speed improvements with superhuman performance on a benchmark. Shie Mannor is a prominent figure in RL, which adds credibility to the paper’s potential influence.
- 8.7 Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization
- Authors: Talha Bozkus, Urbashi Mitra
- Reason: This paper introduces a novel ensemble RL algorithm and provides theoretical justification and empirical results showing performance improvements over state-of-the-art Q-learning. Urbashi Mitra’s expertise in electrical engineering could signify impactful interdisciplinary advances.
- 8.5 Differentially Private Model-Based Offline Reinforcement Learning
- Authors: Alexandre Rio, Merwan Barlier, Igor Colin, Albert Thomas
- Reason: The paper addresses the important issue of privacy in offline RL, a topic that is becoming increasingly relevant as RL is applied to more sensitive domains. The empirical results indicate practical value, and differential privacy is a hot area of research.
- 8.3 Reinforcement Learning as a Catalyst for Robust and Fair Federated Learning: Deciphering the Dynamics of Client Contributions
- Authors: Jialuo He, Wei Chen, Xiaojin Zhang
- Reason: This research applies RL to enhance federated learning, an area of growing importance given the expanding need for privacy-preserving machine learning. The results indicate better robustness and fairness, which are critical for future adoption in real-world applications.