- 9.3 ComSD: Balancing Behavioral Quality and Diversity in Unsupervised Skill Discovery
- Authors: Xin Liu, Yaran Chen, Dongbin Zhao
- Reason: The authors propose a novel approach for optimizing the quality and diversity of unsupervised skill discovery in reinforcement learning. The balance of exploration and exploitation in practice, as well as contrastive learning and a novel dynamic weighting mechanism, suggest this paper might have a significant impact on the field.
- 9.2 Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
- Authors: Zihan Ding, Chi Jin
- Reason: The authors propose consistency models as a new policy representation for reinforcement learning that is both efficient and expressive. The performance improvements demonstrated in offline, offline-to-online, and online RL settings suggest this work may be influential.
- 9.1 Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness
- Authors: Xiaoyu Wen, Xudong Yu, Rui Yang, Chenjia Bai, Zhen Wang
- Reason: The authors address the significant challenge of using offline reinforcement learning for online RL adaptation. Their novel approach could have a significant impact on the efficiency and optimality of reinforcement learning.
- 9.1 Memory Gym: Partially Observable Challenges to Memory-Based Agents in Endless Episodes
- Authors: Marco Pleines, Matthias Pallasch, Frank Zimmer, Mike Preuss
- Abstract highlights: This paper introduces a unique benchmark designed to test Deep Reinforcement Learning agents, particularly comparing the performance of Gated Recurrent Unit (GRU) vs Transformer-XL (TrXL) on their ability to memorize long sequences, withstand noise, and generalize.
- 9.0 A Real-World Quadrupedal Locomotion Benchmark for Offline Reinforcement Learning
- Authors: Hongyin Zhang, Shuyu Yang, Donglin Wang
- Reason: This paper presents a new benchmark for assessing offline reinforcement learning algorithms with real-world legged locomotion tasks. The use of model predictive control and potential of ORL in this area may have significant impact on future research.
- 8.9 DTC: Deep Tracking Control – A Unifying Approach to Model-Based Planning and Reinforcement-Learning for Versatile and Robust Locomotion
- Authors: Fabian Jenelten, Junzhe He, Farbod Farshidian, Marco Hutter
- Reason: The paper proposes a hybrid control architecture that combines the benefits of model-based planning and reinforcement learning. This could potentially transform locomotion pipelines in terms of robustness and foot-placement accuracy.
- 8.7 MORPH: Design Co-optimization with Reinforcement Learning via a Differentiable Hardware Model Proxy
- Authors: Zhanpeng He, Matei Ciocarlie
- Abstract highlights: This paper introduces a method for co-optimization of hardware design parameters and control policies in simulation using reinforcement learning, which could be valuable for designing more efficient reinforcement learning systems.
- 8.5 Estimation and Inference in Distributional Reinforcement Learning
- Authors: Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang
- Abstract highlights: This paper studies distributional reinforcement learning from the perspective of statistical efficiency, providing a unified approach to statistical inference of a wide class of statistical functionals.
- 8.3 Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
- Authors: Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang
- Abstract highlights: This paper proposes a principled framework for autonomous large language model (LLM) agents with provable sample efficiency. The framework combines long-term reasoning and short-term acting to achieve a square root of T regret.
- 7.9 Directly Fine-Tuning Diffusion Models on Differentiable Rewards
- Authors: Kevin Clark, Paul Vicol, Kevin Swersky, David J Fleet
- Abstract highlights: This paper presents a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, which could have practical implications for reinforcement learning applications.