9.3 ComSD: Balancing Behavioral Quality and Diversity in Unsupervised Skill Discovery
- Authors: Xin Liu, Yaran Chen, Dongbin Zhao
- Reason: The authors propose a novel approach for optimizing the quality and diversity of unsupervised skill discovery in reinforcement learning. The balance of exploration and exploitation in practice, as well as contrastive learning and a novel dynamic weighting mechanism, suggest this paper might have a significant impact on the field.
9.2 Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
- Authors: Zihan Ding, Chi Jin
- Reason: The authors propose consistency models as a new policy representation for reinforcement learning that is both efficient and expressive. The performance improvements demonstrated in offline, offline-to-online, and online RL settings suggest this work may be influential.
9.1 Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness
- Authors: Xiaoyu Wen, Xudong Yu, Rui Yang, Chenjia Bai, Zhen Wang
- Reason: The authors address the significant challenge of using offline reinforcement learning for online RL adaptation. Their novel approach could have a significant impact on the efficiency and optimality of reinforcement learning.
9.1 Memory Gym: Partially Observable Challenges to Memory-Based Agents in Endless Episodes
- Authors: Marco Pleines, Matthias Pallasch, Frank Zimmer, Mike Preuss
- Abstract highlights: This paper introduces a unique benchmark designed to test Deep Reinforcement Learning agents, particularly comparing the performance of Gated Recurrent Unit (GRU) vs Transformer-XL (TrXL) on their ability to memorize long sequences, withstand noise, and generalize.
9.0 A Real-World Quadrupedal Locomotion Benchmark for Offline Reinforcement Learning
- Authors: Hongyin Zhang, Shuyu Yang, Donglin Wang
- Reason: This paper presents a new benchmark for assessing offline reinforcement learning algorithms with real-world legged locomotion tasks. The use of model predictive control and potential of ORL in this area may have significant impact on future research.
8.9 DTC: Deep Tracking Control – A Unifying Approach to Model-Based Planning and Reinforcement-Learning for Versatile and Robust Locomotion
- Authors: Fabian Jenelten, Junzhe He, Farbod Farshidian, Marco Hutter
- Reason: The paper proposes a hybrid control architecture that combines the benefits of model-based planning and reinforcement learning. This could potentially transform locomotion pipelines in terms of robustness and foot-placement accuracy.
8.7 MORPH: Design Co-optimization with Reinforcement Learning via a Differentiable Hardware Model Proxy
- Authors: Zhanpeng He, Matei Ciocarlie
- Abstract highlights: This paper introduces a method for co-optimization of hardware design parameters and control policies in simulation using reinforcement learning, which could be valuable for designing more efficient reinforcement learning systems.
8.5 Estimation and Inference in Distributional Reinforcement Learning
- Authors: Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang
- Abstract highlights: This paper studies distributional reinforcement learning from the perspective of statistical efficiency, providing a unified approach to statistical inference of a wide class of statistical functionals.
8.3 Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
- Authors: Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang
- Abstract highlights: This paper proposes a principled framework for autonomous large language model (LLM) agents with provable sample efficiency. The framework combines long-term reasoning and short-term acting to achieve a square root of T regret.
7.9 Directly Fine-Tuning Diffusion Models on Differentiable Rewards
- Authors: Kevin Clark, Paul Vicol, Kevin Swersky, David J Fleet
- Abstract highlights: This paper presents a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, which could have practical implications for reinforcement learning applications.