- 9.5 Safe RLHF: Safe Reinforcement Learning from Human Feedback
- Authors: Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang
- Reason: This paper is potentially influential as they propose a novel algorithm for human value alignment in AI, ensuring both helpfulness and harmlessness, which are significant concerns in reinforcement learning. Additionally, testing on a large language model and improvements from multiple rounds of fine-tuning suggest rigourous empirical support.
- 9.3 Eureka: Human-Level Reward Design via Coding Large Language Models
- Authors: Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar
- Reason: The authors adopt large language models (LLMs) to generate reward functions superior to expert human-engineered rewards. The achievement of human-level reward design and successful applications to a diverse array of RL environments emphasize potential future implications.
- 9.2 MARVEL: Multi-Agent Reinforcement-Learning for Large-Scale Variable Speed Limits
- Authors: Yuhang Zhang, Marcos Quinones-Grueiro, Zhiyao Zhang, Yanbing Wang, William Barbour, Gautam Biswas, Daniel Work
- Reason: The paper introduces MARVEL, a novel Multi-Agent Reinforcement Learning (MARL) framework for managing traffic flow, improving safety and mobility. Conducts extensive experiments and offers insights into agent behavior.
- 9.1 Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
- Authors: Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, David Lindner
- Reason: The authors use pretrained vision-language models as zero-shot reward models, providing a sample-efficient alternative to traditional RL. Considering the demonstrated ability of these models to achieve complex tasks, this could be a novel, significantly efficient approach in reinforcement learning.
- 8.8 How a student becomes a teacher: learning and forgetting through Spectral methods
- Authors: Lorenzo Giambagli, Lorenzo Buffoni, Lorenzo Chicchi, Duccio Fanelli
- Reason: The paper introduces a unique perspective on how machine learning models learn and forget using spectral methods, providing new optimization techniques.
- 8.8 Hybrid Search for Efficient Planning with Completeness Guarantees
- Authors: Kalle Kujanpää, Joni Pajarinen, Alexander Ilin
- Reason: The authors provide a way to accomplish planning problems with completeness guarantees, a critical requirement in many fields. Theoretically sophisticated, it offers a practical way to achieve both efficiency and completeness.
- 8.5 CAT: Closed-loop Adversarial Training for Safe End-to-End Driving
- Authors: Linrui Zhang, Zhenghao Peng, Quanyi Li, Bolei Zhou
- Reason: This paper proposes a Closed-loop Adversarial Training (CAT) framework for safe end-to-end driving. It presents an innovative approach of continuously improving driving safety by training on dynamically generated safety-critical scenarios.
- 8.5 Generative Flow Networks as Entropy-Regularized RL
- Authors: Daniil Tiapkin, Nikita Morozov, Alexey Naumov, Dmitry Vetrov
- Reason: The authors link generative flow networks and reinforcement learning further, integrating model training into an entropy-regularized RL problem. They demonstrate competitive results when applying established soft RL algorithms, warranting potential future studies on the topic.
- 8.4 Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
- Authors: Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, Yaodong Yang
- Reason: The authors propose an environment and library for testing and validating reinforcement learning algorithms in safety-critical scenarios.
- 8.1 PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
- Authors: Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Seoyun Yang, Minjoon Jung, Byoung-Tak Zhang
- Reason: PGA is a significant step towards the personalization of robot’s abilities to better interact with humans, specifically through language-Conditioned Robotic Grasping. Experimental results show viable usage in real-world scenarios.