9.2 A Zero-Shot Reinforcement Learning Strategy for Autonomous Guidewire Navigation
- Authors: Valentina Scarponi, Michel Duprez, Florent Nageotte, Stéphane Cotin
- Reason: Offers a practical zero-shot RL approach for medical robotics, high generalization to unseen scenarios, reported high success rate, and authors are recognized in the field.
9.0 Autonomous vehicle decision and control through reinforcement learning with traffic flow randomization
- Authors: Yuan Lin, Antai Xie, Xiao Liu
- Reason: Addresses a critical application of RL in autonomous vehicles, with novelty in traffic flow randomization, relevance to real-world transferability of RL policies, and the authors have a track record in this research area.
8.9 Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
- Authors: Yifan Song, Da Yin, Xiang Yue, Jie Huang, Sujian Li, Bill Yuchen Lin
- Reason: Explores a novel trajectory optimization approach improving LLM agents’ performance substantially and offers practical implications for reinforcement learning with exploration failures.
8.9 Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination
- Authors: Liangzhou Wang, Kaiwen Zhu, Fengming Zhu, Xinghu Yao, Shujie Zhang, Deheng Ye, Haobo Fu, Qiang Fu, Wei Yang
- Reason: Tackles a fundamental challenge in multi-agent systems, introduces an imaginative method for coordination, and demonstrates strong results on complex environments.
8.7 Wukong: Towards a Scaling Law for Large-Scale Recommendation
- Authors: Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Daifeng Guo, Yanli Zhao, Shen Li, Yuchen Hao, Yantao Yao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, Wenlin Chen
- Reason: Proposes a network architecture and scaling strategy that could significantly advance recommendation models and their scalability, potentially impacting various AI domains including reinforcement learning.
8.7 Behavior Generation with Latent Actions
- Authors: Seungjae Lee, Yibin Wang, Haritheja Etukuru, H. Jin Kim, Nur Muhammad Mahi Shafiullah, Lerrel Pinto
- Reason: Important contribution to behavior modeling and generation in decision making using a hierarchical approach, from a team with a strong background in RL.
8.6 Preventing Reward Hacking with Occupancy Measure Regularization
- Authors: Cassidy Laidlaw, Shivam Singhal, Anca Dragan
- Reason: Presents a novel perspective on prevention of reward hacking, a significant problem in RL, with theoretical and empirical backing, involving authors from a reputable institution in the field.
8.5 Learning-augmented Online Minimization of Age of Information and Transmission Costs
- Authors: Zhongdong Liu, Keyuan Zhang, Bin Li, Yin Sun, Y. Thomas Hou, Bo Ji
- Reason: Introduces a learning-augmented online algorithm balancing transmission and staleness costs in systems, which could translate into improved learning and decision-making strategies in RL scenarios.
8.3 A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation
- Authors: Aritra Mitra
- Reason: Provides a novel analysis of TD learning that could simplify the understanding and optimization of reinforcement learning algorithms with function approximation.
8.1 Geometric Dynamics of Signal Propagation Predict Trainability of Transformers
- Authors: Aditya Cowsik, Tamra Nebabu, Xiao-Liang Qi, Surya Ganguli
- Reason: Investigates the trainability of deep transformers with implications for reinforcement learning, particularly in understanding system dynamics and improving model initialization for training stability.