9.7 SmartPathfinder: Pushing the Limits of Heuristic Solutions for Vehicle Routing Problem with Drones Using Reinforcement Learning
- Authors: Navid Mohammad Imran, Myounggyu Won
- Reason: Innovative integration of RL into heuristic VRPD solutions demonstrating significant improvements in quality and speed, showcasing real-world applicability.
9.4 Decentralized Coordination of Distributed Energy Resources through Local Energy Markets and Deep Reinforcement Learning
- Authors: Daniel May, Matthew Taylor, Petr Musilek
- Reason: Addresses a current and significant challenge in energy systems through RL, with promising results approaching a near-optimal benchmark.
9.2 Reducing Redundant Computation in Multi-Agent Coordination through Locally Centralized Execution
- Authors: Yidong Bai, Toshiharu Sugawara
- Reason: Proposes an innovative method to reduce redundancy in computation for multi-agent systems, applicable in various domains requiring efficient coordination.
9.2 Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras
- Authors: Mhairi Dunion, Stefano V. Albrecht
- Reason: Authors present an innovative approach to overcoming hardware constraints in RL, and they demonstrate significant practical improvements. High potential for influencing real-world applications and the development of RL systems.
9.0 PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
- Authors: Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi
- Reason: Introduces a novel approach in hierarchical RL with application in robotic tasks, improving performance significantly in sparse-reward environments where other baselines struggle.
9.0 Multi-Agent Hybrid SAC for Joint SS-DSA in CRNs
- Authors: David R. Nickel, Anindya Bijoy Das, David J. Love, Christopher G. Brinton
- Reason: This work tackles a complex and relevant problem using multi-agent reinforcement learning, has experimental evidence of outperforming state-of-the-art, and solves a crucial real-world problem related to cognitive radio networks.
8.8 Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent
- Authors: Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
- Reason: Develops a new algorithm for solving imperfect-information games with theoretical and empirical validation, which could impact strategic decision-making processes.
8.8 Optimal Design for Human Feedback
- Authors: Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton
- Reason: The paper addresses the important issue of efficient data collection for preference models in RL, with the potential to reduce the cost and enhance the quality of human annotations.
8.5 Towards Robust Trajectory Representations: Isolating Environmental Confounders with Causal Learning
- Authors: Kang Luo, Yuanshao Zhu, Wei Chen, Kun Wang, Zhengyang Zhou, Sijie Ruan, Yuxuan Liang
- Reason: Challenges a critical modeling issue in RL with a novel causal approach and could lead to more robust and generalizable trajectory models.
8.3 Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
- Authors: Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar
- Reason: The paper explores various fine-tuning methods for LLMs using preferences, with practical insights for improving preference fine-tuning that could impact the refinement of RL policies.