- 9.7 SmartPathfinder: Pushing the Limits of Heuristic Solutions for Vehicle Routing Problem with Drones Using Reinforcement Learning
- Authors: Navid Mohammad Imran, Myounggyu Won
- Reason: Innovative integration of RL into heuristic VRPD solutions demonstrating significant improvements in quality and speed, showcasing real-world applicability.
- 9.4 Decentralized Coordination of Distributed Energy Resources through Local Energy Markets and Deep Reinforcement Learning
- Authors: Daniel May, Matthew Taylor, Petr Musilek
- Reason: Addresses a current and significant challenge in energy systems through RL, with promising results approaching a near-optimal benchmark.
- 9.2 Reducing Redundant Computation in Multi-Agent Coordination through Locally Centralized Execution
- Authors: Yidong Bai, Toshiharu Sugawara
- Reason: Proposes an innovative method to reduce redundancy in computation for multi-agent systems, applicable in various domains requiring efficient coordination.
- 9.2 Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras
- Authors: Mhairi Dunion, Stefano V. Albrecht
- Reason: Authors present an innovative approach to overcoming hardware constraints in RL, and they demonstrate significant practical improvements. High potential for influencing real-world applications and the development of RL systems.
- 9.0 PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
- Authors: Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi
- Reason: Introduces a novel approach in hierarchical RL with application in robotic tasks, improving performance significantly in sparse-reward environments where other baselines struggle.
- 9.0 Multi-Agent Hybrid SAC for Joint SS-DSA in CRNs
- Authors: David R. Nickel, Anindya Bijoy Das, David J. Love, Christopher G. Brinton
- Reason: This work tackles a complex and relevant problem using multi-agent reinforcement learning, has experimental evidence of outperforming state-of-the-art, and solves a crucial real-world problem related to cognitive radio networks.
- 8.8 Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent
- Authors: Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
- Reason: Develops a new algorithm for solving imperfect-information games with theoretical and empirical validation, which could impact strategic decision-making processes.
- 8.8 Optimal Design for Human Feedback
- Authors: Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton
- Reason: The paper addresses the important issue of efficient data collection for preference models in RL, with the potential to reduce the cost and enhance the quality of human annotations.
- 8.5 Towards Robust Trajectory Representations: Isolating Environmental Confounders with Causal Learning
- Authors: Kang Luo, Yuanshao Zhu, Wei Chen, Kun Wang, Zhengyang Zhou, Sijie Ruan, Yuxuan Liang
- Reason: Challenges a critical modeling issue in RL with a novel causal approach and could lead to more robust and generalizable trajectory models.
- 8.3 Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
- Authors: Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar
- Reason: The paper explores various fine-tuning methods for LLMs using preferences, with practical insights for improving preference fine-tuning that could impact the refinement of RL policies.