- 9.5 Deep Reinforcement Learning-based Intelligent Traffic Signal Controls with Optimized CO2 emissions
- Authors: Pedram Agand, Alexey Iskrov, Mo Chen
- Reason: This paper stands out for its potential to address significant global issues like climate change, air pollution, and traffic congestion. The proposed method, EcoLight, shapes the reward scheme for reinforcement learning algorithms to reduce CO2 emissions, which is a fresh take on intersection traffic signal controllers. Considering the global reach of these issues, the novel approach of this paper could have a high impact and influence in the broader research community.
- 9.2 RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup
- Authors: Nico Bohlinger, Klaus Dorer
- Reason: This is the only article that exclusively focuses on reinforcement learning. The authors introduce RL-X, a new deep reinforcement learning library focused on RoboCup soccer simulations and classic DRL benchmarks. It offers both significant improvements in speed and provides a more flexible tool for reinforcement learning applications. The authors’ contributions and the application of the library have potential to influence future research in reinforcement learning.
- 9.1 Contrastive Prefence Learning: Learning from Human Feedback without RL
- Authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh
- Reason: The authors highlight a different approach to learning from human feedback, focusing on maximizing entropy and using an algorithm called Contrastive Preference Learning (CPL). This can be applied to complex RLHF problems, offering a simpler alternative to previous methods. Moreover, the authors’ methodology raises important implications for reinforcement learning, specifically the handling of human preferences.
- 9.0 Progressively Efficient Learning
- Authors: Ruijie Zheng, Khanh Nguyen, Hal Daumé III, Furong Huang, Karthik Narasimhan
- Reason: The authors of this paper tackle some critical issues in the development of AI agents, including the need for rapid skill acquisition and adaptability to user preferences. The proposed framework, Communication-Efficient Interactive Learning (CEIL), has demonstrated remarkable results in terms of performance and communication efficiency.
- 8.9 Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes
- Authors: Ruiquan Huang, Yuan Cheng, Jing Yang, Vincent Tan, Yingbin Liang
- Reason: The authors investigate the potential benefits of multi-task reinforcement learning in more complex sequential decision-making problems, such as Partially Observable Markov Decision Processes (POMDPs). The paper provides a theoretical as well as practical contribution to the current understanding of multi-task reinforcement learning, underpinning its potential benefits and applications.
- 8.8 Optimal Best Arm Identification with Fixed Confidence in Restless Bandits
- Authors: P. N. Karthik, Vincent Y. F. Tan, Arpan Mukherjee, Ali Tajer
- Reason: The paper studies the best arm identification in a restless multi-armed bandit setting with finite arms. The authors provide a theoretical examination of the subject and propose an advantageous policy for best arm identification. Even though it’s not entirely focused on reinforcement learning, it could be beneficial to those engaged in reinforcement learning research.
- 8.7 A Deep Learning Analysis of Climate Change, Innovation, and Uncertainty
- Authors: Michael Barnett, William Brock, Lars Peter Hansen, Ruimeng Hu, Joseph Huang
- Reason: This paper makes a significant contribution to the climate-economics field. By implementing a global solution method using neural networks, the authors were able to grasp the impacts of model uncertainty on social valuations and optimal decisions. Their findings could better inform future climate policies, making it essential reading for the intersection of climate science, economics, and AI.
- 8.7 Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery
- Authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi
- Reason: This paper introduces a model-based reinforcement learning method for causal discovery, using a tree search that builds directed acyclic graphs incrementally. The authors’ approach and their methodology for tackling the challenge of exploring DAG space could potentially influence research involving both reinforcement learning and causal discovery.
- 8.5 Absolute Policy Optimization
- Authors: Weiye Zhao, Feihan Li, Yifan Sun, Rui Chen, Tianhao Wei, Changliu Liu
- Reason: The authors present a new objective function in reinforcement learning that guarantees a monotonic improvement in the lower bound of near-total performance samples, bringing a fresh perspective to policy optimization.
- 8.2 ManiCast: Collaborative Manipulation with Cost-Aware Human Forecasting
- Authors: Kushal Kedia, Prithwish Dan, Atiksh Bhardwaj, Sanjiban Choudhury
- Reason: This paper introduces a novel framework for human-robot collaboration that considers the effect of future human motion on the cost of a robot’s plan. The practical implications of this research in various fields (e.g. healthcare, manufacturing) have a high potential influence.