- 9.6 Goodhart’s Law in Reinforcement Learning
- Authors: Jacek Karwowski, Oliver Hayman, Xingjian Bai, Klaus Kiendlhofer, Charlie Griffin, Joar Skalse
- Reason: The paper provides a substantial theoretical contribution for reinforcement learning, especially regarding the reward misspecification. It proposes methods for optimal early stopping to prevent degradation of the true objective when optimizing an imperfect proxy reward.
- 9.5 Safe Deep Policy Adaptation
- Authors: Wenli Xiao, Tairan He, John Dolan, Guanya Shi
- Reason: Paper introduces SafeDPA, a novel RL and control framework that addresses issues of policy adaptation and safety in autonomous robots. It shows impressive results in various complex environments, provides theoretical safety guarantees, and overcomes the limitations of previous methods.
- 9.4 Deep Reinforcement Learning for Autonomous Vehicle Intersection Navigation
- Authors: Badr Ben Elallid, Hamza El Alaoui, Nabil Benamar
- Reason: The paper tackles the challenge of efficiently and safely navigating complex intersections using reinforcement learning. It shows promising results using the TD3-based approach in latency reduction, collision minimization, and general safety performance.
- 9.3 METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
- Authors: Seohong Park, Oleh Rybkin, Sergey Levine
- Reason: The paper proposes a novel method for unsupervised reinforcement learning enabling it to scale in complex high-dimensional environments. With experiments in several demanding environments, it shows substantial capability of the proposed method.
- 9.2 ELDEN: Exploration via Local Dependencies
- Authors: Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martin-Martin
- Reason: The paper makes an important contribution to tackling complex environments using a novel method ELDEN, which encourages the discovery of new interactions between entities. ELDEN significantly outperforms previous exploration methods across different domains.
- 8.9 A Framework for Few-Shot Policy Transfer through Observation Mapping and Behavior Cloning
- Authors: Yash Shukla, Bharat Kesari, Shivam Goel, Robert Wright, Jivko Sinapov
- Reason: Paper makes strides in the transfer learning domain, proposing a framework that aids in successful policy transfer using minimal target task interactions. The proposed method is applicable even when source and target tasks are semantically dissimilar.
- 8.9 Community Membership Hiding as Counterfactual Graph Search via Deep Reinforcement Learning
- Authors: Andrea Bernini, Fabrizio Silvestri, Gabriele Tolomei
- Reason: This paper introduces a novel method applying deep reinforcement learning to a different research area - community hiding. The rigorous experiments show outperformance over existing baselines.
- 8.7 Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach
- Authors: Heasung Kim, Sravan Ankireddy
- Reason: The paper addresses the complexity of network parameter optimization using deep reinforcement learning techniques. It proposes an offline model based approach which can achieve similar performance to DQN, without the need for exploration and only a fraction of the data, which is beneficial in terms of minimizing risk and maximizing sample efficiency.
- 8.6 Automatic Music Playlist Generation via Simulation-based Reinforcement Learning
- Authors: Federico Tomasi, Joseph Cauteruccio, Surya Kanoria, Kamil Ciosek, Matteo Rinaldi, Zhenwen Dai
- Reason: The paper proposes an innovative reinforcement learning method for music playlist personalization. The work shows high potential for direct business application, based on offline and online evaluations.
- 8.1 Optimal Scheduling of Electric Vehicle Charging with Deep Reinforcement Learning considering End Users Flexibility
- Authors: Christoforos Menos-Aikateriniadis, Stavros Sykiotis, Pavlos S. Georgilakis
- Reason: The research provides a novel method using Deep Reinforcement Learning for efficient scheduling of EV charging. Given the momentum of EV industry, the work shows high potential for practical applications.