9.4 Adaptive Proximal Policy Optimization with Upper Confidence Bound
- Authors: Ziqi Zhang, Jingzehua Xu, Zifeng Zhuang, Jinxin Liu, Donglin wang
- Reason: Introduces an adaptive method which could significantly improve the performance of PPO, a popular RL algorithm, ensuring a high potential for influence in the RL community.
9.3 Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability
- Authors: Reda Ouhamma, Maryam Kamgarpour
- Reason: Addresses an open problem and provides a solution that converges to Nash equilibrium under more practical and weaker conditions, which could be highly influential for both theoretical and applied researchers.
9.1 I Open at the Close: A Deep Reinforcement Learning Evaluation of Open Streets Initiatives
- Authors: R. Teal Witter, Lucas Rosenblatt
- Reason: The paper is camera ready for AAAI 2024 and it potentially influences urban planning and public policy using reinforcement learning, which indicates cross-disciplinary impact.
9.0 GLOP: Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time
- Authors: Haoran Ye, Jiarui Wang, Helan Liang, Zhiguang Cao, Yong Li, Fanzhang Li
- Reason: The paper presents a novel and scalable framework for routing problems, which is highly relevant to real-world applications. Additionally, it is accepted at a top-tier conference (AAAI 2024), indicating peer recognition and potential influence in the field.
8.9 A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
- Authors: Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang
- Reason: Presents a novel perspective on the challenges of Offline-to-Online RL, with significant performance improvements in benchmark environments which might influence future O2O RL research.
8.8 Traffic Signal Control Using Lightweight Transformers: An Offline-to-Online RL Approach
- Authors: Xingshuai Huang, Di Wu, Benoit Boulet
- Reason: Proposes a novel Decision Transformer-based method with online adaptation capabilities, offering a practical solution to a real-world problem and likely to be influential in RL applications in traffic systems.
8.6 Secure Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless MEC Networks
- Authors: Xin Hao, Phee Lep Yeoh, Changyang She, Branka Vucetic, Yonghui Li
- Reason: This paper tackles security in MEC networks using DRL and blockchain, combining two cutting-edge technologies. The authorship includes notable experts, which adds to its credibility and potential impact on the field of secure resource allocation.
8.4 Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
- Authors: Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell
- Reason: It presents a foundational analysis of preference learning incorporating hidden context, which addresses a significant problem in RLHF. This can have far-reaching implications for the training of more robust and reliable RL systems.
8.3 Combinatorial Stochastic-Greedy Bandit
- Authors: Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, Vaneet Aggarwal
- Reason: The paper proposes a novel algorithm with a performance bound for combinatorial bandit problems, which could be influential in the optimization and decision-making domains within machine learning.
7.9 An Invitation to Deep Reinforcement Learning
- Authors: Bernhard Jaeger, Andreas Geiger
- Reason: This tutorial paper could have high educational impact by making deep RL more accessible, but its influence is more on pedagogy than on advancing the state-of-the-art in RL research.