- 9.5 On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $ε$-Greedy Exploration
- Authors: Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury
- Reason: This paper offers a significant theoretical understanding of DQN’s convergence with the $\varepsilon$-greedy policy in reinforcement learning, which is often not fully explored. Furthermore, this paper has strong authority with multiple authors from well-established backgrounds in machine learning research.
- 9.1 QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
- Authors: Elias Frantar, Dan Alistarh
- Reason: The paper presents a highly scalable and practical solution for addressing the memory problem in ML models, including those as large as the trillion-parameter MoEs. Their approach is revolutionary as it allows for the execution of these large models on affordable commodity hardware.
- 9.0 Contextual Bandits for Evaluating and Improving Inventory Control Policies
- Authors: Dean Foster, Randy Jia, Dhruv Madeka
- Reason: This paper discusses a novel method of evaluating and improving inventory control policies, which is a nontrivial problem in reinforcement learning. The authors are also highly influential in the field.
- 9.0 Model predictive control-based value estimation for efficient reinforcement learning
- Authors: Qizhen Wu, Kexin Liu, Lei Chen
- Reason: The paper introduces an improved reinforcement learning method based on model predictive control which leads to a higher learning efficiency and faster convergent speed. It has demonstrated its effectiveness in both a classic database and a dynamic obstacle avoidance scenario for unmanned aerial vehicles.
- 8.9 Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies
- Authors: Michael Beukman, Devon Jarvis, Richard Klein, Steven James, Benjamin Rosman
- Reason: The authors propose an inventive architectural model to address an important problem in reinforcement learning, generalizing to new transition dynamics. The theoretical advancements, originality, and experimental validation ensure high potential influence in the community.
- 8.8 Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits
- Authors: Arnab Maiti, Ross Boczar, Kevin Jamieson, Lillian J. Ratliff
- Reason: This paper provides a near-optimal algorithm for identifying the Nash equilibrium in noisy Matrix Games, a problem that generalizes stochastic Multi-Armed Bandits and Dueling Bandits. The authors are all well-known figures in reinforcement learning.
- 8.8 Towards Control-Centric Representations in Reinforcement Learning from Images
- Authors: Chen Liu, Hongyu Zang, Xin Li, Yong Heng, Yifei Wang, Zhen Fang, Yisen Wang, Mingzhong Wang
- Reason: The authors propose a novel reinforcement learning method to tackle the issue of extracting control-centric representations in image-based reinforcement learning, showing superior performance in large benchmarks.
- 8.7 Symphony of experts: orchestration with adversarial insights in reinforcement learning
- Authors: Matthieu Jonckheere, Chiara Mignacco, Gilles Stoltz
- Reason: The paper explores the concept of orchestration in reinforcement learning, providing insightful methods and simulation results, promising practical applications, and potentially influencing future research in the field.
- 8.4 Imperfect Digital Twin Assisted Low Cost Reinforcement Training for Multi-UAV Networks
- Authors: Xiucheng Wang, Nan Cheng, Longfei Ma, Zhisheng Yin, Tom. Luan, Ning Lu
- Reason: This paper proposes a practical approach to efficient training of multi-UAV networks using an imperfect DT model, addressing the problem of high training costs in distributed reinforcement learning environments.
- 8.2 Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms
- Authors: Prashansa Panda, Shalabh Bhatnagar
- Reason: This paper conducts a non-asymptotic analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms for C-MDP involving inequality constraints, and includes extensive experiments across different grid world settings.