9.5 Quantum Acceleration of Infinite Horizon Average-Reward Reinforcement Learning
- Authors: Bhargav Ganguly, Vaneet Aggarwal
- This paper investigates the potential of quantum acceleration in addressing infinite horizon Markov Decision Processes (MDPs) to enhance average reward outcomes in reinforcement learning. The authors propose a novel framework for reinforcement learning, which exponentially advances regret guarantees for infinite horizon Reinforcement Learning compared to classical counterparts.
9.5 Accelerated Policy Gradient: On the Nesterov Momentum for Reinforcement Learning
- Authors: Yen-Ju Chen, Nai-Chieh Huang, Ping-Chun Hsieh
- The authors not only explore the role of Nesterov’s accelerated gradient (NAG) in reinforcement learning, but they also demonstrate APG’s ability to significantly improve the convergence behavior over standard policy gradient.
9.2 Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach
- Authors: Dengwang Tang, Rahul Jain, Botao Hao, Zheng Wen
- This paper presents a method for efficient online reinforcement learning in the infinite horizon setting when there is an offline dataset to start with. The proposed algorithms and methods significantly improve the state-of-the-art in terms of minimizing cumulative regret.
9.0 Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
- Authors: Washim Uddin Mondal, Vaneet Aggarwal
- The authors propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. The method improves state-of-the-art sample complexity by a log(ε) factor, providing a significant contribution to the field.
9.0 Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
- Authors: Rui Zheng, Wei Shen, Yuan Hua, Wenbin Lai, Shihan Dou, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Haoran Huang, Tao Gui, Qi Zhang, Xuanjing Huang
- The team demonstrates a novel approach to consistently optimize policy performance across various data groups which significantly improves both model performance and training stability.
8.8 Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
- Authors: Haolin Liu, Chen-Yu Wei, Julian Zimmert
- This paper provides an insight into online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback. The authors introduce two algorithms that improve regret performance, contributing significantly to the practical application of machine learning.
8.5 Guarantees for Self-Play in Multiplayer Games via Polymatrix Decomposability
- Authors: Revan MacQueen, James R. Wright
- This paper deals with the multi-agent self-play learning scenario, where agents are trying to learn by interacting with themselves. The authors propose a method for reducing the vulnerability of these agents by decomposing the games into two-player constant-sum games. Such a method could lead to better performance in multiplayer games.
8.1 A General Theoretical Paradigm to Understand Learning from Human Preferences
- Authors: Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos
- The paper introduces a new theoretical framework for learning from human preferences and puts forth important implications about the limitations of current algorithms in the field.
7.5 On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
- Authors: Rohan Subramani, Marcus Williams, Max Heitmann, Halfdan Holm, Charlie Griffin, Joar Skalse
- The study conducts a comprehensive comparison of the expressivities of 17 objective-specification formalisms in reinforcement learning. This comprehension could holistically influence both policy optimization and reward learning.
7.2 Accelerate Presolve in Large-Scale Linear Programming via Reinforcement Learning
- Authors: Yufei Kuang, Xijun Li, Jie Wang, Fangzhou Zhu, Meng Lu, Zhihai Wang, Jia Zeng, Houqiang Li, Yongdong Zhang, Feng Wu
- Here, the authors leverage reinforcement learning to address a complex problem in large-scale Linear Programming. They propose an innovative reinforcement learning framework that dramatically improves problem-solving efficiency.