9.2 Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning
- Authors: Tianchen Zhou, FNU Hairi, Haibo Yang, Jia Liu, Tian Tong, Fan Yang, Michinari Momma, Yan Gao
- Reason: The paper is accepted at a top-tier conference (ICML 2024) and tackles the under-explored multi-objective reinforcement learning (MORL), providing theoretical contributions including finite-time convergence analysis and sample complexity, which can significantly influence the understanding and application of MORL in various domains.
9.0 Federated Reinforcement Learning with Constraint Heterogeneity
- Authors: Hao Jin, Liangyu Zhang, Zhihua Zhang
- Reason: In the trending field of federated learning, this paper adds the important dimension of reinforcement learning with multiple constraints, suited for practical applications like healthcare, which is highly relevant and can drive future research in federated RL.
8.9 Deep Reinforcement Learning for Modelling Protein Complexes
- Authors: Tao Feng, Ziqi Gao, Jiaxuan You, Chenyi Zi, Yan Zhou, Chen Zhang, Jia Li
- Reason: Addresses a challenging problem in bioinformatics with potential high impact, presented at a reputable conference (ICLR 2024), significant accuracy and efficiency improvements demonstrated, and relevance to current high-interest topics in AI such as AlphaFold.
8.7 Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning
- Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg
- Reason: Addresses multi-task learning, an area of growing interest, with solutions for both centralized and decentralized settings. The paper presents theoretical convergence results and practical algorithmic contributions.
8.7 Policy Learning for Balancing Short-Term and Long-Term Rewards
- Authors: Peng Wu, Ziyu Shen, Feng Xie, Zhongyao Wang, Chunchen Liu, Yan Zeng
- Reason: The paper presents a new policy learning framework that balances short-term and long-term rewards, with sound theoretical foundations including identifiability, efficiency bounds, and algorithms with proven regret convergence rates. This addresses a core challenge in reinforcement learning and could be influential for both theory and practice.
8.5 Proximal Curriculum with Task Correlations for Deep Reinforcement Learning
- Authors: Georgios Tzannetos, Parameswaran Kamalaruban, Adish Singla
- Reason: Introduces a novel curriculum strategy that could potentially speed up the training of deep RL agents, backed by theoretical and empirical results, contributing to the advancement of learning curricula in RL.
8.5 Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning
- Authors: Stone Tao, Arth Shukla, Tse-kai Chan, Hao Su
- Reason: Accepted at ICLR 2024, this paper proposes an innovative curriculum learning approach for reinforcement learning, demonstrating significant improvements in sample and demonstration efficiency. It offers a solution to one of RL’s biggest hurdles: data efficiency.
8.4 Enhancing Q-Learning with Large Language Model Heuristics
- Authors: Xiefeng Wu
- Reason: The paper proposes an interesting blend of reinforcement learning and natural language processing by integrating large language model heuristics into Q-learning. Despite the lower impact due to potential text overlap with previous work, the approach could be influential for future work at the intersection of RL and NLP.
8.3 Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
- Authors: Wenjia Meng, Qian Zheng, Long Yang, Yilong Yin, Gang Pan
- Reason: Proposes a method to decrease the high variance issue in off-policy policy gradient methods, which could be influential in making these methods more practical and effective.
8.1 CTD4 - A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics
- Authors: David Valencia, Henry Williams, Trevor Gee, Bruce A MacDonaland, Minas Liarokapis
- Reason: Offers a new distributional RL algorithm tailored for continuous action spaces, addressing practical challenges of CDRL and can be more sample-efficient, which could influence continuous control problems in RL.