- 9.9 2310.17330-CQM: Curriculum Reinforcement Learning with a Quantized World Model
- Authors: Seungjae Lee, Daesol Cho, Jonghae Park, H. Jin Kim
- Reason: The paper was accepted to NeurIPS 2023, which indicates that it has undergone rigorous review by leading experts in the field. Moreover, the paper proposes a novel curriculum method for reinforcement learning in high-dimensional space, which addresses a significant challenge in the field. The authors also claim that their method improves data efficiency and performance, outperforming state-of-the-art methods.
- 9.7 2310.17303-Demonstration-Regularized RL
- Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard
- Reason: The authors propose a demonstration-regularized reinforcement learning that improves RL’s sample complexity. They provide strong theoretical evidence and guarantees. However, it misses out against the first paper as it does not showcase practical performance improvements.
- 9.5 2310.17458-Coalitional Bargaining via Reinforcement Learning: An Application to Collaborative Vehicle Routing
- Authors: Stephen Mak, Liming Xu, Tim Pearce, Michael Ostroumov, Alexandra Brintrup
- Reason: The paper introduces a practical application of RL to coalitional bargaining in vehicle routing. It was also accepted to NeurIPS 2021 Workshop on Cooperative AI. However, description of the practical performance of the method is somewhat limited.
- 9.3 Privately Aligning Language Models with Reinforcement Learning
- Authors: Fan Wu, Huseyin A. Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim
- Reason: The authors are recognized for their work in reinforcement learning. The paper introduces a new DP framework designed to align LLMs with RL, a significant contribution to the privacy preservation in RL language models. The experimental feedback validates this approach while keeping the privacy firm.
- 9.3 2310.17634-Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion
- Authors: Laura Smith, Yunhao Cao, Sergey Levine
- Reason: The paper provides a novel approach to continuously improve robotic locomotion performance using RL. Although the paper seems to be of practical value it is not stated to be accepted by any leading conference.
- 9.2 Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning
- Authors: Hongyu Zang, Xin Li, Leiji Zhang, Yang Liu, Baigui Sun, Riashat Islam, Remi Tachet des Combes, Romain Laroche
- Reason: The paper provides a comprehensive analysis of why bisimulation methods falter in offline tasks and proposes solutions that help prevent overfitting and feature collapse. What gives this paper influence potential is its practical implementation which offers possible performance gains on two benchmark suites.
- 9.1 2310.17596-MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations
- Authors: Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, Dieter Fox
- Reason: The paper discloses an automatic large-scale data generation from human demonstrations to train robots to perform long-horizon and high-precision tasks. While theoretically interesting, concrete performance improvements and conference acceptance remain unstated.
- 9.0 Causal Q-Aggregation for CATE Model Selection
- Authors: Hui Lan, Vasilis Syrgkanis
- Reason: This paper offers a very novel approach to CATE model selection and ensembling by proposing a Q-aggregation method using a doubly robust loss. This method could potentially influence the wider machine learning community interested in CATE estimation and personalized decision making.
- 8.7 Controlled Decoding from Language Models
- Authors: Sidharth Mudgal, Jong Lee, Harish Ganapathy, YaGuang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, Jilin Chen, Alex Beutel, Ahmad Beirami
- Reason: The authors present an innovative reinforcement learning method to control the autoregressive generation from language models, which could have a significant influence in the area of conversation AI and language model development.
- 8.6 Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
- Authors: Yuqing Wang, Zhenghao Xu, Tuo Zhao, Molei Tao
- Reason: The paper tackles a key problem in nonconvex optimization and proposes a novel approach providing a global convergence theory under large learning rates. This could have a significant influence on further studies in the field of optimization, particularly those dealing with high learning rates.