9.5 DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
- Authors: Guanghe Li, Yixiang Shan, Zhengbang Zhu, Ting Long, Weinan Zhang
- Reason: Introduces an innovative data augmentation technique to enhance offline RL performance by addressing the limited optimal trajectories issue, a significant concern in RL. Authors’ affiliations and empirical results suggest high potential influence.
9.3 Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem
- Authors: Maciej Wołczyk, Bartłomiej Cupiał, Mateusz Ostaszewski, Michał Bortkiewicz, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś
- Reason: Conceptualizes a significant challenge in fine-tuning RL models and provides empirical analysis in challenging environments, improving state-of-the-art results in NetHack.
9.2 MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters
- Authors: Arsalan Sharifnassab, Saber Salehkaleybar, Richard Sutton
- Reason: Co-authored by Richard Sutton, a pioneering and highly regarded authority in the field of reinforcement learning. The paper presents a significant improvement in the optimization of meta-parameters, which is a fundamental aspect of RL algorithms.
9.2 Fast Peer Adaptation with Context-aware Exploration
- Authors: Long Ma, Yuanfei Wang, Fangwei Zhong, Song-Chun Zhu, Yizhou Wang
- Reason: Tackles a fundamental challenge in multi-agent RL with applications in cooperative and competitive scenarios, driven by a team including authors with established expertise in the area.
9.1 Language-Guided World Models: A Model-Based Approach to AI Control
- Authors: Alex Zhang, Khanh Nguyen, Jens Tuyls, Albert Lin, Karthik Narasimhan
- Reason: This paper tackles the significant challenge of enabling artificial agents to interpret and act upon human language instructions. Introducing Language-Guided World Models marks an innovative approach to AI control, and considering the current focus on language models and their integration into various AI applications, this paper appears poised to have a substantial impact on the field.
9.1 Boosting Long-Delayed Reinforcement Learning with Auxiliary Short-Delayed Task
- Authors: Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang
- Reason: The paper presents a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that can be pivotal in advancing delayed reinforcement learning scenarios, which are quite common in real-world applications. The potential reduction in sample complexity is a major contribution that, along with outperformance of SOTAs, indicates high influence.
9.0 ARGS: Alignment as Reward-Guided Search
- Authors: Maxim Khanov, Jirayu Burapacheep, Yixuan Li
- Reason: The paper proposes a novel framework that could potentially streamline the alignment of large language models with human objectives. This is an urgent issue in AI, and the paper’s promise of reducing the instability and resource intensity of current methods could be influential, especially if the claims hold up against GPT-4 evaluation.
9.0 PoCo: Policy Composition from and for Heterogeneous Robot Learning
- Authors: Lirui Wang, Jialiang Zhao, Yilun Du, Edward H. Adelson, Russ Tedrake
- Reason: Presents a method for combining heterogeneous datasets in robotics, which is essential for multi-task learning, implicating wide applicability and potential influence. Authorship includes highly respected figures in robotics and machine learning.
9.0 Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning
- Authors: Yixiang Shan, Zhengbang Zhu, Ting Long, Qifan Liang, Yi Chang, Weinan Zhang, Liang Yin
- Reason: Proposes a novel method in diffusion models for RL with experimental evidence of effectiveness, which is a trending area in RL research.
8.9 The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
- Authors: Moschoula Pternea, Prerna Singh, Abir Chakraborty, Yagna Oruganti, Mirco Milletari, Sayli Bapat, Kebei Jiang
- Reason: This paper provides a comprehensive review and introduces a novel taxonomy, which could influence further studies at the intersection of RL and LLMs. The extensive collaboration from various researchers and the multidisciplinary nature of the work contribute to its potential influence.
8.9 EuLagNet: Eulerian Fluid Prediction with Lagrangian Dynamics
- Authors: Qilong Ma, Haixu Wu, Lanxiang Xing, Jianmin Wang, Mingsheng Long
- Reason: Introduces a novel approach in fluid dynamics prediction using a hybrid Eulerian-Lagrangian methodology, beneficial for numerous applications and demonstrating potential influence in reinforcement learning for control problems in continuous, dynamic environments.
8.9 The Virtues of Pessimism in Inverse Reinforcement Learning
- Authors: David Wu, Gokul Swamy, J. Andrew Bagnell, Zhiwei Steven Wu, Sanjiban Choudhury
- Reason: Offers a novel connection between offline RL and IRL, which could significantly speed up learning in IRL by using offline RL algorithms, a frontier area of research with impactful contributions from the authors.
8.8 Prerequisite Structure Discovery in Intelligent Tutoring Systems
- Authors: Louis Annabi, Sao Mai Nguyen
- Reason: It addresses Knowledge Structure and Knowledge Tracing, both critical in education technology. The impact of this work could extend to improving various intelligent tutoring systems, which are increasingly relevant in personalized education.
8.8 Understanding What Affects Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence
- Authors: Jiafei Lyu, Le Wan, Xiu Li, Zongqing Lu
- Reason: Provides a theoretical understanding of generalization gaps in visual RL, a key factor for RL systems, and offers empirical support for the proposed theories.
8.8 A Multi-step Loss Function for Robust Learning of the Dynamics in Model-based Reinforcement Learning
- Authors: Abdelhakim Benechehab, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Balázs Kégl
- Reason: Tackling the critical challenge of compounding one-step prediction errors, this paper proposes a multi-step loss that improves performance significantly over traditional model-based reinforcement learning. Its empirical validation across various tasks signals strong potential influence.
8.7 Inverse Reinforcement Learning by Estimating Expertise of Demonstrators
- Authors: Mark Beliaev, Ramtin Pedarsani
- Reason: This paper presents an innovative framework, IRLEED, to address challenges in Inverse Reinforcement Learning, which is a cornerstone of reliable RL algorithms. The method’s adaptability and effectiveness, demonstrated across diverse settings, indicate strong potential influence.
8.7 Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
- Authors: Yifu Yuan, Jianye Hao, Yi Ma, Zibin Dong, Hebin Liang, Jinyi Liu, Zhixin Feng, Kai Zhao, Yan Zheng
- Reason: It tackles the significant challenge in RL of integrating human feedback into the learning process with a comprehensive system implementation and could have a considerable impact on the way reinforcement learning systems are developed and evaluated.
8.7 Vision-Language Models Provide Promptable Representations for Reinforcement Learning
- Authors: William Chen, Oier Mees, Aviral Kumar, Sergey Levine
- Reason: Leverages vision-language models for embedding world knowledge into RL agents, allowing for prompt-driven improvements in policy training. This innovative approach explores the intersection of vision-language models and RL, authored by researchers including a leader in the field of deep learning and robotics.
8.6 Improved Performances and Motivation in Intelligent Tutoring Systems: Combining Machine Learning and Learner Choice
- Authors: Benjamin Clément, Hélène Sauzéon, Didier Roy, Pierre-Yves Oudeyer
- Reason: This study makes significant strides in the domain of educational technology by enhancing learning performance and motivation in ITS. As personalized education is a fast-growing domain, impacts from this study could influence a wide array of future ITS.
8.5 Distributional Off-policy Evaluation with Bellman Residual Minimization
- Authors: Sungee Hong, Zhengling Qi, Raymond K. W. Wong
- Reason: By introducing the EBRM method and providing finite-sample error bounds, this paper contributes significantly to the theoretical and practical aspects of distributional RL. Its success in simulation studies could influence future research and applications in DRL.
8.5 Transolver: A Fast Transformer Solver for PDEs on General Geometries
- Authors: Haixu Wu, Huakun Luo, Haowen Wang, Jianmin Wang, Mingsheng Long
- Reason: Authors offer an innovative application of transformers to efficiently solve complex PDEs, and this methodology can potentially be adapted to the decision-making processes in RL for systems governed by PDEs.
8.5 Probabilistic Actor-Critic: Learning to Explore with PAC-Bayes Uncertainty
- Authors: Bahareh Tasdighi, Nicklas Werge, Yi-Shan Wu, Melih Kandemir
- Reason: Introduces a new algorithm that effectively mitigates exploration-exploitation trade-off with critic uncertainty estimation, which is a crucial element in RL.
8.5 Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems
- Authors: Tianzhang Cai, Qichen Wang, Shuai Zhang, Özlem Tuğfe Demir, Cicek Cavdar
- Reason: This paper has potential for substantial influence by addressing the current energy efficiency challenges in massive MIMO systems using MARL. The proposed MAPPO-neighbor policy edge over baseline policies during low/high-traffic hours could make it influential in the telecommunications industry.
8.4 3DG: A Framework for Using Generative AI for Handling Sparse Learner Performance Data From Intelligent Tutoring Systems
- Authors: Liang Zhang, Jionghao Lin, Conrad Borchers, Meng Cao, Xiangen Hu
- Reason: The novelty of combining tensor factorization with Generative Adversarial Networks (GAN) and Generative Pre-trained Transformers (GPT) for dealing with sparse data in intelligent tutoring systems presents a unique approach which could be groundbreaking for educational data analysis.
8.3 Hybrid-Prediction Integrated Planning for Autonomous Driving
- Authors: Haochen Liu, Zhiyu Huang, Wenhui Huang, Haohan Yang, Xiaoyu Mo, Chen Lv
- Reason: This paper provides an integration of prediction and planning modules for autonomous systems, which is a central theme in RL, especially for real-world application such as autonomous driving.
8.2 Preference Poisoning Attacks on Reward Model Learning
- Authors: Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik
- Reason: The exploration of vulnerability in preference learning via reward models introduces important considerations for the security and robustness of RL systems. Given the prevalence of such models in high-impact systems, this work could significantly influence future designs and defenses in RL.
8.2 Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning
- Authors: Abdelhakim Benechehab, Albert Thomas, Balázs Kégl
- Reason: Challenges the common approach in offline RL and demonstrates better performance with a single well-calibrated model, which could influence model-based algorithm development.
8.2 A Framework for Partially Observed Reward-States in RLHF
- Authors: Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari
- Reason: This paper could play an important role in the evolving RLHF domain for its unique approach to modeling RL with partially observed reward-states (PORRL), which could lead to improvements in sample complexity and model alignment. The novel models and reductions it introduces for handling human feedback make it notably influential.
8.0 Reducing Optimism Bias in Incomplete Cooperative Games
- Authors: Filip Úradník, David Sychrovský, Jakub Černý, Martin Černý
- Reason: This framework targets the challenge of incomplete information in cooperative game theory, which has broad applications in AI. Its approach to optimize revealing sequences in cooperative games underscores its potential to mold future research in distributed decision-making and machine learning.
7.9 Make Every Move Count: LLM-based High-Quality RTL Code Generation Using MCTS
- Authors: Matthew DeLorenzo, Animesh Basak Chowdhury, Vasudev Gohil, Shailja Thakur, Ramesh Karri, Siddharth Garg, Jeyavijayan Rajendran
- Reason: By integrating Monte Carlo tree-search with transformer decoding algorithms for code generation, this paper could influence methodologies in RTL code generation with LLMs significantly. The empirical improvements in compilation and PPA efficiency suggest this method could be widely adopted in high-quality code generation.