- 9.6 All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization
- Authors: Pablo Barros, Alessandra Sciutti
- The paper proposes a unique model that learns to map and counteract specific opponent strategies in competitive game scenarios. The model is evaluated on two different game scenarios showing strong performance. This concept can significantly advance reinforcement learning in competitive environments.
- 9.5 Motif: Intrinsic Motivation from Artificial Intelligence Feedback
- Authors: Martin Klissarov, Pierluca D’Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff
- Reason: This paper introduces a novel method to interface prior knowledge from a Large Language Model (LLM) with an agent. The interesting part of this paper is that it achieves a higher game score by only learning to maximize its intrinsic reward, which significantly outperforms existing approaches.
- 9.5 Towards Causal Foundation Model: on Duality between Causal Inference and Attention
- Authors: Jiaqi Zhang, Joel Jennings, Cheng Zhang, Chao Ma
- Reason: Proposed a novel, theoretically sound method for causally-aware foundation models from unlabeled datasets allowing zero-shot causal inference on unseen tasks.
- 9.3 Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
- Authors: Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao
- Reason: The work introduces a novel trajectory-wise policy gradient algorithm that operates directly on comparative rewards. This has major implications for improved alignment of LLMs to human preferences through relative feedback.
- 9.3 Using Reinforcement Learning to Optimize Responses in Care Processes: A Case Study on Aggression Incidents
- Authors: Bart J. Verhoef, Xixi Lu
- The paper utilizes reinforcement learning algorithms Q-learning and SARSA in a practical ‘care process’ context. The study shows similarity between policies derived from these algorithms and currently used actions, suggesting their potential application in the field of healthcare and social care.
- 9.2 Reinforcement Learning for Node Selection in Branch-and-Bound
- Authors: Alexander Mattick, Christopher Mutschler
- Reason: This work introduces a novel method that uses reinforcement learning for the sophisticated task of node selection in branch and bound methods. This approach could significantly improve the performance and efficiency of these algorithms in various applications.
- 9.1 Sparse Backpropagation for MoE Training
- Authors: Liyuan Liu, Jianfeng Gao, Weizhu Chen
- Reason: Introduced SparseMixer, a scalable gradient estimator valuable in the training of Mixture-of-Expert (MoE) models, capable of accelerating training convergence up to 2 times.
- 9.0 Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
- Authors: Shengyi Huang, Jiayi Weng, Rujikorn Charakorn, Min Lin, Zhongwen Xu, Santiago Ontañón
- Reason: This paper brings a new and efficient platform for distributed Reinforcement Learning that ensures high reproducibility.
- 9.0 GenSim: Generating Robotic Simulation Tasks via Large Language Models
- Authors: Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, Xiaolong Wang
- The paper proposes GenSim, an innovative approach to generate robotic simulation environments and expert demonstrations using large language models. The performance improvement demonstrated during multitask policy training in both virtual and real-world environments shows the potential of the method for future applications.
- 8.9 Bayesian Design Principles for Frequentist Sequential Learning
- Authors: Yunbei Xu, Assaf Zeevi
- Reason: The paper presents novel design principles for optimal frequentist regret in sequential learning problems offering significance in various areas such as multi-armed bandits and reinforcement learning.
- 8.7 Combining Spatial and Temporal Abstraction in Planning for Better Generalization
- Authors: Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio
- Reason: The work introduces a model-based reinforcement learning agent that utilizes spatial and temporal abstractions to generalize learned skills in novel situations. Although this work is more theoretical, the results show its potential influence in the development of reinforcement learning agents.
- 8.7 Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
- Authors: Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu
- The paper presents Pessimistic Nonlinear Least-Square Value Iteration (PNLSVI), a new oracle-efficient algorithm for offline RL with non-linear function approximation. The provided regret bound shows promise for the application of this algorithm for offline RL.
- 8.5 From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information
- Authors: Zhendong Shi, Xiaoli Wei, Ercan E. Kuruoglu
- Reason: Propose two methods to overcome shortcomings of reinforcement learning with contextual information, particularly valuable in areas such as financial trading markets.
- 8.5 H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation
- Authors: Yanjie Ze, Yuyao Liu, Ruizhe Shi, Jiaxin Qin, Zhecheng Yuan, Jiashun Wang, Huazhe Xu
- The study introduces a new visual representation learning framework (H-InDex) for dexterous manipulation tasks in reinforcement learning. The method uses Human hand-informed visual representations to improve the performance of robots in conducting complex tasks. The results show substantial improvements over baseline methods.
- 8.3 Pre-training with Synthetic Data Helps Offline Reinforcement Learning
- Authors: Zecheng Wang, Che Wang, Zixuan Dong, Keith Ross
- Reason: Showed that simple synthetic data can aid in the pre-training of Conservative Q-Learning, a popular offline DRL algorithm leading to consistent improvements.