- 9.5 Semi-Offline Reinforcement Learning for Optimized Text Generation
- Authors: Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan
- Reason: The paper presents a novel reinforcement learning paradigm and its implementation, which provides a foundation for comparing different RL settings and shows high performance. The authors are widely recognized experts in the field, and the paper was published in proceedings of a top-tier conference.
- 9.3 Simplified Temporal Consistency Reinforcement Learning
- Authors: Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, Joni Pajarinen
- This paper presents a simple and efficient approach to reinforcement learning that relies only on a latent dynamics model, tackling the issue of sample efficiency in current RL models. It exhibits strong performance in a range of high-dimensional locomotion tasks.
- 9.3 Automatic Trade-off Adaptation in Offline RL
- Authors: Phillip Swazinna, Steffen Udluft, Thomas Runkler
- Reason: The paper proposes a new method for improving offline RL, which is currently a trending area in machine learning. It’s noteworthy that the research was featured in an oral presentation at a reputable conference.
- 9.2 Attention-based Open RAN Slice Management using Deep Reinforcement Learning
- Authors: Fatemeh Lotfi, Fatemeh Afghah, Jonathan Ashdown
- This research uses deep reinforcement learning techniques to enhance performance in managing network slices in emerging networks like O-RAN and 5G. Despite the complexity of the task, the proposed method shows promising results when compared with other DRL baseline methods.
- 9.2 Temporal Difference Learning with Experience Replay
- Authors: Han-Dong Lim, Donghwan Lee
- Reason: The paper investigates the theoretical effects of experience replay on RL, providing new insight into the mechanisms of RL.
- 9.1 Residual Q-Learning: Offline and Online Policy Customization without Value
- Authors: Chenran Li, Chen Tang, Haruki Nishimura, Jean Mercat, Masayoshi Tomizuka, Wei Zhan
- This paper tackles the challenge of customizing imitative policies to meet diverse downstream task requirements. A novel approach is introduced, called Residual Q-learning, that does not need to know the inherent reward of the prior policy, proving effective in accomplishing policy customization tasks across various environments.
- 9.1 Creating Multi-Level Skill Hierarchies in Reinforcement Learning
- Authors: Joshua B. Evans, Özgür Şimşek
- Reason: The paper investigates the creation of skill hierarchies in reinforcement learning, an important aspect of building more sophisticated RL systems.
- 9.0 Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
- Authors: Andrei Kucharavy, Rachid Guerraoui, Ljiljana Dolamic
- Reason: The paper provides a thorough investigation of the relationship between evolutionary algorithms and SGD, which makes an important contribution for understanding the relative strengths and weaknesses of these algorithms.
- 8.9 QuadSwarm: A Modular Multi-Quadrotor Simulator for Deep Reinforcement Learning with Direct Thrust Control
- Authors: Zhehui Huang, Sumeet Batra, Tao Chen, Rahul Krupani, Tushar Kumar, Artem Molchanov, Aleksei Petrenko, James A. Preiss, Zhaojing Yang, Gaurav S. Sukhatme
- The authors present QuadSwarm, an efficient and reliable simulator for research involving single and multi-robot RL for quadrotors. It supports fast throughput, and handles physical interactions with the environment that are essential for transferring policies learned in simulation to reality.
- 8.8 Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
- Authors: Yunfan Li, Yiran Wang, Yu Cheng, Lin Yang
- This paper presents an improved sample-efficient policy optimization algorithm that works with general non-linear function approximation, with strong experimental results. This advancement boosts performance over prevailing policy optimization algorithms and portrays promising applications.