9.7 Making RL with Preference-based Feedback Efficient via Randomization
- Authors: Runzhe Wu, Wen Sun
- The paper introduces innovative RL algorithms that use randomization to improve efficiency and reduce the complexity of human feedback. The algorithms demonstrate a notable tradeoff between regret bound and the complexity of queries.
9.5 A Better Match for Drivers and Riders: Reinforcement Learning at Lyft
- Authors: Xabi Azagirre, Akshay Balwally, Guillaume Candeli, Nicholas Chamandy, Benjamin Han, Alona King, Hyungjun Lee, Martin Loncaric, Sébastien Martin, Vijay Narasiman, Zhiwei, Baptiste Richard, Sara Smoot, Sean Taylor, Garrett van Ryzin, Di Wu, Fei Yu, Alex Zamoshchin
- This paper’s influence is evidenced by its practical impact, having allowed Lyft’s drivers to serve millions of additional riders each year, leading to more than $30 million per year in additional revenue. It focuses on a real-world application of reinforcement learning, which is highly relevant within the machine learning research sector.
9.5 Policy Gradient with Kernel Quadrature
- Authors: Satoshi Hayakawa, Tetsuro Morimura
- The method minimizes rewards evaluation bottleneck by selecting a representative subset of large amount of episodes. The paper offers both theoretical and numerical evidence, indicating its significant relevance in the field of RL.
9.4 Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning
- Authors: Jingyun Yang, Max Sobol Mark, Brandon Vu, Archit Sharma, Jeannette Bohg, Chelsea Finn
- The paper proposes a novel framework for reinforcement learning that leverages pre-trained models for new task learning with minimal human intervention. The work demonstrates superior performance in both real robot manipulation tasks and various simulations making it potentially influential in the field of robotic reinforcement learning.
9.3 Orthogonal Subspace Learning for Language Model Continual Learning
- Authors: Xiao Wang, Tianze Chen, Qiming Ge, Han Xia, Rong Bao, Rui Zheng, Qi Zhang, Tao Gui, Xuanjing Huang
- Reason: Introduces a novel approach for continual learning in language models that effectively mitigates catastrophic forgetting.
9.3 Diverse Priors for Deep Reinforcement Learning
- Authors: Chenfan Weng, Zhongguo Li
- The authors offer an innovative approach to solve the exploration-exploitation dilemma in RL using a new ensemble-based method that includes delicately designed prior NNs. The method shows high efficacy and improved sample efficiency.
9.2 Learning to (Learn at Test Time)
- Authors: Yu Sun, Xinhao Li, Karan Dalal, Chloe Hsu, Sanmi Koyejo, Carlos Guestrin, Xiaolong Wang, Tatsunori Hashimoto, Xinlei Chen
- The paper reshapes the problem of supervised learning into a new form of learning with two nested learning problems and demonstrates practical results when applied to transformers in ImageNet. The importance of this paper lies in its refresh approach to the traditional supervised learning.
9.2 The primacy bias in Model-based RL
- Authors: Zhongjian Qiao, Jiafei Lyu, Xiu Li
- The researchers delve into the challenge of primacy bias in deep reinforcement learning (DRL) with a focus on model-based reinforcement learning (MBRL). They propose a method that significantly counteracts this bias and improves the performance of DRL algorithms, making it an impactful read in the reinforcement learning domain.
9.1 Interpretable Deep Reinforcement Learning for Optimizing Heterogeneous Energy Storage Systems
- Authors: Luolin Xiong, Yang Tang, Chensheng Liu, Shuai Mao, Ke Meng, Zhaoyang Dong, Feng Qian
- The paper presents a novel interpretable RL system for optimizing energy storage. It reports performance competition with black-box models, manifesting potential practical applicability and efficiency of interpretable RL models.
9.0 One is More: Diverse Perspectives within a Single Network for Efficient DRL
- Authors: Yiqin Tan, Ling Pan, Longbo Huang
- The authors introduce a new network paradigm for improving the performance of deep reinforcement learning algorithms, providing a balance between computational efficiency and effectiveness. The concept of having diverse subnetworks within a single network is an innovative approach.
9.0 Reinforcement learning in large, structured action spaces: A simulation study of decision support for spinal cord injury rehabilitation
- Authors: Nathan Phelps, Stephanie Marrocco, Stephanie Cornell, Dalton L. Wolfe, Daniel J. Lizotte
- The paper discusses the application of reinforcement learning in the field of medical rehabilitation, which is an underrepresented and potentially groundbreaking application of RL. The study’s findings provide substantial evidence to the effectiveness of reinforcement learning in the treatment of spinal cord injuries.
8.9 Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation
- Authors: Haoyang Liu, Yufei Kuang, Jie Wang, Xijun Li, Yongdong Zhang, Feng Wu
- Reason: Proposes a new approach to promote data diversity for learning-based branch-and-bound (B&B) solvers, leading to efficiency improvements.
8.9 Specialized Deep Residual Policy Safe Reinforcement Learning-Based Controller for Complex and Continuous State-Action Spaces
- Authors: Ammar N. Abbas, Georgios C. Chasparis, John D. Kelleher
- This paper proposes an advanced deep residual policy safe RL system adapted for complex and continuous state-action spaces. The real-world validation of the system underscores its potential applicability and performance.
8.9 Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients
- Authors: Maximilian Krahn, Michelle Sasdelli, Fengyi Yang, Vladislav Golyanik, Juho Kannala, Tat-Jun Chin, Tolga Birdal
- The authors introduce a novel stochastic optimiser for training binary neural networks, demonstrating how it can be used in quantum computation as well. The superiority of the proposed optimiser could have a significant impact on the training of neural networks in the future.
8.8 Robot Skill Generalization via Keypoint Integrated Soft Actor-Critic Gaussian Mixture Models
- Authors: Iman Nematollahi, Kirill Yankov, Wolfram Burgard, Tim Welschehold
- The study proposes a hybrid model that integrates imitation and reinforcement learning for robotic skill generalization. The results show that the approach significantly improves a robot’s ability to adapt to new environments and could therefore heavily influence the way robots are programmed to learn skills.
8.7 Continual Invariant Risk Minimization
- Authors: Francesco Alesiani, Shujian Yu, Mathias Niepert
- This paper tackles on the significant issue in the supervision learning, the invariant feature representation, and proposes a continual learning approach called Continual Invariant Risk Minimization (CIRM). The innovative approach in this paper has shown promising results across several environments and datasets.
8.5 Stabilizing reinforcement learning control: A modular framework for optimizing over all stable behavior
- Authors: Nathan P. Lawrence, Philip D. Loewen, Shuyuan Wang, Michael G. Forbes, R. Bhushan Gopaluni
- The authors propose a novel framework that combines deep reinforcement learning and the stability guarantees of Youla-Kucera parameterization. This framework as a worthwhile contribution to the reinforcement learning field, as stability is such an important factor in many applications of reinforcement learning. It demonstrates a strong theoretical contribution in this area.
8.2 Iteratively Learn Diverse Strategies with State Distance Information
- Authors: Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, Yi Wu
- Reason: Combines iterative learning with state-distance-based diversity measures to discover diverse, human-interpretable strategies in complex reinforcement learning problems.
7.3 $α$-Fair Contextual Bandits
- Authors: Siddhant Chaudhary, Abhishek Sinha
- Reason: Discusses the important problem of maximizing the global α-fair utility function in contextual bandit algorithms, ensuring fairness and compliance with regulatory requirements.
7.1 Towards Zero Shot Learning in Restless Multi-armed Bandits
- Authors: Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe
- Reason: Develops a neural network-based model that improves efficiency and adaptability in restless multi-arm bandit problems while achieving near-zero-shot ability on a wide range of previously unseen problems.