9.8 Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
- Authors: Licong Lin, Yu Bai, Song Mei
- Reason: The paper introduces a theoretical framework to understand the capabilities of transformers in ICRL. The work provides insight into the application of transformers to RL algorithms and discusses how the distribution mismatch in offline training data affects the learned algorithms.
9.7 Deep Reinforcement Learning for Autonomous Cyber Operations: A Survey
- Authors: Gregory Palmer, Chris Parry, Daniel J.B. Harrold, Chris Willis
- Reason: This comprehensive survey on the application of deep reinforcement learning to autonomous cyber-operations provides a novel perspective, and identifies key challenges and potential solutions. The authors’ expertise in machine learning and cyber security enhances its potential influence in the future research directions.
9.7 Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
- Authors: Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey Levine, Chelsea Finn
- Reason: The paper introduces a novel framework that decouples the policies used for data collection and for evaluation in RL. The proposed approach could have a significant impact given its potential for improving the efficiency of online RL and offline RL methods.
9.6 Discerning Temporal Difference Learning
- Authors: Jianfei Ma
- Reason: The author is a well-known authority in RL. The novel TD algorithm, which can adapt to different emphasis functions, offers a significant advancement in TD learning and its implementation in deep RL contexts.
9.6 Cross-Episodic Curriculum for Transformer Agents
- Authors: Lucy Xiaoyang Shi, Yunfan Jiang, Jake Grigsby, Linxi “Jim” Fan, Yuke Zhu
- Reason: This paper presents a novel algorithm, CEC, which specifically targets learning efficiency and generalization in Transformer agents. The work showcases successful applications in multi-task RL with discrete control and mixed-quality data for continuous control.
9.5 Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples
- Authors: Hao Sun, Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar
- Reason: The authors address the crucial issue of decision accountability in offline reinforcement learning, proposing a novel method named Accountable Offline Controller. Its applicability in healthcare and other critical settings positions this work to potentially significant influence.
9.5 MeanAP-Guided Reinforced Active Learning for Object Detection
- Authors: Zhixuan Liang, Xingyu Zeng, Rui Zhao, Ping Luo
- Reason: The paper presents a novel approach to active learning with reinforcement learning-based sampling for object detection. The approach, named MAGRAL, has shown success across popular benchmarks, signifying potential impact in the field of object detection.
9.4 DistillSpec: Improving Speculative Decoding via Knowledge Distillation
- Authors: Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal
- Reason: In this paper, authors present a new method that uses knowledge distillation to better align a draft model for faster large language model inference with the target model, resulting in substantial improvements across various benchmarks.
9.3 Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling
- Authors: Zheqing Zhu, Yueyang Liu, Xu Kuang, Benjamin Van Roy
- Reason: This paper offers a novel approach to non-stationary contextual bandit learning, integrating scalable deep learning methods and strategic exploration mechanism. Applied on real-world recommendation datasets, the method promises significant improvements over existing techniques.
9.1 Online RL in Linearly $q^π$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore
- Authors: Gellért Weisz, András György, Csaba Szepesvári
- Reason: This paper presents a new online reinforcement learning algorithm that deals with the challenges of learning in high dimensional state-action spaces, making it potentially influential for further deploying reinforcement learning in complex environments.
8.9 DeePref: Deep Reinforcement Learning For Video Prefetching In Content Delivery Networks
- Authors: Nawras Alkassab, Chin-Tser Huang, Tania Lorido Botran
- Reason: The authors introduce DeePref, a deep reinforcement learning agent for video prefetching, which shows promising results in performance, making it potentially influential for optimizing CDN networks.
8.7 Impact of multi-armed bandit strategies on deep recurrent reinforcement learning
- Authors: Valentina Zangirolami, Matteo Borrotti
- Reason: The paper discusses important RL issues and introduces an innovative method to improve the learning phase of Convolutional Recurrent Neural Networks.
8.1 QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
- Authors: Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang
- Reason: The work addresses the need for practical methods in deploying LLMs. It showcases a promising strategy for low-bitwidth quantization models while compensating for the performance loss due to quantization.
7.8 Generative Intrinsic Optimization: Intrisic Control with Model Learning
- Authors: Jianfei Ma
- Reason: The author presents a novel method of integrating intrinsic motivation with model learning to enhance RL models’ efficiency.
7.3 SEE-OoD: Supervised Exploration For Enhanced Out-of-Distribution Detection
- Authors: Xiaoyang Song, Wenbo Sun, Maher Nouiehed, Raed Al Kontar, Judy Jin
- Reason: The paper presents a unique approach to OoD detection by employing a Wasserstein-score-based generative adversarial training scheme, which may greatly enhance the capabilities of ML systems in handling unexpected inputs.