9.7 Unsupervised Behavior Extraction via Random Intent Priors
- Authors: Hao Hu, Yiqin Yang, Jianing Ye, Ziqing Mai, Chongjie Zhang
- Reason: This paper proposes an unsupervised approach to extract useful behaviors from offline reward-free datasets. The authors show both empirically and theoretically that rewards generated from random neural networks can extract diverse and useful behaviors. Their experiments also show promising performance improvements of their method, which greatly broadens the applicability of RL to real-world scenarios.
9.7 Behavior Alignment via Reward Function Optimization
- Authors: Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno Castro da Silva
- Reason: The paper is presented at the influential NeurIPS conference as a Spotlight, indicating its potential significance. The authors introduce a novel framework for reward function optimization in reinforcement learning, which could have a crucial impact on the way RL agents are trained.
9.7 Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills
- Authors: Seongun Kim, Kyowoon Lee, Jaesik Choi
- Reason: The paper presents an innovative approach to reinforcement learning, advancing the unsupervised skill discovery in a theoretical framework. The authors’ contributions to complex navigation and manipulation tasks, validated through empirical results, could have a major impact on the field.
9.5 Weakly Coupled Deep Q-Networks
- Authors: Ibrahim El Shar, Daniel R. Jiang
- Reason: This paper introduces a novel deep reinforcement learning algorithm that enhances performance in a class of structured problems. The method separates the algorithm into multiple DQN subagents and then combines their solutions to provide an upper bound on the optimal action value. The authors also prove theoretical convergence, which contributes to the algorithm’s reliability.
9.5 Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans
- Authors: Kyowoon Lee, Seongun Kim, Jaesik Choi
- Reason: Offering a novel approach for refining unreliable plans in machine learning, the paper could be influential considering the problem it addresses is critical in safety-related applications.
9.4 Contextual Stochastic Bilevel Optimization
- Authors: Yifan Hu, Jie Wang, Yao Xie, Andreas Krause, Daniel Kuhn
- The authors introduce a stochastic bilevel optimization framework that extends classical optimization when the lower-level decision maker responds optimally to the upper-level decision. It can be applied to many practical applications such as meta-learning and personalized federated learning. Furthermore, they have established computational and sample complexities for this method.
9.3 A general learning scheme for classical and quantum Ising machines
- Authors: Ludwig Schmid, Enrico Zardini, Davide Pastorello
- The paper presents a novel machine learning model based on the Ising structure that can be efficiently trained using gradient descent. The research offers new possibilities in quantum machine learning and provides experimental results that demonstrate the effectiveness of their proposed learning model.
9.3 Decoupled Actor-Critic
- Authors: Michal Nauman, Marek Cygan
- Reason: The paper presents an off-policy algorithm that learns two distinct actors. With its state-of-the-art performance on locomotion tasks, despite a minimal computational overhead, this paper could have a significant influence in the field.
9.2 Hierarchical Mutual Information Analysis: Towards Multi-view Clustering in The Wild
- Authors: Jiatai Wang, Zhiwei Xu, Xuewen Yang, Xin Wang
- This paper presents a new hierarchical approach to multi-view clustering, which deals with issues of missing and unaligned data in real-world applications. The authors claim this is the first successful attempt to handle these problems separately with different learning paradigms.
9.2 MAG-GNN: Reinforcement Learning Boosted Graph Neural Network
- Authors: Lecheng Kong, Jiarui Feng, Hao Liu, Dacheng Tao, Yixin Chen, Muhan Zhang
- Reason: The paper has been accepted to NeurIPS 2023 and introduces a novel approach to optimizing subgraph search in GNNs using reinforcement learning. The new RL-boosted GNN model shows competitive performance and improved running time.
9.1 Learning to design protein-protein interactions with enhanced generalization
- Authors: Anton Bushuiev, Roman Bushuiev, Anatolii Filkin, Petr Kouba, Marketa Gabrielova, Michal Gabriel, Jiri Sedlar, Tomas Pluskal, Jiri Damborsky, Stanislav Mazurenko, Josef Sivic
- The authors introduce a new method to discover mutations enhancing protein-protein interactions, which is a critical field in biomedical research and therapeutic development. Moreover, their new model has demonstrated enhanced generalization by outperforming other state-of-the-art methods.
9.1 Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
- Authors: Nikki Lijing Kuang, Ming Yin, Mengdi Wang, Yu-Xiang Wang, Yi-An Ma
- Reason: This novel reinforcement learning method addresses the challenge of delayed feedback in RL with linear function approximation. Notably, the authors provide the first analysis for posterior sampling algorithms with delayed feedback in RL and show that their algorithm achieves significantly reduced worst-case regret in the presence of unknown stochastic delays.
9.1 DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization
- Authors: Guowei Xu, Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Zhecheng Yuan, Tianying Ji, Yu Luo, Xiaoyu Liu, Jiaxin Yuan, Pu Hua, Shuzhen Li, Yanjie Ze, Hal Daumé III, Furong Huang, Huazhe Xu
- Reason: The paper addresses a significant shortcoming in existing visual reinforcement learning methods and provides a novel method to guide agents’ exploration-exploitation trade-offs.
9.0 State-Action Similarity-Based Representations for Off-Policy Evaluation
- Authors: Brahma S. Pavse, Josiah P. Hanna
- The paper introduces a new approach to improve the data-efficiency of fitted Q-evaluation (FQE), a reinforcement learning algorithm. The authors propose a state-action similarity metric to learn an encoder, resulting in enhanced performance on challenging tasks.
9.0 Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep Reinforcement Learning
- Authors: Suraj Singireddy, Andre Beckus, George Atia, Sumit Jha, Alvaro Velasquez
- Reason: The authors propose a form of neuro-symbolic transfer learning in which Q-value estimates are distilled into a low-dimensional representation in the form of an automaton. This could enhance the understanding and implementation of RL in decision processes.
9.0 SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization
- Authors: Hao Dong, Ismail Nejjar, Han Sun, Eleni Chatzi, Olga Fink
- Reason: The paper offers a novel framework for domain generalization in multi-modal scenarios. The authors’ experimentation on several datasets and the introduction of their unique approach of cross-modal translation can potentially make a considerable impact in the field.
8.9 Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards
- Authors: Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi
- Reason: This paper offers algorithmic frameworks for robust off-policy evaluation and offline policy optimization with heavy-tailed rewards. This would be an important work for practical applications of offline reinforcement learning (RL) in real-world scenarios laden with heavy-tailed rewards.
8.9 Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement
- Authors: Daesol Cho, Seungjae Lee, H. Jin Kim
- Reason: This paper presents an innovative approach to curriculum RL, called “Diversify for Disagreement & Conquer”. The methodology shows promising results, outperforming prior curriculum RL methods in both quantitative and qualitative aspects.
8.7 Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning
- Authors: Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du
- Reason: The authors underscore the limitations of off-policy dynamic programming techniques and propose a new approach that bypasses the challenges of Bellman completeness. The approach converges under significantly more relaxed assumptions inherited from supervised learning.
8.6 World Model Based Sim2Real Transfer for Visual Navigation
- Authors: Chen Liu, Kiran Lekkala, Laurent Itti
- Reason: This paper deals with an important issue in reinforcement learning, which is the sim2real transfer. The authors propose a system that is trained entirely within a simulator and then is able to transfer to the real world without error, making this research highly influential for real-world applications of reinforcement learning.