- 9.8 Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts
- Authors: Huy Nguyen, Pedram Akbarian, Fanqi Yan, Nhat Ho
- Reason: The paper provides theoretical understanding to the effects of top-K sparse softmax gating function on both density and parameter estimations in the context of Gaussian mixtures, which is a key area of focus in reinforcement learning research. It has potential to influence the wider scope of application of reinforced learning approaches.
- 9.6 Diagnosing and exploiting the computational demands of videos games for deep reinforcement learning
- Authors: Lakshmi Narasimhan Govindarajan, Rex G Liu, Drew Linsley, Alekh Karkada Ashok, Max Reuter, Michael J Frank, Thomas Serre
- Reason: The authors introduce a new tool, Learning Challenge Diagnosticator (LCD), which measures the perceptual and reinforcement learning demands of a task separately. This approach could lead to better optimization of deep reinforcement learning algorithms, making it potentially highly influential in the field.
- 9.6 Can LLM-Generated Misinformation Be Detected?
- Authors: Canyu Chen, Kai Shu
- Reason: This paper addresses a societally important issue of misinformation detection in content generated by Large Language Models (LLMs), a topic of wide concern with the increasing influence of AI-generated content. The research could influence reinforcement learning models in terms of misinformation detection and correction.
- 9.6 Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
- Authors: Zeyuan Allen Zhu, Yuanzhi Li
- Abstract focuses on large language models and their capability to store extensive world knowledge. The authors conduct an in-depth study using a controlled set of semi-synthetic biography data and uncover a relationship between the model’s knowledge extraction ability and different diversity measures of the training data.
- 9.5 Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
- Authors: Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
- The paper offers an explanation for the efficacy of deep policy-based algorithms, which is currently lacking in the literature. This is a critical advance as the authors show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
- 9.4 Weakly Supervised Reasoning by Neuro-Symbolic Approaches
- Authors: Xianggen Liu, Zhengdong Lu, Lili Mou
- Reason: This paper introduces a new approach to weakly supervised reasoning in NLP tasks which combines symbolic and connectionist schools of AI. Given the importance and perennial challenge of reasoning tasks in NLP, this hybrid approach could have a significant impact.
- 9.4 ORLA: Mobile Manipulator-Based Object Rearrangement with Lazy A
- Authors: Kai Gao, Yan Ding, Shiqi Zhang, Jingjin Yu
- Reason: This paper tackles the object rearrangement problem, which is a significant task in the robotics and automation field. The paper’s algorithm could possibly influence reinforcement learning practices in the field of autonomous robotic systems.
- 9.3 ODE-based Recurrent Model-free Reinforcement Learning for POMDPs
- Authors: Xuanle Zhao, Duzhen Zhang, Liyuan Han, Tielin Zhang, Bo Xu
- The paper presents a novel ODE-based recurrent model that combines model-free RL to solve partially observable Markov decision processes (POMDPs). The experiments illustrate the method as robust against irregular observations, providing a valuable contribution to the field.
- 9.2 Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework
- Authors: Wenzhuo Zhou, Yuhan Li, Ruoqing Zhu, Annie Qu
- Reason: The authors target a key challenge in reinforcement learning – off-policy evaluation in infinite-horizon Markov decision processes – by presenting a novel framework that accounts for distributional shifts. Given the ubiquity of Markov decision processes in many areas of machine learning, their work is poised to be highly influential.
- 9.2 Interpretable and Flexible Target-Conditioned Neural Planners For Autonomous Vehicles
- Authors: Haolan Liu, Jishen Zhao, Liangjun Zhang
- Reason: This paper’s proposed neural planner to regress a heatmap, holds a strong influence in interpreting autonomous vehicles movements, which is an advanced application of reinforcement learning. Hence, it can bring significant influence in the reinforced learning approach for autonomous vehicles.
- 9.2 Hierarchical Imitation Learning for Stochastic Environments
- Authors: Maximilian Igl, Punit Shah, Paul Mougin, Sirish Srinivasan, Tarun Gupta, Brandyn White, Kyriacos Shiarlis, Shimon Whiteson
- The paper presents Robust Type Conditioning (RTC) with adversarial training for hierarchical imitation learning in stochastic environments. The results show improved distributional realism while maintaining or improving task performance compared to state-of-the-art baselines.
- 9.1 Deep Reinforcement Learning for the Heat Transfer Control of Pulsating Impinging Jets
- Authors: Sajad Salavatidezfouli, Giovanni Stabile, Gianluigi Rozza
- This research study explores the applicability of Deep Reinforcement Learning (DRL) for thermal control based on Computational Fluid Dynamics. The findings demonstrate the promising potential of DRL in effectively addressing thermal control systems.
- 9.0 C$^2$VAE: Gaussian Copula-based VAE Differing Disentangled from Coupled Representations with Contrastive Posterior
- Authors: Zhangkai Wu, Longbing Cao
- Reason: The authors introduce a novel self-supervised variational autoencoder to learn both disentangled and dependent hidden factors, potentially-enhancing current understanding of representation learning in autoencoders. The novel combination of techniques and the clear grounding in theoretical principles suggests potential for high impact.
- 9.0 Real-time Bandwidth Estimation from Offline Expert Demonstrations
- Authors: Aashish Gottipati, Sami Khairy, Gabriel Mittag, Vishak Gopal, Ross Cutler
- Reason: This paper’s technique of employing offline bandwidth estimation holds potential influence in the field reinforcement learning by potentially improving efficiency and accuracy in data-driven network controls. Its successful implementation could significantly influence future reinforcement learning approaches in network control tasks.
- 8.8 Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
- Authors: Hecotr Kohler, Riad Akrour, Philippe Preux
- Reason: This paper contributes to the call for greater interpretability in AI by examining the performance of actor-critic algorithms in learning decision tree policies. Their findings underline the limitations of deep reinforcement learning models in this regard, offering valuable insights for future research.