9.5 Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization
- Authors: Kun Lei, Zhengmao He, Chenhao Lu, Kaizhe Hu, Yang Gao, Huazhe Xu
- Reason: The paper presents a novel approach to combine both offline and online reinforcement learning seamlessly, which is essential for efficient and adaptable RL applications. The use of diverse ensemble policies and a simple OPE method for multi-step policy improvement is potentially influential for both research and practical implementations in the field.
9.2 Learning Reusable Manipulation Strategies
- Authors: Jiayuan Mao, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
- Reason: This work introduces a framework for learning and generalizing manipulation skills from a single demonstration, with potential broad impact on how robots acquire and apply ‘tricks’ across various tasks, showing an understanding of complex skill acquisition similar to human ability.
8.9 LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion
- Authors: Firas Al-Hafez, Guoping Zhao, Jan Peters, Davide Tateo
- Reason: This paper proposes a comprehensive benchmark for Imitation Learning (IL) in complex locomotion tasks, which is an area generating high interest in both academic and industry circles. The authors, including Jan Peters who is an authority in robotics and machine learning, combine a rich set of environments and datasets that can drive future research and applications. The benchmark includes a wide range of embodiments and metrics, which makes it particularly influential for developing new IL algorithms.
8.7 Using General Value Functions to Learn Domain-Backed Inventory Management Policies
- Authors: Durgesh Kalwar, Omkar Shelke, Harshad Khadilkar
- Reason: The paper proposes a novel reinforcement learning (RL) approach using General Value Functions (GVFs) which is innovative and domain-critical, focusing on inventory management which is a practical and challenging area of application. The approach also includes a potential for faster adaptation to different business environments, which greatly increases its potential influence.
8.7 MAAIP: Multi-Agent Adversarial Interaction Priors for imitation from fighting demonstrations for physics-based characters
- Authors: Mohamed Younes, Ewa Kijak, Richard Kulpa, Simon Malinowski, Franck Multon
- Reason: The paper presents a novel approach to the challenging task of multi-agent interaction in the context of physics-based characters, with a focus on fighting styles, which is crucial for both entertainment industries and complex interactive simulations. The inclusion of unstructured datasets for training provides a realistic framework for developing and evaluating multi-agent systems.
8.7 TS-Diffusion: Generating Highly Complex Time Series with Diffusion Models
- Authors: Yangming Li
- Reason: The paper introduces a robust generative model for complex time series data that handles irregularities, missingness, and high dimensionality, which is critical for various industry applications such as finance and healthcare, marking it as potentially influential.
8.6 QOCO: A QoE-Oriented Computation Offloading Algorithm based on Deep Reinforcement Learning for Mobile Edge Computing
- Authors: Iman Rahmati, Hamed Shah-Mansouri, Ali Movaghar
- Reason: The study’s focus on Quality of Experience (QoE) in mobile edge computing and its novel DRL-based computation offloading strategy could have a significant impact on user experience in mobile networks. It addresses a vital aspect of contemporary mobile applications, and the results show a substantial improvement over existing algorithms.
8.5 Hierarchical Reinforcement Learning for Power Network Topology Control
- Authors: Blazej Manczak, Jan Viebahn, Herke van Hoof
- Reason: Presents a new hierarchical reinforcement learning framework for controlling power networks, addressing the high-dimensional action space issue. The applied nature in critical infrastructure of power grids and reported outperformance makes this paper influentially significant.
8.3 Combining Deep Learning on Order Books with Reinforcement Learning for Profitable Trading
- Authors: Koti S. Jaddu, Paul A. Bilokon
- Reason: Addresses high-frequency trading with a combination of deep learning and reinforcement learning, presenting a reproducible and potentially profitable approach. The application to financial instruments and the success in backtesting makes it notable.
8.3 Nonlinear Multi-objective Reinforcement Learning with Provable Guarantees
- Authors: Nianli Peng, Brandon Fain
- Reason: The paper introduces an algorithm for solving multi-objective Markov Decision Processes with nonlinear functions, extending the widely-applicable E3 algorithm. This algorithm is innovative in its approach to fairness-aware and risk-aware reinforcement learning, areas of increasing importance due to their societal and economic implications.
8.3 Neural Structure Learning with Stochastic Differential Equations
- Authors: Benjie Wang, Joel Jennings, Wenbo Gong
- Reason: The approach of learning the underlying structure among variables in a continuous-time setting can have significant implications across multiple disciplines. The paper’s theoretical and empirical validation emphasizes its potential influence in improving structure learning from temporal observations.
8.1 Steady-State Analysis of Queues with Hawkes Arrival and Its Application to Online Learning for Hawkes Queues
- Authors: Xinyun Chen, Guiyu Hong
- Reason: While the topic of this paper might seem niche, its application to online learning for queues and the development of an efficient algorithm for the optimal staffing problem provide valuable insights into system performance improvement and have potential for influence in the field of operations research and machine learning in healthcare, telecommunications, and various services.
7.9 AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation
- Authors: Daiki E. Matsunaga, Jongmin Lee, Jaeseok Yoon, Stefanos Leonardos, Pieter Abbeel, Kee-Eung Kim
- Reason: The paper’s contribution to multi-agent RL with a novel algorithm that handles the curse of dimensionality and shows experimental performance improvement is quite significant, especially with Pieter Abbeel as one of the authors who is a notable authority in the field.
7.9 RELand: Risk Estimation of Landmines via Interpretable Invariant Risk Minimization
- Authors: Mateo Dulce Rubio, Siqi Zeng, Qi Wang, Didier Alvarado, Francisco Moreno, Hoda Heidari, Fei Fang
- Reason: The paper presents an application of reinforcement learning to the critical and impactful field of humanitarian demining, promising to support demining efforts with tangible real-world benefits. The potential for this technology to save lives contributes to its importance and potential influence.
7.6 Imitation Bootstrapped Reinforcement Learning
- Authors: Hengyuan Hu, Suvir Mirchandani, Dorsa Sadigh
- Reason: Combines imitation learning and reinforcement learning for robotics, demonstrating improved performance and sample efficiency. The applications are directly relevant to robotics, and the authors propose an interesting hybrid approach that could influence future research directions.