9.4 Reparameterized Policy Learning for Multimodal Trajectory Optimization
- Authors: Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su
- Reason: This paper tackles the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. The authors propose a novel model-based RL method, RPG, and demonstrate its effectiveness with improved exploration and data efficiency.
9.1 Privacy Amplification via Importance Sampling
- Authors: Dominik Fay, Sebastian Mair, Jens Sjölund
- Reason: The authors of this paper have a strong authority in the field and their paper explores a novel concept of privacy enhancement using importance sampling in differentially private mechanisms, which could contribute significantly to the development of reinforced learning privacy conversations.
9.1 On Combining Expert Demonstrations in Imitation Learning via Optimal Transport
- Authors: Ilana Sebag, Samuel Cohen, Marc Peter Deisenroth
- Reason: This paper tackles the issue of how to optimally combine multiple expert demonstrations in imitation learning. It proposes a method that provides a more sensible geometric average of demonstrations and demonstrates its efficiency on OpenAI Gym control environments.
8.9 Self-paced Weight Consolidation for Continual Learning
- Authors: Wei Cong, Yang Cong, Gan Sun, Yuyang Liu, Jiahua Dong
- Reason: This paper tackles the problem of preventing catastrophic forgetting in continual learning. It proposes a framework to measure the difficulty of past tasks and selectively maintains the knowledge amongst more difficult past tasks.
8.7 Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings
- Authors: Sujan Dutta, Parth Srivastava, Vaishnavi Solunke, Swaprava Nath, Ashiqur R. KhudaBukhsh
- Reason: This paper, accepted for a major AI conference (IJCAI 2023), addresses a key issue in reinforcement learning - biases in AI models. The dataset used is extensive which gives the paper a good empirical grounding.
8.7 A Definition of Continual Reinforcement Learning
- Authors: David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh
- Reason: This is a foundational paper which aims to define Continual Reinforcement Learning.
8.5 Adversarial Training Over Long-Tailed Distribution
- Authors: Guanlin Li, Guowen Xu, Tianwei Zhang
- Reason: This paper investigates an essential but lesser explored aspect of adversarial training over long-tailed distribution datasets, making it groundbreaking research. The authors propose a new adversarial training framework which could have a substantial impact on the field.
8.5 PASTA: Pretrained Action-State Transformer Agents
- Authors: Raphael Boige, Yannis Flet-Berliac, Arthur Flajolet, Guillaume Richard, Thomas Pierrot
- Reason: This paper investigates models referred to as Pretrained Action-State Transformer Agents (PASTA) and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. It provides insights to practitioners for building robust models.
8.3 On the Sensitivity of Deep Load Disaggregation to Adversarial Attacks
- Authors: Hafsa Bousbiat, Yassine Himeur, Abbes Amira, Wathiq Mansoor
- Reason: This paper explores the sensitivity of NILM algorithms to adversarial attacks, an area of growing interest in reinforcement learning. The research findings could have profound implications for energy management systems.
8.0 Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild
- Authors: Giuseppe Siracusano, Davide Sanvito, Roberto Gonzalez, Manikantan Srinivasan, Sivakaman Kamatchi, Wataru Takahashi, Masaru Kawakita, Takahiro Kakumaru, Roberto Bifulco
- Reason: The authors propose a novel tool for analyzing cyber threat intelligence, with a clear focus on optimizing the process of extracting relevant information from unstructured text sources. This could be instrumental in improving cybersecurity measures related to reinforcement learning algorithms.