9.7 Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
- Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen
- Reason: This paper is highly influential as it addresses open problems and fundamental limitations in reinforcement learning from human feedback. The authors also propose auditing and disclosure standards to enhance accountability in RLHF systems.
9.3 Benchmarking Offline Reinforcement Learning on Real-Robot Hardware
- Authors: Nico Gürtler, Sebastian Blaes, Pavel Kolev, Felix Widmaier, Manuel Wüthrich, Stefan Bauer, Bernhard Schölkopf, Georg Martius
- Reason: The paper proposes a novel benchmark for monitoring and evaluating the performance of offline reinforcement learning algorithms, particularly in real-world robotics tasks. It has considerable influence as it provides valuable toolkit for offline reinforcement learning on real systems.
8.9 Robust Visual Sim-to-Real Transfer for Robotic Manipulation
- Authors: Ricardo Garcia, Robin Strudel, Shizhe Chen, Etienne Arlaud, Ivan Laptev, Cordelia Schmid
- Reason: This paper stands out for its innovative approach to improving the visual domain transfer from simulation to real world applications for robot manipulation tasks.
8.5 Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture Search
- Authors: Alexander Chebykin, Arkadiy Dushatskiy, Tanja Alderliesten, Peter A. N. Bosman
- Reason: Despite its specific focus on neural architecture search, this paper is influential in presenting an adapted version of the Population Based Training algorithm, showing great effectiveness in challenging tasks.
8.1 A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
- Authors: Zhihan Xiong, Romain Camilleri, Maryam Fazel, Lalit Jain, Kevin Jamieson
- Reason: This paper presents an innovative algorithm, P1-RAGE, addressing the challenge of object identification in a non-stationary setting, making it a potentially influential work in its area.