1. 9.1 Supercompiler Code Optimization with Zero-Shot Reinforcement Learning
  2. 8.7 Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective
  3. 8.5 AFU: Actor-Free critic Updates in off-policy RL for continuous control
  4. 8.2 Reinforcement Learning with Generative Models for Compact Support Sets
  5. 7.8 A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints