1. 9.2 A Zero-Shot Reinforcement Learning Strategy for Autonomous Guidewire Navigation
  2. 9.0 Autonomous vehicle decision and control through reinforcement learning with traffic flow randomization
  3. 8.9 Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
  4. 8.9 Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination
  5. 8.7 Wukong: Towards a Scaling Law for Large-Scale Recommendation
  6. 8.7 Behavior Generation with Latent Actions
  7. 8.6 Preventing Reward Hacking with Occupancy Measure Regularization
  8. 8.5 Learning-augmented Online Minimization of Age of Information and Transmission Costs
  9. 8.3 A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation
  10. 8.1 Geometric Dynamics of Signal Propagation Predict Trainability of Transformers