1. 9.8 Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
  2. 9.7 Deep Reinforcement Learning for Autonomous Cyber Operations: A Survey
  3. 9.7 Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
  4. 9.6 Discerning Temporal Difference Learning
  5. 9.6 Cross-Episodic Curriculum for Transformer Agents
  6. 9.5 Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples
  7. 9.5 MeanAP-Guided Reinforced Active Learning for Object Detection
  8. 9.4 DistillSpec: Improving Speculative Decoding via Knowledge Distillation
  9. 9.3 Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling
  10. 9.1 Online RL in Linearly $q^π$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore
  11. 8.9 DeePref: Deep Reinforcement Learning For Video Prefetching In Content Delivery Networks
  12. 8.7 Impact of multi-armed bandit strategies on deep recurrent reinforcement learning
  13. 8.1 QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
  14. 7.8 Generative Intrinsic Optimization: Intrisic Control with Model Learning
  15. 7.3 SEE-OoD: Supervised Exploration For Enhanced Out-of-Distribution Detection