1. 9.2 Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning
  2. 9.0 Federated Reinforcement Learning with Constraint Heterogeneity
  3. 8.9 Deep Reinforcement Learning for Modelling Protein Complexes
  4. 8.7 Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning
  5. 8.7 Policy Learning for Balancing Short-Term and Long-Term Rewards
  6. 8.5 Proximal Curriculum with Task Correlations for Deep Reinforcement Learning
  7. 8.5 Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning
  8. 8.4 Enhancing Q-Learning with Large Language Model Heuristics
  9. 8.3 Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
  10. 8.1 CTD4 - A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics