9.6 ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning
- Authors: Xin Yu, Rongye Shi, Pu Feng, Yongkai Tian, Jie Luo, Wenjun Wu
- Reason: Presented a framework for utilizing prior knowledge with data augmentation and efficiency in model training, thus addressing one of the major challenges in existing reinforcement learning methods.
9.5 Curiosity-Driven Reinforcement Learning based Low-Level Flight Control
- Authors: Amir Ramezani Dooraki, Alexandros Iosifidis
- This paper guides the frontier of reinforcement learning, emphasising on curiosity-driven autonomous learning to control motor speeds from odometry data in quadcopters. With the proposed algorithm, the quadcopter can pass through obstacles. The algorithm is superior to existing methods in terms of learning optimal policy and maximizing rewards.
9.3 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
- Authors: Anthony Brohan, Noah Brown, and many others
- This work is notable because of its study on how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control. The work shows impressive results like improved generalization to novel objects, ability interpret unseen commands, and rudimentary reasoning skills. This can have a massive impact on fields working on complex robotic tasks with minimal human input.
9.3 Robust Multi-Agent Reinforcement Learning with State Uncertainty
- Authors: Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, Fei Miao
- Reason: Proposed a robust multi-agent Q-learning (RMAQ) algorithm addressing the common problem of state uncertainties in multi-agent reinforcement learning, includes both theoretical and empirical analysis.
9.1 Spherical and Hyperbolic Toric Topology-Based Codes On Graph Embedding for Ising MRF Models: Classical and Quantum Topology Machine Learning
- Authors: Vasiliy Usatyuk, Sergey Egorov, Denis Sapozhnikov
- These authors introduce the application of information geometry to describe the ground states of Ising models, establishing a connection between machine learning and error-correcting coding. This is a novel approach and has implications for the development of new embedding methods based on trapping sets and advancements in multiple fields.
9.0 Robust Electric Vehicle Balancing of Autonomous Mobility-On-Demand System: A Multi-Agent Reinforcement Learning Approach
- Authors: Sihong He, Shuo Han, Fei Miao
- Reason: Developed a multi-agent reinforcement learning-based framework for managing electric autonomous vehicles, addressing both supply and demand uncertainties.
8.9 DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction
- Authors: Xiaowei Mao, Haomin Wen, Hengrui Zhang, Huaiyu Wan, Lixia Wu, Jianbin Zheng, Haoyuan Hu, Youfang Lin
- Reason: Introduces Reinforcement Learning (RL) to the route prediction tasks, addressing the mismatch in training and test criteria commonly experienced in deep neural networks.
8.8 A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using $L$-$λ$ Smoothness
- Authors: Hengshuai Yao
- This paper presents a novel Gradient Temporal Difference (GTD) algorithm, called Impression GTD, for minimizing the Norm of Expected td Update (NEU) objective. The singular step-size parameter and faster convergence rate of the algorithm provide a more effective and simpler solution, marking a slight shift in reinforcement learning studies.
8.7 Learning Generalizable Tool Use with Non-rigid Grasp-pose Registration
- Authors: Malte Mosbach, Sven Behnke
- Reason: Proposed a novel method that allows reinforcement learning of tool use behaviors, overcoming one of the challenge of generalizing grasping configurations.
8.6 Dynamic deep-reinforcement-learning algorithm in Partially Observed Markov Decision Processes
- Authors: Saki Omi, Hyo-Sang Shin, Namhoon Cho, Antonios Tsourdos
- This work focuses on the challenges that arise in reinforcement learning due to non-static disturbances, using LSTM networks to manage sequential information on the trajectory. It provides valuable insights into how handling such information can be improved, potentially aiding other similar studies and researches.