Towards Precise Intent-Aligned VLA Aerial Navigation via Expert-Guided GRPO
1Zhejiang University 2Differential Robotics
*Equal contribution; †Corresponding author.
Abstract
Vision-Language-Action (VLA) models offer a promising end-to-end paradigm for unmanned aerial vehicles (UAVs) to accomplish complex tasks specified by fine-grained instructions. However, standard supervised fine-tuning (SFT) suffers from data scarcity, limited generalization, and weak supervision for nuanced and complicated human intents. Reinforcement fine-tuning offers a natural way to mitigate these challenges and align policy behaviors with human intents through designable feedback, but applying it to aerial navigation remains challenging due to inefficient exploration in expansive continuous spaces. To address these challenges, we introduce an efficient reinforcement learning (RL) framework for VLA-based aerial navigation. At its core, we propose EG-GRPO (Expert-Guided Group Relative Policy Optimization) to augment online rollouts with few-shot expert data. Additionally, we design a heterogeneous pipeline enabling parallel simulation and inference, which reduces rollout time by 43.5%. Across multiple tasks specified by complex human intents, EG-GRPO improves the success rate to 2.13× that of the SFT baseline, while improving intent alignment performance by 60.9%. These results demonstrate that our framework can move aerial navigation toward precise intent-aligned flight. Our videos are available on Intent-Aligned_AerialNav, code will be released soon.
Simulation Demos
Real-World Demos
More demos coming soon...
Method Highlights
EG-GRPO Training Pipeline
Initializes UAV navigation from SFT, then injects few-shot expert trajectories into GRPO groups to provide stable trajectory-level rewards for intent-aligned policy updates.
Heterogeneous Parallelization of Inference and Simulation
Decouples Isaac Lab rollout and VLA inference with double-buffered task groups across L20 and A100 hardware, reducing rollout idle time and accelerating online RL.