Neuro-Symbolic Safety Guidance for Vision-Language-Action Models via Constrained Flow Matching
Abstract
Vision-Language-Action (VLA) models have demonstrated promising generalization capabilities across robotic manipulation tasks, yet their real-world deployment remains limited by the lack of effective safety measures. Specifically, existing safety measures only prevent collisions caused by the robot’s next action. In this paper, we propose a neuro-symbolic safety guidance mechanism for flow matching based VLAs that enables predictive collision avoidance. Flow matching based VLAs determine the next actions by predicting a trajectory (a sequence of actions) through an iterative neural flow matching process. Our method formulates safety enforcement as a minimum-norm constrained optimization problem that corrects safety violations during the denoising process of noisy intermediate trajectory predictions. By analyzing predicted trajectories and applying corrections during iterative denoising, our approach anticipates collisions before they become unavoidable. This interleaving of symbolic constraint satisfaction with neural trajectory generation enables predictive collision avoidance rather than reactive intervention. On the SafeLIBERO benchmark, our method achieves 82.8% collision avoidance and 81.6% task success, a 6.3% and 19.8% improvement respectively over single-step methods, with the largest gains on long-horizon tasks where compounding distribution shift is most pronounced. Our results suggest that predictive trajectory-aware safety guidance during generation offers a promising alternative to reactive single-action methods for safe VLA deployment.
Safety Guidance Visualization
Visualization of guided denoising trajectories. Mid-denoising guidance (τ ≈ 0.3–0.6) produces smooth, collision-free paths while preserving the learned action distribution.
SafeLIBERO Evaluation Rollouts
Side-by-side comparison of the same episodes with and without safety guidance. Click "play both" to compare.