Neuro-Symbolic Safety Guidance for Vision-Language-Action Models via Constrained Flow Matching

English, William; Zheng, Hao; Ewetz, Rickard

Neuro-Symbolic Safety Guidance for Vision-Language-Action Models via Constrained Flow Matching

William English¹, Hao Zheng², Rickard Ewetz¹

¹University of Florida ²University of Central Florida

Paper Coming Soon Code Coming Soon arXiv Coming Soon

Overview of neuro-symbolic safety guidance integrated into the flow matching denoising process of a VLA model

Our neuro-symbolic safety guidance mechanism steers VLA model action generation away from obstacles during the flow matching denoising process, enabling predictive collision avoidance without retraining.

Abstract

Vision-Language-Action (VLA) models have demonstrated promising generalization capabilities across robotic manipulation tasks, yet their real-world deployment remains limited by the lack of effective safety measures. Specifically, existing safety measures only prevent collisions caused by the robot’s next action. In this paper, we propose a neuro-symbolic safety guidance mechanism for flow matching based VLAs that enables predictive collision avoidance. Flow matching based VLAs determine the next actions by predicting a trajectory (a sequence of actions) through an iterative neural flow matching process. Our method formulates safety enforcement as a minimum-norm constrained optimization problem that corrects safety violations during the denoising process of noisy intermediate trajectory predictions. By analyzing predicted trajectories and applying corrections during iterative denoising, our approach anticipates collisions before they become unavoidable. This interleaving of symbolic constraint satisfaction with neural trajectory generation enables predictive collision avoidance rather than reactive intervention. On the SafeLIBERO benchmark, our method achieves 82.8% collision avoidance and 81.6% task success, a 6.3% and 19.8% improvement respectively over single-step methods, with the largest gains on long-horizon tasks where compounding distribution shift is most pronounced. Our results suggest that predictive trajectory-aware safety guidance during generation offers a promising alternative to reactive single-action methods for safe VLA deployment.

Safety Guidance Visualization

Visualization of safety-guided flow matching trajectories showing collision avoidance

Visualization of guided denoising trajectories. Mid-denoising guidance (τ ≈ 0.3–0.6) produces smooth, collision-free paths while preserving the learned action distribution.

SafeLIBERO Evaluation Rollouts

Side-by-side comparison of the same episodes with and without safety guidance. Click "play both" to compare.

-->

Related Works

VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer

SafeVLA: Towards Safety Alignment of VLA Models via Constrained Learning

π0: A Vision-Language-Action Flow Model for General Robot Control

Neuro-Symbolic Safety Guidance for Vision-Language-Action Models via Constrained Flow Matching

Our neuro-symbolic safety guidance mechanism steers VLA model action generation away from obstacles during the flow matching denoising process, enabling predictive collision avoidance without retraining.

Abstract

Safety Guidance Visualization

Visualization of guided denoising trajectories. Mid-denoising guidance (τ ≈ 0.3–0.6) produces smooth, collision-free paths while preserving the learned action distribution.

SafeLIBERO Evaluation Rollouts