Constrained Decoding for Robot Foundation Models

How lessons from language models inspired a new way to make robot foundation models provably safe.

📄 Read the Paper

Recent Robot Foundation Models (RFMs) such as SPOC, FLaRe, PoliFormer, and OpenVLA map multimodal inputs (RGB, language, proprioception) directly to action sequences and generalize well across navigation and manipulation tasks. Trained on large language-conditioned trajectory datasets, they achieve strong zero-shot transfer on diverse goals such as object-centric tasks ("find a mug"), spatial tasks ("visit all rooms"), and attribute-conditioned variants ("locate the chair closest to the refrigerator"). While these models demonstrate robust real-world performance, they remain purely data-driven and have no explicit notions of safety. To deploy these policies out in the wild, we need methods beyond depending on implicit biases of pretraining data.

Find and Hold an Apple (Video source: FLaRe)
Grasp a Mug (Video source: FLaRe)

A way to enforce these rules could be through fine tuning a pretrained model on safe demonstrations. However, retraining is expensive and can’t ensure provable safety due to model stochasticity. To overcome this challenge, We need a inference-time way to enforce safety requirements.

Background on structured outputs for LLMs

For language models (LLMs), syntactic correctness such as following JSON schemas can be enforced at inference time without finetuning. Constrained decoding ensures generated sequences satisfy syntactic or structural rules without retraining. Early beam search based approaches would prune tokens that violate constraints [1], while recent frameworks enable grammar- and program-aligned outputs via lightweight runtime control [2], [3]. Follow-on work extends this to regex schemas and context-free grammars [4], [5].

In all these methods, the core principle is the same: mask invalid next-tokens before they’re produced , pruning probability mass of illegal continuations so the model outputs well-formed text (e.g., JSON) [2], [4].

Left: Masking invalid tokens during constrained decoding (source: LMSYS blog). Right: JSON schema-aligned decoding using Outlines.

Key insight: If we can prune invalid completions in LLMs, could we prune unsafe action sequences for RFMs? Since these RFMs are largely autoregressive transformers, can we apply similar ideas from constrained decoding?

The catch: A large class of robotics constraints are temporal and defined over state-based trajectories (e.g, avoid going through an unsafe region at all times). Additionally, most current RFMs output an action token. Thus, unlike LLMs, constraint checking can’t happen purely in token space—it needs forward simulation (a dynamics stepping function) to evaluate specification satisfaction as actions unfold [6].

Large Language Models (LLMs)
  • Constraints defined on tokens (regex/grammar/JSON) that the model outputs [4], [5]
  • Validity checkable on tokens during generation
  • Some constraints can be captured using Finite State Automata
Robot Foundation Models (RFMs)
  • Constraints defined on state space while model outputs actions [6]
  • Requires forward dynamics to evaluate outcomes [6]
  • Constraints can be captured using Temporal Logics

Constrained decoding for RFMs

We take the constrained-decoding principle and apply it to enforce safety specifications over state trajectories. Our safety rules are captured by Signal Temporal Logic (STL), a language defined over continuous signals from dynamical systems. We propose safety specification aligned decoding (SafeDec) that simulates candidate actions with an approximate dynamics model and evaluates STL satisfaction in real time, directly inside the decoding loop [6]. Given an STL rule \( \varphi \) and a lightweight dynamics model \( f(x_t, a_t) \), each candidate next-action is forecasted and either masked (HCD) if it leads to a violation, or reweighted (RCD) by its satisfaction score to steer toward safer behavior—inside the decoder loop [7].
In LLMs, constraints are syntactic and local; in RFMs, constraints are temporal and depend on forward-simulated dynamics. SafeDec bridges the gap with STL-guided decoding [6].

SafeDec sits between the policy’s final logits and action selection. Given an STL spec φ and a lightweight dynamics model, it either masks or reweights action logits at each step so that the sampled action respects safety now—and in the near future.

Hard Constrained Decoding (HCD)

If a candidate action’s predicted next state would violate φ, set its logit to −∞ (i.e., zero probability) before softmax. This yields provable compliance under the assumed dynamics.

Robustness Constrained Decoding (RCD)

Compute STL satisfaction score (robustness) ρ for each candidate’s predicted successor state and convert it to a weight that shifts the logits—boosting safer actions and suppressing risky ones (with tunable strength). This preserves task performance while greatly reducing violations.

SafeDec is model-agnostic: it only needs (1) access to decoder logits and (2) an approximate dynamics function. STL evaluation is done efficiently via STLCG++ for real time inference.

Sample Visualisations

Each plot shows a bird's eye view of trajectories starting from the white dot under the instruction “find a sofa”. Left: The unconstrained model passes through two forbidden regions (red squares) on the way to the yellow sofa. Right: SafeDec modifies the trajectories to respect STL safety specifications while still reaching the goal.
Top: FPV view of the base model violating the requirement. Right: FPV view of the model with SafeDec.

Results at a Glance

Evaluated on hundreds of procedurally generated AI2-THOR scenes with three SOTA policies (SPOC, Flare, PoliFormer), SafeDec enforced two invariant specs: ϕavoid (never enter forbidden zones) and ϕgeofence (stay within allowed regions).

  • Unconstrained: Only ~68–78% geofence and ~72–77% avoid satisfaction.
  • HCD: ~100% spec satisfaction across models/specs, with a modest 5–10% success drop vs. baseline.
  • RCD: ~80–95% spec satisfaction with success rates close to unconstrained (often within 1–3%); better safety–performance trade-off than HCD.

Ablations

Since we assume a simple dynamics model (unicycle) for generating states from proposed actions, we evaluate the impact of noisy dynamics on final satisfaction. Additionally, we also sweep over the beta parameter for RCD, which controls how much specification satisfaction should be prioritized over task performance.

Left: STL satisfaction (%) for HCD and RCD under baseline vs noisy dynamics across base models. Right: Effect of β on success rate and safety satisfaction.

SafeDec remains effective under dynamics noise; both HCD and RCD degrade gracefully. For our β ablation, we observe that as β increases for PoliFormer, both STL satisfaction and success rate improve in tandem until β = 10, suggesting that moderate regularization can actually aid policy execution. Beyond this, STLsatisfaction continues to improve but at the cost of lower success rates. For Flare, larger β values improve STL satisfaction but reduce success rates. These results highlight that the influence of β is model-dependent but in general demonstrate that SafeDec provides a tunable mechanism to balance safety and performance objectives.

References

  1. Hokamp, C., & Liu, Q. (2017). Lexically Constrained Decoding for Sequence Generation (Grid Beam Search). ACL.
  2. Willard, B., & Louf, R. (2023). Guidance: A Language Model Control Framework. GitHub: guidance-ai/guidance.
  3. Beurer-Kellner, L., et al. (2023). Prompting is Programming: A Query Language for LLMs. PLDI. DOI: 10.1145/3591300.
  4. Welleck, S., et al. (2024). From Decoding to Meta-Generation: Inference-Time Algorithms for LLMs. arXiv: 2406.16838.
  5. Park, K., et al. (2024). Grammar-Aligned Decoding. NeurIPS. paper.
  6. SafeDec paper (this work): LLM vs RFM constraint contrast; need for dynamics stepping for spec checking (see §2.2 and §3.1).
  7. SafeDec paper (this work): HCD masking by setting logits to −∞ for spec-violating actions; RCD reweighting by STL robustness (see §3.2–§3.3).
  8. SafeDec paper (this work): Model-agnostic, inference-time STL enforcement without retraining; overall vision (Intro & §3).