Self-Guided Action Diffusion

Abstract

Recent works have shown the promise of inference-time search over action samples for improving generative robot policies. In particular, optimizing cross-chunk coherence via bidirectional decoding has proven effective in boosting the consistency and reactivity of diffusion policies. However, this approach remains computationally expensive as the diversity of sampled actions grows. In this paper, we introduce self-guided action diffusion, a more efficient variant of bidirectional decoding tailored for diffusion-based policies. At the core of our method is to guide the proposal distribution at each diffusion step based on the prior decision. Experiments in simulation tasks show that the proposed self-guidance enables near- optimal performance at negligible inference cost. Notably, under a tight sampling budget, our method achieves up to 70% higher success rates than existing counterparts on challenging dynamic tasks.

Key Findings

High Task Success: Self-Guided Action Diffusion improves Robomimic benchmarks success rates by ~70% over diffusion policy baselines in closed loop settings.
Generalization: On the DexMimicGen Cross-Embodiment Suite and RoboCasa, our method outperforms the GROOT-N1 foundation model by 28% and 12% respectively.
Robustness: Our method maintains high performance even in stochastic environments and across diverse demonstration datasets.
Sample Efficiency: Compared to coherence sampling, Self-GAD achieves higher success with fewer samples by leveraging prior gradients at inference-time.

Approach

Self-Guided Action Diffusion (Self-GAD) enhances diffusion-based robot policies by injecting a guidance signal during inference. At each denoising step, we apply a lightweight update that encourages actions to stay close to prior predictions while allowing adaptation:

$(\hat{s}_t, \hat{a}_t) \leftarrow (\hat{s}_t, \hat{a}_t) + \beta \nabla_{(\hat{s}_t, \hat{a}_t)} \mathcal{L}$

The loss penalizes deviations from the prior trajectory with an exponentially decaying weight:

$\mathcal{L} = \sum_{i = t + h}^{t + l} 0.5^{i - (t + h)} \cdot \left\| \hat{a}_i - a_i^{\text{prior}} \right\|_2^2$

Further, we integrate Self-GAD into GR00T-N1, a robotic foundation model rooted in a flow-matching Diffusion Transformer. This plug-in method improves closed-loop performance, particularly under persistent noise and shifting targets that challenge standard open-loop diffusion policies.

Experiments

How does Self-GAD perform in single-sample closed-loop settings?

We evaluate Self-GAD against baseline diffusion policies using a single-sample closed-loop control strategy. Self-GAD outperforms Random sampling, achieving an average success rate 71.4% higher across all Robomimic benchmarks.

Final Policy Comparison — Comparison of Sampling Methods in Single-Sample Settings

How does Self-GAD scale to foundation models like GR00T-N1-2B?

We integrate Self-GAD into GR00T-N1-2B, a large-scale robotic foundation model using flow-matching and multimodal embeddings. We fine-tune GR00T-N1-2B on 100 demonstrations per task in single action horizon settings (PnP Counter to Cab, Turn Stove On, Turn Sink Faucet On, Turn Microwave Off, Coffee, and Transport). Self-GAD integrated with the robotic foundation models boosts success rates by in both RoboCasa and DexMG, by 28.4% and 12% respectively.

GR00T-N1 Results — Self-GAD in a Generalized Robotic Foundation Model

How does Self-GAD improve sample efficiency?

Compared to coherence sampling, Self-GAD achieves high performance with fewer samples. Here, Self-GAD achieves near-optimal performance with a single sample, maintained from 16 samples in PushT.

How robust is Self-GAD to stochasticity and diverse demonstrations?

We evaluate Self-GAD in noisy environments and across varied training datasets. The method consistently generalizes under distribution shift and environmental perturbations. On the RoboMimic Square task, guidance improves consistency in single-sample rollouts, with benefits increasing in high-variance settings as action diversity grows. In dynamic PushT environments, Self-GAD significantly boosts closed-loop performance, particularly under high variability.

Guidance Enhances Robustness to Dataset Variance

Self-GAD in Dynamic Settings Across Action Horizons

How does Self-GAD weight past predictions during inference?

Self-GAD reduces the performance gap between static and dynamic environments. Here, we confirm the reusability of a finetuned beta weight for prior relevance across dynamic settings.

Temporal Weighting — Temporal weighting under Dynamic Conditions

Self-Guided Action Diffusion