AdDiffusion reformulates diffusion model inference as a Markov Decision Process. A lightweight PPO-trained policy network observes the evolving latent state during denoising and selects from three actions at each step: continue, stop (early termination), or refine (selective region re-denoising). This enables prompt-dependent adaptive computation β simple prompts terminate early, complex prompts allocate additional effort to under-generated regions.
Standard diffusion inference uses a fixed number of steps regardless of prompt complexity. AdDiffusion wraps any pretrained diffusion model (without modifying its weights) and learns when to stop, when to continue, and when to selectively refine β all conditioned on the prompt and current generation quality.
Noise βββ [Agent decides at each step] βββ Final Image
β
ββ continue: standard denoising step
ββ stop: early termination (save compute)
ββ refine: re-denoise under-generated regions
- H1 (Efficiency): Equivalent quality to 50-step inference at β€30 average NFE
- H2 (Quality): Higher quality than fixed-step methods at equal compute budget
- H3 (Adaptive): Prompt-dependent behavior β simple prompts stop early, complex prompts use more steps
- H4 (Refinement): Selective region refinement outperforms additional full-image steps
| Component | Description |
|---|---|
| State | CLIP image + text features, timestep embedding, quality metrics (~1672-dim) |
| Policy | 2-layer MLP β 3-action softmax (continue/stop/refine) |
| Value | 2-layer MLP β scalar estimate |
| Reward | Quality deltas (CLIP, ImageReward) + DINO stability - NFE penalty + terminal bonus |
| Refinement | Cross-attention masks with Gaussian blur, k=2 denoising iterations |
addiffusion/
βββ src/
β βββ diffusion/ # Pipeline wrapper, attention extraction, region refinement
β βββ agent/ # Policy/value networks, state features, PPO, episode loop
β βββ rewards/ # Composite reward with normalization
β βββ evaluation/ # Metrics, VLM-as-Judge, significance testing
β βββ baselines/ # Fixed-step, SAG, Attend-and-Excite, oracle
βββ tests/ # Unit + integration tests
βββ configs/ # Hydra YAML configs
βββ scripts/ # SLURM templates
βββ discovery.md # 38 analytical findings with fixes
βββ experiment.md # Step-by-step experiment execution guide
βββ plan.md # 14-week execution plan with decision gates
βββ phase1.md # Phase 1 completion report
# Install dependencies (requires Python 3.10, CUDA 12.4)
uv venv --python 3.10 && source .venv/bin/activate
uv sync
# Run CPU-only tests (no GPU needed)
uv run python tests/run_all.py --cpu-only
# Run full test suite (requires GPU + model weights)
uv run python tests/run_all.py
# Train the agent
uv run python src/train.py --config configs/default.yaml
# Or submit to SLURM
sbatch scripts/train.slurmBenchmarks: COCO-30K, DrawBench, PartiPrompts, GenEval, T2I-CompBench
Metrics: FID, CLIP Score, ImageReward, HPS v2, Aesthetic Score, VLM-as-Judge (GPT-4o + Gemini)
Baselines: DDIM, DPM-Solver, LCM, SDXL-Turbo, SAG, Attend-and-Excite, Oracle-Stop
| Model | Role |
|---|---|
| SD 1.5 | Primary training backbone |
| SDXL | Scale generalization |
| Flux.1-schnell | Architecture generalization (DiT) |
- Python 3.10, PyTorch 2.6.0, CUDA 12.4
- A100-40GB (SD 1.5) or A100-80GB (SDXL)
- ~800 GPU hours for full pipeline (training + baselines + evaluation)