AI Toolkit Inference

Reproducible Diffusers LoRA inference pipelines for adapters trained with ostris/ai-toolkit.

Model Catalog · GitHub Repo

Docs Home · Model Catalog · HTTP API · Troubleshooting

Wan 2.1 I2V 14B (480P) LoRA Inference with Diffusers (AI Toolkit-trained)

This page documents Wan 2.1 image-to-video (I2V) LoRA inference for the 480P checkpoint variant, using the ai-toolkit-inference reference pipeline.

If you trained a LoRA with ostris/ai-toolkit and your inference doesn’t match training samples, the biggest lever for Wan I2V is usually: using the correct base checkpoint (480P vs 720P), matching the requested resolution, and matching control-image preprocessing.

Run in the cloud (optional): If you want to reproduce the examples on this page in a pinned runtime without local CUDA/driver setup (and reduce preview‑vs‑inference drift), run it via RunComfy’s Cloud AI Toolkit (Train + Inference). 👉 You can open it here: Cloud AI Toolkit (Train + Inference)

Reference implementation (source of truth)

Related model page:

Defaults used by the server

Setting Value Where it comes from
Base checkpoint Wan-AI/Wan2.1-I2V-14B-480P-Diffusers Wan21I2V14B480PPipeline.CONFIG in wan21.py
Resolution divisor (snapping) 16 PipelineConfig.resolution_divisor default in base.py
Default steps 25 PipelineConfig.default_steps default in base.py
Default guidance scale 4.0 PipelineConfig.default_guidance_scale default in base.py
Default frames / fps 41 frames, 16 fps PipelineConfig.default_num_frames/default_fps defaults in base.py
Control image required Yes (ctrl_img) Wan21I2V14BPipeline.CONFIG.requires_control_image=True in wan21.py

What makes this pipeline “480P”

The only difference between wan21_i2v_14b and wan21_i2v_14b480p in this repo is the base model checkpoint:

The inference logic is otherwise the same (same scheduler configuration, same control-image resize behavior, same LoRA application path).

Practical implication:

Control image behavior (I2V)

This model requires a control image:

Why this matters for preview matching:

LoRA loading and scale

This pipeline uses the default Diffusers LoRA path:

So LoRA scale is dynamically adjustable per request (no permanent merge like fuse_lora).

Minimal HTTP request example

{
  "model": "wan21_i2v_14b480p",
  "trigger_word": "sks",
  "prompts": [
    {
      "prompt": "[trigger] a cinematic shot of a runner on a beach, handheld camera",
      "neg": "",
      "seed": 42,
      "width": 832,
      "height": 480,
      "num_frames": 41,
      "fps": 16,
      "sample_steps": 25,
      "guidance_scale": 4.0,
      "ctrl_img": "https://example.com/first_frame.png"
    }
  ],
  "loras": [
    {
      "path": "my_wan_job/my_wan_job.safetensors",
      "network_multiplier": 1.0
    }
  ]
}

Notes:

Preview vs inference mismatch (Wan I2V 80/20)

The most common reasons outputs drift from AI Toolkit samples:

1) Wrong checkpoint variant (480P vs 720P)

2) Different resolution after snapping

3) Control image preprocessing mismatch