Reproducible Diffusers LoRA inference pipelines for adapters trained with ostris/ai-toolkit.
← Docs Home · Model Catalog · HTTP API · Troubleshooting
This page documents Wan 2.1 image-to-video (I2V) LoRA inference for the 480P checkpoint variant, using the ai-toolkit-inference reference pipeline.
If you trained a LoRA with ostris/ai-toolkit and your inference doesn’t match training samples, the biggest lever for Wan I2V is usually: using the correct base checkpoint (480P vs 720P), matching the requested resolution, and matching control-image preprocessing.
Run in the cloud (optional): If you want to reproduce the examples on this page in a pinned runtime without local CUDA/driver setup (and reduce preview‑vs‑inference drift), run it via RunComfy’s Cloud AI Toolkit (Train + Inference). 👉 You can open it here: Cloud AI Toolkit (Train + Inference)
wan21_i2v_14b480psrc/pipelines/wan21.pysrc/pipelines/base.pysrc/schemas/request.pyRelated model page:
| Setting | Value | Where it comes from |
|---|---|---|
| Base checkpoint | Wan-AI/Wan2.1-I2V-14B-480P-Diffusers |
Wan21I2V14B480PPipeline.CONFIG in wan21.py |
| Resolution divisor (snapping) | 16 |
PipelineConfig.resolution_divisor default in base.py |
| Default steps | 25 |
PipelineConfig.default_steps default in base.py |
| Default guidance scale | 4.0 |
PipelineConfig.default_guidance_scale default in base.py |
| Default frames / fps | 41 frames, 16 fps |
PipelineConfig.default_num_frames/default_fps defaults in base.py |
| Control image required | Yes (ctrl_img) |
Wan21I2V14BPipeline.CONFIG.requires_control_image=True in wan21.py |
The only difference between wan21_i2v_14b and wan21_i2v_14b480p in this repo is the base model checkpoint:
wan21_i2v_14b → ...-720P-Diffuserswan21_i2v_14b480p → ...-480P-DiffusersThe inference logic is otherwise the same (same scheduler configuration, same control-image resize behavior, same LoRA application path).
Practical implication:
832×480, 848×480, 864×480).This model requires a control image:
ctrl_img (base64 or URL, see the API docs)width×height before passing it into Diffusers.Why this matters for preview matching:
This pipeline uses the default Diffusers LoRA path:
pipe.load_lora_weights(...)pipe.set_adapters(..., adapter_weights=[network_multiplier])So LoRA scale is dynamically adjustable per request (no permanent merge like fuse_lora).
{
"model": "wan21_i2v_14b480p",
"trigger_word": "sks",
"prompts": [
{
"prompt": "[trigger] a cinematic shot of a runner on a beach, handheld camera",
"neg": "",
"seed": 42,
"width": 832,
"height": 480,
"num_frames": 41,
"fps": 16,
"sample_steps": 25,
"guidance_scale": 4.0,
"ctrl_img": "https://example.com/first_frame.png"
}
],
"loras": [
{
"path": "my_wan_job/my_wan_job.safetensors",
"network_multiplier": 1.0
}
]
}
Notes:
width/height down to a multiple of 16.seed, the server uses the pipeline default seed (usually 42) plus the prompt index (0-based).seed: -1 (the server generates a random seed and returns it in the outputs).The most common reasons outputs drift from AI Toolkit samples:
1) Wrong checkpoint variant (480P vs 720P)
2) Different resolution after snapping
3) Control image preprocessing mismatch
ctrl_img to exactly width×height. If another stack crops/letterboxes instead, you’ll see drift.