AI Toolkit Inference

Reproducible Diffusers LoRA inference pipelines for adapters trained with ostris/ai-toolkit.

Model Catalog · GitHub Repo

Docs Home · Model Catalog · HTTP API

Troubleshooting AI Toolkit LoRA Inference

This page is a practical checklist for the most common “my AI Toolkit samples look great, but inference looks different / broken” problems.

It’s written specifically for:

If you want a known-good baseline, start by running the corresponding by-model pipeline from this repo, then change one variable at a time.

Quick navigation


One-minute checklist (80/20)

When outputs drift from training samples, it’s almost always one of these:

1) Wrong model id / wrong base checkpoint

2) Resolution rules don’t match

3) Scheduler / steps / guidance differ

4) Control image wiring differs

5) LoRA isn’t being applied the way you think


Symptom → cause → fix (quick table)

Symptom Most likely cause What to do in this repo
AI Toolkit samples look good, Diffusers/ComfyUI inference looks different different base checkpoint / scheduler / resolution snapping Start from the model page and copy its defaults; also check snapped dimensions and LoRA load mode.
LoRA “does nothing” (looks like base model) wrong trigger word / scale 0 / incompatible LoRA keys / wrong pipeline family Verify trigger_word, set loras[].network_multiplier=1.0, and use the exact model pipeline. For some families, LoRA is fused so scale is not dynamically adjustable.
API returns CONTROL_IMAGE_REQUIRED edit/I2V model needs ctrl_img Provide ctrl_img (URL or base64) in the prompt item.
API returns MOE_FORMAT_REQUIRED, SINGLE_MOE_CONFIG_ONLY, or SINGLE_LORA_ONLY Wan 2.2 MoE format required (or multiple configs sent), or multiple LoRAs sent Use loras with transformer: "low" / "high" for Wan 2.2 14B (single config); otherwise send exactly one LoRA item.
Wan2.2 motion is “too fast / weird” vs samples frames/fps mismatch; I2V conditioning mismatch Match num_frames + fps defaults from the model page; keep resolution within the checkpoint regime.
Download stalls at ~99% HF transfer/Xet edge cases See the Hugging Face download section below; this repo already applies mitigations.
OOM / CUDA out of memory too-large res/frames, heavy pipeline, CPU offload disabled for that model Reduce width/height/num_frames, try smaller model ids, enable CPU offload where supported.

1) “Training samples are better than my inference”

This is a very common report across model families (FLUX/HiDream/Qwen/Wan). Users often find that:

Examples (for context):

What usually fixes it:

If you’re not sure which model you are actually using, call GET /v1/models (see HTTP API) and compare defaults.


2) “My LoRA isn’t applying / has no effect” (Diffusers / ComfyUI)

This shows up as “the output looks exactly like the base model.” Common causes:

A) Missing trigger word

AI Toolkit commonly uses a trigger token that you must include in the inference prompt.

This server supports a convenience placeholder:

The server replaces [trigger] in every prompt item (see executor code in src/tasks/executor.py).

B) LoRA scale is too low / not actually being updated

The API uses a single LoRA scale per request (per transformer for MoE). If you need multiple scales, send separate requests.

C) Incompatible LoRA key layout (especially Qwen Image)

Some users report Qwen-Image LoRAs not working in other stacks due to weight key naming expectations.

Example report: “Qwen-Image LoRAs not working in ComfyUI” (ai-toolkit #372).

If your LoRA works in AI Toolkit samples but not elsewhere:

D) Diffusers version / pipeline compatibility

Some model families require newer or customized Diffusers behavior for LoRA injection.

Example: LoRA not affecting FLUX inference due to compatibility issues was discussed in Diffusers (diffusers #9361).

This repo pins Diffusers to a specific revision in requirements-inference.txt to reduce “works in one environment but not another” failures.


3) “I’m getting a 400 error from the API”

Most 400s are intentional guardrails.

A) CONTROL_IMAGE_REQUIRED

Models that require a control image:

Fix:

B) MOE_FORMAT_REQUIRED (Wan 2.2 14B)

Wan 2.2 14B uses a dual-transformer MoE setup. This server expects:

"loras": [
  {"path": "low_noise.safetensors", "transformer": "low", "network_multiplier": 1.0},
  {"path": "high_noise.safetensors", "transformer": "high", "network_multiplier": 1.0}
]

Context: users frequently ask about the “low/high noise” split and when each matters (e.g. ai-toolkit #349).

Practical tip:


4) Wan video looks wrong (frames/fps/resolution)

Wan 2.1 (T2V/I2V)

Wan 2.2 I2V “motion too fast / weird”

Users have reported Wan2.2 I2V training/inference oddities, including motion artifacts that can correlate with frame/time settings (see ai-toolkit #421).

What to try:


5) Flex.2 / custom pipelines don’t run in vanilla Diffusers

Some preview models ship with custom or rapidly changing pipeline code.

Example: Flex.2 users discuss Diffusers incompatibilities and the need for custom pipelines (Flex.2 discussion).

Fix:


6) Hugging Face model download stalls at ~99%

This is a real-world issue when downloading large model repos.

Example reports:

This repo includes mitigations in src/services/pipeline_manager.py (environment vars set before snapshot_download). If you still hit stalls:


7) “AI Toolkit expects train_config during inference”

Some users assume the training config must be present to run inference (see ai-toolkit #416).

This server does not read the AI Toolkit training config to infer parameters automatically. Instead:


Still stuck?

If you can share:

…you can usually pinpoint the mismatch quickly.

Note: if your main blocker is environment drift (CUDA/PyTorch/Diffusers versions, or missing model-specific pipeline deps), it can help to run training + inference in a fixed runtime/container. RunComfy provides a managed runtime for AI Toolkit, but any reproducible GPU environment works — the reference behavior is still defined by the code in this repo.