AI Toolkit Inference

Reproducible Diffusers LoRA inference pipelines for adapters trained with ostris/ai-toolkit.

Model Catalog · GitHub Repo

Docs Home · HTTP API · Troubleshooting

Model Catalog

This is the index of supported API model ids for POST /v1/inference.

Each model page is meant to be a small engineering note: defaults that affect outputs, required inputs (e.g. ctrl_img), and the common causes of AI Toolkit preview vs inference mismatch.

If you don’t know which model ids your server build supports, call GET /v1/models (see HTTP API).

Run in the cloud (optional): If you want to reproduce the examples on this page in a pinned runtime without local CUDA/driver setup (and reduce preview‑vs‑inference drift), run it via RunComfy’s Cloud AI Toolkit (Train + Inference). 👉 You can open it here: Cloud AI Toolkit (Train + Inference)

Naming notes

Preview-matching knobs (what usually changes outputs)

If you’re trying to reproduce AI Toolkit training sample previews, start from the relevant model page and align:

Image generation

Model page API model id Base checkpoint Notes
FLUX (FLUX.1-dev) flux black-forest-labs/FLUX.1-dev Pipeline file is named flux_dev.py, but the API model id is flux.
FLUX.2 (FLUX.2-dev) flux2 black-forest-labs/FLUX.2-dev Uses AI Toolkit components and merges LoRA into the transformer at load time (scale is not dynamically adjustable).
FLUX.2-klein 4B flux2_klein_4b black-forest-labs/FLUX.2-klein-base-4B Uses AI Toolkit components + Qwen3 text encoder; CFG/negative prompt are active.
FLUX.2-klein 9B flux2_klein_9b black-forest-labs/FLUX.2-klein-base-9B Uses AI Toolkit components + Qwen3 text encoder; CFG/negative prompt are active.
Flex.1 (alpha) flex1 ostris/Flex.1-alpha Implemented on top of diffusers.FluxPipeline (same family as FLUX.1-dev).
Flex.2 flex2 ostris/Flex.2-preview Uses fuse_lora (weights are merged), so LoRA scale is fixed after load; changing loras[].network_multiplier requires reload.
SDXL 1.0 sdxl stabilityai/stable-diffusion-xl-base-1.0 Uses a DDPMScheduler config aligned to AI Toolkit defaults.
SD 1.5 sd15 stable-diffusion-v1-5/stable-diffusion-v1-5 Uses a DDPMScheduler config aligned to AI Toolkit defaults.
Qwen Image qwen_image Qwen/Qwen-Image Guidance is passed as true_cfg_scale in Diffusers.
Qwen Image (2512) qwen_image_2512 Qwen/Qwen-Image-2512 Uses fuse_lora (weights are merged), so LoRA scale is not dynamically adjustable.
Z-Image zimage Tongyi-MAI/Z-Image Base Z-Image model. Defaults (30 steps / CFG 4.0) are tuned for preview matching.
Z-Image Turbo zimage_turbo Tongyi-MAI/Z-Image-Turbo Few-step model: the defaults (8 steps / CFG 1.0) are part of the model’s intended regime.
Z-Image De-Turbo zimage_deturbo Tongyi-MAI/Z-Image-Turbo Assembles the pipeline from separate components (transformer from ostris/Z-Image-De-Turbo).
Chroma chroma lodestones/Chroma1-Base Requires AI Toolkit for the custom pipeline and quantization path. LoRA is merged before quantization.
HiDream I1 hidream HiDream-ai/HiDream-I1-Full Heavy pipeline: loads Llama-3.1-8B-Instruct as an extra text encoder and fuses LoRA into the transformer.
Lumina Image 2.0 lumina2 Alpha-VLLM/Lumina-Image-2.0 Uses a FlowMatch scheduler config aligned to AI Toolkit’s Lumina2 sampler defaults.
OmniGen2 omnigen2 OmniGen2/OmniGen2 Optional reference images (up to 3 via ctrl_img(_1..3)). Applies chat template for preview matching.

Editing / control-image

Model page API model id Base checkpoint Notes
FLUX Kontext (FLUX.1-Kontext-dev) flux_kontext black-forest-labs/FLUX.1-Kontext-dev Requires a control image (ctrl_img). The server resizes it to match the requested output size.
Qwen Image Edit qwen_image_edit Qwen/Qwen-Image-Edit Requires a control image (ctrl_img). Prompt encoding depends on the control image.
Qwen Image Edit Plus (2509) qwen_image_edit_plus Qwen/Qwen-Image-Edit-2509 Multi-image edit. Supports up to 3 control images (ctrl_img_1..3). Prompt encoding uses those images.
Qwen Image Edit Plus (2511) qwen_image_edit_plus_2511 Qwen/Qwen-Image-Edit-2511 Multi-image edit with fuse_lora (weights are merged). Supports up to 3 control images.
HiDream E1 (HiDream-E1-Full) hidream_e1 HiDream-ai/HiDream-E1-Full Image editing. Requires ctrl_img. Uses fused LoRA (reload required for different scales).

Video generation

Model page API model id Base checkpoint Notes
LTX-2 ltx2 Lightricks/LTX-2 Unified T2V/I2V: if you provide ctrl_img it runs I2V; otherwise T2V. Outputs frames + audio (MP4). LoRA is converted and fused.
LTX-2.3 ltx2.3 dg845/LTX-2.3-Diffusers Same as LTX-2 but with LTX-2.3 base model. Uses VocoderWithBWE for improved audio.
Wan 2.1 T2V (14B) wan21_14b Wan-AI/Wan2.1-T2V-14B-Diffusers Text-to-video. Uses diffusers.WanPipeline.
Wan 2.1 T2V (1.3B) wan21_1b Wan-AI/Wan2.1-T2V-1.3B-Diffusers Text-to-video. Smaller checkpoint (1.3B) with the same API surface as 14B.
Wan 2.1 I2V (14B) wan21_i2v_14b Wan-AI/Wan2.1-I2V-14B-720P-Diffusers Image-to-video. Requires ctrl_img. The 480p variant wan21_i2v_14b480p uses the same logic but a different base checkpoint. Also supports wan21_i2v_14b480p.
Wan 2.2 T2V (A14B) wan22_14b_t2v Wan-AI/Wan2.2-T2V-A14B-Diffusers MoE LoRA format: loras with transformer: "low" / "high" (either side optional).
Wan 2.2 I2V (A14B) wan22_14b_i2v ai-toolkit/Wan2.2-I2V-A14B-Diffusers-bf16 Requires AI Toolkit for the custom Wan22Pipeline + first-frame conditioning. Uses MoE LoRA format and requires ctrl_img.
Wan 2.2 TI2V (5B) wan22_5b Wan-AI/Wan2.2-TI2V-5B-Diffusers Tier-2 model: requires AI Toolkit. Supports both T2V and I2V (provide ctrl_img for I2V).