AI Toolkit Inference

Reproducible Diffusers LoRA inference pipelines for adapters trained with ostris/ai-toolkit.

Model Catalog · GitHub Repo

ai-toolkit-inference Docs

ai-toolkit-inference is a set of reference Diffusers pipelines plus an async FastAPI server for running LoRAs trained with ostris/ai-toolkit.

The core goal is training preview ↔ inference parity. AI Toolkit samples are produced by a specific inference graph (scheduler wiring, resolution rules, LoRA injection, seeding). Small “default” differences between stacks can show up as visibly different outputs.

Start here

What you’ll find in these docs

If you’re calling the server and you don’t know which model ids are supported, start with GET /v1/models (documented under HTTP API).

What’s in this repo

Quick API shape

  1. POST /v1/inference → returns request_id, status_url, result_url
  2. Poll GET /v1/requests/{request_id}/status
  3. Fetch GET /v1/requests/{request_id}/result

The request schema is defined in src/schemas/request.py.

Preview vs inference mismatch (the 80/20)

If AI Toolkit training samples look good but your inference looks different, the most common causes are:

Each model page calls out the model-specific version of these pitfalls.

Glossary (short)

Note: if your main blocker is environment drift (CUDA/PyTorch/Diffusers versions, large model downloads, model-specific pipeline deps), it helps to run training + inference in a fixed runtime/container. RunComfy provides a managed runtime for AI Toolkit, but any reproducible GPU environment works — the reference behavior is still defined by this repo.