Reproducible Diffusers LoRA inference pipelines for adapters trained with ostris/ai-toolkit.
← Docs Home · Model Catalog · Troubleshooting
The ai-toolkit-inference server exposes a small HTTP/REST API for running LoRAs trained with ostris/ai-toolkit on top of curated Diffusers pipelines.
For per-model behavior and defaults, cross-check the Model Catalog.
This API is intentionally simple:
Note (optional): if your main blocker is environment drift (CUDA/PyTorch/Diffusers versions, large model downloads, custom pipeline deps), it helps to run the server in a fixed runtime/container. RunComfy provides a managed runtime for AI Toolkit training + inference, but any reproducible GPU environment works — the reference behavior is still defined by this repo.
The server uses BASE_URL (see src/config.py) to construct status_url and result_url. Default is usually:
http://localhost:8000GET /health → { "status": "healthy" }POST /v1/inferenceThis creates an async task and returns URLs you can poll.
GET /v1/requests/{request_id}/statusGET /v1/requests/{request_id}/resultGET /v1/modelsOpenAPI / interactive docs (served by FastAPI):
GET /docsGET /openapi.jsonThis is the canonical way to discover:
model IDs1) POST /v1/inference → returns request_id, status_url, result_url
2) Poll GET /v1/requests/{request_id}/status
3) Fetch GET /v1/requests/{request_id}/result
Notes:
Source of truth: src/schemas/request.py
{
"model": "flux",
"hf_token": "hf_*** (optional)",
"trigger_word": "sks (optional)",
"prompts": [
{
"prompt": "[trigger] a cinematic photo",
"neg": "",
"width": 1024,
"height": 1024,
"sample_steps": 25,
"guidance_scale": 4.0,
"seed": 42
}
],
"loras": [
{
"path": "my_job/my_job.safetensors",
"network_multiplier": 1.0
}
]
}
Field details:
model (required): one of the supported API model ids.loras (required): list of LoRA items.
path (required): local file path (relative to WORKFLOWS_BASE_PATH) or a URL.transformer (optional): low or high (MoE models only).network_multiplier (optional): LoRA scale (defaults to model config).hf_token (optional): Hugging Face token for gated models.trigger_word (optional): if provided, the server replaces [trigger] in every prompt.prompts (required): list of prompt items.The server resolves each LoRA path as:
{WORKFLOWS_BASE_PATH}/{loras[].path} (unless the path is a URL or absolute)
Implementation: src/api/v1/inference.py.
Common layouts:
workflows/my_job/my_job.safetensors → loras: [{"path": "my_job/my_job.safetensors"}]workflows/my_lora.safetensors → loras: [{"path": "my_lora.safetensors"}]For:
model="wan22_14b_t2v"model="wan22_14b_i2v"loras must include transformer-tagged items for low/high:
"loras": [
{"path": "low_noise.safetensors", "transformer": "low", "network_multiplier": 1.0},
{"path": "high_noise.safetensors", "transformer": "high", "network_multiplier": 1.0}
]
You may provide only one side if you trained only one LoRA.
Prompt items are objects inside prompts: [...].
Common fields:
prompt (required): the positive prompt.trigger_word (optional): per-prompt override for [trigger] replacement.neg (optional): negative prompt.width, height (optional): requested output size.
resolution_divisor.sample_steps (optional): steps.guidance_scale (optional): CFG scale.sampler (optional): sampler name (pipeline-specific; most pipelines ignore this for now).seed (optional): integer seed.
42) plus the prompt index (0-based).seed: -1 (the server generates a random seed and returns it in outputs).Video-only fields (only respected by video pipelines):
num_framesfpsControl image fields:
ctrl_img: a single control imagectrl_img_1, ctrl_img_2, ctrl_img_3: additional control images (used by multi-image edit models)Source of truth for loading: src/libs/image_utils.py
The server accepts a string. If it starts with http, it’s treated as a URL; otherwise it’s treated as base64.
1) URL
"ctrl_img": "https://example.com/image.png"
2) Base64 (raw or data URL)
"ctrl_img": "iVBORw0KGgoAAAANSUhEUgAA..."
"ctrl_img": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
Source of truth: src/schemas/response.py.
POST /v1/inference response{
"request_id": "b3e6e2e0-...",
"status": "queued",
"status_url": "http://localhost:8000/v1/requests/b3e6e2e0-.../status",
"result_url": "http://localhost:8000/v1/requests/b3e6e2e0-.../result",
"created_at": "2026-01-17T00:00:00Z"
}
{
"request_id": "b3e6e2e0-...",
"status": "in_queue|in_progress|succeeded|failed|canceled",
"created_at": "...",
"started_at": "...",
"finished_at": "...",
"error": null
}
{
"request_id": "b3e6e2e0-...",
"status": "succeeded",
"outputs": {
"images": [
{"format": "jpeg", "width": 1024, "height": 1024, "file_path": "/tmp/inference_output/b3e6e2e0-..._output_0.jpg", "seed": 42}
],
"videos": [
{"format": "mp4", "width": 832, "height": 480, "num_frames": 41, "fps": 16, "file_path": "/tmp/inference_output/b3e6e2e0-..._output_0.mp4", "seed": 42}
]
},
"error": null,
"created_at": "...",
"finished_at": "...",
"metadata": {"timings": {"load_pipeline": 12.3, "prompts": [...]}}
}
Notes:
.../result before the task finishes, outputs may be empty/null and status will indicate the current state.file_path is a local path on the server.Errors are returned with structured JSON. Common code values include:
MODEL_NOT_FOUNDLORA_FILE_REQUIREDCONTROL_IMAGE_REQUIREDMOE_FORMAT_REQUIREDSINGLE_LORA_ONLYSINGLE_MOE_CONFIG_ONLYLORA_NOT_FOUNDREQUEST_NOT_FOUNDMOE_FORMAT_NOT_SUPPORTEDImplementation: src/api/v1/inference.py.
curl -X POST http://localhost:8000/v1/inference \
-H 'content-type: application/json' \
-d '{
"model": "flux",
"trigger_word": "sks",
"loras": [{"path": "my_flux_job/my_flux_job.safetensors", "network_multiplier": 1.0}],
"prompts": [{
"prompt": "[trigger] a cinematic portrait photo",
"width": 1024,
"height": 1024,
"sample_steps": 25,
"guidance_scale": 4.0,
"seed": 42
}]
}'
curl -X POST http://localhost:8000/v1/inference \
-H 'content-type: application/json' \
-d '{
"model": "flux_kontext",
"loras": [{"path": "my_kontext_job/my_kontext_job.safetensors", "network_multiplier": 1.0}],
"prompts": [{
"prompt": "make it look like a watercolor",
"width": 1024,
"height": 1024,
"ctrl_img": "https://example.com/input.png",
"seed": 42
}]
}'
curl -X POST http://localhost:8000/v1/inference \
-H 'content-type: application/json' \
-d '{
"model": "wan22_14b_t2v",
"loras": [
{"path": "my_wan_job/low_noise.safetensors", "transformer": "low", "network_multiplier": 1.0},
{"path": "my_wan_job/high_noise.safetensors", "transformer": "high", "network_multiplier": 1.0}
],
"prompts": [{
"prompt": "a cinematic shot of a city at night",
"width": 1280,
"height": 720,
"num_frames": 41,
"fps": 16,
"sample_steps": 25,
"guidance_scale": 4.0,
"seed": 42
}]
}'
import time
import requests
BASE_URL = "http://localhost:8000"
payload = {
"model": "zimage_turbo",
"trigger_word": "sks",
"loras": [{"path": "my_lora_job/my_lora_job.safetensors", "network_multiplier": 1.0}],
"prompts": [
{
"prompt": "[trigger] a photo of a person",
"width": 1024,
"height": 1024,
"seed": 42,
"sample_steps": 8,
"guidance_scale": 1.0,
"neg": "",
}
],
}
# 1) Start request
r = requests.post(f"{BASE_URL}/v1/inference", json=payload, timeout=30)
r.raise_for_status()
job = r.json()
request_id = job["request_id"]
# 2) Poll status
while True:
s = requests.get(f"{BASE_URL}/v1/requests/{request_id}/status", timeout=30)
s.raise_for_status()
status = s.json()["status"]
if status in ("succeeded", "failed", "canceled"):
break
time.sleep(1)
# 3) Fetch result
result = requests.get(f"{BASE_URL}/v1/requests/{request_id}/result", timeout=30)
result.raise_for_status()
print(result.json())
Reminder: file_path values in results are local paths on the server (this repo does not ship object storage/static hosting by default).
Once you can hit the API successfully, the fastest way to get “preview matching” is to follow the per-model page: