Frequently Asked Questions

Which `--mask_type` should I pick?

The default is mematte (Memory Efficient Matting, ViT-B Composition-1k). It produces the cleanest hair and silhouette edges of the three model-based options and is the right starting point.

mematte — default. Good edges, moderate VRAM.
vit — HuggingFace ViTMatte, tiled. Higher VRAM cost, useful if MEMatte underperforms on a specific frame.
sam3 — raw SAM3 mask, no alpha refinement. Fastest; use only when you want SAM3's coarse output and will refine downstream.
user — read pre-computed mattes from --mattes_folder. Use when you have alpha mattes from a different tool and only want the spline JSON from Rotobot Next.

When do I need depth-aware gating, and what `--z_threshold` should I use?

Turn on depth-aware occlusion gating whenever one body part can cross another — typically multi-person scenes where one person stands in front of another, or arms/legs that overlap the torso. There are two ways to feed depth into rotobot_next:

--auto_depth — runs the bundled Depth-Anything-3 (DA3METRIC-LARGE) prepass over --image_folder automatically. No separate depth pass needed; the EXRs land in <output_folder>/auto_depth/ and are then consumed by the matting pass. Best for one-off runs and when you don't already have depth cached.
--depth_folder ./frames_depth — point at a pre-computed folder of metric-depth EXRs (one per input frame, sorted 1:1). Best when you're iterating on matting parameters and don't want to re-run the depth pass each time, or when you have a higher-quality depth source than DA3.

The default --z_threshold 0.025 (metres at the limb midline, scaled per-segment) is the result of a five-value wedge run on 4K clips at [0.2, 0.1, 0.075, 0.05, 0.025]. Findings:

0.025 and 0.05 — no artefacts versus the no-depth baseline; the recommended range.
0.17 (the historical default) — leaves visible artefacts; do not use.
Tighter thresholds also run roughly 20% faster at 4K because rays terminate earlier.

--z_threshold applies the same way regardless of how depth arrives — DA3-via---auto_depth writes the same metric-depth EXR format the --depth_folder path expects.

What exactly does `--auto_depth` do?

--auto_depth runs Depth-Anything-3 (DA3METRIC-LARGE) as a disk-backed prepass before matting starts:

Load DA3METRIC-LARGE onto the GPU.
Iterate every frame in --image_folder with a temporal window (default 3 frames; tunable via --auto_depth_window) and write the centre-frame depth to <output_folder>/auto_depth/<stem>_depth.exr — 16-bit float, PIZ compressed, with depthMin / depthMax / camera matrices in the header.
Delete the model and empty the CUDA allocator so the matting models (SAM3, SAM3D-Body, MEMatte/ViTMatte, MoGE-2) get the full VRAM budget.
Continue into the normal matting pipeline, treating the auto_depth/ folder as if it were a manually supplied --depth_folder.

Why a disk-backed prepass instead of in-process: DA3METRIC-LARGE plus the matting models won't co-resident on a 24 GB GPU, and swapping tensors via the CPU is too slow at 4K. Writing depth to disk and tearing the DA3 process state down completely is the cleanest separation.

Tradeoffs vs. supplying your own --depth_folder:

Pro — single binary, single invocation; no dependency on a separate depth tool.
Pro — header metadata round-trips with the same depthMin/depthMax attributes the legacy depth pipeline expects, so it's bit-compatible with everything downstream.
Con — depth runs every time. If you batch the same clip with several matting variations, run --auto_depth once, then feed the resulting auto_depth/ folder back in via --depth_folder on subsequent runs.
Con — DA3 is single-frame inference, no temporal smoothing. A separately-produced temporally-regularised depth pass (or one with bilateral filtering applied) can be cleaner on certain clips.

--auto_depth cannot be combined with --depth_folder; if both are passed the explicit folder wins and the prepass is skipped (with a warning to stderr).

How are licenses counted?

Per process. One rotobot_next worker = one seat. Running two workers in parallel on the same machine consumes two seats. See the Relay Server page for details.

Where does output land if I omit `--output_folder`?

In ./output/<image_folder_name>/ relative to the current working directory. The JSON file plus any requested sidecar folders (mattes/, debug/, trimaps/, sam3_masks/, filled_shapes/) are written under that root.

Can I run multiple workers on one GPU?

Yes, depending on input resolution and available VRAM:

HD (1080p) / 1440p — two workers fit comfortably on a 24 GB+ GPU (each worker takes around 14–15 GB).
4K / UHD — single worker. Two workers will OOM the ViTMatte / MEMatte tile pass on a 24 GB card and run unreliably on 48 GB.

Most of the batch scripts in visualisation/ use a flock-based claim pattern to share a queue between worker processes safely.

What input formats does Rotobot Next accept?

JPG, PNG, and EXR sequences. EXR input goes through OpenColorIO; the bundle ships the ACES 1.0.3 config so ACEScg-linear EXRs are colour-managed correctly by default. Set OCIO=/path/to/config.ocio to point at a different config.

Why was my first run slow on a cold machine?

The bundled binary materialises the SAM3, SAM3D-Body, ViTMatte / MEMatte, and MoGE-2 weights on first call (about 30–60 seconds on a typical NVMe). Subsequent runs in the same process tree reuse cached weights and start instantly.

--help itself returns in well under a second — it skips model loading entirely.

What environment variables does the binary read?

TOKGAN_RELAY_URL: License relay URL. Default http://0.0.0.0:6349. See the Relay Server page.
OCIO: Path to an OpenColorIO config file (required for non-default colour management; the bundled ACES 1.0.3 config covers most VFX workflows).
SAM3D_DETECTOR_PATH: Override the bundled vitdet detector folder.
SAM3D_SEGMENTOR_PATH: Override the bundled SAM3 segmentor folder.
SAM3D_FOV_PATH: Override the bundled MoGE-2 FOV estimator folder.
SAM3D_MHR_PATH: Override the bundled Momentum Human Rig assets folder.