Rotobot Next CLI Reference
Rotobot Next is a command-line tool that produces per-person spline JSON and (optionally) raster sidecars from a folder of frames. A network license relay must be reachable while the binary runs — see the Relay Server page for setup, and the FAQ for common questions.
CLI Arguments
usage: rotobot_next [-h] --image_folder IMAGE_FOLDER
[--output_folder OUTPUT_FOLDER]
[--depth_folder DEPTH_FOLDER] [--auto_depth]
[--auto_depth_window AUTO_DEPTH_WINDOW]
[--z_threshold Z_THRESHOLD]
[--mask_type {sam3,vit,mematte,user}]
[--mattes_folder MATTES_FOLDER] [--mattes_output]
[--debug_output] [--filled_shapes_per_person_output]
[--output_trimap] [--output_sam3_mask]
[--sam3_mask_combine {best_iou,union_top2,union}]
[--vitmatte_tile_size VITMATTE_TILE_SIZE]
[--vitmatte_tile_overlap VITMATTE_TILE_OVERLAP]
[--mematte_max_tokens MEMATTE_MAX_TOKENS]
[--bbox_thresh BBOX_THRESH] [--use_mask]
[--checkpoint_path CHECKPOINT_PATH]
[--detector_name DETECTOR_NAME]
[--segmentor_name SEGMENTOR_NAME] [--fov_name FOV_NAME]
[--detector_path DETECTOR_PATH]
[--segmentor_path SEGMENTOR_PATH] [--fov_path FOV_PATH]
[--mhr_path MHR_PATH]
[--mematte_config_path MEMATTE_CONFIG_PATH]
[--mematte_checkpoint_path MEMATTE_CHECKPOINT_PATH]
rotobot_next — produce per-person spline JSON and optional raster sidecars
(mattes, trimaps, debug overlays) from a folder of frames, with optional
depth-aware occlusion gating.
options:
-h, --help show this help message and exit
input / output:
--image_folder IMAGE_FOLDER
Folder of input frames (JPG/PNG/EXR; sorted
alphabetically).
--output_folder OUTPUT_FOLDER
Output folder for JSON + sidecars (default:
./output/<image_folder_name>).
depth-aware occlusion gating:
--depth_folder DEPTH_FOLDER
Folder of metric-depth EXR files (one per input frame,
sorted 1:1). When supplied, enables depth-aware
occlusion gating; when omitted, the pipeline runs in
pure-RGB mode.
--auto_depth Run a Depth-Anything-3 (DA3METRIC-LARGE) prepass over
--image_folder before matting. Writes one EXR per
frame to <output_folder>/auto_depth/ and then enables
depth-aware occlusion gating against that folder.
Cannot be combined with --depth_folder (the explicit
folder always wins).
--auto_depth_window AUTO_DEPTH_WINDOW
Temporal window (frames) for the --auto_depth prepass.
Default 3 matches the standalone DA3 CLI; the centre
frame of each window is written to disk.
--z_threshold Z_THRESHOLD
Metric depth tolerance (m) at a limb's midline, scaled
per-segment by SEGMENT_DEPTH_MULT. Only used with
--depth_folder. Default 0.025 — 0.025 and 0.05 produce
no artefacts vs. the no-depth baseline; the previous
0.17 default left visible artefacts. Tighter z also
runs ~20% faster at 4K (rays terminate earlier).
matting:
--mask_type {sam3,vit,mematte,user}
Matte source: mematte (default — Memory Efficient
Matting ViT-B Composition-1k), vit (HF ViTMatte
tiled), sam3 (raw SAM3 mask), or user (read pre-
computed mattes from --mattes_folder).
--mattes_folder MATTES_FOLDER
Folder of pre-computed matte images. Only used with
--mask_type user.
output sidecars (off by default):
--mattes_output Write final alpha mattes as JPEGs under
<output_folder>/mattes/.
--debug_output Write pixel-based debug visualisations under
<output_folder>/debug/.
--filled_shapes_per_person_output
Write per-person filled-shape rasters under
<output_folder>/filled_shapes/.
--output_trimap Save the per-person trimap fed to ViTMatte as JPEG
under <output_folder>/trimaps/ (0=bg, 128=unknown band
between SAM3 erode/dilate, 255=fg). One JPG per person
per frame. Pair with --output_sam3_mask to diagnose
whether bad alpha edges come from the trimap or from
the matting model itself.
--output_sam3_mask Save the greyscale SAM3 mask (max-of-BGR over the
cleaned SAM3 matte, before thresholding) as JPEG under
<output_folder>/sam3_masks/. This is the source the
trimap is built from.
tuning (defaults are good):
--sam3_mask_combine {best_iou,union_top2,union}
How to combine SAM3's three candidate masks (subpart /
part / whole). union_top2 (default): OR top two by IoU
— best coverage/strays trade. best_iou: single
highest-IoU candidate — fewer hallucinations, may drop
head/hair pixels. union: OR all three (legacy —
broader coverage, more strays). Connected-component
gating runs after the combine.
--vitmatte_tile_size VITMATTE_TILE_SIZE
Edge length (px) of each square ViTMatte tile. Larger
tiles see more global context per inference but use
more VRAM. Default 1024.
--vitmatte_tile_overlap VITMATTE_TILE_OVERLAP
Pixel overlap between adjacent ViTMatte tiles. Default
128.
--mematte_max_tokens MEMATTE_MAX_TOKENS
MEMatte backbone token budget. At 4K with topk=0.25
the router picks ~8000 tokens so this cap rarely
fires. Default 24000.
--bbox_thresh BBOX_THRESH
Bounding box detection threshold. Default 0.8.
--use_mask Use mask-conditioned prediction (segmentation mask
auto-generated from bbox).
model paths (advanced — defaults baked into the docker image):
--checkpoint_path CHECKPOINT_PATH
Path to SAM 3D Body model checkpoint.
--detector_name DETECTOR_NAME
Human detection model name (default: vitdet).
--segmentor_name SEGMENTOR_NAME
Human segmentation model name (default: sam3).
--fov_name FOV_NAME FOV estimation model name (default: moge2).
--detector_path DETECTOR_PATH
Human detection model folder (or set
SAM3D_DETECTOR_PATH).
--segmentor_path SEGMENTOR_PATH
Human segmentation model folder (or set
SAM3D_SEGMENTOR_PATH).
--fov_path FOV_PATH FOV estimation model folder (or set SAM3D_FOV_PATH).
--mhr_path MHR_PATH MoHR/assets folder (or set SAM3D_MHR_PATH).
--mematte_config_path MEMATTE_CONFIG_PATH
Path to MEMatte LazyConfig file. Default: ViT-B
Composition-1k.
--mematte_checkpoint_path MEMATTE_CHECKPOINT_PATH
Path to MEMatte .pth checkpoint. Default: ViT-B
Composition-1k weights.
Examples:
# Pure-RGB pass, defaults (mematte alpha, union_top2 SAM3 combine).
rotobot_next --image_folder ./frames --output_folder ./out
# Depth-aware pass with built-in DA3 prepass — no separate
# depth_anything binary needed. Writes <out>/auto_depth/ first,
# then mats against it.
rotobot_next \
--image_folder ./frames \
--output_folder ./out \
--auto_depth --z_threshold 0.05 \
--debug_output
# Depth-aware pass against a pre-computed depth folder (e.g. a
# DA3 run cached from a previous batch) with all sidecars on.
rotobot_next \
--image_folder ./frames \
--depth_folder ./frames_depth \
--output_folder ./out \
--mattes_output --debug_output \
--output_trimap --output_sam3_mask \
--filled_shapes_per_person_output
Environment variables (used when the matching --*_path flag is empty):
SAM3D_DETECTOR_PATH human detection model folder
SAM3D_SEGMENTOR_PATH human segmentation model folder
SAM3D_FOV_PATH FOV estimation model folder
SAM3D_MHR_PATH MoHR / assets folder
Dependencies & Open Source
The following libraries are packaged and utilized within the Rotobot Next engine:
CPython https://github.com/python/cpython/blob/1857a40807daeae3a1bf5efb682de9c9ae6df845/LICENSE
Source Repository
CUDA https://docs.nvidia.com/cuda/archive/12.8.1/eula/index.html
Source Repository
cuDNN https://docs.nvidia.com/deeplearning/cudnn/backend/latest/reference/eula.html
Source Repository
Depth Anything 3 https://raw.githubusercontent.com/ByteDance-Seed/Depth-Anything-3/refs/heads/main/LICENSE
Source Repository
Detectron2 https://github.com/facebookresearch/detectron2/blob/main/LICENSE
Source Repository
DINOv3 https://github.com/facebookresearch/dinov3/blob/main/LICENSE.md
Source Repository
ftfy fixes text for you Python https://github.com/rspeer/python-ftfy/blob/74dd0452b48286a3770013b3a02755313bd5575e/LICENSE.txt
Source Repository
HuggingFace Transformers https://github.com/huggingface/transformers/blob/main/LICENSE
Source Repository
iopath https://raw.githubusercontent.com/facebookresearch/iopath/refs/heads/main/LICENSE
Source Repository
MEMatte https://github.com/HainingTHU/MEMatte/blob/main/LICENSE
Source Repository
Momentum Human Rig https://github.com/facebookresearch/MHR/blob/main/LICENSE
Source Repository
MoGE https://raw.githubusercontent.com/microsoft/MoGe/refs/heads/main/LICENSE
Source Repository
numpy https://github.com/numpy/numpy/blob/a90ef57574c501a780fe834123b20fcea1329f90/LICENSE.txt
Source Repository
OpenColorIO https://github.com/AcademySoftwareFoundation/OpenColorIO/blob/main/LICENSE
Source Repository
OpenImageIO https://github.com/AcademySoftwareFoundation/OpenImageIO/blob/main/LICENSE.md
Source Repository
opencv https://github.com/opencv/opencv/blob/105a7747207e678aca53aee17b0f77f1bbd6bdef/LICENSE
Source Repository
PyInstaller https://github.com/pyinstaller/pyinstaller/blob/9efb6f823ef872f9ff6cd365396df7a63459582b/COPYING.txt
Source Repository
pytorch https://github.com/pytorch/pytorch/blob/main/LICENSE
Source Repository
regex python https://github.com/micbou/regex/blob/d2ef9c49e34bb6dbbe5fa503755dd07c62365aae/PyPI/PKG-INFO
Source Repository
SAM3 Meta https://github.com/facebookresearch/sam3/blob/11dec2936de97f2857c1f76b66d982d5a001155d/LICENSE
Source Repository
SAM3 Body 3D https://github.com/facebookresearch/sam-3d-body/blob/c259bfc38500b1ce9e7048e1a4592e13b84108fe/LICENSE
Source Repository
SciPy https://raw.githubusercontent.com/scipy/scipy/refs/heads/main/LICENSE.txt
Source Repository
timm https://github.com/huggingface/pytorch-image-models/blob/main/LICENSE
Source Repository
tqdm https://github.com/tqdm/tqdm/blob/0ed5d7f18fa3153834cbac0aa57e8092b217cc16/LICENCE
Source Repository
ViTMatte https://github.com/hustvl/ViTMatte/blob/main/LICENSE
Source Repository