Rotobot Next CLI Reference

Rotobot Next is a command-line tool that produces per-person spline JSON and (optionally) raster sidecars from a folder of frames. A network license relay must be reachable while the binary runs — see the Relay Server page for setup, and the FAQ for common questions.

CLI Arguments

usage: rotobot_next [-h] --image_folder IMAGE_FOLDER
                    [--output_folder OUTPUT_FOLDER]
                    [--depth_folder DEPTH_FOLDER] [--auto_depth]
                    [--auto_depth_window AUTO_DEPTH_WINDOW]
                    [--z_threshold Z_THRESHOLD]
                    [--mask_type {sam3,vit,mematte,user}]
                    [--mattes_folder MATTES_FOLDER] [--mattes_output]
                    [--debug_output] [--filled_shapes_per_person_output]
                    [--output_trimap] [--output_sam3_mask]
                    [--sam3_mask_combine {best_iou,union_top2,union}]
                    [--vitmatte_tile_size VITMATTE_TILE_SIZE]
                    [--vitmatte_tile_overlap VITMATTE_TILE_OVERLAP]
                    [--mematte_max_tokens MEMATTE_MAX_TOKENS]
                    [--bbox_thresh BBOX_THRESH] [--use_mask]
                    [--checkpoint_path CHECKPOINT_PATH]
                    [--detector_name DETECTOR_NAME]
                    [--segmentor_name SEGMENTOR_NAME] [--fov_name FOV_NAME]
                    [--detector_path DETECTOR_PATH]
                    [--segmentor_path SEGMENTOR_PATH] [--fov_path FOV_PATH]
                    [--mhr_path MHR_PATH]
                    [--mematte_config_path MEMATTE_CONFIG_PATH]
                    [--mematte_checkpoint_path MEMATTE_CHECKPOINT_PATH]

rotobot_next — produce per-person spline JSON and optional raster sidecars
(mattes, trimaps, debug overlays) from a folder of frames, with optional
depth-aware occlusion gating.

options:
  -h, --help            show this help message and exit

input / output:
  --image_folder IMAGE_FOLDER
                        Folder of input frames (JPG/PNG/EXR; sorted
                        alphabetically).
  --output_folder OUTPUT_FOLDER
                        Output folder for JSON + sidecars (default:
                        ./output/<image_folder_name>).

depth-aware occlusion gating:
  --depth_folder DEPTH_FOLDER
                        Folder of metric-depth EXR files (one per input frame,
                        sorted 1:1). When supplied, enables depth-aware
                        occlusion gating; when omitted, the pipeline runs in
                        pure-RGB mode.
  --auto_depth          Run a Depth-Anything-3 (DA3METRIC-LARGE) prepass over
                        --image_folder before matting. Writes one EXR per
                        frame to <output_folder>/auto_depth/ and then enables
                        depth-aware occlusion gating against that folder.
                        Cannot be combined with --depth_folder (the explicit
                        folder always wins).
  --auto_depth_window AUTO_DEPTH_WINDOW
                        Temporal window (frames) for the --auto_depth prepass.
                        Default 3 matches the standalone DA3 CLI; the centre
                        frame of each window is written to disk.
  --z_threshold Z_THRESHOLD
                        Metric depth tolerance (m) at a limb's midline, scaled
                        per-segment by SEGMENT_DEPTH_MULT. Only used with
                        --depth_folder. Default 0.025 — 0.025 and 0.05 produce
                        no artefacts vs. the no-depth baseline; the previous
                        0.17 default left visible artefacts. Tighter z also
                        runs ~20% faster at 4K (rays terminate earlier).

matting:
  --mask_type {sam3,vit,mematte,user}
                        Matte source: mematte (default — Memory Efficient
                        Matting ViT-B Composition-1k), vit (HF ViTMatte
                        tiled), sam3 (raw SAM3 mask), or user (read pre-
                        computed mattes from --mattes_folder).
  --mattes_folder MATTES_FOLDER
                        Folder of pre-computed matte images. Only used with
                        --mask_type user.

output sidecars (off by default):
  --mattes_output       Write final alpha mattes as JPEGs under
                        <output_folder>/mattes/.
  --debug_output        Write pixel-based debug visualisations under
                        <output_folder>/debug/.
  --filled_shapes_per_person_output
                        Write per-person filled-shape rasters under
                        <output_folder>/filled_shapes/.
  --output_trimap       Save the per-person trimap fed to ViTMatte as JPEG
                        under <output_folder>/trimaps/ (0=bg, 128=unknown band
                        between SAM3 erode/dilate, 255=fg). One JPG per person
                        per frame. Pair with --output_sam3_mask to diagnose
                        whether bad alpha edges come from the trimap or from
                        the matting model itself.
  --output_sam3_mask    Save the greyscale SAM3 mask (max-of-BGR over the
                        cleaned SAM3 matte, before thresholding) as JPEG under
                        <output_folder>/sam3_masks/. This is the source the
                        trimap is built from.

tuning (defaults are good):
  --sam3_mask_combine {best_iou,union_top2,union}
                        How to combine SAM3's three candidate masks (subpart /
                        part / whole). union_top2 (default): OR top two by IoU
                        — best coverage/strays trade. best_iou: single
                        highest-IoU candidate — fewer hallucinations, may drop
                        head/hair pixels. union: OR all three (legacy —
                        broader coverage, more strays). Connected-component
                        gating runs after the combine.
  --vitmatte_tile_size VITMATTE_TILE_SIZE
                        Edge length (px) of each square ViTMatte tile. Larger
                        tiles see more global context per inference but use
                        more VRAM. Default 1024.
  --vitmatte_tile_overlap VITMATTE_TILE_OVERLAP
                        Pixel overlap between adjacent ViTMatte tiles. Default
                        128.
  --mematte_max_tokens MEMATTE_MAX_TOKENS
                        MEMatte backbone token budget. At 4K with topk=0.25
                        the router picks ~8000 tokens so this cap rarely
                        fires. Default 24000.
  --bbox_thresh BBOX_THRESH
                        Bounding box detection threshold. Default 0.8.
  --use_mask            Use mask-conditioned prediction (segmentation mask
                        auto-generated from bbox).

model paths (advanced — defaults baked into the docker image):
  --checkpoint_path CHECKPOINT_PATH
                        Path to SAM 3D Body model checkpoint.
  --detector_name DETECTOR_NAME
                        Human detection model name (default: vitdet).
  --segmentor_name SEGMENTOR_NAME
                        Human segmentation model name (default: sam3).
  --fov_name FOV_NAME   FOV estimation model name (default: moge2).
  --detector_path DETECTOR_PATH
                        Human detection model folder (or set
                        SAM3D_DETECTOR_PATH).
  --segmentor_path SEGMENTOR_PATH
                        Human segmentation model folder (or set
                        SAM3D_SEGMENTOR_PATH).
  --fov_path FOV_PATH   FOV estimation model folder (or set SAM3D_FOV_PATH).
  --mhr_path MHR_PATH   MoHR/assets folder (or set SAM3D_MHR_PATH).
  --mematte_config_path MEMATTE_CONFIG_PATH
                        Path to MEMatte LazyConfig file. Default: ViT-B
                        Composition-1k.
  --mematte_checkpoint_path MEMATTE_CHECKPOINT_PATH
                        Path to MEMatte .pth checkpoint. Default: ViT-B
                        Composition-1k weights.

Examples:
  # Pure-RGB pass, defaults (mematte alpha, union_top2 SAM3 combine).
  rotobot_next --image_folder ./frames --output_folder ./out

  # Depth-aware pass with built-in DA3 prepass — no separate
  # depth_anything binary needed. Writes <out>/auto_depth/ first,
  # then mats against it.
  rotobot_next \
      --image_folder ./frames \
      --output_folder ./out \
      --auto_depth --z_threshold 0.05 \
      --debug_output

  # Depth-aware pass against a pre-computed depth folder (e.g. a
  # DA3 run cached from a previous batch) with all sidecars on.
  rotobot_next \
      --image_folder ./frames \
      --depth_folder  ./frames_depth \
      --output_folder ./out \
      --mattes_output --debug_output \
      --output_trimap --output_sam3_mask \
      --filled_shapes_per_person_output

Environment variables (used when the matching --*_path flag is empty):
  SAM3D_DETECTOR_PATH    human detection model folder
  SAM3D_SEGMENTOR_PATH   human segmentation model folder
  SAM3D_FOV_PATH         FOV estimation model folder
  SAM3D_MHR_PATH         MoHR / assets folder

Dependencies & Open Source

The following libraries are packaged and utilized within the Rotobot Next engine:

CPython https://github.com/python/cpython/blob/1857a40807daeae3a1bf5efb682de9c9ae6df845/LICENSE