Getting started

This page is the orientation an external agent needs before calling any mutating tool. Read it once; the conventions here apply to every tool.

Always describe before mutating

The first call of any session is `describe_video`. It returns the full project JSON — background, video layers, image layers, text layers, shapes, groups, animations, styles, layer order, embed origins. It is free and never mutates anything.

You need describe_video because you must not invent element ids or filenames. Read them from the response. The single most common way an agent gets lost is skipping this step and constructing an id like image.title from intuition — the real id is whatever the project JSON says it is.

A typical session is small:

describe_video — see what's there.
Plan the change.
Call one or more mutating tools.
save_version once at the end.

The save-version bracket

Versions are user-visible: the editor's Versions panel lists every saved version, and the user flicks between them to compare states or roll back. An edit you make without saving a version is one the user can't easily revisit.

Wrap a session like this:

save_version(name="baseline before <task>")   ← rollback point
… your mutations …
save_version(name="<short description>")        ← end-of-task marker

Call save_version once per logical change-set, not once per tool call. A version named "add fade-in for stars" is useful; thirty versions each named "change" are noise. Use a short imperative-mood label — the user sees it verbatim.

Element ids

Every layer is addressed by a prefixed id. Six shapes appear across the catalog:

Prefix	Element
`video.<id>`	A video layer — carries a `clip` filename, renders its source mp4 into the layer box.
`image.<id>`	An image layer — renders an uploaded bitmap.
`text.<id>`	A text layer — renders live typeset text (multi-line, auto-fit).
`shapes.<id>`	A shape layer — `rect`, `ellipse`, `triangle`, or `star`.
`group.<id>`	A layer group — holds an ordered `children[]` and composes a transform onto every descendant.
`background.canvas`	The sentinel canvas backdrop. Exactly one per project; always painted at the back.

<id> is the id field of the matching entry in describe_video's output — never guessed.

Groups take a bare id in a few places. ungroup_layers, rename_group, and set_group_parent's parentGroupId argument take the bare id (e.g. "header"). Everywhere else — add_keyframe, move_layer, set_layer_fill, set_group_parent's elementId — use the full group.<id> form.

The layer tree

project.layer_order is the root-level z-ordering only. A group's children live under that group's children[], not in layer_order. The composition is a tree, not a flat list. describe_video surfaces groups[] and a top-first ordering so you don't have to re-derive the structure.

The canvas backdrop (background.canvas) is not in layer_order — it is always painted first, behind everything.

The coordinate system

The canvas defaults to 1080 × 1920 (9:16) and can be resized per project — 1080×1080 square, 1920×1080 landscape, 1080×1350 (4:5), or a custom size.

A layer's (x, y) is the CENTRE of its bounding box in canvas pixels — not the top-left. This matches Premiere / Final Cut / Motion. Consequences:

To centre a layer on a 1080×1920 canvas, (x, y) = (540, 960).
To place a 200×80 label flush against the top-left corner, its centre is (100, 40) — half its width and half its height in.
(x, y) = (0, 0) puts the layer centre at the canvas corner, so three-quarters of the layer ends up off-canvas.

width and height are pixel dimensions of the bounding box. rotation is in degrees, clockwise.

Frames vs seconds

The timeline runs at 30 fps. Every frame: argument is an integer, 0-indexed frame number. Convert with frames = round(seconds × 30):

A 1-second fade is frames 0..30.
A 2.5-second hold is frames 0..75.
"Two seconds in" is frame 60.

set_duration is the exception — it takes a duration in seconds directly (it sets the composition length).

Animation tracks

Each video / image / text / shapes / group layer can carry animation tracks under project.animations[elementId], keyed by property: x, y, width, height, scale, rotation, opacity. Each property's track is a sorted array of { frame, value, easing? } keyframes.

The key rule, After Effects / Premiere / Final Cut style: when a track exists on a property, it overrides the layer's static value at every frame. So move_layer sets the un-animated default; if the layer has an x track, the track wins. To animate, use add_keyframe. To set a value that isn't animated, use move_layer.

Track values are absolute for leaf layers: x/y are the layer centre's canvas-space pixel position, width/height are pixel dimensions, rotation is degrees, scale orbits the layer centre (1 = no change), opacity is 0..1.

Groups have no static body — their x/y track values are translation offsets applied around the group's frozen pivot, and a group transform composes onto every descendant. A group rotating 30° rotates everything inside it 30° on top of each child's own rotation.

Extrapolation past the ends

A separate per-property setting controls what happens before the first keyframe and after the last. Set it with set_track_loop:

hold (default) — keep the boundary keyframe's value forever.
loop — wrap past the last keyframe back to the first; the animation restarts.
ping-pong — alternate direction each cycle, bouncing back and forth.
cycle — wrap like loop, but each cycle adds the boundary delta (endless rotation or scrolling).

Tracks with fewer than two keyframes ignore the loop mode.

Assets must exist first

add_image_layer, set_image_filename, add_video_layer, set_video_clip, and add_audio_overlay all reference a filename that must already be uploaded. There is no MCP tool for uploading a file. Assets are uploaded by dragging them into the editor (or via the /api/upload-asset / /api/upload-clip HTTP routes). If you reference a filename that isn't uploaded, the tool fails. describe_video only lists layers, not the asset bucket — confirm the upload with the user before referencing a new filename.