maple.backend.policy.gr00tn15.GR00TN15Policy.act

GR00TN15Policy.act(handle: PolicyHandle, payload: Any, instruction: str, model_kwargs: Dict[str, Any] | None = None) → List[float]

Get action prediction for a single observation.

Sends visual observations, proprioceptive state, and language instruction to the GR00T model and receives a predicted action. GR00T uses flow matching to iteratively denoise actions from gaussian noise.

The server expects observations in the format: - observation.images.*: Base64 encoded camera images - observation.state: Robot proprioceptive state as list - prompt: Natural language instruction

Parameters:

handle – Policy handle for the running container.
payload – Observation payload containing: - Image keys (e.g., ‘image’, ‘wrist_image’): camera observations - ‘state’ or ‘observation.state’: robot proprioceptive state
instruction – Natural language instruction for the task.
model_kwargs – Optional runtime parameters (not used for GR00T, configuration is done at load time via model_load_kwargs)

Returns:

Predicted action as list of floats.