maple.utils.eval.BatchEvaluator.run_single

BatchEvaluator.run_single(policy_id: str, env_id: str, task: str, instruction: str | None = None, seed: int = 0, max_steps: int = 300, timeout: int = 200, env_kwargs: Dict[str, Any] | None = {}, model_kwargs: Dict[str, Any] | None = {}, save_video: bool = False, video_path: str | None = None) → EvalResult

Run a single evaluation episode.

Executes one episode via the daemon /run endpoint and packages the results into an EvalResult. Handles errors gracefully by storing error information in the result rather than raising.

Also persists the result to the database for tracking.

Parameters:

policy_id – Policy container ID to use.
env_id – Environment container ID to use.
task – Task specification to execute.
instruction – Optional instruction override.
seed – Random seed for reproducibility.
max_steps – Maximum steps before truncation.
timeout – Timeout multiplier for HTTP request.
env_kwargs – Env-specific parameters.
model_kwargs – Model-specific parameters.
save_video – Whether to record video.
video_path – Optional path for video output.

Returns:

EvalResult with episode outcomes and metrics.