maple.utils.eval

Batch evaluation system for MAPLE.

This module provides tools for running large-scale evaluations of policies across multiple tasks and random seeds. It handles parallel execution, result aggregation, statistics computation, and output formatting.

Key features: - Single and batch episode execution - Parallel evaluation with configurable workers - Comprehensive result tracking and statistics - Per-task performance breakdown - Multiple output formats (JSON, Markdown, CSV) - Progress tracking with callbacks - Database integration for result persistence

The evaluation system communicates with the MAPLE daemon to execute episodes and aggregates results into structured formats for analysis and reporting.

Classes

BatchEvaluator([daemon_url])

Orchestrator for batch evaluations.

BatchResults(batch_id, policy_id, env_id, ...)

Aggregated results from a batch evaluation.

EvalResult(run_id, policy_id, env_id, task, ...)

Result container for a single evaluation episode.

Functions

format_results_csv(batch)

Format batch results as CSV data.

format_results_markdown(batch)

Format batch results as Markdown document.