maple.utils.eval
Batch evaluation system for MAPLE.
This module provides tools for running large-scale evaluations of policies across multiple tasks and random seeds. It handles parallel execution, result aggregation, statistics computation, and output formatting.
Key features: - Single and batch episode execution - Parallel evaluation with configurable workers - Comprehensive result tracking and statistics - Per-task performance breakdown - Multiple output formats (JSON, Markdown, CSV) - Progress tracking with callbacks - Database integration for result persistence
The evaluation system communicates with the MAPLE daemon to execute episodes and aggregates results into structured formats for analysis and reporting.
Classes
|
Orchestrator for batch evaluations. |
|
Aggregated results from a batch evaluation. |
|
Result container for a single evaluation episode. |
Functions
|
Format batch results as CSV data. |
|
Format batch results as Markdown document. |