eval
Run batch evaluation across multiple tasks and seeds.
Synopsis
maple eval POLICY_ID ENV_ID BACKEND [OPTIONS]
Description
The eval command runs comprehensive batch evaluations, automatically iterating
over tasks and seeds. Results are saved as JSON, with optional Markdown and CSV exports.
Arguments
POLICY_IDID of a running policy (e.g.,
openvla-7b-a1b2c3d4)ENV_IDID of a running environment (e.g.,
libero-x1y2z3w4)BACKENDEnvironment backend to load (e.g.,
libero)
Options
--tasks, -t TEXT(required)Tasks to evaluate. Can be:
Suite name:
libero_10(fetches all tasks in suite)Explicit list:
libero_10/0,libero_10/1,libero_10/2Single task:
libero_10/0
--seeds, -s TEXTRandom seeds (comma-separated). Default:
0--max-steps, -m INTEGERMaximum steps per episode. Default: from config (300)
--unnorm-key, -u TEXTDataset key for action unnormalization (policy-specific)
--save-video, -vSave rollout videos
--video-dir TEXTDirectory for videos. Default:
~/.maple/videos--model-kwargs, -u STRModel-specific parameters
--env-kwargs, -e STREnv-specific parameters
--output, -o PATHOutput directory for results. Default:
~/.maple/results--format, -f TEXTOutput format:
json,markdown,csv,all. Default:json--parallel, -p INTEGERNumber of parallel evaluations (experimental). Default: 1
--port INTEGERDaemon port to connect to (default: from config, typically 8000)
Examples
Basic Evaluation
# Evaluate on full suite
maple eval openvla-7b-abc libero-xyz libero --tasks libero_10 --seeds 0,1,2
# Evaluate specific tasks
maple eval openvla-7b-abc libero-xyz libero \
--tasks libero_10/0,libero_10/1 \
--seeds 0,1,2,3,4
With Video Recording
maple eval openvla-7b-abc libero-xyz libero \
--tasks libero_10 \
--seeds 0 \
--save-video \
--video-dir ./videos
Custom Output
maple eval openvla-7b-abc libero-xyz libero \
--tasks libero_10 \
--seeds 0,1,2 \
--output ./results \
--format all # JSON + Markdown + CSV
Extended Evaluation
maple eval openvla-7b-abc libero-xyz libero \
--tasks libero_10 \
--seeds 0,1,2,3,4,5,6,7,8,9 \
--max-steps 500
Output
Console Output
Batch Evaluation
Policy: openvla-7b-abc123
Environment: libero-xyz789
Tasks: 10
Seeds: [0, 1, 2]
Total episodes: 30
Max steps: 300
[1/30] ✓ libero_10/0 seed=0 reward=1.000
[2/30] ✓ libero_10/0 seed=1 reward=1.000
[3/30] ✗ libero_10/0 seed=2 reward=0.234
...
Batch Evaluation Results: batch-20240131-123456
==================================================
Policy: openvla-7b-abc123
Environment: libero-xyz789
Tasks: 10 | Seeds: 3
Overall Results:
Episodes: 30
Success Rate: 72.0%
Avg Reward: 0.847
Avg Steps: 156.3
Avg Duration: 12.34s
Per-Task Results:
libero_10/0: 100.0% (3/3) reward=1.000
libero_10/1: 66.7% (2/3) reward=0.756
libero_10/2: 100.0% (3/3) reward=1.000
...
✓ Results saved: results/batch-20240131-123456.json
JSON Output Structure
{
"batch_id": "batch-20240131-123456",
"policy_id": "openvla-7b-abc123",
"env_id": "libero-xyz789",
"tasks": ["libero_10/0", "libero_10/1", ...],
"seeds": [0, 1, 2],
"max_steps": 300,
"started_at": 1706745600.0,
"finished_at": 1706746200.0,
"total_episodes": 30,
"successful_episodes": 22,
"success_rate": 0.733,
"avg_reward": 0.847,
"avg_steps": 156.3,
"task_stats": {
"libero_10/0": {
"total": 3,
"successful": 3,
"success_rate": 1.0,
"avg_reward": 1.0
},
...
},
"results": [
{
"run_id": "eval-abc123",
"task": "libero_10/0",
"seed": 0,
"success": true,
"steps": 156,
"total_reward": 1.0,
"duration_seconds": 12.34
},
...
]
}
See Also
Quickstart — Basic evaluation walkthrough
maple report— Generate reports from saved results