==== eval ==== Run batch evaluation across multiple tasks and seeds. Synopsis ======== .. code-block:: bash maple eval POLICY_ID ENV_ID BACKEND [OPTIONS] Description =========== The ``eval`` command runs comprehensive batch evaluations, automatically iterating over tasks and seeds. Results are saved as JSON, with optional Markdown and CSV exports. Arguments ========= ``POLICY_ID`` ID of a running policy (e.g., ``openvla-7b-a1b2c3d4``) ``ENV_ID`` ID of a running environment (e.g., ``libero-x1y2z3w4``) ``BACKEND`` Environment backend to load (e.g., ``libero``) Options ======= ``--tasks, -t TEXT`` (required) Tasks to evaluate. Can be: - Suite name: ``libero_10`` (fetches all tasks in suite) - Explicit list: ``libero_10/0,libero_10/1,libero_10/2`` - Single task: ``libero_10/0`` ``--seeds, -s TEXT`` Random seeds (comma-separated). Default: ``0`` ``--max-steps, -m INTEGER`` Maximum steps per episode. Default: from config (300) ``--unnorm-key, -u TEXT`` Dataset key for action unnormalization (policy-specific) ``--save-video, -v`` Save rollout videos ``--video-dir TEXT`` Directory for videos. Default: ``~/.maple/videos`` ``--model-kwargs, -u STR`` Model-specific parameters ``--env-kwargs, -e STR`` Env-specific parameters ``--output, -o PATH`` Output directory for results. Default: ``~/.maple/results`` ``--format, -f TEXT`` Output format: ``json``, ``markdown``, ``csv``, ``all``. Default: ``json`` ``--parallel, -p INTEGER`` Number of parallel evaluations (experimental). Default: 1 ``--port INTEGER`` Daemon port to connect to (default: from config, typically 8000) Examples ======== Basic Evaluation ---------------- .. code-block:: bash # Evaluate on full suite maple eval openvla-7b-abc libero-xyz libero --tasks libero_10 --seeds 0,1,2 # Evaluate specific tasks maple eval openvla-7b-abc libero-xyz libero \ --tasks libero_10/0,libero_10/1 \ --seeds 0,1,2,3,4 With Video Recording -------------------- .. code-block:: bash maple eval openvla-7b-abc libero-xyz libero \ --tasks libero_10 \ --seeds 0 \ --save-video \ --video-dir ./videos Custom Output ------------- .. code-block:: bash maple eval openvla-7b-abc libero-xyz libero \ --tasks libero_10 \ --seeds 0,1,2 \ --output ./results \ --format all # JSON + Markdown + CSV Extended Evaluation ------------------- .. code-block:: bash maple eval openvla-7b-abc libero-xyz libero \ --tasks libero_10 \ --seeds 0,1,2,3,4,5,6,7,8,9 \ --max-steps 500 Output ====== Console Output -------------- .. code-block:: text Batch Evaluation Policy: openvla-7b-abc123 Environment: libero-xyz789 Tasks: 10 Seeds: [0, 1, 2] Total episodes: 30 Max steps: 300 [1/30] ✓ libero_10/0 seed=0 reward=1.000 [2/30] ✓ libero_10/0 seed=1 reward=1.000 [3/30] ✗ libero_10/0 seed=2 reward=0.234 ... Batch Evaluation Results: batch-20240131-123456 ================================================== Policy: openvla-7b-abc123 Environment: libero-xyz789 Tasks: 10 | Seeds: 3 Overall Results: Episodes: 30 Success Rate: 72.0% Avg Reward: 0.847 Avg Steps: 156.3 Avg Duration: 12.34s Per-Task Results: libero_10/0: 100.0% (3/3) reward=1.000 libero_10/1: 66.7% (2/3) reward=0.756 libero_10/2: 100.0% (3/3) reward=1.000 ... ✓ Results saved: results/batch-20240131-123456.json JSON Output Structure --------------------- .. code-block:: json { "batch_id": "batch-20240131-123456", "policy_id": "openvla-7b-abc123", "env_id": "libero-xyz789", "tasks": ["libero_10/0", "libero_10/1", ...], "seeds": [0, 1, 2], "max_steps": 300, "started_at": 1706745600.0, "finished_at": 1706746200.0, "total_episodes": 30, "successful_episodes": 22, "success_rate": 0.733, "avg_reward": 0.847, "avg_steps": 156.3, "task_stats": { "libero_10/0": { "total": 3, "successful": 3, "success_rate": 1.0, "avg_reward": 1.0 }, ... }, "results": [ { "run_id": "eval-abc123", "task": "libero_10/0", "seed": 0, "success": true, "steps": 156, "total_reward": 1.0, "duration_seconds": 12.34 }, ... ] } See Also ======== - :doc:`../guides/quickstart` — Basic evaluation walkthrough - ``maple report`` — Generate reports from saved results