maple.backend.policy.smolvla.SmolVLAPolicy

class maple.backend.policy.smolvla.SmolVLAPolicy

Backend for SmolVLA vision-language-action models.

SmolVLA is a compact generalist robot policy that conditions on visual observations, proprioceptive state, and natural language instructions to predict robot actions. The model handles multi-modal observations including multiple camera views and robot state information, making it suitable for complex manipulation tasks.

The backend manages Docker containers running the SmolVLA inference server, which loads the model from HuggingFace and serves predictions via HTTP API.

Methods

act(handle, payload, instruction[, model_kwargs])

Get action prediction for a single observation.

info()

Get policy backend information and capabilities.

Attributes

name