maple.backend.policy.smolvla.SmolVLAPolicy

class maple.backend.policy.smolvla.SmolVLAPolicy

Backend for SmolVLA vision-language-action models.

SmolVLA is a compact generalist robot policy that conditions on visual observations, proprioceptive state, and natural language instructions to predict robot actions. The model handles multi-modal observations including multiple camera views and robot state information, making it suitable for complex manipulation tasks.

The backend manages Docker containers running the SmolVLA inference server, which loads the model from HuggingFace and serves predictions via HTTP API.

Methods

`act`(handle, payload, instruction[, model_kwargs])	Get action prediction for a single observation.
`info`()	Get policy backend information and capabilities.

Attributes

name