Skip to main content

Sandbox

The sandbox is a uniform Linux compute backend (real bash + python3 + node on PATH). It backs the Claude-Code-style tool set (bash, read, write, edit, glob, grep). Every tool compiles down to a single run against the selected provider — there is no per-runtime routing anymore.

Config

A sandbox is a standalone, account-scoped record referenced from agent config by id (see Workspace & Sandbox).

// POST /accounts/me/sandboxes
{
"name": "default",
"config": {
"provider": "lambda", // lambda | e2b | daytona | kubernetes
"internet": true, // selects the internet-on/off lambda function
"permissionMode": "ask", // edit | ask | bypass
"runtimes": ["bash", "python", "node"], // advisory allow-list (best-effort)
"timeout": 120, // per-call seconds (max: lambda 300, others 600)
"memoryLimit": 512, // MB; bounded for lambda only, operator-sized otherwise
"outputLimitBytes": 65536,
"envVars": { "FOO": "bar" } // injected into every run (encrypted at rest)
}
}

Per-call limits are provider-aware. lambda runs are hard-bounded by the deployed function (timeout ≤ 300 s, memory ≤ 1024 MB). Persistent providers (e2b/daytona/ kubernetes) are long-lived and operator-sized, so a single blocking call is only capped at the harness request budget (600 s) and memory is left to the operator. Output is always truncated harness-side regardless of provider.

ProviderDocumentation
lambdaLambda Details
e2bE2B Details
daytonaDaytona Details
kubernetesKubernetes Details

Reserved (persistent) sandboxes

By default every provider is ephemeral per call (create → run → destroy); only workspace files persist (via the S3 mount). Set persistent: true to instead reserve a long-lived sandbox per workspace — a cloud dev box where pip/npm/uv installs, code, and running jobs survive across calls, scaling down on idle like Fargate. Not valid for lambda.

{
"config": {
"provider": "kubernetes", // kubernetes | daytona | e2b
"persistent": true,
"permissionMode": "bypass",
"lifecycle": {
"idleTimeoutSeconds": 1800, // scale down after 30 min idle (default 900)
"maxLifetimeSeconds": 86400 // hard expiry backstop (optional)
},
"options": { // kubernetes PVC for the coding env (optional)
"mountAwsS3Buckets": true, // S3 = shared workspace files
"persistentDiskGb": 20, // home PVC (packages/venvs/caches)
"persistentHome": "/home/node"
}
}
}

How idle scale-down happens differs per provider: kubernetes uses an infra reaper CronJob (scales replicas 0↔1; home PVC + S3 persist); daytona uses native autoStopInterval (filesystem persists); e2b uses native lifecycle.onTimeout: "pause" (filesystem + memory snapshot persist). A reserved sandbox is reconnected by id on the next call (kubernetes derives a deterministic Sandbox name from the workspace namespace; daytona/ e2b store the id in a persistentSandboxInstance table).

Clean delete (no leaks). Deleting a workspace or account releases its reserved sandboxes: daytona/e2b are torn down explicitly (and their instance rows dropped); kubernetes is reclaimed cluster-side — every reserved Sandbox carries a shutdownTime (shutdownPolicy: Delete) the harness refreshes on each use, so an abandoned Sandbox self-deletes, and the reaper sweeps any orphaned home PVC. There is also a hard-lifetime backstop (lifecycle.maxLifetimeSeconds, default 7 days) so nothing lingers indefinitely.

Background jobs + async_status

Reserved sandboxes can run detached background jobs that outlive the request. bash gains a background: true flag; it starts the work as a detached session in the sandbox and returns a resultId immediately:

bash { command: "uv run train.py", background: true } → resultId
async_status { resultId } → running | completed (with logs) | failed
async_status { resultId, action: "logs" } → tail the job output
async_status { resultId, action: "stop" } → terminate the job

Auto-delivery. When the job finishes it POSTs its result to the harness /sandbox-jobs/<resultId>/complete endpoint, authenticated by a per-job token (not the account key — no account secret ever enters the sandbox). The harness settles the row and resumes the conversation with the result injected, so the model does not have to poll. The follow-up is then delivered back to wherever the turn came from:

OriginDelivery
Chat channel (Telegram/Slack/Discord/Zalo/Pancake/GitHub)pushed into the chat via the channel's sendText (rebuilt from the stored routing)
WebSocketrepublished to the durable conversation stream (replays on reconnect)
Direct/async APIsettled for /status polling; config.hooks.webhook also fires agent.finished

Polling with async_status is still available to check progress or fetch a result sooner. Discord delivers a delayed reply with the bot token (its interaction token expires ~15 min); the bot must have Send Messages permission in the channel.

async_status is auto-registered whenever the agent has a persistent sandbox or any config.tools entry marked async: true, and only resolves a resultId for its own conversation. Jobs are tracked in the AsyncToolResult table.

Ownership & limits. Each sandbox caps concurrent background jobs (10), and a job that is killed when the sandbox is recreated/scaled-to-0 reports as failed (it stamps the launching boot id, so a stale .running marker is never read as "running forever"). The idle reaper never pauses a sandbox while a job is still running.

Network note (kubernetes): auto-delivery requires the sandbox pod to reach the harness Function URL. Daytona/E2B sandboxes have outbound internet by default; for the kubernetes provider the cluster must allow pods egress to the Function URL. Without egress the job still runs and async_status polling still works — only the automatic push-back is skipped.

WebSocket delivery additionally requires the cluster's NATS to expose a WebSocket listener/gateway (infra repo, applied via CI/CD); the durable stream persists regardless, so a client replays on reconnect. See Architecture → WebSocket Gateway.

Model-facing workspace contract

All workspace-backed sandbox providers should feel like a normal Linux project checkout:

pwd # current workspace directory
ls # files in this workspace
python3 script.py # run files directly
node app.js

The model should not need provider-specific paths. For bash, the harness starts each command in the selected workspace directory, so examples should use relative paths (analysis.json, src/index.ts). The dedicated file tools also take workspace-relative paths.

Provider implementation paths are still useful for debugging:

ProviderWorkspace-backed bash cwdUnderlying mount path
lambda/mnt/workspaces/<namespace>AWS S3 Files at /mnt/workspaces/<namespace>
daytona/mnt/workspaces/<namespace> by defaultmount-s3 at options.workspaceRoot/<namespace>
kubernetes/mnt/workspaces/<namespace> by defaultmount-s3 at options.workspaceRoot/<namespace>
e2b/mnt/workspaces/<namespace> when persistentnative sandbox FS (persists via pause); workspace tools require persistent: true

Keep prompt text small: tell the model "use relative paths." Put provider-specific mount paths in docs and logs, not ordinary task prompts.

Lambda: 4-function topology

The lambda provider deploys the same image as four functions across two axes, and the harness auto-selects one per run. The mount axis comes from whether the run has a workspace namespace; the internet axis comes from sandbox.internet.

internet oninternet off
workspace mountedVPC + NAT + S3 mountVPC, no NAT, S3 mount
no workspaceplain Lambda (fastest)VPC, no NAT, no mount

Function names are wired by SST into four env vars (SANDBOX_FN_{MOUNT,NOMOUNT}_{NET,NONET}). Cost note: the topology uses fck-nat on non-prod (≈10× cheaper than a NAT Gateway) and runs the no-mount + internet-on function with no VPC for free managed egress.

How agents use it

With a workspace attached, the file tools operate on the mount:

write notes/a.txt # base64-piped, creates parent dirs
read notes/a.txt # numbered lines
edit notes/a.txt # exact unique string replacement
glob **/*.py # mtime-sorted matches
grep TODO # ripgrep
bash python3 notes/run.py # run programs directly

With no workspace, only bash is available and each call is a fresh container, so write-and-run in one command:

cat <<'EOF' > /tmp/run.py
print("ok")
EOF
python3 /tmp/run.py

Result shape

bash returns combined stdout+stderr as text. The lambda response carries { ok, runtime, exit_code, timed_out, duration_ms, stdout, stderr }; stdout/stderr are truncated at 256 KB by the image and again at outputLimitBytes harness-side.

Security boundaries

  • child processes run with env_clear() first — no AWS credentials leak into runs
  • workspace and skills buckets block public access
  • the workspace mount is rooted at the sandbox/ access-point prefix (load-bearing; keep in sync with WORKSPACE_MOUNT_PREFIX)
  • file tools normalize paths to the workspace and reject directory traversal
  • workspace-backed bash rejects obvious attempts to use absolute paths, parent traversal, or whole-filesystem scans before the command reaches a provider
  • runtimes is a best-effort allow-list on a general VM: the bash tool rejects obvious disallowed runtime invocations and surfaces the allowed list in its description
  • approvals are governed by the sandbox permissionMode (see Workspace & Sandbox)

Skill files

Skills load from the skills S3 bucket. With a workspace attached, load_skill stages the bundle into the workspace namespace at /.claude/skills/<name> so the agent can read and run it with bash. See Skills.

ConcernCode
Tool registration + permissionModefunctions/harness-processing/tools/index.ts
Tool setfunctions/harness-processing/tools/{bash,read,write,edit,glob,grep}.tool.ts
Provider selectionfunctions/harness-processing/sandbox/index.ts
Run contractfunctions/harness-processing/sandbox/types.ts