Architecture and Workflows
The deployed system is a multi-account serverless agent harness. Accounts are managed by account-manage; runtime traffic is handled by harness-processing.
Runtime Layer
Both Lambdas use the Bun custom runtime and startStreamingRuntime() from functions/_shared/runtime.ts.
Runtime boundary:
- SST points Lambda
handlertobootstrap. - The runtime passes the full Function URL event envelope into each handler.
afterResponselets channel webhooks acknowledge quickly, then continue work after the HTTP response.
High-Level Architecture
Account Routing
Every runtime request resolves an account and an account-owned agent before agent work begins.
The diagrams show the logical ownership of runtime config. In code, integrations.ts resolves the account once, loads the selected agent, then passes the runtime config into handler.ts and session.ts to avoid extra lookups during the turn. The runtime projection keeps model, tool, workspace, and skills config, but strips channel credentials before the agent loop.
Root provider webhooks are not accepted. Provider webhook URLs must include the accountId, agentId, and channel name.
Account Management
Provider secrets are not returned in normal account responses. Secret-like fields are redacted as ********; sending that value back in a patch preserves the existing stored secret.
Deleting an account runs account-scoped cleanup before removing the account record. The cleanup deletes runtime rows whose keys are prefixed with acct:{accountId}: and removes the current account filesystem namespaces from S3.
Direct and Async API
The async path stays inside harness-processing: POST /async creates AsyncAgentResult, returns a status URL, and starts an internal Lambda Event invocation. Subagents and same-invocation async tools run inside that Lambda. external-dispatch async tools store delivery metadata and continue later through /async-tools/{resultId}/complete.
Direct sync and async POST access is controlled by ENABLE_DIRECT_API, which defaults to true. When disabled, POST / and POST /async are closed while channel webhooks and internal worker invocations remain available.
Cron Jobs
Cron jobs are included in the default stack as a small scheduled-agent add-on, not a workflow DSL. account-manage owns cron job create, update, delete, and list operations: it stores the account-scoped cron job in DynamoDB and creates, updates, or deletes the matching EventBridge Scheduler schedule. EventBridge Scheduler wakes harness-processing with { kind: "cron-job", accountId, cronJobId }, and the harness starts the configured agent asynchronously.
Developers who need custom chaining, cleanup, polling, or external workflow behavior can deploy their own scheduled worker and call the existing direct or async API.
Channel Webhooks
Customers talk to the provider bot/app owned by the account. They never receive an account secret.
WebSocket Gateway (durable NATS JetStream)
Streaming responses are published to a durable, conversation-scoped JetStream stream. The platform owns the durable stream and a documented replay contract; the gateway that relays to a browser is the caller's application (this is a PaaS — we provide the connection, not the client). Because the stream is keyed by conversation (not connection), a client that drops can reconnect with a fresh socket and replay events it missed — including a background job's result that landed after the original connection closed.
The gateway (the caller's service) owns client auth, the subscription, and
the nats-worker Lambda invocation. Lambda does one core publish per chunk;
the bound WS_RESPONSES stream captures that same message for replay.
NATS subject patterns:
| Subject | Direction | Purpose |
|---|---|---|
v1.{accountId}.{agentId}.ws.response.{convToken} | Lambda → Gateway | Vercel AI SDK stream events (step-start, text, tool-call, finish, error, …) |
convToken = base64url(publicConversationKey) — a single NATS-safe token.
One publish, two read paths — and not double storage. A core publish is fanned out live to any core subscriber on the subject and captured once by the stream. Core publish stores nothing itself — the stream is the only stored copy, so this is not duplicated storage. The platform exposes both read paths and lets the application choose how to switch:
- Connected →
subscribeConversationLive(coresubscribe) — lowest latency. - Dropped mid-stream → reconnect & resume:
readConversationStream(JetStream consumer) fromstartSequence(lastJsMsg.seq) orstartTime, catch up the missed events, then continue live. This is JetStream's only job — resuming a turn that is still streaming. - Reconnect after the turn finished → there is nothing to resume: the buffer was purged at persist time, so read the completed turn from the conversation DB.
Switching policy is the consuming app's call — the platform only guarantees a
monotonic cursor (JsMsg.seq for stream readers; the envelope sequence/eventId
for core subscribers) so a core→stream switch dedupes with a trivial seq check.
Notes:
- Speed: core publish is fire-and-forget (no per-token PubAck round-trip), a
shared
TextEncoder, and the subject precomputed once per publisher — so token publishing stays on the fast path. - Transport (by URL scheme):
connectNatsinnats.tsselects the client fromNATS_URL:wss:///ws://→ WebSocket (nats.ws) for out-of-cluster callers like Lambda (the cluster exposes only awss://ingress externally);nats:///tls://→ core TCP (nats) for in-cluster callers on the internal network (lower latency; core4222is not exposed externally). Moving a service in-cluster is then aNATS_URLchange, not a code change.NATS_TOKENcarries the token-auth credential (omit for an unauthenticated server). - No duplicates: a single read path never sees a message twice; each publish
also carries a
Nats-Msg-Id(eventId:sequence) so the stream'sduplicate_window(~2 min) collapses any publish retry. - Storage (kept minimal): the stream is an in-flight resume buffer, not
the source of truth — the conversation history DB is. So it holds as little as
possible:
- Purge on persist: when a turn finishes and is saved to the DB, the server
(
LiveNatsPublisher.purge, right after the terminaldone) deletes that conversation from the stream — a later reconnect reads the saved turn from the DB, so keeping the buffer would be pointless. The external-async continuation re-enters the same path, so it purges when it finishes. - Short backstop
max_age(~3 min): only for turns that never persist cleanly (e.g. an error/crash before the purge); they expire instead of piling up. - Other knobs in
nats.ts:RESPONSE_STREAM_STORAGE(Filedefault;Memoryis faster/cheaper but lost on restart) andmax_msgs_per_subject. The retention knobs are mutable, soensureResponseStreamsyncs them onto the existing stream on update. HAreplicas: 3multiplies storage by 3.
- Purge on persist: when a turn finishes and is saved to the DB, the server
(
connectionIdis now only a routing/label field on event headers — it no longer scopes the subject, so overlapping turns on one conversation share a stream (group per turn withheaders.eventId).- Background jobs launched over a WebSocket turn publish their result to the same conversation stream, so they survive the socket and replay on reconnect.
ENABLE_WEBSOCKET=trueandNATS_URLare required fornats-workerinvocations (plusNATS_TOKENfor a token-auth server). When WebSocket is disabled, the direct API stays SSE-only and NATS config is ignored.
Infra (lives in the infra repo, applied via CI/CD): the cluster NATS runs JetStream with a WebSocket listener + Traefik ingress at
wss://nats.beeblast.co(token auth via thenats-authsecret) and a file-backed JetStream PVC — so the Lambda connects overwss://today. For production durability, enable JetStream clustering (replicas: 3, which multiplies storage by 3). Core4222stays cluster-internal for future in-cluster callers (see the Transport note above).
Deferred delivery & resume (background jobs)
A detached sandbox job outlives the Lambda that launched it, so its result has to be delivered in a later invocation and routed back to wherever the turn came from. The mechanism is a small delivery descriptor carried on the Session and persisted with the job, so no live connection state needs to survive — only an identifier the next invocation can rebuild from.
- What's saved, and why it's safe.
Session.delivery(anAsyncToolDelivery) describes the origin: a chat channel ({channelName, source}— the routing payload only, never credentials), a WebSocket conversation, or plain async.bash background:truecopies it onto theAsyncToolResultrow in DynamoDB alongside the per-jobcompletionToken. No account secret is stored or enters the sandbox; channel credentials are re-fetched (decrypted) from the agent config at delivery time. - Reinvoke & continue. When the job POSTs its completion (authenticated by the
per-job token), the harness settles the row and reuses the existing async-tool
continuation path: it rebuilds the turn from
parentEventId/conversationKey, injects the job result, and runs the agent loop so it continues where it left off. This is the same settle→continue pipeline used byexternal-dispatchasync tools — background jobs add only thedeliveryrouting on top. - Deliver to origin. After the loop, the follow-up is pushed to the recorded
origin: a channel
sendText, a durable JetStream publish (replayed on reconnect), or a status-row settle. Seebash.tool.ts,handler.ts(continueAfterAsyncToolSettlement,pushReplyToChannel), andintegrations.ts(sendChannelReply).
Sandbox & Workspace Boundaries
Sandbox (compute) and workspace (persistent S3 files) are independent,
account-scoped records, referenced from agent config by id (sandbox, workspaces). The
handler resolves those references (resolveAgentRuntime) before the agent loop. A
sandbox can be attached agent-wide (config.sandbox) or per workspace
(workspaces[].sandbox, overriding the agent-level one). Each workspace's effective
sandbox decides its tools: read/write/edit/glob/grep/bash when present, or
read-only read/glob when absent (via a read-only mount by default, or direct S3 with the
sandbox: null opt-out); bash is also exposed stateless when there is no workspace. Each tool's permissionMode (edit/ask/bypass)
is resolved per call from the selected workspace.
Every sandbox-backed tool compiles to a single run against the provider (lambda/e2b/
daytona/kubernetes). The lambda provider deploys the same image as four functions
(workspace mount × internet) and auto-selects one per run. A workspace's namespace is
derived from accountId:workspaceId, so agents that reference the same workspaceId share
files — including across the sandbox-backed and read-only S3 paths. A workspace with no
sandbox still serves MEMORY.md via the S3 API. workspace.harness.enabled=false
suppresses only the MEMORY/TASKS guidance.
See Workspace & Sandbox for the full model.
Model and Tool Configuration
Agents control model selection, channel credentials, optional skills, subagents, and tool access through encrypted agent config. harness.ts resolves config.model; tools/index.ts exposes the sandbox tools from a referenced sandbox (+ workspaces), subagent dispatch from config.subagent, search/research tools from config.tools, and load_skill when config.skills.enabled is true and config.skills.allowed has paths. See the API Reference for the complete AgentConfig schema.
Storage Boundaries
AccountConfig: account metadata and account secret hash.AgentConfig: account-owned encrypted runtime config payloads.Conversations: normalized model messages by account-scopedconversationKey.ProcessedEvents: dedup markers and short-lived conversation lease records.AsyncAgentResult: async direct API and subagent state for/status/{eventId}polling.AsyncToolResult: async external tool call state, same-table dispatch-group rows for external fan-in, delivery metadata for non-SSE continuations, and structured outputs for parent result injection.- S3 workspace bucket: account/agent-scoped workspace files and staged skill bundles.
- S3 skills bucket: account-scoped skill bundles under
<accountId>/<skill-name>.
Tool execution is inline unless an agent-configured local execute tool sets async: true. execution chooses same-invocation or external-dispatch; SSE supports only same-invocation, while /async, channels, and NATS support both. Subagents are in-process child agent loops; they do not require child Lambda workers.