【日本語版】Dynamic Workers and the New Runtime Contract for AI Agents
Cloudflare’s Dynamic Workers announcement reframes a question most teams have been postponing: what runtime contract should agent-generated code run under?
参考: https://blog.cloudflare.com/dynamic-workers/
For the last two years, many “agent platforms” have quietly used container queues as a universal execution model. It worked, but only up to a point. Cold starts stay expensive, sandbox density stays low, and policy controls are often bolted on after incidents. Dynamic Workers, with isolate-first execution and very fast startup, push the industry toward a stricter model: agent code should be treated as short-lived, policy-bounded computation with explicit cost and trust boundaries.
Why isolate-first changes product architecture
The improvement is not just speed. It changes where you place control points:
- policy checks can happen per invocation instead of per image build
- execution TTL can be enforced as a first-class reliability guardrail
- tenant isolation becomes measurable via runtime metadata, not only via namespace conventions
This matters because agent workflows are bursty. A customer might generate one tool call, then fifty in one minute. If your runtime unit is “spin a container per burst,” finance sees the tail cost before product sees the usage growth.
The runtime contract teams should adopt now
Treat every agent tool execution as a contract with five fields:
- purpose: what business task is allowed
- scope: what data and APIs can be reached
- budget: max CPU time, memory, and egress
- evidence: logs and artifacts required for audit
- fallback: deterministic failure behavior
Most organizations document only purpose and scope. Budget and evidence are where production programs succeed or fail.
Designing policy as data, not code branches
When teams migrate to isolate runtimes, they often preserve old branching logic in application code. That becomes a maintenance trap. Instead, declare policy in structured data reviewed like code.
A practical schema:
tool_idtenant_tiermax_duration_msnetwork_allowlistsecret_profilerequired_observability_fields
Then enforce this in a shared runtime gateway before execution. This keeps product teams from implementing divergent policy logic per feature.
Security posture: assume prompt-to-code abuse exists
HN discussions this week around compromised packages and trust boundaries reminded teams that modern incidents often start in “normal” automation paths.
参考: https://news.ycombinator.com/
In agent systems, assume one of these will happen:
- generated code attempts forbidden network access
- dependency installation pulls an unexpected transitive artifact
- high-volume retries turn a minor bug into abuse traffic
Mitigations that work in practice:
- no dynamic dependency installs inside production execution path
- per-tool egress policy with explicit DNS allowlists
- retry budgets tied to tenant and operation type
- signed execution manifests attached to every run
Reliability: latency goals without integrity loss
Faster runtime startup invites a dangerous anti-pattern: “just execute more.” Resist it. Instead define SLOs that include integrity metrics.
Recommended SLO bundle:
- P95 agent tool latency
- policy rejection rate (expected and healthy)
- evidence completeness ratio
- rollback activation time
If latency is green but evidence completeness drops, you are accumulating unpayable audit debt.
Cost control for CFO-friendly scaling
A common board question in 2026: “Why did agent usage triple but gross margin not improve?”
Isolate-first platforms help only if you expose business-aware chargeback:
- bill by successful business task, not raw invocation count
- separate exploratory runs from customer-visible runs
- cap low-confidence loops automatically
This is where product analytics and infrastructure finance must share one dashboard. Otherwise usage grows while profitability remains opaque.
45-day migration plan from container-centric execution
Week 1-2: inventory all agent execution entry points and classify risk.
Week 3-4: move low-risk read-only tools to isolate runtime with strict budgets.
Week 5-6: implement evidence contracts (trace id, policy id, artifact hash) and fail closed on missing fields.
Week 7: shift high-value write operations after rehearsal with chaos injections.
Define a clear stop condition: if policy mismatch exceeds threshold, freeze migration and remediate schema drift first.
Closing
Dynamic Workers is less about one vendor and more about a convergence trend: AI agents need runtimes that are fast, ephemeral, and policy-native. Teams that redesign around runtime contracts, auditable evidence, and budgeted execution will out-ship teams still treating agent code as generic background jobs.