June 26, 2026 16 nodes #tech#ai
The Assembly Problem in AI
A map of how building agentic systems out of many separate prompt modules creates a new class of reliability failure, and how that pressure is reshaping both silicon and the engineering profession.
The brief, in full
Agentic systems are no longer one prompt. They are assembled from many modules — tool descriptions, role instructions, retrieved context, sub-agent briefs — and the whole behaves differently from any part. This map traces the failure that emerges from assembly, why composition is its root cause, the discipline forming to contain it, and the hardware shift that the same pressure is driving.
Instruction Bleed
directives leak across modules
When several prompt modules share one context window, an instruction meant for one module silently conditions the others. A formatting rule from a tool wrapper steers the reasoning step; a sub-agent's persona colors an unrelated answer. The model has no boundary between modules, so authorship blurs and behavior drifts from what any single module specified.
Silent Drift
no error, just wrong
Instruction bleed rarely throws an exception. The system keeps answering, but tone, format, or priorities have shifted from the spec. Because nothing crashes, the failure hides in passing outputs and only surfaces as a slow erosion of trust, which makes it far harder to catch than a hard fault.
Emergent at Assembly
parts pass, the whole fails
Each module can be tested in isolation and look correct, yet the assembled system misbehaves. The defect lives in the interaction, not the components, so unit-level confidence gives false assurance. This is the signature of an emergent failure and the reason composition deserves its own testing layer.
Composition Causes It
the context window has no walls
The interference is not a bug in any one module — it is a property of how they are joined. A single flat context concatenates everything, so attention can reach across notional boundaries that exist only in the author's head. The more modules you assemble, the more cross-talk paths you create, and reliability degrades faster than parts are added.
Shared Context Coupling
one window, many authors
Modules written by different people, teams, or even other agents end up in the same prompt with no namespace between them. They become tightly coupled by accident, so a change in one module can break another that never referenced it. Coupling that nobody designed is the hardest kind to reason about.
Scaling Penalty
more modules, more cross-talk
Interference paths grow roughly with the number of module pairs, not the number of modules, so reliability gets harder superlinearly as systems grow. The very modularity that makes agents easy to extend is what makes them progressively harder to keep correct.
Agent Reliability Engineering
isolation as a job, not an afterthought
Containing assembly failures is becoming a named specialization. It treats module isolation, instruction scoping, and interference evaluation as first-class engineering work rather than prompt tinkering. The role mirrors how site-reliability engineering once split off from general operations once systems grew too complex to keep reliable by hand.
Module Isolation
fence each instruction's scope
The first mitigation is to give modules real boundaries — scoped instructions, separated contexts, or sub-agents that cannot see each other's directives. Isolation trades a little raw context-sharing for predictability, much as process isolation once tamed shared-memory bugs in operating systems.
Interference Evals
test the seams, not the parts
Beyond isolating modules, the discipline builds evaluations aimed at the joints: does adding module B change module A's behavior, and by how much. Measuring interference directly turns a vague reliability worry into a regression you can track and gate on.
Silicon Response
push inference onto the device
More modules means more tokens, more passes, and more latency for every assembled call. One structural answer is to move inference off shared servers and onto dedicated on-device silicon, so a composed agent can run its many steps locally, cheaply, and predictably. The reliability pressure of assembly is now visible in chip roadmaps.
On-Device Inference Silicon
chips designed around running models
A new wave of consumer silicon is being shaped first around AI inference rather than general compute, even at the cost of skipping conventional high-end parts. Putting a capable model close to the user lets a multi-step agent execute its modules without round-tripping to a datacenter for every call.
open_in_new startupxo.com/ko/news/2026/06/apple-m7-on-device-ai-siliconPredictable Local Cost
latency and price stop drifting
When a composed agent runs locally, each added module costs known device cycles instead of metered, variable cloud calls. Predictable cost and latency are themselves a form of reliability: the system behaves the same on the tenth step as the first, regardless of network or pricing weather.
Discipline Spillover
reliability work crosses domains
The push for predictable behavior connects hardware and process: silicon that makes local runs stable and engineers who make module composition stable are solving the same reliability problem from two ends. Each makes the other's job tractable.
Human Reliability Tangent
we are all trying here
Reliability is not only a machine property. People too are assembled from competing scripts — duty, fear, desire — and behave unpredictably when those modules collide, a tension that on-location human drama keeps re-staging. A small cultural touchpoint that the abstract failure mode quietly rhymes with.
open_in_new hizine.net/ko/titles/we-are-all-trying-hereColliding Scripts
drama as a reliability mirror
Stories set in a shared place put characters' conflicting motives in one frame and watch them interfere, which is the human version of modules sharing one context. The location guide is a pointer to that touchpoint and keeps the tangent grounded in something concrete rather than purely abstract.