Why the agent layer skipped the orchestration startups.

The orchestration-layer thesis circa late 2023 was that the agent ecosystem would develop the way the early cloud ecosystem developed: a foundation layer that handled the primitive (compute, in cloud; capability, in agents), an orchestration layer that handled coordination across primitives, and an application layer that handled the operator-facing workflow. The bet was that the orchestration layer would be a category of its own, sitting between the foundation models and the deployed application, and would collect the kind of margin that orchestration platforms collected in adjacent categories: container orchestration, workflow orchestration, data orchestration.
The companies raised against this thesis were in trouble across 2024. Most of them did fail. The failure is not because the agent ecosystem failed. It is because the foundation-model layer absorbed the orchestration primitives directly, on a release cadence the orchestration startups could not match, and the venture math the orchestration startups raised on no longer pencils.
The technical reason the absorption happened faster than the orchestration thesis predicted is worth thinking about, because it explains which categories of agent infrastructure are still defensible and which are not. The reason is that the agent-orchestration problem turned out to be much closer to the foundation-model training problem than to the cloud-orchestration problem. Coordinating multiple model calls inside a workflow looks superficially like coordinating multiple containers inside a Kubernetes cluster. It is not. The container coordination problem can be solved by a layer outside the containers because the containers themselves do not need to know how they are being coordinated. The model-call coordination problem is much more efficiently solved inside the model itself, because the model is the entity making the planning decisions, and a planning decision that requires a round-trip to an external orchestration layer pays a latency penalty and a context-loss penalty that the in-model planner does not pay. Once the foundation labs realized this, the move was straightforward: ship planning, tool-use, multi-step decomposition, and inter-call state management as native primitives inside the model API. Each of those primitives, shipped natively, eats one of the wedges the orchestration startups raised against.
The labs shipped this fast. By the third quarter of 2023 the major model APIs were carrying tool-use as a native primitive. By early 2024 they were carrying multi-step task decomposition. The cadence on which these primitives ship is approximately quarterly, and each quarterly release closes a gap that the orchestration startups had been positioning their roadmap around.
The startups that raised in 2022 with eighteen months of runway and a sequencing plan that assumed the foundation layer would stay primitive for two more years were in the second half of their runway with their sequencing plan invalidated. The new round did not come. The customer pipeline that was lining up around their differentiated coordination layer is now evaluating against the native primitives on the foundation API, which are good enough for most of the use cases the startup was targeting and an order of magnitude cheaper.
This is the consolidation story. It is a familiar shape.
The interesting question is the counterfactual one: which orchestration patterns _would_ have survived the consolidation, were genuinely defensible against the foundation-layer absorption, and did not ship in time to be in the market when the consolidation arrived. There are three.
The first is durable execution. Long-running agent workflows, in production, need state management that survives across model API outages, across model deprecations, across the operator-grade requirement to audit what happened in a workflow that ran six months ago. Durable execution is the category that workflow orchestrators in adjacent domains have spent the last decade getting right (Temporal, Step Functions, the older job-queue patterns), and the foundation-model layer has not addressed it because the foundation labs are not in the business of running operator infrastructure. The startup that built durable execution as the core primitive, with model-API-agnostic state management, would have a defensible product even after the foundation layer absorbed planning and tool-use, because durable execution is genuinely outside the model. The startups that raised on this in 2022 either pivoted off it because the team's instinct was that the planning layer was the more attention-getting wedge, or they shipped it as a feature alongside a planning-layer pitch and got swallowed when the planning layer was absorbed. The pure-play durable-execution-for-agents startup is the company that should have raised in 2022 and would have a clean exit in 2026. None of them did.
The second is model-portfolio routing. Operators in production are not running on a single foundation model; they are running on a portfolio of three to five models with different cost and latency characteristics, routing requests across the portfolio based on the operator's evaluation of which model performs best on each kind of task. The routing logic is, structurally, outside any single model API, and the routing logic is operationally consequential because the cost-per-correct-answer differential across models can be an order of magnitude on the workflows the operator is most sensitive to. A serious model-routing layer (with task-level evaluation, cost-tracking, fallback policies, and model-deprecation continuity) is not something the foundation labs are going to ship, because it is in tension with their commercial interest. This is the second category that should have raised in 2022 and would have been clearly defensible in 2026. A few companies are working on it, but the ones that raised against it framed it as an orchestration play rather than a routing play, and they got absorbed alongside the orchestration cohort.
The third is in-tenant deployment infrastructure. The largest operator-level customers (banks, hospitals, defense, federal) are not going to run their agent workloads on the foundation labs' shared infrastructure. They are going to run on dedicated capacity, in their own clouds or their own data centers, with their own keys and their own audit trails. Spinning up that kind of in-tenant agent infrastructure is hard in ways that the foundation labs are reluctant to handle directly, because handling it pulls the labs into a services business that is in tension with their product business. The startup that built in-tenant deployment as the core wedge, with the foundation labs as partners rather than competitors, would have an extremely defensible position, because the operator-grade procurement requirement of the largest customers is non-negotiable and the foundation labs are not going to meet it cleanly. A few companies are in this category. Most of them framed it as an orchestration play and got absorbed; one or two framed it correctly as a deployment play and are quietly building defensible businesses.
It is worth dwelling on why these three categories survived the absorption when the orchestration thesis did not, because the explanation generalizes to the next wave.
The orchestration thesis was a thesis about the _model side_ of the workflow: about what the model does, when, and in what sequence. The model is the entity that does that work, and any layer that sits between the model and the operator and re-implements that work is a layer the model can absorb the moment the lab decides it is worth absorbing.
The three surviving categories are theses about the _operator side_ of the workflow. Durable execution is about what happens when the model is not running: state at rest, state across outages, state across deprecation. Portfolio routing is about what happens before the model is selected: evaluation, cost-tracking, fallback. In-tenant deployment is about where the model runs: which cloud, which keys, which audit trail. None of those are work the model does. None of them are problems the lab can absorb without pulling itself into a business it does not want to be in.
That is the structural test for whether a category survives a foundation-layer absorption: is the work the operator is paying for work the model performs, or work that surrounds the model's performance? If the former, the absorption will come and the absorption will be ugly. If the latter, the category is clean.
The venture math behind the failed orchestration startups is also worth naming, because it explains why the cohort was much larger than the structural argument required. The orchestration thesis was, in 2022 and 2023, the most legible thesis on the agent stack. The investor reading the pitch deck could see the wedge: a layer between the model and the application, doing the coordination work. The wedge looked structurally similar to wedges that had been venture-attractive in the prior decade (Kubernetes, Snowflake, the data-orchestration cohort), and the analogical reasoning carried more weight than the technical reasoning would have if anyone had run the technical reasoning carefully. The result was that capital flowed disproportionately into the orchestration-layer pitch decks, the term sheets clustered, and a cohort of well-funded companies converged on a roadmap that the foundation layer was already preparing to absorb. The cohort's venture math was sound conditional on the absorption taking three years and arriving in two or three quarterly steps. The actual absorption took eighteen months and arrived in a single quarterly step from the dominant lab. The cohort's runway, conditional on the actual absorption, is the runway of a category that has lost. There has been a small number of acquihires at fractional-of-the-last-round prices, a small number of pivots into the operator-side categories that should have been the original thesis, and a much larger number of orderly wind-downs.
All three of these categories have a common shape: they are operator-side problems that the foundation-model layer has neither the incentive nor the operating posture to absorb. Durable execution is an infrastructure problem that the labs do not run. Model-portfolio routing is a procurement-and-evaluation problem that is in tension with the labs' commercial interest. In-tenant deployment is an enterprise-services problem that the labs do not want to be in. Each of the three problems sits on the operator side of a clear line, and a startup positioned on the operator side of that line, against a clear thesis about why the foundation layer will not cross it, would have raised cleanly and would be defensible now.
The startups that took the orchestration thesis literally — coordinate the model calls, manage the in-flight state, decompose the planning — were positioned on the wrong side of the line. The orchestration logic is on the model side, because the model is the planner. The fact that the orchestration thesis was, on the surface, the most legible-to-investors thesis in the agent stack made it the thesis that got the most capital, which is also why the consolidation will produce more failed companies than the argument that holds required. Capital allocation in the agent layer in 2022 and 2023 was not adversely selected; it was attractively selected, against a thesis that mistook surface legibility for technical defensibility. The lesson for the next two waves of agent-stack investment is that the line between the model and the operator is much sharper than the cloud analogy suggested, and the categories that will compound through the consolidation are the ones that sit cleanly on the operator side of that line.
The summary, looking back from mid-2025, is that the agent-orchestration market compressed quickly. The startup pitch decks promising a coordination layer between the foundation model and the application were mostly retired across 2024 and the first half of 2025. Three categories of operator-side infrastructure (durable execution, portfolio routing, in-tenant deployment) survived the compression and will probably consolidate into one or two winners each over the next four to six years.
The operator who is procuring agent infrastructure today should evaluate every vendor against the question of whether the vendor's wedge sits on the model side or the operator side of the line. If it sits on the model side, the foundation labs are coming for it. If it sits cleanly on the operator side, and the vendor has framed it correctly, it is a real category.
The orchestration label is doing more harm than good at the present moment. It conflates the two sides of the line, and the operator-level purchasing decision will require the operator to do the disambiguation themselves.
—TJ