The Enterprise AI OS: My Thesis for the Next Five Years

The model is a commodity. I am willing to say that out loud now, two years into building a venture on the opposite of that fear.

Frontier labs will keep trading the lead every few months. The price of a token keeps falling. A capability that was a moat in spring is table stakes by autumn. If your company’s entire AI strategy is “we picked the best model,” you have built your house on a thing that is, by design, trying to make itself cheap. So the interesting question stopped being which model and became: what is the layer that does not get commoditized away, the one a company still depends on after the model underneath has been swapped out three times.

My answer, and the bet I left a director seat to make, is that the durable layer is an operating system. Not a metaphor I am reaching for. I mean the thing an OS actually is: a layer that sits between programs and hardware and gives every program an identity, a permission boundary, a way to call out to devices, a memory it can trust, and a log of what happened. Swap the model for the CPU and the enterprise’s systems for the hardware, and you are describing exactly what an agent needs and almost no one is selling.

Why the incumbents miss it

Go look at what the big players are actually shipping. One camp sells models: bigger context, better reasoning, a price cut on Tuesday. The other camp sells apps: a copilot bolted onto the product you already pay them for, a chat box in the corner of the CRM. Both are real businesses. Neither is the operating system.

They miss it for the same reason most companies miss the layer they are not structured to sell. A model lab is organized to make a model. Its whole apparatus, the researchers, the eval suites, the compute budget, points at one number going up. The connective tissue between that model and a company’s twenty years of accumulated systems is, to a model lab, somebody else’s integration problem. And the app vendor sells you a feature inside their walls; they have no incentive to build the thing that lets an agent reach across into a competitor’s system too. The OS is the part that belongs to nobody’s roadmap because it is nobody’s product. (This is usually where the durable businesses hide. The boring middle that everyone assumes someone else will handle.)

Here is the part nobody tells you. The hard problem in enterprise AI in 2026 is not reasoning. The models reason fine. The hard problem is that a reasoning engine, on its own, has no identity you can hold accountable, no permissions you can scope, no memory you can audit, and no record of what it touched. It is a brilliant contractor you let into the building with no badge, no access list, and no security camera. Nobody sane runs their company that way, which is exactly why the impressive demos keep dying on the threshold of production.

The layers

So what does this OS actually contain. Five layers, and they are not equally hard.

The enterprise AI OS sits between the models and the company's existing systems, with five layers: identity and permissions, tools and MCP, memory, orchestration, and an audit trail spanning all of them

The model is interchangeable. The systems are inherited. The OS in the middle is the part you own.

Identity and permissions comes first because nothing above it is safe without it. An agent is not a user and it is not a service account, and pretending it is either gets you breached. It needs its own identity, scoped to act on behalf of a specific person within that person’s rights and not one permission wider. When the finance agent asks the ledger a question, the ledger needs to know whose authority it is borrowing.

Tools is the layer that has matured most, and we have the Model Context Protocol to thank for it. MCP is about a year and a half old now and it did the unglamorous thing that actually mattered: it standardized how an agent discovers and calls a tool, so you stop writing a bespoke adapter for every system and every model pairing. It is the USB-C of this stack, and I have written before about why I bet on it early. The protocol is not the moat. It is the thing that makes the moat buildable.

Memory is where most teams are still naive. A context window is not memory. Memory is what an agent knows about your business across sessions, the durable, queryable, permission-aware store of state that lets the second conversation be smarter than the first. Get this wrong and every agent interaction starts from zero, which users experience as an assistant with amnesia and a confident voice.

Orchestration is the routing layer: which agent, which tool, which model, in what order, with what fallback when a step fails. Necessary, and increasingly well understood. Not, in my opinion, the part that decides who wins.

The hard, valuable part

The part that decides who wins is the one I have drawn cutting across all the others, because it is not a layer you can add at the end. The transaction fabric: agents that can act, not just answer, and do it safely and auditably over real systems that move real money and real records.

Reading is easy. An agent that summarizes your invoices is a parlor trick and the bar for it is low, because the worst case is a wrong sentence a human catches. An agent that posts a journal entry, releases a payment, amends a customer record, that is a different universe of risk, and it is the universe where the value actually is. Nobody pays enterprise money for a chatbot that talks about the work. They pay for the work to get done.

I learned the weight of this building a reconciliation platform over institutional finance, before I ever called any of it an operating system. The agentic part, the narration of a cash-flow anomaly, was a few weeks of work. The fabric underneath it, the part that let an agent touch a ledger while staying fully explainable, scoped, reversible, and logged down to the action, was most of the year. Every irreversible action gates behind a human who is shown the consequence, not the prompt. Every action an agent takes writes to a record a regulator could read cold. That audit trail is not a feature you sprinkle on top. It is the floor the whole thing stands on, which is why I drew it spanning the full width rather than sitting in a box.

That is the inversion at the center of my thesis. The scarce, defensible thing is not autonomy. Letting an agent do more on its own is the easy direction and mostly the wrong one. The scarce thing is letting an agent act while remaining something a company can trust, govern, and answer for. Trust is the product. The model is just the engine you bolted into the chassis, and you will replace it long before you replace the chassis.

Five years out, I think the model question looks the way “which database” looks today: a real decision, made once a quarter, by people who are not in the room for the meeting that matters. The meeting that matters is about the layer in the middle. The one nobody is structured to sell.

So I am building it. If I am wrong, I am wrong about the most expensive thing in the industry. What would change your mind, or mine?