The Incomplete Agent

Every AI product shipping today is architecturally shallow. A 50-year-old formula explains why, and what comes next.

Mar 18, 2026

$29 billion for Cursor. $9 billion for Replit. $20 billion for Perplexity. $10 billion for Cognition.

Add it up and you’re looking at nearly $70 billion in combined valuation across four companies that run the same architecture: user prompt in, LLM response out.

None of them remember what they said yesterday. None of them check whether they were right. None of them scan the world between prompts. When you close the tab, they cease to exist.

These are not intelligent systems. They are expensive autocomplete with a loading spinner.

I don’t say this to be dismissive. I use these tools every day. Cursor changed how I build software. But there is a difference between a tool that is useful and a system that is viable, and the entire industry is conflating the two.

The Seven Failures

Every agentic AI product on the market today shares the same structural deficiencies. Not bugs. Not product gaps. Architectural constraints that no amount of prompt engineering will fix.

No state persistence. Every conversation starts from zero. The agent that helped you refactor your auth system last Tuesday has no memory of your auth system. You re-explain context like a patient re-describing symptoms to a doctor who shreds their file after every visit.

No self-correction. If the agent gave you a wrong answer last week, it doesn’t know. It can’t adjust. It will make the same class of error indefinitely because it has no mechanism to learn from its own outputs.

No environmental scanning. Between prompts, the system is dormant. The world changes — new data drops, markets move, dependencies break — and the agent sits inert until you type something. It’s reactive in the most literal sense.

No conflict resolution. When data contradicts, the agent hedges. It says “on one hand... on the other hand” and gives you a paragraph that sounds balanced but commits to nothing. Real systems need to surface disagreement as signal, not bury it in diplomatic prose.

No ground truth verification. This is the hallucination problem, but stated correctly. The issue isn’t that LLMs make things up. It’s that there is no mechanism anywhere in the pipeline that checks whether the output corresponds to reality. The system has no concept of “right” or “wrong” — only “plausible.”

Static outputs. An analysis generated on Monday is stale by Wednesday. But it doesn’t know that. It sits in your chat history with the same confidence it had when it was fresh, silently misleading anyone who references it.

Single-model brittleness. One model’s biases become the system’s blind spots. When your entire product is a wrapper around GPT-4 or Claude, you inherit every systematic error that model makes. You’ve built a monoculture.

These seven failures are not independent. They compound. A system with no memory can’t self-correct. A system that doesn’t scan the environment can’t know its outputs are stale. A system with no ground truth verification can’t resolve conflicts. Pull on any thread and the others unravel.

No amount of RAG, fine-tuning, or chain-of-thought prompting fixes this. The problem is upstream of the model. The problem is the architecture.

A Formula Nobody Uses

In 1972, a British cybernetician named Stafford Beer published Brain of the Firm. The book asked a question that sounds simple but turns out to be profound: what structural properties must a system have to remain viable?

Not optimal. Not efficient. Viable — meaning capable of maintaining its own existence in a changing environment.

Beer spent two decades studying organizations, biological systems, and governments. He identified five necessary subsystems. Remove any one and the system degrades. Eventually, it dies.

This framework is called the Viable System Model. It has been applied to corporations, militaries, national governments, and healthcare systems. It has never — to my knowledge — been systematically applied to AI.

Which is strange, because the failures I just described map perfectly onto its gaps.

The Five Systems

I’ll translate Beer’s model into the language of AI architecture, because the original cybernetic terminology is dense and this matters too much to leave locked in a 1972 textbook.

System 1 — Operations. The workers doing the work. In AI terms: the agents pulling data, running analyses, generating text. This is where 99% of the industry’s engineering effort goes. Better models, faster inference, cheaper tokens. System 1 improvements.

System 2 — Coordination. The mechanism that dampens oscillation between operational units. When two agents produce contradictory outputs, System 2 doesn’t pick a winner. It surfaces the disagreement as information. “Model A says recession probability is 35%. Model B says 60%. Here’s why they diverge.” That divergence is the signal. Most AI products bury it.

System 3 — Control. Resource allocation and coherence. The orchestrator ensuring that no part of the system contradicts another. If your research agent says the market is bearish and your portfolio agent is adding long positions, System 3 catches that. This is what people mean when they say “agentic workflow,” but most implementations are thin — a routing layer that dispatches prompts, not a coherence layer that maintains consistency.

System 3-star — Audit. This is the one nobody has. Sporadic, direct verification of outputs against ground truth. Not “does this sound right?” but “is this actually right?” System 3-star bypasses the normal reporting channels and checks the raw data. In an AI system, this means comparing the agent’s claims against real-world observations. Automatically. Continuously.

System 4 — Intelligence. Environmental scanning. The part of the system that watches the world between prompts, detects regime changes, identifies emerging threats and opportunities. Not reactive. Proactive. The system that notices interest rates just moved 50 basis points and propagates that information to every downstream component before anyone asks.

System 5 — Policy. Identity, purpose, and the balance between stability and adaptation. What does this system optimize for? What constraints does it respect? System 5 prevents the system from chasing every signal and losing coherence. It’s the constitutional layer.

Now look at the current landscape through this lens.

Most AI products are System 1 with a thin System 3 wrapper. Maybe some state persistence bolted on. No System 2 — disagreement is buried. No System 3-star — hallucinations persist because nobody checks. No System 4 — dormant between prompts. No System 5 — optimizes for token prediction, not for anything the user actually cares about.

This is a worker with a foreman. Not a viable system.

Who’s Getting Closer

Some companies have figured out pieces of this, even if they haven’t read Beer.

Harvey AI scrapped their proprietary model and rebuilt their legal AI to route between multiple LLMs. That’s a System 1 improvement with partial System 3 — they’re at least acknowledging that one model isn’t enough and that routing requires coherence logic.

Cognition’s Devin maintains a persistent workspace that indexes repositories and retains context across sessions. That’s partial System 4 — the system has some awareness of its environment beyond the immediate prompt.

Mem0 built a memory layer for LLM applications. Partial state persistence. A necessary but insufficient piece of the puzzle.

Bridgewater’s AIA Labs runs multi-model ensembles for macroeconomic analysis, treating model disagreement as a signal rather than noise. That’s the closest thing I’ve seen to System 2 in production — using divergence between models as information, not a bug to be averaged away.

But nobody has all five systems. And critically, nobody has System 3-star. Nobody is running systematic audits of their AI’s outputs against ground truth data. The industry has accepted hallucination as a feature of the technology rather than a failure of the architecture.

The Missing Ingredient

The difference between what exists and what’s needed has a name. It’s the difference between an open loop and a closed loop.

Open loop: prompt goes in, response comes out. Done. The system has no way to know if it was right, no mechanism to adjust, no connection to reality after the output is generated.

Closed loop: prompt goes in, response comes out, reality is observed, the observation is compared to the prediction, the delta feeds back into the system, and the next output is better than the last.

This is not a new idea. Weather forecasting has operated this way for thirty years. Every six hours, ensemble Kalman filters assimilate new observations — satellite data, weather balloons, surface stations — and update the forecast state. The forecast on Tuesday is better than the forecast on Monday because Tuesday’s forecast has absorbed Monday’s errors.

Finance does this. Regime detection models update their state estimates as new data arrives. The system’s view of the world at 4pm reflects everything it learned from market data between 9:30am and 4pm.

AI doesn’t do this. The response you get on day 100 is generated with the same architecture, the same weights, and the same ignorance of its own track record as the response on day 1.

What It Actually Looks Like

I’ve spent the past year working on what it means to close this loop in practice. Two problems stood out as fundamental, and solving them taught me more about viable AI architecture than any paper I’ve read.

The bridge problem. LLMs produce qualitative reasoning. “Tariff escalation increases recession risk.” “The labor market is softening.” These are useful observations, but they’re not math. If you want a system that updates its beliefs quantitatively — the way a Kalman filter updates a state estimate — you need to convert language into numbers.

The solution I landed on: treat LLM probability estimates as pseudo-observations. Synthetic data points that a state estimation filter can process. The LLM says “recession probability should increase” and this becomes a numerical observation that nudges the system’s state.

But here’s the critical design choice. The bridge includes a confidence discount. LLM-generated signals are weighted far less than real data. If a real economic indicator has measurement noise of 0.5, an LLM pseudo-observation gets a noise parameter of 5.0 or higher. The math automatically trusts observed reality ten times more than the model’s opinion.

This is System 3-star — audit — implemented as math rather than vibes. The ground truth doesn’t just outweigh the narrative. It outweighs it by a known, tunable, explicit factor. When real data comes in and contradicts what the LLM suggested, the system doesn’t debate. It updates. The observation wins.

The decay problem. An LLM analyzes a Fed speech on Monday. Useful analysis. But by Friday, three new data releases have come in. The speech analysis is stale — yet it’s still influencing the system’s probability estimates with the same weight it had when it was generated.

Without a solution, the system accumulates stale opinions like sediment. Old analyses from a different regime keep nudging probabilities in directions that are no longer relevant.

The fix: every LLM signal gets a half-life. The influence decays exponentially. Breaking news gets a six-hour half-life. Weekly analysis gets three days. A structural thesis — “we’re in a tightening cycle” — gets fourteen days.

Recent signals matter more. Old signals fade. The system doesn’t just watch the world — it forgets appropriately.

This is System 4 — environmental scanning — but the real insight is that intelligence isn’t just about what you perceive. It’s about what you let go. A system that remembers everything with equal weight is not intelligent. It’s hoarding.

The Formula

Here’s where this lands.

The Viable System Model says a system needs five layers operating in recursive coordination to maintain viability. Operations. Coordination. Control. Audit. Intelligence. Policy.

The AI industry has spent five years and hundreds of billions of dollars optimizing System 1 — making the workers faster, cheaper, and more articulate. The results are genuinely impressive. The models are extraordinary.

But a system that is all workers and no feedback is not an agent. It’s a parlor trick with an API. It generates outputs. It doesn’t learn from them. It doesn’t check them. It doesn’t notice when the world changes. It doesn’t resolve its own contradictions. It doesn’t have a concept of being wrong.

The formula for fixing this is not new. It’s fifty years old. It was published by a man who applied it to the Chilean economy and the British steel industry. It has been validated in domains ranging from military command structures to hospital management systems.

Nobody has applied it to intelligence itself.

The next generation of AI products — the ones that will make today’s agents look like toys — will not win on model quality. They will win on architecture. On closed loops. On systems that get better with time because they were built to.

The incomplete agent ships fast, demos well, and raises billions. But it doesn’t learn. It doesn’t adapt. It doesn’t survive.

The formula exists. The question is who builds it first.

References

Beer, Stafford. Brain of the Firm. Allen Lane, 1972.
Beer, Stafford. The Heart of Enterprise. John Wiley & Sons, 1979.
Ashby, W. Ross. “Requisite Variety and Its Implications for the Control of Complex Systems.” Cybernetica, 1958.
Evensen, Geir. “Sequential Data Assimilation with a Nonlinear Quasi-Geostrophic Model Using Monte Carlo Methods to Forecast Error Statistics.” Journal of Geophysical Research, 1994.
IBM Research. “Why Agentic AI Needs a Systems Theory.” IBM Research Blog, 2025.
Espejo, Raul and Harnden, Roger, eds. The Viable System Model: Interpretations and Applications of Stafford Beer’s VSM. John Wiley & Sons, 1989.

tril-lion 无限

Discussion about this post

Ready for more?