AI Feels Like the Mainframe Era

The most useful analogy for today’s frontier AI is not the internet boom or the mobile revolution. It is the mainframe era: enormous capability concentrated in a few places, accessed remotely, priced by the hour, and surrounded by a priesthood of operators who understand the machine better than the people who depend on it.

Centralization as a feature

When computation was scarce, organizations did not buy computers — they bought time on computers. IBM’s model was not selling boxes to everyone; it was selling access to shared infrastructure with strict quotas, batch queues, and specialized interfaces. The user experience was secondary because the bottleneck was compute itself.

Modern LLM APIs rhyme with this history uncomfortably well. The model weights live in someone else’s datacenter. Inference is metered. Context windows are allocated. Rate limits behave like job schedulers. Developers write prompts the way operators once wrote JCL — carefully, with retries, aware that the machine may reject or truncate the work.

This is not an indictment. Centralization made early computing possible. It may be the only way to train and serve models at current scale. But it does shape what gets built.

What personal computing changed

The PC revolution did not merely make computers cheaper. It changed ownership, latency, and creative control. Software could be written, run, and iterated locally. The feedback loop tightened from days to seconds. Entire categories of applications — spreadsheets, desktop publishing, indie games — became viable because the machine was yours.

AI has not yet had that moment at the model layer. We have local inference experiments and small open weights, but the default path for serious capability still routes through a remote API key and a vendor’s roadmap.

Implications for builders

If you are designing systems today, assume a hybrid decade:

Batch and async workloads will continue to map cleanly onto centralized inference.
Interactive and privacy-sensitive workloads will push toward edge and local models, even when quality is lower.
Tooling will matter as much as model quality — runtime orchestration, retrieval, caching, and evaluation become the “operating system” layer.

The interesting question is not whether AI stays centralized forever. It is which parts of the stack become personal first, and what new applications become possible when the feedback loop tightens.

A working hypothesis

We are pre-PC in AI infrastructure. The smartphone era — if it comes — will be defined less by bigger models and more by composable local runtimes that treat LLMs as one component among many, not the center of gravity.

That shift will feel obvious in hindsight. Right now, it still feels like renting time on a very eloquent mainframe.