Stratechery by Ben Thompson

Amazon’s Durability

Tuesday, May 5, 2026

Listen to Podcast
Listen to this post:

Log in to listen

When it comes to the AI soap opera — there is news every day, and the company on top and the bottom seems to shift by the quarter if not the month — the news that I find most intriguing and instructive this week is about physical goods and logistics. From Bloomberg:

Amazon.com Inc. unveiled a suite of logistics services that will let businesses buy its existing freight and distribution offerings as a package, sending shares of rival delivery companies such as FedEx Corp. and United Parcel Service Inc. lower. The world’s largest online retailer on Monday announced Amazon Supply Chain Services (ASCS), offering other companies access to its “full portfolio” of supply-chain and distribution offerings. The service largely consolidates a package of existing products — air and ocean freight, trucking and last-mile delivery — into a new suite it says companies like Procter & Gamble Co. and 3M Co. are already using.

This is a very satisfying announcement for Stratechery, given it’s the culmination of a prediction I made a decade ago in The Amazon Tax. Amazon at that point had two primary businesses — Amazon.com and AWS — and I made the case in that Article that they were actually very similar: in both cases Amazon built “primitives” that had Amazon itself as their first, best customer, justifying and driving initial development, but in both cases the ultimate play was to sell those primitives to other companies. It was already clear at the time that logistics would follow the same path:

It seems increasingly clear that Amazon intends to repeat the model when it comes to logistics: after experimenting with six planes last year the company recently leased 20 more to flesh out its private logistics network; this is on top of registering its China subsidiary as an ocean freight forwarder…

So how might this play out? Well, start with the fact that Amazon itself would be this logistics network’s first-and-best customer, just as was the case with AWS. This justifies the massive expenditure necessary to build out a logistics network that competes with UPS, FedEx, et al, and most outlets are framing these moves as a way for Amazon to rein in shipping costs and improve reliability, especially around the holidays.

However, I think it is a mistake to think that Amazon will stop there: just as they have with AWS and e-commerce distribution I expect the company to offer its logistics network to third parties, which will increase the returns to scale, and, by extension, deepen Amazon’s eventual moat.

Now, ten years later, we are here, with the official unveiling of Amazon Supply Chain Services, and I think the time frame is an important one: Amazon, more than any other company, actually operates with decade-long timeframes, consistently making real-world investments at massive scale that (1) convert their marginal costs into capital costs and (2) gain leverage on those capital costs by selling them to other businesses.

This is, by the way, still a story about AI.

A Brief History of AWS

Three years ago SemiAnalysis wrote an Article entitled Amazon’s Cloud Crisis: How AWS Will Lose The Future Of Computing, and I found it very compelling. First, though, some history (much of which is covered in SemiAnalysis’ article).

Amazon not only invented cloud computing, but also realized it would be a commodity market. While most people in tech think about building sustainable differentiation that allows you to charge higher prices, thus producing profit, commodity markets work differently: there, sustainable profits come from having structurally cheaper costs.

Amazon developed exactly that, first through having the largest scale — giving the company both buying power and also the most leverage on their development costs — and second through genuine innovation. AWS built a specialized system called Nitro, built on their own chips, that offloaded server management, including network management, storage management, hypervisor management, etc. from the expensive Intel and AMD servers that the company sold access to; this let Amazon run that many more virtual machines on a single server, significantly increasing utilization, i.e. delivering a structural cost advantage.

Amazon doubled down on their custom chip efforts with Graviton, their ARM processors. Graviton chips, particularly the first few generations, were inferior to Intel or AMD chips, but that didn’t mean they were useless. By that time AWS had expanded from simply being an Infrastructure-as-a-Service (IaaS) provider to being a Platform-as-a-Service (PaaS) provider as well. IaaS means you provide raw compute, storage, etc., on which customers can run things like operating systems or databases; PaaS means you provide that basic functionality as a service. Amazon Relational Database Service (RDS), for example, is a fully managed database that customers can access via a set of APIs without having to worry about actually managing the full database themselves, worrying about scaling, duplication, etc. This, by extension, means that customers don’t need to know and don’t need to care about the compute infrastructure that undergirds services like RDS — which has long been Graviton!

PaaS lets Amazon double-dip in terms of profitability: first, AWS could sell PaaS products at a higher margin than IaaS products, and second, the company could leverage its own cheaper silicon to serve those products, reducing their costs. Over time Graviton has become more competitive in performance — while still being cheaper — giving Amazon a lower-cost compute instance to sell to end users, but even without 3rd-party take-up the investment in building its own silicon has paid off over time.

Training vs. Inference

Fast forward to AI, and SemiAnalysis’ concern was that all of these optimizations left AWS ill-prepared for AI. One big problem was networking:

Rather than implement the best networking from Nvidia and/or Broadcom, Amazon is using its own Nitro and Elastic Fabric Adaptor (EFA) networking. This works well for many workloads, plus it delivers a cost, performance, and security advantage. There are business, cultural, and security reasons why Amazon will not implement other networking. The cultural one is important. Nitro and networking SoC’s generally have been Amazon’s biggest cost advantage for years. It’s ingrained into their DNA. Even EFA delivers on this too, but they don’t see how new workloads are evolving and that a new tier is needed due to the lack of foresight in their internal workload and infrastructure teams. Amazon is making a deliberate choice of not adopting that we believe will bite them in the future.

Another was Amazon’s insistence on building its own chips, which were not only inferior to the best Nvidia chips in terms of performance, but might also lead to them getting fewer Nvidia chips going forward:

At least some other clouds will implement out-of-node NVLink. That’s where the discussion of prioritization now comes in. AI GPUs face tremendous shortages, for at least a full year. This is one of the most pivotal times for AI, and it may mark the haves and the have-nots. Nvidia is a complete monopoly right now. Why would Nvidia prioritize Amazon for these GPUs, when they know Amazon will move to their in-house chips as quickly as they can, for as many compute workloads as they can? Why would Nvidia ship tons of GPUs to the cloud that is not using any of their networking, thereby reducing their share of wallet?

Instead, Nvidia prioritizes the me-too clouds. Amazon does get meaningful volume, but nowhere close to where demand is. Amazon’s H100 GPU shipments relative to public cloud shipments is a significantly lower than their share of the public cloud. Those other clouds also can’t satisfy demand, but they get a bigger percentage of the GPUs they ask Nvidia for, and as such, firms looking for GPUs for training or inference will move to those clouds. Nvidia is the kingmaker right now, and they are capitalizing on it. They have to spread the balance of power out to prevent compute share from clustering towards Amazon.

These concerns were well-founded in the 2023 time-period when that Article was written: that was a time when AI, thanks to ChatGPT, had hit the mainstream, but the largest share of compute still went to training. Training required all of the things that Amazon lacked, particularly the ability to network large numbers of Nvidia GPUs together into one coherent system. In such a system the most important capability was horizontal networking between chips, so that you could update weights during training, a step that needed to happen serially. It was absolutely the case that cloud providers like Microsoft or Oracle or the neoclouds, which implemented full Nvidia solutions, instead of the standalone HGX racks that AWS favored, were much better suited to training large language models.

That is still the case, by the way. What has changed is that training is no longer the biggest AI compute market; inference is, thanks not only to increased AI adoption, but also because of fundamental changes in terms of how AI works. From an Update about Nvidia:
- The first inflection point was the emergence of LLMs — call this the ChatGPT moment. In this first paradigm tokens were generated by GPUs and presented as the answer to a question.
- The second inflection point was the emergence of reasoning models — call this the o1 moment. In this paradigm there are a very large number of tokens that are generated to figure out the answer before the answer is actually generated; this was an exponential increase in the addressable market for tokens.
- The third inflection point was the emergence of functional agents — call this the Opus 4.5 moment. In this paradigm those reasoning models are not triggered by humans asking a question, but by an agent solving a problem. This increases the market in two directions: first, humans can run multiple agents, and secondly, agents can leverage reasoning models multiple times to accomplish a task. This isn’t just an exponential increase in the addressable market for tokens, it’s two exponential increases squared.
Both the shift to inference and the shift in the nature of inference have been positives for AWS’ approach.
- First, while inference still requires significant memory, the requirement is significantly less than that required for training. It’s actually viable to store a model’s parameters in a single server; you don’t need to network together thousands of chips.
- Second, while reasoning and agentic workloads require significantly more tokens, and thus a massively larger KV cache, the increase is actually so large that even the most optimized Nvidia inference systems are being built with dedicated memory servers. This sort of architecture is much more compatible with Amazon’s networking approach than the thousands-of-chips-networked-together approach is.
- Third, agents are heavily CPU dependent, which has two important implications. First, fully utilizing accelerators is a function of having sufficient general compute; second, achieving maximum utilization of heterogeneous compute means unbundling CPUs and GPUs and routing workloads between resources, which is exactly the sort of disaggregated-resource abstraction that Amazon has been building with Nitro.
The utilization point is an important one. Nvidia CEO Jensen Huang made his case for Nvidia chips over custom ASICs at length at GTC 2025. Huang’s argument was that AI factories — to use his term — were ultimately constrained by power; that meant that the most important metric for profitability was not the cost of chips but rather tokens-per-watt. In other words, if you can’t increase watts, it’s worth spending more on chips to increase tokens on those watts.

There are, however, three reasons why this argument may not hold, particularly for a company like Amazon.
- First, if you have the money to buy that many Nvidia chips, you also have the money to spend on getting more power — which is exactly what AWS has been focused on. This very much fits AWS’ modus operandi, which is to invest more upstream (in this case in power) with the goal of spending less downstream (paying Nvidia huge margins for their chips).
- Second, in the long term, electricity is more of a commodity than logic is. That means it is a market where innovation and competition are more likely to break a bottleneck, which is another way to say that investing in one’s own silicon is the area most likely to deliver a return on investment.
- Third, the nature of inference workloads — particularly agentic ones — is such that perfect accelerator utilization is going to be a much harder problem to solve than when it comes to training.
These points are moot, however, if you don’t have your own logic chip that is at least competitive, and here Amazon’s long-term outlook is paying off. Amazon bought Annapurna Labs, which makes their chips, in 2015, and launched their first AI-focused chip in 2019. No, it wasn’t very good, but critically, that was seven years ago: now Trainium 3 is decent and the trajectory is even better. AWS is positioned to have a sustainable cost advantage for inference going forward.

AWS’s Neutrality

Moreover, they are already replaying the Graviton playbook. Trainium chips help undergird Bedrock, its AI platform, which is to say that users are using Trainium chips even if they didn’t explicitly choose to do so. AWS CEO Matt Garman made this point explicitly in a Stratechery Interview:

I think just with GPUs, by the way, you’re going to interact with a lot of these accelerator chips through abstractions. So the vast majority of customers don’t interact with GPUs either, except through maybe like in their laptop or something like that, for graphics. But when you’re talking to OpenAI, even if they’re running on GPUs, you’re not talking to the GPUs, if you’re talking to Claude, you’re through GPUs or Trainium or TPUs, you’re not talking to any of those chips, you’re talking to the interface. And the vast majority of inference out there is being done on one of a handful of models.

And so whether it’s 5, 10, 20, 100, it’s not millions of people that are programming to those things directly, and that’s gonna be true going forward just because these systems are so complex, they’re very large. If you’re going to go train a model, not that many people have enough money to go train a model, not that many people have the expertise to actually manage it. They’re very complicated systems, and the OpenAI team is incredible in their ability to squeeze value out of a very large compute cluster. But not that many people have the team that can do that, independent of what the chip happens to be, and so I think that that’s going to be true for all accelerator chips, honestly.

The frontier models are an important factor in this, and that is an angle that I didn’t see coming. Nvidia CEO Jensen Huang explained in a recent interview with Dwarkesh Patel why Nvidia didn’t invest in Anthropic early on:

At the time, I didn’t deeply internalize how difficult it would be to build a foundation AI lab like OpenAI and Anthropic, and the fact that they needed huge investments from the supplier themselves. We just weren’t in a position to make the multi-billion dollar investment into Anthropic so that they could use our compute. But Google and AWS were. They put in huge investments in the beginning so that Anthropic, in return, used their compute. We just weren’t in a position to do that at the time.

I would say my mistake is I didn’t deeply internalize that they really had no other options, that a VC would never put in $5-10 billion of investment into an AI lab with the hopes of it turning out to be Anthropic. So that was my miss. But even if I understood it, I don’t think we would’ve been in a position to do that at the time. But I’m not going to make that same mistake again.

Amazon had both the money and the chips to invest into Anthropic precisely because they had built such a cash machine with AWS in the first place. That’s the thing with big investments in infrastructure: they take years to build, but the benefit of that investment compounds over time.

Anthropic, meanwhile, thanks to those investments from Amazon and Google, can not only run across a variety of chips, but for a long time was the only frontier model available on all of the leading clouds, an important selling point for enterprises. Microsoft, in the end, needed to let go of Azure’s exclusive access to OpenAI’s API in part because that exclusivity was hurting the prospects of their mammoth stake in OpenAI.

You can also make the case that Amazon is the best choice for frontier model access in a world of limited compute: Microsoft’s core business is software, which is to say that the company faces massive pressure to invest in their own AI capabilities, even at the cost of de-prioritizing cloud customers. That’s exactly what happened at Microsoft earlier this year, when the company missed Azure growth projections because they devoted more compute to their internal workloads. It was an understandable decision: cloud demand is eternal, but the risk from AI for existing software businesses is existential.

This also applies to Google: the company’s core business is also digital, and while search has fended off the threat from chatbots that many expected, the fundamental challenge is still one to be managed, not extinguished. Amazon’s core businesses, meanwhile, are very much rooted in the physical world: selling and shipping physical goods, and building data centers. Both are amenable to Amazon devoting the majority of its chips to customers’ workloads.

Amazon’s Future

If this week marks the resolution of one of Amazon’s long bets, you can see the outline of future resolutions in present day announcements. One prominent example is Amazon Leo, the company’s satellite service that seems, at first glance, duplicative of SpaceX’s Starlink, which has the advantage of already existing at scale.

Remember Amazon’s formula, however, which CEO Andy Jassy stated explicitly with regards to Leo on the company’s most recent earnings call:

Today, if you ask what stops us from growing the business, we have to get the constellation into space. We have over 20 launches planned this year. We have over 30 launches planned in 2027. But I think the business has a chance to be a very large many billion-dollar revenue business. And I think it has some characteristics that are reminiscent of AWS in that it’s capital-intensive upfront where you’re committing a lot of capital and cash in the early years for assets that you get to leverage over a long period of time. And so I like the free cash flow and return on invested capital characteristics of that business in the medium to long term.

The fact that it is extremely capital-intensive is not the only thing about Leo that makes it like AWS: a critical factor is that Amazon is the first-best customer to give the service scale, and here it’s worth going back to logistics. I noted above that Amazon delivery still has marginal costs, and that is because humans have to make the delivery.

Amazon, however, has already pointed to the future, a full 13 years ago when the company first started talking publicly about drone delivery. It’s been a long slog, to be sure, but it’s increasingly plausible to imagine a future where delivery costs are a matter of depreciation on drone assets, and what would such a future require? How about reliable widespread satellite coverage for communicating with and guiding those drones? And, if Amazon doesn’t want to be dependent on Jensen Huang for chips, do you think they want to be dependent on Elon Musk for drone connectivity?

Of course other businesses — like Apple — will be able to pay to use Amazon’s satellite infrastructure, just like they can now pay to use Amazon’s delivery service, or pay to use AWS, or pay to sell on Amazon.com. The world may change, in increasingly drastic ways, but Amazon’s approach, by virtue of its focus on long-term investments in the physical world, appears to be as sturdy as ever.

More generally, I increasingly suspect that long-term vulnerability to AI — or, to put it more positively, long-term incentives to invest in AI — are very strongly correlated with the degree to which a company interacts with the physical world, and secondarily, the degree to which companies feel secure in their control of distribution:
- Apple and Amazon feel comfortable not having leading edge models, just access to them, because their business is rooted in the physical.
- Microsoft has invested heavily in data centers, but doesn’t own their own model, perhaps because they feel their control of distribution to enterprises will protect their core business (or because they had too much of a dependency on OpenAI).
- Google and Meta are investing at a similar scale to Amazon, and are also heavily invested in their own models. Both are Aggregators, which is to say they have to continually earn attention from consumers, given that competition is only a click away; having good AI is existential to them.
This is, in the end, another advantage to making the sort of long-term bets Amazon specializes in: the threats are so distant that you have plenty of time to make new investments that address any weaknesses that develop in the meantime — or, as is the case of AI, wait for the market to tilt in your favor.
This Week in Stratechery

2026.18: Long-term, Peripheral & Myopic Visions
(Photo by Noah Berger/Getty Images for Amazon Web Services)

Welcome back to This Week in Stratechery!

As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone. Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings.

On that note, here were a few of our favorites this week.
1. Amazon and AI. When it comes to AI, every quarter seems to bring a new winner and loser. For my part, the company that I find increasingly compelling is Amazon. Things didn’t look promising a couple of years ago, when training was the most important infrastructure use case, but Amazon — whether through vision or good fortune — was positioning itself well for a world defined by inference (given that their inference chip is called “Trainium”, I’m going with a little bit of column A and a little bit of column B). Now the company is adding OpenAI’s models to its offerings, and collaborating with the frontier lab on an entirely new kind of enterprise product: Bedrock Managed Agents, the subject of a Stratechery Interview with AWS CEO Matt Garman and OpenAI CEO Sam Altman. — Ben Thompson
Continue reading
1. The Future of AR Devices. Amidst a never-ending conversation about AI, software and infrastructure spending, it was refreshing this week to dream about the possibilities for the future of hardware. Ben’s Daily Update on Monday traced his experience with the Meta Display glasses and culminated with an epiphany on what the future of AR should look like. We dove deeper on Sharp Tech with an extended conversation about why the Display glasses are superior to Meta’s Orion prototype, notes on what future VR headsets should emphasize, and whether phones (or books?) should be characterized as AR devices. — Andrew Sharp
2. Beijing’s Myopia in AI and Elsewhere. On Sharp China this week Bill and I unpacked the implications of a terrific mess in Singapore, as China’s National Development and Reform Commission has moved to block Meta’s $2 billion acquisition of Manus, a formerly Chinese AI company that had reincorporated in Singapore and had already received payment and integrated its products and employees into Meta’s operations. Then, on Sharp Text this morning, I wrote about Beijing’s geopolitical behavior in 2026, what Western media tends to get wrong, and — with the Manus decision being a good example — why the CCP’s geopolitical and domestic strategies are generally reactive, not proactive, and often counterproductive.— AS
Stratechery Articles and Updates
- AI Hardware, Meta Display, Redefining VR and AR — I finally tried the Meta Ray-Ban Display, and it completely changed how I think about AR and VR.
- An Interview with OpenAI CEO Sam Altman and AWS CEO Matt Garman About Bedrock Managed Agents — An interview with OpenAI CEO Sam Altman and AWS CEO Matt Garman about their new partnership, plus my thoughts on OpenAI and Microsoft’s new deal.
- Intel Earnings, Intel’s Differentiation?, Whither Terafab — Intel’s earnings were very impressive, but the chief driver was a structural shift in demand for CPUs for AI. Plus, what is going on with Terafab?
- Amazon Earnings, Trainium and Commodity Markets, Additional Amazon Notes — Amazon’s earnings suggest that the shift away from training towards inference and agents means their bet on Trainium is paying off. Plus, additional notes on ads, agents, and sports rights.
Sharp Text by Andrew Sharp
- Beijing Is Not Playing the Long Game — Every single week, someone in the Western media will tell you that China is playing “the long game.” Don’t believe them.
Dithering with Ben Thompson and Daring Fireball’s John Gruber
- Meta Ray-Ban Display
- OpenAI, Musk & Microsoft
Asianometry with Jon Yu
- Fanuc and the Numerical Control Revolution
Sharp China with Andrew Sharp and Sinocism’s Bill Bishop
- Beijing Kills Meta’s Manus Deal; April Politburo Takeaways; Foreign Forces Afflicting the Youth; US Countermeasures Mounting
Greatest of All Talk with Andrew Sharp and Ben Golliver
- NAW and CJ and CA CAWWWWW, DEFCON 2 for Jokic and the Nuggets, Notes on OKC, Toronto, and VJ Edgecombe
- Playoff Stock Watch: Scottie Barnes Awareness, Pistons Repricing, Jokic Market Corrections, and Lots More
Sharp Tech with Andrew Sharp and Ben Thompson
- AWS History and Trainium’s AI Future, OpenAI Makes a Deal With Microsoft, Meta and the Future of Wearable Devices
This week’s Stratechery video is on Tim Cook’s Impeccable Timing.
Tim Cook’s Impeccable Timing

Tuesday, April 21, 2026

Listen to Podcast

Listen to this post:

Log in to listen

It’s the nature of business that the eulogy for a chief executive doesn’t happen when they die, but when they retire, or, in the case of Apple CEO Tim Cook, announce that they will step up to the role of Executive Chairman on September 1. The one morbid exception is when a CEO dies on the job — or quits because they are dying — and the truth of the matter is that that is where any honest recounting of Cook’s incredibly successful tenure as Apple CEO, particularly from a financial perspective, has to begin.

The numbers, to be clear, are extraordinary. Cook became CEO of Apple on August 24, 2011, and in the intervening 15 years revenue has increased 303%, profit 354%, and the value of Apple has gone from $297 billion to $4 trillion, a staggering 1,251% increase.

The reason for Cook’s accession in 2011 became clear a mere six weeks later, when Steve Jobs passed away from cancer on October 5, 2011. Jobs’ death isn’t the reason Cook was chosen — Cook had already served as interim CEO while Jobs underwent treatment in 2009 — but I think the timing played a major role in making Cook arguably the greatest non-founder CEO of all time.

Zero to One

Peter Thiel introduced the concept of Zero To One thusly:

When we think about the future, we hope for a future of progress. That progress can take one of two forms. Horizontal or extensive progress means copying things that work — going from 1 to n. Horizontal progress is easy to imagine because we already know what it looks like. Vertical or intensive progress means doing new things — going from 0 to 1. Vertical progress is harder to imagine because it requires doing something nobody else has ever done. If you take one typewriter and build 100, you have made horizontal progress. If you have a typewriter and build a word processor, you have made vertical progress.

Steve Jobs made 0 to 1 products, as he reminded the audience in the introduction to his most famous keynote:

Every once in a while, a revolutionary product comes along that changes everything. First of all, one’s very fortunate if one gets to work on one of these in your career. Apple’s been very fortunate: it’s been able to introduce a few of these into the world.

In 1984, we introduced the Macintosh. It didn’t just change Apple, it changed the whole computer industry. In 2001, we introduced the first iPod. It didn’t just change the way we all listen to music, it changed the entire music industry.

Well, today we’re introducing three revolutionary products of this class. The first one: a widescreen iPod with touch controls. The second: a revolutionary mobile phone. And the third is a breakthrough Internet communications device. Three things…are you getting it? These are not three separate devices. This is one device, and we are calling it iPhone.

Steve Jobs would, three years later, also introduce the iPad, which makes four distinct product categories if you’re counting. Perhaps the most important 0 to 1 product Jobs created, however, was Apple itself, which raises the question: what makes Apple Apple?

The Cook Doctrine

“What Makes Apple Apple” isn’t a new question; it was the central question of Apple University, the internal training program the company launched in 2008. Apple University was hailed on the outside as a Steve Jobs creation, but while I’m sure he green lit the concept, it was clear to me as an intern on the Apple University team in 2010, that the program’s driving force was Tim Cook.

The core of the program, at least when I was there, was what became known as The Cook Doctrine:

We believe that we’re on the face of the Earth to make great products, and that’s not changing.

We’re constantly focusing on innovating.

We believe in the simple, not the complex.

We believe that we need to own and control the primary technologies behind the products we make, and participate only in markets where we can make a significant contribution.

We believe in saying no to thousands of projects so that we can really focus on the few that are truly important and meaningful to us.

We believe in deep collaboration and cross-pollination of our groups, which allow us to innovate in a way that others cannot.

And frankly, we don’t settle for anything less than excellence in every group in the company, and we have the self-honesty to admit when we’re wrong and the courage to change.

And I think, regardless of who is in what job, those values are so embedded in this company that Apple will do extremely well.

Cook explained this on Apple’s January 2009 earnings call, during Jobs’ first leave of absence, in response to a question about how Apple would fare without its founder. It’s a brilliant statement, but it is — as the last paragraph makes clear — ultimately about maintaining, nurturing, and growing what Jobs built.

That is why I started this Article by highlighting the timing of Cook’s ascent to the CEO role. The challenge for CEOs following iconic founders is that the person who took the company from 0 to 1 usually sticks around for 2, 3, 4, etc.; by the time they step down the only way forward is often down. Jobs, however, by virtue of leaving the world too soon, left Apple only a few years after its most important 0 to 1 product ever, meaning it was Cook who was in charge of growing and expanding Apple’s most revolutionary device yet.

Cook’s Triumphs

Cook, to be clear, managed this brilliantly. Under his watch the iPhone not only got better every year, but expanded its market to every carrier in basically every country, and expanded the line from one model in two colors to five models in a plethora of colors sold at the scale of hundreds of millions of units a year.

Cook was, without question, an operational genius. Moreover, this was clearly the case even before he scaled the iPhone to unimaginable scale. When Cook joined Apple in 1998 the company’s operations — centered on Apple’s own factories and warehouses — were a massive drag on the company; Cook methodically shut them down and shifted Apple’s manufacturing base to China, creating a just-in-time supply chain that year-after-year coordinated a worldwide network of suppliers to deliver Apple’s ever-expanding product line to customers’ doorsteps and a fleet of beautiful and brand-expanding stores. There was not, under Cook’s leadership, a single significant product issue or recall.

Cook also oversaw the introduction of major new products, most notably AirPods and Apple Watch; the “Wearables, Home, and Accessories” category delivered $35.4 billion in revenue last year, which would rank 128 on the Fortune 500. Still, both products are derivative of the iPhone; Cook’s signature 0 to 1 product, the Apple Vision Pro, is more of a 0.5.

Cook’s more momentous contribution to Apple’s top line was the elevation of Services. The Google search deal actually originated in 2002 with an agreement to make Google the default search service for Safari on the Mac, and was extended to the iPhone in 2007; Google’s motivation was to ensure that Apple never competed for their core business, and Cook was happy to take an ever increasing amount of pure profit.

The App Store also predated Cook; Steve Jobs said during the App Store’s introduction that “we keep 30 [percent] to pay for running the App Store”, and called it “the best deal going to distribute applications to mobile platforms”. It’s important to note that, in 2008, this was true! The App Store really was a great deal.

Three years later, in a July 28, 2011 email — less than a month before Cook officially became CEO — Phil Schiller wondered if Apple should lower its take once they were making $1 billion a year in profit from the App Store. John Gruber, writing on Daring Fireball in 2021, wondered what might have been had Cook followed Schiller’s advice:

In my imagination, a world where Apple had used Phil Schiller’s memo above as a game plan for the App Store over the last decade is a better place for everyone today: developers for sure, but also users, and, yes, Apple itself. I’ve often said that Apple’s priorities are consistent: Apple’s own needs first, users’ second, developers’ third. Apple, for obvious reasons, does not like to talk about the Apple-first part of those priorities, but Cook made explicit during his testimony during the Epic trial that when user and developer needs conflict, Apple sides with users. (Hence App Tracking Transparency, for example.)

These priorities are as they should be. I’m not complaining about their order. But putting developer needs third doesn’t mean they should be neglected or overlooked. A large base of developers who are experts on developing and designing for Apple’s proprietary platforms is an incredible asset. Making those developers happy — happy enough to keep them wanting to work and focus on Apple’s platforms — is good for Apple itself.

I want to agree with Gruber — I was criticizing Apple’s App Store policies within weeks of starting Stratechery, years before it became a major issue — but from a shareholder perspective, i.e. Cook’s ultimate bosses, it’s hard to argue with Apple’s uncompromising approach. Last year Apple Services generated 26% of Apple’s revenue and 41% of the company’s profit; more importantly, Services continues to grow year-over-year, even as iPhone growth has slowed from the go-go years.

China and AI

Another way to frame the Services question is to say that Gruber is concerned about the long-term importance of something that is somewhat ineffable — developer willingness and desire to support Apple’s platforms — which is, at least in Gruber’s mind, essential for Apple’s long-term health. Cook, in this critique, prioritized Apple’s financial results and shareholder returns over what was best for Apple in the long run.

This isn’t the only part of Apple’s business where this critique has validity. Cook’s greatest triumph was, as I noted above, completely overhauling and subsequently scaling Apple’s operations, which first and foremost meant developing a heavy dependence on China. This dependence was not inevitable: Patrick McGee explained in Apple In China, which I consider one of the all-time great books about the tech industry, how Apple made China into the manufacturing behemoth it became. McGee added in a Stratechery Interview:

Let me just refer back to something that you wrote I think a few months ago when you called the last 20, 25 years, like the golden age for companies like Apple and Silicon Valley focused on software and Chinese taking care of the hardware manufacturing. That is a perfect partnership, and if we were living in a simulation and it ended tomorrow, you’d give props for Apple to taking advantage of the situation better than anybody else.

The problem is we’re probably not living in the simulation and things go on, and I’ve got this rather disquieting conclusion where, look, Apple’s still really good probably, they’re not as good as they once were under Jony Ive, but they’re still good at industrial design and product design, but they don’t do any operations in our own country. That’s all dependent on China. You’ve called this in fact the biggest violation of the Tim Cook doctrine to own and control your destiny, but the Chinese aren’t just doing the operations anymore, they also have industrial design, product design, manufacturing design.

It really is ironic: Tim Cook built what is arguably Apple’s most important technology — its ability to build the world’s best personal computer products at astronomical scale — and did so in a way that leaves Apple more vulnerable than anyone to the deteriorating relationship between the United States and China. China was certainly good for the bottom line, but was it good for Apple’s long-run sustainability?

This same critique — of favoring a financially optimal strategy over long-term sustainability — may also one day be levied on the biggest question Cook leaves his successor: what impact will AI have on Apple? Apple has, to date, avoided spending hundreds of billions of dollars on the AI buildout, and there is one potential future where the company profits from AI by selling the devices everyone uses to access commoditized models; there is another future where AI becomes the means by which Apple’s 50 Years of Integration is finally disrupted by companies that actually invested in the technology of the future.

Cook’s Timing

If Tim Cook’s timing was fortunate in terms of when in Apple’s lifecycle he took the reins, then I would call his timing in terms of when in Apple’s lifecycle he is stepping down as being prudent, both for his legacy and for Apple’s future.

Apple is, in terms of its traditional business model, in a better place than it has ever been. The iPhone line is fantastic, and selling at a record pace; the Mac, meanwhile, is poised to massively expand its market share as Apple Silicon — another Jobs initiative, appropriately invested in and nurtured by Cook — makes the Mac the computer of choice for both the high end (thanks to Apple Silicon’s performance and unified memory architecture) and the low end (the iPhone chip-based MacBook Neo significantly expands Apple’s addressable market). Meanwhile, the Services business continues to grow. Cook is stepping down after Apple’s best-ever quarter, a milestone that very much captures his tenure, for better and for worse.

At the same time, the AI question looms — and it suggests that Something Is Rotten in the State of Cupertino. The new Siri still hasn’t launched, and when it does, it will be with Google’s technology at the core. That was, as I wrote in an Update, a momentous decision for Apple’s future:

Apple’s plans are a bit like the alcoholic who admits that they have a drinking problem, but promises to limit their intake to social occasions. Namely, how exactly does Apple plan on replacing Gemini with its own models when (1) Google has more talent, (2) Google spends far more on infrastructure, and (3) Gemini will be continually increasing from the current level, where it is far ahead of Apple’s efforts? Moreover, there is now a new factor working against Apple: if this white-labeling effort works, then the bar for “good enough” will be much higher than it is currently. Will Apple, after all of the trouble they are going through to fix Siri, actually be willing to tear out a model that works so that they can once again roll their own solution, particularly when that solution hasn’t faced the market pressure of actually working, while Gemini has?

In short, I think Apple has made a good decision here for short term reasons, but I don’t think it’s a short-term decision: I strongly suspect that Apple, whether it has admitted it to itself or not, has just committed itself to depending on 3rd-parties for AI for the long run.

As I noted above and in that Update, this decision may work out; if it doesn’t, however, the sting will be felt long after Cook is gone. To that end, I certainly hope that John Ternus, the new CEO, was heavily involved in the decision; truthfully, he should have made it.

To that end, it’s right that Cook is stepping down now. Jobs might have been responsible for taking Apple from 0 to 1, but it was Cook that took Apple from 1 to $436 billion in revenue and $118 billion in profit last year. It’s a testament to his capabilities and execution that Apple didn’t suffer any sort of post-founder hangover; only time will tell if, along the way, Cook created the conditions for a crash out, by virtue of he himself forgetting The Cook Doctrine and what makes Apple Apple.

I wrote a follow-up to this Article in this Daily Update.
Mythos, Muse, and the Opportunity Cost of Compute

Monday, April 13, 2026

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

In January 2025, Doug O’Laughlin at Fabricated Knowledge declared that o1 and reasoning models marked the end of Aggregation Theory:

I believe that there is no practical limit to the improvements of models other than economics, and I think that will be the real constraint in the future. It is reasonable that if we spent infinite dollars on a model, it would be improved. The problem is whether infinite dollars would make sense for a business.

That is going to be the key question for 2025. How do the economics of AI make this work? One of the core assumptions about the internet has just been broken. Marginal costs now exist again, meaning that most hyperscalers will become increasingly capital-intensive.

The era of Aggregation Theory is behind us, and AI is again making technology expensive. This relation of increased cost from increased consumption is anti-internet era thinking. And this will be the big problem that will be reckoned with this year. Hyperscaler’s business models are mainly underpinned by the marginal cost being zero. So, as long as you set up the infrastructure and fill an internet-scale product with users, you can make money.

This era will soon be over, and the future will be much weirder and more compute-intensive. Looking back on the 2010s, we will probably consider them a naive time in the long arc of technology. One of our fundamental assumptions about this period is unraveling. This will be the single most significant change in the technology landscape going forward.

Aggregation Theory was, if I may say so myself, the single best way to understand the 2010s, particularly consumer tech. It explained the dynamics undergirding Google and Facebook’s dominance, as well as the App Store and Amazon’s e-commerce business; it was also a useful (albeit incomplete) framework to understand an entire host of consumer services like Uber, Airbnb, and Netflix.

It’s worth pointing out, however, that some of the critical insights undergirding Aggregation Theory are much older, and are embedded in the fundamental nature of tech itself. They are, as O’Laughlin notes, rooted in the concept of zero marginal costs.

Marginal Costs

Marginal costs are how much it costs to make one more unit of a good. Consider a widget-making factory:
- You need land for the factory
- You need machines for the factory
- You need electricity to operate the machines
- You need humans to operate the machines
- You need the raw material for the widgets
Land and machines are clearly fixed costs; you have to have both to get started, and you are paying for both whether or not you make one more widget. Raw material, on the other hand, is clearly a marginal cost: if you make one more widget, you need one more widget’s worth of raw material. When it comes to physical goods, electricity and humans are also marginal costs: you need more or fewer of them depending on whether you make more or fewer widgets.

Where marginal costs matter is that they provide a price floor. Companies will operate unprofitably because profit and loss is an accounting concept that incorporates depreciation, i.e. your fixed costs. For example, imagine that a company spent $1,000 on a factory to make widgets that have a marginal cost of $10: as long as the price of widgets is >$10 the company will make them even if they don’t earn enough money to cover their depreciation costs (i.e. they operate at a loss) because at least they are still making a marginal profit on each widget (what the company may not do is invest in any more fixed costs, and, eventually, will probably go bankrupt from interest on the debt that likely financed those fixed costs).

I explain all of this precisely because it’s almost completely immaterial to tech. First, there generally are no raw material costs, because the outputs are digital. Second, because there are no raw material costs, and because the fixed costs are so large, electricity and humans are generally treated as fixed costs, not marginal costs: of course you will run your servers all of the time and at full capacity, because every scrap of additional revenue you can generate is worth it.

AI very much fits in this paradigm: the output is digital, and while AI chips use a lot of electricity, the cost is a fraction of the cost of the chips themselves, which is to say that no one with AI chips is making marginal cost calculations in terms of utilizing them. They’re going to be used! Rather, the decision that matters is what they will be used for.

Opportunity Costs

Consider Microsoft: last quarter the company missed the Street’s Azure growth expectations not because there wasn’t demand, but because the company decided to use its capacity for its own products. CFO Amy Hood said on the company’s earnings call:

I think it’s probably better to think about the Azure guidance that we give as an allocated capacity guide about what we can deliver in Azure revenue. Because as we spend the capital and put GPUs specifically, it applies to CPUs, the GPUs more specifically, we’re really making long-term decisions. And the first thing we’re doing is solving for the increased usage in sales and the accelerating pace of M365 Copilot as well as GitHub Copilot, our first-party apps. Then we make sure we’re investing in the long-term nature of R&D and product innovation. And much of the acceleration that I think you’ve seen from us and products over the past a bit is coming because we are allocating GPUs and capacity to many of the talented AI people we’ve been hiring over the past years.

Then, when you end up, is that, you end up with the remainder going towards serving the Azure capacity that continues to grow in terms of demand. And a way to think about it, because I think, I get asked this question sometimes, is if I had taken the GPUs that just came online in Q1 and Q2 in terms of GPUs and allocated them all to Azure, the KPI would have been over 40. And I think the most important thing to realize is that this is about investing in all the layers of the stack that benefit customers. And I think that’s hopefully helpful in terms of thinking about capital growth, it shows in every piece, it shows in revenue growth across the business and shows as OpEx growth as we invest in our people.

The cost that Microsoft is contending with here is not marginal cost, but rather opportunity cost: compute spent in one area cannot be used in another area; in the case of these earnings, Microsoft was admitting that they could have made their Azure number if they wanted to, but chose to prioritize their own workloads because, as CEO Satya Nadella noted later in the call, those have higher gross margin profiles and higher lifetime value.

It’s opportunity costs, not marginal costs, that are the challenge facing hyperscalers. How much compute should go to customers, and which ones? How much should be reserved for internal workloads? Microsoft needs to balance Azure — both for its enterprise customers and OpenAI — and its software business; Amazon needs to balance its e-commerce business, AWS, and its strategic investments in both Anthropic and OpenAI. Google has to balance GCP, its own strategic investment in Anthropic, and its consumer businesses.

Mythos

Last week Anthropic ~~released~~ announced Mythos, its most advanced model. And, in somewhat typical Anthropic fashion, it did so by focusing on its dangers; from the introductory post for Project Glasswing, the company’s initiative for leveraging Mythos to address security:

We formed Project Glasswing because of capabilities we’ve observed in a new frontier model trained by Anthropic that we believe could reshape cybersecurity. Claude Mythos Preview is a general-purpose, unreleased frontier model that reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.

In an Update last week I analogized Anthropic’s “disaster-porn-as-marketing-tool” approach to The Boy Who Cried Wolf; what’s important about that analogy is not just that the boy raised false alarms, but also that, in the end, the wolf did come. To that end, I wrote two weeks ago about the myriad of security issues that underpin all software, and my optimism that AI would solve these issues in the long run, even if it made things much worse in the short run. In other words, it’s actually not important whether or not Mythos represents a major security threat: if this model doesn’t, a future model will; to that end, I do support leveraging Mythos to proactively find and fix bugs before bad actors can find and exploit them.

At the same time, it’s also worth noting that there are other reasons for Anthropic to not make Mythos widely available, limiting access to a finite number of companies with a high capacity and willingness to pay. The first are those opportunity costs: Anthropic is already short on compute serving its current models; X was overrun with complaints and debates this weekend about Anthropic allegedly dumbing down Claude over the last month or so. Making Mythos more widely available — particularly to subscription plans that don’t pay per usage — would make the situation much worse.

In other words, Anthropic isn’t facing a marginal cost problem, but an opportunity cost problem: where to allocate its compute. Of course this could become a margin problem: I suspect that Anthropic is going to overcome its conservatism in terms of compute by acquiring more compute from hyperscalers and neoclouds, and paying dearly for the privilege.

The key to handling those costs will be to charge more for Claude going forward; that, by extension, means maintaining pricing power, which leads to a second benefit of not releasing Mythos broadly. Anthropic certainly faces competition from OpenAI; for both frontier labs, however, the real competition in the long run are open source models. Right now those primarily come from China, and a key ingredient in fast-following frontier models is distillation; from Anthropic’s blog:

We have identified industrial-scale campaigns by three AI laboratories—DeepSeek, Moonshot, and MiniMax—to illicitly extract Claude’s capabilities to improve their own models. These labs generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts, in violation of our terms of service and regional access restrictions.

These labs used a technique called “distillation,” which involves training a less capable model on the outputs of a stronger one. Distillation is a widely used and legitimate training method. For example, frontier AI labs routinely distill their own models to create smaller, cheaper versions for their customers. But distillation can also be used for illicit purposes: competitors can use it to acquire powerful capabilities from other labs in a fraction of the time, and at a fraction of the cost, that it would take to develop them independently.

I absolutely believe this is a real problem, and wrote as much when DeepSeek R1 was released last year. I also think it’s in the interest of everyone other than the frontier labs to pretend that it isn’t; open source models are not subject to the frontier labs’ markup or compute constraints, which is exactly why it benefits most companies to have them available, whether or not they are distilled. Of course that doesn’t mean they are free to run: you still need to provide the compute.

Notice, however, how that makes stopping distillation even more of a priority for the frontier labs: first, they want to protect their margins. Second, however, their biggest cost is opportunity cost: the customers they can’t serve because they don’t have enough compute. To the extent they can make compute less useful for their potential customers — by stopping open source models from distilling their models — is the extent to which they can acquire that compute for themselves at more favorable rates.

Meta Muse

Mythos wasn’t the only new model announced last week: Meta released the first fruit of their new frontier lab as well. From the company’s blog post:

Today, we’re excited to introduce Muse Spark, the first in the Muse family of models developed by Meta Superintelligence Labs. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration.

Muse Spark is the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts. To support further scaling, we are making strategic investments across the entire stack — from research and model training to infrastructure, including the Hyperion data center…

Muse Spark offers competitive performance in multimodal perception, reasoning, health, and agentic tasks. We continue to invest in areas with current performance gaps, such as long-horizon agentic systems and coding workflows.

Muse Spark isn’t state of the art, but it’s in the game, and overall a positive first impression from Meta Superintelligence Labs. What is most notable to me, however, is the extent to which the last nine months of AI have made clear that CEO Mark Zuckerberg made the right call to embark on that “ground-up overhaul of [Meta’s] AI efforts”.

The trigger for O’Laughlin’s post that I opened this Article with was reasoning, where models using more tokens led to better answers; since then agents have exponentially increased token demand, as they can use LLMs continuously without a human in the loop. This is a huge driver in sky-rocketing demand for Claude, as well as OpenAI’s Codex. Moreover, this use case is so potentially profitable that not only is Anthropic’s revenue sky-rocketing, but OpenAI is pivoting its focus to enterprise.

Indeed, you can make the argument that one of OpenAI’s biggest challenges is the fact it has such a popular consumer product in ChatGPT. I, with my Aggregation Theory lens, have long maintained that that userbase was a big advantage for OpenAI, but that assumed that the company could effectively monetize it, which is why I have argued so vociferously for an advertising model. OpenAI has big projections for exactly that, but until that materializes, that big consumer base is a big opportunity cost in terms of OpenAI’s focus and compute. The company has, to its credit and in the face of widespread skepticism, made significant investments in more compute, but the temptation to allocate more and more compute to agentic use cases that enterprises will pay for, even at the expense of the consumer business, will be very large.

This puts Meta in a unique position relative to everyone else in the industry: unlike any of the hyperscalers or the frontier labs, Meta does not have an enterprise or cloud business to worry about. That means that serving the consumer market comes with no opportunity costs. Of course those opportunity costs would be much smaller anyways, given that Meta already has an at-scale advertising business to monetize usage. In other words, Meta may actually face less competition in winning the consumer space than it might have seemed a few months ago, simply because that is their primary focus — and because they have their own model, which means they don’t need to worry about not having access to the frontier labs (much of this analysis applies to Google, of course).

This, by the same token, is why Meta should open source Muse, just like they did Llama. The entities that will be the most hurt by widespread availability of a frontier model are other frontier labs, who will see their pricing power reduced and face increased competition for compute. This will make it even harder for them to bear the opportunity cost of pursuing the consumer market, leaving it for Meta.

Demand vs. Supply

So is “the era of Aggregation Theory…behind us”? On one hand, the insight that the way to create and maintain value will come from owning the customer is almost certainly going to continue to be the case. On the consumer side owning customers leads to advertising which provides the revenue to provide services to customers. On the enterprise side — which, I would note, has never been an arena where Aggregation Theory was meant to be applied — I think it’s likely that both Anthropic and OpenAI continue to move up the stack and deliver features that compete with software providers directly (an approach that is also in line with not making leading edge models publicly available).

On the other hand, O’Laughlin’s observation that we are and will continue to be compute constrained is an important one: companies will not be able to assume they can serve everyone, because serving one set of customers imposes the opportunity cost of not serving another. This won’t, at least in theory, last forever: at some point AI will be “good enough” for enough use cases that there will be enough compute capacity to take advantage of the fact that there really aren’t meaningful marginal costs entailed in serving AI; that theoretical future, however, feels further away than ever.

OpenAI is betting that this compute constraint — and the deals they have made to overcome it — will matter more than Anthropic’s current momentum with end users. From Bloomberg:

OpenAI told investors this week that its early push to dramatically increase computing resources gives it a key advantage over Anthropic PBC at a moment when its longtime rival is gaining ground and mulling a potential public offering.

The ChatGPT maker said it has outpaced Anthropic by “rapidly and consistently” adding computing capacity to support wider adoption of its software, according to a note the company sent to some of its investors after Anthropic announced a more powerful AI model called Mythos. The ambitious infrastructure build-out, criticized by some as too costly, has enabled OpenAI to better keep pace with rising demand for AI products, the memo states.

I’m less certain that this will be dispositive. When it comes to AI, distribution and transaction costs are still free — the two preconditions for Aggregators — which means that the winners should be those with the most compelling products. Those products will win the most users, providing the money necessary to source the compute to serve them; consider Anthropic’s deal to secure a meaningful portion of TPU supply, which, given the capacity constraints at TSMC, is ultimately an example of taking supply from Google. I suspect that Anthropic can take more, including already built hyperscaler and neocloud capacity. Yes, that compute will be more expensive, but if demand is high enough the necessary cash flow will be there.

In other words, my bet is that owning demand will ultimately trump owning supply, suggesting that the underlying principles of Aggregation Theory lives on. To put it another way, I think that OpenAI will need to win with better products, not just more compute; then again, if more compute is the key to better products, then does supply matter most? Regardless, they’ll certainly be focused on delivering both to the enterprise customers who are driving Anthropic’s astonishing growth. The real cost may be the consumer market they currently dominate, given that Meta has nothing to lose and everything to gain.
Apple’s 50 Years of Integration

Tuesday, March 31, 2026

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

There is a weird phenomenon as a sports fan where the athletes on the field or court are older than you…and then they’re your age…and then they’re all younger than you; for me the last athlete I could look up to, at least in terms of age, was Tom Brady.

Tech companies are similar, in a way. I like to write about tech history, and the importance of origin stories for understanding company cultures, and I’m fortunate enough to have witnessed most of those origins. However, there are still some companies that pre-date me — the Tom Brady’s of the industry, if you will — and one of those is Apple, which turns 50 tomorrow.

Apple History

My first computer was a hand-me-down IBM-compatible 286 — I don’t even remember the brand — but I mostly cut my teeth building my own computers with overclocked Celeron chips in college, using parts procured by leveraging unsustainable dot-com era customer acquisition strategies (a unique email address meant a PayPal account with a free $25 and a single-use credit card with another free $25 used for a Value America account with a $50 off coupon). Needless to say I not only witnessed many of these companies’ births, but also their deaths!

There were Apple II’s at my elementary school, where I would type out programs in BASIC, but my first serious interaction with the company’s products was at the college newspaper doing layout in QuarkXPress; after I graduated I was smitten by the iMac G4 and its adjustable arm, and the GarageBand addition to the iLife suite; I ended up buying an iBook, and here I am, a quarter of a century later, typing this Article on a MacBook Pro.

In my history is much of Apple’s history. I missed the very early years, when the Apple I was a mere circuit board created by Steve Wozniak; Steve Jobs bought the parts for the initial batch on net-30 terms and paid them off by receiving cash-on-delivery from a computer shop in Mountain View; it was the Apple II, released in 1977, that made the company, and that was my first encounter with Apple. The Mac came out in 1984, and found its niche in desktop publishing; that’s how I came back to Apple in college. Apple, however, was struggling in the face of more capable modular Windows PCs, which I was happily building in the meantime.

It was OS X that changed Apple’s fortunes with nerds, and Jony Ive’s stunning designs that changed the value proposition for everyone else; iLife, meanwhile, made the Mac useful from day one. It was the combination of all three that made me a customer, and as the Internet destroyed lock-in, it was the fit and finish of the operating system and Apple’s independent developer ecosystem that made my two years at Microsoft with Windows a drag; then, in 2020, Apple’s differentiation came full circle: Macs were the fastest personal computers — particularly laptops — in the world.

There were, of course, other parts of the Apple story, including the iPod and, most importantly, the iPhone. Those were the products that made Apple the most valuable company in the world for years (today Apple is surpassed only by Nvidia). These products, however, might have been in a form that addressed a far larger market, but were still very much Apple, a company that, all these years later, faces no competition when it comes to integrating hardware and software.

Apple’s Competitors

What do I mean by “no competition”? Well, consider Apple’s nominal competitors through the years:

IBM: This is, perhaps, the most iconic photo from early Apple:

Jean Pigozzi via Andy Hertzfeld

The Apple I launched in a world where computing was primarily for the enterprise, and primarily happened on IBM’s mainframes. Increased accessibility of processors and memory, however, made hobbyist computers possible, which is exactly what the Apple I was.

It was the Apple II, however, that made IBM pay attention; I explained in 2013’s The Truth About Windows Versus the Mac:

In the late 1970s and very early 1980s, a new breed of personal computers were appearing on the scene, including the Commodore, MITS Altair, Apple II, and more. Some employees were bringing them into the workplace, which major corporations found unacceptable, so IT departments asked IBM for something similar. After all, “No one ever got fired…”

IBM spun up a separate team in Florida to put together something they could sell IT departments. Pressed for time, the Florida team put together a minicomputer using mostly off-the-shelf components; IBM’s RISC processors and the OS they had under development were technically superior, but Intel had a CISC processor for sale immediately, and a new company called Microsoft said their OS — DOS — could be ready in six months. For the sake of expediency, IBM decided to go with Intel and Microsoft.

IBM was, in the end, just a hardware maker; they couldn’t be bothered to make the software.

Microsoft: Software fell to Microsoft. Continuing from that 2013 Article:

The rest, as they say, is history. The demand from corporations for IBM PCs was overwhelming, and DOS — and applications written for it — became entrenched. By the time the Mac appeared in 1984, the die had long since been cast. Ultimately, it would take Microsoft a decade to approach the Mac’s ease-of-use, but Windows’ DOS underpinnings and associated application library meant the Microsoft position was secure regardless.

For decades after the fact, conventional wisdom was that Microsoft’s modular approach — the one that let me build my own computers — was unquestionably superior to Apple’s integration of hardware and software. In fact, it was Apple’s integration that kept the company afloat: all of those Macs used for desktop publishing were expensive, and gave Apple enough revenue to (barely) stay in business; the company’s brief foray into licensing Macintosh OS was a major contributor to the company nearly going bankrupt.

Or, to put it another way, Apple only briefly competed with Microsoft, and it nearly killed them.

Consumer Electronics Companies: It’s difficult to choose a company to represent the iPod era, because Apple didn’t really face any meaningful competition. There was Sony and the Discman, and Diamond and Creative with some of the first MP3 players, but the reality is that no one had the combination of hardware and software that made the iPod special; in this case, the software was iTunes, and putting iTunes on Windows is what propelled Apple far beyond the Macintosh, and laid the groundwork for what came next.

RIM, Palm, and Nokia: It was early smartphone makers who were, in the framing I am taking in this Article, the only true competition Apple has ever had. All three of these companies integrated hardware and software, which makes sense given that the smartphone category was so nascent — that’s when integration is particularly important.

The iPhone, however, was different in one important regard: RIM, Palm (which also sold phones with Microsoft’s Windows Mobile), and Nokia first and foremost made phones; the iPhone was a full-blown computer, built on a foundation of OS X. That, combined with the iPhone’s innovative multi-touch input method, resulted in a vastly more capable and compelling device that wiped out all three companies.

Android: Android is, in many respects, the Windows to Apple’s iOS — which was why many commentators predicted that Apple was doomed. One critical difference, however, is in the Article I excerpted above: whereas DOS came before the Mac, the iPhone came before Android. That meant that Apple had a critical mass of users and developers first, in contrast to the 1980s. Another difference is that the iPhone sold to end users, not IT departments, who actually cared about the look and feel of the device they were spending their money on. A third difference is that Apple had (and continues to have) the performance advantage, thanks to their investment in their own silicon, a stark difference from the dead end the company found itself in with the Mac.

Android is, of course, a big success, with more unit market share worldwide (although the iPhone has majority share in the U.S.). There is a place for modularity, and companies like Samsung have done well to build high-end Android-powered devices, with a host of Chinese companies in particular filling in the lower-end. And, it should be noted, that Google makes its own Pixel phones as well; that is true competition, albeit one that barely registers given Google’s commitment to the entire Android ecosystem (so few, if any Pixel-exclusive features, at least not for long), and Apple’s grip on the high-end of the market.

Perhaps Apple’s most interesting new product is one that takes the company full circle. The MacBook Neo is the cheapest Mac laptop ever, and has the company poised for major gains in the low-end of the market. Notably, in defiance of the assumption that modular offerings take share by being cheaper and “good enough”, Apple, by making everything from operating system to device to chip, is selling a computer that is both higher quality and has higher performance with lower component costs than the alternatives in its class; and, now that there is no more software lock-in — the Neo runs a browser and an AI chat client just like Windows machines do — Apple is poised to make major gains in its oldest market.

Apple Aggregates AI

More generally, Apple’s market share in all of its markets, including the phone, continues to increase over time, not decrease. This is happening despite the fact that Apple is not investing at a meaningful level — at least compared to its Big Tech peers — in AI server capacity, and has yet to ship the new AI-empowered Siri it promised nearly two years ago. The reason it doesn’t matter is that no matter how powerful AI becomes, you still need to access it with a device, and Apple, thanks to its integration of hardware and software, makes the best devices.

Now, according to Bloomberg, Apple is planning to leverage its position with end users to give access to multiple AI providers:

Apple Inc. plans to open Siri to outside artificial intelligence assistants, a major move aimed at bolstering the iPhone as an AI platform. The company is preparing to make the change as part of a Siri overhaul in its upcoming iOS 27 operating system update, according to people with knowledge of the matter. The assistant can already tap into ChatGPT through a partnership with OpenAI, but Apple will now allow competing services to do the same…

The company is developing new tools to allow AI chatbot apps installed via the App Store to integrate with the Siri assistant, said the people, who asked not to be identified because the plans haven’t been announced. The chatbots will also work with an upcoming Siri app and other features in the Apple Intelligence platform. That means, for instance, if users have Alphabet Inc.’s Google Gemini or Anthropic PBC’s Claude installed, they’d be able to send queries to those services from within the Siri voice assistant, just like they have been able to with ChatGPT since Apple Intelligence launched in 2024. The approach also should allow Apple to generate more money from third-party AI subscriptions through the App Store.

This isn’t quite Safari search, wherein Apple earns a revenue share from Google for searches made through the iPhone’s built-in browser, but given that AI assistants are largely monetized through subscriptions, it’s not far off: Apple will happily sell subscriptions through the App Store and take 30% of the price for the first year, and 15% after that. Owning the device means Apple gets to aggregate AI (and the company is already making $1 billion a year from chatbot subscriptions).

This is exactly what I expected after Apple announced that initial partnership with OpenAI; from a 2024 Update

Apple, probably more than any other company, deeply understands its position in the value chains in which it operates, and brings that position to bear to get other companies to serve its interests on its terms; we see it with developers, we see it with carriers, we see it with music labels, and now I think we see it with AI. Apple — assuming it delivers on what it showed with Apple Intelligence — is promising to deliver features only it can deliver, and in the process lock in its ability to compel partners to invest heavily in features it has no interest in developing but wants to make available to Apple’s users on Apple’s terms.

The company that owns the point of integration in the value chain never wants to have an exclusive supplier; it wants to commoditize its complements, which means creating a modular interface for multiple companies to compete on the integrator’s terms, which is exactly what these AI extensions for App Store apps sound like.

Of course there still is the matter of getting Apple Intelligence to work; this upcoming feature is separate from Apple’s deal with Gemini for foundation models for Siri. I explained the distinction in this Update, and concluded:

The big problem with this vision is that it assumed that Apple Intelligence would be competent, and it simply wasn’t; just as the iPhone search deal wouldn’t be worth much if the iPhone sucked, Siri chatbot integration isn’t worth much if Siri sucks. Now, however, Google is selling the underlying model to make Siri good, and their biggest hope is that they can pay Apple all of their money back — and more! — to have a money-making Gemini sit on top.

Apple will let the users decide who is on top; I’m sure the company would also be amenable to be paid to be the default!

Apple and OpenAI

Many people are taking a victory lap about Apple’s decision to not compete in AI models, claiming that the company is winning by not trying; I previously linked to Horace Dediu’s The most brilliant move in corporate history?, but it’s a good articulation of the argument:

The hyperscalers are now spending 94% of their operating cash flows on AI infrastructure. Amazon is projected to go negative free cash flow this year with as much as $28 billion in the red. Alphabet’s free cash flow is expected to collapse 90% from $73 billion to $8 billion. These companies used to be the greatest cash machines ever built. Now they’re borrowing money to keep the data center lights on…

And what are they getting for that $650 billion? AI services generate roughly $35 billion in total revenue or 5% of what’s being spent on infrastructure. There are dreams of more of course, but the business models of AI have yet to resonate, especially for consumers…

Apple didn’t miss the AI revolution. It just bet that the winners won’t be the ones who build the infrastructure. They’ll be the ones who own the customer and no one else on Earth owns the best customers.

Apple owns the best customers because it makes the best devices, thanks to its integration of hardware and software. And, as I recounted above, it is somehow, fifty years on, the only company of its kind. There is, however, an emerging threat that Apple is seeking to head off. Again from Bloomberg:

Apple Inc. awarded rare bonuses to iPhone hardware designers this week, aiming to stem a wave of departures to AI startups like OpenAI that are building their own devices. The company granted out-of-cycle bonuses worth several hundred thousand dollars to many members of its iPhone Product Design team, according to people with knowledge of the matter.

Apple’s leadership has grown increasingly concerned about the number of engineers being poached by potential rivals. OpenAI, which has tapped former Apple design chief Jony Ive to help design a new generation of AI-centric products, has emerged as a particular threat…OpenAI’s hardware division is run in part by Apple veteran Tang Tan. He used to oversee the iPhone product design team that’s receiving the bonuses. Tan’s group at OpenAI has hired several dozen Apple engineers, and not just ones who worked on the iPhone. The startup has lured employees who helped develop the iPad, Apple Watch and Vision Pro.

OpenAI isn’t just hiring designers; the company is also building out operations capabilities to be able to actually make the upcoming Ive-designed device at scale (presumably in China). Still, many are wondering about the status of OpenAI’s hardware device given the news about Sora; from the Wall Street Journal:

OpenAI is planning to pull the plug on its Sora video platform, a product it released to great fanfare last year that has since fallen from public view. The move is one of a number of steps OpenAI is taking to refocus on business and coding functions ahead of a potential initial public offering as soon as the fourth quarter of this year. CEO Sam Altman announced the changes to staff on Tuesday, writing that the company would wind down products that use its video models. In addition to the consumer app, OpenAI is also discontinuing a version of Sora for developers and won’t support video functionality inside ChatGPT, either.

OpenAI is in the middle of a strategy shift to redirect the company’s computing resources and top talent toward so-called productivity tools that can be used by both enterprises and individual users. Last week, OpenAI announced that it was combining its ChatGPT desktop app, coding tool Codex and browser into one “superapp.” The company expects the consolidated product to align its employees around a single vision.

In fact, cutting Sora but keeping the hardware initiative fits this strategy shift: Sora, along with the also indefinitely delayed adult-mode, were products that drive more attention, which lends itself to the more traditional consumer business model of advertising. Productivity, on the other hand, is a much better fit for enterprise, where Anthropic is making major gains. The problem, however, is that most consumers aren’t willing to pay for software; what they are willing to pay for are devices. This was the secret of the iPhone; from 2016’s Everything as a Service:

Apple has arguably perfected the manufacturing model: most of the company’s corporate employees are employed in California in the design and marketing of iconic devices that are created in Chinese factories built and run to Apple’s exacting standards (including a substantial number of employees on site), and then transported all over the world to consumers eager for best-in-class smartphones, tablets, computers, and smartwatches.

What makes this model so effective — and so profitable — is that Apple has differentiated its otherwise commoditizable hardware with software. Software is a completely new type of good in that it is both infinitely differentiable yet infinitely copyable; this means that any piece of software is both completely unique yet has unlimited supply, leading to a theoretical price of $0. However, by combining the differentiable qualities of software with hardware that requires real assets and commodities to manufacture, Apple is able to charge an incredible premium for its products.

OpenAI is approaching this space from the opposite direction: it has a massive consumer user base for ChatGPT, and an impressively large number of subscribers; it is also adding advertising. However, to truly monetize consumers the most attractive business model is the Apple model: integrated hardware and software.

Apple’s Real AI Threat

The truth is that Apple’s lack of investment in AI was always going to be a short to medium-term win: the company doesn’t have to spend on infrastructure, and everyone still needs a device. The real threat is in the long-term: what happens if AI becomes so good that it obviates traditional user interfaces? Or, to put it another way, what if the point of integration that is most compelling is not a traditional operating system and hardware device, but rather AI and a dedicated device?

If this threat materializes, it won’t be with OpenAI’s initial offering; the smartphone is the ultimate form factor, and does so many jobs that depend on its flexibility and capability and 3rd-party ecosystem that no new entrant could hope to compete (indeed, Google and Android is arguably a bigger threat for this reason). However, just how capable might AI be not just next year, but in five years, or ten years? If ever a better interaction paradigm were to succeed the smartphone surely it will be rooted in AI — and Apple, by giving up now, won’t be in the game.

This absolutely is not a prediction. Indeed, if I had to bet, I would bet on Apple keeping its place:
- First, there is the likelihood that the smartphone, thanks to its screen, connectivity, and battery life, is in fact the best device for AI, and that furthermore, AI will be just one capability alongside everything a smartphone already does.
- Second, to the extent that AI inference moves to the edge, Apple has a big advantage thanks to its industry-leading chips.
- Third, Apple always has the option of opening up its devices to allow for much deeper integration with 3rd-party AI providers other than OpenAI, in order to effectively fight off a potential threat.
It’s also worth noting that OpenAI has, in its relatively short life, managed to frame itself as a competitor to basically everyone in tech, from Google to Meta to Microsoft, only to find itself forced to pivot in the face of Anthropic and its focused approach on coding and productivity in the enterprise. The audacity of taking on everyone is impressive; the effectiveness of fighting everyone for everything may be less so.

Still, there is an angle here for OpenAI, and a point of vulnerability for Apple. The company made it fifty years with no one truly competing with its integrated business model; the fate of its next fifty years may rest on the question of just how compelling AI ends up being — and if OpenAI can out-Apple the original.
Agents Over Bubbles

Monday, March 16, 2026

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

There is a weird paradox in terms of AI prognostication: on one hand, you don’t want to be the one to completely dismiss the most terrifying doomsday scenarios; who wants to be found out to be foolishly optimistic? At the same time, there is also pressure to give credence to the possibility that we are in a bubble, and all of this hype and spending is going to go belly up.

While I have argued against the former, I have very much been on board with the latter, making the case that bubbles can be good.

Sitting here in March 2026, however, on the morning of Nvidia’s GTC, I’ve come to a different conclusion: I don’t think we’re in a bubble (which, paradoxically, maybe is the truest evidence we are).

LLM Paradigms

Over the last couple of weeks, first in the context of Nvidia’s earnings, and then last week in the context of Oracle’s, I’ve talked about three LLM inflection points.

ChatGPT: The first LLM inflection point was the November 2022 launch of ChatGPT, which hardly needs an explanation. Yes, transformer-based large language models were introduced in 2017, and the capabilities were both impressive and growing, but under-appreciated; Stratechery started an interview series with Daniel Gross and Nat Friedman in October 2022 under the premise that there was an incredible new technology that was sorely lacking for product applications and startup energy.

Needless to say, that was entirely flipped on its head just weeks later. ChatGPT opened the eyes of the world to what LLMs were capable of, but the initial versions had two flaws that have stuck in many people’s minds, particularly those convinced that we are in a bubble.

The first flaw is that LLMs frequently got things wrong, and worse, would hallucinate when it didn’t know the answer. This made LLMs feel like something of a parlor trick: amazing when they work, but not something that you can count on. The second was related to the first: even in that flawed state LLMs were tremendously useful, but you needed to have an idea of what to use them for, and you needed to proactively take care to manage mistakes and verify the output in case it was hallucinated.

o1: The second LLM inflection point was the release of OpenAI’s o1 model in September 2024. By that point LLMs had improved tremendously, both thanks to new foundation models and also because of continued improvements in post-training; that meant that the stream of tokens that constituted an answer in ChatGPT or Claude was now much more likely to be right, and they were somewhat less likely to hallucinate. What made o1 different, however, was that it reasoned over its answer before delivering it to you. I explained in an Update at the time:

The big challenge for traditional LLMs is that they are path-dependent; while they can consider the puzzle as a whole, as soon as they commit to a particular guess they are locked in, and doomed to failure. This is a fundamental weakness of what are known as “auto-regressive large language models”, which to date, is all of them.

Reasoning models self-evaluate: they work through an answer and then consider if the answer is correct, or if they should consider other alternatives. To put it in terms of the weaknesses I identified above, they were internally proactive in terms of managing mistakes, reducing the burden on the user to continually actively guide the LLM, and the results were remarkable. From my perspective, if the brilliance of ChatGPT was in making LLMs much more readable and useful, the brilliance of o1 was in making LLMs much more reliable and essential.

Opus 4.5: Anthropic released Opus 4.5 on November 24, 2025, to relatively little fanfare; then, at some point in December, Claude Code with Opus 4.5 suddenly seemed to be able to do things that were never possible previously. OpenAI released GPT-5.2-Codex around the same time, on December 18, and it was similarly capable. People had been talking about “agents” for a while; suddenly, however, both Claude and Codex were actually accomplishing tasks — some of which took hours — and doing them correctly.

That bit about the Opus 4.5 model’s release date is interesting, however: the key thing about agentic workloads is that they are about more than the model, or using the model recursively, like o1. Rather, a critical component of making agentic workloads work is the “harness”, i.e. the software that actually controls the model.

To put it another way, Claude Code and OpenAI’s Codex actually abstract the user away from the model: you give instructions to an agent, which actually directs the model; critically, the agent can also use other deterministic tools as well, which means that it can verify its results. To put it in the context of coding, in paradigm one an LLM would generate code; in paradigm two an LLM would think about the code it was generating and iterate towards a better answer; in this paradigm an agent directs a model to generate code, then checks to see if the code actually works, and if it doesn’t tries again, all without the user needing to be involved.

In other words, many of the biggest flaws from the original ChatGPT have been substantially mitigated, at least for verifiable use cases like coding: LLMs are much more likely to be right the first time, they reason over their results to increase their chances, and now agents actively verify the results without humans needing to be in the loop. That leaves one flaw: actually figuring out what to use these for.

The Decreased Need for Agency

The reason I’ve been writing about these three inflection points over the last couple of weeks has been to explain why it is that the industry is so compute constrained and why the massive investment in capex by the hyperscalers is justified.
- The first paradigm required a lot of compute for training, but inference — actually answering a question — was relatively efficient: you simply sent the user whatever the model spit out.
- The second paradigm dramatically increased the amount of computing needed for inference, for two reasons: first, generating an answer required a lot more tokens, because all of the “reasoning” required tokens, in addition to the answer itself. Second, the fact that reasoning made the models so much more useful meant that they were used more, which drove increased token usage in its own right.
- It’s the third paradigm, however, that has truly tipped the scales in favor of capex expenditure not being speculative investment but rather badly needed investment in meeting demand that far exceeds supply. First, generating an answer will often entail multiple calls to a reasoning model. Second, the agent itself needs compute, and that compute — and the tools the agent uses — is better done by CPUs than GPUs. Third, agents are another step function increase in usefulness, which means they are going to be used even more than even reasoning models in a chatbot.
It’s how this third point will be manifested that I think is under-appreciated. After all, far more people use chatbots than use agents, and I would make the case that most people are not using chatbots as much as they should! It’s been a question of agency: to get the most from AI requires actually taking the initiative to use AI; I wrote in 2024’s MKBHD’s For Everything:

Large language models are intelligent, but they do not have goals or values or drive. They are tools to be used by, well, anyone who is willing and able to take the initiative to use them. I don’t think either Brownlee or I particularly need AI, or, to put it another way, are overly threatened by it…The connection between us and AI, though, is precisely the fact that we haven’t needed it: the nature of media is such that we could already create text and video on our own, and take advantage of the Internet to — at least in the case of Brownlee — deliver finishing blows to $230 million startups.

How many industries, though, are not media, in that they still need a team to implement the vision of one person? How many apps or services are there that haven’t been built, not because one person can’t imagine them or create them in their mind, but because they haven’t had the resources or team or coordination capabilities to actually ship them?

This gets at the vector through which AI impacts the world above and beyond cost savings in customer support, or whatever other obvious low-hanging fruit there may be: as the ability of large language models to understand and execute complex commands — with deterministic computing as needed — increases, so too does the potential power of the sovereign individual telling AI what to do. The Internet removed the necessity — and inherent defensibility — of complex cost structures for media; AI has the potential to do the same for a far greater host of industries.

It’s interesting to read that two years on, realize that I was writing about the latest paradigm shift well before it happened, and yet feel completely blown away by that paradigm shift all the same. That’s how big of a deal actually functional agents are: you can see them coming and yet still be amazed when they arrive — and, as one must say with everything related to AI, in a form that is the worst they will ever be.

It’s the implications on agency, however, that are the most profound: yes, you need agency to use agents, and yes, the number of people who will have that agency are probably far fewer than those who might use a chatbot. Of course you can make the (almost certainly accurate) case that chatbots will become agent managers in their own right, but the more critical observation is that by abstracting humans away from direct model management any one single human can control multiple agents.

What this means in terms of compute — and by extension, economic impact — is that it actually won’t require that many people with agency to drastically increase the amount of compute that is actively utilized to create products with meaningful economic impact. In other words, the rise of agents doesn’t just mean a dramatic increase in compute, but also a narrowing of the need for widescale adoption by humans for that demand to manifest. Yes, AI still needs agency; it just doesn’t need agency from that many people for its impact to be profound.

Enterprise Economic Imperatives

Apple-focused media, in the wake of the recent MacBook Neo launch, latched onto comments from Asus CFO Nick Wu on the company’s recent earnings call describing the $599 computer as “a shock to the entire market”; equally interesting, however, was how Wu sought to downplay the Neo’s potential effects on that market:

Actually, we heard about the MacBook Neo shipments coming online back in the second half of last year. So we made some internal preparations. But after the product officially released, we found the specs to have some limitations. For example, the memory is not upgradable, and it only has 8 gigabytes of memory. So this may limit certain applications. So I think when Apple positioned the product, it’s probably focused more on content consumption. This differs somewhat from mainstream notebook usage scenarios because in that case, the Neo feels more like a tablet because tablets are mostly for content consumption.

This feels like a bit of a cop-out, given just how capable the Neo’s processor is, and how well Mac OS operates on 8GB of RAM, thanks in part to Apple’s deep integration of hardware and software; at the same time, Wu is tapping into something that is true, which is that most consumers mostly do just want to consume content (which, I would add, means he should be more worried about the Neo, not less). This is why your favorite productivity application always ends up pivoting to the enterprise: it is companies who are willing to pay for productivity, because they are the ones actually paying for the workers who they want to be more productive.

It’s reasonable to expect this to apply to AI as well: the most compelling consumer applications of AI, at least in the near term, are Google and Meta’s advertising businesses, which sit alongside content. By the same token, it was always unrealistic for OpenAI to think that it could convert more than a small percentage of consumers into subscribers; that’s both why an ad model is essential, and also why that won’t be enough to pay the bills. It’s definitely the case that most people don’t want to pay for AI; it remains to be seen if they want to use it enough to make the ad model work.

That is another way of saying that Anthropic got it right by focusing almost entirely on the enterprise market: companies have a demonstrated willingness to pay for software that makes their employees more productive, and AI certainly fits the bill in that regard. What makes enterprise executives truly salivate, however, is the prospect of AI not simply eliminating jobs, but doing so precisely because that makes the company as a whole more productive.

It’s always been the case, even in large companies, that a relatively small number of people actually move the needle and drive the company forward in meaningful ways. That drive, however, has been filtered through a huge apparatus, filled with humans, who accelerate the effort in some vectors, and retard it in others. That apparatus makes broad impact possible, but it carries massive coordination costs.

Agents, however, will tilt much more heavily towards pure acceleration, making those drivers of value much more impactful. I’m sympathetic to the argument that the best companies will want to use AI to do more, not simply save money; the reality of large organizations, however, is that the positive impact of AI will not be in eliminating jobs, but rather replacing hard-to-manage-and-motivate human cogs in the organizational machine with agents that not only do what they are told but do so tirelessly and continuously until the job is done.

This only makes the argument that we are not in a bubble that much more compelling:
- First, all of the weaknesses of LLMs are being addressed by exponential increases in compute.
- Second, the number of people who need to wield AI effectively for demand to skyrocket is decreasing.
- Third, the economic returns from using agents aren’t just impactful on the bottom line, but the top line as well.
In this context, is it any wonder that every single hyperscaler says that demand for compute exceeds supply, and that every single hyperscaler is, in the face of stock market skepticism, announcing capex plans that blow away expectations?

This is also why the impending wave of layoffs that are going to be credited to AI shouldn’t be completely dismissed as a useful cover for correcting over-hiring decisions in the COVID era, or right-sizing compensation structures in the wake of multiple contractions. That is all true!

At the same time, it’s worth considering that companies become bloated because that has long been the only way to scale, and it’s hard to know at what point the diminishing returns that come from the drag of coordination costs and a sprawling workforce outweigh the benefits of the marginal employee; you only find that point when you have blown past it, and it’s hard to go backwards.

AI, however, not only gives the aforementioned excuse to undo that bloat, but also moves the “rightsize” point significantly towards a much smaller workforce. More and more companies are not simply going to wonder if they hired too much for a pre-AI world, but also if they hired too much for a post-AI world; the most forward-looking and future-proof approach will likely be to cut more rather than less, with the hope that those who remain have no choice but to rebuild scale with agents. After all, if they don’t, dramatically smaller competitors built with AI from the beginning will soon be nipping at their heels with both smaller cost structures and more capabilities that will structurally increase over time.

There is a good chance this is going to get ugly; I’m not advocating for this outcome, rather analyzing why it is probably going to happen. The economic imperatives are going to be impossible to resist, and will fuel demand for even more compute over time, further supporting the case that this is no bubble.

Agents and the AI Value Chain

Another important bubble question is about the sky-high valuations of Anthropic and OpenAI: sure, maybe all of this stuff is real, but if models are a commodity, is there any profit to be made? Horace Dediu raises these questions at Asymco and wonders if Apple is executing The Most Brilliant Move in Corporate History:

Here is where Apple’s bet becomes genius. AI models are commoditizing faster than anyone predicted. Software and hardware both have tendencies to commodify. Protections exist but they have to do with integration and distribution. DeepSeek built a model for $6 million that matches systems costing $100 million. Open source models now power 80% of startups seeking VC funding. The moat these companies are spending hundreds of billions to build is evaporating.

Apple understood this before anyone else. It didn’t build its own AI model, it licensed Google’s Gemini for about $1 billion a year. Why spend $100 billion building a factory when outsourcing costs a billion? And if a better model appears next year, Apple just switches vendors…Apple didn’t miss the AI revolution. It just bet that the winners won’t be the ones who build the infrastructure. They’ll be the ones who own the customer and no one else on Earth owns the best customers.

I think that nearly all of these assertions were defensible during the first LLM paradigm. It didn’t take long for multiple base models to be more than good enough for what most people use LLMs for, like, say, cooking or basic medical advice, or as a therapist or companion. Moreover, it was reasonable to expect that models of this quality would soon be able to run locally; I made the case that this was Apple’s opportunity myself back when their own models — which they absolutely did try to build, contra Dediu — failed to ship.

The reasoning paradigm, however, blew a significant hole in the local inference case. Not only do reasoning models require fast compute, given the number of tokens generated, but they also need exponentially more memory to accommodate much larger context windows, which is the biggest limitation of local models. Apple makes incredible chips with a compelling unified memory architecture that makes basic inference more plausible for their devices than anyone else; there is also no scenario where capable reasoning models that are remotely competitive with cloud-based models are running locally in the foreseeable future.

It is agents, however, that may strike the fatal blow to Dediu’s argument. Specifically, I noted above that what made Opus 4.5 compelling was not the model release itself, but changes to the Claude Code harness that made it suddenly dramatically more useful. What this means is that model performance isn’t the only thing that matters: the integration between model and harness is where true agent differentiation is found.

This is a very big deal when it comes to figuring out the future structure of the AI industry and where profits will flow, because profits flow away from modular parts of the value chain — which are commoditized — and flow towards integrated parts of the value chain, which are differentiated. Apple is of course the ultimate example of this: its hardware is not commoditized because it is integrated with their software, which is why Apple can charge sustainably higher prices and capture nearly the entirety of the PC and smartphone sector profits.

It follows, then, that if agents require integration between model and harness, that the companies building that integration — specifically Anthropic and OpenAI (Gemini is a strong model, but Google hasn’t yet shipped a compelling harness) — are actually poised to be significantly more profitable than it might have seemed as recently as late last year. And, by the same token, companies who were betting on model commoditization may struggle to deliver competitive products.

The canary in the coal mine in this regard is Microsoft. Microsoft once fancied itself as an integrated AI provider, bragging on an earnings call about how its deep integration with OpenAI would mean sustainably differentiated infrastructure; a month later OpenAI nearly imploded and Microsoft pivoted, talking increasingly about models as commodities and a Core AI strategy that entailed building infrastructure around models that themselves would be interchangeable and abstracted away from Microsoft’s customers.

Fast forward to last week, however, when Microsoft revealed how they will handle the potential business impact of AI reducing seats, which is a bit of a problem for their seat-based business model: the company is going to bundle AI into a new higher-tiered enterprise offering, E7, which is going to cost twice as much — $99 per seat per month — as the formerly top-of-the-line E5. That’s a big increase, which Microsoft needs to justify with AI that actually makes those seats more productive, and the product they launched with the new bundle was Copilot Cowork.

If the “Cowork” name sounds familiar, it’s because this is basically the enterprise version of Claude Cowork, a GUI-ified version of Claude Code that the company released earlier this year. There are important differences with the Microsoft version, including the fact that the latter runs in the cloud and is grounded in your organizational data, with all of the permission and access policies that go with it. What is crucial, however, is that Copilot Cowork — unlike the Copilot chatbot — is not model agnostic: Cowork is an agent, which means it needs both a model and a harness, and those are two integrated pieces, not modular components.

The implications of this are significant: Microsoft is admitting, at least for now, that delivering a truly compelling agentic product that enterprises are willing to pay for means abandoning their stated goal of being model agnostic; that, by extension, raises the possibility that models are not and will not be commodities, because agents require more than models.

This certainly raises questions about Apple’s decision to merely license Gemini and build a harness themselves in the form of new Siri. Microsoft decided that they couldn’t deliver a compelling product by going that route; what has Apple done to inspire faith that they can do a better job? If anything, the company’s saving grace is the point that Dediu ended with: consumers may simply not care that much about agents, in which case Apple will be fine with good enough, even as Microsoft, with enterprise customers who do care, realizes it needs to share more margin than it might want to with Anthropic.

What matters in terms of this Article, however, is that if agents are making Anthropic and OpenAI the points of integration in the value chain, then the bubble argument that these companies are overvalued, or that the massive investments other companies are making on their behalf in data centers is unwarranted, may not be correct.

I must, in the end, address my opening parenthetical: I’ve long maintained that there is no need to be worried about a bubble as long as everyone is worried about a bubble; it’s the moment when caution is flung to the wind and assurances are made that this is definitely not a bubble that we might actually be in one. And, well, I think the rise of agents means we are not in a bubble. The capex is warranted, and Anthropic and OpenAI look more durable than ever. If my declaring there is no bubble means there is one, then so be it!
Anthropic and Alignment

Monday, March 2, 2026

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

Just because you do not take an interest in politics doesn’t mean politics won’t take an interest in you.
― Pericles

This is not an Article about the campaign being waged by the U.S. against Iran, but it’s a useful — and timely — analogy. There is a never-ending debate that can be had about the concept of International Law and who might be violating it. Some will argue that the U.S. is in violation for the attacks; others will note that Iran has been serially violating International Law with both its overt actions and its support of terror networks for my entire life.

What is important to note is that the entire debate is ultimately pointless: the very concept of “international law” is fake, not because pertinent statutes and agreements don’t exist, but because their effectiveness is ultimately rooted in their enforceability. That, by extension, means there must be an entity to enact such enforcement, with the capability to match, and such an entity does not exist.

Yes, there is the United Nations, but said body only operates by the agreement of its members, and their willingness to subjugate themselves to not only its edicts, but to also put forward the capabilities to enforce its mandates. In other words, the only agents that matter are nation states themselves, and the relative power of those nation states is not a function of lawyers and judges but rather their ability to project force and coerce others.

To put it another way, if, after this weekend, you want to hold onto the concept of International Law, then realize the debate has been resolved: Iran was in violation, because their military just had its clock cleaned by the U.S., which means the U.S. decides who is right and who is wrong.

Anthropic vs. The Department of War

While most of the U.S., and certainly the rest of the world, was preoccupied with the happenings in Iran, another fervent debate has been ongoing in tech. Once again one of the parties is the United States itself, but the other entity in question is a private company, Anthropic. From the Wall Street Journal:

The federal government will stop working with Anthropic and designate the artificial intelligence company a supply-chain risk, a dramatic escalation of the government’s clash with the company over how its technology can be used by the Pentagon. While Anthropic’s relationship with the administration hit a new low, rival OpenAI said late Friday that it reached an agreement with the Defense Department to have its models used in classified settings, until recently a status only held by Anthropic. Friday’s quick-fire developments between the Pentagon and two Silicon Valley darlings are poised to shape the future of how the federal government and, particularly the Pentagon, uses cutting-edge AI tools.

Anthropic staked out its position earlier in the week in a Statement from Dario Amadei on [its] discussions with the Department of War:
In a narrow set of cases, we believe AI can undermine, rather than defend, democratic values. Some uses are also simply outside the bounds of what today’s technology can safely and reliably do. Two such use cases have never been included in our contracts with the Department of War, and we believe they should not be included now:
- Mass domestic surveillance. We support the use of AI for lawful foreign intelligence and counterintelligence missions. But using these systems for mass domestic surveillance is incompatible with democratic values. AI-driven mass surveillance presents serious, novel risks to our fundamental liberties. To the extent that such surveillance is currently legal, this is only because the law has not yet caught up with the rapidly growing capabilities of AI. For example, under current law, the government can purchase detailed records of Americans’ movements, web browsing, and associations from public sources without obtaining a warrant, a practice the Intelligence Community has acknowledged raises privacy concerns and that has generated bipartisan opposition in Congress. Powerful AI makes it possible to assemble this scattered, individually innocuous data into a comprehensive picture of any person’s life—automatically and at massive scale.
- Fully autonomous weapons. Partially autonomous weapons, like those used today in Ukraine, are vital to the defense of democracy. Even fully autonomous weapons (those that take humans out of the loop entirely and automate selecting and engaging targets) may prove critical for our national defense. But today, frontier AI systems are simply not reliable enough to power fully autonomous weapons. We will not knowingly provide a product that puts America’s warfighters and civilians at risk. We have offered to work directly with the Department of War on R&D to improve the reliability of these systems, but they have not accepted this offer. In addition, without proper oversight, fully autonomous weapons cannot be relied upon to exercise the critical judgment that our highly trained, professional troops exhibit every day. They need to be deployed with proper guardrails, which don’t exist today.
To our knowledge, these two exceptions have not been a barrier to accelerating the adoption and use of our models within our armed forces to date.

The Department of War has stated they will only contract with AI companies who accede to “any lawful use” and remove safeguards in the cases mentioned above. They have threatened to remove us from their systems if we maintain these safeguards; they have also threatened to designate us a “supply chain risk” — a label reserved for US adversaries, never before applied to an American company — and to invoke the Defense Production Act to force the safeguards’ removal. These latter two threats are inherently contradictory: one labels us a security risk; the other labels Claude as essential to national security.

Regardless, these threats do not change our position: we cannot in good conscience accede to their request.
I actually didn’t realize before this episode that the National Security Agency (NSA) is a part of the Department of War; that certainly provides useful context around the surveillance point. And, as we saw a decade ago with the Snowden revelations, the NSA can be both aggressive and creative in its interpretations of what is legal in terms of surveillance. One might have hoped that telecom companies in particular might have taken a stand like Anthropic did.

At the same time, what is the standard by which it should be decided what is allowed and not allowed if not laws, which are passed by an elected Congress? Anthropic’s position is that Amodei — who I am using as a stand-in for Anthropic’s management and its board — ought to decide what its models are used for, despite the fact that Amodei is not elected and not accountable to the public.

And, on the second point, who decides when and in what way American military capabilities are used? That is the responsibility of the Department of War, which ultimately answers to the President, who also is elected. Once again, however, Anthropic’s position is that an unaccountable Amodei can unilaterally restrict what its models are used for.

It’s worth noting that there are reports that Anthropic’s concerns may be broader than just fully autonomous weapons; from Semafor:

Anthropic is one of the few “frontier” large language models available for classified use by the US government because it is available through Amazon’s Top Secret Cloud and through Palantir’s Artificial Intelligence Platform, which is how its Claude chatbot ended up appearing on the screens of officials who were monitoring the seizure of then-Venezuelan President Nicolás Maduro…

Soon after the Maduro raid, during a regular check-in that Palantir holds with Anthropic, an Anthropic official discussed the operation with a Palantir senior executive, who gathered from the exchange that the AI startup disapproved of its technology being used for that purpose. The Palantir executive was alarmed by the implication of Anthropic’s inquiry that the company might resist the use of its technology in a US military operation, and reported the conversation back to the Pentagon, a senior Defense Department official said.

Anthropic denied it objected to whatever involvement Claude may have had in the Maduro raid, but the Semafor story resonates given the trend in some tech circles to resist any involvement in military operations. And, to that end, one could argue that this stand-off is ending as it should: Anthropic and its models will be removed from the Department of War tech stack, and an alternative will take their place.

North Korea and Nuclear Weapons

Amodei has been outspoken about other aspects of AI and national security; from Bloomberg in January:

Anthropic Chief Executive Officer Dario Amodei said selling advanced artificial intelligence chips to China is a blunder with “incredible national security implications” as the US moves to allow Nvidia Corp. to sell its H200 processors to Beijing. “It would be a big mistake to ship these chips,” Amodei said in an interview with Bloomberg Editor-in-Chief John Micklethwait at the World Economic Forum in Davos, Switzerland. “I think this is crazy. It’s a bit like selling nuclear weapons to North Korea.”

This rather raises the stakes of a messy procurement decision: consider the implications if we take Amodei’s analogy literally.

Start with Iran: beyond the fact that Iran has been responsible for the deaths of thousands of Americans throughout the Middle East and beyond, one of the arguments for the U.S. intervention is that Iran continues to pursue nuclear weapons capabilities. It’s North Korea that shows why: North Korea doesn’t need to buy nuclear weapons, because they already have them, and it certainly makes any sort of potential military action against them considerably more complicated. Nuclear weapons make you an effective lawyer in the (nonexistent¹) court of international law!

In short, nuclear weapons meaningfully tilt the balance of power; the extent that AI is of equivalent importance is the extent to which the United States has far more interest in not only what Anthropic lets it do with its models, but also what Anthropic is allowed to do period.

This, I think, gives important context to the designation of Anthropic as a supply chain risk. Secretary of War Pete Hegseth said on X:

In conjunction with the President’s directive for the Federal Government to cease all use of Anthropic’s technology, I am directing the Department of War to designate Anthropic a Supply-Chain Risk to National Security. Effective immediately, no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic.

This would decimate Anthropic: at a bare minimum the company relies on cloud hosting from AWS, Microsoft, and Google, all of which have contracts with the Department of War; I imagine the same applies to Nvidia. Fortunately for the company, Hegseth’s declaration does seem out of step with the law, which limits Hegseth’s authority to work covered by U.S. government contracts; in other words, AWS could still serve Anthropic models, as long as it doesn’t use Anthropic models for any of its services offered to the U.S. government.

Regardless, this is an extreme measure that has been met with near universal dismay, even amongst people who are sympathetic to the idea that a private company should not have veto power over the U.S. military. Why would the U.S. government want to kneecap one of its AI champions?

In fact, Amodei already answered the question: if nuclear weapons were developed by a private company, and that private company sought to dictate terms to the U.S. military, the U.S. would absolutely be incentivized to destroy that company. The reason goes back to the question of international law, North Korea, and the rest:
- International law is ultimately a function of power; might makes right.
- There are some categories of capabilities — like nuclear weapons — that are sufficiently powerful to fundamentally affect the U.S.’s freedom of action; we can bomb Iran, but we can’t North Korea.
- To the extent that AI is on the level of nuclear weapons — or beyond — is the extent that Amodei and Anthropic are building a power base that potentially rivals the U.S. military.
Anthropic talks a lot about alignment; this insistence on controlling the U.S. military, however, is fundamentally misaligned with reality. Current AI models are obviously not yet so powerful that they rival the U.S. military; if that is the trajectory, however — and no one has been more vocal in arguing for that trajectory than Amodei — then it seems to me the choice facing the U.S. is actually quite binary:
- Option 1 is that Anthropic accepts a subservient position relative to the U.S. government, and does not seek to retain ultimate decision-making power about how its models are used, instead leaving that to Congress and the President.
- Option 2 is that the U.S. government either destroys Anthropic or removes Amodei.
Note that I’m not making the (very good) argument put forward by Anduril founder Palmer Luckey about the importance of democratic oversight; Luckey wrote on X:

This gets to the core of the issue more than any debate about specific terms. Do you believe in democracy? Should our military be regulated by our elected leaders, or corporate executives?…

The fact that this is a debate over AI does not change the underlying calculus. The same problems apply to definitions and use of ethically fraught but important capabilities like surveillance systems or autonomous weapons. It is easy to say “But they will have cutouts to operate with autonomous systems for defensive use!”, but you immediately get into the same issues and more — what is autonomous? What is defensive? What about defending an asset during an offensive action, or parking a carrier group off the coast of a nation that considers us to be offensive?

At the end of the day, you have to believe that the American experiment is still ongoing, that people have the right to elect and unelect the authorities making these decisions, that our imperfect constitutional republic is still good enough to run a country without outsourcing the real levers of power to billionaires and corpos and their shadow advisors. I still believe. And that is why “bro just agree the AI won’t be involved in autonomous weapons or mass surveillance why can’t you agree it is so simple please bro” is an untenable position that the United States cannot possibly accept.

Again, I think this is a good argument; the one I am putting forward, however, is much more basic and brutal, and doesn’t have anything to do with belief or not in the American experiment (although I’m with Luckey in that regard): it simply isn’t tolerable for the U.S. to allow for the development of an independent power structure — which is exactly what AI has the potential to undergird — that is expressly seeking to assert independence from U.S. control.

Complex Systems

I don’t, for the record, want Anthropic to be destroyed, and I want them to be a U.S. AI champion. I also, for the record, don’t trust Amodei’s judgment in terms of either national security or AI security.

In terms of national security, I already commented on Amodei’s Davos comments on X:

Last year I laid out in AI Promise and Chip Precariousness why I believed a systemic view of the U.S.-China rivalry entailed some painful tradeoffs when it came to chips and China:

The important takeaway that is relevant to this Article is that Taiwan is the flashpoint in both scenarios. A pivot to Asia is about gearing up to defend Taiwan from a potential Chinese invasion or embargo; a retrenchment to the Americas is about potentially granting — or acknowledging — China as the hegemon of Asia, which would inevitably lead to Taiwan’s envelopment by China.

This is, needless to say, a discussion where I tread gingerly, not least because I have lived in Taipei off and on for over two decades. And, of course, there is the moral component entailed in Taiwan being a vibrant democracy with a population that has no interest in reunification with China. To that end, the status quo has been simultaneously absurd and yet surprisingly sustainable: Taiwan is an independent country in nearly every respect, with its own border, military, currency, passports, and — pertinent to tech — economy, increasingly dominated by TSMC; at the same time, Taiwan has not declared independence, and the official position of the United States is to acknowledge that China believes Taiwan is theirs, without endorsing either that position or Taiwanese independence.

Chinese and Taiwanese do, in my experience, handle this sort of ambiguity much more easily than do Americans; still, gray zones only go so far. What has been just as important are realist factors like military strength (once in favor of Taiwan, now decidedly in favor of China), economic ties (extremely deep between Taiwan and China, and China and the U.S.), and war-waging credibility. Here the Ukraine conflict and the resultant China-Russia relationship looms large, thanks to the sharing of military technology and overland supply chains for oil and food that have resulted, even as the U.S. has depleted itself. That, by extension, gets at another changing factor: the hollowing out of American manufacturing under Pax Americana has been directly correlated with China’s dominance of the business of making things, the most essential war-fighting capability.

Still, there is — or rather was — a critical factor that might give China pause: the importance of TSMC. Chips undergird every aspect of the modern economy; the rise of AI, and the promise of the massive gains that might result, only make this need even more pressing. And, as long as China needs TSMC chips, they have a powerful incentive to leave Taiwan alone.

The key thing to consider is the opposite scenario: cutting China off from advanced chips doesn’t just reduce the likelihood that Chinese companies are dependent on a U.S.-based ecosystem, it also reduces the cost of destroying TSMC. More than that, if AI becomes as capable as Amodei says it will — the equivalent, or more, of nuclear weapons — then it actually becomes game theory optimal for China to do exactly that: if China can’t have AI, then it is, at least under current circumstances, relatively easy to make sure that nobody does.

Amodei is, as the quote above notes, cognizant of China as a threat generally; it concerns me that he consistently fails to acknowledge that the implication of his recommended course of action in terms of chip controls is to risk destroying AI for everybody.

Then again, Amodei isn’t really a fan of AI for everybody: he and Anthropic have been vocal opponents of open source models, and were major drivers of what I considered a very misguided Biden executive order about AI. Like the Taiwan situation, I think these positions evince a failure to think systematically:
- First, were there only closed AI systems, then unimaginable power would be vested in the owners of those systems; it seems that Amodei thinks that power should be wielded by him (at a minimum, I would prefer that it be wielded by the U.S. government).
- Second, the idea that AI safety can only be guaranteed by a limited number of responsible stewards ignores the massive incentives that exist to build other models. This was clear years ago when only a few companies were working on AI models, and has been proven out by what has happened in reality so far.
- Third, in a world of AI proliferation, the best defense against AI will be AI; this means that more AI is actually safer than limited AI, which means open source is ultimately safer.
There is certainly room for disagreement on these points; what concerns me about Amodei and Anthropic in particular is the consistent pattern of being singularly focused on being the one winner with all of the power, with limited consideration of how everyone else may react to that situation. Or, to be more blunt, the reality that other people exist and they have guns and missiles and yes, nuclear weapons. Might still makes right, and I personally would rather not hand over the future of humanity to a person and a company that seems to consistently forget that fact.

Who to Entrust

I do think this post on X from Ramez Naam is the most optimistic way to frame the debate this weekend:

I do have tremendous discomfort about AI’s surveillance capabilities in particular; there are a lot of safeguards we thought we had that were actually mostly due to the friction entailed in overcoming them. AI, even more than computers and the Internet, is a friction solvent, and I completely understand why Anthropic’s pushback on this specific point resonates broadly.

The way to address this new reality, however, is with new laws and through strengthening accountable oversight; cheering or even demanding that an unelected executive decide how and where such powerful capabilities can be used is the road an even more despotic future.

Our adversaries, meanwhile, will certainly be developing autonomous fighting capabilities (and yes, I admit my chip prescriptions make this more likely much sooner — tradeoffs are hard!); the U.S. will need to move in this direction if we are to remain the ultimate source of international law. And, by the U.S., I mean a democratically elected President and Congress, not a San Francisco executive. I don’t want that, and, more pertinently, the ones with guns aren’t going to tolerate it. Anthropic needs to align itself with that reality.

I wrote a follow-up to this Article in this Daily Update.
1. Yes, The Hague exists; its subject to all of the same limitations as the United Nations ↩
Thin Is In

Tuesday, February 17, 2026

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

There was, in the early days of computing, no debate about thick clients versus thin:

When a computer was the size of a room, there were no clients: you scheduled time or submitted jobs, and got back the results when it was your turn. A few years later, however, thin clients in the form of a monitor and keyboard arrived:

There is no computer in this image; rather, this is a terminal connected to a mainframe. That’s why it’s called a “thin” client: it’s just an interface, with all of the computing happening elsewhere (i.e. in another room). By the 1980s, however, “thick” clients were the dominant form of computing, in the form of the PC. All of your I/O and compute were packaged together: you typed on a keyboard connected to a PC, which output to the monitor in front of you.

A decade later, and Sun Microsystems in particular tried to push the idea of a “network computer”:

Adrian Cockcroft, CC-SA 4.0

This was a device that didn’t really have a local operating system; you ran Java applications and Java applets from a browser that were downloaded as they were used from a central server. Sun’s pitch was that network computers would be much cheaper and easier to administer, but PCs were dropping in price so quickly that the value proposition rapidly disappeared, and Windows so dominant that it was already the only platform that network administrators wanted to deal with. Thick clients won, and won decisively.

If you wanted to make a case for thin clients, you could argue that mobile devices are a hybrid; after all, the rise of mobile benefited from and drove the rise of the cloud: nearly every app on a phone connects to a server somewhere. Ultimately, however, mobile devices are themselves thick clients: they are very capable computers in their own right, that certainly benefit from being connected to a server, but are useful without it. Critically, the server component is just data: the actual interface is entirely local.

You can make the same argument about SaaS apps: on one hand, yes, they operate in the cloud and are usually accessed via a browser; on the other hand, the modern browser is basically an operating system in its own right, and the innovations that made SaaS apps possible were the fact that interactive web apps could be downloaded and run locally. Granted, this isn’t far off from Sun’s vision (although the language ended up being JavaScript, not Java), but you still need a lot of local compute to make these apps work.

AI vs. UI

The thick-versus-thin debate felt, for many years, like a relic; that’s how decisive was the thick client victory. One of the things that is fascinating about AI, however, is that the thin client concept is not just back, it’s dominant.

The clearest example of this is the interface that most people use to interact with AI: chat. There is no UI that matters other than a text field and a submit button; when you click that button the text is sent to a data center, where all of the computation happens, and an answer is sent back to you. The quality of the answer or of the experience as a whole is largely independent of the device you are using: it could be a browser on a PC, an app on a high-end smartphone, or the cheapest Android device you can find. The device could be a car, or glasses, or just an earpiece. The local compute that matters is not processing power, but rather connectivity.

This interaction paradigm actually looks a lot like the interaction paradigm for mainframe computers: type text into a terminal, send it to the computer, and get a response back. Unlike mainframe terminals, however, the user doesn’t need to know a deterministic set of commands; you just say what you want in plain language and the computer understands. There is no pressure for local compute capability to drive a user interface that makes the computer easier to use, because a more complex user interface would artificially constrain the AI’s capabilities.

Nicolas Bustamante, in an X Article about the prospects for vertical software in an AI world, explained why this is threatening:

When the interface is a natural language conversation, years of muscle memory become worthless. The switching cost that justified $25K per seat per year dissolves. For many vertical software companies, the interface was most of the value. The underlying data was licensed, public, or semi-commoditized. What justified premium pricing was the workflow built on top of that data. That’s over.

Bustamante’s post is about much more than chat interfaces, but I think the user interface point is profound: it’s less that AI user interfaces are different, and more that, for many use cases, they basically don’t exist.

This is even clearer when you consider the next big wave of AI: agents. The point of an agent is not to use the computer for you; it’s to accomplish a specific task. Everything between the request and the result, at least in theory, should be invisible to the user. This is the concept of a thin client taken to the absolute extreme: it’s not just that you don’t need any local compute to get an answer from a chatbot; you don’t need any local compute to accomplish real work. The AI on the server does it all.

Of course most agentic workflows that work tread a golden path, but stumble with more complex situations or edge cases. That, though, is changing rapidly, as models become better and the capabilities of the chips running them increase, particularly in terms of memory. When it comes to inference, memory isn’t just important for holding the model weights, but also retaining context about the task at hand.

To date most of the memory that matters has been high-bandwidth memory attached to the GPUs, but future architectures will offload context to flash storage. At the same time, managing agents is best suited to CPUs, which themselves need large amounts of DRAM. In short, both the amount of compute we have, and the capability of that compute, still isn’t good enough; once it crosses that threshold, though, demand will only get that much stronger.

This combination of factors will only accentuate the dominance of the thin client paradigm:
- First, if compute isn’t yet good enough, then workloads will flow to wherever compute is the best, which is going to be in large data centers.
- Second, if larger models and more context makes for better results, then workloads will flow to wherever there is the most memory available.
- Third, the expense of furnishing this level of compute means that it will be far more economical to share the cost of that compute amongst millions of users; guaranteeing high utilization and maximizing leverage on your up-front costs.
Yes, you can run large language models locally, but you are limited in the size of the model, the size of the context window, and speed. Meanwhile, the superior models with superior context windows and faster speeds don’t require a trip to the computer lab; just connect to the Internet from anywhere. Note that this reality applies even to incredible new local tools like OpenClaw: OpenClaw is an orchestration layer that runs locally, but the actual AI inference is, by default and in practice for most users, done by models in the cloud.

To put it another way, to be competitive, local inference would need some combination of smaller-yet-sufficiently-capable models, a breakthrough in context management, and critically, lots and lots of memory. It’s that last one that might be the biggest problem of all.

The Memory Crowd-Out

From Bloomberg:

A growing procession of tech industry leaders including Elon Musk and Tim Cook are warning about a global crisis in the making: A shortage of memory chips is beginning to hammer profits, derail corporate plans and inflate price tags on everything from laptops and smartphones to automobiles and data centers — and the crunch is only going to get worse…

Sony Group Corp. is now considering pushing back the debut of its next PlayStation console to 2028 or even 2029, according to people familiar with the company’s thinking. That would be a major upset to a carefully orchestrated strategy to sustain user engagement between hardware generations. Close rival Nintendo Co., which contributed to the surplus demand in 2025 after its new Switch 2 console drove storage card purchases, is also contemplating raising the price of that device in 2026, people familiar with its plans said. Sony and Nintendo representatives didn’t respond to requests for comment.

A manager at a laptop maker said Samsung Electronics has recently begun reviewing its memory supply contracts every quarter or so, versus generally on an annual basis. Chinese smartphone makers including Xiaomi Corp., Oppo and Shenzhen Transsion Holdings Co. are trimming shipment targets for 2026, with Oppo cutting its forecast by as much as 20%, Chinese media outlet Jiemian reported. The companies did not respond to requests for comment.

The memory shortage has been looming for a while, and is arguably the place where consumers will truly feel the impact of AI; I wrote in January in the context of Nvidia’s keynote at CES:

CES stands for “Consumer Electronics Show”, and while Nvidia’s gaming GPUs received some updates, they weren’t a part of [Nvidia CEO Jensen] Huang’s keynote, which was focused on that Vera Rubin AI system and self-driving cars. In other words, there wasn’t really anything for the consumer, despite the location, because AI took center stage. This is fine as far as Nvidia goes: both the Vera Rubin announcement and its new Alpamayo self-driving system are big deals. It is, however, symbolic of the impact AI is having on technology broadly, and that impact is set to impact consumer electronics in a major way. Specifically, not only is all of the energy and investment in the tech sector going towards AI, but so is the supply chain.

A big story over the last few months has been the dramatically escalating cost of memory as the major memory manufacturers shift their focus to high-bandwidth memory for AI chips in particular. What that means is that everything else is going to get a lot more expensive: memory is one of the most expensive components in nearly everything tech-related, and given the competitive and commoditized nature of the industry those costs will almost certainly be passed on to the end users.

This AI crowd-out dynamic arguably started with the hyperscalers, who diverted ever increasing parts of their budget to GPUs in place of CPU purchases, but now it’s coming for everything from grid power to turbines and now to components, and it’s only going to increase and become more impactful to end users. In other words, Nvidia may not have talked about consumer electronics at the Consumer Electronics Show, but they are having the biggest impact on the industry by far.

The downsides of this crowd-out effect are obvious; I pity anyone trying to build their own PC, for example, but soon their pain will be everyone’s pain as prices inevitably rise on everything that needs RAM.

At the same time, I think the reported PlayStation delay is telling: apparently the PS5 is “good enough” for Sony to wait for more memory capacity to come online, and they’re probably right! Thick clients — of which consoles like the PS5 are the ultimate example — have long since reached the point of diminishing returns when it comes to hardware improvements. I think you could make the same case for PCs and phones as well: what we already have is already more than sufficient for almost any task we want to do.

Moreover, the plateau in thick client capability is happening at the same time that the need for any capability at all is disappearing, thanks to these entirely new AI workflows that happen in the cloud. Yes, it sucks that AI is making memory scarce and personal computers of all kinds — from PCs to phones to consoles — more expensive; it’s also making them less important than ever.

Of course thick clients could make a comeback, particularly since local inference is “free” (i.e. the user pays for their own electricity). As I noted above, however, I’m skeptical about local inference in the near term for performance reasons, and the memory crunch is going to make it uneconomical for the foreseeable future. And, by the time local inference is a viable alternative, path dependency downstream of these few years may have already led to many workflows moving to this new paradigm.

It will, to be clear, be a transition: UI isn’t just about how to use a computer, it also, as Benedict Evans noted on a recent Interview, embeds critical aspects of how a business works. Open-ended text prompts in particular are a terrible replacement for a well-considered UI button that both prompts the right action and ensures the right thing happens. That’s why it’s the agent space that will be the one to watch: what workflows will transition from UI to AI, and thus from a thick client architecture to a thin one? Current workflows are TBD; future workflows seem inevitable.
Microsoft and Software Survival

Tuesday, February 3, 2026

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

One way to track the AI era, starting with the November 2022 launch of ChatGPT, is by which Big Tech company was, at a particular point in time, thought to be most threatened. At the beginning everyone — including yours truly — was concerned about Google and the potential disruption of Search. Then, early last year, it was Apple’s turn, as its more intelligent Siri stumbled so badly it didn’t even launch. By the fall it was Meta’s in the crosshairs, as the company completely relaunched its AI efforts as Llama hit a wall.

Now it’s Microsoft’s turn, which is a bit of a full circle moment, given that the company was thought to be the biggest winner from ChatGPT in particular, thanks to its partnership with OpenAI. I wrote in early 2023 in AI and the Big Five:

Microsoft, meanwhile, seems the best placed of all. Like AWS it has a cloud service that sells GPUs; it is also the exclusive cloud provider for OpenAI. Yes, that is incredibly expensive, but given that OpenAI appears to have the inside track to being the AI epoch’s addition to this list of top tech companies, that means that Microsoft is investing in the infrastructure of that epoch.

Bing, meanwhile, is like the Mac on the eve of the iPhone: yes it contributes a fair bit of revenue, but a fraction of the dominant player, and a relatively immaterial amount in the context of Microsoft as a whole. If incorporating ChatGPT-like results into Bing risks the business model for the opportunity to gain massive market share, that is a bet well worth making.

The latest report from The Information, meanwhile, is that GPT is eventually coming to Microsoft’s productivity apps. The trick will be to imitate the success of AI-coding tool GitHub Copilot (which is built on GPT), which figured out how to be a help instead of a nuisance (i.e. don’t be Clippy!).

What is important is that adding on new functionality — perhaps for a fee — fits perfectly with Microsoft’s subscription business model. It is notable that the company once thought of as a poster child for victims of disruption will, in the full recounting, not just be born of disruption, but be well-placed to reach greater heights because of it.

I do, I must admit, post that excerpt somewhat sheepishly, as much of it seems woefully shortsighted:
- OpenAI is still Azure’s biggest customer, but the fact that the maker of ChatGPT represents 45% of Azure’s Remaining Performance Obligations (RPO) is now seen as a detriment by the market.
- Bing was briefly interesting when it contained Sydney; Microsoft quickly squashed what remains the single most compelling AI experience I’ve had and, one could make the case, Bing’s growth prospects.
- All of Microsoft’s products have a CoPilot of some sort; it’s not clear how well any of them work, and both Claude and OpenAI are attacking the professional productivity space.
- Microsoft 365 CoPilot has 15 million paying customers, but (1) that’s a tiny fraction of Microsoft 365’s overall customer base and (2) the rise of agents raises serious questions about the long-term viability of the per-seat licensing model on which Microsoft’s productivity business is built.
All of these factors — plus the fact that Azure growth came in a percentage point lower than expected — contributed to one of the worst days in stock market history. From Bloomberg last week:

Microsoft Corp. shares got caught up in a selloff Thursday that wiped out $357 billion in value, second-largest for a single session in stock market history. The software giant’s stock closed down 10%, its biggest plunge since March 2020, following Microsoft’s earnings after the bell Wednesday, which showed record spending on artificial intelligence as growth at its key cloud unit slowed. The only bigger one-day valuation destruction was Nvidia Corp.’s $593 billion rout last year after the launch of DeepSeek’s low-cost AI model. Microsoft’s move is larger than the market capitalizations of more than 90% of S&P 500 Index members, according to data compiled by Bloomberg…

The selloff comes amid heightened skepticism from investors that the hundreds of billions of dollars Big Tech is spending on AI will eventually pay off. Microsoft’s results showed a 66% rise in capital expenditures in its most recent quarter to a record $37.5 billion, while growth at its closely tracked Azure cloud-computing unit slowed from the prior quarter.

I laid out my base case for Big Tech back in 2020 in The End of the Beginning, arguing that the big tech companies would be the foundation on which future paradigms would be built; is Microsoft the one that might crack?

The Beneficiaries of AI-Written Code

It can, when it comes to vibe coding, be difficult to parse the hype on X from the reality on the ground; what is clear is the trajectory. I have talked to experienced software engineers who will spend 10 minutes complaining about the hype and all of the shortcomings of Claude Code or OpenAI Codex, only to conclude by admitting that AI just helped them write a new feature or app that they never would have otherwise, or would have taken far longer to do than it actually did.

The beauty of AI writing code is that it is a nearly perfect match of probabilistic inputs and deterministic outputs: the code needs to actually run, and that running code can be tested and debugged. Given this match I do think it is only a matter of time before the vast majority of software is written by AI, even if the role of the software architect remains important for a bit longer.

That, then, raises the most obvious bear case for any software company: why pay for software when you can just ask AI to write your own application, perfectly suited to your needs? Is software going to be a total commodity and a non-viable business model in the future?

I’m skeptical, for a number of reasons. First, companies — particularly American ones — are very good at focusing on their core competency, and for most companies in the world, that isn’t software. There is a reason most companies pay other companies for software, and the most fundamental reason to do so won’t change with AI.

Second, writing the original app is just the beginning: there is maintenance, there are security patches, there are new features, there are changing standards — writing an app is a commitment to a never-ending journey — a journey, to return to point one, that has nothing to do with the company’s core competency.

Third, selling software isn’t just about selling code. There is support, there is compliance, there are integrations with other software, the list of what is actually valuable goes far beyond code. This is why companies don’t run purely open source software: they don’t want code, they want a product, with everything that entails.

Still, that doesn’t mean the code isn’t being written by AI: it’s the software companies themselves that will be the biggest beneficiaries of and users of AI for writing code. In other words, on this narrow question of AI-written code, I would contend that software companies are not losers, but rather winners: they will be able to write more code more efficiently and quickly.

AI Competition

When the Internet first came along it seemed, at first glance, a tremendous opportunity for publishers: suddenly their addressable market wasn’t just the geographic area they could deliver newspapers to, but rather the entire world! In fact, the nature of the opportunity was the exact opposite; from 2014’s Economic Power in the Age of Abundance:

One of the great paradoxes for newspapers today is that their financial prospects are inversely correlated to their addressable market. Even as advertising revenues have fallen off a cliff — adjusted for inflation, ad revenues are at the same level as the 1950s — newspapers are able to reach audiences not just in their hometowns but literally all over the world.

The problem for publishers, though, is that the free distribution provided by the Internet is not an exclusive. It’s available to every other newspaper as well. Moreover, it’s also available to publishers of any type, even bloggers like myself.

To be clear, this is absolutely a boon, particularly for readers, but also for any writer looking to have a broad impact. For your typical newspaper, though, the competitive environment is diametrically opposed to what they are used to: instead of there being a scarce amount of published material, there is an overwhelming abundance. More importantly, this shift in the competitive environment has fundamentally changed just who has economic power.

The power I was referring to was Google; this Article was an articulation of Aggregation Theory a year before I coined the term.

The relevance to AI-written code, however, is not necessarily about Aggregators, but rather about inputs. Specifically, what changed for publishers is that the cost of distribution went to zero: of course that was beneficial for any one publisher, but it was disastrous for publishers as a collective. In the case of software companies, the input that is changing is the cost of code: it’s not going completely to zero, at least not yet — you still need a managing engineer, for one, and tokens, particularly for leading edge models actually capable of writing usable code, have significant marginal costs — but the relative cost is much lower, and the trend is indeed towards zero.

If you want to carry this comparison forward, this is an argument against there even being a market for software in the long run. After all, the most consumed form of content on the Internet today, three decades on, is in fact user-generated content, which you could analogize to companies having AI write their own software. That seems a reasonable bet for 2056 — if we even have companies then (I think we will).

In the shorter-term, however, the real risk I see for software companies is the fact that while they can write infinite software thanks to AI, so can every other software company. I suspect this will completely upend the relatively neat and infinitely siloed SaaS ecosystem that has been Silicon Valley’s bread-and-butter for the last decade: identify a business function, leverage open source to write a SaaS app that addresses that function, hire a sales team, do some cohort analysis, IPO, and tell yourself that you were changing the world.

The problem now, however, is that while businesses may not give up on software, they don’t necessarily want to buy more — if anything, they need to cut their spending so they have more money for their own tokens. That means the growth story for all of these companies is in serious question — the industry-wide re-rating seems completely justified to me — which means the most optimal application of that new AI coding capability will be to start attacking adjacencies, justifying both your existence and also presenting the opportunity to raise prices. In other words, for the last decade the SaaS story has been about growing the pie: the next decade is going to be about fighting for it, and the model makers will be the arms dealers.

Agents and Work IQ

While this battle is happening, there will be another fundamental shift taking place: yes, humans will be using software, at least for a while, but increasingly so will agents. What isn’t clear is who will be creating the agents: I expect every SaaS app to have their own agent, but that agent will definitionally be bound by the borders of the application (which will be another reason to expand the app into adjacent areas). Different horizontal players, meanwhile, will be making a play to cover broader expanses of the business, with the promise of working across multiple apps.

Microsoft is one of those horizontal layers, and the company’s starting point for agents is what it is calling Work IQ; here is how CEO Satya Nadella explained Work IQ on the company’s earnings call:

Work IQ takes the data underneath Microsoft 365 and creates the most valuable stateful agent for every organization. It delivers powerful reasoning capabilities over people, their roles, their artifacts, their communications and their history and memory all within an organization security boundary. Microsoft 365 Copilot’s accuracy and latency powered by Work IQ is unmatched, delivering faster and more accurate work grounded results than competition, and we have seen our biggest quarter-over-quarter improvement in response quality to date. This has driven record usage intensity with average number of conversations per user doubling year-over-year.

This feels like the right layer for Microsoft, given the company’s ownership of identity. Active Directory is one of the most valuable free products of all time: it was the linchpin via which Microsoft tied together all of its enterprise products and services, first driving upgrades up and down the stack, and later underpinning its per-seat licensing business model. That the company sees its understanding of the individual worker and all of his or her artifacts, permissions, etc. as the obvious place to build agents makes sense.

There’s one big problem with this starting point, however: it’s shrinking. Owning and organizing a company by identity is progressively less valuable if the number of human identities starts to dwindle — and, with a per-seat licensing model, you make less money. That, by extension, means that Microsoft should feel a significant amount of urgency when it comes to fighting the adjacency battles I predicted above. First, directly incorporating more business functions into Microsoft’s own software suite will make Microsoft’s agents more capable. Secondly, absorbing more business functions into Microsoft’s software offering will let the company charge more. Third, the larger Microsoft’s surface area, the more power it will have to compel other software makers to interface with its agents, increasing their capability.

Microsoft’s Miss

This pressure explains the choices Microsoft made that led to its Azure miss in particular. Microsoft was clear that, once again, demand exceeded supply. CFO Amy Hood said in her prepared remarks:

Our customer demand continues to exceed our supply. Therefore, we must balance the need to have our incoming supply better meet growing Azure demand with expanding first-party AI usage across services like M365 Copilot and GitHub Copilot, increasing allocations to R&D teams to accelerate product innovation and continued replacement of end-of-life server and networking equipment.

She further explained in the Q&A section that Azure revenue was directly downstream from Microsoft’s own capacity allocation:

I think it’s probably better to think about the Azure guidance that we give as an allocated capacity guide about what we can deliver in Azure revenue. Because as we spend the capital and put GPUs specifically, it applies to CPUs, the GPUs more specifically, we’re really making long-term decisions. And the first thing we’re doing is solving for the increased usage in sales and the accelerating pace of M365 Copilot as well as GitHub Copilot, our first-party apps. Then we make sure we’re investing in the long-term nature of R&D and product innovation. And much of the acceleration that I think you’ve seen from us and products over the past a bit is coming because we are allocating GPUs and capacity to many of the talented AI people we’ve been hiring over the past years.

Then, when you end up, is that, you end up with the remainder going towards serving the Azure capacity that continues to grow in terms of demand. And a way to think about it, because I think, I get asked this question sometimes, is if I had taken the GPUs that just came online in Q1 and Q2 in terms of GPUs and allocated them all to Azure, the KPI would have been over 40. And I think the most important thing to realize is that this is about investing in all the layers of the stack that benefit customers. And I think that’s hopefully helpful in terms of thinking about capital growth, it shows in every piece, it shows in revenue growth across the business and shows as OpEx growth as we invest in our people.

Nadella called this a portfolio approach:

Basically, as an investor, I think when you think about our capital and you think about the gross margin profile of our portfolio, you should obviously think about Azure. But you should think about M365 Copilot and you should think about GitHub pilot, you should think about Dragon Copilot, Security Copilot. All of those have a gross margin profile and lifetime value. I mean if you think about it, acquiring an Azure customer is super important to us, but so is acquiring an M365 or a GitHub or a Dragon Copilot, which are all by the way incremental businesses and TAMs for us. And so we don’t want to maximize just 1 business of ours, we want to be able to allocate capacity, while we’re sort of supply constrained in a way that allow us to essentially build the best LTV portfolio.

That’s on one side. And the other one that Amy mentioned is also R&D. I mean you got to think about compute is also R&D, and that’s sort of the second element of it. And so we are using all of that, obviously, to optimize for the long term.

The first part of Nadella’s answer is straightforward: Microsoft makes better margins and has more lifetime value from its productivity applications than it does from renting out Azure capacity, so investors should be happy that it is allocating scarce resources to that side of the business. And, per the competition point above, this is defensive as well: if Microsoft doesn’t get AI right for its own software then competitors will soon be moving in.

The R&D point, however, is also critical: Microsoft also needs to be working to expand its offering, and increasingly the way to do that is going to be by using AI to write that new software. That takes a lot of GPUs — so many that Microsoft simply didn’t have enough to meet the 40% Azure growth rate that Wall Street expected. I think it was the right decision.

Token Foundries

There are some broader issues raised by Microsoft’s capacity allocation. First, we have the most powerful example yet of the downside of having insufficient chips. Hood was explicit that Microsoft could have beat Wall Street’s number if they had enough GPUs; the fact they didn’t was a precipitating factor in losing $357 billion in value. How much greater will the misses be a few years down the road when AI demand expands even further, particularly if TSMC both remains the only option and continues to be conservative in its CapEx?

Secondly, however, it’s fair for Azure customers to feel a bit put out by Microsoft’s decision to favor itself. It reminds me of the pre-TSMC world, when fabs were a part of Integrated Device Manufacturers like Intel or Texas Instruments. If you wanted to manufacture a chip you could contract for space on their lines, but you were liable to lose that capacity if the fab needed it for their own products; TSMC was unique in that they were a pure play foundry: their capacity was solely for their customers, who they weren’t going to compete with.

This isn’t the case with Azure: Microsoft has first dibs, and then OpenAI, and then everyone else, and that priority order was made clear this quarter. Moreover, it’s fair to assume that Amazon and Google will make similar prioritization decisions. I didn’t, before writing this article, fully grok the potential for neoclouds, or Oracle for that matter, but the value proposition of offering a pure play token foundry might be larger than I appreciated.

All that noted, the safest assumption is that Microsoft, like the rest of Big Tech, will figure this out. Some software may be dead, but not all of it, at least not yet, and the biggest software maker of them all is — thanks in part to that size — positioned to be one of the survivors. It’s just going to need a lot of compute, not only for its customers, but especially for itself.
TSMC Risk

Monday, January 26, 2026

Listen to Podcast

Watch on YouTube
Listen to this post:

Log in to listen

You probably think, given this title, you know what this Article is about. The most advanced semiconductors are made by TSMC in Taiwan,¹ and Taiwan is claimed by China, which has not and will not take reunification-by-force off of the table.

Relatedly, AI obviously has significant national security implications; at Davos, Anthropic CEO Dario Amodei reiterated his objection to the U.S. allowing the sale of Nvidia chips to China. From Bloomberg:

Anthropic Chief Executive Officer Dario Amodei said selling advanced artificial intelligence chips to China is a blunder with “incredible national security implications” as the US moves to allow Nvidia Corp. to sell its H200 processors to Beijing. “It would be a big mistake to ship these chips,” Amodei said in an interview with Bloomberg Editor-in-Chief John Micklethwait at the World Economic Forum in Davos, Switzerland. “I think this is crazy. It’s a bit like selling nuclear weapons to North Korea.”

The nuclear weapon analogy is an interesting one: a lot of game theory was developed to manage the risk of nuclear weapons, particularly once the U.S.S.R. gained/stole nuclear capability, ending the U.S.’s brief monopoly on the technology. Before that happened, however, the U.S. had a dominant military position, given we had nuclear weapons and no one else did. Perhaps Amodei believes the U.S. should have advanced AI and China should not, giving us a dominant military position?

The problem with that reality, however, is Taiwan, as I explained in AI Promise and Chip Precariousness. AI, in contrast to nuclear weapons, has a physical dependency in Taiwan that can be easily destroyed by Chinese missiles, even without an invasion; if we got to a situation where only the U.S. had the sort of AI that would give us an unassailable advantage militarily, then the optimal strategy for China would change to taking TSMC off of the board.

Given this dependency, my recommendations in the Article run counter to Amodei: I want China dependent on not just U.S. chips but also on TSMC directly, which is why I argued in favor of selling Nvidia chips to China, and further believe that Huawei and other Chinese companies ought to be able to source from TSMC (on the flip side, I would ban the sale of semiconductor manufacturing equipment to Chinese fabs). I think it’s a good thing the Trump administration moved on the first point, at least.

However, this risk is not what this Article is about: there is another TSMC risk facing the entire AI industry in particular; moreover, it’s a risk the downside of which is already being realized.

The TSMC Brake

There was one refrain that was common across Big Tech earnings last quarter: demand for AI exceeds supply. Here was Amazon CEO Andy Jassy on the company’s earnings call:

You’re going to see us continue to be very aggressive investing in capacity because we see the demand. As fast as we’re adding capacity right now, we’re monetizing it.

Here was Microsoft CFO Amy Hood on the company’s earnings call:

Azure AI services revenue was generally in line with expectations, and this quarter, demand again exceeded supply across workloads, even as we brought more capacity online.

Here was Google CFO Anat Ashkenazi on the company’s earnings call:

In GCP, we see strong demand for enterprise AI infrastructure, including TPUs and GPUs, enterprise AI solutions driven by demand for Gemini 2.5 and our other AI models, and core GCP infrastructure and other services such as cybersecurity and data analytics. As I’ve mentioned on previous earnings calls, while we have been working hard to increase capacity and have improved the pace of server deployments and data center construction, we still expect to remain in a tight demand-supply environment in Q4 and 2026.

Here was Meta CEO Mark Zuckerberg on the company’s earnings call:

To date, we keep on seeing this pattern where we build some amount of infrastructure to what we think is an aggressive assumption. And then we keep on having more demand to be able to use more compute, especially in the core business in ways that we think would be quite profitable than we end up having compute for.

Earlier this month, TSMC CEO C.C. Wei admitted that the shortage was a lack of chips, not power; from the company’s earnings call:

Talking about to build a lot of AI data center all over the world, I use one of my customers’ customers’ answer. I asked the same question. They told me that they planned this one, 5-6 years ago already. So, as I said, those cloud service providers are smart, very smart. So, they say that they work on the power supply 5-6 years ago. So, today, their message to me is: silicon from TSMC is a bottleneck, and asked me not to pay attention to all others, because they have to solve the silicon bottleneck first. But indeed, we do get the power supply, all over the world, especially in the US. Not only that, but we also look at, who support those kind of a power supply, like a turbine, like, what, nuclear power plant, the plan or those kinds of things. We also look at the supply of the rack. We also look at the supply of the cooling system. Everything, so far, so good. So we have to work hard to narrow the gap between the demand and supply from TSMC.

The cause of that gap is obvious if you look at TSMC’s financials, specifically the company’s annual capital expenditures:

After a big increase in CapEx in 2021, driven by the COVID shortages and a belief in 5G, TSMC’s annual CapEx in the following years was basically flat — it actually declined on a year-over-year basis in both 2023 and 2024. Note those dates! ChatGPT was released in November 2022; that kicked off a massive increase in CapEx amongst the hyperscalers in particular, but it sure seems like TSMC didn’t buy the hype.

That lack of increased investment earlier this decade is why there is a shortage today, and is why TSMC has been a de facto brake on the AI buildout/bubble; I wrote last quarter:

To put it another way, if Altman and OpenAI are the ones pushing to accelerate the AI infrastructure buildout, it’s Wei and TSMC that are the brakes. The extent to which all of Altman’s deals actually materialize is dependent on how much TSMC invests in capacity now, and while they haven’t shown their hand yet, the company is saying all of the right things about AI being a huge trend without having yet committed to a commensurate level of investment, at least relative to OpenAI’s goals.

That Update was about the future, but it’s important to note that the TSMC brake has — if all of those CEO and CFO comments above are to be believed — already cost the biggest tech companies a lot of money. That’s the implication of not having enough supply to satisfy demand: there was revenue to be made that wasn’t, because TSMC didn’t buy the AI hype at the same time everyone else did.

TSMC’s CapEx Plans

TSMC is, finally, starting to invest more. Last year’s CapEx increased 37% to $41 billion, and there’s another increase in store for this year to $52–$56 billion; if we take the midpoint, that represents an increase of 32%, a bit less than last year:

Make no mistake, $54 billion is a big number, one that Wei admitted made him nervous:

You essentially try to ask whether the AI demand is real or not. I’m also very nervous about it. Yeah, you bet, because we have to invest about USD52 billion to USD56 billion for the CapEx, right? If we did not do it carefully, that will be a big disaster to TSMC for sure. So, of course, I spent a lot of time in the last three-four months talking to my customers and then customers’ customers. I want to make sure that my customers’ demands are real.

Wei made clear that he was worried about the market several years down the line:

If you build a new fab, it takes two and three year, two to three years to build a new fab. So even we start to spend $52 billion to $56 billion, the contribution to this year is almost none, and 2027, a little bit. So we actually, we are looking for 2028-2029 supply, and we hope it’s a time that the gap will be narrow…So 2026-2027 for the short-term, we are looking to improve our productivity. 2028 to 2029, yes, we start to increase our capacity significantly. And it will continue this way if the AI demand megatrend as we expected.

First off, this delayed impact explains why TSMC’s lack of CapEx increase a few years ago is resulting in supply-demand imbalance today. Secondly, notice how this year’s planned increase — which again, won’t really have an impact until 2028 — pales in comparison to the CapEx growth of the hyperscalers (2025 numbers are estimates; note that Amazon’s CapEx includes Amazon.com):

Remember, a significant portion of this CapEx growth is for chips that are supported by TSMC’s stagnant CapEx growth from a few years ago. It’s notable, then, that TSMC’s current and projected CapEx growth is still less than the hyperscalers: how much less is it going to be than the hyperscalers’ growth in 2028, when the fabs being built today start actually producing chips?

In short, the TSMC brake isn’t going anywhere — if anything, it’s being pressed harder than ever.

TSMC Risk

TSMC is, to be clear, being extremely rational. CapEx is inherently risky: you are spending money now in anticipation of demand that may or may not materialize. Moreover, the risk for a foundry is higher than basically any other business model: nearly all of a foundry’s costs are CapEx, which means that if demand fails to materialize, costs — in the form of depreciation — don’t go down as they might with a business model with a higher percentage of marginal costs. This is exacerbated by the huge dollar figures entailed in building fabs: $52–$56 billion may drive revenues with big margins, but those big margins can easily flip to being huge losses and years of diminished pricing power thanks to excess capacity. Therefore, it’s understandable that TSMC is trying to manage its risks. Sure, the company may be foregoing some upside in 2028, but what is top of Wei’s mind is avoiding “a big disaster.”

What is important to note, however, is that the risk TSMC is managing doesn’t simply go away: rather, it’s being offloaded to the hyperscalers in particular. Specifically, if we get to 2028, and TSMC still isn’t producing enough chips to satisfy demand, then that means the hyperscalers will be forgoing billions of dollars in revenue — even more than they are already forgoing today. Yes, that risk is harder to see than the risk TSMC is avoiding, because the hyperscalers aren’t going to be bankrupt for a lack of chips to satisfy demand. Still, the potential money not made — particularly when the number is potentially in the hundreds of billions of dollars — is very much a risk that the hyperscalers are incurring because of TSMC’s conservatism.

What the hyperscalers need to understand is that simply begging TSMC to make more isn’t going to fix this problem, because begging TSMC to make more is to basically ask TSMC to take back the risk TSMC is offloading to the hyperscalers — they already declined! Rather, the only thing that will truly motivate TSMC to take on more risk is competition. If TSMC were worried about not just forgoing its own extra revenue, but actually losing business to a competitor, then the company would invest more. Moreover, that extra investment would be stacked on top of the investment made by said competitor, which means the world would suddenly have dramatically more fab capacity.

If You Want a Bubble

In short, the only way to truly get an AI bubble, with all of the potential benefits that entails, or, in the optimistic case, to actually meet demand in 2028 and beyond, is to have competition in the foundry space. That, by extension, means Samsung or Intel — or both — actually being viable options.

Remember, however, the number one challenge facing those foundries: a lack of demand from the exact companies whom TSMC has deputized to take on their risk. I wrote in U.S. Intel:

Our mythical startup, however, doesn’t exist in a vacuum: it exists in the same world as TSMC, the company who has defined the modern pure play foundry. TSMC has put in the years, and they’ve put in the money; TSMC has the unparalleled customer service approach that created the entire fabless chip industry; and, critically, TSMC, just as they did in the mobile era, is aggressively investing to meet the AI moment. If you’re an Nvidia, or an Apple in smartphones, or an AMD or a Qualcomm, why would you take the chance of fabricating your chips anywhere else? Sure, TSMC is raising prices in the face of massive demand, but the overall cost of a chip in a system is still quite small; is it worth risking your entire business to save a few dollars for worse performance with a worse customer experience that costs you time to market and potentially catastrophic product failures?

We know our mythical startup would face these challenges because they are the exact challenges Intel faces. Intel may need “a meaningful external customer to drive acceptable returns on [its] deployed capital”, but Intel’s needs do not drive the decision-making of those external customers, despite the fact that Intel, while not fully caught up to TSMC, is at least in the ballpark, something no startup could hope to achieve for decades.

Becoming a meaningful customer of Samsung or Intel is very risky: it takes years to get a chip working on a new process, which hardly seems worth it if that process might not be as good, and if the company offering the process definitely isn’t as customer service-centric as TSMC. I understand why everyone sticks with TSMC.

The reality that hyperscalers and fabless chip companies need to wake up to, however, is that avoiding the risk of working with someone other than TSMC incurs new risks that are both harder to see and also much more substantial. Except again, we can see the harms already: foregone revenue today as demand outstrips supply. Today’s shortages, however, may prove to be peanuts: if AI has the potential these companies claim it does, future foregone revenue at the end of the decade is going to cost exponentially more — surely a lot more than whatever expense is necessary to make Samsung and/or Intel into viable competitors for TSMC.

This, incidentally, is how the geographic risk issue will be fixed, if it ever is. It’s hard to get companies to pay for insurance for geopolitical risks that may never materialize. What is much more likely is that TSMC’s customers realize that their biggest risk isn’t that TSMC gets blown up by China, but that TSMC’s monopoly and reasonable reluctance to risk a rate of investment that matches the rest of the industry means that the rest of the industry fails to fully capture the value of AI.
1. Yes, there are chips made in Arizona, but only a portion, and they need to be sent back to Taiwan for packaging and testing. ↩
Apple: You (Still) Don’t Understand the Vision Pro

Monday, January 12, 2026

Listen to Podcast

Watch on YouTube

Listen to this post:

Log in to listen

Dear Apple,

I was, given my interest in virtual and augmented reality, already primed to have a high degree of interest in the Vision Pro, but even so, I appreciate how you have gone out of your way to make sure I’m intrigued. You let me try the Vision Pro the day it was announced, and while I purchased my own the day it shipped (and had it flown over to Taiwan), you recently sent me a demo version of the M5 Vision Pro (it’s definitely snappier, although I don’t like the Dual Knit Band at all; the Solo Knit Band continues to fit my head best).

However, the reason I truly know you are trying to win my heart is that not only did you finally show a live sporting event in the Vision Pro, and not only was it an NBA basketball game, but the game actually featured my Milwaukee Bucks! Sure, I had to jump through VPN hoops to watch the broadcast, which was only available in the Lakers home market, but who am I to complain about watching Giannis Antetokounmpo seal the game with a block and a steal on LeBron James in my M5 Vision Pro?

And yet, complain I shall: you have — like almost every video you have produced for the Vision Pro — once again shown that you fundamentally do not understand the device you are selling. I’m incredibly disappointed, and cannot in good faith recommend any model of the Vision Pro to basketball fans (or anyone else for that matter).

Sports on TV

Apple, you are one of the grandfather’s of the tech industry at this point; it’s hard to believe that you are turning 50 this year! Still, you are much younger than TV generally, and sports on TV specifically. The first U.S. television broadcast of a sporting event was a Columbia-Princeton baseball game on May 17, 1939 on NBC; there was one camera accompanying the radio announcer. Three months later NBC televised the first Major League Baseball game between the Brooklyn Dodgers and Cincinnati Reds; this time they used two cameras.

All televised sports face a fundamental limitation when it comes to the fan experience: the viewer is experiencing something that is happening in real life 3D on a 2D screen; the solution NBC discovered from the very beginning was to not try and recreate the in-person experience, but to instead create something uniquely suited to this new medium. Two cameras became three, then four, then 147 — that’s how many cameras Fox used for last year’s Super Bowl broadcast. Of course many of those cameras were specialized: included in that number were 27 super slow motion cameras, 23 high resolution cameras, 16 robotic cameras, 10 wireless cameras, and two SkyCams.

The job of stitching all of those cameras together into one coherent broadcast falls on the production team, housed in a specially equipped truck outside the stadium; that team coordinates with the broadcast booth to provide a seamless experience where every jump feels natural and pre-meditated, even though it’s happening in real time. It’s a great experience!

And, of course, there is the pre-game, half-time, and post-game shows, which used an additional 64 cameras, including 12 wireless cameras, eight robotic cameras, seven augmented reality cameras, and a FlyCam. No broadcast is complete without something to fill the time when the game isn’t on. After all, as advanced as TV broadcasts may be, they still face the fundamental limitation that confronted NBC: how do you translate an in-person experience into something that is compelling for people on their couch looking at a 2D screen?

Vision Pro Disappointments

When I first tried the Vision Pro the demo included a clip from an NBA game that was later cut from the demo that shipped with the device (which was the one available in Apple Stores); it jumped out at me at the time:

What was much more compelling were a series of immersive video experiences that Apple did not show in the keynote. The most striking to me were, unsurprisingly, sports. There was one clip of an NBA basketball game that was incredibly realistic: the game clip was shot from the baseline, and as someone who has had the good fortune to sit courtside, it felt exactly the same, and, it must be said, much more immersive than similar experiences on the Quest.

It turns out that one reason for the immersion is that Apple actually created its own cameras to capture the game using its new Apple Immersive Video Format. The company was fairly mum about how it planned to make those cameras and its format more widely available, but I am completely serious when I say that I would pay the NBA thousands of dollars to get a season pass to watch games captured in this way. Yes, that’s a crazy statement to make, but courtside seats cost that much or more, and that 10-second clip was shockingly close to the real thing.

What is fascinating is that such a season pass should, in my estimation, look very different from a traditional TV broadcast, what with its multiple camera angles, announcers, scoreboard slug, etc. I wouldn’t want any of that: if I want to see the score, I can simply look up at the scoreboard as if I’m in the stadium; the sounds are provided by the crowd and PA announcer. To put it another way, the Apple Immersive Video Format, to a far greater extent than I thought possible, truly makes you feel like you are in a different place.

The first thing that has been frustrating about the Vision Pro has been the overall absence of content; Apple, you produced a number of shows for launch, and then added nothing for months. The pace has picked up a bit, but that has revealed a second frustration: I think that your production stinks!

One of the first pieces of sports content that you released was an MLS Season in Review immersive video in March 2024; I wrote in an Update:

I have a lot to say about this video and, by extension, the Vision Pro specifically, and Apple generally. Let me work my way up, starting with the video: it’s terrible. The problem — one that was immediately apparent before I got into all of the pedantry below — is that while the format is immersive, the video is not immersive at all. This is the big problem:

This is a screenshot of a stopwatch Mac app I downloaded because it supported keyboard shortcuts (and could thus use it while watching the immersive video). There are, in a five minute video, 54 distinct shots; that’s an average of one cut every six seconds! Moreover, there wasn’t that much gameplay: only 2 minutes and 32 seconds. Worse, some of the cuts happen in the same highlight — there was one play where there was a sideline view of the ball being passed up the field, and then it switched to a behind-the-goal view for the goal. I actually missed the goal the first time because I was so discombobulated that it took me a few seconds to even figure out where the ball was.

In short, this video was created by a team that had zero understanding of the Vision Pro or why sports fans might be so excited about it. I never got the opportunity to feel like I was at one of these games, because the moment I started to feel the atmosphere or some amount of immersion there was another cut (and frankly, the cuts were so fast that I rarely if ever felt anything). This edit may have been perfect for a traditional 2D-video posted on YouTube; the entire point of immersive video on the Vision Pro, though, is that it is an entirely new kind of experience that requires an entirely new approach.

I had the exact same response when you released a video of a Metallica concert last March:

As for the concert itself, the video was indeed very cool. The opening shot following James Hetfield walking into the stadium was very compelling, and, well, it was immersive. And then you cut to another camera angle, and while that camera angle was also immersive, the video as a whole no longer was.

What followed was a very enjoyable 30 minutes or so — I’ll probably watch it again — but it felt like a particularly neat documentary, not like I was at a concert. You had a monologue from each member of the band, you had shots of the crowd, you had three songs, all, as Apple proudly noted in their press release, shot with “14 Apple Immersive Video cameras using a mix of stabilized cameras, cable-suspended cameras, and remote-controlled camera dolly systems that moved around the stage.” That means the final product was edited together from those 14 cameras and the four interviews, which is to say it was a produced artifact of a live experience; at no point did I feel like I was at the concert.

News flash: I didn’t watch the video again. I’m just not that interested in a TV-style documentary of Metallica. I added:

We are nearly two years on from that introduction, and over a year beyond the actual launch of the Vision Pro, and there has yet to be an experience like I envisioned and thought was coming. What is frustrating is that the limiting factor is Apple itself: the company had 14 Apple Immersive Video cameras at this concert, but what I want is only one. I want an Apple Immersive Video camera planted in the audience, and the opportunity to experience the concert as if I were there, without an Apple editor deciding what I get to see and when.

Needless to say, you probably already know why I thought Friday’s telecast was a big disappointment.

Bucks vs. Lakers

I understand, Apple, why it’s not easy to record or even take a screenshot of a copyrighted game; please bear with me while I describe the experience using text.

When I started the broadcast I had, surprise surprise, a studio show, specially tailored for the Apple Vision Pro. In other words, there was a dedicated camera, a dedicated presenter, a dedicated graphics team, etc. There was even a dedicated announcing team! This all sounds expensive and special, and I think it was a total waste.

Here’s the thing that you don’t seem to get, Apple: the entire reason why the Vision Pro is compelling is because it is not a 2D screen in my living room; it’s an immersive experience I wear on my head. That means that all of the lessons of TV sports production are immaterial. In fact, it’s worse than that: insisting on all of the trappings of a traditional sports broadcast has two big problems: first, because it is costly, it means that less content is available than might be otherwise. And second, it makes the experience significantly worse.

Jump ahead to game action. The best camera was this one on the scorer’s table:

I have, as I noted, had the good fortune of sitting courtside at an NBA game, and this very much captured the experience. The biggest sensation you get by being close to the players is just how tall and fast and powerful they are, and you got that sensation with the Vision Pro; it was amazing.

The problem, however, is that you would be sitting there watching Giannis or LeBron or Luka glide down the court, and suddenly you would be ripped out of the experience because the entirely unnecessary producer decided you should be looking through one of these baseline cameras under the hoop:

These are also not bad seats! I’ve had the good fortune of sitting under the basket as well. These are the seats where you really get a sense of not just the power but also the physicality of an NBA game: I would gladly watch an entire game from here. But alas, I was only granted a few seconds, before the camera changed again.

This was absolutely maddening — so maddening, that I am devoting a front page Article to a device no one but me cares about, in the desperate attempt to get someone at your company to listen. What makes the Vision Pro unique is the sense of presence: you really feel like you are wherever the Vision Pro takes you. In other words, when I’m wearing the Vision Pro, and the camera actually stays fixed — like, for example, when you set up a special fourth camera specifically for the Lakers Girls performance, which I think was the single longest continual shot in the entire broadcast — I get the sensation of sitting courtside at Crypto.com Arena, and it’s amazing. Suddenly $3,499 feels cheap!

However, when I’m getting yanked around from camera to camera, the experience is flat out worse than just watching on TV. Just think about it: would it be enjoyable to be teleported from sideline to baseline to baseline and back again, completely at the whim of some producer, and often in the middle of the play, such that you have to get your bearings to even figure out what is going on? It would be physically uncomfortable — and that’s exactly what it was in the Vision Pro.

What is so frustrating is that the right approach is so obvious that I wrote about it the day you announced this device: one camera, with no production. Just let me sit courtside and watch an NBA game. I don’t need a scoreboard, I can look up and see it. I don’t need a pre-game or post-game show, I can simply watch the players warm-up. I don’t need announcers, I’d rather listen to the crowd and the players on the court. You have made a device that, for this specific use case, is better than TV in every way, yet you insist on producing content for it like it is TV! Just stop!

A Ticket to Live

There will be more games this year; from your press release last October:

Basketball fans will soon be able to experience NBA games like never before in Apple Immersive on Apple Vision Pro, with a selection of live Los Angeles Lakers matchups during the 2025-26 season, courtesy of Spectrum SportsNet. Viewers will feel the intensity of each game as if they were courtside, with perspectives impossible to capture in traditional broadcasts. The schedule of games will be revealed later this fall, with the first game streaming by early next year, available through the forthcoming Spectrum SportsNet app for Vision Pro.

That schedule was announced last week, and there are six games total (including last Friday’s). Six! That’s it.

I get it, though: producing these games is expensive: you need a dedicated studio host, a dedicated broadcast crew, multiple cameras, a dedicated production crew, and that costs money. Except you don’t need those things at all. All that you need to do, to not just create a good-enough experience but a superior experience, is simply set up the cameras and let me get from the Vision Pro what I can’t get from anything else: the feeling that I am actually there.

And, I would add, you shouldn’t stop with the Lakers: there should be Vision Pro cameras at every NBA game, at every NFL game, at every NHL game, at every MLB game — they should be standard issue at every stadium in the world. There should be Vision Pro cameras at every concert hall and convention center. None of these cameras need a dedicated host or announcers or production crew, because the Vision Pro isn’t TV; it’s actual presence, and presence is all you need. $3,499 is a lot of money for physically uncomfortable TV; it’s an absolute bargain if it’s a way to experience any live experience in the world on demand.

But, alas, you refuse. So nope, I still can’t recommend the Vision Pro, not because it’s heavy or expensive or has an external battery, but because you, Apple, have no idea what makes it special.