Listen to this post:
When it comes to the AI soap opera — there is news every day, and the company on top and the bottom seems to shift by the quarter if not the month — the news that I find most intriguing and instructive this week is about physical goods and logistics. From Bloomberg:
Amazon.com Inc. unveiled a suite of logistics services that will let businesses buy its existing freight and distribution offerings as a package, sending shares of rival delivery companies such as FedEx Corp. and United Parcel Service Inc. lower. The world’s largest online retailer on Monday announced Amazon Supply Chain Services (ASCS), offering other companies access to its “full portfolio” of supply-chain and distribution offerings. The service largely consolidates a package of existing products — air and ocean freight, trucking and last-mile delivery — into a new suite it says companies like Procter & Gamble Co. and 3M Co. are already using.
This is a very satisfying announcement for Stratechery, given it’s the culmination of a prediction I made a decade ago in The Amazon Tax. Amazon at that point had two primary businesses — Amazon.com and AWS — and I made the case in that Article that they were actually very similar: in both cases Amazon built “primitives” that had Amazon itself as their first, best customer, justifying and driving initial development, but in both cases the ultimate play was to sell those primitives to other companies. It was already clear at the time that logistics would follow the same path:
It seems increasingly clear that Amazon intends to repeat the model when it comes to logistics: after experimenting with six planes last year the company recently leased 20 more to flesh out its private logistics network; this is on top of registering its China subsidiary as an ocean freight forwarder…
So how might this play out? Well, start with the fact that Amazon itself would be this logistics network’s first-and-best customer, just as was the case with AWS. This justifies the massive expenditure necessary to build out a logistics network that competes with UPS, FedEx, et al, and most outlets are framing these moves as a way for Amazon to rein in shipping costs and improve reliability, especially around the holidays.
However, I think it is a mistake to think that Amazon will stop there: just as they have with AWS and e-commerce distribution I expect the company to offer its logistics network to third parties, which will increase the returns to scale, and, by extension, deepen Amazon’s eventual moat.
Now, ten years later, we are here, with the official unveiling of Amazon Supply Chain Services, and I think the time frame is an important one: Amazon, more than any other company, actually operates with decade-long timeframes, consistently making real-world investments at massive scale that (1) convert their marginal costs into capital costs and (2) gain leverage on those capital costs by selling them to other businesses.
This is, by the way, still a story about AI.
A Brief History of AWS
Three years ago SemiAnalysis wrote an Article entitled Amazon’s Cloud Crisis: How AWS Will Lose The Future Of Computing, and I found it very compelling. First, though, some history (much of which is covered in SemiAnalysis’ article).
Amazon not only invented cloud computing, but also realized it would be a commodity market. While most people in tech think about building sustainable differentiation that allows you to charge higher prices, thus producing profit, commodity markets work differently: there, sustainable profits come from having structurally cheaper costs.
Amazon developed exactly that, first through having the largest scale — giving the company both buying power and also the most leverage on their development costs — and second through genuine innovation. AWS built a specialized system called Nitro, built on their own chips, that offloaded server management, including network management, storage management, hypervisor management, etc. from the expensive Intel and AMD servers that the company sold access to; this let Amazon run that many more virtual machines on a single server, significantly increasing utilization, i.e. delivering a structural cost advantage.
Amazon doubled down on their custom chip efforts with Graviton, their ARM processors. Graviton chips, particularly the first few generations, were inferior to Intel or AMD chips, but that didn’t mean they were useless. By that time AWS had expanded from simply being an Infrastructure-as-a-Service (IaaS) provider to being a Platform-as-a-Service (PaaS) provider as well. IaaS means you provide raw compute, storage, etc., on which customers can run things like operating systems or databases; PaaS means you provide that basic functionality as a service. Amazon Relational Database Service (RDS), for example, is a fully managed database that customers can access via a set of APIs without having to worry about actually managing the full database themselves, worrying about scaling, duplication, etc. This, by extension, means that customers don’t need to know and don’t need to care about the compute infrastructure that undergirds services like RDS — which has long been Graviton!
PaaS lets Amazon double-dip in terms of profitability: first, AWS could sell PaaS products at a higher margin than IaaS products, and second, the company could leverage its own cheaper silicon to serve those products, reducing their costs. Over time Graviton has become more competitive in performance — while still being cheaper — giving Amazon a lower-cost compute instance to sell to end users, but even without 3rd-party take-up the investment in building its own silicon has paid off over time.
Training vs. Inference
Fast forward to AI, and SemiAnalysis’ concern was that all of these optimizations left AWS ill-prepared for AI. One big problem was networking:
Rather than implement the best networking from Nvidia and/or Broadcom, Amazon is using its own Nitro and Elastic Fabric Adaptor (EFA) networking. This works well for many workloads, plus it delivers a cost, performance, and security advantage. There are business, cultural, and security reasons why Amazon will not implement other networking. The cultural one is important. Nitro and networking SoC’s generally have been Amazon’s biggest cost advantage for years. It’s ingrained into their DNA. Even EFA delivers on this too, but they don’t see how new workloads are evolving and that a new tier is needed due to the lack of foresight in their internal workload and infrastructure teams. Amazon is making a deliberate choice of not adopting that we believe will bite them in the future.
Another was Amazon’s insistence on building its own chips, which were not only inferior to the best Nvidia chips in terms of performance, but might also lead to them getting fewer Nvidia chips going forward:
At least some other clouds will implement out-of-node NVLink. That’s where the discussion of prioritization now comes in. AI GPUs face tremendous shortages, for at least a full year. This is one of the most pivotal times for AI, and it may mark the haves and the have-nots. Nvidia is a complete monopoly right now. Why would Nvidia prioritize Amazon for these GPUs, when they know Amazon will move to their in-house chips as quickly as they can, for as many compute workloads as they can? Why would Nvidia ship tons of GPUs to the cloud that is not using any of their networking, thereby reducing their share of wallet?
Instead, Nvidia prioritizes the me-too clouds. Amazon does get meaningful volume, but nowhere close to where demand is. Amazon’s H100 GPU shipments relative to public cloud shipments is a significantly lower than their share of the public cloud. Those other clouds also can’t satisfy demand, but they get a bigger percentage of the GPUs they ask Nvidia for, and as such, firms looking for GPUs for training or inference will move to those clouds. Nvidia is the kingmaker right now, and they are capitalizing on it. They have to spread the balance of power out to prevent compute share from clustering towards Amazon.
These concerns were well-founded in the 2023 time-period when that Article was written: that was a time when AI, thanks to ChatGPT, had hit the mainstream, but the largest share of compute still went to training. Training required all of the things that Amazon lacked, particularly the ability to network large numbers of Nvidia GPUs together into one coherent system. In such a system the most important capability was horizontal networking between chips, so that you could update weights during training, a step that needed to happen serially. It was absolutely the case that cloud providers like Microsoft or Oracle or the neoclouds, which implemented full Nvidia solutions, instead of the standalone HGX racks that AWS favored, were much better suited to training large language models.
That is still the case, by the way. What has changed is that training is no longer the biggest AI compute market; inference is, thanks not only to increased AI adoption, but also because of fundamental changes in terms of how AI works. From an Update about Nvidia:
- The first inflection point was the emergence of LLMs — call this the ChatGPT moment. In this first paradigm tokens were generated by GPUs and presented as the answer to a question.
- The second inflection point was the emergence of reasoning models — call this the o1 moment. In this paradigm there are a very large number of tokens that are generated to figure out the answer before the answer is actually generated; this was an exponential increase in the addressable market for tokens.
- The third inflection point was the emergence of functional agents — call this the Opus 4.5 moment. In this paradigm those reasoning models are not triggered by humans asking a question, but by an agent solving a problem. This increases the market in two directions: first, humans can run multiple agents, and secondly, agents can leverage reasoning models multiple times to accomplish a task. This isn’t just an exponential increase in the addressable market for tokens, it’s two exponential increases squared.
Both the shift to inference and the shift in the nature of inference have been positives for AWS’ approach.
- First, while inference still requires significant memory, the requirement is significantly less than that required for training. It’s actually viable to store a model’s parameters in a single server; you don’t need to network together thousands of chips.
- Second, while reasoning and agentic workloads require significantly more tokens, and thus a massively larger KV cache, the increase is actually so large that even the most optimized Nvidia inference systems are being built with dedicated memory servers. This sort of architecture is much more compatible with Amazon’s networking approach than the thousands-of-chips-networked-together approach is.
- Third, agents are heavily CPU dependent, which has two important implications. First, fully utilizing accelerators is a function of having sufficient general compute; second, achieving maximum utilization of heterogeneous compute means unbundling CPUs and GPUs and routing workloads between resources, which is exactly the sort of disaggregated-resource abstraction that Amazon has been building with Nitro.
The utilization point is an important one. Nvidia CEO Jensen Huang made his case for Nvidia chips over custom ASICs at length at GTC 2025. Huang’s argument was that AI factories — to use his term — were ultimately constrained by power; that meant that the most important metric for profitability was not the cost of chips but rather tokens-per-watt. In other words, if you can’t increase watts, it’s worth spending more on chips to increase tokens on those watts.
There are, however, three reasons why this argument may not hold, particularly for a company like Amazon.
- First, if you have the money to buy that many Nvidia chips, you also have the money to spend on getting more power — which is exactly what AWS has been focused on. This very much fits AWS’ modus operandi, which is to invest more upstream (in this case in power) with the goal of spending less downstream (paying Nvidia huge margins for their chips).
- Second, in the long term, electricity is more of a commodity than logic is. That means it is a market where innovation and competition are more likely to break a bottleneck, which is another way to say that investing in one’s own silicon is the area most likely to deliver a return on investment.
- Third, the nature of inference workloads — particularly agentic ones — is such that perfect accelerator utilization is going to be a much harder problem to solve than when it comes to training.
These points are moot, however, if you don’t have your own logic chip that is at least competitive, and here Amazon’s long-term outlook is paying off. Amazon bought Annapurna Labs, which makes their chips, in 2015, and launched their first AI-focused chip in 2019. No, it wasn’t very good, but critically, that was seven years ago: now Trainium 3 is decent and the trajectory is even better. AWS is positioned to have a sustainable cost advantage for inference going forward.
AWS’s Neutrality
Moreover, they are already replaying the Graviton playbook. Trainium chips help undergird Bedrock, its AI platform, which is to say that users are using Trainium chips even if they didn’t explicitly choose to do so. AWS CEO Matt Garman made this point explicitly in a Stratechery Interview:
I think just with GPUs, by the way, you’re going to interact with a lot of these accelerator chips through abstractions. So the vast majority of customers don’t interact with GPUs either, except through maybe like in their laptop or something like that, for graphics. But when you’re talking to OpenAI, even if they’re running on GPUs, you’re not talking to the GPUs, if you’re talking to Claude, you’re through GPUs or Trainium or TPUs, you’re not talking to any of those chips, you’re talking to the interface. And the vast majority of inference out there is being done on one of a handful of models.
And so whether it’s 5, 10, 20, 100, it’s not millions of people that are programming to those things directly, and that’s gonna be true going forward just because these systems are so complex, they’re very large. If you’re going to go train a model, not that many people have enough money to go train a model, not that many people have the expertise to actually manage it. They’re very complicated systems, and the OpenAI team is incredible in their ability to squeeze value out of a very large compute cluster. But not that many people have the team that can do that, independent of what the chip happens to be, and so I think that that’s going to be true for all accelerator chips, honestly.
The frontier models are an important factor in this, and that is an angle that I didn’t see coming. Nvidia CEO Jensen Huang explained in a recent interview with Dwarkesh Patel why Nvidia didn’t invest in Anthropic early on:
At the time, I didn’t deeply internalize how difficult it would be to build a foundation AI lab like OpenAI and Anthropic, and the fact that they needed huge investments from the supplier themselves. We just weren’t in a position to make the multi-billion dollar investment into Anthropic so that they could use our compute. But Google and AWS were. They put in huge investments in the beginning so that Anthropic, in return, used their compute. We just weren’t in a position to do that at the time.
I would say my mistake is I didn’t deeply internalize that they really had no other options, that a VC would never put in $5-10 billion of investment into an AI lab with the hopes of it turning out to be Anthropic. So that was my miss. But even if I understood it, I don’t think we would’ve been in a position to do that at the time. But I’m not going to make that same mistake again.
Amazon had both the money and the chips to invest into Anthropic precisely because they had built such a cash machine with AWS in the first place. That’s the thing with big investments in infrastructure: they take years to build, but the benefit of that investment compounds over time.
Anthropic, meanwhile, thanks to those investments from Amazon and Google, can not only run across a variety of chips, but for a long time was the only frontier model available on all of the leading clouds, an important selling point for enterprises. Microsoft, in the end, needed to let go of Azure’s exclusive access to OpenAI’s API in part because that exclusivity was hurting the prospects of their mammoth stake in OpenAI.
You can also make the case that Amazon is the best choice for frontier model access in a world of limited compute: Microsoft’s core business is software, which is to say that the company faces massive pressure to invest in their own AI capabilities, even at the cost of de-prioritizing cloud customers. That’s exactly what happened at Microsoft earlier this year, when the company missed Azure growth projections because they devoted more compute to their internal workloads. It was an understandable decision: cloud demand is eternal, but the risk from AI for existing software businesses is existential.
This also applies to Google: the company’s core business is also digital, and while search has fended off the threat from chatbots that many expected, the fundamental challenge is still one to be managed, not extinguished. Amazon’s core businesses, meanwhile, are very much rooted in the physical world: selling and shipping physical goods, and building data centers. Both are amenable to Amazon devoting the majority of its chips to customers’ workloads.
Amazon’s Future
If this week marks the resolution of one of Amazon’s long bets, you can see the outline of future resolutions in present day announcements. One prominent example is Amazon Leo, the company’s satellite service that seems, at first glance, duplicative of SpaceX’s Starlink, which has the advantage of already existing at scale.
Remember Amazon’s formula, however, which CEO Andy Jassy stated explicitly with regards to Leo on the company’s most recent earnings call:
Today, if you ask what stops us from growing the business, we have to get the constellation into space. We have over 20 launches planned this year. We have over 30 launches planned in 2027. But I think the business has a chance to be a very large many billion-dollar revenue business. And I think it has some characteristics that are reminiscent of AWS in that it’s capital-intensive upfront where you’re committing a lot of capital and cash in the early years for assets that you get to leverage over a long period of time. And so I like the free cash flow and return on invested capital characteristics of that business in the medium to long term.
The fact that it is extremely capital-intensive is not the only thing about Leo that makes it like AWS: a critical factor is that Amazon is the first-best customer to give the service scale, and here it’s worth going back to logistics. I noted above that Amazon delivery still has marginal costs, and that is because humans have to make the delivery.
Amazon, however, has already pointed to the future, a full 13 years ago when the company first started talking publicly about drone delivery. It’s been a long slog, to be sure, but it’s increasingly plausible to imagine a future where delivery costs are a matter of depreciation on drone assets, and what would such a future require? How about reliable widespread satellite coverage for communicating with and guiding those drones? And, if Amazon doesn’t want to be dependent on Jensen Huang for chips, do you think they want to be dependent on Elon Musk for drone connectivity?
Of course other businesses — like Apple — will be able to pay to use Amazon’s satellite infrastructure, just like they can now pay to use Amazon’s delivery service, or pay to use AWS, or pay to sell on Amazon.com. The world may change, in increasingly drastic ways, but Amazon’s approach, by virtue of its focus on long-term investments in the physical world, appears to be as sturdy as ever.
More generally, I increasingly suspect that long-term vulnerability to AI — or, to put it more positively, long-term incentives to invest in AI — are very strongly correlated with the degree to which a company interacts with the physical world, and secondarily, the degree to which companies feel secure in their control of distribution:
- Apple and Amazon feel comfortable not having leading edge models, just access to them, because their business is rooted in the physical.
- Microsoft has invested heavily in data centers, but doesn’t own their own model, perhaps because they feel their control of distribution to enterprises will protect their core business (or because they had too much of a dependency on OpenAI).
- Google and Meta are investing at a similar scale to Amazon, and are also heavily invested in their own models. Both are Aggregators, which is to say they have to continually earn attention from consumers, given that competition is only a click away; having good AI is existential to them.
This is, in the end, another advantage to making the sort of long-term bets Amazon specializes in: the threats are so distant that you have plenty of time to make new investments that address any weaknesses that develop in the meantime — or, as is the case of AI, wait for the market to tilt in your favor.
















