An Interview with OpenAI CEO Sam Altman and AWS CEO Matt Garman About Bedrock Managed Agents

Good morning,

As I noted yesterday, today’s Stratechery Interview is early in terms of my timing — Tuesday instead of Thursday — and late in terms of delivery — 1pm Eastern instead of 6am — because the topic was embargoed. That embargo created a bit of a weird situation for me over the last several days:

So here we are.

I think the Microsoft-OpenAI deal makes a lot of sense for both sides. Here are the bullet points of the new arrangement from Microsoft’s post:

  • Microsoft remains OpenAI’s primary cloud partner, and OpenAI products will ship first on Azure, unless Microsoft cannot and chooses not to support the necessary capabilities. OpenAI can now serve all its products to customers across any cloud provider.
  • Microsoft will continue to have a license to OpenAI IP for models and products through 2032. Microsoft’s license will now be non-exclusive.
  • Microsoft will no longer pay a revenue share to OpenAI.
  • Revenue share payments from OpenAI to Microsoft continue through 2030, independent of OpenAI’s technology progress, at the same percentage but subject to a total cap.
  • Microsoft continues to participate directly in OpenAI’s growth as a major shareholder.

I think the most important point is the last one. Azure had a real competitive advantage thanks to being the only hyperscaler able to offer OpenAI models, but this also hindered OpenAI, particularly once it became clear that many enterprises cared first and foremost about accessing models on their current cloud of choice; I’ve been noting for a while that this was a real competitive advantage for Anthropic. In other words, Azure’s exclusivity was actively damaging Microsoft’s investment in OpenAI, and given Anthropic’s rapid growth this year, Microsoft needed to tend to their investment, even if it diminished Azure’s differentiation.

OpenAI, meanwhile, clearly sees AWS as a massive opportunity — so much so that they are forgoing Azure-related revenue for the next few years (which, per the previous point, will help Azure management feel better about losing their exclusivity; their PnL is going to look a lot better without paying a revenue share to OpenAI). OpenAI is also releasing Microsoft from the AGI clause; now the agreement between the two companies will run through 2032 no matter what.

What does seem clear is that OpenAI’s focus is going to be on AWS, and the greatest evidence in that regard is the topic of this interview: Bedrock Managed Agents, powered by OpenAI. The easiest way to think about this offering is Codex in AWS; a lot of what makes Codex work is the fact that it is local, which gives you a lot of complexity, particularly in terms of security, for free. It’s another thing entirely to figure out how to make agents work across an organization, and the goal of this offering is to make these workflows much more accessible for organizations who already have most of their data in AWS.

To that end, in this interview, we discuss how AWS created the entire cloud category, and the impact it had on startups, and how AI is both similar and different to that previous paradigm shift. Then we discuss Bedrock Managed Agents, what it is, and how it differs from Amazon’s existing AgentCore offering. We also touch on Trainium and why chips won’t matter to most AI users, and why partnering makes sense relative to Google’s focus on full integration.

As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player.

On to the Interview:

An Interview with OpenAI CEO Sam Altman and AWS CEO Matt Garman About Bedrock Managed Agents

This interview is lightly edited for clarity.

AWS and Startups

Matt Garman and Sam Altman — well Matt, welcome to Stratechery — and Sam, welcome back [I previously interviewed Altman in October 2025, March 2025, and February 2023].

Sam Altman: Thank you.

Matt Garman: Thank you, thanks for having me.

So Matt, this is your first time on Stratechery. Alas, I think that Sam’s presence is going to preclude the usual getting to know you section. Besides, he doesn’t want to hear us reminisce about our times at Kellogg Business School, but it is good to have a fellow alumnus on the podcast.

MG: Yeah, I’m happy to be here. I’ll come back another time and we can do a little deeper dive.

That’d be great. You’ve been working on AWS since you were an intern, and you’re now in charge of the entire organization during this AI wave. What aspects of building the AI business are the same as building the original commodity compute business, for lack of a better term, and what aspects are really different?

MG: I think that the parts that are the same are that I see that same excitement and builders out there being able to do things that they were never able to do before, and one of the cool things is when we first started AWS, is developers all of a sudden could get their hands on infrastructure that was only available to the largest companies who had millions of dollars to go build data centers. With a credit card and a couple of dollars, they could spin up applications and it really exploded what was possible for people building out there on the Internet. We kind of took the idea that people could build whatever they want and we weren’t going to presuppose what they should do and that the creativity of the world out there was, if we could put powerful tools in front of them, they’d build interesting and amazing things.

I think this is as much, if not more, transformational to what it’s enabling builders out there to do. As you think about what’s possible, you don’t have to have gone to school and learned for 10 years to code in order to go build an application, you don’t have to have huge teams of hundreds of people and months and months and months of time to go build things. You can build things with small teams, you can build it fast and you can iterate quickly, and AI is unlocking all sorts of innovation across every different aspect of the world. I think in many ways that’s very similar, and it’s super exciting to see what it’s enabling from the customer base out there.

There was a bit, though, when AWS came along, you were the only one, so you get all the upsides and downsides and everything sort of for free. Is there a bit where it felt like in the AWS era, there’s a lot about commodity compute, making it fungible, elastic, cheap — in AI, particularly in training, it feels like the winning abstraction was more about these really vertically integrated super clusters, really advanced networking, and really tight linkages between software and hardware. Was that sort of a surprise for you, where you’re coming at it now — instead of fresh, “We’re the only ones here, we had a particular way of looking at large-scale compute”, and at least for the first few years of AI, it maybe didn’t perfectly align?

MG: I don’t know that it was different for us. I think for what was different though, is just the incredible rapid scale of adoption, and I think that that’s probably surprised everybody. Sam, you can weigh in different if you disagree, but just the speed of adoption and how fast people have grabbed onto the capabilities there, I think has surprised everyone.

It’s different if you go to the, when we started cloud computing, it took us a really long time to explain why a bookseller would provide your compute power, that was a lot of explanation to explain what cloud computing was. There was a lot of hard work that people forget, but back in 2006, it wasn’t a given that that’s just how the world’s computing would move to and so there was a lot of kind of hard work there.

Do you think you had to do a bit of explaining now though, because lots of people were anchoring on the training era and you’re like, “We’re thinking about the inference era“, and that’s going to be something different, maybe you still had to get those explanatory powers going again?

MG: You do, but it’s just how quickly people understand what you’re talking about is just totally different. So I think yes, I think if you move from where people are saying, “That does seem kind of cool, and it’s really neat that I have this intelligent chatbot that I can talk to”, going to, “I can actually do work in your enterprise”, has been a little bit of an education, but it’s also been relatively quick in the scope of how fast technology moves.

We’re going to get to the product that we’re here for very quickly, I promise, but Sam — from the startup ecosystem perspective, when you look back, obviously AWS, transformational, completely changed where the barrier was, now anyone can get started. You have seeds, you have angel investors, and it sort of moves back the barrier where the cutoff point, you don’t have to get servers on a PowerPoint, you can build an app and then go to your Series A or whatever it might be. What, though, is different or the same compared to what that enabled versus the world today from your perspective?

SA: I think there have been four great moments for platform enablement of startups at mass scale: there was the Internet, there was cloud, there was mobile, and then there was AI. The first one of those that I was kind of like an adult for was the cloud and in the early days of YC [Combinator] — it’s like hard to overstate what a change this meant for startups. Before, you had these startups that were like renting colo[cation] space and putting together servers and putting stuff in there and it was this like massively complex thing, and you had to like raise all this money. Then all of a sudden, even though the cloud happened like right after YC got started, I guess it was the year after.

I was just going to ask that — is it really at the end of the day, they’re really hand-in-hand more than you realized at the time?

SA: They felt incredibly hand-in-hand at the time, it felt like YC was, you know, surfing this wave of the cloud from the very beginning because there were some early pre-AWS examples.

You don’t need to put that much money into a startup to get something off the ground if AWS exists compared to what it might’ve been before.

SA: It was this huge enabling change and it was part of why YC sounded so crazy at the time. People were like, “Well, there’s no way you can fund a startup with a few tens of thousands of dollars, it’s impossible, the server costs more than that”, so it was this complete change to what startups could do with small amounts of capital.

Startups generally win when there is a big platform shift and you can do things with a faster cycle time and much less capital than before, that’s a classic way startups can beat big companies, and at the beginning of my career, I really witnessed that happen with the cloud, it actually feels quite directionally similar now watching what companies are doing building on AI, but as Matt was saying, the speed of it is crazy.

Is there a bit where the incumbents, the large companies, are adopting this way faster than they than they were the cloud?

SA: There’s definitely more of that, but I also mean just the the rate that revenue is scaling in at startups — I spoke at YC recently and I kind of asked at the end, “What are the expectations for revenue for a good company at the end of YC?”, and they’re like, “Well it’s kind of changing every month, maybe we’d have a different answer at the beginning of the batch versus the end of the batch”, and this never used to happen before. Just the rate at which people are able to build scaled business on this new platform is unlike anything I’ve seen before.

You were the cloud of choice for basically all startups, a huge advantage to that whole era, Matt. What makes you the cloud of choice today? Because you think about a lot of people building on the OpenAI API, or is that something you felt, “Actually we’re coming at this market from a very different perspective, we have a huge installed base who’s begging us to get AI things, and we have less visibility into this whole cohort that Sam’s talking about”?

MG: I think there’s a couple of things. One is, is we’re quite excited about our partnership, and I think it’s going to be really meaningful to a bunch of startups out there. But today, even if you go and you talk to startups, the vast majority of scaling startups are still scaling on AWS today, and there’s a whole bunch of reasons for that. The scale is there, the availability is there, the security is there, the reliability is there, that kind of partner ecosystem of other ISVs are in AWS, the customers are in AWS.

(laughing) Everyone’s used the AWS panel whether they wanted to or not, so they’re used to it.

MG: And we help them. We spend a ton of time enabling startups, whether it’s with credits, but it’s not just with credits, it’s advice on how to set up your systems, how to think about go-to-market, a bunch of those things that are, I think, are really appreciated by a bunch of the startups, we invest a lot of time and effort to make sure because we really feel like the startups are the lifeblood of AWS. They were from the beginning, like when Sam was talking about it, but they remain today, and I still go once a quarter out to Silicon Valley or other places to meet directly with startups to hear what they’re doing, to make sure that what we’re building is landing with them. So there is more competition today than there was 20 years ago for that startup attention, and it’s just as important for us as it’s ever been and and we spend a ton of time to make sure that we’re meeting the needs of those startups.

Is it fair to say people building directly on the OpenAI API, as opposed to say the Azure version of it, are more likely to have a stack of AWS for for regular compute and then OpenAI for for their AI?

MG: I think that’s a very common pattern that a lot of startups have today, absolutely.

Bedrock Managed Agents

Well that brings us to today’s announcement: Bedrock Managed Agents, powered by OpenAI, I think I got that right. The pitch, as I understand it, is not simply OpenAI models are available in AWS — I don’t think that’s allowed — it’s that OpenAI’s frontier models are being packaged inside an AWS-native agent runtime, identity, permission state, logging, governance, and deployment. Sam, is that the right way to articulate it?

SA: Yeah, that was pretty good.

Thank you. What is this? Now explain it in English.

SA: I think the next phase of AI is going from you supply some text to an agent and get more text back, or even you supply a bunch of code and get more code back, to we are going to have these agents running inside of a company doing all different kinds of work.

Virtual co-workers is kind of my least bad of the ways I’ve heard this described, but no one has quite figured out the right language for this, and we are packaging a new product that we’re working on together to help enable companies that want to build these sorts of stateful agents and make them available. Again, I think we don’t know exactly how the world’s going to talk about these, use these, but if you look at what’s happening [with Codex], I think there’s a great example of where we can see this all going.

How important is the harness, the runtime around the model, the tools, state — to your point, a very important word to you — memory, permissions, evals, to making agents actually work?

SA: Hard to overstate how critical it is. I no longer think of the harness and the model as these entirely separable things, like my experience of using these, I am very aware of the fact that I don’t always know when I fire something off in Codex and it does an amazing thing for me. I don’t know how much credit —

Was it that the model is amazing or the harness was amazing?

SA: Yeah, exactly.

To what extent is the harness developed in conjunction with the model? Where does that integration happen? Is it in post-training? Is it in the prompt? What makes this integration work?

SA: Both of those. It’s not really part of the pre-training process but I would say you can look at it — there’s a more interesting thing here which is the fact that we’ve seen examples of this many times in the past of where things that we thought were very separable get baked in more and more and more. Like the way we initially thought about tool-calling, which is now a critical part of how we use these models, was not something that we thought about deeply integrating into the training process and over time we’ve done more and more of that.

I would also suspect that model and harness come together more over time and I would for that matter, I would expect that pre-training and post-training eventually come together more over time as well. It’s such a cliché to say, but I’ll do it anyway, because I think it’s very, very true — we’re so early in the paradigm of all of this, this is still like the Homebrew Computer Club days of how much this is like really matured as an industry.

This is why I think so interesting, I wrote about this a few weeks ago, in any value chain, ultimately a point of integration emerges that that’s where it’s really important, these two pieces have to go together to make it work. And over time, that’s obviously where a lot of value collects — my thesis then is that this harness-model integration is the key point. It’s to your interest, but it sounds like you agree.

SA: It is to my interest, I do agree, but I also would say even more broadly, what you care about is that you go type into Codex what you want to happen and that it happens.

You don’t care about the implementation details.

SA: I don’t think you do. There have been so many examples as we’ve been figuring all of this out where we had to do something at the level of the system prompt, that later we didn’t. The general observation here is as the models get smarter, you have more flexibility to get them to behave in the ways you want which sounds like an obvious statement, but it is—

It’s easier to tell a 10-year-old what to do than a 5-year-old.

SA: When I think back to what we had to do to get any drop of utility squeezed out of these models back in the GPT-3 days that now you never would have to, because of course the model just understands and does it well out of the box, that trend may keep going much further.

MG: I was just going to add to that — I completely agree with that and I think when you talk to customers who have ideas exactly what they want these systems to do, previous to this kind of joint collaboration that we worked on together, is that customers were kind of forced to pull that together themselves, right? They wanted these models and agents to remember that they work together well and they wanted to integrate into their existing systems, and it’s not just third-party tools, it’s their own tools. They want them to learn about their own data, their own applications, and their own operating environment and all of that kind of integration today, at least, is left to every single customer to do on their own.

So part of this joint collaboration that we were leaning into together is co-building a new type of product that actually brings those things much closer together so that customers can much more easily go accomplish these things that they want to do, where identity is already kind of built into that product, where the ability to go authenticate to your database all happens inside of your AWS VPC [Virtual Private Cloud]. You can do a bunch of these things that would be possible to do if we were kind of at the OpenAI APIs and AWS over here, but by building this thing together, we make it much easier for customers to much more rapidly get to value and go accomplish the thing they want to do inside of their enterprise environment.

So you think that you can build a functional agent in a generic harness, it’s just way more difficult? You’re making it easier? Or is there a bit where actually there might not even be stuff you can do if you don’t have them tied together?

SA: To go back to your earlier analogy, pre-AWS days, you could do a lot if you were willing to go stand in a cage and buy a bunch of servers and figure out how to connect them and hire your own network engineer, and you could make a lot of things happen and then all of a sudden as soon as you could just like log into an AWS control panel and click, “I need another S3 instance”, or whatever, you could make a lot more things happen because the activation energy, the amount of work that required for the basics, got way better so you can do a lot with the models today.

Yet every time I watch someone use our models or try to set up some of this work Matt was saying, I am torn between being happy they’re so impressed and feel like this is a magical technology and pulling my hair out at how much pain and suffering they’re going through to get anything to work at all, and that’s not just true of developers building these products, even using ChatGPT and watching people copy and paste things from here to there and try to have this complicated set of prompts — I know that’s going to go away, and I’m thrilled. It’s still so early, and so bad.

Just don’t take away your integration with BBEdit, that’s all I ask, my number one favorite feature of the ChatGPT app.

SA: Fine.

(laughing) Thank you.

SA: A) This stuff is just way too hard to do, and we think if we can make it way easier it’ll bring way more value to developers and businesses, but B) there are a lot of things that you just can’t reliably get to work at all and I think through our joint collaboration not only will it be a story of ease of use and not having to go build out your own colo or whatever, we are going to jointly figure out a lot of new things to build where people will be able to build products and services that just can’t be done even with a lot of pain and suffering.

Local vs. Cloud

I actually want to come back to that point about things to be built. But just to go back to Codex real quick — Codex is a harness and model, it runs locally. Why is it easier to get agents to work locally right now?

SA: Actually, we started with it running in the cloud, and I think eventually you do want it to run in the cloud.

For sure. I’m walking through the transition to this offering, which is in the cloud. But why did you go back to local?

SA: You have your whole environment there, your computer’s set up, your data is there, you don’t have to like think about — it was just easier to get to work, even though it’s not the end state. But getting to a world where agents do run in the cloud and when you — if you have a very intensive thing, or you need to close your computer or whatever, you can hand stuff off to working on the cloud, I think is clearly going to be great. But the ease of use that we were able to deliver clearly in the short term, it won out to have it using your local environment.

There’s one way that I think about it, is like you have the old school security model, which is like the castle-and-moat sort of thing, and you’re moving to a new security model of zero trust and everything having the appropriate permission structure and authenticating and all those bits and pieces, and it feels like to me one way to frame running locally, it’s like your self-imposed castle-and-moat, everything’s on there, I just assume it’s all fine and easy to do. And a way I’m thinking about this, and Matt, let me know if that resonates with you, is to get all those pieces to actually function in a production environment you just can’t even have that all locally, you have to be operating this environment from the get-go, is that a right way to think about it?

MG: I don’t know that there’s any computing environment that’s gotten rid of a client, there are just benefits of operating locally. There’s a reason that most of your iPhone apps also have a local component, whether it’s connectivity or latency or just local compute or access to files and applications. The local client does have a particular — as Sam said, it’s easy, it works really well, it’s constrained, though, there’s limits to it.

You can’t scale out your local laptop, you have what you have and once you start getting in an enterprise contract, sharing between two people gets to be a little bit harder — thinking about permissions, thinking about security boundaries gets to be a little bit harder. So there’s a number of those pieces where I think that, I wouldn’t say that having the local environment is a bad thing, it’s just a different thing, and I think that you’re eventually going to want to have that bride across both.

That’s my question, because you have in the cloud era, you had containers that helped you converge local and production environments, but it kind of feels like in this case if you have to deal with agents, to your point, say I was like a virtual co-worker and or whatever it might be, if they have their own identity and they have their own permissions and all those sorts of things, to even build them you need to be in the right environment as you’re going to deploy it, it would seem that way to me.

SA: I think there is so much to figure out here. Just to give one example, if you’re an employee at a company, do you want to have one account for when you use some service, and then should your agent just use your account, or should your agent use a different account so that the server can tell which is which?

Or what if you want lots of agents?

SA: Exactly. I suspect that what we actually want is something we haven’t figured out yet, and maybe it’s that when Ben’s agent is logging in as Ben, it uses Ben’s account but it notes that it’s an agent and not the real Ben. We don’t even have a primitive to think about that, but we may quickly need to figure that out and and my sense is there there are going to be 50 other things like that where as we have agents join the workforce and act with increasing levels of autonomy and complexity of tasks, a lot of the mental models that we have for how software works and how access control and permissions work inside of a company or on the broader Internet, those are all just going to have to evolve.

How do you think about, Matt, in terms of security and access policies and whatnot for agents?

MG: Yeah, I do think that that’s where when you move more of these workloads into the cloud that you can have as a central organization, more controls over some of the security pieces of it. And I do think, when we talk to customers all of the time, it is what they worry about, which is, “I love the promise of what I can do with some of these really powerful models and agents, how do I make sure that I don’t have a company-ending event where I screw it up?”, and there’s the worry out there.

I think we can help with that because it these are solvable problems, they are, and I think, giving some customers confidence, “Well, it operates inside of this VPC”, and you can at least then control that boundary and know what it has access to, or it goes through this gateway, and you can give it permissions, much like you give it a role inside of the rest of your environment. These are constructs that over the last 20 years, we’ve built up a really rich set of capabilities, so that it’s not just Y Combinator startups, but it’s global banks and healthcare agencies and everybody in the world and government agencies that can use AWS and having built up all of that security structure around it, I think can help us further accelerate how they take advantage of this technology and kind of have these safeguards to run fast.

I think a lot of times when you’re in a company, particularly companies that are in risk-averse environments, having those safety guardrails where they say, “If it operates inside of the sandbox, I am excited to go fast”, can actually help many of our customers start to use these technologies for a much broader set of things.

AgentCore vs. Managed Agents

A lot of these capabilities you’re talking about that you’ve developed over 20 years and you’re trying to put it in place for agents are exposed today through AgentCore. So what is the relationship between Bedrock Managed Agents powered by OpenAI and Bedrock AgentCore?

MG: A lot of what we’ve built together is building on the building blocks of AgentCore in order to kind of pull some of these pieces together.

So there’s like a super set that sits on top of that?

MG: The AWS team and the OpenAI team used AgentCore components together with the OpenAI models and a bunch of those pieces to go and co-build this product together.

AgentCore is kind of our set of primitives that just like if with AWS, if you want to go and build our own agentic workflows, you can do that. You can have a memory component, you can have a safe execution environment, you can have a permissioning capability, and you can go and configure all of those and we have customers running those in production today that are doing really cool things.

But not with OpenAI.

MG: But not with OpenAI, they have to use different models today, that’s true. Actually, that’s not true, we have people doing it with OpenAI.

Oh, just calling to another cloud or whatever.

MG: They just call directly to the OpenAI model. So we actually absolutely have people doing it with OpenAI today, not natively inside of Bedrock, but they’re still using that. And it’s an open ecosystem where you can pull different capabilities to go build whatever you want and my bet is that people will continue to do that. We have builders out there that love to, to Sam’s analogy, love to continue to build computers at home today, even though you don’t have to do that, and even though people like to build and we think that people for a long time will build their own agents, but the vast majority of them are going to want an easier way to do it where they don’t want to have to go configure all of those pieces themselves and that’s part of what we’ve launched in this collaboration together.

Just to be super clear, you talk about this managed experience with Bedrock Managed Agents, you can also use AgentCore and pull from a model, whether on AWS or somewhere else. And just to make clear, Sam, this is a question for you, this is the distinction between OpenAI on say, Azure, where that’s just you have direct access to the API, and that is distinct from this managed service on Amazon. Is that correct?

SA: Correct, yep.

And you feel very good about that, that’s scoped correctly in all terms, it’s not going to be an issue going forward?

SA: Yeah, I think things will evolve over time, but I feel very good about this as a way to start.

Is this going to be an exclusive offering for AWS? Or do you anticipate having this sort of managed agent service on other clouds?

SA: Yeah, we’re doing this exclusively with Amazon, we’re excited about it.

How much of the exclusive is, “Look, we’re using all Amazon’s APIs, of course it’s only on Amazon”, or is this the overall idea of a managed experience, it’s not just a “We’re using Amazon APIs”, it’s, “Right now this is going to be on Amazon”?

SA: Spiritually, we want to do this as a joint effort between our companies.

Got it. The PR does say something, and this goes back to the point you mentioned, Matt, earlier about you could call out to other APIs and glue this all together yourself. In this case, the customer data stays within AWS, so what exactly does OpenAI see, what does that mean?

MG: That’s right. So the whole thing kind of stays within your VPC and so data is protected inside of the Bedrock environment.

Got it. And this is going to be running on OpenAI models through Bedrock, and these are going to be on Trainium?

MG: They’ll be through a mix of different – some of it will be on Trainium, some of it will be on GPUs.

Is that just a function of timing? Because I think as part of your announcement a couple of months ago —

MG: Some of it’s timing and capabilities, I think we’ll kind of be mixing in the different components of building the system together, using the right infrastructure for the right parts of it. But over time, more and more of it will be on Trainium.

SA: We are quite excited to get these models running on Trainium.

I can imagine.

Trainium

One quick question, just a general question about Trainium, Matt. Trainium, is it fair to think, and this is the way I’m thinking about it, so I want to make sure I have it right. Trainium — very unfortunately named, because it’s really going to be about inference going forward — the number one manifestation will be through managed services like a Bedrock, where the customer doesn’t even necessarily know what compute they’re using, is that a fair way to think about it?

MG: Number one, I take responsibility for bad naming across all AWS services.

Look, I have a word-of-mouth site named Stratechery, so I have all sympathy for bad naming.

SA: I think Trainium is a cool word.

MG: It is a cool word.

It is a cool word, it just feels like it’s an inference chip, not a training chip.

MG: It is. But, yeah, naming aside, it is useful for both training and inference. And look, it’s a chip that we’re incredibly excited about, and both in the current generations as well as ongoing, we think that’s going to be a huge business and a real enabler for a lot of the things that we do together.

I think just with GPUs, by the way, you’re going to interact with a lot of these accelerator chips through abstractions. So the vast majority of customers don’t interact with GPUs either, except through maybe like in their laptop or something like that, for graphics. But when you’re talking to OpenAI, even if they’re running on GPUs, you’re not talking to the GPUs, if you’re talking to Claude, you’re through GPUs or Trainium or TPUs, you’re not talking to any of those chips, you’re talking to the interface. And the vast majority of inference out there is being done on one of a handful of models.

And so whether it’s 5, 10, 20, 100, it’s not millions of people that are programming to those things directly, and that’s gonna be true going forward just because these systems are so complex, they’re very large. If you’re going to go train a model, not that many people have enough money to go train a model, not that many people have the expertise to actually manage it. They’re very complicated systems, and the OpenAI team is incredible in their ability to squeeze value out of a very large compute cluster. But not that many people have the team that can do that, independent of what the chip happens to be, and so I think that that’s going to be true for all accelerator chips, honestly.

SA: Ben, I increasingly think of what we have to do as a company is to be a token factory. But what the customer cares about is that we can deliver the best unit of intelligence at the lowest price and as much of it as they want, with as much capacity as they want.

Do you think we stick with pricing as far as — pricing is based on tokens, does that make sense in the long run?

SA: No. And in fact, like there was an interesting example of this with our model that just came out, 5.5. where the per-token cost is much higher than 5.4, but it requires a hugely fewer number of tokens to get the same answer, and you actually don’t care about how many tokens the answer takes, you just want the piece of work done, and you want again a price and an amount of capacity you can have for that.

So maybe I was wrong to say “token factory”, but we’re like an intelligence factory or something. We just want as many units of intelligence for the lowest price and whether that is a bigger model running fewer tokens, a smaller model running lots of tokens, whether a GPU or Trainium or something else, whether we do any of the other kind of number of things we could do about that creatively, I don’t think customers care.

In fact, they don’t really interact with that. When you go put something into Codex or when you go build a new kind of agent in the SRE [Stateful Runtime Environment], you should never have to think about that and you should just be astonished at how much you get for how little cost.

Is the reduced token usage is that model, or is that harness?

SA: That’s mostly model, it’s a little bit harness.

Got it. Do you anticipate Matt, by the way, I asked Sam the exclusive question, do you anticipate offering a similar managed service for other models?

MG: We’re focused on doing this with OpenAI right now. We’re very excited about what we’re doing together, and the fullness of time is a long time.

The fullness of time is a long time, I’ll let you stick with that one. It’s fine, I had to ask the question.

Customer Demand

I do have a question as far as customers, Sam, to your point, both your input on this, I’m curious — when people are actually in production, where does OpenAI’s responsibility end and AWS’s begin? It sounds to me, if all the data is on AWS and it’s staying there, and they’re operating at a higher level, this is ultimately AWS’s responsibility? Is that the right way — am I thinking about that correctly from a consumer perspective?

MG: Yeah, I think that’s right. When you’re going to call somebody, you’ll call AWS support to help you out, and it’s part of your AWS environment and you build it together and your AWS account reps are going to help you there. And we’ll bring in, when we’re building it, we’ll bring in our OpenAI colleagues to help you figure out how to best take advantage of this or whatever. At some point, if we run into a bug that we need their help with, we’ll escalate over to them, but AWS will be that frontline support that you kind of interact with.

Where do you see the scale of this business, Sam, relative to your core API business?

SA: I hope it’s going to be huge, we’re putting a lot of effort into this, we’re committing to buy a lot of compute, I believe there will be a lot of revenue there to support this. The increasing framework that I’ve had is that at a low enough price, demand for intelligence is essentially uncapped.

So is it very elastic in that regard? You decrease price, demand goes up?

SA: It’s certainly that, but again, you can decrease the price of water and maybe you’ll drink a little more water, maybe you’ll shower twice a day instead of once a day, there’s some elasticity there but at some point you’re like, “You know what, I have enough water”.

Also you will buy water no matter how much it costs if you have to.

SA: Other utilities, if electricity is cheaper you’ll certainly use more of it, but if you think about intelligence as a utility, there’s no other utility I know of that I’m just like, “I just want more, I’ll just use more as long as the price is low enough, I’ll just use more”.

MG: I will say actually and interestingly it’s largely been true of compute power where if you think about the cost of a compute cycle today versus what it was 30 years ago, like I don’t even know how many orders of magnitude cheaper, and there’s more compute being sold today than ever.

Right. People don’t really think about the cost of compute at least until they’re at extremely high levels it’s a material level, but by and large strategically speaking it’s just assumed you have compute. What’s the runway to getting there with with AI where it’s not the number one thought process, “How much am I spending here?”.

SA: I don’t think that is the number one thought process. Right now we have way more customers asking us, “No matter what the price is, can you give me more? I just need more capacity, I’ll pay you extra”, than we have arguing with us about the price.

But I do think we are going to continue to bring the price down crazily dramatically, now maybe the more we do that the amount of wealth that wants to flow and just goes up more and more and more. But I am confident we will continue to be able to reduce the cost of today’s level of intelligence quite dramatically — one thing that has somewhat surprised me is how much, and I don’t know if this is going to stay the case or not, but at least today how much of the total market demand is at the absolute frontier.

Right, there’s a lot of questions about that. It’s very expensive to serve the front end, people can just get the previous one, but you’re saying people just want to be on the front end no matter what?

SA: So far they do.

MG: And I think that’s a good signal that you’re not anywhere close to where we want to be and that there’s so much more demand, and I really do think it’s like if you go 40 years ago to compute demand, a computer was crazy expensive, and now it’s dwarfed by the the power that’s in everybody’s cell phone and we sell billions more of those things. I do think that that’s what’s going to happen to the AI world where today you’re pushing, everybody wants to use the frontier because that’s what you need in order to get a lot of useful work, and everyone’s so excited about the capabilities out there.

I think over time, you will have a mix of models, by the way, where you will have some smaller models that are able to do stuff that even the latest OpenAI models aren’t able to do yet, but they will be smaller and cheaper and faster over time, and you’ll have the super big ones that are going to go try to cure cancer and other things like that. But I think we’re still at just the early stages of what’s possible and when you see this much demand and this much growth when you’re at the early stages of what’s possible, it’s exciting for what the future holds.

Is there a bit of a cynical view here where, Sam, you had a bunch of customers that are like, “We’d love to use OpenAI models, but all our stuff’s in AWS, we’re not moving”. And Matt, you’re like, “Look, all our stuff’s in AWS, can you please go get OpenAI models?”, and this is just satisfying that need — and it turns out, because AWS is the biggest, that was an astronomical amount of need. Is that just the easiest answer? Or is there a bit here, too, where you actually think you can deliver something highly differentiated that will also draw new customers for each of you?

SA: We’re clearly thrilled to get access to AWS customers, and so many people love AWS. Yeah, that is a true statement.

MG: That part is definitely true.

(laughing) Right.

MG: And vice-versa, our customers are very excited to get access to OpenAI technology.

SA: But I do think there is something incredible and new to build together, and I am hopeful that when people look back on this in a year, the most important thing people will talk about is not like, “Oh, finally, you can get access to these models via AWS”, or whatever, but it’ll be like, “Wow, we didn’t realize how important this new product was”. I think we are close at a model and harness and capability level to just a completely new kind of computing and that will feel very different than the existing ways people have thought about, “I need an API to this model”, or whatever.

MG: I couldn’t agree more, that’s exactly it. The first part is great and is nice and the second part is, I think, what we all get super excited about.

Building the AI Stack

To that point, I mentioned I want to come back to this earlier, but I have a theory, which may or may not be correct, I’m curious your guys’ point about this, about stuff to be built. Specifically, there may end up being this real middleware or middle layer of where you have all these different databases and SaaS apps and all these bits and pieces of data in an organization that can stretch across things, you have this agent layer/harness or with the harness, I guess, sitting on top, and there’s something to be built in the middle and OpenAI Frontier gets at this a little bit. Is this part of this? Or is this something to be built? Or am I totally off base and we don’t need that at all?

SA: You are totally right that we need something there. When I’ve been talking to customers recently, like large enterprises, they’re like, “I want some sort of agent runtime environment, I want a management layer where I can connect my data to agents and also make sure that I understand where I’m spending on tokens and not and have some sort of oversight there, and I want some sort of workspace” — hopefully it’ll be Codex — “something like that for my employees”, and that package of what people are asking for is getting remarkably consistent, but there is work to go off and now go build all that offering.

It feels like there’s like almost a double agent layer that’s necessary. There’s like the agent layer to maintain the middle layer that is constantly spelunking down in all these data sources and then there’s the actual user interface layer that is where people are actually interacting with. Does that sort of fit with where we’re going or is that off base?

SA: On both of those, I agree that that’s a picture of how the world looks today. As the models get really smart, I don’t think we know exactly what the architecture of the future is going to look like.

Right now people do, at this sort of call it user agent layer, want to interact with multiple agents and we make it so that you can build agents for this thing and that thing and they can talk together and whatever else and then at the company management layer, people have all these controls about how you help the AI go spelunk and files in file systems.

And at some point you realize that you’re just holding on to the past for no reason at all, this should just be in the model.

SA: That’s what I was going to say. At some point, you may say, “Actually, we have such incredible capabilities, let’s re-architect the whole thing”.

MG: Yeah, I agree. And I think there’s something different, and I’m not sure we all know what it is yet, but that’s part of the beauty also, is you get customers using and building and you can learn from them and figure out how you can make that easier, faster, better for them.

Sam, this is the second time we’ve done one of these product launch interviews, last time it was with Kevin Scott and New Bing — you were pretty confident about the threat you posed to Google then, how well do you think that worked out?

SA: I think we have done better than I expected. ChatGPT is, I think, the first really large-scale new consumer product since Facebook.

Is that actually the answer, you’ve done better than you expected, but it manifested mostly through ChatGPT as opposed to other other areas?

SA: No, I think we’ve also done quite well on the API, particularly on Codex, but that was not what I was thinking at the time. At the time, I was thinking maybe these new kinds of language interfaces are going to change the way people find information on the the Internet and you know — Google, also just absolutely phenomenal company, I think in many ways Google is still underrated just in terms of the breadth and depth of what they do, but I am happy with how ChatGPT has performed relatively.

I actually have a Google question for you Matt, in a similar way. Google was just up there this week, Thomas Kurian talking about their fully integrated stack, all the way up and down from model to chip to to agent layer, all that sort of thing. You’re here with another company executive, definitionally not fully integrated within Amazon, but is there a bit where everyone was critical of you not having a frontier edge model — now that we’re in this sort of inference area, you’re used to serving a lot of companies. Did you maybe end up in a better spot by being neutral in a way? Was that on purpose or did you accidentally end up in a great place that you didn’t realize it was going to be?

MG: A little bit on purpose. We, since we started AWS, we have always embraced our partners as a key part of us supporting our end customers. Since the very beginning, it’s been an incredibly important part of our strategy is to lean in with partners and maybe different than some others, we view our success is if the partners are successful and they’re building on top of us or together with us, and if they’re successful, then we’re successful, that’s awesome.

We view it as that’s growing the pie together, then that’s a win, and it’s not necessarily how others view the world. Sometimes they say, “I have to own everything”, and that’s okay, that’s a view that people have. But I think that choice is important, and that way the best products win. And by the way, you can have first-party products in that world, you can have lots of third-party products in that world, but our view is we want the customers to be able to pick the best thing for them. And if the best thing is your own stuff that you’re building, awesome.

For us, if the best thing is what our partners are building, but it’s on top of us, we view that as a win as well, it’s because it’s the best thing for our customers. We’ve long thought that, and it’s actually how we built the Bedrock platform in the AI world. We want to support a broad set of models, we want to support a broad set of capabilities, and it’s true, it’s been true across from databases to compute platforms to other things like that.

So I think it’s been an intentional strategy, I think it’s a strategy that customers appreciate because they like that, and we’re excited to continue to lean into it.

Yeah, it’s interesting. There’s the balance between software, platform, infrastructure, and everyone says they’ll serve everyone. But it does feel like you go way back when AWS started, it’s like you start with the I [Infrastructure], and that gives you almost – that gives you the greatest flexibility, it feels like, from my perspective, to meet Sam in the middle. Sam’s got a great S [Software], you guys are building a P [Platform] together, I guess is the way to put it.

MG: That’s right. It does make it hard where you say, “We have one S3”, there’s not other S3 offerings, that part is true. So some of those core components are, like you said, at the infrastructure layer, we do lean in pretty heavily on the stuff that we build. But as you move up that stack, I think there’s a broader set of capabilities and if you view the world that — in no world do I think any one company is going to own every application and as you get further down the stack, when you get to kind of the models and services layer, there’s fewer of those and you get down the infrastructure, there’s even fewer of those and our view is kind of embracing that whole set of partners is great for us end customers.

Sam, any final words?

SA: I think that was very well put. I really do think there’s a potential at a new generation of the kinds of products that developers can now build and given how steep we expect model capability progress to be over the next year, the fact that we’re going to go on this journey together and try to really build a platform to enable it, is coming at a good time, and I think people are going to love it.

Very good. Matt, Sam, thanks for coming on Stratechery.

MG: Awesome. Thanks for having us.

SA: Thank you.


This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery.

The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly.

Thanks for being a supporter, and have a great day!