An Interview with Nvidia CEO Jensen Huang About AI’s iPhone Moment

Good morning,

I first spoke with Nvidia founder and CEO Jensen Huang after last March’s GTC conference, and again after last fall’s GTC; as I observe in the interview below, Nvidia’s semiannual conference frequency might seem very aggressive, but given Nvidia’s central role in AI our last talk seems like it was years ago.

In this interview, conducted on the occasion of this week’s GTC, we discuss what Huang calls AI’s iPhone moment — ChatGPT — and how that has affected Nvidia’s business. We also touch on the biggest announcement from GTC — Nvidia’s new DGX Cloud service — while also discussing how Nvidia responded to the Biden administration’s export controls, TSMC’s new plant in Arizona, running AI locally, and Nvidia’s position in the stack in an LLM world.

To listen to this interview as a podcast, click the link at the top of this email to add Stratechery to your podcast player.

On to the interview:

An Interview with Nvidia CEO Jensen Huang About AI’s iPhone Moment

This interview is lightly edited for clarity.

The Impact of ChatGPT

Jensen Huang, welcome back to Stratechery.

JH: Thank you, Ben. Nice to see you.

The last time we talked was in September of last year, I’m honored that Stratechery is sort of your post-GTC go-to. Two months later after that, ChatGPT was released. You said on your earnings call last month that, “There’s no question that whatever our views are of this year, they have been dramatically changed.” Let’s start with why. Why have your views dramatically changed? What was the impact of ChatGPT?

JH: Well, we’ve seen GPT-3 out there for a while and we’ve seen WebGPT and we’ve seen InstructGPT, we’ve seen pieces of GPT released and each one of them were really spectacular. But what really woke the whole industry up was ChatGPT. This is an extraordinary application, this is unquestionably the most easy-to-use application that has ever been created that performs tasks that are consistently surprising to just about everyone. You could be someone who doesn’t know very much about computing and interact with it and be surprised with the coherent response. You could be somebody who is deep in computer science and still marveled by the apparent reasoning and problem solving that it is able to do. I think that across the entire spectrum, people were surprised by the incredible capabilities of ChatGPT.

And that woke up the entire industry, in every industry. It was the conversation of every boardroom, I’m sure, it was the conversation of every engineer around the world, and it was a dinnertime conversation, children were talking about it, there’s never been anything like it. This is a big moment, of course everybody was surprised by the widespread use of ChatGPT, but what was immediately happened after that is how every cloud and every software company woke up and asked, “What does this mean to us?” and simultaneously, hundreds of startups were being created and incredible applications were being built with generative AI, both image generative AI as well as language generative AI or a combination of the two, and these startups were coming out of the woodworks. VCs were funding it, the VCs were writing blogs about generative AI, this was all happening within about sixty days.

So if you were to apportion the surprise, because I kind of heard two answers there, one is just surprise at the capabilities and one is the mass response. Was it a little bit of column A, a little bit of column B as far as your surprise as the CEO of Nvidia? Like you said, you’d seen GPT before — GPT-3 and its previous various iterations — but was it not just that it was so capable but also like, “Holy cow, there’s so many people using it, this is going to have a meaningful impact on our business in a way I didn’t expect”?

JH: I think we were all surprised by how effective ChatGPT was in both its ease of use as well as incredible capability. Almost immediately after, the ripples of ChatGPT around the world, cloud service providers, software vendors in all these different industries started to ask the question, what does it mean to them? The combination of both of those things, and they were both independently surprising I guess, but it happened very fast. It’s not as if transformers weren’t developing and getting larger and larger, and it wasn’t as if the innovation wasn’t done in plain sight, but it was the ChatGPT moment when it all came together. The user interface was incredible. It was fine-tuned to produce incredible results, and it was all of that that came together in a flashpoint and because it went around the world literally instantaneously, the ripples, the reverberation around it from every industry happened very quickly.

A lot of people, they never directly engaged the work that we were doing. We were talking about it, but most of the executives that I knew, they see the GTC Keynotes, they’re really excited about them, but it doesn’t affect them directly. Not one executive that I know, not one, has not been awoken by ChatGPT, now they call me and say “Now I understand what you were talking about. All of those things you were talking about, I get it now.” When I was explaining the transformers, these large language models, it’s first learning the language of humans, but it’s going to learn the language of everything, everything that has structure. So what has structure? Well, it turns out the physical world has structure, that’s why we’re symmetric, that’s why when I see the front of Ben, I have a feeling about the back of Ben.

You had an extensive bit on physics a couple of GTCs ago I think that was trying to make this point, but now people get it.

JH: Yeah, and there’s a language to proteins, there’s a language to chemicals, and if we can understand the language and represent it in computer science, imagine the scale at which we can move, we can understand, and we can generate. We can understand proteins and the functions that are associated with them, and we can generate new proteins with new properties and functions. We can do that with generative AI now, now all of a sudden those words make sense and now they’re connecting and fired up and are now applying it to all of their fields of their own companies and see opportunity after opportunity for themselves to apply it. So I think the “AI Moment”, ChatGPT, was a very important deal, it kind of opened everybody’s mind.

The thing that is equally important is that ChatGPT and generative AI has several properties — one, of course, is it caused and drove an inflection point in the adoption of AI which is the reason why training is going to up, which is the reason demand is going up. However, what it’s also done is it has created a new computing model where the way you program the computer and the applications that you can build with this new computer and the accessibility of this computer, meaning the people that can actually put it to work, is all brand new!

These three properties that I just described characterize computing platforms. You can use these three properties to describe minicomputers, workstations, PC’s, the Internet, cloud, mobile cloud — each one of them you program it in different ways, you write different applications, and the reach of this platform is different. So in the case of workstations it was measured in hundreds of thousands of people, in the case of PC’s it was measured in hundreds of millions and billions, in the case of mobile devices it was billions, and what’s interesting is that the number of applications in each case grew. There are five million applications in app stores, there are probably several hundred PC applications, there’s 350,000, maybe 500,000 websites that matter — how many applications are there going to be on large language models? I think it will be hundreds of millions, and the reason for that is because we’re going to write our own! Everybody’s going to write their own.

This is a phase shift, a difference, an inflection point, a different way, there’s definitely a new computing model here, this is profound. We’re going to look back and we’re going to say “Hey, guess what, we were there”. The day this thing came to realization we were all there, we internalized what it meant to us, and we took advantage of it. A lot of companies are asking themselves, “What does this mean? What does this mean to us? What does this mean to our industry? What does this mean to our competition? What does this mean to our products? What does this mean to our business model?” This is happening right now as we speak, a very big deal, a really really big deal. Hopefully one of these days when we look back at this interview, we will remember when this was happening when you and I were talking.

I think that’s very well put. You had a section right in the center, right before you introduced the inference platforms, where you walk through ChatGPT and the implication of people being able to effectively program with natural language. I thought that was the most compelling part of the presentation. You always go so fast from product announcement to product announcement, to have that little bit in the middle where you went “Look, here’s context, this is where it’s a big deal”, I really enjoyed that part and that’s sort of what you were driving at now.

JH: Really big deal.

Nvidia’s ChatGPT Response

I guess the natural follow-up then is how have your views changed in a tangible business sense? What is Nvidia doing differently post-ChatGPT that they weren’t doing before?

JH: Well, the first thing we have to do is respond to demand. There is clearly an acceleration of demand, a step up increase in demand, and a great urgency to deploy the resources in two different ways. First is training — almost every major cloud service provider was already working on large language models, now they realize they have to get to the next level faster.

And not just the research side, but the inference side.

JH: That’s the first thing is just to train these models faster, to get to the larger models sooner, faster, to develop all of the supporting models for fine-tuning and alignment and guard railing and augmenting and all of the supporting models that go around these large language models, everybody’s got to work on that immediately right now and so the urgency of getting to the next level of large language models, which means bigger models, more data, and the urgency of creating all of the surrounding models, that’s one. So training has experienced a step up in an acceleration in demand around the world.

Second, because generative model is so useful that it’s being connected into Microsoft Office, the most pervasive applications in the world, Google Docs, all of a sudden generative AI is now connected into some very, very large applications and web browsers of course and so the inference demand has gone through the roof. So I think both of those things are happening and the step up in demand, the urgency of delivery, both of those things are happening at the same time.

I want to get to the other shifts you might have made, but I do want to jump in on that meeting demand bit because I want your help to solve one of the biggest conundrums I’ve had, which was in the second half of last year when you were taking large write-downs on your inventory, it turned out when you dove into your financial statements that a big chunk of those write-downs was not just chips you’d already made and couldn’t sell, but also future purchase order obligations with TSMC, and they were famously being very strict about those when everyone wanted space. How should I, as an analyst, balance those write-downs with the perception of there being an explosion in AI applications? Is the explosion overstated or is it just a matter of that happened before ChatGPT and things exploded after that? Or is this a gaming issue and you can’t convert to AI? That disconnect has always stood out for me and I’d like your help in solving it.

JH: We had to take reserves, we had to take write-downs on gaming products and data center products, fairly obviously because we were expecting last year to be a lot bigger than it turned out and because cycle times were so long in the beginning of the year, it was fairly certain that we must have either built or made commitments to purchase a great deal more inventory and supply than we actually ended up selling. Last year was unquestionably a disappointing year.

Right, but it wasn’t just stuff that you didn’t sell, it was also those purchase obligations, which maybe I’m wrong, but I assume that’s stuff that hasn’t yet been built but you’ve committed to build in the future.

JH: In those cases, if we end up building it, then those commitments will likely be unwound, but we’ll see how it turns out. It just depends on how much is future commitments, how much of it was really written off, how much of it is previous generation. For example, by the time we get there, it is unlikely that we need, as time goes on, the likelihood of requiring Amperes will be lowered because we want Ada’s and Hopper’s and so we were expecting to sell a lot of Amperes last year, obviously. There’s a shelf life in that, and the Hoppers with the transformer engine is so good at these large language models, so we’ll see how it all turns out.

Got it. What is the constraint on building more and meeting this demand? Is it just the ability to make chips? Is it data centers that can house them? Is it the customer base actually coming to the table and putting money down? What’s the limiting factor?

JH: Well, you’ve kind of identified quite a few of them. First of all, people think that a GPU is a chip, but it’s not. If you think about what our data center GPUs are, it’s eight chips with two-and-a-half DCA [unclear] packaging interconnected into NVLink with a thermal management system that is off the charts, delivering a few thousand amps at a few gigahertz. It is a very heavy computer, I think just lifting a Hopper out of the oven is probably something along the lines of 70 pounds with 35,000 individual components. So building that GPU is, even if you had the five nanometer wafers, you’re still missing a whole lot of components and then there’s the manufacturing part of it.

Now that’s just a GPU. In order stand up one of these AI supercomputers, you got switches, you have all the NICs, you have all the cables, and then you still have data center space you got to stand up, the PDUs — all of it is critical path, none of it’s easy. These are the most advanced computers the world makes today, and we make them in volume and so we’re moving as fast as we can, everybody’s in a bit of a race and there’s a great deal of urgency to get there and so we recognize that and we’re working as hard as we can.

To go back to what Nvidia is doing different post-ChatGPT — we hit number one, you’re just racing to meet demand — what else has changed about the way you think about the business?

JH: Inference. The inference, the scale of inference business has gone through a step function, no doubt, and the type of inference that is being done right now where you know that video will have generative AI added to it to augment the video either to enhance the background, enhance the subject, relight the face, do eye reposing, augment with fun graphics, so on and so forth. All of that generative AI work is done in the cloud and so video has generative AI. We know that there’s imaging and 3D graphics for generative AI, video for generative AI. One of my favorite companies is a company called RunwayML, which is basically video editing at full film quality scale with generative AI — the video capability is incredible, anybody could do video editing now and that requires a different type of GPU.

Then there’s large language models and the large language models span from more than 175 billion parameters, but more than that, and of course scaling all the way down to maybe 5 or 8 billion parameters and some 20 billion and 40 billion and so on and so forth, 75 billion, all kinds of different sizes and shapes. So there’s the large language model inference where response time per token per character is really important because you’re writing a very long letter or you’re writing a very long program and people want to see it relatively interactively. If it goes offline, the patience for people using it is going to be challenged. Then these inferences have retrieval models and these retrieval models that have AI databases, vector databases, and it has its own type of inference model, and Grace Hopper is ideal for that.

You had those four chips — or those four new platforms you called them — that came out that were all dedicated to those four use cases, have those all been in the works for a long time? Some of them have analogies to previous generations like the Ampere generation, but some of them like the H100 language model one, I think that one’s new. What was the speed from coming up with these to actually announcing them and developing them? Have they been in works for a long time or was there a bit where that was absolutely accelerated with the explosion in inference in particular? To the extent that’s the case, it’s certainly an affirmation of your ability to take a core architecture and rework it very quickly into something else, I’m just curious what the timing is there about those four new platforms?

JH: All of the core technology exists. Turning them into inference products at scale for data centers, that happened in about six months.

China and TSMC

You mentioned one thing, just as an example of how you can quickly change course, you basically took all these A100s and changed them to being the A800s for China. By and large, the main limitation you’re dealing with there is the speed of the memory in interconnect and the actual speed of the chip is fine, and now you’re going to have the H800 going forward. What’s the implication of doing that? You talked about the DGX supercomputer a little bit ago and all the complexity that goes into it, obviously that’s dependent on high speed memory interfaces between the chips. Is it even possible to build a DGX computer for China or is it like they’re going to have to figure out how to tie those chips together on their side?

JH: The 800 series are fused on, means they’re physically fused, so that it gears down either the floating point processing or the interconnect or both, in order to comply with our export controls and so they’re physically fused and they cannot be unfused. But however, it’s form-fit function identical and it plugs into the same sockets, it runs the same software just slower. And it’s slower, but it’s still the world’s best. So it serves the needs of the customers that uses them and it complies. Ultimately, the most important thing is it complies with regulation.

You’ve talked so much about the way you’re thinking about accelerated computing and you love to mention that Moore’s Law is over, or slowed down, or however you want to frame it, and that it’s important to not just build a chip, but you have to build a system and then that has to scale up to even data centers. They still plug in, but with this fuse of the memory interconnects, are these still scalable to systems and to data centers and it’s just slow? Or is that a fundamental limitation on how that scaling can work?

JH: It scales but not as effectively, it just depends on what scale you would like to go to. The importance of interconnects is fairly significant. The computing fabric that compute connects processors needs to be quite high speed. The faster the processors, the greater need for high speed computing fabrics and so it’s a matter of scale and the effectiveness of the scale. For example, if you want to increase to 1000 processors, the linearity of that scale up would be less linear and it would plateau earlier if the interconnects were slower and so that’s basically the trade-off. It’s just a matter of how far can you scale and what is the effectiveness of the scaling, the linearity of the scaling.

On the same theme, you were in Arizona for TSMC’s ribbon cutting late last year, and TSMC has made comments in their earnings about how a part of their selling proposition going forward is going to be the cost incurred through increased geographic flexibility. That’s something you’re going to pay for as a TSMC customer, is that a cost you’re eager to pay or do you feel a little bit along for the ride here?

JH: Ultimately every company needs to have diversity and resilience, that resilience comes from diversity and redundancy and in order to a achieve diversity and redundancy so that every company can have greater resilience implies building fabs in the United States and elsewhere, and those fabs are incrementally more expensive. In the grand scheme of things, those have to be taken into consideration. And so, there’s a price to be paid for diversity and redundancy and we invest ourselves in our company and every large company in order to have resilience. There’s power redundancy, there’s storage redundancy, there’s security redundancy, there’s all kinds of redundancy systems. Even organizations — sales and marketing are dovetailing each other so that they can have some diversity and some redundancy so that you have greater resilience, engineering does the same thing. We have a lot of different ways to be a more resilient organization, this basically says we need to have more resilience in the supply chain. Now, in packaging and assembly and testing, we have diversity and redundancy already built in and that costs money and so these are all the things that large companies have to take into consideration.

DGX Cloud

This has been an ongoing theme of our conversations. It started when I first talked to you a year ago and I asked you if you would ever have a cloud service. It turns out, to go back to that interview, you pre-announced DGX Cloud in that you said, quote, “If we ever do services, we’ll run it all over the world in the GPUs that are in everybody’s clouds in addition to building something ourselves if we have to.” Now, last time we talked about Omniverse Cloud, which is the “Building something ourselves”, perhaps. But here, you announced DGX Cloud that runs in other people’s data centers. Were you already planning on DGX Cloud then, or is this something that you have really shifted your thinking around over the last year? What did you tell me, yet not tell me a year ago?

JH: (laughing) I guess that’s going to have to for you to decode. If you look at GTC, most of everything we’ve been building, I’ve been describing and it’s been built in plain sight and the reason why it’s important for me to convey our direction fairly transparently at GTC is because we have so many partners that depend on us, and it’s important that they understand where we’re going. We’ve been very transparent about and very consistent about our desire to build a computing platform that’s available everywhere and this computing platform is built at data center scale. Today’s computers is not a PC, today’s computer is a data center, the data center is the computer and you have to orchestrate that entire fleet of computers inside the data centers as if it’s one. That’s why they call it single pane of glass, it’s managing one computer and that’s why it has to be software-defined. That’s why you have to have a separation of the compute plane and the control plane.

All of those architectural reasons leads up to, basically, the data center is the computer. We build our entire system full stack, and then we build it end-to-end at data center scale but then when we go to market, we disaggregate this entire thing. This is the miracle of what we do, we’re full stack, we’re data center scale, we work in multiple domains, we have quantum computing here, we have computational lithography there, we have computer graphics here and this architecture runs all of these different domains, in artificial intelligence and robotics and such and we operate from the cloud to the edge and we built it in a full system, vertically integrated, but when we go to market, we disaggregate everything and we integrate it into the world’s computing fabric.

We integrate it into Dell, we integrate it into Lenovo, we integrate it into Quanta, we integrate it into Foxconn, we integrate it into Azure, GCP, AWS, OCI, so on and so forth and everybody has their own needs and everybody has their own way of operating and we work with them to understand what the APIs are. We take all of those considerations back into our company and we create a way that allows us to be a systems company on the one hand, vertically integrated, fully full stack, and yet when we go to market, we disaggregate everything and we connect it into the world. That is really the miracle of what we’ve done and that’s only possible if I communicate to the ecosystem at all times what I intend to build so that they could be prepared for us on the one hand, we could be prepared for them on the other.

Now, if you look at our computing platform today, let’s take an OEM for example. We are really an extension, Nvidia is really an extension of Dell, we’re an extension of HP and we’re an extension of them in the sense that whenever you want to buy an Nvidia computer, you call Dell and you specify the Nvidia computers that you want and they have it all available to them, fully compliant with our model, it’s fully compatible with all of our software stack and the customer could fully expect to have a fully optimized computer. Well, we should be able to do exactly the same thing at all the CSPs (cloud service providers). We should be able to extend our architecture at Azure, we should be able to extend AWS with Nvidia, and that extension of the computer should be exactly the same as what they would enjoy as the respect to run the Nvidia stack natively and they should be able to enjoy the computer, run that stack, that is, on any cloud, multi-cloud, hybrid cloud, all the way to the edge, that same software stack.

When we talk about cloud, that’s essentially what I imagine and now there’s a business extension. This is the largest, most significant business model extension we’ve ever had and in this relationship not only is Nvidia in the clouds, the CSPs, these instances, we take to market ourselves and when we win, they win. Of course they also have their own Nvidia instances and they’re welcome to take those to market, and we support them and so it’s a bit of an interesting model in the sense that our salesforce and our marketing and their salesforce and marketing are now working so closely together and in every instance we sell with them. Some of it comes directly through us because the customer wants to work with us directly, some of it goes through them because the customer enjoys working with them directly. We’re fully supportive and happy with either model.

I certainly get the bit about how Nvidia has been broadly accessible through the different clouds. But to your point, this does seem like a pretty meaningful shift where you built this system and then as you mentioned, you disaggregate it to a certain extent, so it fits in the different clouds. But now you are actually delivering the system as a whole, where you don’t go sign up with Oracle, you actually go to an Nvidia site and deal with Nvidia sales. You talked about you’re going to help people with these models and that entails a large and growing service organization.

There are a lot of different angles on this, but you’re launching with Oracle, that’s the first one, and that makes perfect sense to me because you walk in and you get to not only bring them immediate differentiation off the top, but also because they’re starting from behind, they will build the full Nvidia stack, including your networking, including your management layer, as opposed to the other extremes like AWS where they’ve invested heavily in their Nitro layer, which is kind of directly competitive with some of the stuff you’ve done to tie this together. Is that a good read of the situation where you now want to go directly to customers, and so just like a Dell or an HP is okay to build directly to your specification and be an implementation layer, is that a way to think about Oracle? We now have the OEM layer of clouds?

JH: It’s close. It’s not so prescriptive in the sense that in the case of Oracle OCI, I worked with Clay [Magouyrk] and we worked with their architecture team to figure out, “What is the best way to integrate into their cloud architecture?” It’s not so much that we deliver our systems and they stood that up, it’s far from that, we work very closely to figure out, “What is the best architecture within their cloud that can be as performant as possible on our entire stack?”. Our entire stack has to run natively, everything that we do has to run natively, including our orchestration layer, including just the system management layer or distributed computing layer. All of it has to run natively, and so we have to work closely with them to make sure that their cloud and our software architecture are going to work harmoniously together, so it’s a collaboration in every case.

When we did it with Azure, there’s a collaboration that includes, “What’s the best way to do the computing? What’s the best way to integrate with security? What’s the best way to connect the storage? How best to host it so that the customer with a lot of private data security issues and maybe industrial regulation issues for the data, that they could best manage it?”. So there’s a fair amount of collaboration around architecting for these different things.

So is there going to be a situation where maybe the experience is going to differ per cloud provider? There may be ones that are like, “Look, if you want the full Nvidia experience, this has our networking, it has our management layer”, “Oh this one, it has a different networking which we’ve accommodated with and we’re working with it, but it’s not going to scale quite as well because it’s not fully tuned”. Is this going to be a difference that’s tangible to customers and you’re going to communicate to them?

JH: Yeah, sure. Ultimately, the customer’s going to pay for value and performance is a very important part of that. But ultimately the architecture we’ve implemented is not so diverse that it’s going to be that noticeable and so all of the clouds have worked really closely with us to enhance and optimize to the best of our abilities and it should all be good, I’m expecting them all to be good.

When someone signs up for DGX Cloud, are they going to then choose which underlying host provider they want? How does that work? Do they have to also have a relationship with Oracle or with Azure or GCP or is Nvidia going to intermediate that relationship and you tell Nvidia what you want, then Nvidia will go find the place to put it?

JH: If you don’t care, then we suggest it. If you do care because you already have some kind of a pre-negotiated agreement with them, we’re happy to use that.

Got it. That makes sense as far as Azure for example, I think clearly, a DGX Cloud is very attractive to enterprises, particularly the bit where they can leverage your pre-trained models, they can put their own data in, they can have their own LLMs and they probably already have a relationship with Azure and it makes sense for Microsoft to want to be there, also GCP to an extent.

AWS is notable by its absence. I don’t know if I’ve seen so many logos in an Nvidia presentation, almost every slide had a logo. There was clearly a strong emphasis on the importance of partnerships for you going forward, but that did make the absences more notable. At the same time, you announced there’s going to be the new P5 instance, which sounds like a DGX on AWS presumably through their software layer. Is there a technical hangup there or do just they feel like for their business, it makes sense to stick with AWS instances, not be intermediated?

JH: There’s not much to read into it aside from we started with the other three. There’s dialogue, and it’s hard for us to stand up everybody all at the same time, we have every interest in doing it with AWS, I think AWS has every interest in continuing to partner with us in a very deep way. In our joint press release it described a very deep partnership. We’re working together on SageMaker, we’re working together on all the infrastructure, we work together on AI, there’s just a ton of stuff that we’re working together on, recommender systems, graph neural networks, their robotics system, the list of things that we work on together with AWS is really large. This new business model that we described, we started with OCI not because of anything aside from there was just a lot more compatibility there.

And they have the most to gain.

JH: They potentially have the most to gain but we want to work with everybody in this way, just as we work with HP and Dell and Lenovo. Their time to market on each one of our platforms is not always at the same time, but at some point, we work with everybody all in the same way.

So are you going to have to really build out and expand your salesforce and support organization? It really jumped out to me when you were talking about your Nemo and Picasso models and how they can be foundation models and industries can bring in their own data with them. You’re like, “We will be standing by to help you and assist you through this process.” That made my ears perk up a little bit because to your point, previously someone would go to Dell to buy, they’re buying an Nvidia computer, but Dell is handling that layer of interaction with the consumer. This sounds like a pretty substantial commitment from a go-to-market perspective, an ongoing support perspective to interface directly with Nvidia — is that something you need to build out going forward?

JH: We already work very deeply with end users and developers who do these things. We do that today and our engagement with the world’s leading important verticals that we focus on, whether it’s healthcare, automotive, of course all the AI startups, we work with some 10,000 AI startups. So industry after industry, if there are industries where we could add a lot of value, the video game industry, we have direct coverage on just about every developer. The automotive industry, we have direct coverage on just about every single car company. The healthcare industry, we’re working with just about every drug discovery company and so we already do that today. It’s just that the fulfillment of the system ultimately comes from somebody else. If you want your stack accelerated, you work with Nvidia. We work directly with TSMC on cuLitho and we work directly with ASML, it’s not as if some other intermediary is doing that on our behalf. The quantum researchers we work directly with and it’s just a fulfillment of the systems historically has been through somebody else. In this case, the fulfillment could still be through another CSP or some other OEM or it could be directly through us.

Is there a bit where with this new business model it’s kind of a relief because Nvidia’s always been so famously cyclical and you went through one of the downsides of the cycle last year and is there a bit in the back of your mind that’s like, “You know what? Some subscription revenue would be pretty nice instead of selling products.”

JH: That wasn’t really the consideration, the consideration was there are many companies that are working on very important stacks themselves and the stack could be their enterprise version of machine learning frameworks. There are companies that standardize not on somebody else’s ML ops, not on somebody else’s ML platform, but their own and the reason for that is because they have a very large team internally and they have a lot of expertise and a lot of that stuff is quite proprietary. They believe they can do a better job and in that particular case, we would work with them to accelerate their end-to-end ML ops platforms and we would either host it if they choose to host it with us partially or all of it, or if they host it with one of our CSP partners, all of it or partially, we’re completely happy with it either way. So this is really a way for us to be able to engage directly with customers all the way to hosting, standing up their service, standing up their applications in the cloud and to be able to move faster and have greater reach. We can now, because Nvidia’s in the browser, instead of having to build up a DGX system, you could still build up a DGX system, but you can get going right directly from a browser.

The DGX Cloud Customer

That’s the part that I’m very curious about, you mentioned these teams with large teams and large organizations that are their own structure, they are perfectly capable of getting a P5 instance from Amazon for example, and getting that all set up on their own. But a point you emphasized when talking about DGX Cloud was all the enterprises that don’t have those teams, that woke up over the last six months to generative AI in particular and realized they needed something and Nvidia’s well placed to provide that. So here’s my question: let’s say there’s an enterprise that comes up…

JH: Ben, maybe I’ll just interrupt you.

Absolutely.

JH: Your assumption was wrong. We have large teams working with Amazon on SageMaker. They have a large team and they need my large team to accelerate SageMaker. Vertex AI has a large team, they work with Nvidia’s large team to accelerate Vertex AI. Microsoft has a large team, but nobody knows accelerated computing better than we do. Nobody has a richer set of libraries that has to be integrated into their frameworks than we do and nobody has the scale of resources to help them accelerate and optimize their entire end-to-end microservices. We do that for a living all day long for people who have large engineering teams. Some of them have large engineering teams and the machine learning framework they created is not for languages, not for computer vision, but for drug discovery. They need somebody like ourselves to help them accelerate that platform end-to-end, and stand it up on a accelerated cloud.

So by working with us directly, they can accelerate it end-to-end and they can run it on any cloud that has Nvidia on it and if they decide that they would like to host it with us partially — remember, if they host it with us partially or fully, then they’ll have the benefit of engineers working with them continuously refining their framework, refining their platform. Accelerated computing is not like spinning up a CPU as you know, there’s a lot of acceleration libraries that have to be created, there’s just a lot of computer science. Otherwise, how is it possible that you overcome Moore’s Law? How do you overcome physics except for cleverness, right? And so that cleverness is computer science.

Makes total sense. So with that as grounding, is it fair to say then even with this large group of folks that work closely with Nvidia and Nvidia can help them do better and go faster, and whether that would be hosting or whatever might be, do you also see an opportunity of folks that have no idea what they’re doing and they feel like they need to get into this space? What is the relative size of that market of people that are coming into, “This is something that we need for our business” relative to the folks that you’re assisting, have always been assisting. Am I correct in seeing there being two different parts here, the newcomers, for lack of a better word, and people that are already established? Or is this just an expansion of the people that are already established?

JH: For people who are very new to machine learning or AI, they ought to just use one of the clouds.

Not your cloud?

JH: Yeah, just one of the clouds because SageMaker is terrific and Azure ML’s terrific, these are all terrific things. Unless you want to build something rather bespoke and rather optimized for a particular domain of use, you got to just use one of the — if you’re new to ML anyways, just start with any of the clouds. They’re all excellent and we’ve integrated our GPUs into most of the world’s AI frameworks and so they’re all going to be accelerated somehow. Unless you’re building either a bespoke framework platform and it’s very specific to you for your domain, but maybe it’s for industrial AI, maybe it’s not. You’re creating something that’s really important and your whole company strategy’s behind it, and it’s for a domain that’s not generic enterprise AI, it’s not XGBoost and so there you need some experts to help you develop. Not just develop the acceleration layer is one, but also maybe create some of the algorithms for it. So that would be an example of a company that would really benefit from working directly with us.

Maybe you want to create a proprietary large language model and you want to make sure that you can scale up that large language model, also refine it and deploy it and have that language model be as performant as possible and you have a lot of domain expertise, you have a lot of data scientists, but maybe you just don’t have the large language model expertise we do — we can help you. I think that anybody who could use a public cloud, anybody who could use a cloud service today, use it. We should not be the world’s fifth cloud service provider. We’re not trying to be that.

CUDA and Commoditization

That last bit I’m really interested in because I’m curious — do you think there’s going to be companies that really never get into CUDA because that just didn’t make sense for that ecosystem as it was, but now they’re all in on LLM. So it’s like they’re higher up, they want to build stuff on top of LLMs and you talk about all the startups that’s sprouted up in the post-ChatGPT moment. Is the explosion of innovation — is this going to be in a fundamentally different layer of the stack going forward, just further up on top and that’s something that you foresee going forward?

JH: It depends on what applications you perform. If you’re creating a generative AI-based video-based storytelling service and you’re generating video at very high quality, you’re going to want to work with us to accelerate the living daylights out of that. Because the processing of that is just really long and so we could add a lot of value. However, if you’re a spell checker, I’m pretty sure that I would recommend you to a whole bunch of services and there’s just no sense in proving that. So long as it’s already accelerated on GPU, and I’m pretty sure it’ll be accelerated on just about everybody’s cloud, I think it’s going to be pretty terrific. I would venture to say that 80% of the world should just go directly to the cloud, one of our partners, and we’ll work directly with our cloud service providers to make sure that the infrastructure and their services and their APIs are all as accelerated as possible, and that would be terrific. For some of them that can’t, for whatever reason, they’re welcome to work with us.

That makes total sense. But if there’s 80% of the world that should just go to your partner because they’re higher levels of abstraction, is that a risk for Nvidia in terms of commoditization? Do you have any concerns about efforts by companies like Meta trying to expand PyTorch to abstract CUDA away and lessen your lock-in, and AMD and Intel want to obviously fill in this space and they’re more than happy — a commoditization layer would be better to them because they’re behind. How much are you thinking about this and does it have any impact on your shift to the cloud? Or is this several things that may or may not be happening at the same time?

JH: You build a great company by doing things that other people can’t do. You don’t build a great company by fighting other people, a whole bunch of other people, to do things that everybody can do. Notice I said if any of our CSPs partners of whom we’ve already accelerated their stack, and we’re delightfully accelerating their stacks and we’re working hard to accelerate their stacks, and we announced a whole bunch of machine learning platforms that we’ve accelerated. Some of them are in the cloud, some of them are in third parties, anybody with an ML ops pipeline, ML ops framework, anybody with a framework, we’ll help you accelerate it. And if somebody can use that and make their lives easier and more convenient, we’re delighted by that, we should work on things that only we can work on.

Look at everything that Nvidia does. We don’t go work on a whole bunch of stuff that could be pin compatible or binary compatible, whatever it is. We’re just not a share company, we’re about solving problems that we should solve, that only we can solve, or that we’re just singularly best to solve. Notice the way I’ve explained everything so far is exactly that and there’s no inconsistency between celebrating the success of our partners, even if it’s in an abstraction layer where we are further below, versus in conserving our own energy to go work on things that only we should do and we can do, it’s just not a conflict to me. I’m really happy when AWS does well, I’m really happy when OpenAI does well and this is our attitude about our partners and ecosystem, whether they use CUDA or not, it just doesn’t matter to us. CUDA has great value to us, whether it has great value to somebody else or not, so what? The most important thing is that they’re able to do whatever it is they’re trying to do as easily as possible, as cost effectively as possible. If one of our partners is the best answer or if even somebody else is the best answer, so be it. We move on.

Centralized vs. Localized Compute

One other big question I have about AI generally is the question of centralized versus local. Obviously, centralized has huge advantages in terms of the compute-to-command and scalability. At the same time, there are costs attached to it and real controls that people may not always like. Meanwhile, we have seen models that run locally for Stable Diffusion in terms of image generation and then these past few weeks Meta’s LLaMA model for language. Do you think that we will see meaningful generative AI applications run locally? Number two, do you see much of a market opportunity for Nvidia there? Right now the best option is obviously an Nvidia gaming GPU, get a 4090. Should there be a consumer AI GPU?

JH: Inference will be the way software is operated in the future. Inference is simply a piece of software that was written by a computer instead of a piece of software that was written by a human and every computer will just run inference someday. Every computer will be a generative AI someday. Why look up an answer if you already know the answer? Every question that you ask me, if I have to go look it up or go find a bunch of friends and caucus and then come back and give you the answer, that takes a lot more energy than what’s already in my brain, and just call it 25 watts. I’m sitting here producing answers all day long —

Producing answers for the last hour, which I appreciate!

JH: Right, so this is completely generative AI. Generative AI is the most energy-conserving way to do computing, there’s no question about that. And of course the question is when can we do that on a large scale? Well, Ben, back in the old days, in order to run OpenGL, it started out in the data center in a Reality Engine, and then it was $100,000 – $150,000 workstation and now you’ve run OpenGL on a phone. The same exact thing is going to happen with inference. Today, large language models, the largest ones requires A100 HGX to run, so that’s a couple of hundred thousand dollars. But how long would it be before we can have smaller versions of that, and quite performant versions of that running on cell phones? No more than ten years. We’re going to run inferences literally everywhere of all different sizes. Distributed computing is very cost-effective, it’s not going to go away. However, in the future, you’ll do some inference on the phone, you’ll do some inference on your PC, but you’ll always have the backup, you’ll always be connected to a cloud model which is much more capable as a backup to the smaller version on the device and so I have a lot of confidence that today’s computing model is going to remain.

Is that local inference going to be done on Nvidia chips? Or is this an area where Apple has a big advantage? What do you think?

JH: Well, it would be the people who build devices today and there will be simpler models of course, but these devices are really capable these days, they’ll run all kinds of interesting models.

I want the H1 Nvidia chip that I can stick in my computer. Well, it always seemed pretty aggressive for Nvidia to have GTC every six months, but it certainly seems appropriate now given the pace of change.

JH: Ben, this is a big observation. Remember when Moore’s Law was advancing at 10 times every five years at ISO power and ISO cost? It’s that trend, that’s Moore’s Law, ten times every five years, ISO power, ISO cost. As a result, after 35 years, 40 years of owning PCs, still about $1,000. My first PC was $1,000 from Gateway, it’s still about $1,000, and the performance is way better. It still plugs into the wall, it’s still a couple of hundred watts, and the performance is way better, so that’s Moore’s Law.

Unfortunately, you’re not going to get ten times at ISO power or ISO cost this next five years, not going to happen. Meanwhile, what’s interesting is that in the same time, in this last ten years, there’s no question AI has advanced a million times, a million times. I mean, how is it possible that we went from AlexNet detecting cats to doing this today? So obviously, computing has advanced tremendously and the way that’s happened, of course, is a complete reinvention of how computers write software, the computer architecture of it, and the computer runs software. Every single layer from the chip to the system to the interconnect to the algorithms, all completely redesigned and so this way of doing full-stack computing as you projected out ten years, there’s no question in my mind, large language models and these very large language models will have an opportunity to improve by another factor of a million. It just it has to be full stack.

Yeah, I mean that’s a perfect way to end it, that is sort of what I was driving at. I love the callback to — I remember when the 486 came out or the 386 — it was like, “Didn’t the 286 just come out? It’s so much better and so much faster”. It does feel that way, that sense of, “Wow, this is not slowing down at all”. Jensen Huang, it was good to talk to you again and very exciting times.

JH: Incredibly good to talk to you.


This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery.

The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly.

Thanks for being a supporter, and have a great day!