Nvidia Corp., the most valuable semiconductor maker in the US, unveiled a new type of graphics chip that uses enhanced artificial intelligence to create more realistic images in games. Codenamed Ada Lovelace, the new architecture underpins the company’s GeForce RTX 40 series of graphics cards, unveiled by co-founder and Chief Executive Officer Jensen Huang at an online event Tuesday. The top-of-the-line RTX 4090 will cost $1,599 and go on sale Oct. 12. Other versions that come in November will retail for $899 and $1,199.
The high-end version of the new chip will have 76 billion transistors and will be accompanied by 24GB of onboard memory on the RTX 4090, making it one of the most advanced in the industry. Nvidia is relying on Taiwan Semiconductor Manufacturing Co. to produce the processor with its so-called 4N technology while Micron Technology Inc. is the memory provider. Nvidia has been using Samsung Electronics Co. to make Ada’s predecessor. The new technology promises to speed up the rate at which cards generate images using the traditional method of calculating where pixels are located on the screen while at the same time using AI to simulate others. It’s continuing a shift that Nvidia is pioneering that allows computers to make images appear more natural by building them using calculations of the path of individual rays of light…
The approach could give customers a fresh reason to upgrade their technology — something Nvidia could use right now. The chipmaker is suffering from a steep slowdown in demand for PC components. Last month, Nvidia reported much lower quarterly sales than it originally predicted and gave a disappointing forecast. Nvidia has been forced to deliberately slow down shipments to make sure its customers — primarily makers of graphics cards sold as add-ins for high-end computers — work through their stockpiles of unused inventory. That process should be completed by the end of the year, Huang has said.
I had the opportunity to talk with Huang this week about his GTC keynote which, despite the focus in news stories, spent relatively little time on the company’s new GPU; what is more important are the applications that are enabled by that GPU. I asked Huang about that shift, and how long it will take for Nvidia’s business results to reflect it. We also discussed the company’s shift into cloud services, its challenging last few quarters, President Biden’s China ban, and more.
To listen to this interview as a podcast, click the link at the top of this email to add Stratechery to your podcast player.
On to the interview:
An Interview With Nvidia CEO Jensen Huang About Building the Omniverse Cloud
This interview is lightly edited for clarity.
GPUs at GTC | The Omniverse Cloud | AI Democratization | Nvidia’s Enterprise Focus | GPU Glut, TSMC Power, and China | Measuring Progress
GPUs at GTC
Jensen Huang, it’s good to talk to you again on the occasion of your keynote at Nvidia’s GTC Conference. I actually had to look this up, but GTC stands for GPU Technology Conference — it certainly is the case that everything you talked about at the conference was built on and downstream from Nvidia’s creation and work in the GPU space.
However, what was really striking to me, is that while you did indeed announce a GPU based on the Ada Lovelace architecture available in the 4090, 4080 RTX graphics cards, you only spent about fifteen minutes on that in a ninety-minute keynote. Did you find that striking at all, or is it one of those things where you’re the proverbial frog in the boiling water, and as Nvidia has become about so much more than gaming, it seemed perfectly normal that we were going to have this little announcement up front, and then talk about a bunch of other stuff that is enabled by it for the rest of the time?
Jensen Huang: Well, let’s take a step back. GTC was created as a developer’s conference. That’s really what it was about, and that’s what it continues to be. It’s about the developers, it’s about the ecosystem, it’s about all the people that can or should benefit from developing applications that run on GPUs, and inspiring people to develop GPU-accelerated applications. It’s not easy like it used to be developing CPU-oriented applications, just writing a program, compiling it and off you go; you have to understand what GPUs are good at. There are many things that are unreasonably good, if you make the effort. How is it possible that an application could run twenty times faster, or an application would run fifty times faster? I spoke of an application, a quantum computing simulator for Amazon that ran nine hundred times faster. You wouldn’t expect anything like that. On the one hand, you want to educate people about what’s possible, the different things, the different ways, and different applications that it could run, and then teaching people how to do it. So that’s what GTC’s about.
In this particular GTC, we spoke about Ada, and I jumped right to it in the beginning of the keynote, because it is the driving force behind just about everything else that we do. Ada invented some very new capabilities, some of it’s related to artificial intelligence that we then speak about. I think it’s just amazing, and it came full circle, that the GPU was at the center of the modern AI Big Bang, and really enabled deep learning to scale and to achieve its results. But on the other hand, the invention of this new way of doing software, and these new families of algorithms, came back and revolutionized computer graphics altogether. And so on the one hand, the GPU, the graphics card made possible AI, on the other hand AI came back and reinvented computer graphics. I really loved that symmetry and I thought I would just celebrate it almost all together, right up front.
Then, of course, if not for the ability to do real time ray-tracing on an entire scale, on giant scale, in scenes that we were shown — and by the way, none of anything that I showed was pre-rendered, everything was all real-time — if not for Ada doing that, we wouldn’t be able to accomplish a lot of the things that we wanted to accomplish with Omniverse, so I felt that it made sense to put it all up front and then talked about the impact of Ada after that.
I want to talk about ray-tracing, I want to talk about AI and graphics, and I want to talk about Omniverse. That was an excellent setting of the table by you, whereas I, like your keynote, just jumped right into it!
I think that some skeptics of Nvidia might argue that the real reason to spend relatively little time on gaming applications, particularly in this keynote, is that the previous Ampere generation is good enough. It definitely was striking that there weren’t a lot of gaming demos — one the announcements is that you’re releasing a tool to mod older games, to take advantage of the Lovelace GPU’s, and you had a demo of a re-release of Portal. Is this a fair criticism in your view? My guess would be that your argument is that ray tracing is the future, it hasn’t been good enough until now, and only now can developers target it, but it will take time for games to come and fill the gap. Is that your view of the market?
JH: Fantastic. First of all, the answer is absolutely no. And let’s use some examples. The type of games that are going to be really fun in the future are going to be Minecraft-like games, Roblox-like games. These are games where the users are creating the virtual world, and you’re creating the world, or you’re playing in the world, and you’re helping build a world. Maybe the world came prebuilt, but you get to modify it. It could be a Battlefield, Call of Duty, something like that, and it came pre-built, but you get a chance to bunker down and create a small fort near a particular area. Well, in order to do that, in order to create these user-generated worlds, you can’t do pre-lighting, you can’t do pre-baking of the light, which is what most games do today. When you see a beautiful game, you go, “Wow, that looks almost like a movie.” Well, the reason for that, it was made like a movie.
Right. And in fact, that’s how they’re making movies now.
JH: (laughing) Exactly. So it was made into a movie by hundreds of artists, arduously rendering, pre-rendering, pre-lighting, adding all kinds of interesting textures, and it was largely made. A lot of the physics are canned, and so it doesn’t allow for user-generated content. If you take down a wall in a particular area, then all of a sudden the light floods in, and the light floods in and it illuminates the room in a very different way, all of a sudden it doesn’t look right. We would like to imagine worlds where either you build it completely, or you get to modify it, or you could augment it.
The other thing that you want to be able to do is have real natural physics. When you drop something, it ought to fall, when you throw something, it ought to behave the way it behaves, and so the amount of physics that has to be integrated into all the environments — you can just imagine physics and computer graphics go hand in hand, because when something falls apart, all of a sudden the way that you render it is completely different. The interaction of physical physics and electromagnetic physics are tightly wound, and so you can’t separate those things.
Then lastly, you want the world to have a whole bunch of AIs. You want the characters to animate and interact with you, and based on your conversation, have reasonable conversations back to you. You would like to have superpowers. For example, instead of building a fort completely piece, by piece, by piece, you might be able to give it an outline of the fort that you’re think thinking about, and then just say, “Create a fort”, and this AI goes off and generates this amazing 3D fort. Then from there you get to modify it, or you keep it. “Show me another one,” or, “Show me another one,” and, “Show me another one,” and then, “Populate, surround this fort with a forest.” And, “Show me another one. Show me more tall trees, more this kind of tree. Add a lake.” So you want this environment to be a living, breathing thing and that foundation of the future of computer graphics and the future of games, is what RTX is all about.
We’re on the same path here, because one of the striking lines in your keynote was that “Future games will not have pre-baked worlds, it will be simulations”. What’s interesting about that is it’s getting into the economics of gaming, where it used to be that all the cost of a game was in the actual game development, in developing the engine and all those sorts of pieces, but at some point the cost of creating graphics for immersive games has surged in line with the capability of chips like yours, and resulted in this world of super expensive AAA games, to your point, of all these artists laboring away at all these scenes. Then there’s little independent games, and there’s not much in the middle. Is a world of pure simulation, as fantastical as that sounds, perhaps counterintuitively a more accessible one, because you can code the environment, or you can describe the environment, instead of laboriously drawing it?
JH: Well, there’s a perfect analogy for what you’ve just said. Artificial intelligence, people thought, would aggregate technology powers in fewer companies, but in fact artificial intelligence democratizes computer science. It makes it possible for anybody to write software, it’s democratizing and makes everybody a creator, it’s going to make everybody a game developer. If you can go to the extreme and say that a user could generate and create all kinds of really interesting games, why can’t small studios create interesting games? Why can’t larger studios than that create even larger-scale games, and so and so forth? In the future, game developers will be less about the hard and the repetitive engineering of creating the world physics, which includes computer graphics and physics and everything around it, and more about the game. So it’s going to be more about creating interesting games and fun games, and gameplay, and interesting assets and things like that, and less about making physics work, because hopefully physics just works. In our world, physics just works.
The Omniverse Cloud
This leads me to one of the most interesting announcements of the keynote. You spent a lot of time, as you have for the last several keynotes, talking about your Omniverse offering, talking about Digital Twins, new libraries of assets, et cetera, and then at the end you sort of tossed off that Nvidia’s building its own cloud service for Omniverse. Of course I noticed this, because I asked you about this earlier this year and you said at the time:
If we ever do services, we will run it all over the world on the GPUs that are in everybody’s clouds, in addition to building something ourselves, if we have to. One of the rules of our company is to not squander the resources of our company to do something that already exists. If something already exists, for example, an x86 CPU, we’ll just use it. If something already exists, we’ll partner with them, because let’s not squander our rare resources on that. And so if something already exists in the cloud, we just absolutely use that or let them do it, which is even better. However, if there’s something that makes sense for us to do and it doesn’t make for them to do, we even approach them to do it, other people don’t want to do it then we might decide to do it. We try to be very selective about the things that we do, we’re quite determined not to do things that other people do.
Given that, what is it that you are going to do in the cloud that no one else will do, or they don’t see the opportunity, or they’re not capable of doing?
JH: First of all, Omniverse has several elements. At the core it’s about a shared world. Another way of saying it is, at the core, it’s a shared database, and because the state of the world, the assets of the world, the representation of that world is common and used by all of the people that are in it, we share a database. We share the representation of the tree, and the lake, and the fort and so on so forth and so all of these representations are shared, which means that there’s a database somewhere that’s active that we can all see. We could pull stuff from the past. We could say that we’ve already built a fort, or we’ve already built a tree and, “I’m going to bring that up. I built it in another world. I want to bring it in.” It’s a database. It’s a living, breathing, very, very large-scale, high-fidelity, true-to-integrity, high-integrity database, that you can use for virtual worlds for user-generated content, but very importantly that you can use for manufacturing. So the screw, the little piece of the tabs, the screw, the pieces of metal, all the way up to an engine, the wheels, the tires, all of it has to be its true form. It has to be the real asset otherwise you can’t use it for manufacturing, you can’t use it for supply chain and such. So this is a database, it’s the first USD [Universal Scene Descriptor] database of its kind. It’s very large scale, it has to be distributed, and it has to be shared. Meaning you and I could use it in different countries, we should be able to use it in different clouds, we should be able to use the same database, whether we’re on-prem, or in the cloud, or at home.
First of all, that database called the Omniverse Nucleus, what we call Omniverse, the Nucleus doesn’t exist anywhere in the world, and it’s first of its kind, number one, and number two, it has to be put in the cloud but it has to be put in every cloud and there’s a storage backend associated with it. It has to be resilient, it has to be distributed. This is an industrial-scale database system that has never been built before, and so it’s a very, very big deal. That’s where the vast majority of the raw computer science is currently working.
Then there’s the second part which is our portals, our viewports into that world. Each one of us, we could be a robot that’s working in that world, and so you would have a viewport. So I gave you an example of a self-driving car, and its viewport into that virtual world is from the perspective of the sensors that it has. You could be a person in that world, and you could have a XR or a VR head-mounted display and you want to see that viewport into that world fully ray-traced. You could be a whole bunch of robots that are in that world. So you got people, you got robots, and every autonomous agent has a viewport into that world, and that viewport is physically simulated. Most of the rendering that you saw, the video game of Racer X or the DRIVE Sim or Isaac Sim, all of those things are viewports. They’re just the viewport, the perspective of an agent, an autonomous agent, into that virtual world. But what it’s really seeing is the database, it’s the active, realtime database.
And so Omniverse Cloud has several parts. The first part is just putting the Nucleus up in the cloud that people can share and store and retrieve, and process information, all the things that you do with databases. So there’s effectively going to be a SQL of Nucleus, right? There’s going to be a new language, and that new language is called USD, and it allows us to process the data and manipulate the data and then there’s the second part, which is all the viewports, and all of that stuff has to be done in the cloud. The viewport has to be close to the agent, close to the people, because you want it to be interactive and low-latency and things like that. The database wants to be resilient and wants to be very large scale and runs on distributed computing, and it has to be a multi-cloud and that’s the reason why this is such a big deal. It’s never been done before.
Just from a technical perspective, you did talk about in the context of Omniverse Nucleus that you were also launching some aspect of that with AWS. It was super brief, so I actually watched it a couple times, I didn’t quite catch it, and then you were talking about on-premises applications and things along those lines. How does that actually work? You mentioned it had to be everywhere, I think about the on-premises thing, why do companies have stuff on-premises? Maybe they have regulations around their data, like they have to do certain things. How does that mesh with this idea of a shared database where everyone has access to all these sorts of things? How do those ideas of we’re going to be multi-cloud, we’re going to serve all these different use cases, but also it’s going to be this shared sort of experience, how do those two fit together?
JH: The easiest analogy I would say is probably Snowflake, but it’s Snowflake in real time. Does that make sense? It’s not for structured data, it’s for 3D data. Think of it as a database in the cloud that runs USD, and you could load stuff into it, you could retrieve stuff from it, and it’s multi-cloud and it doesn’t matter where you are, you just kind of access it. You can almost think of it as Dropbox except it’s live.
Right. Universal Dropbox, but for 3D scenes. What is the tension and trade-off here? If you have this idea of you have these powerful graphics capabilities, you start doing this descriptive scene-making, where you can describe scenes that could be made on the fly — on one hand, that’s super compelling. You could see how that lends to a metaverse future where the marginal cost of content creation is drastically lower. On the other side, you have this idea of well, you actually need these defined objects. If you have a USD of that screw, someone had to go in and actually make that piece and get it sort of exactly right. You call it the Omniverse or the metaverse or whatever, is your idea that it would be both? The background or the atmosphere, that’s maybe sort of rendered in real time and appropriate to your situation, and then there’s individual pieces that are pulled from this database? What’s the interaction between real-time generation of new content versus this database of what already exists?
JH: Well if you want it to be shared by other people and you want it to be shared in the exact representation and fidelity of what you wanted to share, like for example, a component in the supply chain, it has to be exactly what it is. You have, for the very first time, the ability to do it just once and not to create it for every single tool and every everybody’s format. You have an opportunity to do it just one time, just like you do it now on the Internet, you do it just one time, and then from that one time, you have the ability to let all of your customers use it. Once you build that one component, that supplier builds that component, then everybody else can use it, and then the next supplier, the next integrator builds it one time, everybody else can use it. All of those things ought to be done in libraries, and they could be third-party libraries. They could be part of our libraries, you could assemble a whole bunch of parts into your own part, now that becomes your own personal library, and then that’s stored in your database.
There’s a whole bunch of stuff that is related to virtual worlds that has to be alive. For example, every single session’s different. You have a whole bunch of robots that are moving around, and they’re not going to move around the same way every time, and your Digital Twin is being created in real time all the time. So that would be an example of data that is unique, bespoke to you, and it’s not pre-baked, and it’s a simulation, it’s a Digital Twin. So it spans the entire gamut of assets that are done by third parties, done by you, done in real time, and then there’s experimentation on top of that. Suppose I wanted to see how a factory is laid out and how it would change the throughput of the supply chain. Suppose I wanted to change the layout of my fort and where it’s located, and it might affect how defensible my fort is and I might decide to put my fort at a different part of the port so that it could be more defensible against future conquerors, and so it could affect my gameplay. I want to try different placements and simulate this outcome and so on and so forth. After I’m done simulating, I throw it away. I could decide that I want all these scenarios to be actually simulated by an AI so I would say, “Okay, now that I have a fort that I kind of like the structure of it, place it in a whole bunch of different places and tell me what is the most optimal place for defense”. Maybe it’s not the most optimal place for the best view, but it’s the most optimal placement for defense. The AI goes off and figures out what the loss function is, and then it tries all these different optimizations, and it says, “You know, right here. If you raise the fort, the whole base of the fort, the plinth of the fort by about ten feet, you’ll be able to see across all those small mountains and hills, and you’ll be able to see enemies coming from afar.” You know, whatever it is.
I’m curious though about this “Whatever it is” because there’s definitely a view here where you calling it the Omniverse, not the Metaverse, and talking about all these sort of industrial applications, it’s kind of like a great smoke screen where, “No, actually we’re the ones that are actually building this Metaverse vision, this Neal Stephenson world, where it spans the globe, anyone can go in and do it.
But I think one of the questions and challenges that occurs to me is you talk about things like going in forts and playing games, and then you talk about things like having the perfect screw for a supply chain. There’s a monetary aspect to this, a business model aspect, where, one, you maybe charge a whole bunch of money to access, because it’s super valuable, there are unique things. Yes, that screw is only made once, but having access to that screw is incredibly valuable, and then it’s going to be very expensive to get access to and that’s going to justify the investment. The other side is more of a consumer play, where you can’t charge consumers as much as you can charge in enterprise, but if it’s that much more accessible, the long-term potential upside outcome is just massive. As you’re talking about the technology and have all these ideas of what it could be used for, how is the business model opportunity playing into that? How do you think that this will work out?
JH: First of all, Omniverse is a platform and engine for other people to make metaverse applications.
JH: On the one hand, the viewports are essentially, think of it as metaverse engines. They’re simulation engines, unlike a game engine, that it tends to have a lot of pre-baked and artistically created assets. Everything in Omniverse is intended to be and optimized for real time and simulated assets. It’s an engine, on the one hand, for the viewports, on the other hand, the database, it’s just a database engine. It’s no different than Snowflake’s and so I think that the business model basically has three parts.
There’s the lowest-level part, which is all the Ada Lovelace GPUs and the Omniverse computers, and some of it will be in the cloud, some of it will be on-prem, and some of it will be at the far edge. A long time ago, and Akamai probably commercialized the best, came up with the content delivery network, the CDN. In the future, there’ll be a GDN (graphics delivery network), and the GDN is a whole bunch of GPUs very far edge, and it makes it possible for content to be enjoyed interactively and so low-latency that you can actually use VR and AR in the cloud very, very easily. It’ll track your head pose and it’ll track the world perfectly, and the round trip will be in not tens of milliseconds, but it’ll probably be ten milliseconds. So in that future world the metaverse could manifest wherever and be rendered beautifully right there on the spot. So on the lowest level, it’s the graphics hardware, it’s the infrastructure, and it’s in the clouds and the prem and such.
Then the next layer is the database, and the business model there is probably going to be very Snowflake-y, if you will. You could host your own database, we’re going to make it possible for you to host your own database, in which case you could imagine that’s probably very Oracle or very SAP-like, except for the metaverse. It could be in the cloud, it could be very Snowflake-like.
Then there’s the viewports, and my sense is that for the viewports, there’ll be two different types of viewports. For most of our consumer application viewport providers, they could be a social network, they could be a game developer, we’ll just let them do that for free. Then for the industrial applications, like Digital Twins, we might create something like DRIVE Sim or Isaac SIM that has a lot of the simulation, Digital Twin nature to it, and then give that to people for them to modify, and then we’ll monetize that per engine. There’s a whole different layer to it. The one thing that we won’t do is we absolutely will not build a video game, we won’t host a social network, we won’t host a virtual world, those kind of things. We’ll tend to probably be in the background, creating the engines.
I did want to go back to, you mentioned this democratization of AI, and that’s been a real mind shift and wake up call for me, and definitely, I was on the other side of assuming it would lead to more centralization. Stable Diffusion is like the preeminent example of this, where it’s open source, you can run it locally, but it’s also already being modified in all these super interesting ways. At the same time, it’s still pretty expensive to run this stuff, and I’m curious, 1) did you always have a view that AI would be more democratic than people thought it would be, or has that shifted for you over the last little bit as well and then 2) to what extent, if any, is some of this cloud offering, not just about, again, the sort of big industrial applications, but I’m particularly interested in using these GPUs on the edge, making them actually accessible for consumer-grade applications in a cost-effective way?
JH: First of all, if you think about the first principles of artificial intelligence, it’s about a computer that writes software by itself. The extent by which computing could help society is limited not by the cost of computers, it’s limited by the number of people who know how to program computers, and so the number of people who can now program computers has gone up by a factor of several orders of magnitude, and the reason for that is because just about everybody knows how to teach somebody else how to do something, or show somebody how to do something properly, but very few people know how to program C++. I think for artificial intelligence, the concept that you now have a computer scientist inside your computer, that’s the first part of democratization.
The second part of it has to do with the fact that you can write a piece of software, and this piece of software can now adapt to a whole bunch of other skills that you never intended to write it for, that’s the second layer of democratization. The first one already happened, right? You could argue that it even got a boost with GitHub’s Copilot that helps you write software. So the number of people who could write software, artificial intelligence software, the concept of zero code, the concept of low code, all of that concept of AI-assisted coding, all of that is really about democratizing programming, and so on. Next layer is what’s happening right now with large language models. It’s really quite an amazing thing that the scaling law caused not just only the recognition of patterns and relationships, but the scaling law made it possible for us to literally encode human knowledge into this neural network.
And everyone has access to the Internet to get all this data, to feed into it.
JH: It’s a little bit like this neural network now has memory, because it learned from everything that we’ve ever done and spoken about. What’s a really interesting application? Well, maybe there’s a rare form of cancer and it was described in doctor’s notes in a lot of different ways. Maybe it was described in research, but it’s rarely seen and the reason why it’s rarely seen is because it’s rare. But you know the characteristics of it, the multi-modality characteristics of it, and if you can now teach an AI how to now imagine a whole bunch of different iterations of that rare disease that you’ve never seen before, or you rarely seen from the descriptions of research. Now, this AI image generator can generate a whole bunch of different versions of it and you go, “Oh my goodness, this is what it looks like” and then you could take a computer vision algorithm that then now in the future, when you see something like this and a whole bunch of versions of something like this detected in your ultrasound or your CT, or whatever it happens to be.
Now you can imagine how, in fact, this large language model, which has embodied and encoded so much human knowledge could reduce the complexity and make it possible to solve problems that we’ve never solved before. Now this large language model says, “Well, four or five people can go and train four or five companies or institutions, or you could team up together just like we did with Hugging Face.” A whole bunch of people can come together and train one model, which then afterwards, we could adapt, fine tune, prompt learn into a whole bunch of other skills that it’s never been trained to do, and so now it’s been democratized.
To go back to the game discussion, one thing that was striking to me is that games are always characterized as being the cutting edge. Games, they’re the first thing that people want to do on computers. They went to text, and they went to sprites, and they went to 3D and then it’s assumed that the virtual world will be games. Meanwhile, you have text on the Internet, but there’s a bit where games even today, to go back to our discussion about texture creation and all those sorts of things, it’s like handcrafted HTML. It’s like as if we were still building every site and doing every sort of interaction and bit by hand, whereas modern websites are adaptive and framework-driven and you have a site like Facebook that is completely different every time you look at it. It’s actually a marvel of engineering, they’re serving billions of different Facebooks every single day.
What’s interesting to tie it back into this discussion is this is almost a reason I think to be a little bit more bearish on game-like experiences in VR, maybe in the nearer term, in part because the large language models have benefited from the explosion and content that was enabled by modern website development, the ability that people can interact with super easily, and they’re super adaptive whereas games, have almost been restricted because they’ve had to be handcrafted to date. On one hand, that’s a reason why all this tech stuff is actually further ahead in terms of large language models in its application to images and stuff. On the other hand, to the extent your new chips and new approaches can enable auto-generative games, that’s almost like a prerequisite for creating large, vast virtual worlds, because you need the core stuff to actually make that stuff up in the future.
JH: One problem. The core of one data is structured, the core of another data is unstructured.
Games have a ton of structured data, it’s the other bit they’re missing.
JH: If you look at a website today, most ads are square and most words are words. So you could reform, you could replace that experience for you relatively easy in two dimensions.
Right, that’s true.
JH: Sometimes as an image, sometimes a video, sometimes some generated words, but within the constraints of the freedom of expression is relatively fixed — a website for e-commerce is relatively fixed. However, in a virtual world, it’s completely unstructured, the geometry is different in everything that you see. To be able to generate things on the one hand that are free-formed, on the other hand, it has to obey the laws of physics. For example, a cat can’t be ten stories tall. We have an understanding of what the world looks like, we have an understanding of how the world behaves, and so if you generate things, you can’t generate things randomly, it has to be within the realm of possibility. It has to weigh the laws of physics or not but some physical law that’s understandable and representable and it’s constrained by that and yet still completely open. That capability is relatively new, we now have the ability of using AI to synthesize and predict physics that are within the laws of physics, the laws of symmetry and the laws that obeys the laws of physically-based worlds, conserves the laws of energy, so on so forth. I think the AI-generated world where characters animate with kinematics that are sensible because of bone structure, or somehow their gait is sensible because of their four-legged animals. We’ve seen horses and we’ve seen leopards and we’ve seen them move, they’re sensible to us. Or trees that blow according to the wind in a way that is sensible because from all the leaves of all the trees and the way that the surface of the ocean is moving. There’s a sensible way that the wind is blowing, that they’re not randomly blowing in every single direction and so all of these things are now possible. I think that the next era of AI-generated virtual worlds that you can believe and you can still modify is around the corner.
Nvidia’s Enterprise Focus
One other note on the keynote before I get to a couple other questions I wanted to ask you, the thought I had watched the video was that, “Wow, for what most people think about as a game company, it sure is very enterprise-y”. You, fortunately, not have the sit down in two chairs with the CEO of another company part of enterprise keynotes, but still there was a lot of talk in partnership with Siemens, partnership with all these different companies leveraging the Omniverse or whatever else it might be.
Then your final announcement, your “One More Thing” was with this obvious gaming company, which is Deloitte — I’m obviously a bit tongue-in-cheek, but it did kind of capture the keynote in a way. Talk through why that — even in your final slide of, “This is what we talked about”, the last one on the left was Deloitte’s logo. Why is that important for what you’re trying to do and take Nvidia going forward?
JH: The first phase of modern AI was all about discovering new architectures and pushing the limits of the architecture and scaling up the architecture. It led from Convolutional Neural Networks (CNNs) as you know, to Recurrent Neural Network (RNNs), to Long Short-Term Memory (LSTMs), to Generative Adversarial Networks (GANs), to what we know today to be rather scalable transformers and all the vision transformers and so on, so forth. So all these architectural ideas were the core of the work of the first ten years, and during that time the industrial applications of it are associated with recommender systems and user interfaces of mobile devices and such. The ability to use the development of AI in that first generation, it also kicked off all of the work, even though it hasn’t been realized yet, all of the work associated for computer vision associated with self-driving cars and robotics were all kicked off in the first ten years. If you look at the work that we’re doing today, that was all really work that was kicked off ten years ago.
The next ten years is really about the application of AI and if you think about “Is RNNs really useful to enterprises at scale? LSTMs, are they really useful for enterprises at scale? Are GANs really useful for enterprises at scale?” The answer is yes, but not really that easy to use.
Whereas large language models and prompts are much more clearly useful.
JH: Whereas large language models, it has made it possible for us to literally go into just about every single company and say, “Give us a high impact challenge and let’s see if we can automate that”. There are several different aspects of it. The first aspect of it has to do with the data processing part of it, which is the reason why RAPIDS is so important. The second part of it has to do with the AI model of it, that’s where the large language model comes in. The third part of it has to be the deployment of it. How do I deploy this AI in a way that my customers know how to interact with it? Chatbots are fine, typing into text boxes are fine, but nothing is more engaging, especially at the point of sale or where the digital meets physical to have a virtual avatar, than you can literally engage and you could decide whether that avatar is a cartoon brand ambassador, or it could be somebody that looks very human. That last part of it, and then to be able to deploy at scale, Triton. So I just mentioned really four major buckets of work and these four major buckets of work, these four pillars are now sufficiently well established. I put it in the cloud so that all of the world’s enterprise can use it and then now we need a partner to work with us to scale this out to 100,000, 250,000 enterprises around the world. I think the big message from that is that the next ten years is going to be about the application of AI on an enterprise scale, so that companies could revolutionize their products and services and revolutionize themselves.
That makes total sense.
GPU Glut, TSMC Power, and China
One of the challenges facing Nvidia now is this CPU glut. Not only has sales slow down from the pandemic highs, but you have all these Ethereum GPUs flooding the market. As I understand it, you did delay Ada a bit, you’ve spent the last few quarters, resetting your channel, lowering prices and all those sorts of things, all of which makes sense. It’s very painful, but it makes sense going forward. Do you think looking backwards over the last couple years, are there some things you could have done differently to manage this, or is it sort of a perfect storm, and you’re just going to soldier through?
JH: I don’t think we could have seen it. I don’t think I would’ve done anything different, but what I did learn from previous examples is that when it finally happens to you, just take the hard medicine and get it behind you.
Yeah, which I think you did.
JH: Which is totally okay. Just take the hard medicine and get behind us. We’ve had two bad quarters and two bad quarters in the context of a company, it’s frustrating for all the investors, it’s difficult on all the employees.
We’ve been here before at Nvidia.
JH: Yeah, I think we just have to deal with it and not be overly emotional about it, realize how it happened, keep the company as agile as possible. But when the facts presented itself, we just made cold, hard decisions. We took care of our partners, we took care of our channel, we took care of making sure that everybody had plenty of time. By delaying Ada, we made sure that everybody had plenty of time for and we repriced all the products such that even in the context of Ada, even if Ada were available, the products that after it’s been repriced is actually a really good value. I think we took care of as many things as we could, it resulted in two fairly horrific quarters. But I think in the grand scheme of things, we’ll come right back so I think that was probably the lessons from the past.
Well, speaking of lessons over your career, one thing Nvidia has always done is avoid tie-ins to one supplier, and that includes GPU manufacturing. Did you start with Texas Instruments? I think so, and then TSMC, you were on to Samsung. Now, you’re back to TSMC, but to extend the perfect storm analogy, you came to TSMC at a time when they had real market power and could extract long-term commitments that are not great for margins. When you issued your earnings warning, I wrote at the time that one of the advantages of being a fabless company is you’re supposed to be able to avoid the boom/bust cycle, but now it feels like you’re experiencing it all the same. Over the course of your career — you’ve been dealing with manufacturers for decades — has it been a surprise to you that we’ve ended up in this state where foundries can actually push out their pain to fabless companies?
JH: It’s always been the pain of the fabless companies, in fact. The only thing that fabless companies didn’t have to do was build the fab and operate the fab. It had very little to do with the capital as it turns out. A fab is $10 billion. So what? It was $5 billion a few years ago. So what? If you wanted to use a fab, if you wanted to build a fab, the capital part of it is easy. The part that’s hard is operating it. Just because you have a fab doesn’t mean you’re a foundry. The magic of TSMC is —
JH: Yeah! The fact that their process technology is not just state-of-the-art, their process technology is adaptive to everybody. For example, at Nvidia, we have our own custom process, it’s called 4N. And it’s tuned not for cell phones, it’s tuned very precisely for Nvidia’s GPUs, and they will work with us on that and then there will be somebody else who comes up with something else. The fact that they have amazing technology, the fact that the process adapted to us and their supply and product management, their operations team works with our supply chain so incredibly well, ups and downs, ins and outs, it’s just insane. To do this with 3,000 companies is beyond imagination. So I think that’s the genius of TSMC and that’s the difference between, if you will, digging a hole in the ground, which is easy to do, versus operating a foundry, which apparently is nearly impossible to do. But the question that you asked though, Ben, is that we’ve always owned from the very moment we place a PO, all those wafers through TSMC’s fab or a foundry’s fab, we’ve always owned from that moment on our supply chain. What is fundamentally different and what is fundamentally different this time around, and it has everything to do with scarcity of the supply, is that the supply chain cycle time extended by a factor of three or four. What used to be a four-month cycle time became a year-and-a-half cycle time.
So you had to make an agreement a few years ago for 2022 and then here you are.
JH: Yeah, exactly. You have to place orders a year-and-a-half in advance, and you’re so short, the whole industry is so short, the whole market is so short, that you’re more than delighted to do so. In our case, we are running almost a $10 billion dollar a quarter company and $10 billion a quarter, a year and a half, that’s a lot of supply. I think the benefit of course, is that our company is really agile and the moment that we saw it, we have really great products and they’re going to be terrific for a long time and we were fortunate to have just started a brand new product cycle. When we prepaid a whole bunch of capacity, it’s for capacity of Ada and Hopper and Orin and so on, so forth. We were fortunate in that regard, but the fundamental change is the cycle time of the world supply chain and that really has to get shortened.
Yeah, that makes sense. One last question on China. You mentioned in your keynote that Tencent used Merlin for its recommendation engine, but now you are not allowed to sell or you are allowed to sell or there’s certain restrictions, or you can get licenses. Number one, how much of a real world impact do you think this will have on your business? I saw your notes from your financial analyst call today, you seem pretty optimistic that there’s just only certain products that are limited, you can work around that, which is comforting from my perspective as an analyst, because I think the big concern I have about this is not just a potential reduction in addressable market, but it’s like having an entire market that is now heavily incentivized to get away from CUDA. Is that something you’re thinking about? Could it shift your strategy there or do you think that it’s not going to end up being that big of a deal?
JH: It’s excellent to know what the law is and it’s excellent to know what our limitations are. The limitations and the restrictions are very specific to a combination of computation level and multi-chip interconnection level. That restriction gives us plenty of envelope to go and run our business and for the vast majority of our customers in China, this is either unaffected because we have so many alternative products that they can use and they’re all architecturally compatible, many of them are socket compatible because we use industry standard sockets. So the ability for us to serve our customers in China with alternative products is quite acceptable and early response from all of our customers has been great.
For the customers that absolutely need that level of capability, we have to go for a license, that’s all, and it isn’t intended to keep Nvidia’s products from China, it isn’t intended to keep Chinese customers from those products. It’s intended to keep a specific use case of our products to be visible to the US government. I think the limitations and restrictions that is now known to us is an envelope that we can work with. For this particular quarter, we had to make a disclosure because it applied instantly, there was no grace period. So in our particular case, we’re right in the middle of the quarter and so our ability to execute and pivot was something that we wanted to make sure that people understood. But the vast majority of those customers that depend on our products will make sure that they had alternatives that are within the guidelines and the restrictions.
One last big picture question to wrap this all up. You mentioned in the context of self-driving that, “Well, it’s not quite here yet”, and it does feel like that is a descriptor that could be applied to a lot of the stuff here. On one hand, it is a developer conference, and your job is to sell a vision of the future so that people want to build on your platforms and do those sorts of things. Still, how long until the reality of Nvidia’s business matches the timing of this keynote, where “Yeah, there’s this nice little gaming bit over here”, but actually “We’re this industrial powerhouse, we’re doing all these sorts of different things, there’s self-driving cars out there, we’re on the edge, we’re in the middle and the center”. How far off is that and how do you measure your progress there? How do you know you’re actually moving down that road? Is it just your database sales? How do you think about that from almost like a KPI perspective?
JH: There’s a section in the keynote that for most viewers are probably the driest part, but for the people that are using accelerated computing for their core work really important. For example, the part about RAPIDS and Spark accelerated by RAPIDS, Triton, all of that area, all of the SDKs, it really is about the expansion of our accelerated computing platforms into new markets, into new companies, into new industries. That is probably the single best early indicator of near term future success if you will, within the next six months, within the next year. So the fact that that RAPIDS is so broadly accepted and Spark is now powering Dataproc and GCP and Triton is used by 35,000 companies and increased by factor of three year-over-year. Those kind of things, it’s really about the Nvidia’s SDKs and where it’s going. The work that we’re doing with graph database, the most important database of modern enterprise, now we can accelerate it, now we could put that into deep learning. Those kind of SDKs open us directly into new worlds and those SDKs accelerate applications that people are already using or databases are already using or standards and application frameworks that they’re already using. So that area with very little, if you will, graphics fanfare is very work-related. It’s very application-related and so that’s one.
Right, you have this clear graphics GPU application. This is shifting it to, “No, you actually keep doing the same work you were doing before, but now it automagically is a million times faster because we’ve abstracted it all away”.
JH: That’s right. And it is automagically, exactly right. Not only can you scale graph database analytics now by a hundred times, what used to take a week, now takes an hour. What used to take several hours is now instantaneous, for example, the work that HEAVY.AI was doing with Charter. So all of that stuff today, it is real-time and you use, rinse and repeat it. Whatever the success is, rinse and repeat it.
With respect to the robotics and self-driving cars where you started, the important thing is, number one is, Orin sales is a good indicator of how we’re doing. But the thing that’s really important to understand is robotics is not just about things that move, it’s about things that thinks about moving. It perceives the environment, it has reasons about a worldview of the environment, and it thinks about reasons about the things that are moving within it and all of its surroundings and it makes a recommendation on what to do. The recommendation could be “Divert this box from this track to that track”, “Move this AMR from here to here”, “Open the door”, “Close the door”, “Light off the alarm”, “Put a color symbol on top of ultrasound image to segment it, identify something”. So robotic systems are systems that basically do three things: they perceive the environment, they have a worldview and they plan some action. I used Orin as the platform to explain, there’s so many different types of robotics applications. It could be remote sensing, it could be a self-driving car, it could be an AMR.
Yeah. I liked how you analogized the little drone thing. It’s a self-driving car in an unstructured environment. It’s like “Yeah, because it’s not going to run over someone and kill them”, so there’s so many more possibilities.
JH: (laughing) Yeah. So Orin is a home-run. We invented the world’s first robotic processor with Xavier and most people didn’t understand what it’s for and how to use it and we’ve developed now the framework for programming it, the pipeline for processing it and environments for you to develop it and simulate it and do CI/CD around it, so we’ve really built up a whole robotics ecosystem. And then I just wanted to make sure that people realize the concept of robotics is really the concept of artificial intelligence plus interacting with something physical. So that world, how big is that world? It’s our entire world that we know.
And the analogy to games is super clear, right? We’re moving from a deterministic model where everything has to be pre-programmed like robots used to be to decision-making capability, dynamic response to what’s happening and that might be light in the case of graphics and ray-tracing, or it might be an actual environment you’re interacting with in terms of robots. So the analogy makes sense there.
JH: That’s right. The symmetry makes sense. There’s a reason why all of these things are being done at Nvidia. I really appreciate you watching the keynote so carefully and you’re incredibly observant and you’ve captured a lot of it. There were four things that we did. We announced a whole new wave, new chips. So we talked about Ada and Hopper and Orin and the new chips that are staging up. We talked about new software stacks that opens new markets. Software stacks, the SDKs, directly translates to new sales because it directly translates to new applications. And then thirdly, we talked about some things that we had to put in the cloud so that we could make it accessible. I didn’t do it as much this time because it’s just the beginning, but you could just imagine next time, I’m going to talk about Omniverse Cloud a lot and the reason for that is because it’s groundbreaking science and it’s a new type of database, a new type of computing platform. I basically said it that Omniverse is a new database platform, it’s a new networking platform and it’s a new computing platform, so we want to put that into cloud. Then lastly, new partnerships to get this incredible technology out into the hands of all the enterprises.
Yeah, it’s a phenomenal business opportunity, so I think that’s a good thing worth talking a lot.
JH: (laughing) And Q2 and Q3 will be behind us.
Yes, that’s right, it’s all upside. Jensen Huang, it was good to talk to you again and congratulations on the keynote and I’ll look forward to talking to you again soon.
JH: Thanks Ben. Talk to you soon.
This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery.
The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly.
Thanks for being a supporter, and have a great day!