An Interview with Midjourney Founder David Holz about Generative AI, VR, and Silicon Valley

Good morning,

This Daily Update interview is another installment of the Stratechery Founder series; as a reminder, one of the challenges in covering startups is the lack of available data. My solution is to go in the opposite direction and interview founders directly, letting them give their subjective overview of their companies, while pressing them on their business model, background, and long-term potential.

David Holz is, in my estimation, a throwback to the old Silicon Valley. Holz first came to prominence as the founder and CEO of Leap Motion (not Magic Leap!), which created a new user interface paradigm for computers using cameras and hand gestures. Leap Motion was, ultimately, too early, and sold to Ultrahaptics in 2019. Holz then set up a studio to explore new opportunities, ultimately landing on AI-generative art; the self-funded company released the resultant product, Midjourney, as a Discord bot this summer. I used Midjourney to generate most of the images for The AI Unbundling Article I wrote earlier this fall, including this image of a paperboy:

Paper boy, by Midjourney

This image was distinctly Midjourney: no, it wasn’t technically correct, but it had a certain feel to it that, more than any other AI-generative service, felt like art. Earlier this week, in an announcement on Discord (that has been strangely uncovered by the media), Holz wrote:

We’re going to try alpha-testing a version of our V4 algorithms today…What’s new with the V4 base model?

  • Vastly more knowledge (of creatures, places, and more)
  • Much better at getting small details right (in all situations)
  • Handles more complex prompting (with multiple levels of detail)
  • Better with multi-object / multi-character scenes
  • Supports advanced functionality like image prompting and multi-prompts
  • Supports –chaos arg (set it from 0 to 100) to control the variety of image grids

[…] V4 is an entirely new codebase and totally new AI architecture. It’s our first model trained on a new Midjourney AI supercluster and has been in the works for over 9 months. V4 isn’t the final step, but our first step, and we hope you all feel it as the new beginning of something deep and unfathomable. Please use these incredible powers with joy, and wonder, responsibility and respect.

Here is “paperboy on a bicycle” as created by Midjourney V4:

Paper boy, by Midjourney V4

There is, to be fair, a bit less of that Midjourney aesthetic, but it is, objectively speaking, a much better depiction (hand excepted). We get into the choices Holz has made in that regard, the business of Midjourney, and Holz’s journey to this point — including Leap Motion — in this interview. This, by the way, is what Midjourney imagined Holz looked like (using his Twitter avatar as a prompt):

David Holz, by Midjourney V4

On to the interview:

An Interview with David Holz about Midjourney, Generative AI, and Silicon Valley

This interview is lightly edited for clarity.

Holz’s Background

David Holz, I am incredibly excited to have you on the podcast, both because of the subject matter — you are the founder and CEO of Midjourney — but also because we did have the chance to talk offline previously and it was an incredibly fun conversation. So welcome to Stratechery.

David Holz: Thank you. Thank you for having me.

Tell me more, before we get to Midjourney and your V4 release, which we’ll definitely get to, about your background, and not just in tech. We’ll get to Leap Motion in a moment, but I want to go even further back. Where did you grow up? How did you get started in technology? What got you in front of a computer in the first place?

DH: Oh, that’s a good question. Gosh. I grew up in south Florida, Fort Lauderdale. I try to come up with a good reason for why I was around technology. It possibly goes into even further back. My dad was a dentist. He had a dental office on a sailboat and then he used to do go around the Caribbean in South America. And then when he had me, he got a land office. But we were in a weird area of town where there weren’t really any other kids, so it was just me and old people. And so I would spend a lot of time by myself. And I guess somehow that ended up also becoming a lot of time around technology. I don’t totally know where the first computer came from. It probably was a computer that came from work by my dad, who had an early computer from being a dentist, and I just spent a lot of time on that.

And we got another computer, another computer, and eventually they tried to get me classes on computers. And so I was programming, I was programming Scheme, which is, I guess, an unusual programming language for teaching college foundations. But when I was five, six, it’s weird, I didn’t really like it. I don’t think I really motivated to use computers until I learned that if I programmed the right sort of series of incantations, I could hack Jedi Knight Dark Forces to give me special force powers where I could shoot rockets out of my hands. So it actually came from reprogramming video games, which weirdly was the first time I became motivated to actually use the skills that I had.

I don’t think it’s that weird. I think both of those are very common origin stories. Number one, I wasn’t around a lot of kids or people who were interested in the same things as I was. It comes up again and again on these interviews. But then number two, in terms of modding video games, you need a motivation to accomplish something, because it’s not necessarily enough to say, “I’m going to learn how to program.” That isn’t particularly inspirational. It’s when you have an end goal in mind and you want to achieve something, you realize, “Well, I can achieve this if I want to.” I’ve heard that from lot and lots of folks.

DH: Yeah. I think I remember one of my first computer classes, they were teaching me the first things and I was like, “What is this for?” And they go, “So you can code.” I’m like, “Well, what’s at the end of the book? If I do the whole book, what’s at the end of the book?” And they go, “Well, you can do a tic-tac-toe program. It’s like, “That doesn’t sound very interesting. I don’t care. Why am I doing this?” And it disconnected really early. But then the second, it was Star Wars. You could program to give yourself Star Wars powers. It was a totally different game.

Yeah, for sure.

DH: Very different motivations. But I was exposed to a lot of it, and so it was never scary. I was probably more of a scientist than I was an engineer or programmer. I was much more interested in science and doing experiments and exploring. I tended to wake my parents up with the ideas of inventions, of something could be better this way or that way, and I think it would often come down to, “Well, why isn’t it like that already?” And they’d be like, “Well, maybe you should do it that way.” “Maybe I will.” Luckily, my parents were always very good at never telling me a question was bad or they were very encouraging in spite of the fact that I’d be asking things that couldn’t really be answered very well.

Yeah, empowering you so that you can answer them yourself. I mean well, just to go back, so one of the companies or the company you started was Leap Motion. For those who aren’t familiar, it’s a new user interface that used hand motions and was very much associated with AR and VR, which I want to ask you about in a little bit. But you were a PhD math student at the time you started that, right? So it’s not like you were coming up to be a programmer. It wasn’t just science, you were all the way at the the core, studying math.

DH: I think it was more that I wanted to understand the universe and so I saw physics, and I looked up to the great physics minds of history, “These people really seemed to understand the universe. I want to be like them.” And so I started off by studying physics and then sometime along the line of studying physics, I also started studying math. So I ended up graduating with dual degrees. I had a math degree and a physics degree.

Then I had to go to graduate school and I was like, “What do I study, math or physics?” And they pretty much go, “You have to pick one.” And I just couldn’t quite handle this because I like, “Well, in physics it’s really more about reality over truth. We just want to have equations that fit the data. We don’t really care if the math is good.” And they go, “Yes.” And I go, “Yeah, math is about truth over reality. It’s all about these different numbers, it doesn’t really matter if they have any connection back to the real world, right?” “Yes.” “So you’re telling me that for graduate school, I have to choose between truth and reality?” “Yes.” And I just couldn’t handle this as a concept.

So I went into applied math, which I thought was sort of a middle ground, and it was a little bit. There are these areas where you can use math where everything is the truest of the true, which you can still touch the physical nature of the universe, but you end up having to go into really deep corners where both align each other really well. When they do align, it’s God-like powers, incredible knowledge. But I think that it was not quite right for me.

Around that time, I was actually doing a lot of things. I was working on this PhD in applied math at UNC Chapel Hill. I was working at NASA Langley on laser radars and atmospheric science and Mars missions. And I was working with the Max Planck Institute on some neuro-imaging projects, like mapping some of the early rat brains. It was around this time where I realized I was working on all these projects, and I was getting overwhelmed and asking myself, “Why am I doing all of this? Why?” And I realized it was because I didn’t really care enough about any one thing to just do one. And that’s when I decided to back way up. And say, “I do want to actually focus on something and I want to focus on one thing that I care about.”

One more thing I was doing at the time was I was doing the earliest prototypes of Leap Motion just in my apartment. And I decided that was the thing that I actually cared about. I cared about it so much that I was willing to do it while doing all the other things. And I just wanted to focus on that. And the Leap Motion thing actually came from much earlier in my life too.

Leap Motion

Walk me through your journey with Leap Motion from the beginning, the idea, the concept, and we’ll get to sort of what happened when you actually started the company, but I’m actually curious.

DH: Leap Motion actually started in middle school. There was a typing class and they told me if I could type 60 words a minute with a paper over my hand, I didn’t have to do the class, I could do whatever I wanted. I instantly did it. And then I decided, “What am I going to do? I’m going to try to learn how to do 3D modeling.” And I tried to learn how to do 3D modeling, and it was really, really hard. And it was weird for me because I was really good at computers and there were lots of good 3D models on computers. And so I didn’t feel like there was anything wrong with me, and there was nothing wrong with computers. I felt like the problem had to be something in between the two, which was the way they interacted via the human-computer interface. So I’m 12 or 13 and I was like, “Something’s really wrong and it’s the way that people interact with technology, there’s a bottleneck, there’s a limitation there, and that should be better”, and it stuck in my head and I always wanted it to be better.

I think sometime around the end of high school, I was like, “Maybe we could track hands with these sort of multiple views, with algorithms.” I started designing stuff like that. And so it stayed with me to graduate school as I was learning a lot of the skills to play with that on the side. And then finally, when I needed to quit everything to focus on one thing, I was like, “Well, I really care about humans and technology and the way they interact and I want to try to start a company focused on just solving that problem, the human computer interaction problem. And that’s the one thing that I’m really passionate about.” And so that’s why.

It’s interesting. I feel like there’s a similarity here with your frustration in having to choose between truth and reality, math and physics, and this bit here where one of the frustrations with computers is they are so literal. They literally do exactly what you tell them to, which is why the coding process can be so frustrating. It’s like, “Why is this not doing the right thing?” It’s like, “Oh, because actually I told it to do the wrong thing and it’s doing exactly what I told it to do.” I sense a pattern of you seeing this sort of disconnect, in this case it’s, “Oh, me as a human, I know what I want to do. I can’t get the computer to sort of manifest and do that. So there has to be a better way to interact with it.” Just like with, “Of course in the long run, math and physics have to resolve, but there’s something missing. There’s some missing piece to figure out how they link together is.” Do you see a similar sort of line there?

DH: Yeah, I mean the thing I used to say was that the biggest limit in technology isn’t the size or the cost or the speed, but how we interact with it. And it doesn’t matter how smart we are, how good technology is, how we work together fundamentally determines what we can do. And so that was the thesis for Leap.

So leap, this idea that you could basically use your hands and you were using cameras and you were transferring that into a UI, this was over a decade ago.

DH: To give you context, we were doing mid-air gesture control in 3D before Windows supported touch screens. So to us it was this idea, where touch screens were just starting to exist basically around that time. And I was like, “Well, touch screens are awesome and then hand interfaces will come after touch screens and it’s all going to be awesome”. But the idea that we were actually commercializing gesture stuff before touch had actually hit Windows or big tablets and stuff in general — I think we actually came before the big tablets too, the iPads and stuff — so it was very early. The general thought was that a lot of our intelligence isn’t abstract and symbolic, but physical and intuitive, and trying to build user interfaces more in that sort of physical intuitive, manifested reality is one of the ways to leverage a lot of sort of locked-up human intelligence. It’s the reason a lot of people have a hard time coding, but everybody can manipulate stuff in the real world.

So is this just a matter of you just being way too early?

DH: That’s zooming way out. I think there were a lot of things that we didn’t know and that nobody else knew. So there were certain mistakes that everybody made. For example, there was a general idea that when they brought touchscreens to Windows, it would be awesome. And then it turned out to not be awesome. And then we hit the same thing, which is that we did a lot of just control stuff interacting with Windows, and that was also not awesome. It was pretty much exactly the same as touchscreens where Windows wasn’t really made for touchscreens. Things were a little small, the applications weren’t really that compelling with it, but touchscreens were great on smartphones and we knew that that succeeded.

And so around that time, we thought that maybe the same was true for hands. We wanted to go to different devices or we’d have to remake PCs. So we decided to go to different devices, we went to virtual reality, augmented reality. VR and AR felt like it would be to sort of physical hand stuff that touch was to mobile. And I think there’s a lot of arguments for that being true still. I think one of the challenges of course is that the market was going really fast and then it slowed down, and then there was nothing we could do as part of that market to bring it over the finish line. But there are a lot of other challenges too. There are a lot of things in hindsight we feel like we could have done that would’ve been independent of VR and AR that would’ve been successful. But the second we wanted to go to the new devices of the new markets, then our fate became linked to the general scale of those markets.

The Current State of VR

So I think this makes a lot of sense. A new sort of interface does come along with a new paradigm. That new paradigm has to have a completely different software stack that sort of assumes the presence of this new interface. And I think that mobile and touch is a really great example. And yeah, in the case of VR, the fact that you’d want to use your hands to control and manipulate stuff also makes a ton of sense.

What’s your view of the VR space now? Obviously Meta is the sort of dominant player that’s in the market right now. There’s still lots of other headsets that can be tethered to your PC and there are other products, Apple is rumored to be building something. Are you excited about what’s happening or are you bummed out about the way things are going?

DH: I think it’s fair for me to say that both I and probably every other single pioneer of the space is disappointed with the present state of both AR and VR. I think we all saw a lot of potential in the early days and I think honestly, so did everybody else. But I think one of the challenges of what happened was everybody got so excited that we effectively saw premature consolidation too early in the field. And in general, that sort of consolidation is really good when a lot of the product and markets have been mapped out already, because then you can just push it to scale. But when a lot of it, when we were still in the space of needing to try a lot of things collectively, both to try and fail a lot of things collectively, then the consolidation, the larger companies just haven’t been as good at that. Effectively what we got is instead of having 10 VR companies coming out with different headsets of different designs, we got one. And the one design was not the most adventurous or the most coherent design. It was conservative.

Right. It has considerations about scale built into the design. It was basically trying to scale too early. This is really interesting. I go back to mobile and there’s probably an aspect where, there used to be hundreds or thousands of new phones designed every year. You obviously had Blackberry, which was a pretty significant success and obviously didn’t turn out to be the right form factor in the long run, but it pushed pretty far and it reached the limits of what you could do with that form factor. And then Apple comes in and yes, they completely redefine the space, but they certainly benefited just from observation, if nothing else, everything else that was out there. With VR, it’s like, “Okay, let’s see. It’s like everyone’s already down to, oh, Facebook or Apple,” and there’s like, “Where’s the Blackberry in this space? Or where’s the Nokia or the Sony or whatever, just pumping out a million different designs trying to see what sticks?”

DH: Yeah, it’s hard. I think that in smartphones, it became keyboard or no keyboard, physical keyboard and no keyboard. That was the design decision. But I think VR and AR have a lot more decisions than that. You can go bigger form factor, small form factor at higher price, lower price. You can have the more transparent, more opaque, there’s a bigger design space to explore. And I think that’s because smartphones always existed within design space of a phone. So it can only be so big-

The form factor was already defined. Yeah, that makes sense.

DH: In this case, that’s not there. The space is much larger and I think VR and AR really needed 12 companies basically making products. And then I think in two generations of 12 companies, we would have been way closer to something that was really compelling for everyone. But instead we got one product, maybe two. And then with three to four-year generation cycles, which is almost the worst possible thing that you could do in an innovative market. None of these things would have ever happened if that’s how any other area of technology was.

It’s super interesting because one of the criticisms of governments moving into a market is the crowd-out effect where, because you have something with relatively unlimited resources, but there’s a ton of inherent conservatism that comes with that approach that yeah, that’s the concern, is that you’re going to sort of crowd out innovation because no one’s going to even bother investing in an alternative and you don’t get the iteration. You get a real strong fear of failure instead. Basically, the argument here is because Facebook bought Oculus so early, and Facebook obviously could sort of look at it and see, look, they have so many resources, they’re so clearly strongly committed to this space, then everyone else from venture capitalists to would-be entrepreneurs are just like, “Well, there’s kind of no point in competing here.” And then you end up in a state where you have a relatively slow moving entity leading, which is just a reality of a very large company with a lot to lose being in a space. And that’s sort of like the critique that you have of the space generally.

DH: Yeah. Although I would point out that it wasn’t only Oculus that was acquired, but there were probably two dozen other companies that were acquired by Google and Facebook and Apple and Microsoft. And there was effectively an ecosystem prior to the big acquisitions where if you needed a camera system, you could go to someone, if you needed a scanning system, you could go to someone. There were different startups for every element of the stack. And then when all of those went away, if a new startup kind of exists, tries to come into play now, it doesn’t have any other startups to work with.

You have to build everything on your own, which is a gargantuan and difficult task.

DH: And all the startups were talking to each other. There’s a little book about Oculus and it talks about in the book that the first employee at Oculus, his first day, was visiting Leap Motion. And that’s how they actually closed him. He said “you should really work with Leap Motion” and Oculus was like “Well if you join today then you can come with us to Leap Motion.” He goes, “Okay.” So it really was deeply intertwined with a lot of these things. We were all talking to each other constantly and bouncing ideas off each other. It was pretty open. There wasn’t a lot of secrecy between startups. It wasn’t seen as particularly competitive because we kind were like, “Boy, either we figure this out together or we don’t.”

With the big companies though, there are huge firewalls. The second something is inside a big tech company, you don’t know about it. No one will talk to anyone. None of the big tech companies talk to each other. And so you lose all of the cross-pollination. You do get bigger teams, but a lot of the big teams, it’s kind of a different lifestyle and different group of people who join too. I knew a lot of people who were great, who are like, “I would never work for a startup, but I would work for Facebook.” And that’s when they entered the field. And that’s very different, and you start feeling very different forms of culture very quickly around these projects.

The Problem with Tech’s Big Companies

It’s kind of interesting to trace this evolution over tech history, because you start out when Microsoft was large, Microsoft was famous for not buying anyone. They’re like, “No, we’re just going to mimic it and build it ourselves.” And in retrospect, that was actually very freeing, because there was a large space and you had the advantage of being small versus being the big entity. Then I think Google, and probably YouTube was an example of this, it was like, “Look, it’s already invented. We can take it to scale.” And that maybe was an optimal place because the ability to take a new invention and immediately expand it to billions of people is pretty incredible. Then Facebook pushes it even further with acquiring sort of Instagram, which again, the product was fully realized and then they took it to scale, they monetized it, all those sorts of things.

But now what I’m hearing from you, and I think it’s a very compelling argument, is that it’s kind of gone too far, because they’re buying before the products are fully realized and need experimentation. We’re as far away from the Microsoft model as we can get where they waited for the product and then mimicked it, to, “No, we’re just going to do it ourselves.” And it turns out big companies are bad at doing that.

DH: Yeah. I mean there’s an argument that you want to buy something before it gets big. Facebook did that with Instagram to some extent. There are a lot of reasons to buy a company. One of them is to literally just take it off the market and to have all the people working on that new market at your company. I think there’s been a lot of weird realizations at this point that these companies can effectively buy out companies before they become competition, and that they can hire up effectively everybody who they can to make it harder to actually build teams. I think that’s one of the reasons why you’re seeing all the layoffs right now is because the honest truth is that they know that they don’t need those people to function as a business. That was never the goal. They didn’t hire all those people because they needed those people. They hired them because they wanted to take them off the market to some extent. And that’s kind of a dirty truth, but I think that there’s something there that is worth admitting.

So it’s almost like a reason for optimism, and there’s always this sort of cliche that, “Oh, all these people who are laid off will start new companies.” But in this case, it might literally be the case that if not start, at least be open to joining, and the talent market or talent opportunity for startups is going to be the best it’s been in a very long time over the next couple years.

DH: I think that there’s a lot of innovation that’s going to happen, and I think everybody benefits from that. Obviously it’s a balance and I think the balance has been off for a while, and I think maybe some of these things are starting to rebalance now, and that’s potentially really exciting. I don’t know if AR/VR is going to rebalanced for a while, that might take a few more years.

Yeah. Maybe Mark Zuckerberg has to throw in the towel first. I’m not sure if that’s going to ever happen.

DH: Yeah, I mean a lot of people are just quitting now from all these AR/VR teams at different companies. But still I think there’s a difference between quitting and then deciding to go into the space again. And that’s not just a question of engineers, it’s a question of investors. Both investors and engineers have to decide together that now is the time for us to try to build AR and VR companies again.

There’s become such a bias against hardware, and the belief that it’s not an investible area. Is that a fair bias? Is it really getting too hard or do you think that’s mistaken?

DH: Generally, I don’t think hardware is hard. I think hardware is hard to get started, but once you have it started, it’s actually one of the easiest machines to keep going. It doesn’t have a lot of the disadvantages of software, where software is always on the edge of oblivion effectively, it’s always one line of code away from everything collapsing. With hardware, you get to start over each time, but you have all the supply chains and you have all the tools ready. And so that’s always constantly being made fresh. I think that hardware is actually, in a lot of ways, is a great industry.

I think one of the challenges with hardware has been effectively building platforms before products and that might be the real thing. That might be the single biggest thing. With Leap, there was a large period of time where there was this idea that, “Oh, Apple’s a platform or AWS is a platform. These are all platforms. You should build a platform.” But the fact is I don’t really know if anyone’s ever done that and been successful. They’ve always built a product first and then the platform came from there. And so I think maybe it’s possible that this sort of platform before product play — which was, I want to say more of a thesis of investors than a thesis of the entrepreneur community — that false thesis maybe led to a lot of the hardware companies that were not successful.

I think that’s really compelling, because I think you could say a similar thing about Meta’s Metaverse play, which is they’re so focused on it being a platform when, look, you have to make a great product first. I wrote an article ages ago about Apple relative to Microsoft. I said, “Look, the App Store would probably be better if Microsoft administered it. Microsoft’s good at cultivating developer platforms and having good policies about developers and growing the broader ecosystem, but they can’t make a consumer product that anyone wants to buy.” And that’s the price of entry. And Apple, as much as you want to critique or criticize their platform policies, and I certainly do it enough, you have to recognize they remain a fantastic product company, and that’s why they have a platform power because they made a great product first.

DH: The pressure on startups during that last generation of hardware to obviously be platforms was huge and hard to overstate. Leap Motion had an app store. We had 300 apps in the app store. The guy who ran our app store ran the Apple App Store at one point. We had these things, but it turns out having 300 shitty apps doesn’t substitute for having a single good product. And in some ways we actually did have a really good product. Developers loved it, musicians loved it. There were a lot of really cool opportunities to actually hone in on those areas and make a really good product, but there was this huge focus on effectively being Apple from day one.

And it wasn’t just really from us. I think I saw this applied to a lot of the other companies too, where they start having these app stores and these ecosystems and you’re always kind of wondering, why are you doing this? Your widget is good, just focus on the widget. And I think that was just a mistake. I think the industry needs to acknowledge this as a mistake, that we have to build good products before we build platforms, and then just move on from this sort of false thesis, which is partially what I tried to do at Midjourney.

The Midjourney Story

I was waiting for you to pause, because I’m like, “This is a natural transition.” So Midjourney, it’s incredible. The news and reason I have you on this week is you just launched v4, which dramatically ups the realism. One new feature is that you can use photos to prompt it, so I have this great image of David Holz’ podcasting, which I think looks pretty accurate. I’m looking at you right now. But tell me about the birth of Midjourney and how you’ve been thinking about it as a product, not necessarily a platform? Or how your learnings and stuff from Leap have impacted the decisions you’ve made along the way.

DH: There are so many learnings, a lot of lessons. Leap actually had a pretty good business just selling a widget that people liked. The challenge was in trying to do the app stores and a full ecosystem and trying to be an OEM component vendor effectively. And so in this case, we have a product that obviously we could license, we could license models, we could have an API, we could let people make mini games. But you know what, no, we’re just going to focus on the core experience of a person interacting with a computer and imagining through the piece of technology how a person might travel through the use of a vehicle. And trying to look at what this thing is, at a really deep level, day one. Don’t try to let someone else market it or put it into existing products before trying to figure out what it actually is.

There’s a temptation whenever you see something new to just stick it into whatever exists: take the gesture control and put it on Windows, or let’s do radio shows on television. There’s always this temptation and I think that’s generally always been wrong. And so taking your time and just watching people use it and trying to discover the inner essence of the thing, I think is the way to true art. When cinema was first invented, I think the best people that ended up defining cinema were people who were trying to figure out what was this thing in itself? Not how do I make a play with a video. And so trying to understand the nature of a new medium and of a new frontier before just shoving it into other stuff, I think is a really important process of discovery.

And then usually in the act of doing that, you learn a lot of things that you would’ve not been able to figure out if you went in there trying to make a particular product at day one. So first it’s very much an exploratory phase, an un-opinionated exploratory phase. I think that’s very important. With Leap, we didn’t do that I think until too late in a lot of ways. I would test things on myself, because I’m a human, but it turned out that I’m the son of kind of a surgeon and I’m fairly dexterous and I was fairly young. And the raw physical dexterity and the way that people use feedback with vision and haptics was not the same with everybody. And that when we started finally doing really strong user tests at Leap, a lot of our stuff changed.

And so in this case, it’s like, “Oh, I can test things on myself. I find this compelling, but let me test it on people.” And we did our first Midjourney test a little over a year ago now with 200 people. We immediately saw huge things we didn’t realize. 90% of people couldn’t learn how to tell it what they wanted. So that was like, “Okay, we’re going to lose 90% of people if we don’t solve that.” And then a lot of people like using it together and it was sort of like, “Oh, well there’s a social thing. Maybe people don’t really want to just go into a corner and imagine photos of dogs.” And so trying to now, okay, there’s a social element, there’s this teaching element, there’s this sort of communication and language element, and that’s kind of informed our first version. And then we kind of tested it again and again and again.

What prompted the interest in the AI-generation space in the first place? Was there an aha moment? This is what I need to do next once you had sold Leap Motion? Or was that just a dawning realization?

DH: One of my goals at Midjourney is to build new human infrastructure, and I think the world’s going to need a lot of new things, and we need infrastructure in which to build new things. And so I kind of think a lot about building new forms of human infrastructure, like new pillars of infrastructure. So I needed my themes. And my pillars were reflection, imagination, and coordination. You have to reflect about who you are, what you want. You have to imagine what could be. And you have to coordinate to get there. And so I was actually working on projects for all three with a team. And because that’s the lens that we were using on everything, we were starting to see a few breakthroughs happening in the field of image synthesis, and it felt qualitatively different than anything that I had encountered before in the field of AI.

It helped that everybody was in San Francisco, all the AI people, a huge percentage, we’d gather up and we’d talk at parties, and I would say things like, “Boy, this paper on the diffusion model, this feels really different. This is not like the GANs. And everybody would talk about it. And we’re like, “Yeah, no, it’s not the GANs. This is a totally new thing. There’s all these ramifications.” And we kept coming to the same conclusions that the technical ramifications were pretty serious. And so the question were what the human ramifications were? What the user interface implications were?

And so even though I don’t think I necessarily am the one to help the world figure out how to make a fake dog photo, I do think that I have a unique ability to think about the human side of it. What does it mean for a human to imagine through a computer as they would travel through a car? And if people somehow interact together in the process of imagining things, if we have these co-imaginative spaces, what is it like to create these sort of shared imaginative environments that are somehow intermediated by machines? That all feels like it is in my wheelhouse.

Other things that were in my wheelhouse are I love it when a piece of AI doesn’t have a good cost function. Meaning that in general, if you have a really good way to measure whether or not something is working, it’s actually pretty easy to solve it in AI. But the things that are hardest and most interesting are the things where it’s like, “Is that a good dog photo? I don’t know.” It’s really interesting now. It’s like, “What is a good image? What do people want out of an image? What do they expect? How do they speak? How do the words of what they’re trying to come up with affect the thing that they actually want?” There’s this almost very philosophical and vague problem to solve. And I really love that.

That turned out to be a big thing that I had to get good at for Leap Motion because, “What does it mean for hand tracking to work well?” was a similarly difficult question because it was in the context of, and is it good for interaction? What does it feel like? And then there were a lot of these sort of human elements and I got really good at that. And so I knew I could help there. I knew I could help on the design side. And I mean, I’m somewhat of a designer. I had a professional design business in high school, I did design a lot at Leap. And now this is a little bit design. I have enough knowledge of AI, I have enough knowledge of user interfaces, I have enough knowledge of design that I can take all three things together and I think I have a lot to add.

I’m sensing, I mean there’s almost like the name Midjourney, it’s being in the middle. There’s sort of a theme here. It really stood out to me. The fact that you don’t know if an image is good or not, or a computer can’t know. It goes back to the product bit. How do you make a good product? Great products have some sort of opinion and vision that drives them. And only once you have the great product and get the large user base do you start getting actionable feedback to improve the product on the edges. But so much of product development, it’s trying to get data too early. It’s like trying to pull something out of user surveys or whereever it might be. And there needs to be vision. And I find it really striking also that you have these goals in place before you settled on doing visual AI. That’s the opposite of what you would think would happen as far as a company in this space.

DH: Well, I had this goal, which was we needed to somehow create a more imaginative world. I mean, one of the biggest risks in the world I think is a collapse in belief, a belief in ourselves, a belief in the future. And part of that I think comes from a lack of imagination, a lack of imagination of what we can be, lack of imagination of what the future can be. And so this imagination thing I think is an important pillar of something that we need in the world. And I was thinking about this and I saw this, I’m like, “I can turn this into a force that can expand the imagination of the human species.” It was what we put on our company thing now. And that felt realistic. So that was really exciting.

Well, your prompt is, “/Imagine”, which is perfect.

DH: So that was kind of the vision. But I mean, there is a lot of stuff we didn’t know. We didn’t know, how do people interact with this? What do they actually want out of it? What is the social thing? What is that? And there’s a lot of things. What are the mechanisms? What are the interfaces? What are the components that you build this experiences through? And so we kind of just have to go into that without too many opinions and just try things. And I kind of used a lot of lessons from Leap here, which was that instead of trying to go in and design a whole experience out of nothing, presupposing that you can somehow see 10 steps into the future, just make a bunch of things and see what’s cool and what people like. And then take a few of those and put them together.

It’s amazing how you try 10 things and you find the three coolest pieces, and you put them together, it feels like a lot more than three things. It kind of multiplies out in complexity and detail and it feels like it has depth, even though it doesn’t seem like a lot. And so yeah, there’s something magic about finding three cool things and then starting to build a product out of that.

Building Midjourney

Walk me through this because Midjourney was just publicly released this year, but when did you actually start? There seems to be a disconnect here between your articulating this value of discovery and seeing how communities respond and all these pieces, versus you have to actually go and train a model, which in some respects is doing work ahead of time. So when did you really get started? When did you start training your first model and what’s the timeline here?

DH: About 16 months ago. So we probably spent two months just using it ourselves and playing with it ourselves to form opinions.

How long did it take to train your first model?

DH: There were enough open source things that you could put stuff together to get a sense of the experience of having to train anything, which was really great. But then very quickly we started to do more custom stuff, but we didn’t need to train anything to start to get a sense of the space. A lot of it was thanks to this woman named Katherine Crowson, and she was an independent researcher in the middle of nowhere, just training stuff. She was one of the pioneers of the space. Didn’t work for anybody, it’s really cool. So, that was kind of special. And there were some open source things from OpenAI that didn’t make images but could help with the language stuff called Clip. So last year there were a lot of experiments made by lots of different researchers that were taped together that could show you could make an image and it was kind of compelling. And so that’s what honestly what kind of got me, I was like, “This is obviously really cool and it’s obviously not what it could be.”

And so you could sort of parallel the work then? You could on one hand start to figure out the community aspects, user interface aspects. The bit about using it in Discord is brilliant to my mind, just the speed of iteration it allows, and you’re not spending time building an entire user interface. I mean, maybe the long run Midjourney will have its own website, its own user interface, but for now you sort of get a user interface for free as you figure this bit out. And then at the same time you say, “Okay, we’re starting to figure this out. We really should train our own model.” And you just came out with V4, which I think you said it took you nine months to train.

DH: Yeah, it took a really long time.

Right. So you started training the V4 model that just came out before Midjourney was released publicly. But because there was this open source bit and you could pull enough pieces together, you could sort parallize that discovery.

DH: But maybe what happened a little bit more was, we started off with our first user test, it had two algorithms. It had the sort of open source stuff, which made decent looking pictures in 20 minutes. It was quite slow back then, 20 minutes to get a picture. My thesis was that, “We don’t need image quality that good but we want speed.” And so I worked with Daniel who was one of our first engineers. We made a version that could do 15 seconds. And so I was like, “Okay, let’s do a user test, 200 people and there’s both algorithms, there’s a 20 minute one and there’s a 15 second one, we’ll build good experience around both and we’ll started to see what happens. And it turned out people had a really hard time using the higher quality image one, they liked the one that was lower quality. But there were a lot of false signals actually around, it turned out that while we thought that speed was important and that quality wasn’t important, we’ve kind of learned over time that it’s not that speed is important, quality is unimportant, it’s that there’s uncertain window of speed that allows people to learn.

Right. So they could iterate and do different prompts and things along those lines.

DH: If it’s too slow you don’t learn. And it turns out if it’s really fast, it also can not feel good in its own other way. And so we actually end up making things a little slower. It turns out 10 seconds is too fast, 60 seconds, turns out it’s around the time where they can do a few things, they get them all back and it feels good. But within that period of time we had two versions, we had a 60 second version and a 10 second version. And basically, nobody cared about this 10 second version because the 60 one was prettier and 60 wasn’t too slow.

Right.

DH: Basically, there’s a golden zone and we sort just focused on quality within that zone.

Midjourney and Discord

The Discord thing was interesting. We knew social was important, but actually the reason why we were in Discord was because we were a remote team and we built a bot that helped us just debug the system inside of Discord with the team. So the team was talking to each other and we’d be like, “What about this parameter?” And then we’d like, “Oh, let’s make an image.” We’d do that. “So what of this parameter?” We were using the bot to actually drive the development of the algorithm in the remote setting. And then from there we were like, “Oh this is pretty fun. Maybe we should let somebody else use this.” And then we did a user test of that on Discord with the same bot. And then everybody really liked that and we kind of saw people spit balling ideas off each other in real time and creating these imaginative environments and sharing knowledge. And that seemed so good. I immediately was like, “Okay, we have to go all on in this. Let me build a Discord bot team collectively.”

Yeah. I mean, what’s so magical about the Discord experience is that all these pieces go into it. So number one, it takes a while but it’s not dead space, you see the image emerging. So it’s very compelling to just sit there and say, “Wow, I think it’s getting there. Oh what’s that going to be?” And then it sort of comes out. But then if you’re in one of the public channels, while you’re waiting for your image, you’re seeing all these other images come by. Number one, it’s inspirational, because some of them are absolutely incredible. Number two, you see the prompt they used like, “Oh they did this XYZ thing, I should try that on my next one.” And by the way, you can just put in another prompt while you’re waiting for yours to come out. And that sort of communal discovery anticipation all wrapped into this sort of 60 second experience. That excitement of being in one of those channels, it’s really compelling.

DH: Yeah, that was a surprise. I think that kind of naturally arose in the first test. When I saw it, I was like, “We have to go all in on that.” And yeah, I mean everyone’s like, “Are you crazy? You’re going to do everything in Discord?” I’m like, “People like it and nothing else matters.”

It’s funny because you get a great mobile experience for free. You get all this syncing for free. I mean, there’s an aspect here where in tech over time stuff goes up the stack and we’ve been in the app era for so long, where you have to have your own app, you have your own website, and you have to do all the syncing in the backend. What’s interesting about Midjourney is not just the image bits, but I think the user interface bit is actually really compelling. This is arguably the biggest and most important chat bot app where you’re assuming the presence of Discord and then you’re sitting on top of that. And it’s a higher level of abstraction than the app paradigm that we’ve been stuck at previously.

DH: I think it’s fair to say it’s probably the biggest chat bot app ever. It’s been hard for that market to work.

Yeah.

DH: I think a lot of chatbots try to be a person, which is bad. A chatbot should not try to be a person. And so in this case there are a lot of people who want to make the icon for the chatbot a little robot. And I was like, “Absolutely not. This is not a robot. You are not working with a robot. It’s a vehicle. We put a little sailboat on there.” It’s like, what did you see when you’re on the sailboat?

You’re “mid-journey”, right?

DH: Yeah. There’s no conversation between you and the computer, but then there is a conversation between you and other people.

Yes.

DH: I think that the Midjourney experience would not work at all if it was just talking to a chatbot in room by yourself, but the second that it’s in a room with lots of people, it becomes really interesting. And this is another area that I’ve never seen before with the chatbots, is a chat bot in an environment with multiple humans that’s not trying to be a person too. It’s almost like a collaborative command line interface or collaborative search or it’s some kind of collaborative space. And that’s really interesting.

No, I have to congratulate you, because you made this statement before that the important thing is being open to discover what sort of fits the new paradigm. And an example I always go back to is, when advertisements were first online, they just plastered an advertisement next to the text, because that’s what you did in newspapers. But why would you recreate that? Whatever works in the future has to take advantage of the nativeness of interface that it’s on. So feed advertising was the example. You have an infinite feed which you can’t have on paper and you can sort put advertisements in it.

In this case, all the original chatbots were too literal. You’re literally chatting with a bot and you see this every single time, the first version is too literal. And what you have to figure out is what’s actually uniquely made possible in this space. And what this is, is users chatting, but there’s an ongoing prompt, not just a prompt in sort of image generation, but a prompt as to what to chat about. If you got a bunch of strangers in a chat room, it’s going to devolve into insanity and hate speech and slurs and all that sort of stuff, but that doesn’t happen in Midjourney channels because there’s something to talk about, which is this bot generating incredible images every few seconds.

DH: Yeah, it’s kind of like a bunch of people in a room and ideas are swirling in the room. “What if this?”

I think we’re going to look back at Midjourney and this is what chatbots will be in the long run. It will be not you chatting with the bot, it’ll be you chatting with other people with a continual prompt of what to talk about.

DH: Yeah, I think that’s really cool. I think one of the things I also really was surprised about was how it helps a regular person who wouldn’t understand the product, actually understand the product better.

Yeah. And you don’t have to ask to feel stupid and say, “How do I do this?” You literally just sit and observe, which is how humans learn, generally.

DH: Well, but also, I mean, we find that when I did user testing, it was kind of unbelievable. It’s like, “Don’t you want a person to discover the product by themselves?” We would do this and we’d be like, “Okay, here’s a machine. It’ll let you do a picture of anything you want, anything you can imagine, what do you want?” And they just go, “Dog.”

And it’ll show them a photo of a dog and they go, “Okay.” And it’s like, “Well no, come on..” Because you’re there at first. “What do you want? Come on a little bit more than that.” And they go, “Big dog.” And then I keep questioning and they go, “Big fluffy dog.” And at the end of it they’re so uninterested, it’s like, “This isn’t interesting, why would I care about this?” But then you throw these people into the same environment all of a sudden, with complete strangers, they go, “Dog.” And someone else goes, “Space dog”, “Space dog with lasers”, “Space dog with lasers and angel wings”. And all of a sudden this person’s like, “Oh my god.”

Space dog with lasers and angel wings, by Midjourney V4

DH: They’ve been put into this imaginative environment and it starts to change their beliefs about themselves and what they can do and all of a sudden it’s creating an imaginative environment that actually makes people more imaginative too.

Yes.

DH: Which is a really weird. If I had said a year ago, “David, imagine a product onboarding experience where you throw somebody into a room and it makes them more imaginative with other people.” I would’ve had a really hard time. It’s hard to imagine that. What would that be? And it’s like, yeah, it’s really cool that it kind of emerges. It’s like when you throw people together, there’s lot of user interface stuff that’s kind of already solved because we have user interfaces to other humans. So there’s a certain argument that letting that happen, letting all the good stuff there happen and then getting out of the way, is very powerful. So I don’t know. That’s been really, really interesting. I mean, it’s kind of gotten to the point where it’s the largest server on Discord now. Discord isn’t really meant for something of that size per se, it wasn’t built for multimillion person servers. And then also, I mean the demographics are also not necessarily Discord demographics.

Right. It’s not a lot of gamers in there. You tell the story of the trucker at the rest stop, right?

DH: Yeah. There’s this trucker guy, he’s in his 50s and he uses Midjourney from his smartphone at the pit stops. That’s not the normal Discord user. In fact, I think we found in our last survey that there’s as many people over 45 as under 18 on the Discord server, which is completely unlike the normal Discord demographics. Also, the way that young people use it is actually totally different than older people using it. So there may actually be more than one product at play, but it’s hard to tell sometimes with all the people using it. The older people actually have more words and more experiences to create stuff with. It’s like older people, writers tend to be older and then younger people almost don’t have the words. They literally tell me, “David, I don’t have the words. I know I have the image but I don’t have the words.”

So sometimes they’ll actually make images out of other images and there’s a very different process which is interesting. The most important part is just have ideas and hypotheses and things you want to test but don’t go into there with such a strong opinion that you don’t see the true nature of the thing. And in fact, the most important part of building a product is just being open to discovering the true nature of a thing.

Yeah.

DH: And there’s something so opinionated about the average product experience. With the product creation experience where it’s from day one, you have to know your market and you have to know what’s the six user steps to the thing. And they’re trying to create all this complexity, it doesn’t really matter, all that matters is that people have an experience that’s emotionally resonant, that makes them feel, an emotional response. And to some extent I think if people feel an emotional response to your product, you’re like 80% of the way to having a business.

Midjourney’s Business

How much of this ability to take it slow, to be so clearly focused on and invested in the discovery process, to not necessarily be building a platform from day one, comes from the fact that incredibly enough, and it blows everyone’s mind when they hear this, Midjourney is not venture-funded currently? Was that part of the reason to go a different route?

DH: I had a lot of traumatic experiences coming out of Leap Motion. So my first goal was to design a lab that did not require investors from day one to get started and to function. And so I think that it was just kind of designed in a very particular way of, “I want to have something that I don’t need to go to anybody and explain to them what my business is in order to have money to start to do my research.” I have goals and I have ways of investigating those goals and I’m going to test those on people and at some point I’m going to charge for something and now I’m going to figure it out. It was a lot more of starting a business than starting a venture business.

That was really interesting, I’ve never done that before. There are a lot of advantages to having venture funding. But in this case, when I first started in tech I was a random person with the cool resume and a co-founder, but I didn’t have any credibility to work on crazy technology, whereas, now I do. So when I needed to find a cloud vendor to give me 10,000 GPUs, I could just email the head of the cloud vendor and say, “Hey, this is David doing a thing.” And they go “This is David, he is doing a thing.” And they could give me all the resources. And I didn’t need to have venture funding to do that because people effectively knew who I was. It was a huge advantage of just being a known factor in technology.

Most people uses that to get more funding. But in this case you were able to use it to get capabilities, particularly from the cloud vendors.

DH: I could build the team, I could get my team together because I know whatever it is I’m doing it’s going to be cool, and I would be able to figure it out. And then I got the vendors to give us capabilities. I mean, I think everybody in the tech field, they kind of know that whatever David works on, it’s going to be cool. They might not know if it’s going to be a business, but certainly from Leap Motion, they know it’s going to be one of the coolest things, it’s going to be really cool. So maybe I had something to prove from a business standpoint, but I didn’t have anything to prove from a technology standpoint. So everyone will help you build something cool. I could have asked investors, I’m sure if I had said “I’m making a thing, give me a check,” They would’ve done that too. They are very supportive out here. But as a second-time founder, I could’ve gotten venture funding with no questions asked.

As a second-time founder, though, you used that advantage and instead of going to investors, you went to vendors. Which, I think is really interesting. But I mean, Midjourney is definitely making money now, right?

DH: Yeah (chuckles).

I think the chuckle says it all. So I mean I have an enterprise account, which at least, when I bought it was $600 a year. But for individuals it is, I actually don’t have the price in front of me. But if you want to do more images than a free account, it’s relatively cheap and people just want to keep making more images, right?

DH: Yeah. It was pretty tricky to find the right business model for this. On one level, it’s a very honest business model. You want to make images, they happen on the cloud and we take a profit margin.

$10 a month, that’s what it is for basic and $30 a month for standard.

DH: One of the reasons why I wanted to make this a paid business was because it felt like the most honest business model somebody could have. It’s like I’m writing infrastructure, you’re paying for that infrastructure, and a profit margin to pay a team. So I really like that. Figuring out the right price and the right sort of features and building that up was actually quite challenging. The $10 and $30 and there was a lot of other weird things with unlimited, relaxed mode images. That was a weird journey that I had not been through before. We had no idea what to charge for anything.

I had to check the prices because they’ve already iterated since I was last looking at them a month ago.

DH: Yeah, I think the first month we lost money, because the power users, the whales, they just wiped out all of our margin and then took us to negative. And then the second one we built in safeties to prevent that from happening, and then we started making too much money and then we kind of tried to push the margins down, because we don’t really need margins that high at day one for something like this. And we just tried to push it into having better service for all the users. And then we can turn that knob until we’re at a place where we feel like we can pay for R&D and stuff.

The Oncoming GPU Crunch

So how does the cost structure work on this? So on one hand V4 just came, out nine months training a model. Training at least, to a layman like me is going to be very expensive. But then you have all these people in there actually generating the images, running these inference models, which I understand are running on GPUs. So in this case, I’m guessing your marginal cost per image generated is a much bigger chunk than the training? How does your cost structure work?

DH: I would say about 10% of our cloud costs are going into the training, and 90% is the inference to the users making images. So it’s almost all the cost is making the image.

And that making of the image is running on GPUs?

DH: Yes, yes. I’m not very sure whether to say the exact numbers that we’re using…

You’re using thousands and thousands and thousands of GPUs.

DH: More than thousands. Or usually more than 10,000 GPUs.

Are you the largest GPU user in the world?

DH: No, but we are one of them.

I think you’re unique because the majority of your usage is for inference, not for training. Right?

DH: Yeah, and our inference usage is weird in that some users are willing to wait and some don’t have to wait and some won’t wait. And then there’s a large latency, but the usage pattern is really weird. So we did a lot of innovative logistics stuff early on to make the costs low. So right now, if you make an image, there are eight different regions of the world that the image might get made in, and you have no idea. It might get made in Korea or Japan or the Netherlands or something. It’s going to eight different regions that the GPUs are balancing between. Where it’s really cool is because we’ll use a lot of the GPUs in Korea while it’s nighttime and everyone’s sleeping there, no one’s using them. We can kind of load balance. You can basically race the darkness of the night across the earth.

Interesting.

DH: Otherwise the GPUs are in high demand. It’s actually hard to get as many as we’re using. So I never really took this into account. There’s just the logistics of doing this stuff at high scale which is really interesting, and almost nobody realizes that we’re effectively already within a factor of 10 of the world running out of GPUs for this kind of market. So hopefully in the next year we go up a factor of 10 otherwise the cloud just runs out of machines, which will be a really interesting thing to happen. And so we have to plan for that almost, what happens if we do that and how do we run up to that and what’s that?

Why is that? Is there just not enough GPUs being made? Is there not enough fab capacity? What’s the limiting factor?

DH: Mostly just how many spare machines there are and all the data centers. So on average a cloud vendor tries to minimize the number of idle machines because that’s like minimizing your inventory in a retail store. There are a few vendors that don’t have the same cost structure where they have more cash, and they have other internal usages of it, and then they have more computers. But even they will only have so many more than they need. And so it’s really a best question of what’s the excess capacity of the present cloud vendors, and what are the markets that drive that capacity? And so I think what we are, in some ways, we are the first large consumer use case of a big AI model.

That’s why the bit about inference being the bigger portion of your cost is so interesting, because to date, if most of your GPUs are being used for training, there’s a certain predictability to that, both in terms of how many I’m going to need and how long I’m going to need them for. But your demand is unpredictable. It’s actually driven by consumer. And so you can’t forecast. You can see Midjourney’s growth, you can see we’re going to need these sorts of things. But then to build a GPU is a six-month process. There’s all this sort of lead time there. You could see how this mismatch would happen where you can see the demand coming, but that inflection point is going to hit, and maybe a lot of folks aren’t going to realize that inflection was coming until it does hit.

DH: Yeah, we’ve been caught off-guard a few times where we ran out of GPUs briefly at major cloud vendors. And when that happens, it’s really hard because people don’t get their pictures basically, and they get really upset. And so we constantly try to make sure. It feels a little bit whatever Apple does for their logistics where they’re trying to build the iPhone and make sure the supply chain’s good. We almost have a computational supply chain, which is a really interesting concept, which I think, never existed before basically large models.

Yeah, because there has never been consumer-driven direct demand for cloud compute.

DH: Yeah, I mean in this case, almost every dollar somebody spends on our service goes to paying for a GPU, which is kind of weird. It’s all going to compute more flops per dollar than any other service, certainly. And then, yeah, so what’s happened is there wasn’t really that much consumer demand for compute, and now we’re showing that there is, and the clouds are basically not built out under the assumption that there’s a large consumer demand for compute.

Basically the cloud will have to readjust to this new reality. And that is a pretty big readjustment. To me, if I had to guess, if we want a billion people in the world using large AI models, whether for AI image generation or for text or for anything, these are just large models, it doesn’t really matter what they’re being used for. If we want a billion people to have access to it, the cloud probably needs to be a thousand times larger computationally speaking than it is today. And so a thousand times difference, it’s like it’s not physically possible in the short term. And so for the next few years, I think these markets are going to be limited computationally more than anything, which is one reason why we actually are relatively quiet on the marketing side, and we are relatively quiet in terms of not having to have a product for everybody.

I was going to note that V4 in some respect got no press, right? It wasn’t on Techmeme. I mean, Twitter, of course, is sort of drowning out everything, but it was just an announcement in your Discord server. And it’s interesting. That’s actually sort of an important thing for you because you can’t handle the entire world trying V4 right now.

DH: Yeah.

Is this an opportunity for a new cloud vendor? Is this a paradigm shift? It sure seems like it, right? All cloud vendors to date were building for other businesses, and there was a predictability to, oh, we give you flexibility and scalability, but we do that in the context of us building this large infrastructure cloud in a steady predictable way. What you’re saying is no, you need this capacity for much more scalability. You think flexibility is a SaaS business. No. This is something completely different. And paradigm shifts are opportunities for new companies. Is there an opportunity for a new cloud vendor to serve this need?

DH: I have a lot of cloud vendors coming to me right now. What I’m usually telling them is like, “Thank you for coming to me with the offer of 10,000 machines. If I do all the work to get you online, I might only double my capacity.” So I’ll wait until I’m about to run out in order to do that because it’s a lot of extra work. And I would rather focus on fewer vendors instead of more vendors right now. But eventually I would take all of them if I needed them. But usually I don’t have the room to just come in and do all this work for a 20% boost or even a 2x boost in the capacity. Not yet. So the challenges is if you want to come in as a new cloud vendor, you probably need to be willing to put down a billion dollars up front in basically buying GPUs and building out data centers. And if you’re willing to put that money down you can immediately be an interesting player in the market. But if it’s less than that, it’s probably not super interesting. That’s be my personal feeling.

It feels like this is what Nvidia’s cloud offering should be.

DH: Yeah, could happen. If it’s just a question of who can buy more GPUs than probably Nvidia wins that game, but it’s probably also a challenge of, or an existing vendor, which everyone has extra cash. I think it’s more of a question of if we need a thousand times more compute in the cloud, there’s nothing. Having a thousand times more GPUs in the cloud is going to be an incredible physical expenditure of energy, not electricity, but it’s literally just making that many machines and making that many data centers. And so if we actually do need that much more compute, I think the opportunity is probably in saying that what we need maybe isn’t a GPU that might be twice as efficient or three times more efficient; basically the second you have to increase by a factor of a thousand, then you really start to ask what it actually makes sense to build.

We saw this with Bitcoin, right? Bitcoin moved away from GPU’s to dedicated chips, and maybe the argument here is that needs to happen because the payoff is so tremendous.

DH: Yeah, I don’t know what it will look like, but there are two possible worlds. One is that it just takes us seven years to scale a thousand X, and for the next seven years the market is computationally limited, which would be really interesting. Maybe five years, but it could just be five or seven years. And then the other argument is sometime in that period we’ll start to see huge new forms, significant energy put into custom chips, which could maybe drive it down another factor of 10, and then all of a sudden happens in one year, you know? So I don’t know what will happen there.

I’m aware of one really cool chip effort, which maybe in a few years you would burn the neural network into the chip directly, and then there isn’t even any memory anymore. It’s the transistors themselves that hold the weights. So electricity comes in, and images come out. There’s not even necessarily even a clock and you could just make these. That would be really cool. But then if that’s not what happens, then it’s going to be this big, the normal silicon industry kind of doing their normal optimizations and trying to see what they can do. It’s going to be a battle basically. Can we get the algorithms to eek out more performance on existing silicon? And I don’t know what’s going to come out that, I mean, yeah, actually, it is possible that a new silicon vendor could come up. I’ve seen some exotic chip designs.

Midjourney V4

I have five more questions I really want to get to, but we’re going a little long. I’ll finish with V4. V4 is so much more realistic. I guess there’s sort two questions about this relative to V3. V3 had a very Midjourney look, right? You could tell that was Midjourney, and that was a vibe that I think a lot of people really enjoyed. But V4 feels more realistic, and some people are like, “I kind of miss the vibe, I’ve missing the Midjourney vibe of V3.” Your take on that is question number one.

Number two, as part of realism comes a much more stronger ability to deep fakes or you can put someone in a particular situation. How are you thinking about this bit? Was the vibe of V3 just an accident because you couldn’t make good enough images, and now you can, and now these other concerns emerge?

DH: For every version of the software, we spend a lot of time just trying to make it look good, as good as that version can be with whatever limitations there are at the time. Version one and version two were just very raw algorithms. We were trying to make them look nice. Version three was the first one where we could use user data to try to be more informed about what the aesthetic should be, and people really like that. Version four is in a weird place right now where we’re still using some of the data from the older versions to kind of define the look, but the system itself is so different. It really should have a new look of its own. And we haven’t had time to push that out yet. We will try to push it out soon. Hopefully in the next week we’ll try to define more of a look for V4. But the problem was it was so different from the old one that we basically had to try to get entirely new user data to figure out what it should be, but it’s hard because it’s doing so much more. The range is so wide.

Oh, the amount of detail and not just that, but the background detail is what really stands out to me. And it’s not just that the subject in the foreground which is what you asked for and is very detailed, but there’s so much atmosphere, and it’s all coherent. It’s all the same image. I mean that’s what has blown me away with V4.

DH: If anything actually, the current range of V4 has been restricted by our current style. The current style of V4, we actually put a lot of guardrails on it in the first release because we didn’t have all the data collected yet. We just had to be very conservative. And so in theory it should actually have a much wider artistic range, but also the style should be by default, more opinionated and then you should be able to turn up and down the stylism slider.

Do you feel increasingly restrained because you have such a large user base now? Whereas before you could be more experimental, and now you’re like, “Oh, we need to get data but people are going to judge us right away.” Is that a challenge you’re facing?

DH: I would say that one challenge that we have found is that you basically can’t give anybody a model for more than a week without them becoming attached to it permanently.

Right.

DH: It was maybe too successful at making people see it as part of their own minds because we can almost never get rid of anything, which is hard. But I mean, there’s a lot of edge. Also the tools are a lot more powerful than they seem. There are a lot more features on our stuff than most, I would say any other platform. So when we launched the V4, we didn’t have weird aspect ratios, and we didn’t have all the style controls, and we had a lot of features that were missing. People were really upset and we’re kind of like, there are already a lot of features on here. We have to support a lot from day one. So that’s been really challenging.

I think one of the challenges is not so much having too many users, but the fact that there are actually multiple communities who want very different things out of Midjourney. There are professional designers. There are people who just making images that look as realistic as possible. There are people who want things that look really strange. Large amount of people generally use images. They are very different people, with very different uses of it. And then there’s the reality that making something that works for everybody is not really possible to some extent. And so that’s been a challenge. If anything, it would almost be nice if we could have a product for fewer people, because we’re so far ahead right now, I feel like everybody comes to us, but it is challenging to be everything for everyone. And sometimes I am just like, look, hey, if we don’t have that feature, great, you can go see somewhere else. I don’t mind. It’s totally okay.

Is this where you do feel the pressure of, “Oh, maybe I should get some more money. Maybe I should hire, get a bigger team and manifest what we’ve built as different products for different people”?

DH: It’s really interesting. I think my challenge, I would be open to making multiple products, but I think the challenge is that a product is not just a set of features, but it’s also a community and a brand and a vision. And it’s hard for me to take a bucket of money and buy six visions of the future and six different types of communities and products out of that. I haven’t figured out a way to scale that. I think perhaps no one’s ever scaled that. For some reason every product company basically doesn’t try to make 10 products at once, you know?

Yeah, no, it’s a good point. I mean, that was sort of Apple in its heyday. You were with Steve Jobs on his vision of computing and with Midjourney you’re on David Holz’s journey.

DH: Yeah.

It’s super interesting. It’s really compelling, and I have a bunch more questions on here, so I would love to have you back soon and get more into all this sort of stuff.

DH: Of course. Thank you.


This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery.

The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly.

Thanks for being a supporter, and have a great day!