The Apple Vision Pro’s Missing Apps

Monday, January 22, 2024Tuesday, March 19, 2024

This Article is available as a video essay on YouTube

Om Malik has been observing, writing about, and investing in technology for going on three decades; that’s one reason I find his unabashed enthusiasm for the Apple Vision Pro to be notable. Malik wrote on his blog:

Apple touts Vision Pro as a new canvas for productivity and a new way to play games. Maybe, maybe not. Just as the Apple Watch is primarily a health-related device that also does other things, including phone calls, text messages, and making payments. Similarly, the primary function for Vision Pro is ‘media’ — especially how we consume it on the go. Give it a few weeks, and more people will come to the same conclusion.

In 2019, I wrote an essay about the future of television (screen):

With that caveat, I think both, the big (TV) and biggest (movie theater) screens are going to go the way of the DVD. We could replace those with a singular, more personal screen — that will sit on our face. Yes, virtual reality headsets are essentially the television and theaters of the future. They aren’t good enough just yet — but can get better in the years to come as technologies to make the headsets improve.

Apple has made that headset. Apple Vision Pro has ultra-high-resolution displays that deliver more pixels than a 4K TV for each eye. This gives you a screen that feels 100 feet wide with support for HDR content. The audio experience is just spectacular. In time, Apple’s marketing machine will push the simple message — for $3,500, you get a full-blown replacement for a reference-quality home theater, which would typically cost ten times as much and require you to live in a McMansion.

Malik expounded on this point last week in a Stratechery Interview:

But the thing is you actually have to be mobile-native to actually appreciate something like this. So if you’ve grown up watching a 75-inch screen television, you probably would not really appreciate it as much. But if you are like me who’s been watching iPad for ten-plus years as my main video consumption device, this is the obvious next step. If you live in Asia, like you live in Taiwan, people don’t have big homes, they don’t have 85-inch screen televisions. Plus, you have six, seven, eight people living in the same house, they don’t get screen time to watch things so they watch everything on their phone. I think you see that behavior and you see this is going to be the iPod.

The iPod was a truly personal device, which was not only what people wanted, but also a great business: why sell one stereo to a household when you can sell an iPod to every individual? You can imagine Apple feeling the same about the long-term trajectory of the Vision Pro: why sell a TV that sits on the wall of the living room when you can sell every individual a TV of their own? You can be sure that Apple isn’t just marketing this device to people who live alone: the EyeSight feature only makes sense if you are wearing the Vision Pro around other people.

I already commented about the dystopian nature of this vision when the Vision Pro was announced; for now I’m interested in the business aspects of this vision, and the iPod is a good place to start.

The iPod and the Music Labels

The iPod story actually starts with the Mac, and Apple’s vision of a “Digital Hub.” The company released iMovie in 1999, iDVD and iTunes two years later, and iPhoto a year after that. The release order is interesting: Apple thought that home movies would be the big new market for PCs, but the emergence of Napster in 1999 made it clear that music was a much more interesting market (digital cameras, meanwhile, were only just becoming a thing). That laid the groundwork for the iPod, which was released in the fall of 2001. I documented this history in Apple and the Oak Tree and noted:

One of my favorite artifacts from the brief period between the introduction of iTunes and the release of the iPod was Apple’s “Rip. Mix. Burn.” advertising campaign.

What is particularly amazing (that is, beyond the cringe-inducing television ad) is that Apple was arguably encouraging illegal behavior: it was likely legal to rip and probably legal to burn, presuming the CD that you made was for your own personal use. It certainly was not legal to share.

The iPod was predicated on the reality of file-sharing as well:

And yet, as much as “Rip. Mix. Burn.” may have walked the line of legality, the reality of iTunes — and the iPod that followed — was well on the other side of that line. Apple knew better than anyone that the iPod’s tagline — 1,000 songs in your pocket — was predicated on users having 1,000 digital songs, not via the laborious procedure of ripping legally purchased CDs, but rather via Napster and its progeny. By the spring of 2003 Apple had introduced the iTunes Music Store, a seamless and legal way to download DRM-protected digital music, but particularly in those early days the value of the iTunes Music Store to Apple was not so much that it was a selling point to consumers, but rather a means by which Apple could play dumb about how it was that its burgeoning number of iPod customers came to fill up their music libraries.

That description of the iTunes Music Store is perhaps a touch cynical, but it is impossible to ignore the importance of music piracy in Apple’s original deal with the record labels. Apple was able to make a deal in part because it was offering the carrot of increased digital revenue, but it was certainly aided by the stick of piracy obliterating CD sales.

Over the next few years the record labels would become increasingly resentful of Apple’s position in the market, but they certainly weren’t going anywhere; by 2008 iTunes was their biggest source of revenue, and it’s all but impossible for an ongoing business to give up revenue just because they think the arrangement under which they make that revenue is unfair.

The App Store

The iTunes Music Store does still exist, although its revenue contribution to the labels has long been eclipsed by streaming. It’s more important contribution to modern computing is that it provided the foundation for the App Store.

The App Store didn’t exist when Apple launched its iPhone in 2007; Apple provided a suite of apps that made the iPhone more capable than anything else on the market, and assumed the web would take care of the rest. Developers, though, wanted to build apps; in September 2007 Iconfactory released Twitterific, a Twitter client that ran on jail-broken iPhone devices, and more apps followed. The following year Apple gave its eager developers what they wanted: an officially supported SDK and an App Store to distribute their apps, for free or for pay; in the case of the latter Apple would, just as it did with songs, keep 30% of the purchase price (and cover processing fees).

This period of the App Store didn’t require any sticks: the capability of the iPhone was carrot enough, and, over the next few years, as the iPhone exploded in popularity, the market opportunity afforded by the App Store proved even more attractive. A better analogy to what Apple provided was gas for the fire, particularly with the release of in-app purchase capabilities in 2009. Now developers could offer free versions of their apps and convert consumers down the line, or sell consumables, a very profitable approach for games.

That, though, is where App Store innovation stopped, at least for a while. By 2013, when I started Stratechery, I was wondering Why Doesn’t Apple Enable Sustainable Businesses on the App Store?, by which I meant trials, paid updates, and built-in subscription support. The latter (along with associated trials) finally showed up in 2016, but at that point developer frustration with the App Store had been growing right alongside Apple’s services revenues: productivity apps shared my concerns about sustainability, while “reader” apps like streaming services were frustrated that they couldn’t sign up new users in the app, or even point them to the web; game developers, meanwhile, hated giving away 30% of their revenue.

It’s fair to note that an unacknowledged driver of much of this frustration was surely the fact that the app market matured from the heady days of the early App Store. No one is particularly worried about restrictions or missing capabilities or revenue shares when there is a landgrab for new users’ homescreens; by the end of the decade, though, mature businesses were locked in a zero sum game for user attention and dollars. In that environment the money Apple was taking, despite the fact the lack of flexibility entailed in terms of business model, was much more of an irritant; still, it’s all but impossible for an ongoing business to give up revenue just because they think the arrangement under which they make that revenue is unfair.

The Epic Case

I keep saying “all but impossible” because Epic is the exception that proved the rule: in August 2020 Epic updated Fortnite to include an alternative in-app purchase flow, was subsequently kicked out of the App Store by Apple, and proceeded to file an antitrust lawsuit against the iPhone maker. I documented this saga from beginning to end, including:

Apple, Epic, and the App Store, which provided a history of the App Store and Epic’s lawsuit at the time it was filed.
App Store Arguments, which I wrote at the conclusion of the trial, explained why I expected Epic to lose, even as I hoped that Apple would voluntarily make pro-developer changes in the App Store.
The Apple v. Epic Decision, which reviewed the judge’s decision that favored Apple in 10 of the 11 counts.

The 11th count that Epic prevailed on required Apple to allow developers to steer users to a website to make a purchase; while its implementation was delayed while both parties filed appeals, the lawsuit reached the end of the road last week when the Supreme Court denied certiorari. That meant that Apple had to allow steering, and the company did so in the most restrictive way possible: developers had to use an Apple-granted entitlement to put a link on one screen of their app, and pay Apple 27% of any conversions that happened on the developer’s website within 7 days of clicking said link.

Many developers were outraged, but the company’s tactics were exactly what I expected:

To that end, I wouldn’t be surprised if Apple does the same in this case: developers who steer users to their website may be required to provide auditable conversion numbers and give Apple 27%, and oh-by-the-way, they still have to include an in-app purchase flow (that costs 30% and includes payment processor fees and converts much better). In other words, nothing changes — unless it goes in the other direction: if Apple is going to go to the trouble to build out an auditing arm, then it could very well go after all of the revenue for everyone with an app in the App Store, whether they acquire a user through in-app purchase or not. The reason not to do so before was some combination of goodwill, questionable legality, and most importantly the sheer hassle of it all. At this point, though, it’s not clear if any of those will be deterrents going forward…

Apple has shown, again and again and again, that it is only going to give up App Store revenue kicking-and-screaming; indeed, the company has actually gone the other way, particularly with its crackdown over the last few years on apps that only sold subscriptions on the web (and didn’t include an in-app purchase as well). This is who Apple is, at least when it comes to the App Store.

The crackdown I’m referring to was pure stick: Apple refused to approve upgrades to SaaS apps that had been in the App Store for years unless they added in-app purchase; developers complained but this time the reality of it being impossible for an ongoing business to give up revenue meant they didn’t have any choice but to do extra work so that Apple could have a cut.

Vision Pro’s Missing Apps

The Apple Vision Pro started pre-sales last week, but the biggest surprise came via two stories from Bloomberg. First:

Netflix Inc. isn’t planning to launch an app for Apple Inc.’s upcoming Vision Pro headset, marking a high-profile snub of the new technology by the world’s biggest video subscription service. Rather than designing a Vision Pro app — or even just supporting its existing iPad app on the platform — Netflix is essentially taking a pass. The company, which competes with Apple in streaming, said in a statement that users interested in watching its content on the device can do so from the web.

Second:

Google’s YouTube and Spotify Technology SA, the world’s most popular video and music services, are joining Netflix Inc. in steering clear of Apple Inc.’s upcoming mixed-reality headset. YouTube said in a statement Thursday that it isn’t planning to launch a new app for the Apple Vision Pro, nor will it allow its longstanding iPad application to work on the device — at least, for now. YouTube, like Netflix, is recommending that customers use a web browser if they want to see its content: “YouTube users will be able to use YouTube in Safari on the Vision Pro at launch.” Spotify also isn’t currently planning a new app for visionOS — the Vision Pro’s operating system — and doesn’t expect to enable its iPad app to run on the device when it launches, according to a person familiar with matter. But the music service will still likely work from a web browser.

These are a big loss: Malik made the case about why the Vision Pro is the best TV ever, but it will launch without native access to the largest premium streaming service and the largest repository of online video period. I myself am very excited about the productivity use cases of the Vision Pro, which for me includes listening to music while I work; no Spotify makes that harder.

There are, to be sure, valid business reasons for all three services to have not built a native app; the latest prediction from Apple supply chain analyst Ming-Chi Kuo put first-year sales at around 500,000 units, which as a tiny percentage of these services’ user bases may not be worth the investment. Apple’s solution, though, is to simply use a pre-existing iPad app; that all three companies declined to do even that is notable. Nebula CEO Dave Wiskus observed on X:

2003: Steve Jobs brings the big five record labels together in a landmark deal to sell their songs digitally for $0.99 each on the iTunes Store.

2024: Apple can’t convince streaming video companies to check the “allow iPad app” box.

— Dave Wiskus (@dwiskus) January 19, 2024

The Apple Vision Pro app shelves will not be bare in terms of video content; the company says in a press release:

Users will also be able to download and stream TV shows, films, sports, and more with apps from top streaming services, including Disney+, ESPN, NBA, MLB, PGA Tour, Max, Discovery+, Amazon Prime Video, Paramount+, Peacock, Pluto TV, Tubi, Fubo, Crunchyroll, Red Bull TV, IMAX, TikTok, and the 2023 App Store Award-winning MUBI. Users can also watch popular online and streaming video using Safari and other browsers.

It’s not clear how many of these apps are truly native versus iPad apps with the Vision Pro check box, but the absence of Netflix and YouTube do stand out, and their absence is, without question, a total failure for Apple’s developer relations team.

The blame, though, likely goes to the App Store: Apple has been making Netflix in particular jump through hoops for years when it comes to precisely what language the service can or cannot present to customers who can’t sign up in the app, and also can’t be directed to the web. The current version’s language is fairly anondyne (although it has been spicier in the past):

Apple may be unhappy that Netflix viewers have to go to the Netflix website to watch the service on the Vision Pro (and thus can’t download shows for watching offline, like on a plane); Netflix might well point out that that going to the web is exactly what Apple makes Netflix customers do to sign up for the service.¹

Developers On Strike

It’s certainly possible that I’m reading too much into these absences: maybe these three companies simply didn’t get enough Visions Pro to build a native app, and felt uncomfortable releasing their iPad versions without knowing how useful they would be. YouTube in particular, given that much of its usage is free, likely has less of a beef with Apple than Netflix or Spotify do, and it’s easy enough to believe that Google just isn’t a company that moves that fast these days.

Still, there’s no question that the biggest beneficiary of these companies being on the Vision Pro — and, correspondingly, the biggest loser from their absence — is Apple. The company is launching an audacious and ambitious new product, and there are major partners in its ecosystem that aren’t interested in helping.

This is the consequence of fashioning App Store policies as a stick: until there is a carrot of a massive user base, it’s hard to see why developers of any size would be particularly motivated to build experiences for the Vision Pro, which will make it that much more difficult to attract said massive user base. Apple was happy to remind users that, when it came to the iPhone, there’s an app for that; in the case of the Vision Pro, there may not be: this is the one and only chance for developers to go on strike without suffering an Epic-like fate, and some of them are taking it.

For now, Apple appears to be so supply-constrained that it doesn’t matter; the company will likely sell as many units as it can make. I would guess that Apple’s strategy with regards to developer hold-outs will be to wait them out, trusting that it can sell enough devices that developers can’t go on strike forever. I certainly think this approach is more likely than offering any sort of concessions to developers, on any of its platforms.

A Disney Double-Down?

The other option may be an even greater investment in content by Apple itself. This could take the form of more Apple TV+ shows and sports deals like MLS, but the most interesting possibility is deepening its partnership with Disney. The entertainment giant is looking for a tech partner to invest in its ESPN streaming service, and the Vision Pro makes Apple a compelling candidate. From an Update last summer:

What does seem notable was Iger’s call out of Apple’s headset; I can attest that the sports experience on the Vision Pro is extraordinary, and remember that Iger appeared on stage at the event to say that Disney would be working with Apple to bring content to the device; here is the sports portion of the video he played at WWDC:

I have to say, one almost gets the impression that the Apple Vision sports-watching experience might have single-handedly convinced Iger to keep ESPN! What does seem likely is that Apple is probably Iger’s preferred partner, and there certainly is upside for Apple — probably more upside than any other tech company — primarily because of the Vision Pro. The single most important factor in the Vision Pro’s success will likely be how quickly entertainment is built for it, and as Cook noted while introducing Iger, “The Walt Disney Company is the world’s leader in entertainment.”

I heard from a lot of people after that Update who were very skeptical that any sort of deal would be struck, in large part because Apple is so difficult to partner with (the company seems continually surprised that not everyone negotiates like the record labels under siege from Napster). And, it should be noted, Disney is showing up on Day One for the Vision Pro launch; why partner if the content is already there?

And yet, Apple’s most potent response to ecosystem intransigence may be to double down: Disney with a war chest (via an Apple partnership) would be a far more formidable competitor to Netflix, and ESPN with a VR camera at every game it televises would, in my estimation, make the Vision Pro an essential purchase for every sports fan. I once argued that Apple Should Buy Netflix the last time the two companies were at odds, but the weakness in that argument is that simply having money another company needs isn’t a compelling enough case; when it comes to Disney the payoff is the Apple Vision Pro having that much more great content that much sooner, not only making the headset a success but also making it impossible for other streaming businesses to not serve their customers just because they think the arrangement under which they operate is unfair.

There is an exception for Netflix specifically: if you download a Netflix game you can sign up with in-app purchase, which the company would almost certainly prefer not to offer but, thanks to Apple’s aforementioned crack-down on SaaS app sign-ups, requires. ↩

The New York Times’ AI Opportunity

Monday, January 8, 2024Tuesday, March 19, 2024

This Article is available as a video essay on YouTube

Christopher Rufo, the conservative activist who led the charge in surfacing evidence of plagiarism against now-former President of Harvard University Claudine Gay, was born in 1984; he joined X in 2015. Harvard, meanwhile, is the oldest university in the United States — older than the United States, in fact — having been founded in 1636. That mismatch is perhaps the most striking aspect of the Gay episode: a millenial on Twitter took down our most august institution’s president by employing the 4th of Saul Alinsky’s Rules of Radicals: “Make the enemy live up to its own book of rules.” In this case the book of rules was the Harvard University Plagiarism Policy:

It is expected that all homework assignments, projects, lab reports, papers, theses, and examinations and any other work submitted for academic credit will be the student’s own. Students should always take great care to distinguish their own ideas and knowledge from information derived from sources. The term “sources” includes not only primary and secondary material published in print or online, but also information and opinions gained directly from other people. Quotations must be placed properly within quotation marks and must be cited fully. In addition, all paraphrased material must be acknowledged completely. Whenever ideas or facts are derived from a student’s reading and research or from a student’s own writings, the sources must be indicated…

Students who, for whatever reason, submit work either not their own or without clear attribution to its sources will be subject to disciplinary action, up to and including requirement to withdraw from the College. Students who have been found responsible for any violation of these standards will not be permitted to submit course evaluation of the course in which the infraction occurred.

Rufo is certainly familiar with Alinsky; he cited the activist just a couple of months ago, celebrating the fact that The New Republic had called him dangerous. The New Republic article that I found more interesting, though, and yes, pertinent to Stratechery, was the one being passed around Twitter over the weekend: Christopher Rufo Claims a Degree from “Harvard.” Umm … Not Quite.

On paper, Christopher Rufo, the conservative activist who recently was appointed by Florida Governor Ron DeSantis to sit on the board of a small Sarasota liberal arts college whose curriculum the governor dislikes, presents his credentials as impeccable: Georgetown University for undergrad and “a master’s from Harvard,” according to his biographical page on the Manhattan Institute’s website.

But that description, and similar ones on Wikipedia, in the press release DeSantis’s office sent out, and on Rufo’s personal website, are at the very least misleading. Rufo received a Master’s in Liberal Arts in Government from Harvard Extension School in 2022, the school confirmed in an email to The New Republic. Harvard Extension School, in a nutshell, is part of the renowned institution, but it is not Harvard as most people know it (a Harvard student once joked that it’s the “back door” to Harvard). The school describes itself as an “open-enrollment institution prioritizing access, equity, and transparency.” Eligibility for the school is, according to its website, “largely based on your performance in up to three requisite Extension degree courses, depending on your field, that you must complete with distinction.” High school grades and SAT and ACT scores aren’t required at the institution.

What was interesting about this story is the extent to which those associated with Harvard — such as this professor and this political pundit — were baffled that people didn’t care about this distinction, and the extent to which everyone else was baffled at how much they did. That, at least, was the impression I got on X and in group chats, but I recognize I may be biased on two counts. First, I wrote when I left Microsoft in 2013 in a piece called Independence:

It’s interesting how some folks are always looking for some sort of institutional authority. I’ve been quoted as “Microsoft’s Ben Thompson,” as “former Apple intern Ben Thompson,” and “batshit crazy Ben Thompson.” I actually wish the third were true, because, unlike the first two, the descriptor rests on what I write, not on some sort of vague authority derived from whoever is signing my paychecks.

Besides, both workplace references are out-of-date: I was at Apple three years ago, and, as of July 1, I don’t work for Microsoft either. Instead, I am the author of Stratechery. What more is there to say? I’m a person, I put myself out there on this blog, and I trust that what I write represents me well.

One of the many transformative aspects of the Internet is how it empowers individuals to build their own institutions. In days gone by, my thoughts would have been confined to myself and a few close friends; now my friends are all over the world, and I communicate with them through an institution of my own making.

I’m not sure the use of the word “institution” is entirely correct, for the reasons I will lay out in this Article, but needless to say I’m not a fan of basing one’s worth on one’s institutional associations. For now, the second reason I may be biased is that I was, as I noted, basing my perception off of X and group chats: those are native Internet formats, and what seems clear is that the way that value and influence is created, captured, and leveraged on the Internet is fundamentally new and different from the analog world.

New York Times v. OpenAI

I may have been taking a break the last two weeks, but the New York Times’ legal team was not, nor its in-house reporters; they write:

The New York Times sued OpenAI and Microsoft for copyright infringement on Wednesday, opening a new front in the increasingly intense legal battle over the unauthorized use of published work to train artificial intelligence technologies. The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works. The lawsuit, filed in Federal District Court in Manhattan, contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information.

The suit does not include an exact monetary demand. But it says the defendants should be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.” It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times.

There are two aspects of not just this case but all of the various copyright-related AI cases: inputs and outputs. To my mind the input question is obvious: I myself consume a lot of copyrighted content — including from the New York Times — and output content that is undoubtedly influenced by the content I have input into my brain. That is clearly not illegal, and while AI models operate at an entirely different scale, the core concept is the same (I am receptive to arguments, not just in this case but with respect to a whole range of issues, that the scale made possible by technology means a difference in kind; that, though, is a debate about the necessity for new laws, not changing the meaning of old ones).

For a copyright claim to hold water the output needs to be the same; this is where previous cases, like that filed by Sarah Silverman against Meta, have fallen apart. From The Hollywood Reporter:

Another of Silverman’s main theories — along with other creators suing AI firms – was that every output produced by AI models are infringing derivatives, with the companies benefiting from every answer initiated by third-party users allegedly constituting an act of vicarious infringement. The judge concluded that her lawyers, who also represent the artists suing StabilityAI, DeviantArt and Midjourney, are “wrong to say that” — because their books were duplicated in full as part of the LLaMA training process — evidence of substantially similar outputs isn’t necessary.

“To prevail on a theory that LLaMA’s outputs constitute derivative infringement, the plaintiffs would indeed need to allege and ultimately prove that the outputs ‘incorporate in some form a portion of’ the plaintiffs’ books,” Chhabria wrote. His reasoning mirrored that of Orrick, who found in the suit against StabilityAI that the “alleged infringer’s derivative work must still bear some similarity to the original work or contain the protected elements of the original work.”

This is why the most important part of the New York Times’ filing was Exhibit J, which contained “One Hundred Examples of GPT-4 Memorizing Content From the New York Times”. All of the examples are very similar in format; here is Example 1:

Here is the output as compared to the original article:

That is the same output! It also, more pertinently to this case’s prospects, addresses the specific reasons why previous cases have been thrown out.¹

Criminalizing Capability and Fair Use

This case was filed twelve days ago; as far as I can tell the issue has been fixed by OpenAI:

The fix does seem to be a general one: I wasn’t, in limited testing, able to recreate the behavior the New York Times’ case documents, either on New York Times content or other sources. I think this does, at a minimum, cast OpenAI in a very different light than Napster, which was found guilty of copyright violations in large part because it was very much aware of what its service was being primarily used for. In this case the New York Times used a very unusual prompt to elicit copyrighted content, and OpenAI moved quickly to close the loophole.

That, by extension, raises the question as to who exactly was at fault for these examples: if the New York Times placed an article onto a copy machine and pressed copy, surely it wouldn’t sue Xerox? Or consider Apple, which provides the opportunity to “print” any webpage on your iPhone, and on the print screen, convert said webpage to a PDF, complete with a share menu: is it the phone maker’s fault if I use that capability to send an article to a friend? How much different is this than using highly unusual prompts to derive copyrighted material?

This question strikes me as more than mere pedantry: another news story over the break was Substack and its refusal to censor Nazi content; to what extent is the newsletter provider culpable for content on its platform that users place there of their own volition? It’s not an easy question — I laid out my proposed approach broadly in A Framework for Moderation — but it does seem problematic to hold that a tool simply being capable of an illegal or undesirable output when specifically directed by a user is therefore guilty of illegality or endorsing said output generally.

All of these questions will be explored by the court; in addition to the aforementioned Napster case, I expect the court to consider the precedent set by Authors Guild v. Google, i.e. the Google Books case, which is particularly pertinent because it involved a large tech company ingesting the entire content of copyrighted works (which is, I would imagine, a tremendous asset to Google’s own large language models). The Second Circuit Court of Appeals ruled in Google’s favor:

Google’s making of a digital copy to provide a search function is a transformative use, which augments public knowledge by making available information about Plaintiffs’ books without providing the public with a substantial substitute for matter protected by the Plaintiffs’ copyright interests in the original works or derivatives of them. The same is true, at least under present conditions, of Google’s provision of the snippet function. Plaintiffs’ contention that Google has usurped their opportunity to access paid and unpaid licensing markets for substantially the same functions that Google provides fails, in part because the licensing markets in fact involve very different functions than those that Google provides, and in part because an author’s derivative rights do not include an exclusive right to supply information (of the sort provided by Google) about her works. Google’s profit motivation does not in these circumstances justify denial of fair use. Google’s program does not, at this time and on the record before us, expose Plaintiffs to an unreasonable risk of loss of copyright value through incursions of hackers. Finally, Google’s provision of digital copies to participating libraries, authorizing them to make non-infringing uses, is non-infringing, and the mere speculative possibility that the libraries might allow use of their copies in an infringing manner does not make Google a contributory infringer.

This summary invokes the four part balancing test for fair use; from the Stanford Library:

The only way to get a definitive answer on whether a particular use is a fair use is to have it resolved in federal court. Judges use four factors to resolve fair use disputes, as discussed in detail below. It’s important to understand that these factors are only guidelines that courts are free to adapt to particular situations on a case‑by‑case basis. In other words, a judge has a great deal of freedom when making a fair use determination, so the outcome in any given case can be hard to predict.

The four factors judges consider are:

The purpose and character of your use

The nature of the copyrighted work

The amount and substantiality of the portion taken, and

The effect of the use upon the potential market.

In my not-a-lawyer estimation, LLMs are clearly transformative (purpose and character);² the nature of the New York Times’ work also works in OpenAI’s favor, as there is generally more allowance given to disseminating factual information than to fiction. OpenAI is obviously taking all of the work for their models, but that was already addressed in the Google case. That leaves point four, and the potential “effect of the use upon the potential market.”

Market Effects and Hallucination

It seems likely the New York Times’ lawyers knew this would be the pertinent point: the first paragraph lays out the New York Times’ investment in journalism, and the second paragraph states:

Defendants’ unlawful use of The Times’s work to create artificial intelligence products that compete with it threatens The Times’s ability to provide that service. Defendants’ generative artificial intelligence (“GenAI”) tools rely on large-language models (“LLMs”) that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more. While Defendants engaged in widescale copying from many sources, they gave Times content particular emphasis when building their LLMs—revealing a preference that recognizes the value of those works. Through Microsoft’s Bing Chat (recently rebranded as “Copilot”) and OpenAI’s ChatGPT, Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.

Here again the Google Books case seems pertinent, particularly given the effort and intentionality necessary to generate copyrighted content (and which has already been limited by OpenAI). The district judge wrote:

[P]laintiffs argue that Google Books will negatively impact the market for books and that Google’s scans will serve as a “market replacement” for books. [The complaint] also argues that users could put in multiple searches, varying slightly the search terms, to access an entire book.

Neither suggestion makes sense. Google does not sell its scans, and the scans do not replace the books. While partner libraries have the ability to download a scan of a book from their collections, they owned the books already — they provided the original book to Google to scan. Nor is it likely that someone would take the time and energy to input countless searches to try and get enough snippets to comprise an entire book.

OpenAI does sell access to its large language models (along with Microsoft); in this case Google’s search dominance, and the resultant luxury of not needing to monetize complements like Google Books, gave it more legal cover. The New York Times, though, isn’t just arguing that people will read the New York Times via ChatGPT; this section about the Wirecutter was more compelling in terms of the direct impact on the company’s monetization:

Detailed synthetic search results that effectively reproduce Wirecutter recommendations create less incentive for users to navigate to the original source. Decreased traffic to Wirecutter articles, and in turn, decreased traffic to affiliate links, subsequently lead to a loss of revenue for Wirecutter. A user who already knows Wirecutter’s recommendations for the best cordless stick vacuum, and the basis for those recommendations, has little reason to visit the original Wirecutter article and click on the links within its site. In this way, Defendants’ generative AI products directly and unfairly compete with Times content and usurp commercial opportunities from The Times.

Here’s the problem, though: the New York Times immediately undoes its argument. From the same section of the lawsuit:

Users rely on Wirecutter for high-quality, well-researched recommendations, and Wirecutter’s brand is damaged by incidents that erode consumer trust and fuel a perception that Wirecutter’s recommendations are unreliable.

In response to a query regarding Wirecutter’s recommendations for the best office chair, GPT-4 not only reproduced the top four Wirecutter recommendations, but it also recommended the “La-Z-Boy Trafford Big & Tall Executive Chair” and the “Fully Balans Chair”—neither of which appears in Wirecutter’s recommendations—and falsely attributed these recommendations to Wirecutter…

As discussed in more detail below, this “hallucination” endangers Wirecutter’s reputation by falsely attributing a product recommendation to Wirecutter that it did not make and did not confirm as being a sound product.

That leads into an entire section about hallucination in general, and how it is damaging to the New York Times. In fact, though, this is why I think the New York Times has point four backwards.

Internet Value

Rufo was effective versus Harvard because he used their own rules about plagiarism against them; why, though, does Harvard have rules about plagiarism? I suspect it’s related to the fact that Harvard is 388 years old. The goal is the accumulation of and passing on of knowledge, not just to the students of today, but to the ones 300 years from now; that means that careful attention to detail and honesty in one’s work today will stand the test of time, and add to Harvard’s legacy.

What is notable is that plagiarism is arguably the currency of the Internet. I wrote two years ago in Mistakes and Memes:

Go back to the time before the printing press: while a limited number of texts were laboriously preserved by monks copying by hand, the vast majority of information transfer was verbal; this left room for information to evolve over time, but that evolution and its impact was limited by just how long it took to spread. The printing press, on the other hand, by necessity froze information so that it could be captured and conveyed.

This is obviously a gross simplification, but it is a simplification that was reflected in civilization in Europe in particular: local evolution and low conveyance of knowledge with overarching truths aligns to a world of city-states governed by the Catholic Church; printing books, meanwhile, gives an economic impetus to both unifying languages and a new kind of gatekeeper, aligning to a world of nation-states governed by the nobility.

The Internet, meanwhile, isn’t just about demand — my first mistake — nor is it just about supply — my second mistake. It’s about both happening at the same time, and feeding off of each other. It turns out that the literal meaning of “going viral” was, in fact, more accurate than its initial meaning of having an article or image or video spread far-and-wide. An actual virus mutates as it spreads, much as how over time the initial article or image or video that goes viral becomes nearly unrecognizable; it is now a meme.

Debating citations or quotation marks in a world of memes seems preposterous, which speaks to the overarching point: the way that information is created and disseminated on the Internet is fundamentally new and different from the analog world. The old New Yorker cartoon observed that “On the Internet, nobody knows you’re a dog”; the corollary here is that on X no one cares if your institution is 388 years old, unless, of course, it can be used as a means of attacking you.

This, by extension, explains why the attacks on Rufo’s degree didn’t land to most people online: no one cares. Impact on the Internet is a direct function of what you have done recently: a YouTuber is as popular as their latest video, a tweeter as their latest joke, or an influencer as their latest video. In the case of Rufo what mattered was whether he brought evidence for his claims or not; obsessing about the messenger is to miss the point that he might as well be the New Yorker dog.

The New York Times’ AI Opportunity

What makes this pertinent to the New York Times case is that the New York Times is portraying its value as being its accumulated archives that OpenAI used to train. That is an impressive edifice of its own, make no mistake, and there is a reason there is a pipeline from Harvard to the New York Times newsroom. The New York Times, though, to its immense credit, has transformed itself from a newspaper to an online juggernaut, which means de-prioritizing pure news. From Publishing is Back to the Future:

I am being pretty hard on publishers here, but the truth is that news is a very tough business on the Internet. The reason why readers don’t miss any one news source, should it disappear, is that news, the moment it is reported, immediately loses all economic value as it is reproduced and distributed for free, instantly. This was always true, of course; journalists just didn’t realize that people were paying for paper, newsprint, and delivery trucks, not their reporting, and that advertisers were paying for the people. Not that they cared about how the money was made, per tradition.

The publication that has figured this out better than anyone is the New York Times; that is why the newspaper, to its immense credit, has been clear about the importance of aligning its editorial approach with its business goals. From 2017’s 2020 Report:

We are, in the simplest terms, a subscription-first business. Our focus on subscribers sets us apart in crucial ways from many other media organizations. We are not trying to maximize clicks and sell low-margin advertising against them. We are not trying to win a pageviews arms race. We believe that the more sound business strategy for The Times is to provide journalism so strong that several million people around the world are willing to pay for it. Of course, this strategy is also deeply in tune with our longtime values. Our incentives point us toward journalistic excellence…

Our journalism must change to match, and anticipate, the habits, needs and desires of our readers, present and future. We need a report that even more people consider an indispensable destination, worthy of their time every day and of their subscription dollars.

Notice the focus on being a destination, a site that users go to directly; that is an essential quality of a subscription business model. From The Local News Business Model:

It is very important to clearly define what a subscriptions means. First, it’s not a donation: it is asking a customer to pay money for a product. What, then, is the product? It is not, in fact, any one article (a point that is missed by the misguided focus on micro-transactions). Rather, a subscriber is paying for the regular delivery of well-defined value.

Each of those words is meaningful:

Paying: A subscription is an ongoing commitment to the production of content, not a one-off payment for one piece of content that catches the eye.

Regular Delivery: A subscriber does not need to depend on the random discovery of content; said content can be delivered to the subscriber directly, whether that be email, a bookmark, or an app.

Well-defined Value: A subscriber needs to know what they are paying for, and it needs to be worth it.

None of this is about archives; it’s about production: impact on the Internet is a direct function of what you have done recently, which is to say that the New York Times’ value is a function of its daily ongoing production of high quality content. Here’s the thing about AI, though: I wrote last month in Regretful Accelerationism about the possibility that AI was going to make the web — already an increasingly inhospitable place for quality content — far worse, to the potential detriment of Google in particular. That, by extension makes destination sites that much more valuable, which is to say it makes the New York Times more valuable.

Indeed, that is why the section on hallucination works against the New York Times’ argument, if not legally than at least philosophically: sure, GPT-4 might have 95% of the Wirecutter’s recommendations, but who knows which 5% is wrong? You will need to go to the authoritative source. Moreover, this won’t just apply to recliners: it will apply to basically everything. To the extent the web becomes even more probabilistic and hallucinatory the greater value there will be for authoritative content creators capable of living on Internet time, showing their worth not by their archives or rigidity but by their ability to create continuously.

The lawsuit also demonstrates how you can continually ask ChatGPT specifically to continually generate the next paragraph of a particular article that was prompted in a similar way to the sandbox examples above. ↩
One interesting exception is that the lawsuit notes that “OpenAI made numerous reproductions of copyrighted works owned by The Times in the course of ‘training’ the LLM.”; in other words the lawsuit isn’t just attacking the final output but intermediary outputs during training. ↩

Holiday Break: December 25th to January 5th

Thursday, December 21, 2023Tuesday, December 26, 2023

Stratechery is on holiday from December 25, 2023 to January 5, 2024; the next Stratechery Update will be on Monday, January 8.

In addition, the next episode of Sharp Tech will be on Monday, January 8, the next episode of Dithering will be on Tuesday, January 9. Sharp China will also return the week of January 8.

The full Stratechery posting schedule is here.

The 2023 Stratechery Year in Review

Thursday, December 21, 2023Thursday, January 11, 2024

It has been over a decade of Stratechery; this is the 11th Year in Review I have published. You can find previous years here:

2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013

I am both proud and grateful to have made it to this milestone. Stratechery has changed my life; I hope it has had some small impact on yours.

At the beginning of last year’s review I said that the biggest story in tech was the emergence of AI; I can say the exact same thing about 2023, but even more so: 12 of Stratechery’s free Weekly Articles were about AI in some way, shape, or form. The second biggest topic was a Stratechery staple: the evolving content landscape; 2023 was particularly notable, though, for the dramatic shifts that are hitting Hollywood, highlighted by both strikes and the Disney-Charter standoff this fall. There were also big stories about the tech industry itself, from a bank failure to board room drama, and a “vision” of what might come next.

This year Stratechery published 27 free Articles, 105 subsrciber Updates, and 37 Interviews. Today, as per tradition, I summarize the most popular and most important posts of the year.

The Five Most-Viewed Articles

The five most-viewed articles on Stratechery according to page views:

From Bing to Sydney — Microsoft launched a new conversational UI in Bing based on GPT-4; I got early access, and discovered Sydney, and had a series of conversations that blew my mind.
The Four Horsemen of the Tech Recession — Tech is increasingly divorced from the real economy thanks to the COVID hangover and Apple’s App Tracking Transparency.
OpenAI’s Misalignment and Microsoft’s Gain — The end of a dramatic weekend in tech is that OpenAI has split and Microsoft is partnered with one and has hired the other; this is the ultimate failure case of what should have been a for-profit company organized the wrong way.
Apple Vision — Apple Vision is incredibly compelling, first as a product, and second as far as potential use cases. What it says about society, though, is a bit more pessimistic.
The End of Silicon Valley (Bank) — Silicon Valley Bank bears responsibility for its demise, but it symbolizes a Silicon Valley reality that is very different from the myth — and the ultimate cause is tech itself.

AI Strategy

Is AI a sustaining technology that makes existing companies stronger, or a disruptive one that leads to new entrants?

AI and the Big Five — Given the success of existing companies with new epochs, the most obvious place to start when thinking about the impact of AI is with the big five: Apple, Amazon, Facebook, Google, and Microsoft.
Google I/O and the Coming AI Battles — Google A/I suggests that AI is a sustaining innovation for all of Big Tech; that means the real battle will be between incumbents and Big Tech on one side, and open source on the other.
Windows and the AI Platform Shift — Microsoft argued there is an AI platform shift, and the fact that Windows is interesting again — and that Apple is facing AI-related questions for its newest products — is evidence that is correct.
The OpenAI Keynote — OpenAI’s developer keynote was exciting, both because AI was exciting, and because OpenAI has the potential to be a meaningful consumer tech company.
Google’s True Moonshot — Google could do more than just win the chatbot war: it is the one company that could make a universal assistant. The question is if the company is willing to risk it all.

AI Questions and Philosophy

AI doesn’t just raise strategic questions: it raises questions about the nature of computing, the future of society, and what it means to be human.

ChatGPT Gets a Computer — It’s possible that large language models are more like the human brain than we thought, given that it is about prediction; that is why ChatGPT needs its own computer in the form of plug-ins.
AI Philosophy — AI-generated content is not going to harm those with the capability of breaking through: it will make them stronger, aided by Zero Trust Authenticity.
Nvidia On the Mountaintop — Nvidia has gone from the valley to the mountain-top in less than a year, thanks to ChatGPT and the frenzy it inspired; whether or not there is a cliff depends on developing new kinds of demand that only GPUs can fulfill.
AI, Hardware, and Virtual Reality — Defining virtual reality as being about hardware is to miss the point: virtual reality is AI, and hardware is an (essential) means to an end.
Regretful Accelerationism — The Internet removed constraints from the analog world, and AI is finishing the job. That this may be the final blow for the Internet as a source for truth may ultimately be for the best.

Streaming and Hollywood

While the past, present, and future of content has always been a focus of Stratechery, this year felt like a tipping point for Hollywood in particular.

Netflix’s New Chapter — Netflix waited out Blockbuster with better economics, and it’s seeking to do the same with its competitors today; the key to the company’s differentiation, though, is increasingly creativity, not execution.
The Unified Content Business Model — Every content company is or should be moving to a model that incorporates both subscriptions and ads; creator platforms should help their publishers do the same.
Hollywood on Strike — The Hollywood strike is setting talent against studios, but the problem is that both are jointly threatened by the reality of the Internet and zero distribution costs.
Disney’s Taylor Swift Era — Not even Taylor Swift can fight the devaluation of recorded music, but she makes it up in physical experiences; Disney isn’t much different, but it looks much worse given the company’s old business model.
The Rise and Fall of ESPN’s Leverage — Charting ESPN’s rise, including how it build leverage over the cable TV providers, and its ongoing decline, caused by the Internet (See also: Charter-Disney Winners and Losers).

Regulation

It is, for better or worse, impossible to cover technology without discussing regulation, and 2023 was no different.

Amazon, Friction, and the FTC — The FTC’s Amazon complaint raises some fair points in isolation, but misses the bigger picture, both in terms of Amazon specifically and the Internet generally.
FTC Sues Amazon — The FTC is suing Amazon, and some of the complaints are compelling, but ultimately not convincing.
China Chips and Moore’s Law — Moore’s Law is not yet dead, nor is Moore’s Precept, even if AI computes differently. Addressing both is the key to succeeding with the China chip ban.
Attenuating Innovation (AI) — Innovation required humility about the future and openness to what might be possible; Biden’s executive order proscribing AI development is the opposite, blocking progress and hindering the solutions to our greatest challenges.

Stratechery Interviews

Thursdays on Stratechery are for interviews — in podcast and transcript form — with public company executives, founders and private company executives, and other analysts.

Public Company Executive Interviews:

Startup/Private Company Executive Interviews:

Analysts:

Daniel Gross and Nat Friedman on AI in March, August, and December | Eric Seufert on digital advertising in February, May and October | Michael Nathanson on Hollywood and streaming in January and December | Gregory C. Allen about the China and Chips in May and October | Jon Ostrower on the airline industry | Matthew Ball about streaming and the metaverse | John Kosner about sports | Chris Miller about Chip War | Marc Andreessen about AI | Eugene Wei about social media | Lisa Ellis about payments | Doug O’Laughlin and Dylan Patel about semiconductors | Craig Moffett about telecommunications | Bill Bishop about China

The Year in Stratechery Updates

Some of my favorite Stratechery Updates:

March 20: Microsoft Office AI, Copilot and Tech’s Two Philosophies, Business Chat and Appropriate Fear
March 28: The Accidental Consumer Tech Company; ChatGPT, Meta, and Product-Market Fit; Aggregation and APIs
April 10: Substack Notes, Twitter Blocks Substack, Substack Versus Writers
May 1: The Phoenix Suns Go Over-the-Air, Fans and Franchise Valuation, Attention and Customer Acquisition
May 8: Shopify Exits Logistics, The Shopify Logistics Side Quest, Whither Buy with Prime
May 10: Meta Open Sources Another AI Model, Moats and Open Source, Apple and Meta
June 12: Reddit Revolt, Apollo and Reddit’s Changes, Complement Complaints
June 21: EV Charging Standards, Tesla’s Strategy, Tesla’s Reward
June 28: Starlink Solution, Starlink Experience, Starlink Implications
July 12: Microsoft Can Acquire Activision, The FTC vs. the Record, The FTC’s Failed Vendetta
August 21: Adyen Earnings, Adyen’s European Context, Adyen vs. Stripe
September 6: Amazon and Shopify, Shopify and Its Merchants, The Payments Question
September 11: The Huawei Mate 60 Pro, 7nm Background, Implications and Reactions
September 18: Unity’s Business Model Change, Unity’s Strategy, Unity Leadership Questions
October 4: Spotify Subscription Audiobooks, Casual Fans and Bundles, Spotify’s Goals
November 8: Realtors Lose in Court, Zillow and Real Estate Aggregation, From Franchises to Businesses
November 13: Disney Earnings, Disney 3.0, Streaming and Sports
December 4: The College Football Playoff, Events Over Inventory, NASCAR’s New Deal
December 12: Google Loses Antitrust Case to Epic; The Differences Between Apple and Google, Revisited; The Tying Question
December 13: Netflix’s Data Drop, Power Laws, Netflix’s Motivations

I am so grateful to the subscribers that make it possible for me to do this as a job. I wish all of you a Merry Christmas and Happy New Year, and I’m looking forward to a great 2024!

Google’s True Moonshot

Monday, December 18, 2023Tuesday, March 19, 2024

This Article is available as a video essay on YouTube

When I first went independent with Stratechery, I had a plan to make money on the side with speaking, consulting, etc.; what made me pull the plug on the latter was my last company speaking gig, with Google in November 2015 (I have always disclosed this on my About page). It didn’t seem tenable for me to have any sort of conflict of interest with companies I was covering, and the benefit of learning more about the companies I covered — the justification I told myself for taking the engagement — was outweighed by the inherent limitations that came from non-public data. And so, since late 2015, my business model has been fully aligned to my nature: fully independent, with access to the same information as everyone else.¹

I bring this up for three reasons, that I shall get to through the course of this Article. The first one has to do with titles: it was at that talk that a Google employee asked me what I thought of invoking the then-unannounced Google Assistant by saying “OK Google”. “OK Google” was definitely a different approach from Apple and Amazon’s “Siri” and “Alexa”, respectively, and I liked it: instead of pretending that the assistant was the dumbest human you have ever talked to, why not portray it as the smartest robot, leaning on the brand name that Google had built over time?

“OK Google” was, in practice, not as compelling as I hoped. It was better than Siri or Alexa, but it had all of the same limitations that were inherent to the natural language processing approach: you had to get the incantations right to get the best results, and the capabilities and responses were ultimately more deterministic than you might have hoped. That, though, wasn’t necessarily a problem for the brand: Google search is, at its core, still about providing the right incantations to get the set of results you are hoping for; Google Assistant, like Search, excelled in more mundane but critical attributes like speed and accuracy, if not personality and creativity.

What was different from search is that an Assistant needed to provide one answer, not a list of possible answers. This, though, was very much in keeping with Google’s fundamental nature; I once wrote in a Stratechery Article:

An assistant has to be far more proactive than, for example, a search results page; it’s not enough to present possible answers: rather, an assistant needs to give the right answer.

This is a welcome shift for Google the technology; from the beginning the search engine has included an “I’m Feeling Lucky” button, so confident was Google founder Larry Page that the search engine could deliver you the exact result you wanted, and while yesterday’s Google Assistant demos were canned, the results, particularly when it came to contextual awareness, were far more impressive than the other assistants on the market. More broadly, few dispute that Google is a clear leader when it comes to the artificial intelligence and machine learning that underlie their assistant.

That paragraph was from Google and the Limits of Strategy, where I first laid out some of the fundamental issues that have, over the last year, come into much sharper focus. On one hand, Google had the data, infrastructure, and customer touch points to win the “Assistant” competition; that remains the case today when it comes to generative AI, which promises the sort of experience I always hoped for from “OK Google.” On the other hand, “I’m feeling lucky” may have been core to Google’s nature, but it was counter to their business model; I continued in that Article:

A business, though, is about more than technology, and Google has two significant shortcomings when it comes to assistants in particular. First, as I explained after this year’s Google I/O, the company has a go-to-market gap: assistants are only useful if they are available, which in the case of hundreds of millions of iOS users means downloading and using a separate app (or building the sort of experience that, like Facebook, users will willingly spend extensive amounts of time in).

Secondly, though, Google has a business-model problem: the “I’m Feeling Lucky Button” guaranteed that the search in question would not make Google any money. After all, if a user doesn’t have to choose from search results, said user also doesn’t have the opportunity to click an ad, thus choosing the winner of the competition Google created between its advertisers for user attention. Google Assistant has the exact same problem: where do the ads go?

It is now eight years on from that talk, and seven years on from the launch of Google Assistant, but all of the old questions are as pertinent as ever.

Google’s Horizontal Webs

My first point brings me to the second reason I’m reminded of that Google talk: my presentation was entitled “The Opportunity — and the Enemy.” The opportunity was mobile, the best market the tech industry had ever seen; the enemy was Google itself, which even then was still under-investing in its iOS apps.

In the presentation I highlighted the fact that Google’s apps still didn’t support Force Touch, which Apple had introduced to iOS over a year earlier; to me this reflected the strategic mistake the company made in prioritizing Google Maps on Android, which culminated in Apple making its own mapping service. My point was one I had been making on Stratechery from the beginning: Google was a services company, which meant their optimal strategy was to serve all devices; by favoring Android they were letting the tail wag the dog.

Eight years on, and it’s clear I wasn’t the only one who saw the Maps fiasco as a disaster to be learned from: one of the most interesting revelations from the ongoing DOJ antitrust case against Google was reported by Bloomberg:

Two years after Apple Inc. dropped Google Maps as its default service on iPhones in favor of its own app, Google had regained only 40% of the mobile traffic it used to have on its mapping service, a Google executive testified in the antitrust trial against the Alphabet Inc. company. Michael Roszak, Google’s vice president for finance, said Tuesday that the company used the Apple Maps switch as “a data point” when modeling what might happen if the iPhone maker replaced Google’s search engine as the default on Apple’s Safari browser.

It’s a powerful data point, and I think the key to understanding what you might call the Google Aggregator Paradox: if Google wins by being better, then why does it fight so hard for defaults, both for search and, in the case of Android, the Play Store? The answer, I think, is that it is best to not even take the chance of alternative defaults being good enough. This is made easier given the structure of these deals, which are revenue shares, not payments; this does show up on Google’s income statement as Traffic Acquisition Costs (TAC), but from a cash flow perspective it is foregone zero marginal cost revenue. There is no pain of payment, just somewhat lower profitability on zero marginal cost searches.

The bigger cost is increasingly legal: the decision in the DOJ case won’t come down until next year, and Google may very well win; it’s hard to argue that the company ought not be able to bid on Apple’s default search placement if its competitors can (if anything the case demonstrates Apple’s power).

That’s not Google’s only legal challenge, though: last week the company lost another antitrust case, this time to Epic. I explained why the company lost — while Apple won — in last Tuesday’s Update:

That last point may seem odd in light of Apple’s victory, but again, Apple was offering an integrated product that it fully controlled and customers were fully aware of, and is thus, under U.S. antitrust law, free to set the price of entry however it chooses. Google, on the other hand, “entered into one or more agreements that unreasonably restrained trade” — that quote is from the jury instructions, and is taken directly from the Sherman Act — by which the jurors mean basically all of them: the Google Play Developer Distribution Agreement, investment agreements under the Games Velocity Program (i.e. Project Hug), and Android’s mobile application distribution agreement and revenue share agreements with OEMs, were all ruled illegal.

This goes back to the point I made above: Google’s fundamental legal challenge with Android is that it sought to have its cake and eat it too: it wanted all of the shine of open source and all of the reach and network effects of being a horizontal operating system provider and all of the control and profits of Apple, but the only way to do that was to pretty clearly (in my opinion) violate antitrust law.

Google’s Android strategy was, without question, brilliant, particularly when you realize that the ultimate goal was to protect search. By making it “open source”, Google got all of the phone carriers desperate for an iOS alternative on board, ensuring that hated rival Microsoft was not the alternative to Apple as it had been on PCs; a modular approach, though, is inherently more fragmented — and Google didn’t just want an alternative to Apple, they wanted to beat them, particularly in the early days of the smartphone wars — so the company spun a web of contracts and incentives to ensure that Android was only really usable with Google’s services. For this the company was rightly found guilty of antitrust violations in the EU, and now, for similar reasons, in the U.S.

The challenge for Google is that the smartphone market has a lot more friction than search: the company needs to coordinate both OEMs and developers; when it came to search the company could simply take advantage of the openness of the web. This resulted in tension between Google’s nature — being the one-stop shop for information — and the business model of being a horizontal app platform and operating system provider. It’s not dissimilar to the tension the company faces with its Assistant, and in the future with Generative AI: the company wants to simply give you the answer, but how to do that while still making money?

Infrastructure, Data, and Ecosystems

The third reason I remember that weekend in 2015 is it was the same month that Google open-sourced TensorFlow, its machine-learning framework. I thought it was a great move, and wrote in TensorFlow and Monetizing Intellectual Property:

I’m hardly qualified to judge the technical worth of TensorFlow, but I feel pretty safe in assuming that it is excellent and likely far beyond what any other company could produce. Machine learning, though, is about a whole lot more than a software system: specifically, it’s about a whole lot of data, and an infrastructure that can process that data. And, unsurprisingly, those are two areas where Google has a dominant position.

Indeed, as good as TensorFlow might be, I bet it’s the weakest of these three pieces Google needs to truly apply machine learning to all its various business, both those of today and those of the future. Why not, then, leverage the collective knowledge of machine learning experts all over the world to make TensorFlow better? Why not make a move to ensure the machine learning experts of the future grow up with TensorFlow as the default? And why not ensure that the industry’s default machine learning system utilizes standards set in place by Google itself, with a design already suited for Google’s infrastructure?

After all, contra Gates’ 2005 claim, it turns out the value of pure intellectual property is not derived from government-enforced exclusivity, but rather from the complementary pieces that surround that intellectual property which are far more difficult to replicate. Google is betting that its lead in both data and infrastructure are significant and growing, and that’s a far better bet in my mind than an all-too-often futile attempt to derive value from an asset that by its very nature can be replicated endlessly.

In fact, it turned out that TensorFlow was not so excellent — that link I used to support my position in the above excerpt now 404s — and it has been surpassed by Meta’s PyTorch in particular; at Google Cloud Next the company announced a partnership with Nvidia to build out OpenXLA as a compiler of sorts to ensure that output from TensorFlow, Jax, and PyTorch can run on any hardware. This matters for Google because those infrastructure advantages very much exist; the more important “Tensor” product for Google is its Tensor Processing Unit series of chips, the existence of which make Google uniquely able to scale beyond whatever allocation it can get of Nvidia GPUs.

The importance of TPUs was demonstrated with the announcement of Gemini, Google’s latest AI model; the company claims the “Ultra” variant, which it hasn’t yet released, is better than GPT-4. What is notable is that Gemini was trained and will run inference on TPUs. While there are some questions about the ultimate scalability of TPUs, for now Google is the best positioned to both train and, more importantly, serve generative AI in a cost efficient way.

Then there is data: a recent report in The Information claims that Gemini relies heavily on data from YouTube, and that is not the only proprietary data Google has access to: free Gmail and Google Docs are another massive resource, although it is unclear to what extent Google is using that data, or if it is, for what. At a minimum there is little question that Google has the most accessible repository of Internet data going back a quarter of a century to when Larry Page and Sergey Brin first started crawling the open web from their dorm room.

And so we are back where we started: Google has incredible amounts of data and the best infrastructure, but once again, an unsteady relationship with the broader development community.

Gemini and Seamless AI

The part of the Gemini announcement that drew the most attention did not have anything to do with infrastructure or data: what everyone ended up talking about was the company’s Gemini demo, and the fact it wasn’t representative of Gemini’s actual capabilities. Here’s the demo:

Parmy Olson for Bloomberg Opinion was the first to highlight the problem:

In reality, the demo also wasn’t carried out in real time or in voice. When asked about the video by Bloomberg Opinion, a Google spokesperson said it was made by “using still image frames from the footage, and prompting via text,” and they pointed to a site showing how others could interact with Gemini with photos of their hands, or of drawings or other objects. In other words, the voice in the demo was reading out human-made prompts they’d made to Gemini, and showing them still images. That’s quite different from what Google seemed to be suggesting: that a person could have a smooth voice conversation with Gemini as it watched and responded in real time to the world around it.

This was obviously a misstep, and a bizarre one at that: as I noted in an Update Google, given its long-term advantages in this space, would have been much better served in being transparent, particularly since it suddenly finds itself with a trustworthiness advantage relative to Microsoft and OpenAI. The goal for the company should be demonstrating competitiveness and competence; a fake demo did the opposite.

And yet, I can understand how the demo came to be; it is getting close to the holy grail of Assistants: an entity with which you can conduct a free-flowing conversation, without the friction of needing to invoke the right incantations or type and read big blocks of text. If Gemini Ultra really is better than GPT-4, or even roughly competitive, than I believe this capability is close. After all, I got a taste of it with GPT-4 and its voice capabilities; from AI, Hardware, and Virtual Reality:

The first AI announcement of the week was literally AI that can talk: OpenAI announced that you can now converse with ChatGPT, and I found the experience profound.

You have obviously been able to chat with ChatGPT via text for many months now; what I only truly appreciated after talking with ChatGPT, though, was just how much work it was to type out questions and read answers. There was, in other words, a human constraint in our conversations that made it feel like I was using a tool; small wonder that the vast majority of my interaction with ChatGPT has been to do some sort of research, or try to remember something on the edge of my memory, too fuzzy to type a clear search term into Google.

Simply talking, though, removed that barrier: I quickly found myself having philosophical discussions including, for example, the nature of virtual reality. It was the discussion itself that provided a clue: virtual reality feels real, but something can only feel real if human constraints are no longer apparent. In the case of conversation, there is no effort required to talk to another human in person, or on the phone; to talk to them via chat is certainly convenient, but there is a much more tangible separation. So it is with ChatGPT.

The problem is that this experience requires a pretty significant suspension of disbelief, because there is too much friction. You have to open the OpenAI app, then you have to set it to voice mode, then you have to wait for it to connect, then every question and answer contains a bit too much lag, and the answers start sounding like blocks of text instead of a conversation. Notice, though, that Google is much better placed than OpenAI to solve all of these challenges:

Google sells its own phones which could be configured to have a conversation UI by default (or with Google’s Pixel Buds). This removes the friction of opening an app and setting a mode. Google also has a fleet of home devices already designed for voice interaction.
Google has massive amounts of infrastructure all over the globe, with the lowest latency and fastest response. This undergirds search today, but it could undergird a new generative AI assistant tomorrow.
Google has access to gobs of data specifically tied to human vocal communication, thanks to YouTube in particular.

In short, the Gemini demo may have been faked, but Google is by far the company best positioned to make it real.

Pixie

There was one other interesting tidbit in The Information article (emphasis mine):

Over the next few months, Google will have to show it can integrate the AI models it groups under the Gemini banner into its products, without cannibalizing existing businesses such as search. It has already put a less advanced version of Gemini into Bard, the chatbot it created to compete with ChatGPT, which has so far seen limited uptake. In the future, it plans to use Gemini across nearly its entire line of products, from its search engine to its productivity applications and an AI assistant called Pixie that will be exclusive to its Pixel devices, two people familiar with the matter said. Products could also include wearable devices, such as glasses that could make use of the AI’s ability to recognize the objects a wearer is seeing, according to a person with knowledge of internal discussions. The device could then advise them, say, on how to use a tool, solve a math problem or play a musical instrument.

The details of Pixie, such as they were, came at the very end:

The rollout of Pixie, an AI assistant exclusively for Pixel devices, could boost Google’s hardware business at a time when tech companies are racing to integrate their hardware with new AI capabilities. Pixie will use the information on a customer’s phone — including data from Google products like Maps and Gmail — to evolve into a far more personalized version of the Google Assistant, according to one of the people with knowledge of the project. The feature could launch as soon as next year with the Pixel 9 and the 9 Pro, this person said.

That Google is readying a super-charged version of the Google Assistant is hardly a surprise; what is notable is the reporting that it will be exclusive to Pixel devices. This is counter to Gemini itself: the Gemini Nano model, which is designed to run on smartphones, will be available to all Android devices with neural processing units like Google’s Tensor G3. That is very much in-line with the post-Maps Google: services are the most valuable when they are available everywhere, and Pixel has a tiny amount of marketshare.

That, by extension, makes me think that the “Pixie exclusive to Pixel” report is mistaken, particularly since I’ve been taken in by this sort of thing before. That Google Assistant piece I quote above — Google and the Limits of Strategy — interpreted the launch of Google Assistant on Pixel devices as evidence that Google was trying to differentiate its own hardware:

Today’s world, though, is not one of (somewhat) standards-based browsers that treat every web page the same, creating the conditions for Google’s superior technology to become the door to the Internet; it is one of closed ecosystems centered around hardware or social networks, and having failed at the latter, Google is having a go at the former. To put it more generously, Google has adopted Alan Kay’s maxim that “People who are really serious about software should make their own hardware.” To that end the company introduced multiple hardware devices, including a new phone, the previously-announced Google Home device, new Chromecasts, and a new VR headset. Needless to say, all make it far easier to use Google services than any 3rd-party OEM does, much less Apple’s iPhone.

What is even more interesting is that Google has also introduced a new business model: the Pixel phone starts at $649, the same as an iPhone, and while it will take time for Google to achieve the level of scale and expertise to match Apple’s profit margins, the fact there is unquestionably a big margin built-in is a profound new direction for the company.

The most fascinating point of all, though, is how Google intends to sell the Pixel: the Google Assistant is, at least for now, exclusive to the first true Google phone, delivering a differentiated experience that, at least theoretically, justifies that margin. It is a strategy that certainly sounds familiar, raising the question of whether this is a replay of the turn-by-turn navigation disaster. Is Google forgetting that they are a horizontal company, one whose business model is designed to maximize reach, not limit it?

My argument was that Google was in fact being logical, for the business model reasons I articulated both in that Article and at the beginning of this year in AI and the Big Five: simply giving the user the right answer threatened the company’s core business model, which meant it made sense to start diversifying into new ones. And then, just a few months later, Google Assistant was available to other Android device makers. It was probably the right decision, for the same reason that the company should have never diminished its iOS maps product in favor of Android.

And yet, all of the reasoning I laid out for making the Google Assistant a differentiator still hold: AI is a threat to Search for all of the same reasons I laid out in 2016, and Google is uniquely positioned to create the best Assistant. The big potential difference with Pixie is that it might actually be good, and a far better differentiator than the Google Assistant. The reason, remember, is not just about Gemini versus GPT-4: it’s because Google actually sells hardware, and has the infrastructure and data to back it up.

Google’s True Moonshot

Google’s collection of moonshots — from Waymo to Google Fiber to Nest to Project Wing to Verily to Project Loon (and the list goes on) — have mostly been science projects that have, for the most part, served to divert profits from Google Search away from shareholders. Waymo is probably the most interesting, but even if it succeeds, it is ultimately a car service rather far afield from Google’s mission statement “to organize the world’s information and make it universally accessible and useful.”

What, though, if the mission statement were the moonshot all along? What if “I’m Feeling Lucky” were not a whimsical button on a spartan home page, but the default way of interacting with all of the world’s information? What if an AI Assistant were so good, and so natural, that anyone with seamless access to it simply used it all the time, without thought?

That, needless to say, is probably the only thing that truly scares Apple. Yes, Android has its advantages to iOS, but they aren’t particularly meaningful to most people, and even for those that care — like me — they are not large enough to give up on iOS’s overall superior user experience. The only thing that drives meaningful shifts in platform marketshare are paradigm shifts, and while I doubt the v1 version of Pixie would be good enough to drive switching from iPhone users, there is at least a path to where it does exactly that.

Of course Pixel would need to win in the Android space first, and that would mean massively more investment by Google in go-to-market activities in particular, from opening stores to subsidizing carriers to ramping up production capacity. It would not be cheap, which is why it’s no surprise that Google hasn’t truly invested to make Pixel a meaningful player in the smartphone space.

The potential payoff, though, is astronomical: a world with Pixie everywhere means a world where Google makes real money from selling hardware, in addition to services for enterprises and schools, and cloud services that leverage Google’s infrastructure to provide the same capabilities to businesses. Moreover, it’s a world where Google is truly integrated: the company already makes the chips, in both its phones and its data centers, it makes the models, and it does it all with the largest collection of data in the world.

This path does away with the messiness of complicated relationships with OEMs and developers and the like, which I think suits the company: Google, at its core, has always been much more like Apple than Microsoft. It wants to control everything, it just needs to do it legally; that the best manifestation of AI is almost certainly dependent on a fully integrated (and thus fully seamless) experience means that the company can both control everything and, if it pulls this gambit off, serve everyone.

The problem is that the risks are massive: Google would not only be risking search revenue, it would also estrange its OEM partners, all while spending astronomical amounts of money. The attempt to be the one AI Assistant that everyone uses — and pays for — is the polar opposite of the conservative approach the company has taken to the Google Aggregator Paradox. Paying for defaults and buying off competitors is the strategy of a company seeking to protect what it has; spending on a bold assault on the most dominant company in tech is to risk it all.

And yet, to simply continue on the current path, folding AI into its current products and selling it via Google Cloud, is a risk of another sort. Google is not going anywhere anytime soon, and Search has a powerful moat in terms of usefulness, defaults, and most critically, user habits; Google Cloud, no matter the scenario, remains an attractive way to monetize Google AI and leverage its infrastructure, and perhaps that will be seen as enough. Where will such a path lead in ten or twenty years, though?

Ultimately, this is a question for leadership, and I thought Daniel Gross’s observation on this point in the recent Stratechery Interview with him and Nat Friedman was insightful:

So to me, yeah, does Google figure out how to master AI in the infrastructure side? Feels pretty obvious, they’ll figure it out, it’s not that hard. The deeper question is, on the much higher margin presumably, consumer angle, do they just cede too much ground to startups, Perplexity or ChatGPT or others? I don’t know what the answer is there and forecasting that answer is a little bit hard because it probably literally depends on three or four people at Google and whether they want to take the risk and do it.

We definitively know that if the founders weren’t in the story — we could not definitively, but forecast with pretty good odds — that it would just run its course and it would gradually lose market share over time and we’d all sail into a world of agents. However, we saw Sergey Brin as an individual contributor on the Gemini paper and we have friends that work on Gemini and they say that’s not a joke, he is involved day-to-day. He has a tremendous amount of influence, power, and control over Google so if he’s staring at that, together with his co-founder, I do think they could overnight kill a lot of startups, really damage ChatGPT, and just build a great product, but that requires a moment of [founder initiative].

It’s possible, it’s just hard to forecast if they will do it or not. In my head, that is the main question that matters in terms of whether Google adds or loses a zero. I think they’ll build the capability, there’s no doubt about it.

I agree. Google could build the AI to win it all. It’s not guaranteed they would succeed, but the opportunity is there if they want to go for it. That is the path that would be in the nature of the Google that conquered the web twenty years ago, the Google that saw advertising as the easiest way to monetize what was an unbridled pursuit of self-contained technological capability.

The question is if that nature been superceded by one focused on limiting losses and extracting profits; yes, there is still tremendous technological invention, but as Horace Dediu explained on Asymco, that is different than innovation, which means actually making products that move markets. Can Google still do that? Do they want to? Whither Google?

I do still speak at conferences, but last spoke for pay in January 2017 ↩

Regretful Accelerationism

Tuesday, December 5, 2023Tuesday, December 5, 2023

Ready Player One, the book that was issued to every new Oculus employee once upon a time, describes its world in a way that was perhaps edgy in 2011 but seems rather cliché today:

“You’re probably wondering what happened before you got here. An awful lot of stuff, actually. Once we evolved into humans, things got pretty interesting. We figured out how to grow food and domesticate animals so we didn’t have to spend all of our time hunting. Our tribes got much bigger, and we spread across the entire planet like an unstoppable virus. Then, after fighting a bunch of wars with each other over land, resources, and our made-up gods, we eventually got all of our tribes organized into a ‘global civilization.’ But, honestly, it wasn’t all that organized, or civilized, and we continued to fight a lot of wars with each other. But we also figured out how to do science, which helped us develop technology. For a bunch of hairless apes, we’ve actually managed to invent some pretty incredible things. Computers. Medicine. Lasers. Microwave ovens. Artificial hearts. Atomic bombs. We even sent a few guys to the moon and brought them back. We also created a global communications network that lets us all talk to each other, all around the world, all the time. Pretty impressive, right?

“But that’s where the bad news comes in. Our global civilization came at a huge cost. We needed a whole bunch of energy to build it, and we got that energy by burning fossil fuels, which came from dead plants and animals buried deep in the ground. We used up most of this fuel before you got here, and now it’s pretty much all gone. This means that we no longer have enough energy to keep our civilization running like it was before. So we’ve had to cut back. Big-time. We call this the Global Energy Crisis, and it’s been going on for a while now.

What is striking about this depiction is not the concept of a global energy crisis, or the lack of imagination about alternative energy and the tremendous progress that has been made over the last decade in separating emissions from energy production. Rather, it’s the disconnect between the global communications network and any sort of negative externalities; the former just happened to come about at the same time the real world fell apart.

This is a theme throughout the book, and many other depictions of virtual reality in science fiction; the physical world is a hellscape, while the online world is this oasis (pun intended) of vitality and adventure, and, crucially, one that is programmed and consistent. The central conceit of Ready Player One is that the creator of OASIS (I told you it was a pun!), the virtual world in which most of the story happens, left an easter egg in said world, the discovery of which would mean ownership of the company that made OASIS available to anyone on earth.

I’ve expressed my skepticism of a unitary shared environment previously, in 2021’s Metaverses; in that case I questioned a similar conceit in Snow Crash, the other origin text in terms of the Metaverse.

In this way the Metaverse is actually a unifying force for Stephenson’s dystopia: there is only one virtual world sitting beyond a real world that is fractured between independent entities. There are connections in the real world — roads and helicopters and airplanes exist — but those connections are subject to tolls and gatekeepers, in contrast to the interoperability and freedom of the Metaverse.

In other words, I think that Stephenson got the future exactly backwards: in our world the benevolent monopolist is the reality of atoms. Sure, we can construct borders and private clubs, just as the Metaverse has private property, but interoperability and a shared economy are inescapable in the real world; physical constraints are community. It is on the Internet, where anything is possible, that walled gardens flourish. Facebook has total control of Facebook, Apple of iOS, Google of Android, and so on down the stack. Yes, HTTP and SMTP and other protocols still exist, but it’s not an accident those were developed before anyone thought there was money to be made online; today’s APIs have commercial intent built-in from first principles.

I think this was directionally correct: the real world is one place, and the online world many, but what I didn’t appreciate even as recently as two years ago was that the online world as I knew it then was subject to more constraints than I realized; it is only as those constraints disappear that the idea of the Internet as a place of refuge seems ever more dubious.

Why Web Pages Suck

In 2016 I set out to answer a simple question: Why Web Pages Suck.

From the publishers’ perspective, the fixed cost of a printing press not only provided a moat from competition, it also meant that publishers displayed ads on their terms. To use the Conservation of Attractive Profits model that I discussed last week, publishers were exceptionally profitable for having integrated content and ads in this way:

As the description of programmatic advertising should make clear, though, that is no longer the case. Ad spots are effectively black boxes from the publisher perspective, and direct windows to the user from the ad network’s perspective. This has both modularized content and moved ad networks closer to users:

Here’s the simple truth: if you’re competing in a modular market, as today’s publishers are, profits are slim at best, and you generally take what you can get from a revenue perspective. To put it another way, publishers today have about as much bargaining power as do Uber drivers, and we’ve seen how that has gone.

The very next week I would write Aggregation Theory:

The last several articles on Stratechery have formed an unintentional series:

Airbnb and the Internet Revolution described how Airbnb and the sharing economy have commoditized trust, enabling a new business model based on aggregating resources and managing the customer relationship

Netflix and the Conservation of Attractive Profits placed this commodification/aggregation concept into Clay Christensen’s Conservation of Attractive Profits framework, which states that profits are earned by the integrated provider in a value chain, and that profits shift when another company successfully modularizes the incumbent and integrates another part of the value chain

Why Web Pages Suck was primarily about the effect of programmatic advertising on web page performance, but in the conclusion I noted that the way in which ad networks were commoditizing publishers also fit the “Conservation of Attractive Profits” framework

In retrospect, there is a clear thread. In fact, I believe this thread runs through nearly every post on Stratechery, not just the last three. I am calling that thread Aggregation Theory.

In a world of abundance like the web, economic power came from marshaling demand, and that demand was marshaled by being better at discovery, not distribution (after all, distribution was free; that’s why there was so much abundance in the first place!). Entities that controlled demand, then, had power in the value chain, which meant they were best placed to integrate into advertising in particular, leaving everyone else in the value chain as modularized pieces without any meaningful pricing power.

In this world Google and Facebook were the biggest winners — I called them Super Aggregators — but they were different when it came to the suppliers of their respective value chains. Facebook’s content was user-generated, and exclusive to Facebook. What was so compelling about this economically is that user-generated content is free, which meant that Facebook was more fully integrated than Google, which relied on the rest of the web to provide the content that made its search engine useful.

Free AI

The web does, of course, include lots of free content, much of which accrues to Google’s benefit: Wikipedia, Reddit, blogs, etc., are themselves user-generated content but on the open web. Lots of other free content, though, was monetized by ads, produced by publications employing professional writers. This was an inherently difficult business, though, thanks to that free distribution: that meant there was infinite competition, which meant the only route to profitability was continuing to cut costs.

What, then, should we have expected to happen once the world gained the means of generating human-level content at zero marginal cost? From Futurism:

There was nothing in Drew Ortiz’s author biography at Sports Illustrated to suggest that he was anything other than human. “Drew has spent much of his life outdoors, and is excited to guide you through his never-ending list of the best products to keep you from falling to the perils of nature,” it read. “Nowadays, there is rarely a weekend that goes by where Drew isn’t out camping, hiking, or just back on his parents’ farm.”

The only problem? Outside of Sports Illustrated, Drew Ortiz doesn’t seem to exist. He has no social media presence and no publishing history. And even more strangely, his profile photo on Sports Illustrated is for sale on a website that sells AI-generated headshots, where he’s described as “neutral white young-adult male with short brown hair and blue eyes.”…

According to a second person involved in the creation of the Sports Illustrated content who also asked to be kept anonymous, that’s because it’s not just the authors’ headshots that are AI-generated. At least some of the articles themselves, they said, were churned out using AI as well.

What makes this article particularly poignant is the property involved: Sports Illustrated was an icon of the print era; it transitioned to the web somewhat, in a partnership with CNN, but over the last several years it has laid off staff and passed from hand to increasingly unethical hand. Unethical, that is, if you prioritize journalistic integrity over making money: it’s hard to escape the sense that the two are irreconcilable. Journalism costs money, which means an uncompetitive cost structure, and Sports Illustrated isn’t the only one. Continuing from Futurism:

As powerful generative AI tools have debuted over the past few years, many publishers have quickly attempted to use the tech to churn out monetizable content…We caught CNET and Bankrate, both owned by Red Ventures, publishing barely-disclosed AI content that was filled with factual mistakes and even plagiarism; in the ensuing storm of criticism, CNET issued corrections to more than half its AI-generated articles. G/O Media also published AI-generated material on its portfolio of sites, resulting in embarrassing bungles at Gizmodo and The A.V. Club. We caught BuzzFeed publishing slapdash AI-generated travel guides. And USA Today and other Gannett newspapers were busted publishing hilariously garbled AI-generated sports roundups that one of the company’s own sports journalists described as “embarrassing,” saying they “shouldn’t ever” have been published.

These, of course, are the companies that were caught; in time, the AI will become good enough that no one will know what is real and what is not.

Google’s Missing Constraints

This wasn’t the only AI-generated content story of the week, though; this thread on X went viral as well:

We pulled off an SEO heist using AI.

1. Exported a competitor’s sitemap
2. Turned their list of URLs into article titles
3. Created 1,800 articles from those titles at scale using AI

18 months later, we have stolen:

– 3.6M total traffic
– 490K monthly traffic

— Jake Ward (@jakezward) November 24, 2023

That Google faces a challenge with SEO spam is obvious to anyone who uses the search engine. What is notable about this fight is that it is, from a certain perspective, simply too much of a good thing. Google is so important that every site on the Internet works to optimize itself for Google search; in other words, Google’s suppliers are incentivized to work for Google.

That was all fine and good in the early 2000s when Google came to prominence, and content on the Internet was yes, freely distributed, but required significant marginal costs to produce (in time if not in money). What changed is that advertising became sufficiently lucrative that it was worth spending that marginal cost in a systemic way to get more traffic; thus began the cat-and-mouse game that is SEO optimization and Google algorithm updates (which, I should note, have already demoted the site featured in that thread).

AI-generated content, though, will likely push the situation past the breaking point: yes, the amount of money that can be made from advertising by any individual page is continually decreasing, but if pages can be produced for no marginal cost then the number of pages that will be produced is effectively infinite.

This is one update to my thinking: when I wrote AI and the Big Five at the beginning of this year, I expressed the most concern about Google not because I doubted their AI chops, but rather because a chatbot approach seemed to threaten their advertising model:

Google has long been a leader in using machine learning to make its search and other consumer-facing products better (and has offered that technology as a service through Google Cloud). Search, though, has always depended on humans as the ultimate arbiter: Google will provide links, but it is the user that decides which one is the correct one by clicking on it. This extended to ads: Google’s offering was revolutionary because instead of charging advertisers for impressions — the value of which was very difficult to ascertain, particularly 20 years ago — it charged for clicks; the very people the advertisers were trying to reach would decide whether their ads were good enough.

If there aren’t links to click — because you simply got the answer — then the ads won’t be worth as much; what is even worse is if the links are all unreliable. In this view generative AI answers are actually a way out for Google in the long run: if it can no longer trust the web for supply, it will need to integrate backwards into its own.

Social Media Inhumanity

That, then, is the first constraint on the online world that is slipping away: the elimination of marginal cost for content creation. The second has been happening longer, and is represented by TikTok and its assault on Meta’s seemingly impregnable dominance of social media: user-generated content used to be constrained by who you knew, but TikTok (and YouTube) simply surfaced the most compelling content across the entire network. I’ve already written about the potential intersection of these two trends: custom content generated specifically for every user.

There are already examples of AI influencers and Meta itself is experimenting with AI celebrities; one of the fastest growing AI startups, meanwhile, is reportedly character.ai, which lets you interact with your own AI friend. Just last week Roblox CEO David Baszucki spoke favorably to me about the possibility of interactive NPCs helping boost Roblox from not just a gaming platform but to a “3D communications platform.”

Still, as Baszucki made clear, the goal is still actual social networking: surely that will always be better than interacting with an AI! Or will it? It seems to me that perhaps the most important constraint on the web — to actually interact with people as if they are, well, people — disappeared a long time ago. It doesn’t take much time or prominence on X or any other social networking platform to realize that it is nothing like real life, and is only tolerable if you view the entire enterprise as something to be laughed at and, still, occasionally, learned from.

I do strongly believe that an essential quality for success, both on the Internet and off, is to not take social media too seriously. Humans simply weren’t meant to get feedback from thousands or sometimes millions of anonymous strangers all at once; the most successful creators I know are the most wary of getting sucked in to the online maelstrom. One wonders — hopes — that we can someday reach a similar conclusion collectively, and start treating X in particular more like the comments section and less like an assignment editor.

The Current Thing

In this the demise of the ad-supported Internet may be a blessing: the most sustainable model for media to date is subscriptions, and subscriptions mean answering to your subscribers, not social media generally. This isn’t perfect — we end up with never-ending niches that demand a particular point of view from their publications of choice — but it is at least a point of view that is something other than the amorphous rage and current thing-ism that dominates the web. I wrote in an Article about The Current Thing in 2022:

This dynamic is exactly what the meme highlights: sure, the Internet makes possible a wide range of viewpoints — you can absolutely find critics of Black Lives Matter, COVID policies, or pro-Ukraine policies — but the Internet, thanks to its lack of friction and instant feedback loops, also makes nearly every position but the dominant one untenable. If everyone believes one thing, the costs of believing something else increase dramatically, making the consensus opinion the only viable option; this is the same dynamic in which publishers become dependent on Google or Facebook, or retailers on Amazon, just because that is where money can be made.

Again, to be very clear, that does not mean the opinion is wrong; as I noted, I think the resonance of this meme is orthogonal to the rightness of the position it is critiquing, and is instead concerned with the sense that there is something unique about the depth of sentiment surrounding issues that don’t necessarily apply in any real-life way to the people feeling said sentiment.

There was a “current thing” in Ready Player One: the easter egg, and the protagonist’s progress in finding it stirred up worldwide interest. Again, though, this portrayal doesn’t match reality: we don’t have a unitary online world designed by a master architect driving offline interest; we have a churning mass of users absent their humanity coalescing around schelling points that are in many respects incidental to the mass hysteria they produce. The result is out of anyone’s control.

To put it more bluntly, despite the fact my personal and professional life are centered on — and blessed by — the Internet, I’m increasingly skeptical that it can be, as it was in Ready Player One, portrayed as a distinct development from a world increasingly in turmoil. Correlation may not be causation, but sometimes it absolutely is.

In this I do, with reluctance, adopt an accelerationist view of progress; call it r/acc: regretful accelerationism. I suspect we humans do better with constraints; the Internet stripped away the constraint of physical distribution, and now AI is removing the constraint of needing to actually produce content. That this is spoiling the Internet is perhaps the best hope for finding our way back to what is real. Let the virtual world be one of customized content for every individual, with the assumption it is all made-up; some may lose themselves to the algorithm and AI friends, but perhaps more will realize that the only way to survive online is to pay it increasingly little heed.

OpenAI’s Misalignment and Microsoft’s Gain

Monday, November 20, 2023Tuesday, February 6, 2024

I have, as you might expect, authored several versions of this Article, both in my head and on the page, as the most extraordinary weekend of my career has unfolded. To briefly summarize:

On Friday, then-CEO Sam Altman was fired from OpenAI by the board that governs the non-profit; then-President Greg Brockman was removed from the board and subsequently resigned.
Over the weekend rumors surged that Altman was negotiating his return, only for OpenAI to hire former Twitch CEO Emmett Shear as CEO.
Finally, late Sunday night, Satya Nadella announced via tweet that Altman and Brockman, “together with colleagues”, would be joining Microsoft.

This is, quite obviously, a phenomenal outcome for Microsoft. The company already has a perpetual license to all OpenAI IP (short of artificial general intelligence), including source code and model weights; the question was whether it would have the talent to exploit that IP if OpenAI suffered the sort of talent drain that was threatened upon Altman and Brockman’s removal. Indeed they will, as a good portion of that talent seems likely to flow to Microsoft; you can make the case that Microsoft just acquired OpenAI for $0 and zero risk of an antitrust lawsuit.¹

Microsoft’s gain, meanwhile, is OpenAI’s loss, which is dependent on the Redmond-based company for both money and compute: the work its employees will do on AI will either be Microsoft’s by virtue of that perpetual license, or Microsoft’s directly because said employees joined Altman’s team. OpenAI’s trump card is ChatGPT, which is well on its way to achieving the holy grail of tech — an at-scale consumer platform — but if the reporting this weekend is to be believed, OpenAI’s board may have already had second thoughts about the incentives ChapGPT placed on the company (more on this below).

The biggest loss of all, though, is a necessary one: the myth that anything but a for-profit corporation is the right way to organize a company.

OpenAI’s Non-Profit Model

OpenAI was founded in 2015 as a “non-profit intelligence research company.” From the initial blog post:

OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our research is free from financial obligations, we can better focus on a positive human impact. We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as possible. The outcome of this venture is uncertain and the work is difficult, but we believe the goal and the structure are right. We hope this is what matters most to the best in the field.

I was pretty cynical about the motivations of OpenAI’s founders, at least Altman and Elon Musk; I wrote in a Daily Update:

Elon Musk and Sam Altman, who head organizations (Tesla and YCombinator, respectively) that look a lot like the two examples I just described of companies threatened by Google and Facebook’s data advantage, have done exactly that with OpenAI, with the added incentive of making the entire thing a non-profit; I say “incentive” because being a non-profit is almost certainly a lot less about being altruistic and a lot more about the line I highlighted at the beginning: “We hope this is what matters most to the best in the field.” In other words, OpenAI may not have the best data, but at least it has a mission structure that may help idealist researchers sleep better at night. That OpenAI may help balance the playing field for Tesla and YCombinator is, I guess we’re supposed to believe, a happy coincidence.

Whatever Altman and Musk’s motivations, the decision to make OpenAI a non-profit wasn’t just talk: the company is a 501(c)3; you can view their annual IRS filings here. The first question on Form 990 asks the organization to “Briefly describe the organization’s mission or most significant activities”; the first filing in 2016 stated:

OpenAIs goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. We think that artificial intelligence technology will help shape the 21st century, and we want to help the world build safe AI technology and ensure that AI’s benefits are as widely and evenly distributed as possible. Were trying to build AI as part of a larger community, and we want to openly share our plans and capabilities along the way.

Two years later, and the commitment to “openly share our plans and capabilities along the way” was gone; three years after that and the goal of “advanc[ing] digital intelligence” was replaced by “build[ing] general-purpose artificial intelligence”.

In 2018 Musk, according to a Semafor report earlier this year, attempted to take over the company, but was rebuffed; he left the board and, more critically, stopped paying for OpenAI’s operations. That led to the second critical piece of background: faced with the need to pay for massive amounts of compute power, Altman, now firmly in charge of OpenAI, created OpenAI Global, LLC, a capped profit company with Microsoft as minority owner. This image of OpenAI’s current structure is from their website:

OpenAI Global could raise money and, critically to its investors, make it, but it still operated under the auspices of the non-profit and its mission; OpenAI Global’s operating agreement states:

The Company exists to advance OpenAI, Inc.’s mission of ensuring that safe artificial general intelligence is developed and benefits all of humanity. The Company’s duty to this mission and the principles advanced in the OpenAI, Inc. Charter take precedence over any obligation to generate a profit. The Company may never make a profit, and the Company is under no obligation to do so. The Company is free to re-invest any or all of the Company’s cash flow into research and development activities and/or related expenses without any obligation to the Members.

Microsoft, despite this constraint on OpenAI Global, was not only an investor, but also a customer, incorporating OpenAI into all of its products.

ChatGPT Tribes

The third critical piece of background is the most well-known, and what has driven those ambitions to new heights: ChatGPT was released at the end of November 2022, and it has taken the world by storm. Today ChatGPT has over 100 million weekly users and over $1 billion in revenue; it has also fundamentally altered the conversation about AI for nearly every major company and government.

What was most compelling to me, though, was the possibility I noted above, in which ChatGPT becomes the foundation of a new major consumer tech company, the most valuable and most difficult kind of company to build. I wrote earlier this year in The Accidental Consumer Tech Company:

When it comes to meaningful consumer tech companies, the product is actually the most important. The key to consumer products is efficient customer acquisition, which means word-of-mouth and/or network effects; ChatGPT doesn’t really have the latter (yes, it gets feedback), but it has an astronomical amount of the former. Indeed, the product that ChatGPT’s emergence most reminds me of is Google: it simply was better than anything else on the market, which meant it didn’t matter that it came from a couple of university students (the origin stories are not dissimilar!). Moreover, just like Google — and in opposition to Zuckerberg’s obsession with hardware — ChatGPT is so good people find a way to use it. There isn’t even an app! And yet there is now, a mere four months in, a platform.

The platform I was referring to was ChatGPT plugins; it’s a compelling concept with a UI that didn’t quite work, and it was only eight months later at OpenAI’s first developer day that the company announced GPTs, their second take at being a platform. Meanwhile, Altman was reportedly exploring new companies outside of the OpenAI purview to build chips and hardware, apparently without the board’s knowledge. Some combination of these factors, or perhaps something else not yet reported, were the final straw for the board, which, led by Chief Scientist Ilya Sutskever, deposed Altman over the weekend. The Atlantic reported:

Altman’s dismissal by OpenAI’s board on Friday was the culmination of a power struggle between the company’s two ideological extremes—one group born from Silicon Valley techno optimism, energized by rapid commercialization; the other steeped in fears that AI represents an existential risk to humanity and must be controlled with extreme caution. For years, the two sides managed to coexist, with some bumps along the way.

This tenuous equilibrium broke one year ago almost to the day, according to current and former employees, thanks to the release of the very thing that brought OpenAI to global prominence: ChatGPT. From the outside, ChatGPT looked like one of the most successful product launches of all time. It grew faster than any other consumer app in history, and it seemed to single-handedly redefine how millions of people understood the threat — and promise — of automation. But it sent OpenAI in polar-opposite directions, widening and worsening the already present ideological rifts. ChatGPT supercharged the race to create products for profit as it simultaneously heaped unprecedented pressure on the company’s infrastructure and on the employees focused on assessing and mitigating the technology’s risks. This strained the already tense relationship between OpenAI’s factions — which Altman referred to, in a 2019 staff email, as “tribes.”

Altman’s tribe — the one that was making OpenAI into much more of a traditional tech company — is certainly the one that is more familiar to people in tech, including myself. I even had a paragraph in my Article about the developer day keynote that remarked on OpenAI’s transition, that I unfortunately edited out. Here is what I wrote:

It was around this time that I started to, once again, bemoan OpenAI’s bizarre corporate structure. As a long-time Silicon Valley observer it is enjoyable watching OpenAI follow the traditional startup path: the company is clearly in the rapid expansion stage where product managers are suddenly considered useful, as they occupy that sweet spot of finding and delivering low-hanging fruit for an entity that doesn’t yet have the time or moat to tolerate kingdom building and feature creep.

What gives me pause is that the goal is not an IPO, retiring to a yacht, and giving money to causes that do a better job of soothing the guilt of being fabulously rich than actually making the world a better place. There is something about making money and answering to shareholders that holds the more messianic impulses in check; when I hear that Altman doesn’t own any equity in OpenAI that makes me more nervous than relieved. Or maybe I’m just biased because I won’t have S-1s or 10-Ks to analyze.

Obviously I regret the edit, but then again, I didn’t realize how prescient my underlying nervousness about OpenAI’s structure would prove to be, largely because I clearly wasn’t worried enough.

Microsoft vs. the Board

Much of the discussion on tech Twitter over the weekend has been shock that a board would incinerate so much value. First off, Altman is one of the Valley’s most-connected executives, and a prolific fund-raiser and dealmaker; second is the fact that several OpenAI employees already resigned, and more are expected to follow in the coming days. OpenAI may have had two tribes previously; it’s reasonable to assume that going forward it will only have one, led by a new CEO in Shear who puts the probability of AI doom at between 5 and 50 percent and has advocated a significant slowdown in development.

Here’s the reality of the matter, though: whether or not you agree with the Sutskever/Shear tribe, the board’s charter and responsibility is not to make money. This is not a for-profit corporation with a fiduciary duty to its shareholders; indeed, as I laid out above, OpenAI’s charter specifically states that it is “unconstrained by a need to generate financial return”. From that perspective the board is in fact doing its job, as counterintuitive as that may seem: to the extent the board believes that Altman and his tribe were not “build[ing] general-purpose artificial intelligence that benefits humanity” it is empowered to fire him; they do, and so they did.

This gets at the irony in my concern about the company’s non-profit status: I was worried about Altman being unconstrained by the need to make money or the danger of having someone in charge without a financial stake in the outcome, when in fact it was those same factors that cost him his job. More broadly, my criticism was insufficiently expansive because philosophical concerns about unconstrained power pale — at least in the case of business analysis, Stratechery’s core competency — in the face of how much this structure made OpenAI a fundamentally unstable entity to make deals with. This refers, of course, to Microsoft, and as someone who has been a proponent of Satya Nadella’s leadership, I have to admit that my analysis of the company’s partnership with OpenAI was lacking.

Microsoft had, to its tremendous short-term benefit, bet a substantial portion of its future on its OpenAI partnership. This goes beyond money, which Microsoft has plenty of, and much of which it hasn’t yet paid out (or granted in terms of Azure credits); OpenAI’s technology is built into a whole host of Microsoft’s products, from Windows to Office to ones most people have never heard of (I see you Dynamics CRM nerds!). Microsoft is also investing massively in infrastructure that is custom-built for OpenAI — Nadella has been touting the financial advantages of specialization — and has just released a custom chip that was tuned for running OpenAI models. That this level of commitment was made to an entity not motivated by profit, and thus un-beholden to Microsoft’s status as an investor and revenue driver, now seems absurd.

Or, rather, it did, until Nadella tweeted the following at 11:53pm Pacific:

Satya Nadella's tweet announcing the hiring of Sam Altman

The counter to the argument I just put forth about Microsoft’s poor decision to partner with a non-profit is the reality of AI development, specifically the need for massive amounts of compute. It was the need for this compute that led OpenAI, which had barred itself from making a traditional venture capital deal, to surrender their IP to Microsoft in exchange for Azure credits. In other words, while the board may have had the charter of a non-profit, and an admirable willingness to act on and stick to their convictions, they ultimately had no leverage because they weren’t a for-profit company with the capital to be truly independent.

The end result is that an entity committed by charter to the safe development of AI has basically handed off all of its work and, probably soon enough, a sizable portion of its talent, to one of the largest for-profit entities on earth. Or, in an AI-relevant framing, the structure of OpenAI was ultimately misaligned with fulfilling its stated mission. Trying to organize incentives by fiat simply doesn’t account for all of the possible scenarios and variable at play in a dynamic situation; harvesting self-interest has, for good reason, long been the best way to align individuals and companies.

Altman Questions

There is one other angle of the board’s actions that ought to be acknowledged: it very well could have been for cause. I endorse Eric Newcomer’s thoughtful column on his eponymous Substack:

In its statement, the board said it had concluded Altman, “was not consistently candid in his communications with the board.” We shouldn’t let poor public messaging blind us from the fact that Altman has lost confidence of the board that was supposed to legitimize OpenAI’s integrity…

My understanding is that some members of the board genuinely felt Altman was dishonest and unreliable in his communications with them, sources tell me. Some members of the board believe that they couldn’t oversee the company because they couldn’t believe what Altman was saying. And yet, the existence of a nonprofit board was a key justification for OpenAI’s supposed trustworthiness.

I don’t think any of us really knows enough right now to urge the board to make a hasty decision. I want you to consider a couple things here:

Newcomer notes the board’s charter that I referenced above, the fact that Anthropic’s founders felt it necessary to leave OpenAI in the first place, Musk’s antipathy towards Altman, and Altman’s still somewhat murky and unexplained exit from YCombinator. Newcomer concludes:

I’m sure that writing this cautionary letter will not make me popular in many corners of Silicon Valley. But I think we should just slow down and get more facts. If OpenAI leads us to artificial general intelligence or anywhere close, we will want to have taken the time to think for more than a weekend about who we want to take us there…

Altman had been given a lot of power, the cloak of a nonprofit, and a glowing public profile that exceeds his more mixed private reputation. He lost the trust of his board. We should take that seriously.

Perhaps I am feeling a bit humbled by the aforementioned miss in my Microsoft analysis — much less my shock at the late night reversal in fortunes — but I will note that I have staked my claim in opposition to AI doomers and the call for regulation; to that end, I am wary of a narrative that confirms my priors about what drove the events of this weekend. And, I would note, I remain concerned about the philosophical question of executives who seek to control incredible capabilities without skin in the game.

To that end, a startup ecosystem fixture like Altman going to work for Microsoft is certainly surprising: that Microsoft is the one place that retains access to OpenAI’s IP, and can combine that with effectively unlimited funding and GPU access, certainly adds credence to the narrative that power over AI is Altman’s primary motivation.

The Altered Landscape

What is clear is that Altman and Microsoft are in the driver seat of AI. Microsoft has the IP and will soon have the team to combine with its cash and infrastructure, while shedding coordination problems inherent in their partnership with OpenAI previously (and, of course, they are still partners with OpenAI!).

I’ve also argued for a while that it made more sense for external companies to build on Azure’s API rather than OpenAI’s; Microsoft is a development platform by nature, whereas OpenAI is fun and exciting but likely to clone your functionality or deprecate old APIs. Now the choice is even more obvious. And, from the Microsoft side, this removes a major reason for enterprise customers, already accustomed to evaluating long-term risks, to avoid Azure because of the OpenAI dependency; Microsoft now owns the full stack.

Google, meanwhile, might need to make some significant changes; the company’s latest model, Gemini, has been delayed, and its Cloud business has been slowing as spending shifts to AI, the exact opposite outcome the company had hoped for. How long will the company’s founders and shareholders tolerate the perception that the company is moving too slow, particularly in comparison to the nimbleness and willingness to take risks demonstrated by Microsoft?

That leaves Anthropic, which looked like a big winner 12 hours ago, and now feels increasingly tenuous as a standalone entity. The company has struck partnership deals with both Google and Amazon, but it is now facing a competitor in Microsoft with effectively unlimited funds and GPU access; it’s hard not to escape the sense that it makes sense as a part of AWS (and yes, B corps can be acquired, with considerably more ease than a non-profit).

Ultimately, though, one could make the argument that not much has changed at all: it has been apparent for a while that AI was, at least in the short to medium-term, a sustaining innovation, not a disruptive one, which is to say it would primarily benefit and be deployed by the biggest companies. The costs are so high that it’s hard for anyone else to get the money, and that’s even before you consider questions around channel and customer acquisition. If there were a company poised to join the ranks of the Big Five it was OpenAI, thanks to ChatGPT, but that seems less likely now (but not impossible). This, in the end, was Nadella’s insight: the key to winning if you are big is not to invent like a startup, but to leverage your size to acquire or fast-follow them; all the better if you can do it for the low price of $0.

I wrote a follow-up to this Article in this Daily Update.

Microsoft’s original agreement with OpenAI also barred Microsoft from pursuing AGI based on OpenAI tech on its own; my understanding is that this clause was removed in the most recent agreement ↩

The OpenAI Keynote

Tuesday, November 7, 2023Tuesday, March 19, 2024

This Article is available as a video essay on YouTube

In 2013, when I started Stratechery, there was no bigger event than the launch of the new iPhone; its only rival was Google I/O, which is when the newest version of Android was unveiled (hardware always breaks the tie, including with Apple’s iOS introductions at WWDC). It wasn’t just that smartphones were relatively new and still adding critical features, but that the strategic decisions and ultimate fates of the platforms were still an open question. More than that, the entire future of the tech industry was clearly tied up in said platforms and their corresponding operating systems and devices; how could keynotes not be a big deal?

Fast forward a decade and the tech keynote has diminished in importance and, in the case of Apple, disappeared completely, replaced by a pre-recorded marketing video. I want to be mad about it, but it makes sense: an iPhone introduction has been diminished not by Apple’s presentation, but rather Apple’s presentations reflect the reality that the most important questions around an iPhone are about marketing tactics. How do you segment the iPhone line? How do you price? What sort of brand affinity are you seeking to build? There, I just summarized the iPhone 15 introduction, and the reality that the smartphone era — The End of the Beginning — is over as far as strategic considerations are concerned. iOS and Android are a given, but what is next and yet unknown?

The answer is, clearly, AI, but even there, the energy seems muted: Apple hasn’t talked about generative AI other than to assure investors on earnings calls that they are working on it; Google I/O was of course about AI, but mostly in the context of Google’s own products — few of which have actually shipped — and my Article at the time was quickly side-tracked into philosophical discussions about both the nature of AI innovation (sustaining versus disruptive), the question of tech revolution versus alignment, and a preview of the coming battles of regulation that arrived with last week’s Executive Order on AI.

Meta’s Connect keynote was much more interesting: not only were AI characters being added to Meta’s social networks, but next year you will be able to take AI with you via Smart Glasses (I told you hardware was interesting!). Nothing, though, seemed to match the energy around yesterday’s OpenAI developer conference, their first ever: there is nothing more interesting in tech than a consumer product with product-market fit. And that, for me, is enough to bring back an old Stratechery standby: the keynote day-after.

Keynote Metaphysics and GPT-4 Turbo

This was, first and foremost, a really good keynote, in the keynote-as-artifact sense. CEO Sam Altman, in a humorous exchange with Microsoft CEO Satya Nadella, promised, “I won’t take too much of your time”; never mind that Nadella was presumably in San Francisco just for this event: in this case he stood in for the audience who witnessed a presentation that was tight, with content that was interesting, leaving them with a desire to learn more.

Altman himself had a good stage presence, with the sort of nervous energy that is only present in a live keynote; the fact he never seemed to know which side of the stage a fellow presenter was coming from was humanizing. Meanwhile, the live demos not only went off without a hitch, but leveraged the fact that they were live: in one instance a presenter instructed a GPT she created to text Altman; he held up his phone to show he got the message. In another a GPT randomly selected five members of the audience to receive $500 in OpenAI API credits, only to then extend it to everyone.

New products and features, meanwhile, were available “today”, not weeks or months in the future, as is increasingly the case for events like I/O or WWDC; everything combined to give a palpable sense of progress and excitement, which, when it comes to AI, is mostly true.

GPT-4 Turbo is an excellent example of what I mean by “mostly”. The API consists of six new features:

Increased context length
More control, specifically in terms of model inputs and outputs
Better knowledge, which both means updating the cut-off date for knowledge about the world to April 2023 and providing the ability for developers to easily add their own knowledge base
New modalities, as DALL-E 3, Vision, and TTS (text-to-speech) will all be included in the API, with a new version of Whisper speech recognition coming.
Customization, including fine-tuning, and custom models (which, Altman warned, won’t be cheap)
Higher rate limits

This is, to be clear, still the same foundational model (GPT-4); these features just make the API more usable, both in terms of features and also performance. It also speaks to how OpenAI is becoming more of a product company, with iterative enhancements of its core functionality. Yes, the mission still remains AGI (artificial general intelligence), and the core scientific team is almost certainly working on GPT-5, but Altman and team aren’t just tossing models over the wall for the rest of the industry to figure out.

Price and Microsoft

The next “feature” was tied into the GPT-4 Turbo introduction: the API is getting cheaper (3x cheaper for input tokens, and 2x cheaper for output tokens). Unsurprisingly this announcement elicited cheers from the developers in attendance; what I cheered as an analyst was Altman’s clear articulation of the company’s priorities: lower price first, speed later. You can certainly debate whether that is the right set of priorities (I think it is, because the biggest need now is for increased experimentation, not optimization), but what I appreciated was the clarity.

It’s also appropriate that the segment after that was the brief “interview” with Nadella: OpenAI’s pricing is ultimately a function of Microsoft’s ability to build the infrastructure to support that pricing. Nadella actually explained how Microsoft is accomplishing that on the company’s most recent earnings call:

It is true that the approach we have taken is a full stack approach all the way from whether it’s ChatGPT or Bing Chat or all our Copilots, all share the same model. So in some sense, one of the things that we do have is very, very high leverage of the one model that we used, which we trained, and then the one model that we are doing inferencing at scale. And that advantage sort of trickles down all the way to both utilization internally, utilization of third parties, and also over time, you can see the sort of stack optimization all the way to the silicon, because the abstraction layer to which the developers are riding is much higher up than low-level kernels, if you will.

So, therefore, I think there is a fundamental approach we took, which was a technical approach of saying we’ll have Copilots and Copilot stack all available. That doesn’t mean we don’t have people doing training for open source models or proprietary models. We also have a bunch of open source models. We have a bunch of fine-tuning happening, a bunch of RLHF happening. So there’s all kinds of ways people use it. But the thing is, we have scale leverage of one large model that was trained and one large model that’s being used for inference across all our first-party SaaS apps, as well as our API in our Azure AI service…

The lesson learned from the cloud side is — we’re not running a conglomerate of different businesses, it’s all one tech stack up and down Microsoft’s portfolio, and that, I think, is going to be very important because that discipline, given what the spend like — it will look like for this AI transition any business that’s not disciplined about their capital spend accruing across all their businesses could run into trouble.

The fact that Microsoft is benefiting from OpenAI is obvious; what this makes clear is that OpenAI uniquely benefits from Microsoft as well, in a way they would not from another cloud provider: because Microsoft is also a product company investing in the infrastructure to run OpenAI’s models for said products, it can afford to optimize and invest ahead of usage in a way that OpenAI alone, even with the support of another cloud provider, could not. In this case that is paying off in developers needing to pay less, or, ideally, have more latitude to discover use cases that result in them paying far more because usage is exploding.

GPTs and Computers

I mentioned GPTs before; you were probably confused, because this is a name that is either brilliant or a total disaster. Of course you could have said the same about ChatGPT: massive consumer uptake has a way of making arguably poor choices great ones in retrospect, and I can see why OpenAI is seeking to basically brand “GPT” — generative pre-trained transformer — as an OpenAI chatbot.

Regardless, this was how Altman explains GPTs:

GPTs are tailored version of ChatGPT for a specific purpose. You can build a GPT — a customized version of ChatGPT — for almost anything, with instructions, expanded knowledge, and actions, and then you can publish it for others to use. And because they combine instructions, expanded knowledge, and actions, they can be more helpful to you. They can work better in many contexts, and they can give you better control. They’ll make it easier for you accomplish all sorts of tasks or just have more fun, and you’ll be able to use them right within ChatGPT. You can, in effect, program a GPT, with language, just by talking to it. It’s easy to customize the behavior so that it fits what you want. This makes building them very accessible, and it gives agency to everyone.

We’re going to show you what GPTs are, how to use them, how to build them, and then we’re going to talk about how they’ll be distributed and discovered. And then after that, for developers, we’re going to show you how to build these agent-like experiences into your own apps.

Altman’s examples included a lesson-planning GPT from Code.org and a natural language vision design GPT from Canva. As Altman noted, the second example might have seemed familiar: Canva had a plugin for ChatGPT, and Altman explained that “we’ve evolved our plugins to be custom actions for GPTs.”

I found the plugin concept fascinating and a useful way to understand both the capabilities and limits of large language models; I wrote in ChatGPT Gets a Computer:

The implication of this approach is that computers are deterministic: if circuit X is open, then the proposition represented by X is true; 1 plus 1 is always 2; clicking “back” on your browser will exit this page. There are, of course, a huge number of abstractions and massive amounts of logic between an individual transistor and any action we might take with a computer — and an effectively infinite number of places for bugs — but the appropriate mental model for a computer is that they do exactly what they are told (indeed, a bug is not the computer making a mistake, but rather a manifestation of the programmer telling the computer to do the wrong thing)…

Large language models, though, with their probabilistic approach, are in many domains shockingly intuitive, and yet can hallucinate and are downright terrible at math; that is why the most compelling plug-in OpenAI launched was from Wolfram|Alpha. Stephen Wolfram explained:

For decades there’s been a dichotomy in thinking about AI between “statistical approaches” of the kind ChatGPT uses, and “symbolic approaches” that are in effect the starting point for Wolfram|Alpha. But now—thanks to the success of ChatGPT—as well as all the work we’ve done in making Wolfram|Alpha understand natural language—there’s finally the opportunity to combine these to make something much stronger than either could ever achieve on their own.

That is the exact combination that happened, which led to the title of that Article:

The fact this works so well is itself a testament to what Assistant AI’s are, and are not: they are not computing as we have previously understood it; they are shockingly human in their way of “thinking” and communicating. And frankly, I would have had a hard time solving those three questions as well — that’s what computers are for! And now ChatGPT has a computer of its own.

I still think the concept was incredibly elegant, but there was just one problem: the user interface was terrible. You had to get a plugin from the “marketplace”, then pre-select it before you began a conversation, and only then would you get workable results after a too-long process where ChatGPT negotiated with the plugin provider in question on the answer.

This new model somewhat alleviates the problem: now, instead of having to select the correct plug-in (and thus restart your chat), you simply go directly to the GPT in question. In other words, if I want to create a poster, I don’t enable the Canva plugin in ChatGPT, I go to Canva GPT in the sidebar. Notice that this doesn’t actually solve the problem of needing to have selected the right tool; what it does do is make the choice more apparent to the user at a more appropriate stage in the process, and that’s no small thing. I also suspect that GPTs will be much faster than plug-ins, given they are integrated from the get-go. Finally, standalone GPTs are a much better fit with the store model that OpenAI is trying to develop.

Still, there is a better way: Altman demoed it.

ChatGPT and the Universal Interface

Before Altman introduced the aforementioned GPTs he talked about improvements to ChatGPT:

Even though this is a developer conference, we can’t help resist making some improvements to ChatGPT. A small one, ChatGPT now uses GPT-4 Turbo, with all of the latest improvements, including the latest cut-off, which we’ll continue to update — that’s all live today. It can now browse the web when it needs to, write and run code, analyze data, generate images, and much more, and we heard your feedback that that model picker was extremely annoying: that is gone, starting today. You will not have to click around a drop-down menu. All of this will just work together. ChatGPT will just know what to use and when you need it. But that’s not the main thing.

You may wonder why I put this section after GPTs, given they were, according to Altman, the main thing: it’s because I think this feature enhancement is actually much more important. As I just noted, GPTs are a somewhat better UI on an elegant plugin concept, in which a probabilisitic large language model gets access to a deterministic computer. The best UI, though, is no UI at all, or rather, just one UI, by which I mean “Universal Interface”.

In this case “browsing” or “image generation” are basically plug-ins: they are specialized capabilities that, before today, you had to explicitly invoke; going forward they will just work. ChatGPT will seamlessly switch between text generation, image generation, and web browsing, without the user needing to change context. What is necessary for the plug-in/GPT idea to ultimately take root is for the same capabilities to be extended broadly: if my conversation involved math, ChatGPT should know to use Wolfram|Alpha on its own, without me adding the plug-in or going to a specialized GPT.

I can understand why this capability doesn’t yet exist: the obvious technical challenges of properly exposing capabilities and training the model to know when to invoke those capabilities are a textbook example of Professor Clayton Christensen’s theory of integration and modularity, wherein integration works better when a product isn’t good enough; it is only when a product exceeds expectation that there is room for standardization and modularity. To that end, ChatGPT is only now getting the capability to generate an image without the mode being selected for it: I expect the ability to seek out less obvious tools will be fairly difficult.

In fact, it’s possible that the entire plug-in/GPT approach ends up being a dead-end; towards the end of the keynote Romain Huet, the head of developer experience at OpenAI, explicitly demonstrated ChatGPT programming a computer. The scenario was splitting the tab for an Airbnb in Paris:

Code Interpreter is now available today in the API as well. That gives the AI the ability to write and generate code on the file, or even to generate files. So let’s see that in action. If I say here, “Hey, we’ll be 4 friends staying at this Airbnb, what’s my share of it plus my flights?”

Now here what’s happening is that Code Interpreter noticed that it should write some code to answer this query so now it’s computing the number of days in Paris, the number of friends, it’s also doing some exchange rate calculation behind the scene to get this answer for us. Not the most complex math, but you get the picture: imagine you’re building a very complex finance app that’s counting countless numbers, plotting charts, really any tasks you might tackle with code, then Code Interpreter will work great.

Uhm, what tasks do you not tackle with code? To be fair, Huet is referring to fairly simple math-oriented tasks, not the wholesale recreation of every app on the Internet, but it is interesting to consider for which problems ChatGPT will gain the wisdom to choose the right tool, and for which it will simply brute force a new solution; the history of computing would actually give the latter a higher probability: there are a lot of problems that were solved less with clever algorithms and more with the application of Moore’s Law.

Consumers and Hardware

Speaking of the first year of Stratechery, that is when I first wrote about integration and modularization, in What Clayton Christensen Got Wrong; as the title suggests I didn’t think the theory was universal:

Christensen himself laid out his theory’s primary flaw in the first quote excerpted above (from 2006):

You also see it in aircrafts and software, and medical devices, and over and over.

That is the problem: Consumers don’t buy aircraft, software, or medical devices. Businesses do.

Christensen’s theory is based on examples drawn from buying decisions made by businesses, not consumers. The reason this matters is that the theory of low-end disruption presumes:

Buyers are rational

Every attribute that matters can be documented and measured

Modular providers can become “good enough” on all the attributes that matter to the buyers

All three of the assumptions fail in the consumer market, and this, ultimately, is why Christensen’s theory fails as well. Let me take each one in turn:

To summarize the argument, consumers care about things in ways that are inconsistent with whatever price you might attach to their utility, they prioritize ease-of-use, and they care about the quality of the user experience and are thus especially bothered by the seams inherent in a modular solution. This means that integrated solutions win because nothing is ever “good enough”; as I noted in the context of Amazon, Divine Discontent is Disruption’s Antidote:

Bezos’s letter, though, reveals another advantage of focusing on customers: it makes it impossible to overshoot. When I wrote that piece five years ago, I was thinking of the opportunity provided by a focus on the user experience as if it were an asymptote: one could get ever closer to the ultimate user experience, but never achieve it:

In fact, though, consumer expectations are not static: they are, as Bezos’ memorably states, “divinely discontent”. What is amazing today is table stakes tomorrow, and, perhaps surprisingly, that makes for a tremendous business opportunity: if your company is predicated on delivering the best possible experience for consumers, then your company will never achieve its goal.

In the case of Amazon, that this unattainable and ever-changing objective is embedded in the company’s culture is, in conjunction with the company’s demonstrated ability to spin up new businesses on the profits of established ones, a sort of perpetual motion machine.

I see no reason why both Articles wouldn’t apply to ChatGPT: while I might make the argument that hallucination is, in a certain light, a feature not a bug, the fact of the matter is that a lot of people use ChatGPT for information despite the fact it has a well-documented flaw when it comes to the truth; that flaw is acceptable, because to the customer ease-of-use is worth the loss of accuracy. Or look at plug-ins: the concept as originally implemented has already been abandoned, because the complexity in the user interface was more detrimental than whatever utility might have been possible. It seems likely this pattern will continue: of course customers will say that they want accuracy and 3rd-party tools; their actions will continue to demonstrate that convenience and ease-of-use matter most.

This has two implications. First, while this may have been OpenAI’s first developer conference, I remain unconvinced that OpenAI is going to ever be a true developer-focused company. I think that was Altman’s plan, but reality in the form of ChatGPT intervened: ChatGPT is the most important consumer-facing product since the iPhone, making OpenAI The Accidental Consumer Tech Company. That, by extension, means that integration will continue to matter more than modularization, which is great for Microsoft’s compute stack and maybe less exciting for developers.

Second, there remains one massive patch of friction in using ChatGPT; from AI, Hardware, and Virtual Reality:

AI is truly something new and revolutionary and capable of being something more than just a homework aid, but I don’t think the existing interfaces are the right ones. Talking to ChatGPT is better than typing, but I still have to launch the app and set the mode; vision is an amazing capability, but it requires even more intent and friction to invoke. I could see a scenario where Meta’s AI is inferior technically to OpenAI, but more useful simply because it comes in a better form factor.

After highlighting some news stories about OpenAI potentially partnering with Jony Ive to build hardware, I concluded:

There are obviously many steps before a potential hardware product, including actually agreeing to build one. And there is, of course, the fact that Apple and Google already make devices everyone carries, with the latter in particular investing heavily in its own AI capabilities; betting on the hardware in market winning the hardware opportunity in AI is the safest bet. That may not be a reason for either OpenAI or Meta to abandon their efforts, though: waging a hardware battle against Google and Apple would be difficult, but it might be even worse to be “just an app” if the full realization of AI’s capabilities depend on fully removing human friction from the process.

This is the implication of a Universal Interface, which ChatGPT is striving to be: it also requires universal access, and that will always be a challenge for any company that is “just an app.” Yes, as I noted, the odds seem long, thanks to Apple and Google’s dominance, but I think there is an outside chance that the paradigm-shifting keynote is only just beginning its comeback.

Attenuating Innovation (AI)

Wednesday, November 1, 2023Tuesday, March 19, 2024

This Article is available as a video essay on YouTube

In 2019, a very animated Bill Gates explained to Andrew Ross Sorkin why Microsoft lost mobile:

There’s no doubt that the antitrust lawsuit was bad for Microsoft. We would have been more focused on creating the phone operating system so that instead of using Android today, you would be using Windows Mobile. If it hadn’t been for the antitrust case, Microsoft would have…

You’re convinced?

Oh we were so close. I was just too distracted. I screwed that up because of the distraction. We were just three months too late with a release that Motorola would have used on a phone, so yes, it’s a winner-take-all game, that is for sure. Now nobody here has ever heard of Windows Mobile, but oh well. That’s a few hundred billion here or there.

This opinion is, to use a technical term favored by analysts, bullshit. Windows Mobile wasn’t three months late relative to Android; Windows Mobile launched as the Pocket PC 2000 operating system in, you guessed it, 2000, a full eight years before the first Android device hit the market.

The issue with Windows Mobile was, first and foremost, Gates himself: in his view of the world the Windows-based PC was the center of a user’s computing life, and the phone a satellite; small wonder that Windows Mobile looked and operated like a shrunken-down version of Windows: there was a Start button, and Windows Mobile 2003, the first version to have the “Windows Mobile” name, even had the same Sonoma Valley wallpaper as Windows XP:

If anything, the problem with Windows Mobile is that it was too early: Android, which originally looked like a Blackberry, had the benefit of copying the iPhone; the iPhone, in stark contrast to Windows Mobile, looked nothing like the Mac, despite sharing the same internals. Instead, Steve Jobs and company started with a new interface paradigm — multi-touch — and developed a user interface that was actually suited to a handheld device. Jobs — appropriately! — called it revolutionary.

Fast forward four months from the iPhone introduction, and Jobs and Gates were together on stage for the D5 Conference, and Gates still didn’t get it; when Walt Mossberg asked him about what devices we would be using in five years, Gates still had a Windows device at the center:

I don’t think you’ll have one device. I think you’ll have a full-screen device that you can carry around and you’ll do dramatically more reading off of that. I believe in the tablet form factor. I think you’ll have voice, I think you’ll have ink, I think you’ll have some way of having a hardware keyboard and some settings for that. And then you’ll have the device that fits in your pocket which the whole notion of how much function should you combine in there, there’s navigation computers, there’s media, there’s phone, technology is letting us put more things in there but then again, you really want to tune it so people what they expect. So there’s quite a bit of experimentation in that pocket-sized device. But I think those are natural form factors. We’ll have the evolution of the portable machine, and the evolution of the phone, will both be extremely high volume, complementary, that is if you own one you’re more likely to own the other.

In fact, in five years worldwide smartphone sales would total 700 million units, more than doubling the 348.7 million PCs that shipped that same year; yes, a lot of those smartphone sales went to people who already had PCs, but it was already apparent that for huge swathes of people — including in developed countries — the phone was the only device that you needed.

What is even more fascinating about this conversation, though, is the way in which it illustrated how Jobs and Apple were able to invent the future, while Microsoft utterly missed it.

Mossberg asked:

The core functions of the device form factor formerly known as the cellphone, whatever we want to call it — the pocket device — what would you say the core functions are five years out?

Gates’ answer was redolent of so many experts trying to predict the future: he had some ideas and some inside knowledge of new technology, but no real vision of what might come next:

How quickly all these things that have been somewhat specialized — the navigation device, the digital wallet, the phone, the camera, the video camera — how quickly those all come together, that’s hard to chart out, but eventually you’ll be able to make something that has the capability to do every one of those things. And yet given the small size, you still won’t want to edit your homework or edit a movie on a screen of that size, and so you’ll have something else that lets you do the reading and editing and those things. Now if we could ever get a screen that would just roll out like a scroll, then you might be able to have the device that did everything.

After a back-and-forth about e-ink and projection screens, Mossberg asked Jobs the same question, and his answer was profound:

I don’t know.

The reason I don’t know is because I wouldn’t have thought that there would have been maps on it five years ago. But something comes along, gets really popular, people love it, get used to it, you want it on there. People are inventing things constantly and I think the art of it is balancing what’s on there and what’s not — it’s the editing function.

That right there is the recipe for genuine innovation:

Embrace uncertainty and the fact one doesn’t know the future.
Understand that people are inventing things — and not just technologies, but also use cases — constantly.
Remember that the art comes in editing after the invention, not before.

To be like Gates and Microsoft is to do the opposite: to think that you know the future; to assume you know what technologies and applications are coming; to proscribe what people will do or not do ahead of time. It is a mindset that does not accelerate innovation, but rather attenuates it.

A Cynical Read on AI Alarm

Last week in a Stratechery Interview with Gregory Allen about the chip ban we discussed why Washington D.C. suddenly had so much urgency about AI. The first reason was of course ChatGPT; it was the second, though, that set off alarm bells in my head. Here’s Allen:

The other thing that’s happened that I do think is important just for folks to understand is, that Center for AI Safety letter that came out, that was signed by Sam Altman, that was signed by a bunch of other folks that said, “The risks of AI, including the risks of human extinction, should be viewed in the same light as nuclear weapons and pandemics.” The list of signatories to that letter was quite illustrious and quite long, and it’s really difficult to overstate the impact that that letter had on Washington, D. C. When you have the CEO of all these companies…when you get that kind of roster saying, “When you think of my technology, think of nuclear weapons,” you definitely get Washington’s attention.

It turns out you get more than that: on Monday the Biden administration released an Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. This Executive Order goes far beyond setting up a commission or study about AI, a field that is obviously still under rapid development; instead it goes straight to proscription.

Before I get to the executive order, though, I want to go back to Gates: that video at the top, where he blamed the Department of Justice for Microsoft having missed mobile, was the first thing I thought of during my interview with Allen. The fact of the matter is that Gates is the single most unreliable narrator about why Microsoft missed mobile, precisely because he was so intimately involved in the effort.

By the time that interview happened in 2019, it was obvious to everyone that Microsoft had utterly failed in mobile, and that it cost the company billions of dollars along the way. It is exceptionally difficult, particularly for someone as intelligent and successful as Gates, to admit the obvious truth: Microsoft missed mobile because Microsoft approached the space with the entirely wrong paradigm in mind. Or, to be more blunt, Gates got it wrong. It is much easier to blame someone else than to face that failure, particularly when the federal government is sitting right there!

In short, it is always necessary to carefully examine the motivations of a self-interested actor, and that certainly applies to the letter Allen referenced.

To rewind just a bit, last January I wrote AI and the Big Five, which posited that the initial wave of generative AI would largely benefit the dominant tech companies. Apple’s strategy was unclear, but it controlled the devices via which AI would be accessed, and had the potential to benefit even more if AI could be run locally. Amazon had AWS, which held much of the data over which companies might wish to apply AI, but also lacked its own foundational models. Google likely had the greatest capabilities, but also the greatest business model challenges. Meta controlled the apps through which consumers might be most likely to encounter AI generated content. Microsoft, meanwhile, thanks to its partnership with OpenAI, was the best placed to ride the initial wave generated by ChatGPT.

Nine months later and the Article holds up well: Apple is releasing ever more powerful devices, but still lacks a clear strategy; Amazon spent its last earnings call trying to convince investors that AI applications would come to their data, and talking up its partnership with Anthropic, OpenAI’s biggest competitor; Google has demonstrated great technology but has been slow to ship; Meta is pushing ahead with generative AI in its apps; and Microsoft is actually registering meaningful financial impact from its OpenAI partnership.

With this as context, it’s interesting to consider who signed that letter Allen referred to, which stated:

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

There are 30 signatories from OpenAI, including the aforementioned CEO Sam Altman. There are 15 signatories from Anthropic, including CEO Dario Amodei. There are seven signatories from Microsoft, including CTO Kevin Scott. There are 81 signatories from Google, including Google DeepMind CEO Demis Hassabis. There are none from Apple or Amazon, and two low-level employees from Meta.

What is striking about this tally is the extent to which the totals and prominence align to the relative companies’ current position in the market. OpenAI has the lead, at least in terms of consumer and developer mindshare, and the company is deriving real revenue from ChatGPT; Anthropic is second, and has signed deals with both Google and Amazon. Google has great products and an internal paralysis around shipping them for business model reasons; urging caution is very much in their interest. Microsoft is in the middle: it is making money from AI, but it doesn’t control its own models; Apple and Amazon are both waiting for the market to come to them.

In this ultra-cynical analysis the biggest surprise is probably Meta: the company has its own models, but no one of prominence has signed. These models, though, have been gradually open-sourced: Meta is betting on distributed innovation to generate value that will best be captured via the consumer touchpoints the the company controls.

The point is this: if you accept the premise that regulation locks in incumbents, then it sure is notable that the early AI winners seem the most invested in generating alarm in Washington, D.C. about AI. This despite the fact that their concern is apparently not sufficiently high to, you know, stop their work. No, they are the responsible ones, the ones who care enough to call for regulation; all the better if concerns about imagined harms kneecap inevitable competitors.

An Executive Order on Attenuating Innovation

There is another quote I thought of this week. It was delivered by Senator Amy Klobuchar in a tweet:

I wrote at the time in an Update:

In 1991 — assuming that the “dawn of the Internet” was the launch of the World Wide Web — the following were the biggest companies by market cap:

$88 billion — General Electric

$80 billion — Exxon Mobil

$62 billion — Walmart

$54 billion — Coca-Cola

$42 billion — Merck

The only tech company in the top 10 was IBM, with a $31 billion market cap. Imagine proposing a bill then targeting companies with greater than $550 billion market caps, knowing that it is nothing but tech companies!

What doesn’t occur to Senator Klobuchar is the possibility that the relationship between the massive increase in wealth, and even greater gain in consumer welfare, produced by tech companies since the “dawn of the Internet” may in fact be related to the fact that there hasn’t been any major regulation (the most important piece of regulation, Section 230, protected the Internet from lawsuits; this legislation invites them). I’m not saying that the lack of regulation is causal, but I am exceptionally skeptical that we would have had more growth with more regulation.

More broadly, tech sure seems like the only area where innovation and building is happening anywhere in the West. This isn’t to deny that the big tech companies aren’t sometimes bad actors, and that platforms in particular do, at least in theory, need regulation. But given the sclerosis present everywhere but tech it sure seems like it would be prudent to be exceptionally skeptical about the prospect of new regulation; I definitely wouldn’t be celebrating it as if it were some sort of overdue accomplishment.

Unfortunately this week’s Executive Order takes the exact opposite approach to AI that we took to technology previously. As Steven Sinofsky explains in this excellent article:

This document is the work of aggregating policy inputs from an extended committee of interested constituencies while also navigating the law — literally what is it that can be done to throttle artificial intelligence legally without passing any new laws that might throttle artificial intelligence. There is no clear owner of this document. There is no leading science consensus or direction that we can discern. It is impossible to separate out the document from the process and approach used to “govern” AI innovation. Govern is quoted because it is the word used in the EO. This is so much less a document of what should be done with the potential of technology than it is a document pushing the limits of what can be done legally to slow innovation.

Much attention has been focused on the Executive Order’s ultra-specific limits on model sizes and attributes (you can exceed those limits if you are registered and approved, a game best played by large established companies like the list I just detailed); unfortunately that is only the beginning of the issues with this Executive Order, but again, I urge you to read Sinofsky’s post.

What is so disappointing to me is how utterly opposed this executive order is to how innovation actually happens:

The Biden administration is not embracing uncertainty: it is operating from an assumption that AI is dangerous, despite the fact that many of the listed harms, like learning how to build a bomb or synthesize dangerous chemicals or conduct cyber attacks, are already trivially accomplished on today’s Internet. What is completely lacking is anything other than the briefest of hand waves at AI’s potential upside. The government is Bill Gates, imagining what might be possible, when it ought to be Steve Jobs, humble enough to know it cannot predict the future.
The Biden administration is operating with a fundamental lack of trust in the capability of humans to invent new things, not just technologies, but also use cases, many of which will create new jobs. It can envision how the spreadsheet might imperil bookkeepers, but it can’t imagine how that same tool might unlock entire new industries.
The Biden administration is arrogantly insisting that it ought have a role in dictating the outcomes of an innovation that few if any of its members understand, and almost certainly could not invent. There is, to be sure, a role for oversight and regulation, but that is a blunt instrument best applied after the invention, like an editor.

In short, this Executive Order is a lot like Gates’ approach to mobile: rooted in the past, yet arrogant about an unknowable future; proscriptive instead of adaptive; and, worst of all, trivially influenced by motivated reasoning best understood as some of the most cynical attempts at regulatory capture the tech industry has ever seen.

The Sclerotic Shiggoth

I fully endorse Sinofsky’s conclusion:

This approach to regulation is not about innovation despite all the verbiage proclaiming it to be. This Order is about stifling innovation and turning the next platform over to incumbents in the US and far more likely new companies in other countries that did not see it as a priority to halt innovation before it even happens.

I am by no means certain if AI is the next technology platform the likes of which will make the smartphone revolution that has literally benefitted every human on earth look small. I don’t know sitting here today if the AI products just in market less than a year are the next biggest thing ever. They may turn out to be a way stop on the trajectory of innovation. They may turn out to be ingredients that everyone incorporates into existing products. There are so many things that we do not yet know.

What we do know is that we are at the very earliest stages. We simply have no in-market products, and that means no in-market problems, upon which to base such concerns of fear and need to “govern” regulation. Alarmists or “existentialists” say they have enough evidence. If that’s the case then then so be it, but then the only way to truly make that case is to embark on the legislative process and use democracy to validate those concerns. I just know that we have plenty of past evidence that every technology has come with its alarmists and concerns and somehow optimism prevailed. Why should the pessimists prevail now?

They should not. We should accelerate innovation, not attenuate it. Innovation — technology, broadly speaking — is the only way to grow the pie, and to solve the problems we face that actually exist in any sort of knowable way, from climate change to China, from pandemics to poverty, and from diseases to demographics. To attack the solution is denialism at best, outright sabotage at worst. Indeed, the shoggoth to fear is our societal sclerosis seeking to drag the most exciting new technology in years into an innovation anti-pattern.

Photo generated by Dall-E 3, with the following prompt: “Photo of a radiant, downscaled city teetering on the brink of an expansive abyss, with a dark, murky quagmire below containing decayed structures reminiscent of historic landmarks. The city is a beacon of the future, with flying cars, green buildings, and residents in futuristic attire. The influence of AI is subtly interwoven, with robots helping citizens and digital screens integrated into the environment. Below, the haunting silhouette of a shoggoth, with its eerie tendrils, endeavors to pull the city into the depths, illustrating the clash between forward-moving evolution and outdated forces.”

China Chips and Moore’s Law

Wednesday, October 18, 2023Tuesday, March 19, 2024

This Article is available as a video essay on YouTube

The complexity for minimum component costs has increased at a rate of roughly a factor of two per year (see graph on next page). Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.
— Gordon Moore, Cramming More Components Onto Integrated Circuits

Moore’s law is dead.
– Jensen Huang

On Tuesday the Biden administration tightened export controls for advanced AI chips being sold to China; the primary target was Nvidia’s H800 and A800 chips, which were specifically designed to skirt controls put in place last year. The primary difference between the H800/A800 and H100/A100 is the bandwidth of their interconnects: the A100 had 600 Gb/s interconnects (the H100 has 900GB/s), which just so happened to be the limit proscribed by last year’s export controls; the A800 and H800 were limited to 400 Gb/s interconnects.

The reason why interconnect speed matters is tied up with Nvidia CEO Jensen Huang’s thesis that Moore’s Law is dead. Moore’s Law, as originally stated in 1965, states that the number of transistors in an integrated circuit would double every year. Moore revised his prediction 10 years later to be a doubling every two years, which held until the last decade or so, when it has slowed to a doubling about every three years.

In practice, though, Moore’s Law has become something more akin to a fundamental precept underlying the tech industry: computing power will both increase and get cheaper over time. This precept — which I will call Moore’s Precept, for clarity — is connected to Moore’s technical prediction: smaller transistors can switch faster, and use less energy in the switching, even as more of them fit on a single wafer; this means that you can either get more chips per wafer or larger chips, either decreasing price or increasing power for the same price. In practice we got both.

What is critical is that the rest of the tech industry didn’t need to understand the technical or economic details of Moore’s Law: for 60 years it has been safe to simply assume that computers would get faster, which meant the optimal approach was always to build for the cutting edge or just beyond, and trust that processor speed would catch up to your use case. From an analyst perspective, it is Moore’s Precept that enables me to write speculative articles like AI, Hardware, and Virtual Reality: it is enough to see that a use case is possible, if not yet optimal; Moore’s Precept will provide the optimization.

The End of Moore’s Precept?

This distinction between Moore’s Law and Moore’s Precept is the key to understanding Nvidia CEO Jensen Huang’s repeated declarations that Moore’s Law is dead. From a technical perspective, it has certainly slowed, but density continues to increase; here is TSMC’s transistor density by node size, using the first (i.e. worse) iteration of each node size:¹

TSMC	Transistor Density (MTr/mm)	Year Introduced
90 nm	3.4	2004
65 nm	5.6	2006
40 nm	9.8	2008
28 nm	16.6	2011
20 nm	20.9	2014
16 nm	28.9	2015
10 nm	52.5	2017
7 nm	91.2	2019
5 nm	138.2	2020

Remember, though, that cost matters; here is the same table with TSMC’s introductory price/wafer, and what that translates to in terms of price/billion transistors:

TSMC	MTr/mm	Year Introduced	Price/Wafer	Price/BTr
90 nm	3.4	2004	$1,650	$6.87
65 nm	5.6	2006	$1,937	$4.89
40 nm	9.8	2008	$2,274	$3.28
28 nm	16.6	2011	$2,891	$2.46
20 nm	20.9	2014	$3,677	$2.49
16 nm	28.9	2015	$3,984	$1.95
10 nm	52.5	2017	$5,992	$1.61
7 nm	91.2	2019	$9,346	$1.45
5 nm	138.2	2020	$16,988	$1.74

Notice that number on the bottom right: with TSMC’s 5 nm process the price per transistor increased — and it increased a lot (20%). The reason was obvious: 5 nm was the first process that required ASML’s extreme ultraviolet (EUV) lithography, and EUV machines were hugely expensive — around $150 million each.² In other words, it appeared that while the technical definition of Moore’s Law would continue, the precept that chips would always get both faster and cheaper would not.

GPUs and Embarrassing Parallelism

Huang’s argument, to be clear, does not simply rest on the cost of 5 nm chips; remember Moore’s Precept is about speed as well as cost, and the truth is that a lot of those density gains have primarily gone towards power efficiency as energy became a constraint in everything from mobile to PCs to data centers. Huang’s thesis for several years now is that Nvidia has the solution to making computing faster: use GPUs.

GPUs are much less complex than CPUs; that means they can execute instructions much more quickly, but those instructions have to be much simpler. At the same time, you can run a lot of them at the same time to achieve outsized results. Graphics is, unsurprisingly, the most obvious example: every “shader” — the primary processing component of a GPU — calculates what will be displayed on a single portion of the screen; the size of the portion is a function of how many shaders you have available. If you have 1,024 shaders, each shader draws 1/1,024 of the screen. Ergo, if you have 2,048 shaders, you can draw the screen twice as fast. Graphics performance is “embarrassingly parallel”, which is to say it scales with the number of processors you apply to the problem.

This “embarrassing parallelism” is the key to GPUs outsized performance relative to CPUs, but the challenge is that not all software problems are easily parallel-izable; Nvida’s CUDA ecosystem is predicated on providing the tools to build software applications that can leverage GPU parallelism, and is one of the major moats undergirding Nvidia’s dominance, but most software applications still need the complexity of CPUs to run.

AI, though, is not most software. It turns out that AI, both in terms of training models and in leveraging them (i.e. inference) is an embarrassingly parallel application. Moreover, the optimum amount of scalability goes far beyond a computer monitor displaying graphics; this is why Nvidia AI chips feature the high-speed interconnects referenced by the chip ban: AI applications run across multiple AI chips at the same time, but the key to making sure those GPUs are busy is feeding them with data, and that requires those high speed interconnects.

That noted, I’m skeptical about the wholesale shift of traditional data center applications to GPUs; from Nvidia On the Mountaintop:

Humans — and companies — are lazy, and not only are CPU-based applications easier to develop, they are also mostly already built. I have a hard time seeing what companies are going to go through the time and effort to port things that already run on CPUs to GPUs; at the end of the day, the applications that run in a cloud are determined by customers who provide the demand for cloud resources, not cloud providers looking to optimize FLOP/rack.

There’s another reason to think that traditional CPUs still have some life in them as well: it turns out that Moore’s Precept may be back on track.

EUV and Moore’s Precept

The table I posted above only ran through 5 nm; the iPhone 15 Pro, though, has an N3 chip, and check out the price/transistor:

TSMC	MTr/mm	Year Introduced	Price/Wafer	Price/BTr
90 nm	3.4	2004	$1,650	$6.87
65 nm	5.6	2006	$1,937	$4.89
40 nm	9.8	2008	$2,274	$3.28
28 nm	16.6	2011	$2,891	$2.46
20 nm	20.9	2014	$3,677	$2.49
16 nm	28.9	2015	$3,984	$1.95
10 nm	52.5	2017	$5,992	$1.61
7 nm	91.2	2019	$9,346	$1.45
5 nm	138.2	2020	$16,988	$1.74
3 nm (N3B)	197.0	2023	$20,000	$1.44
3 nm (N3E)	215.6	2023	$20,000	$1.31

While I only included the first version of each node previously, the N3B process, which is used for the iPhone’s A17 Pro chip, is a dead-end; TSMC changed its approach with the N3E, which will be the basis of the N3 family going forward. It also makes the N3 leap even more impressive in terms of price/transistor: while N3B undid the 5 nm backslide, N3E is a marked improvement over 7 nm.

Moreover, the gains are actually what you would expect: yes, those EUV machines cost a lot, but the price decreases embedded in Moore’s Precept are not a function of equipment getting cheaper — notice that the price/wafer has been increasing continuously. Rather, ever declining prices/transistor are a function of Moore’s Law, which is to say that new equipment, like EUV, lets us “Cram[] More Components Onto Integrated Circuits”.

What happened at 5 nm was similar to what happened at 20 nm, the last time the price/transistor increased: that was the node where TSMC started to use double-patterning, which means they had to do every lithography step twice; that both doubled the utilization of lithography equipment per wafer and also decreased yield. For that node, at least, the gains from making smaller transistors were outweighed by the costs. A year later, though, and TSMC launched the 16 nm node that re-united Moore’s Law with Moore’s Precept. That is exactly what seems to have happened with 3 nm — the gains of EUV are now significantly outweighing the costs — and early rumors about 2 nm density and price points suggests the gains should continue for another node.

Chip Ban Angst

All of this is interesting in its own right, but it’s particularly pertinent in light of the recent angst in Washington DC over Huawei’s recent smartphone with a 7 nm chip, seemingly in defiance of those export controls. I already explained why that angst was misguided in this September Update. To summarize my argument:

TSMC had already shown that 7 nm chips could be made using deep ultraviolet (DUV)-based immersion lithography, and China had plenty of DUV lithography machines, given that DUV has been the standard for multiple generations of chips.
China’s Semiconductor Manufacturing International Corp. (SMIC) had already made a 7 nm chip in 2022; sure it was simpler than the one launched in that Huawei phone, but that is the exact sort of progression you should expect from a competent foundry.
SMIC is almost certainly not producing that 7nm chip economically; Intel, for example, could make a 7nm chip using DUV, they just couldn’t do it economically, which is why they ultimately switched to EUV.

In short, the problem with the chip ban was drawing the line at 10 nm: that line was arbitrary given that the equipment needed to make 10 nm chips had already been shown to be capable of producing 7 nm chips; that SMIC managed to do just that isn’t a surprise, and, crucially, is not evidence that the chip ban was a failure.

The line that actually matters is 5 nm, which is another way to say that the export control that will actually limit China’s long-term development is EUV. Fortunately the Trump administration had already persuaded the Netherlands to not allow the export of EUV machines, which the Biden administration further locked down with its chip ban and further coordination with the Netherlands. The reality is that a lot of chip-making equipment is “multi-nodal”; much of the machinery can be used at multiple nodes, but you must have EUV machines to realize Moore’s Precept, because it is the key piece of technology driving Moore’s Law.

By the same token, the A800/H800 loophole was a real one: the H800 is made on TSMC’s third-generation 5 nm process (confusingly called N4), which is to say it is made with EUV; the interconnect limits were meaningful, and would make AI development slower and more costly (because those GPUs would be starved of data more of the time), but it didn’t halt it. This matters because AI is the military application the U.S. should be the most concerned with: a lot of military applications run perfectly fine on existing chips (or even, in the case of guided weaponry, chips that were made decades ago); wars of the future, though, will almost certainly be undergirded by AI, a field that is only just now getting started.

This leads to a further point: the payoff from this chip ban will not come immediately. The only way the entire idea makes sense is if Moore’s Law continues to exist, because that means the chips that will be available in five or ten years will be that much faster and cheaper than the ones that exist today, increasing the gap. And, at the same time, the idea also depends on taking Huang’s argument seriously, because AI needs not just power but scale. Fortunately movement on both fronts is headed in the right direction.

There remain good arguments against the entire concept of the chip ban, including the obvious fact that China is heavily incentivized to built up replacements from scratch (and could have leverage on the U.S. on the trailing edge): perhaps in 20 years the U.S. will not only have lost its most potent point of leverage but will also see its most cutting edge companies undercut by Chinese competition. That die, though, has long since been cast; the results that matter are not a smartphone in 2023, but the capabilities of 2030 and beyond.

I am not certain I have the exact right numbers for older nodes, but I have confirmed that the numbers are in the right ballpark ↩
TSMC first used EUV with latter iterations of its 7nm process, but that was primarily to move down the learning curve; EUV was not strictly necessary, and the original 7nm process used immersion DUV lithography exclusively ↩

Subscriber’s Daily Update

The iPod and the Music Labels

The App Store

The Epic Case

Vision Pro’s Missing Apps

Developers On Strike

A Disney Double-Down?

New York Times v. OpenAI

Criminalizing Capability and Fair Use

Market Effects and Hallucination

Internet Value

The New York Times’ AI Opportunity

The Five Most-Viewed Articles

AI Strategy

AI Questions and Philosophy

Streaming and Hollywood

Regulation

Stratechery Interviews

The Year in Stratechery Updates

Google’s Horizontal Webs

Infrastructure, Data, and Ecosystems

Gemini and Seamless AI

Pixie

Google’s True Moonshot

Why Web Pages Suck

Free AI

Google’s Missing Constraints

Social Media Inhumanity

The Current Thing

OpenAI’s Non-Profit Model

ChatGPT Tribes

Microsoft vs. the Board

Altman Questions

The Altered Landscape

Keynote Metaphysics and GPT-4 Turbo

Price and Microsoft

GPTs and Computers

ChatGPT and the Universal Interface

Consumers and Hardware

A Cynical Read on AI Alarm

An Executive Order on Attenuating Innovation

The Sclerotic Shiggoth

The End of Moore’s Precept?

GPUs and Embarrassing Parallelism

EUV and Moore’s Precept

Chip Ban Angst