A quick update on scheduling: I will be traveling next week; my plan is to post on Monday and Tuesday, but not Wednesday. I will be appearing at an event and will post the audio for Thursday’s Interview. There will be a Dithering episode on Tuesday; Friday is TBD. Finally, there will only be one Sharp Tech episode, which we will release in the middle of the week.
Meanwhile, last week’s Sharp China discussed the recent spate of raids on Western firms and discussed the complicated relationships between India, Pakistan, China, and the U.S, while Greatest of All Talk continues to cover a very compelling NBA playoffs.
On to the update, which is free to read:
Meta Open Sources Another AI Model
From The Verge:
Meta has announced a new open-source AI model that links together multiple streams of data, including text, audio, visual data, temperature, and movement readings. The model is only a research project at this point, with no immediate consumer or practical applications, but it points to a future of generative AI systems that can create immersive, multisensory experiences and shows that Meta continues to share AI research at a time when rivals like OpenAI and Google have become increasingly secretive.
The core concept of the research is linking together multiple types of data into a single multidimensional index (or “embedding space,” to use AI parlance). This idea may seem a little abstract, but it’s this same concept that underpins the recent boom in generative AI. For example, AI image generators like DALL-E, Stable Diffusion, and Midjourney all rely on systems that link together text and images during the training stage. They look for patterns in visual data while connecting that information to descriptions of the images. That’s what then enables these systems to generate pictures that follow users’ text inputs. The same is true of many AI tools that generate video or audio in the same way.
Meta says that its model, ImageBind, is the first to combine six types of data into a single embedding space. The six types of data included in the model are: visual (in the form of both image and video); thermal (infrared images); text; audio; depth information; and — most intriguing of all — movement readings generated by an inertial measuring unit, or IMU. (IMUs are found in phones and smartwatches, where they’re used for a range of tasks, from switching a phone from landscape to portrait to distinguishing between different types of physical activity.)
Meta has a demo page, which has canned examples of how the audio sound of a dog barking might be associated with an image of a dog, for example; the breakthrough here is that by including all of the modalities in the same embedding space you don’t need perfect 1:1 pairs for training, which means the AI can learn a lot more about what goes together from a much smaller amount of data.
What I am particularly interested in, though — and building on last week’s Update about Facebook’s earnings — is the license. ImageBind is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license (CC BY-NC-SA). To borrow the summary from Creative Commons:
CC BY-NC-SA: This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.
What is interesting is that this is the fourth AI-related release Meta has made in the past few months, and all of them had different licenses. In addition to ImageBind:
- LLaMA, the large language model that I wrote about previously, was released under a Meta-created license that restricts usage to non-commercial research purposes (also Meta didn’t actually release the model explicitly; it was leaked by someone who received access).
- DINOv2, which trains high-performance computer vision models using self-supervision, was licensed under a Creative Commons Attribution-NonCommercial license; this is similar to the license for ImageBind, without the need to apply the same license to any work done to augment the model.
- Segment Anything, which identifies unique objects in images, was licensed under an Apache license, which is arguably the most permissive license there is: there are no restrictions on the use of code, and further developments have no license requirements on them and can also be patented (the clarity around patented material is the most important difference between the MIT and Apache licenses). Meta also released their own Segment Anything weights (i.e. the output of a large training run), but this was, like the LLaMA weights, restricted to research only.
While I am sure there are good reasons behind all of these licensing decisions, it definitely feels like tangible evidence of the mix of excitement and trepidation I identified in CEO Mark Zuckerberg’s comments on the earnings call: Meta can sense a real opportunity in Open Source, but the company isn’t yet quite sure how to take advantage of it.
Moats and Open Source
There was a fair bit of buzz last leak over an alleged internal Google memo that was leaked to SemiAnalysis entitled We Have No Moat, And Neither Does OpenAI.
We’ve done a lot of looking over our shoulders at OpenAI. Who will cross the next milestone? What will the next move be? But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch. I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today. Just to name a few:
- LLMs on a Phone: People are running foundation models on a Pixel 6 at 5 tokens/sec.
- Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening..
- Responsible Release: This one isn’t “solved” so much as “obviated”. There are entire websites full of art models with no restrictions whatsoever, and text is not far behind.
- Multimodality: The current multimodal ScienceQA SOTA was trained in an hour.
While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.
The rest of the memo lays out the case; I thought the most compelling section was the part arguing that speed of iteration and stacking improvements were essential to getting better, and that open source had an advantage over large training runs in these two areas.
Still, while I’m obviously extremely interested in and excited about open source models, the author may have overstated his case a bit. Shital Shah has a long tweet arguing that most open source benchmarking isn’t reliable (which is to say that their performance isn’t nearly as good as, say, GPT-3, much less GPT-4), and Nathan Labenz has a thread arguing that, amongst other things, the OpenAI’s aggressive movement down the pricing curve will foreclose any real economic opportunity for open source alternatives, ultimately starving them of resources (including user data). For my part I will note that one of the big takeaways I had from ChatGPT is just how important and effective reinforcement learning with human feedback is to the user experience; that’s also a fairly expensive proposition that open source alternatives may not be able to match.
The biggest pushback I have to the memo, though, is that any determination of a moat is inextricably tied to the business model a particular bit of technology is attached to, which is to say that any sort of blanket statement misses important nuance. Take Google itself: the general way in which its search engine works is understood, but small distinctions in quality still matter; it also matters that Google is the default almost everywhere, that it gets much more user data, has such a huge advertising base, etc. It may be the case that in a few years we look at ChatGPT in a similar way: yes, its capabilities may be generally matched, but it has small differences that help, it has more user data, a larger and more diverse plugin ecosystem, etc.
Identifiable differences in capabilities also matter when it comes to selling cloud services, whether that be OpenAI’s API, Azure, Google Cloud, etc; enterprise selling is the space where feeds and speeds move the needle, and while open source has its own attraction for some enterprises, you can bet that Google Cloud and OpenAI/Azure will leverage whatever advantages they have in their sales motion.
Here, though, I go back to one of Zuckerberg’s comments on that earnings call:
I think that there’s an important distinction between the products we offer and a lot of the technical infrastructure, especially the software that we write to support that. And historically, whether it’s the Open Compute project that we’ve done or just open sourcing a lot of the infrastructure that we’ve built, we’ve historically open sourced a lot of that infrastructure, even though we haven’t open sourced the code for our core products or anything like that.
And the reason why I think why we do this is that unlike some of the other companies in the space, we’re not selling a cloud computing service where we try to keep the different software infrastructure that we’re building proprietary. For us, it’s way better if the industry standardizes on the basic tools that we’re using and therefore we can benefit from the improvements that others make and others’ use of those tools can, in some cases like Open Compute, drive down the costs of those things which make our business more efficient too. So I think to some degree we’re just playing a different game on the infrastructure than companies like Google or Microsoft or Amazon, and that creates different incentives for us.
Zuckerberg was specifically talking about cloud infrastructure software, but the same point applies to AI capabilities as well: Meta isn’t selling its capabilities; rather, it sells a canvas for users to put whatever content they desire, and to consume the content created by other users. It follows, then, that Meta ought to be fairly agnostic about how and where that content is created; by extension, if Meta were to open source its content creation models, the most obvious place where the content of those models would be published is on Meta platforms. To put it another way, Meta’s entire business is predicated on content being a commodity; making creation into a commodity as well simply provides more grist for the mill.
What is compelling about this reality, and the reason I latched onto Zuckerberg’s comments in that call, is that Meta is uniquely positioned to overcome all of the limitations of open source, from training to verification to RLHF to data quality, precisely because the company’s business model doesn’t depend on having the best models, but simply on the world having a lot of them.
Apple and Meta
I didn’t cover Apple’s earnings last week because they really were quite boring, but I did want to highlight this bit from CFO Luca Maestri’s prepared remarks on the earnings call:
Moving to services, we reached a new all-time revenue record of $20.9 billion. And in addition to the all-time records Tim mentioned earlier, we set March quarter records for advertising, AppleCare and video. Despite these records, as we saw in recent quarters, certain services offerings, such as digital advertising and mobile gaming, continue to be affected by the current macroeconomic environment.
It’s a bit odd to note that Apple simultaneously set a record for advertising and also complain about the macro environment impacting digital advertising; what seems more likely is that Maestri didn’t want to talk about mobile gaming in isolation. After all, the most obvious reason why mobile gaming revenue would be down is Apple’s App Tracking Transparency, which impacted mobile gaming more than any other sector. This, by extension, explains why for most of 2022 we seemed to be in what Eric Seufert called The App Tracking Transparency Recession, which is the phenomenon where all of advertising, from TV to print to billboards to radio to search seemed to be doing well, even as digital display advertising of the sort impacted by ATT fell off a cliff (which, to return to the original point, meant fewer mobile downloads and monetization, which would impact Apple).
The reason to highlight this point is to reemphasize the degree to which Apple and Meta have complementary businesses: Smartphones made Facebook into an advertising juggernaut, and Facebook gave people something to do with their phones, even as its advertising built the App Store. You can imagine the two companies’ ventures in, say, VR and AR working out much less well for lack of the other: Meta would be better off if Apple built glasses, and Apple’s glasses will be compelling to the extent a company like Meta is layering on social experiences.
I bring this up because I imagine one of things holding Meta back from truly open source models is the realization that the biggest benefactor would be Apple. I noted in AI and the Big Five that Apple is poised to be the biggest beneficiary of open source, thanks to the compute it has in its devices and its ability to leverage its integration of the OS with its own chips; this would apply to Meta’s models just as much as anything else, which surely grinds Zuckerberg’s gears.
At the same time, this makes the point: Meta could be a massive beneficiary of Apple’s devices being able to do generative AI on the edge. After all, what apps might take advantage of that? Facebook and Instagram! And where would the output be seen by most people? On Facebook and Instagram! This is an example of where structure is stronger than antipathy: Apple and Meta are simply structurally compatible; the only decision for both of them is whether they want to work with each other willingly or through gritted teeth.
Amazon would be the other beneficiary of Meta open source models, hosting them much as they host any other open source project. This, though, wouldn’t have nearly the upside for Meta as Apple’s adoption would, even as it could undergird future Meta competitors. A possible solution here could be something like MongoDB’s Server Side Public License; no, it’s not open source by the strictest definition of the term, but the net effect, as MongoDB has shown, is to reap almost all of the benefits of open source while preventing the hyperscalers from taking away your business.
This Update will be available as a podcast later today. To receive it in your podcast player, visit Stratechery.
The Stratechery Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly.
Thanks for being a subscriber, and have a great day!