New Bing Errors, User Preferences and Company Reputations, Section 230 and LLMs

Good morning,

I promise that Stratechery is not going to become solely about chat bots (or has a chat bot taken over Stratechery?!). There is a great interview in the can this week about Disney, for example. However, there was a new piece of information today that is extremely interesting in its own right, and led to a further question that I haven’t seen anyone raise, so I appreciate your forbearance!

On to the update:

New Bing Errors

From Dmitri Brereton:

Bing AI got some answers completely wrong during their demo. But no one noticed. Instead, everyone jumped on the Bing hype train. Google’s Bard got an answer wrong during an ad, which everyone noticed. Now the narrative is “Google is rushing to catch up to Bing and making mistakes!”. That would be a fine narrative if Bing didn’t make even worse mistakes during its own demo.

Brereton lists three examples: a pet vacuum review doesn’t seem to match the information on the page Bing was allegedly pulling from; a Mexican night life itinerary doesn’t seem to match the reality of the locations it highlighted; and a financial statements summary has numerous numbers completely wrong.

Brereton concludes:

Bing AI did a great job of creating media hype, but their product is no better than Google’s Bard. At least as far as we can tell from the limited information we have about both.

I am shocked that the Bing team created this pre-recorded demo filled with inaccurate information, and confidently presented it to the world as if it were good. I am even more shocked that this trick worked, and everyone jumped on the Bing AI hype train without doing an ounce of due diligence. Bing AI is incapable of extracting accurate numbers from a document, and confidently makes up information even when it claims to have sources. It is definitely not ready for launch, and should not be used by anyone who wants an accurate model of reality.

First off, tremendous kudos to Brereton for going through the presentation and verifying all of the demos. I will put myself in the camp of those who didn’t do a similar level of due diligence, and accept Brereton’s admonishment.

Secondly, wow, I too “am shocked that the Bing team created this pre-recorded demo filled with inaccurate information, and confidently presented it to the world as if it were good.” That raises a whole set of separate questions. The first is obviously whether anyone at Microsoft bothered checking. If yes, well, they did a very poor job. If no, why not? Did anyone not consider whether or not the answers might be false?

I recognize the irony in my asking this question since I myself didn’t bother checking the answers, but with a bit of self-reflection, well, I trusted Microsoft; given it’s a $2 trillion company hyping up this new product I guess that seems like a reasonable thing to have done?

That, though, is a point in and of itself: an obvious takeaway here is the importance of verifying AI-generated data — that was the point of AI Homework — but in this case my failure was in not verifying that a company verified AI-generated data; to put it another way, Microsoft effectively laundered new Bing’s AI reputation to the point where no one other than Brereton even thought to check to see if it was generating the correct answers.

What is just as interesting is what this says about Microsoft: probably the most obvious explanation is that the company is so enthusiastic about this technology and is so eager to take on Google that they didn’t do due diligence. What now, though? Will this be a reason to slow down? Or did Microsoft know that the results might have errors, and barreled forwards anyways?

On the Google side, this certainly is a point in Sundar Pichai’s favor in terms of taking time to roll out a similar tool for Google, above and beyond the business model and cost questions that I have raised previously.

User Preferences and Company Reputations

In another way, though, the Bard and Bing errors highlight exactly why the chat question is so vexing for Google in particular. The one thing we know is that users are going crazy about these tools:

chatGPT : something different is happening.
Number of days to 1M and 100M users :
vs
* instagram
* spotify
* facebook
* netflix
* twitter#chatgpt #ai #openai #google @openai pic.twitter.com/1wNqtU4ytA

— Kyle Hailey (@kylelf_) February 9, 2023

This isn’t quite an apples-to-apples comparison, as Hailey’s tweet appears to be comparing cumulative ChatGPT users — themselves compiled by a third party — to reported active users from the comparison set. It does, though, seem directionally accurate, and it certainly helps explain why Microsoft seems to have thrown caution to the wind.

Google, though, well it’s Google: yes, as I noted yesterday, the search engine’s answer box is wrong more often than you might think, but it’s wrong in the same way that car accidents kill tens of thousands of Americans every year; at this point we have accepted it and no one notices, as tragic as it is. If an airplane crashes, though, or if a self-driving car takes a life, that’s headline news. This is basically what happened to Bard last week, and, thanks to Brereton’s diligence, new Bing this week. Can Google risk that sort of news cycle going forward? Can Microsoft?

At the same time, though, AI answers are not human lives — that’s where the analogy falls apart in a major way. And again, there is clear product-market fit for tools like this, which are bound to get better over time, even if they never are perfect. Balancing these two imperatives — reputation and accuracy versus user demand — is going to be a devilishly difficult proposition, particularly for Google: they both have the best reputation and have all of the users, so there is no perfect answer as far as their business is concerned.

That noted, I do wonder if Microsoft will blink first: on one hand the company has more freedom of movement both because Bing doesn’t have the greatest reputation for accuracy already, and because the company has nothing to lose in the search market. At the same time, does the company want to risk its own overall reputation and risk negative news cycles and potential harms for a market it is only a bit player in?

Section 230 and LLMs

That leads to one of the most interesting questions of all, prompted by this section in Brereton’s article:

According to this pros and cons list, the “Bissell Pet Hair Eraser Handheld Vacuum” sounds pretty bad. Limited suction power, a short cord, and it’s noisy enough to scare pets? Geez, how is this thing even a best seller? Oh wait, this is all completely made up information.

Bing AI was kind enough to give us its sources, so we can go to the hgtv article and check for ourselves. The cited article says nothing about limited suction power or noise. In fact, the top amazon review for this product talks about how quiet it is. The article also says nothing about the “short cord length of 16 feet” because it doesn’t have a cord. It’s a portable handheld vacuum.

I hope Bing AI enjoys being sued for libel.

First a caveat: there is in fact a corded Bissell Pet Hair Eraser Handheld Vacuum — here’s an Amazon link — although the reviews there suggest it has excellent suction power and there don’t seem to be any noise complaints. It’s possible that Brereton simply looked at the wrong vacuum — the Bing results don’t say cordless at all — which kind of makes the point that everyone makes mistakes, including humans.

The line I’m most interested in, though, is that last one: even if Bing has a plausible defense in this specific case, could Microsoft be sued for libel? To put it another way, is the output of large language models protected by Section 230?

I went over the genesis of Section 230 in this Daily Update. Here is the relevant text from the statute:

(c) Protection for “Good Samaritan” blocking and screening of offensive material

(1) Treatment of publisher or speaker
No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.
(2) Civil liability No provider or user of an interactive computer service shall be held liable on account of—

(A) any action voluntarily taken in good faith to restrict access to or availability of material that the provider or user considers to be obscene, lewd, lascivious, filthy, excessively violent, harassing, or otherwise objectionable, whether or not such material is constitutionally protected; or
(B) any action taken to enable or make available to information content providers or others the technical means to restrict access to material described in paragraph (1).

Here is the definition of “information content provider”:

The term “information content provider” means any person or entity that is responsible, in whole or in part, for the creation or development of information provided through the Internet or any other interactive computer service.

On platforms like Facebook, it is clear that the person making a post is a distinct “information content provider” from Facebook, which means that Facebook is not liable for the post, and does not assume liability for all posts if it moderates some of them. This also extends to search results: Google is not liable for search results because that information comes from a distinct “information content provider”. There is, I should note, a case before the Supreme Court right now — Gonzalez v. Google — about whether recommendation engines are covered by Section 230.

None of these, though, are comparable to something like New Bing, or ChatGPT: yes, the corpus undergirding the large language model is from the web, but all of the AI companies are very eager to assure everyone that the content is original; after all, they don’t want to be sued for copyright violations! Stability AI, for example, is being sued for copyright violations for Stable Diffusion; a group of “tech enthusiasts uninvolved in the case” took it upon themselves to write a defense at stablediffusionfrivolous.com. From the introduction:

AI art tools do not in any way, shape or form “collage together images”. No images are stored in the checkpoints in any way, shape or form, and indeed – as documented below – such a thing is a literal impossibility. Rather, AI art tools work akin to seeing shapes in clouds – starting with random latent noise and trying to make it “make more sense”, pushing and nudging it one step at a time, based on the relationships it has learned about what is statistically “normal” in images…

While one can certainly have sympathy for artists who are faced with change in their industry – as has happened many times in the past, to great resistance, such as with the advent of photography, and later, of digital tools like Photoshop – the simple facts are, the rights of creators are not unlimited. That’s literally what fair use is. In his very critique of AI art “misappropriating” images, the attorney for the plaintiffs takes the images of various researchers straight from their papers, “with no consent” and with “no compensation”. And that’s fine, because, again, there are limits to the rights of creators, and the world is better for the existence of fair use. Indeed, while the images were taken in their entirety, AI image generators make use of on the order of a byte or so per image. An entire artist’s portfolio may be represented in a tweet or two. A Wikipedia page on an artist stores far more. Google thumbnails store vastly more, by orders of magnitude. If using a byte or so from a work, to empower countless millions of people to create works not even resembling any input, cannot be considered fair use, then the entire notion of fair use has no meaning.

Who, then, generates the images? Is it the model? Is it the person who wrote the prompt? Extend this to ChatGPT or Bing: who generates the output? Is it OpenAI or Microsoft? Is it the person writing the prompt? Who is responsible for libel?

I don’t have any clue what the answer is, other than to note that there is a fundamental tension between claiming that these tools violate copyright and that they ought be liable for creating something they can be sued over — can’t be both! What I do know, though, is that Microsoft — and Google — have very large legal departments that are arguing very strenuously in the Supreme Court in favor of Section 230 and, I would think, may read Brereton’s article with trepidation (of course Microsoft’s lawyers may have already thought all this through, but that seems inconsistent with not actually checking to see if the demonstrations were accurate).

That leads to an additional observation: this angle arguably increases the probability that there really might be a startup that breaks through in the consumer space. I still think that Google is the favorite, and Microsoft second, but there may be very tangible benefits in literally having nothing to lose. Or, perhaps, this will be YouTube 2.0, where a startup breaks through, but ultimately needs to be acquired to attain the pocketbooks and political power necessary to overcome legal obstacles (maybe that is a way to think about the OpenAI and Microsoft relationship).

One final point: it’s obvious on an intellectual level why it is “bad” to have wrong results. What is fascinating to me, though, is that I’m not sure humans care, particularly on the visceral level that drives a product to 100 million users in a matter of weeks. After all, it’s not as if humans are right 100% of the time, but we like talking to and learning from them all the same; the humanization of computers, even in the most primitive manifestation we have today, may very well be alluring enough that good enough accuracy is sufficient to gain traction. This will, of course, be tremendously controversial, particularly amongst folks who see the world as something that can be tamed with the right set of facts and rules; I tend to think things are more complex, because humans themselves are more complex, and revealed preference certainly suggests that the breakthrough AI product to date is not substance but style.

This Update will be available as a podcast later today. To receive it in your podcast player, visit Stratechery.

The Stratechery Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly.

Thanks for being a subscriber, and have a great day!

Stratechery by Ben Thompson

New Bing Errors, User Preferences and Company Reputations, Section 230 and LLMs

New Bing Errors

User Preferences and Company Reputations

Section 230 and LLMs

Related

Subscriber’s Daily Update

Netflix Earnings, Netflix’s Reporting Change, Netflix’s Waiting Game

An Interview with Michael Morton About E-Commerce Winners and Losers

More on Humane, Limitless, The iPhone Integration Barrier

New Bing Errors

User Preferences and Company Reputations

Section 230 and LLMs

Share

Related

Subscriber’s Daily Update

Does VR Matter?, Modular Products and Defining New Markets, TikTok Acquirers

Netflix Earnings, Netflix’s Reporting Change, Netflix’s Waiting Game

An Interview with Michael Morton About E-Commerce Winners and Losers

More on Humane, Limitless, The iPhone Integration Barrier