Apple iCloud Encryption, CSAM Scanning and Convergent Encryption

Good morning,

Monday’s Sharp Tech did, like Monday’s Article about Consoles and Competition, come out a few hours late; I apologize for the delay. In this episode we discussed TSMC’s Arizona announcements and ruminated about the future of remote work.

On to the update:

Apple iCloud Encryption

Last week, from the Wall Street Journal:

Apple Inc. is planning to significantly expand its data-encryption practices, a step that is likely to create tensions with law enforcement and governments around the world as the company continues to build new privacy protections for millions of iPhone users. The expanded end-to-end encryption system, an optional feature called Advanced Data Protection, would keep most data secure that is stored in iCloud, an Apple service used by many of its users to store photos, back up their iPhones or save specific device data such as Notes and Messages. The data would be protected in the event that Apple is hacked, and it also wouldn’t be accessible to law enforcement, even with a warrant.

While Apple has drawn attention in the past for being unable to help agencies such as the Federal Bureau of Investigation access data on its encrypted iPhones, it has been able to provide much of the data stored in iCloud backups upon a valid legal request. Last year, it responded to thousands of such requests in the U.S., according to the company. With these new security enhancements, Apple would no longer have the technical ability to comply with certain law-enforcement requests such as for iCloud backups—which could include iMessage chat logs and attachments and have been used in many investigations.

This news isn’t entirely a surprise, for a few different reasons.

First, just for clarification’s sake, data you store in iCloud is encrypted on Apple’s servers; until now, though, Apple had the keys to decrypt most of that data. Not all of it though: things like your Keychain (which stores accounts and passords), health data, payment data, etc. were end-to-end encrypted; that means the data was encrypted on your device, and only you had the keys (or, more accurately, those keys were stored on your devices). That means that Apple could not access that data, even if you lost your password, or if law enforcement showed up with a warrant. On the other hand, Apple could access things like Photos, Backups, and iCloud Drive, which they might want to do for the same reasons (or to provide easy access in a browser).

Much of the discussion around end-to-end encryption is focused on the law enforcement angle, for understandable reasons; the workaround for law enforcement griping about iPhone encryption has been the fact that most of the data that law enforcement wants access to is found in iCloud backups, which are turned on by default. This includes private keys for things like iMessage, which are backed up and thus accessible by Apple. For the record, I always thought this was a reasonable middle ground; I wrote in a January 2020 Update:

This also splits the difference when it comes to principles: users have agency — they can ensure that everything they do is encrypted — while total privacy is available but not given by default.

I actually think that Apple does an excellent job of striking that balance today. When it comes to the iPhone itself, Apple is the only entity that can make it truly secure; no individual can build their own secure enclave that sits at the root of iPhone security. Therefore, they are right to do so: everyone has access to encryption.

From there it is possible to build a fully secure environment: use only encrypted communications, use encrypted backups to a computer secured by its own hardware-based authentication scheme, etc. Taking the slightly easier route, though — iCloud backups, Facebook messaging, etc. — means some degree of vulnerability that, let’s not forget, is sometimes justifiably leveraged. Law enforcement can get a warrant for those backups or chat logs, just as they can install a wire tap.

Again, this isn’t going to stop determined bad actors, but as I noted, nothing is. The question is what of the rest, those that get swept up by the worst sort of communities, and who commit legitimate crimes: what should their defaults be?

I would say my personal calculus in terms of defaults has shifted a bit over the last few years to be more on the “increased encryption” side of things, but it is a spectrum between everything encrypted by default using open source software and nothing being encrypted. Moreover, that spectrum applies to the other angle I referenced above: customer support. Indeed, I strongly suspect that by far the most common reason for Apple to use their keys (beyond displaying content in a browser) is to help customers recover lost data.

Despite these benefits of Apple’s current approach, though, the company’s controversial announcement in August 2021 that it would implement on-device scanning for Child Sexual Abuse Material (CSAM) for uploads to iCloud Photos strongly suggested that end-to-end encryption for things like photos was being worked on. I wrote in Apple’s Mistake:

Apple’s idealized outcome solves a lot of seemingly intractable problems. On one hand, CSAM is horrific and Apple hasn’t been doing anything about it; on the other hand, the company has a longstanding commitment to ever increasing amounts of encryption, ideally end-to-end. Apple’s system, if it works precisely as designed, preserves both goals: the company can not only keep end-to-end encryption in Messages, but also add it to iCloud Photos (which is not currently encrypted end-to-end), secure in the knowledge that it is doing its part to not only report CSAM but also help parents look after their children. And, from a business perspective, it means that Apple can continue to not make the massive investments that companies like Facebook have in trust-and-safety teams; the algorithm will take care of it.

Apple ultimately paused the rollout of on-device scanning for further study, which raised the question as to whether they were going to abandon end-to-end encryption for iCloud Photos in particular. In fact, the company continued to push forward, leading to last week’s announcement. These new capabilities, though, will not be on by default — and I think there is a good chance they never will be, and probably never should be. Again, the biggest risk that most customers face is losing access to their data; I think that Apple is correct to make this an opt-in so that users can be clear about the responsibility they are taking when enabling end-to-end encryption.

CSAM Scanning and Convergent Encryption

From Wired:

In August 2021, Apple announced a plan to scan photos that users stored in iCloud for child sexual abuse material (CSAM). The tool was meant to be privacy-preserving and allow the company to flag potentially problematic and abusive content without revealing anything else. But the initiative was controversial, and it soon drew widespread criticism from privacy and security researchers and digital rights groups who were concerned that the surveillance capability itself could be abused to undermine the privacy and security of iCloud users around the world. At the beginning of September 2021, Apple said it would pause the rollout of the feature to “collect input and make improvements before releasing these critically important child safety features.” In other words, a launch was still coming. Now the company says that in response to the feedback and guidance it received, the CSAM-detection tool for iCloud photos is dead.

Instead, Apple told WIRED this week, it is focusing its anti-CSAM efforts and investments on its “Communication Safety” features, which the company initially announced in August 2021 and launched last December. Parents and caregivers can opt into the protections through family iCloud accounts. The features work in Siri, Apple’s Spotlight search, and Safari Search to warn if someone is looking at or searching for child sexual abuse materials and provide resources on the spot to report the content and seek help. Additionally, the core of the protection is Communication Safety for Messages, which caregivers can set up to provide a warning and resources to children if they receive or attempt to send photos that contain nudity. The goal is to stop child exploitation before it happens or becomes entrenched and reduce the creation of new CSAM.

At first glance, the combination of not doing on-device scanning for CSAM (which uses the NCMEC database of known CSAM) and adding end-to-end encryption suggests that Apple is simply washing its hands of CSAM scanning in iCloud Photos. That, I would note, is legal: companies are required to report any CSAM they find, but they are not required to do proactive scanning. Still, most do: it is not good for business to be known as the safe place to store CSAM, which raises the question as to whether this is exactly what Apple is risking.

In fact, there is a loophole in Apple’s implementation of iCloud Photo end-to-end encryption; from Apple’s Advanced Data Protection for iCloud support document:

iCloud stores some data without the protection of user-specific CloudKit service keys, even when Advanced Data Protection is turned on. CloudKit Record fields must be explicitly declared as “encrypted” in the container’s schema to be protected, and reading and writing encrypted fields requires the use of dedicated APIs. Dates and times when a file or object was modified are used to sort a user’s information, and checksums of file and photo data are used to help Apple de-duplicate and optimize the user’s iCloud and device storage—all without having access to the files and photos themselves. Details about how encryption is used for specific data categories is available in the Apple Support article iCloud data security overview.

Decisions such as the use of checksums for data de-duplication — a well-known technique called convergent encryption — were part of the original design of iCloud services when they launched. This metadata is always encrypted, but the encryption keys are stored by Apple with standard data protection. To continue to strengthen security protections for all users, Apple is committed to ensuring more data, including this kind of metadata, is end-to-end encrypted when Advanced Data Protection is turned on.

De-duplication is particularly important for Apple from a cost-perspective: if many people have the exact same document or photo or movie, etc., it is far cheaper to store that file once instead of storing a unique version for every single user who has that file on their device. If every user is encrypting said file with their own unique keys, though, then the resultant encrypted file will by definition be unique to each user (because the key is what is used to encrypt the file; to decrypt the file to its original form you need the unique key).

Convergent encryption solves this problem in a very clever way:

  • The way to make sure that every unique user with the same file ends up with an encrypted version of that file that is also identical is to ensure they use the same key.
  • However, you can’t share keys between users, because that defeats the entire point; you need a common reference point between users that is unknown to anyone but those users.
  • The answer is to use the file itself: the system creates a hash of the file’s content, and that hash (a long string of characters derived from a known algorithm) is the key that is used to encrypt said file.

If every iCloud user uses this technique — and given that Apple implements the system, they do — then every iCloud user with the same file will produce the same encrypted file, given that they are using the same key (which is derived from the file itself); that means that Apple only needs to store one version of that file even as it makes said file available to everyone who “uploaded” it (in truth, because iCloud integration goes down to the device, the file is probably never actually uploaded at all — Apple just includes a reference to the file that already exists on its servers, thus saving a huge amount of money on both storage costs and bandwidth).

There is one huge flaw in convergent encryption, however, called “confirmation of file”: if you know the original file you by definition can identify the encrypted version of that file (because the key is derived from the file itself). When it comes to CSAM, though, this flaw is a feature: because Apple uses convergent encryption for its end-to-end encryption it can by definition do server-side scanning of files and exploit the “confirmation of file” flaw to confirm if CSAM exists, and, by extension, who “uploaded” it. Apple’s extremely low rates of CSAM reporting suggest that the company is not currently pursuing this approach, but it is the most obvious way to scan for CSAM given it has abandoned its on-device plan.

I wouldn’t, for the record, have any objection to Apple doing this; in fact, they should. Yes, this is a capability that could be extended to other content, but that is a risk inherent in uploading content to someone else’s servers. I wrote in the conclusion of Apple’s Mistake:

One’s device ought to be one’s property, with all of the expectations of ownership and privacy that entails; cloud services, meanwhile, are the property of their owners as well, with all of the expectations of societal responsibility and law-abiding which that entails. It’s truly disappointing that Apple got so hung up on its particular vision of privacy that it ended up betraying the fulcrum of user control: being able to trust that your device is truly yours.

There is another advantage to “confirmation of file” scanning to Apple’s previous plan: not only is it happening on Apple’s servers instead of on your device (although if you want to get really specific about the fact each file is probably only uploaded once across Apple’s user base, the line is quite fuzzy), but it also only works when the file is an exact match. Apple’s previous plan called for a technique known as “perceptual hashing”, which could identify a known photo even if it were cropped or had minute pixel-level changes; these sort of alterations would make a known photo undiscoverable with a “confirmation of file” approach, because the file is different! I think that’s a good thing, though: I wrote earlier this year about the horrors of getting this stuff wrong, and perceptual hashing does open the door to doing just that.

To defend Apple’s on-device system, there were a lot of safeguards in place to avoid false positives, including a threshold for review and humans in the loop; my chief objection was to the very concept of on-device scanning, even if it made some technical sense, and even though simply using an iPhone entails trusting Apple. Opposing it was a hard call, because there are real trade-offs involved, and the system I am endorsing here is only very slightly different in implementation, and effectively the same in practice (but easier to defeat by slightly altering the files in question). I think the principle of on-device versus on-server is something worth defending though, even if it entails splitting hairs.


This Update will be available as a podcast later today. To receive it in your podcast player, visit Stratechery.

The Stratechery Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly.

Thanks for being a subscriber, and have a great day!