Okay, let's play peon. Here are three perfectly legal and work-safe thumbnails of a famous singer: https://imgur.com/a/j40fMex. The singer is underage in precisely one of the three photos. Can you decide which one?
If your account has a large number of safety vouchers that trigger a CSAM match, then Apple will gather enough fragments to reassemble a secret key X (unique to your device) which they can use to decrypt the "visual derivatives" (very low resolution thumbnails) stored in all your matched safety vouchers.
An Apple employee looks at the thumbnails derived from your photos. The only judgment call this employee gets to make is whether it can be ruled out (based on the way the thumbnail looks) that your uploaded photo is CSAM-related. As long as the thumbnail contains a person, or something that looks like the depiction of a person (especially in a vaguely violent or vaguely sexual context, e.g. with nude skin or skin with injuries) they will not be able to rule out this possibility based on the thumbnail alone. And they will not have access to anything else.
Given the ability to produce hash collisions, an adversary can easily generate photos that fail this visual inspection as well. This can be accomplished straightforwardly by using perfectly legal violent or sexual material to produce the collision (e.g. most people would not suspect foul play if they got a photo of genitals from their Tinder date). But much more sophisticated attacks [2] are also possible: since the computation of the visual derivative happens on the client, an adversary will be able to reverse engineer the precise algorithm.
While 30 matching hashes are probably not sufficient to convict somebody, they're more than sufficient to make somebody a suspect. Reasonable suspicion is enough to get a warrant, which means search and seizure, computer equipment hauled away and subjected to forensic analysis, etc. If a victim works with children, they'll be fired for sure. And if they do charge somebody, it will be in Apple's very best interest not to assist the victim in any way: that would require admitting to faults in a high profile algorithm whose mere existence was responsible for significant negative publicity. In an absurdly unlucky case, the jury may even interpret "1 in 1 trillion chance of false positive" as "way beyond reasonable doubt".
Chances are the FBI won't have the time to go after every report. But an attack may have consequences even if it never gets to the "warrant/charge/conviction" stage. E.g. if a victim ever gets a job where they need to obtain a security clearance, the Background Investigation Process will reveal their "digital footprint", almost certainly including the fact that the FBI got a CyberTipline Report about them. That will prevent them from being granted interim determination, and will probably lead to them being denied a security clearance.
(See also my FAQ from the last thread [1], and an explanation of the algorithm [3])
Fair enough. I suppose it's true that you could create a colliding sexually explicit image where age is indeterminate, and the reviewer may not realize it isn't a match.
> Given the ability to produce hash collisions, an adversary can easily generate photos that fail this visual inspection as well.
Apple could easily fix this by also showing a low-res version of the CSAM image that was collided with, but I'll grant that they may not be able to do that legally (and reviewers probably don't want to look at actual CSAM).
The problem is that it is a scaled low-res version. There are well publicized attacks[1] showing you can completely change the contents of the image post scaling. There's also the added problem that if the scaled down image is small, even without the attack, it's impossible to make a reasonable human judgement call (as OP points out).
The problem isn't CSAM scanning in principle. The problem is that the shift to the client & the various privacy-preserving steps Apple is attempting to make is actually making the actions taken in response to a match different in a concerning way. One big problem isn't the cases where the authorities should investigate*, but that a malicious actor can act surreptitiously and leave behind almost no footprint of the attack. Given SWATting is a real thing, imagine how it plays out if child pornography is a thing. From the authorities perspective SWATting is low incidence & not that big a deal. Very different perspective on the victim side though.
* One could argue about the civil liberties aspect & the fact that having CSAM images is not the same as actually abusing children. However, among the general population that line of reasoning just gets you dismissed as supporting child abuse & is only starting to become acknowledged in the psychiatry community.
You're adding quite a lot of technobabble gloss to an "attack vector" that boils down to "people can send you images that are visually indistinguishable from known CSAM".
Guess what, they can already do this but worse by just sending you actual illegal images of 17.9 year olds.
While it would be bad to be subjected to such an attack, and there is a small chance it would lead to some kind of interaction with law enforcement, the outcomes you present are just scaremongering and not reasonable.
I suggest you reread the comment, because "people can send you images that are visually indistinguishable from known CSAM" is not what is being said at all. Where did you even get that from?
The point is precisely that people can become victims of various new attacks, without ever touching photos that are actual "known CSAM". For Christ's sake, half the comments here are about how adversaries can create and spread political memes that trigger automated CSAM filters on people's phones just to "pwn the libz".
> Guess what, they can already do this but worse by just sending you actual illegal images of 17.9 year olds.
No, this misses the point completely. You cannot easily trigger any automated systems merely by taking photos of 17.9 year olds and sending them to people. E.g. your own photos are not in the NCMEC databases, and you'd have to reveal your own illegal activities to get them in there. You (or malicious political organizations) especially cannot attack and expose "wrongthinking" groups of people by sending them photos of 17.9 year olds.
> No, this misses the point completely. You cannot easily trigger any automated systems merely by taking photos of 17.9 year olds and sending them to people.
An attacker can embed a matching image inside of a PowerPoint zip file, and email it to any corporate employee using O365.
Or, an angry parent can call the police and let them know that a 16 year old possesses nose pictures of their 15 year old girlfriend.
The over top response to this controversy is really disappointing.
Sure, your proposed attack, that requires the victim to have a 15 year old girlfriend, to break an (admittedly silly) law by having nude photos on their phone, for you to call the cops, and for them to take such a call seriously is clearly comparable to a vector that can be used to target innocents, groups of individuals, etc. who did not break the law in any way, and that do not require the attacker to handle prohibitex material at all, and requires Apple to keep a ton of information completely obscure to even provide a weak semblance of security (it was shown to be completely broken except possibly for one unknown hash, in two weeks). Clearly comparable. Sure. Clearly.
For one last time, the NeuralHash collisions make this tool perfectly unusable for catching pedos: all of the next generation of CSAM content will collide with hashes of popular, innocent images. Two weeks after it was deployed, Apple's CSAM scanning is now _only_ an attack vector and a privacy risk. It's completely useless for its nominal function. This would be a massive, hilarious own goal from Apple even if the public reaction was over the top (although it isn't). They just reduced the privacy and security of nearly all their customers, further exposed themselves to the whims of governments, and for no gain whatsoever.
Can you explain how these theoretical political memes hash-match to an image in the NCMEC database, and then also pass the visual check?
> "No, this misses the point completely. You cannot easily trigger any automated systems merely by taking photos of 17.9 year olds and sending them to people."
Did I say "taking"? I am talking about sending (theoretical) actual images from the NCMEC database. This is functionally identical to the "attack" you describe.
Yes, I can. This is just one possible strategy: there are many others, where different things are done, and where things are done in a different order.
You use the collider [1] and one of the many scaling attacks ([2] [3] [4], just the ones linked in this thread) to create an image that matches the hash of a reasonably fresh CSAM image currently circulating on the Internet, and resizes to some legal sexual or violent image. Note that knowing such a hash and having such an image are both perfectly legal. Moreover, since the resizing (the creation of the visual derivative) is done on the client, you can tailor your scaling attack to the specific resampling algorithm.
Eventually, someone will make a CyberTipline report about the actual CSAM image whose hash you used, and the image (being a genuine CSAM image) will make its way into the NCMEC hash database. You will even be able to tell precisely when this happens, since you have the client-side half of the PST database, and you can execute the NeuralHash algorithm.
You can start circulating the meme before or after this step. Repeat until you have circulated enough photos to make sure that many people in the targeted group have exceeded the threshold.
Note that the memes will trigger automated CSAM matches, and pass the Apple employee's visual inspection: due to the safety voucher system, Apple will not inspect the full-size images at all, and they will have no way of telling that the NeuralHash is a false positive.
Okay, perhaps the three thumbnails was unclear. I didn't mean to illustrate any specific attack with it, just to convey the feeling of why it's difficult to tell apart legal and potentially illegal content based on thumbnails (i.e. why a reviewer would have to click "possible CSAM" even if the thumbnail looks like "vanilla" sexual or violent content that probably depicts adults). I'd splice in a sentence to clarify this, but I can't edit that particular comment anymore.
Ok yeah, I do agree this scaling attack potentially makes this feasible, if it essentially allows you to present a completely different image to the reviewer as to the user. Has anyone done this yet? i.e. an image that NeuralHashes to a target hash, and also scale-attacks to a target image, but looks completely different.
(Perhaps I misunderstood your original post, but this seems to be a completely different scenario to the one you originally described with reference to the three thumbnails)
This attack doesn’t work. If the resized image doesn’t match the CSAM image your NeuralHash mimicked, then when Apple runs it’s private perceptual hash, the hash value won’t match the expected value and it will be ignored without any human looking at it.
We have no reason to believe that Apple's second, secret perceptual hash provides any meaningful protection against such attacks. At best, we can hope that it'll allow early detection of attacks in a few cases, but chances are that's the best it can do. We might not ever learn: Apple now has a very strong incentive not to admit to any evidence of abuse or to any faults in their algorithm.
(Sorry, this is going to be long. I know understand most/all of this stuff, it's mostly there to provide a bit of context for the users reading our exchange)
The term "hash function" is a bit of a misnomer here. When people hear "hash", they tend to think about cryptographic hash functions, such as SHA256 or BLAKE3. When two messages have the same hash value, we say that they collide. Fortunately, cryptographic hash functions have several good properties associated with them: for example, there is no known way to generate a message that yields a given predetermined hash value, no known way to find two different messages with the same hash value, and no known way to make a small change to a message without changing the corresponding hash value. These properties make cryptographic hash functions secure, trustworthy and collision-resistant even in the face of powerful adversaries. Generally, when you decide to use two unrelated cryptographic hash algorithms instead of one, executing a preimage attacks against both hashes becomes much more difficult for the adversary.
However, as you know, the hash functions that Apple uses for identifying CSAM images are not "cryptographic hash functions" at all. They are "perceptual hash functions". The purpose of a perceptual hash is the exact opposite of a cryptographic hash: two images that humans see/hear/perceive (hence the term perceptual) to be the same or similar should have the same perceptual hash. There is no known perceptual hash function that remains secure and trustworthy in any sense in the face of (even unsophisticated) adversaries. In particular, preimage attacks against perceptual hashes are very easy, compared to the same attacks against cryptographic hashes.
Using two unrelated cryptographic hashes meaningfully increases resistance to collision and preimage attacks. Using ROT13 twice does not increase security in any meaningful sense. Using two perceptual hashes, while not as bad, is still much closer to the "using ROT13 twice for added security" than to the "using multiple cryptographic hashes" end.
Finding a SHA1 collision took 22 years, and there are still no effective preimage attacks against it. Creating the NeuralHash collider took a single week. More importantly, even if you were to use two unrelated perceptual hash functions, executing a preimage attacks against both hashes need not become much more difficult for the adversary: easy * easy is still easy. Layering cryptography upon cryptography is meaningful, but only as long as one of the layers is actually difficult to attack. This is not the case for perceptual hashes. In fact, in many similar contexts, these adversarial attacks tend to transfer: if they work against one technique or model, they often work against other models as well [3]. In the attack discussed above, the adversary has nearly full control over the "visual derivative", so even a very unsophisticated adversary can subject the target thumbnail itself to the collider before performing the resizing attack, and hope that it transfers against the second hash. If the second hash is a variant of NeuralHash (somewhat likely, it could even be NeuralHash performed on the thumbnail itself; we don't know anything about it!), or if it's a ML model trained on the same or similar datasets (quite likely), or if it's one of the known algorithms (say PhotoDNA) then some amount of transfer is likely to happen. And given an adversary that is going to distribute a large number of photos anyway, a 10% success rate is more than enough. Given the diminished state space (fixed size thumbnails, almost certainly smaller than 64x64 for legal reasons), a 10% success rate is completely plausible even with these naive approaches. An adversary that has some (even very little information) about the second hash algorithm can do much more sophisticated stuff, and perform much better.
But what if we boldly rule out all transfer results? Doesn't Apple keep their algorithm secret?! Can we think of the weights (coefficients) of the second perceptual hash as some kind of secret key in the cryptographical sense? Alas, no. Apple would have to make sure that all the outputs of the secret perceptual hash are kept secret as well. Due to the way perceptual hashing algorithms work, they provide a natural training gradient having access to sufficiently many input-outputs examples is probably enough to train a high-fidelity "clone" that allows one to generate adversarial examples and perform successful preimage attacks even if the weights of the clone are completely different from the secret weights of the original network. This can be done with standard black box techniques [4]. It's much harder (but nowhere near crypto hard, still perfectly plausible) to pull this off when they have access to one bit of output (match or no match). A single compromised Apple employee can gather enough data to do this given the ability to observe some inputs and outputs, even if said employee has no access to the innards or the magic numbers. The hash algorithm is kept secret because if it wasn't, an attack would be completely trivial: but an adversary does not need to learn this secret to mount an effective attack.
These are just two scenarios. There are many others. "Nobody has ever demonstrated such an attack working end-to-end" is not a good defense: it's been two weeks since the system was rolled out, and once an attack is executed, we probably won't learn about it for years to come. But the attacker can be rewarded way before "due process" kicks in: e.g. if a victim ever gets a job where they need to obtain a security clearance, the Background Investigation Process will reveal their "digital footprint", almost certainly including the fact that the NCMEC got a report about them, even if the FBI never followed up on it. That will prevent them from being granted interim determination, and will probably lead to them being denied a security clearance. If you pull off this attack on your political opponents, you can prevent them from getting government jobs, possibly without them ever learning why. And again, this is one single proposed attack. There were at least 6 different attacks proposed by regular HN users in the recent threads!
As a more general observation, cryptography tends to be resistant to attacks only if one can say things such as "the adversary cannot be successful unless they know some piece of information k, and we have very good mathematical reasons (e.g. computational hardness) to believe that they can't learn k". The technology is flawed: even the state-of-the-art in perceptual hashes does not satisfy this criterion. Currently, they are at best technicool gadgets, but layering technicool upon technicool cannot make their system more secure.And Apple's system is a high-profile target if there ever was one.
Barring a major breakthrough in perceptual hashing (one that Apple decided to keep secret and leave out of both whitepapers), the claim that the secret second hash will prevent collision attacks is not justified. The chances of such a secret breakthrough are very slim: it'd be like learning that SpaceX has already built a base on the Moon and has been doing regular supply runs with secret spaceships. Vaguely plausible in theory (SpaceX has people who do rocketry, Apple has people who do cybersecurity), but vanishingly unlikely in practice.
And that's before we mention that the mere existence of the collider made the entire exercise completely pointless: the real pedos can now use the collider to effectively anonymize their CSAM drops, making sure that all of their content collides with innocnent photos, and ensuring that none of the images will be picked up by NeuralHash anyway. For all practical purposes, Apple's CSAM detection is now _only_ an attack vector, and nothing else.
The first half of your post is predicated on it being likely the noise added to generate hash A using the NeuralHash is likely to produce a specific hash B with some unknown perceptual hashing function (which they specifically call out [1] as independent of the NeuralHash function precisely because they don’t want to make this easy, so speculating it might be the NeuralHash run again is incorrect). Hash A is generated via thousands of iterations of an optimization function, guessing and checking to produce a 12 bit number. What shows that same noise would produce an identical match when run through a completely different hashing function that is designed very differently specifically to avoid these attacks? Just one bit of difference will prevent a match. Nothing you’ve linked to would show any likelihood of that being anywhere close to 10 percent.
For the second part, yes if an Apple engineer (that had access to this code) leaked the internal hash function they used or a bunch of example image’s to hash values, that would allow these adversarial attacks.
Until you can show an example or paper where the same adversarial image generates a specific hash value for two unrelated perceptual hash functions, with one being hidden, it is not right to predict a high likelihood of that first scenario being possible.
Here’s a thought exercise, how long would it have taken researches to generate a hash collision with that dog image if the NeuralHash wasn’t public and you received no immediate feedback that you were “right” or getting closer along the way?
> Until you can show an example or paper where the same adversarial image generates a specific hash value for two unrelated perceptual hash functions, with one being hidden, it is not right to predict a high likelihood of that first scenario being possible.
"There is no paper attacking ROT13 done twice, therefore it must be secure". Usually, it's on the one proposing the protocol to make a case for its security. Doubly so when it's supposed to last a long time, a lot of people are interested in attacking it, and successful attacks can put people in harm's way.
You know what, if you think that this is difficult, feel free to pick an existing perceptual hash function H, cough up some money, and we'll announce a modest prize (say $4000) on HN for the first person to have a working collision attack for NeuralHash+H. H will run on a scaled-down thumbnail, and we'll keep the precise identity of the algorithm secret. If the challenge gets any traction, but nobody succeeds within 40 days, I'll pay you $4000 for your effort. If you're right, this should be easy money. (cf SHA1, which lasted 22 years)
Heck If Apple claims that this is difficult (afaict they don't, it would be unwise), they might even join in with their own preimage challenge for $$$. It'd be a no-brainer, a simple and cheap way of generating good publicity.
They claim their H is resistant to adversarial attacks, so they are claiming this to be difficult.
If I took an exact public perceptual hash function implementation and used that as H in your contest, it might be possible for a researcher attacking all public perceptual hash functions to stumble on the right one within 40 days.
I agree with you that we are trusting Apple to implement this competently. This isn’t something that can be proved to work mathematically where nothing about the implementation has to be kept secret.
So worse case everything you say could come true but to imply that is likely is wrong.
This leaves open the question of how the image gets on the device of the victim. You would have to craft a very specific image that the victim is likely to save, and the existence of such a specially crafted file would completely exonerate them.
2. Generate an objectionable image with the same hash as the target's photo. (This is obviously illegal.)
3. Submit the objectionable image to the government database.
Now the target's photo will be flagged until manually reviewed.
This doesn't sound impossible as a targeted attack, and if done on a handful of images that millions of people might have saved (popular memes?) it might even grind the manual reviews to a halt. But maybe I'm not understanding something in this (very bad idea) system.
This requires the attacker handling CSAM which defeats the benefit. The risk in all cases is anytime you actually handle CSAM then the attack is void since you're now actually guilty of the crime and have to do it (very few will cross that line).
The point though is that this is something someone's Apple phone is doing, that their device is not. So the goal is to send a hash collided images by non-Apple channels (email) where there is a reasonably good chance that image would make it's way into someone's global device photo store and into automatic iCloud uploads.
Sending an MMS would work, for example, or a picture to Signal which then someone saves to outside of Signal (a meme).
In all these cases, the original sender doesn't have an Apple device: so they're not getting scanned by the same algorithm, but more importantly their device is not spying on them. Importantly too: they've done nothing illegal.
But: the victim is getting flagged by their own device. And the victim has to have their device seized and analysed to determine (1) that it's not CSAM, (2) that they were sent those images that flagged and aren't trying to divert attention by getting themselves false pinged upfront, but then (3) the sender has committed no crime. There's no reason or even risk to investigate them, because by the time the victim has dealt with law enforcement, it's been established that no one had anything illegal.
It's the digital equivalent of a sock of cat litter testing positive as being methamphetamine, except if it was your drive through McDonald's order.
The goal is not to get convictions, the goal is harrassment.
Perhaps that's true in the narrowest sense, but aren't the odds of generating a colliding file so low as to all but rule out coincidence and therefore strongly indicate premeditated cyber-attack (which is illegal)?
If I were law enforcement, at the very least I'd want to keep tabs on these sources of false positives. Probably easy enough to convince a judge that someone capable of the "tech wizardry" to collide a hash can un-collide one too, and therefore more thorough/invasive search warrants of the source are justified.
Your argument is "the technology is flawed, there let's also arrest anyone who we suspect of generating false positives".
Like security researchers. Or the people currently inspecting the algorithm. And also frankly what are you going to do about overseas adversaries? The most likely people looking at how to exploit this would explicitly be state-sponsored Russian hackers - this is right up the alley of their desire to be able to cause low level chaos without committing to a serious attack.
And at the end of the day you've still succeeded: the point is that by the time you've established it was spurious, the target has already been through the legal wringer. The legal wringer is the point.
None of those thumbnails (or visual derivatives) will match the hash value of the known csam you are trying to simulate since it won’t be possible to know the hash value target since that hash function is private.
If your account has a large number of safety vouchers that trigger a CSAM match, then Apple will gather enough fragments to reassemble a secret key X (unique to your device) which they can use to decrypt the "visual derivatives" (very low resolution thumbnails) stored in all your matched safety vouchers.
An Apple employee looks at the thumbnails derived from your photos. The only judgment call this employee gets to make is whether it can be ruled out (based on the way the thumbnail looks) that your uploaded photo is CSAM-related. As long as the thumbnail contains a person, or something that looks like the depiction of a person (especially in a vaguely violent or vaguely sexual context, e.g. with nude skin or skin with injuries) they will not be able to rule out this possibility based on the thumbnail alone. And they will not have access to anything else.
Given the ability to produce hash collisions, an adversary can easily generate photos that fail this visual inspection as well. This can be accomplished straightforwardly by using perfectly legal violent or sexual material to produce the collision (e.g. most people would not suspect foul play if they got a photo of genitals from their Tinder date). But much more sophisticated attacks [2] are also possible: since the computation of the visual derivative happens on the client, an adversary will be able to reverse engineer the precise algorithm.
While 30 matching hashes are probably not sufficient to convict somebody, they're more than sufficient to make somebody a suspect. Reasonable suspicion is enough to get a warrant, which means search and seizure, computer equipment hauled away and subjected to forensic analysis, etc. If a victim works with children, they'll be fired for sure. And if they do charge somebody, it will be in Apple's very best interest not to assist the victim in any way: that would require admitting to faults in a high profile algorithm whose mere existence was responsible for significant negative publicity. In an absurdly unlucky case, the jury may even interpret "1 in 1 trillion chance of false positive" as "way beyond reasonable doubt".
Chances are the FBI won't have the time to go after every report. But an attack may have consequences even if it never gets to the "warrant/charge/conviction" stage. E.g. if a victim ever gets a job where they need to obtain a security clearance, the Background Investigation Process will reveal their "digital footprint", almost certainly including the fact that the FBI got a CyberTipline Report about them. That will prevent them from being granted interim determination, and will probably lead to them being denied a security clearance.
(See also my FAQ from the last thread [1], and an explanation of the algorithm [3])
[1] https://news.ycombinator.com/item?id=28232625
[2] https://graphicdesign.stackexchange.com/questions/106260/ima...
[3] https://news.ycombinator.com/item?id=28231218