Yeah, I don’t have that answer, of course. But nothing prevents them from changing that secondary algorithm yearly, or at whatever rate the CSAM database owners would tolerate full rehashing, or chaining together multiple hashes. They can literally tune it to whatever arbitrary false positive rate they want. Although, not knowing any better, I would guess that they would just use Microsoft’s PhotoDNA hash unchanged, and just keep it under wraps, since I think that’s what they already use for iCloud email attachment scanning. PhotoDNA just does a scaled down, black and white edge/intensity gradient comparison, and not a neural net feature detection. I would think using a completely different technology would make the pair of algorithms extremely robust taken together, but that’s not my field.
While there may not be an immovable obstacle standing between them and a complete recataloging, there are a lot of factors that would strongly disincentivise it. Chief among them being the fact that the project is a already a radioactive cost center - and unless they plan on switching industries and giving Blue Coat a run for its money, it always will be.
> ...chaining together multiple hashes.
That would be the lazy programmer way to do it that would very likely result in a situation where correlation starts popping up - that is why DBAs weren't advised to do some whacky md5/sha1 mashup that avoids requiring every user rekey in the wake of a digest bump up.
> ...I would guess that they would just use Microsoft’s PhotoDNA hash unchanged...
That is a reasonable guess, because that is what all the NGOs have been using - IWF being one of the more notorious. That would be bad news though, for anyone expected the thumbnail perceptual hashing step to provide meaningful protection.
> I would think using a completely different technology would make the pair of algorithms extremely robust...
Nope - which is why you don't see hybrid cryptographic algorithms. Also, if they are using PhotoDNA on their verification step then they actually implemented the thing totally backwards... because the high-pass filter approach makes it resistant to the hash length extension attacks that are imperceivable to humans. That counts for nothing by the time the first algorithm has been fooled by an extension attack (and this neural thing is definitely vulnerable to it), because the attacker would already be selecting for a thumbnail image that would fool a human in the second step - and PhotoDNA would be looking for the exact same thing that a human would: points of contrast.
BTW, PhotoDNA is a black box with no outside scrutiny to speak of - you can count on one hand the number of papers where it is even mentioned (and only ever in passing).