Pornography itself is a slippery slope. You crave more and more explicit content and eventually get to CP.
Pornography itself is a slippery slope. You crave more and more explicit content and eventually get to CP.
Thank you for explaining things. I appreciate it.
I don’t think I was referring to false negatives. I was referring to false positives. A false negative would be something illegal that was not flagged.
So if a false positive is so incredibly rare, probably less so than all false arrests combined, why not set the threshold to one or two then involve the authorities?
I’m concerned about false positives but if that is truly almost impossible I say anyone having even one illegal image should be arrested.
Doubtful. I have experimented with many different perceptual hashing algorithms in my work, and there would always be false matches like a picture of a cow on a green field producing the same hash as a picture of a tractor on a green field. Turn up the threshold to prevent this, and suddenly minimal cropping prevents matching of some otherwise identical pictures. It is hard to find a good balance between false positives and not missing what you want to get, and it wil never be perfect.The point is that the false positive won't generally look anything like the target CSAM image.
...and you have neatly demonstrated the exact false logic that lead to a massive miscarriage of justice in the Sally Clarke case and which is more generally known as "The Prosecutor's Fallacy".They ask everyone to take two coins from their pocket and flip them. If they flip tails twice, it's off to the hoosegow. Each flip of a fair coin has a 50% false positive rate, but the chance of two tails in a row is only 25% (0.5^2) so you'll have a 25% false positive rate.
Make everyone flip 30 coins and you have a false positive rate of 0.5^30, or about 1 in a billion.
When the 30th hash fails, Apple can unlock the hashes and derivative images. A human can then look, not at the actual images themselves, but some unexplained derivative of them.
Why would I install a mobile OS on my Mac? And why do you claim Google scans people's private files on their Android devices?If MacOS scans your files they will be no different than google, at that point you might as well get an Android
I think there is nothing wrong with discussing the technical aspects of such a system. However, I would agree that such a system should be rejected even if its implementation was proven to be flawless.What are you guys arguing about? I think that really no one who talks about false positives and all that kind of 💩gets it.
Why would I install a mobile OS on my Mac? And why do you claim Google scans people's private files on their Android devices?
It's not about the implementation or the technology - the question is who has control over this technology (it certainly won't be the smartphone user).I think there is nothing wrong with discussing the technical aspects of such a system. However, I would agree that such a system should be rejected even if its implementation was proven to be flawless.
Something I do not see discussed much is the involvement of NCMEC, the organization that maintains the CSAM hash database, and that whole situation I find rather shady. NCMEC is a government-funded private NGO, a quasi-agency with agency powers but no oversight or transparency requirements. They effectively have the power to declare numbers illegal, and US companies are required by law to report user content matching such hash numbers to this private(?) organization.
There is no independent auditing of the NCMEC database, only they know what is in there. Insiders have claimed the database also includes individually harmless pictures that were confiscated along with CSAM. Whether true or not, I could see some sense in that, as possessing the exact same sundown picture that was also found on some creep's hard drive could indeed be a bit suspicious - but it would also make a detection algorithm that aims to find visually similar images all the more worrisome. Swiss federal police has complained that 90% of the warnings they get from NCMEC turn out to be irrelevant.
Last but not least, it was NCMEC who urged Apple to ignore criticism as the "screeching voice of the minority". This is an organization with a serious holier-than-thou attitude, they know what is right and everyone else better shut up and do as they say. Fighting CSAM is a worthy cause, but I have little trust in NCMEC or methods involving them.
...and you have neatly demonstrated the exact false logic that lead to a massive miscarriage of justice in the Sally Clarke case and which is more generally known as "The Prosecutor's Fallacy".
Specifically, you can only say "the chance of two tails in a row is only 25% (0.5^2)" when you are talking about uncorrelated events. Look at any high school probability task and they'll usually make a point of saying that it is a "fair coin" - part of that means that each toss is completely independent of what went before. Your specific example is kinda value because real-life coins are a reasonably close approximation to theoratical "fair coins" - but you have to be very, very careful applying that to a different situation where you don't know that the events are totally random and uncorrelated.
The basis of the Sally Clark error - which convicted an innocent woman and possibly contributed to her early death - was the false logic that "the probability of one sudden infant death in a family is p so the probability of two sudden infant deaths is p x p which is so infinitesimal that it must be murder" - treating the deaths as two totally random events and ignoring the possibility that there might be some genetic or environmental cause that made the second death a near certainty.
(It's a very common misconception - I once had to call out a question on a major UK maths exam where the same basic probability question got recycled year after year in a different context - it was OK while it was tossing two fair coins, or throwing a fair die, or throwing a dart (maybe) - but then one year it was the probability of both seeds planted in the same pot germinating which is obviously dependent on environmental factors, yet the expected answer was
More generally, though - if you're selecting people from a large population then after the fact that low-low false positive chance becomes misleading. Its the difference between DNA fingerprinting (which also has a false positive rate) 10 actual suspects in a case - a match would be pretty conclusive - vs. checking against a database of 300 million DNA fingerprints (which brings a significant chance of at least one false match).
Ok, I'm going to try to explain this a bit by example. This isn't a rigorous mathematical treatment, but hopefully gives you an idea of what I mean. Please understand this is an example involving coins and not image hashes […]
The math for the Apple scheme gets far more complicated because there's more images, and more things to match, and it's not just a binary coin flip. I don't think I can make any estimates of the underlying probabilities from what's published
According to the interwebs there are about 1.46 billion active iPhone users. So a 1 in a billion chance of a false match means the probability of finding a false match is about 1.
Yet somehow your entire critique rests on that assumption...if you read this and walk away thinking every image has a 50% chance of a match, you're understanding it wrong.
...and the contents of a person's picture collection is clearly not uncorrelated - they're likely to have lots of photos of the same or similar subjects - if not multiple copies of the same image. if someone has one image that is generating a false match then it is very likely that they will have a second photo that matches.
...another claim that just doesn't add up, because I only see two possibilities:
(a) The derivative images resemble the original images, so Apple would be able to meaningfully check that the match had some basis in reality - in which case the "hash" isn't really a hash - it would have to contain a low-res version of the image - and the claims about Apple not being able to access your images "at all" would all be false (and the hashes of known CSAM would themselves be CSAM if they could be converted into something recognisable).
(b) The derivative images are a visual representation of the hash and bear no resemblance to the original image - and comparing them proves nothing about the relationship between the two actual images. That seems plausible, because it is awfully like this widely used technique for human-friendly comparions of long numbers:
View attachment 2254064
...which is a quick way of visually checking that the "fingerprint" of the key on the host you are logging into is what you expected. However, all that amounts to is a more convenient way of checking that two long (256 bit or more) "fingerprints" are equal - humans are better at comparing images than long strings of numbers and letters. It does not guard against the possibility that the host has, somehow, falsely generated the expected fingerprint. In this example, the "fingerprint" is a cryptographic hash specifically designed to make that very unlikely, and which will change in response to the tiniest change in the actual key. In the Apple example we're talking about an image recognition hash specifically designed to produce the same hash for non-identical images. If the algorithm has, for some reason, generated a false match, the hashes will be equal and the derived images will be the same.
What that technique (and the '30 hits' threshold) would help guard against is random bit flips due to memory faults, power glitches and cosmic rays - which is absolutely an issue if you're dealing with data on that scale (its why people are complaining that the Mac Pro doesn't have ECC RAM) - but is completely unrelated to the chance of a 'false positive' caused by the algorithm matching some irrelevant feature of two images.
The real worry is that - if there are rebuttals to these concerns (and only the developers can say for sure), they should have been addressed at some length in the white paper - not glossed over.
The training here matters, not just thresholds. If the network is trained to find contextually similar images, or not trained against it, it may cluster "things in fields" together. If the network is trained to find specific images and trained away from finding similar but distinct images, as Apple does here, it will better discriminate between them-- generate hashes a greater distance from the target hash. If non-target images of the types people are concerned about (generally just any image involving nudity) are in the training set of distraction images, it will train the network to differentiate on features other than simple nudity.Doubtful. I have experimented with many different perceptual hashing algorithms in my work, and there would always be false matches like a picture of a cow on a green field producing the same hash as a picture of a tractor on a green field. Turn up the threshold to prevent this, and suddenly minimal cropping prevents matching of some otherwise identical pictures. It is hard to find a good balance between false positives and not missing what you want to get, and it wil never be perfect.
Just dropping an opinion droplet in this vast sea of comments:
These ideas whether implemented finally or not, I totally believe it comes from a good place… heck, maybe this is all diversion.
HOWEVER, companies time and time again have proven themselves to be quite unreliable for the most simple things, even for tasks that were solved decades ago (submarines imploding in 2023 anyone?).
Not for a single second I trust that this would be a flawless implementation both in privacy and correct tag triggering… since if simple things like “Hey Siri, set a timer for 15mins” fail often or macOS’s Systems Settings that has been working for 30+ years suddenly start having bugs.
I can totally see cases where taking a picture of a cloudy sky sends the FBI your way.
Should it be completely abandoned? Don’t know, debatable, I don’t think so… but if it will work properly!
I don’t know what’s up, it boils me how bad things have gotten overtime, hard to trust anything regarding sensitive matters.
I think there is nothing wrong with discussing the technical aspects of such a system. However, I would agree that such a system should be rejected even if its implementation was proven to be flawless.
Something I do not see discussed much is the involvement of NCMEC, the organization that maintains the CSAM hash database, and that whole situation I find rather shady. NCMEC is a government-funded private NGO, a quasi-agency with agency powers but no oversight or transparency requirements. They effectively have the power to declare numbers illegal, and US companies are required by law to report user content matching such hash numbers to this private(?) organization.
There is no independent auditing of the NCMEC database, only they know what is in there. Insiders have claimed the database also includes individually harmless pictures that were confiscated along with CSAM. Whether true or not, I could see some sense in that, as possessing the exact same sundown picture that was also found on some creep's hard drive could indeed be a bit suspicious - but it would also make a detection algorithm that aims to find visually similar images all the more worrisome. Swiss federal police has complained that 90% of the warnings they get from NCMEC turn out to be irrelevant.
Last but not least, it was NCMEC who urged Apple to ignore criticism as the "screeching voice of the minority". This is an organization with a serious holier-than-thou attitude, they know what is right and everyone else better shut up and do as they say. Fighting CSAM is a worthy cause, but I have little trust in NCMEC or methods involving them.
Let's assume everyone's honest, and that the probability any single test is wrong is 0.5^30, and we want to determine the chance we'll get one or more false positives in a city of 1M. Then I believe this is correct:Make everyone flip 30 coins and you have a false positive rate of 0.5^30, or about 1 in a billion. If it's just a lawful citizen, they have a one in a billion chance of being flagged. If the city is Dallas then the chance that someone in the population is falsely flagged is about one in a billion false positives for a population of about a million.
Yes. Most cloud based photo storage systems have clear terms of service to not host illegal material.Isn’t this detecting feature already on their servers for iCloud? When this controversy came up I got to understand that they wanted to move the detecting process to the customers devices, so pictures on iCloud are being already monitorised.
Edit: A post of the telegraph talking about it:
Apple scans photos to check for child abuse
Apple scans photos to check for child sexual abuse images, an executive has said, as tech companies come under pressure to do more to tackle the crime.www.telegraph.co.uk
Let's assume everyone's honest, and that the probability any single test is wrong is 0.5^30, and we want to determine the chance we'll get one or more false positives in a city of 1M. Then I believe this is correct:
Probability any single test is wrong (false positive, since we've assume everyone's honest) = 0.5^30
Probability any single test is correct = 1 – 0.5^30
Probability that every one of the 10^6 tests are correct = (1 – 0.5^30)^(10^6)
Probability that the above is not the case, i.e., that one or more tests is wrong = 1 – (1 – 0.5^30)^(10^6) ≈ 0.001
Thus the probability that one or more honest citizens get a false positive is actually ≈ 0.1%. With a population of 100M, the probability increases to ≈ 10%.
Apple's analysis suggested one in a trillion accounts would trigger a false positive. So with a billion iCloud accounts that means a 1 in 1,000 chance of ever needing manual check. If that estimate is off by 5 or 6 orders of magnitude it means one person double checking the derivative data of 30 false positives a day. That doesn't sound like too much.
So regardless if you agreed to iCloud Ts & Cs they were going to scan every photo you have or take.
definitely likely to want to be used by less desirable governments.
Sure. Apple might scan macOS. Just don’t see how Android is a realistic alternative.You think if Apple scanned iOS they won't scan MacOS?
As for Google, you must be kidding. You do know their business is to collect your data to sell for advertisers right? They are worth $1.6T for that. So is Facebook.
There was someone that posted info about the company behind the CSAM database, that is has no transparency requirements and it’s a semi government company. They would just need to add the hashes of “problematic material” (EG: a pic depicting the US president as a clown”) to the database and send the updated version to Apple.It only would run if you uploaded your photos to iCloud, so presumably you'd have to agree to the iCloud Ts & Cs...
I never understood this argument. What's to stop bad governments from wanting to do this anyway? If they want to insist that Apple scan everyone's photos they still can demand that and Apple will likely deny the request. If they implement this hash scanning, they can demand Apple expand it, and Apple will likely deny the request.
Yes. Most cloud based photo storage systems have clear terms of service to not host illegal material.
Apple wanted to move this to being on device. So regardless if you agreed to iCloud Ts & Cs they were going to scan every photo you have or take.
Super creepy, and definitely likely to want to be used by less desirable governments.