Hi everyone,
I did not find any software (paid or free) that can handle Photos libraries with around a 100.000 or more images.
I have tried a number, but none of them would scan the entire library effectively in reasonable time.
In the end, I coded my own free solution:
https://github.com/peter-sk/photosdup
I would be interested to know if this solution works for others. And where it breaks down.
The main idea is that it first scales images to 50x50 pixels (as difPy does) and then builds a smart datastructure (a KD tree) that allows finding similar pictures without having to compare all images with each other.
In my case, it helped me to tag approx. 12.000 out of approx. 66.000 images as duplicate using the keyword function of Mac Photos, allowing me to review and delete these images.
Looking forward to your experiences and feedback
Peter
I did not find any software (paid or free) that can handle Photos libraries with around a 100.000 or more images.
I have tried a number, but none of them would scan the entire library effectively in reasonable time.
In the end, I coded my own free solution:
https://github.com/peter-sk/photosdup
I would be interested to know if this solution works for others. And where it breaks down.
The main idea is that it first scales images to 50x50 pixels (as difPy does) and then builds a smart datastructure (a KD tree) that allows finding similar pictures without having to compare all images with each other.
In my case, it helped me to tag approx. 12.000 out of approx. 66.000 images as duplicate using the keyword function of Mac Photos, allowing me to review and delete these images.
Looking forward to your experiences and feedback
Peter