Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

peter-sk

macrumors newbie
Original poster
Jul 2, 2022
3
6
Hi everyone,
I did not find any software (paid or free) that can handle Photos libraries with around a 100.000 or more images.
I have tried a number, but none of them would scan the entire library effectively in reasonable time.

In the end, I coded my own free solution:

https://github.com/peter-sk/photosdup

I would be interested to know if this solution works for others. And where it breaks down.

The main idea is that it first scales images to 50x50 pixels (as difPy does) and then builds a smart datastructure (a KD tree) that allows finding similar pictures without having to compare all images with each other.

In my case, it helped me to tag approx. 12.000 out of approx. 66.000 images as duplicate using the keyword function of Mac Photos, allowing me to review and delete these images.

Looking forward to your experiences and feedback :)

Peter
 

Tagbert

macrumors 603
Jun 22, 2011
6,264
7,287
Seattle
Hi everyone,
I did not find any software (paid or free) that can handle Photos libraries with around a 100.000 or more images.
I have tried a number, but none of them would scan the entire library effectively in reasonable time.

In the end, I coded my own free solution:

https://github.com/peter-sk/photosdup

I would be interested to know if this solution works for others. And where it breaks down.

The main idea is that it first scales images to 50x50 pixels (as difPy does) and then builds a smart datastructure (a KD tree) that allows finding similar pictures without having to compare all images with each other.

In my case, it helped me to tag approx. 12.000 out of approx. 66.000 images as duplicate using the keyword function of Mac Photos, allowing me to review and delete these images.

Looking forward to your experiences and feedback :)

Peter

Interesting. I use PowerPhotos for that. My library is 118K items. I had not noticed a major performance problem with dup scanning but i mainly did that when I used it to merge my libraries into one when i finally got a faster Mac. How long does a dup scan take for your library?
 
  • Like
Reactions: rmadsen3

peter-sk

macrumors newbie
Original poster
Jul 2, 2022
3
6
Interesting. I use PowerPhotos for that. My library is 118K items. I had not noticed a major performance problem with dup scanning but i mainly did that when I used it to merge my libraries into one when i finally got a faster Mac. How long does a dup scan take for your library?
I just compared. For a library with 16.000 images, PowerPhotos took close to 1 hour (26 minutes for the initial scan +a approx. 30 minutes for generating all the classes etc.). My Python script took 16 minutes.

For larger libraries, the difference grows.

Cheers,
Peter
 

Tagbert

macrumors 603
Jun 22, 2011
6,264
7,287
Seattle
I just compared. For a library with 16.000 images, PowerPhotos took close to 1 hour (26 minutes for the initial scan +a approx. 30 minutes for generating all the classes etc.). My Python script took 16 minutes.

For larger libraries, the difference grows.

Cheers,
Peter
Nice
 

peter-sk

macrumors newbie
Original poster
Jul 2, 2022
3
6
Hi all,

I found that the (already quite good) performance was basically limited by my SSD's sustained reading speed. I improved photosdup by using the thumbnails pre-computed by the Photos app and by first computing possible duplicates using 10x10 pixel images before using more time on 50x50 pixel images.

With the --thumbs option, it just scanned a library with 26637 photos containing 2576 duplicates (mostly HEIC and JPEG of the same iPhone photos) in 2 minutes and 21 seconds :)

Cheers,
Peter
 
  • Wow
Reactions: Tagbert

Mcrumors David

macrumors regular
Oct 8, 2014
190
77
Does any of those softwares display the containing album? ...and also, can link back the remaining photo of the duplicated to the albums it was taken out of?
 

Unk1ne

Suspended
Mar 7, 2022
34
28
Reno,Nv
Screenshot 2022-08-19 at 11.52.35 AM.png

The Option to delete Duplicates
 

Tagbert

macrumors 603
Jun 22, 2011
6,264
7,287
Seattle
Does it link back the duplicates to the albums? I doubt it ...
I doubt that any of them do. They would need to design the feature around album merging to do that and I doubt that it is a common enough problem to focus on.

Would it work for you to merge two albums and then run a process to remove duplicates? Would that give you what you want?
 

Mcrumors David

macrumors regular
Oct 8, 2014
190
77
I doubt that any of them do. They would need to design the feature around album merging to do that and I doubt that it is a common enough problem to focus on.

Would it work for you to merge two albums and then run a process to remove duplicates? Would that give you what you want?

I second that ...

PowerPhotos does remove duplicates and keep albums intact (i.e. link back the remaining photo to the albums from which the duplicates were deleted from) ! (I did not test, but got a reply from Herr Webster)

1660977092989.png


PS: When you have 130k photos, merging around albums - of course - is not an option.
 
  • Like
Reactions: Tagbert

Tagbert

macrumors 603
Jun 22, 2011
6,264
7,287
Seattle
I second that ...

PowerPhotos does remove duplicates and keep albums intact (i.e. link back the remaining photo to the albums from which the duplicates were deleted from) ! (I did not test, but got a reply from Herr Webster)

View attachment 2045099

PS: When you have 130k photos, merging around albums - of course - is not an option.
Power Photos has other features, too. It can split and merge photo libraries. I used to do that a lot when using iPhoto as it had a hard time dealing with large photo libraries. Photos seems marginally better in that regard.
 

RhetTbull

macrumors member
Apr 18, 2022
99
73
Los Angeles, CA
Hi everyone,
I did not find any software (paid or free) that can handle Photos libraries with around a 100.000 or more images.
I have tried a number, but none of them would scan the entire library effectively in reasonable time.

In the end, I coded my own free solution:

https://github.com/peter-sk/photosdup

I would be interested to know if this solution works for others. And where it breaks down.

The main idea is that it first scales images to 50x50 pixels (as difPy does) and then builds a smart datastructure (a KD tree) that allows finding similar pictures without having to compare all images with each other.

In my case, it helped me to tag approx. 12.000 out of approx. 66.000 images as duplicate using the keyword function of Mac Photos, allowing me to review and delete these images.

Looking forward to your experiences and feedback :)

Peter
Hi Peter. Thanks for sharing! This looks useful. I'm the author of the PhotoScript package you used to create folders and albums -- glad to see a good use for this! Cheers!
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.