Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

0002378

Suspended
Original poster
May 28, 2017
675
671
Let me explain the scenario.

I have a huge folder of documentaries on my internal HDD, and I also keep them backed up on my external drive.

Let's say that one day, I decide to reorganize my documentaries on my local drive into neat categories (folders). I don't add any new files, I just move 'em around locally.

So, my internal drive structure looks like:
blabla.../Documentaries/War/WW2 From Space.mp4
blabla.../Documentaries/Disaster/Runaway Train.mp4

Now, those same files already exist on my external (backup) drive, but are not yet neatly organized into categories/subfolders:
blabla.../Documentaries/WW2 From Space.mp4
blabla.../Documentaries/Runaway Train.mp4

So, when I now use rsync --update using local drive as src and ext drive as dest, it will think that /War/WW2 From Space.mp4 doesn't exist ! Because it is not under the same relative path on the external drive. So, it will (needlessly) copy it over under a new /War directory.

Can I tell rsync to be smarter and look for a src file anywhere in the dest Documentaries folder, and if found, simply move it around, creating new folders as needed, so it is organized the same way as the src (local Documentaries folder) ? This will save me a TON of time, because a copy operation is obviously much more time (and space) consuming than a simple move.

So far, I have written a Java program that does exactly what I want (Java is my go-to for stuff like this), but I'm wondering if I really need to reinvent the wheel here.

If not rsync, any other utility ?
 
Last edited:
I don't think 'rsync' can do this, and a quick search of its man page for the word "move" doesn't suggest anything. I'm not surprised by this, since rsync is designed to sync things, not reorganize them. Part of syncing is maintaining structure, in addition to file contents, so I don't see why rsync would have this capability.

To move things around within a structure would require that rsync maintain all the hashes for all the files, so it could know which things are actually identical; it certainly can't rely on names alone to do that. The thing is, maintaining all the hashes only has a purpose when moving things around, which implies altering structure rather than maintaining it. And as noted above, rsync is designed to maintain structure, not alter it.


If I were doing this, I might make an awk script. That's mainly because I'm thinking of the inputs and the actions, not the language to write it in.

One input would be the list of filenames in their uncategorized locations (plain leaf names). Another input would be the list of files in their categorized locations (partial pathnames, with dirnames). The action is to match each uncategorized name to its categorized name (pattern matching is awk's forté), and emit a 'mv' command-line that performs the move. Then you manually check the output list of 'mv' commands to make sure it's sensible, and finally feed it to a bash shell as input.

The inputs might be produced with 'find' or 'ls -R' or maybe some other source.

If you wanted the awk script to match src and dst according to something other than leaf filename, you'd put that as an additional field in both the inputs (uncategorized and categorized name lists). Then awk would match those instead of leaf names.

You could even write the script so it always matches according to a 2nd field, then structure your inputs so the "hash" is the leaf filename. That way you could use different criteria (metadata, hashes, leaf names, etc.) and structure your inputs, yet the script that actually produces the 'mv' commands would be the same, since all it's doing is matching field-A of input set 1 with field-A of input set 2, and producing a 'mv -fv {field_B_set_1} {field_B_set_2}'. Awk's "associative arrays" would be instrumental here.

Tools other than awk could also do this. Perl springs to mind, but I don't know it well enough to say how to accomplish this.

Clearly, a Java app could also do this (Map), since the structured inputs are pretty easy to parse, and outputting a series of mv's is child's play.
 
  • Like
Reactions: 0002378
Thanks, chown. By "hashes", do you mean the inodes ?

I've already written a Java program to do this directly using java.io.File objects. It was trivial to do. I was just hoping that someone else has already written something that has been out there forever and hence been tested to death and deemed efficient and reliable.

Thanks, in any case.
 
Thanks, chown. By "hashes", do you mean the inodes ?

I've already written a Java program to do this directly using java.io.File objects. It was trivial to do. I was just hoping that someone else has already written something that has been out there forever and hence been tested to death and deemed efficient and reliable.

Thanks, in any case.
inode numbers will work, but so would anything that's unique to the file. In the schemes I outlined it's just a key, used to match a source "thing" to a destination "place". Reorganization consists of matching keys between 2 sets, then generating the action that transforms one into the other.

If "reorganization" amounts to "move THIS to THERE" repeated over a set of THIS'es and THERE's, then all that's needed is a unique key for each THIS to match to its corresponding THERE. The circumstances will dictate what happens if a THERE doesn't have a THIS, or there are multiple THIS'es for a THERE, or even multiple THERE's for a THIS. In a corporate reorg, the set of THIS'es with no THERE's are let go (fired).
 
  • Like
Reactions: 0002378
If the files are organized in folders as you want them on the external drive, couldn't you just delete the folder from your internal drive and replace it with the backup?
 
If the files are organized in folders as you want them on the external drive, couldn't you just delete the folder from your internal drive and replace it with the backup?

Yes, I could do that. But, that would be very time consuming, as my movies directory is huge ! I'm talking > 1TB. So, replacing it each time would not be a feasible solution.
 
If "reorganization" amounts to "move THIS to THERE" repeated over a set of THIS'es and THERE's, then all that's needed is a unique key for each THIS to match to its corresponding THERE. The circumstances will dictate what happens if a THERE doesn't have a THIS, or there are multiple THIS'es for a THERE, or even multiple THERE's for a THIS. In a corporate reorg, the set of THIS'es with no THERE's are let go (fired).
Further, if there's no THERE there and no there here, then it stand to reason the here that's here is not where the here was to begin with. The user must then make the assumption that neither here nor there is actually here or there, but here, not there, and is actually WHERE. See?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.