Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacBH928

macrumors G3
Original poster
May 17, 2008
8,725
3,892
Greetings. I am looking for advice.

I am looking for an archival format that is multiplatform for long term storage that is non-corruptable (read-only, like a cd). I am also not sure if I should use compression because I am afraid the compression might causes errors during the compression or decompression in the future.

ZIP: compresses files, doesn't preserve meta-data, not sure if it can handle large files 10GB+

Tar: most likely option, not sure if its safe to use the GZ compression along with it. I also need a GUI tool as I am not a terminal expert

DMG: pretty much does what I want but its MacOS only AFAIK. There is also the sparse bundle disk image format, is this better?

Or should I store them just as plan folder+files as that would be the safest option?

------

I also plan to upload it to the cloud, which is safer incase the upload gets interrupted? can the upload safely continue on a single mega 10GB+ file?
 

Shirasaki

macrumors P6
May 16, 2015
16,249
11,745
Even ZIP is not entirely multi-platform considering how each platform encodes text, including file name. And yeah, ZIP is simple, but has not many modern features RAR, TAR etc enjoy. And I’m not sure if ZIP supports error correction.

RAR is more or less multi-platform but unfortunately there is no official GUI client on macOS. Only a terminal utility.

Im not a Linux guy so dunno much about TAR. But I feel it should be better compared to ZIP.

DMG I think it’s macOS only. ISO might be a better choice.

As for compression, you can choose not to, meaning packaging files in one file but apply zero compression. RAR iirc supports such feature. Maybe TAR also supports it.

Modern day‘s uploading supports “continue where you left off”, causing minimal interruptions. But some cloud storage provider may not have this feature, meaning each file must be uploaded in one go. It depends.
 
  • Like
Reactions: MacBH928

Bigwaff

Contributor
Sep 20, 2013
2,693
1,809
I am looking for an archival format that is multiplatform for long term storage that is non-corruptable (read-only, like a cd).
Currently, M-Discs are considered the best long term digital archival medium.

I am also not sure if I should use compression because I am afraid the compression might causes errors during the compression or decompression in the future.
Valid concern. You'll probably have to research and read to determine the best tactic.
 

wyrdness

macrumors 6502
Dec 2, 2008
274
322
I use TGZ (gzipped tar) on Mac and Linux. I'm not sure how well that's supported in Windows, since I don't use it. I curse anyone who sends me RAR files, as I often have to find and install software to extract them. I probably wouldn't use DMG as an archive format. Sometimes I find it preferable to have a single archive file and sometimes it's better to have an archive of plain folders and files. It's probably best to choose the latter unless you really need compression and a single file (e.g. for long time archiving) meets your needs.
 

MacBH928

macrumors G3
Original poster
May 17, 2008
8,725
3,892
Sometimes I find it preferable to have a single archive file and sometimes it's better to have an archive of plain folders and files. It's probably best to choose the latter unless you really need compression and a single file (e.g. for long time archiving) meets your needs.

So you think storing them as plain folder/directories is the safest choice?

I ask because I found out for some reason if you transfer multiple small files it puts strain on the hardware and the upload/download process more than a single large file. Also it scares me that when transferring a huge amount of files in directories and sub-directories, a file here or there might get corrupted or goes missing.
 

MacBH928

macrumors G3
Original poster
May 17, 2008
8,725
3,892
Currently, M-Discs are considered the best long term digital archival medium.


Valid concern. You'll probably have to research and read to determine the best tactic.

😂😂😂

I actually looked into that and a Redditor , I believe, made a convincing argument that its not the media that I have to worry about but the media player/reader. As you advance it gets harder and harder to find one and I couldn't agree more. Think how its more difficult now days to get a floppy disc or vhs reader. Computers no longer ship with optical media readers.

Still storing on Bluray could be a viable option.

Consider the alternative: if you have one big file, what do you do if that gets corrupted or goes missing?

well you tell me, I am looking for advice. If archiving/zipping large files is prone to errors why so many people use it?
I wonder what people who are in the long term archival business use and the guys at https://www.reddit.com/r/DataHoarder/
 
  • Haha
Reactions: Shirasaki

Bigwaff

Contributor
Sep 20, 2013
2,693
1,809
I actually looked into that and a Redditor , I believe, made a convincing argument that its not the media that I have to worry about but the media player/reader. As you advance it gets harder and harder to find one and I couldn't agree more. Think how its more difficult now days to get a floppy disc or vhs reader. Computers no longer ship with optical media readers.
Yes. There are no guarantees the format or media you choose today will be supported tomorrow. Which is why you need to accept and plan you may have to migrate your archived data to a new format and media in the future.
I wonder what people who are in the long term archival business use
 
  • Like
Reactions: MacBH928

Shirasaki

macrumors P6
May 16, 2015
16,249
11,745
I wonder what people who are in the long term archival business use
Believe it or not, tape, yes the VHS tape and cassette we all dont use nowadays, is a popular format for long term storage. Its read write speed won’t allow you to install an OS and run on it, but perfect for long term storage and occasional data access, plus its cheap compared To HDD and definitely SSD.
 
  • Like
Reactions: MacBH928

SierraVista

macrumors member
May 20, 2024
90
320
Magnetic media is the best format for long term (years scale) archive. You might consider LTO-8 or -7 tape, media is plentiful; the only downside is that the drives can be pretty expensive. There are also “WORM” (write once, read many) tapes that will address your read-only requirement.

Large companies still rely on LTO so the risk of support going away or hardware becoming unavailable should be pretty slim. I know OWC has a Mac-specific external LTO drive and there are also plenty of GUI apps available for working with tapes. Uncompressed Tar files are standard when working with tapes, in fact Tar means “Tape ARchive”.
 

Allen_Wentz

macrumors 68040
Dec 3, 2016
3,273
3,696
USA
😂😂😂

I actually looked into that and a Redditor , I believe, made a convincing argument that its not the media that I have to worry about but the media player/reader. As you advance it gets harder and harder to find one and I couldn't agree more. Think how its more difficult now days to get a floppy disc or vhs reader. Computers no longer ship with optical media readers.

Still storing on Bluray could be a viable option.



well you tell me, I am looking for advice. If archiving/zipping large files is prone to errors why so many people use it?
I wonder what people who are in the long term archival business use and the guys at https://www.reddit.com/r/DataHoarder/
You say that "its not the media that I have to worry about but the media player/reader. As you advance it gets harder and harder to find one and I couldn't agree more. Think how its more difficult now days to get a floppy disc or vhs reader."

I couldn't agree less. Once some medium is popular enough to build huge numbers of copies in circulation the world will always keep building machines to read them. E.g. a quiick Amazon search brings a floppy disk reader delivered tomorrow for US$16. Niche storage methods should however be avoided IMO.

Deterioration of media is an issue, however, that needs to be dealt with by routine recopying and multiple backup copies.

IMO a primary issue that you have not discussed is UI. Secure archiving happens through humans one way or another. How well the human interface(s) to the process are dealt with (now and in the archival future and during catastrophes we have not yet thought up) is by far the most important parameter to consider. E.g. the process can be so time-consuming as to be boring, leading to user errors; or the process can be so obtuse as to lead to user errors; or the humans doing the work at some future time, or during absence of the primary operator may be poorly trained or unmotivated, leading to user errors; or the process may be so fully automated as to repeatedly corrupt itself without humans being aware; etc.

I bring up UI because some archival methodologies may be easier or more difficult to effectively long-term implement than others.

Just my $0.02.
 
Last edited:
  • Like
Reactions: Shirasaki

chown33

Moderator
Staff member
Aug 9, 2009
10,990
8,874
A sea of green
I'd probably pick 'cpio' or 'tar', coupled with gz for compression of the overall archive.

I'm pretty sure 'cpio' will preserve symlinks, hard-links, xattrs, and all the usual Unixy stuff, such as mode, owner, and ACLs. I'm not sure 'tar' will, but it's worth testing it.

Regardless of what anyone suggests, you should do a test using some small files that have various things, such as symlinks, hard-links, xattrs, etc. Put the files into an archive, then restore to a different target disk. You can test restoring from the archive by writing to a disk-image in either HFS+ or APFS format.
 
  • Like
Reactions: LockOn2B and Nermal

MacBH928

macrumors G3
Original poster
May 17, 2008
8,725
3,892
Believe it or not, tape, yes the VHS tape and cassette we all dont use nowadays, is a popular format for long term storage. Its read write speed won’t allow you to install an OS and run on it, but perfect for long term storage and occasional data access, plus its cheap compared To HDD and definitely SSD.

Magnetic media is the best format for long term (years scale) archive. You might consider LTO-8 or -7 tape, media is plentiful; the only downside is that the drives can be pretty expensive. There are also “WORM” (write once, read many) tapes that will address your read-only requirement.

Large companies still rely on LTO so the risk of support going away or hardware becoming unavailable should be pretty slim. I know OWC has a Mac-specific external LTO drive and there are also plenty of GUI apps available for working with tapes.

I still have hard time believing this. Tape is the most flimsy storage medium + the writing heads cause wear&tear to the tape medium causing gradual degradation (data loss) over time. it was one of the selling features of optical media over vhs, cassette, floppies, and vinyl.

I looked it up online and says tape will survive 30 years compared to CD/DVD which is approximated to be 10. A real head scratcher.

Uncompressed Tar files are standard when working with tapes, in fact Tar means “Tape ARchive”.

just to be clear, storing data in a TAR archive is the more reliable standard method for LTO over just transferring the files as is to the tape?
 

MacBH928

macrumors G3
Original poster
May 17, 2008
8,725
3,892
I'd probably pick 'cpio' or 'tar', coupled with gz for compression of the overall archive.

I'm pretty sure 'cpio' will preserve symlinks, hard-links, xattrs, and all the usual Unixy stuff, such as mode, owner, and ACLs. I'm not sure 'tar' will, but it's worth testing it.

Regardless of what anyone suggests, you should do a test using some small files that have various things, such as symlinks, hard-links, xattrs, etc. Put the files into an archive, then restore to a different target disk. You can test restoring from the archive by writing to a disk-image in either HFS+ or APFS format.

I highly appreciate your input. I never heard of cpio before. What is your opinion on GZ compression? is it possible to get data loss in the compress/decompress processes? Can I check if everything went fine without having to double check every single file and folder?

I really do not want to tar.gz a HDD and 7 years later find it was corrupted with missing files.
 

Shirasaki

macrumors P6
May 16, 2015
16,249
11,745
I highly appreciate your input. I never heard of cpio before. What is your opinion on GZ compression? is it possible to get data loss in the compress/decompress processes? Can I check if everything went fine without having to double check every single file and folder?

I really do not want to tar.gz a HDD and 7 years later find it was corrupted with missing files.
Compression for files and folders (not media files) are always lossless, otherwise whatever that format it would be discarded a long time ago. The most rudimentary form of compression is done by creating a table and save one copy of data that appears in the drive many times, with a lookup table to point at where that is.

If you are concerned about data integrity, modern format would also store a CRC checksum of every file as part of the compression. And you can checksum the archive immediately after creating it.

Ultimately, multiple copies on different physical media is a much better way to preserve the data.
 

SierraVista

macrumors member
May 20, 2024
90
320
I still have hard time believing this. Tape is the most flimsy storage medium + the writing heads cause wear&tear to the tape medium causing gradual degradation (data loss) over time. it was one of the selling features of optical media over vhs, cassette, floppies, and vinyl.
Those things are all real risks, sure. In situations where you have requirements for years-scale, offline backup of large amounts of data, such as in a regulated industry like banking or medicine, those risks are accounted for in your backup strategy.

This could include writing data to multiple sets of media to be stored in different locations, to reduce the possibility of data loss due to media failure, equipment failure, natural disasters, etc. In the past, that could have meant everything from building special tape storage areas into your data centers to the CFO taking a copy of the accounting backups home to put in a safe in his/her basement. Today, modern enterprise backup systems support a variety of storage types, so you could save one copy of your backups to tape, another to Azure Storage in the US, and a third to AWS S3 in Europe. There are managed services that can front a Tape interface with other storage, for legacy processes that can’t be upgraded for whatever reason.

I looked it up online and says tape will survive 30 years compared to CD/DVD which is approximated to be 10. A real head scratcher.
Those would be best-case scenarios. Tapes certainly can last that long, but this requires optimal storage conditions. Off-site archiving providers will maintain special warehouses at just the right temperature and humidity ranges to minimize the possibility of damage to or degradation of the tapes.

CD/DVD-R, especially those written using burners rather than stamping, are highly susceptible to “disc rot”, which damage to the disc caused by chemical reactions between the surface of the disc and the air, UV light, or the components of the disc itself. This can start after only a few years, and can render the disc unreadable even it appears visually fine. M-Disks were specifically designed to have increased resistance to these processes, but are not immune.

just to be clear, storing data in a TAR archive is the more reliable standard method for LTO over just transferring the files as is to the tape?
It’s been a loooong time and my memory is kind of fuzzy but in general, the upside to using TAR is that it keeps track of which files are on the tape and where on the tape they are located. Other formats/tools require this information to be stored elsewhere, such as in a database, and then you have to worry about backing up that database too or you won’t be able to tell what is on your tapes anymore.
 
Last edited:

9valkyrie

macrumors member
Feb 13, 2024
47
17
Windows does support TAR (it's even bundled with a variant of TAR). In *nix systems TAR does support permission retention, etc. For verifying integrity I'd recommend SHA256; or a mix of SHA1, MD5, and SHA256; or SHA512. If you are replicating an entire drive then dd is the best option.
 

millerj123

macrumors 68030
Mar 6, 2008
2,601
2,703
Believe it or not, tape, yes the VHS tape and cassette we all dont use nowadays, is a popular format for long term storage. Its read write speed won’t allow you to install an OS and run on it, but perfect for long term storage and occasional data access, plus its cheap compared To HDD and definitely SSD.
When our data was our product, we had RAID, but did a daily backup to tape that was immediately stored offsite, then brought back after a week, and would eventually be brought back into rotation. Of course, we had IT required to periodically test that the data could be recovered from the tape as well.

For a while, I used CDs to back up data. Anymore, I don't trust the older ones, and most of that data is no longer terribly relevant. Now, I keep backups on several hard drives that I rotate through. Nothing I have is that time sensitive or critical in general.
 
  • Like
Reactions: Shirasaki

MacBH928

macrumors G3
Original poster
May 17, 2008
8,725
3,892
PAR2 enters the chat...

interesting. This works great for a single file, but I doubt it for a large amount of files as it will have to increase the storage for every single file stored. Just imagine a 1TB HDD.
 

MacBH928

macrumors G3
Original poster
May 17, 2008
8,725
3,892
If you are concerned about data integrity, modern format would also store a CRC checksum of every file as part of the compression. And you can checksum the archive immediately after creating it.

Windows does support TAR (it's even bundled with a variant of TAR). In *nix systems TAR does support permission retention, etc. For verifying integrity I'd recommend SHA256; or a mix of SHA1, MD5, and SHA256; or SHA512. If you are replicating an entire drive then dd is the best option.

great suggestions. I should learn how to use it. Is there a GUI app that I can add the main directory to it and does the checksum on all the files in it then does the same for the compressed version and compare them?
 

MacBH928

macrumors G3
Original poster
May 17, 2008
8,725
3,892
Those things are all real risks, sure. In situations where you have requirements for years-scale, offline backup of large amounts of data, such as in a regulated industry like banking or medicine, those risks are accounted for in your backup strategy.

This could include writing data to multiple sets of media to be stored in different locations, to reduce the possibility of data loss due to media failure, equipment failure, natural disasters, etc. In the past, that could have meant everything from building special tape storage areas into your data centers to the CFO taking a copy of the accounting backups home to put in a safe in his/her basement. Today, modern enterprise backup systems support a variety of storage types, so you could save one copy of your backups to tape, another to Azure Storage in the US, and a third to AWS S3 in Europe. There are managed services that can front a Tape interface with other storage, for legacy processes that can’t be upgraded for whatever reason.


Those would be best-case scenarios. Tapes certainly can last that long, but this requires optimal storage conditions. Off-site archiving providers will maintain special warehouses at just the right temperature and humidity ranges to minimize the possibility of damage to or degradation of the tapes.

CD/DVD-R, especially those written using burners rather than stamping, are highly susceptible to “disc rot”, which damage to the disc caused by chemical reactions between the surface of the disc and the air, UV light, or the components of the disc itself. This can start after only a few years, and can render the disc unreadable even it appears visually fine. M-Disks were specifically designed to have increased resistance to these processes, but are not immune.


It’s been a loooong time and my memory is kind of fuzzy but in general, the upside to using TAR is that it keeps track of which files are on the tape and where on the tape they are located. Other formats/tools require this information to be stored elsewhere, such as in a database, and then you have to worry about backing up that database too or you won’t be able to tell what is on your tapes anymore.

When our data was our product, we had RAID, but did a daily backup to tape that was immediately stored offsite, then brought back after a week, and would eventually be brought back into rotation. Of course, we had IT required to periodically test that the data could be recovered from the tape as well.

For a while, I used CDs to back up data. Anymore, I don't trust the older ones, and most of that data is no longer terribly relevant. Now, I keep backups on several hard drives that I rotate through. Nothing I have is that time sensitive or critical in general.

thanks for the help and the tips. Can't imagine how long it will take to put back YouTube (if their DB of videos got corrupted) from LTO backups.
 

Woof Woof

macrumors member
Sep 15, 2004
94
17
interesting. This works great for a single file, but I doubt it for a large amount of files as it will have to increase the storage for every single file stored. Just imagine a 1TB HDD.

Yeah, it is best suited for archived files that have been grouped and compressed and not a live, ever changing library.

It is a sort of RAID 6 for files. You can lose 1 (or more depending on redundancy level) files and still recover the original file.

It was really popular in the usenet days where you would inevitably be missing a few parts due to replication errors between usenet servers.
 

chown33

Moderator
Staff member
Aug 9, 2009
10,990
8,874
A sea of green
What's your budget for this endeavor? Both initial outlay, and ongoing annual costs.

Also, what does the data budget look like? Is it mainly an initial capture of size X, then updates of size Y? If so, what are X, Y, and the update interval?

Some things you ask about, like data integrity, could be at least partly addressed by distributing identical copies across multiple storage forms. For example, multiple NAS devices, and multiple online storage services. You'd need checksums, like the above noted SHA ones, but you wouldn't directly need an error-correcting scheme, because errors would be handled by the redundant storage.


You also haven't mentuoned security. If you need to encrypt the data, then you'll also need to store the keys securely. If this is long-term, then that means secure against natural disasters or your own death. And now that Death has entered the scene, it brings legal issues into play (who owns what, who is responsible for what, etc.)
 
  • Wow
Reactions: Shirasaki

Shirasaki

macrumors P6
May 16, 2015
16,249
11,745
great suggestions. I should learn how to use it. Is there a GUI app that I can add the main directory to it and does the checksum on all the files in it then does the same for the compressed version and compare them?
Uhh different compression utility has their own way of storing and displaying checksum data. 7z for example shows the checksum in their GUI interface iirc. WinRAR also does the same. Maybe some compression programs can output checksum data into a text file. I’m not too sure about which one can however.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.