For media storage where speed should not be that important, why not unraid:
http://lime-technology.com/wiki/index.php?title=Overview and
http://lime-technology.com/
Macworld says this:
http://www.macworld.com/article/146120/2010/02/unraid_server.html
He could use that 20-bay Norco for the box. Maybe 2 of them?
I keep seeing all these recommendations for various forms of RAID, but why would he want to link multiple drives if speed wouldn't matter that much (after the initial loading of all that data onto them)?
I also argue that if he actually owns the DVDs/BDs for all that media, why not let them be his master copies for backup and not put the ISOs on there? That would dramatically downgrade the storage needs while still holding the digital copy he would use most of the time on a much smaller server.
For one thing, unraid is ludicrously expensive -- $150 for up to 16 drives is the biggest package mentioned, which means that this particular guy would have to negotiate his special needs for an even more expensive package.
There are several reasons for raid. One is speed, but frankly, this is irrelevant in this case. The most important one is consolidation of multiple drives into a single filesystem. You do *not* want to manually distribute data over 40 drives. If the storage is pure archival, I suppose you *could* reasonably easily stuff the drives manually and hit something like 80-95 percent capacity, but that would be at the complete expense of any type of organisation whatsoever. A single filesystem is *much* easier to deal with.
The other reason is fault tolerance. It's not *entirely* the case that failures are distributed evenly over time. Consumer-grade HDDs, when they have specs for this, tend to be specced at between .8 and 1.2 million hours MTBF. That would mean that with 40 drives, you should expect the first fault at 20.000 hours, give or take, which is 2.3 years. However, with 400 drives, you would expect a drive failure every 2000 hours, ie every 2-3 months.
Once the drives hit 3-5 years you will start seeing actual lifetime related failure rather than the type of failure measured by MTBF.
Now, that's pure math -- with 40 drives, which is big enough for statistics to take effect, I would expect at least a 10 percent mortality rate in the first year. So you *will* be dealing with drive failures regularly. If your data is unimportant enough to be lost without recovery at that rate, you might as well just chuck it now. If you're willing to re-rip 2 tera worth of data every 3 months.. Well, you might as well just chuck the ISOs now and re-rip when you need them instead.
The question isn't "Why RAID?", it's "what kind of RAID?"
Something to get out of the way first: JBOD/RAID0, just say no. Either one would allow you to consolidate the disks into one file system, but any drive failure would trash the entire system, which Would Be Bad.
RAID1: Well, it's secure and easy, but losing fully half your capacity smarts. Even if you do do it, you will probably want to run two twenty-disk raid5/6 +hotspare arrays and mirror those, rather than mirroring individual drives and then striping or concatenating them, or mirroring two giant stripes/concats.
Which leaves raid5/6 and its variants.
My gut feeling is that 8-drive raid6 arrays still sounds decent, for the traditional raid setup, but out of forty drives you should probably count on at least one if not two hot spares simply so the rebuild will start immediately and there is less (but not remotely zero) chance of multiple drive failures taking out the array.
The backblaze approach, with 45 drives off 9 5-port port multipliers, is a pretty darn good one for this type of system. I've looked at PMs in the past, but have never managed to find them available anywhere for consumers to purchase, backblaze seems to have solved that problem for the world.
FreeNAS with ZFS is probably the best easy, non-customized way to manage a system like this. If you have 45 devices, I'd say 3 hotspares and 3 pools each with 14 devices in z2 or 3 sounds right, put together into one tank. That leaves 36 or 33 devices worth of effective storage.
You're still going to have to do backups, though, because if something unforeseen happens that causes some of the wrong disks to go out simultaneously, your data is toast.