The Data Storage Conundrum

We all need to store data: our documents, photos, music files, video files, and more. As time goes on, we have more and more data to store. In addition, we need to backup all that data. I have often said that is not a question of whether a hard drive will die, but when it will.

As such, developing a strategy for storing data can be complicated. You have data on your computer, and if you have a large music and/or video library, you most likely ha additional data on an external hard drive. In addition, you need backups for all that data. The best backup strategy includes multiple backups: one or more Time Machine backups, clones of your startup drive, and redundant backups of your media. Because never forget that one back up isn’t enough: you should always have at least two, in case you lose your original data and you find that your backup is corrupted.

I have a 27″ iMac with a 256 GB internal SSD, and a 4 TB external drive for my media. I also have an additional 2 TB drive for other data: software installers, archives, and other miscellaneous files.

I use two Time Machine drives to back up my startup drive and my music library. I have two redundant backups for my media drive; this means that my music files are backed up both by Time Machine and these redundant backups. My video files, mostly rips of DVDs and Blu-rays that I own, are only backed up twice. As for that extra 2 TB drive, it, too, has double backups.

All this comes at a price. I have lots of hard drives. I have a total of five units, four of which each hold two hard drives. Two of these units are connected to my Mac by a Thunderbolt, and the other three are USB-3 drives.

I would love to simplify this. I would love to have, say, one unit to store all my data, and another unit to back it up. But it’s not that simple. I’m not comfortable with a RAID unit, because the data is not recoverable unless the hard drives are in the exact same RAID unit. In addition, RAID units are noisy. Since they have so many drives, and processors, they need fans. All of the hard drive units I have are fanless, and the only noise they make is that the hard drives spinning. My drives in the shelf unit with boxes in front of them to dampen the noise.

You can buy enclosures that hold multiple drives and don’t use RAID, or configure a RAID unit as JBOD, or “just a bunch of drives.” In that case, each drive appears as a single drive on your computer, whereas a RAID unit shows all of the storage as if it were one drive. But these devices have the same problem: they have fans, and they are noisy.

Another option is using network drives. They would allow me to use either a RAID unit or a multiple-drive enclosure in a location other than my office. However, the limitation of network speed would be problematic at times. Gigabit ethernet may sound fast, but when you’re copying a lot of files, it’s not. Both Thunderbolt and USB-3 are much faster. As such, any device that is connected to a computer will copy files more quickly. This isn’t a big problem for, say, incremental backups, where only new or changed files get copied. If these happen over the network in the background, it doesn’t slow much down, and since these generally run at night (with the exception of Time Machine backups), I wouldn’t notice them anyway. But when you do need access to large files, it is slow. In addition, I would have to run an ethernet cable into another room, because Wi-Fi isn’t fast enough.

So what’s the solution? For now, I haven’t found an ideal solution. Perhaps larger hard drives will make all of this easier: instead of meeting, saying, two 4 TB drives, one 8 TB drive would be enough. So I could cut the number of drives I use in half. But I still need at least two separate drives for Time Machine backups, and at least two separate drives to backup my media files. So I’m not even sure that larger drives will make that much of a difference. Because of the fragility of hard drives, storing data really is a conundrum.

12 thoughts on “The Data Storage Conundrum

  1. I agree completely. I’ve also not found a clean and simple backup strategy. I have an iMac with less than 1TB of data on the internal drive. I also have an external drive connected by Thunderbolt that holds all my media files (about 3TB of data).

    For backups, I use two 5TB drives (one connected to my iMac, one stored offsite, alternating them about once per month). Each 5TB drive is configured in two partitions, a 1TB partition and a 4TB partition. The iMac backs up to the 1TB partition, the external drive backs up to the 4TB partition (Carbon Copy Cloner).

    I also separately back up the iMac to the cloud (CrashPlan), as well as some files from my external drive.

    All this works, but becomes more cumbersome as data volumes increase.

    • Unfortunately, since I moved a bit more than a month ago, I don’t have the upstream bandwidth for cloud backups. But that was part of my backup strategy for a while.

  2. No offsite back-ups? What about a thief that steals all your drives or a fire that burns your house down? Unlikely, sure, but so are multiple drives failing at the same time. It may be hard to backup 4TB remotely, but you should at least have a backup HD, or HDs, you store off site.

    • I do have a solution for that as well. But that’s not what the article is really about; it’s more about the stuff in my home office, and how I manage it and access it.

  3. I use a Synology NAS at home but would recommend the solution I use at work: a Promise- brand thunderbolt array. it’s RAID and its been reliable and fast.

    I can backup over the cloud so I use Backblaze in addition to time machine. for the NAS I use Amazon.

  4. I’m in the midst of figuring this out at home, and it’s a pita. I experimented with a consumer Seagate 2 bay NAS last winter, but gave up on it, even though it was easy to manage, and a bit faster than using a 2011 mini as a file server. The CPU wasn’t nearly fast enough to handle encryption though, and I also decided that I’d rather stick with HFS+ than multiple ext4 partitions per disk (even in jbod), since I already have mac disk recovery tools that I’m comfortable with. I also simply don’t trust raid, unless it’s multiply backed up, preferably not to the same brand of raid, because several times in the last couple of years at work I’ve seen low end raid arrays just fall apart, with no recovery possible. Disks were all fine, it was other glitches, some hardware, some software. I just don’t need the potential lack of downtime or faster speeds, which is what raid is good at.

    I made a table of the kinds and amounts of data I have, whether they and/or their backups should be encrypted, their backup priority, and a wild guess at how much the amount of data will grow in 5 years. I.e. there’s no point to backing up DVD rips, since it’s a large amount of space, I still have the originals, and watching videos of any sort is a low priority for me. Music rips I do want to backup, because it took a long time to fiddle with the metadata. But there’s no point in encryption if that would increase complications, and multiple backups would probably be overkill, though I am trickling them out to Crashplan on a low priority.

    For almost everything else, in principle I keep three backups. Time Machine for versioning, Crashplan for out-of-region offsite (often weeks behind because of my slow DSL), and daily Chronosync mirrors to encrypted drives that I can rotate for local offsite every month or so. But since I’m slowly rearranging everything, some of those backups aren’t getting done. I got a new mini that’s becoming the file server for several old macs, and the plan is to have everything. originals and backups, fully encrypted. Ultimately, I’ll probably have two USB 3 4-bay jbod enclosures (mix of 2TB – 4TB drives), one for originals, one for on site backups, plus a 6 TB media drive served by a different mini, and a couple of small drives to rotate offsite for the more important stuff. I got the first big enclosure a couple of months ago, the OWC Elite Pro Qx2. I was worried about noise, but it’s been great, maybe because my ears don’t work all that well anymore. But it only revs up the fan if it has to, which is less often that I expected. It’s about 18″ from my ear, and even when I first set it up I barely noticed it. Now I don’t hear anything but the drive chuckles.

    For photography, I’ll probably end up with a small drive just for that, so I can attach it locally to whatever machine I’m working on. GB ether is almost tolerable if you have jumbo frames on, but either FW 800 or USB 3 is better. I’ll experiment with ethernet over thunderbolt if I ever get a new work-horse mac.

    It’s all taking a long time to sort out, partly because I’m trying to weed duplicates of large files, and reorganize things. What I really want is three 1 PB thunderbolt 6 drives, one for real stuff and two for backups. That would keep me for decades unless I develop a yen to shoot 16K video…

    Do you still have and use the WD NAS you posted about a while ago?

    • I have the NAS, but I was testing it with two old 1 TB drives. I’ve been thinking of getting larger drives and setting it up for some of my backups. But I’m not sure if I should get 5 or 6 TB drives, or wait for 8 TB drives. The problem, as I explain in the article, is that it doesn’t matter if you have larger drives, you still need multiples for data security.

      One thing I kind of like these days is the small, portable, self-powered USB-3 drives you can get. They come up to 4 TB, and I was thinking of getting a half-dozen of them, connecting them to something via a hub, and using them for backups. The problem is that my Mac mini server – which I use for my video library with Plex – is too old to have USB-3, so I’ve been waiting for an update to the Mac mini to replace it. In that case, I may do all my backups that way, over the network.

      • My limited experience with an inexpensive (Transend) USB 3 hub at work is that it’s not much more reliable with drives than USB 2. I get occasional unexpected ejects with a corsair 128 GB flash drive that works perfectly when directly connected. I haven’t tried it with regular external hard drives because I really need to be able to count on good behavior when fiddling with other peoples stuff. It’s one of the reasons that I grumbled and went with the 4 bay enclosure(s) at home, and a couple of two bay sata docks for work. It’s quite possible that a higher quality hub would be ok, but plenty of testing seems in order before investing in drives to use with one.

  5. I think you have put your finger on a major problem with the current state of computing. Without re-stating here all of my own Heath Robinson solutions, the key point is: no-one offers an elegant, simple, reliable solution. Not at any price.

    Like you I have 3 desktop and 2 laptop computers with numerous USB hard drives which I must remember to rotate (including offsite) and reconnect regularly to a variety automated scripts. And TEST — on more than one occasion I have found TM was not doing its job as specified, and the files I wanted could not be recovered.

    I use Carbon Copy Cloner (CCC) and have studied it extensively; an excellent program but still written from the simple perspective of providing a clone of one disk to another. Not surprising, but it means its functions have to be adapted to all the permutations of backup that I do by necessity, and manually. It would be nice of someone provided a backup management suite that handled all the permutations required.

    I do have cloud backup to a paid Dropbox subscription. That is a lot simpler and, of course, is offsite. It took days for the first iTunes Media backup but keeps up pretty well since then. I still need to figure out how to backup my photos (in Lightroom). Again, I was surprised that Dropbox themselves did not offer any advice or instructions on how to use their service for specific needs such as iTunes and Lightroom. There must be plenty of people who want to do the same.

    Even so, I would not rely on Dropbox alone. Which brings me back to your USB HD conundrum.

    • chronosync, by econtechnologies, may be of use for you. As reliable and sturdy as superduper/ccc but for more specific backing up/syncing.

    • I expect that Lightroom catalogs are sqllite or similar databases like Aperture/Photos libraries are. Databases need special care for backups, because if they’re backed up while running, the backup can be corrupted, and at least some data lost. The photos themselves should be fine since they don’t change. This is one reason why Aperture had the Vault backup scheme. It’s the equivalent of having SQL dump a set of restorable records in a sane fashion, then you can backup the dump. Time Machine has finally learned to be careful about Apple app databases, but I don’t know if it’s as careful about other company’s apps. Since so many programs use databases these days, I try to remember to quit from pretty much everything each night so the chronosync backups will avoid possible pitfalls.

      This is one reason that it’s not a great idea to keep everything on Dropbox–it doesn’t know how to behave with active databases, and it really can’t cope with multiple computers trying to access the same data. If they have any file locking, it’s not very effective, since I’ve had several incidents of trying to unsnarl users dropbox files, sometimes only possible if they also had a real backup.

      Dropbox is a poor backup in other respects, since it’s thoroughly designed as a sharing mechanism. Say you get hit with ransomware. Everything on Dropbox will also be encrypted, and that percolates to any other computers connected to those files. Ah, but dropbox keeps 30(?) days of versions! Unfortunately, the interface for restore is pretty much one file at a time through a really clunky web interface. Great for getting an older version back after you accidentally delete a file, not so much for recovering your life. There are the security implications too–because of the way they examine files for deduplication, they, and potentially hackers, can test against your files to figure out the contents. Not to mention that they have the at rest encryption keys, not you. Dropbox use is banned for any use at our university, not even including the lack of FERPA and HIPAA agreements. (Not that I can get anyone to actually stop using it, sigh.)

Leave a Reply to gastropod Cancel reply