To Backup or to Archive, That is the Question!

Both use tape…!?
December 3, 2012 – PresSTORE comes with a Backup and an Archive module. Both can use tape to save a copy of the data offline. So what exactly is the difference then?

To Backup means to create a safety copy. Ideally, one will never need to use this copy. However, equipment fails and accidents happen. That is when you need to reach for the backup and recover either parts or all of your data. Accidents are generally twofold, either the online hardware fails and data in its most recent state that needs to be recovered gets lost, or the data accidentally or maliciously gets altered or deleted and you want to go back in time and recover the latest state prior to this event. For this purpose, you would like to save your data as often as possible and keep the copies for as long as possible. Furthermore, you want your data in the backup to be organized exactly the same way as it originally was on the file system, and to use a minimum of space on your offline system. PresSTORE Backup is tailored for and equipped to offer you exactly that.

To Archive means to preserve, to put away for safe and long keeping. Going about your everyday business produces data, lots of data. While working, this data resides in online storage, on the fastest and most expensive disk. Usually, once the work is done, the data produced is no longer required, but must be kept, perhaps for legal purposes, or to re-use it and incorporate into new work at a later time. Disk storage, however, is limited and expensive in terms of space, energy consumption and maintenance. It is therefore a logical step to free disk space for new work by storing old work offline to a less expensive, less energy-hungry and less maintenance-intensive storage: tape. Since you will probably not be accessing this storage for a long time, you will want to organize this storage in such a way to easily find your work long after you have forgotten where it was located in the file system and what the files were named or what they contained. For this purpose you will want to tag your work with meta-data, to create preview pictures of the data and to organize the finished work into collections or projects. You can easily do all this using the PresSTORE Archive module.

Whereas Backup tends to be done cyclically, re-using the space on tapes and replacing older copies of data with newer copies, Archive is a continually growing storage where the tapes will be written once and kept forever. The Backup tape storage size is in the same order of magnitude as the file system it is backing up. The Archive storage will probably exceed the disk storage size by several orders of magnitude over time.

Into the Details

The Backup and Archive modules are designed to optimally execute their respective tasks:

  • Backup tapes hold the corresponding backup index to allow bare metal recovery. Archive tapes do not store the archive index because of its large size that would waste tape space.
  • Archive enables attaching additional information to the stored file to assist finding it later, like previews, meta-data tags and an organization that is not bound to the file’s location on disk. The Backup module strictly reflects the file’s location in the file system.
  • Backup always shows a full state of the saved file system at a selected point in time. Archive allows defining different indexes for different purposes, each organized differently, for example for different work-groups.
  • Backup also allows saving changes in the file system since the last backup. Archive saves all the files passed to it, no questions asked.

Pitfalls

Nothing lasts forever. Materials fatigue and wear in time. Technologies change, rapidly. Even though tape vendors declare 30 years of shelf life, this is under ideal conditions. The truth lies far from that. A backup tape will last you 3 years if you are lucky. Each read of the tape lowers the tape’s lifetime, as well as does storing it in non-ideal conditions. The drives wear too, and a new technological generation is released every 2-3 years. The interfaces to the machines change. So it is not realistic to expect being able to connect your drive to a machine after 10 years time. Therefore, it must be planned that any data stored offline must be migrated to a newer technology every now and then. PresSTORE provides functionality to ease migration. For example, LTO maintains read compatibility two generations back. Migration is therefore recommended at latest after the second generation switch (e.g. LTO-4 to LTO-6).

With the sinking costs of disk technology, attention has turned to backup to disk (or God forbid, Archive). The pitfall here is, if the disk isn’t continually spinning and is committed to the shelf, it will probably not spin once attached after a few years. This is due to gumming oil and other factors. Furthermore, spinning disks are expensive and even though disk storage has become cheap, it is still an order of magnitude more expensive than tape, and the tape does not suffer the technological disadvantages of disk.

Lastly, “misusing” Backup in place of Archive is never a good idea. The drawbacks are not technological, but rather organizational in nature. The additional and different mechanisms that Archive offers take the burden of organization off your shoulders and make the data findable later in time. The index organization supports that. When using backup, you will quickly find yourself lost in your data without a true ability to locate anything any more. Furthermore, migrating to a new technology is only possible with Archive. And lastly, there is always the danger of overwriting your data due to the cyclic nature of backup and the automatic retention mechanisms that you explicitly have to override. All this makes Archive the right tool for archive.

(Orignal by Sven Köster, revised by Ibrahim Tannir)