P5 Backup and Archive to the Cloud: 6 questions answered

Archiware Spotlight

By David Fox

1. What cloud support does P5 offer for Backup and Archiving?

Since P5 version 5.5, Archiware has introduced flexible support for a variety of cloud storage services in both the Backup and Archive modules. This Spotlight article will help you consider cloud storage as another option in your data management workflows and learn how P5 can assist in the implementation.

Archive and Backup to the cloud

In order to add cloud storage support, the existing P5 Backup and Archive modules and their storage formats, based around ‘pools’ and ‘volumes’, are used alongside new ‘cloud storage’ services. This means that existing workflows in P5 can be easily modified or augmented to include an element of cloud storage without making wholesale changes to the configuration.

Archiware supports a number of different cloud storage providers, and more will be added over time, providing great flexibility. Details of the currently supported providers are listed below.

Accessing cloud storage in P5 is covered within the same licensing that allows use of tape and disk stage – if you already own P5 licenses, you can start experimenting with cloud today – implementing some test workflows to learn more about what works best for you.

Let’s begin by looking generally at why cloud storage is interesting, and focussing on some of the positives and negatives so that you can understand the potential impact for you and your data-management environment.

2. Should I be using cloud storage? How do I decide what is right for me?

cloud-pros-and-cons

With cloud storage technology maturing and pricing and feature sets settling down, now is a good time to investigate – especially if you haven’t paid much attention so far, beyond using Dropbox for personal file storage.

When comparing cloud storage with on-site, we’re really considering a shift in various trade-offs, e.g. up-front investment versus perpetual subscription costs, local ownership and control versus trusting and dependency upon a third party.

Let’s begin with a list of general positive and negative aspects of cloud storage to consider – these points apply equally to backup and archive.

Pros

  • No up-front capital expenditure – compared with the significant up-front cost of purchasing tape/disk hardware. This makes startup costs for cloud almost zero and testing workflows easy.
  • Infrastructure expenses – technical staff costs, data-centre provision, cooling, power, insurance all replaced by on going charges.
  • Unlimited storage capacity – provided you can pay for what you’re using. Infinitely expandable.
  • High levels of reliability and security – generally more than can be achieved by provisioning disk storage yourself, due to redundant duplication across multiple locations. Encryption.
  • Low maintenance – the cloud provider takes care of replacing aging hardware while your data remains available throughout. Contrast with ageing tapes/disks having to be replaced, involving costly data-migration.
  • Access from anywhere – enabling collaboration, unlike local storage.

Cons

  • Perpetual expenses – based on the amount of data being stored. Contrast with investing in hardware and then having several years use for minimal ongoing costs. E.g. power, cooling, maintenance contracts.
  • Transaction costs – in addition to cost/GB/month, additional upload/download can also be charged, making storage of smaller files uneconomical.
  • Slow access speeds – limited by both your WAN speed and the speed the cloud provider allows. Unlikely to be as fast as on-premise solutions.
  • Fast WAN link required – potential investment required to improve existing WAN connection or install a dedicated connection for cloud storage access.
  • Increased recovery times – if you’re in a hurry, fast access to large restores may not be possible – can be mitigated by cloud provider shipping a hard drive – at additional cost.
  • Dependency on cloud provider – and on the reliability of your WAN connection.

3. How do I decide if Backup to the cloud is viable for me?

Quick primer: Backup of data is needed so that we can recover from a ‘disaster’ that has caused us to lose data. Therefore backup requires copying all of our important data and any ongoing changes over time. This copy needs to be located somewhere unaffected by the hypothetical disaster – so called ‘off-site’ backup. Since cloud storage is, by definition, always off-site, we get off to a good start for its suitability for backup.

Let’s begin with the basic research required before cloud backup can be considered:

  • How much data currently exists that needs to be backed up?
  • How much data is created and how much changes per day?
  • For how many days do we wish to retain previous file versions in the backup?
  • How fast can we upload/download data to our chosen cloud storage provider?
  • How much does the cloud storage cost – per GB per month?

Much of this can be gleaned by investigating the existing backup strategy. From this information, it should be possible to calculate:

  • How long the first backup (full backup) will take to complete to cloud storage?
  • How long each subsequent daily backup of changes will take?
  • The total amount of storage needed to keep the current and historical data (e.g. 30 days)?
  • How much the cloud storage will cost?
  • How long we can expect a given restore to take, including a restore of everything in the backup?

Some organisations are regularly dipping into their backup to restore overwritten or accidentally deleted files while others rarely touch it. Your own use pattern will help determine what works for you, based on the average size of a restore and how long that will take from cloud storage.

It’s possible to fine-tune these outcomes by, for example, reducing the retention period, choosing cheaper cloud storage, increasing your WAN speed etc. We’re dealing with a set of trade-offs that can be tweaked to produce the best end result.

Many cloud storage providers will also be able to send you a hard disk in the mail, allowing the initial backup, or a large restore, to be performed without all the data having to travel across a relatively slow WAN connection.


CALCULATING THE COST OF CLOUD STORAGE – On request, Archiware will provide you with an Excel sheet providing cost calculations for support cloud storage. You can plug in your own figures and estimate storage costs for your own use-cases. Email info [at] archiware.com to request this valuable resource.


4. How do I decide if Archive to the cloud is viable for me?

When we archive data, we’re doing so for one or more of the following reasons.

  • Moving completed work away from expensive storage where it need no longer reside.
  • Safely retaining data for the long term, effectively indefinitely.
  • Guarantee that completed work is safe from deletion.
  • Reduce the amount of data we have to backup (disaster recovery).
  • Remain legally compliant with clients requirements.

You can see that a cloud archive solution has quite different needs and constraints. As with our cloud Backup workflow, we still need to consider:

  • How much data exists today that needs to be archived?
  • At what rate will we continue to archive data over time?
  • How fast can we upload/download data to our chosen cloud storage provider?
  • How much does the cloud storage cost – per GB per month?

Again, we can now calculate timings and costs for our cloud archive to see if it’s viable. If we plan on restoring on a regular basis, then we should also factor in the cost of retrieving data – an additional cost from the cloud storage vendor.

Some cloud storage is custom designed for archiving. Having data immediately available for restore can be traded off against the overall cost of the storage. If you’re prepared to wait a few hours for restored data to become available, you can pay less overall for storage (see Amazon Glacier storage in the next section).
Archiware P5 Archive is well suited to having data stored in the cloud, because a local index including previews and metadata is always available to browse the contents of your archive. Only when you choose what you wish to restore is the cloud storage accessed and required files retrieved.

A hybrid archive uses two or more different types of storage to provide greater redundancy. An LTO tape archive can be augmented with a second copy of the same data stored in the cloud. When restoring, the user can pick which storage they wish to restore from. A small restore might be convenient and quick from the cloud, while bringing LTO tapes back onsite might be best for a larger restore job.

Archiving to the cloud with additional redundancy of storage can be achieved in P5 with a single provider, storing the data in more than one physical location, or by using two different cloud providers and archiving separately to each.

Finally, cloud storage is well suited as an Archive format because the necessity of replacing hardware as it ages over the longer term is undertaken by the cloud provider. If you’ve archived to LTO tape, as time passes, newer tape technology becomes available and your existing tapes get older. After some years, the data will need to be re-archived to newer tape technology so that it can continue to exist into the future. Contrast with archiving to the cloud, the cloud provider will be responsible for replacing ageing hardware, all built-in to the cost you’re already paying for the storage.

5. Which cloud storage services does P5 support?

At the time of writing, the following cloud services are available for use with Backup and Archive in P5 version 5.5.3.

What are the differences between those different services?

Amazon S3 is charged per GB per month and is a good general purpose option and probably the most popular commercial cloud storage product at the time of writing. Amazon S3 data is redundantly stored across more than one facility in a chosen geographical region. The level of redundancy can be tuned, affecting the price (See https://aws.amazon.com/s3/faqs). There’s lot to read about S3. Start by Googling for the AWS S3 pricing page to see what it will cost. Note that pricing differs slightly depending upon which of the many geographic regions you wish to have your data stored in.

Amazon S3 uses its own https protocol and organisation of data as objects that reside inside of ‘buckets’. There are open source implementations of the same protocol allowing you to build your own S3-compatible storage box or use someone else’s – e.g. https://minio.io/. The Generic S3 option allows P5 to use this generic storage, including some commercial options, e.g. Wasabi (https://wasabi.com).

Backblaze B2 is an Amazon S3 competitor and offers similar functionality. It provides somewhat less redundancy and is hosted only in the US at the time of writing. It is considerably cheaper however, and comes from a company who have been providing their own cloud workstation backup service for many years. https://www.backblaze.com/b2/cloud-storage.html

Finally, Amazon Glacier comes from the same stable as S3 but has some unique attributes which make it attractive for certain archive scenarios. Glacier is considerably (around one fifth in Jan 2018) cheaper than regular S3, costs are kept low by providing a variety of retrieval speeds, Glacier provides three options for access to data, from a few minutes to several hours. Recovery of files is performed by making a request and waiting for it to be executed. P5 will handle this process for you, but if you’re likely to require your data back quickly, this is not the best choice. https://aws.amazon.com/glacier/

6. How do I configure Archiware P5 to use cloud storage?

Note: A full description of the required setup options is beyond the scope of this article. There is a more detailed and technical description available here: http://news.jpy.com/kbase/2018/1/19/p5-cloud-storage-setup-and-best-practise-guide

p5-cloud-config-screenshotFirst, a little historical perspective: prior to the introduction of cloud storage in P5 version 5.5, the concept of a ‘Volume’ was used to represented either a physical tape or a large ‘container’ file on disk. Volumes could be written to by both Backup and Archive jobs.

With the advent of cloud storage support in the newer version, P5 retains the same Volume concept, meaning that most of the required storage configuration remains the same and will be intuitive for those already familiar with the product. Volumes on disk are now stored as folders, rather than monolithic container files. Inside these volume folders, your data is stored in a number of smaller files, called ‘chunks’. Chunks work perfectly for cloud storage – they’re smaller than volumes, so provide granular upload and download of your data to the cloud.

Cloud storage is accessed by creating a ‘Cloud Service’ where we can provide the details from our account with the cloud provider, allowing P5 to connect and access the storage. We then configure a ‘Disk Library’ to allow creation of Volumes on disk, local to the P5 server. When the volumes we create are linked to a ‘Cloud Service’ via a storage pool however, P5 automatically synchronises all data written to the volumes to the cloud storage and optionally removes the local copy, depending on choices made during setup.

Restoring is performed by browsing an ‘index’ of the data contained within these Volumes and making a selection of files/folders to recover. P5 then handles locating the data within the Volume to restore the requested data. If required, the relevant ‘chunks’ of a volume are copied back from cloud storage in order to facilitate the restore.