LTFS Reality Check

The fifth generation of the LTO (Linear Tape-Open) standard is known for its embracing of LTFS (Linear Tape File System). The goal of incorporating LTFS was to broaden LTO’s range of use, increase flexibility and facilitate the exchange and transport of data. It provides users disk-like access to browsing and OS level functionality without the need for software. But while LTFS does represent a major milestone for tape-based archiving, much work still remains to be done before this technology can be considered fully developed.

Tape as a storage medium has historically had its share of detractors and opponents, and the need for positive information has likely been a factor in spurring the development of LTFS. This article will touch on the benefits of the marriage between LTO and LTFS, as well as the issues that still remain.

Tape and Its Reputation

One very interesting yet lesser known fact of LTO is that the origin of the standard is actually based on combining the best features of several preceding technologies: the single track cartridge from DLT; the chip within the cartridge from AIT; the servo tracks from SLR; and error correction from DDS. This “Best of all Worlds” product is something of a rarity in the IT world, and created in LTO a highly professional and secure storage medium that has proven to be unmatched in cost and efficiency, particularly for long-term archiving and off-site storage. Yet tape is still much maligned with misconceptions concerning reliability, slow speed and cost.

Now let´s have a closer look at LTFS. What exactly is LTFS, how is it implemented and what is the benefit for the user? LTO5 introduced an option to partition tape media. Partition 0 (37GB in size) holds the index and Metadata, whereas partition 1 holds the actual data as well as a backup copy of the index.

During a read process, the index is read from partition 0, placed into RAM of the connected computer, and copied into a temp directory respectively. There the index is updated and written back to partition 0. Additionally, a backup copy of the index will be written after the actual stored data (on partition 1). The index itself is an XML structure (in UTF-8) that holds the file names and directories of all the files written to tape. For security reasons, the index is kept twice on the tape.

(Almost) Like a Disk

When a LTFS tape (LTO-5) is inserted into a drive, it mounts and displays its contents just like a disk (assuming the correct driver is installed) and sees the files listed on the tape, just like a disk. This is certainly an important improvement, in that tapes that were written with older backup software did not mount by themselves. In any case, since every vendor created their own respective proprietary format or container files on tape, this would have been of little use, and since the respective container files hold all the attributes, ACLs, extended finder attributes, etc., cross-platform compatibility was dependent on it.

At present LTFS still has it’s issues with special characters in file names, as well as in the file path. Extended attributes are not always supported between platforms, and this generates warning messages that can sometimes be ambiguous. These conditions are largely dependent upon which OS the LTO medium was written with. To compensate, the user would have to document this information for data exchange, making it a cumbersome process.

Write, Read, Delete

To write a file to tape, you drag it onto the drive symbol or the window of the tape, just like you would with a disk. But this is where the similarity ends. Since tape is a linear medium, access to files works completely different than with disks. Files are always appended to the end of the previously written data partition. Deleting a file does not free up space – to delete a file and free up space it is necessary to delete all the previous files and rewrite them back to tape again. Just reading a fully populated 1TB tape can take up to three hours. Writing back to tape can take just as long.

For users accustomed to the non-linear access of a hard disk, USB stick or SSD, this linear process can be cumbersome. Marketing tape-based media as comparable to hard disks in this respect leads to unnecessary misunderstanding and disappointment. For example: “The tape can be utilized just like a hard disk or other removable media, including directory tree structures.” (www.trustlto.com). The issue of access rights further complicates things. Since the index is cached (in /tmp/ltfs) while the tape is in the drive, the user has to have access rights to its directory, as well as for the mount point of the LTO drive. (Oracle LTFS)

Search and Find

As with CDs, DVDs or disks, each LTFS tape holds a directory of all files contained on the medium. When searching for a file that might be located on one of several tapes, each of those tapes must be individually placed in the drive, and its index read before the directory can be displayed. Depending on the number of files on the tape, this process can take several minutes. If several tapes are to be screened, this can add up to half an hour or more just to locate a file. The actual reading process of the file also requires time, but is relatively fast at 140MB/s (assuming the host computer can support the data at this rate).

Traditional Backup software like Archiware´s PresSTORE has a clear advantage over this procedure. All written files and versions are stored in an index that can be browsed or searched by the user at any time. When the desired file is located, the respective tape is called up, and queued to the exact position where the file is located, since this information is also stored.

Vendor independent

One factor in favor of LTFS is that it is vendor-independent, since no software is required for reading and writing to tape; only driver software is needed. But it is here where the term “vendor-independent” can be somewhat misleading. Each of the three LTO drive manufacturers (HP, IBM and Quantum) creates their own driver software. Since each company supports OS X, Windows and Linux, there are altogether nine different drivers. How they will be kept up to date over the years is unknown.

Essentially, dependency on backup software vendors has been traded for dependency on drive manufacturers. Backup software vendors who support tape have invested heavily in gaining the necessary know-how and market share, and software development is at the core of their business. In contrast, drive manufacturers have the majority of their business invested in traditional storage solutions that are not using LTFS, thus giving them little incentive to develop and update their LTFS software. Users who have invested in their product can easily end up stuck with an outdated OS version or disk, losing access to their data. This recently happened when LTFS moved from version 1.0 to version 2.0 and new media would not even mount with older drivers (IBM troubleshooting).

Scalability

On a smaller scale, administering a few tapes and putting them in the drive to locate a file seems manageable. As the number of tapes increases this process becomes increasingly unwieldy. If seek time is relevant, LTFS can only be used for archiving a small number of tape volumes. It is certainly not feasible on a larger scale in a professional application.

Traditional archive software like PresSTORE P4 compiles a catalog of all tapes that are part of an archive. This catalog can be browsed and searched anytime, supported with media previews to streamline the process. Searching by file name, date of archiving, path or metadata (even individually formulated) is standard. Any file can be located within seconds. Multiple files can be combined in a restore selection. The physical tape is only called up by the software when starting the actual restore process, and in a tape library setting the tape will be mounted in the drive automatically.

With its emphasis on direct attached storage (DAS) at single workstation, LTFS seems to underestimate LTO technology and its potential. For the user, the advantages of LTFS are more than offset by its high complexity and limited ease of use. With “datacenter on the desktop” and new technologies like Thunderbolt, there are additional challenges for the professional workstation user in the very near future. The more clearly those are presented and the more transparent they are, the more they will be embraced by users.

Who LTFS is for

Considering all the aforementioned factors, attributes and limitations, it is clear that LTFS is only appropriate in smaller computing environments. Nevertheless, maintaining a consistent archive structure and documentation to facilitate file location and retrieval is still crucial in all environments. The additional security of redundant, off-site backup is also critical. The total number of tapes should still be manageable manually, and it is thus clear that long-term archiving with LTFS is not advisable at this point. While LTO5 is an interesting medium for data transport due to its high data density and robust technology, long-term experience with driver development and future compatibility are still unknown factors.

LTFS is still something of a novelty for LTO tape. At this stage neither the idea nor the implementation are convincing. It bears many of the attributes of a professional medium, but with a somewhat ill-fitting feature set for most users. Communicating the real advantages of LTO tape more effectively and to a broader audience would go a long way toward attracting new users. One approach would be to emphasize the unrivaled TCO of tape versus disk. Tape drives and libraries could be designed to be more attractive, with better desktop compatibility for new users and applications. There is much left to do.

CatDV and P4 Archive Integration

After Apple announced that it would discontinue Final Cut Server, the search began for a good replacement. In many cases, the solution is CatDV by Square Box Systems.

CatDV has become a very popular asset management system. It can provide entire video workgroups with a DAM and automation solution.

Now there is an improved way to connect CatDV to P4 Archive, called Castor. This will help you connect your asset management to a tape storage system.

Castor is a script package that ties P4 to CatDV. More information about this great tool can be found at our partner’s website: www.provideotech.org/castor

For more in depth information, please also check out this article, done by our reseller 318 Inc. www.techjournal.318.com/xsan/final-cut-server-eol’d-what-do-we-do-now

P4 Training MasterClass

Our English distributor JPY runs regular technical trainings for their P4 reseller network. They have recorded a training session and made it available as a free web-resource.

In the P4 MasterClass you’ll learn how to install, configure and troubleshoot P4 via a collection of HD videos. Join David Fox as he presents a training course to engineers from UK resellers, including live demos, questions from the audience and whiteboard explanations.

Videos available at masterclass.jpy.com.

Get Your Head Out of the Cloud Part 3: Final Words on the Cloud

Backup is essentially a type of insurance – insurance against the loss of data. Your backup provider is providing the regular SAVING of your data as the insurance benefit. But you also need a guarantee for RESTORING your data. This is a critical, but often overlooked point. In an emergency you need your data right away, or at least within a short time frame, but no backup provider can realistically guarantee this, since anything outside of their own infrastructure is not under their control or responsibility.

Everyone has experienced fluctuations in transfer speeds in personal, non-critical use and no one really cares if a web page takes a few seconds more to load. But what happens if the restore that was supposed to take ten minutes suddenly drags on for more than two hours? Or even worse, just keeps failing? Who do you ask for help? The point is – there are no guarantees of speed, dependability or safety. If your data is critical to your production or your enterprise, one backup is not enough; you need a second one as a local reserve. This copy should be a duplicate of your data, which can be integrated into the production immediately, without the need for a restore. This copy should be created in the background during your production with the help of synchronization.

From a corporate perspective, online backup offers two important advantages: The costs are transparent and the service is scalable. The only important requirement is Internet access with the necessary upload and download capacity (depending on your backup volume). Unfortunately these high-performance connections are still very expensive. The biggest disadvantage is the lack of safety on the local level and the reduced availability of the data, e.g. in case of a total breakdown. The only remedy for this is an additional local synchronization of your data.

The deciding factor when thinking about online backup is the question of safety – whether one should entrust important company data to a third party. Even with encrypted data, there still remains a risk, because anything that is encoded can be decoded.

In conclusion, while Cloud backup is a great solution, it’s not appropriate for everyone and every purpose. This is particularly true for large data sets and/or operations needing fast and/or constant availability. A service that has no guaranteed data transfer rate may be a problem during upload and while you are restoring your data back to your systems. The idea of a failover solution within the Cloud may turn out to be useless, because you never know the variables you are dealing with.

What to take away from all this: If you would like to backup in the cloud, do it with a local copy of your data. And what could be a better way to achieve this then use PresSTORE Synchronize.

Get Your Head Out of the Cloud Part 2: Cloud Backup vs. Local Backup

Does data backup to the Cloud make sense and is it technically feasible? Consider these two important factors:

(1) How much data must be saved each day or night and

(2) What type of Internet connection is available?

Example: You have 20GB of modified or new data per day. Your uploading speed is 8 Mbit/sec (1 MB/sec).

You will need approximately six hours for the transfer of your data. Because the backup would overload the Internet connection during the day, the backup process should only be active in the nighttime. If the backup takes too long, the upload speed has to be increased by upgrading your bandwidth. For planning purposes, it would be necessary to obtain guaranteed transfer rates from your provider. But guaranteed bandwidth between two points is rarely possible, except perhaps at considerable expense and typically only in some major metropolitan areas.

When handling large data files such as pictures, videos or music, it is important to take into consideration that data compression and byte differences cannot be used, because the data has already been compressed.

Even if backing up data to the Cloud presents no problems, it does not always economically make sense. There can be numerous disadvantages and limitations when compared to a local backup.

Costs

  • Cloud backup can represent potential savings if you can use your already existing Internet infrastructure at night, when it is not in use for other business purposes.
  • If the upload speed of your existing connection is not sufficient, additional costs for a more powerful connection must be added to the calculation.
  • Charges for the backup provider and/or storage space will also add to the cost.

Availability and Dependability

  • A dependable connection is critical to uncorrupted data backup. Typically, a large-scale data backup requires your system’s complete infrastructure for quite some time.
  • Another important consideration is how long you can afford to wait to restore your data in case of an emergency. The disaster recovery of a complete hard disc can take days via Internet.

Safety and Security

  • How does the backup provider guarantee the safety of your data? Are there several redundant copies at various locations?
  • How is your data protected from unauthorized access by third parties? Is the data merely encrypted, or are there additional safety measures in place? Is your company permitted to store data off-site?
  • What happens to your data in case of termination of the service, and how is the restitution of important data provided for?

One effective method of evaluating the quality of the backup provider’s service is to ask for internationally recognized certifications such as SAS 70, Type 2 or ISO 27001. These certificates include IT services ranging from infrastructure to the running of the system, its applications and the production of the software as well as monitoring, reporting and management of emergency- and business-continuity. This will at least provide some guarantee of the performance promised. (As noted earlier, this does not include a guarantee for the quality of the Internet connection between the backup provider und your company.)

Summary: Depending on your requirements and the amount of data, the costs for basic professional Cloud services can quickly add up. If you plan to transfer entire business groups to the Cloud, the costs will multiply accordingly.

Next up:  Get Your Head Out of the Cloud Part 3: Final Words on the Cloud

Get Your Head Out of the Cloud Part 1: Our Take on Cloud Computing

When we open our browsers in the morning for the latest news from the tech world, there is one topic that constantly follows us around. The Cloud.

The Cloud has opened the door to many new services that we have come to love, and that are making our lives easier. We wouldn’t want to do without our Basecamp, Campaign Monitor or Google Docs, and we are thankful for being able to access them from anywhere we want.

But not all services make sense in the Cloud, at least at the moment. One of them is data management and backup. It is tempting to think that you don’t need your own hardware anymore because you can simply push everything into the Cloud. This may turn out to be a mistake that you will regret if you don’t create your own particular Cloud Backup strategy.

For many applications, the Cloud makes good sense. But is backing up of your company’s critical data to the Cloud practical and safe? What are the technical requirements for switching to Cloud computing and data storage? Is there really a potential for cost reduction? What are the actual costs you can expect? More importantly, what are the potential costs if things don’t go as planned?

What to consider with Cloud Backup

The term “Cloud computing” has no universally accepted definition, but it is generally used to refer to the approach of providing processing power, data storage or software services on demand, with these services accessed via Internet (public Cloud) or via local Intranet (private Cloud).

Not surprisingly, Internet firms like Amazon, Google and Yahoo have been the main drivers in advancing Cloud computing. The steadily growing number of users forced these companies to distribute their services over multiple systems and servers. This is the origin of distributed applications, data systems and data storage. The result was a significant boost of scalability, along with guaranteed availability of the necessary resources on demand, even at times of maximum load.

The use of redundancies (safety backups) makes Cloud computing clearly superior to conventional (i.e. non-distributed) applications in terms of safety and availability. But watch out: A lot of systems currently offered on the Internet do not scale over multiple systems. They are more like single server applications to begin with, and are only expanded as the number of users grows. It’s important to check with your provider to confirm the safety and immediate availability of your data.

Many applications are already offered online. These new solutions, which are handled completely by an external IT-provider and accessed via web browser, are called SaaS (Software as a Service). Because they don’t require any internal resources, it is tempting to use them within your production right away.

The potential for new and innovative solutions in the Cloud is huge, especially their ability to be used very quickly by a large market. How much of a company’s data is saved to the Cloud today or tomorrow depends largely on the costs, the technical possibilities and the safety requirements. Not every business application, however, is suitable for the Cloud.
Here are some factors to consider before moving your company’s data to the Cloud:

Dependability

Cloud Computing is only possible if a connection to the Internet is available.

  • But what happens if there is no connection?
  • What if the connection is spotty or intermittent?
  • Is the Cloud application of crucial importance for the company?
  • What is the speed of your connection to the Internet?

It is usually possible to purchase greater bandwidth for a faster Internet connection. The quality of your connection to your data, however, cannot be guaranteed. When sending your data to your backup facility, there are dozens of factors influencing the transfer rate. Many of them you cannot change, most of them you won’t even know about. Dozens of companies are involved in transferring your data from point A to point B, and nobody will give you any guarantees.

Remember that uploading speed is also critical for file sharing and backup services. Many Internet providers provide fast download speeds, but offer woefully slow upload speeds.

Safety

  • How is the safety of your data guaranteed? Is there a backup?
  • Who is liable in case of a breakdown or malfunction?
  • How is the security of the data guaranteed? (Access by third parties)

The high cost of performance and the safety of your Internet connection is still an issue. The infrastructure to support large-scale Cloud computing is not yet mature enough to handle the massive amounts of traffic that can be required.

Next up:  Get Your Head Out of the Cloud Part 2: Cloud Backup vs. Local Backup

First Steps

We are starting a blog because we think it’s the best place to share our opinion on topics we care about.

Posts will be inspired by questions from customers, real world experiences from our developers, or just stuff we want to share with the world. Hope you enjoy reading it!