Backup Window and Amount of Data
One of the most critical steps in evaluating the storage media requirements is determining the actual backup window. This must be the actual amount of time you are allowed to have backups running, while at the same time, controlling the backup hardware, using a major part of the network, and using the resources on the systems being backed up. The size of your backup window is becoming a much harder measurement to define. You must be able to determine the number of hours in a day and in a week that can be dedicated to backups, as this is an integral part of the equation to determine media requirements.
Based on the data we collected in the earlier chapters, you should now have a very good idea of how much data needs to be backed up each day and each week. You need to know how much data needs to be backed up during each window. Generally, the largest backups will be the full backups. In the past, most administrators performed daily incremental backups and did all their full backups over the weekend when most people were not working. This concept is changing. It is very common now for a percentage of the systems, say, one-fifth, to have full backups done each day and the remaining systems to have incremental backups each day. The weekends are saved to do maintenance or to catch up. If this is closer to your model, then your window would be the time each day when backups are performed, and the amount of data would be the average sum of the total data that would be backed up, roughly a fifth of your total data. If you do not have specific operational information on the amount of data that will make up your incremental backups, you can estimate using a percentage of change to calculate the amount of data. It is common to use 20 percent, unless you have a more accurate measurement. The goal here is to try to get as close as possible to your actual environment.
Drives
Now that we have the amount of data and the number of hours needed to store that data, all we have left to do is some basic math. Just take the total amount of data that has to be backed up daily and divide by the duration of the daily backup window:
Ideal data transfer rate = Amount of data to back up ÷ Backup window
If you have 100 GB of data and an 8-hour window, your ideal data transfer rate would be 12.5 GB/hr.
After you have an idea of the ideal data transfer rate, you can then look at the different drive types to see which might offer the best fit for your needs. Not surprisingly, this is a little more complicated than just looking at the base numbers, though. With potential drive technology, you must consider both performance and capacity. In larger enterprise environments, one size usually does not fit all. As mentioned several times, you need to look at the recovery requirements first and work back. This might mean you will need two different types of drives, some that are very high performance but with less capacity and some that offer higher capacity with lower performance. Data that is being kept for long retention periods, especially to fulfill legal requirements, might be better suited for the lower-performance but higher-capacity media. Data that might be required for immediate restores where time is money might be better suited for the high-performance media. It is not uncommon to have backups done to high-performance drives and media and then the images vaulted to high-capacity drives and media for off-site storage.
A sample of tape drive transfer rates, capacities, and access times is given in Table 4.1. This information can be very helpful in determining which drive technology you need, but never forget these are all theoretical numbers and are given without taking into account the internal drive compression. Drive manufacturers advertise compression rates for the different drive technologies. These vary depending on the drive but are also theoretical numbers. These specifications can change with new firmware levels or versions of the drives. To get the most accurate numbers, contact the drive vendor or go to their Web site, where you'll find up-to-date specification sheets.
DRIVE | THEORETICAL TRANSFER RATE GB/HR (NO COMPRESSION) | THEORETICAL CAPACITIES GB (NO COMPRESSION) | ACCESS TIME EXCLUDING LOAD TIME | COMPRESSION |
---|---|---|---|---|
4mm (HP DDS-2) | 1.8 | 4 | ||
4mm (HP DDS-3) | 3.6 | 12 | ||
Mammoth | 11 | 20 | 60 sec | 2:1 |
Mammoth-2 | 42.4 | 60 | 60 sec | 2:1 |
DLT 4000 | 5.4 | 20 | 68 sec | 2:1 |
DLT 7000 | 18 | 35 | 60 sec | 2:1 |
DLT 8000 | 21.5 | 40 | 60 sec | 2:1 |
SDLT | 39.6 | 110 | 70 sec | 2:1 |
9840 | 36 | 20 | 11 sec | 2.5:1 |
9940 | 36 | 60 | 41 sec | 3.5:1 |
LTO | 52.7 | 100 | 25 sec | 2:1 |
AIT-2 | 21.1 | 50 | 27 sec | 2.6:1 |
AIT-3 | 42 | 100 | 27 sec | 2.6:1 |
When you start actually figuring how many of which kind of drive you will need, we recommend using the native transfer rates and capacities without compression. It is very difficult to estimate what kind of compression rate you will experience, as it is totally dependent on the makeup of your data. Some data is very compressible, while other data will yield very little compression. If you do your architecture based on no compression, the only surprises you should experience should be good ones; you will have plenty of capacity with room for growth.
Capacity
After selecting the appropriate drive technology that provides the performance and cartridge capacity you need, you next want to look at how many cartridges you will need to have available. This involves all the elements we have looked at so far. The number of cartridges required depends on the amount of data that you are backing up, the frequency of your backups, your retention periods, and the capacity of the media used to store your backups. A simple formula that can be used is as follows:
Number of tapes = (Total data to back up × Frequency of backups × Retention period)/Tape capacity
Following is an example:
Total amount of data = 100 GB
Full backups per month = 4
Retention period for full backups = 6 months
Incremental backups per month = 30
Retention period for incremental backups = 1 month
Preliminary calculations:
Size of full backups = 100 GB × 4 per month × 6 months = 2.4 TB
Size of incremental backups = (20 percent of 100 GB) × 30 × 1 month = 600 GB
Total data stored = 2.4 TB + 600 GB = 3 TB
Solution:
Tape drive = DLT 7000
Tape capacity without compression = 31.5 GB
Total tapes needed for full backups = 2.4 TB / 31.5 GB = 76.2 = 77
Total tapes needed for incremental backups = 600 GB / 31.5 GB = 19.1 = 20
Total tapes needed = 77 + 20 = 97
By looking at this example, you would expect to have a minimum of 97 active cartridges at any given time. This also assumes that all the cartridges will be filled to capacity and there will be no unused tape. These calculations are based on no compression. This does give you an idea of the steps necessary to plan for an appropriately sized tape library. We would never recommend implementing an enterprise backup strategy that does not include a robotic tape library with a barcode reader. Without these, the management can become overwhelming and very susceptible to human error. It is much better to turn over media management to an enterprise backup application.
When figuring out how many slots are required to support your environment, do not forget to include some slots for cleaning tapes and at least two for the catalog backups. Actually, you will want to reserve twice as many slots for catalog backups as are needed so you can keep a copy of the catalog. If you are including an off-site storage solution of some type (vaulting) as part of your backup strategy, you need to include this in your total capacity calculations, since creating duplicate copies requires additional tapes.
Library?
As stated in the previous section, most enterprise backup strategies will include some type of robotic tape library. There are several library manufacturers, each with an entire line of libraries from small to very large. Part of this decision will be based on the drive technology you select, as some libraries support only certain drives. The considerations for selecting a library are as follows:
-
Does it handle the desired drive type?
-
Will it handle the required number of drives?
-
Does it support the needed number of slots?
-
Does it have expansion capability?
-
What type of connection, SCSI or Fiber?
-
Does it support barcode labels?
As you look at the different libraries available, you should also consider if your strategy is best served by one large library that contains all the drives and media or by smaller libraries that are distributed throughout your enterprise. We will discuss some of the reasons for picking one or the other in a later chapter, but part of this decision is whether you plan to implement a SAN or distributed media servers (or both). Generally, it is cheaper to buy one large library than two smaller libraries that equal the same capacity in drives and slots.
A sample of the library vendors are ADIC, ATL, Compaq, Exabyte, Fujitsu, HP, IBM, NEC, Sony, Spectra Logic, and StorageTek. Each of these companies has a Web site that contains all the information for their entire line of libraries. This would be an excellent place to go for information.