The title of this chapter reflects the three rules of system administration: 1) Backup; 2) Backup; and 3) Backup! Although this advice may sound trite, the number of people who have lost important or valuable data, not to mention all the configuration information they spend days getting correct, is enormous. Even if you don't have a tape drive or other backup storage device, get in the habit of backing up the most important pieces of information. This chapter looks at how to properly back up information.
If you run a system that has many users, network access, e-mail, and so on, backups are a very important aspect of the daily routine. If your system is used for your own pleasure and is not used for any important files, backups are not as important except as a way to recover your configuration and setup information. You should make backups either way; the difference is the regularity with which you make them.
A backup is a copy of the filesystem or files on part of a filesystem stored onto another medium that can be used later to recreate the original. In most UNIX systems, the medium used for backups is tape, but you can also use floppy disks or secondary and removable hard disks.
So many potential sources of damage to a modern computer system exist that they can be overwhelming. Damage to your hard disks and their filesystems and data can occur from hardware failures, power interruptions, or badly typed commands. Part of the potential for damage with Linux is the nature of an operating system itself. Because Linux is a multiuser and multitasking operating system, many system files are open at any moment. At most millisecond increments, data is being written to or read from a hard disk (even when the system has no users or user-started background processes on it). Also, Linux maintains a lot of information in memory about its current state and the state of the filesystems. This information must be written to disk frequently. When CPU processes are interrupted, system files and tables can be lost from memory. Disk files can be left in a temporary state that doesn't match the real filesystem status.
Although damage to a filesystem can occur from many sources, not all of which are under the control of the system administrator, it is the administrator's task to make sure the system can be restored to a working state as quickly as possible. Having a backup is sometimes your only chance of getting back lost information. Although the process of making backups can be tiresome and time-consuming, this inconvenience is often outweighed by the time required to recoup any lost information in case of problems. With utilities like cron available, the task of backing up is much easier, too.
One final aspect about backups you need to consider is where to keep the backup media after it has been used. For most home users, the only option is to store the tapes, drives, floppy disks, or other media in the same place as the Linux machine. Make sure the location is away from magnetic fields (including telephones, modems, televisions, speakers, and so on). For systems that are used for more than pleasure, consider keeping copies away from the main machine, preferably away from the same physical location. This type of off-site backup enables you to recover in case of a catastrophe, such as a fire, that destroys your system and backup media library.
By far the most commonly used medium for backups is tape, especially tape cartridges. Tape is favored because it has a low cost, a relatively easy storage requirement, and reasonable speed. The process of writing and reading data from a tape is reliable, and tapes are portable from machine to machine. All you need, of course, is a tape drive. If you don't have one, you need to find another usable medium for backups.
Possible alternative media include removable hard disks of many different types, such as the Iomega Bernoulli or ZIP drives. These cartridges use magnetic head technology just like a normal hard drive. You can remove these disk-platter systems, which usually come in a protective cartridge, from the main system and store them elsewhere. You can then cycle through several of these disks as you would with tapes. In some cases, removable cartridges are available for a competitive price compared to tape cartridges, although some high-capacity removable cartridges cost more (but also offer more storage). The cost of the removable cartridge drive varies depending on the capacity, manufacturer, and technology, but it is also competitive with a tape drive in many cases.
Several new magneto-optical cartidge systems for DOS and Windows are usable under Linux, too. These systems tend to be small 3.5-inch cartridge systems that fit into a small drive unit. A 230M magneto-optical cartridge and drive can cost less than some tape drives, and they present a more secure backup medium because magneto-optical systems are not susceptible to magnetic fields. They have a potentially longer life, too. Large-capacity magneto-optical systems, now approaching 2.4G, are currently available, although they tend to cost as much as a new computer.
Another possibility is another hard disk. With the price of hard disks dropping all the time, you can add another hard disk just for backups to your system (or any other system connected by a network) and use it as a full backup.
The popularity of writable CD-ROM and WORM (write once, read many) drives makes them a possibilty as well, although you must bear in mind that this type of media can only be written to once (the disks can't be reused). This type of media does have an advantage for archival purposes where you may need to prove certain file dates are accurate. CDs are also useful for permanent storage of important files like accounting records, personal letters, documents such as wills, and binaries. CD-ROM discs can hold 750M of data, although most consumer discs are designed for 650M.
Consider a floppy disk drive as a last resort backup device for large filesystems, although it is very good for backing up small files. High-capacity floppy disk drives are beginning to appear now, but the lack of Linux drivers make them unusable for most backup situations.
One of the most important aspects of making backups is to make them regularly. Regularity is much more important for systems that support many users and have constantly changing filesystems. If your Linux machine is used only for your own purposes, you can make backups whenever you feel there is material that should be backed up.
For most systems with a few users, constant Internet access for e-mail or newsgroups, and similar daily changes to the filesystem, a daily backup schedule is important. You don't have to make a full backup of everything on your hard drives every day, but you should consider using incremental backups, which copy only those files that are new or have changed since the last backup.
Most UNIX system administrators prefer to perform backups during the night or early hours of the morning because few users are logged in, there is no real load on the CPU, and the system has the least number of open files at this time. Because backups are easily automated using cron (see Chapter 23, "The cron and at Programs"), you can set the exact backup time to minimize the impact on any other background processing tasks that the system may be running. Because you don't have to manually start the backup process, you can do it at any time. All the system administrator has to do in this kind of backup schedule is check that the backup was completed properly, change the backup media, and log the backup.
For those systems with a single user and a lightly loaded Linux system, backups can be done practically anytime, although it is a good idea to have the backups performed automatically if your system is on all the time. If your Linux system is only active when you want to use it, get in the habit of making a backup while you do other tasks on the system.
When DOS or Windows users move to UNIX, they sometimes have the bad habit of keeping a single tape (or other media) and continually recycling that one unit every time they make a backup. It is foolhardy to keep only one backup copy of a system as this prevents you from moving back to previous backups. For example, suppose you deleted a file a week ago and had it safely stored on a backup tape at that time. When you reuse the backup tape, the old contents are erased and you can never get the old file back.
Ideally, you should keep backup copies for days, or even weeks, before reusing them. On systems with several users, this habit is even more important because users only remember that they need a file they deleted two months ago after you have recycled the tape a few times. Some backup scheduling methods can help get around this problem, as you will see in a moment. The ideal backup routine varies depending on the system administrator's ideas about backups, but a comprehensive backup system requires at least two weeks of daily incremental backups and a full backup every week.
A full backup is a complete image of everything on the filesystem, including all files. The backup media required for full backups is usually close to the total size of your filesystem. For example, if you have 150M used in your filesystem, you need about 150M of tape or other media for a backup. With compression algorithms, some backup systems can get the requirements much lower, but compression is not always available. Also, you may need several volumes of media for a single full backup, depending on the capacity of the backup unit. If your tape drive can only store 80M on a cartridge and you have to backup 150M, you need two tapes in sequence for the one backup. Because the Linux system's cron utility can't change tapes automatically, full backups over several volumes require some operator interaction. Obviously, making a full system backup on low-capacity media (like floppy disks) is a long, tedious process because there are many volumes that must be switched.
Incremental backups (sometimes called differential backups) back up only the files that have been changed or created since the last backup. Unlike DOS, Linux doesn't have a file indicator that shows what files have been backed up. However, you can use the modification date to effectively act like a backup indicator.
Incremental backups are sometimes difficult to make with Linux unless you restrict yourself to particular areas of the filesystem that are likely to have changed. For example, if your users are all in the /usr directory, you can backup only that filesystem area instead of the entire filesystem. This kind of backup is often called a partial backup, as only a part of the filesystem is saved. (Incremental backups can be made under any operating system by using a background process that logs all changes of files to a master list, and then uses the master list to create backups. Creating such a scheme is seldom worth the effort, though.)
How often should you back up your system? The usual rule is to back up whenever you can't afford to lose information. For many people, this criteria means daily backups. Imagine that you have been writing a document or program, and you lose all the work since the last backup. How long will it take to rewrite (if at all possible)? If the rewriting of the loss is more trouble than the time required to perform a backup, make a backup!
So how can you effectively schedule backups for your system, assuming you want to save your contents regularly? Assuming that your system has several users (friends calling in by modem or family members who use it) and a reasonable volume of changes (e-mail, newsgroups, word processing files, databases, or applications you are writing, for example), consider daily backups. The most common backup schedule for a small, medium-volume system requires between 10 and 14 tapes, depending on whether backups are performed on weekends. (The rest of this section uses tapes as the backup medium, but you can substitute any other device that you want.)
Label all backup tapes with names that reflect their use. For example, label your tapes Daily 1, Daily 2, and so on up to the total number of daily use tapes, such as Daily 10. Cycle through these daily use tapes, restarting the cycle after you have used all the tapes (so that Daily 1 follows after Daily 10). With this many tapes, you have a two week supply of backups (ignoring weekend backups, in this case), enabling you to recover anything going back two weeks. If you have more tapes available, use them to extend the backup cycle.
The backups can be either full or partial, depending on your needs. A good practice is to make one full backup for every four or five partial. You can make a full backup of your entire filesystem on Mondays, for instance, but only back up the /usr directories the other days of the week. Make an exception to this process if you make changes to the Linux configuration so that you have the changes captured with a full backup. You can keep track of the backups using a backup log, which is covered in the next section.
An expansion of this daily backup scheme that many administrators (including the author) prefer is the daily and weekly backup cycle. This backup system breaks up the number of tapes into daily and weekly use. For example, if you have 14 tapes, use 10 for a daily cycle as already mentioned. You can still call these tapes Daily 1 through Daily 10. Use the other four tapes in a biweekly cycle and name them Week 1, Week 2, Week 3, and Week 4.
To use this backup system, perform your daily backups as already mentioned, but use the next weekly tape when you get to the end of the daily cycle. Then you cycle through the daily tapes again, followed by the next weekly tape. (Your backup cycle is Daily 1 through Daily 10, Week 1, Daily 1 through Daily 10, Week 2, and so on.)
This backup cycle has one major advantage over a simple daily cycle. When the entire cycle is underway, there are 10 daily backups, which cover a two-week period. The biweekly tapes extend back over four complete daily cycles, or eight weeks. You can then recover a file or group of files from the filesystem as it was two months ago, instead of just two weeks. This backup method gives you a lot more flexibility in recovering information that was not noticed as missing or corrupt right away. If even more tapes are available, you can extend either the daily or biweekly cycle, or add monthly backups.
Many system administrators begin their careers by making regular backups, as they should. However, when they get to the point where they have to restore a file from a backup tape, they have no idea which tapes include the file or which tapes were used on what days. Some system administrators get by this problem by placing a piece of paper or stick note on each tape with the date and contents on it. This solution means you have to flip through the tapes to find the one you want, though, which can be awkward when you have lots of tapes. For this reason, you should keep a backup log. (A log is a good idea for backups on other operating systems as well.)
Whenever you make a backup, you should update the backup log. A backup log doesn't have to be anything complex or elaborate. You can use the back of a notebook with a couple of vertical columns drawn in, use a form on the computer itself (which you should print out regularly, of course), or keep a loose-leaf binder with a few printed forms in it. A typical backup log needs the following information:
You can record these four bits of information in a few seconds. For larger systems, you can add a few other pieces of information to complete a full backup record:
The dates of the backup help you keep track of when the last backup was performed and also act as an index for file recovery. If one of your system users knows they deleted a file by accident a week ago, you can determine the proper backup tape for the file restoration from the backup log dates.
For convenience, keep the backup log near the system. Some administrators prefer to keep the log in the same location as the backup media storage instead. Some system administrators also keep a duplicate copy of the backup log in another site, just in case of catastrophe. Do what is appropriate for your system.
The tar (tape archiver) program is usually the command you use to save files and directories to an archive medium and recover them later. The tar command works by creating an archive file, which is a single large entity that holds many files within it (much like PKZIP does in DOS, for example). The tar command only works with archives it creates.
The format of the command is a little awkward and takes some getting used to, but fortunately most users only need a few variations of the commad. The format of the tar command is as follows:
tar switch modifiers files
The files section of the command indicates which files or directories you want to archive or restore. You probably want to archive a full filesystem such as /usr. In the case of recovery, you may want a single file such as /usr/tparker/big_file.
The switch controls how tar reads or writes to the backup media. You can use only one switch with tar at a time. The valid switches are as follows:
c | Creates a new archive media |
r | Writes to end of existing archive |
t | Lists names of files in an archive |
u | Adds files that are not already modified or archived |
x | Extracts from the archive |
You can add a number of modifiers to the tar command to control the archive and how tar uses it. Valid modifiers include the following:
A | Suppresses absolute filenames |
b | Provides a blocking factor (1-20) |
e | Prevents splitting files across volumes |
f | Specifies the archive media device name |
F | Specifies the name of a file for tar arguments |
k | Gives size of archive volume (in kilobytes) |
l | Displays error messages if links are unresolved |
m | Does not restore modification times |
n | Indicates the archive is not a tape |
p | Extracts files with their original permissions |
v | Provides verbose output (lists files on the console) |
w | Displays archive action and waits for user confirmation |
The tar command uses absolute pathnames for most actions, unless you specify the A modifier.
A few examples may help explain the tar command and how to use tar switches. If you are using a tape drive called /dev/tape and the entire filesystem to be archived totals less than the tape's capacity, you can create the tape archive with the following command:
tar cf /dev/tape /
The f option enables you to specify the device name, /dev/tape in this case. The entire root filesystem is archived in a new archive file (indicated by the c). Any existing contents on the tape are automatically overwritten when the new archive is created. (You are not asked whether you are sure you want to delete the existing contents of the tape, so make sure you are overwriting material you don't need.) If you include the v option in the command, tar would echo the filenames and their sizes to the console as they are archived.
If you need to restore the entire filesystem from the tape used in the preceding example, issue the command:
tar xf /dev/tape
This command restores all files on the tape because no specific directory has been indicated for recovery. The default, when no file or directory is specified, is the entire tape archive. If you want to restore a single file from the tape, use the command
tar xf /dev/tape /usr/tparker/big_file
which restores only the file /usr/tparker/big_file.
Sometimes you may want to obtain a list of all files on a tape archive. You can do this with the following command:
tar tvf /dev/tape
This command uses the v option to display the results from tar. If the list is long, you may want to redirect the command to a file.
Most tapes require a blocking factor when creating an archive, but you don't need to specify a blocking factor when reading a tape because tar can figure it out automatically. The blocking factor tells tar how much data to write in a chunk on the tape. When archiving to a tape, you specify the blocking factor with the b modifier. For example, the command
tar cvfb /dev/tape 20 /usr
creates an new archive on /dev/tape that has a blocking factor of 20 and contains all the files in /usr. Most tapes can use a blocking factor of 20, and you can assume this factor as a default value unless your tape drive specifically won't work with this value. The only times blocking factors are changed are for floppy disks and other hard disk volumes. Note that the arguments following the modifiers are in the same order as the modifiers. The f precedes the b modifier so the arguments have the device before the blocking factor. The arguments must be in the same order as the modifiers, which can sometimes cause a little confusion.
Another common problem is that a tape may not be large enough to hold the entire archive, in which case more than one tape will be needed. To tell tar the size of each tape, you need the k option. This option uses an argument that is the capacity in kilobytes. For example, the command
tar cvbfk 20 /dev/tape 122880 /usr
tells tar to use a blocking factor of 20 for the device /dev/tape. The tape capacity is 122880 kilobytes (approximately 120 M). Again, note that the order of arguments matches the order of the modifiers.
Floppy disks create another problem with tar, as the blocking factor is usually different. When you use floppy disks, archives usually require more than one disk. You use the k option to specify the archive volume's capacity. For example, to back up the /usr/tparker directory to 1.2M floppy disks, the command would be
tar cnfk /dev/fd0 1200 /usr/tparker
where /dev/fd0 is the device name of the floppy drive and 1200 is the size of the disk in kilobytes. The n modifier tells tar that this is not a tape. As a result, tar runs a little more efficiently than if the modifier had been left off.
This chapter looked at the basics of backups. You should maintain a backup log and make regular backups to protect your work. Although tar is a little awkward to use at first, it soon becomes second nature. You can use the tar command in combination with compression utilities such as compress. Alternatively, you can use utilities like gzip and gunzip that combine both utilities into one program. Although this program may be more convenient, tar is still the most widely used archive utility and is therefore worth knowing.
A number of scripts are beginning to appear that automate the backup process or give you a menu-driven interface to the backup system. These scripts are not in general distribution, but you may want to check FTP and BBS sites for a utility that simplifies backups for you.