-->

Previous | Table of Contents | Next

Page 207

Repairing Filesystems

Some disk data is kept in memory temporarily before being written to disk, for performance reasons (see the previous discussion of the sync mount option). If the kernel does not have an opportunity to actually write this data, the filesystem can become corrupted. This can happen in several ways:

As part of the boot process, Linux runs the fsck program, whose job it is to check and repair filesystems. Most of the time the boot follows a controlled shutdown (see the manual page for shutdown), and in this case, the filesystems will have been unmounted before the reboot. In this case, fsck says that they are "clean." It knows this because before unmounting them, the kernel writes a special signature on the filesystem to indicate that the data is intact. When the filesystem is mounted again for writing, this signature is removed.

If, on the other hand, one of the disasters listed takes place, the filesystems will not be marked "clean," and when fsck is invoked, as usual, it will notice this and begin a full check of the filesystem. This also occurs if you specify the -f flag to fsck. To prevent errors creeping up on it, fsck also enforces a periodic check; a full check is done at an interval specified on the filesystem itself (usually every 20 boots or 6 months, whichever comes sooner), even if it was unmounted cleanly.

The boot process (see Chapter 4) checks the root filesystem and then mounts it read/write. (It's mounted read-only by the kernel; fsck asks for confirmation before operating on a read/write filesystem, and this is not desirable for an unattended reboot.) First, the root filesystem is checked with the following command:


fsck -V -a /

Then all the other filesystems are checked by executing this command:


fsck -R -A -V -a

These options specify that all the filesystems should be checked (-A) except the root filesystem, which doesn't need checking a second time (-R), and that operations produce informational messages about what it is doing as it goes (-V), but that the process should not be interactive (-a). The latter is done because, for example, there might not be anyone present to answer any questions from fsck.

Page 208

In the case of serious filesystem corruption, the approach breaks down because there are some things that fsck will not do to a filesystem without your say-so. In this case, it returns an error value to its caller (the startup script), and the startup script spawns a shell to allow the administrator to run fsck interactively. When this has happened, this message appears:


*** An error occurred during the file system check.

*** Dropping you to a shell; the system will reboot

*** when you leave the shell.

Give root password for maintenance

(or type Control-D for normal startup):

This is a very troubling event, particularly because it might well appear if you have other problems with the system—for example, a lockup (leading you to press the reset button) or a spontaneous reboot. None of the online manuals are guaranteed to be available at this stage, because they might be stored on the filesystem whose check failed. This prompt is issued if the root filesystem check failed, or the filesystem check failed for any of the other disk filesystems.

When the automatic fsck fails, you need to log in by specifying the root password and run the fsck program manually. When you have typed in the root password, you are presented with the following prompt:


(Repair filesystem) #

You might worry about what command to enter here, or indeed what to do at all. At least one of the filesystems needs to be checked, but which one? The preceding messages from fsck should indicate which, but it isn't necessary to go hunting for them. There is a set of options you can give fsck that tells it to check everything manually, and this is a good fallback:


fsck -A -V ; echo == $? ==

This is the same command as the previous one, but the -R option is missing, in case the root filesystem needs to be checked, and the -a option is missing, so fsck is in its "interactive" mode. This might enable a check to succeed just because it can now ask you questions. The purpose of the echo == $? == command is to unambiguously interpret the outcome of the fsck operation. If the value printed between the equals signs is less than 4, all is well. If this value is 4 or more, more recovery measures are needed. The meanings of the various values returned are as follows:

0 No errors
1 Filesystem errors corrected
2 System should be rebooted
4 Filesystem errors left uncorrected
8 Operational error
16 Usage or syntax error
128 Shared library error

Page 209

If this does not work, this might be because of a corrupted superblock—fsck starts its disk check and if this is corrupted, it can't start. By good design, the ext2 filesystem has many backup superblocks scattered regularly throughout the filesystem. Suppose the command announces that it has failed to clean some particular filesystem—for example, /dev/fubar. You can start fsck again, using a backup superblock by using the following command:


fsck -t ext2 -b 8193 /dev/fubar

8193 is the block number for the first backup superblock. This backup superblock is at the start of block group 1 (the first is numbered 0). There are more backup superblocks at the start of block group 2 (16385), and block group 3 (24577); they are spaced at intervals of 8192 blocks. If you made a filesystem with settings other than the defaults, these might change. mke2fs lists the superblocks that it creates as it goes, so that is a good time to pay attention if you're not using the default settings. There are further things you can attempt if fsck is still not succeeding, but these are very rare and usually indicate hardware problems so severe that they prevent the proper operation of fsck. Examples include broken wires in the IDE connector cable and similar nasty problems. If this command still fails, you might seek expert help or try to fix the disk in a different machine.

These extreme measures are very unlikely; a manual fsck, in the unusual circumstance where it is actually required, almost always fixes things. After the manual fsck has worked, the root shell that the startup scripts provide has done its purpose. Type exit to exit it. At this point, in order to make sure that everything goes according to plan, the boot process is started again from the beginning. This second time around, the filesystems should all be error-free and the system should boot normally.

Hardware

There are block devices under Linux for representing all sorts of random access devices—floppy disks, hard disks (XT, EIDE, and SCSI), Zip drives, CD-ROM drives, ramdisks, and loopback devices.

Hard Disks

Hard disks are large enough to make it useful to keep different filesystems on different parts of the hard disk. The scheme for dividing these disks up is called partitioning. Although it is common for computers running MS-DOS to have only one partition, it is possible to have several different partitions on each disk. The summary of how the disk is partitioned is kept in its partition table.

The Partition Table

A hard disk might be divided up like this:

Previous | Table of Contents | Next