Linux System Administrator's Survival Guide lsg20.htm

Previous Page TOC Next Page



Chapter 20


Managing Processes


Everything that runs on a Linux system is a process. Knowing how to manage the processes running on your Linux system is a critical aspect of system administration. This chapter tells you how to find out which processes are running on your system and what they are doing. You can then use this information to manage the processes as necessary.

In the course of discussing processes, this chapter doesn't bother explaining the mechanics behind how processes are allocated or how the Linux kernel manages to time slice all the processes to run a multitasking operating system. Instead, this chapter looks at the nitty-gritty aspects of process control you need to keep your system running smoothly.

Understanding Processes


You may hear the terms process and job used when talking about operating systems. A formal definition of a process is that it is a single program running in its own virtual address space. Using this definition, everything running under Linux is a process. A job, on the other hand, may involve several commands executing in series. Likewise, a single command line issued at a shell prompt may involve more than one process, especially when pipes or redirection are involved.

Several types of processes are involved with the Linux operating system. Each has its own special features and attributes:


Using the ps Command


The easiest method of finding out which processes are running on your system is to use the ps (process status) command. The ps command is available to all system users, as well as root, although the output changes a little depending on whether you are logged in as root when you issue the command. When you are logged in as a normal system user (not root) and issue the ps command by itself, it displays information about every process you are running. The following output is an example of what you might see:


$ ps

 PID TTY STAT TIME COMMAND

 41 v01 S 0:00 -bash

 134 v01 R 0:00 ps

The output of the ps command is always organized in columns. The first column is labeled PID, which means process identification number. The PID is a number that Linux assigns to each process to help in handling all processes. PIDs start at zero and increment by one for each process being run, up to some system-determined number (such as 65,564). When Linux reaches the highest number, it starts numbering from the lowest number again, skipping the numbers used by active processes. Usually, the lowest number processes are the system kernel and daemons, which start when Linux boots and remain active as long as Linux is running. To manipulate processes (to terminate them, for example), you must use the PID.

The TTY column in the ps command output shows you which terminal the process was started from. If you are logged in as a user, this column usually lists your terminal or console window. If you are running on multiple console windows, you see all the processes you started in every displayed window.

The STAT column in the ps command output shows you the current status of the process. The two most common entries in the STAT column are S for sleeping and R for running. A sleeping process is one that isn't currently active. A running process is one that is currently executing on the CPU. Processes may switch between sleeping and running many times every second.

The TIME column shows the total amount of system (CPU) time used by the process so far. These numbers tend to be very small for most processes, as they require only a short time to complete. The numbers under the TIME column are a total of the CPU time, not the amount of time the process has been alive.

Finally, the NAME column contains the name of the process you are running. This name is usually the command you entered, although some commands start up other processes. These processes are called child processes, and they show up in the ps output as though you had entered them as commands.

As a general convention, login shells have a hyphen placed before their name (such as -bash in the preceding output) to help you distinguish the startup shell from any shells you may have started afterwards. Any other shells that appear in the output don't have the hyphen in front of the name, as the following example shows:


$ ps

 PID TTY STAT TIME COMMAND

 46 v01 S 0:01 -bash

 75 v01 S 0:00 phksh

 96 v01 R 0:00 bash

 123 v01 R 0:00 ps

This example shows that the user's startup shell is bash (PID 46) and that the user started up the Korn shell (pdksh, PID 75) and another Bourne shell (bash, PID 96) afterwards. Notice also that the process status, ps, appears in this output (and the previous one) because it is running when you issued the command. The ps command always appears in the output.

When a user issues the ps command, that user sees only his own processes. If you issue the ps command when you are logged in as the superuser, you see all the processes on the system because the root login owns everything running. Because this command can produce very long outputs, especially on a system with several users, you may want to pipe the output from the ps command to a page filter (such as more or less) or save the output in a file for further examination. Both commands are shown in the following code:


ps | more

ps > /tmp/ps_file

The ps command has a number of options and arguments, although most system administrators use only a couple of common command line formats. A useful ps option for checking user processes is -u, which adds several columns to the output of the ps command. The following output is from a user (not root) command using this option:


$ ps -u

USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND

bill 41 0.1 6.8 364 472 v01 S 23:19 0:01 -bash

bill 138 0.0 3.3 72 228 v01 R 23:34 0:00 ps -u

The most important addition to the output is the USER column, which shows who started and owns the process. The name listed under the USER column is the user's login name, as found in the /etc/passwd file (ps does a lookup procedure in the /etc/passwd file to convert the user identification number to the proper username).

This option also adds the column labeled %CPU, which shows the percentage of CPU time that the process has used so far. The column %MEM shows the percentage of your system's memory currently used by the process. These numbers can be handy for finding processes that consume far too much CPU or memory. If you see a user process that has very high usage, check to make sure it is a valid process and not a runaway that will continue to drain your system's resources.

When you issue this command logged in as root, you see all the processes running on the system. As before, consider paginating the output to make it readable. You also can use the -u option to specify a user's processes by adding the appropriate username. For example, if you are logged in as root and want to see only yvonne's processes, issue the following command:


ps -u yvonne

Most users can issue this command to examine other user's processes, as well. This command lets them find out who is hogging all the CPU time! The -u option also enables the superuser see the processes users are running when they report problems without having to wade through all the system processes as well. Finally, the -u option with a username is handy to help terminate user processes when they are hung or start to run away.

Users can see all the processes running on the system (instead of just the processes they started) by using the -a option. Because the superuser sees all the processes on the system anyway, the root login doesn't have to use this option, although it is still legal to use it. This output doesn't change, though. When issued by a user (not root), the -a option produces the following output:


$ ps -a

 PID TTY STAT TIME COMMAND

 1 psf S 0:00 init

 6 psf S 0:00 update (sync)

 23 psf S 0:00 /usr/sbin/crond -l10

 29 psf S 0:00 /usr/sbin/syslogd

 31 psf S 0:00 /usr/sbin/klogd

 33 psf S 0:00 /usr/sbin/lpd

 40 psf S 0:00 selection -t ms

 42 v02 S 0:01 -bash

 43 v03 S 0:00 /sbin/agetty 38400 tty3

 44 v04 S 0:00 /sbin/agetty 38400 tty4

 45 v05 S 0:00 /sbin/agetty 38400 tty5

 46 v06 S 0:00 /sbin/agetty 38400 tty6

 41 v01 S 0:01 -bash

 140 v01 R 0:00 ps -a

This relatively short output shows a very lightly loaded system. Most of the entries are the Linux operating system kernel and daemons, as well as serial port getty processes. Only the last two commands are started by the user who issued the ps command. Of course, you can't tell who started each process with this output. To see who started each process, you can combine the -u and -a options (note that you use only one hyphen, followed by the option letters):


$ ps -au

USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND

root 1 0.0 3.0 44 208 psf S 23:19 0:00 init

root 6 0.0 1.8 24 128 psf S 23:19 0:00 update (sync)

root 23 0.0 3.0 56 212 psf S 23:19 0:00 /usr/sbin/crond -l10

root 29 0.0 3.4 61 236 psf S 23:19 0:00 /usr/sbin/syslogd

root 31 0.0 2.8 36 200 psf S 23:19 0:00 /usr/sbin/klogd

root 33 0.0 2.9 64 204 psf S 23:19 0:00 /usr/sbin/lpd

root 40 0.0 2.0 32 140 psf S 23:19 0:00 selection -t ms

root 42 0.1 6.9 372 480 v02 S 23:19 0:01 -bash

root 43 0.0 2.3 37 164 v03 S 23:19 0:00 /sbin/agetty 38400 tt

root 44 0.0 2.3 37 164 v04 S 23:19 0:00 /sbin/agetty 38400 tt

root 45 0.0 2.3 37 164 v05 S 23:19 0:00 /sbin/agetty 38400 tt

root 46 0.0 2.3 37 164 v06 S 23:19 0:00 /sbin/agetty 38400 tt

yvonne 41 0.0 6.8 364 472 v01 S 23:19 0:01 -bash

yvonne 2519 0.0 3.4 80 236 v01 R 23:39 0:00 ps -ua

This command produces a list with all the same columns as the -u option, but it shows all the processes running on the system. The order in which you enter the options doesn't matter, so -au is functionally the same as -ua.

A few other ps command line options are occasionally useful. The -l option adds information about which processes started each process (useful when you want to identify child processes):


$ ps -l

 F UID PID PPID PRI NI SIZE RSS WCHAN STAT TTY TIME COMMAND

 0 501 41 1 15 0 364 472 114d9c S v01 0:00 -bash

 0 501 121 41 29 0 64 208 0 R v01 0:00 ps -l

The PPID (Parent Process ID) column shows which process started that particular process. The preceding extract shows that the ps command was started by the bash process, as the shell is the parent of all user commands. The PPID for the login Bourne shell is PID 1, which is the init process of the operating system. (Think about what this relationship means. If init ever terminates, all other processes die, too.)



The Linux version of the ps command has a few idiosyncrasies. The hyphen before an option is not strictly necessary, so ps u works as well as ps -u. However, because UNIX convention (and most UNIX versions) require a hyphen, you should use them.

Most system administrators get by with three versions of the ps command (when logged in as root). To display information about the system as a whole, the following two command lines show practically everything there is to know about processes:


ps -ef

ps -le

The meaning of the primary columns in the output from the two commands has been mentioned earlier in this section. The rest of the columns are either evident from their shortform or are not that important. For complete information, see the ps man page (which is not entirely accurate or complete, unfortunately).

Using kill


A process that locks up a terminal or doesn't do anything is generally referred to as a hung process. Sometimes a user has a process that doesn't terminate properly (especially common with programmers). This kind of process is called a runaway process. In both cases, the only way to get rid of the process and restore some normalcy to the system is to terminate the process by issuing the kill command.

To use kill, you must have access to another window or console where you can issue commands. If your terminal is completely locked up, you will have to find another one from which to log in. As a user, you can only kill your own processes; you cannot affect any process another user or the system is running. As root, you can terminate any process with the kill command.

In order to use the kill command, you need the process identification number (PID) of the process to be terminated. Use the ps command, as explained in the preceding section, to find out this information. Next, use the kill command with the PID as an argument. For example, the following terminal session shows a user process called bad_prog started by walter that has hung up and needs to be killed. The PID is obtained by displaying all of walter's processes:


$ ps -u walter

USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND

walter 561 0.1 6.8 364 472 v01 S 13:19 0:01 -bash

walter 598 9.3 4.1 2736 472 v01 R 15:26 2:01 bad_prog

$ kill 598

When you issue the kill command, you don't get any return message if it works properly. The only way to verify that the process was properly terminated is to issue another ps command and look for the PID or process name.

Because some processes spawn child processes with different PIDs, you must be sure to check that all the child processes are terminated as well. The best way to do this is to watch the names of the executing processes for a few minutes to ensure that the child isn't dormant, only to return later. This problem usually happens while the child processes are being generated by a parent. Check the PPID column (use the ps -l option) to see which process is the parent and terminate that process as well.



When you are killing processes and are logged in as root, make sure you type the correct PID or you may inadvertently terminate another process. Check the PID carefully! Also, don't kill any system processes unless you know what they do and why they need to be terminated.

If the process doesn't terminate properly with the kill command, you need to use sterner measures. The kill command has several levels of operation. When issued with no arguments other than the PID, the kill command tries to gracefully terminate the process (which means any open files are closed and kill is generally polite to the process). If this command doesn't work, use the -9 option, which is a little more forceful in its attempt to terminate the process. Essentially, the command tries to terminate the process without regard to open files or child processes, although you seldom have to worry about problems with this type of termination because Linux handles it all. For example, to forcefully terminate the process with PID 726, issue the following command:


kill -9 726

If the process still doesn't terminate, it's time to get ruthless and use the -15 option, the most potent form of kill command. Only use this option when the other forms of the kill command are not working, as it doesn't try to be nice to the process or any open files at all. To use this option on the same sample process, issue the command:


kill -15 726

If that doesn't work, the process may be unkillable. This situation does happen quite often with Linux, and the only solution is to shut down and reboot the machine.

To help prevent a user from killing other user's processes, ps checks for the process owner when you issue a kill command. If a user tries to kill another user's process, a message like the following one is displayed:


kill: - Not owner

The superuser doesn't get this message because the superuser login can kill anything except some system processes (such as init).

Using the top Command


Sometimes you may want to watch the system's behavior to spot problems, monitor system loading, or check for runaway processes. Instead of running the ps command at regular intervals, Linux offers the top command as an alternative. When you issue the top command, the screen shows a continual snapshot of the system, taken every five seconds (unless you specify a different time increment). By default, top shows the most CPU-intensive tasks on the system as a full-screen display.

The syntax of the top command allows you to alter much of the utility's behavior from the command line, although most changes are also available from within top:


top [-] [d delay] [q] [S] [s] [i]

The command line options supported by top are as follows:
d Specifies the delay between screen updates (can be changed from within top using the s command)
q Forces top to refresh without a delay
S Uses cumulative mode (the CPU time each listed process shows includes any children the process spawned)
s Runs top in secure mode (disables interactive commands)
i Ignores idle or zombie processes

The top command can be very useful when you are tweaking a system's performance or want to see how heavily used the system is when a large number of users or processes are involved. Many system administrators run top with a slow delay (such as every 60 seconds) on a space terminal or console window throughout the day to get a fast assessment of the system's performance and load. If you do run top for a long period, use the s option to switch on secure mode. This option disables many of the interactive commands that can enable any user with access to the top screen to manipulate processes.

The output from the top command shows several summary lines at the top of the screen, followed by a list of the most CPU-intensive processes:


1:58pm up 59 min, 2 users, load average: 0.13, 0.34, 0.98

26 processes: 25 sleeping, 1 running, 0 zombie, 0 stopped

CPU states: 0.9% user, 6.4% system, 0.0% nice, 92.7% idle

Mem: 14620K av, 6408K used, 8212K free, 4632K shrd, 2328K buff

Swap: 0K av, 0K used, 0K free

 PID USER PRI NI SIZE RES SHRD STAT %CPU %MEM TIME COMMAND

 236 root 19 0 93 316 344 R 7.3 2.1 0:00 top

 1 root 1 0 48 232 308 S 0.0 1.5 0:00 init

 63 root 2 0 388 556 572 S 0.0 3.8 0:00 -bash

 209 root 1 0 98 320 356 S 0.0 2.1 0:00 in.telnetd

 24 root 1 0 60 228 296 S 0.0 1.5 0:00 /usr/sbin/crond -l10

  K

 6 root 1 0 36 164 336 S 0.0 1.1 0:00 bdflush (daemon)

 7 root 1 0 36 168 340 S 0.0 1.1 0:00 update (bdflush)

 38 root 1 0 73 280 332 S 0.0 1.9 0:00 /usr/sbin/syslogd

 40 root 1 0 44 240 320 S 0.0 1.6 0:00 /usr/sbin/klogd

 42 bin 1 0 84 240 320 S 0.0 1.6 0:00 /usr/sbin/rpc.portmap

 44 root 1 0 76 292 320 S 0.0 1.9 0:00 /usr/sbin/inetd

 46 root 1 0 68 212 304 S 0.0 1.4 0:00 /usr/sbin/lpd

 51 root 1 0 116 280 376 S 0.0 1.9 0:00 /usr/sbin/rpc.nfsd

The top utility displays several useful pieces of information in the first few lines. The uptime display on the first line shows the total amount of time the system has been up since the last reset. Following the uptime are three load averages that are constantly updated. The load averages show the average number of processes run in the last one, five, and fifteen minutes.

The total number of processes that are running at the time of the snapshot are shown on the second line, broken down following the total into the number of processes currently running, sleeping (not executing), zombie (status unsure or defunct), and stopped.

The CPU states line (the third line of the header) shows the percentage of CPU time in user mode, system mode, nice tasks, and idle. (A nice process has a negative nice value, which sets the priority of the process. Note that a nice task is counted by Linux as both a user task and a system task, so the total of the process values may add up to more than 100 percent.)

The fourth header line of the top output shows memory usage, including the amount of available memory, free memory at the moment of the snapshot, currently used memory, the amount of shared memory, and the amount of memory used for buffers. The last header line shows the swap statistics, which reflect the use of the system's swap space. The line shows the total swap space, available swap space, and used swap space. Following the header is the list of CPU-intensive processes, structured like the ps command's output.

While top is running, you can issue some commands to alter its behavior (unless you started top with the -s option to disable interactive commands). The following interactive commands are available:
^L Redraws the screen
h/? Displays help
k Kills a process (you are prompted for the PID and the signal level such as 9 or 15, as discussed earlier under the kill command)
i Ignores idle and zombie processes
n/# Changes the number of processes displayed
q Quits
r Renices a process (you are prompted for the PID and the nice value)
S Toggles cumulative mode
s Changes the delay between updates

Note that some terminals cannot display the output of the top command properly. When run, top should clear the entire screen and display a full screen of information. If you see overlapping lines or the screen has large blank areas, the terminal is not properly supported for top output. This problem often occurs when you use telnet across a network or emulate a terminal like a VT100.

Summary


This chapter has shown you how to obtain listings of the processes currently executing on your Linux system and how to terminate those processes when they require it. Although you may not have to use this knowledge often, every operating system has occasions when something gets out of hand and needs you to control it. The problems multiply as the number of users increases. Process commands enable you to correct the problem without terminating the operating system.

Previous Page Page Top TOC Next Page