-->

Previous | Table of Contents | Next

Page 341

CHAPTER 16

Advanced System
Administration

by David Pitts

IN THIS CHAPTER

Page 342

A large portion of this book is devoted to advanced system administration, including script and automation development, configuring and building kernels, network management, security, and many other tasks. One task not addressed thus far is performance analysis. This chapter, then, looks at the initial steps of performance analysis, showing how to determine CPU, memory, and paging space usage. Two tools are examined: vmstat and top.

Basic Performance Analysis

Basic performance analysis is the process of identifying performance bottlenecks and involves a number of steps. The first step is to look at the big picture: Is the problem CPU or I/O related? If it is a CPU problem, what is the load average? You should probably check to see what processes are running and who is causing the problem. If it is an I/O problem, then is it paging or normal disk I/O? If it is paging, increasing memory might help. You can also try to isolate the program or the user causing the problem. If it is a disk problem, then is the disk activity balanced? If you have only one disk, perhaps you might want to install a second.

The next section looks at several tools that can be used to determine the answers to the preceding questions.

Determining CPU Usage

CPU usage is the first test on the list. There are many different ways to obtain a snapshot of the current CPU usage. The one I am going to focus on here is vmstat. The vmstat command gives you several pieces of data, including the CPU usage. The following is the syntax for the
command:


$ vmstat interval [count]

interval is the number of seconds between reports, and count is the total number of reports to give. If the count is not included, vmstat will run continuously until you stop it with Ctrl+C or kill the process.

Here is an example of the output from vmstat:


shell:/home/dpitts$ vmstat 5 5

 procs                  memory    swap        io    system         cpu

 r b w  swpd  free  buff cache  si  so   bi   bo   in   cs  us  sy  id

 0 0 0  1104  1412 10032 36228   0   0   10    8   31   15   7   4  24

 0 0 0  1104  1736 10032 36228   0   0    0    3  111   18   1   1  99

 0 0 0  1104  1816 10032 36228   0   0    0    1  115   23   2   2  96

 0 1 0  1104  1148 10096 36268   8   0    7    4  191  141   4   6  91

 0 0 0  1104  1868  9812 35676   6   0    2   10  148   39  25   4  70

The first line of the report displays the average values for each statistic since boot time. It should be ignored. For determining CPU used, you are interested in the last three columns, as indicated by the cpu heading. They are us, sy, and id and are explained in the following table.

Page 343

CPU Description
us Percentage of CPU cycles spent on performing user tasks.
sy Percentage of CPU cycles spent as system tasks. These tasks include waiting on I/O, performing general operating system functions, and so on.
id Percentage of CPU cycles not used. This is the amount of time the system was idle.

Just because the CPU time is high (or the idle time low) is not necessarily indicative of an overall CPU problem. It could be that there are a number of batch jobs running that just need to be rearranged. In order to determine that there is actually a CPU problem, it is important to monitor the CPU percentages for a significant period of time. If the percentages are high during this time, there is definitely a problem.

Next, look at a different section of the vmstat output. If the problem is not CPU related, look to see whether it is a problem with paging or normal disk I/O. To determine whether it is a memory problem, look at the headings memory and swap:


shell:/home/dpitts$ vmstat 5 5

 procs                  memory    swap        io    system         cpu

 r b w  swpd  free  buff cache  si  so   bi   bo   in   cs  us  sy  id

 1 0 0  1096  1848  4580 37524   0   0    9    8    8   17   7   3  29

 1 0 0  1096  1424  4580 37980   0   0   92   10  125   24  94   4   3

 2 0 0  1096   864  4536 38408   0   0  112   31  146   42  93   2   5

 2 0 0  1096   732  4360 38480  10   0   98    7  146   48  97   3   1

Memory Description
swpd The amount of virtual memory used (KB)
free The amount of idle memory (KB)
buff The amount of memory used as buffers (KB)
cache The amount of memory left in the cache (KB)

Swap Description
si The amount of memory swapped in from disk (KB/s)
so The amount of memory swapped to disk (KB/s)

The most important of these fields is the swap in column. This column shows paging that has previously been swapped out, even if it was done before the vmstat command was issued.

Page 344

The io section is used to determine if the problem is with blocks sent in or out of the device:


shell:/home/dpitts$ vmstat 5 5

 procs                  memory    swap        io    system         cpu

 r b w  swpd  free  buff cache  si  so   bi   bo   in   cs  us  sy  id

 1 0 0  1096  1848  4580 37524   0   0    9    8    8   17   7   3  29

 1 0 0  1096  1424  4580 37980   0   0   92   10  125   24  94   4   3

 2 0 0  1096   864  4536 38408   0   0  112   31  146   42  93   2   5

 2 0 0  1096   732  4360 38480  10   0   98    7  146   48  97   3   1

The io section is described in the following table.

IO Description
bi The blocks sent to a block device (blocks/s)
bo The blocks received from a block device (blocks/s)
cs The number of context switches per second

These fields run from several to several hundred (maybe even several thousands). If you are having a lot of in and out block transfers, the problem is probably here. Keep in mind, though, that a single reading is not indicative of the system as a whole, just a snapshot of the system at that time. There are three states in which the processes can exist. They are runtime, uninterrupted sleep, and swapped out. These are defined in the following table.

Procs Description
r The number of processes waiting for runtime
b The number of processes in uninterrupted sleep
w The number of processes swapped out but otherwise able to run

The number of processes waiting for runtime is a good indication that there is a problem. The more processes waiting, the slower the system. More than likely, you won't be looking at vmstat unless you already know there is a bottleneck somewhere, so the r field doesn't give you much vital information.

top

The top command provides another tool for identifying problems with a Linux system. The top command displays the top CPU processes. More specifically, top provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system and can provide an interactive interface for manipulating processes. The default is to update every five seconds. The following is an example of the output from top:


  1:36am  up 16 days,  7:50,  3 users,  load average: 1.41, 1.44, 1.21

60 processes: 58 sleeping, 2 running, 0 zombie, 0 stopped

CPU states: 89.0% user,  8.5% system, 92.4% nice,  3.9% idle

Mem:  63420K av, 62892K used,   528K free, 32756K shrd,  6828K buff

Swap: 33228K av,  1096K used, 32132K free               38052K cached

PID USER     PRI  NI  SIZE  RSS SHARE STATE  LIB %CPU %MEM   TIME COMMAND

Previous | Table of Contents | Next