-->
Page 341
by David Pitts
Page 342
A large portion of this book is devoted to advanced system administration, including script and automation development, configuring and building kernels, network management, security, and many other tasks. One task not addressed thus far is performance analysis. This chapter, then, looks at the initial steps of performance analysis, showing how to determine CPU, memory, and paging space usage. Two tools are examined: vmstat and top.
Basic performance analysis is the process of identifying performance bottlenecks and involves a number of steps. The first step is to look at the big picture: Is the problem CPU or I/O related? If it is a CPU problem, what is the load average? You should probably check to see what processes are running and who is causing the problem. If it is an I/O problem, then is it paging or normal disk I/O? If it is paging, increasing memory might help. You can also try to isolate the program or the user causing the problem. If it is a disk problem, then is the disk activity balanced? If you have only one disk, perhaps you might want to install a second.
The next section looks at several tools that can be used to determine the answers to the preceding questions.
CPU usage is the first test on the list. There are many different ways to obtain a snapshot of
the current CPU usage. The one I am going to focus on here is
vmstat. The vmstat command gives you several pieces of data, including the CPU usage. The following is the syntax for the
command:
$ vmstat interval [count]
interval is the number of seconds between reports, and count is the total number of reports to give. If the count is not included, vmstat will run continuously until you stop it with Ctrl+C or kill the process.
Here is an example of the output from vmstat:
shell:/home/dpitts$ vmstat 5 5 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 1104 1412 10032 36228 0 0 10 8 31 15 7 4 24 0 0 0 1104 1736 10032 36228 0 0 0 3 111 18 1 1 99 0 0 0 1104 1816 10032 36228 0 0 0 1 115 23 2 2 96 0 1 0 1104 1148 10096 36268 8 0 7 4 191 141 4 6 91 0 0 0 1104 1868 9812 35676 6 0 2 10 148 39 25 4 70
The first line of the report displays the average values for each statistic since boot time. It should be ignored. For determining CPU used, you are interested in the last three columns, as indicated by the cpu heading. They are us, sy, and id and are explained in the following table.
Page 343
CPU | Description |
us | Percentage of CPU cycles spent on performing user tasks. |
sy | Percentage of CPU cycles spent as system tasks. These tasks include waiting on I/O, performing general operating system functions, and so on. |
id | Percentage of CPU cycles not used. This is the amount of time the system was idle. |
Just because the CPU time is high (or the idle time low) is not necessarily indicative of an overall CPU problem. It could be that there are a number of batch jobs running that just need to be rearranged. In order to determine that there is actually a CPU problem, it is important to monitor the CPU percentages for a significant period of time. If the percentages are high during this time, there is definitely a problem.
Next, look at a different section of the vmstat output. If the problem is not CPU related, look to see whether it is a problem with paging or normal disk I/O. To determine whether it is a memory problem, look at the headings memory and swap:
shell:/home/dpitts$ vmstat 5 5 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 1096 1848 4580 37524 0 0 9 8 8 17 7 3 29 1 0 0 1096 1424 4580 37980 0 0 92 10 125 24 94 4 3 2 0 0 1096 864 4536 38408 0 0 112 31 146 42 93 2 5 2 0 0 1096 732 4360 38480 10 0 98 7 146 48 97 3 1
Memory | Description |
swpd | The amount of virtual memory used (KB) |
free | The amount of idle memory (KB) |
buff | The amount of memory used as buffers (KB) |
cache | The amount of memory left in the cache (KB) |
Swap | Description |
si | The amount of memory swapped in from disk (KB/s) |
so | The amount of memory swapped to disk (KB/s) |
The most important of these fields is the swap in column. This column shows paging that has previously been swapped out, even if it was done before the vmstat command was issued.
Page 344
The io section is used to determine if the problem is with blocks sent in or out of the device:
shell:/home/dpitts$ vmstat 5 5 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 1096 1848 4580 37524 0 0 9 8 8 17 7 3 29 1 0 0 1096 1424 4580 37980 0 0 92 10 125 24 94 4 3 2 0 0 1096 864 4536 38408 0 0 112 31 146 42 93 2 5 2 0 0 1096 732 4360 38480 10 0 98 7 146 48 97 3 1
The io section is described in the following table.
IO | Description |
bi | The blocks sent to a block device (blocks/s) |
bo | The blocks received from a block device (blocks/s) |
cs | The number of context switches per second |
These fields run from several to several hundred (maybe even several thousands). If you are having a lot of in and out block transfers, the problem is probably here. Keep in mind, though, that a single reading is not indicative of the system as a whole, just a snapshot of the system at that time. There are three states in which the processes can exist. They are runtime, uninterrupted sleep, and swapped out. These are defined in the following table.
Procs | Description |
r | The number of processes waiting for runtime |
b | The number of processes in uninterrupted sleep |
w | The number of processes swapped out but otherwise able to run |
The number of processes waiting for runtime is a good indication that there is a problem. The more processes waiting, the slower the system. More than likely, you won't be looking at vmstat unless you already know there is a bottleneck somewhere, so the r field doesn't give you much vital information.
The top command provides another tool for identifying problems with a Linux system. The top command displays the top CPU processes. More specifically, top provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system and can provide an interactive interface for manipulating processes. The default is to update every five seconds. The following is an example of the output from top:
1:36am up 16 days, 7:50, 3 users, load average: 1.41, 1.44, 1.21 60 processes: 58 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 89.0% user, 8.5% system, 92.4% nice, 3.9% idle Mem: 63420K av, 62892K used, 528K free, 32756K shrd, 6828K buff Swap: 33228K av, 1096K used, 32132K free 38052K cached PID USER PRI NI SIZE RSS SHARE STATE LIB %CPU %MEM TIME COMMAND