The Sysadmin's Toolbox: sar
https://haxordoubt.blogspot.com/2012/09/the-sysadmins-toolbox-sar.html
There's an old saying: "When the cat's away the mice will play." The same is true for servers. It's as if servers wait until you aren't logged in (and usually in the middle of REM sleep) before they have problems. Logs can go a long way to help you isolate problems that happened in the past on a machine, but if the problem is due to high load, logs often don't tell the full story. In my March 2010 column "Linux Troubleshooting, Part I: High Load" (http://www.linuxjournal.com/article/10688), I discussed how to troubleshoot a system with high load using tools such as uptime and top. Those tools are great as long as the system still has high load when you are logged in, but if the system had high load while you were at lunch or asleep, you need some way to pull the same statistics top gives you, only from the past. That is where sar comes in.
Enable sar Logging
sar is a classic Linux tool that is part of the sysstat package and should be available in just about any major distribution with your regular package manager. Once installed, it will be enabled on a Red Hat-based system, but on a Debian-based system (like Ubuntu), you might have to edit /etc/default/sysstat, and make sure thatENABLED
is set to true. On a Red Hat-based system, sar will log seven days of statistics by default. If you want to log more than that, you can edit /etc/sysconfig/sysstat and change the HISTORY
option. Once sysstat is configured and enabled, it will collect statistics about your system every ten minutes and store them in a logfile under either /var/log/sysstat or /var/log/sa via a cron job in /etc/cron.d/sysstat. There is also a daily cron job that will run right before midnight and rotate out the day's statistics. By default, the logfiles will be date-stamped with the current day of the month, so the logs will rotate automatically and overwrite the log from a month ago.
CPU Statistics
After your system has had some time to collect statistics, you can use the sar tool to retrieve them. When run with no other arguments, sar displays the current day's CPU statistics:
$ sar
. . .
07:05:01 PM CPU %user %nice %system %iowait %steal %idle
. . .
08:45:01 PM all 4.62 0.00 1.82 0.44 0.00 93.12
08:55:01 PM all 3.80 0.00 1.74 0.47 0.00 93.99
09:05:01 PM all 5.85 0.00 2.01 0.66 0.00 91.48
09:15:01 PM all 3.64 0.00 1.75 0.35 0.00 94.26
Average: all 7.82 0.00 1.82 1.14 0.00 89.21
If you are familiar with the command-line tool top, the above CPU statistics should look familiar, as they are the same as you would get in real time from top. You can use these statistics just like you would with top, only in this case, you are able to see the state of the system back in time, along with an overall average at the bottom of the statistics, so you can get a sense of what is normal. Because I devoted an entire previous column to using these statistics to troubleshoot high load, I won't rehash all of that here, but essentially, sar provides you with all of the same statistics, just at ten-minute intervals in the past.
RAM Statistics
sar also supports a large number of different options you can use to pull out other statistics. For instance, with the-r
option, you can see RAM statistics:
$ sar -r
. . .
07:05:01 PM kbmemfree kbmemused %memused kbbuffers kbcached
kbcommit %commit
. . .
08:45:01 PM 881280 2652840 75.06 355284 1028636
8336664 183.87
08:55:01 PM 881412 2652708 75.06 355872 1029024
8337908 183.89
09:05:01 PM 879164 2654956 75.12 356480 1029428
8337040 183.87
09:15:01 PM 886724 2647396 74.91 356960 1029592
8332344 183.77
Average: 851787 2682333 75.90 338612 1081838
8341742 183.98
Just like with the CPU statistics, here I can see RAM statistics from the past similar to what I could find in top.
Disk Statistics
Back in my load troubleshooting column, I referenced sysstat as the source for a great disk I/O troubleshooting tool called iostat. Although that provides real-time disk I/O statistics, you also can pass sar the-b
option to get disk I/O data from the past:
$ sar -b
. . .
07:05:01 PM tps rtps wtps bread/s bwrtn/s
. . .
08:45:01 PM 2.03 0.33 1.70 9.90 31.30
08:55:01 PM 1.93 0.03 1.90 1.04 31.95
09:05:01 PM 2.71 0.02 2.69 0.69 48.67
09:15:01 PM 1.52 0.02 1.50 0.20 27.08
Average: 5.92 3.42 2.50 77.41 49.97
I figure these columns need a little explanation:
tps
: transactions per second.rtps
: read transactions per second.wtps
: write transactions per second.bread/s
: blocks read per second.bwrtn/s
: blocks written per second.
-A
option, which will return a complete dump of all the statistics it has for the day (or just browse its man page). Turn Back Time
So by default, sar returns statistics for the current day, but often you'll want to get information a few days in the past. This is especially useful if you want to see whether today's numbers are normal by comparing them to days in the past, or if you are troubleshooting a server that misbehaved over the weekend. For instance, say you noticed a problem on a server today between 5PM and 5:30PM. First, use the-s
and -e
options to tell sar to display data only between the start (-s
) and end (-e
) times you specify:
$ sar -s 17:00:00 -e 17:30:00
Linux 2.6.32-29-server (www.example.net) 02/06/2012 _x86_64_
(2 CPU)
05:05:01 PM CPU %user %nice %system %iowait %steal %idle
05:15:01 PM all 4.39 0.00 1.83 0.39 0.00 93.39
05:25:01 PM all 5.76 0.00 2.23 0.41 0.00 91.60
Average: all 5.08 0.00 2.03 0.40 0.00 92.50
To compare that data with the same time period from a different day, just use the
-f
option and point sar to one of the logfiles under /var/log/sysstat or /var/log/sa that correspond to that day. For instance, to pull statistics from the first of the month:
$ sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01
Linux 2.6.32-29-server (www.example.net) 02/01/2012 _x86_64_
(2 CPU)
05:05:01 PM CPU %user %nice %system %iowait %steal %idle
05:15:01 PM all 9.85 0.00 3.95 0.56 0.00 85.64
05:25:01 PM all 5.32 0.00 1.81 0.44 0.00 92.43
Average: all 7.59 0.00 2.88 0.50 0.00 89.04
You also can add all of the normal sar options when pulling from past logfiles, so you could run the same command and add the
-r
argument to get RAM statistics:
$ sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01 -r
Linux 2.6.32-29-server (www.example.net) 02/01/2012 _x86_64_
(2 CPU)
05:05:01 PM kbmemfree kbmemused %memused kbbuffers kbcached
kbcommit %commit
05:15:01 PM 766452 2767668 78.31 361964 1117696
8343936 184.03
05:25:01 PM 813744 2720376 76.97 362524 1118808
8329568 183.71
Average: 790098 2744022 77.64 362244 1118252
8336752 183.87
As you can see, sar is a relatively simple but very useful troubleshooting tool. Although plenty of other programs exist that can pull trending data from your servers and graph them (and I use them myself), sar is great in that it doesn't require a network connection, so if your server gets so heavily loaded it doesn't respond over the network anymore, there's still a chance you could get valuable troubleshooting data with sar.