Page 1 of 1

lfd load average email - sort by CPU

Posted: 28 Jan 2007, 20:24
by silver_2000
I love the fact that I get an email when the load goes high and it includes the process list from TOP

What would make it even better is if the list that was emailed could be sorted by CPU usage - since thats the priority of the email ...

Keep up the great work

Posted: 29 Jan 2007, 09:59
by chirpy
The email is actually triggered by the load average on the server. You can have a high load average with low CPU utilisation, so it might give you a false view of the problem. The output that is emailed is actually from:

ps aux

Posted: 29 Jan 2007, 13:57
by silver_2000
chirpy wrote:The email is actually triggered by the load average on the server. You can have a high load average with low CPU utilisation, so it might give you a false view of the problem. The output that is emailed is actually from:

ps aux
I'm a newbie but I assumed that load was CPU utilization. What else could it be ?
If its processes waiting to run you have to assume that the CPUs are currently in use on current processes correct ?

http://www.teamquest.com/resources/gunt ... /index.htm
In short it is the average sum of the number of processes waiting in the run-queue plus the number currently executing over 1, 5, and 15 minute time periods.
It’s calculated like this: load(t) = load(t - 1) e^(-5/60m) + n (1 - e^(-5/60m))
(at least in Linux)

And thats why Im confused because I have seen load of 25 with the top command showing no particular PID using all that much processor .

Posted: 29 Jan 2007, 17:28
by chirpy
However, the reason you have processes queued can be for 3 main performance reasons (and a combination of all/any of them) which don't necessarily lead to high CPU usage (except the first one, obviously):

1. CPU activity

2. Memory use

3. I/O

So, for example, you could have a single process accessing a lot of data on disk (disk I/O) and on a system with a bottleneck on the disk I/O can cause high load, eventhough CPU usage is light.

Another example is memory thrashing, where you don't have sufficient physical memory for all the applications needs. This results in frantic memory being swapped to disk. This will cause high I/O to disk, high memory usage (clearly) and high CPU usage due to swapping activity. No one process is necessarily to blame in that scenario.

In both of those cases, the %age CPU usage of a given process doesn't necessarily have any bearing for the reasons of high load average, they simply mean the process queue will lengthen. So, sorting on CPU %age will not help determine where the problem lies.

Since a large proportion of performance issues that we see relate to bottlenecks and not runaway processes, individual process CPU usage is usually less of an issue.

Posted: 29 Jan 2007, 17:37
by silver_2000
Great info.
That clears up a lot
Im pretty sure the issue is disk I/O - Too bad its hard to track
Next time I upgrade my server Ill look at faster disks :)