17 Feb 2012 15:59

## Are measurements cumulative?

```Hi all,

sorry if this has been asked and answered before or if I was to blind to
find it inside the documentation and the mail archive.

How does the tool calculate

-a- quantities (like number of slabs, memory allocated, queuelength,...)
-b- utilisation (CPU used%,...)

when using different sampling intervals (-i ...)?

For rates it seems obvious(?) to divide the difference of counter values
at the end and and the start by the sampling time.

Could someone explain what is the effect on choosing longer sampling
intervals and how 'averaging' is done and which meas are considered in
the calculation?

Cheers

-Frank

17 Feb 2012 16:10

### Re: Are measurements cumulative?

On Fri, Feb 17, 2012 at 9:59 AM, Frank Heckes wrote:
Hi all,

sorry if this has been asked and answered before or if I was to blind to
find it inside the documentation and the mail archive.

could be you're blind OR it could be it's not there.  ;)
lot's of documentation and when you're a one person team you miss things.

How does the tool calculate

-a- quantities (like number of slabs, memory allocated, queuelength,...)

these are simply instantaneous values as reported in /proc/xxx

-b- utilisation (CPU used%,...)

all cpu times reported in /proc/stat are in jiffies, so I just look at the change between to samples and add up the user, system and other times ignoring iowait since that's not real cpu.  this then gives me the total number of jiffies in the interval.  now it's simply a matter of something like

user/total*100 to tell me the % time spend in user time

some tools actually report based on a single core and so a cpu bound 8 core system would be reported as 800% but collectl reports 100%

that's simply the different between start/finish divided by seconds.  if you include -on, it divides by 1 to give absolute numbers

when using different sampling intervals (-i ...)?

numbers should all be the same unless -on specified

For rates it seems obvious(?) to divide the difference of counter values
at the end and and the start by the sampling time.

correct

Could someone explain what is the effect on choosing longer sampling
intervals and how 'averaging' is done and which meas are considered in
the calculation?

the only different in using longer sampling intervals is loss of accuracy.  my favorite example is if you have a 30 second spike in the network and are only sampling every couple of minutes, you'll see an elevation but never know you were saturated for 1/2 minute.  even at 10 seconds you'll miss shorter spiked, but 10 seconds has shown to be a good compromise, though some user choose 5 or even 1 second.

hope this helps

-mark

Cheers

-Frank

17 Feb 2012 17:50

### Re: Are measurements cumulative?

```Hello Mark,

many thanks for the quick and detailed answer.

Cheers

-Frank

On Fri, 2012-02-17 at 16:10 +0100, Mark Seger wrote:
>
>

>
>         How does the tool calculate
>
>         -a- quantities (like number of slabs, memory allocated,
>         queuelength,...)
>
> these are simply instantaneous values as reported in /proc/xxx
>
Okay, this means that one might miss peak values upon increasing the
measurement time.

>         -b- utilisation (CPU used%,...)
>
>
> all cpu times reported in /proc/stat are in jiffies, so I just look at
> the change between to samples and add up the user, system and other
> times ignoring iowait since that's not real cpu.  this then gives me
> the total number of jiffies in the interval.  now it's simply a matter
> of something like
>
>
> user/total*100 to tell me the % time spend in user time
>
>
> some tools actually report based on a single core and so a cpu bound 8
> core system would be reported as 800% but collectl reports 100%
>
Is the disk utilisation also based on jiffles?

>         -c- rates (disks read/write, iops,...)
>
>
> that's simply the different between start/finish divided by seconds.
>  if you include -on, it divides by 1 to give absolute numbers
>
>         when using different sampling intervals (-i ...)?
>
>
> numbers should all be the same unless -on specified
>
>         For rates it seems obvious(?) to divide the difference of
>         counter values
>         at the end and and the start by the sampling time.
>
>
> correct
>
>         Could someone explain what is the effect on choosing longer
>         sampling
>         intervals and how 'averaging' is done and which meas are
>         considered in
>         the calculation?
>
>
> the only different in using longer sampling intervals is loss of
> accuracy.  my favorite example is if you have a 30 second spike in the
> network and are only sampling every couple of minutes, you'll see an
> elevation but never know you were saturated for 1/2 minute.  even at
> 10 seconds you'll miss shorter spiked, but 10 seconds has shown to be
> a good compromise, though some user choose 5 or even 1 second.
Just for interest would it make sense that the tool would have some
'internal' hidden counters to make some maybe configurable number of
'hidden' measurements and sum values for each counter in these counters
to take the average in the end?
Maybe this is dump, cause one could choose a smaller measurement
interval from the start that would lead to the same computational
(concerning CPU and memory) overhead, but it would help to get more
'accurate' meas even for bigger interval with lower amount of
performance measurement data(?).

>
> hope this helps
>
Yes, very much. Many thanks!

Cheers

-Frank

17 Feb 2012 19:24

### Re: Are measurements cumulative?

>
>         How does the tool calculate
>
>         -a- quantities (like number of slabs, memory allocated,
>         queuelength,...)
>
> these are simply instantaneous values as reported in /proc/xxx
>
Okay, this means that one might miss peak values upon increasing the
measurement time.

exactly.  that's one of the reasons I go crazy when I see people running sar at a 10 minute interval

>         -b- utilisation (CPU used%,...)
>
>
> all cpu times reported in /proc/stat are in jiffies, so I just look at
> the change between to samples and add up the user, system and other
> times ignoring iowait since that's not real cpu.  this then gives me
> the total number of jiffies in the interval.  now it's simply a matter
> of something like
>
>
> user/total*100 to tell me the % time spend in user time
>
>
> some tools actually report based on a single core and so a cpu bound 8
> core system would be reported as 800% but collectl reports 100%
>
Is the disk utilisation also based on jiffles?

if Time:HiRes is installed it uses that.  if not installed it does use jiffies

>         -c- rates (disks read/write, iops,...)
>
>
> that's simply the different between start/finish divided by seconds.
>  if you include -on, it divides by 1 to give absolute numbers
>
>         when using different sampling intervals (-i ...)?
>
>
> numbers should all be the same unless -on specified
>
>         For rates it seems obvious(?) to divide the difference of
>         counter values
>         at the end and and the start by the sampling time.
>
>
> correct
>
>         Could someone explain what is the effect on choosing longer
>         sampling
>         intervals and how 'averaging' is done and which meas are
>         considered in
>         the calculation?
>
>
> the only different in using longer sampling intervals is loss of
> accuracy.  my favorite example is if you have a 30 second spike in the
> network and are only sampling every couple of minutes, you'll see an
> elevation but never know you were saturated for 1/2 minute.  even at
> 10 seconds you'll miss shorter spiked, but 10 seconds has shown to be
> a good compromise, though some user choose 5 or even 1 second.
Just for interest would it make sense that the tool would have some
'internal' hidden counters to make some maybe configurable number of
'hidden' measurements and sum values for each counter in these counters
to take the average in the end?
Maybe this is dump, cause one could choose a smaller measurement
interval from the start that would lead to the same computational
(concerning CPU and memory) overhead, but it would help to get more
'accurate' meas even for bigger interval with lower amount of
performance measurement data(?).

it could do lots of things but in the spirit of simplicity (many would argue it lost its simplicity long ago ;)), it is what it is.  Also, collectl NEVER looks at the data it collects, at least not when explicitly displaying results as I wanted to keep the collection as light-weight as possible.

-mark

>
> hope this helps
>
Yes, very much. Many thanks!

Cheers

-Frank

