Sunday, September 26, 2010

Pruning Bogus Value Spikes from MRTG Gauge Values after a Reboot

MRTG is awesome and simple.  I use it in a bunch of places to quickly figure out roughly what's going on over my networks and servers.  Traffic numbers, disk space, the usual stuff.  My usage of it is strictly ghetto, but it works really well for me even when I misuse it.

When I reboot a machine, though, I find my gauges all spike.  Traffic numbers, CPU load, etc, there's a spike right at the point where I rebooted the machine.  It seems to be that MRTG sees the gauge is reset, but it treats it as an overflow, as if the gauge wrapped around, and not as if it's been zeroed out.  So the spike may be its attempt to account for the massive jump in data that would cause a wraparound.

the only solution I have just yet - because I can't use ABSMax or MaxBytes parameters - is to prune the ugly data points:

val=500000; \
for F in /var/www/html/mrtg/{cpu,dev-*}/*.{log,old};\
  do awk -vX=$val '
    NF < 4 || ($2 < X && $3 < X && $4 < X && $5 < X)
  ' $F | diff -u $F - | patch $F;\
done
Really, though, you should use the ABSMax and MaxBytes parameters whenever and wherever you can.  It'll prevent this spike when you reset your machine.

Finally, I'm sorry if the above code snippet looks like absolute ass.  There is some formatting for reading ease, while I usually do it all on one line, but also what is with the pathetic format munging in this 'new, better' editor?  It's horrid!  Can we roll it back, please?

Labels: , , ,