Our client is wondering why the values in Solarwinds do not reflect the values found on their servers:
top - 17:58:42 up 1:44, 1 user, load average: 0.03, 0.06, 0.06 Tasks: 94 total, 1 running, 93 sleeping, 0 stopped, 0 zombie Cpu(s): 3.7%us, 0.2%sy, 0.0%ni, 94.8%id, 1.2%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8174656k total, 1725996k used, 6448660k free, 39772k buffers Swap: 8388600k total, 0k used, 8388600k free, 285544k cached
= ~21% Utilization
$ free -m total used free shared buffers cached Mem: 7983 1684 6298 0 39 278 -/+ buffers/cache: 1366 6616 Swap: 8191 0 8191
= ~21% Utilization
Solarwinds = 17% utilization
Figuring that this was just a case of SNMP sending slightly different data I tried a basic snmpwalk against memory:
$ snmpwalk -v 2c -c xxxxxxxxxx localhost Memory UCD-SNMP-MIB::memIndex.0 = INTEGER: 0 UCD-SNMP-MIB::memErrorName.0 = STRING: swap UCD-SNMP-MIB::memTotalSwap.0 = INTEGER: 8388600 UCD-SNMP-MIB::memAvailSwap.0 = INTEGER: 8388600 UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 8174656 UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 6446020 UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 14834620 UCD-SNMP-MIB::memMinimumSwap.0 = INTEGER: 16000 UCD-SNMP-MIB::memShared.0 = INTEGER: 0 UCD-SNMP-MIB::memBuffer.0 = INTEGER: 42552 UCD-SNMP-MIB::memCached.0 = INTEGER: 285616 UCD-SNMP-MIB::memSwapError.0 = INTEGER: 0 UCD-SNMP-MIB::memSwapErrorMsg.0 = STRING:
1-(memAvailReal/memTotalReal) = ~21%
Even when I manually enter the OIDs I receive the same basic results.
$ snmpwalk -v 2c -c xxxxxxx localhost .1.3.6.1.4.1.2021.4.5.0 UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 8174656 $ snmpwalk -v 2c -c xxxxxxx localhost .1.3.6.1.4.1.2021.4.6.0 UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 6400580
= ~21%
I'm having a hard time explaining to our client why Solarwinds is reporting a 4% lower utilization than they are seeing on the server itself. 4% could be the difference between an alert being generated or not, so you can see where the dilemma is coming from.
We have seen similar situations on Linux disk monitors, but in that case we are able to see how the values are being pulled more or less directly from SNMP. When we can fall back on Solarwinds using the SNMP reported data we are able to explain why utilization levels in Solarwinds do not reflect those on the server itself. In this case we are really at a loss for an explanation.
Is Solarwinds using a different OID? If so, is there a way to change the OID that is being used to the ones I just showed above without resorting to a UDP or something? Can someone provide me with the formula that is being used to calculate Memory Used on the CPU Load & Memory Utilization module?
Thanks in advance,
Bob