Troubleshooting CPU , Memory , Disk with counter threshold:
CPU
%RDY – >10 -% of time a VM was waiting to be scheduled. If values between 5 and 10 % take care.Possible reasons: too many vCPUs, too many vSMP VMs or a CPU limit setting.
%CSTP– >-3 -This value is interesting if you are using vSMP virtual machines. It shows the percentage of time a ready to run VM has spent in co-deschedule state.If value is >3 decrease the number of vCPUs from the VM concerned.
%MLMTD– >-1 Counter showing percentage of time a ready to run vCPU was not scheduled because of a CPU limit setting. Remove limit for better performance.
%VMWAIT– 100-percentage of time a VM was waiting for some VMkernel activity to complete (such as I/O) before it can continue. Includes %SWPWT and “blocked”, but not IDLE Time (as %WAIT does). Possible cause: Storage performance issue | latency to a device in the VM configuration eg. USB device, serial pass-through device or parallel pass-through device
%SWPWT– >-5 Counter showing how long a VM has to wait for swapped pages read from disk. A reason for this could be memory overcommitment. Pay attention if %SWPWT is >5!
%SYS >10 Percentage of time spent by system to process interrupts and to perform other system activities on behalf of the world.Possible cause: maybe caused by high I/O VM
Disk
DAVG/cmd >-25 Latency at the device driver level Indicator for storage performance troubles
KAVG/cmd >-3 Latency caused by VMKernel Possible cause: Queuing (wrong queue depth parameter or wrong failover policy)
GAVG: >-25 GAVG = DAVG + KAVG
ABRTS/s >1 Commands aborted per second If the storage system has not responded within 60 seconds VMs with an Windows Operating System will issue an abort.
RESET/s >1 Number of commands reset per second.
NUMA Node – ESXTOP press “M” change field D,G
N%L <80 -Percentage of VM Memory located at the local NUMA Node. If this value is less than 80 percent the VM will experience performance issues.
NLMEM: VM Memory (in MB) located at local Node
NRMEM: VM Memory (in MB) located at remote Node
NMN: Numa Node where the VM is located
Memory
MCTLSZ: >-1 Amount of guest physial memory (MB) the ESXi Host is reclaiming by ballon driver. A reason for this is memory overcommitment
SWCUR: >-1 Memory (in MB) that has been swapped by VMKernel.Possible cause: memory overcommitment.
SWR/s, >-1 Rate at which the ESXi Host is writing to or reading from swapped memory. Possible cause: memory overcommitment.
CACHEUSD>-1 Memory (in MB) compressed by ESXi Host
ZIP/s >-1Values larger 0 indicate that the host is actively compressing memory.
UNZIP/s >-1 Values larger 0 indicate that the host is accessing compressed memory. Reason: memory overcommitment.
Memory State:high enough free memory available (normal TPS cycles)
clear <100% of minFree: ESXi actively calls TPS to collapse pages
soft <64% of minFree: Host reclaims memory by balloon driver + TPS
hard <32% of minFree: Host starts to swap, compress + TPS / no more ballooning
low <16% of minFree: ESXi blocks VMs from allocating more RAM +