NUMA (non-uniform memory access) is a multiprocessing memory architecture in which memory access time is dependent on the memory location relative to the processor, and a CPU may access its own local memory faster than non-local memory. The benefits of NUMA are limited to specific workloads, especially on servers where data is regularly linked to individual processes or users.
NUMA systems are high-performance server solutions that use a multi-system bus. They may be able to integrate a large number of processors into a single system image, resulting in improved price-performance ratios.
NUMA delivers considerable benefits when used in a virtualized environment (VMware ESXi, Microsoft Hyper-V, etc.) as long as the Guest VM does not utilize more resources than the NUMA nodes.
Architecture of NUMA
The NUMA architecture contains several CPU modules, each with multiple CPUs and its own local memory, I/O slots, and other characteristics. Each CPU has access to the whole system’s memory since its nodes can utilize an interconnection module to link and share information. Local memory access is much faster than remote memory access, which is also the source of inconsistent memory access.
In many aspects, NUMA and MPP are structurally similar. They’re made up of a lot of different nodes. Each node has its own processor, memory, and I/O ports. Nodes can transfer information via the node connection approach. There is, however, a fundamental distinction:
Node Interconnection Mechanism: It is the NUMA node connectivity technique that is implemented on the same physical server. When a CPU needs to access external memory, it must wait. This is the main reason why the NUMA server can’t offer linear performance expansion as CPU power increases.
Memory Access Mechanism: The NUMA node connectivity technique and is implemented on the same physical server. When a CPU needs to access external memory, it must wait. This is the main reason why the NUMA server can’t offer linear performance expansion as CPU power increases.
NUMA Home Node
NUMA Home Node is a logical representation of the local memory and the CPU and is critical in initial placement because if a VM’s CPU or Memory allocation exceeds the NUMA home node, the VM will be forced to balance over two nodes.
Each virtual machine that the NUMA scheduler supervises is given a home node. In a system with processors and local memory, a home node is a NUMA node.
When allocating memory to a VM, the ESXi host will always try to allocate memory from the home node. The VM’s virtual CPUs are constrained to running on the home node to enhance memory locality. When needed or viable, the VMkernel NUMA scheduler can dynamically shift a VM’s home node to respond to changes in system demand.
However, the VMkernel is limited by physical and technological constraints, and misconfigurations might result in performance problems and cannot be relied on the VMkernel for effective load balancing of VMs.
As a result, it is necessary to start evaluating the current NUMA state such as.
- Is NUMA remote access possible?
- How frequently do VMs migrate their Home node?
- How much memory is moved when NUMA migration takes place and how many VMs are affected?
- Is this a generic or ESXi-specific problem?
Then, on a per-VM or per-ESXi basis, begin changing default VMkernel settings or correcting misconfigurations, and track all improvements over time.
VMs have been severely impacted by thousands of migrations every day. Because every migration also causes a memory content migration (the VMkernel tries to optimize by relocating distant node memory to home node memory), the entire ESXi host can quickly slow down.
CNIL Metrics and Logs monitors the NUMA Home Node performance in percentage; the lower this value, the greater the danger that NUMA locality is causing a performance problem. If the number is less than 80%, you should be concerned.
NUMA Remote Node
NUMA Remote Node Access displays the amount of memory accessed via the remote node in bytes (slowest memory access). Don’t be concerned about single-digit Mbyte numbers. However, Gigabytes… take action!
The amount of memory accessible by the VM from a distant NUMA node. The greater this value is, the more likely NUMA is to blame for a performance issue. If the amount exceeds several Megabytes, you should be concerned.
The NUMA Remote Node and the amount of memory accessed from a remote NUMA node by the VM are monitored by CNIL Metrics and Logs. The greater this value is, the more likely NUMA is to blame for a performance issue. If the amount exceeds several Megabytes, you should be concerned.
How does the VMware ESXi Host use NUMA?
ESXi dynamically balances processor load to maximize memory locality using a sophisticated NUMA scheduler. Each VM that the NUMA scheduler supervises is given a home node. In a system with processors and local memory, a home node is a NUMA node.
The ESXi host always tries to receive RAM from the home node when a VM requires it. The VM’s virtual CPUs are constrained to running on the home node to enhance memory locality.
The VMkernel NUMA scheduler may dynamically relocate a virtual machine’s home node to respond to changes in system demand when needed or feasible. However, the VMkernel is limited by physical and technological constraints, and misconfigurations might result in performance problems. For successful load balancing of your VMs, you can’t rely just on the VMkernel.
How to detect NUMA performance issues
There are a variety of methods to encounter NUMA performance difficulties, but monitoring them without third-party software is difficult. The NUMA Home Node utilization is the most critical item to examine, as we already know.
Check out CNIL Metrics and Logs if you’re searching for a simpler solution that keeps and visualizes information over a lengthy period and for all of your ESXi hosts and VMs. To get started, take advantage of the 30-day free trial. In the Starter, you’ll find the most significant metrics: Under Virtual Machine Memory Access Slowdown indications on the VMware Virtual Machine Dashboard.
The proportion of memory access that stays in the NUMA Home Node is shown in NUMA Home Node percent (the fastest memory access). That number should always be 100% or extremely close to 100%. If it falls below 90% for an extended time, you should begin optimizing.
NUMA Remote Node Access displays the amount of memory accessed via the remote node in bytes and doesn’t be concerned about single-digit Mbyte numbers.
Start your free Trial!
When memory is accessed locally, optimum performance is achieved for most workloads. To get the most out of the system, the VM vCPU and RAM setup should mirror the workload needs. VMs should typically be small enough to fit on a single NUMA node. When VM setup spans several NUMA nodes, NUMA optimizations come in handy, but if possible, stick to a single CPU package architecture.
Check out the other blog posts: