[ Bottom of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]

Performance Management Guide

Tuning mbuf pool performance

The network subsystem uses a memory management facility that revolves around a data structure called an mbuf. Mbufs are mostly used to store data in the kernel for incoming and outbound network traffic. Having mbuf pools of the right size can have a positive effect on network performance. If the mbuf pools are configured incorrectly, both network and system performance can suffer. The upper limit of the mbuf pool size, which is the thewall tunable, is automatically determined by the operating system, based on the amount of memory in the system. As the system administrator, only you can tune the upper limit of the mbuf pool size.

The thewall tunable

The thewall network tunable option sets the upper limit for network kernel buffers. The system automatically sets the value of the thewall tunable to the maximum value and in general, you should not change the value. You could decrease it, which would reduce the amount of memory the system uses for network buffers, but it might affect network performance. Since the system only uses the necessary number of buffers at any given time, if the network subsystem is not being heavily used, the total number of buffers should be much lower than the thewall value.

The unit of thewall tunable is in 1 KB, so 1048576 bytes indicates 1024 MB or 1 GB of RAM.

32-bit versus 64-bit kernel

The AIX 32-bit kernel has up to 1 GB of mbuf buffer space, consisting of up to four memory segments of 256 MB each. This value might be lower, based on the total amount of memory in the system. The size of the thewall tunable is either 1 GB or half of the amount of system memory, whichever value is smaller.

The AIX 64-bit kernel has a much larger kernel buffer capacity. It has up to 65 GB of mbuf buffer space, consisting of 260 memory segments of 256 MB each. With the 64-bit kernel, the size of the thewall tunable is either 65 GB or half of the amount of system memory, whichever value is smaller.

Therefore, systems with large numbers of TCP connections, network adapters, or network I/O should consider using the 64-bit kernel if the mbuf pool is limiting capacity or performance.

The maxmbuf tunable

The value of the maxmbuf tunable limits how much real memory is used by the communications subsystem. You can also use the maxmbuf tunable to lower the thewall limit. You can view the maxmbuf tunable value by running the lsattr -E -l sys0 command . If themaxmbuf value is greater than 0 , the maxmbuf value is used regardless of the value of thewall tunable.

The default value for the maxmbuf tunable is 0. A value of 0 for the maxmbuf tunable indicates that the thewall tunable is used. You can change the maxmbuf tunable value by using the chdev or smitty commands.

The sockthresh and strthresh threshold tunables

The sockthresh and strthresh tunables are the upper thresholds to limit the opening of new sockets or TCP connections, or the creation of new streams resources. This prevents buffer resources from not being available and ensures that existing sessions or connections have resources to continue operating.

The sockthresh tunable specifies the memory usage limit. No new socket connections are allowed to exceed the value of the sockthresh tunable. The default value for the sockthresh tunable is 85%, and once the total amount of allocated memory reaches 85% of the thewall or maxmbuf tunable value, you cannot have any new socket connections, which means the return value of the socket() and socketpair() system calls is ENOBUFS, until the buffer usage drops below 85%.

Similarly, the strthresh tunable limits the amount of mbuf memory used for streams resources and the default value for the strthresh tunable is 85%. The async and TTY subsytems run in the streams environment. The strthresh tunable specifies that once the total amount of allocated memory reaches 85% of the thewall tunable value, no more memory goes to streams resources, which means the return value of the streams call is ENOSR, to open streams, push modules or write to streams devices.

You can tune the sockthresh and strthresh thresholds with the no command.

Overview of the mbuf Management Facility

The mbuf management facility controls different buffer sizes that can range from 32 bytes up to 16384 bytes. The pools are created from system memory by making an allocation request to the Virtual Memory Manager (VMM). The pools consist of pinned pieces of kernel virtual memory in which they always reside in physical memory and are never paged out. The result is that the real memory available for paging in application programs and data has been decreased by the amount that the mbuf pools have been increased.

The network memory pool is split evenly among each processor. Each sub-pool is then split up into buckets, with each bucket holding bufers ranging in size from 32 to 16384 bytes. Each bucket can borrow memory from other buckets on the same processor but a processor cannot borrow memory from another processor's network memory pool. When a network service needs to transport data, it can call a kernel service such as m_get() to obtain a memory buffer. If the buffer is already available and pinned, it can get it immediately. If the upper limit has not been reached and the buffer is not pinned, then a buffer is allocated and pinned. Once pinned, the memory stays pinned but can be freed back to the network pool. If the number of free buffers reaches a high-water mark, then a certain number is unpinned and given back to the system for general use. This unpinning is done by the netm() kernel process. The caller of the m_get() subroutine can specify whether to wait for a network memory buffer. If the M_DONTWAIT flag is specified and no pinned buffers are available at that time, a failed counter is incremented. If the M_WAIT flag is specified, the process is put to sleep until the buffer can be allocated and pinned.

The netstat -m command to monitor mbuf pools

Use the netstat -m command to detect shortages or failures of network memory (mbufs/clusters) requests You can use the netstat -Zm command to clear (or zero) the mbuf statistics. This is helpful when running tests to start with a clean set of statistics. The following fields are provided with the netstat -m command:

Field name
Definition
By size
Shows the size of the buffer.
inuse
Shows the number of buffers of that particular size in use.
calls
Shows the number of calls, or allocation requests, for each sized buffer.
failed
Shows how many allocation requests failed because no buffers were available.
delayed
Shows how many calls were delayed if that size of buffer was empty and theM_WAIT flag was set by the caller.
free
Shows the number of each size buffer that is on the free list, ready to be allocated.
hiwat
Shows the maximum number of buffers, determined by the system, that can remain on the free list. Any free buffers above this limit are slowly freed back to the system.
freed
Shows the number of buffers that were freed back to the system when the free count when above the hiwat limit.

You should not see a large number of failed calls. There might be a few, which trigger the system to allocate more buffers as the buffer pool size increases. There is a predefined set of buffers of each size that the system starts with after each reboot, and the number of buffers increases as necessary.

The following is an example of the netstat -m command from a two-processor or CPU machine:

# netstat -m

Kernel malloc statistics:

******* CPU 0 *******
By size           inuse     calls failed   delayed    free   hiwat   freed
32                   68       693      0         0      60    2320       0
64                   55       115      0         0       9    1160       0
128                  21       451      0         0      11     580       0
256                1064      5331      0         0    1384    1392      42
512                  41       136      0         0       7     145       0
1024                 10       231      0         0       6     362       0
2048               2049      4097      0         0     361     362     844
4096                  2         8      0         0     435     435     453
8192                  2         4      0         0       0      36       0
16384                 0       513      0         0      86      87     470


******* CPU 1 *******
By size           inuse     calls failed   delayed    free   hiwat   freed
32                  139       710      0         0     117    2320       0
64                   53       125      0         0      11    1160       0
128                  41       946      0         0      23     580       0
256                  62      7703      0         0    1378    1392     120
512                  37       109      0         0      11     145       0
1024                 21       217      0         0       3     362       0
2048                  2      2052      0         0     362     362     843
4096                  7        10      0         0     434     435     449
8192                  0         4      0         0       1      36       0
16384                 0      5023      0         0      87      87    2667


***** Allocations greater than 16384 Bytes *****

By size           inuse     calls failed   delayed    free   hiwat   freed
65536                 2         2      0         0       0    4096       0

Streams mblk statistic failures:
0 high priority mblk failures
0 medium priority mblk failures
0 low priority mblk failures

ARP cache tuning

The Address Resolution Protocol (ARP) is a protocol used to map 32-bit IPv4 addresses into a 48-bit host adapter address required by the data link protocol. ARP is handled transparently by the system. However, the system maintains an ARP cache, which is a table that holds the associated 32-bit IP addresses and its 48-bit host address. You might need to change the size of the ARP cache in environments where large numbers of machines (clients) are connected.

The no command tunable parameters are:

The ARP table size is composed of a number of buckets, defined by the arptab_nb parameter. Each bucket holds arptab_bsiz entries. The defaults are 73 buckets with 7 entries each, so the table can hold 511 (73 x 7) host addresses. If a server connects to 1000 client machines concurrently, then the default ARP table is too small, which causes AIX to thrash the ARP cache. The operating system then has to purge an entry in the cache and replace it with a new address. This requires the TCP or UDP packets to wait (be queued) while the ARP protocol exchanges this information. The arpqsize parameter determines how many of these waiting packets can be queued by the ARP layer until an ARP response is received back from an ARP request. If the ARP queue is overrun, outgoing TCP or UDP packets are dropped.

ARP cache thrashing might have a negative impact on performance for the following reasons:

  1. The current outgoing packet has to wait for the ARP protocol lookup over the network.
  2. Another ARP entry must be removed from the ARP cache. If all the addresses are needed, another address is required when the host address that is deleted has packets sent to it.
  3. The ARP output queue might be overrun, which could cause dropped packets.

The arpqsize, arptab_bsiz, and arptab_nb parameters are all reboot parameters in that the system must be rebooted if their values change because they alter tables that are built at boot time or TCP/IP load time.

The arpt_killc parameter is the time, in minutes, before an ARP entry is deleted. The default value of the arpt_killc parameter is 20 minutes. ARP entries are deleted from the table every arpt_killc minutes to cover the case where a host system might change its 48-bit address, which can occur when its network adapter is replaced for example. This ensures that any stale entries in the cache are deleted, as these would prevent communication with such a host until its old address is removed. Increasing this time would reduce ARP lookups by the system, but can result in holding stale host addresses longer. The arpt_killc parameter is a dynamic parameter, so it can be changed on the fly without rebooting the system.

The netstat -p arp command displays the ARP statistics. These statistics show how many total ARP request have been sent and how many packets have been purged from the table when an entry is deleted to make room for a new entry. If this purged count is high, then your ARP table size should be increased. The following is an example of the netstat -p arpcommand.

# netstat -p arp

arp:  
     6 packets sent
     0 packets purged

You can display the ARP table with the arp -a command. The command output shows which addresses are in the ARP table and how those addresses are hashed and to what buckets.

 ? (10.3.6.1) at 0:6:29:dc:28:71 [ethernet] stored 
                                                   
bucket:    0     contains:    0 entries            
bucket:    1     contains:    0 entries            
bucket:    2     contains:    0 entries            
bucket:    3     contains:    0 entries            
bucket:    4     contains:    0 entries            
bucket:    5     contains:    0 entries            
bucket:    6     contains:    0 entries            
bucket:    7     contains:    0 entries            
bucket:    8     contains:    0 entries            
bucket:    9     contains:    0 entries            
bucket:   10     contains:    0 entries            
bucket:   11     contains:    0 entries            
bucket:   12     contains:    0 entries            
bucket:   13     contains:    0 entries            
bucket:   14     contains:    1 entries            
bucket:   15     contains:    0 entries            


...lines omitted...

There are 1 entries in the arp table.

[ Top of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]