This fix a performance bug introduced by commit
90b1ebe "mlx4: set
maximal number of default RSS queues", which limits the numbers of IRQs
opened by core module.
The limit should be on the number of queues in the indirection table -
rx_rings, and not on the number of IRQ's. Also, limiting on mlx4_core
initialization instead of in mlx4_en, prevented using "ethtool -L" to
utilize all the CPU's, when performance mode is prefered, since limiting
this number to 8 reduces overall packet rate by 15%-50% in multiple TCP
streams applications.
For example, after running ethtool -L <ethx> rx 16
Packet rate
Before the fix 897799
After the fix
1142070
Results were obtained using netperf:
S=200 ; ( for i in $(seq 1 $S) ; do ( \
netperf -H 11.7.13.55 -t TCP_RR -l 30 &) ; \
wait ; done | grep "1 1" | awk '{SUM+=$6} END {print SUM}' )
CC: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
{
int i;
int num_of_eqs;
+ int num_rx_rings;
struct mlx4_dev *dev = mdev->dev;
mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
dev->caps.comp_pool/
dev->caps.num_ports) - 1;
+ num_rx_rings = min_t(int, num_of_eqs,
+ netif_get_num_default_rss_queues());
mdev->profile.prof[i].rx_ring_num =
- rounddown_pow_of_two(num_of_eqs);
+ rounddown_pow_of_two(num_rx_rings);
}
}
#include <linux/slab.h>
#include <linux/io-mapping.h>
#include <linux/delay.h>
-#include <linux/netdevice.h>
#include <linux/kmod.h>
#include <linux/mlx4/device.h>
struct mlx4_priv *priv = mlx4_priv(dev);
struct msix_entry *entries;
int nreq = min_t(int, dev->caps.num_ports *
- min_t(int, netif_get_num_default_rss_queues() + 1,
+ min_t(int, num_online_cpus() + 1,
MAX_MSIX_P_PORT) + MSIX_LEGACY_SZ, MAX_MSIX);
int i;