Mellanox CX6 vdpa 硬件卸载 ovs-kernel 方式

本文来讲解,使用mellanox CX6 dx 网卡,实现vdpa硬件卸载。

视频讲解:

vdpa 硬件卸载介绍

既然说到了vdpa卸载,那么我们先简单介绍一下他是什么。

vDPA (virtio data path acceleration) 是一个内核框架,在2020年正式引入内核,NIC厂家会做vDPA网卡,意思是datapath遵循virtio规范,而控制面由厂家驱动提供。

以下是vDPA在虚拟机平台部署时的架构图:

以下是vDPA在k8s平台中部署是的架构图:

上面的架构图,是借用红帽介绍vdpa背景的文章。我们这次的实验,是按照mellanox的文档来做,从mellanox角度看,vdpa有2种方式来做

  1. 配置ovs-dpdk, ovs配置vdpa端口,同时创建socket。vm通过socket挂载vdpa设备。
  2. 配置ovs-kernel,启动vdpa-dpdk程序,同时创建socket。vm通过socket挂载vdpa设备。

第一种方法,由于ovs-dpdk,mellanox官方文档说只支持到rhel/centos 7 , 我们的环境是rhel/rocky 8.4,所以我们用后面一种方法。

在这里,背景介绍的很简单,以下是参考链接,可以更深入的学习:

有一个dpdk特殊概念,vf representor,dpdk文档有说,简单理解,是给控制面准备的vf的分身。

  • https://doc.dpdk.org/guides-18.11/prog_guide/switch_representation.html
   .-------------.                 .-------------. .-------------.
   | hypervisor  |                 |    VM 1     | |    VM 2     |
   | application |                 | application | | application |
   `--+---+---+--'                 `----------+--' `--+----------'
      |   |   |                               |       |
      |   |   `-------------------.           |       |
      |   `---------.             |           |       |
      |             |             |           |       |
.-----+-----. .-----+-----. .-----+-----.     |       |
| port_id 3 | | port_id 4 | | port_id 5 |     |       |
`-----+-----' `-----+-----' `-----+-----'     |       |
      |             |             |           |       |
    .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
    | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
    `-+--'    `-----+-----' `-----+-----' `---+--' `--+---'
      |             |             |           |       |
      |             |   .---------'           |       |
      `-----.       |   |   .-----------------'       |
            |       |   |   |   .---------------------'
            |       |   |   |   |
         .--+-------+---+---+---+--.
         | managed interconnection |
         `------------+------------'
                      |
                 .----+-----.
                 | physical |
                 |  port 0  |
                 `----------'

本次实验的架构图如下:

系统安装


export VAR_HOST='rl_panlab105'

# 按照完了操作系统以后,添加kernel参数,主要是intel_iommu=on iommu=pt,然后重启
cp /etc/default/grub /etc/default/grub.bak
sed -i "/GRUB_CMDLINE_LINUX/s/resume=[^[:space:]]*//"  /etc/default/grub
sed -i "/GRUB_CMDLINE_LINUX/s/rd.lvm.lv=${VAR_HOST}\\/swap//"  /etc/default/grub
# https://unix.stackexchange.com/questions/403706/sed-insert-text-after-nth-character-preceding-following-a-given-string
sed -i '/GRUB_CMDLINE_LINUX/s/"/ intel_iommu=on iommu=pt  default_hugepagesz=1G hugepagesz=1G hugepages=16 rdblacklist=nouveau"/2' /etc/default/grub

grub2-mkconfig -o /boot/efi/EFI/rocky/grub.cfg

grub2-mkconfig -o /boot/grub2/grub.cfg

# 添加kvm cpu host mode模式的支持,可以不做
cat << EOF > /etc/modprobe.d/kvm-nested.conf
options kvm_intel nested=1  
options kvm-intel enable_shadow_vmcs=1   
options kvm-intel enable_apicv=1         
options kvm-intel ept=1                  
EOF

# 默认的操作系统安装,有swap, home分区,我们是测试系统,全都删了吧。
umount /home
swapoff  /dev/$VAR_HOST/swap

cp /etc/fstab /etc/fstab.bak
sed -i 's/^[^#]*home/#&/' /etc/fstab
sed -i 's/^[^#]*swap/#&/' /etc/fstab

lvremove -f /dev/$VAR_HOST/home
lvremove -f /dev/$VAR_HOST/swap

lvextend -l +100%FREE /dev/$VAR_HOST/root
xfs_growfs /dev/$VAR_HOST/root

# 至此,开始安装网卡驱动
# 103 driver install
# https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed
mkdir -p /data/down/
cd /data/down/
dnf groupinstall -y 'Development Tools'
dnf groupinstall -y "Server with GUI"

wget https://www.mellanox.com/downloads/ofed/MLNX_OFED-5.4-3.0.3.0/MLNX_OFED_LINUX-5.4-3.0.3.0-rhel8.4-x86_64.tgz
tar zvxf *.tgz
cd /data/down/MLNX_OFED_LINUX-5.4-3.0.3.0-rhel8.4-x86_64
dnf install -y tcl tk kernel-modules-extra python36 make gcc-gfortran tcsh unbound
./mlnxofedinstall --all --force --distro rhel8.4
# ./mlnxofedinstall --dpdk --ovs-dpdk --upstream-libs --add-kernel-support --force --distro rhel8.4

reboot

systemctl enable --now mst
systemctl enable --now openibd

cat << EOF > /etc/yum.repos.d/mlx.repo
[mlnx_ofed]
name=MLNX_OFED Repository
baseurl=file:///data/down/MLNX_OFED_LINUX-5.4-3.0.3.0-rhel8.4-x86_64/RPMS
enabled=1
gpgcheck=0
EOF

dnf makecache 

# 开始安装dpdk相关的软件
mkdir -p /data/soft
cd /data/soft

dnf config-manager --set-enabled powertools
dnf install -y ninja-build meson

# 装mlnx版本的dpdk组件和ovs软件
# dnf group list
# dnf groupinstall -y 'Development Tools'
# install dpdk
dnf install -y mlnx-dpdk mlnx-dpdk-devel numactl-devel openvswitch  openvswitch-selinux-policy libnl3-devel openssl-devel zlib-devel libpcap-devel elfutils-libelf-devel 
# https://doc.dpdk.org/guides/linux_gsg/sys_reqs.html#compilation-of-the-dpdk
pip3 install --user pyelftools

systemctl enable --now openvswitch

export PATH=$PATH:/opt/mellanox/dpdk/bin/
echo 'export PATH=$PATH:/opt/mellanox/dpdk/bin/' >> ~/.bash_profile

# 编译上游的dpdk软件包,因为我们要用里面的vdpa sample程序
cd /data/soft/
wget https://fast.dpdk.org/rel/dpdk-20.11.3.tar.xz
tar vxf dpdk-20.11.3.tar.xz
# https://core.dpdk.org/doc/quick-start/
cd /data/soft/dpdk-stable-20.11.3/
# meson -Dexamples=all build
meson --reconfigure -Dexamples=all build
ninja -C build

export PKG_CONFIG_PATH=/opt/mellanox/dpdk/lib64/pkgconfig/
cd /data/soft/dpdk-stable-20.11.3/examples/vdpa
make -j 

# 按照kvm相关软件包
# install kvm with qemu
# dnf -y groupinstall "Server with GUI"

dnf -y install qemu-kvm libvirt libguestfs-tools virt-install virt-viewer virt-manager tigervnc-server

systemctl disable --now firewalld
systemctl enable --now libvirtd

# 最后,设置mlx网卡参数,激活sriov
# pci地址,使用 lspci -D | grep -i mell 或者 lshw -c network -businfo 得到
lspci -D | grep -i mell
# 0000:04:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
# 0000:04:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

lshw -c network -businfo
# Bus info          Device     Class          Description
# =======================================================
# pci@0000:02:00.0  eno3       network        NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
# pci@0000:02:00.1  eno4       network        NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
# pci@0000:01:00.0  eno1       network        NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
# pci@0000:01:00.1  eno2       network        NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
# pci@0000:04:00.0  enp4s0f0   network        MT2892 Family [ConnectX-6 Dx]
# pci@0000:04:00.1  enp4s0f1   network        MT2892 Family [ConnectX-6 Dx]

# UCTX_EN is for enable DevX
# DevX allows to access firmware objects
mlxconfig -y -d 0000:04:00.0 set SRIOV_EN=1 UCTX_EN=1 NUM_OF_VFS=8

ovs-kernel 方案

网卡设置脚本

# mlx默认的ovs,缺少一些selinux的配置,在此补上
# 项目上,可以根据需要,自行补充缺少的selinux配置
semodule -i wzh-mellanox-ovs-dpdk.pp

# 这里做了一个配置和启动ovs的脚步,逻辑是先清空ovs配置,再配置网卡模式,然后启动ovs
cat << 'EOF' > /data/ovs-offload-env.sh
#!/usr/bin/env bash

set -e
set -x

systemctl restart openvswitch
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=try
systemctl restart openvswitch

ip link set dev ${IFNAME} down || true
ip link set dev ${IFNAME}_0 down || true
ip link set dev ${IFNAME}_1 down || true

ip link set dev ${IFNAME}v0 down || true
ip link set dev ${IFNAME}v1 down || true

ovs-vsctl del-port ovs-sriov ${IFNAME} || true
ovs-vsctl del-port ovs-sriov ${IFNAME}_0 || true
ovs-vsctl del-port ovs-sriov ${IFNAME}_1 || true
ovs-vsctl del-br ovs-sriov || true

ovs-vsctl del-port br0-ovs pf0vf0 || true
ovs-vsctl del-port br0-ovs pf0vf1 || true
ovs-vsctl del-port br0-ovs pf0 || true
ovs-vsctl del-br br0-ovs || true

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=false
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-extra=" "
ovs-vsctl --no-wait set Open_vSwitch . other_config={}

# Turn off SR-IOV on the PF device. 
echo 0 > /sys/class/net/$IFNAME/device/sriov_numvfs
cat /sys/class/net/$IFNAME/device/sriov_numvfs
# 0

systemctl restart openvswitch

# Turn ON SR-IOV on the PF device. 
echo 2 > /sys/class/net/$IFNAME/device/sriov_numvfs
cat /sys/class/net/$IFNAME/device/sriov_numvfs
# 2

ip link set $IFNAME vf 0 mac ${VF1MAC}
ip link set $IFNAME vf 1 mac ${VF2MAC}

echo ${PCINUM%%.*}.2 > /sys/bus/pci/drivers/mlx5_core/unbind || true
echo ${PCINUM%%.*}.3 > /sys/bus/pci/drivers/mlx5_core/unbind || true

devlink dev eswitch set pci/$PCINUM mode switchdev
devlink dev eswitch show pci/$PCINUM
# # pci/0000:43:00.0: mode switchdev inline-mode none encap-mode basic

echo ${PCINUM%%.*}.2 > /sys/bus/pci/drivers/mlx5_core/bind
echo ${PCINUM%%.*}.3 > /sys/bus/pci/drivers/mlx5_core/bind

# systemctl enable --now openvswitch
# systemctl restart openvswitch

# Create an OVS bridge (here it's named ovs-sriov). 
ovs-vsctl add-br ovs-sriov

ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

systemctl restart openvswitch

ovs-vsctl add-port ovs-sriov ${IFNAME}
ovs-vsctl add-port ovs-sriov ${IFNAME}_0
ovs-vsctl add-port ovs-sriov ${IFNAME}_1

ip link set dev ${IFNAME} up
ip link set dev ${IFNAME}_0 up
ip link set dev ${IFNAME}_1 up

ip link set dev ${IFNAME}v0 up
ip link set dev ${IFNAME}v1 up

# systemctl restart openvswitch

# ip addr add ${VF1IP} dev ${IFNAME}v0
# ip addr add ${VF2IP} dev ${IFNAME}v1

EOF

# for 103
# export IFNAME=enp4s0f0
# export PCINUM=0000:04:00.0
# export VF1MAC=e4:11:22:33:44:50
# export VF2MAC=e4:11:22:33:44:51
# export VF1IP=192.168.55.21/24
# export VF2IP=192.168.55.22/24
# bash /data/ovs-offload-env.sh

# 设置一下环境变量,就可以执行脚本,启动ovs了。
# for 105
export IFNAME=enp67s0f0
export PCINUM=0000:43:00.0
export VF1MAC=e4:11:22:33:55:60
export VF2MAC=e4:11:22:33:55:61
# export VF1IP=192.168.55.31/24
# export VF2IP=192.168.55.32/24
bash /data/ovs-offload-env.sh

# 我们还需要启动一个DPDK的程序,做vdpa的功能,并接到vf上去。
/data/soft/dpdk-stable-20.11.3/examples/vdpa/build/vdpa -w ${PCINUM%%.*}.2,class=vdpa --log-level=pmd,info -- -i
create /tmp/sock-virtio0 0000:43:00.2
# EAL: Detected 24 lcore(s)
# EAL: Detected 2 NUMA nodes
# Option -w, --pci-whitelist is deprecated, use -a, --allow option instead
# EAL: Detected shared linkage of DPDK
# EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
# EAL: Selected IOVA mode 'VA'
# EAL: No available hugepages reported in hugepages-2048kB
# EAL: Probing VFIO support...
# EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:43:00.2 (socket 1)
# mlx5_vdpa: ROCE is disabled by Netlink successfully.
# EAL: No legacy callbacks, legacy socket not created
# Interactive-mode selected
# vdpa> create /tmp/sock-virtio0 0000:43:00.2
# VHOST_CONFIG: vhost-user server: socket created, fd: 112
# VHOST_CONFIG: bind to /tmp/sock-virtio0
# vdpa>

vdpa> list
# device name     queue num       supported features
# 0000:43:00.2            256             0x114c60180b

vdpa> stats 0000:43:00.2 0
# Device 0000:43:00.2:
#         Virtq 0:
#                 received_descriptors                                             1024
#                 completed_descriptors                                            39
#                 bad descriptor errors                                            0
#                 exceed max chain                                                 0
#                 invalid buffer                                                   0
#                 completion errors                                                0

kvm

接下来,我们就要创建一个kvm,来使用我们的vdpa通道。

由于我们创建了一个socket,需要qemu有权限读取这个socket,所以我们需要把qemu的用户改为root。

sed -i.bak 's/#user = "root"/user = "root"/' /etc/libvirt/qemu.conf

# 我们还需要创建一个网桥,让kvm能接住宿主机的网口能上网。方便访问和管理。
mkdir -p /data/kvm
cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.103/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh

# 我们先用标准的方法,创建,启动和安装一个kvm
cd /data/kvm
export DOMAIN=cx6.1

virt-install --name="${DOMAIN}" --vcpus=2 --ram=8192 \
--cputune vcpupin0.vcpu=14,vcpupin1.vcpu=16 \
--memorybacking hugepages.page0.size=1,hugepages.page0.unit=GiB \
--cpu host-model \
--disk path=/data/kvm/${DOMAIN}.qcow2,bus=virtio,size=30 \
--os-variant rhel8.4 \
--network bridge=baremetal,model=virtio \
--graphics vnc,port=59000 \
--boot menu=on --location /data/kvm/Rocky-8.4-x86_64-minimal.iso \
--initrd-inject helper-ks-rocky.cfg --extra-args "inst.ks=file:/helper-ks-rocky.cfg" 

# 接下来,配置这个kvm,把vdpa的通道加入到kvm里面。
# https://unix.stackexchange.com/questions/235414/libvirt-how-to-pass-qemu-command-line-args
# virt-xml $DOMAIN --edit --confirm --qemu-commandline 'env=MY-ENV=1234'
virt-xml $DOMAIN --edit --qemu-commandline='-chardev socket,id=charnet1,path=/tmp/sock-virtio0'
virt-xml $DOMAIN --edit --qemu-commandline='-netdev vhost-user,chardev=charnet1,queues=16,id=hostnet1'
virt-xml $DOMAIN --edit --qemu-commandline='-device virtio-net-pci,mq=on,vectors=6,netdev=hostnet1,id=net1,mac=e4:11:c6:d3:45:f2,bus=pcie.0,addr=0x6,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024'

接下来,要手动修改如下的配置配置,注意这里cpu binding的核,都应该在一个numa上面。

virsh edit cx6.1

  <cputune>
    <vcpupin vcpu='0' cpuset='14'/>
    <vcpupin vcpu='1' cpuset='16'/>
  </cputune>

  <cpu mode='host-model' check='partial'>
    <numa>
      <cell id='0' cpus='0-1' memory='8388608' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  

最后的配置样例如下,项目中,可以根据以下例子排错。

virsh dumpxml cx6.1

<domain type='kvm' id='11' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>cx6.1</name>
  <uuid>5cbb6f7c-7122-4fc4-9706-ff46aed3bf25</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://redhat.com/rhel/8.4"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>2</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='14'/>
    <vcpupin vcpu='1' cpuset='16'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-rhel8.2.0'>hvm</type>
    <boot dev='hd'/>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge-IBRS</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='pcid'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='umip'/>
    <feature policy='require' name='md-clear'/>
    <feature policy='require' name='stibp'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='ibpb'/>
    <feature policy='require' name='ibrs'/>
    <feature policy='require' name='amd-stibp'/>
    <feature policy='require' name='amd-ssbd'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='require' name='pschange-mc-no'/>
    <numa>
      <cell id='0' cpus='0-1' memory='8388608' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/data/kvm/cx6.1.qcow2' index='2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu'/>
      <target dev='sda' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x16'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:8d:b6:8e'/>
      <source bridge='baremetal'/>
      <target dev='vnet2'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/6'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/6'>
      <source path='/dev/pts/6'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-11-cx6.1/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <graphics type='vnc' port='59000' autoport='no' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <stats period='5'/>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <alias name='rng0'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </rng>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c46,c926</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c46,c926</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+0</label>
    <imagelabel>+0:+0</imagelabel>
  </seclabel>
  <qemu:commandline>
    <qemu:arg value='-chardev'/>
    <qemu:arg value='socket,id=charnet1,path=/tmp/sock-virtio0'/>
    <qemu:arg value='-netdev'/>
    <qemu:arg value='vhost-user,chardev=charnet1,queues=16,id=hostnet1'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-net-pci,mq=on,vectors=6,netdev=hostnet1,id=net1,mac=e4:11:c6:d3:45:f2,bus=pcie.0,addr=0x6,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024'/>
  </qemu:commandline>
</domain>

赶紧试试吧

接下来就进入测试和体验环节。

# in cx6.1 kvm
# nmcli dev connect enp0s6
nmcli con modify enp0s6 ipv4.method manual ipv4.addresses 192.168.99.11/24
# nmcli con modify enp0s6 ipv4.method manual ipv4.addresses 192.168.55.91/24
nmcli con up enp0s6

# on peer machine (102)
nmcli con modify enp66s0f0 ipv4.method manual ipv4.addresses 192.168.99.21/24
# nmcli con modify enp66s0f0 ipv4.method manual ipv4.addresses 192.168.55.92/24
# nmcli dev connect enp66s0f0
nmcli con up enp66s0f0

# run after the tcpdump is running
ping 192.168.99.21
# PING 192.168.99.21 (192.168.99.21) 56(84) bytes of data.
# 64 bytes from 192.168.99.21: icmp_seq=1 ttl=64 time=0.089 ms
# 64 bytes from 192.168.99.21: icmp_seq=2 ttl=64 time=0.044 ms
# 64 bytes from 192.168.99.21: icmp_seq=3 ttl=64 time=0.046 ms
# ....

# on 105
tcpdump -i enp67s0f0_0 -w dump.test
# dropped privs to tcpdump
# tcpdump: listening on enp67s0f0_0, link-type EN10MB (Ethernet), capture size 262144 bytes
# ^C2 packets captured
# 2 packets received by filter
# 0 packets dropped by kernel

tcpdump -i enp67s0f0 -w dump.test
# dropped privs to tcpdump
# tcpdump: listening on enp67s0f0, link-type EN10MB (Ethernet), capture size 262144 bytes
# ^C4 packets captured
# 4 packets received by filter
# 0 packets dropped by kernel

用 wireshark 打开,可以看到是标准的icmp包,说明我们构建的是数据通路,而不是协议封装。另外,我们会发现,ping了很多包,但是我们只是抓到了1个,这是因为,网卡offload了,我们只能抓到第一个进入内核查流表的包,后面的都网卡offload了,就抓不到了。

以下是在pf上抓的包,抓到了4个。都是流的第一个包,后面的就都offload啦。

# ovs-dpctl dump-flows

# on 105
# 看看ovs的流表,可以看到有2个arp(0x0806)的流表(0x0806),正向和方向
# 还有2个ip(0x0800)的流表,正向和反向
ovs-appctl dpctl/dump-flows type=offloaded
# recirc_id(0),in_port(2),eth(src=0c:42:a1:fa:18:8e,dst=e4:11:c6:d3:45:f2),eth_type(0x0800),ipv4(frag=no), packets:149, bytes:15198, used:0.510s, actions:3
# recirc_id(0),in_port(2),eth(src=0c:42:a1:fa:18:8e,dst=e4:11:c6:d3:45:f2),eth_type(0x0806), packets:0, bytes:0, used:8.700s, actions:3
# recirc_id(0),in_port(3),eth(src=e4:11:c6:d3:45:f2,dst=0c:42:a1:fa:18:8e),eth_type(0x0800),ipv4(frag=no), packets:149, bytes:14602, used:0.510s, actions:2
# recirc_id(0),in_port(3),eth(src=e4:11:c6:d3:45:f2,dst=0c:42:a1:fa:18:8e),eth_type(0x0806), packets:0, bytes:0, used:8.701s, actions:2

# 我们再看看tc的配置,可以看到ovs把配置下发给了tc
# 这里是vf的入流量,可以看到它把流量镜像给了父端口,并且规则由硬件实现
tc -s filter show dev enp67s0f0_0 ingress
# filter protocol ip pref 2 flower chain 0
# filter protocol ip pref 2 flower chain 0 handle 0x1
#   dst_mac 0c:42:a1:fa:18:8e
#   src_mac e4:11:c6:d3:45:f2
#   eth_type ipv4
#   ip_flags nofrag
#   in_hw in_hw_count 1
#         action order 1: mirred (Egress Redirect to device enp67s0f0) stolen
#         index 4 ref 1 bind 1 installed 318 sec used 0 sec
#         Action statistics:
#         Sent 30380 bytes 310 pkt (dropped 0, overlimits 0 requeues 0)
#         Sent software 0 bytes 0 pkt
#         Sent hardware 30380 bytes 310 pkt
#         backlog 0b 0p requeues 0
#         cookie 8be6df4d7d4c33fce08f01a46fa10a4a
#         no_percpu
#         used_hw_stats delayed

# 我们再看看vf的出流量
# 有2个规则,一个是arp,一个是ip
# 都会把流量镜像给了父端口,并且规则由硬件实现
tc -s filter show dev enp67s0f0_0 egress
# filter ingress protocol ip pref 2 flower chain 0
# filter ingress protocol ip pref 2 flower chain 0 handle 0x1
#   dst_mac 0c:42:a1:fa:18:8e
#   src_mac e4:11:c6:d3:45:f2
#   eth_type ipv4
#   ip_flags nofrag
#   in_hw in_hw_count 1
#         action order 1: mirred (Egress Redirect to device enp67s0f0) stolen
#         index 4 ref 1 bind 1 installed 379 sec used 0 sec
#         Action statistics:
#         Sent 36260 bytes 370 pkt (dropped 0, overlimits 0 requeues 0)
#         Sent software 0 bytes 0 pkt
#         Sent hardware 36260 bytes 370 pkt
#         backlog 0b 0p requeues 0
#         cookie 8be6df4d7d4c33fce08f01a46fa10a4a
#         no_percpu
#         used_hw_stats delayed

# filter ingress protocol arp pref 4 flower chain 0
# filter ingress protocol arp pref 4 flower chain 0 handle 0x1
#   dst_mac 0c:42:a1:fa:18:8e
#   src_mac e4:11:c6:d3:45:f2
#   eth_type arp
#   in_hw in_hw_count 1
#         action order 1: mirred (Egress Redirect to device enp67s0f0) stolen
#         index 3 ref 1 bind 1 installed 13 sec used 6 sec
#         Action statistics:
#         Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
#         Sent software 0 bytes 0 pkt
#         Sent hardware 60 bytes 1 pkt
#         backlog 0b 0p requeues 0
#         cookie 1fbfd56eae42f9dbe71bf99bd800cd6d
#         no_percpu
#         used_hw_stats delayed

tc qdisc show dev enp67s0f0_0
# qdisc mq 0: root
# qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
# qdisc ingress ffff: parent ffff:fff1 ----------------

# 最后,我们把系统环境记录一下,方便回忆和项目上对比。
# on 105
ip link
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master baremetal state UP mode DEFAULT group default qlen 1000
#     link/ether 90:b1:1c:40:59:27 brd ff:ff:ff:ff:ff:ff
# 3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
#     link/ether 90:b1:1c:40:59:28 brd ff:ff:ff:ff:ff:ff
# 4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
#     link/ether 90:b1:1c:40:59:29 brd ff:ff:ff:ff:ff:ff
# 5: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
#     link/ether 90:b1:1c:40:59:2a brd ff:ff:ff:ff:ff:ff
# 6: enp67s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a2 brd ff:ff:ff:ff:ff:ff
#     vf 0     link/ether e4:11:22:33:55:60 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
#     vf 1     link/ether e4:11:22:33:55:61 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
# 7: enp67s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a3 brd ff:ff:ff:ff:ff:ff
# 8: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN mode DEFAULT group default qlen 256
#     link/infiniband 00:00:10:28:fe:80:00:00:00:00:00:00:98:03:9b:03:00:cc:71:2c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
# 9: baremetal: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
#     link/ether 90:b1:1c:40:59:27 brd ff:ff:ff:ff:ff:ff
# 10: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
#     link/ether 52:54:00:8f:4a:bc brd ff:ff:ff:ff:ff:ff
# 11: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN mode DEFAULT group default qlen 1000
#     link/ether 52:54:00:8f:4a:bc brd ff:ff:ff:ff:ff:ff
# 16: enp67s0f0_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
#     link/ether fa:cf:0f:6a:ec:45 brd ff:ff:ff:ff:ff:ff
# 17: enp67s0f0_1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
#     link/ether 76:65:93:70:96:ac brd ff:ff:ff:ff:ff:ff
# 18: enp67s0f0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether e4:11:22:33:55:60 brd ff:ff:ff:ff:ff:ff
# 19: enp67s0f0v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether e4:11:22:33:55:61 brd ff:ff:ff:ff:ff:ff
# 20: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
#     link/ether f6:e9:fd:16:8a:ea brd ff:ff:ff:ff:ff:ff
# 21: ovs-sriov: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a2 brd ff:ff:ff:ff:ff:ff
# 22: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master baremetal state UNKNOWN mode DEFAULT group default qlen 1000
#     link/ether fe:54:00:8d:b6:8e brd ff:ff:ff:ff:ff:ff

ip a
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
#     inet 127.0.0.1/8 scope host lo
#        valid_lft forever preferred_lft forever
#     inet6 ::1/128 scope host
#        valid_lft forever preferred_lft forever
# 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master baremetal state UP group default qlen 1000
#     link/ether 90:b1:1c:40:59:27 brd ff:ff:ff:ff:ff:ff
# 3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
#     link/ether 90:b1:1c:40:59:28 brd ff:ff:ff:ff:ff:ff
# 4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
#     link/ether 90:b1:1c:40:59:29 brd ff:ff:ff:ff:ff:ff
# 5: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
#     link/ether 90:b1:1c:40:59:2a brd ff:ff:ff:ff:ff:ff
# 6: enp67s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a2 brd ff:ff:ff:ff:ff:ff
# 7: enp67s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a3 brd ff:ff:ff:ff:ff:ff
# 8: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256
#     link/infiniband 00:00:10:28:fe:80:00:00:00:00:00:00:98:03:9b:03:00:cc:71:2c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
# 9: baremetal: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
#     link/ether 90:b1:1c:40:59:27 brd ff:ff:ff:ff:ff:ff
#     inet 172.21.6.105/24 brd 172.21.6.255 scope global noprefixroute baremetal
#        valid_lft forever preferred_lft forever
#     inet6 fe80::12a7:202d:c70b:be14/64 scope link noprefixroute
#        valid_lft forever preferred_lft forever
# 10: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
#     link/ether 52:54:00:8f:4a:bc brd ff:ff:ff:ff:ff:ff
#     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
#        valid_lft forever preferred_lft forever
# 11: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
#     link/ether 52:54:00:8f:4a:bc brd ff:ff:ff:ff:ff:ff
# 16: enp67s0f0_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
#     link/ether fa:cf:0f:6a:ec:45 brd ff:ff:ff:ff:ff:ff
# 17: enp67s0f0_1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
#     link/ether 76:65:93:70:96:ac brd ff:ff:ff:ff:ff:ff
# 18: enp67s0f0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether e4:11:22:33:55:60 brd ff:ff:ff:ff:ff:ff
#     inet 192.168.55.31/24 scope global enp67s0f0v0
#        valid_lft forever preferred_lft forever
# 19: enp67s0f0v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether e4:11:22:33:55:61 brd ff:ff:ff:ff:ff:ff
#     inet 192.168.55.32/24 scope global enp67s0f0v1
#        valid_lft forever preferred_lft forever
# 20: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
#     link/ether f6:e9:fd:16:8a:ea brd ff:ff:ff:ff:ff:ff
# 21: ovs-sriov: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a2 brd ff:ff:ff:ff:ff:ff
# 22: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master baremetal state UNKNOWN group default qlen 1000
#     link/ether fe:54:00:8d:b6:8e brd ff:ff:ff:ff:ff:ff
#     inet6 fe80::fc54:ff:fe8d:b68e/64 scope link
#        valid_lft forever preferred_lft forever

ovs-vsctl show
# 8f3eddeb-c42c-4af4-9dc8-a46169d91a7c
#     Bridge ovs-sriov
#         Port enp67s0f0_1
#             Interface enp67s0f0_1
#         Port ovs-sriov
#             Interface ovs-sriov
#                 type: internal
#         Port enp67s0f0
#             Interface enp67s0f0
#         Port enp67s0f0_0
#             Interface enp67s0f0_0
#     ovs_version: "2.14.1"

# on kvm
ip link
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# 2: enp0s6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether e4:11:c6:d3:45:f2 brd ff:ff:ff:ff:ff:ff
# 3: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
#     link/ether 52:54:00:8d:b6:8e brd ff:ff:ff:ff:ff:ff

ip a
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
#     inet 127.0.0.1/8 scope host lo
#        valid_lft forever preferred_lft forever
#     inet6 ::1/128 scope host
#        valid_lft forever preferred_lft forever
# 2: enp0s6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether e4:11:c6:d3:45:f2 brd ff:ff:ff:ff:ff:ff
#     inet 192.168.99.11/24 brd 192.168.99.255 scope global noprefixroute enp0s6
#        valid_lft forever preferred_lft forever
#     inet6 fe80::f3c:b686:1739:a748/64 scope link noprefixroute
#        valid_lft forever preferred_lft forever
# 3: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
#     link/ether 52:54:00:8d:b6:8e brd ff:ff:ff:ff:ff:ff
#     inet 172.21.6.11/24 brd 172.21.6.255 scope global noprefixroute enp1s0
#        valid_lft forever preferred_lft forever
#     inet6 fe80::5054:ff:fe8d:b68e/64 scope link noprefixroute
#        valid_lft forever preferred_lft forever

性能测试

# on 102
dnf install -y iperf3
systemctl disable --now firewalld

iperf3 -s -p 6666

# on 11
dnf install -y iperf3

iperf3 -t 20 -p 6666 -c 192.168.99.21
Connecting to host 192.168.99.21, port 6666
[  5] local 192.168.99.11 port 50960 connected to 192.168.99.21 port 6666
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.40 GBytes  12.0 Gbits/sec    0    594 KBytes
[  5]   1.00-2.00   sec  1.39 GBytes  12.0 Gbits/sec    0    594 KBytes
[  5]   2.00-3.00   sec  1.39 GBytes  12.0 Gbits/sec    0    594 KBytes
[  5]   3.00-4.00   sec  1.40 GBytes  12.0 Gbits/sec    0    624 KBytes
[  5]   4.00-5.00   sec  1.40 GBytes  12.0 Gbits/sec    0    659 KBytes
[  5]   5.00-6.00   sec  1.40 GBytes  12.0 Gbits/sec    0    659 KBytes
[  5]   6.00-7.00   sec  1.40 GBytes  12.0 Gbits/sec    0    659 KBytes
[  5]   7.00-8.00   sec  1.40 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]   8.00-9.00   sec  1.40 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]   9.00-10.00  sec  1.40 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]  10.00-11.00  sec  1.39 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]  11.00-12.00  sec  1.39 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]  12.00-13.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  13.00-14.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  14.00-15.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  15.00-16.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  16.00-17.00  sec  1.39 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]  17.00-18.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  18.00-19.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  19.00-20.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-20.00  sec  27.9 GBytes  12.0 Gbits/sec    0             sender
[  5]   0.00-20.04  sec  27.9 GBytes  11.9 Gbits/sec                  receiver

iperf Done.

# on 105
systemctl disable --now irqbalance.service
mlnx_affinity start

# on 102
systemctl disable --now irqbalance.service
mlnx_affinity start

# on 102
dnf install -y qperf
qperf

# on 105
qperf 192.168.88.21 tcp_bw
tcp_bw:
    bw  =  2.8 GB/sec

# on 101
qperf 192.168.99.21 tcp_bw
tcp_bw:
    bw  =  1.48 GB/sec