IB网络配置-bond

1. 前言

本文详细介绍在almalinux 8.9上如何设置IB网卡聚合并配置静态ip。

   

2. 驱动安装

2.1. 下载驱动

IB网卡驱动由厂家提供,本文介绍的驱动是英伟达官方发布。在下载驱动前要检查IB网卡型号,并查找支持该型号的驱动。下载地址为:https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/

2.2. 安装驱动

将驱动压缩包解压后,执行sudo ./mlnxofedinstall安装驱动。

2.3. 启动服务

/etc/init.d/openibd restart
/etc/init.d/opensmd restart

   

3. IB网卡测试

3.1. 查看ib网卡状态

ibstat
----------------------------------
CA 'mlx5_0'
	CA type: MT4123
	Number of ports: 1
	Firmware version: 20.39.2048
	Hardware version: 0
	Node GUID: 0xa088c203002cf0e0
	System image GUID: 0xa088c203002cf0e0
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 200
		Base lid: 207
		LMC: 0
		SM lid: 1
		Capability mask: 0xa651e848
		Port GUID: 0xa088c203002cf0e0
		Link layer: InfiniBand
CA 'mlx5_1'
	CA type: MT4123
	Number of ports: 1
	Firmware version: 20.39.2048
	Hardware version: 0
	Node GUID: 0xa088c203002d3ef0
	System image GUID: 0xa088c203002d3ef0
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 200
		Base lid: 200
		LMC: 0
		SM lid: 1
		Capability mask: 0xa651e848
		Port GUID: 0xa088c203002d3ef0
		Link layer: InfiniBand

从以上结果可以看到,检测到2个IB网卡:mlx5_0mlx5_1。对于每个网卡的信息中,重点关注StateLink layer 两个字段。State: Active表示该网卡处于正常工作状态,Link layer: InfiniBand表示该网卡工作在InfiniBand模式。IB网卡工作模式有两种:Ethernet和InfiniBand。

3.2. 连通性测试

使用ibping命令来测试两台机器之间IB网络连通情况。

3.2.1. 服务端

ibping -S -C mlx5_0 -P 1

3.2.2. 客户端

ibping -c 10000 -f -C mlx5_0 -L 205
-------------------------------------

--- node3.(none) (Lid 205) ibping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 84 ms
rtt min/avg/max = 0.002/0.008/0.019 ms

3.3. 带宽测试

3.3.1. 服务端

ib_write_bw -a -d mlx5_0 --report_gbits
-----------------------------------------

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF		Device         : mlx5_0
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0xcd QPN 0x14ed7 PSN 0x6490b0 RKey 0x06c6d4 VAddr 0x007f14513e5000
 remote address: LID 0xcf QPN 0x15e2f PSN 0x93b196 RKey 0x00a8d1 VAddr 0x007fb537d34000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 8388608    5000             196.80             196.80 		   0.002933
---------------------------------------------------------------------------------------

3.3.2. 客户端

ib_write_bw -a -F 10.1.90.3 -d mlx5_0 --report_gbits
-------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF		Device         : mlx5_0
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0xcf QPN 0x15e2f PSN 0x93b196 RKey 0x00a8d1 VAddr 0x007fb537d34000
 remote address: LID 0xcd QPN 0x14ed7 PSN 0x6490b0 RKey 0x06c6d4 VAddr 0x007f14513e5000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2          5000           0.086643            0.086198            5.387353
 4          5000             0.17               0.17   		   5.444247
 8          5000             0.35               0.35   		   5.455403
 16         5000             0.70               0.70   		   5.458773
 32         5000             1.40               1.40   		   5.464472
 64         5000             2.82               2.81   		   5.484173
 128        5000             5.63               5.60   		   5.465125
 256        5000             11.23              11.19  		   5.465888
 512        5000             22.38              22.36  		   5.457910
 1024       5000             44.52              44.46  		   5.426791
 2048       5000             88.88              88.77  		   5.418079
 4096       5000             157.04             156.59 		   4.778655
 8192       5000             196.41             196.28 		   2.995013
 16384      5000             196.61             196.57 		   1.499719
 32768      5000             196.71             196.70 		   0.750358
 65536      5000             196.76             196.75 		   0.375268
 131072     5000             196.78             196.77 		   0.187653
 262144     5000             196.79             196.79 		   0.093837
 524288     5000             196.79             196.79 		   0.046918
 1048576    5000             196.80             196.80 		   0.023460
 2097152    5000             196.80             196.80 		   0.011730
 4194304    5000             196.79             196.79 		   0.005865
 8388608    5000             196.80             196.80 		   0.002933
---------------------------------------------------------------------------------------

   

4. IB网络配置

4.1. 查看IB网卡映射

ibdev2netdev
---------------------------------------------------------------------------------------
mlx5_0 port 1 ==> ib0 (Up)
mlx5_1 port 1 ==> ib1 (Up)

以上结果显示,mlx5_0mlx5_1两个IB网卡在系统中分别映射成ib0ib1两个网卡。在配置IB网卡的ip地址时,实际上是配置ib0ib1,而不是mlx5_0mlx5_1

4.2. IB网卡bond配置

4.2.1. bond0配置

添加/etc/sysconfig/network-scripts/ifcfg-bond0文件,并追加以下内容:

CONNECTED_MODE=no
TYPE=bond
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
NAME=bond0
DEVICE=bond0
ONBOOT=yes
IPADDR=30.1.90.4
NETMASK=255.0.0.0
USERCTL=no
BONDING_MASTER=yes
BONDING_OPTS="mode=1 miimon=100"

其中最重要的是TYPEBONDING_MASTER两个字段。网上在配置bond时通常将TYPE字段设置成bonding,然后再现场配置时,必须要配置成bond才有效。

修改/etc/modprobe.d/ib_ipoib.conf,并追加以下内容:

alias bond0 bonding
options bond0 miimon=100 mode=1 max_bonds=1

4.2.2. ib0配置

CONNECTED_MODE=no
TYPE=InfiniBand
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
NAME=ib0
DEVICE=ib0
ONBOOT=yes
USERCTL=no
MASTER=bond0
SLAVE=yes
PRIMARY=yes

4.2.3. ib1配置

CONNECTED_MODE=no
TYPE=InfiniBand
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
NAME=ib1
DEVICE=ib1
ONBOOT=yes
USERCTL=no
MASTER=bond0
SLAVE=yes

4.2.4. 重启网络

nmcli connection reload

4.2.5. 查看ip

ip a
----------
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 34:73:79:be:24:30 brd ff:ff:ff:ff:ff:ff
    altname enp24s0f0
    altname ens64f0
    inet 10.1.90.4/16 brd 10.1.255.255 scope global noprefixroute eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::3673:79ff:febe:2430/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 34:73:79:be:24:31 brd ff:ff:ff:ff:ff:ff
    altname enp24s0f1
    altname ens64f1
4: ib0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 256
    link/infiniband 00:00:10:29:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2c:f0:e0 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
5: ib1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 256
    link/infiniband 00:00:10:29:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2d:3e:f0 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/infiniband 00:00:10:29:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2d:3e:f0 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 30.1.90.4/8 brd 30.255.255.255 scope global noprefixroute bond0
       valid_lft forever preferred_lft forever

   

5. 参考资料