openshift 4.9 加载第三方驱动 / 内核模块

我们在项目中,会遇到特种硬件,比如 fpga 卡,软件供应商为这个 fpga 卡提供了驱动/内核模块,我们需要把这个驱动加载到系统中。本文就讲述,如何在 openshift 4.9 里面,通过 deployment / pod 的方式,想系统注入这个驱动/内核模块。

在本次实验中,物理机上有一块fpga卡,我们得到了对应的驱动 nr_drv_wr.ko ,这个驱动加载以后,会创建一个网卡,我们要初始化这个网卡。

好了,就让我们来看看是怎么做的吧。

制作镜像

我们把驱动拷贝到镜像里面,还把自动加载脚本也复制到镜像里面。自动加载脚本里面,有一个小技巧,就是 ko 文件,需要打上正确的selinux 标签,否则 insmod 会报错。


mkdir -p /data/wzh/fpga
cd /data/wzh/fpga

cat << 'EOF' > ./ocp4.install.sh
#!/bin/bash

set -e
set -x

if  chroot /host lsmod  | grep nr_drv > /dev/null 2>&1
then
    echo NR Driver Module had loaded!
else
    echo Inserting NR Driver Module
    # chroot /host rmmod nr_drv > /dev/null 2>&1

    if [ $(uname -r) == "4.18.0-305.19.1.rt7.91.el8_4.x86_64" ];
    then
        echo insmod nr_drv_wr.ko ...
        /bin/cp -f nr_drv_wr.ko /host/tmp/nr_drv_wr.ko
        chroot /host chcon -t modules_object_t /tmp/nr_drv_wr.ko
        chroot /host insmod /tmp/nr_drv_wr.ko load_xeth=1
        /bin/rm -f /host/tmp/nr_drv_wr.ko

        CON_NAME=`chroot /host nmcli -g GENERAL.CONNECTION dev show xeth`

        chroot /host nmcli connection modify "$CON_NAME" con-name xeth
        chroot /host nmcli connection modify xeth ipv4.method disabled ipv6.method disabled
        chroot /host nmcli dev conn xeth
    else
        echo insmod nr_drv_ko Failed!
    fi

fi
EOF

cat << EOF > ./fpga.dockerfile
FROM docker.io/busybox:1.34

USER root
COPY Driver.PKG /Driver.PKG

COPY ocp4.install.sh /ocp4.install.sh
RUN chmod +x /ocp4.install.sh

WORKDIR /
EOF

buildah bud -t registry.ocp4.redhat.ren:5443/nep/fgpa-driver:v07 -f fpga.dockerfile .

buildah push registry.ocp4.redhat.ren:5443/nep/fgpa-driver:v07

openshift 部署

部署之前,我们先给service account加上特权模式,我们这个实验,在default project里面,用了default service account,所以命令就在下面,但是到了具体项目中,一般是要创建单独的project,并且创建单独的service account的。

然后我们用了几个小技巧,首先用init container,把驱动复制进pod,传递给真正运行的容器,然后我们无限睡眠,保持这个pod运行,这么做是因为,如果容器正常退出了,deployment会自动重启,但是我们这里不想自动重启,所以我们无限睡眠,保持这个pod运行。好在这个 pod 消耗很小。

未来可能会优化成用 job / static pod 的方式来运行。


oc adm policy add-scc-to-user privileged -z default -n default

cat << EOF > /data/install/fpga.driver.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fpga-driver
  # namespace: default
  labels:
    app: fpga-driver
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fpga-driver
  template:
    metadata:
      labels:
        app: fpga-driver
    spec:
      hostPID: true
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - fpga-driver
              topologyKey: "kubernetes.io/hostname"
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - worker-0
      # restartPolicy: Never
      initContainers:
      - name: copy
        image: registry.ocp4.redhat.ren:5443/nep/fgpa-driver:v07
        command: ["/bin/sh", "-c", "tar zvxf /Driver.PKG --strip 1 -C /nep/driver/ && /bin/cp -f /ocp4.install.sh /nep/driver/ "]
        imagePullPolicy: Always
        volumeMounts:
        - name: driver-files
          mountPath: /nep/driver/
      containers:
      - name: driver
        image: registry.redhat.io/rhel8/support-tools:8.4
        # imagePullPolicy: Always
        command: [ "/usr/bin/bash","-c","cd /nep/driver/ && bash ./ocp4.install.sh && sleep infinity " ]
        # command: [ "/usr/bin/bash","-c","tail -f /dev/null || true " ]
        resources:
          requests:
            cpu: 10m
            memory: 20Mi
        securityContext:
          privileged: true
          # runAsUser: 0
          seLinuxOptions:
            level: "s0"
        volumeMounts:
        - name: driver-files
          mountPath: /nep/driver/
        - name: host
          mountPath: /host
      volumes: 
      - name: driver-files
        emptyDir: {}
      - name: host
        hostPath:
          path: /
          type: Directory
EOF
oc create -f /data/install/fpga.driver.yaml

# to restore
oc delete -f /data/install/fpga.driver.yaml


sign the kernel model

CHAPTER 4. SIGNING KERNEL MODULES FOR SECURE BOOT