← Back to Index

openshift 4.9 加载第三方驱动 / 内核模块

我们在项目中,会遇到特种硬件,比如 fpga 卡,软件供应商为这个 fpga 卡提供了驱动/内核模块,我们需要把这个驱动加载到系统中。本文就讲述,如何在 openshift 4.9 里面,通过 deployment / pod 的方式,想系统注入这个驱动/内核模块。

在本次实验中,物理机上有一块fpga卡,我们得到了对应的驱动 nr_drv_wr.ko ,这个驱动加载以后,会创建一个网卡,我们要初始化这个网卡。

好了,就让我们来看看是怎么做的吧。

制作镜像

我们把驱动拷贝到镜像里面,还把自动加载脚本也复制到镜像里面。自动加载脚本里面,有一个小技巧,就是 ko 文件,需要打上正确的selinux 标签,否则 insmod 会报错。


        mkdir -p /data/wzh/fpga
        cd /data/wzh/fpga
        
        cat << 'EOF' > ./ocp4.install.sh
        #!/bin/bash
        
        set -e
        set -x
        
        if  chroot /host lsmod  | grep nr_drv > /dev/null 2>&1
        then
            echo NR Driver Module had loaded!
        else
            echo Inserting NR Driver Module
            # chroot /host rmmod nr_drv > /dev/null 2>&1
        
            if [ $(uname -r) == "4.18.0-305.19.1.rt7.91.el8_4.x86_64" ];
            then
                echo insmod nr_drv_wr.ko ...
                /bin/cp -f nr_drv_wr.ko /host/tmp/nr_drv_wr.ko
                chroot /host chcon -t modules_object_t /tmp/nr_drv_wr.ko
                chroot /host insmod /tmp/nr_drv_wr.ko load_xeth=1
                /bin/rm -f /host/tmp/nr_drv_wr.ko
        
                CON_NAME=`chroot /host nmcli -g GENERAL.CONNECTION dev show xeth`
        
                chroot /host nmcli connection modify "$CON_NAME" con-name xeth
                chroot /host nmcli connection modify xeth ipv4.method disabled ipv6.method disabled
                chroot /host nmcli dev conn xeth
            else
                echo insmod nr_drv_ko Failed!
            fi
        
        fi
        EOF
        
        cat << EOF > ./fpga.dockerfile
        FROM docker.io/busybox:1.34
        
        USER root
        COPY Driver.PKG /Driver.PKG
        
        COPY ocp4.install.sh /ocp4.install.sh
        RUN chmod +x /ocp4.install.sh
        
        WORKDIR /
        EOF
        
        buildah bud -t registry.ocp4.redhat.ren:5443/nep/fgpa-driver:v07 -f fpga.dockerfile .
        
        buildah push registry.ocp4.redhat.ren:5443/nep/fgpa-driver:v07

openshift 部署

部署之前,我们先给service account加上特权模式,我们这个实验,在default project里面,用了default service account,所以命令就在下面,但是到了具体项目中,一般是要创建单独的project,并且创建单独的service account的。

然后我们用了几个小技巧,首先用init container,把驱动复制进pod,传递给真正运行的容器,然后我们无限睡眠,保持这个pod运行,这么做是因为,如果容器正常退出了,deployment会自动重启,但是我们这里不想自动重启,所以我们无限睡眠,保持这个pod运行。好在这个 pod 消耗很小。

未来可能会优化成用 job / static pod 的方式来运行。


        oc adm policy add-scc-to-user privileged -z default -n default
        
        cat << EOF > /data/install/fpga.driver.yaml
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: fpga-driver
          # namespace: default
          labels:
            app: fpga-driver
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: fpga-driver
          template:
            metadata:
              labels:
                app: fpga-driver
            spec:
              hostPID: true
              affinity:
                podAntiAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                    - labelSelector:
                        matchExpressions:
                          - key: "app"
                            operator: In
                            values:
                            - fpga-driver
                      topologyKey: "kubernetes.io/hostname"
                nodeAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                    nodeSelectorTerms:
                    - matchExpressions:
                      - key: kubernetes.io/hostname
                        operator: In
                        values:
                        - worker-0
              # restartPolicy: Never
              initContainers:
              - name: copy
                image: registry.ocp4.redhat.ren:5443/nep/fgpa-driver:v07
                command: ["/bin/sh", "-c", "tar zvxf /Driver.PKG --strip 1 -C /nep/driver/ && /bin/cp -f /ocp4.install.sh /nep/driver/ "]
                imagePullPolicy: Always
                volumeMounts:
                - name: driver-files
                  mountPath: /nep/driver/
              containers:
              - name: driver
                image: registry.redhat.io/rhel8/support-tools:8.4
                # imagePullPolicy: Always
                command: [ "/usr/bin/bash","-c","cd /nep/driver/ && bash ./ocp4.install.sh && sleep infinity " ]
                # command: [ "/usr/bin/bash","-c","tail -f /dev/null || true " ]
                resources:
                  requests:
                    cpu: 10m
                    memory: 20Mi
                securityContext:
                  privileged: true
                  # runAsUser: 0
                  seLinuxOptions:
                    level: "s0"
                volumeMounts:
                - name: driver-files
                  mountPath: /nep/driver/
                - name: host
                  mountPath: /host
              volumes: 
              - name: driver-files
                emptyDir: {}
              - name: host
                hostPath:
                  path: /
                  type: Directory
        EOF
        oc create -f /data/install/fpga.driver.yaml
        
        # to restore
        
        oc delete -f /data/install/fpga.driver.yaml
        

sign the kernel model

CHAPTER 4. SIGNING KERNEL MODULES FOR SECURE BOOT