← Back to Index

[!TIP] Ongoing and occasional updates and improvements.

using keepalived as a sidecar to maintain VIP for pods

The client wants to use Kubernetes (K8S) as a traditional platform. In practice, they plan to expose IP addresses directly from pod. Additionally, there are two Pod running on two hosts that VIP can migrate between each other. However, they have not fully utilized some of the native features offered by Kubernetes and only regard it as a container management tool.

After summarizing, we have clarified the customer’s requirements:

  1. it is necessary to support running multiple Pods on the development platform
  2. there should be a capability to directly expose IP addresses for external accessing;
  3. it is required to configure a VIP across multiple Pods, and this VIP should be able to migrate between different Pods.
  4. After migrating the VIP from pod-01 to pod-02, the VIP cannot be automatically migrated back to pod-01. Therefore, future migration operations need to be performed manually.
  5. The system can only detect the failure status of nodes and does not check the failure status of applications.

We will use macvlan on 2nd network to demo the VIP for pods. We will also use keepalived as a sidecar to maintain the VIP for the pods.

Here is the architecture diagram of our testing:

Key points of this solution:

  1. keepalived as a sidecar to maintain the VIP for the pods.
  2. keepalived will change the route table (default gateway) when the VIP is migrated to another pod.
  3. pods run with macvlan on 2nd network.

[!TIP] Only VIPs require a public address; all other addresses can be private.

macvlan on 2nd network

First, we need to configure the settings related to macvlan in the cluster of our province. Once this configuration is complete, we will be able to utilize it when deploying Deployment and Pool.


        var_namespace='demo-playground'
        
        # create demo project
        
        oc new-project $var_namespace
        
        
        # create the macvlan config
        
        # please notice, we have ip address configured.
        
        oc delete -f ${BASE_DIR}/data/install/macvlan.conf
        
        var_namespace='demo-playground'
        cat << EOF > ${BASE_DIR}/data/install/macvlan.conf
        ---
        apiVersion: k8s.cni.cncf.io/v1
        kind: NetworkAttachmentDefinition
        metadata:
          name: $var_namespace-macvlan
          namespace: $var_namespace
        spec:
          config: |- 
            {
              "cniVersion": "0.3.1",
              "name": "macvlan-net",
              "type": "macvlan",
              "_master": "eth1",
              "linkInContainer": false,
              "mode": "bridge",
              "ipam": {
                  "type": "static"
                }
            }
        
        EOF
        
        oc apply -f ${BASE_DIR}/data/install/macvlan.conf

test with pods

Next, we deploy two Pods in the cluster. Each Pod is assigned an IP address from the macvlan network. We will then run some commands on the Pods to verify the configuration.


        # create demo pods
        
        oc delete -f ${BASE_DIR}/data/install/pod.yaml
        
        var_namespace='demo-playground'
        cat << EOF > ${BASE_DIR}/data/install/pod.yaml
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-01
          namespace: $var_namespace
          labels:
            app: tinypod-01
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-01
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.91/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-01
                wzh-run: tinypod-testing
            spec:
              affinity:
                podAntiAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchExpressions:
                        - key: app
                          operator: In
                          values:
                          - tinypod-02
                      topologyKey: "kubernetes.io/hostname"
              containers:
              - image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                name: agnhost-container
                command: [ "/agnhost", "serve-hostname"]
        
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-02
          namespace: $var_namespace
          labels:
            app: tinypod-02
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-02
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.92/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-02
                wzh-run: tinypod-testing
            spec:
              affinity:
                podAntiAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchExpressions:
                        - key: app
                          operator: In
                          values:
                          - tinypod-01
                      topologyKey: "kubernetes.io/hostname"
              containers:
              - image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                name: agnhost-container
                command: [ "/agnhost", "serve-hostname"]
        
        EOF
        
        oc apply -f ${BASE_DIR}/data/install/pod.yaml
        
        # run commands on the pods belongs to both deployments
        
        # Get the list of pod names
        
        pods=$(oc get pods -n $var_namespace -l wzh-run=tinypod-testing -o jsonpath='{.items[*].metadata.name}')
        
        # Loop through each pod and execute the command
        
        for pod in $pods; do
          echo "Pod: $pod"
          oc exec -it $pod -n $var_namespace -- /bin/sh -c "ip a"
          echo
        done
        
        # Pod: tinypod-01-64f74695d5-qzdkr
        
        # 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        
        #     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        
        #     inet 127.0.0.1/8 scope host lo
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 ::1/128 scope host
        
        #        valid_lft forever preferred_lft forever
        
        # 2: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
        
        #     link/ether 0a:58:0a:86:00:0a brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 10.134.0.10/23 brd 10.134.1.255 scope global eth0
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::858:aff:fe86:a/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # 3: net1@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        
        #     link/ether 12:8f:74:c6:ef:18 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 192.168.99.91/24 brd 192.168.99.255 scope global net1
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::108f:74ff:fec6:ef18/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # Pod: tinypod-02-597bb4db87-wmh74
        
        # 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        
        #     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        
        #     inet 127.0.0.1/8 scope host lo
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 ::1/128 scope host
        
        #        valid_lft forever preferred_lft forever
        
        # 2: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
        
        #     link/ether 0a:58:0a:85:00:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 10.133.0.12/23 brd 10.133.1.255 scope global eth0
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::858:aff:fe85:c/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # 3: net1@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        
        #     link/ether c2:4f:09:dc:ea:43 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 192.168.99.92/24 brd 192.168.99.255 scope global net1
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::c04f:9ff:fedc:ea43/64 scope link
        
        #        valid_lft forever preferred_lft forever

keepalived as a sidecar

Next, we will deploy a keepalived container as a sidecar to maintain the VIP for the pods. The keepalived container will be responsible for monitoring the health of the pods and managing the VIP.

keepalived image

There are some keepalived container image available on the github, but they are not updated for a long time. We will build our own keepalived image.


        mkdir -p /data/keepalived
        cd /data/keepalived
        
        cat << EOF > init.sh
        #!/bin/bash
        
        set -e
        set -o pipefail
        
        /usr/sbin/keepalived -n -l -D -f /etc/keepalived/keepalived.conf
        EOF
        
        cat << EOF > Dockerfile
        FROM registry.access.redhat.com/ubi9
        
        # Update the image to get the latest CVE updates
        
        RUN dnf update -y \
         && dnf install -y --nodocs --allowerasing \
            bash       \
            curl       \
            iproute    \
            keepalived \
         && rm /etc/keepalived/keepalived.conf
        
        COPY init.sh /init.sh
        
        RUN chmod +x init.sh
        
        CMD ["./init.sh"]
        EOF
        
        podman build -t quay.io/wangzheng422/qimgs:keepalived-2024-09-06-v01 .
        
        podman push quay.io/wangzheng422/qimgs:keepalived-2024-09-06-v01

settings


        # create scc for keepalived, we need to add NET_ADMIN, NET_BROADCAST, NET_RAW capabilities
        
        oc delete -f ${BASE_DIR}/data/install/keepalived-scc.yaml
        
        cat << EOF > ${BASE_DIR}/data/install/keepalived-scc.yaml
        apiVersion: security.openshift.io/v1
        kind: SecurityContextConstraints
        metadata:
          name: keepalived-scc
        allowPrivilegedContainer: false
        allowedCapabilities:
        
        - NET_ADMIN
        - NET_BROADCAST
        - NET_RAW
        runAsUser:
          type: RunAsAny
        seLinuxContext:
          type: RunAsAny
        fsGroup:
          type: RunAsAny
        supplementalGroups:
          type: RunAsAny
        users: []
        groups: []
        EOF
        oc apply -f ${BASE_DIR}/data/install/keepalived-scc.yaml
        
        # create a sa
        
        oc delete -f ${BASE_DIR}/data/install/keepalived-sa.yaml
        var_namespace='demo-playground'
        cat << EOF > ${BASE_DIR}/data/install/keepalived-sa.yaml
        apiVersion: v1
        kind: ServiceAccount
        metadata:
          name: keepalived-sa
          namespace: $var_namespace
        EOF
        oc apply -f ${BASE_DIR}/data/install/keepalived-sa.yaml
        
        # add scc to sa
        
        oc adm policy add-scc-to-user keepalived-scc -z keepalived-sa -n $var_namespace

run with 2 nodes

ip addresses:


        # create demo pods
        
        # 192.168.77.100 is our VIP
        
        oc delete -f ${BASE_DIR}/data/install/pod.yaml
        
        var_namespace='demo-playground'
        cat << EOF > ${BASE_DIR}/data/install/pod.yaml
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-config
          namespace: $var_namespace
        data:
          keepalived.conf: |
            global_defs {
                log_level 7
                script_user root
                # enable_script_security
            }
            vrrp_script chk_ip {
                script "/etc/keepalived/check_ip.sh"
                interval 2
            }
            vrrp_instance VI_1 {
                state MASTER
                # state BACKUP
                interface net1
                virtual_router_id 51
                priority 100
                advert_int 1
                authentication {
                    auth_type PASS
                    auth_pass 1111
                }
                virtual_ipaddress {
                    192.168.77.100/24 dev net1
                }
                track_interface {
                    net1
                }
                track_script {
                    chk_ip 
                }
                notify_master "/etc/keepalived/notify_master.sh"
                notify_backup "/etc/keepalived/notify_backup.sh"
            }
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-scripts
          namespace: $var_namespace
        data:
          check_ip.sh: |
            #!/bin/sh
            if curl --max-time 0.1 -s http://192.168.99.91:9376 > /dev/null 2>&1 ; then
              exit 0
            else
              exit 1
            fi
          notify_master.sh: |
            #!/bin/sh
            ip route del default
            ip route add default via 192.168.77.1 dev net1
          notify_backup.sh: |
            #!/bin/sh
            ip route del default
            GATEWAY=\$(ip r | grep "10.132.0.0/14" | awk '{print \$3}')
            ip route add default via \$GATEWAY dev eth0   
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-01
          namespace: $var_namespace
          labels:
            app: tinypod-01
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-01
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.91/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-01
                wzh-run: tinypod-testing
            spec:
              affinity:
                podAntiAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchExpressions:
                        - key: app
                          operator: In
                          values:
                          - tinypod-02
                      topologyKey: "kubernetes.io/hostname"
              serviceAccountName: keepalived-sa
              initContainers:
              - name: init-permissions
                image: docker.io/busybox
                command: ['sh', '-c', 'cp /etc/keepalived/*.sh /tmp/keepalived/ && chmod 755 /tmp/keepalived/*.sh && chown root:root /tmp/keepalived/*.sh']
                volumeMounts:
                - name: keepalived-scripts
                  mountPath: /etc/keepalived
                - name: writable-scripts
                  mountPath: /tmp/keepalived
              containers:
              - name: application-container
                image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                command: [ "/agnhost", "serve-hostname"]
              - name: keepalived
                image: quay.io/wangzheng422/qimgs:keepalived-2024-09-06-v01
                imagePullPolicy: IfNotPresent
                securityContext:
                  # privileged: true
                  # runAsUser: 0
                  capabilities:
                    add: ["NET_ADMIN", "NET_BROADCAST", "NET_RAW"]
                volumeMounts:
                - name: keepalived-config
                  mountPath: /etc/keepalived/keepalived.conf
                  subPath: keepalived.conf
                - name: writable-scripts
                  mountPath: /etc/keepalived
              volumes:
              - name: keepalived-config
                configMap:
                  name: keepalived-config
              - name: keepalived-scripts
                configMap:
                  name: keepalived-scripts
              - name: writable-scripts
                emptyDir: {}
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-config-backup
          namespace: $var_namespace
        data:
          keepalived.conf: |
            global_defs {
                log_level 7
                script_user root
                # enable_script_security
            }
            vrrp_script chk_ip {
                script "/etc/keepalived/check_ip.sh"
                interval 2
                weight +20
            }
            vrrp_instance VI_1 {
                state BACKUP
                interface net1
                virtual_router_id 51
                priority 90
                advert_int 1
                authentication {
                    auth_type PASS
                    auth_pass 1111
                }
                virtual_ipaddress {
                    192.168.77.100/24 dev net1
                }
                track_interface {
                    net1
                }
                track_script {
                    chk_ip
                }
                notify_master "/etc/keepalived/notify_master.sh"
                notify_backup "/etc/keepalived/notify_backup.sh"
            }
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-backup-scripts
          namespace: $var_namespace
        data:
          check_ip.sh: |
            #!/bin/sh
            
            # ourself is ok?
            if curl --max-time 0.1 -s http://192.168.99.92:9376 > /dev/null 2>&1 ; then
              # exit 0
              # continue, only ourself is ok.
              :
            else
              exit 1
            fi
            
            # Define the local file to record failure
            FAILURE_RECORD_FILE="/tmp/failure_record.txt"
        
            # Check if the failure record file exists
            # if so, we will still be the master
            if [ -f "\$FAILURE_RECORD_FILE" ]; then
              exit 0  # return Success (will add weight)
            fi
        
            # if peer fail, we should add weight
            if curl --max-time 0.1 -s http://192.168.99.91:9376 > /dev/null 2>&1 ; then
              # exit 0
              exit 1  # curl ok, return Failure (no change in weight)
            else
              # exit 1
              # Record the failure by creating the file
              touch "\$FAILURE_RECORD_FILE"
              exit 0  # curl fail, return Success (will add weight)
            fi
          notify_master.sh: |
            #!/bin/sh
            ip route del default
            ip route add default via 192.168.77.1 dev net1
          notify_backup.sh: |
            #!/bin/sh
            ip route del default
            GATEWAY=\$(ip r | grep "10.132.0.0/14" | awk '{print \$3}')
            ip route add default via \$GATEWAY dev eth0 
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-02
          namespace: $var_namespace
          labels:
            app: tinypod-02
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-02
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.92/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-02
                wzh-run: tinypod-testing
            spec:
              affinity:
                podAntiAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchExpressions:
                        - key: app
                          operator: In
                          values:
                          - tinypod-01
                      topologyKey: "kubernetes.io/hostname"
              serviceAccountName: keepalived-sa
              initContainers:
              - name: init-permissions
                image: docker.io/busybox
                command: ['sh', '-c', 'cp /etc/keepalived/*.sh /tmp/keepalived/ && chmod 755 /tmp/keepalived/*.sh && chown root:root /tmp/keepalived/*.sh']
                volumeMounts:
                - name: keepalived-scripts
                  mountPath: /etc/keepalived
                - name: writable-scripts
                  mountPath: /tmp/keepalived
              containers:
              - name: application-container
                image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                command: [ "/agnhost", "serve-hostname"]
              - name: keepalived
                image: quay.io/wangzheng422/qimgs:keepalived-2024-09-06-v01
                imagePullPolicy: IfNotPresent
                securityContext:
                  # privileged: true
                  # runAsUser: 0
                  capabilities:
                    add: ["NET_ADMIN", "NET_BROADCAST", "NET_RAW"]
                volumeMounts:
                - name: keepalived-config
                  mountPath: /etc/keepalived/keepalived.conf
                  subPath: keepalived.conf
                - name: writable-scripts
                  mountPath: /etc/keepalived
              volumes:
              - name: keepalived-config
                configMap:
                  name: keepalived-config-backup
              - name: keepalived-scripts
                configMap:
                  name: keepalived-backup-scripts
              - name: writable-scripts
                emptyDir: {}
        
        EOF
        
        oc apply -f ${BASE_DIR}/data/install/pod.yaml
        
        
        # run commands on the pods belongs to both deployments
        
        # Get the list of pod names
        
        pods=$(oc get pods -n $var_namespace -l wzh-run=tinypod-testing -o jsonpath='{.items[*].metadata.name}')
        
        # Loop through each pod and execute the command
        
        # we try to check the ip address and route table
        
        for pod in $pods; do
          echo "Pod: $pod"
          oc exec -it $pod -n $var_namespace -- /bin/sh -c "ip a"
          echo
        done
        
        # Pod: tinypod-01-7899f4c557-wnvd2
        
        # Defaulted container "agnhost-container" out of: agnhost-container, keepalived
        
        # 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        
        #     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        
        #     inet 127.0.0.1/8 scope host lo
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 ::1/128 scope host
        
        #        valid_lft forever preferred_lft forever
        
        # 2: eth0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
        
        #     link/ether 0a:58:0a:86:00:0b brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 10.134.0.11/23 brd 10.134.1.255 scope global eth0
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::858:aff:fe86:b/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # 3: net1@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        
        #     link/ether 72:24:2e:8b:df:a5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 192.168.99.91/24 brd 192.168.99.255 scope global net1
        
        #        valid_lft forever preferred_lft forever
        
        #     inet 192.168.77.100/24 scope global net1
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::7024:2eff:fe8b:dfa5/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # Pod: tinypod-02-65b5989698-q5t5p
        
        # Defaulted container "agnhost-container" out of: agnhost-container, keepalived
        
        # 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        
        #     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        
        #     inet 127.0.0.1/8 scope host lo
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 ::1/128 scope host
        
        #        valid_lft forever preferred_lft forever
        
        # 2: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
        
        #     link/ether 0a:58:0a:85:00:0e brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 10.133.0.14/23 brd 10.133.1.255 scope global eth0
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::858:aff:fe85:e/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # 3: net1@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        
        #     link/ether 62:ec:10:28:3a:c9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 192.168.99.92/24 brd 192.168.99.255 scope global net1
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::60ec:10ff:fe28:3ac9/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        
        # Get the list of pod names
        
        pods=$(oc get pods -n $var_namespace -l wzh-run=tinypod-testing -o jsonpath='{.items[*].metadata.name}')
        
        # Loop through each pod and execute the command
        
        # here is the route table
        
        for pod in $pods; do
          echo "Pod: $pod"
          oc exec -it $pod -n $var_namespace -- /bin/sh -c "ip r"
          echo
        done
        
        # Pod: tinypod-01-7fbb8c856b-n22cd
        
        # Defaulted container "agnhost-container" out of: agnhost-container, keepalived, init-permissions (init)
        
        # default via 192.168.77.1 dev net1
        
        # 10.132.0.0/14 via 10.133.0.1 dev eth0
        
        # 10.133.0.0/23 dev eth0 proto kernel scope link src 10.133.0.19
        
        # 100.64.0.0/16 via 10.133.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.133.0.1 dev eth0
        
        # 192.168.77.0/24 dev net1 proto kernel scope link src 192.168.77.100
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.91
        
        # Pod: tinypod-02-bb499c57-l8tng
        
        # Defaulted container "agnhost-container" out of: agnhost-container, keepalived, init-permissions (init)
        
        # default via 10.134.0.1 dev eth0
        
        # 10.132.0.0/14 via 10.134.0.1 dev eth0
        
        # 10.134.0.0/23 dev eth0 proto kernel scope link src 10.134.0.9
        
        # 100.64.0.0/16 via 10.134.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.134.0.1 dev eth0
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.92
        
        
        # curl http://192.168.77.100:9376
        
        # curl http://192.168.99.91:9376
        
        # curl http://192.168.99.92:9376
        
        # we check the service from the VIP, it gives timestamp, and the pod name which is serving the request.
        
        while true; do
          TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
          RESPONSE=$(curl --max-time 0.05 -s -w "%{http_code}" http://192.168.77.100:9376)
          HTTP_CODE="${RESPONSE: -3}"
          CONTENT="${RESPONSE:0:-3}"
        
          if [ "$HTTP_CODE" -eq 200 ]; then
              echo "$TIMESTAMP - $CONTENT"
          else
              echo "$TIMESTAMP - call failed"
          fi
        
          sleep 1
        done
        
        # after node is cut off power, VIP take 3 seconds to failover to pod-02
        
        # 2024-09-05 23:10:49 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:10:50 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:10:51 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:10:52 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:10:53 - call failed
        
        # 2024-09-05 23:10:54 - call failed
        
        # 2024-09-05 23:10:55 - call failed
        
        # 2024-09-05 23:10:56 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:10:57 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:10:58 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:10:59 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:11:00 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:11:01 - tinypod-02-c788654d4-hlsw5
        
        
        # after node power on, VIP will not back to pod-01
        
        # because the check_ip.sh script logic.
        
        # 2024-09-05 23:12:30 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:12:31 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:12:32 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:12:33 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:12:34 - tinypod-02-c788654d4-hlsw5
        
        
        # after node normal power off, VIP take 1 second to failover to pod-02
        
        # 2024-09-05 23:14:45 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:46 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:47 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:48 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:49 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:50 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:51 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:52 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:53 - call failed
        
        # 2024-09-05 23:14:54 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:14:55 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:14:56 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:14:57 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:14:58 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:14:59 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:00 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:01 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:02 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:03 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:04 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:05 - tinypod-02-c788654d4-hlsw5
        
        # after node normal power on, VIP will not move back to pod-01
        
        # because the check_ip.sh script logic.
        
        # 2024-09-05 23:17:28 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:17:29 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:17:30 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:17:31 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:17:32 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:17:33 - tinypod-01-6fc4fb867-ml4rh
        
        # 2024-09-05 23:17:34 - tinypod-01-6fc4fb867-ml4rh
        
        # 2024-09-05 23:17:35 - tinypod-01-6fc4fb867-ml4rh
        
        # 2024-09-05 23:17:36 - tinypod-01-6fc4fb867-ml4rh
        
        # 2024-09-05 23:17:37 - tinypod-01-6fc4fb867-ml4rh
        
        # 2024-09-05 23:17:38 - tinypod-01-6fc4fb867-ml4rh
        
        # 2024-09-05 23:17:39 - tinypod-01-6fc4fb867-ml4rh
        
        oc get pod -o wide
        
        # NAME                         READY   STATUS        RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
        
        # tinypod-01-6fc4fb867-ml4rh   2/2     Running       0          97s   10.132.0.98   master-01-demo   <none>           <none>
        
        # tinypod-01-6fc4fb867-tcmgx   2/2     Terminating   2          14m   10.134.0.8    worker-02-demo   <none>           <none>
        
        # tinypod-02-c788654d4-hlsw5   2/2     Running       0          14m   10.133.0.24   worker-01-demo   <none>           <none>
        
        
        # on host, run tcpdump, to verify the VIP is working
        
        # run `curl 8.8.8.8` in the master pod, and check the tcpdump output
        
        tcpdump -i br-ocp tcp and dst host 8.8.8.8
        
        # dropped privs to tcpdump
        
        # tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
        
        # listening on br-ocp, link-type EN10MB (Ethernet), snapshot length 262144 bytes
        
        # 07:25:20.189841 IP 192.168.77.100.33636 > dns.google.http: Flags [S], seq 1576966623, win 32120, options [mss 1460,sackOK,TS val 1509135670 ecr 0,nop,wscale 7], length 0
        
        # 07:25:21.229779 IP 192.168.77.100.33636 > dns.google.http: Flags [S], seq 1576966623, win 32120, options [mss 1460,sackOK,TS val 1509136710 ecr 0,nop,wscale 7], length 0
        
        # 07:25:23.277818 IP 192.168.77.100.33636 > dns.google.http: Flags [S], seq 1576966623, win 32120, options [mss 1460,sackOK,TS val 1509138758 ecr 0,nop,wscale 7], length 0
        
        # 07:25:27.309816 IP 192.168.77.100.33636 > dns.google.http: Flags [S], seq 1576966623, win 32120, options [mss 1460,sackOK,TS val 1509142790 ecr 0,nop,wscale 7], length 0

check on node level

The requirement is to failover the VIP to another pod only when the node is down. So we should not check the application endpoint, we will create a simple http server using seperated pod, and check the web server is running or not, so we can know the node is down or not.

Here is the architecture:

ip addresses:


        # create demo pods
        
        # 192.168.77.100 is our VIP
        
        oc delete -f ${BASE_DIR}/data/install/pod.yaml
        
        var_namespace='demo-playground'
        cat << EOF > ${BASE_DIR}/data/install/pod.yaml
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-config
          namespace: $var_namespace
        data:
          keepalived.conf: |
            global_defs {
                log_level 7
                script_user root
                # enable_script_security
            }
            vrrp_script chk_ip {
                script "/etc/keepalived/check_ip.sh"
                interval 2
            }
            vrrp_instance VI_1 {
                state MASTER
                # state BACKUP
                interface net1
                virtual_router_id 51
                priority 100
                advert_int 1
                authentication {
                    auth_type PASS
                    auth_pass 1111
                }
                virtual_ipaddress {
                    192.168.77.100/24 dev net1
                }
                track_interface {
                    net1
                }
                track_script {
                    chk_ip 
                }
                notify_master "/etc/keepalived/notify_master.sh"
                notify_backup "/etc/keepalived/notify_backup.sh"
            }
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-scripts
          namespace: $var_namespace
        data:
          check_ip.sh: |
            #!/bin/sh
            if curl --max-time 0.1 -s http://192.168.99.81:9376 > /dev/null 2>&1 ; then
              exit 0
            else
              exit 1
            fi
          notify_master.sh: |
            #!/bin/sh
            ip route del default
            ip route add default via 192.168.77.1 dev net1
          notify_backup.sh: |
            #!/bin/sh
            ip route del default
            GATEWAY=\$(ip r | grep "10.132.0.0/14" | awk '{print \$3}')
            ip route add default via \$GATEWAY dev eth0   
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-01
          namespace: $var_namespace
          labels:
            app: tinypod-01
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-01
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.91/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-01
                wzh-run: tinypod-testing
            spec:
              # do not run with the same node of tinypod-02
              # be can be run on the same node in extreme cases
              affinity:
                podAntiAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchExpressions:
                        - key: app
                          operator: In
                          values:
                          - tinypod-02
                      topologyKey: "kubernetes.io/hostname"
              serviceAccountName: keepalived-sa
              initContainers:
              - name: init-permissions
                image: docker.io/busybox
                command: ['sh', '-c', 'cp /etc/keepalived/*.sh /tmp/keepalived/ && chmod 755 /tmp/keepalived/*.sh && chown root:root /tmp/keepalived/*.sh']
                volumeMounts:
                - name: keepalived-scripts
                  mountPath: /etc/keepalived
                - name: writable-scripts
                  mountPath: /tmp/keepalived
              containers:
              - name: application-container
                image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                command: [ "/agnhost", "serve-hostname"]
              - name: keepalived
                image: quay.io/wangzheng422/qimgs:keepalived-2024-09-06-v01
                imagePullPolicy: IfNotPresent
                securityContext:
                  # privileged: true
                  # runAsUser: 0
                  capabilities:
                    add: ["NET_ADMIN", "NET_BROADCAST", "NET_RAW"]
                volumeMounts:
                - name: keepalived-config
                  mountPath: /etc/keepalived/keepalived.conf
                  subPath: keepalived.conf
                - name: writable-scripts
                  mountPath: /etc/keepalived
              volumes:
              - name: keepalived-config
                configMap:
                  name: keepalived-config
              - name: keepalived-scripts
                configMap:
                  name: keepalived-scripts
              - name: writable-scripts
                emptyDir: {}
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-01-check
          namespace: $var_namespace
          labels:
            app: tinypod-01-check
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-01-check
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.81/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-01-check
                wzh-run: tinypod-testing
            spec:
              # run with tinypod-01 app
              affinity:
                podAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                      - key: app
                        operator: In
                        values:
                        - tinypod-01
                    topologyKey: "kubernetes.io/hostname"
              containers:
              - name: endpoint-container
                image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                command: [ "/agnhost", "serve-hostname"]
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-config-backup
          namespace: $var_namespace
        data:
          keepalived.conf: |
            global_defs {
                log_level 7
                script_user root
                # enable_script_security
            }
            vrrp_script chk_ip {
                script "/etc/keepalived/check_ip.sh"
                interval 2
                weight +20
            }
            vrrp_instance VI_1 {
                state BACKUP
                interface net1
                virtual_router_id 51
                # priority should be lower than master
                # but we do not want to fail back
                priority 90
                advert_int 1
                authentication {
                    auth_type PASS
                    auth_pass 1111
                }
                virtual_ipaddress {
                    192.168.77.100/24 dev net1
                }
                track_interface {
                    net1
                }
                track_script {
                    chk_ip
                }
                notify_master "/etc/keepalived/notify_master.sh"
                notify_backup "/etc/keepalived/notify_backup.sh"
            }
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-backup-scripts
          namespace: $var_namespace
        data:
          check_ip.sh: |
            #!/bin/sh
        
            # Define the local file to record failure
            FAILURE_RECORD_FILE="/tmp/failure_record.txt"
        
            # ourself is ok?
            if curl --max-time 0.1 -s http://192.168.99.82:9376 > /dev/null 2>&1 ; then
              # exit 0
              # continue, only ourself is ok.
              :
            else
              # return Failure (no change in weight)
              # will not be the master
              /bin/rm -f "\$FAILURE_RECORD_FILE"
              exit 1
            fi
            
            # Check if the failure record file exists
            # if so, we will still be the master
            if [ -f "\$FAILURE_RECORD_FILE" ]; then
              exit 0  # return Success (will add weight)
            fi
        
            # if peer fail, we should add weight
            if curl --max-time 0.1 -s http://192.168.99.81:9376 > /dev/null 2>&1 ; then
              # exit 0
              exit 1  # curl ok, return Failure (no change in weight)
            else
              # exit 1
              # Record the failure by creating the file
              touch "\$FAILURE_RECORD_FILE"
              exit 0  # curl fail, return Success (will add weight)
            fi
          notify_master.sh: |
            #!/bin/sh
            ip route del default
            ip route add default via 192.168.77.1 dev net1
          notify_backup.sh: |
            #!/bin/sh
            ip route del default
            GATEWAY=\$(ip r | grep "10.132.0.0/14" | awk '{print \$3}')
            ip route add default via \$GATEWAY dev eth0 
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-02
          namespace: $var_namespace
          labels:
            app: tinypod-02
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-02
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.92/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-02
                wzh-run: tinypod-testing
            spec:
              affinity:
                podAntiAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchExpressions:
                        - key: app
                          operator: In
                          values:
                          - tinypod-01
                      topologyKey: "kubernetes.io/hostname"
              serviceAccountName: keepalived-sa
              initContainers:
              - name: init-permissions
                image: docker.io/busybox
                command: ['sh', '-c', 'cp /etc/keepalived/*.sh /tmp/keepalived/ && chmod 755 /tmp/keepalived/*.sh && chown root:root /tmp/keepalived/*.sh']
                volumeMounts:
                - name: keepalived-scripts
                  mountPath: /etc/keepalived
                - name: writable-scripts
                  mountPath: /tmp/keepalived
              containers:
              - name: application-container
                image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                command: [ "/agnhost", "serve-hostname"]
              - name: keepalived
                image: quay.io/wangzheng422/qimgs:keepalived-2024-09-06-v01
                imagePullPolicy: IfNotPresent
                securityContext:
                  # privileged: true
                  # runAsUser: 0
                  capabilities:
                    add: ["NET_ADMIN", "NET_BROADCAST", "NET_RAW"]
                volumeMounts:
                - name: keepalived-config
                  mountPath: /etc/keepalived/keepalived.conf
                  subPath: keepalived.conf
                - name: writable-scripts
                  mountPath: /etc/keepalived
              volumes:
              - name: keepalived-config
                configMap:
                  name: keepalived-config-backup
              - name: keepalived-scripts
                configMap:
                  name: keepalived-backup-scripts
              - name: writable-scripts
                emptyDir: {}
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-02-check
          namespace: $var_namespace
          labels:
            app: tinypod-02-check
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-02-check
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.82/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-02-check
                wzh-run: tinypod-testing
            spec:
              # run with tinypod-02 app
              affinity:
                podAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                      - key: app
                        operator: In
                        values:
                        - tinypod-02
                    topologyKey: "kubernetes.io/hostname"
              containers:
              - name: endpoint-container
                image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                command: [ "/agnhost", "serve-hostname"]
        EOF
        
        oc apply -f ${BASE_DIR}/data/install/pod.yaml
        
        
        # run commands on the pods belongs to both deployments
        
        # Get the list of pod names
        
        pods=$(oc get pods -n $var_namespace -l wzh-run=tinypod-testing -o jsonpath='{.items[*].metadata.name}')
        
        # Loop through each pod and execute the command
        
        # we try to check the ip address and route table
        
        for pod in $pods; do
          echo "Pod: $pod"
          oc exec -it $pod -n $var_namespace -- /bin/sh -c "ip a"
          echo
        done
        
        # Pod: tinypod-01-974b4cc84-x9rsz
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        
        #     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        
        #     inet 127.0.0.1/8 scope host lo
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 ::1/128 scope host
        
        #        valid_lft forever preferred_lft forever
        
        # 2: eth0@if26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
        
        #     link/ether 0a:58:0a:86:00:15 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 10.134.0.21/23 brd 10.134.1.255 scope global eth0
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::858:aff:fe86:15/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # 3: net1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        
        #     link/ether 36:70:c0:f8:8d:07 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 192.168.99.91/24 brd 192.168.99.255 scope global net1
        
        #        valid_lft forever preferred_lft forever
        
        #     inet 192.168.77.100/24 scope global net1
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::3470:c0ff:fef8:8d07/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # Pod: tinypod-01-check-668b5d9498-ncmk5
        
        # 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        
        #     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        
        #     inet 127.0.0.1/8 scope host lo
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 ::1/128 scope host
        
        #        valid_lft forever preferred_lft forever
        
        # 2: eth0@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
        
        #     link/ether 0a:58:0a:86:00:16 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 10.134.0.22/23 brd 10.134.1.255 scope global eth0
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::858:aff:fe86:16/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # 3: net1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        
        #     link/ether ae:82:93:bd:38:f2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 192.168.99.81/24 brd 192.168.99.255 scope global net1
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::ac82:93ff:febd:38f2/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # Pod: tinypod-02-97d4bfd8b-kxsbn
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        
        #     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        
        #     inet 127.0.0.1/8 scope host lo
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 ::1/128 scope host
        
        #        valid_lft forever preferred_lft forever
        
        # 2: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
        
        #     link/ether 0a:58:0a:85:00:0f brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 10.133.0.15/23 brd 10.133.1.255 scope global eth0
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::858:aff:fe85:f/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # 3: net1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        
        #     link/ether 5a:2e:41:7c:8c:1c brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 192.168.99.92/24 brd 192.168.99.255 scope global net1
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::582e:41ff:fe7c:8c1c/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # Pod: tinypod-02-check-645b69c854-6nk9r
        
        # 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        
        #     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        
        #     inet 127.0.0.1/8 scope host lo
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 ::1/128 scope host
        
        #        valid_lft forever preferred_lft forever
        
        # 2: eth0@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
        
        #     link/ether 0a:58:0a:85:00:11 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 10.133.0.17/23 brd 10.133.1.255 scope global eth0
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::858:aff:fe85:11/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        # 3: net1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        
        #     link/ether 2e:5e:cb:96:ee:20 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        
        #     inet 192.168.99.82/24 brd 192.168.99.255 scope global net1
        
        #        valid_lft forever preferred_lft forever
        
        #     inet6 fe80::2c5e:cbff:fe96:ee20/64 scope link
        
        #        valid_lft forever preferred_lft forever
        
        
        
        # Get the list of pod names
        
        pods=$(oc get pods -n $var_namespace -l wzh-run=tinypod-testing -o jsonpath='{.items[*].metadata.name}')
        
        # Loop through each pod and execute the command
        
        # here is the route table
        
        for pod in $pods; do
          echo "Pod: $pod"
          oc exec -it $pod -n $var_namespace -- /bin/sh -c "ip r"
          echo
        done
        
        # Pod: tinypod-01-974b4cc84-x9rsz
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # default via 192.168.77.1 dev net1
        
        # 10.132.0.0/14 via 10.134.0.1 dev eth0
        
        # 10.134.0.0/23 dev eth0 proto kernel scope link src 10.134.0.21
        
        # 100.64.0.0/16 via 10.134.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.134.0.1 dev eth0
        
        # 192.168.77.0/24 dev net1 proto kernel scope link src 192.168.77.100
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.91
        
        # Pod: tinypod-01-check-668b5d9498-ncmk5
        
        # default via 10.134.0.1 dev eth0
        
        # 10.132.0.0/14 via 10.134.0.1 dev eth0
        
        # 10.134.0.0/23 dev eth0 proto kernel scope link src 10.134.0.22
        
        # 100.64.0.0/16 via 10.134.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.134.0.1 dev eth0
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.81
        
        # Pod: tinypod-02-97d4bfd8b-kxsbn
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # default via 10.133.0.1 dev eth0
        
        # 10.132.0.0/14 via 10.133.0.1 dev eth0
        
        # 10.133.0.0/23 dev eth0 proto kernel scope link src 10.133.0.15
        
        # 100.64.0.0/16 via 10.133.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.133.0.1 dev eth0
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.92
        
        # Pod: tinypod-02-check-645b69c854-6nk9r
        
        # default via 10.133.0.1 dev eth0
        
        # 10.132.0.0/14 via 10.133.0.1 dev eth0
        
        # 10.133.0.0/23 dev eth0 proto kernel scope link src 10.133.0.17
        
        # 100.64.0.0/16 via 10.133.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.133.0.1 dev eth0
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.82
        
        # curl http://192.168.77.100:9376
        
        # curl http://192.168.99.91:9376
        
        # curl http://192.168.99.92:9376
        
        # we check the service from the VIP, it gives timestamp, and the pod name which is serving the request.
        
        while true; do
          TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
          RESPONSE=$(curl --max-time 0.05 -s -w "%{http_code}" http://192.168.77.100:9376)
          HTTP_CODE="${RESPONSE: -3}"
          CONTENT="${RESPONSE:0:-3}"
        
          if [ "$HTTP_CODE" -eq 200 ]; then
              echo "$TIMESTAMP - $CONTENT"
          else
              echo "$TIMESTAMP - call failed"
          fi
        
          sleep 1
        done
        
        # after node is cut off power, VIP take 3 seconds to failover to pod-02
        
        # 2024-09-05 23:10:49 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:10:50 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:10:51 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:10:52 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:10:53 - call failed
        
        # 2024-09-05 23:10:54 - call failed
        
        # 2024-09-05 23:10:55 - call failed
        
        # 2024-09-05 23:10:56 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:10:57 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:10:58 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:10:59 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:11:00 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:11:01 - tinypod-02-c788654d4-hlsw5
        
        
        # after node power on, VIP will not back to pod-01
        
        # because the check_ip.sh script logic.
        
        # 2024-09-05 23:12:30 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:12:31 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:12:32 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:12:33 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:12:34 - tinypod-02-c788654d4-hlsw5
        
        
        # after node normal power off, VIP take 1 second to failover to pod-02
        
        # 2024-09-05 23:14:45 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:46 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:47 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:48 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:49 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:50 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:51 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:52 - tinypod-01-6fc4fb867-tcmgx
        
        # 2024-09-05 23:14:53 - call failed
        
        # 2024-09-05 23:14:54 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:14:55 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:14:56 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:14:57 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:14:58 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:14:59 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:00 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:01 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:02 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:03 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:04 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:15:05 - tinypod-02-c788654d4-hlsw5
        
        # after node normal power on, VIP will not back to pod-01
        
        # because the check_ip.sh script logic.
        
        # 2024-09-05 23:17:28 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:17:29 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:17:30 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:17:31 - tinypod-02-c788654d4-hlsw5
        
        # 2024-09-05 23:17:32 - tinypod-02-c788654d4-hlsw5
        

do not start app on backup node

The client has further new requirements:

  1. The app should not start on the backup node; it should only start after switching to the master node.
  2. The pod can have multiple VIPs, with each VIP maintained by a separate VRRP ID.

These requirements are not complicated; we just need to use more techniques within the same architecture:

  1. Use a script as a daemon process to start the app through this daemon.
  2. The daemon monitors certain switch files; if there is a switch file, it starts the app, and if there is no switch file, it shuts down the app.
  3. In Keepalived, configure multiple VRRP instances.

However, there are some shortcomings in this solution:

  1. The logs of the daemon process and those of the app will be mixed together in stdout, so other logging solutions may need to be considered further.


        
        # create demo pods
        
        # 192.168.77.100 , 192.168.88.100 is our VIP
        
        var_namespace='demo-playground'
        
        # create master pod
        
        oc delete -f ${BASE_DIR}/data/install/pod.yaml
        
        cat << EOF > ${BASE_DIR}/data/install/pod.yaml
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-config
          namespace: $var_namespace
        data:
          keepalived.conf: |
            global_defs {
                log_level 7
                script_user root
                # enable_script_security
            }
            vrrp_script chk_ip {
                script "/etc/keepalived/check_ip.sh"
                interval 2
            }
            vrrp_instance VI_1 {
                state MASTER
                interface net1
                virtual_router_id 51
                priority 100
                advert_int 1
                authentication {
                    auth_type PASS
                    auth_pass 1111
                }
                virtual_ipaddress {
                    192.168.77.100/24 dev net1
                }
                track_interface {
                    net1
                }
                track_script {
                    chk_ip 
                }
                notify_master "/etc/keepalived/notify_master_vi_1.sh"
                notify_backup "/etc/keepalived/notify_backup_vi_1.sh"
            }
            vrrp_instance VI_2 {
                state MASTER
                interface net1
                virtual_router_id 52
                priority 100
                advert_int 1
                authentication {
                    auth_type PASS
                    auth_pass 2222
                }
                virtual_ipaddress {
                    192.168.88.100/24 dev net1
                }
                track_interface {
                    net1
                }
                track_script {
                    chk_ip 
                }
                notify_master "/etc/keepalived/notify_master_vi_2.sh"
                notify_backup "/etc/keepalived/notify_backup_vi_2.sh"
            }
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-scripts
          namespace: $var_namespace
        data:
          check_ip.sh: |
            #!/bin/sh
            if curl --max-time 0.1 -s http://192.168.99.81:9376 > /dev/null 2>&1 ; then
              exit 0
            else
              exit 1
            fi
          notify_master_vi_1.sh: |
            #!/bin/sh
            ip route del default
            ip route add default via 192.168.77.1 dev net1
        
            # create switch file, to trigger app start
            touch /mnt/switch/77
          notify_backup_vi_1.sh: |
            #!/bin/sh
            ip route del default
            GATEWAY=\$(ip r | grep "10.132.0.0/14" | awk '{print \$3}')
            ip route add default via \$GATEWAY dev eth0   
        
            # remove switch file, to trigger app stop
            rm /mnt/switch/77
          notify_master_vi_2.sh: |
            #!/bin/sh
        
            # create switch file, to trigger app start
            touch /mnt/switch/88
          notify_backup_vi_2.sh: |
            #!/bin/sh
        
            # remove switch file, to trigger app stop
            rm /mnt/switch/88
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: app-start-script
        data:
          startup.sh: |
            #!/bin/bash
        
            # Directory containing the switch files
            SWITCH_DIR="/mnt/switch"
            # List of switch files to monitor
            SWITCH_FILES=("88" "77")
        
            # Infinite loop
            while true; do
        
              # Check if agnhost process is running
              AGNHOST_PID=\$(pgrep -f "agnhost serve-hostname")
        
              # Flag to check if all switch files exist
              all_files_exist=true
        
              # Check each switch file
              for file in "\${SWITCH_FILES[@]}"; do
                if [[ ! -f "\$SWITCH_DIR/\$file" ]]; then
                  all_files_exist=false
                  break
                fi
              done
        
              if \$all_files_exist; then
                echo "All switch files found."
                # Check if agnhost process is running
                if [[ -z "\$AGNHOST_PID" ]]; then
                  # Start agnhost in the background if not running
                  echo "Starting agnhost..."
                  /agnhost serve-hostname &
                else
                  echo "agnhost is already running."
                fi
              else
                echo "Not all switch files found."
                # Find and terminate the agnhost process
                if [[ -n "\$AGNHOST_PID" ]]; then
                  kill -9 "\$AGNHOST_PID"
                  echo "agnhost process terminated."
                else
                  echo "agnhost process not running."
                fi
              fi
        
              # Sleep for 1 second
              echo "Sleeping for 1 second..."
              sleep 1
              
            done
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-01
          namespace: $var_namespace
          labels:
            app: tinypod-01
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-01
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.91/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-01
                wzh-run: tinypod-testing
            spec:
              # do not run with the same node of tinypod-02
              # be can be run on the same node in extreme cases
              affinity:
                podAntiAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchExpressions:
                        - key: app
                          operator: In
                          values:
                          - tinypod-02
                      topologyKey: "kubernetes.io/hostname"
              serviceAccountName: keepalived-sa
              initContainers:
              - name: init-permissions
                image: docker.io/busybox
                command: ['sh', '-c', 'cp /etc/keepalived/*.sh /tmp/keepalived/ && chmod 755 /tmp/keepalived/*.sh && chown root:root /tmp/keepalived/*.sh']
                volumeMounts:
                - name: keepalived-scripts
                  mountPath: /etc/keepalived
                - name: writable-scripts
                  mountPath: /tmp/keepalived
              containers:
              - name: application-container
                image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                command: [ "/bin/bash", "/mnt/scripts/startup.sh"]
                volumeMounts:
                - name: share-switch
                  mountPath: /mnt/switch
                - name: app-start-script
                  mountPath: /mnt/scripts
              - name: keepalived
                image: quay.io/wangzheng422/qimgs:keepalived-2024-09-06-v01
                imagePullPolicy: IfNotPresent
                securityContext:
                  # privileged: true
                  # runAsUser: 0
                  capabilities:
                    add: ["NET_ADMIN", "NET_BROADCAST", "NET_RAW"]
                volumeMounts:
                - name: keepalived-config
                  mountPath: /etc/keepalived/keepalived.conf
                  subPath: keepalived.conf
                - name: writable-scripts
                  mountPath: /etc/keepalived
                - name: share-switch
                  mountPath: /mnt/switch
              volumes:
              - name: keepalived-config
                configMap:
                  name: keepalived-config
              - name: keepalived-scripts
                configMap:
                  name: keepalived-scripts
              - name: writable-scripts
                emptyDir: {}
              - name: share-switch
                emptyDir: {}
              - name: app-start-script
                configMap:
                  name: app-start-script
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-01-check
          namespace: $var_namespace
          labels:
            app: tinypod-01-check
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-01-check
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.81/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-01-check
                wzh-run: tinypod-check
            spec:
              # run with tinypod-01 app
              affinity:
                podAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                      - key: app
                        operator: In
                        values:
                        - tinypod-01
                    topologyKey: "kubernetes.io/hostname"
              containers:
              - name: endpoint-container
                image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                command: [ "/agnhost", "serve-hostname"]
        EOF
        
        oc apply -f ${BASE_DIR}/data/install/pod.yaml
        
        
        # get the log of app container using label
        
        oc logs -n $var_namespace -l app=tinypod-01 -c application-container
        
        # Sleeping for 1 second...
        
        # Not all switch files found.
        
        # agnhost process not running.
        
        # Sleeping for 1 second...
        
        # Not all switch files found.
        
        # agnhost process not running.
        
        # Sleeping for 1 second...
        
        # All switch files found.
        
        # Starting agnhost...
        
        # I0922 04:15:22.582473      16 log.go:198] Serving on port 9376.
        
        
        # create backup pod
        
        oc delete -f ${BASE_DIR}/data/install/pod-02.yaml
        
        cat << EOF > ${BASE_DIR}/data/install/pod-02.yaml
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-config-backup
          namespace: $var_namespace
        data:
          keepalived.conf: |
            global_defs {
                log_level 7
                script_user root
                # enable_script_security
            }
            vrrp_script chk_ip {
                script "/etc/keepalived/check_ip.sh"
                interval 2
                weight +20
            }
            vrrp_instance VI_1 {
                state BACKUP
                interface net1
                virtual_router_id 51
                # priority should be lower than master
                # but we do not want to fail back
                # so we will increase weight to 110
                # if master failed, we will take over
                priority 90
                advert_int 1
                authentication {
                    auth_type PASS
                    auth_pass 1111
                }
                virtual_ipaddress {
                    192.168.77.100/24 dev net1
                }
                track_interface {
                    net1
                }
                track_script {
                    chk_ip
                }
                notify_master "/etc/keepalived/notify_master_vi_1.sh"
                notify_backup "/etc/keepalived/notify_backup_vi_1.sh"
            }
            vrrp_instance VI_2 {
                state BACKUP
                interface net1
                virtual_router_id 52
                priority 90
                advert_int 1
                authentication {
                    auth_type PASS
                    auth_pass 2222
                }
                virtual_ipaddress {
                    192.168.88.100/24 dev net1
                }
                track_interface {
                    net1
                }
                track_script {
                    chk_ip 
                }
                notify_master "/etc/keepalived/notify_master_vi_2.sh"
                notify_backup "/etc/keepalived/notify_backup_vi_2.sh"
            }
        ---
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: keepalived-backup-scripts
          namespace: $var_namespace
        data:
          check_ip.sh: |
            #!/bin/sh
        
            # Define the local file to record failure
            FAILURE_RECORD_FILE="/tmp/failure_record.txt"
        
            # ourself is ok?
            if curl --max-time 0.1 -s http://192.168.99.82:9376 > /dev/null 2>&1 ; then
              # exit 0
              # continue, only ourself is ok.
              :
            else
              # return Failure (no change in weight)
              # will not be the master
              /bin/rm -f "\$FAILURE_RECORD_FILE"
              exit 1
            fi
            
            # Check if the failure record file exists
            # if so, we will still be the master
            if [ -f "\$FAILURE_RECORD_FILE" ]; then
              exit 0  # return Success (will add weight)
            fi
        
            # if peer fail, we should add weight
            if curl --max-time 0.1 -s http://192.168.99.81:9376 > /dev/null 2>&1 ; then
              # exit 0
              exit 1  # curl ok, return Failure (no change in weight)
            else
              # exit 1
              # Record the failure by creating the file
              touch "\$FAILURE_RECORD_FILE"
              exit 0  # curl fail, return Success (will add weight)
            fi
          notify_master_vi_1.sh: |
            #!/bin/sh
            ip route del default
            ip route add default via 192.168.77.1 dev net1
        
            # create switch file, to trigger app start
            touch /mnt/switch/77
          notify_backup_vi_1.sh: |
            #!/bin/sh
            ip route del default
            GATEWAY=\$(ip r | grep "10.132.0.0/14" | awk '{print \$3}')
            ip route add default via \$GATEWAY dev eth0   
        
            # remove switch file, to trigger app stop
            rm /mnt/switch/77
          notify_master_vi_2.sh: |
            #!/bin/sh
        
            # create switch file, to trigger app start
            touch /mnt/switch/88
          notify_backup_vi_2.sh: |
            #!/bin/sh
        
            # remove switch file, to trigger app stop
            rm /mnt/switch/88
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-02
          namespace: $var_namespace
          labels:
            app: tinypod-02
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-02
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.92/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-02
                wzh-run: tinypod-testing
            spec:
              affinity:
                podAntiAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchExpressions:
                        - key: app
                          operator: In
                          values:
                          - tinypod-01
                      topologyKey: "kubernetes.io/hostname"
              serviceAccountName: keepalived-sa
              initContainers:
              - name: init-permissions
                image: docker.io/busybox
                command: ['sh', '-c', 'cp /etc/keepalived/*.sh /tmp/keepalived/ && chmod 755 /tmp/keepalived/*.sh && chown root:root /tmp/keepalived/*.sh']
                volumeMounts:
                - name: keepalived-scripts
                  mountPath: /etc/keepalived
                - name: writable-scripts
                  mountPath: /tmp/keepalived
              containers:
              - name: application-container
                image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                command: [ "/bin/bash", "/mnt/scripts/startup.sh"]
                volumeMounts:
                - name: share-switch
                  mountPath: /mnt/switch
                - name: app-start-script
                  mountPath: /mnt/scripts
              - name: keepalived
                image: quay.io/wangzheng422/qimgs:keepalived-2024-09-06-v01
                imagePullPolicy: IfNotPresent
                securityContext:
                  # privileged: true
                  # runAsUser: 0
                  capabilities:
                    add: ["NET_ADMIN", "NET_BROADCAST", "NET_RAW"]
                volumeMounts:
                - name: keepalived-config
                  mountPath: /etc/keepalived/keepalived.conf
                  subPath: keepalived.conf
                - name: writable-scripts
                  mountPath: /etc/keepalived
                - name: share-switch
                  mountPath: /mnt/switch
              volumes:
              - name: keepalived-config
                configMap:
                  name: keepalived-config-backup
              - name: keepalived-scripts
                configMap:
                  name: keepalived-backup-scripts
              - name: writable-scripts
                emptyDir: {}
              - name: share-switch
                emptyDir: {}
              - name: app-start-script
                configMap:
                  name: app-start-script
        ---
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: tinypod-02-check
          namespace: $var_namespace
          labels:
            app: tinypod-02-check
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: tinypod-02-check
          template:
            metadata:
              annotations:
                k8s.v1.cni.cncf.io/networks: '[
                  {
                    "name": "$var_namespace-macvlan", 
                    "_mac": "02:03:04:05:06:07", 
                    "_interface": "myiface1", 
                    "ips": [
                      "192.168.99.82/24"
                      ] 
                  }
                ]'
              labels:
                app: tinypod-02-check
                wzh-run: tinypod-check
            spec:
              # run with tinypod-02 app
              affinity:
                podAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                      - key: app
                        operator: In
                        values:
                        - tinypod-02
                    topologyKey: "kubernetes.io/hostname"
              containers:
              - name: endpoint-container
                image: registry.k8s.io/e2e-test-images/agnhost:2.43
                imagePullPolicy: IfNotPresent
                command: [ "/agnhost", "serve-hostname"]
        EOF
        
        oc apply -f ${BASE_DIR}/data/install/pod-02.yaml
        
        # get the log of app container using label
        
        oc logs -n $var_namespace -l app=tinypod-01 -c application-container
        
        # Sleeping for 1 second...
        
        # Not all switch files found.
        
        # agnhost process not running.
        
        # Sleeping for 1 second...
        
        # Not all switch files found.
        
        # agnhost process not running.
        
        # Sleeping for 1 second...
        
        # All switch files found.
        
        # Starting agnhost...
        
        # I0922 04:15:22.582473      16 log.go:198] Serving on port 9376.
        
        oc logs -n $var_namespace -l app=tinypod-02 -c application-container
        
        # Sleeping for 1 second...
        
        # Not all switch files found.
        
        # agnhost process not running.
        
        # Sleeping for 1 second...
        
        # Not all switch files found.
        
        # agnhost process not running.
        
        # Sleeping for 1 second...
        
        # Not all switch files found.
        
        # agnhost process not running.
        
        # Sleeping for 1 second...
        
        # Get the list of pod names
        
        pods=$(oc get pods -n $var_namespace -l wzh-run=tinypod-testing -o jsonpath='{.items[*].metadata.name}')
        
        # Loop through each pod and execute the command
        
        # here is the route table
        
        for pod in $pods; do
          echo "Pod: $pod"
          oc exec -it $pod -n $var_namespace -- /bin/sh -c "ip r"
          echo
        done
        
        # Pod: tinypod-01-779d86fd54-xn4hb
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # default via 192.168.77.1 dev net1
        
        # 10.132.0.0/14 via 10.133.0.1 dev eth0
        
        # 10.133.0.0/23 dev eth0 proto kernel scope link src 10.133.0.13
        
        # 100.64.0.0/16 via 10.133.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.133.0.1 dev eth0
        
        # 192.168.77.0/24 dev net1 proto kernel scope link src 192.168.77.100
        
        # 192.168.88.0/24 dev net1 proto kernel scope link src 192.168.88.100
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.91
        
        # Pod: tinypod-02-7d44467c79-zwk4h
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # default via 10.134.0.1 dev eth0
        
        # 10.132.0.0/14 via 10.134.0.1 dev eth0
        
        # 10.134.0.0/23 dev eth0 proto kernel scope link src 10.134.0.7
        
        # 100.64.0.0/16 via 10.134.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.134.0.1 dev eth0
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.92
        
        
        # get ps -ef result of all app container using lable
        
        # Get the list of pods with the label tinypod
        
        pods=$(oc get pods -n $var_namespace -l wzh-run=tinypod-testing -o jsonpath='{.items[*].metadata.name}')
        
        # Iterate over each pod and execute the ps -ef command inside the container
        
        for pod in $pods; do
          echo "ps -ef result for pod: $pod"
          oc exec $pod -n $var_namespace -- ip r
          oc exec -n $var_namespace $pod -- ps -ef
          oc exec -n $var_namespace $pod -- ls /mnt/switch
          echo "-----------------------------------"
        done
        
        # ps -ef result for pod: tinypod-01-779d86fd54-fqrnp
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # default via 192.168.77.1 dev net1
        
        # 10.132.0.0/14 via 10.133.0.1 dev eth0
        
        # 10.133.0.0/23 dev eth0 proto kernel scope link src 10.133.0.25
        
        # 100.64.0.0/16 via 10.133.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.133.0.1 dev eth0
        
        # 192.168.77.0/24 dev net1 proto kernel scope link src 192.168.77.100
        
        # 192.168.88.0/24 dev net1 proto kernel scope link src 192.168.88.100
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.91
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # PID   USER     TIME  COMMAND
        
        #     1 root      0:00 /bin/bash /mnt/scripts/startup.sh
        
        #   214 root      0:00 /agnhost serve-hostname
        
        #   517 root      0:00 sleep 1
        
        #   518 root      0:00 ps -ef
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # 77
        
        # 88
        
        # -----------------------------------
        
        # ps -ef result for pod: tinypod-02-7d44467c79-8xpvz
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # default via 10.134.0.1 dev eth0
        
        # 10.132.0.0/14 via 10.134.0.1 dev eth0
        
        # 10.134.0.0/23 dev eth0 proto kernel scope link src 10.134.0.16
        
        # 100.64.0.0/16 via 10.134.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.134.0.1 dev eth0
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.92
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # PID   USER     TIME  COMMAND
        
        #     1 root      0:00 /bin/bash /mnt/scripts/startup.sh
        
        #   476 root      0:00 sleep 1
        
        #   483 root      0:00 ps -ef
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # -----------------------------------
        
        
        # scale deployment tinypod-01-check to 0
        
        oc scale deployment tinypod-01-check --replicas=0 -n $var_namespace
        oc scale deployment tinypod-02-check --replicas=1 -n $var_namespace
        
        
        # or scale deployment tinypod-02-check to 0
        
        oc scale deployment tinypod-02-check --replicas=0 -n $var_namespace
        oc scale deployment tinypod-01-check --replicas=1 -n $var_namespace
        
        # scale check deployment to 1
        
        oc scale deployment tinypod-01-check --replicas=1 -n $var_namespace
        oc scale deployment tinypod-02-check --replicas=1 -n $var_namespace
        
        
        
        
        # get ps -ef result of all app container using lable
        
        # Get the list of pods with the label tinypod
        
        pods=$(oc get pods -n $var_namespace -l wzh-run=tinypod-testing -o jsonpath='{.items[*].metadata.name}')
        
        # Iterate over each pod and execute the ps -ef command inside the container
        
        for pod in $pods; do
          echo "ps -ef result for pod: $pod"
          oc exec $pod -n $var_namespace -- ip r
          oc exec -n $var_namespace $pod -- ps -ef
          oc exec -n $var_namespace $pod -- ls /mnt/switch
          echo "-----------------------------------"
        done
        
        # ps -ef result for pod: tinypod-01-779d86fd54-fqrnp
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # default via 10.133.0.1 dev eth0
        
        # 10.132.0.0/14 via 10.133.0.1 dev eth0
        
        # 10.133.0.0/23 dev eth0 proto kernel scope link src 10.133.0.25
        
        # 100.64.0.0/16 via 10.133.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.133.0.1 dev eth0
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.91
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # PID   USER     TIME  COMMAND
        
        #     1 root      0:00 /bin/bash /mnt/scripts/startup.sh
        
        #   815 root      0:00 sleep 1
        
        #   816 root      0:00 ps -ef
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # -----------------------------------
        
        # ps -ef result for pod: tinypod-02-7d44467c79-8xpvz
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # default via 192.168.77.1 dev net1
        
        # 10.132.0.0/14 via 10.134.0.1 dev eth0
        
        # 10.134.0.0/23 dev eth0 proto kernel scope link src 10.134.0.16
        
        # 100.64.0.0/16 via 10.134.0.1 dev eth0
        
        # 172.22.0.0/16 via 10.134.0.1 dev eth0
        
        # 192.168.77.0/24 dev net1 proto kernel scope link src 192.168.77.100
        
        # 192.168.88.0/24 dev net1 proto kernel scope link src 192.168.88.100
        
        # 192.168.99.0/24 dev net1 proto kernel scope link src 192.168.99.92
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # PID   USER     TIME  COMMAND
        
        #     1 root      0:00 /bin/bash /mnt/scripts/startup.sh
        
        #   582 root      0:00 /agnhost serve-hostname
        
        #   785 root      0:00 sleep 1
        
        #   792 root      0:00 ps -ef
        
        # Defaulted container "application-container" out of: application-container, keepalived, init-permissions (init)
        
        # 77
        
        # 88
        
        # -----------------------------------
        

end

demo application image

We need an application container image to demo, we need it restart fast, so we can test the keepalived failover quickly.


        mkdir -p /data/caddy
        cd /data/caddy
        
        echo "wzh hello world" > index.html
        
        cat << EOF > Caddyfile
        :9376 {
                # Set this path to your site's directory.
                root * /usr/share/caddy
        
                # Enable the static file server.
                file_server
        
        }
        EOF
        
        cat << EOF > Dockerfile
        FROM docker.io/caddy:alpine
        
        COPY index.html /usr/share/caddy/index.html
        
        COPY Caddyfile /etc/caddy/Caddyfile
        
        EOF
        
        podman build -t quay.io/wangzheng422/qimgs:caddy-2024-09-09-v02 .
        
        podman push quay.io/wangzheng422/qimgs:caddy-2024-09-09-v02

bottom