← Back to Index

OCP 4.19 Agent-based Installation (ABI): Deep Dive and Solutions for Adding API / Ingress VIPs

In an OpenShift 4.19 (ABI) bare-metal environment, the management logic for API VIPs and Ingress VIPs is designed with high atomicity and immutability. This document provides a comprehensive analysis of the underlying configuration management, from low-level files to Operator reconciliation loops, and offers a robust solution for extending VIPs post-installation.


1. Core Analysis: Why Manual Modifications Fail

In a Bare Metal IPI/ABI architecture, VIPs are not mere static configurations but are part of a multi-layered, declarative state managed by the cluster.

  • Filesystem Level: /etc/keepalived/keepalived.conf is continuously monitored and overwritten by the baremetal-runtimecfg container.
  • Manifest Level: /etc/kubernetes/manifests/keepalived.yaml is under the strict supervision of the Cluster Network Operator (CNO). Any change in the file’s hash triggers a near-instant revert.
  • Resource Level: The status field of the Infrastructure resource is locked by the Cluster Version Operator (CVO). Since this involves the security of the API Server’s certificates (SAN list), the system prohibits dynamic changes to the management VIPs post-installation.

2. Deep Dive: Source-Level Analysis of keepalived.yaml

Analyzing the definition of /etc/kubernetes/manifests/keepalived.yaml on a Master node reveals the hierarchical management logic that ensures manual changes are always prioritized for reconciliation.

2.1 Static Pod Manifest Snippet

# The following definition determines how VIPs are announced on the node
initContainers:
  - name: render-config-keepalived
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d27...
    command:
    - runtimecfg
    - render
    - "/etc/kubernetes/kubeconfig"
    - "--api-vips"
    - "192.168.99.21"  # <--- Source of truth hardcoded at installation time
    - "--ingress-vips"
    - "192.168.99.22"
containers:
  - name: keepalived
    command:
    - /bin/bash
    - -c
    - |
      # The startup logic includes cleanup for any stale VIPs
      remove_vip "192.168.99.21"
      remove_vip "192.168.99.22"
      /usr/sbin/keepalived -f /etc/keepalived/keepalived.conf --dont-fork --vrrp ...
  - name: keepalived-monitor
    command:
    - /bin/bash
    - -c
    - |
      # Runtime dynamic maintainer
      api_vips=192.168.99.21
      ingress_vips=192.168.99.22
      dynkeepalived /var/lib/kubelet/kubeconfig /config/keepalived.conf.tmpl /etc/keepalived/keepalived.conf --api-vips "${api_vips}" --ingress-vips "${ingress_vips}"

Conclusion: The file’s integrity is verified by the Cluster Network Operator (CNO). If the “Actual State” on the disk deviates from the “Desired State” in the Operator’s cache, the Operator will overwrite the file with the standard template immediately.


3. The “Standard Path”: Extending API VIP with MetalLB

If you need to announce an additional API VIP (e.g., 192.168.99.17) post-installation, MetalLB is the recommended approach for bare-metal clusters.

3.1 MetalLB IP Pool Configuration (IPAddressPool)

First, define a dedicated IP pool for the management VIP.

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: api-vip-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.99.17/32 # <--- The new API VIP to be added
  autoAssign: false    # Set to false to prevent accidental assignment to other services
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: api-vip-adv
  namespace: metallb-system
spec:
  ipAddressPools:
  - api-vip-pool

3.2 Creating the API LoadBalancer Service

Since the API Server itself is a Static Pod, it is not directly managed by a Service selector. You must create a transparent Service and manually map the Endpoints to the Master nodes’ physical IPs.

[!IMPORTANT] Common Pitfall: If the Service defines a name for the port (e.g., api-server), the Endpoints definition must include the exact same name. Otherwise, the association will fail, resulting in an empty backend list and a Connection refused error.

apiVersion: v1
kind: Service
metadata:
  name: api-external-lb
  namespace: openshift-config
spec:
  type: LoadBalancer
  ports:
  - port: 6443
    targetPort: 6443
    name: api-server # <--- Port name defined here
  loadBalancerIP: 192.168.99.17 # <--- VIP used for the service
---
# Manually define Endpoints pointing to the physical IPs of the three Master nodes
apiVersion: v1
kind: Endpoints
metadata:
  name: api-external-lb
  namespace: openshift-config
subsets:
  - addresses:
      - ip: 192.168.99.23
      - ip: 192.168.99.24
      - ip: 192.168.99.25
    ports:
      - port: 6443
        name: api-server # <--- MUST match the Service port name!

4. Troubleshooting: Why “Connection refused”?

If you observe successful ARP advertisements but receive a Connection refused during curl tests:

# ARP check shows successful L2 broadcast
? (192.168.99.17) at 00:50:56:8e:2a:32 [ether] on enp1s0

4.1 Analysis: Empty Backends (Endpoints Mismatch)

In Kubernetes, if a Service uses named ports but the Endpoints fail to match those names, the Service becomes invalid for forwarding.

  • Symptom: The OVN/iptables rules do not generate forwarding entries. When a request reaches the node, the kernel responds with an RST packet.
  • Verification: Run oc describe svc api-external-lb -n openshift-config. If the Endpoints field is empty or <none>, this is the root cause.
oc describe svc api-external-lb -n openshift-config
# Name:                     api-external-lb
# Namespace:                openshift-config
# Labels:                   <none>
# Annotations:              metallb.io/ip-allocated-from-pool: api-vip-pool
# Selector:                 <none>
# Type:                     LoadBalancer
# IP Family Policy:         SingleStack
# IP Families:              IPv4
# IP:                       172.22.86.65
# IPs:                      172.22.86.65
# Desired LoadBalancer IP:  192.168.99.17
# LoadBalancer Ingress:     192.168.99.17 (VIP)
# Port:                     api-server  6443/TCP
# TargetPort:               6443/TCP
# NodePort:                 api-server  30848/TCP
# Endpoints:                192.168.99.24:6443,192.168.99.23:6443,192.168.99.25:6443
# Session Affinity:         None
# External Traffic Policy:  Cluster
# Internal Traffic Policy:  Cluster
# Events:
#   Type    Reason        Age                  From                Message
#   ----    ------        ----                 ----                -------
#   Normal  IPAllocated   7m16s                metallb-controller  Assigned IP ["192.168.99.17"]
#   Normal  nodeAssigned  95s (x2 over 7m15s)  metallb-speaker     announcing from node "master-02-demo" with protocol "layer2"

5. The “TLS Bypass” Logic: DNS Spoofing

Even if MetalLB successfully announces .17, Certificate Verification is the final hurdle.

5.1 Why URL-based access is mandatory

  • Direct IP Access: oc login https://192.168.99.17:6443 -> FAILS. The TLS certificate SAN list does not contain .17.
  • Domain Access via DNS:
    1. Update /etc/hosts or the external DNS server to point api.<cluster>.<domain> to the new VIP 192.168.99.17.
    2. Access oc login https://api.<cluster>.<domain>:6443.
    3. Result: SUCCESS. The client validates the Hostname. As long as the domain matches the certificate, the client does not care which IP was used for the connection.

5.2 Real-world Proof of Concept

Before the DNS change (Traffic goes to the original .21):

oc get node -v9 --kubeconfig ./kubeconfig
# I0302 10:45:44.383133   95401 loader.go:395] Config loaded from file:  /home/sno/data/install/auth/kubeconfig
# I0302 10:45:44.405033   95401 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json;as=Table;v=v1;g=meta.k8s.io,application/json;as=Table;v=v1beta1;g=meta.k8s.io,application/json" -H "User-Agent: oc/4.18.0 (linux/amd64) kubernetes/3a48fc2" 'https://api.demo-01-rhsys.wzhlab.top:6443/api/v1/nodes?limit=500'
# I0302 10:45:44.406769   95401 round_trippers.go:495] HTTP Trace: DNS Lookup for api.demo-01-rhsys.wzhlab.top resolved to [{192.168.99.21 }]
# I0302 10:45:44.408264   95401 round_trippers.go:510] HTTP Trace: Dial to tcp:192.168.99.21:6443 succeed
# I0302 10:45:44.428580   95401 round_trippers.go:553] GET https://api.demo-01-rhsys.wzhlab.top:6443/api/v1/nodes?limit=500 200 OK in 23 milliseconds
# I0302 10:45:44.428626   95401 round_trippers.go:570] HTTP Statistics: DNSLookup 1 ms Dial 1 ms TLSHandshake 12 ms ServerProcessing 7 ms Duration 23 ms
# I0302 10:45:44.428644   95401 round_trippers.go:577] Response Headers:
# I0302 10:45:44.428662   95401 round_trippers.go:580]     Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
# I0302 10:45:44.428675   95401 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: e2aadba0-d157-4853-891d-97509e17dec5
# I0302 10:45:44.428687   95401 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: c31bf2b8-4458-45e7-9166-b26415b6c301
# I0302 10:45:44.428701   95401 round_trippers.go:580]     Date: Mon, 02 Mar 2026 10:45:44 GMT
# I0302 10:45:44.428713   95401 round_trippers.go:580]     Audit-Id: d729dd28-2675-4c19-9dd0-2dce054fc3ce
# I0302 10:45:44.428723   95401 round_trippers.go:580]     Cache-Control: no-cache, private
# I0302 10:45:44.428733   95401 round_trippers.go:580]     Content-Type: application/json
# .........

Applying the Local DNS Hack:

# Force the local machine to use the new MetalLB VIP for the API server
sudo tee -a /etc/hosts << EOF

192.168.99.17 api.demo-01-rhsys.wzhlab.top

EOF

Verifying that the domain now resolves to the new VIP (.17) and succeeds via TLS:

# Debug verification pointing to the local kubeconfig
oc get node -v9 --kubeconfig ./kubeconfig
# I0302 19:02:32.646329    2344 loader.go:402] Config loaded from file:  ./kubeconfig
# I0302 19:02:32.647127    2344 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
# I0302 19:02:32.647169    2344 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
# I0302 19:02:32.647187    2344 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
# I0302 19:02:32.647202    2344 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
# I0302 19:02:32.663432    2344 helper.go:113] "Request Body" body=""
# I0302 19:02:32.663610    2344 round_trippers.go:473] curl -v -XGET  -H "User-Agent: oc/4.19.0 (linux/amd64) kubernetes/24755b6" -H "Accept: application/json;as=Table;v=v1;g=meta.k8s.io,application/json;as=Table;v=v1beta1;g=meta.k8s.io,application/json" 'https://api.demo-01-rhsys.wzhlab.top:6443/api/v1/nodes?limit=500'
# I0302 19:02:32.664475    2344 round_trippers.go:502] HTTP Trace: DNS Lookup for api.demo-01-rhsys.wzhlab.top resolved to [{192.168.99.17 }]
# I0302 19:02:32.668290    2344 round_trippers.go:517] HTTP Trace: Dial to tcp:192.168.99.17:6443 succeed
# I0302 19:02:32.692076    2344 round_trippers.go:560] GET https://api.demo-01-rhsys.wzhlab.top:6443/api/v1/nodes?limit=500 200 OK in 28 milliseconds
# I0302 19:02:32.692155    2344 round_trippers.go:577] HTTP Statistics: DNSLookup 0 ms Dial 3 ms TLSHandshake 16 ms ServerProcessing 6 ms Duration 28 ms
# I0302 19:02:32.692188    2344 round_trippers.go:584] Response Headers:
# I0302 19:02:32.692219    2344 round_trippers.go:587]     Audit-Id: 5b6a5bca-da4b-46cf-b206-5a04c60392a6
# I0302 19:02:32.692243    2344 round_trippers.go:587]     Cache-Control: no-cache, private
# I0302 19:02:32.692281    2344 round_trippers.go:587]     Content-Type: application/json
# I0302 19:02:32.692314    2344 round_trippers.go:587]     Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
# I0302 19:02:32.692356    2344 round_trippers.go:587]     X-Kubernetes-Pf-Flowschema-Uid: e2aadba0-d157-4853-891d-97509e17dec5
# I0302 19:02:32.692394    2344 round_trippers.go:587]     X-Kubernetes-Pf-Prioritylevel-Uid: c31bf2b8-4458-45e7-9166-b26415b6c301
# I0302 19:02:32.692430    2344 round_trippers.go:587]     Date: Mon, 02 Mar 2026 11:02:33 GMT
# ......

6. Summary

Solution Operation Result
MetalLB Create IPAddressPool + LoadBalancer Service Successfully announces the network IP
DNS Integration Point API Domain to MetalLB VIP Successfully passes TLS verification
System Modification Edit keepalived.yaml or Infrastructure FAILED (Reverted or Rejected)

The Final Verdict: For extending API management VIPs, “sidestepping” the Operator is smarter than “fighting” it. By leveraging MetalLB for the L2 announcement and DNS for identity validation, you can support multiple API entry points without violating the declarative integrity of the OpenShift cluster.