OCP 4.19 Agent-based Installation (ABI): Deep Dive and Solutions for Adding API / Ingress VIPs
In an OpenShift 4.19 (ABI) bare-metal environment, the management logic for API VIPs and Ingress VIPs is designed with high atomicity and immutability. This document provides a comprehensive analysis of the underlying configuration management, from low-level files to Operator reconciliation loops, and offers a robust solution for extending VIPs post-installation.
1. Core Analysis: Why Manual Modifications Fail
In a Bare Metal IPI/ABI architecture, VIPs are not mere static configurations but are part of a multi-layered, declarative state managed by the cluster.
- Filesystem Level:
/etc/keepalived/keepalived.confis continuously monitored and overwritten by thebaremetal-runtimecfgcontainer. - Manifest Level:
/etc/kubernetes/manifests/keepalived.yamlis under the strict supervision of the Cluster Network Operator (CNO). Any change in the fileās hash triggers a near-instant revert. - Resource Level: The
statusfield of theInfrastructureresource is locked by the Cluster Version Operator (CVO). Since this involves the security of the API Serverās certificates (SAN list), the system prohibits dynamic changes to the management VIPs post-installation.
2. Deep Dive: Source-Level Analysis of keepalived.yaml
Analyzing the definition of /etc/kubernetes/manifests/keepalived.yaml on a Master node reveals the hierarchical management logic that ensures manual changes are always prioritized for reconciliation.
2.1 Static Pod Manifest Snippet
# The following definition determines how VIPs are announced on the node
initContainers:
- name: render-config-keepalived
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d27...
command:
- runtimecfg
- render
- "/etc/kubernetes/kubeconfig"
- "--api-vips"
- "192.168.99.21" # <--- Source of truth hardcoded at installation time
- "--ingress-vips"
- "192.168.99.22"
containers:
- name: keepalived
command:
- /bin/bash
- -c
- |
# The startup logic includes cleanup for any stale VIPs
remove_vip "192.168.99.21"
remove_vip "192.168.99.22"
/usr/sbin/keepalived -f /etc/keepalived/keepalived.conf --dont-fork --vrrp ...
- name: keepalived-monitor
command:
- /bin/bash
- -c
- |
# Runtime dynamic maintainer
api_vips=192.168.99.21
ingress_vips=192.168.99.22
dynkeepalived /var/lib/kubelet/kubeconfig /config/keepalived.conf.tmpl /etc/keepalived/keepalived.conf --api-vips "${api_vips}" --ingress-vips "${ingress_vips}"Conclusion: The fileās integrity is verified by the Cluster Network Operator (CNO). If the āActual Stateā on the disk deviates from the āDesired Stateā in the Operatorās cache, the Operator will overwrite the file with the standard template immediately.
3. The āStandard Pathā: Extending API VIP with MetalLB
If you need to announce an additional API VIP (e.g., 192.168.99.17) post-installation, MetalLB is the recommended approach for bare-metal clusters.
3.1 MetalLB IP Pool Configuration (IPAddressPool)
First, define a dedicated IP pool for the management VIP.
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: api-vip-pool
namespace: metallb-system
spec:
addresses:
- 192.168.99.17/32 # <--- The new API VIP to be added
autoAssign: false # Set to false to prevent accidental assignment to other services
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: api-vip-adv
namespace: metallb-system
spec:
ipAddressPools:
- api-vip-pool3.2 Creating the API LoadBalancer Service
Since the API Server itself is a Static Pod, it is not directly managed by a Service selector. You must create a transparent Service and manually map the Endpoints to the Master nodesā physical IPs.
[!IMPORTANT] Common Pitfall: If the
Servicedefines anamefor the port (e.g.,api-server), theEndpointsdefinition must include the exact samename. Otherwise, the association will fail, resulting in an empty backend list and aConnection refusederror.
apiVersion: v1
kind: Service
metadata:
name: api-external-lb
namespace: openshift-config
spec:
type: LoadBalancer
ports:
- port: 6443
targetPort: 6443
name: api-server # <--- Port name defined here
loadBalancerIP: 192.168.99.17 # <--- VIP used for the service
---
# Manually define Endpoints pointing to the physical IPs of the three Master nodes
apiVersion: v1
kind: Endpoints
metadata:
name: api-external-lb
namespace: openshift-config
subsets:
- addresses:
- ip: 192.168.99.23
- ip: 192.168.99.24
- ip: 192.168.99.25
ports:
- port: 6443
name: api-server # <--- MUST match the Service port name!4. Troubleshooting: Why āConnection refusedā?
If you observe successful ARP advertisements but receive a Connection refused during curl tests:
# ARP check shows successful L2 broadcast
? (192.168.99.17) at 00:50:56:8e:2a:32 [ether] on enp1s0
4.1 Analysis: Empty Backends (Endpoints Mismatch)
In Kubernetes, if a Service uses named ports but the Endpoints fail to match those names, the Service becomes invalid for forwarding.
- Symptom: The OVN/iptables rules do not generate forwarding entries. When a request reaches the node, the kernel responds with an
RSTpacket. - Verification: Run
oc describe svc api-external-lb -n openshift-config. If theEndpointsfield is empty or<none>, this is the root cause.
oc describe svc api-external-lb -n openshift-config
# Name: api-external-lb
# Namespace: openshift-config
# Labels: <none>
# Annotations: metallb.io/ip-allocated-from-pool: api-vip-pool
# Selector: <none>
# Type: LoadBalancer
# IP Family Policy: SingleStack
# IP Families: IPv4
# IP: 172.22.86.65
# IPs: 172.22.86.65
# Desired LoadBalancer IP: 192.168.99.17
# LoadBalancer Ingress: 192.168.99.17 (VIP)
# Port: api-server 6443/TCP
# TargetPort: 6443/TCP
# NodePort: api-server 30848/TCP
# Endpoints: 192.168.99.24:6443,192.168.99.23:6443,192.168.99.25:6443
# Session Affinity: None
# External Traffic Policy: Cluster
# Internal Traffic Policy: Cluster
# Events:
# Type Reason Age From Message
# ---- ------ ---- ---- -------
# Normal IPAllocated 7m16s metallb-controller Assigned IP ["192.168.99.17"]
# Normal nodeAssigned 95s (x2 over 7m15s) metallb-speaker announcing from node "master-02-demo" with protocol "layer2"5. The āTLS Bypassā Logic: DNS Spoofing
Even if MetalLB successfully announces .17, Certificate Verification is the final hurdle.
5.1 Why URL-based access is mandatory
- Direct IP Access:
oc login https://192.168.99.17:6443-> FAILS. The TLS certificate SAN list does not contain.17. - Domain Access via DNS:
- Update
/etc/hostsor the external DNS server to pointapi.<cluster>.<domain>to the new VIP192.168.99.17. - Access
oc login https://api.<cluster>.<domain>:6443. - Result: SUCCESS. The client validates the Hostname. As long as the domain matches the certificate, the client does not care which IP was used for the connection.
- Update
5.2 Real-world Proof of Concept
Before the DNS change (Traffic goes to the original .21):
oc get node -v9 --kubeconfig ./kubeconfig
# I0302 10:45:44.383133 95401 loader.go:395] Config loaded from file: /home/sno/data/install/auth/kubeconfig
# I0302 10:45:44.405033 95401 round_trippers.go:466] curl -v -XGET -H "Accept: application/json;as=Table;v=v1;g=meta.k8s.io,application/json;as=Table;v=v1beta1;g=meta.k8s.io,application/json" -H "User-Agent: oc/4.18.0 (linux/amd64) kubernetes/3a48fc2" 'https://api.demo-01-rhsys.wzhlab.top:6443/api/v1/nodes?limit=500'
# I0302 10:45:44.406769 95401 round_trippers.go:495] HTTP Trace: DNS Lookup for api.demo-01-rhsys.wzhlab.top resolved to [{192.168.99.21 }]
# I0302 10:45:44.408264 95401 round_trippers.go:510] HTTP Trace: Dial to tcp:192.168.99.21:6443 succeed
# I0302 10:45:44.428580 95401 round_trippers.go:553] GET https://api.demo-01-rhsys.wzhlab.top:6443/api/v1/nodes?limit=500 200 OK in 23 milliseconds
# I0302 10:45:44.428626 95401 round_trippers.go:570] HTTP Statistics: DNSLookup 1 ms Dial 1 ms TLSHandshake 12 ms ServerProcessing 7 ms Duration 23 ms
# I0302 10:45:44.428644 95401 round_trippers.go:577] Response Headers:
# I0302 10:45:44.428662 95401 round_trippers.go:580] Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
# I0302 10:45:44.428675 95401 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: e2aadba0-d157-4853-891d-97509e17dec5
# I0302 10:45:44.428687 95401 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: c31bf2b8-4458-45e7-9166-b26415b6c301
# I0302 10:45:44.428701 95401 round_trippers.go:580] Date: Mon, 02 Mar 2026 10:45:44 GMT
# I0302 10:45:44.428713 95401 round_trippers.go:580] Audit-Id: d729dd28-2675-4c19-9dd0-2dce054fc3ce
# I0302 10:45:44.428723 95401 round_trippers.go:580] Cache-Control: no-cache, private
# I0302 10:45:44.428733 95401 round_trippers.go:580] Content-Type: application/json
# .........Applying the Local DNS Hack:
# Force the local machine to use the new MetalLB VIP for the API server
sudo tee -a /etc/hosts << EOF
192.168.99.17 api.demo-01-rhsys.wzhlab.top
EOFVerifying that the domain now resolves to the new VIP (.17) and succeeds via TLS:
# Debug verification pointing to the local kubeconfig
oc get node -v9 --kubeconfig ./kubeconfig
# I0302 19:02:32.646329 2344 loader.go:402] Config loaded from file: ./kubeconfig
# I0302 19:02:32.647127 2344 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
# I0302 19:02:32.647169 2344 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
# I0302 19:02:32.647187 2344 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
# I0302 19:02:32.647202 2344 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
# I0302 19:02:32.663432 2344 helper.go:113] "Request Body" body=""
# I0302 19:02:32.663610 2344 round_trippers.go:473] curl -v -XGET -H "User-Agent: oc/4.19.0 (linux/amd64) kubernetes/24755b6" -H "Accept: application/json;as=Table;v=v1;g=meta.k8s.io,application/json;as=Table;v=v1beta1;g=meta.k8s.io,application/json" 'https://api.demo-01-rhsys.wzhlab.top:6443/api/v1/nodes?limit=500'
# I0302 19:02:32.664475 2344 round_trippers.go:502] HTTP Trace: DNS Lookup for api.demo-01-rhsys.wzhlab.top resolved to [{192.168.99.17 }]
# I0302 19:02:32.668290 2344 round_trippers.go:517] HTTP Trace: Dial to tcp:192.168.99.17:6443 succeed
# I0302 19:02:32.692076 2344 round_trippers.go:560] GET https://api.demo-01-rhsys.wzhlab.top:6443/api/v1/nodes?limit=500 200 OK in 28 milliseconds
# I0302 19:02:32.692155 2344 round_trippers.go:577] HTTP Statistics: DNSLookup 0 ms Dial 3 ms TLSHandshake 16 ms ServerProcessing 6 ms Duration 28 ms
# I0302 19:02:32.692188 2344 round_trippers.go:584] Response Headers:
# I0302 19:02:32.692219 2344 round_trippers.go:587] Audit-Id: 5b6a5bca-da4b-46cf-b206-5a04c60392a6
# I0302 19:02:32.692243 2344 round_trippers.go:587] Cache-Control: no-cache, private
# I0302 19:02:32.692281 2344 round_trippers.go:587] Content-Type: application/json
# I0302 19:02:32.692314 2344 round_trippers.go:587] Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
# I0302 19:02:32.692356 2344 round_trippers.go:587] X-Kubernetes-Pf-Flowschema-Uid: e2aadba0-d157-4853-891d-97509e17dec5
# I0302 19:02:32.692394 2344 round_trippers.go:587] X-Kubernetes-Pf-Prioritylevel-Uid: c31bf2b8-4458-45e7-9166-b26415b6c301
# I0302 19:02:32.692430 2344 round_trippers.go:587] Date: Mon, 02 Mar 2026 11:02:33 GMT
# ......6. Summary
| Solution | Operation | Result |
|---|---|---|
| MetalLB | Create IPAddressPool + LoadBalancer Service | Successfully announces the network IP |
| DNS Integration | Point API Domain to MetalLB VIP | Successfully passes TLS verification |
| System Modification | Edit keepalived.yaml or Infrastructure |
FAILED (Reverted or Rejected) |
The Final Verdict: For extending API management VIPs, āsidesteppingā the Operator is smarter than āfightingā it. By leveraging MetalLB for the L2 announcement and DNS for identity validation, you can support multiple API entry points without violating the declarative integrity of the OpenShift cluster.