Deep Dive: OVN Egress Service with MetalLB-Based BGP in OpenShift 4.19
Overview
In OpenShift 4.19, the Egress Service provides a powerful networking abstraction that combines the dynamic allocation of LoadBalancer IPs with the high-performance traffic interception of OVN-Kubernetes. By integrating MetalLB with the cluster’s shared FRR-K8s backbone, administrators can advertise Egress IPs to upstream BGP peers using standard Kubernetes CRDs. This document explores the architectural implementation, environment setup, and a deep-dive verification of the “Tromboning” traffic flow and SNAT mechanisms.
1. Laboratory Environment and Validation Topology
To demonstrate the feasibility of this solution, a verification environment was constructed using native Linux virtual machines (KVM/libvirt) on a CentOS 9 bare-metal host. The host bridge br-ocp uses the subnet 192.168.99.0/24 (Host IP: 192.168.99.1).
1.1 Environment Specifications
- OpenShift 4.19 Cluster:
- 3-Node Compact Cluster (Master + Worker roles).
- Network interfaces attached to
br-ocp. - CNI: OVN-Kubernetes (Default).
- Additional Add-on: MetalLB Operator.
- Node IPs:
192.168.99.23~192.168.99.25. - Egress IP Pool:
192.168.66.100/24.
- KVM Router Node (CentOS 9):
- Simulates a Datacenter Core Switch running FRRouting (FRR).
- eth0 (Management/Internal):
192.168.99.12. - eth1 (External simulator):
192.168.55.12.
- KVM Server Node (CentOS 9):
- Acts as an external application server to verify SNAT.
- eth0:
192.168.55.13, with gateway pointing to192.168.55.12.
2. Infrastructure Configuration
2.1 Router (FRR) Node Setup
The Router node acts as the BGP neighbor for the cluster. It must enable kernel forwarding and establish iBGP sessions with all OCP nodes.
# 1. Install base routing components
sudo dnf install -y frr
# Explicitly enable bgpd in the daemons file before starting
sudo sed -i 's/^bgpd=no/bgpd=yes/' /etc/frr/daemons
sudo systemctl enable --now frr
# 2. Enable IPv4 kernel forwarding
echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.d/99-ipforward.conf
sudo sysctl -p /etc/sysctl.d/99-ipforward.conf
# 3. Configure the core FRR configuration file
cat <<EOF | sudo tee /etc/frr/frr.conf
frr defaults traditional
log syslog informational
no ipv6 forwarding
!
router bgp 64512
bgp router-id 192.168.99.12
! Configure iBGP neighbors for each OCP cluster node
neighbor 192.168.99.23 remote-as 64512
neighbor 192.168.99.24 remote-as 64512
neighbor 192.168.99.25 remote-as 64512
! Configure iBGP ECMP and network advertisement within the address-family
address-family ipv4 unicast
network 192.168.55.0/24
maximum-paths ibgp 4
exit-address-family
!
line vty
!
EOF
# 4. Reload FRR to apply changes
sudo systemctl restart frr2.2 Server Node Configuration (Traffic Destination)
The Server node requires a route back to the Egress IP range via the Router. We use Python to spin up listeners for connectivity verification.
# Configure route to the Egress IP pool via the Router
sudo ip route add 192.168.66.100 via 192.168.55.12
# Verify routing table
ip r
# default via 192.168.99.1 dev enp1s0 proto static metric 100
# 192.168.55.0/24 dev enp1s0 proto kernel scope link src 192.168.55.13 metric 100
# 192.168.66.100 via 192.168.55.12 dev enp1s0
# 192.168.99.0/24 dev enp1s0 proto kernel scope link src 192.168.99.13 metric 100
# Start background listeners to verify connectivity
# (Listening on ports 80 and 8080 to prove Egress IP transparency)
nohup python3 -m http.server 80 &
nohup python3 -m http.server 8080 &
# Monitor incoming traffic to observe the source (Egress IP 192.168.66.100)
sudo tcpdump -i any 'tcp port 80 or tcp port 8080' -n3. OpenShift Cluster Preparation
3.1 Network Operator and Node Configuration
In OpenShift 4.19, the BGP capability is provided by the FRR-K8s daemon. We must enable this provider in the Network.operator API.
# Removing auxiliary IP addresses to the host interfaces to support routing/testing
oc debug node/master-01-demo -- chroot /host /bin/bash -c "nmcli connection modify enp1s0 -ipv4.addresses 192.168.66.23/24"
oc debug node/master-02-demo -- chroot /host /bin/bash -c "nmcli connection modify enp1s0 -ipv4.addresses 192.168.66.24/24"
oc debug node/master-03-demo -- chroot /host /bin/bash -c "nmcli connection modify enp1s0 -ipv4.addresses 192.168.66.25/24"
# Enable FRR provider and Route Advertisements in OVN-Kubernetes
oc patch Network.operator.openshift.io cluster --type=merge -p \
'{
"spec": {
"additionalRoutingCapabilities": {
"providers": ["FRR"]
},
"defaultNetwork": {
"ovnKubernetesConfig": {
"routeAdvertisements": "Enabled"
}
}
}
}'4. BGP and Egress Service Implementation
4.1 Establishing BGP Connectivity (MetalLB CRDs)
Instead of using manual FRRConfiguration, we leverage the MetalLB Operator’s BGP CRDs.
apiVersion: metallb.io/v1beta1
kind: MetalLB
metadata:
name: metallb
namespace: metallb-systemcheck the pods the supporting bgp.
# Check the status of the FRR pods
oc get pods -n openshift-frr-k8s
# NAME READY STATUS RESTARTS AGE
# frr-k8s-952fv 7/7 Running 0 63m
# frr-k8s-webhook-server-79fdc47779-w8lv9 1/1 Running 0 63m
# frr-k8s-z4xkk 7/7 Running 0 63m
# frr-k8s-zkwzh 7/7 Running 0 63m
# Check the status of the MetalLB pods
oc get pods -n metallb-system -l app=metallb
# NAME READY STATUS RESTARTS AGE
# controller-6df44df4f5-wgpxs 2/2 Running 0 63m
# metallb-operator-webhook-server-7866f494cc-68qs9 1/1 Running 0 63m
# speaker-7cxcm 2/2 Running 0 63m
# speaker-8sxp7 2/2 Running 0 62m
# speaker-fdhmj 2/2 Running 0 63m4.2 MetalLB Operator Initialization
We define the BGPPeer to establish the session with the external router.
# MetalLB Peer Configuration
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
name: peer-sample1
namespace: metallb-system
spec:
peerAddress: 192.168.99.12
peerASN: 64512
myASN: 645124.3 Egress IP Pool and Advertisement
We establish the IP address pool and the specific BGPAdvertisement rule, targeting the nodes managed by the EgressService.
# Define the Egress IP range for the cluster
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: egress-pool
namespace: metallb-system
spec:
addresses:
- 192.168.66.100-192.168.66.100 # Static Egress IP
---
# Advertisement logic
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
name: egress-identity-bgp-adv
namespace: metallb-system
spec:
ipAddressPools:
- egress-pool
# Only nodes labeled by OVN-K as Egress Gateways will advertise this IP
nodeSelectors:
- matchLabels:
egress-service.k8s.ovn.org/demo-egress-egress-identity: ""4.4 Deploying the Egress Service
The EgressService CRD instructs OVN-K to use the LoadBalancer IP for SNAT.
oc new-project demo-egress
# Apply the Service and EgressService bundle
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Service
metadata:
name: egress-identity
namespace: demo-egress
annotations:
metallb.universe.tf/address-pool: egress-pool
spec:
type: LoadBalancer
selector:
app: requester
ports:
- name: http
protocol: TCP
port: 8080
targetPort: 8080
---
apiVersion: k8s.ovn.org/v1
kind: EgressService
metadata:
name: egress-identity
namespace: demo-egress
spec:
sourceIPBy: "LoadBalancerIP"
nodeSelector:
matchLabels:
node-role.kubernetes.io/master: ""
EOF5. Verification and Result Analysis
5.1 Dynamic FRRConfiguration Sync
When MetalLB CRDs are applied in the metallb-system namespace (with the correct namespace labels or cluster configuration), the MetalLB Operator automatically synthesizes FRRConfiguration objects in the openshift-frr-k8s namespace.
oc get FRRConfiguration -A
# NAMESPACE NAME AGE
# openshift-frr-k8s metallb-master-01-demo 15m
# openshift-frr-k8s metallb-master-02-demo 15m
# openshift-frr-k8s metallb-master-03-demo 15m
# Inspecting a dynamically generated configuration
oc get FRRConfiguration metallb-master-01-demo -n openshift-frr-k8s -o yaml
# apiVersion: v1
# items:
# - apiVersion: frrk8s.metallb.io/v1beta1
# kind: FRRConfiguration
# metadata:
# creationTimestamp: "2026-03-05T02:22:03Z"
# generation: 2
# name: metallb-master-01-demo
# namespace: openshift-frr-k8s
# resourceVersion: "366323"
# uid: 1b71c010-5e53-4973-8c4e-49896cd8181a
# spec:
# bgp:
# routers:
# - asn: 64512
# neighbors:
# - address: 192.168.99.12
# asn: 64512
# disableMP: false
# dualStackAddressFamily: false
# passwordSecret: {}
# port: 179
# toAdvertise:
# allowed:
# mode: filtered
# prefixes:
# - 192.168.66.100/32
# toReceive:
# allowed:
# mode: filtered
# prefixes:
# - 192.168.66.100/32
# nodeSelector:
# matchLabels:
# kubernetes.io/hostname: master-01-demo
# raw: {}
# - apiVersion: frrk8s.metallb.io/v1beta1
# kind: FRRConfiguration
# metadata:
# creationTimestamp: "2026-03-05T02:22:03Z"
# generation: 1
# name: metallb-master-02-demo
# namespace: openshift-frr-k8s
# resourceVersion: "365356"
# uid: 99415279-10a5-44fe-aef8-78a342ef589c
# spec:
# bgp:
# routers:
# - asn: 64512
# neighbors:
# - address: 192.168.99.12
# asn: 64512
# disableMP: false
# dualStackAddressFamily: false
# passwordSecret: {}
# port: 179
# toAdvertise:
# allowed:
# mode: filtered
# toReceive:
# allowed:
# mode: filtered
# nodeSelector:
# matchLabels:
# kubernetes.io/hostname: master-02-demo
# raw: {}
# - apiVersion: frrk8s.metallb.io/v1beta1
# kind: FRRConfiguration
# metadata:
# creationTimestamp: "2026-03-05T02:22:03Z"
# generation: 1
# name: metallb-master-03-demo
# namespace: openshift-frr-k8s
# resourceVersion: "365355"
# uid: 38743b7a-903c-41c7-8027-7e905d0f49d4
# spec:
# bgp:
# routers:
# - asn: 64512
# neighbors:
# - address: 192.168.99.12
# asn: 64512
# disableMP: false
# dualStackAddressFamily: false
# passwordSecret: {}
# port: 179
# toAdvertise:
# allowed:
# mode: filtered
# toReceive:
# allowed:
# mode: filtered
# nodeSelector:
# matchLabels:
# kubernetes.io/hostname: master-03-demo
# raw: {}
# kind: List
# metadata:
# resourceVersion: ""5.2 Connectivity and BGP Route Check
Verify that the external router has learned the /32 route from the correct node.
# Verify route on the external FRR router
vtysh -c 'show ip route bgp'
# B>* 192.168.66.100/32 [200/0] via 192.168.99.23, enp1s0, weight 1, 00:18:37
# Execute curl from Pods on different nodes
oc exec -it $NODE3_POD -n demo-egress -- curl -I http://192.168.55.13 --connect-timeout 26. Under the Hood: Deep Dive into Interception
The EgressService in OCP 4.19 relies on a sophisticated interplay between OVS registers and native Linux nftables.
6.1 OVN Address Sets
OVN-K creates an AddressSet containing the IPs of all Pods in the namespace to track who should be SNATed.
6.2 OVS OpenFlow Redirection (Stage 1)
On the originating node (master-03), OVS modifies registers to redirect the packet into the Geneve tunnel towards the Egress node (master-01).
# Check OVS flows for the source Pod IP
oc exec -n openshift-ovn-kubernetes $OVN_NODE_03_POD -c ovn-controller -- ovs-ofctl dump-flows br-int | grep "10.133.0.3"
# table=25, priority=101, ip, nw_src=10.133.0.3 actions=load:0x64580004->NXM_NX_XXREG0[96..127]...- 0x64580004 translates to
100.88.0.4, the Transit Switch IP ofmaster-01-demo.
6.3 NFTables Dynamic Maps (Stage 2)
On the egress node (master-01), the final SNAT transformation occurs using nftables maps for maximum throughput.
# Extract the dynamic SNAT map from the host
oc debug node/master-01-demo -- chroot /host /bin/bash -c "nft list ruleset | grep 'egress-service-snat-v4' -A 5 -B 5"
# map egress-service-snat-v4 {
# type ipv4_addr : ipv4_addr
# elements = { 10.132.0.39 comment "demo-egress/egress-identity" : 192.168.66.100, 10.133.0.4 comment "demo-egress/egress-identity" : 192.168.66.100 }
# }
oc debug node/master-02-demo -- chroot /host /bin/bash -c "nft list ruleset | grep 'egress-service-snat-v4' -A 5 -B 5"
# map egress-service-snat-v4 {
# type ipv4_addr : ipv4_addr
# }
oc debug node/master-03-demo -- chroot /host /bin/bash -c "nft list ruleset | grep 'egress-service-snat-v4' -A 5 -B 5"
# map egress-service-snat-v4 {
# type ipv4_addr : ipv4_addr
# }
# Extract the DNAT rules
oc debug node/master-01-demo -- chroot /host /bin/bash -c "iptables -L -v -n -t nat | grep 'Chain OVN-KUBE-EXTERNALIP' -A 4 -B 0"
# Chain OVN-KUBE-EXTERNALIP (2 references)
# pkts bytes target prot opt in out source destination
# 0 0 DNAT tcp -- * * 0.0.0.0/0 192.168.66.100 tcp dpt:8080 to:172.22.88.148:8080
oc debug node/master-02-demo -- chroot /host /bin/bash -c "iptables -L -v -n -t nat | grep 'Chain OVN-KUBE-EXTERNALIP' -A 4 -B 0"
# Chain OVN-KUBE-EXTERNALIP (2 references)
# pkts bytes target prot opt in out source destination
# 0 0 DNAT tcp -- * * 0.0.0.0/0 192.168.66.100 tcp dpt:8080 to:172.22.88.148:8080
oc debug node/master-03-demo -- chroot /host /bin/bash -c "iptables -L -v -n -t nat | grep 'Chain OVN-KUBE-EXTERNALIP' -A 4 -B 0"
# Chain OVN-KUBE-EXTERNALIP (2 references)
# pkts bytes target prot opt in out source destination
# 0 0 DNAT tcp -- * * 0.0.0.0/0 192.168.66.100 tcp dpt:8080 to:172.22.88.148:8080Conclusion
The evolution from manual FRRConfiguration to integrated MetalLB CRDs in OpenShift 4.19 represents a significant improvement in usability and automation. By leveraging the Shared FRR-K8s Instance Observation (via namespace labeling), administrators can use standard Kubernetes networking abstractions to manage complex BGP routing policies, ensuring high-performance Egress traffic management with minimal manual overhead.