← Back to Index

openshift 4.20 Network Observability with ovn egress firewall

OpenShift 4.20 introduces eBPF as the agent for Network Observability (NetObserv). Unlike the previous IPFIX-based approach, eBPF runs as a DaemonSet directly on each node and captures packets at the kernel level — enabling RTT (Round-Trip Time) measurements with nanosecond precision, which was not possible before.

This document walks through the full setup: installing LokiStack as the log backend, deploying the Network Observability Operator, setting up an Egress IP scenario to observe external traffic paths, and finally implementing an automated EgressFirewall to block Google traffic based on dynamically updated IP ranges.

  • https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/network_observability/installing-network-observability-operators

The following diagram shows the overall architecture of this lab:

The lab uses a 3-master SNO-style cluster. Network flows are captured by eBPF agents on each node, enriched with Kubernetes metadata by flowlogs-pipeline, then stored in LokiStack (backed by S3 object storage). The OCP web console plugin (netobserv-plugin) queries Loki to visualize flows in real time. Separately, aggregated metrics are exported to the built-in OCP Prometheus for dashboard display.

try with loki

install loki

Network Observability stores raw flow logs (each individual TCP/UDP flow as a JSON record) in LokiStack. Loki is a log aggregation system optimized for large volumes of structured data — it compresses and stores logs in object storage (S3), making it far cheaper than storing in Prometheus time-series format. Without Loki, you can still get aggregated metrics via Prometheus, but you lose the ability to browse individual flow records and filter by pod, namespace, port, or protocol in the traffic flows table.

RTT measurement data is stored as a field (TimeFlowRttNs) in each flow log record in Loki. This is why Loki is a prerequisite for RTT visibility.

create S3 bucket

Loki uses object storage (S3 or S3-compatible) as its primary data store. All flow log data — compressed chunks and index files — are written to S3. Without a working S3 backend, LokiStack pods will fail to start. In this lab we use rustfs (a lightweight, self-hosted S3-compatible server running on the helper node at 192.168.99.1:9000) as a stand-in for a cloud S3 bucket.

The screenshot below shows the rustfs web UI after creating the demo bucket that Loki will use for storage. The bucket must exist before deploying LokiStack.

install loki operator

With the S3 bucket ready, we install the Loki Operator from OperatorHub and then create a LokiStack custom resource. The key configuration decisions are:

  • size: 1x.demo — the smallest deployment profile, suitable for lab/testing (production would use 1x.small or 1x.medium)
  • replication.factor: 1 — single replica to save resources; production should use 3
  • storage.secret — references the Kubernetes Secret containing S3 credentials
  • storageClassName: nfs-csi — local PVC storage for Loki component pods (ingester WAL, index cache); this is separate from S3 and holds hot/in-flight data
  • tenants.mode: openshift-network — the special mode required by Network Observability (as opposed to openshift-logging mode for log aggregation)

The screenshot below shows the Loki Operator successfully installed from OperatorHub, ready for LokiStack CR creation.


oc new-project netobserv

# netobserv is a resource-hungry application, it has high requirements for the underlying loki, we configure the maximum gear, and the number of replicas, etc., to adapt to our test environment.

cat << EOF > ${BASE_DIR}/data/install/loki-netobserv.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: loki-s3 
stringData:
  access_key_id: rustfsadmin
  access_key_secret: rustfsadmin
  bucketnames: demo
  endpoint: http://192.168.99.1:9000
  # region: eu-central-1

---
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: loki
spec:
  size: 1x.demo # 1x.medium , 1x.demo
  replication:
    factor: 1
  storage:
    schemas:
    - version: v13
      effectiveDate: '2022-06-01'
    secret:
      name: loki-s3
      type: s3
  storageClassName: nfs-csi
  tenants:
    mode: openshift-network
    openshift:
        adminGroups: 
        - cluster-admin
  template:
    gateway:
      replicas: 1
    ingester:
      replicas: 1
    indexGateway:
      replicas: 1

EOF

oc create --save-config -n netobserv -f ${BASE_DIR}/data/install/loki-netobserv.yaml

# to delete

# oc delete -n netobserv -f ${BASE_DIR}/data/install/loki-netobserv.yaml

# oc get pvc -n netobserv | grep loki- | awk '{print $1}' | xargs oc delete -n netobserv pvc

# run below, if reinstall
oc adm groups new cluster-admin

oc adm groups add-users cluster-admin admin

oc adm policy add-cluster-role-to-group cluster-admin cluster-admin

install net observ

The Network Observability Operator is the core component that orchestrates the entire pipeline. Once installed, it manages three sub-components through a single FlowCollector custom resource:

  • eBPF Agent (DaemonSet on every node) — captures raw packet flows at the kernel level
  • flowlogs-pipeline — receives flow data from eBPF agents, enriches it with Kubernetes metadata (pod names, namespaces, labels), and forwards to Loki
  • netobserv-plugin — a dynamic OCP console plugin that queries Loki and renders the Network Traffic UI

Installation is straightforward via OperatorHub. However, there is a known issue with the eBPF agent: after initial deployment, some agents may not fully activate. Restarting the cluster nodes after installation resolves this — do not skip this step if the eBPF agents appear stuck.

The screenshots below walk through the installation steps in the OCP web console:

Step 1 — Search for “Network Observability” in OperatorHub:

Step 2 — Select the operator and click Install:

Step 3 — Create the FlowCollector CR. This is the main configuration object. Key fields include the Loki URL (pointing to our LokiStack), the agent type (eBPF), and the sampling rate:

Step 4 — Configure the Loki connection in the FlowCollector. The lokiStack section references the LokiStack resource we created earlier in the netobserv namespace:

Step 5 — eBPF agent settings: sampling rate, interfaces to monitor, and privilege settings. The eBPF agent needs elevated privileges to access the kernel network stack:

Step 6 — After applying the FlowCollector CR, all operator pods come up in the netobserv namespace. The eBPF agent pods run on every node. Once ready, the “Network Traffic” menu item appears in the OCP console:

try it out

deploy egress IP

The goal of this test is to observe how the Egress IP feature interacts with Network Observability. An Egress IP assigns a stable, predictable source IP address to all outbound traffic from a given namespace. This is important in scenarios where:

  • A backend service outside the cluster filters connections by source IP (allowlisting)
  • You need consistent source identity for audit/compliance logging
  • You want to observe the exact network path (node → egress IP → internet) using NetObserv RTT data

In this lab, we assign 192.168.99.103 as the egress IP for the llm-demo namespace. All pods in that namespace will appear to originate from 192.168.99.103 when reaching external destinations — regardless of which node the pod is actually running on.

Without Egress IP, outbound traffic uses the node’s primary IP as the source address, which changes if the pod is rescheduled to a different node. With Egress IP, the source is always stable.


# label a node to host egress ip
oc label node --all k8s.ovn.org/egress-assignable="" --overwrite

# label a namespace with env
oc new-project llm-demo
oc label ns llm-demo env=egress-demo


# create a egress ip
cat << EOF > ${BASE_DIR}/data/install/egressip.yaml
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egressips-prod
spec:
  egressIPs:
  - 192.168.99.103
  namespaceSelector:
    matchLabels:
      env: egress-demo
EOF

oc apply -f ${BASE_DIR}/data/install/egressip.yaml

# oc delete -f ${BASE_DIR}/data/install/egressip.yaml

oc get egressip -o json | jq -r '.items[] | [.status.items[].egressIP, .status.items[].node] | @tsv'
# 192.168.99.103  master-01-demo

make traffic and see result

With the Egress IP in place, we deploy a test pod in the llm-demo namespace on master-02-demo — a different node than where the egress IP is assigned. This is intentional: OVN-Kubernetes will route outbound traffic from master-02-demo through the egress node (master-01-demo) so it exits via the 192.168.99.103 IP. This cross-node egress path creates interesting RTT values because the traffic traverses an extra network hop inside the cluster before leaving.

The pod continuously curls https://www.google.com to generate external traffic. The eBPF agent on each node captures these flows, and flowlogs-pipeline enriches them with Kubernetes metadata before writing to Loki.


# go back to helper
# create a dummy pod
cat << EOF > ${BASE_DIR}/data/install/demo1.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: wzh-demo-pod
spec:
  nodeSelector:
    kubernetes.io/hostname: 'master-02-demo'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        quay.io/wangzheng422/qimgs:centos9-test-2025.12.18.v01
      env:
        - name: key
          value: value
      command: [ "/bin/bash", "-c", "--" ]
      args: [ "tail -f /dev/null" ]
      # imagePullPolicy: Always
EOF

oc apply -n llm-demo -f ${BASE_DIR}/data/install/demo1.yaml

# oc delete -n llm-demo -f ${BASE_DIR}/data/install/demo1.yaml

oc exec -n llm-demo wzh-demo-pod -it -- bash
# in the container terminal
while true; do curl https://www.google.com && sleep 1; done;

# while true; do curl http://192.168.77.8:13000/cache.db > /dev/null; done;

After the pod starts generating traffic, we can observe it in the OCP web console under Observe → Network Traffic or Pod → Network Traffic. The screenshots below walk through what you see in the UI:

You can see flows from wzh-demo-pod in llm-demo reaching external Google IP addresses (e.g., 142.251.x.x):

Each flow record stored in Loki contains a full JSON document. The following example shows a captured flow from wzh-demo-pod receiving a response from a Google server (142.251.152.119:443). Key fields to note:

  • SrcAddr: 142.251.152.119 — the Google server IP (source of this ingress packet)
  • DstAddr: 10.133.0.21 — the pod’s internal IP on master-02-demo
  • DstK8S_Name: wzh-demo-pod / DstK8S_Namespace: llm-demo — Kubernetes metadata added by flowlogs-pipeline
  • TimeFlowRttNs: 8421000 — RTT of 8.421 milliseconds to Google, measured by eBPF at the TCP layer
  • Interfaces: ["genev_sys_6081", "eth0"] — the GENEVE tunnel interface (OVN overlay) and the pod’s eth0, showing the packet path through the OVN network stack
  • Sampling: 50 — only 1 in 50 packets is reported (to reduce Loki write volume); actual traffic is much heavier
{
  "AgentIP": "192.168.99.24",
  "Bytes": 6938,
  "Dscp": 0,
  "DstAddr": "10.133.0.21",
  "DstK8S_HostIP": "192.168.99.24",
  "DstK8S_HostName": "master-02-demo",
  "DstK8S_Name": "wzh-demo-pod",
  "DstK8S_Namespace": "llm-demo",
  "DstK8S_NetworkName": "primary",
  "DstK8S_OwnerName": "wzh-demo-pod",
  "DstK8S_OwnerType": "Pod",
  "DstK8S_Type": "Pod",
  "DstMac": "0a:58:64:58:00:02",
  "DstPort": 52960,
  "DstSubnetLabel": "Pods",
  "Etype": 2048,
  "Flags": [
    "ACK"
  ],
  "FlowDirection": "0",
  "IfDirections": [
    0,
    0
  ],
  "Interfaces": [
    "genev_sys_6081",
    "eth0"
  ],
  "K8S_FlowLayer": "app",
  "Packets": 3,
  "Proto": 6,
  "Sampling": 50,
  "SrcAddr": "142.251.152.119",
  "SrcMac": "0a:58:64:58:00:04",
  "SrcPort": 443,
  "TimeFlowEndMs": 1776231454328,
  "TimeFlowRttNs": 8421000,
  "TimeFlowStartMs": 1776231454316,
  "TimeReceived": 1776231455,
  "Udns": [
    ""
  ],
  "app": "netobserv-flowcollector"
}

The remaining screenshots show additional views and dashboards available in the NetObserv UI:

block google with egress firewall

background

OVN EgressFirewall supports blocking traffic by CIDR range. However, it does not support domain name blocking in a reliable way for Google, because:

  • Google serves its services from hundreds of constantly-changing IP addresses
  • DNS-based rules (dnsName) in EgressFirewall use cached resolutions that cannot keep up with Google’s IP rotation

The solution is to use Google’s own published IP range lists to compute the exact CIDRs, then automatically update the EgressFirewall daily.

ip range strategy

Google publishes two IP range lists:

  • https://www.gstatic.com/ipranges/goog.jsonAll Google-owned IPs (including GCP customer IPs)
  • https://www.gstatic.com/ipranges/cloud.json — GCP customer IPs (VMs, Cloud Functions, etc.)

The formula: goog.json minus cloud.json = Google’s own service IPs (Search, Gmail, YouTube, Maps, etc.)

This avoids over-blocking legitimate GCP-hosted services while targeting Google’s consumer/search services.

architecture

flowchart TD
    subgraph nsUpdater[egress-fw-updater - no EgressFirewall]
        CJ[CronJob<br/>runs daily at 02h00]
        POD[Pod python3 + curl<br/>compute goog minus cloud = 91 CIDRs<br/>generate EgressFirewall YAML]
        CJ --> POD
    end

    GSTATIC[www.gstatic.com<br/>goog.json and cloud.json]

    subgraph nsDemo[llm-demo - has EgressFirewall]
        EFW[EgressFirewall default<br/>Allow 192.168.99.0/24 node network<br/>Allow 172.22.0.0/16 service network<br/>Allow 10.132.0.0/14 pod network<br/>Deny 91 Google CIDRs<br/>Allow 0.0.0.0/0 all other traffic]
    end

    POD -- "fetch IP lists" --> GSTATIC
    POD -- "PATCH via K8s API ClusterRole" --> EFW

    style CJ fill:#C8E6C9,stroke:#388E3C
    style POD fill:#C8E6C9,stroke:#388E3C
    style GSTATIC fill:#BBDEFB,stroke:#1976D2
    style EFW fill:#FFE0B2,stroke:#F57C00

Key Design Point: The CronJob must run in a separate namespace with no EgressFirewall.

If the CronJob is in the same namespace as the EgressFirewall it manages, it will be blocked from reaching www.gstatic.com (a Google IP) and fail to download the IP lists.

deploy the automation


# apply all resources at once:
# - Namespace: egress-fw-updater (no EgressFirewall)
# - ServiceAccount + ClusterRole + ClusterRoleBinding
# - ConfigMap (Python script)
# - CronJob (runs daily at 02:00)

cat << 'EOF' > ${BASE_DIR}/data/install/egress-firewall-google-updater.yaml
---
# Dedicated namespace for the updater - NO EgressFirewall here
apiVersion: v1
kind: Namespace
metadata:
  name: egress-fw-updater

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: egress-firewall-updater
  namespace: egress-fw-updater

---
# ClusterRole: can manage EgressFirewall in any namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: egress-firewall-updater
rules:
- apiGroups: ["k8s.ovn.org"]
  resources: ["egressfirewalls"]
  verbs: ["get", "create", "update", "patch", "delete"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: egress-firewall-updater
subjects:
- kind: ServiceAccount
  name: egress-firewall-updater
  namespace: egress-fw-updater
roleRef:
  kind: ClusterRole
  name: egress-firewall-updater
  apiGroup: rbac.authorization.k8s.io

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: egress-firewall-updater-script
  namespace: egress-fw-updater
data:
  update.py: |
    import json, urllib.request, ipaddress, sys, os

    def fetch_json(url):
        with urllib.request.urlopen(url, timeout=30) as r:
            return json.loads(r.read())

    # TARGET_NAMESPACE: the namespace to apply EgressFirewall to
    NS              = os.environ.get("TARGET_NAMESPACE", "llm-demo")
    MACHINE_NETWORK = os.environ.get("MACHINE_NETWORK", "192.168.99.0/24")
    SERVICE_NETWORK = os.environ.get("SERVICE_NETWORK", "172.22.0.0/16")
    CLUSTER_NETWORK = os.environ.get("CLUSTER_NETWORK", "10.132.0.0/14")

    print("Fetching goog.json from www.gstatic.com ...")
    goog  = fetch_json("https://www.gstatic.com/ipranges/goog.json")
    print("Fetching cloud.json from www.gstatic.com ...")
    cloud = fetch_json("https://www.gstatic.com/ipranges/cloud.json")

    def get_v4(data):
        return {ipaddress.ip_network(p["ipv4Prefix"])
                for p in data["prefixes"] if "ipv4Prefix" in p}

    goog_v4  = get_v4(goog)
    cloud_v4 = get_v4(cloud)

    # goog - cloud = Google own service IPs (not GCP customer IPs)
    google_only = sorted(
        [net for net in goog_v4
         if not any(net.subnet_of(c) for c in cloud_v4)],
        key=lambda n: (n.network_address, n.prefixlen)
    )
    print(f"goog IPv4: {len(goog_v4)}, cloud IPv4: {len(cloud_v4)}, google-only: {len(google_only)}")

    lines = []
    lines.append("apiVersion: k8s.ovn.org/v1")
    lines.append("kind: EgressFirewall")
    lines.append("metadata:")
    lines.append(f"  name: default")
    lines.append(f"  namespace: {NS}")
    lines.append("spec:")
    lines.append("  egress:")
    # Allow internal cluster networks first (must be before deny rules)
    for cidr, comment in [
        (MACHINE_NETWORK, "node/machine network (API server access)"),
        (SERVICE_NETWORK,  "service network"),
        (CLUSTER_NETWORK,  "pod/cluster network"),
    ]:
        lines.append(f"  - type: Allow")
        lines.append(f"    to:")
        lines.append(f"      cidrSelector: {cidr}")
    # Deny Google-only CIDRs
    for net in google_only:
        lines.append(f"  - type: Deny")
        lines.append(f"    to:")
        lines.append(f"      cidrSelector: {net}")
    # Allow everything else
    lines.append(f"  - type: Allow")
    lines.append(f"    to:")
    lines.append(f"      cidrSelector: 0.0.0.0/0")

    with open("/tmp/egress-firewall.yaml", "w") as f:
        f.write("\n".join(lines))
    print(f"YAML written ({len(google_only) + 4} rules total)")

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: egress-firewall-google-updater
  namespace: egress-fw-updater
spec:
  schedule: "0 2 * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: egress-firewall-updater
          restartPolicy: OnFailure
          containers:
          - name: updater
            image: quay.io/wangzheng422/qimgs:centos9-test-2025.12.18.v01
            env:
            # Target namespace where EgressFirewall will be applied
            - name: TARGET_NAMESPACE
              value: "llm-demo"
            # Cluster network CIDRs to allow (customize for your cluster)
            - name: MACHINE_NETWORK
              value: "192.168.99.0/24"
            - name: SERVICE_NETWORK
              value: "172.22.0.0/16"
            - name: CLUSTER_NETWORK
              value: "10.132.0.0/14"
            command:
            - /bin/bash
            - -c
            - |
              set -e
              echo "=== Step 1: Generate EgressFirewall YAML ==="
              python3 /scripts/update.py

              echo "=== Step 2: Apply via Kubernetes Server-Side Apply API ==="
              TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)

              HTTP_RESULT=$(curl -k -s -w "\nHTTP_STATUS:%{http_code}" \
                -X PATCH \
                -H "Authorization: Bearer ${TOKEN}" \
                -H "Content-Type: application/apply-patch+yaml" \
                "https://kubernetes.default.svc/apis/k8s.ovn.org/v1/namespaces/${TARGET_NAMESPACE}/egressfirewalls/default?fieldManager=egress-firewall-updater&force=true" \
                --data-binary @/tmp/egress-firewall.yaml)

              HTTP_STATUS=$(echo "${HTTP_RESULT}" | grep HTTP_STATUS | cut -d: -f2)
              echo "Apply HTTP status: ${HTTP_STATUS}"
              if [[ "${HTTP_STATUS}" == "200" || "${HTTP_STATUS}" == "201" ]]; then
                echo "=== EgressFirewall in ${TARGET_NAMESPACE} updated successfully ==="
              else
                echo "=== ERROR: HTTP ${HTTP_STATUS} ==="
                echo "${HTTP_RESULT}"
                exit 1
              fi
            volumeMounts:
            - name: scripts
              mountPath: /scripts
          volumes:
          - name: scripts
            configMap:
              name: egress-firewall-updater-script
EOF

oc apply -f ${BASE_DIR}/data/install/egress-firewall-google-updater.yaml

# to delete
# oc delete -f ${BASE_DIR}/data/install/egress-firewall-google-updater.yaml
# oc delete egressfirewall default -n llm-demo

manually trigger and verify


# manually trigger one run (for testing, without waiting for cron schedule)
oc create job -n egress-fw-updater egress-fw-test-run \
  --from=cronjob/egress-firewall-google-updater

# watch job status
oc get job -n egress-fw-updater egress-fw-test-run -w

# check job logs
oc logs -n egress-fw-updater -l job-name=egress-fw-test-run

# expected log output:
# === Step 1: Generate EgressFirewall YAML ===
# Fetching goog.json from www.gstatic.com ...
# Fetching cloud.json from www.gstatic.com ...
# goog IPv4: 96, cloud IPv4: 862, google-only: 91
# YAML written (95 rules total)
# === Step 2: Apply via Kubernetes Server-Side Apply API ===
# Apply HTTP status: 200
# === EgressFirewall in llm-demo updated successfully ===

# verify EgressFirewall status
oc get egressfirewall -n llm-demo
# NAME      EGRESSFIREWALL STATUS
# default   EgressFirewall Rules applied

# check rule count
oc get egressfirewall -n llm-demo default -o json | jq '.spec.egress | length'
# 95

verify google is blocked


# Before applying EgressFirewall - Google is accessible
oc exec -n llm-demo wzh-demo-pod -- curl -s --max-time 8 \
  -o /dev/null -w "%{http_code}" https://www.google.com
# 200

# After applying EgressFirewall - Google is blocked (connection timeout)
oc exec -n llm-demo wzh-demo-pod -- curl -s --max-time 8 \
  -o /dev/null -w "%{http_code}" https://www.google.com
# 000  (exit code 28 = timeout, blocked by EgressFirewall)

# Other sites remain accessible
oc exec -n llm-demo wzh-demo-pod -- curl -s --max-time 8 \
  -o /dev/null -w "%{http_code}" https://www.baidu.com
# 200

notes

  • OVN EgressFirewall max rules per namespace: 8,000 (current usage: ~95, well within limit)
  • Google IP list changes infrequently; daily updates are sufficient
  • The CronJob uses Kubernetes Server-Side Apply (application/apply-patch+yaml) via curl directly, requiring no oc or kubectl binary in the container image
  • Adjust MACHINE_NETWORK, SERVICE_NETWORK, CLUSTER_NETWORK env vars to match your cluster’s actual CIDRs before deploying

eBPF deep-dive: testing every feature module

overview

The Network Observability Operator 1.10 on OCP 4.20 supports the following eBPF agent features, all configured via the spec.agent.ebpf.features list in the FlowCollector CR. This section documents a systematic test of every feature module, including deployment steps, CLI verification, flow record evidence, and CPU/memory impact analysis.

The following features were enabled simultaneously for testing:

spec:
  agent:
    ebpf:
      features:
        - PacketDrop
        - DNSTracking
        - FlowRTT
        - NetworkEvents
        - PacketTranslation
        - UDNMapping
        - IPSec
      privileged: true    # required for PacketDrop, NetworkEvents, UDNMapping
      sampling: 50         # 1-in-50 packet sampling
      cacheMaxFlows: 120000
      cacheActiveTimeout: "15s"
Feature Purpose Requires privileged: true Key Flow Fields
FlowRTT TCP round-trip time measurement TimeFlowRttNs
DNSTracking DNS request/response latency & RCODE DnsId, DnsLatencyMs, DnsFlagsResponseCode
PacketDrop Kernel-level packet drop tracking PktDropBytes, PktDropPackets, PktDropLatestDropCause, PktDropLatestState
PacketTranslation Service → Pod address translation (xlat) XlatSrcAddr, XlatDstAddr, XlatSrcPort, XlatDstPort, XlatDstK8S_Name
NetworkEvents OVN-Kubernetes ACL action tracking NetworkEvents (array of Type/Action/Feature)
UDNMapping User-defined network interface mapping Udns
IPSec IPSec encryption status tracking IPSec-related fields

test environment

Cluster:       OCP 4.20.21 (3 control-plane/worker nodes)
Operator:      Network Observability Operator 1.10
Agent:         eBPF DaemonSet (3 pods, one per node)
Backend:       LokiStack (1x.demo, backed by rustfs S3)
Namespace:     netobserv-test (test workloads), llm-demo (egress IP test)

baseline resource usage (all features enabled)

# eBPF agent resource consumption with ALL 7 features enabled
oc adm top pods -n netobserv-privileged
# NAME                         CPU(cores)   MEMORY(bytes)
# netobserv-ebpf-agent-48t8v   11m          164Mi         (master-03)
# netobserv-ebpf-agent-h9ldf   12m          160Mi         (master-02)
# netobserv-ebpf-agent-q92x7   11m          144Mi         (master-01)

# Node-level resource usage
oc adm top nodes
# NAME             CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)
# master-01-demo   1341m        5%       9274Mi          32%
# master-02-demo   1704m        7%       12737Mi         43%
# master-03-demo   2296m        9%       15883Mi         54%

Key observation: Each eBPF agent pod uses approximately 11–12m CPU (0.01 cores) and 144–164 MiB memory with all 7 features enabled. This is remarkably lightweight — the eBPF programs run in kernel space and only the user-space agent process (which aggregates and forwards flows) shows up in the pod metrics. The actual kernel-side CPU overhead is not reflected in pod metrics but is included in the node-level CPU figures.

deploy test workloads


# Create test namespace with Service + backend pods + traffic generator
cat << EOF | oc apply -f -
---
apiVersion: v1
kind: Namespace
metadata:
  name: netobserv-test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
  namespace: netobserv-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-server
  template:
    metadata:
      labels:
        app: web-server
    spec:
      containers:
      - name: nginx
        image: quay.io/wangzheng422/qimgs:centos9-test-2025.12.18.v01
        command: ["/bin/bash", "-c"]
        args:
        - |
          python3 -m http.server 8080
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: web-svc
  namespace: netobserv-test
spec:
  selector:
    app: web-server
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
---
apiVersion: v1
kind: Pod
metadata:
  name: traffic-gen
  namespace: netobserv-test
  labels:
    app: traffic-gen
spec:
  nodeSelector:
    kubernetes.io/hostname: 'master-02-demo'
  containers:
  - name: client
    image: quay.io/wangzheng422/qimgs:centos9-test-2025.12.18.v01
    command: ["/bin/bash", "-c", "tail -f /dev/null"]
EOF

# Deploy NetworkPolicy to allow only port 8080 ingress to web-server
# Traffic to other ports (e.g., 9999) will be dropped by OVN → captured by PacketDrop
cat << EOF | oc apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-port-9999
  namespace: netobserv-test
spec:
  podSelector:
    matchLabels:
      app: web-server
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: traffic-gen
    ports:
    - protocol: TCP
      port: 8080
  - from:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 8080
EOF

generate test traffic


# 1. Service traffic for PacketTranslation (xlat) — via ClusterIP
SVC_IP=$(oc get svc web-svc -n netobserv-test -o jsonpath="{.spec.clusterIP}")
echo "Service ClusterIP: $SVC_IP"
# Service ClusterIP: 172.22.114.234
for i in $(seq 1 10); do
  oc exec -n netobserv-test traffic-gen -- \
    curl -s --max-time 3 -o /dev/null -w "req$i: HTTP %{http_code} " http://$SVC_IP:80/
done
# req1: HTTP 200 req2: HTTP 200 ... req10: HTTP 200

# 2. DNS queries for DNSTracking
for domain in www.google.com www.baidu.com kubernetes.default.svc redhat.com github.com; do
  oc exec -n netobserv-test traffic-gen -- nslookup $domain > /dev/null 2>&1 \
    && echo "DNS OK: $domain" || echo "DNS FAIL: $domain"
done
# DNS OK: www.google.com
# DNS OK: www.baidu.com
# DNS OK: kubernetes.default.svc
# DNS OK: redhat.com
# DNS OK: github.com

# 3. External traffic for FlowRTT measurement
oc exec -n netobserv-test traffic-gen -- \
  curl -s --max-time 5 -o /dev/null -w "HTTP %{http_code} RTT_connect=%{time_connect}s" \
  https://www.baidu.com
# HTTP 200 RTT_connect=0.594255s

# 4. Blocked traffic for PacketDrop — connect to port 9999 (blocked by NetworkPolicy)
WEB_POD_IP=$(oc get pod -n netobserv-test -l app=web-server -o jsonpath="{.items[0].status.podIP}")
oc exec -n netobserv-test traffic-gen -- \
  timeout 3 bash -c "echo test | nc -w 2 $WEB_POD_IP 9999"
# Ncat: TIMEOUT. (connection blocked by NetworkPolicy — packet dropped by OVN)

feature 1: FlowRTT (TCP Round-Trip Time)

what it does

FlowRTT uses eBPF to measure TCP Smoothed Round-Trip Time (sRTT) at the kernel level. It hooks into the TCP stack and reads the tcp_sock->srtt_us value, recording it in nanoseconds as the TimeFlowRttNs field in each flow record. This provides accurate network latency measurement without any application-level instrumentation.

evidence from Loki flow records

Multiple flows captured with TimeFlowRttNs values showing intra-cluster RTT measurements:

Flow: 10.134.0.2:58136 -> 10.134.0.30:3101   RTT = 89,000 ns (0.089 ms)   — same-node
Flow: 10.133.0.91:34732 -> 10.134.0.30:3100   RTT = 912,000 ns (0.9 ms)    — cross-node
Flow: 10.134.0.14:45106 -> 10.134.0.29:9095   RTT = 13,508,000 ns (13.5 ms) — Loki ingester
Flow: 192.168.99.25:10250 -> 10.132.0.90       RTT = 10,041,000 ns (10 ms)  — kubelet
Flow: 10.133.0.4:8443 -> 10.133.0.91:53320     RTT = 15,425,000 ns (15.4 ms) — API server

Interpretation: Same-node flows show sub-millisecond RTT (0.089ms). Cross-node flows via GENEVE overlay show 0.9–15ms RTT depending on load and path. These RTT values are measured at the TCP layer by eBPF, not by ICMP ping, so they reflect actual application-perceived latency.

CPU impact

FlowRTT adds minimal overhead because it reads an existing kernel field (tcp_sock->srtt_us) rather than performing new measurements. The eBPF program simply copies a value that the kernel already maintains for TCP congestion control.

feature 2: DNSTracking

what it does

DNSTracking captures DNS request/response pairs at the eBPF level. It hooks into UDP port 53 traffic and parses DNS headers to extract:

  • DnsId — DNS transaction ID
  • DnsLatencyMs — time between request and response
  • DnsFlagsResponseCode — DNS RCODE (0=NOERROR, 3=NXDOMAIN, etc.)
  • DnsErrno — system-level DNS error code

observation

With sampling rate of 50 (1 in 50 packets), DNS tracking captures are sparse because DNS queries are typically single-packet request/response pairs. The probability of capturing both the request AND response for the same DNS transaction is (1/50) × (1/50) = 0.04%. To reliably observe DNS tracking data, reduce the sampling rate:

# Temporarily reduce sampling to capture DNS flows
oc patch flowcollector cluster --type=json \
  -p '[{"op": "replace", "path": "/spec/agent/ebpf/sampling", "value": 1}]'

# After testing, restore default sampling
oc patch flowcollector cluster --type=json \
  -p '[{"op": "replace", "path": "/spec/agent/ebpf/sampling", "value": 50}]'

CPU impact

DNSTracking adds a small overhead because the eBPF program must parse DNS packet headers (beyond the normal IP/TCP/UDP header inspection). For each DNS packet, the agent decodes the DNS header to extract the transaction ID, flags, and response code. At sampling=50, the impact is negligible. At sampling=1, expect a noticeable increase in CPU for DNS-heavy workloads.

feature 3: PacketDrop

what it does

PacketDrop hooks into the kernel’s kfree_skb tracepoint to detect when packets are dropped. It captures:

  • PktDropBytes — total bytes dropped
  • PktDropPackets — number of packets dropped
  • PktDropLatestDropCause — kernel drop reason (e.g., SKB_DROP_REASON_TCP_RESET)
  • PktDropLatestState — TCP state at time of drop (e.g., TCP_INVALID_STATE)
  • PktDropLatestFlags — TCP flags of dropped packet

Requires privileged: true because the kfree_skb tracepoint is a privileged kernel operation.

evidence from Loki flow records

A packet drop was captured from the Loki internal traffic:

{
  "SrcAddr": "10.132.0.90",
  "DstAddr": "10.134.0.30",
  "SrcPort": 47464,
  "DstPort": 3100,
  "PktDropBytes": 32,
  "PktDropPackets": 1,
  "PktDropLatestDropCause": "SKB_DROP_REASON_TCP_RESET",
  "PktDropLatestFlags": 16,
  "PktDropLatestState": "TCP_INVALID_STATE"
}

Interpretation: This shows a TCP RST packet (flags=16=ACK) being dropped by the kernel because it arrived in an invalid TCP state. This is normal TCP behavior — connection teardown can result in transient invalid states. The important point is that eBPF captures these drops that are invisible to application-level monitoring.

test with NetworkPolicy-blocked traffic

The NetworkPolicy we deployed blocks traffic to port 9999 on web-server pods. When traffic-gen attempts to connect to port 9999, OVN drops the packet. The eBPF agent captures this as a SYN packet on the GENEVE tunnel interface:

{
  "SrcAddr": "10.132.0.46",
  "DstAddr": "10.134.0.36",
  "SrcPort": 37186,
  "DstPort": 9999,
  "SrcK8S_Name": "traffic-gen",
  "DstK8S_Name": "web-server-65d6fb4bc4-2l2nb",
  "Flags": ["SYN"],
  "Interfaces": ["genev_sys_6081"],
  "Packets": 1,
  "Bytes": 74
}

Note: The SYN packet is captured at the GENEVE tunnel interface before OVN drops it. PacketDrop fields (PktDropBytes, etc.) appear when the kernel’s kfree_skb is triggered, which may happen at a different point in the OVN processing pipeline.

CPU impact

PacketDrop attaches to the kfree_skb tracepoint which fires for every dropped packet in the kernel — not just sampled ones. The eBPF program then records the drop reason and associates it with the flow. On nodes with high packet drop rates (e.g., under DDoS or heavy NetworkPolicy enforcement), this can add measurable CPU overhead. In normal cluster operation, packet drops are infrequent, so the impact is minimal.

feature 4: PacketTranslation (xlat)

what it does

PacketTranslation enriches flow records with the translated endpoint information — i.e., when a packet passes through NAT/DNAT (e.g., Service ClusterIP → backend Pod IP), both the original and translated addresses are recorded. This allows you to see:

  • Which Service VIP was accessed (DstAddr: 172.22.0.1:443)
  • Which actual backend handled the request (XlatDstAddr: 192.168.99.24, XlatDstPort: 6443)

evidence from Loki flow records

Service-to-node translation (Kubernetes API Server):

{
  "SrcAddr": "10.132.0.54",
  "DstAddr": "172.22.0.1",
  "SrcPort": 38680,
  "DstPort": 443,
  "SrcK8S_Name": "machine-api-controllers-5644995fb-fpvtt",
  "SrcK8S_Namespace": "openshift-machine-api",
  "XlatDstAddr": "192.168.99.24",
  "XlatDstK8S_Name": "master-02-demo",
  "XlatDstK8S_Type": "Node",
  "XlatDstPort": 6443,
  "XlatSrcAddr": "10.132.0.54",
  "XlatSrcK8S_Name": "machine-api-controllers-5644995fb-fpvtt",
  "XlatSrcK8S_Type": "Pod"
}

Interpretation: The machine-api-controllers pod accessed the Kubernetes API at 172.22.0.1:443 (the Service ClusterIP). OVN DNAT’d this to 192.168.99.24:6443 (master-02-demo’s actual API server port). The Xlat* fields reveal this translation, which is invisible in standard flow logs.

Node-to-node etcd translation:

{
  "SrcAddr": "10.134.0.27",
  "DstAddr": "192.168.99.25",
  "DstPort": 2379,
  "XlatDstAddr": "192.168.99.25",
  "XlatDstK8S_Name": "master-03-demo",
  "XlatDstK8S_Type": "Node",
  "XlatSrcAddr": "192.168.99.23",
  "XlatSrcK8S_Name": "master-01-demo",
  "XlatSrcK8S_Type": "Node",
  "XlatSrcPort": 41234
}

CPU impact

PacketTranslation uses eBPF’s connection tracking (conntrack) integration to look up NAT mappings. This adds a conntrack lookup for each captured packet, which is a constant-time hash table operation. The overhead is proportional to the number of sampled packets, making it negligible at the default sampling rate of 50.

feature 5: NetworkEvents

what it does

NetworkEvents tracks OVN-Kubernetes ACL (Access Control List) actions — whenever OVN processes a packet against NetworkPolicy, AdminNetworkPolicy, BaselineNetworkPolicy, EgressFirewall, UDN isolation, or Multicast rules, the action (allow/drop) is recorded in the NetworkEvents field.

This is a Technology Preview feature as of OCP 4.20.

observation

NetworkEvents requires OVN-Kubernetes to emit ACL action events via a specific mechanism that links eBPF observations to OVN internal state. In our testing with sampling=50, no NetworkEvents were captured in the 15-minute observation window. This is expected behavior:

  1. NetworkEvents are generated only for new connections (not every packet)
  2. OVN ACL events are relatively rare compared to data-plane traffic
  3. With 1-in-50 sampling, the probability of capturing the exact first packet of a new connection (where the ACL decision is recorded) is low

To reliably observe NetworkEvents, use the OCP console’s Network Traffic UI with the “Network events” filter, or temporarily reduce the sampling rate.

CPU impact

NetworkEvents adds overhead by hooking into the OVN ACL processing path. Since ACL evaluations happen for every new connection (not every packet), the per-packet overhead is minimal. The main cost is the additional metadata collection and correlation between eBPF and OVN event streams.

feature 6: UDNMapping

what it does

UDNMapping (User-Defined Network Mapping) maps network flows to user-defined networks (UDNs). When pods are attached to custom networks (created via UserDefinedNetwork CRDs), this feature identifies which network a flow belongs to via the Udns field.

Requires privileged: true because it needs access to network namespace information.

evidence from Loki flow records

All flow records include the Udns field showing the network name:

{
  "Udns": ["default"]          // Pod on the default cluster network
}
{
  "Udns": ["", "default"]      // Flow observed on multiple interfaces
}

In our cluster, all pods use the default OVN-Kubernetes network, so all Udns values show "default". When UserDefinedNetwork CRs are created and pods are attached to them, the Udns field would show the custom network name (e.g., "my-tenant-network").

CPU impact

UDNMapping adds a lookup to determine which UDN a network interface belongs to. This is a per-flow operation (not per-packet due to flow caching), so the overhead scales with the number of unique flows rather than packet rate. Impact is negligible for most workloads.

feature 7: IPSec

what it does

IPSec tracking detects whether traffic is encrypted/decrypted using IPSec. When IPSec is configured at the cluster level (via the Cluster Network Operator), this feature adds encryption status metadata to flow records.

observation

Our test cluster does not have cluster-level IPSec enabled (it requires spec.defaultNetwork.ovnKubernetesConfig.ipsecConfig in the Network operator CR). Therefore, no IPSec-specific fields appear in flow records. The feature is enabled in the FlowCollector but has no effect without IPSec infrastructure.

To test this feature, IPSec must be enabled at the cluster level, which requires re-encrypting all inter-node traffic — a significant infrastructure change not suitable for a simple lab test.

eBPF Flow Filter (bonus feature)

what it does

The eBPF flow filter (spec.agent.ebpf.flowFilter) allows kernel-level filtering of flows before they leave the eBPF agent. This is different from Loki-side filtering — it reduces the amount of data sent from the agent, saving CPU, memory, and network bandwidth.

configuration example

spec:
  agent:
    ebpf:
      flowFilter:
        enable: true
        rules:
        - cidr: "10.132.0.0/14"
          action: Accept
          protocol: TCP
          ports: 8080
          direction: Ingress
        - cidr: "0.0.0.0/0"
          action: Reject

This configuration would:

  1. Accept only TCP port 8080 traffic within the pod network
  2. Reject all other traffic at the eBPF level (never sent to flowlogs-pipeline or Loki)

Note: Flow filtering was not applied during our testing because we wanted to capture all features’ data. In production, flow filtering can significantly reduce resource consumption by eliminating uninteresting infrastructure traffic.

CPU and system impact analysis

eBPF agent resource consumption

Metric Value (per agent) Notes
CPU usage 11–12m (0.01 cores) With all 7 features enabled
Memory usage 144–164 MiB Flow cache + eBPF maps
CPU request 100m Configured in FlowCollector
Memory limit 800 MiB Configured in FlowCollector

kernel-side impact

The eBPF programs run inside the kernel and their CPU consumption is not reflected in the agent pod’s CPU metrics. Instead, it shows up in the node-level CPU figures. The kernel-side overhead depends on:

  1. Packet rate — More packets = more eBPF invocations
  2. Sampling rate — Lower sampling = more processing per packet
  3. Features enabled — Each feature adds hooks to different kernel subsystems
  4. Flow diversity — More unique flows = larger eBPF hash maps

In our 3-node cluster with light test traffic:

  • Node CPU usage: 5–9% (1341m–2296m out of ~24 cores per node)
  • eBPF overhead is estimated at < 1% additional CPU per node

flowlogs-pipeline resource consumption

Metric Value
CPU usage 49m (0.05 cores)
Memory usage 63 MiB

The flowlogs-pipeline pod processes and enriches flows before sending to Loki. Its resource consumption scales with the flow rate (which depends on cluster traffic volume and sampling rate).

recommendations for production

  1. Sampling rate: Start with the default of 50. Lower values increase accuracy but also increase CPU, memory, and storage. For most clusters, sampling=50 provides sufficient visibility.

  2. Feature selection: Enable only the features you need:

    • FlowRTT — low overhead, high value for latency troubleshooting
    • PacketTranslation — low overhead, essential for Service traffic visibility
    • PacketDrop — moderate overhead (requires privileged: true), valuable for NetworkPolicy debugging
    • DNSTracking — low overhead, useful for DNS troubleshooting
    • NetworkEvents — moderate overhead, useful for security audit
    • UDNMapping — low overhead, only needed with UserDefinedNetworks
    • IPSec — negligible overhead, only relevant with IPSec-enabled clusters
  3. Flow filtering: Use spec.agent.ebpf.flowFilter to exclude infrastructure traffic (e.g., etcd, kubelet health checks) at the kernel level, significantly reducing downstream resource usage.

  4. Memory limits: The default 800 MiB memory limit for the eBPF agent is sufficient for most workloads. If NetObservAgentFlowsDropped alerts fire, increase cacheMaxFlows and the memory limit.

querying Loki for flow records

You can query Loki directly to inspect flow records with feature fields:

# Get authentication token
TOKEN=$(oc whoami -t)

# Query flows from a specific namespace (last 10 minutes)
NOW=$(date +%s)
START=$((NOW-600))

oc exec -n netobserv flowlogs-pipeline-<pod-id> -- curl -s -k \
  "https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network/loki/api/v1/query_range" \
  --data-urlencode 'query={SrcK8S_Namespace="netobserv-test"}' \
  --data-urlencode "limit=10" \
  --data-urlencode "start=${START}000000000" \
  --data-urlencode "end=${NOW}000000000" \
  -H "Authorization: Bearer $TOKEN"

Important: Use the query_range endpoint (not query) because Loki stores flow logs as log streams, which require range queries.

feature fields summary

The following table shows all eBPF-specific fields that may appear in flow records stored in Loki:

Field Feature Type Description
TimeFlowRttNs FlowRTT integer TCP smoothed RTT in nanoseconds
DnsId DNSTracking integer DNS transaction ID
DnsLatencyMs DNSTracking integer DNS query-to-response latency in ms
DnsFlagsResponseCode DNSTracking string DNS RCODE (NOERROR, NXDOMAIN, etc.)
DnsErrno DNSTracking integer System DNS error code
PktDropBytes PacketDrop integer Total bytes in dropped packets
PktDropPackets PacketDrop integer Number of dropped packets
PktDropLatestDropCause PacketDrop string Kernel drop reason
PktDropLatestState PacketDrop string TCP state at drop time
PktDropLatestFlags PacketDrop integer TCP flags of dropped packet
XlatSrcAddr PacketTranslation string Pre-NAT source IP
XlatDstAddr PacketTranslation string Pre-NAT destination IP
XlatSrcPort PacketTranslation integer Pre-NAT source port
XlatDstPort PacketTranslation integer Pre-NAT destination port
XlatSrcK8S_Name PacketTranslation string Pre-NAT source K8s object
XlatDstK8S_Name PacketTranslation string Pre-NAT destination K8s object
NetworkEvents NetworkEvents array OVN ACL actions (allow/drop)
Udns UDNMapping array User-defined network names

cleanup


# Remove test workloads
oc delete namespace netobserv-test

# To disable specific features, patch the FlowCollector:
# oc patch flowcollector cluster --type=json \
#   -p '[{"op": "replace", "path": "/spec/agent/ebpf/features", "value": ["FlowRTT"]}]'

eBPF kernel-level deep-dive: bpftool inspection on CoreOS nodes

CoreOS nodes do not ship bpftool, so we use oc debug with a custom image that has it pre-installed. The --image flag overrides the debug pod’s container image, but we do not use chroot /host because the host filesystem lacks bpftool. Instead, the debug pod runs with host privileges and can directly access the kernel’s eBPF subsystem.


IMG="quay.io/wangzheng422/qimgs:centos9-test-2025.12.18.v01"

# list all eBPF programs on master-01
oc debug node/master-01-demo --image=$IMG -- bash -c 'bpftool prog list'

# list all eBPF maps
oc debug node/master-01-demo --image=$IMG -- bash -c 'bpftool map list'

# show network attachment points
oc debug node/master-01-demo --image=$IMG -- bash -c 'bpftool net list'

# export program list as JSON for scripted analysis
oc debug node/master-01-demo --image=$IMG -- bash -c 'bpftool prog list --json'

NetObserv eBPF programs on master-01-demo

Running bpftool prog list on master-01-demo reveals 145 total eBPF programs loaded system-wide. Of these, 16 belong to netobserv-ebpf-agent (PID 7253). The following table maps each program to its NetObserv feature:

ID Type Name Feature xlated jited memlock
133 tracepoint kfree_skb PacketDrop 4,208B 2,375B 12,288B
134 kprobe network_events_monitoring NetworkEvents 6,560B 3,695B 16,384B
135 kprobe probe_entry_SSL_write TLS/SSL monitoring 48B 39B 4,096B
136 sched_cls tc_egress_flow_parse Flow - TC egress 9,904B 6,148B 20,480B
137 sched_cls tc_egress_pca_parse PacketCapture - TC 80B 69B 12,288B
138 sched_cls tc_ingress_flow_parse Flow - TC ingress 9,904B 6,145B 20,480B
141 tracing tcp_rcv_fentry FlowRTT - fentry 2,936B 1,716B 12,288B
142 kprobe tcp_rcv_kprobe FlowRTT - kprobe fallback 2,928B 1,713B 12,288B
143 sched_cls tcx_egress_flow_parse Flow - TCX egress 9,912B 6,151B 20,480B
145 sched_cls tcx_ingress_flow_parse Flow - TCX ingress 9,912B 6,148B 20,480B
146 sched_cls tcx_ingress_pca_parse PacketCapture - TCX 88B 69B 12,288B
147 kprobe track_nat_manip_pkt PacketTranslation 6,184B 3,733B 16,384B
148 kprobe xfrm_input_kprobe IPSec - input 3,680B 2,123B 12,288B
149 kprobe xfrm_input_kretprobe IPSec - input return 1,312B 876B 4,096B
150 kprobe xfrm_output_kprobe IPSec - output 3,704B 2,141B 12,288B
151 kprobe xfrm_output_kretprobe IPSec - output return 1,336B 894B 4,096B

Total program memlock: 208.0 KB (all 16 NetObserv programs combined).

Key observations:

  • The flow_parse programs (tc/tcx variants) are the largest at ~10 KB xlated each — they are the “main” packet processing programs that run on every interface.
  • FlowRTT loads two programs: tcp_rcv_fentry (using the modern fentry/fexit tracing mechanism) and tcp_rcv_kprobe (kprobe fallback for older kernels). Only one is actively used at runtime.
  • IPSec loads 4 programs (input/output × kprobe/kretprobe) to track both directions of IPSec processing.
  • probe_entry_SSL_write is a tiny 48B program that hooks into OpenSSL’s SSL_write function for TLS-aware flow detection.

eBPF maps memory usage

The eBPF maps are where the real memory lives. Running bpftool map list shows 18 NetObserv-related maps:

ID Type Name max_entries memlock Purpose
9 array .bss 1 24,576B Global state
10 array .rodata 1 8,192B Read-only config
11 percpu_array global_counters 11 2,464B Per-CPU counters
12 lpm_trie filter_map 1 0B Flow filter
13 lpm_trie peer_filter_map 1 0B Peer filter
14 percpu_hash aggregated_flow 120,000 2,108,536B Flow aggregation - PacketDrop
15 percpu_hash aggregated_flow 120,000 2,098,816B Flow aggregation - NetworkEvents
16 ringbuf ssl_data_event_ 4,096 16,680B SSL event buffer
17 hash dns_flows 1,048,576 16,779,192B DNS flow tracking ★
18 percpu_array dns_name_map 1 1,040B DNS name cache
19 hash aggregated_flow 120,000 2,196,928B Flow aggregation - DNSTracking
20 percpu_hash aggregated_flow 120,000 2,105,440B Flow aggregation - main
21 ringbuf direct_flows 4,096 16,680B Direct flow export
22 ringbuf packet_record 4,096 16,680B Packet capture buffer
23 percpu_hash additional_flow 120,000 2,218,696B Additional flow data - FlowRTT
24 percpu_hash aggregated_flow 120,000 2,144,200B Flow aggregation - PacketTranslation
25 hash ipsec_ingress_m 1,048,576 16,778,880B IPSec ingress map ★
26 hash ipsec_egress_ma 1,048,576 16,778,880B IPSec egress map ★

The ★ marks indicate large pre-allocated maps (~16 MB each). The top 3 memory consumers are:

  • dns_flows — 16.0 MB (1M entries hash map for DNS tracking)
  • ipsec_ingress_m — 16.0 MB (1M entries hash map for IPSec ingress)
  • ipsec_egress_ma — 16.0 MB (1M entries hash map for IPSec egress)

Total NetObserv map memlock: ~63.3 MB per node.

Important: The IPSec maps (32 MB combined) are pre-allocated even if IPSec is not actively used. Similarly, dns_flows (16 MB) is pre-allocated with max_entries=1048576 regardless of actual DNS traffic volume. These are the primary memory optimization targets if memory is constrained.

Network attachment: TCX hooks on every interface

Running bpftool net list shows that tcx_ingress_flow_parse (prog_id 145) and tcx_egress_flow_parse (prog_id 143) are attached to every network interface on the node via TCX (TC eXpress) hooks. Below is a partial listing:

tc:
enp1s0(2)          tcx/ingress tcx_ingress_flow_parse  prog_id 145
enp1s0(2)          tcx/egress  tcx_egress_flow_parse   prog_id 143
enp2s0(3)          tcx/ingress tcx_ingress_flow_parse  prog_id 145
enp2s0(3)          tcx/egress  tcx_egress_flow_parse   prog_id 143
ovs-system(4)      tcx/ingress tcx_ingress_flow_parse  prog_id 145
ovs-system(4)      tcx/egress  tcx_egress_flow_parse   prog_id 143
ovn-k8s-mp0(6)     tcx/ingress tcx_ingress_flow_parse  prog_id 145
ovn-k8s-mp0(6)     tcx/egress  tcx_egress_flow_parse   prog_id 143
genev_sys_6081(7)  tcx/ingress tcx_ingress_flow_parse  prog_id 145
genev_sys_6081(7)  tcx/egress  tcx_egress_flow_parse   prog_id 143
br-int(8)          tcx/ingress tcx_ingress_flow_parse  prog_id 145
br-int(8)          tcx/egress  tcx_egress_flow_parse   prog_id 143
br-ex(9)           tcx/ingress tcx_ingress_flow_parse  prog_id 145
br-ex(9)           tcx/egress  tcx_egress_flow_parse   prog_id 143
... (plus all pod veth interfaces)

On master-01-demo, these programs are attached to 37 interfaces (including physical NICs, OVS bridges, GENEVE tunnels, and pod veth ports), creating 74 TCX hook points (37 × 2 for ingress + egress). All hook points share the same 2 program IDs — the kernel reuses the JIT-compiled code across all interfaces.

Cross-node comparison

Running the same bpftool analysis on all 3 nodes confirms that the NetObserv eBPF program set is identical across the cluster:

Node Total eBPF progs NetObserv progs Prog memlock Map memlock (est.)
master-01-demo 145 16 208.0 KB ~63.3 MB
master-02-demo 203 16 208.0 KB ~63.3 MB
master-03-demo 209 16 208.0 KB ~63.3 MB

Observations:

  • All NetObserv programs are identical across all nodes (same name, type, bytes_xlated, bytes_jited). Only the kernel-assigned program IDs differ.
  • The total system eBPF count varies (145–209) because each node runs a different number of pods, each contributing cgroup_device eBPF programs managed by systemd.
  • The DaemonSet deployment model ensures uniform eBPF instrumentation across the entire cluster.

eBPF program architecture: how features map to kernel hooks

Each FlowCollector feature maps to specific eBPF programs and kernel hooks:

Feature eBPF Program Hook Type Kernel Function
FlowRTT tcp_rcv_fentry tracing/fentry tcp_rcv_established
FlowRTT tcp_rcv_kprobe kprobe (fallback) tcp_rcv_established
DNSTracking (inline in flow_parse) sched_cls/TCX DNS parsed in TC path
PacketDrop kfree_skb tracepoint kfree_skb (packet free = drop)
PacketTranslation track_nat_manip_pkt kprobe nf_nat_manip_pkt
NetworkEvents network_events_monitoring kprobe OVN network event hooks
IPSec xfrm_input_kprobe kprobe xfrm_input
IPSec xfrm_input_kretprobe kprobe (return) xfrm_input
IPSec xfrm_output_kprobe kprobe xfrm_output_resume
IPSec xfrm_output_kretprobe kprobe (return) xfrm_output_resume
UDNMapping (inline in flow_parse) Handled in flow_parse path

The TC/TCX flow path is the core data path, attached to every interface:

  • tcx_ingress_flow_parse / tc_ingress_flow_parse (legacy fallback) — processes all ingress packets
  • tcx_egress_flow_parse / tc_egress_flow_parse (legacy fallback) — processes all egress packets
  • tcx_ingress_pca_parse / tc_egress_pca_parse — packet capture hooks

The flow_parse programs (~10 KB xlated each) are the “main” eBPF programs that process every packet on every interface. They: (1) extract the 5-tuple (src/dst IP, port, protocol), (2) look up/update flow aggregation maps, (3) enrich with DNS, RTT, NAT translation data from other maps, and (4) export via ring buffers when a flow expires.

Memory footprint summary

Category Memory
eBPF program code (16 progs) 208 KB
Flow aggregation maps (6 maps) ~12.9 MB
DNS flow map (1M entries) ~16.0 MB
IPSec maps (2 × 1M entries) ~32.0 MB
Ring buffers (3 buffers) ~50 KB
Misc (counters, config, filter) ~35 KB
Total per node ~63.5 MB
Total across 3-node cluster ~190.5 MB

The IPSec maps (32 MB) and DNS flow map (16 MB) are pre-allocated at maximum capacity regardless of actual traffic. If IPSec and DNS tracking are not needed, disabling these features would save ~48 MB per node. The flow aggregation maps use cacheMaxFlows=120000 entries, which is configurable in the FlowCollector CR.

end