← Back to Index

[!TIP] Ongoing and occasional updates and improvements.

using gitops/aap to collect heap dump

Customer have requirements to collect heap dumps in different openshift to central location. We will try to make this happen using gitops and ansible.

Here are the proposed steps:

The architecture is like this:

install acm

install acm from operator:

create a acm instance:

use basic mode, not HA mode, so we will not create multiple instances for same object.

Now, we try to import the managed cluster, in our case, it will be sno-demo cluster.

But, before import, we need to get api url and api token from managed cluster, sno-dmo.

Now, you can see the api url, and api token:

api url: https://api.demo-01-rhsys.wzhlab.top:6443

api token: sha256~636nYarACWldNeNTx69kGOYPWaQUWcjcMtCHGLNm3Gk

Now, we go back to acm hub cluster, to create the imported cluster. Set the name for the cluster, and select import mode, we will use api token.

Then, we will not use ansible automation to help the import, just ignore at this step.

Review, and import.

After the import, we can see the managed cluster in acm hub.

We can see it is single node openshift.

And add-ons are installed in the imported cluster.

install gitops

We need openshift gitops to create gitops configuration.

install gitops from operator on acm hub cluster.

You can see there is default instance created.

install aap / ansible platform

Find the app operator.

try cluster-scoped channel first.

Then create an aap instance.

Following the offical document.

set service type and ingress type, and patch the config

spec:
          controller:
            disabled: false
        
          eda:
            disabled: false
        
          hub:
            disabled: false
            storage_type: file
            file_storage_storage_class: wzhlab-top-nfs
            file_storage_size: 10Gi

Get the url to access the app platform.

For app, it needs subscription files from redhat portal.

Go to redhat portal, and requrest a trail.

Download the subscription file, and upload to the app platform.

The app installation will continue.

set credential for openshift, it is different from acm importing cluster.


        # for sno-demo cluster
        
        cd ${BASE_DIR}/data/install
        
        wget https://raw.githubusercontent.com/wangzheng422/docker_env/refs/heads/dev/redhat/ocp4/4.16/files/ansible-sa.yaml
        
        oc new-project aap-namespace
        
        oc apply -f ansible-sa.yaml
        
        oc create token containergroup-service-account --duration=876000h -n aap-namespace
        
        # very long output
        
        # for acm-demo cluster
        
        cd ${BASE_DIR}/data/install
        
        wget https://raw.githubusercontent.com/wangzheng422/docker_env/refs/heads/dev/redhat/ocp4/4.16/files/ansible-sa.yaml
        
        oc new-project aap-namespace
        
        oc apply -f ansible-sa.yaml
        
        oc create token containergroup-service-account --duration=876000h -n aap-namespace
        
        # very long output

Define the credential to connect to openshift cluster:

Set the url and the token generated.

Define project, which is the source code reference.

And define the job template.

Set the parameter of the job, like target cluster credential, the project(git repo), the ansible playbook(the path in git repo).

gitops source code

Our code example, we have gitops code and ansible playbook code in the same repo:

Use upstream k8s_core collection:

deploy app using gitops

The source code of gitops is in the repo:

We will use argocd push mode, because the pull mode needs addtional configuration.

Set the application name, and select the argo server, which runs on the hub cluster. Also switch on the yaml button, you can see the yaml file that will be created.

Select git type, set the github url, branch, and the path to the yaml that will be deployed. And set the target namespace, which will be created on the target ocp cluster.

Set the sync policy, which will be applied to argo cd.

And set the placement, which will tell argo cd which target cluster are.

For the placement, there is expression, which is the cluster name, which is sno-demo. And we can see you can select the cluster based on different labels.

And match the value with different logic.

Here is the yaml file that will be created, for your reference:

apiVersion: argoproj.io/v1alpha1
        kind: ApplicationSet
        metadata:
          name: java-app-threads
          namespace: openshift-gitops
        spec:
          generators:
            - clusterDecisionResource:
                configMapRef: acm-placement
                labelSelector:
                  matchLabels:
                    cluster.open-cluster-management.io/placement: java-app-threads-placement
                requeueAfterSeconds: 180
          template:
            metadata:
              name: java-app-threads-{{name}}
              labels:
                velero.io/exclude-from-backup: "true"
            spec:
              destination:
                namespace: wzh-demo-01
                server: "{{server}}"
              project: default
              sources:
                - path: gitops/threads
                  repoURL: https://github.com/wangzheng422/demo-acm-app-gitops
                  targetRevision: main
                  repositoryType: git
              syncPolicy:
                automated:
                  prune: true
                  selfHeal: true
                syncOptions:
                  - CreateNamespace=true
                  - PruneLast=true
        ---
        apiVersion: cluster.open-cluster-management.io/v1beta1
        kind: Placement
        metadata:
          name: java-app-threads-placement
          namespace: openshift-gitops
        spec:
          predicates:
            - requiredClusterSelector:
                labelSelector:
                  matchExpressions:
                    - key: name
                      operator: In
                      values:
                        - sno-demo

Now, we access the argocd to see what happend, we can see there is a new application created.

Go into the application, and click on the first icon.

You can see it will create the deployment on the target/another cluster.

create job/job template in aap

The ansible job/job template use ansible playbook, which is located in this repo:

We create 3 job templates in aap for 3 playbooks in the repo:

And we define a workflow, to add the 3 job templates in the workflow. We introduce the workflow here because the job template only works for one ocp cluster, but the use case needs to operate on 2 ocp clusters.

And we run the workflow, it will be successful.

[!TIP] You can define the ansible job and ansible workflow using openshift aap operator’s CR, but it is not recommended right now, as it is not very well documented.

Maintain multi-cluster consistency using policy

Now, we deploy application and get dump files from pods using ansible. Then next step is to maintain the consistency of the multi-cluster. We can use policy to do this.

Define policy name, and the namespace that the policy will be applied to, which is on acm hub ocp. We will use openshift-gitops namespace, because the default cluster set is defined to binding to this namespace.

Then we define the content of the policy, there are some build-in templates, we will use the policy-namespace template, which is to create a namespace on the target cluster.

As we can see, there are some build-in templates, we will use the simple one, and then we can see the yaml file that will be created.

Then set the parameter of the namespace tempalte, which is the namespace name.

For cluster level consistency, we can force the policy to be applied automatically, but this is not recommended based on auther’s experience. It is recommended to report warning, and let administration to decide what actions to take.

Then, define the placement, which is the target cluster, which is sno-demo.

Then, define some anotation for the policy, which is the standard that the policy is based on.

Review the configuration, and create the policy.

After the policy is created, we can see the policy in the acm hub cluster. And we can see the policy is applied to the target cluster, a warning is reported.

Now, we can see the detail of the warning, it reports the namespace is not created on the target cluster.

Here is the yaml file that will be created, for your reference, you can see it defines object-templates, which is skelton of the object that will be created on the target cluster.

apiVersion: policy.open-cluster-management.io/v1
        kind: Policy
        metadata:
          name: must-have-namespace-demo-target
          namespace: openshift-gitops
          annotations:
            policy.open-cluster-management.io/categories: CM Configuration Management
            policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
            policy.open-cluster-management.io/standards: NIST SP 800-53
        spec:
          disabled: false
          policy-templates:
            - objectDefinition:
                apiVersion: policy.open-cluster-management.io/v1
                kind: ConfigurationPolicy
                metadata:
                  name: policy-namespace
                spec:
                  object-templates:
                    - complianceType: musthave
                      objectDefinition:
                        apiVersion: v1
                        kind: Namespace
                        metadata:
                          name: demo-target
                  pruneObjectBehavior: None
                  remediationAction: inform
                  severity: low
        ---
        apiVersion: cluster.open-cluster-management.io/v1beta1
        kind: Placement
        metadata:
          name: must-have-namespace-demo-target-placement
          namespace: openshift-gitops
        spec:
          clusterSets:
            - default
          predicates:
            - requiredClusterSelector:
                labelSelector:
                  matchExpressions:
                    - key: name
                      operator: In
                      values:
                        - sno-demo
          tolerations:
            - key: cluster.open-cluster-management.io/unreachable
              operator: Exists
            - key: cluster.open-cluster-management.io/unavailable
              operator: Exists
        ---
        apiVersion: policy.open-cluster-management.io/v1
        kind: PlacementBinding
        metadata:
          name: must-have-namespace-demo-target-placement
          namespace: openshift-gitops
        placementRef:
          name: must-have-namespace-demo-target-placement
          apiGroup: cluster.open-cluster-management.io
          kind: Placement
        subjects:
          - name: must-have-namespace-demo-target
            apiGroup: policy.open-cluster-management.io
            kind: Policy

using policy to enforce promethus alert rule

We now use policy to enforce promethus alert rule. Here is the promethus rule example:

apiVersion: monitoring.coreos.com/v1
        kind: PrometheusRule
        metadata:
          name: wzh-cpu-alerts
          namespace: openshift-monitoring  # Ensure this is the correct namespace for your setup
        spec:
          groups:
            - name: cpu-alerts
              rules:
                - alert: HighCpuUsage
                  expr: sum(rate(container_cpu_usage_seconds_total{container!="POD"}[5m])) by (pod) > 0.8
                  for: 5m
                  labels:
                    severity: warning
                  annotations:
                    summary: "High CPU usage detected"
                    description: "Pod {{ $labels.pod }} is using more than 80% CPU for the last 5 minutes."

Before setting in acm, we need to convert it into policy, because by default, the acm build-in policy-template does not support promethus rule.

apiVersion: policy.open-cluster-management.io/v1
        kind: Policy
        metadata:
          name: must-have-prometheus-alert-rule
          namespace: policies
          annotations:
            policy.open-cluster-management.io/categories: CM Configuration Management
            policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
            policy.open-cluster-management.io/standards: NIST SP 800-53
        spec:
          disabled: false
          remediationAction: enforce
          policy-templates:
            - objectDefinition:
                apiVersion: policy.open-cluster-management.io/v1
                kind: ConfigurationPolicy
                metadata:
                  name: policy-alert-rule
                spec:
                  object-templates:
                    - complianceType: musthave
                      objectDefinition:
                        apiVersion: monitoring.coreos.com/v1
                        kind: PrometheusRule
                        metadata:
                          name: wzh-cpu-alerts
                          namespace: openshift-monitoring  # Ensure this is the correct namespace for your setup
                        spec:
                          groups:
                            - name: cpu-alerts
                              rules:
                                - alert: HighCpuUsage
                                  expr: sum(rate(container_cpu_usage_seconds_total{container!="POD"}[5m])) by (pod) > 0.8
                                  for: 5m
                                  labels:
                                    severity: warning
                                  annotations:
                                    summary: "High CPU usage detected"
                                    description: "Pod {{`{{$labels.pod}}`}} is using more than 80% CPU for the last 5 minutes."
                  pruneObjectBehavior: DeleteIfCreated
                  remediationAction: enforce
                  severity: low

Please note, we use pruneObjectBehavior: DeleteIfCreated, so if policy is deleted, the promethus rule will be deleted.

We also use {{`{{$labels.pod}}`}} , which will overwrite the value of the pod label, and also compatible with policy template.

Here is how to create using webUI:

  1. navigate to governance -> policies -> create policy

  2. set the policy name, and namespace

  3. copy the content of policy-template from above example, and select enforce. You can see the prune policy is set to DeleteIfCreated

  4. select the placement.

  5. finally, the policy is deployed. And the prometheus rule is created. So the policy is compliant.

when to use policy and when to use application

We have 2 choice by now to deploy yaml to ocp

So when to use policy and when to use application?

In general, we can use application to deploy the application, and use policy to enforce the cluster wide configuration. If your yaml does not have namespace, then it is better to use policy, because the config is cluster wide. If your yaml has namespace, then it is better to use application, because the config is namespace wide.

But sometimes, your yaml is about some operator configuration, which is cluster wide, but it has namespace in the yaml, then you can use policy to deploy the yaml. Like the prometheus rule example above, it is cluster wide, but it has namespace in the yaml.

end