Openshift4 慢慢走

本仓库是作者在日常系统操作中的技术笔记。作者平日有些机会进行很多系统操作,包括很多PoC,新系统验证,方案探索工作,所以会有很多系统实际操作的机会,涉及到操作系统安装,iaas, paas平台搭建,中间件系统验证,应用系统的开发和验证。很多操作步骤比较复杂,所以需要一个地方进行集中的笔记记录,方便自己整理,并第一时间在线分享。

作者还做了一个chrome extension,用来在new tab上展示bing.com的美图,简单美观,欢迎使用。

作者还有很多视频演示,欢迎前往作者的频道订阅

许可证

书中涉及代码采用GNU V3许可。

版权声明

本书遵循 CC-BY-NC-SA 4.0 协议。商业转载必须征求作者 wangzheng422 授权同意,转载请务必注明出处 作者保留最终解释权及法律追究权力。

免费获得OpenShift4下载密钥

4.6 离线安装, 介质准备

本文的安装步骤,最好是在美国的VPS上完成,然后打包传输回来。

准备离线安装源的步骤如下

  • 准备好operator hub catalog,主要是需要里面的日期信息
  • 运行脚本,准备离线安装源

环境准备

# on vultr
yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

yum -y install htop byobu ethtool dstat

rm -rf /data/ocp4
mkdir -p /data/ocp4
cd /data/ocp4

yum -y install podman docker-distribution pigz skopeo docker buildah jq python3-pip git python36

pip3 install yq

# https://blog.csdn.net/ffzhihua/article/details/85237411
# wget http://mirror.centos.org/centos/7/os/x86_64/Packages/python-rhsm-certificates-1.19.10-1.el7_4.x86_64.rpm
# rpm2cpio python-rhsm-certificates-1.19.10-1.el7_4.x86_64.rpm | cpio -iv --to-stdout ./etc/rhsm/ca/redhat-uep.pem | tee /etc/rhsm/ca/redhat-uep.pem

systemctl enable --now docker

# systemctl start docker

docker login -u ****** -p ******** registry.redhat.io
docker login -u ****** -p ******** registry.access.redhat.com
docker login -u ****** -p ******** registry.connect.redhat.com

podman login -u ****** -p ******** registry.redhat.io
podman login -u ****** -p ******** registry.access.redhat.com
podman login -u ****** -p ******** registry.connect.redhat.com

# to download the pull-secret.json, open following link
# https://cloud.redhat.com/openshift/install/metal/user-provisioned
cat << 'EOF' > /data/pull-secret.json
{"auths":{"cloud.openshift.com":*********************
EOF

cat << EOF >>  /etc/hosts
127.0.0.1 registry.redhat.ren
EOF

# 配置registry
mkdir -p /etc/crts/ && cd /etc/crts

# https://access.redhat.com/documentation/en-us/red_hat_codeready_workspaces/2.1/html/installation_guide/installing-codeready-workspaces-in-tls-mode-with-self-signed-certificates_crw
openssl genrsa -out /etc/crts/redhat.ren.ca.key 4096
openssl req -x509 \
  -new -nodes \
  -key /etc/crts/redhat.ren.ca.key \
  -sha256 \
  -days 36500 \
  -out /etc/crts/redhat.ren.ca.crt \
  -subj /CN="Local Red Hat Ren Signer" \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat /etc/pki/tls/openssl.cnf \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature'))

openssl genrsa -out /etc/crts/redhat.ren.key 2048

openssl req -new -sha256 \
    -key /etc/crts/redhat.ren.key \
    -subj "/O=Local Red Hat Ren /CN=*.ocp4.redhat.ren" \
    -reqexts SAN \
    -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "\n[SAN]\nsubjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth")) \
    -out /etc/crts/redhat.ren.csr

openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth") \
    -days 36500 \
    -in /etc/crts/redhat.ren.csr \
    -CA /etc/crts/redhat.ren.ca.crt \
    -CAkey /etc/crts/redhat.ren.ca.key \
    -CAcreateserial -out /etc/crts/redhat.ren.crt

openssl x509 -in /etc/crts/redhat.ren.crt -text

/bin/cp -f /etc/crts/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

cd /data/ocp4
# systemctl stop docker-distribution

/bin/rm -rf /data/registry
mkdir -p /data/registry
cat << EOF > /etc/docker-distribution/registry/config.yml
version: 0.1
log:
  fields:
    service: registry
storage:
    cache:
        layerinfo: inmemory
    filesystem:
        rootdirectory: /data/registry
    delete:
        enabled: true
http:
    addr: :5443
    tls:
       certificate: /etc/crts/redhat.ren.crt
       key: /etc/crts/redhat.ren.key
compatibility:
  schema1:
    enabled: true
EOF
# systemctl restart docker
# systemctl enable docker-distribution

# systemctl restart docker-distribution

# podman login registry.redhat.ren:5443 -u a -p a

systemctl enable --now docker-distribution

operator hub catalog

mkdir -p /data/ocp4
cd /data/ocp4

export BUILDNUMBER=4.6.28

wget -O openshift-client-linux-${BUILDNUMBER}.tar.gz https://mirror.openshift.com/pub/openshift-v4/clients/ocp/${BUILDNUMBER}/openshift-client-linux-${BUILDNUMBER}.tar.gz
wget -O openshift-install-linux-${BUILDNUMBER}.tar.gz https://mirror.openshift.com/pub/openshift-v4/clients/ocp/${BUILDNUMBER}/openshift-install-linux-${BUILDNUMBER}.tar.gz

tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/sbin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/sbin/

wget -O operator.sh https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/scripts/operator.sh

bash operator.sh

# 2021.05.07.0344

离线安装源制作

rm -rf /data/ocp4
mkdir -p /data/ocp4
cd /data/ocp4
# wget -O build.dist.sh https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/scripts/build.dist.sh

# bash build.dist.sh

wget -O prepare.offline.content.sh https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/scripts/prepare.offline.content.sh

# git clone https://github.com/wangzheng422/docker_env.git
# cd docker_env
# git checkout dev
# cp redhat/ocp4/4.6/scripts/prepare.offline.content.sh /data/ocp4/
# cd /data/ocp4
# rm -rf docker_env

bash prepare.offline.content.sh -v 4.6.28, -m 4.6 -h 2021.05.07.0344

output of mirror of images

Success
Update image:  registry.redhat.ren:5443/ocp4/openshift4:4.6.5
Mirror prefix: registry.redhat.ren:5443/ocp4/openshift4

To use the new mirrored repository to install, add the following section to the install-config.yaml:

imageContentSources:
- mirrors:
  - registry.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev


To use the new mirrored repository for upgrades, use the following to create an ImageContentSourcePolicy:

apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: example
spec:
  repositoryDigestMirrors:
  - mirrors:
    - registry.redhat.ren:5443/ocp4/openshift4
    source: quay.io/openshift-release-dev/ocp-release
  - mirrors:
    - registry.redhat.ren:5443/ocp4/openshift4
    source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

########################################
##
Success
Update image:  openshift/release:4.3.3

To upload local images to a registry, run:

    oc image mirror --from-dir=/data/mirror_dir file://openshift/release:4.3.3* REGISTRY/REPOSITORY


download image for components

########################################
# your images
cd /data/ocp4/
export MIRROR_DIR='/data/install.image'
/bin/rm -rf ${MIRROR_DIR}
bash add.image.sh install.image.list ${MIRROR_DIR}

export MIRROR_DIR='/data/poc.image'
/bin/rm -rf ${MIRROR_DIR}
bash add.image.sh poc.image.list ${MIRROR_DIR}

########################################
# common function
build_image_list() {
  VAR_INPUT_FILE=$1
  VAR_OUTPUT_FILE=$2
  VAR_OPERATOR=$3

  VAR_FINAL=`cat $VAR_INPUT_FILE | grep $VAR_OPERATOR | awk '{if ($2) print $2;}' | sort | uniq | tail -1`

  echo $VAR_FINAL

  cat $VAR_INPUT_FILE | grep $VAR_FINAL | awk '{if ($2) print $1;}' >> $VAR_OUTPUT_FILE
}

########################################
# redhat operator hub
export MIRROR_DIR='/data/redhat-operator'

/bin/rm -rf ${MIRROR_DIR}
/bin/rm -f /data/ocp4/mapping-redhat.list
wanted_operator_list=$(cat redhat-operator-image.list | awk '{if ($2) print $2;}' \
  | sed 's/\..*//g' | sort | uniq
)

while read -r line; do
    build_image_list '/data/ocp4/redhat-operator-image.list' '/data/ocp4/mapping-redhat.list' $line
done <<< "$wanted_operator_list"

bash add.image.sh mapping-redhat.list ${MIRROR_DIR}

# /bin/cp -f pull.add.image.failed.list pull.add.image.failed.list.bak
# bash add.image.resume.sh pull.add.image.failed.list.bak ${MIRROR_DIR}

cd ${MIRROR_DIR%/*}
tar cf - echo ${MIRROR_DIR##*/}/ | pigz -c > echo ${MIRROR_DIR##*/}.tgz 

# to load image back
bash add.image.load.sh '/data/redhat-operator' 'registry.redhat.ren:5443'

######################################
# certified operator hub
export MIRROR_DIR='/data/certified-operator'

/bin/rm -rf ${MIRROR_DIR}
/bin/rm -f /data/ocp4/mapping-certified.list
wanted_operator_list=$(cat certified-operator-image.list | awk '{if ($2) print $2;}' \
  | sed 's/\..*//g' | sort | uniq
)

while read -r line; do
    build_image_list '/data/ocp4/certified-operator-image.list' '/data/ocp4/mapping-certified.list' $line
done <<< "$wanted_operator_list"

bash add.image.sh mapping-certified.list ${MIRROR_DIR}

# /bin/cp -f pull.add.image.failed.list pull.add.image.failed.list.bak
# bash add.image.resume.sh pull.add.image.failed.list.bak ${MIRROR_DIR}

cd ${MIRROR_DIR%/*}
tar cf - echo ${MIRROR_DIR##*/}/ | pigz -c > echo ${MIRROR_DIR##*/}.tgz 

# bash add.image.sh mapping-certified.txt

#######################################
# community operator hub
export MIRROR_DIR='/data/community-operator'

/bin/rm -rf ${MIRROR_DIR}
/bin/rm -f /data/ocp4/mapping-community.list
wanted_operator_list=$(cat community-operator-image.list | awk '{if ($2) print $2;}' \
  | sed 's/\..*//g' | sort | uniq
)

while read -r line; do
    build_image_list '/data/ocp4/community-operator-image.list' '/data/ocp4/mapping-community.list' $line
done <<< "$wanted_operator_list"

bash add.image.sh mapping-community.list ${MIRROR_DIR}

# /bin/cp -f pull.add.image.failed.list pull.add.image.failed.list.bak
# bash add.image.resume.sh pull.add.image.failed.list.bak ${MIRROR_DIR}

cd ${MIRROR_DIR%/*}
tar cf - echo ${MIRROR_DIR##*/}/ | pigz -c > echo ${MIRROR_DIR##*/}.tgz 

# bash add.image.sh mapping-community.txt

# to load image back
bash add.image.load.sh '/data/community-operator' 'registry.redhat.ren:5443'

#####################################
# samples operator
export MIRROR_DIR='/data/is.samples'

/bin/rm -rf ${MIRROR_DIR}
bash add.image.sh is.openshift.list  ${MIRROR_DIR}


镜像仓库代理 / image registry proxy

准备离线镜像仓库非常麻烦,好在我们找到了一台在线的主机,那么我们可以使用nexus构造image registry proxy,在在线环境上面,做一遍PoC,然后就能通过image registry proxy得到离线镜像了

  • https://mtijhof.wordpress.com/2018/07/23/using-nexus-oss-as-a-proxy-cache-for-docker-images/
#####################################################
# init build the nexus fs
mkdir -p /data/ccn/nexus-image
chown -R 200 /data/ccn/nexus-image

# podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z docker.io/sonatype/nexus3:3.29.0

podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh

podman stop nexus-image
podman rm nexus-image

# get the admin password
cat /data/ccn/nexus-image/admin.password && echo
# 84091bcd-c82f-44a3-8b7b-dfc90f5b7da1

# open http://nexus.ocp4.redhat.ren:8082

# 开启 https
# https://blog.csdn.net/s7799653/article/details/105378645
# https://help.sonatype.com/repomanager3/system-configuration/configuring-ssl#ConfiguringSSL-InboundSSL-ConfiguringtoServeContentviaHTTPS
mkdir -p /data/install/tmp
cd /data/install/tmp

# 将证书导出成pkcs格式
# 这里需要输入密码  用 password,
openssl pkcs12 -export -out keystore.pkcs12 -inkey /etc/crts/redhat.ren.key -in /etc/crts/redhat.ren.crt

cat << EOF >> Dockerfile
FROM docker.io/sonatype/nexus3:3.29.0
USER root
COPY keystore.pkcs12 /keystore.pkcs12
RUN keytool -v -importkeystore -srckeystore keystore.pkcs12 -srcstoretype PKCS12 -destkeystore keystore.jks -deststoretype JKS -storepass password -srcstorepass password  &&\
    cp keystore.jks /opt/sonatype/nexus/etc/ssl/
USER nexus
EOF
buildah bud --format=docker -t docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh -f Dockerfile .
buildah push docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh

######################################################
# go to helper, update proxy setting for ocp cluster
cd /data/ocp4
bash image.registries.conf.sh nexus.ocp4.redhat.ren:8083

mkdir -p /etc/containers/registries.conf.d
/bin/cp -f image.registries.conf /etc/containers/registries.conf.d/

cd /data/ocp4
oc apply -f ./99-worker-container-registries.yaml -n openshift-config
oc apply -f ./99-master-container-registries.yaml -n openshift-config

######################################################
# dump the nexus image fs out
podman stop nexus-image

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date
cd /data/ccn

tar cf - ./nexus-image | pigz -c > nexus-image.tgz 
buildah from --name onbuild-container scratch
buildah copy onbuild-container nexus-image.tgz  /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/nexus-fs:image-$var_date
# buildah rm onbuild-container
# rm -f nexus-image.tgz 
buildah push docker.io/wangzheng422/nexus-fs:image-$var_date
echo "docker.io/wangzheng422/nexus-fs:image-$var_date"

# 以下这个版本,可以作为初始化的image proxy,里面包含了nfs provision,以及sample operator的metadata。很高兴的发现,image stream并不会完全下载镜像,好想只是下载metadata,真正用的时候,才去下载。
# docker.io/wangzheng422/nexus-fs:image-2020-12-26-1118

##################################################
## call nexus api to get image list
# https://community.sonatype.com/t/how-can-i-get-a-list-of-tags-for-a-docker-image-akin-to-the-docker-hub-list/3210
# https://help.sonatype.com/repomanager3/rest-and-integration-api/search-api
curl -k -u admin:84091bcd-c82f-44a3-8b7b-dfc90f5b7da1 -X GET 'http://nexus.ocp4.redhat.ren:8082/service/rest/v1/search?repository=registry.redhat.io'

curl -u admin:84091bcd-c82f-44a3-8b7b-dfc90f5b7da1 -X GET 'http://nexus.ocp4.redhat.ren:8082/service/rest/v1/components?repository=registry.redhat.io'

podman pull docker.io/anoxis/registry-cli
podman run --rm anoxis/registry-cli -l admin:84091bcd-c82f-44a3-8b7b-dfc90f5b7da1 -r https://nexus.ocp4.redhat.ren:8083

# https://github.com/rpardini/docker-registry-proxy

REPO_URL=nexus.ocp4.redhat.ren:8083

curl -k -s -X GET https://$REPO_URL/v2/_catalog \
 | jq '.repositories[]' \
 | sort \
 | xargs -I _ curl -s -k -X GET https://$REPO_URL/v2/_/tags/list



##################################################
## prepare for baidu disk
mkdir -p /data/ccn/baidu
cd /data/ccn

tar cf - ./nexus-image | pigz -c > /data/ccn/baidu/nexus-image.tgz 

cd /data/ccn/baidu
split -b 20000m nexus-image.tgz  nexus-image.tgz.
rm -f nexus-image.tgz

yum -y install python3-pip
pip3 install --user bypy 
/root/.local/bin/bypy list
/root/.local/bin/bypy upload

upload to baidu disk

export BUILDNUMBER=4.6.28

mkdir -p /data/bypy
cd /data
tar -cvf - ocp4/ | pigz -c > /data/bypy/ocp.$BUILDNUMBER.tgz
tar -cvf - registry/ | pigz -c > /data/bypy/registry.$BUILDNUMBER.tgz

cd /data/bypy
# https://github.com/houtianze/bypy
yum -y install python3-pip
pip3 install --user bypy 
/root/.local/bin/bypy list
/root/.local/bin/bypy upload


openshift 4.9 single node, assisted install mode, without dhcp, connected

本文描述,如何使用assisted service(辅助安装服务),来安装一个单节点openshift4集群,特别的地方是,默认情况,openshift4要求网络上提供dhcp服务,让节点启动的时候,能拿到IP地址,从而进一步下载容器镜像,并且和assisted service交互,拿到配置。可是大部分客户的网络,是不允许开启dhcp服务的,那么我们在这里就使用assisted service暂时隐藏的功能,进行static ip模式的部署。

本实验设想的客户环境/需求是这样的:

  1. 实验网络没有dhcp
  2. 实验网络可以访问外网
  3. 实验环境中有2台主机
  4. 将在实验环境中的1台主机上,安装单节点openshift4(baremetal模式)

由于作者实验环境所限,我们就用kvm来代替baremetal进行实验。

安装过程大概是这样的:

  1. 启动helper vm,并在helper节点上配置dns服务
  2. 启动本地assisted service服务
  3. 在assisted service上进行配置
  4. 从assisted service上下载iso
  5. 通过iso启动kvm/baremetal
  6. 在assisted service上进行配置,开始安装
  7. 观察和等待安装结束
  8. 获得openshift4的用户名密码等信息,登录集群。

本次实验的架构图:

部署 dns

assisted install 模式下,如果想静态ip安装,需要在实验网络上部署一个dns服务。因为我们部署的是single node openshift,只需要把如下4个域名,指向同一个ip地址就可以。当然,你需要提前想好域名。

  • api.ocp4s.redhat.ren
  • api-int.ocp4s.redhat.ren
  • *.apps.ocp4.redhat.ren
  • ocp4-sno.ocp4.redhat.ren

部署 assisted install service

assisted install service有2个版本,一个是cloud.redhat.com上面那个,同时还有一个本地版本,两个版本功能一样,因为我们需要有定制需求,所以我们选择本地版本。

# https://github.com/openshift/assisted-service/blob/master/docs/user-guide/assisted-service-on-local.md

# https://github.com/openshift/assisted-service/tree/master/deploy/podman

podman version
# Version:      3.4.2
# API Version:  3.4.2
# Go Version:   go1.16.12
# Built:        Wed Feb  2 07:59:28 2022
# OS/Arch:      linux/amd64

mkdir -p /data/assisted-service/
cd /data/assisted-service/

export http_proxy="http://192.168.195.54:5085"
export https_proxy=${http_proxy}

wget https://raw.githubusercontent.com/openshift/assisted-service/master/deploy/podman/configmap.yml
wget https://raw.githubusercontent.com/openshift/assisted-service/master/deploy/podman/pod.yml

unset http_proxy
unset https_proxy

sed -i 's/ SERVICE_BASE_URL:.*/ SERVICE_BASE_URL: "http:\/\/172.21.6.103:8090"/' configmap.yml

# 启动本地assisted service
podman play kube --configmap configmap.yml pod.yml

# 用以下命令,停止/删除本地assisted service
cd /data/assisted-service/
podman play kube --down pod.yml

⚠️注意:本地版本的assisted service,会从mirror.openshift.com上面下载多个版本的iso,总共有6GB。请等待下载完成

podman exec assisted-installer-image-service du -h /data
# 6.3G    /data

运行成功以后,访问以下url

http://172.21.6.103:8080

创建cluster

访问本地的assist install service, 创建一个cluster

配置集群的基本信息

填写自己的pull-secret信息,并点击下一步

进入下一个页面后,点击add host

直接点击generate discovery iso,我们会在后面定制ssh key,现在不需要配置。

记录下来download command,因为我们需要里面的env infra id

我们这里的command是

wget -O discovery_image_ocp4s.iso 'http://127.0.0.1:8888/images/78506b3c-46e4-47f7-8a18-ec1ca4baa3b9?arch=x86_64&type=full-iso&version=4.9'

定制 assisted install service的配置

assisted install service创建的iso,要去实验网络必须有dhcp服务,我们要做的是static ip,那么我们就要定制一下 assisted install service, 激活他现在还是隐藏的功能(暂时没有官方支持)。

# on helper
cd /data/sno

SNO_IP=172.21.6.13
SNO_GW=172.21.6.254
SNO_NETMAST=255.255.255.0
SNO_NETMAST_S=24
SNO_HOSTNAME=ocp4-sno
SNO_IF=enp1s0
SNO_IF_MAC=`printf '00:60:2F:%02X:%02X:%02X' $[RANDOM%256] $[RANDOM%256] $[RANDOM%256]`
SNO_DNS=172.21.1.1
SNO_DISK=/dev/vda
SNO_CORE_PWD=redhat

echo ${SNO_IF_MAC} > /data/sno/sno.mac

ASSISTED_SERVICE_URL=http://172.21.6.103:8080
# infra id is part of download url on UI
INFRA_ENV_ID=78506b3c-46e4-47f7-8a18-ec1ca4baa3b9
NODE_SSH_KEY="$(cat ~/.ssh/id_rsa.pub)"
request_body=$(mktemp)

cat << EOF > /data/sno/server-a.yaml
dns-resolver:
  config:
    server:
    - ${SNO_DNS}
interfaces:
- ipv4:
    address:
    - ip: ${SNO_IP}
      prefix-length: ${SNO_NETMAST_S}
    dhcp: false
    enabled: true
  name: ${SNO_IF}
  state: up
  type: ethernet
routes:
  config:
  - destination: 0.0.0.0/0
    next-hop-address: ${SNO_GW}
    next-hop-interface: ${SNO_IF}
    table-id: 254
EOF

cat << EOF > /data/sno/static.ip.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-static-ip
storage:
  files:
    - path: /etc/NetworkManager/system-connections/${SNO_IF}.nmconnection
      overwrite: true
      contents:
        inline: |
          [connection]
          id=${SNO_IF}
          type=ethernet
          autoconnect-retries=1
          interface-name=${SNO_IF}
          multi-connect=1
          permissions=
          wait-device-timeout=60000

          [ethernet]
          mac-address-blacklist=

          [ipv4]
          address1=${SNO_IP}/${SNO_NETMAST_S=24},${SNO_GW}
          dhcp-hostname=${SNO_HOSTNAME}
          dhcp-timeout=90
          dns=${SNO_DNS};
          dns-search=
          may-fail=false
          method=manual

          [ipv6]
          addr-gen-mode=eui64
          dhcp-hostname=${SNO_HOSTNAME}
          dhcp-timeout=90
          dns-search=
          method=disabled

          [proxy]

EOF

# https://access.redhat.com/solutions/6194821
# butane /data/sno/static.ip.bu | python3 -c 'import json, yaml, sys; print(json.dumps(yaml.load(sys.stdin)))'


# https://stackoverflow.com/questions/2854655/command-to-escape-a-string-in-bash
# VAR_PULL_SEC=`printf "%q" $(cat  /data/pull-secret.json)`
tmppath=$(mktemp)
butane /data/sno/static.ip.bu | python3 -c 'import json, yaml, sys; print(json.dumps(yaml.load(sys.stdin)))' | jq -c '.spec.config | .ignition.version = "3.1.0" ' > ${tmppath}
VAR_NMSTATIC=$(cat ${tmppath})
# rm -f ${tmppath}

jq -n --arg SSH_KEY "$NODE_SSH_KEY" \
  --arg NMSTATE_YAML1 "$(cat server-a.yaml)" \
  --arg MAC_ADDR "$(cat /data/sno/sno.mac)" \
  --arg PULL_SEC "$(cat  /data/pull-secret.json)" \
  --arg NMSTATIC "${VAR_NMSTATIC}" \
'{
    "proxy":{"http_proxy":"","https_proxy":"","no_proxy":""},
    "ssh_authorized_key":$SSH_KEY,
    "pull_secret":$PULL_SEC,
    "image_type":"full-iso",
    "ignition_config_override":$NMSTATIC,
    "static_network_config": [
      {
        "network_yaml": $NMSTATE_YAML1,
        "mac_interface_map": [{"mac_address": $MAC_ADDR, "logical_nic_name": "enp1s0"}]
      }
    ]
}' > $request_body

# 我们来看看创建的request body
cat $request_body

# 向 assisted install service发送请求,进行定制
curl -H "Content-Type: application/json" -X PATCH -d @$request_body ${ASSISTED_SERVICE_URL}/api/assisted-install/v2/infra-envs/$INFRA_ENV_ID
# {"cluster_id":"850934fd-fa64-4057-b9d2-1eeebd890e1a","cpu_architecture":"x86_64","created_at":"2022-02-11T03:54:46.632598Z","download_url":"http://127.0.0.1:8888/images/89cc84a1-2dfd-4d7e-9ca3-903342c40d60?arch=x86_64&type=full-iso&version=4.9","email_domain":"Unknown","expires_at":"0001-01-01T00:00:00.000Z","href":"/api/assisted-install/v2/infra-envs/89cc84a1-2dfd-4d7e-9ca3-903342c40d60","id":"89cc84a1-2dfd-4d7e-9ca3-903342c40d60","kind":"InfraEnv","name":"ocp4s_infra-env","openshift_version":"4.9","proxy":{"http_proxy":"","https_proxy":"","no_proxy":""},"pull_secret_set":true,"ssh_authorized_key":"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCrkO4oLIFTwjkGON+aShlQRKwXHOf3XKrGDmpb+tQM3UcbsF2U7klsr9jBcGObQMZO7KBW8mlRu0wC2RxueBgjbqvylKoFacgVZg6PORfkclqE1gZRYFwoxDkLo2c5y5B7OhcAdlHO0eR5hZ3/0+8ZHZle0W+A0AD7qqowO2HlWLkMMt1QXFD7R0r6dzTs9u21jASGk3jjYgCOw5iHvqm2ueVDFAc4yVwNZ4MXKg5MRvqAJDYPqhaRozLE60EGIziy9SRj9HWynyNDncCdL1/IBK2z9T0JwDebD6TDNcPCtL+AeKIpaHed52PkjnFf+Q+8/0Z0iXt6GyFYlx8OkxdsiMgMxiXx43yIRaWZjx54kVtc9pB6CL50UKPQ2LjuFPIZSfaCab5KDgPRtzue82DE6Mxxg4PS+FTW32/bq1WiOxCg9ABrZ0n1CGaZWFepJkSw47wodMnvlBkcKY3Rn/SsLZVOUsJysd+b08LQgl1Fr3hjVrEQMLbyU0UxvoerYfk= root@ocp4-helper","static_network_config":"dns-resolver:\n  config:\n    server:\n    - 172.21.1.1\ninterfaces:\n- ipv4:\n    address:\n    - ip: 172.21.6.13\n      prefix-length: 24\n    dhcp: false\n    enabled: true\n  name: enp1s0\n  state: up\n  type: ethernet\nroutes:\n  config:\n  - destination: 0.0.0.0/0\n    next-hop-address: 172.21.6.254\n    next-hop-interface: enp1s0\n    table-id: 254HHHHH00:60:2F:8B:42:88=enp1s0","type":"full-iso","updated_at":"2022-02-11T04:01:14.008388Z","user_name":"admin"}

# on helper
cd /data/sno/
wget -O discovery_image_ocp4s.iso "http://172.21.6.103:8888/images/${INFRA_ENV_ID}?arch=x86_64&type=full-iso&version=4.9"

# coreos-installer iso kargs modify -a \
#   " ip=${SNO_IP}::${SNO_GW}:${SNO_NETMAST}:${SNO_HOSTNAME}:${SNO_IF}:none nameserver=${SNO_DNS}" \
#   /data/sno/discovery_image_ocp4s.iso

/bin/mv -f /data/sno/discovery_image_ocp4s.iso /data/sno/sno.iso

启动kvm

我们回到kvm宿主机,启动kvm,开始安装single node openshift

# back to kvm host

create_lv() {
    var_vg=$1
    var_lv=$2
    var_size=$3
    lvremove -f $var_vg/$var_lv
    lvcreate -y -L $var_size -n $var_lv $var_vg
    wipefs --all --force /dev/$var_vg/$var_lv
}

create_lv vgdata lvsno 120G

export KVM_DIRECTORY=/data/kvm

mkdir -p  ${KVM_DIRECTORY}
cd ${KVM_DIRECTORY}
scp root@192.168.7.11:/data/sno/sno.* ${KVM_DIRECTORY}/

# on kvm host
# export KVM_DIRECTORY=/data/kvm
virt-install --name=ocp4-sno --vcpus=16 --ram=65536 \
--cpu=host-model \
--disk path=/dev/vgdata/lvsno,device=disk,bus=virtio,format=raw \
--os-variant rhel8.3 --network bridge=baremetal,model=virtio,mac=$(<sno.mac) \
--graphics vnc,port=59012 \
--boot menu=on --cdrom ${KVM_DIRECTORY}/sno.iso

在 assisted install service里面配置sno参数

回到 assisted install service webUI,能看到node已经被发现

点击下一步,配置物理机的安装子网

点击下一步,回顾集群配置信息

开始安装,到这里,我们等待就可以

一段时间以后,通常20-30分钟,就安装完成了,当然这要网络情况比较好的条件下。

⚠️不要忘记下载集群证书,还有webUI的用户名,密码。

访问sno集群

# back to helper
# copy kubeconfig from web browser to /data/sno
export KUBECONFIG=/data/sno/auth/kubeconfig

oc get node
# NAME       STATUS   ROLES           AGE   VERSION
# ocp4-sno   Ready    master,worker   71m   v1.22.3+e790d7f

oc get co
# NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
# authentication                             4.9.18    True        False         False      54m
# baremetal                                  4.9.18    True        False         False      58m
# cloud-controller-manager                   4.9.18    True        False         False      63m
# cloud-credential                           4.9.18    True        False         False      68m
# cluster-autoscaler                         4.9.18    True        False         False      59m
# config-operator                            4.9.18    True        False         False      69m
# console                                    4.9.18    True        False         False      54m
# csi-snapshot-controller                    4.9.18    True        False         False      68m
# dns                                        4.9.18    True        False         False      58m
# etcd                                       4.9.18    True        False         False      62m
# image-registry                             4.9.18    True        False         False      55m
# ingress                                    4.9.18    True        False         False      57m
# insights                                   4.9.18    True        False         False      63m
# kube-apiserver                             4.9.18    True        False         False      58m
# kube-controller-manager                    4.9.18    True        False         False      61m
# kube-scheduler                             4.9.18    True        False         False      60m
# kube-storage-version-migrator              4.9.18    True        False         False      68m
# machine-api                                4.9.18    True        False         False      59m
# machine-approver                           4.9.18    True        False         False      60m
# machine-config                             4.9.18    True        False         False      63m
# marketplace                                4.9.18    True        False         False      68m
# monitoring                                 4.9.18    True        False         False      54m
# network                                    4.9.18    True        False         False      68m
# node-tuning                                4.9.18    True        False         False      64m
# openshift-apiserver                        4.9.18    True        False         False      55m
# openshift-controller-manager               4.9.18    True        False         False      60m
# openshift-samples                          4.9.18    True        False         False      57m
# operator-lifecycle-manager                 4.9.18    True        False         False      60m
# operator-lifecycle-manager-catalog         4.9.18    True        False         False      60m
# operator-lifecycle-manager-packageserver   4.9.18    True        False         False      58m
# service-ca                                 4.9.18    True        False         False      68m
# storage                                    4.9.18    True        False         False      63m

访问集群的webUI

https://console-openshift-console.apps.ocp4s.redhat.ren/

用户名密码是: kubeadmin / 3QS3M-HA3Px-376HD-bvfif

reference

https://github.com/openshift/assisted-service/tree/master/docs/user-guide

  • https://access.redhat.com/solutions/6135171
  • https://github.com/openshift/assisted-service/blob/master/docs/user-guide/assisted-service-on-local.md
  • https://github.com/openshift/assisted-service/blob/master/docs/user-guide/restful-api-guide.md

search

  • pre-network-manager-config.sh
  • /Users/wzh/Desktop/dev/assisted-service/internal/constants/scripts.go
  • NetworkManager

https://superuser.com/questions/218340/how-to-generate-a-valid-random-mac-address-with-bash-shell

end


cat << EOF > test
02:00:00:2c:23:a5=enp1s0
EOF
cat test | cut -d= -f1 | tr '[:lower:]' '[:upper:]'

printf '00-60-2F-%02X-%02X-%02X\n' $[RANDOM%256] $[RANDOM%256] $[RANDOM%256]
virsh domifaddr freebsd11.1

openshift 4.9 single node, assisted install mode, without dhcp, disconnected

本文描述,如何使用assisted service(辅助安装服务),来安装一个单节点openshift4集群,特别的地方是,默认情况,openshift4要求网络上提供dhcp服务,让节点启动的时候,能拿到IP地址,从而进一步下载容器镜像,并且和assisted service交互,拿到配置。可是大部分客户的网络,是不允许开启dhcp服务的,那么我们在这里就使用assisted service暂时隐藏的功能,进行static ip模式的部署。

本实验设想的客户环境/需求是这样的:

  1. 实验网络没有dhcp
  2. 实验网络不能访问外网
  3. 实验环境中有2台主机
  4. 将在实验环境中的1台主机上,安装单节点openshift4(baremetal模式)

由于作者实验环境所限,我们就用kvm来代替baremetal进行实验。

安装过程大概是这样的:

  1. 启动helper vm,并在helper节点上配置dns服务
  2. 启动本地assisted service服务
  3. 在assisted service上进行配置
  4. 从assisted service上下载iso
  5. 通过iso启动kvm/baremetal
  6. 在assisted service上进行配置,开始安装
  7. 观察和等待安装结束
  8. 获得openshift4的用户名密码等信息,登录集群。

本次实验的架构图:

安装介质

本文的安装,使用openshift 4.9.12,未来方便,作者打包了安装介质,里面除了openshift镜像,还有一些辅助软件和工具。

打包好的安装包,在这里下载,百度盘下载链接,版本是 4.9.12 :

  • 4.9.12
    • 链接: https://pan.baidu.com/s/1Wj5MUBLMFli1kOit1eafug 提取码: ur8r

部署 dns

assisted install 模式下,如果想静态ip安装,需要在实验网络上部署一个dns服务。因为我们部署的是single node openshift,只需要把如下4个域名,指向同一个ip地址就可以。当然,你需要提前想好域名。

  • api.ocp4s.redhat.ren
  • api-int.ocp4s.redhat.ren
  • *.apps.ocp4.redhat.ren
  • ocp4-sno.ocp4.redhat.ren

cd /data/ocp4/ocp4-upi-helpernode-master/
cat << 'EOF' > /data/ocp4/ocp4-upi-helpernode-master/vars.yaml
---
ocp_version: 4.9.12
ssh_gen_key: false
staticips: true
firewalld: false
dns_forward: yes
iso:
  iso_dl_url: "file:///data/ocp4/rhcos-live.x86_64.iso"
  my_iso: "rhcos-live.iso" # this is internal file, just leave as it.
helper:
  name: "helper"
  ipaddr: "192.168.7.11"
  networkifacename: "enp1s0"
  gateway: "192.168.7.1"
  netmask: "255.255.255.0"
dns:
  domain: "redhat.ren"
  clusterid: "sno"
  forwarder1: "172.21.1.1"
  forwarder2: "172.21.1.1"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.112"
  interface: "enp1s0"
  install_drive: "vda"
masters:
  - name: "master-0"
    ipaddr: "192.168.7.113"
    interface: "enp1s0"
    install_drive: "vda"
  # - name: "master-1"
  #   ipaddr: "192.168.7.14"
  #   interface: "enp1s0"
  #   install_drive: "vda"    
  # - name: "master-2"
  #   ipaddr: "192.168.7.15"
  #   interface: "enp1s0"
  #   install_drive: "vda"    
workers:
  - name: "worker-0"
    ipaddr: "192.168.7.116"
    interface: "eno1"
    install_drive: "sda"
  - name: "worker-1"
    ipaddr: "192.168.7.117"
    interface: "enp1s0"
    install_drive: "sda"
  # - name: "worker-2"
  #   ipaddr: "192.168.7.18"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "infra-0"
  #   ipaddr: "192.168.7.19"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "infra-1"
  #   ipaddr: "192.168.7.20"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "worker-3"
  #   ipaddr: "192.168.7.21"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "worker-4"
  #   ipaddr: "192.168.7.22"
  #   interface: "enp1s0"
  #   install_drive: "vda"
others:
  - name: "registry"
    ipaddr: "192.168.7.1"
  - name: "yum"
    ipaddr: "192.168.7.1"
  - name: "quay"
    ipaddr: "192.168.7.1"
  - name: "nexus"
    ipaddr: "192.168.7.1"
  - name: "git"
    ipaddr: "192.168.7.1"
otherdomains:
  - domain: "infra.redhat.ren"
    hosts:
    - name: "registry"
      ipaddr: "192.168.7.1"
    - name: "yum"
      ipaddr: "192.168.7.1"
    - name: "quay"
      ipaddr: "192.168.7.1"
    - name: "quaylab"
      ipaddr: "192.168.7.1"
    - name: "nexus"
      ipaddr: "192.168.7.1"
    - name: "git"
      ipaddr: "192.168.7.1"
  - domain: "ocp4s-ais.redhat.ren"
    hosts:
    - name: "api"
      ipaddr: "192.168.7.13"
    - name: "api-int"
      ipaddr: "192.168.7.13"
    - name: "ocp4-sno"
      ipaddr: "192.168.7.13"
    - name: "*.apps"
      ipaddr: "192.168.7.13"
force_ocp_download: false
remove_old_config_files: false
ocp_client: "file:///data/ocp4/{{ ocp_version }}/openshift-client-linux-{{ ocp_version }}.tar.gz"
ocp_installer: "file:///data/ocp4/{{ ocp_version }}/openshift-install-linux-{{ ocp_version }}.tar.gz"
ppc64le: false
arch: 'x86_64'
chronyconfig:
  enabled: true
  content:
    - server: "192.168.7.11"
      options: iburst
setup_registry: # don't worry about this, just leave it here
  deploy: false
  registry_image: docker.io/library/registry:2
  local_repo: "ocp4/openshift4"
  product_repo: "openshift-release-dev"
  release_name: "ocp-release"
  release_tag: "4.6.1-x86_64"
ocp_filetranspiler: "file:///data/ocp4/filetranspiler.tgz"
registry_server: "registry.ocp4.redhat.ren:5443"
EOF

ansible-playbook -e @vars.yaml tasks/main.yml

/bin/cp -f /data/ocp4/rhcos-live.x86_64.iso /var/www/html/install/live.iso

部署 assisted install service

assisted install service有2个版本,一个是cloud.redhat.com上面那个,同时还有一个本地版本,两个版本功能一样,因为我们需要有定制需求,所以我们选择本地版本。

# https://github.com/openshift/assisted-service/blob/master/docs/user-guide/assisted-service-on-local.md

# https://github.com/openshift/assisted-service/tree/master/deploy/podman

podman version
# Version:      3.4.2
# API Version:  3.4.2
# Go Version:   go1.16.12
# Built:        Wed Feb  2 07:59:28 2022
# OS/Arch:      linux/amd64

mkdir -p /data/assisted-service/
cd /data/assisted-service/

export http_proxy="http://192.168.195.54:5085"
export https_proxy=${http_proxy}

wget https://raw.githubusercontent.com/openshift/assisted-service/master/deploy/podman/configmap.yml
wget https://raw.githubusercontent.com/openshift/assisted-service/master/deploy/podman/pod.yml

/bin/cp -f configmap.yml configmap.yml.bak

unset http_proxy
unset https_proxy

sed -i 's/ SERVICE_BASE_URL:.*/ SERVICE_BASE_URL: "http:\/\/172.21.6.103:8090"/' configmap.yml

cat << EOF > /data/assisted-service/os_image.json
[{
  "openshift_version": "4.9",
  "cpu_architecture": "x86_64",
  "url": "http://192.168.7.11:8080/install/live.iso",
  "rootfs_url": "http://192.168.7.11:8080/install/rootfs.img",
  "version": "49.84.202110081407-0"
}]
EOF
cat << EOF > /data/assisted-service/release.json
[{
  "openshift_version": "4.9",
  "cpu_architecture": "x86_64",
  "url": "quaylab.infra.redhat.ren/ocp4/openshift4:4.9.12-x86_64",
  "version": "4.9.12",
  "default": true
}]
EOF

cat configmap.yml.bak \
  | python3 -c 'import json, yaml, sys; print(json.dumps(yaml.load(sys.stdin)))' \
  | jq --arg OSIMAGE "$(jq -c . /data/assisted-service/os_image.json)" '. | .data.OS_IMAGES = $OSIMAGE ' \
  | jq --arg RELEASE_IMAGES "$(jq -c . /data/assisted-service/release.json)" '. | .data.RELEASE_IMAGES = $RELEASE_IMAGES ' \
  | python3 -c 'import yaml, sys; print(yaml.dump(yaml.load(sys.stdin), default_flow_style=False))' \
  > configmap.yml

# 启动本地assisted service
cd /data/assisted-service/
podman play kube --configmap configmap.yml pod.yml

# 注入离线镜像仓库的证书
podman cp /etc/crts/redhat.ren.ca.crt assisted-installer-service:/etc/pki/ca-trust/source/anchors/quaylab.crt
podman exec assisted-installer-service update-ca-trust

# 用以下命令,停止/删除本地assisted service
cd /data/assisted-service/
podman play kube --down pod.yml

podman exec assisted-installer-image-service du -h /data
# 1.1G    /data

运行成功以后,访问以下url

http://172.21.6.103:8080

创建cluster

访问本地的assist install service, 创建一个cluster, ocp4s-ais.redhat.ren

配置集群的基本信息

填写自己的pull-secret信息,并点击下一步

进入下一个页面后,点击add host

直接点击generate discovery iso,我们会在后面定制ssh key,现在不需要配置。

记录下来download command,因为我们需要里面的env infra id

我们这里的command是

wget -O discovery_image_ocp4s-ais.iso 'http://127.0.0.1:8888/images/b6b173ab-f080-4378-a9e0-bb6ff02f78bb?arch=x86_64&type=full-iso&version=4.9'

定制 assisted install service的配置

assisted install service创建的iso,要去实验网络必须有dhcp服务,我们要做的是static ip,那么我们就要定制一下 assisted install service, 激活他现在还是隐藏的功能(暂时没有官方支持)。

# on helper
cd /data/sno

ASSISTED_SERVICE_URL=http://172.21.6.103:8080
# infra id is part of download url on UI
INFRA_ENV_ID=b6b173ab-f080-4378-a9e0-bb6ff02f78bb
NODE_SSH_KEY="$(cat ~/.ssh/id_rsa.pub)"

SNO_IP=192.168.7.13
SNO_GW=192.168.7.1
SNO_NETMAST=255.255.255.0
SNO_NETMAST_S=24
SNO_HOSTNAME=ocp4-sno
SNO_IF=enp1s0
SNO_IF_MAC=`printf '00:60:2F:%02X:%02X:%02X' $[RANDOM%256] $[RANDOM%256] $[RANDOM%256]`
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_CORE_PWD=redhat

echo ${SNO_IF_MAC} > /data/sno/sno.mac

cat << EOF > /data/sno/server-a.yaml
dns-resolver:
  config:
    server:
    - ${SNO_DNS}
interfaces:
- ipv4:
    address:
    - ip: ${SNO_IP}
      prefix-length: ${SNO_NETMAST_S}
    dhcp: false
    enabled: true
  name: ${SNO_IF}
  state: up
  type: ethernet
routes:
  config:
  - destination: 0.0.0.0/0
    next-hop-address: ${SNO_GW}
    next-hop-interface: ${SNO_IF}
    table-id: 254
EOF

cat << EOF > /data/sno/static.ip.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-static-ip

EOF

VAR_INSTALL_IMAGE_REGISTRY=quaylab.infra.redhat.ren
cat << EOF > /data/sno/install.images.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-install-images
storage:
  files:
    - path: /etc/containers/registries.conf.d/base.registries.conf
      overwrite: true
      contents:
        inline: |
          unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
          short-name-mode = ""

          [[registry]]
            prefix = ""
            location = "quay.io/openshift-release-dev/ocp-release"
            mirror-by-digest-only = true

            [[registry.mirror]]
              location = "${VAR_INSTALL_IMAGE_REGISTRY}/ocp4/openshift4"

            [[registry.mirror]]
              location = "${VAR_INSTALL_IMAGE_REGISTRY}/ocp4/release"

          [[registry]]
            prefix = ""
            location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"
            mirror-by-digest-only = true

            [[registry.mirror]]
              location = "${VAR_INSTALL_IMAGE_REGISTRY}/ocp4/openshift4"

            [[registry.mirror]]
              location = "${VAR_INSTALL_IMAGE_REGISTRY}/ocp4/release"

EOF

cat << EOF > /data/sno/install.crts.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-install-crts
storage:
  files:
    - path: /etc/pki/ca-trust/source/anchors/quaylab.crt
      overwrite: true
      contents:
        inline: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/          /g' )

EOF

mkdir -p /data/sno/disconnected/
# copy ntp related config
/bin/cp -f  /data/ocp4/ocp4-upi-helpernode-master/machineconfig/* /data/sno/disconnected/

# copy image registry proxy related config
cd /data/ocp4
bash image.registries.conf.sh nexus.infra.redhat.ren:8083

/bin/cp -f /data/ocp4/99-worker-container-registries.yaml /data/sno/disconnected/
/bin/cp -f /data/ocp4/99-master-container-registries.yaml /data/sno/disconnected/

cd /data/sno/

# scripts to get ignition from yaml file
# run under bash
# 1st paramter: is the filename which will write to coreos for first boot
# 2nd parameter: is the file content to read from
get_file_content_for_ignition() {
  VAR_FILE_NAME=$1
  VAR_FILE_CONTENT_IN_FILE=$2

  tmppath=$(mktemp)

cat << EOF > $tmppath
      {
        "overwrite": true,
        "path": "$VAR_FILE_NAME",
        "user": {
          "name": "root"
        },
        "contents": {
          "source": "data:text/plain,$(cat $VAR_FILE_CONTENT_IN_FILE | python3 -c "import sys, urllib.parse; print(urllib.parse.quote(''.join(sys.stdin.readlines())))"  )"
        }
      }
EOF

  RET_VAL=$(cat $tmppath | jq -c .)

  FILE_JSON=$(cat $VAR_FILE_CONTENT_IN_FILE | python3 -c 'import json, yaml, sys; print(json.dumps(yaml.load(sys.stdin)))')

cat << EOF > $tmppath
      {
        "overwrite": true,
        "path": "$(echo $FILE_JSON | jq -r .spec.config.storage.files[0].path )",
        "user": {
          "name": "root"
        },
        "contents": {
          "source": "$( echo $FILE_JSON | jq -r .spec.config.storage.files[0].contents.source )"
        }
      }
EOF
  # cat $tmppath

  RET_VAL_2=$(cat $tmppath | jq -c .)

  /bin/rm -f $tmppath
}

get_file_content_for_ignition "/opt/openshift/openshift/99-master-chrony-configuration.yaml" "/data/sno/disconnected/99-master-chrony-configuration.yaml"
VAR_99_master_chrony=$RET_VAL
VAR_99_master_chrony_2=$RET_VAL_2

get_file_content_for_ignition "/opt/openshift/openshift/99-worker-chrony-configuration.yaml" "/data/sno/disconnected/99-worker-chrony-configuration.yaml"
VAR_99_worker_chrony=$RET_VAL
VAR_99_worker_chrony_2=$RET_VAL_2

get_file_content_for_ignition "/opt/openshift/openshift/99-master-container-registries.yaml" "/data/sno/disconnected/99-master-container-registries.yaml"
VAR_99_master_container_registries=$RET_VAL
VAR_99_master_container_registries_2=$RET_VAL_2

get_file_content_for_ignition "/opt/openshift/openshift/99-worker-container-registries.yaml" "/data/sno/disconnected/99-worker-container-registries.yaml"
VAR_99_worker_container_registries=$RET_VAL
VAR_99_worker_container_registries_2=$RET_VAL_2

butane /data/sno/install.images.bu > /data/sno/disconnected/99-zzz-master-install-images.yaml
get_file_content_for_ignition "/opt/openshift/openshift/99-zzz-master-install-images.yaml" "/data/sno/disconnected/99-zzz-master-install-images.yaml"
VAR_99_master_install_images=$RET_VAL
VAR_99_master_install_images_2=$RET_VAL_2

butane /data/sno/install.crts.bu > /data/sno/disconnected/99-zzz-master-install-crts.yaml
get_file_content_for_ignition "/opt/openshift/openshift/99-zzz-master-install-crts.yaml" "/data/sno/disconnected/99-zzz-master-install-crts.yaml"
VAR_99_master_install_crts=$RET_VAL
VAR_99_master_install_crts_2=$RET_VAL_2

# https://access.redhat.com/solutions/6194821
# butane /data/sno/static.ip.bu | python3 -c 'import json, yaml, sys; print(json.dumps(yaml.load(sys.stdin)))'

# https://stackoverflow.com/questions/2854655/command-to-escape-a-string-in-bash
# VAR_PULL_SEC=`printf "%q" $(cat  /data/pull-secret.json)`

# https://access.redhat.com/solutions/221403
# VAR_PWD_HASH="$(openssl passwd -1 -salt 'openshift' 'redhat')"
VAR_PWD_HASH="$(python3 -c 'import crypt,getpass; print(crypt.crypt("redhat"))')"

tmppath=$(mktemp)
butane /data/sno/static.ip.bu \
  | python3 -c 'import json, yaml, sys; print(json.dumps(yaml.load(sys.stdin)))' \
  | jq '.spec.config | .ignition.version = "3.1.0" ' \
  | jq --arg VAR "$VAR_PWD_HASH" --arg VAR_SSH "$NODE_SSH_KEY" '.passwd.users += [{ "name": "wzh", "system": true, "passwordHash": $VAR , "sshAuthorizedKeys": [ $VAR_SSH ], "groups": [ "adm", "wheel", "sudo", "systemd-journal"  ] }]' \
  | jq --argjson VAR "$VAR_99_master_chrony" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_worker_chrony" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_container_registries" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_worker_container_registries" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_install_images" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_install_crts" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_chrony_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_container_registries_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_install_images_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_install_crts_2" '.storage.files += [$VAR] ' \
  | jq -c . \
  > ${tmppath}
VAR_IGNITION=$(cat ${tmppath})
rm -f ${tmppath}

# cat /run/user/0/containers/auth.json
# {
#         "auths": {
#                 "quaylab.infra.redhat.ren": {
#                         "auth": "cXVheWFkbWluOnBhc3N3b3Jk"
#                 }
#         }
# }
request_body=$(mktemp)

jq -n --arg SSH_KEY "$NODE_SSH_KEY" \
  --arg NMSTATE_YAML1 "$(cat server-a.yaml)" \
  --arg MAC_ADDR "$(cat /data/sno/sno.mac)" \
  --arg IF_NIC "${SNO_IF}" \
  --arg PULL_SEC '{"auths":{"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"quaylab.infra.redhat.ren": {"auth": "cXVheWFkbWluOnBhc3N3b3Jk","email": "noemail@localhost"}}}' \
  --arg IGNITION "${VAR_IGNITION}" \
'{
    "proxy":{"http_proxy":"","https_proxy":"","no_proxy":""},
    "ssh_authorized_key":$SSH_KEY,
    "pull_secret":$PULL_SEC,
    "image_type":"full-iso",
    "ignition_config_override":$IGNITION,
    "static_network_config": [
      {
        "network_yaml": $NMSTATE_YAML1,
        "mac_interface_map": [{"mac_address": $MAC_ADDR, "logical_nic_name": $IF_NIC}]
      }
    ]
}' > $request_body

# 我们来看看创建的request body
cat $request_body

# 向 assisted install service发送请求,进行定制
curl -H "Content-Type: application/json" -X PATCH -d @$request_body ${ASSISTED_SERVICE_URL}/api/assisted-install/v2/infra-envs/$INFRA_ENV_ID
# {"cluster_id":"850934fd-fa64-4057-b9d2-1eeebd890e1a","cpu_architecture":"x86_64","created_at":"2022-02-11T03:54:46.632598Z","download_url":"http://127.0.0.1:8888/images/89cc84a1-2dfd-4d7e-9ca3-903342c40d60?arch=x86_64&type=full-iso&version=4.9","email_domain":"Unknown","expires_at":"0001-01-01T00:00:00.000Z","href":"/api/assisted-install/v2/infra-envs/89cc84a1-2dfd-4d7e-9ca3-903342c40d60","id":"89cc84a1-2dfd-4d7e-9ca3-903342c40d60","kind":"InfraEnv","name":"ocp4s_infra-env","openshift_version":"4.9","proxy":{"http_proxy":"","https_proxy":"","no_proxy":""},"pull_secret_set":true,"ssh_authorized_key":"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCrkO4oLIFTwjkGON+aShlQRKwXHOf3XKrGDmpb+tQM3UcbsF2U7klsr9jBcGObQMZO7KBW8mlRu0wC2RxueBgjbqvylKoFacgVZg6PORfkclqE1gZRYFwoxDkLo2c5y5B7OhcAdlHO0eR5hZ3/0+8ZHZle0W+A0AD7qqowO2HlWLkMMt1QXFD7R0r6dzTs9u21jASGk3jjYgCOw5iHvqm2ueVDFAc4yVwNZ4MXKg5MRvqAJDYPqhaRozLE60EGIziy9SRj9HWynyNDncCdL1/IBK2z9T0JwDebD6TDNcPCtL+AeKIpaHed52PkjnFf+Q+8/0Z0iXt6GyFYlx8OkxdsiMgMxiXx43yIRaWZjx54kVtc9pB6CL50UKPQ2LjuFPIZSfaCab5KDgPRtzue82DE6Mxxg4PS+FTW32/bq1WiOxCg9ABrZ0n1CGaZWFepJkSw47wodMnvlBkcKY3Rn/SsLZVOUsJysd+b08LQgl1Fr3hjVrEQMLbyU0UxvoerYfk= root@ocp4-helper","static_network_config":"dns-resolver:\n  config:\n    server:\n    - 172.21.1.1\ninterfaces:\n- ipv4:\n    address:\n    - ip: 172.21.6.13\n      prefix-length: 24\n    dhcp: false\n    enabled: true\n  name: enp1s0\n  state: up\n  type: ethernet\nroutes:\n  config:\n  - destination: 0.0.0.0/0\n    next-hop-address: 172.21.6.254\n    next-hop-interface: enp1s0\n    table-id: 254HHHHH00:60:2F:8B:42:88=enp1s0","type":"full-iso","updated_at":"2022-02-11T04:01:14.008388Z","user_name":"admin"}

rm -f ${request_body}

# on helper
cd /data/sno/
wget -O discovery_image_ocp4s.iso "http://172.21.6.103:8888/images/${INFRA_ENV_ID}?arch=x86_64&type=full-iso&version=4.9"

# coreos-installer iso kargs modify -a \
#   " ip=${SNO_IP}::${SNO_GW}:${SNO_NETMAST}:${SNO_HOSTNAME}:${SNO_IF}:none nameserver=${SNO_DNS}" \
#   /data/sno/discovery_image_ocp4s.iso

/bin/mv -f /data/sno/discovery_image_ocp4s.iso /data/sno/sno.iso

启动kvm

我们回到kvm宿主机,启动kvm,开始安装single node openshift

# back to kvm host

create_lv() {
    var_vg=$1
    var_lv=$2
    var_size=$3
    lvremove -f $var_vg/$var_lv
    lvcreate -y -L $var_size -n $var_lv $var_vg
    wipefs --all --force /dev/$var_vg/$var_lv
}

create_lv vgdata lvsno 120G

export KVM_DIRECTORY=/data/kvm

mkdir -p  ${KVM_DIRECTORY}
cd ${KVM_DIRECTORY}
scp root@192.168.7.11:/data/sno/sno.* ${KVM_DIRECTORY}/

# on kvm host
# export KVM_DIRECTORY=/data/kvm
virt-install --name=ocp4-sno --vcpus=16 --ram=65536 \
--cpu=host-model \
--disk path=/dev/vgdata/lvsno,device=disk,bus=virtio,format=raw \
--os-variant rhel8.3 --network bridge=baremetal,model=virtio,mac=$(<sno.mac) \
--graphics vnc,port=59012 \
--boot menu=on --cdrom ${KVM_DIRECTORY}/sno.iso

在 assisted install service里面配置sno参数

回到 assisted install service webUI,能看到node已经被发现

点击下一步,配置物理机的安装子网

点击下一步,回顾集群配置信息

开始安装,到这里,我们等待就可以

一段时间以后,通常20-30分钟,就安装完成了,当然这要网络情况比较好的条件下。

⚠️不要忘记下载集群证书,还有webUI的用户名,密码。

访问sno集群

# back to helper
# copy kubeconfig from web browser to /data/sno
export KUBECONFIG=/data/sno/auth/kubeconfig

oc get node
# NAME       STATUS   ROLES           AGE   VERSION
# ocp4-sno   Ready    master,worker   9h    v1.22.3+e790d7f

oc get co
# NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
# authentication                             4.9.12    True        False         False      6h6m
# baremetal                                  4.9.12    True        False         False      9h
# cloud-controller-manager                   4.9.12    True        False         False      9h
# cloud-credential                           4.9.12    True        False         False      9h
# cluster-autoscaler                         4.9.12    True        False         False      9h
# config-operator                            4.9.12    True        False         False      9h
# console                                    4.9.12    True        False         False      9h
# csi-snapshot-controller                    4.9.12    True        False         False      9h
# dns                                        4.9.12    True        False         False      6h6m
# etcd                                       4.9.12    True        False         False      9h
# image-registry                             4.9.12    True        False         False      9h
# ingress                                    4.9.12    True        False         False      9h
# insights                                   4.9.12    True        False         False      9h
# kube-apiserver                             4.9.12    True        False         False      9h
# kube-controller-manager                    4.9.12    True        False         False      9h
# kube-scheduler                             4.9.12    True        False         False      9h
# kube-storage-version-migrator              4.9.12    True        False         False      9h
# machine-api                                4.9.12    True        False         False      9h
# machine-approver                           4.9.12    True        False         False      9h
# machine-config                             4.9.12    True        False         False      9h
# marketplace                                4.9.12    True        False         False      9h
# monitoring                                 4.9.12    True        False         False      9h
# network                                    4.9.12    True        False         False      9h
# node-tuning                                4.9.12    True        False         False      9h
# openshift-apiserver                        4.9.12    True        False         False      6h4m
# openshift-controller-manager               4.9.12    True        False         False      9h
# openshift-samples                          4.9.12    True        False         False      6h4m
# operator-lifecycle-manager                 4.9.12    True        False         False      9h
# operator-lifecycle-manager-catalog         4.9.12    True        False         False      9h
# operator-lifecycle-manager-packageserver   4.9.12    True        False         False      9h
# service-ca                                 4.9.12    True        False         False      9h
# storage                                    4.9.12    True        False         False      9h

访问集群的webUI

https://console-openshift-console.apps.ocp4s-ais.redhat.ren/

用户名密码是: kubeadmin / Sb7Fp-U466I-SkPB4-6bpEn

reference

https://github.com/openshift/assisted-service/tree/master/docs/user-guide

  • https://access.redhat.com/solutions/6135171
  • https://github.com/openshift/assisted-service/blob/master/docs/user-guide/assisted-service-on-local.md
  • https://github.com/openshift/assisted-service/blob/master/docs/user-guide/restful-api-guide.md

search

  • pre-network-manager-config.sh
  • /Users/wzh/Desktop/dev/assisted-service/internal/constants/scripts.go
  • NetworkManager

https://superuser.com/questions/218340/how-to-generate-a-valid-random-mac-address-with-bash-shell

end


cat << EOF > test
02:00:00:2c:23:a5=enp1s0
EOF
cat test | cut -d= -f1 | tr '[:lower:]' '[:upper:]'

printf '00-60-2F-%02X-%02X-%02X\n' $[RANDOM%256] $[RANDOM%256] $[RANDOM%256]
virsh domifaddr freebsd11.1

cat configmap.yml | python3 -c 'import json, yaml, sys; print(json.dumps(yaml.load(sys.stdin)))' | jq -r .data.OS_IMAGES | jq '.[] | select( .openshift_version == "4.9" and .cpu_architecture == "x86_64" ) ' | jq .
# {
#   "openshift_version": "4.9",
#   "cpu_architecture": "x86_64",
#   "url": "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.9/4.9.0/rhcos-4.9.0-x86_64-live.x86_64.iso",
#   "rootfs_url": "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.9/4.9.0/rhcos-live-rootfs.x86_64.img",
#   "version": "49.84.202110081407-0"
# }

cat configmap.yml | python3 -c 'import json, yaml, sys; print(json.dumps(yaml.load(sys.stdin)))' | jq -r .data.RELEASE_IMAGES | jq -r .
# [
#   {
#     "openshift_version": "4.6",
#     "cpu_architecture": "x86_64",
#     "url": "quay.io/openshift-release-dev/ocp-release:4.6.16-x86_64",
#     "version": "4.6.16"
#   },
#   {
#     "openshift_version": "4.7",
#     "cpu_architecture": "x86_64",
#     "url": "quay.io/openshift-release-dev/ocp-release:4.7.42-x86_64",
#     "version": "4.7.42"
#   },
#   {
#     "openshift_version": "4.8",
#     "cpu_architecture": "x86_64",
#     "url": "quay.io/openshift-release-dev/ocp-release:4.8.29-x86_64",
#     "version": "4.8.29"
#   },
#   {
#     "openshift_version": "4.9",
#     "cpu_architecture": "x86_64",
#     "url": "quay.io/openshift-release-dev/ocp-release:4.9.18-x86_64",
#     "version": "4.9.18",
#     "default": true
#   },
#   {
#     "openshift_version": "4.9",
#     "cpu_architecture": "arm64",
#     "url": "quay.io/openshift-release-dev/ocp-release:4.9.18-aarch64",
#     "version": "4.9.18"
#   },
#   {
#     "openshift_version": "4.10",
#     "cpu_architecture": "x86_64",
#     "url": "quay.io/openshift-release-dev/ocp-release:4.10.0-rc.0-x86_64",
#     "version": "4.10.0-rc.0"
#   }
# ]


cat << EOF > /data/sno/static.ip.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-static-ip
# passwd:
#   users:
#     name: wzh
#     password_hash: "$(openssl passwd -1 wzh)"
# storage:
#   files:
#     - path: /etc/NetworkManager/system-connections/${SNO_IF}.nmconnection
#       overwrite: true
#       contents:
#         inline: |
#           [connection]
#           id=${SNO_IF}
#           type=ethernet
#           autoconnect-retries=1
#           interface-name=${SNO_IF}
#           multi-connect=1
#           permissions=
#           wait-device-timeout=60000

#           [ethernet]
#           mac-address-blacklist=

#           [ipv4]
#           address1=${SNO_IP}/${SNO_NETMAST_S=24},${SNO_GW}
#           dhcp-hostname=${SNO_HOSTNAME}
#           dhcp-timeout=90
#           dns=${SNO_DNS};
#           dns-search=
#           may-fail=false
#           method=manual

#           [ipv6]
#           addr-gen-mode=eui64
#           dhcp-hostname=${SNO_HOSTNAME}
#           dhcp-timeout=90
#           dns-search=
#           method=disabled

#           [proxy]

EOF


# https://access.redhat.com/solutions/221403
# VAR_PWD_HASH="$(openssl passwd -1 -salt 'openshift' 'redhat')"
VAR_PWD_HASH="$(python3 -c 'import crypt,getpass; print(crypt.crypt("redhat"))')"

tmppath=$(mktemp)
butane /data/sno/static.ip.bu \
  | python3 -c 'import json, yaml, sys; print(json.dumps(yaml.load(sys.stdin)))' \
  | jq '.spec.config | .ignition.version = "3.1.0" ' \
  | jq --arg VAR "$VAR_PWD_HASH" --arg VAR_SSH "$NODE_SSH_KEY" '.passwd.users += [{ "name": "wzh", "system": true, "passwordHash": $VAR , "sshAuthorizedKeys": [ $VAR_SSH ], "groups": [ "adm", "wheel", "sudo", "systemd-journal"  ] }]' \
  | jq --argjson VAR "$VAR_99_master_chrony" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_worker_chrony" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_container_registries" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_worker_container_registries" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_install_images" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_install_crts" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_chrony_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_container_registries_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_install_images_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_install_crts_2" '.storage.files += [$VAR] ' \
  | jq -c . \
  > ${tmppath}
VAR_IGNITION=$(cat ${tmppath})
rm -f ${tmppath}

bottom

openshift 4.6 静态IP离线 baremetal 安装,包含operator hub

安装过程视频

本文描述ocp4.6在baremetal(kvm模拟)上面,静态ip安装的方法。包括operator hub步骤。

架构图

离线安装包下载

ocp4.3的离线安装包下载和3.11不太一样,按照如下方式准备。另外,由于默认的baremetal是需要dhcp, pxe环境的,那么需要准备一个工具机,上面有dhcp, tftp, haproxy等工具,另外为了方便项目现场工作,还准备了ignition文件的修改工具,所以离线安装包需要一些其他第三方的工具。

https://github.com/wangzheng422/ocp4-upi-helpernode 这个工具,是创建工具机用的。

https://github.com/wangzheng422/filetranspiler 这个工具,是修改ignition文件用的。

打包好的安装包,在这里下载,百度盘下载链接,版本是4.6.5:

链接: https://pan.baidu.com/s/1-5QWpayV2leinq4DOtiFEg 密码: gjoe

其中包括如下类型的文件:

  • ocp4.tgz 这个文件包含了iso等安装介质,以及各种安装脚本,全部下载的镜像列表等。需要复制到宿主机,以及工具机上去。
  • registry.tgz 这个文件也是docker image registry的仓库打包文件。需要先补充镜像的话,按照这里操作: 4.6.add.image.md
  • install.image.tgz 这个文件是安装集群的时候,需要的补充镜像.
  • rhel-data.7.9.tgz 这个文件是 rhel 7 主机的yum更新源,这么大是因为里面有gpu, epel等其他的东西。这个包主要用于安装宿主机,工具机,以及作为计算节点的rhel。

合并这些切分文件,使用类似如下的命令

cat registry.?? > registry.tgz

在外网云主机上面准备离线安装源

准备离线安装介质的文档,已经转移到了这里:4.6.build.dist.md

宿主机准备

本次实验,是在一个32C, 256G 的主机上面,用很多个虚拟机安装测试。所以先准备这个宿主机。

如果是多台宿主机,记得一定要调整时间配置,让这些宿主机的时间基本一致,否则证书会出问题。

主要的准备工作有

  • 配置yum源
  • 配置dns
  • 安装镜像仓库
  • 配置vnc环境
  • 配置kvm需要的网络
  • 创建helper kvm
  • 配置一个haproxy,从外部导入流量给kvm

以上准备工作,dns部分需要根据实际项目环境有所调整。

本次的宿主机是一台rhel7

cat << EOF >>  /etc/hosts
127.0.0.1 registry.ocp4.redhat.ren
EOF

# 准备yum更新源
mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak
cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://127.0.0.1/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum repolist

yum -y install byobu htop 

systemctl disable --now firewalld

# 配置registry
mkdir -p /etc/crts/ && cd /etc/crts
openssl req \
   -newkey rsa:2048 -nodes -keyout redhat.ren.key \
   -x509 -days 3650 -out redhat.ren.crt -subj \
   "/C=CN/ST=GD/L=SZ/O=Global Security/OU=IT Department/CN=*.ocp4.redhat.ren" \
   -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "[SAN]\nsubjectAltName=DNS:registry.ocp4.redhat.ren,DNS:*.ocp4.redhat.ren,DNS:*.redhat.ren"))

/bin/cp -f /etc/crts/redhat.ren.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

cd /data
mkdir -p /data/registry
# tar zxf registry.tgz
yum -y install podman docker-distribution pigz skopeo
# pigz -dc registry.tgz | tar xf -
cat << EOF > /etc/docker-distribution/registry/config.yml
version: 0.1
log:
  fields:
    service: registry
storage:
    cache:
        layerinfo: inmemory
    filesystem:
        rootdirectory: /data/4.6.5/registry
    delete:
        enabled: true
http:
    addr: :5443
    tls:
       certificate: /etc/crts/redhat.ren.crt
       key: /etc/crts/redhat.ren.key
compatibility:
  schema1:
    enabled: true
EOF
# systemctl restart docker
# systemctl stop docker-distribution
systemctl enable --now docker-distribution
# systemctl restart docker-distribution
# podman login registry.redhat.ren:5443 -u a -p a

# firewall-cmd --permanent --add-port=5443/tcp
# firewall-cmd --reload

# 加载更多的镜像
# 解压缩 ocp4.tgz
bash add.image.load.sh /data/4.6.5/install.image 'registry.ocp4.redhat.ren:5443'

# https://github.com/christianh814/ocp4-upi-helpernode/blob/master/docs/quickstart.md

# 准备vnc环境

yum -y install tigervnc-server tigervnc gnome-terminal gnome-session \
  gnome-classic-session gnome-terminal nautilus-open-terminal \
  control-center liberation-mono-fonts google-noto-sans-cjk-fonts \
  google-noto-sans-fonts fonts-tweak-tool

yum install -y    qgnomeplatform   xdg-desktop-portal-gtk \
  NetworkManager-libreswan-gnome   PackageKit-command-not-found \
  PackageKit-gtk3-module   abrt-desktop   at-spi2-atk   at-spi2-core   \
  avahi   baobab   caribou   caribou-gtk2-module   caribou-gtk3-module   \
  cheese   compat-cheese314   control-center   dconf   empathy   eog   \
  evince   evince-nautilus   file-roller   file-roller-nautilus   \
  firewall-config   firstboot   fprintd-pam   gdm   gedit   glib-networking   \
  gnome-bluetooth   gnome-boxes   gnome-calculator   gnome-classic-session   \
  gnome-clocks   gnome-color-manager   gnome-contacts   gnome-dictionary   \
  gnome-disk-utility   gnome-font-viewer   gnome-getting-started-docs   \
  gnome-icon-theme   gnome-icon-theme-extras   gnome-icon-theme-symbolic   \
  gnome-initial-setup   gnome-packagekit   gnome-packagekit-updater   \
  gnome-screenshot   gnome-session   gnome-session-xsession   \
  gnome-settings-daemon   gnome-shell   gnome-software   gnome-system-log   \
  gnome-system-monitor   gnome-terminal   gnome-terminal-nautilus   \
  gnome-themes-standard   gnome-tweak-tool   nm-connection-editor   orca   \
  redhat-access-gui   sane-backends-drivers-scanners   seahorse   \
  setroubleshoot   sushi   totem   totem-nautilus   vinagre   vino   \
  xdg-user-dirs-gtk   yelp

yum install -y    cjkuni-uming-fonts   dejavu-sans-fonts   \
  dejavu-sans-mono-fonts   dejavu-serif-fonts   gnu-free-mono-fonts   \
  gnu-free-sans-fonts   gnu-free-serif-fonts   \
  google-crosextra-caladea-fonts   google-crosextra-carlito-fonts   \
  google-noto-emoji-fonts   jomolhari-fonts   khmeros-base-fonts   \
  liberation-mono-fonts   liberation-sans-fonts   liberation-serif-fonts   \
  lklug-fonts   lohit-assamese-fonts   lohit-bengali-fonts   \
  lohit-devanagari-fonts   lohit-gujarati-fonts   lohit-kannada-fonts   \
  lohit-malayalam-fonts   lohit-marathi-fonts   lohit-nepali-fonts   \
  lohit-oriya-fonts   lohit-punjabi-fonts   lohit-tamil-fonts   \
  lohit-telugu-fonts   madan-fonts   nhn-nanum-gothic-fonts   \
  open-sans-fonts   overpass-fonts   paktype-naskh-basic-fonts   \
  paratype-pt-sans-fonts   sil-abyssinica-fonts   sil-nuosu-fonts   \
  sil-padauk-fonts   smc-meera-fonts   stix-fonts   \
  thai-scalable-waree-fonts   ucs-miscfixed-fonts   vlgothic-fonts   \
  wqy-microhei-fonts   wqy-zenhei-fonts

vncpasswd

cat << EOF > ~/.vnc/xstartup
#!/bin/sh
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
vncconfig &
gnome-session &
EOF
chmod +x ~/.vnc/xstartup

vncserver :1 -geometry 1280x800
# 如果你想停掉vnc server,这么做
vncserver -kill :1

# firewall-cmd --permanent --add-port=6001/tcp
# firewall-cmd --permanent --add-port=5901/tcp
# firewall-cmd --reload

# connect vnc at port 5901
# export DISPLAY=:1

# https://www.cyberciti.biz/faq/how-to-install-kvm-on-centos-7-rhel-7-headless-server/

# 配置kvm环境
yum -y install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install virt-viewer virt-manager

systemctl enable libvirtd
systemctl start libvirtd

lsmod | grep -i kvm
brctl show
virsh net-list
virsh net-dumpxml default

# 创建实验用虚拟网络

cat << EOF >  /data/virt-net.xml
<network>
  <name>openshift4</name>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='openshift4' stp='on' delay='0'/>
  <domain name='openshift4'/>
  <ip address='192.168.7.1' netmask='255.255.255.0'>
  </ip>
</network>
EOF

virsh net-define --file virt-net.xml
virsh net-autostart openshift4
virsh net-start openshift4

# restore back
virsh net-destroy openshift4
virsh net-undefine openshift4

# 创建工具机

mkdir -p /data/kvm
cd /data/kvm

lvremove -f datavg/helperlv
lvcreate -y -L 430G -n helperlv datavg

virt-install --name="ocp4-aHelper" --vcpus=2 --ram=4096 \
--disk path=/dev/datavg/helperlv,device=disk,bus=virtio,format=raw \
--os-variant centos7.0 --network network=openshift4,model=virtio \
--boot menu=on --location /data/kvm/rhel-server-7.8-x86_64-dvd.iso \
--initrd-inject helper-ks.cfg --extra-args "inst.ks=file:/helper-ks.cfg" 

# virt-viewer --domain-name ocp4-aHelper
# virsh start ocp4-aHelper
# virsh list --all

# start chrony/ntp server on host
cat << EOF > /etc/chrony.conf
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow 192.0.0.0/8
local stratum 10
logdir /var/log/chrony
EOF
systemctl enable --now chronyd
# systemctl restart chronyd
chronyc tracking
chronyc sources -v
chronyc sourcestats -v
chronyc makestep

工具机准备

以下是在工具机里面,进行的安装操作。

主要的操作有

  • 配置yum源
  • 运行ansible脚本,自动配置工具机
  • 上传定制的安装配置文件
  • 生成ignition文件

sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config
systemctl restart sshd

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

# in helper node
mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak/
cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://192.168.7.1/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum repolist

yum -y install ansible git unzip podman python36

mkdir -p /data/ocp4/
# scp ocp4.tgz to /data
cd /data
tar zvxf ocp4.tgz
cd /data/ocp4

# 这里使用了一个ansible的项目,用来部署helper节点的服务。
# https://github.com/wangzheng422/ocp4-upi-helpernode
unzip ocp4-upi-helpernode.zip
# 这里使用了一个ignition文件合并的项目,用来帮助自定义ignition文件。
# https://github.com/wangzheng422/filetranspiler
podman load -i filetranspiler.tgz

# 接下来,我们使用ansible来配置helper节点,装上各种openshift集群需要的服务
# 根据现场环境,修改 ocp4-upi-helpernode-master/vars-static.yaml
# 主要是修改各个节点的网卡和硬盘参数,还有IP地址
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars-static.yaml -e '{staticips: true}' tasks/main.yml

# try this:
/usr/local/bin/helpernodecheck

mkdir -p /data/install

# GOTO image registry host
# copy crt files to helper node
scp /etc/crts/redhat.ren.ca.crt root@192.168.7.11:/data/install/
scp /etc/crts/redhat.ren.crt root@192.168.7.11:/data/install/
scp /etc/crts/redhat.ren.key root@192.168.7.11:/data/install/

# GO back to help node
/bin/cp -f /data/install/redhat.ren.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

# 定制ignition
cd /data/install

# 根据现场环境,修改 install-config.yaml
# 至少要修改ssh key, 还有 additionalTrustBundle,这个是镜像仓库的csr 

# vi install-config.yaml 
cat << EOF > /data/install/install-config.yaml 
apiVersion: v1
baseDomain: redhat.ren
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 3
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.254.0.0/16
    hostPrefix: 24
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: '{"auths":{"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ppa.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"}}}'
sshKey: |
$( cat /root/.ssh/helper_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /data/install/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

cd /data/install/
/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] 

openshift-install create ignition-configs --dir=/data/install

cd /data/ocp4/ocp4-upi-helpernode-master
# 我们来为每个主机,复制自己版本的ign,并复制到web server的目录下
ansible-playbook -e @vars-static.yaml -e '{staticips: true}' tasks/ign.yml
# 如果对每个主机有自己ign的独特需求,在这一步,去修改ign。

# 以下操作本来是想设置网卡地址,但是实践发现是不需要的。
# 保留在这里,是因为他可以在安装的时候注入文件,非常有用。
# mkdir -p bootstrap/etc/sysconfig/network-scripts/
# cat <<EOF > bootstrap/etc/sysconfig/network-scripts/ifcfg-ens3
# DEVICE=ens3
# BOOTPROTO=none
# ONBOOT=yes
# IPADDR=192.168.7.12
# NETMASK=255.255.255.0
# GATEWAY=192.168.7.1
# DNS=192.168.7.11
# DNS1=192.168.7.11
# DNS2=192.168.7.1
# DOMAIN=redhat.ren
# PREFIX=24
# DEFROUTE=yes
# IPV6INIT=no
# EOF
# filetranspiler -i bootstrap.ign -f bootstrap -o bootstrap-static.ign
# /bin/cp -f bootstrap-static.ign /var/www/html/ignition/

# 我们为每个节点创建各自的iso文件
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars-static.yaml -e '{staticips: true}' tasks/iso.yml

回到宿主机

本来,到了这一步,就可以开始安装了,但是我们知道coreos装的时候,要手动输入很长的命令行,实际操作的时候,那是不可能输入对的,输入错一个字符,安装就失败,要重启,重新输入。。。

为了避免这种繁琐的操作,参考网上的做法,我们就需要为每个主机定制iso了。好在,之前的步骤,我们已经用ansible创建了需要的iso,我们把这些iso复制到宿主机上,就可以继续了。

这里面有一个坑,我们是不知道主机的网卡名称的,只能先用coreos iso安装启动一次,进入单用户模式以后,ip a 来查看以下,才能知道,一般来说,是ens3。

另外,如果是安装物理机,disk是哪个,也需要上述的方法,来看看具体的盘符。另外,推荐在物理机上安装rhel 8 来测试一下物理机是不是支持coreos。物理机安装的时候,遇到不写盘的问题,可以尝试添加启动参数: ignition.firstboot=1

# on kvm host

export KVM_DIRECTORY=/data/kvm

cd ${KVM_DIRECTORY}
scp root@192.168.7.11:/data/install/*.iso ${KVM_DIRECTORY}/

create_lv() {
    var_name=$1
    lvremove -f datavg/$var_name
    lvcreate -y -L 120G -n $var_name datavg
    # wipefs --all --force /dev/datavg/$var_name
}

create_lv bootstraplv
create_lv master0lv
create_lv master1lv
create_lv master2lv
create_lv worker0lv
create_lv worker1lv
create_lv worker2lv

# finally, we can start install :)
# 你可以一口气把虚拟机都创建了,然后喝咖啡等着。
# 从这一步开始,到安装完毕,大概30分钟。
virt-install --name=ocp4-bootstrap --vcpus=4 --ram=8192 \
--disk path=/dev/datavg/bootstraplv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-bootstrap.iso   

# 想登录进coreos一探究竟?那么这么做
# ssh core@bootstrap 
# journalctl -b -f -u bootkube.service

virt-install --name=ocp4-master0 --vcpus=4 --ram=16384 \
--disk path=/dev/datavg/master0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-0.iso 

# ssh core@192.168.7.13

virt-install --name=ocp4-master1 --vcpus=4 --ram=16384 \
--disk path=/dev/datavg/master1lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-1.iso 

virt-install --name=ocp4-master2 --vcpus=4 --ram=16384 \
--disk path=/dev/datavg/master2lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-2.iso 

virt-install --name=ocp4-worker0 --vcpus=4 --ram=32768 \
--disk path=/dev/datavg/worker0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-worker-0.iso 

virt-install --name=ocp4-worker1 --vcpus=4 --ram=16384 \
--disk path=/dev/datavg/worker1lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-worker-1.iso 

virt-install --name=ocp4-worker2 --vcpus=4 --ram=16384 \
--disk path=/dev/datavg/worker2lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-worker-2.iso 

# on workstation
# open http://192.168.7.11:9000/
# to check

# if you want to stop or delete vm, try this
virsh list --all
virsh destroy ocp4-bootstrap
virsh destroy ocp4-master0 
virsh destroy ocp4-master1 
virsh destroy ocp4-master2 
virsh destroy ocp4-worker0 
virsh destroy ocp4-worker1 
virsh destroy ocp4-worker2
virsh undefine ocp4-bootstrap
virsh undefine ocp4-master0 
virsh undefine ocp4-master1 
virsh undefine ocp4-master2 
virsh undefine ocp4-worker0 
virsh undefine ocp4-worker1 
virsh undefine ocp4-worker2

在工具机上面

这个时候,安装已经自动开始了,我们只需要回到工具机上静静的观察就可以了。

在bootstrap和装master阶段,用这个命令看进度。

cd /data/install
export KUBECONFIG=/data/install/auth/kubeconfig
echo "export KUBECONFIG=/data/install/auth/kubeconfig" >> ~/.bashrc
oc completion bash | sudo tee /etc/bash_completion.d/openshift > /dev/null

cd /data/install
openshift-install wait-for bootstrap-complete --log-level debug

一切正常的话,会看到这个。

有时候证书会过期,验证方法是登录 bootstrap, 看看过期时间。如果确定过期,要清除所有的openshift-install生成配置文件的缓存,重新来过。

echo | openssl s_client -connect localhost:6443 | openssl x509 -noout -text | grep Not

一般来说,如果在openshift-install这一步之前,按照文档,删除了缓存文件,就不会出现过期的现象。

oc get nodes

这个时候,只能看到master,是因为worker的csr没有批准。如果虚拟机是一口气创建的,那么多半不会遇到下面的问题。

oc get csr

会发现有很多没有被批准的

批准之

yum -y install jq
oc get csr | grep -v Approved
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
# oc get csr -o name | xargs oc adm certificate approve

然后worker 节点cpu飙高,之后就能看到worker了。

等一会,会看到这个,就对了。

上面的操作完成以后,就可以完成最后的安装了

openshift-install wait-for install-complete --log-level debug
# here is the output
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "7MXaT-vqouq-UukdG-uzNEi"

我们的工具机是带nfs的,那么就配置高档一些的nfs存储吧,不要用emptydir

bash /data/ocp4/ocp4-upi-helpernode-master/files/nfs-provisioner-setup.sh

# oc edit configs.imageregistry.operator.openshift.io
# 修改 storage 部分
# storage:
#   pvc:
#     claim:
oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Managed","storage":{"pvc":{"claim":""}}}}' --type=merge

oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

oc get clusteroperator image-registry

oc get configs.imageregistry.operator.openshift.io cluster -o yaml

# 把imagepruner给停掉
# https://bugzilla.redhat.com/show_bug.cgi?id=1852501#c24
# oc patch imagepruner.imageregistry/cluster --patch '{"spec":{"suspend":true}}' --type=merge
# oc -n openshift-image-registry delete jobs --all

oc get configs.samples.operator.openshift.io/cluster -o yaml

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Managed"}}' --type=merge

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

配置一下本地的dns ( 把 *.apps.ocp4.redhat.ren 配置成 192.168.7.11 ) ,指向工具机的haproxy,打开浏览器就能访问管理界面了

chrony/NTP 设置

在 ocp 4.6 里面,需要设定ntp同步,我们之前ansible脚本,已经创建好了ntp的mco配置,把他打到系统里面就好了。

oc apply -f /data/ocp4/ocp4-upi-helpernode-master/machineconfig/

Operator Hub 离线安装

https://docs.openshift.com/container-platform/4.2/operators/olm-restricted-networks.html

https://github.com/operator-framework/operator-registry

https://www.cnblogs.com/ericnie/p/11777384.html?from=timeline&isappinstalled=0

https://access.redhat.com/documentation/en-us/openshift_container_platform/4.2/html-single/images/index

operator hub 准备分2个层次,一个是本文章描述的,制作operator hub的离线资源,并镜像operator 镜像。做到这一步,能够在离线部署的ocp4.2上,看到operator hub,并且能够部署operator。但是如果要用operator来部署要用的组件,那么operator会再去下载镜像,这个层次的镜像,也需要离线部署,但是由于每个operator需要的镜像都不一样,也没有统一的地方进行描述,所以需要各个项目现场,根据需要另外部署,本项目会尽量多的下载需要的镜像,但是目前无法避免遗漏。

# on helper node, 在工具机上
cd /data/ocp4

# scp /etc/crts/redhat.ren.crt 192.168.7.11:/root/ocp4/
# https://docs.openshift.com/container-platform/4.4/builds/setting-up-trusted-ca.html
oc project openshift-config
oc create configmap ca.for.registry -n openshift-config \
    --from-file=registry.ocp4.redhat.ren..5443=/data/install/redhat.ren.crt
# 如果你想删除这个config map,这么做
# oc delete configmap ca.for.registry
oc patch image.config.openshift.io/cluster -p '{"spec":{"additionalTrustedCA":{"name":"ca.for.registry"}}}'  --type=merge
# oc patch image.config.openshift.io/cluster -p '{"spec":{"registrySources":{"insecureRegistries":["registry.redhat.ren"]}}}'  --type=merge
oc get image.config.openshift.io/cluster -o yaml

# 以下这个步骤是官网文档要做的,实践中发现,disconnected环境不需要
# oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'
# 如果你不小心还是照着官网做了,用如下步骤删掉
# oc patch OperatorHub cluster --type json  -p '[{"op": "remove", "path": "/spec/disableAllDefaultSources"}]'

oc patch OperatorHub cluster --type json \
    -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'

oc get OperatorHub cluster -o yaml

# yum -y install python36
# 根据项目现场情况,调整参数,运行以下命令,生成配置文件,指向内网镜像仓库
cd /data/ocp4/
bash image.registries.conf.sh registry.ocp4.redhat.ren:5443

# 由于某些ocp 4.2的更新机制,以下操作会触发集群更新,
# 集群节点会逐个重启,集群组件也会逐个重启,请等待集群重启完毕。
oc apply -f ./99-worker-container-registries.yaml -n openshift-config
oc apply -f ./99-master-container-registries.yaml -n openshift-config

# !!!正常情况,以下操作不需要!!!
# 以下操作,删除mirror镜像信息,也会触发集群更新操作,请等待集群重启完毕
oc delete -f ./99-worker-container-registries.yaml -n openshift-config
oc delete -f ./99-master-container-registries.yaml -n openshift-config

watch oc get machineconfigpools

watch oc get node

从监控界面,能看到节点在升级,重启。


# on helper node

# params for operator hub images
export var_date='2020.11.23.0135'
echo $var_date
export var_major_version='4.6'
echo ${var_major_version}

export LOCAL_REG='registry.ocp4.redhat.ren:5443'

# 如果想看到redhat的operator,这样做
# 镜像源在 docker.io/wangzheng422/operator-catalog:redhat-$var_major_version-$var_date
# 后面的参数,去build.dist.sh文件里面,查看
# var_date 和 var_major_version 参数得到
cat <<EOF > redhat-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operators-catalog
  namespace: openshift-marketplace
spec:
  displayName: Red Hat Operators
  sourceType: grpc
  image: ${LOCAL_REG}/ocp4/operator-catalog:redhat-${var_major_version}-${var_date}
  publisher: Red Hat
EOF
oc create -f redhat-operator-catalog.yaml

# 如果想看到certified的operator,这样做
# 镜像源在 docker.io/wangzheng422/operator-catalog:certified-$var_major_version-$var_date
# 后面的参数,去build.dist.sh文件里面,查看
# var_date 和 var_major_version 参数得到
cat <<EOF > certified-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: certified-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: Certified Operators
  sourceType: grpc
  image: ${LOCAL_REG}/ocp4/operator-catalog:certified-${var_major_version}-${var_date}
  publisher: Red Hat
EOF
oc create -f certified-operator-catalog.yaml

# 如果想看到community的operator,这样做
# 镜像源在 docker.io/wangzheng422/operator-catalog:community-$var_major_version-$var_date
# 后面的参数,去build.dist.sh文件里面,查看
# var_date 和 var_major_version 参数得到
cat <<EOF > community-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: community-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: Community Operator
  sourceType: grpc
  image: ${LOCAL_REG}/ocp4/operator-catalog:community-${var_major_version}-${var_date}
  publisher: Red Hat
EOF
oc create -f community-operator-catalog.yaml

cat <<EOF > marketplace-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-marketplace-catalog
  namespace: openshift-marketplace
spec:
  displayName: Red Hat Marketplace
  sourceType: grpc
  image: ${LOCAL_REG}/ocp4/operator-catalog:redhat-marketplace-${var_major_version}-${var_date}
  publisher: Red Hat
EOF
oc create -f marketplace-operator-catalog.yaml

# 想删除这些离线operator hub,就这样做。
# find . -name "*-operator-catalog.yaml" -exec oc delete -f {} \;

oc get pods -n openshift-marketplace
oc get catalogsource -n openshift-marketplace
oc get packagemanifest -n openshift-marketplace

能看到operator 列表

部署一个operator也能成功

# set master and worker combine
# https://github.com/openshift-telco/openshift4x-poc/blob/master/MASTER-WORKER-COMBINED.md
oc edit schedulers cluster
# apiVersion: config.openshift.io/v1
# kind: Scheduler
# metadata:
# name: cluster
# spec:
#     mastersSchedulable: true

其他链接

https://www.cnblogs.com/ericnie/p/11764124.html

以下是参考材料

https://blog.openshift.com/openshift-4-2-disconnected-install/

https://blog.openshift.com/openshift-4-bare-metal-install-quickstart/

https://github.com/christianh814/ocp4-upi-helpernode#ocp4-upi-helper-node-playbook

https://github.com/openshift/cluster-samples-operator/blob/master/manifests/image-references

https://github.com/e-minguez/ocp4-upi-bm-pxeless-staticips/blob/master/docs/12-post-installation.md

https://www.openshift.com/blog/deploying-a-upi-environment-for-openshift-4-1-on-vms-and-bare-metal

openshift 4.6 静态IP离线 baremetal 安装,包含operator hub

安装过程视频

本文描述ocp4.6在baremetal(kvm模拟)上面,静态ip安装的方法。包括operator hub步骤。

离线安装包下载

ocp4.3的离线安装包下载和3.11不太一样,按照如下方式准备。另外,由于默认的baremetal是需要dhcp, pxe环境的,那么需要准备一个工具机,上面有dhcp, tftp, haproxy等工具,另外为了方便项目现场工作,还准备了ignition文件的修改工具,所以离线安装包需要一些其他第三方的工具。

https://github.com/wangzheng422/ocp4-upi-helpernode 这个工具,是创建工具机用的。

https://github.com/wangzheng422/filetranspiler 这个工具,是修改ignition文件用的。

打包好的安装包,在这里下载,百度盘下载链接,版本是4.6.28:

  • 链接: https://pan.baidu.com/s/1XFbiOAcz7nul-N9U0aDxHg 密码: 6qtt

其中包括如下类型的文件:

  • ocp4.tgz 这个文件包含了iso等安装介质,以及各种安装脚本,全部下载的镜像列表等。需要复制到宿主机,以及工具机上去。
  • registry.tgz 这个文件也是docker image registry的仓库打包文件。需要先补充镜像的话,按照这里操作: 4.6.add.image.md
  • install.image.tgz 这个文件是安装集群的时候,需要的补充镜像.
  • rhel-data.7.9.tgz 这个文件是 rhel 7 主机的yum更新源,这么大是因为里面有gpu, epel等其他的东西。这个包主要用于安装宿主机,工具机,以及作为计算节点的rhel。

合并这些切分文件,使用类似如下的命令

cat registry.?? > registry.tgz

在外网云主机上面准备离线安装源

准备离线安装介质的文档,已经转移到了这里:4.6.build.dist.md

宿主机准备

本次实验,是在一个32C, 256G 的主机上面,用很多个虚拟机安装测试。所以先准备这个宿主机。

如果是多台宿主机,记得一定要调整时间配置,让这些宿主机的时间基本一致,否则证书会出问题。

主要的准备工作有

  • 配置yum源
  • 配置dns
  • 安装镜像仓库
  • 配置vnc环境
  • 配置kvm需要的网络
  • 创建helper kvm
  • 配置一个haproxy,从外部导入流量给kvm

以上准备工作,dns部分需要根据实际项目环境有所调整。

本次的宿主机是一台rhel8, 参考这里进行离线repo等基本的配置rhel8.build.kernel.repo.cache.md

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

cat << EOF >>  /etc/hosts
127.0.0.1 registry.ocp4.redhat.ren
EOF

dnf clean all
dnf repolist

dnf -y install byobu htop 

systemctl disable --now firewalld

# 配置registry
mkdir -p /etc/crts/ && cd /etc/crts
openssl req \
   -newkey rsa:2048 -nodes -keyout redhat.ren.key \
   -x509 -days 3650 -out redhat.ren.crt -subj \
   "/C=CN/ST=GD/L=SZ/O=Global Security/OU=IT Department/CN=*.ocp4.redhat.ren" \
   -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "[SAN]\nsubjectAltName=DNS:registry.ocp4.redhat.ren,DNS:*.ocp4.redhat.ren,DNS:*.redhat.ren"))

/bin/cp -f /etc/crts/redhat.ren.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

cd /data
mkdir -p /data/registry
# tar zxf registry.tgz
dnf -y install podman pigz skopeo jq 
# pigz -dc registry.tgz | tar xf -
cd /data/ocp4
podman load -i /data/ocp4/registry.tgz

podman run --name local-registry -p 5443:5000 \
  -d --restart=always \
  -v /data/registry/:/var/lib/registry:z \
  -v /etc/crts:/certs:z \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/redhat.ren.crt \
  -e REGISTRY_HTTP_TLS_KEY=/certs/redhat.ren.key \
  docker.io/library/registry:2

# firewall-cmd --permanent --add-port=5443/tcp
# firewall-cmd --reload

# 加载更多的镜像
# 解压缩 ocp4.tgz
bash add.image.load.sh /data/install.image 'registry.ocp4.redhat.ren:5443'

# https://github.com/christianh814/ocp4-upi-helpernode/blob/master/docs/quickstart.md

# 准备vnc环境
vncpasswd

cat << EOF > ~/.vnc/config
session=gnome
securitytypes=vncauth,tlsvnc
desktop=sandbox
geometry=1280x800
alwaysshared
EOF

cat << EOF >> /etc/tigervnc/vncserver.users
:1=root
EOF

systemctl start vncserver@:1
# 如果你想停掉vnc server,这么做
systemctl stop vncserver@:1

# firewall-cmd --permanent --add-port=6001/tcp
# firewall-cmd --permanent --add-port=5901/tcp
# firewall-cmd --reload

# connect vnc at port 5901
# export DISPLAY=:1

# 创建实验用虚拟网络

cat << EOF >  /data/kvm/virt-net.xml
<network>
  <name>openshift4</name>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='openshift4' stp='on' delay='0'/>
  <domain name='openshift4'/>
  <ip address='192.168.7.1' netmask='255.255.255.0'>
  </ip>
</network>
EOF

virsh net-define --file /data/kvm/virt-net.xml
virsh net-autostart openshift4
virsh net-start openshift4

# restore back
virsh net-destroy openshift4
virsh net-undefine openshift4

# 创建工具机

mkdir -p /data/kvm
cd /data/kvm

lvremove -f rhel/helperlv
lvcreate -y -L 200G -n helperlv rhel

virt-install --name="ocp4-aHelper" --vcpus=2 --ram=4096 \
--disk path=/dev/rhel/helperlv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --location /data/kvm/rhel-8.3-x86_64-dvd.iso \
--initrd-inject helper-ks-rhel8.cfg --extra-args "inst.ks=file:/helper-ks-rhel8.cfg" 

# restore kvm
virsh destroy ocp4-aHelper
virsh undefine ocp4-aHelper

# virt-viewer --domain-name ocp4-aHelper
# virsh start ocp4-aHelper
# virsh list --all

# start chrony/ntp server on host
/bin/cp -f /etc/chrony.conf /etc/chrony.conf.default
cat << EOF > /etc/chrony.conf
# pool 2.rhel.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow 192.0.0.0/8
local stratum 10
logdir /var/log/chrony
EOF
systemctl enable --now chronyd
# systemctl restart chronyd
chronyc tracking
chronyc sources -v
chronyc sourcestats -v
chronyc makestep

# setup ftp data root
mount --bind /data/dnf /var/ftp/dnf
chcon -R -t public_content_t  /var/ftp/dnf


工具机准备

以下是在工具机里面,进行的安装操作。

主要的操作有

  • 配置yum源
  • 运行ansible脚本,自动配置工具机
  • 上传定制的安装配置文件
  • 生成ignition文件

sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config
systemctl restart sshd

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

# in helper node
mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

export YUMIP="192.168.7.1"
cat << EOF > /etc/yum.repos.d/remote.repo
[remote-epel]
name=epel
baseurl=ftp://${YUMIP}/dnf/epel
enabled=1
gpgcheck=0

[remote-epel-modular]
name=epel-modular
baseurl=ftp://${YUMIP}/dnf/epel-modular
enabled=1
gpgcheck=0

[remote-appstream]
name=appstream
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-appstream-rpms
enabled=1
gpgcheck=0

[remote-baseos]
name=baseos
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-baseos-rpms
enabled=1
gpgcheck=0

[remote-baseos-source]
name=baseos-source
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-baseos-source-rpms
enabled=1
gpgcheck=0

[remote-supplementary]
name=supplementary
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-supplementary-rpms
enabled=1
gpgcheck=0

[remote-codeready-builder]
name=supplementary
baseurl=ftp://${YUMIP}/dnf/codeready-builder-for-rhel-8-x86_64-rpms
enabled=1
gpgcheck=0

EOF

yum clean all
yum makecache
yum repolist

yum -y install ansible git unzip podman python3

yum -y update

reboot

# yum -y install ansible git unzip podman python36

mkdir -p /data/ocp4/
# scp ocp4.tgz to /data
# scp /data/down/ocp4.tgz root@192.168.7.11:/data/
cd /data
tar zvxf ocp4.tgz
cd /data/ocp4

# 这里使用了一个ansible的项目,用来部署helper节点的服务。
# https://github.com/wangzheng422/ocp4-upi-helpernode
unzip ocp4-upi-helpernode.zip
# 这里使用了一个ignition文件合并的项目,用来帮助自定义ignition文件。
# https://github.com/wangzheng422/filetranspiler
podman load -i filetranspiler.tgz

# 接下来,我们使用ansible来配置helper节点,装上各种openshift集群需要的服务
# 根据现场环境,修改 ocp4-upi-helpernode-master/vars-static.yaml
# 主要是修改各个节点的网卡和硬盘参数,还有IP地址
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars-static.rhel8.yaml -e '{staticips: true}' tasks/main.yml

# try this:
/usr/local/bin/helpernodecheck

mkdir -p /data/install

# GOTO image registry host
# copy crt files to helper node
scp /etc/crts/redhat.ren.ca.crt root@192.168.7.11:/data/install/
scp /etc/crts/redhat.ren.crt root@192.168.7.11:/data/install/
scp /etc/crts/redhat.ren.key root@192.168.7.11:/data/install/

# GO back to help node
/bin/cp -f /data/install/redhat.ren.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

# 定制ignition
cd /data/install

# 根据现场环境,修改 install-config.yaml
# 至少要修改ssh key, 还有 additionalTrustBundle,这个是镜像仓库的csr 

# vi install-config.yaml 
cat << EOF > /data/install/install-config.yaml 
apiVersion: v1
baseDomain: redhat.ren
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 3
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.254.0.0/16
    hostPrefix: 24
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: '{"auths":{"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ppa.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"}}}'
sshKey: |
$( cat /root/.ssh/helper_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /data/install/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

cd /data/install/
/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] 

openshift-install create ignition-configs --dir=/data/install

cd /data/ocp4/ocp4-upi-helpernode-master
# 我们来为每个主机,复制自己版本的ign,并复制到web server的目录下
ansible-playbook -e @vars-static.rhel8.yaml -e '{staticips: true}' tasks/ign.yml
# 如果对每个主机有自己ign的独特需求,在这一步,去修改ign。

# 以下操作本来是想设置网卡地址,但是实践发现是不需要的。
# 保留在这里,是因为他可以在安装的时候注入文件,非常有用。
# mkdir -p bootstrap/etc/sysconfig/network-scripts/
# cat <<EOF > bootstrap/etc/sysconfig/network-scripts/ifcfg-ens3
# DEVICE=ens3
# BOOTPROTO=none
# ONBOOT=yes
# IPADDR=192.168.7.12
# NETMASK=255.255.255.0
# GATEWAY=192.168.7.1
# DNS=192.168.7.11
# DNS1=192.168.7.11
# DNS2=192.168.7.1
# DOMAIN=redhat.ren
# PREFIX=24
# DEFROUTE=yes
# IPV6INIT=no
# EOF
# filetranspiler -i bootstrap.ign -f bootstrap -o bootstrap-static.ign
# /bin/cp -f bootstrap-static.ign /var/www/html/ignition/

# 我们为每个节点创建各自的iso文件
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars-static.rhel8.yaml -e '{staticips: true}' tasks/iso.yml

回到宿主机

本来,到了这一步,就可以开始安装了,但是我们知道coreos装的时候,要手动输入很长的命令行,实际操作的时候,那是不可能输入对的,输入错一个字符,安装就失败,要重启,重新输入。。。

为了避免这种繁琐的操作,参考网上的做法,我们就需要为每个主机定制iso了。好在,之前的步骤,我们已经用ansible创建了需要的iso,我们把这些iso复制到宿主机上,就可以继续了。

这里面有一个坑,我们是不知道主机的网卡名称的,只能先用coreos iso安装启动一次,进入单用户模式以后,ip a 来查看以下,才能知道,一般来说,是ens3。

另外,如果是安装物理机,disk是哪个,也需要上述的方法,来看看具体的盘符。另外,推荐在物理机上安装rhel 8 来测试一下物理机是不是支持coreos。物理机安装的时候,遇到不写盘的问题,可以尝试添加启动参数: ignition.firstboot=1

# on kvm host

export KVM_DIRECTORY=/data/kvm

cd ${KVM_DIRECTORY}
scp root@192.168.7.11:/data/install/*.iso ${KVM_DIRECTORY}/

create_lv() {
    var_vg=$1
    var_lv=$2
    lvremove -f $var_vg/$var_lv
    lvcreate -y -L 120G -n $var_lv $var_vg
    # wipefs --all --force /dev/datavg/$var_name
}

create_lv rhel bootstraplv
create_lv nvme master0lv
create_lv nvme master1lv
create_lv nvme master2lv
create_lv rhel worker0lv
create_lv rhel worker1lv
create_lv rhel worker2lv

# finally, we can start install :)
# 你可以一口气把虚拟机都创建了,然后喝咖啡等着。
# 从这一步开始,到安装完毕,大概30分钟。
virt-install --name=ocp4-bootstrap --vcpus=4 --ram=8192 \
--disk path=/dev/rhel/bootstraplv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-bootstrap.iso   

# 想登录进coreos一探究竟?那么这么做
# ssh core@bootstrap
# journalctl -b -f -u bootkube.service

virt-install --name=ocp4-master0 --vcpus=4 --ram=16384 \
--disk path=/dev/nvme/master0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-0.iso 

# ssh core@192.168.7.13

virt-install --name=ocp4-master1 --vcpus=4 --ram=16384 \
--disk path=/dev/nvme/master1lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-1.iso 

virt-install --name=ocp4-master2 --vcpus=4 --ram=16384 \
--disk path=/dev/nvme/master2lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-2.iso 

virt-install --name=ocp4-worker0 --vcpus=4 --ram=32768 \
--disk path=/dev/rhel/worker0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-worker-0.iso 

virt-install --name=ocp4-worker1 --vcpus=4 --ram=16384 \
--disk path=/dev/rhel/worker1lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-worker-1.iso 

virt-install --name=ocp4-worker2 --vcpus=4 --ram=16384 \
--disk path=/dev/rhel/worker2lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-worker-2.iso 

# on workstation
# open http://192.168.7.11:9000/
# to check

# if you want to stop or delete vm, try this
virsh list --all
virsh destroy ocp4-bootstrap
virsh destroy ocp4-master0 
virsh destroy ocp4-master1 
virsh destroy ocp4-master2 
virsh destroy ocp4-worker0 
virsh destroy ocp4-worker1 
virsh destroy ocp4-worker2
virsh undefine ocp4-bootstrap
virsh undefine ocp4-master0 
virsh undefine ocp4-master1 
virsh undefine ocp4-master2 
virsh undefine ocp4-worker0 
virsh undefine ocp4-worker1 
virsh undefine ocp4-worker2

在工具机上面

这个时候,安装已经自动开始了,我们只需要回到工具机上静静的观察就可以了。

在bootstrap和装master阶段,用这个命令看进度。

cd /data/ocp4
export KUBECONFIG=/data/install/auth/kubeconfig
echo "export KUBECONFIG=/data/install/auth/kubeconfig" >> ~/.bashrc
oc completion bash | sudo tee /etc/bash_completion.d/openshift > /dev/null

cd /data/install
openshift-install wait-for bootstrap-complete --log-level debug

一切正常的话,会看到这个。

有时候证书会过期,验证方法是登录 bootstrap, 看看过期时间。如果确定过期,要清除所有的openshift-install生成配置文件的缓存,重新来过。

echo | openssl s_client -connect localhost:6443 | openssl x509 -noout -text | grep Not

一般来说,如果在openshift-install这一步之前,按照文档,删除了缓存文件,就不会出现过期的现象。

oc get nodes

这个时候,只能看到master,是因为worker的csr没有批准。如果虚拟机是一口气创建的,那么多半不会遇到下面的问题。

oc get csr

会发现有很多没有被批准的

批准之

yum -y install jq
oc get csr | grep -v Approved
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
# oc get csr -o name | xargs oc adm certificate approve

然后worker 节点cpu飙高,之后就能看到worker了。

等一会,会看到这个,就对了。

上面的操作完成以后,就可以完成最后的安装了

openshift-install wait-for install-complete --log-level debug
# here is the output
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "6yL7t-uDCaN-6grKP-VtYkx"

我们的工具机是带nfs的,那么就配置高档一些的nfs存储吧,不要用emptydir

bash /data/ocp4/ocp4-upi-helpernode-master/files/nfs-provisioner-setup.sh

# oc edit configs.imageregistry.operator.openshift.io
# 修改 storage 部分
# storage:
#   pvc:
#     claim:
oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Managed","storage":{"pvc":{"claim":""}}}}' --type=merge

oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

oc get clusteroperator image-registry

oc get configs.imageregistry.operator.openshift.io cluster -o yaml

# 把imagepruner给停掉
# https://bugzilla.redhat.com/show_bug.cgi?id=1852501#c24
# oc patch imagepruner.imageregistry/cluster --patch '{"spec":{"suspend":true}}' --type=merge
# oc -n openshift-image-registry delete jobs --all

oc get configs.samples.operator.openshift.io/cluster -o yaml

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Managed"}}' --type=merge

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

配置一下本地的dns ( 把 *.apps.ocp4.redhat.ren 配置成 192.168.7.11 ) ,指向工具机的haproxy,打开浏览器就能访问管理界面了

chrony/NTP 设置

在 ocp 4.6 里面,需要设定ntp同步,我们之前ansible脚本,已经创建好了ntp的mco配置,把他打到系统里面就好了。

oc apply -f /data/ocp4/ocp4-upi-helpernode-master/machineconfig/

Operator Hub 离线安装

https://docs.openshift.com/container-platform/4.2/operators/olm-restricted-networks.html

https://github.com/operator-framework/operator-registry

https://www.cnblogs.com/ericnie/p/11777384.html?from=timeline&isappinstalled=0

https://access.redhat.com/documentation/en-us/openshift_container_platform/4.2/html-single/images/index

operator hub 准备分2个层次,一个是本文章描述的,制作operator hub的离线资源,并镜像operator 镜像。做到这一步,能够在离线部署的ocp4.2上,看到operator hub,并且能够部署operator。但是如果要用operator来部署要用的组件,那么operator会再去下载镜像,这个层次的镜像,也需要离线部署,但是由于每个operator需要的镜像都不一样,也没有统一的地方进行描述,所以需要各个项目现场,根据需要另外部署,本项目会尽量多的下载需要的镜像,但是目前无法避免遗漏。

# on helper node, 在工具机上
cd /data/ocp4

# scp /etc/crts/redhat.ren.crt 192.168.7.11:/root/ocp4/
# https://docs.openshift.com/container-platform/4.4/builds/setting-up-trusted-ca.html
oc project openshift-config
oc create configmap ca.for.registry -n openshift-config \
    --from-file=registry.ocp4.redhat.ren..5443=/data/install/redhat.ren.crt
# 如果你想删除这个config map,这么做
# oc delete configmap ca.for.registry
oc patch image.config.openshift.io/cluster -p '{"spec":{"additionalTrustedCA":{"name":"ca.for.registry"}}}'  --type=merge
# oc patch image.config.openshift.io/cluster -p '{"spec":{"registrySources":{"insecureRegistries":["registry.redhat.ren"]}}}'  --type=merge
oc get image.config.openshift.io/cluster -o yaml

# 以下这个步骤是官网文档要做的,实践中发现,disconnected环境不需要
# oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'
# 如果你不小心还是照着官网做了,用如下步骤删掉
# oc patch OperatorHub cluster --type json  -p '[{"op": "remove", "path": "/spec/disableAllDefaultSources"}]'

oc patch OperatorHub cluster --type json \
    -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'

oc get OperatorHub cluster -o yaml

# yum -y install python36
# 根据项目现场情况,调整参数,运行以下命令,生成配置文件,指向内网镜像仓库
cd /data/ocp4/
bash image.registries.conf.sh registry.ocp4.redhat.ren:5443

# 由于某些ocp 4.2的更新机制,以下操作会触发集群更新,
# 集群节点会逐个重启,集群组件也会逐个重启,请等待集群重启完毕。
oc apply -f ./99-worker-container-registries.yaml -n openshift-config
oc apply -f ./99-master-container-registries.yaml -n openshift-config

# !!!正常情况,以下操作不需要!!!
# 以下操作,删除mirror镜像信息,也会触发集群更新操作,请等待集群重启完毕
oc delete -f ./99-worker-container-registries.yaml -n openshift-config
oc delete -f ./99-master-container-registries.yaml -n openshift-config

watch oc get machineconfigpools

watch oc get node

从监控界面,能看到节点在升级,重启。


# on helper node

# params for operator hub images
export var_date='2020.11.23.0135'
echo $var_date
export var_major_version='4.6'
echo ${var_major_version}

export LOCAL_REG='registry.ocp4.redhat.ren:5443'

# 如果想看到redhat的operator,这样做
# 镜像源在 docker.io/wangzheng422/operator-catalog:redhat-$var_major_version-$var_date
# 后面的参数,去build.dist.sh文件里面,查看
# var_date 和 var_major_version 参数得到
cat <<EOF > redhat-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operators-catalog
  namespace: openshift-marketplace
spec:
  displayName: Red Hat Operators
  sourceType: grpc
  image: ${LOCAL_REG}/ocp4/operator-catalog:redhat-${var_major_version}-${var_date}
  publisher: Red Hat
EOF
oc create -f redhat-operator-catalog.yaml

# 如果想看到certified的operator,这样做
# 镜像源在 docker.io/wangzheng422/operator-catalog:certified-$var_major_version-$var_date
# 后面的参数,去build.dist.sh文件里面,查看
# var_date 和 var_major_version 参数得到
cat <<EOF > certified-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: certified-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: Certified Operators
  sourceType: grpc
  image: ${LOCAL_REG}/ocp4/operator-catalog:certified-${var_major_version}-${var_date}
  publisher: Red Hat
EOF
oc create -f certified-operator-catalog.yaml

# 如果想看到community的operator,这样做
# 镜像源在 docker.io/wangzheng422/operator-catalog:community-$var_major_version-$var_date
# 后面的参数,去build.dist.sh文件里面,查看
# var_date 和 var_major_version 参数得到
cat <<EOF > community-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: community-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: Community Operator
  sourceType: grpc
  image: ${LOCAL_REG}/ocp4/operator-catalog:community-${var_major_version}-${var_date}
  publisher: Red Hat
EOF
oc create -f community-operator-catalog.yaml

cat <<EOF > marketplace-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-marketplace-catalog
  namespace: openshift-marketplace
spec:
  displayName: Red Hat Marketplace
  sourceType: grpc
  image: ${LOCAL_REG}/ocp4/operator-catalog:redhat-marketplace-${var_major_version}-${var_date}
  publisher: Red Hat
EOF
oc create -f marketplace-operator-catalog.yaml

# 想删除这些离线operator hub,就这样做。
# find . -name "*-operator-catalog.yaml" -exec oc delete -f {} \;

oc get pods -n openshift-marketplace
oc get catalogsource -n openshift-marketplace
oc get packagemanifest -n openshift-marketplace

能看到operator 列表

部署一个operator也能成功

# set master and worker combine
# https://github.com/openshift-telco/openshift4x-poc/blob/master/MASTER-WORKER-COMBINED.md
oc edit schedulers cluster
# apiVersion: config.openshift.io/v1
# kind: Scheduler
# metadata:
# name: cluster
# spec:
#     mastersSchedulable: true

其他链接

https://www.cnblogs.com/ericnie/p/11764124.html

以下是参考材料

https://blog.openshift.com/openshift-4-2-disconnected-install/

https://blog.openshift.com/openshift-4-bare-metal-install-quickstart/

https://github.com/christianh814/ocp4-upi-helpernode#ocp4-upi-helper-node-playbook

https://github.com/openshift/cluster-samples-operator/blob/master/manifests/image-references

https://github.com/e-minguez/ocp4-upi-bm-pxeless-staticips/blob/master/docs/12-post-installation.md

https://www.openshift.com/blog/deploying-a-upi-environment-for-openshift-4-1-on-vms-and-bare-metal

openshift 4.6 离线 baremetal IPI (全自动)安装 单网络模式

简介

视频讲解

本文描述ocp4.6在baremetal(kvm模拟)上面,IPI (全自动)安装。

根据openshift文档,baremetal IPI安装有两种模式,一种是provisioning网络独立,另外一种是provisioning网络和baremetal(服务)网络合并的模式。考虑到POC现场的环境,本次实验,使用简单的网络部署,也就是合并的网络模式。

以下是本次实验的架构图:

离线安装包下载

打包好的安装包,在这里下载,百度盘下载链接,版本是4.6.9-ccn:

链接: https://pan.baidu.com/s/1jJU0HLnZMnvCNMNq1OEDxA 密码: uaaw

其中包括如下类型的文件:

  • ocp4.tgz 这个文件包含了iso等安装介质,以及各种安装脚本,全部下载的镜像列表等。需要复制到宿主机,以及工具机上去。
  • registry.tgz 这个文件也是docker image registry的仓库打包文件。需要先补充镜像的话,按照这里操作: 4.6.add.image.md
  • nexus-image.tgz 这个是nexus的镜像仓库打包,集群的镜像proxy指向nexus,由nexus提供镜像的cache
  • poc.image.tgz 这个是给registry.tgz补充的一些镜像,主要是ccn使用,补充的镜像列表在这里 poc.image.list ,按照这里操作: 4.6.add.image.md

合并这些切分文件,使用类似如下的命令

cat registry.?? > registry.tgz

注意,可能需要更新离线镜像包中的helper用的ansible脚本。

在外网云主机上面准备离线安装源

准备离线安装介质的文档,已经转移到了这里:4.6.build.dist.md

宿主机准备

本次实验,是在一个32C, 256G 的主机上面,用很多个虚拟机安装测试。所以先准备这个宿主机。

如果是多台宿主机,记得一定要调整时间配置,让这些宿主机的时间基本一致,否则证书会出问题。

主要的准备工作有

  • 配置yum源
  • 配置dns
  • 安装镜像仓库
  • 配置vnc环境
  • 配置kvm需要的网络
  • 创建helper kvm

以上准备工作,dns部分需要根据实际项目环境有所调整。

本次的宿主机是一台rhel8, 参考这里进行离线repo等基本的配置rhel8.build.kernel.repo.cache.md

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

cat << EOF >>  /etc/hosts
127.0.0.1 registry.ocp4.redhat.ren nexus.ocp4.redhat.ren git.ocp4.redhat.ren
EOF

dnf clean all
dnf repolist

dnf -y install byobu htop jq ipmitool

systemctl disable --now firewalld

# 配置registry
mkdir -p /etc/crts/ && cd /etc/crts

# https://access.redhat.com/documentation/en-us/red_hat_codeready_workspaces/2.1/html/installation_guide/installing-codeready-workspaces-in-tls-mode-with-self-signed-certificates_crw
openssl genrsa -out /etc/crts/redhat.ren.ca.key 4096
openssl req -x509 \
  -new -nodes \
  -key /etc/crts/redhat.ren.ca.key \
  -sha256 \
  -days 36500 \
  -out /etc/crts/redhat.ren.ca.crt \
  -subj /CN="Local Red Hat Ren Signer" \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat /etc/pki/tls/openssl.cnf \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature'))

openssl genrsa -out /etc/crts/redhat.ren.key 2048

openssl req -new -sha256 \
    -key /etc/crts/redhat.ren.key \
    -subj "/O=Local Red Hat Ren /CN=*.ocp4.redhat.ren" \
    -reqexts SAN \
    -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "\n[SAN]\nsubjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth")) \
    -out /etc/crts/redhat.ren.csr

openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth") \
    -days 36500 \
    -in /etc/crts/redhat.ren.csr \
    -CA /etc/crts/redhat.ren.ca.crt \
    -CAkey /etc/crts/redhat.ren.ca.key \
    -CAcreateserial -out /etc/crts/redhat.ren.crt

openssl x509 -in /etc/crts/redhat.ren.crt -text

/bin/cp -f /etc/crts/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

cd /data
mkdir -p /data/registry
# tar zxf registry.tgz
dnf -y install podman pigz skopeo jq 
# pigz -dc registry.tgz | tar xf -
cd /data/ocp4
podman load -i /data/ocp4/registry.tgz

podman run --name local-registry -p 5443:5000 \
  -d --restart=always \
  -v /data/registry/:/var/lib/registry:z \
  -v /etc/crts:/certs:z \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/redhat.ren.crt \
  -e REGISTRY_HTTP_TLS_KEY=/certs/redhat.ren.key \
  docker.io/library/registry:2

podman start local-registry

# firewall-cmd --permanent --add-port=5443/tcp
# firewall-cmd --reload

# 加载更多的镜像
# 解压缩 ocp4.tgz
bash add.image.load.sh /data/install.image 'registry.ocp4.redhat.ren:5443'

# https://github.com/christianh814/ocp4-upi-helpernode/blob/master/docs/quickstart.md

# 准备vnc环境
vncpasswd

cat << EOF > ~/.vnc/config
session=gnome
securitytypes=vncauth,tlsvnc
desktop=sandbox
geometry=1440x855
alwaysshared
EOF

cat << EOF >> /etc/tigervnc/vncserver.users
:1=root
EOF

systemctl start vncserver@:1
# 如果你想停掉vnc server,这么做
systemctl stop vncserver@:1

# firewall-cmd --permanent --add-port=6001/tcp
# firewall-cmd --permanent --add-port=5901/tcp
# firewall-cmd --reload

# connect vnc at port 5901
# export DISPLAY=:1

# 创建实验用虚拟网络

cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.105/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF

nmcli con mod baremetal +ipv4.address '192.168.7.1/24'
nmcli networking off; nmcli networking on

# 创建工具机

mkdir -p /data/kvm
cd /data/kvm

lvremove -f rhel/helperlv
lvcreate -y -L 200G -n helperlv rhel

virt-install --name="ocp4-aHelper" --vcpus=2 --ram=4096 \
--disk path=/dev/rhel/helperlv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot menu=on --location /data/kvm/rhel-8.3-x86_64-dvd.iso \
--initrd-inject helper-ks-rhel8-ipi.cfg --extra-args "inst.ks=file:/helper-ks-rhel8-ipi.cfg" 

virsh start ocp4-aHelper

# DO NOT USE, restore kvm
virsh destroy ocp4-aHelper
virsh undefine ocp4-aHelper

# virt-viewer --domain-name ocp4-aHelper
# virsh start ocp4-aHelper
# virsh list --all

# start chrony/ntp server on host
/bin/cp -f /etc/chrony.conf /etc/chrony.conf.default
cat << EOF > /etc/chrony.conf
# pool 2.rhel.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow 192.0.0.0/8
local stratum 10
logdir /var/log/chrony
EOF
systemctl enable --now chronyd
# systemctl restart chronyd
chronyc tracking
chronyc sources -v
chronyc sourcestats -v
chronyc makestep

# setup ftp data root
mount --bind /data/dnf /var/ftp/dnf
chcon -R -t public_content_t  /var/ftp/dnf

# create the master and worker vm, but not start them
export KVM_DIRECTORY=/data/kvm
mkdir -p ${KVM_DIRECTORY}
cd ${KVM_DIRECTORY}
# scp root@192.168.7.11:/data/install/*.iso ${KVM_DIRECTORY}/

remove_lv() {
    var_vg=$1
    var_lv=$2
    lvremove -f $var_vg/$var_lv
}

create_lv() {
    var_vg=$1
    var_lv=$2
    lvcreate -y -L 120G -n $var_lv $var_vg
    wipefs --all --force /dev/$var_vg/$var_lv
}

remove_lv nvme master0lv
remove_lv nvme master1lv
remove_lv nvme master2lv
remove_lv rhel worker0lv
remove_lv rhel worker1lv
remove_lv rhel worker2lv

# create_lv rhel bootstraplv
create_lv nvme master0lv
create_lv nvme master1lv
create_lv nvme master2lv
create_lv rhel worker0lv
create_lv rhel worker1lv
create_lv rhel worker2lv

virt-install --name=ocp4-master0 --vcpus=4 --ram=16384 \
--disk path=/dev/nvme/master0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-master0.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-master0.xml

virt-install --name=ocp4-master1 --vcpus=4 --ram=16384 \
--disk path=/dev/nvme/master1lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-master1.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-master1.xml

virt-install --name=ocp4-master2 --vcpus=4 --ram=16384 \
--disk path=/dev/nvme/master2lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-master2.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-master2.xml

virt-install --name=ocp4-worker0 --vcpus=8 --ram=65536 \
--disk path=/dev/rhel/worker0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-worker0.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-worker0.xml

virt-install --name=ocp4-worker1 --vcpus=4 --ram=32768 \
--disk path=/dev/rhel/worker1lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-worker1.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-worker1.xml

virt-install --name=ocp4-worker2 --vcpus=2 --ram=8192 \
--disk path=/dev/rhel/worker2lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-worker2.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-worker2.xml

cd /data/kvm/
for i in master{0..2} worker{0..2}
do
  echo -ne "${i}\t" ; 
  virsh dumpxml ocp4-${i} | grep "mac address" | cut -d\' -f2 | tr '\n' '\t'
  echo 
done > mac.list
cat /data/kvm/mac.list
# master0 52:54:00:7b:5b:83
# master1 52:54:00:9b:f4:bc
# master2 52:54:00:72:16:ac
# worker0 52:54:00:19:f4:65
# worker1 52:54:00:88:4f:2c
# worker2 52:54:00:ed:25:30

# GOTO image registry & kvm host
# copy crt files to helper node
ssh-copy-id root@192.168.7.11

ssh root@192.168.7.11 mkdir -p /data/install
ssh root@192.168.7.11 mkdir -p /data/ocp4
scp /data/down/ocp4.tgz root@192.168.7.11:/data/
rsync -e ssh --info=progress2 -P --delete -arz /data/ocp4/ 192.168.7.11:/data/ocp4/

scp /etc/crts/redhat.ren.ca.crt root@192.168.7.11:/data/install/
scp /data/kvm/mac.list root@192.168.7.11:/data/install/

# install redfish for kvm
# https://access.redhat.com/solutions/4315581
# https://access.redhat.com/solutions/3057171
# https://docs.openstack.org/virtualbmc/latest/user/index.html
# https://docs.openstack.org/sushy-tools/latest/user/dynamic-emulator.html
dnf -y install python3-pip
# pip3 install --user sushy-tools

mkdir -p /data/install
cd /data/install

# podman create --name swap docker.io/wangzheng422/imgs:openshift-baremetal-install-4.6.5 ls
# podman cp swap:/openshift-baremetal-install ./
# podman rm -fv swap

podman create --name swap docker.io/wangzheng422/imgs:ocp.bm.ipi.python.dep.rhel8-4.6.7 ls
podman cp swap:/wheelhouse.tar.gz - > wheelhouse.tar.gz
tar zvxf wheelhouse.tar.gz
podman rm -fv swap

pip3 install --user -r wheelhouse/requirements.txt --no-index --find-links wheelhouse

/root/.local/bin/sushy-emulator -i 0.0.0.0 --ssl-certificate /etc/crts/redhat.ren.crt --ssl-key /etc/crts/redhat.ren.key

# curl https://registry.ocp4.redhat.ren:8000/redfish/v1/Systems/

# DO NOT USE, restore 
# if you want to stop or delete vm, try this
virsh list --all
# virsh destroy ocp4-bootstrap
virsh destroy ocp4-master0 
virsh destroy ocp4-master1 
virsh destroy ocp4-master2 
virsh destroy ocp4-worker0 
virsh destroy ocp4-worker1 
virsh destroy ocp4-worker2
# virsh undefine ocp4-bootstrap
virsh undefine ocp4-master0 --nvram
virsh undefine ocp4-master1 --nvram
virsh undefine ocp4-master2 --nvram
virsh undefine ocp4-worker0 --nvram
virsh undefine ocp4-worker1 --nvram
virsh undefine ocp4-worker2 --nvram

工具机准备

以下是在工具机里面,进行的安装操作。

主要的操作有

  • 配置yum源
  • 运行ansible脚本,自动配置工具机
  • 上传定制的安装配置文件
  • 生成ignition文件

sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config
systemctl restart sshd

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

systemctl disable --now firewalld

# in helper node
mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

export YUMIP="192.168.7.1"
cat << EOF > /etc/yum.repos.d/remote.repo
[remote-epel]
name=epel
baseurl=ftp://${YUMIP}/dnf/epel
enabled=1
gpgcheck=0

[remote-epel-modular]
name=epel-modular
baseurl=ftp://${YUMIP}/dnf/epel-modular
enabled=1
gpgcheck=0

[remote-appstream]
name=appstream
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-appstream-rpms
enabled=1
gpgcheck=0

[remote-baseos]
name=baseos
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-baseos-rpms
enabled=1
gpgcheck=0

[remote-baseos-source]
name=baseos-source
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-baseos-source-rpms
enabled=1
gpgcheck=0

[remote-supplementary]
name=supplementary
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-supplementary-rpms
enabled=1
gpgcheck=0

[remote-codeready-builder]
name=supplementary
baseurl=ftp://${YUMIP}/dnf/codeready-builder-for-rhel-8-x86_64-rpms
enabled=1
gpgcheck=0

EOF

yum clean all
yum makecache
yum repolist

yum -y install ansible git unzip podman python3

yum -y update

reboot

# yum -y install ansible git unzip podman python36

mkdir -p /data/ocp4/
# scp ocp4.tgz to /data
# scp /data/down/ocp4.tgz root@192.168.7.11:/data/
cd /data
tar zvxf ocp4.tgz
cd /data/ocp4

# 这里使用了一个ansible的项目,用来部署helper节点的服务。
# https://github.com/wangzheng422/ocp4-upi-helpernode
unzip ocp4-upi-helpernode.zip
# 这里使用了一个ignition文件合并的项目,用来帮助自定义ignition文件。
# https://github.com/wangzheng422/filetranspiler
podman load -i filetranspiler.tgz

mkdir -p /data/install

mkdir -p /data/ocp4/
cd /data/ocp4/
cat << 'EOF' > redfish.sh
#!/usr/bin/env bash

curl -k -s https://192.168.7.1:8000/redfish/v1/Systems/ | jq -r '.Members[]."@odata.id"' >  list

while read -r line; do
    curl -k -s https://192.168.7.1:8000/$line | jq -j '.Id, " ", .Name, "\n" '
done < list

EOF
bash redfish.sh > /data/install/vm.list
cat /data/install/vm.list
# 9cc02fbc-cbfe-4006-b5a9-f04712321157 ocp4-worker0
# b1a13dd1-7864-4b61-bd0c-851c11f87199 ocp4-master0
# 0a121472-6d24-47ae-9715-8e8e175ab397 ocp4-master2
# b30891d1-b14b-4645-9b05-504a58e1e059 ocp4-worker1
# fb261d6c-31c5-4e7e-8020-2789d5cc63e3 ocp4-aHelper
# 4497d313-390c-4c6b-a5d6-3f533e397aaf ocp4-master1
# f9b0a86d-1587-47ea-9a92-a2762b0684fd ocp4-worker2

cat << EOF > /data/ocp4/ocp4-upi-helpernode-master/vars-dhcp.rhel8.yaml
---
ssh_gen_key: true
staticips: false
bm_ipi: true
firewalld: false
dns_forward: false
iso:
  iso_dl_url: "file:///data/ocp4/rhcos-live.x86_64.iso"
  my_iso: "rhcos-live.iso"
helper:
  name: "helper"
  ipaddr: "192.168.7.11"
  networkifacename: "enp1s0"
  gateway: "192.168.7.1"
  netmask: "255.255.255.0"
dns:
  domain: "redhat.ren"
  clusterid: "ocp4"
  forwarder1: "192.168.7.1"
  forwarder2: "192.168.7.1"
  api_vip: "192.168.7.100"
  ingress_vip: "192.168.7.101"
dhcp:
  router: "192.168.7.1"
  bcast: "192.168.7.255"
  netmask: "255.255.255.0"
  poolstart: "192.168.7.70"
  poolend: "192.168.7.90"
  ipid: "192.168.7.0"
  netmaskid: "255.255.255.0"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.12"
  interface: "enp1s0"
  install_drive: "vda"
  macaddr: "52:54:00:7e:f8:f7"
masters:
  - name: "master-0"
    ipaddr: "192.168.7.13"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: "$(cat /data/install/mac.list | grep master0 | awk '{print $2}')"
  - name: "master-1"
    ipaddr: "192.168.7.14"
    interface: "enp1s0"
    install_drive: "vda"    
    macaddr: "$(cat /data/install/mac.list | grep master1 | awk '{print $2}')"
  - name: "master-2"
    ipaddr: "192.168.7.15"
    interface: "enp1s0"
    install_drive: "vda"   
    macaddr: "$(cat /data/install/mac.list | grep master2 | awk '{print $2}')"
workers:
  - name: "worker-0"
    ipaddr: "192.168.7.16"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: "$(cat /data/install/mac.list | grep worker0 | awk '{print $2}')"
  - name: "worker-1"
    ipaddr: "192.168.7.17"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: "$(cat /data/install/mac.list | grep worker1 | awk '{print $2}')"
  - name: "worker-2"
    ipaddr: "192.168.7.18"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: "$(cat /data/install/mac.list | grep worker2 | awk '{print $2}')"
others:
  - name: "registry"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
  - name: "yum"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
  - name: "quay"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
  - name: "nexus"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
  - name: "git"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
otherdomains:
  - domain: "rhv.redhat.ren"
    hosts:
    - name: "manager"
      ipaddr: "192.168.7.71"
    - name: "rhv01"
      ipaddr: "192.168.7.72"
  - domain: "cmri-edge.redhat.ren"
    hosts:
    - name: "*"
      ipaddr: "192.168.7.71"
    - name: "*.apps"
      ipaddr: "192.168.7.72"
force_ocp_download: false
remove_old_config_files: false
ocp_client: "file:///data/ocp4/4.6.9/openshift-client-linux-4.6.9.tar.gz"
ocp_installer: "file:///data/ocp4/4.6.9/openshift-install-linux-4.6.9.tar.gz"
ppc64le: false
arch: 'x86_64'
chronyconfig:
  enabled: true
  content:
    - server: "192.168.7.1"
      options: iburst
setup_registry:
  deploy: false
  registry_image: docker.io/library/registry:2
  local_repo: "ocp4/openshift4"
  product_repo: "openshift-release-dev"
  release_name: "ocp-release"
  release_tag: "4.6.1-x86_64"
registry_server: "registry.ocp4.redhat.ren:5443"
EOF

# 接下来,我们使用ansible来配置helper节点,装上各种openshift集群需要的服务
# 根据现场环境,修改 ocp4-upi-helpernode-master/vars-static.yaml
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars-dhcp.rhel8.yaml -e '{ staticips: false, bm_ipi: true }'  tasks/main.yml

# try this:
/usr/local/bin/helpernodecheck

mkdir -p /data/install

# GO back to help node
/bin/cp -f /data/install/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

# 根据现场环境,修改 install-config.yaml
# 至少要修改ssh key, 还有 additionalTrustBundle,这个是镜像仓库的csr 

# copy your pull secret file into helper
# SEC_FILE='/data/pull-secret.json'
# cat << 'EOF' > $SEC_FILE

# 定制ignition
cd /data/install

# vi install-config.yaml 
cat << EOF > /data/install/install-config.yaml 
apiVersion: v1
baseDomain: redhat.ren
platform:
  baremetal:
    apiVIP: 192.168.7.100
    ingressVIP: 192.168.7.101
    bootstrapProvisioningIP: 192.168.7.102
    provisioningHostIP: 192.168.7.103
    provisioningNetwork: "Disabled"
    bootstrapOSImage: http://192.168.7.11:8080/install/rhcos-qemu.x86_64.qcow2.gz?sha256=$(zcat /var/www/html/install/rhcos-qemu.x86_64.qcow2.gz | sha256sum | awk '{print $1}')
    clusterOSImage: http://192.168.7.11:8080/install/rhcos-openstack.x86_64.qcow2.gz?sha256=$(sha256sum /var/www/html/install/rhcos-openstack.x86_64.qcow2.gz | awk '{print $1}')
    hosts:
      - name: master-0
        role: master
        bmc:
          address: redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/$(cat vm.list | grep master0 | awk '{print $1}')
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep master0 | awk '{print $2}')
        rootDeviceHints:
          deviceName: "/dev/vda"
      - name: master-1
        role: master
        bmc:
          address: redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/$(cat vm.list | grep master1 | awk '{print $1}')
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep master1 | awk '{print $2}')
        rootDeviceHints:
          deviceName: "/dev/vda"
      - name: master-2
        role: master
        bmc:
          address: redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/$(cat vm.list | grep master2 | awk '{print $1}')
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep master2 | awk '{print $2}')
        rootDeviceHints:
          deviceName: "/dev/vda"
      - name: worker-0
        role: worker
        bmc:
          address: redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/$(cat vm.list | grep worker0 | awk '{print $1}')
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep worker0 | awk '{print $2}')
        rootDeviceHints:
          deviceName: "/dev/vda"
      - name: worker-1
        role: worker
        bmc:
          address: redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/$(cat vm.list | grep worker1 | awk '{print $1}')
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep worker1 | awk '{print $2}')
        rootDeviceHints:
          deviceName: "/dev/vda"
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.254.0.0/16
    hostPrefix: 24
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
  machineCIDR: 192.168.7.0/24
compute:
- name: worker
  replicas: 2
controlPlane:
  name: master
  replicas: 3
  platform:
    baremetal: {}
pullSecret: '$( cat /data/pull-secret.json )'
sshKey: |
$( cat /root/.ssh/helper_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /data/install/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

# GO back to host
mkdir -p /data/install
cd /data/install
scp root@192.168.7.11:/data/install/install-config.yaml /data/install/

cd /data/install
for i in $(sudo virsh list --all | tail -n +3 | grep bootstrap | awk {'print $2'});
do
  sudo virsh destroy $i;
  sudo virsh undefine $i;
  sudo virsh vol-delete $i --pool default;
  sudo virsh vol-delete $i.ign --pool default;
  virsh pool-destroy $i
  virsh pool-delete $i
  virsh pool-undefine $i
done
/bin/rm -rf .openshift_install.log .openshift_install_state.json terraform* auth tls 
/data/ocp4/4.6.9/openshift-baremetal-install --dir /data/install/ --log-level debug create cluster

# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "tjRNB-xHf2f-fFh8n-ppNXi"

# on kvm host, copy back auth folder
rsync -arz /data/install/auth root@192.168.7.11:/data/install/

# Go back to helper
ansible localhost -m lineinfile -a 'path=$HOME/.bashrc regexp="^export KUBECONFIG" line="export KUBECONFIG=/data/install/auth/kubeconfig"'
source $HOME/.bashrc

oc get node
oc get pod -n openshift-machine-api
oc get BareMetalHost -n openshift-machine-api
oc get bmh -n openshift-machine-api
# NAME       STATUS   PROVISIONING STATUS      CONSUMER                    BMC                                                                                               HARDWARE PROFILE   ONLINE   ERROR
# master-0   OK       externally provisioned   ocp4-zn8lq-master-0         redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/965c420a-f127-4639-9184-fe3546d2bde4                      true
# master-1   OK       externally provisioned   ocp4-zn8lq-master-1         redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/46f9dff4-1b44-4286-8a7c-691673340030                      true
# master-2   OK       externally provisioned   ocp4-zn8lq-master-2         redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/9e544eb6-1b98-4b0a-ad32-7df232ae582a                      true
# worker-0   OK       provisioned              ocp4-zn8lq-worker-0-mv4d7   redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/c399c6b7-525a-4f4e-8280-0472b6494fc5   unknown            true
# worker-1   OK       provisioned              ocp4-zn8lq-worker-0-9frt6   redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/a4052132-7598-4879-b3e1-c48c47cf67ed   unknown            true

我们就能看到bm的输出了 可以看到web console上node的配置指向了bm 我们也可以看到久违的machine配置

添加一个新节点

IPI 模式下,添加一个新节点非常方便,只要定义一个BareMetalHost就好了。

cd /data/install/
cat << EOF > /data/install/bmh.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: worker-2-bmc-secret
type: Opaque
data:
  username: $(echo -ne "admin" | base64)
  password: $(echo -ne "password" | base64)
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: worker-2
spec:
  online: true
  bootMACAddress: $(cat mac.list | grep worker2 | awk '{print $2}')
  bmc:
    address: redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/$(cat vm.list | grep worker2 | awk '{print $1}')
    credentialsName: worker-2-bmc-secret
    disableCertificateVerification: true
  rootDeviceHints:
    deviceName: /dev/vda
EOF
oc -n openshift-machine-api create -f bmh.yaml

# DO NOT USE, restore, delete the vm
oc -n openshift-machine-api delete -f bmh.yaml

oc get bmh -n openshift-machine-api
# NAME       STATUS   PROVISIONING STATUS      CONSUMER                    BMC                                                                                               HARDWARE PROFILE   ONLINE   ERROR
# master-0   OK       externally provisioned   ocp4-zn8lq-master-0         redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/965c420a-f127-4639-9184-fe3546d2bde4                      true
# master-1   OK       externally provisioned   ocp4-zn8lq-master-1         redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/46f9dff4-1b44-4286-8a7c-691673340030                      true
# master-2   OK       externally provisioned   ocp4-zn8lq-master-2         redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/9e544eb6-1b98-4b0a-ad32-7df232ae582a                      true
# worker-0   OK       provisioned              ocp4-zn8lq-worker-0-mv4d7   redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/c399c6b7-525a-4f4e-8280-0472b6494fc5   unknown            true
# worker-1   OK       provisioned              ocp4-zn8lq-worker-0-9frt6   redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/a4052132-7598-4879-b3e1-c48c47cf67ed   unknown            true
# worker-2   OK       inspecting                                           redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/2eee2e57-e18b-460b-bb3f-7f048f84c69b                      true

oc get machinesets -n openshift-machine-api
# NAME                  DESIRED   CURRENT   READY   AVAILABLE   AGE
# ocp4-zn8lq-worker-0   2         2         2       2           155m

oc get machinesets -n openshift-machine-api -o json | jq -r .items[0].metadata.name

# 扩容worker到3副本,会触发worker-2的部署
oc scale --replicas=3 machineset $(oc get machinesets -n openshift-machine-api -o json | jq -r .items[0].metadata.name) -n openshift-machine-api

镜像仓库代理 / image registry proxy

准备离线镜像仓库非常麻烦,好在我们找到了一台在线的主机,那么我们可以使用nexus构造image registry proxy,在在线环境上面,做一遍PoC,然后就能通过image registry proxy得到离线镜像了

  • https://mtijhof.wordpress.com/2018/07/23/using-nexus-oss-as-a-proxy-cache-for-docker-images/
#####################################################
# init build the nexus fs
/bin/cp -f nexus-image.tgz /data/ccn/
tar zxf nexus-image.tgz
chown -R 200 /data/ccn/nexus-image

# podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z docker.io/sonatype/nexus3:3.29.0

podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh

podman stop nexus-image
podman rm nexus-image

# get the admin password
cat /data/ccn/nexus-image/admin.password && echo
# 84091bcd-c82f-44a3-8b7b-dfc90f5b7da1

# open http://nexus.ocp4.redhat.ren:8082

# 开启 https
# https://blog.csdn.net/s7799653/article/details/105378645
# https://help.sonatype.com/repomanager3/system-configuration/configuring-ssl#ConfiguringSSL-InboundSSL-ConfiguringtoServeContentviaHTTPS
mkdir -p /data/install/tmp
cd /data/install/tmp

# 将证书导出成pkcs格式
# 这里需要输入密码  用 password,
openssl pkcs12 -export -out keystore.pkcs12 -inkey /etc/crts/redhat.ren.key -in /etc/crts/redhat.ren.crt

cat << EOF >> Dockerfile
FROM docker.io/sonatype/nexus3:3.29.0
USER root
COPY keystore.pkcs12 /keystore.pkcs12
RUN keytool -v -importkeystore -srckeystore keystore.pkcs12 -srcstoretype PKCS12 -destkeystore keystore.jks -deststoretype JKS -storepass password -srcstorepass password  &&\
    cp keystore.jks /opt/sonatype/nexus/etc/ssl/
USER nexus
EOF
buildah bud --format=docker -t docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh -f Dockerfile .
buildah push docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh

######################################################
# go to helper, update proxy setting for ocp cluster
cd /data/ocp4
bash image.registries.conf.sh nexus.ocp4.redhat.ren:8083

mkdir -p /etc/containers/registries.conf.d
/bin/cp -f image.registries.conf /etc/containers/registries.conf.d/

cd /data/ocp4
oc apply -f ./99-worker-container-registries.yaml -n openshift-config
oc apply -f ./99-master-container-registries.yaml -n openshift-config

######################################################
# dump the nexus image fs out
podman stop nexus-image

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date
cd /data/ccn

tar cf - ./nexus-image | pigz -c > nexus-image.tgz 
buildah from --name onbuild-container scratch
buildah copy onbuild-container nexus-image.tgz  /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/nexus-fs:image-$var_date
# buildah rm onbuild-container
# rm -f nexus-image.tgz 
buildah push docker.io/wangzheng422/nexus-fs:image-$var_date
echo "docker.io/wangzheng422/nexus-fs:image-$var_date"

# 以下这个版本,可以作为初始化的image proxy,里面包含了nfs provision,以及sample operator的metadata。很高兴的发现,image stream并不会完全下载镜像,好想只是下载metadata,真正用的时候,才去下载。
# docker.io/wangzheng422/nexus-fs:image-2020-12-26-1118

配置镜像仓库的ca

安装过程里面,已经把镜像仓库的ca放进去了,但是好想image stream不认,让我们再试试

oc project openshift-config
oc create configmap ca.for.registry -n openshift-config \
    --from-file=registry.ocp4.redhat.ren..5443=/data/install/redhat.ren.ca.crt \
    --from-file=nexus.ocp4.redhat.ren..8083=/data/install/redhat.ren.ca.crt 
oc patch image.config.openshift.io/cluster -p '{"spec":{"additionalTrustedCA":{"name":"ca.for.registry"}}}'  --type=merge

# oc patch image.config.openshift.io/cluster -p '{"spec":{"registrySources":{"insecureRegistries":["nexus.ocp4.redhat.ren:8083"]}}}'  --type=merge

oc get image.config.openshift.io/cluster -o yaml

# openshift project下面的image stream重新加载一下把
oc get is -o json | jq -r '.items[].metadata.name' | xargs -L1 oc import-image --all 

配置internal registry

我们的工具机是带nfs的,那么就给interneal registry配置高档一些的nfs存储吧,不要用emptydir

bash /data/ocp4/ocp4-upi-helpernode-master/files/nfs-provisioner-setup.sh

# oc edit configs.imageregistry.operator.openshift.io
# 修改 storage 部分
# storage:
#   pvc:
#     claim:
oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Managed","storage":{"pvc":{"claim":""}}}}' --type=merge

oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

oc get clusteroperator image-registry

oc get configs.imageregistry.operator.openshift.io cluster -o yaml

# 把imagepruner给停掉
# https://bugzilla.redhat.com/show_bug.cgi?id=1852501#c24
# oc patch imagepruner.imageregistry/cluster --patch '{"spec":{"suspend":true}}' --type=merge
# oc -n openshift-image-registry delete jobs --all

配置sample operator

openshift内置了一个sample operator,里面有一大堆红帽的产品。

oc get configs.samples.operator.openshift.io/cluster -o yaml

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Managed", "samplesRegistry": "nexus.ocp4.redhat.ren:8083"}}' --type=merge

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

chrony/NTP 设置

在 ocp 4.6 里面,需要设定ntp同步,我们之前ansible脚本,已经创建好了ntp的mco配置,把他打到系统里面就好了。

oc apply -f /data/ocp4/ocp4-upi-helpernode-master/machineconfig/

Operator Hub 离线安装

使用nexus作为image proxy以后,就不需要做这个离线操作了,但是如果我们想搞CCN这种项目,因为他自带了一个catalog,为了避免冲突,我们可能还是需要屏蔽到默认的operator hub


oc patch OperatorHub cluster --type json \
    -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'

oc get OperatorHub cluster -o yaml

给 openshift project image stream 打补丁

在有代理的网络环境中,我们需要给openshift project下的image stream打一些补丁。

cd /data/ocp4
bash is.patch.sh registry.ocp4.redhat.ren:5443/ocp4/openshift4

给 router / ingress 更换证书

有时候,我们需要公网CA认证的证书,给router来用,那么我们就搞一下

https://docs.openshift.com/container-platform/4.6/security/certificates/replacing-default-ingress-certificate.html


mkdir -p /data/ccn/ingress-keys/etc
mkdir -p /data/ccn/ingress-keys/lib
cd /data/ccn/ingress-keys
podman run -it --rm --name certbot \
            -v "/data/ccn/ingress-keys/etc:/etc/letsencrypt":Z \
            -v "/data/ccn/ingress-keys/lib:/var/lib/letsencrypt":Z \
            docker.io/certbot/certbot certonly  -d "*.apps.ocp4.redhat.ren" --manual --preferred-challenges dns-01  --server https://acme-v02.api.letsencrypt.org/directory

cp ./etc/archive/apps.ocp4.redhat.ren/fullchain1.pem apps.ocp4.redhat.ren.crt
cp ./etc/archive/apps.ocp4.redhat.ren/privkey1.pem apps.ocp4.redhat.ren.key

ssh root@192.168.7.11 mkdir -p /data/install/ingress-key

scp apps.* root@192.168.7.11:/data/install/ingress-key

# on helper
cd /data/install/ingress-key

oc create secret tls wzh-ingress-key \
     --cert=apps.ocp4.redhat.ren.crt \
     --key=apps.ocp4.redhat.ren.key \
     -n openshift-ingress

oc patch ingresscontroller.operator default \
     --type=merge -p \
     '{"spec":{"defaultCertificate": {"name": "wzh-ingress-key"}}}' \
     -n openshift-ingress-operator

排错技巧


# login to bootstrap to debug
# find the ip from kvm console
ssh -i ~/.ssh/helper_rsa core@192.168.7.75
journalctl -b -f -u release-image.service -u bootkube.service
journalctl -b -u release-image.service -u bootkube.service | grep -i baremetal
sudo -i
export KUBECONFIG=/etc/kubernetes/kubeconfig
oc get pod -n openshift-machine-api
oc get BareMetalHost -n openshift-machine-api

# debug why bootstrap can't be ping...
cat .openshift_install_state.json | jq  '."*bootstrap.Bootstrap"'.Config.storage.files[].path

cat .openshift_install_state.json | jq -r '."*bootstrap.Bootstrap"'.File.Data | base64 -d | jq -r . > ign.json

cat .openshift_install_state.json | jq  -r '."*bootstrap.Bootstrap".Config.storage.files[].contents.source ' | sed 's/.*base64,//g' | base64 -d > decode

cat .openshift_install_state.json | jq  -r '."*bootstrap.Bootstrap".Config.storage.files[] | .path, .contents.source ' | while read -r line ; do if [[ $line =~ .*base64,.* ]]; then echo $(echo $line | sed 's/.*base64,//g' | base64 -d) ; else echo $line; fi; done > files


openshift 4.6 离线 baremetal IPI (全自动)安装 使用 provisionning network 双网络模式

简介

视频讲解

本文描述ocp4.6在baremetal(kvm模拟)上面,IPI (全自动)安装。

根据openshift文档,baremetal IPI安装有两种模式,一种是provisioning网络独立,另外一种是provisioning网络和baremetal(服务)网络合并的模式。考虑到POC现场的环境,本次实验,使用复杂的网络部署,也就是baremetal, provisioning network分离的模式。

以下是本次实验的架构图:

离线安装包下载

打包好的安装包,在这里下载,百度盘下载链接,版本是4.6.28:

  • 链接: https://pan.baidu.com/s/1XFbiOAcz7nul-N9U0aDxHg 密码: 6qtt

其中包括如下类型的文件:

  • ocp4.tgz 这个文件包含了iso等安装介质,以及各种安装脚本,全部下载的镜像列表等。需要复制到宿主机,以及工具机上去。
  • registry.tgz 这个文件也是docker image registry的仓库打包文件。需要先补充镜像的话,按照这里操作: 4.6.add.image.md

合并这些切分文件,使用类似如下的命令

cat registry.?? > registry.tgz

注意,可能需要更新离线镜像包中的helper用的ansible脚本。

在外网云主机上面准备离线安装源

准备离线安装介质的文档,已经转移到了这里:4.6.build.dist.md

宿主机准备

本次实验,是在一个32C, 256G 的主机上面,用很多个虚拟机安装测试。所以先准备这个宿主机。

如果是多台宿主机,记得一定要调整时间配置,让这些宿主机的时间基本一致,否则证书会出问题。

主要的准备工作有

  • 配置yum源
  • 配置dns
  • 安装镜像仓库
  • 配置vnc环境
  • 配置kvm需要的网络
  • 创建helper kvm

以上准备工作,dns部分需要根据实际项目环境有所调整。

本次的宿主机是一台rhel8, 参考这里进行离线repo等基本的配置rhel8.build.kernel.repo.cache.md

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

cat << EOF >>  /etc/hosts
127.0.0.1 registry.ocp4.redhat.ren
EOF

dnf clean all
dnf repolist

dnf -y install byobu htop jq ipmitool

systemctl disable --now firewalld

# 配置registry
mkdir -p /etc/crts/ && cd /etc/crts

# https://access.redhat.com/documentation/en-us/red_hat_codeready_workspaces/2.1/html/installation_guide/installing-codeready-workspaces-in-tls-mode-with-self-signed-certificates_crw
openssl genrsa -out /etc/crts/redhat.ren.ca.key 4096
openssl req -x509 \
  -new -nodes \
  -key /etc/crts/redhat.ren.ca.key \
  -sha256 \
  -days 36500 \
  -out /etc/crts/redhat.ren.ca.crt \
  -subj /CN="Local Red Hat Ren Signer" \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat /etc/pki/tls/openssl.cnf \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature'))

openssl genrsa -out /etc/crts/redhat.ren.key 2048

openssl req -new -sha256 \
    -key /etc/crts/redhat.ren.key \
    -subj "/O=Local Red Hat Ren /CN=*.ocp4.redhat.ren" \
    -reqexts SAN \
    -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "\n[SAN]\nsubjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth")) \
    -out /etc/crts/redhat.ren.csr

openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth") \
    -days 365 \
    -in /etc/crts/redhat.ren.csr \
    -CA /etc/crts/redhat.ren.ca.crt \
    -CAkey /etc/crts/redhat.ren.ca.key \
    -CAcreateserial -out /etc/crts/redhat.ren.crt

openssl x509 -in /etc/crts/redhat.ren.crt -text

/bin/cp -f /etc/crts/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

cd /data
mkdir -p /data/registry
# tar zxf registry.tgz
dnf -y install podman pigz skopeo jq 
# pigz -dc registry.tgz | tar xf -
cd /data/ocp4
podman load -i /data/ocp4/registry.tgz

podman run --name local-registry -p 5443:5000 \
  -d --restart=always \
  -v /data/registry/:/var/lib/registry:z \
  -v /etc/crts:/certs:z \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/redhat.ren.crt \
  -e REGISTRY_HTTP_TLS_KEY=/certs/redhat.ren.key \
  docker.io/library/registry:2

podman start local-registry

# firewall-cmd --permanent --add-port=5443/tcp
# firewall-cmd --reload

# 加载更多的镜像
# 解压缩 ocp4.tgz
bash add.image.load.sh /data/install.image 'registry.ocp4.redhat.ren:5443'

# https://github.com/christianh814/ocp4-upi-helpernode/blob/master/docs/quickstart.md

# 准备vnc环境
vncpasswd

cat << EOF > ~/.vnc/config
session=gnome
securitytypes=vncauth,tlsvnc
desktop=sandbox
geometry=1280x800
alwaysshared
EOF

cat << EOF >> /etc/tigervnc/vncserver.users
:1=root
EOF

systemctl start vncserver@:1
# 如果你想停掉vnc server,这么做
systemctl stop vncserver@:1

# firewall-cmd --permanent --add-port=6001/tcp
# firewall-cmd --permanent --add-port=5901/tcp
# firewall-cmd --reload

# connect vnc at port 5901
# export DISPLAY=:1

# 创建实验用虚拟网络

cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.105/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
pkill dhclient;dhclient baremetal
nmcli con down baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh
nmcli con mod baremetal +ipv4.address '192.168.7.1/24'

cat << 'EOF' > /data/kvm/bridge.provisioning.sh
#!/usr/bin/env bash

PUB_CONN='eno2'
PUB_IP='172.22.0.1/24'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down provisioning
nmcli con delete provisioning
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname provisioning type bridge con-name provisioning ipv4.addresses $PUB_IP ipv4.method manual
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master provisioning
nmcli con down provisioning
nmcli con up provisioning
EOF
bash /data/kvm/bridge.provisioning.sh

nmcli networking off; nmcli networking on

# 创建工具机

mkdir -p /data/kvm
cd /data/kvm

lvremove -f rhel/helperlv
lvcreate -y -L 100G -n helperlv rhel

virt-install --name="ocp4-aHelper" --vcpus=4 --ram=6144 \
--disk path=/dev/rhel/helperlv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot menu=on --location /data/kvm/rhel-8.3-x86_64-dvd.iso \
--initrd-inject helper-ks-rhel8-ipi.cfg --extra-args "inst.ks=file:/helper-ks-rhel8-ipi.cfg" 

virsh start ocp4-aHelper

# DO NOT USE, restore kvm
virsh destroy ocp4-aHelper
virsh undefine ocp4-aHelper

# virt-viewer --domain-name ocp4-aHelper
# virsh start ocp4-aHelper
# virsh list --all

# start chrony/ntp server on host
/bin/cp -f /etc/chrony.conf /etc/chrony.conf.default
cat << EOF > /etc/chrony.conf
# pool 2.rhel.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow 192.0.0.0/8
local stratum 10
logdir /var/log/chrony
EOF
systemctl enable --now chronyd
# systemctl restart chronyd
chronyc tracking
chronyc sources -v
chronyc sourcestats -v
chronyc makestep

# setup ftp data root
mount --bind /data/dnf /var/ftp/dnf
chcon -R -t public_content_t  /var/ftp/dnf

# create the master and worker vm, but not start them
export KVM_DIRECTORY=/data/kvm

# cd ${KVM_DIRECTORY}
# scp root@192.168.7.11:/data/install/*.iso ${KVM_DIRECTORY}/

create_lv() {
    var_vg=$1
    var_lv=$2
    lvremove -f $var_vg/$var_lv
    lvcreate -y -L 120G -n $var_lv $var_vg
    wipefs --all --force /dev/$var_vg/$var_lv
}

# create_lv rhel bootstraplv
create_lv nvme master0lv
create_lv nvme master1lv
create_lv nvme master2lv
create_lv rhel worker0lv
create_lv rhel worker1lv
create_lv rhel worker2lv

virt-install --name=ocp4-master0 --vcpus=4 --ram=16384 \
--disk path=/dev/nvme/master0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=provisioning,model=virtio \
--network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-master0.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-master0.xml

virt-install --name=ocp4-master1 --vcpus=4 --ram=16384 \
--disk path=/dev/nvme/master1lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=provisioning,model=virtio \
--network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-master1.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-master1.xml

virt-install --name=ocp4-master2 --vcpus=4 --ram=16384 \
--disk path=/dev/nvme/master2lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=provisioning,model=virtio \
--network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-master2.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-master2.xml

virt-install --name=ocp4-worker0 --vcpus=4 --ram=32768 \
--disk path=/dev/rhel/worker0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=provisioning,model=virtio \
--network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-worker0.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-worker0.xml

virt-install --name=ocp4-worker1 --vcpus=4 --ram=16384 \
--disk path=/dev/rhel/worker1lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=provisioning,model=virtio \
--network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-worker1.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-worker1.xml

virt-install --name=ocp4-worker2 --vcpus=4 --ram=16384 \
--disk path=/dev/rhel/worker2lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=provisioning,model=virtio \
--network bridge=baremetal,model=virtio \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-worker2.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-worker2.xml

cd /data/kvm/
for i in master{0..2} worker{0..2}
do
  echo -ne "${i}\t" ; 
  virsh dumpxml ocp4-${i} | grep "mac address" | cut -d\' -f2 | tr '\n' '\t'
  echo 
done > mac.list
cat /data/kvm/mac.list
# master0 52:54:00:a8:77:90       52:54:00:1f:1c:1f
# master1 52:54:00:8a:97:b3       52:54:00:a1:d6:df
# master2 52:54:00:54:8f:4a       52:54:00:0b:7c:61
# worker0 52:54:00:4c:8a:80       52:54:00:f0:f4:2b
# worker1 52:54:00:89:eb:62       52:54:00:ee:e4:2b
# worker2 52:54:00:e1:ec:6e       52:54:00:1b:d6:b5

# GOTO image registry & kvm host
# copy crt files to helper node
ssh-copy-id root@192.168.7.11

ssh root@192.168.7.11 mkdir -p /data/install
ssh root@192.168.7.11 mkdir -p /data/ocp4
scp /data/down/ocp4.tgz root@192.168.7.11:/data/

scp /etc/crts/redhat.ren.ca.crt root@192.168.7.11:/data/install/
scp /data/kvm/mac.list root@192.168.7.11:/data/install/

# install redfish for kvm
# https://access.redhat.com/solutions/4315581
# https://access.redhat.com/solutions/3057171
# https://docs.openstack.org/virtualbmc/latest/user/index.html
# https://docs.openstack.org/sushy-tools/latest/user/dynamic-emulator.html
dnf -y install python3-pip
# pip3 install --user sushy-tools

mkdir -p /data/install
cd /data/install

podman create --name swap docker.io/wangzheng422/imgs:openshift-baremetal-install-4.6.5 ls
podman cp swap:/openshift-baremetal-install ./
podman rm -fv swap

podman create --name swap docker.io/wangzheng422/imgs:ocp.bm.ipi.python.dep.rhel8-4.6.7 ls
podman cp swap:/wheelhouse.tar.gz - > wheelhouse.tar.gz
tar zvxf wheelhouse.tar.gz
podman rm -fv swap

pip3 install --user -r wheelhouse/requirements.txt --no-index --find-links wheelhouse

ps -ef | grep vbmcd | awk '{print $2}' | xargs kill
/bin/rm -f /root/.vbmc/master.pid
/root/.local/bin/vbmcd

# curl https://registry.ocp4.redhat.ren:8000/redfish/v1/Systems/

virsh list --all
# /root/.local/bin/vbmc add ocp4-bootstrap --port 6230 --username admin --password password
# /root/.local/bin/vbmc start ocp4-bootstrap

var_i=1
for i in master{0..2} worker{0..2}
do
  /root/.local/bin/vbmc add ocp4-$i --port $(( 6230 + $var_i )) --username admin --password password
  /root/.local/bin/vbmc start ocp4-$i
  (( var_i += 1))
done

/root/.local/bin/vbmc list
# +--------------+---------+---------+------+
# | Domain name  | Status  | Address | Port |
# +--------------+---------+---------+------+
# | ocp4-master0 | running | ::      | 6231 |
# | ocp4-master1 | running | ::      | 6232 |
# | ocp4-master2 | running | ::      | 6233 |
# | ocp4-worker0 | running | ::      | 6234 |
# | ocp4-worker1 | running | ::      | 6235 |
# | ocp4-worker2 | running | ::      | 6236 |
# +--------------+---------+---------+------+

/root/.local/bin/vbmc show ocp4-master0

# DO NOT USE, restore

var_i=1
for i in master{0..2} worker{0..2}
do
  /root/.local/bin/vbmc stop ocp4-$i
  /root/.local/bin/vbmc delete ocp4-$i
  (( var_i += 1))
done

# if you want to stop or delete vm, try this
virsh list --all
# virsh destroy ocp4-bootstrap
virsh destroy ocp4-master0 
virsh destroy ocp4-master1 
virsh destroy ocp4-master2 
virsh destroy ocp4-worker0 
virsh destroy ocp4-worker1 
virsh destroy ocp4-worker2
# virsh undefine ocp4-bootstrap
virsh undefine ocp4-master0 --nvram
virsh undefine ocp4-master1 --nvram
virsh undefine ocp4-master2 --nvram
virsh undefine ocp4-worker0 --nvram
virsh undefine ocp4-worker1 --nvram
virsh undefine ocp4-worker2 --nvram

工具机准备

以下是在工具机里面,进行的安装操作。

主要的操作有

  • 配置yum源
  • 运行ansible脚本,自动配置工具机
  • 上传定制的安装配置文件
  • 生成ignition文件

sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config
systemctl restart sshd

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

systemctl disable --now firewalld

# in helper node
mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

export YUMIP="192.168.7.1"
cat << EOF > /etc/yum.repos.d/remote.repo
[remote-epel]
name=epel
baseurl=ftp://${YUMIP}/dnf/epel
enabled=1
gpgcheck=0

[remote-epel-modular]
name=epel-modular
baseurl=ftp://${YUMIP}/dnf/epel-modular
enabled=1
gpgcheck=0

[remote-appstream]
name=appstream
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-appstream-rpms
enabled=1
gpgcheck=0

[remote-baseos]
name=baseos
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-baseos-rpms
enabled=1
gpgcheck=0

[remote-baseos-source]
name=baseos-source
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-baseos-source-rpms
enabled=1
gpgcheck=0

[remote-supplementary]
name=supplementary
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-supplementary-rpms
enabled=1
gpgcheck=0

[remote-codeready-builder]
name=supplementary
baseurl=ftp://${YUMIP}/dnf/codeready-builder-for-rhel-8-x86_64-rpms
enabled=1
gpgcheck=0

EOF

yum clean all
yum makecache
yum repolist

yum -y install ansible git unzip podman python3

yum -y update

reboot

# yum -y install ansible git unzip podman python36

mkdir -p /data/ocp4/
# scp ocp4.tgz to /data
# scp /data/down/ocp4.tgz root@192.168.7.11:/data/
cd /data
tar zvxf ocp4.tgz
cd /data/ocp4

# 这里使用了一个ansible的项目,用来部署helper节点的服务。
# https://github.com/wangzheng422/ocp4-upi-helpernode
unzip ocp4-upi-helpernode.zip
# 这里使用了一个ignition文件合并的项目,用来帮助自定义ignition文件。
# https://github.com/wangzheng422/filetranspiler
podman load -i filetranspiler.tgz

mkdir -p /data/install

mkdir -p /data/ocp4/

cat << EOF > /data/ocp4/ocp4-upi-helpernode-master/vars-dhcp.rhel8.yaml
---
ssh_gen_key: true
staticips: false
bm_ipi: true
firewalld: false
dns_forward: false
iso:
  iso_dl_url: "file:///data/ocp4/rhcos-live.x86_64.iso"
  my_iso: "rhcos-live.iso"
helper:
  name: "helper"
  ipaddr: "192.168.7.11"
  networkifacename: "enp1s0"
  gateway: "192.168.7.1"
  netmask: "255.255.255.0"
dns:
  domain: "redhat.ren"
  clusterid: "ocp4"
  forwarder1: "192.168.7.1"
  forwarder2: "192.168.7.1"
  api_vip: "192.168.7.100"
  ingress_vip: "192.168.7.101"
dhcp:
  router: "192.168.7.1"
  bcast: "192.168.7.255"
  netmask: "255.255.255.0"
  poolstart: "192.168.7.70"
  poolend: "192.168.7.90"
  ipid: "192.168.7.0"
  netmaskid: "255.255.255.0"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.12"
  interface: "enp1s0"
  install_drive: "vda"
  macaddr: "52:54:00:7e:f8:f7"
masters:
  - name: "master-0"
    ipaddr: "192.168.7.13"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: "$(cat /data/install/mac.list | grep master0 | awk '{print $3}')"
  - name: "master-1"
    ipaddr: "192.168.7.14"
    interface: "enp1s0"
    install_drive: "vda"    
    macaddr: "$(cat /data/install/mac.list | grep master1 | awk '{print $3}')"
  - name: "master-2"
    ipaddr: "192.168.7.15"
    interface: "enp1s0"
    install_drive: "vda"   
    macaddr: "$(cat /data/install/mac.list | grep master2 | awk '{print $3}')"
workers:
  - name: "worker-0"
    ipaddr: "192.168.7.16"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: "$(cat /data/install/mac.list | grep worker0 | awk '{print $3}')"
  - name: "worker-1"
    ipaddr: "192.168.7.17"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: "$(cat /data/install/mac.list | grep worker1 | awk '{print $3}')"
  - name: "worker-2"
    ipaddr: "192.168.7.18"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: "$(cat /data/install/mac.list | grep worker2 | awk '{print $3}')"
others:
  - name: "registry"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
  - name: "yum"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
  - name: "quay"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
otherdomains:
  - domain: "rhv.redhat.ren"
    hosts:
    - name: "manager"
      ipaddr: "192.168.7.71"
    - name: "rhv01"
      ipaddr: "192.168.7.72"
  - domain: "cmri-edge.redhat.ren"
    hosts:
    - name: "*"
      ipaddr: "192.168.7.71"
    - name: "*.apps"
      ipaddr: "192.168.7.72"
force_ocp_download: false
remove_old_config_files: false
ocp_client: "file:///data/ocp4/4.6.5/openshift-client-linux-4.6.5.tar.gz"
ocp_installer: "file:///data/ocp4/4.6.5/openshift-install-linux-4.6.5.tar.gz"
ppc64le: false
arch: 'x86_64'
chronyconfig:
  enabled: true
  content:
    - server: "192.168.7.1"
      options: iburst
setup_registry:
  deploy: false
  registry_image: docker.io/library/registry:2
  local_repo: "ocp4/openshift4"
  product_repo: "openshift-release-dev"
  release_name: "ocp-release"
  release_tag: "4.6.1-x86_64"
registry_server: "registry.ocp4.redhat.ren:5443"
EOF

# 接下来,我们使用ansible来配置helper节点,装上各种openshift集群需要的服务
# 根据现场环境,修改 ocp4-upi-helpernode-master/vars-static.yaml
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars-dhcp.rhel8.yaml -e '{ staticips: false, bm_ipi: true }'  tasks/main.yml

# try this:
/usr/local/bin/helpernodecheck

# GO back to help node
/bin/cp -f /data/install/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

# 定制ignition
cd /data/install

# 根据现场环境,修改 install-config.yaml
# 至少要修改ssh key, 还有 additionalTrustBundle,这个是镜像仓库的csr 

# vi install-config.yaml 
cat << EOF > /data/install/install-config.yaml 
apiVersion: v1
baseDomain: redhat.ren
platform:
  baremetal:
    apiVIP: 192.168.7.100
    ingressVIP: 192.168.7.101
    # provisioningBridge: provisioning
    provisioningNetworkCIDR: 172.22.0.0/24
    # provisioningDHCPRange: 172.22.0.10,172.22.0.100
    # clusterProvisioningIP: 172.22.0.3
    # bootstrapProvisioningIP: 172.22.0.2
    # provisioningNetwork: Managed
    provisioningNetworkInterface: enp1s0
    # externalBridge: baremetal
    bootstrapOSImage: http://192.168.7.11:8080/install/rhcos-qemu.x86_64.qcow2.gz?sha256=$(zcat /var/www/html/install/rhcos-qemu.x86_64.qcow2.gz | sha256sum | awk '{print $1}')
    clusterOSImage: http://192.168.7.11:8080/install/rhcos-openstack.x86_64.qcow2.gz?sha256=$(sha256sum /var/www/html/install/rhcos-openstack.x86_64.qcow2.gz | awk '{print $1}')
    hosts:
      - name: master-0
        role: master
        bmc:
          address: ipmi://192.168.7.1:6231
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep master0 | awk '{print $2}')
        hardwareProfile: default 
        rootDeviceHints:
          deviceName: "/dev/vda"
      - name: master-1
        role: master
        bmc:
          address: ipmi://192.168.7.1:6232
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep master1 | awk '{print $2}')
        hardwareProfile: default 
        rootDeviceHints:
          deviceName: "/dev/vda"
      - name: master-2
        role: master
        bmc:
          address: ipmi://192.168.7.1:6233
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep master2 | awk '{print $2}')
        hardwareProfile: default 
        rootDeviceHints:
          deviceName: "/dev/vda"
      - name: worker-0
        role: worker
        bmc:
          address: ipmi://192.168.7.1:6234
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep worker0 | awk '{print $2}')
        hardwareProfile: unknown         
        rootDeviceHints:
          deviceName: "/dev/vda"
      - name: worker-1
        role: worker
        bmc:
          address: ipmi://192.168.7.1:6235
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep worker1 | awk '{print $2}')
        hardwareProfile: unknown         
        rootDeviceHints:
          deviceName: "/dev/vda"
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.254.0.0/16
    hostPrefix: 24
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
  machineCIDR: 192.168.7.0/24
compute:
- name: worker
  replicas: 2
controlPlane:
  name: master
  replicas: 3
  platform:
    baremetal: {}
pullSecret: '{"auths":{"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ppa.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"}}}'
sshKey: |
$( cat /root/.ssh/helper_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /data/install/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

# GO back to host
mkdir -p /data/install
cd /data/install
scp root@192.168.7.11:/data/install/install-config.yaml /data/install/

cd /data/install
for i in $(sudo virsh list --all | tail -n +3 | grep bootstrap | awk {'print $2'});
do
  sudo virsh destroy $i;
  sudo virsh undefine $i;
  sudo virsh vol-delete $i --pool default;
  sudo virsh vol-delete $i.ign --pool default;
  virsh pool-destroy $i
  virsh pool-delete $i
  virsh pool-undefine $i
done
/bin/rm -rf .openshift_install.log .openshift_install_state.json terraform* auth tls 
./openshift-baremetal-install --dir /data/install/ --log-level debug create cluster
# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "dTSbu-aIIZr-gxRxT-njrEr"

安装的过程是全自动的,所以也不用干什么,在provisioning network的模式下,可以看到master激活了2个网卡。

接着,我们就可以去helper节点上,用我们熟悉的oc命令操作集群了。

# on kvm host, copy back auth folder
rsync -arz /data/install/auth root@192.168.7.11:/data/install/

# Go back to helper
ansible localhost -m lineinfile -a 'path=$HOME/.bashrc regexp="^export KUBECONFIG" line="export KUBECONFIG=/data/install/auth/kubeconfig"'
source $HOME/.bashrc

oc get node
oc get pod -n openshift-machine-api
oc get BareMetalHost -n openshift-machine-api
oc get bmh -n openshift-machine-api
# NAME       STATUS   PROVISIONING STATUS      CONSUMER                    BMC                       HARDWARE PROFILE   ONLINE   ERROR
# master-0   OK       externally provisioned   ocp4-sbsqb-master-0         ipmi://192.168.7.1:6231                      true
# master-1   OK       externally provisioned   ocp4-sbsqb-master-1         ipmi://192.168.7.1:6232                      true
# master-2   OK       externally provisioned   ocp4-sbsqb-master-2         ipmi://192.168.7.1:6233                      true
# worker-0   OK       provisioned              ocp4-sbsqb-worker-0-kcz5t   ipmi://192.168.7.1:6234   unknown            true
# worker-1   OK       provisioned              ocp4-sbsqb-worker-0-5ktqw   ipmi://192.168.7.1:6235   unknown            true
# worker-2   OK       ready                                                ipmi://192.168.7.1:6236   unknown            false

oc get pod -n openshift-kni-infra

我们就能看到bm的输出了

可以看到web console上node的配置指向了bm

我们也可以看到久违的machine配置

添加一个新节点

IPI 模式下,添加一个新节点非常方便,只要定义一个BareMetalHost就好了。

cd /data/install/
cat << EOF > /data/install/bmh.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: worker-2-bmc-secret
type: Opaque
data:
  username: $(echo -ne "admin" | base64)
  password: $(echo -ne "password" | base64)
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: worker-2
spec:
  online: true
  bootMACAddress: $(cat mac.list | grep worker2 | awk '{print $2}')
  bmc:
    address: ipmi://192.168.7.1:6236
    credentialsName: worker-2-bmc-secret
    disableCertificateVerification: true
  hardwareProfile: unknown         
  rootDeviceHints:
    deviceName: /dev/vda
EOF
oc -n openshift-machine-api create -f bmh.yaml

# DO NOT USE, restore, delete the vm
oc -n openshift-machine-api delete -f bmh.yaml

oc get bmh -n openshift-machine-api
# NAME       STATUS   PROVISIONING STATUS      CONSUMER                    BMC                       HARDWARE PROFILE   ONLINE   ERROR
# master-0   OK       externally provisioned   ocp4-sbsqb-master-0         ipmi://192.168.7.1:6231                      true
# master-1   OK       externally provisioned   ocp4-sbsqb-master-1         ipmi://192.168.7.1:6232                      true
# master-2   OK       externally provisioned   ocp4-sbsqb-master-2         ipmi://192.168.7.1:6233                      true
# worker-0   OK       provisioned              ocp4-sbsqb-worker-0-kcz5t   ipmi://192.168.7.1:6234   unknown            true
# worker-1   OK       provisioned              ocp4-sbsqb-worker-0-5ktqw   ipmi://192.168.7.1:6235   unknown            true
# worker-2   OK       ready                                                ipmi://192.168.7.1:6236   unknown            false

oc get machinesets -n openshift-machine-api
# NAME                  DESIRED   CURRENT   READY   AVAILABLE   AGE
# ocp4-sbsqb-worker-0   2         2         2       2           99m

oc get machinesets -n openshift-machine-api -o json | jq -r .items[0].metadata.name

# 扩容worker到3副本,会触发worker-2的部署
oc scale --replicas=3 machineset $(oc get machinesets -n openshift-machine-api -o json | jq -r .items[0].metadata.name) -n openshift-machine-api

排错技巧


# login to bootstrap to debug
# find the ip from kvm console
ssh -i ~/.ssh/helper_rsa core@192.168.7.75
journalctl -b -f -u release-image.service -u bootkube.service
journalctl -b -u release-image.service -u bootkube.service | grep -i baremetal
sudo -i
export KUBECONFIG=/etc/kubernetes/kubeconfig
oc get pod -n openshift-machine-api
oc get BareMetalHost -n openshift-machine-api

# debug why bootstrap can't be ping...
cat .openshift_install_state.json | jq  '."*bootstrap.Bootstrap"'.Config.storage.files[].path

cat .openshift_install_state.json | jq -r '."*bootstrap.Bootstrap"'.File.Data | base64 -d | jq -r . > ign.json

cat .openshift_install_state.json | jq  -r '."*bootstrap.Bootstrap".Config.storage.files[].contents.source ' | sed 's/.*base64,//g' | base64 -d > decode

cat .openshift_install_state.json | jq  -r '."*bootstrap.Bootstrap".Config.storage.files[] | .path, .contents.source ' | while read -r line ; do if [[ $line =~ .*base64,.* ]]; then echo $(echo $line | sed 's/.*base64,//g' | base64 -d) ; else echo $line; fi; done > files


openshift 4.6 静态IP离线 baremetal 安装,采用 cilium 网络插件

based on: https://docs.cilium.io/en/stable/gettingstarted/k8s-install-openshift-okd/

base on cilium v1.9.4

安装过程视频

本文描述ocp4.6在baremetal(kvm模拟)上面,静态ip安装的方法,并使用cilium 网络插件

以下是本次实验的架构图:

离线安装包下载

ocp4.3的离线安装包下载和3.11不太一样,按照如下方式准备。另外,由于默认的baremetal是需要dhcp, pxe环境的,那么需要准备一个工具机,上面有dhcp, tftp, haproxy等工具,另外为了方便项目现场工作,还准备了ignition文件的修改工具,所以离线安装包需要一些其他第三方的工具。

https://github.com/wangzheng422/ocp4-upi-helpernode 这个工具,是创建工具机用的。

https://github.com/wangzheng422/filetranspiler 这个工具,是修改ignition文件用的。

打包好的安装包,在这里下载,百度盘下载链接,版本是4.6.12:

其中包括如下类型的文件:

  • ocp4.tgz 这个文件包含了iso等安装介质,以及各种安装脚本,全部下载的镜像列表等。需要复制到宿主机,以及工具机上去。
  • registry.tgz 这个文件也是docker image registry的仓库打包文件。需要先补充镜像的话,按照这里操作: 4.6.add.image.md
  • install.image.tgz 这个文件是安装集群的时候,需要的补充镜像.
  • rhel-data.7.9.tgz 这个文件是 rhel 7 主机的yum更新源,这么大是因为里面有gpu, epel等其他的东西。这个包主要用于安装宿主机,工具机,以及作为计算节点的rhel。

合并这些切分文件,使用类似如下的命令

cat registry.?? > registry.tgz

在外网云主机上面准备离线安装源

准备离线安装介质的文档,已经转移到了这里:4.6.build.dist.md

宿主机准备

本次实验,是在一个32C, 256G 的主机上面,用很多个虚拟机安装测试。所以先准备这个宿主机。

如果是多台宿主机,记得一定要调整时间配置,让这些宿主机的时间基本一致,否则证书会出问题。

主要的准备工作有

  • 配置yum源
  • 配置dns
  • 安装镜像仓库
  • 配置vnc环境
  • 配置kvm需要的网络
  • 创建helper kvm
  • 配置一个haproxy,从外部导入流量给kvm

以上准备工作,dns部分需要根据实际项目环境有所调整。

本次的宿主机是一台rhel8, 参考这里进行基本的配置配置rhel8.build.kernel.repo.cache.md

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

cat << EOF >>  /etc/hosts
127.0.0.1 registry.ocp4.redhat.ren
EOF

dnf clean all
dnf repolist

dnf -y install byobu htop 

systemctl disable --now firewalld

# 配置registry
mkdir -p /etc/crts/ && cd /etc/crts

# https://access.redhat.com/documentation/en-us/red_hat_codeready_workspaces/2.1/html/installation_guide/installing-codeready-workspaces-in-tls-mode-with-self-signed-certificates_crw
openssl genrsa -out /etc/crts/redhat.ren.ca.key 4096
openssl req -x509 \
  -new -nodes \
  -key /etc/crts/redhat.ren.ca.key \
  -sha256 \
  -days 36500 \
  -out /etc/crts/redhat.ren.ca.crt \
  -subj /CN="Local Red Hat Ren Signer" \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat /etc/pki/tls/openssl.cnf \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature'))

openssl genrsa -out /etc/crts/redhat.ren.key 2048

openssl req -new -sha256 \
    -key /etc/crts/redhat.ren.key \
    -subj "/O=Local Red Hat Ren /CN=*.ocp4.redhat.ren" \
    -reqexts SAN \
    -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "\n[SAN]\nsubjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth")) \
    -out /etc/crts/redhat.ren.csr

openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth") \
    -days 36500 \
    -in /etc/crts/redhat.ren.csr \
    -CA /etc/crts/redhat.ren.ca.crt \
    -CAkey /etc/crts/redhat.ren.ca.key \
    -CAcreateserial -out /etc/crts/redhat.ren.crt

openssl x509 -in /etc/crts/redhat.ren.crt -text

/bin/cp -f /etc/crts/redhat.ren.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

cd /data
mkdir -p /data/registry
# tar zxf registry.tgz
dnf -y install podman pigz skopeo jq 
# pigz -dc registry.tgz | tar xf -
cd /data/ocp4
podman load -i /data/ocp4/registry.tgz

podman run --name local-registry -p 5443:5000 \
  -d --restart=always \
  -v /data/registry/:/var/lib/registry:z \
  -v /etc/crts:/certs:z \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/redhat.ren.crt \
  -e REGISTRY_HTTP_TLS_KEY=/certs/redhat.ren.key \
  docker.io/library/registry:2

# firewall-cmd --permanent --add-port=5443/tcp
# firewall-cmd --reload

# 加载更多的镜像
# 解压缩 ocp4.tgz
bash add.image.load.sh /data/install.image 'registry.ocp4.redhat.ren:5443'

# https://github.com/christianh814/ocp4-upi-helpernode/blob/master/docs/quickstart.md

# 准备vnc环境
vncpasswd

cat << EOF > ~/.vnc/config
session=gnome
securitytypes=vncauth,tlsvnc
desktop=sandbox
geometry=1280x800
alwaysshared
EOF

cat << EOF >> /etc/tigervnc/vncserver.users
:1=root
EOF

systemctl start vncserver@:1
# 如果你想停掉vnc server,这么做
systemctl stop vncserver@:1

# firewall-cmd --permanent --add-port=6001/tcp
# firewall-cmd --permanent --add-port=5901/tcp
# firewall-cmd --reload

# connect vnc at port 5901
# export DISPLAY=:1

# 创建实验用虚拟网络

cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.105/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF

nmcli con mod baremetal +ipv4.address '192.168.7.1/24'
nmcli networking off; nmcli networking on


# 创建工具机

mkdir -p /data/kvm
cd /data/kvm

lvremove -f rhel/helperlv
lvcreate -y -L 200G -n helperlv rhel

virt-install --name="ocp4-aHelper" --vcpus=2 --ram=4096 \
--disk path=/dev/rhel/helperlv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network network=openshift4,model=virtio \
--boot menu=on --location /data/kvm/rhel-8.3-x86_64-dvd.iso \
--initrd-inject helper-ks-rhel8.cfg --extra-args "inst.ks=file:/helper-ks-rhel8.cfg" 

# restore kvm
virsh destroy ocp4-aHelper
virsh undefine ocp4-aHelper

# virt-viewer --domain-name ocp4-aHelper
# virsh start ocp4-aHelper
# virsh list --all

# start chrony/ntp server on host
/bin/cp -f /etc/chrony.conf /etc/chrony.conf.default
cat << EOF > /etc/chrony.conf
# pool 2.rhel.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow 192.0.0.0/8
local stratum 10
logdir /var/log/chrony
EOF
systemctl enable --now chronyd
# systemctl restart chronyd
chronyc tracking
chronyc sources -v
chronyc sourcestats -v
chronyc makestep

# setup ftp data root
mount --bind /data/dnf /var/ftp/dnf
chcon -R -t public_content_t  /var/ftp/dnf


工具机准备

以下是在工具机里面,进行的安装操作。

主要的操作有

  • 配置yum源
  • 运行ansible脚本,自动配置工具机
  • 上传定制的安装配置文件
  • 生成ignition文件

sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config
systemctl restart sshd

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

# in helper node
mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

export YUMIP="192.168.7.1"
cat << EOF > /etc/yum.repos.d/remote.repo
[remote-epel]
name=epel
baseurl=ftp://${YUMIP}/dnf/epel
enabled=1
gpgcheck=0

[remote-epel-modular]
name=epel-modular
baseurl=ftp://${YUMIP}/dnf/epel-modular
enabled=1
gpgcheck=0

[remote-appstream]
name=appstream
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-appstream-rpms
enabled=1
gpgcheck=0

[remote-baseos]
name=baseos
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-baseos-rpms
enabled=1
gpgcheck=0

[remote-baseos-source]
name=baseos-source
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-baseos-source-rpms
enabled=1
gpgcheck=0

[remote-supplementary]
name=supplementary
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-supplementary-rpms
enabled=1
gpgcheck=0

[remote-codeready-builder]
name=supplementary
baseurl=ftp://${YUMIP}/dnf/codeready-builder-for-rhel-8-x86_64-rpms
enabled=1
gpgcheck=0

EOF

yum clean all
yum makecache
yum repolist

yum -y install ansible git unzip podman python3

yum -y update

reboot

# yum -y install ansible git unzip podman python36

mkdir -p /data/ocp4/
# scp ocp4.tgz to /data
# scp /data/down/ocp4.tgz root@192.168.7.11:/data/
# rsync -e ssh --info=progress2 -P --delete -arz  /data/ocp4/ root@192.168.7.11:/data/ocp4/
cd /data
tar zvxf ocp4.tgz
cd /data/ocp4

# 这里使用了一个ansible的项目,用来部署helper节点的服务。
# https://github.com/wangzheng422/ocp4-upi-helpernode
unzip ocp4-upi-helpernode.zip
# 这里使用了一个ignition文件合并的项目,用来帮助自定义ignition文件。
# https://github.com/wangzheng422/filetranspiler
podman load -i filetranspiler.tgz

# 接下来,我们使用ansible来配置helper节点,装上各种openshift集群需要的服务
# 根据现场环境,修改 ocp4-upi-helpernode-master/vars-static.yaml
# 主要是修改各个节点的网卡和硬盘参数,还有IP地址

cat << EOF > /data/ocp4/ocp4-upi-helpernode-master/vars-static.rhel8.yaml
---
ssh_gen_key: true
staticips: true
bm_ipi: false
firewalld: false
dns_forward: false
iso:
  iso_dl_url: "file:///data/ocp4/rhcos-live.x86_64.iso"
  my_iso: "rhcos-live.iso"
helper:
  name: "helper"
  ipaddr: "192.168.7.11"
  networkifacename: "enp1s0"
  gateway: "192.168.7.1"
  netmask: "255.255.255.0"
dns:
  domain: "redhat.ren"
  clusterid: "ocp4"
  forwarder1: "192.168.7.1"
  forwarder2: "192.168.7.1"
  api_vip: "192.168.7.11"
  ingress_vip: "192.168.7.11"
dhcp:
  router: "192.168.7.1"
  bcast: "192.168.7.255"
  netmask: "255.255.255.0"
  poolstart: "192.168.7.70"
  poolend: "192.168.7.90"
  ipid: "192.168.7.0"
  netmaskid: "255.255.255.0"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.12"
  interface: "enp1s0"
  install_drive: "vda"
  macaddr: "52:54:00:7e:f8:f7"
masters:
  - name: "master-0"
    ipaddr: "192.168.7.13"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: ""
  - name: "master-1"
    ipaddr: "192.168.7.14"
    interface: "enp1s0"
    install_drive: "vda"    
    macaddr: ""
  - name: "master-2"
    ipaddr: "192.168.7.15"
    interface: "enp1s0"
    install_drive: "vda"   
    macaddr: ""
workers:
  - name: "worker-0"
    ipaddr: "192.168.7.16"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: ""
  - name: "worker-1"
    ipaddr: "192.168.7.17"
    interface: "enp1s0"
    install_drive: "vda"
    macaddr: ""
others:
  - name: "registry"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
  - name: "yum"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
  - name: "quay"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
  - name: "nexus"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
  - name: "git"
    ipaddr: "192.168.7.1"
    macaddr: "52:54:00:7e:f8:f7"
otherdomains:
  - domain: "rhv.redhat.ren"
    hosts:
    - name: "manager"
      ipaddr: "192.168.7.71"
    - name: "rhv01"
      ipaddr: "192.168.7.72"
  - domain: "cmri-edge.redhat.ren"
    hosts:
    - name: "*"
      ipaddr: "192.168.7.71"
    - name: "*.apps"
      ipaddr: "192.168.7.72"
force_ocp_download: false
remove_old_config_files: false
ocp_client: "file:///data/ocp4/4.6.16/openshift-client-linux-4.6.16.tar.gz"
ocp_installer: "file:///data/ocp4/4.6.16/openshift-install-linux-4.6.16.tar.gz"
ppc64le: false
arch: 'x86_64'
chronyconfig:
  enabled: true
  content:
    - server: "192.168.7.1"
      options: iburst
setup_registry:
  deploy: false
  registry_image: docker.io/library/registry:2
  local_repo: "ocp4/openshift4"
  product_repo: "openshift-release-dev"
  release_name: "ocp-release"
  release_tag: "4.6.1-x86_64"
registry_server: "registry.ocp4.redhat.ren:5443"
EOF

cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars-static.rhel8.yaml -e '{staticips: true}' tasks/main.yml

# try this:
/usr/local/bin/helpernodecheck

mkdir -p /data/install

# GOTO image registry host
# copy crt files to helper node
scp /etc/crts/redhat.ren.ca.crt root@192.168.7.11:/data/install/
scp /etc/crts/redhat.ren.crt root@192.168.7.11:/data/install/
scp /etc/crts/redhat.ren.key root@192.168.7.11:/data/install/

# GO back to help node
/bin/cp -f /data/install/redhat.ren.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

# 定制ignition
cd /data/install

# 根据现场环境,修改 install-config.yaml
# 至少要修改ssh key, 还有 additionalTrustBundle,这个是镜像仓库的csr 

# vi install-config.yaml 
cat << EOF > /data/install/install-config.yaml 
apiVersion: v1
baseDomain: redhat.ren
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.254.0.0/16
    hostPrefix: 24
  networkType: Cilium
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: '$( cat /data/pull-secret.json )'
sshKey: |
$( cat /root/.ssh/helper_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /data/install/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

cd /data/install/
/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] 

openshift-install create manifests --dir "./"

cat << EOF > "/data/install/manifests/cluster-network-03-cilium-namespace.yaml"
apiVersion: v1
kind: Namespace
metadata:
  name: cilium
  annotations:
    # node selector is required to make cilium-operator run on control plane nodes
    openshift.io/node-selector: ""
  labels:
    name: cilium
    # run level sets priority for Cilium to be deployed prior to other components
    openshift.io/run-level: "0"
    # enable cluster logging for Cilium namespace
    openshift.io/cluster-logging: "true"
    # enable cluster monitoring for Cilium namespace
    openshift.io/cluster-monitoring: "true"
EOF

去一个公网主机

cp /data/ocp4/clients/helm-linux-amd64 /usr/local/bin/helm
chmod  +x /usr/local/bin/helm

mkdir -p /data/cilium
cd /data/cilium

helm repo add cilium https://helm.cilium.io/

helm template cilium/cilium --version 1.9.4  \
   --namespace cilium \
   --set ipam.mode=cluster-pool \
   --set cni.binPath=/var/lib/cni/bin \
   --set cni.confPath=/var/run/multus/cni/net.d \
   --set ipam.operator.clusterPoolIPv4PodCIDR=10.254.0.0/16 \
   --set ipam.operator.clusterPoolIPv4MaskSize=24 \
   --set nativeRoutingCIDR=10.254.0.0/16 \
   --set bpf.masquerade=false \
   --set endpointRoutes.enabled=true \
   --set hubble.enabled=true \
   --set hubble.listenAddress=":4244" \
   --set hubble.relay.enabled=true \
   --set hubble.ui.enabled=true \
   --output-dir "/data/cilium/"


回到 helper

# upload /data/cilium/cilium/templates/ to /data/install/cilium/templates
cd /data/install
for resource in cilium/templates/*
    do cp "${resource}" "./manifests/cluster-network-04-cilium-$(basename ${resource})"
done

# 我们换季里面有nexus,那么我们把proxy的补丁打进去
cd /data/ocp4
bash image.registries.conf.sh nexus.ocp4.redhat.ren:8083

mkdir -p /etc/containers/registries.conf.d
/bin/cp -f image.registries.conf /etc/containers/registries.conf.d/

cd /data/install
cp  /data/ocp4/99-worker-container-registries.yaml ./manifests/
cp  /data/ocp4/99-master-container-registries.yaml ./manifests/

cp /data/ocp4/ocp4-upi-helpernode-master/machineconfig/* ./manifests/

openshift-install create ignition-configs --dir=/data/install

cd /data/ocp4/ocp4-upi-helpernode-master
# 我们来为每个主机,复制自己版本的ign,并复制到web server的目录下
ansible-playbook -e @vars-static.rhel8.yaml -e '{staticips: true}' tasks/ign.yml
# 如果对每个主机有自己ign的独特需求,在这一步,去修改ign。

# 以下操作本来是想设置网卡地址,但是实践发现是不需要的。
# 保留在这里,是因为他可以在安装的时候注入文件,非常有用。
# mkdir -p bootstrap/etc/sysconfig/network-scripts/
# cat <<EOF > bootstrap/etc/sysconfig/network-scripts/ifcfg-ens3
# DEVICE=ens3
# BOOTPROTO=none
# ONBOOT=yes
# IPADDR=192.168.7.12
# NETMASK=255.255.255.0
# GATEWAY=192.168.7.1
# DNS=192.168.7.11
# DNS1=192.168.7.11
# DNS2=192.168.7.1
# DOMAIN=redhat.ren
# PREFIX=24
# DEFROUTE=yes
# IPV6INIT=no
# EOF
# filetranspiler -i bootstrap.ign -f bootstrap -o bootstrap-static.ign
# /bin/cp -f bootstrap-static.ign /var/www/html/ignition/

# 我们为每个节点创建各自的iso文件
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars-static.rhel8.yaml -e '{staticips: true}'  tasks/iso.yml

回到宿主机

本来,到了这一步,就可以开始安装了,但是我们知道coreos装的时候,要手动输入很长的命令行,实际操作的时候,那是不可能输入对的,输入错一个字符,安装就失败,要重启,重新输入。。。

为了避免这种繁琐的操作,参考网上的做法,我们就需要为每个主机定制iso了。好在,之前的步骤,我们已经用ansible创建了需要的iso,我们把这些iso复制到宿主机上,就可以继续了。

这里面有一个坑,我们是不知道主机的网卡名称的,只能先用coreos iso安装启动一次,进入单用户模式以后,ip a 来查看以下,才能知道,一般来说,是ens3。

另外,如果是安装物理机,disk是哪个,也需要上述的方法,来看看具体的盘符。另外,推荐在物理机上安装rhel 8 来测试一下物理机是不是支持coreos。物理机安装的时候,遇到不写盘的问题,可以尝试添加启动参数: ignition.firstboot=1

# on kvm host

export KVM_DIRECTORY=/data/kvm

cd ${KVM_DIRECTORY}
scp root@192.168.7.11:/data/install/*.iso ${KVM_DIRECTORY}/

remove_lv() {
    var_vg=$1
    var_lv=$2
    lvremove -f $var_vg/$var_lv
}

create_lv() {
    var_vg=$1
    var_lv=$2
    lvcreate -y -L 120G -n $var_lv $var_vg
    wipefs --all --force /dev/$var_vg/$var_lv
}

remove_lv rhel bootstraplv
remove_lv nvme master0lv
remove_lv nvme master1lv
remove_lv nvme master2lv
remove_lv rhel worker0lv
remove_lv rhel worker1lv

create_lv rhel bootstraplv
create_lv nvme master0lv
create_lv nvme master1lv
create_lv nvme master2lv
create_lv rhel worker0lv
create_lv rhel worker1lv

# finally, we can start install :)
# 你可以一口气把虚拟机都创建了,然后喝咖啡等着。
# 从这一步开始,到安装完毕,大概30分钟。
virt-install --name=ocp4-bootstrap --vcpus=4 --ram=8192 \
--cpu=host-model \
--disk path=/dev/rhel/bootstraplv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-bootstrap.iso   

# 想登录进coreos一探究竟?那么这么做
# ssh core@bootstrap
# journalctl -b -f -u bootkube.service

virt-install --name=ocp4-master0 --vcpus=6 --ram=36864 \
--cpu=host-model \
--disk path=/dev/nvme/master0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-0.iso 

# ssh core@192.168.7.13

virt-install --name=ocp4-master1 --vcpus=6 --ram=36864 \
--cpu=host-model \
--disk path=/dev/nvme/master1lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-1.iso 

virt-install --name=ocp4-master2 --vcpus=6 --ram=36864 \
--cpu=host-model \
--disk path=/dev/nvme/master2lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-2.iso 

# we add gpu passthrough into kvm
# look ./4.6.gpu.passthrough.md to find how to 
lspci -n | grep 10de:1eb8
# 05:00.0 0302: 10de:1eb8 (rev a1)
virsh nodedev-list | grep pci | grep 05_00_0
# pci_0000_05_00_0
# https://docs.nvidia.com/grid/11.0/grid-vgpu-user-guide/index.html#using-gpu-pass-through
virsh nodedev-dumpxml pci_0000_05_00_0| egrep 'domain|bus|slot|function'
    # <domain>0</domain>
    # <bus>5</bus>
    # <slot>0</slot>
    # <function>0</function>
    # <capability type='virt_functions' maxCount='16'/>
    #   <address domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>

# if it is gpu passthrough
virt-install --name=ocp4-worker0 --vcpus=6 --ram=36864 \
--cpu=host-model \
--disk path=/dev/rhel/worker0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--host-device=pci_0000_05_00_0 \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-worker-0.iso 

# if it is vgpu 
virt-install --name=ocp4-worker0 --vcpus=6 --ram=36864 \
--cpu=host-model \
--disk path=/dev/rhel/worker0lv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-worker-0.iso

# on workstation
# open http://192.168.7.11:9000/
# to check

# if you want to stop or delete vm, try this
virsh list --all
virsh destroy ocp4-bootstrap
virsh undefine ocp4-bootstrap

virsh destroy ocp4-master0 
virsh destroy ocp4-master1 
virsh destroy ocp4-master2 
virsh destroy ocp4-worker0 
virsh destroy ocp4-worker1

virsh undefine ocp4-master0 
virsh undefine ocp4-master1 
virsh undefine ocp4-master2 
virsh undefine ocp4-worker0
virsh undefine ocp4-worker1

在工具机上面

这个时候,安装已经自动开始了,我们只需要回到工具机上静静的观察就可以了。

在bootstrap和装master阶段,用这个命令看进度。

cd /data/ocp4
export KUBECONFIG=/data/install/auth/kubeconfig
echo "export KUBECONFIG=/data/install/auth/kubeconfig" >> ~/.bashrc
oc completion bash | sudo tee /etc/bash_completion.d/openshift > /dev/null

cd /data/install
openshift-install wait-for bootstrap-complete --log-level debug

一切正常的话,会看到这个。

有时候证书会过期,验证方法是登录 bootstrap, 看看过期时间。如果确定过期,要清除所有的openshift-install生成配置文件的缓存,重新来过。

echo | openssl s_client -connect localhost:6443 | openssl x509 -noout -text | grep Not

一般来说,如果在openshift-install这一步之前,按照文档,删除了缓存文件,就不会出现过期的现象。

oc get nodes

这个时候,只能看到master,是因为worker的csr没有批准。如果虚拟机是一口气创建的,那么多半不会遇到下面的问题。

oc get csr

会发现有很多没有被批准的

批准之

yum -y install jq
oc get csr | grep -v Approved
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
# oc get csr -o name | xargs oc adm certificate approve

然后worker 节点cpu飙高,之后就能看到worker了。

等一会,会看到这个,就对了。

上面的操作完成以后,就可以完成最后的安装了

openshift-install wait-for install-complete --log-level debug
# here is the output
# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "ngc8Z-hWogN-HcVYJ-UGXcs"

# 由于cilium的pod第一次启动好像不正常,我们要统一重启一下
oc -n cilium delete pod --all 

测试 cilium

# wget https://raw.githubusercontent.com/cilium/cilium/1.9.4/examples/kubernetes/connectivity-check/connectivity-check.yaml

kubectl apply -f - << EOF
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
  name: cilium-test
allowHostPorts: true
allowHostNetwork: true
users:
  - system:serviceaccount:cilium-test:default
priority: null
readOnlyRootFilesystem: false
runAsUser:
  type: MustRunAsRange
seLinuxContext:
  type: MustRunAs
volumes: null
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostPID: false
allowPrivilegeEscalation: false
allowPrivilegedContainer: false
allowedCapabilities: null
defaultAddCapabilities: null
requiredDropCapabilities: null
groups: null
EOF

cd /data/install

kubectl create ns cilium-test

kubectl apply -n cilium-test -f /data/install/cilium/connectivity-check.yaml

kubectl get pods -n cilium-test
# the result see below pic

# restore
kubectl delete ns cilium-test
kubectl delete scc cilium-test

因为我们是一个离线环境,所以连接外网的最后2个测试,肯定不能通过,其他都是正常的,所以是没问题的。

接下来,我们按照hubble,算是一个前端吧


oc apply -f /data/install/cilium/templates/

kubectl -n cilium get pods 

# restore
# oc delete -f /data/install/cilium/templates/

cilium看来还有点稳定性问题,有几个pod需要重启,hubble才能装上。

接下来,我们就测试一下酷酷的界面吧

open: http://hubble-ui-cilium.apps.ocp4.redhat.ren/

# kubectl port-forward -n kube-system svc/hubble-ui --address 0.0.0.0 --address :: 12000:80

# 我们创建一个测试应用
# wget https://raw.githubusercontent.com/cilium/cilium/1.9.4/examples/minikube/http-sw-app.yaml
oc create -n default -f /data/install/cilium/http-sw-app.yaml

oc expose svc hubble-ui -n cilium

oc project default
kubectl exec xwing -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
kubectl exec tiefighter -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing

# kubectl port-forward -n cilium svc/hubble-ui --address 0.0.0.0 --address :: 12000:80

# kubectl -n cilium get pods -l k8s-app=cilium

POD=$( oc -n cilium get pod -l k8s-app=cilium -o json | jq -r '.items[0].metadata | select( .name | contains("cilium") ) | .name' ) 
echo $POD

kubectl -n cilium exec $POD -- cilium endpoint list
# ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])
#               IPv6   IPv4           STATUS
#            ENFORCEMENT        ENFORCEMENT

# 292        Disabled           Disabled          47012      k8s:app=network-metrics-daemon
#                      10.254.2.215   ready
#                                                            k8s:component=network

#                                                            k8s:io.cilium.k8s.namespace.labels.name=openshift-multus

#                                                            k8s:io.cilium.k8s.namespace.labels.olm.operatorgroup.uid/2280aae5-4b08-41a3-a491-$
# fb50008331e
#                                                            k8s:io.cilium.k8s.namespace.labels.openshift.io/cluster-monitoring=true

#                                                            k8s:io.cilium.k8s.namespace.labels.openshift.io/run-level=0

#                                                            k8s:io.cilium.k8s.policy.cluster=default

#                                                            k8s:io.cilium.k8s.policy.serviceaccount=metrics-daemon-sa

#                                                            k8s:io.kubernetes.pod.namespace=openshift-multus

#                                                            k8s:openshift.io/component=network

#                                                            k8s:type=infra


kubectl -n cilium exec $POD -- cilium status
# KVStore:                Ok   Disabled
# Kubernetes:             Ok   1.19 (v1.19.0+e49167a) [linux/amd64]
# Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
# KubeProxyReplacement:   Probe   [enp1s0 (Direct Routing)]
# Cilium:                 Ok      OK
# NodeMonitor:            Listening for events on 6 CPUs with 64x4096 of shared memory
# Cilium health daemon:   Ok
# IPAM:                   IPv4: 10/255 allocated from 10.254.2.0/24,
# BandwidthManager:       Disabled
# Host Routing:           Legacy
# Masquerading:           IPTables
# Controller Status:      50/50 healthy
# Proxy Status:           OK, ip 10.254.2.79, 0 redirects active on ports 10000-20000
# Hubble:                 Ok              Current/Max Flows: 4096/4096 (100.00%), Flows/s: 71.25   Metrics: Disabled
# Cluster health:         3/3 reachable   (2021-03-08T13:10:50Z)

kubectl -n cilium exec $POD -- cilium status --all-addresses
# KVStore:                Ok   Disabled
# Kubernetes:             Ok   1.19 (v1.19.0+e49167a) [linux/amd64]
# Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
# KubeProxyReplacement:   Probe   [enp1s0 (Direct Routing)]
# Cilium:                 Ok      OK
# NodeMonitor:            Listening for events on 6 CPUs with 64x4096 of shared memory
# Cilium health daemon:   Ok
# IPAM:                   IPv4: 12/255 allocated from 10.254.2.0/24,
# Allocated addresses:
#   10.254.2.131 (openshift-multus/multus-admission-controller-c56l8 [restored])
#   10.254.2.132 (openshift-monitoring/alertmanager-main-0 [restored])
#   10.254.2.15 (openshift-monitoring/prometheus-k8s-0 [restored])
#   10.254.2.171 (cilium/hubble-relay-95df958c6-2jxwl)
#   10.254.2.192 (cilium/hubble-ui-5df5fb587d-6l75r)
#   10.254.2.195 (health)
#   10.254.2.215 (openshift-multus/network-metrics-daemon-vjkmr [restored])
#   10.254.2.245 (openshift-controller-manager/controller-manager-bj29h [restored])
#   10.254.2.35 (openshift-apiserver/apiserver-5c67746947-bd2wp [restored])
#   10.254.2.79 (router)
#   10.254.2.90 (openshift-dns/dns-default-hggxx [restored])
#   10.254.2.97 (openshift-oauth-apiserver/apiserver-549f94565d-kl6bb [restored])
# BandwidthManager:    Disabled
# Host Routing:        Legacy
# Masquerading:        IPTables
# Controller Status:   66/66 healthy
# Proxy Status:        OK, ip 10.254.2.79, 0 redirects active on ports 10000-20000
# Hubble:              Ok              Current/Max Flows: 4096/4096 (100.00%), Flows/s: 68.96   Metrics: Disabled
# Cluster health:      3/3 reachable   (2021-03-08T14:47:09Z)

kubectl get cn master-0 -o yaml
# ...
# spec:
#   addresses:
#   - ip: 192.168.7.13
#     type: InternalIP
#   - ip: 10.254.0.184
#     type: CiliumInternalIP
#   azure: {}
#   encryption: {}
#   eni: {}
#   health:
#     ipv4: 10.254.0.167
#   ipam:
#     podCIDRs:
#     - 10.254.0.0/24
# ...

# 让我们用bpftool来查看一下系统里面都有啥
oc -n cilium exec $POD -- bpftool net list
# xdp:

# tc:
# enp1s0(2) clsact/ingress bpf_netdev_enp1s0.o:[from-netdev] id 1077
# enp1s0(2) clsact/egress bpf_netdev_enp1s0.o:[to-netdev] id 1083
# cilium_net(3) clsact/ingress bpf_host_cilium_net.o:[to-host] id 1070
# cilium_host(4) clsact/ingress bpf_host.o:[to-host] id 1032
# cilium_host(4) clsact/egress bpf_host.o:[from-host] id 1045
# cilium_vxlan(5) clsact/ingress bpf_overlay.o:[from-overlay] id 923
# cilium_vxlan(5) clsact/egress bpf_overlay.o:[to-overlay] id 928
# lxc2818365dcf52(9) clsact/ingress bpf_lxc.o:[from-container] id 984
# lxc2818365dcf52(9) clsact/egress bpf_lxc.o:[to-container] id 1020
# lxcc899b45a3e26(11) clsact/ingress bpf_lxc.o:[from-container] id 944
# lxcc899b45a3e26(11) clsact/egress bpf_lxc.o:[to-container] id 975
# lxcad116d75c844(15) clsact/ingress bpf_lxc.o:[from-container] id 950
# lxcad116d75c844(15) clsact/egress bpf_lxc.o:[to-container] id 980
# lxc2fda292e3d0b(19) clsact/ingress bpf_lxc.o:[from-container] id 939
# lxc2fda292e3d0b(19) clsact/egress bpf_lxc.o:[to-container] id 968
# lxc3c9c10774fe9(21) clsact/ingress bpf_lxc.o:[from-container] id 948
# lxc3c9c10774fe9(21) clsact/egress bpf_lxc.o:[to-container] id 986
# lxc17eb3d9acd1f(25) clsact/ingress bpf_lxc.o:[from-container] id 1009
# lxc17eb3d9acd1f(25) clsact/egress bpf_lxc.o:[to-container] id 1063
# lxccd70d8b98510(27) clsact/ingress bpf_lxc.o:[from-container] id 1005
# lxccd70d8b98510(27) clsact/egress bpf_lxc.o:[to-container] id 1043
# lxc90379056af7f(29) clsact/ingress bpf_lxc.o:[from-container] id 941
# lxc90379056af7f(29) clsact/egress bpf_lxc.o:[to-container] id 977
# lxc_health(51) clsact/ingress bpf_lxc.o:[from-container] id 1089
# lxc_health(51) clsact/egress bpf_lxc.o:[to-container] id 1095
# lxc9b377d665f9f(53) clsact/ingress bpf_lxc.o:[from-container] id 1007
# lxc9b377d665f9f(53) clsact/egress bpf_lxc.o:[to-container] id 1040
# lxc1e3c45bc89b2(55) clsact/ingress bpf_lxc.o:[from-container] id 1018
# lxc1e3c45bc89b2(55) clsact/egress bpf_lxc.o:[to-container] id 1050

# flow_dissector:



在节点上,也能看到bpf加载了

# pod 里面的ip
ip a
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
#     inet 127.0.0.1/8 scope host lo
#        valid_lft forever preferred_lft forever
#     inet6 ::1/128 scope host 
#        valid_lft forever preferred_lft forever
# 18: eth0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP 
#     link/ether f2:93:a4:bd:7b:38 brd ff:ff:ff:ff:ff:ff
#     inet 10.254.1.174/32 scope global eth0
#        valid_lft forever preferred_lft forever
#     inet6 fe80::f093:a4ff:febd:7b38/64 scope link 
#        valid_lft forever preferred_lft forever

镜像仓库代理 / image registry proxy

准备离线镜像仓库非常麻烦,好在我们找到了一台在线的主机,那么我们可以使用nexus构造image registry proxy,在在线环境上面,做一遍PoC,然后就能通过image registry proxy得到离线镜像了

  • https://mtijhof.wordpress.com/2018/07/23/using-nexus-oss-as-a-proxy-cache-for-docker-images/
#####################################################
# init build the nexus fs
/bin/cp -f nexus-image.tgz /data/ccn/
tar zxf nexus-image.tgz
chown -R 200 /data/ccn/nexus-image

# podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z docker.io/sonatype/nexus3:3.29.0

podman run -d -p 8082:8081 -p 8083:8083 -p 8084:8084 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh

podman stop nexus-image
podman rm nexus-image

# get the admin password
cat /data/ccn/nexus-image/admin.password && echo
# 84091bcd-c82f-44a3-8b7b-dfc90f5b7da1

# open http://nexus.ocp4.redhat.ren:8082

# how to cleanup
# https://github.com/wangzheng422/nexus-docker-cleanup

# 开启 https
# https://blog.csdn.net/s7799653/article/details/105378645
# https://help.sonatype.com/repomanager3/system-configuration/configuring-ssl#ConfiguringSSL-InboundSSL-ConfiguringtoServeContentviaHTTPS
mkdir -p /data/install/tmp
cd /data/install/tmp

# 将证书导出成pkcs格式
# 这里需要输入密码  用 password,
openssl pkcs12 -export -out keystore.pkcs12 -inkey /etc/crts/redhat.ren.key -in /etc/crts/redhat.ren.crt

cat << EOF >> Dockerfile
FROM docker.io/sonatype/nexus3:3.29.0
USER root
COPY keystore.pkcs12 /keystore.pkcs12
RUN keytool -v -importkeystore -srckeystore keystore.pkcs12 -srcstoretype PKCS12 -destkeystore keystore.jks -deststoretype JKS -storepass password -srcstorepass password  &&\
    cp keystore.jks /opt/sonatype/nexus/etc/ssl/
USER nexus
EOF
buildah bud --format=docker -t docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh -f Dockerfile .
buildah push docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh

######################################################
# go to helper, update proxy setting for ocp cluster
cd /data/ocp4
bash image.registries.conf.sh nexus.ocp4.redhat.ren:8083

mkdir -p /etc/containers/registries.conf.d
/bin/cp -f image.registries.conf /etc/containers/registries.conf.d/

cd /data/ocp4
oc apply -f ./99-worker-container-registries.yaml -n openshift-config
oc apply -f ./99-master-container-registries.yaml -n openshift-config

######################################################
# dump the nexus image fs out
podman stop nexus-image

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date
cd /data/ccn

tar cf - ./nexus-image | pigz -c > nexus-image.tgz 
buildah from --name onbuild-container scratch
buildah copy onbuild-container nexus-image.tgz  /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/nexus-fs:image-$var_date
# buildah rm onbuild-container
# rm -f nexus-image.tgz 
buildah push docker.io/wangzheng422/nexus-fs:image-$var_date
echo "docker.io/wangzheng422/nexus-fs:image-$var_date"

# 以下这个版本,可以作为初始化的image proxy,里面包含了nfs provision,以及sample operator的metadata。很高兴的发现,image stream并不会完全下载镜像,好想只是下载metadata,真正用的时候,才去下载。
# docker.io/wangzheng422/nexus-fs:image-2020-12-26-1118

配置镜像仓库的ca

安装过程里面,已经把镜像仓库的ca放进去了,但是好想image stream不认,让我们再试试

oc project openshift-config
oc create configmap ca.for.registry -n openshift-config \
    --from-file=registry.ocp4.redhat.ren..5443=/data/install/redhat.ren.ca.crt \
    --from-file=nexus.ocp4.redhat.ren..8083=/data/install/redhat.ren.ca.crt 
oc patch image.config.openshift.io/cluster -p '{"spec":{"additionalTrustedCA":{"name":"ca.for.registry"}}}'  --type=merge

# oc patch image.config.openshift.io/cluster -p '{"spec":{"registrySources":{"insecureRegistries":["nexus.ocp4.redhat.ren:8083"]}}}'  --type=merge

oc get image.config.openshift.io/cluster -o yaml

# openshift project下面的image stream重新加载一下把
oc get is -o json | jq -r '.items[].metadata.name' | xargs -L1 oc import-image --all 

我们的工具机是带nfs的,那么就配置高档一些的nfs存储吧,不要用emptydir

bash /data/ocp4/ocp4-upi-helpernode-master/files/nfs-provisioner-setup.sh

# oc edit configs.imageregistry.operator.openshift.io
# 修改 storage 部分
# storage:
#   pvc:
#     claim:
oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Managed","storage":{"pvc":{"claim":""}}}}' --type=merge

oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

oc get clusteroperator image-registry

oc get configs.imageregistry.operator.openshift.io cluster -o yaml

# 把imagepruner给停掉
# https://bugzilla.redhat.com/show_bug.cgi?id=1852501#c24
# oc patch imagepruner.imageregistry/cluster --patch '{"spec":{"suspend":true}}' --type=merge
# oc -n openshift-image-registry delete jobs --all

oc get configs.samples.operator.openshift.io/cluster -o yaml

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Managed"}}' --type=merge

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

配置一下本地的dns ( 把 *.apps.ocp4.redhat.ren 配置成 192.168.7.11 ) ,指向工具机的haproxy,打开浏览器就能访问管理界面了

chrony/NTP 设置

在 ocp 4.6 里面,需要设定ntp同步,我们之前ansible脚本,已经创建好了ntp的mco配置,把他打到系统里面就好了。

oc apply -f /data/ocp4/ocp4-upi-helpernode-master/machineconfig/

Operator Hub 离线安装

https://docs.openshift.com/container-platform/4.2/operators/olm-restricted-networks.html

https://github.com/operator-framework/operator-registry

https://www.cnblogs.com/ericnie/p/11777384.html?from=timeline&isappinstalled=0

https://access.redhat.com/documentation/en-us/openshift_container_platform/4.2/html-single/images/index

operator hub 准备分2个层次,一个是本文章描述的,制作operator hub的离线资源,并镜像operator 镜像。做到这一步,能够在离线部署的ocp4.2上,看到operator hub,并且能够部署operator。但是如果要用operator来部署要用的组件,那么operator会再去下载镜像,这个层次的镜像,也需要离线部署,但是由于每个operator需要的镜像都不一样,也没有统一的地方进行描述,所以需要各个项目现场,根据需要另外部署,本项目会尽量多的下载需要的镜像,但是目前无法避免遗漏。

# on helper node, 在工具机上
cd /data/ocp4

# scp /etc/crts/redhat.ren.crt 192.168.7.11:/root/ocp4/
# https://docs.openshift.com/container-platform/4.4/builds/setting-up-trusted-ca.html
oc project openshift-config
oc create configmap ca.for.registry -n openshift-config \
    --from-file=registry.ocp4.redhat.ren..5443=/data/install/redhat.ren.crt
# 如果你想删除这个config map,这么做
# oc delete configmap ca.for.registry
oc patch image.config.openshift.io/cluster -p '{"spec":{"additionalTrustedCA":{"name":"ca.for.registry"}}}'  --type=merge
# oc patch image.config.openshift.io/cluster -p '{"spec":{"registrySources":{"insecureRegistries":["registry.redhat.ren"]}}}'  --type=merge
oc get image.config.openshift.io/cluster -o yaml

# 以下这个步骤是官网文档要做的,实践中发现,disconnected环境不需要
# oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'
# 如果你不小心还是照着官网做了,用如下步骤删掉
# oc patch OperatorHub cluster --type json  -p '[{"op": "remove", "path": "/spec/disableAllDefaultSources"}]'

oc patch OperatorHub cluster --type json \
    -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'

oc get OperatorHub cluster -o yaml

# yum -y install python36
# 根据项目现场情况,调整参数,运行以下命令,生成配置文件,指向内网镜像仓库
cd /data/ocp4/
bash image.registries.conf.sh registry.ocp4.redhat.ren:5443

# 由于某些ocp 4.2的更新机制,以下操作会触发集群更新,
# 集群节点会逐个重启,集群组件也会逐个重启,请等待集群重启完毕。
oc apply -f ./99-worker-container-registries.yaml -n openshift-config
oc apply -f ./99-master-container-registries.yaml -n openshift-config

# !!!正常情况,以下操作不需要!!!
# 以下操作,删除mirror镜像信息,也会触发集群更新操作,请等待集群重启完毕
oc delete -f ./99-worker-container-registries.yaml -n openshift-config
oc delete -f ./99-master-container-registries.yaml -n openshift-config

watch oc get machineconfigpools

watch oc get node

从监控界面,能看到节点在升级,重启。


oc get pods -n openshift-marketplace
oc get catalogsource -n openshift-marketplace
oc get packagemanifest -n openshift-marketplace

能看到operator 列表

部署一个operator也能成功

other tips

# disable cluster upgrade check, and insight check
oc scale --replicas 0 -n openshift-cluster-version deployments/cluster-version-operator
oc scale --replicas 0 -n openshift-insights deployments/insights-operator


# set master and worker combine
# https://github.com/openshift-telco/openshift4x-poc/blob/master/MASTER-WORKER-COMBINED.md
oc edit schedulers cluster
# apiVersion: config.openshift.io/v1
# kind: Scheduler
# metadata:
# name: cluster
# spec:
#     mastersSchedulable: true

以下是参考材料

https://www.openshift.com/blog/delivering-a-three-node-architecture-for-edge-deployments

nvidia gpu for openshift 4.6 disconnected 英伟达GPU离线安装

简介

本次实验是openshift 边缘 GPU 场景的一部分,主要关注于nvidia gpu如何在离线的情况下安装。关于如何 gpu passthrough 到kvm,模拟边缘gpu主机,见这个文档

以下是讲解视频

以下是本次实验的架构图:

制作 rhel8 repo / 安装源

nvidia gpu operator需要在线下载包,来编译driver,那么在离线场景,我们就需要先准备一个rhel8 的 repo。

export PROXY="127.0.0.1:18801"

subscription-manager --proxy=$PROXY release --list

subscription-manager --proxy=$PROXY release --set=8

subscription-manager --proxy=$PROXY repos --disable="*"
subscription-manager --proxy=$PROXY repos \
    --enable="rhel-8-for-x86_64-baseos-rpms" \
    --enable="rhel-8-for-x86_64-baseos-source-rpms" \
    --enable="rhel-8-for-x86_64-appstream-rpms" \
    --enable="rhel-8-for-x86_64-supplementary-rpms" \
    --enable="codeready-builder-for-rhel-8-x86_64-rpms" \
    --enable="rhocp-4.6-for-rhel-8-x86_64-rpms" \
    --enable="rhel-8-for-x86_64-baseos-eus-rpms" \
    # endline

mkdir -p /data/dnf/gaps
cd /data/dnf/gaps

# subscription-manager --proxy=$PROXY release --set=8.2
# subscription-manager --proxy=$PROXY release --set=8

# dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
dnf copr enable frostyx/modulemd-tools
dnf install -y modulemd-tools 
# dnf install -y https://kojipkgs.fedoraproject.org//packages/modulemd-tools/0.9/1.fc32/noarch/modulemd-tools-0.9-1.fc32.noarch.rpm

# 注意,这里需要的包,需要先部署一下gpu operator,然后看看driver的日志,里面装什么包,这里替换成相应的包,不同版本的gpu operator要求不同,所以这里的包也不一样。
/bin/rm -rf /data/dnf/gaps/*
# dnf download --resolve --releasever=8.2 --alldeps \
# --repo rhel-8-for-x86_64-baseos-eus-rpms,rhel-8-for-x86_64-baseos-rpms,rhel-8-for-x86_64-appstream-rpms,ubi-8-baseos,ubi-8-appstream \
# kernel-headers.x86_64 kernel-devel.x86_64 kernel-core.x86_64 systemd-udev.x86_64 elfutils-libelf.x86_64 elfutils-libelf-devel.x86_64 \
# kernel-headers-4.18.0-193.40.1.el8_2.x86_64 kernel-devel-4.18.0-193.40.1.el8_2.x86_64 kernel-core-4.18.0-193.40.1.el8_2.x86_64 systemd-udev-239-31.el8_2.2.x86_64 kernel-headers-4.18.0-193.41.1.el8_2.x86_64 kernel-devel-4.18.0-193.41.1.el8_2.x86_64 \
# elfutils-libelf-0.180-1.el8.x86_64

subscription-manager --proxy=$PROXY release --set=8.2

dnf download --resolve --releasever=8.2 --alldeps \
--repo rhel-8-for-x86_64-baseos-eus-rpms,rhel-8-for-x86_64-baseos-rpms,rhel-8-for-x86_64-appstream-rpms,ubi-8-baseos,ubi-8-appstream \
kernel-headers.x86_64 kernel-devel.x86_64 kernel-core.x86_64 systemd-udev.x86_64 elfutils-libelf.x86_64 elfutils-libelf-devel.x86_64 \
kernel-headers-4.18.0-193.41.1.el8_2.x86_64 kernel-devel-4.18.0-193.41.1.el8_2.x86_64 kernel-core-4.18.0-193.41.1.el8_2.x86_64

subscription-manager --proxy=$PROXY release --set=8

dnf download --resolve --alldeps \
--repo rhel-8-for-x86_64-baseos-rpms,rhel-8-for-x86_64-appstream-rpms,ubi-8-baseos,ubi-8-appstream \
elfutils-libelf.x86_64 elfutils-libelf-devel.x86_64 

# https://access.redhat.com/solutions/4907601
createrepo ./
repo2module . \
    --module-name foo \
    --module-stream devel \
    --module-version 123 \
    --module-context f32
createrepo_mod .

现在,本机的 /data/dnf/gaps/ 目录,就是repo的目录了,做一个ftp服务,把他暴露出去就好了。具体方法,参考这里

修改英伟达驱动镜像 / nvidia driver image

默认 nvidia gpu driver pod 是需要联网下载各种包的,这里面还涉及到订阅,非常麻烦,而且离线无法使用。

我们刚才已经做了一个离线的repo仓库,那么我们就需要定制一下driver image,让他直接用离线的repo仓库就好了。

官方driver镜像下载: https://ngc.nvidia.com/catalog/containers/nvidia:driver/tags


mkdir -p /data/install/
cd /data/install
# /bin/rm -rf /etc/yum.repos.d/* 
export YUMIP="192.168.7.1"
cat << EOF > ./remote.repo
[gaps]
name=gaps
baseurl=ftp://${YUMIP}/dnf/gaps
enabled=1
gpgcheck=0

EOF

oc create configmap repo-config -n gpu-operator-resources --from-file=./remote.repo

可以使用oeprator UI 来安装ClusterPolicy,注意调整driver config 的 repo config : repo-config -> /etc/yum.repos.d

定制driver config image

如果我们对driver config image有特殊需求,那么这样定制

here is an reference: https://github.com/dmc5179/nvidia-driver

有人把 rpm 包直接装在driver image里面了,也是一个很好的思路。


# driver image
# nvidia-driver-daemonset
podman pull nvcr.io/nvidia/driver:450.80.02-rhcos4.6

# you can test the driver image, like this:
# podman run --rm -it --entrypoint='/bin/bash' nvcr.io/nvidia/driver:450.80.02-rhcos4.6

podman run --rm -it --entrypoint='/bin/bash' nvcr.io/nvidia/driver:460.32.03-rhcos4.6

mkdir -p /data/gpu/
cd /data/gpu
export YUMIP="192.168.7.1"
cat << EOF > /data/gpu/remote.repo
[gaps]
name=gaps
baseurl=ftp://${YUMIP}/dnf/gaps
enabled=1
gpgcheck=0

EOF

cat << EOF > /data/gpu/Dockerfile
FROM nvcr.io/nvidia/driver:450.80.02-rhcos4.6

RUN /bin/rm -rf /etc/yum.repos.d/* 
COPY remote.repo /etc/yum.repos.d/remote.repo

EOF

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

buildah bud --format=docker -t docker.io/wangzheng422/imgs:nvidia-gpu-driver-$var_date-rhcos4.6 -f Dockerfile .

# podman run --rm -it --entrypoint='/bin/bash' docker.io/wangzheng422/imgs:nvidia-gpu-driver-2021-01-21-0942

buildah push docker.io/wangzheng422/imgs:nvidia-gpu-driver-$var_date-rhcos4.6
echo "docker.io/wangzheng422/imgs:nvidia-gpu-driver-$var_date-rhcos4.6"

# docker.io/wangzheng422/imgs:nvidia-gpu-driver-2021-02-05-1131-rhcos4.6

好了,我们最后制作好的镜像,使用tag的实话,注意要省略到后面的 rhcos4.6, 因为operator UI 会自动补全。

开始离线安装gpu operator

参考nvidia官方的安装文档

首先,我们要安装 node feature discovery (nfd),在以前,nfd只能扫描独立worker节点,所以 3 节点的edge模式,那个时候不支持的。在后面的版本里面,nfd修复了这个漏洞,3节点的edge模式也支持了。

然后给nfd做一个配置,直接点击创建就可以,什么都不用改,注意namespace是openshift-operator

接下来,我们先创建一个namespace gpu-operator-resources, 然后安装 nvidia gpu operator

我们在gpu operator里面,创建一个cluster policy,注意要修改他的参数。这里给的例子是定制过driver镜像的,如果没定制,不用修改这个参数。

最后我们从 node label 上就能看见识别到的 gpu 了。

测试一下

# 先按照官方文档试试
oc project gpu-operator-resources

POD_NAME=$(oc get pods -o json | jq -r '.items[] | select( .metadata.name | contains("nvidia-driver-daemonset") ) | .metadata.name' | head )

oc exec -it $POD_NAME -- nvidia-smi 
# Thu Jan 21 04:12:36 2021
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
# |-------------------------------+----------------------+----------------------+
# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
# |                               |                      |               MIG M. |
# |===============================+======================+======================|
# |   0  Tesla T4            On   | 00000000:05:00.0 Off |                  Off |
# | N/A   27C    P8    14W /  70W |      0MiB / 16127MiB |      0%      Default |
# |                               |                      |                  N/A |
# +-------------------------------+----------------------+----------------------+

# +-----------------------------------------------------------------------------+
# | Processes:                                                                  |
# |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
# |        ID   ID                                                   Usage      |
# |=============================================================================|
# |  No running processes found                                                 |
# +-----------------------------------------------------------------------------+

# 我们再启动个应用试试
# https://nvidia.github.io/gpu-operator/

# https://ngc.nvidia.com/catalog/containers/nvidia:tensorrt
# 我们按照这个官方文档,做一个测试镜像

# goto helper
cd /data/ocp4

cat << EOF > /data/ocp4/gpu.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo1
  labels:
    app: demo1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo1
  template:
    metadata:
      labels:
        app: demo1
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'worker-0'
      restartPolicy: Always
      containers:
        - name: demo1
          image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2-ubi8"

EOF
oc project demo
oc apply -f gpu.yaml
# [Vector addition of 50000 elements]
# Copy input data from the host memory to the CUDA device
# CUDA kernel launch with 196 blocks of 256 threads
# Copy output data from the CUDA device to the host memory
# Test PASSED
# Done

oc delete -f gpu.yaml

# on build host
# https://ngc.nvidia.com/catalog/containers/nvidia:tensorrt
# podman run -it nvcr.io/nvidia/tensorrt:20.12-py3
mkdir -p /data/gpu
cd /data/gpu
cat << EOF > /data/gpu/Dockerfile
FROM docker.io/wangzheng422/imgs:tensorrt-ljj

CMD tail -f /dev/null

EOF
var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

buildah bud --format=docker -t docker.io/wangzheng422/imgs:tensorrt-ljj-$var_date -f Dockerfile .

buildah push docker.io/wangzheng422/imgs:tensorrt-ljj-$var_date
echo "docker.io/wangzheng422/imgs:tensorrt-ljj-$var_date"

# docker.io/wangzheng422/imgs:tensorrt-ljj-2021-01-21-1151

# go back to helper node
cat << EOF > /data/ocp4/gpu.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo1
  labels:
    app: demo1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo1
  template:
    metadata:
      labels:
        app: demo1
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'worker-0'
      restartPolicy: Always
      containers:
        - name: demo1
          image: docker.io/wangzheng422/imgs:tensorrt-ljj-2021-01-21-1151

EOF
oc project demo
oc apply -f gpu.yaml

# oc rsh into the pod, run sample program using gpu
# cd tensorrt/bin/
# ./sample_mnist
# you will see this correct result
# &&&& PASSED TensorRT.sample_mnist # ./sample_mnist

oc delete -f gpu.yaml


tips

  1. 如果发现nfd不能发现gpu型号,node reboot就好了
  2. 如果发现gpu feature discovery不正常, node reboot就好了
cat /proc/driver/nvidia/version

reference

https://www.openshift.com/blog/simplifying-deployments-of-accelerated-ai-workloads-on-red-hat-openshift-with-nvidia-gpu-operator

https://www.openshift.com/blog/how-to-use-entitled-image-builds-to-build-drivercontainers-with-ubi-on-openshift

https://access.redhat.com/solutions/5232901

https://docs.nvidia.com/datacenter/kubernetes/openshift-on-gpu-install-guide/

https://access.redhat.com/solutions/4907601

https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html

以下弯路

# add ubi support
cat << EOF > /etc/yum.repos.d/ubi.repo
[ubi-8-baseos]
name=ubi-8-baseos
baseurl=https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi8/8/x86_64/baseos/os
enabled=1
gpgcheck=1

[ubi-8-appstream]
name=ubi-8-appstream
baseurl=https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi8/8/x86_64/appstream/os
enabled=1
gpgcheck=1

[ubi-8-codeready-builder]
name=ubi-8-codeready-builder
baseurl=https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi8/8/x86_64/codeready-builder/os/
enabled=1
gpgcheck=1

EOF

cd /data/dnf
dnf reposync -m --download-metadata --delete -n



cat << EOF > /data/ocp4/gpu.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo1
  labels:
    app: demo1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo1
  template:
    metadata:
      labels:
        app: demo1
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'worker-0'
      restartPolicy: Always
      containers:
        - name: demo1
          image: nvidia/cuda:11.1.1-devel-centos8

EOF
oc project demo
oc apply -f gpu.yaml

oc delete -f gpu.yaml

cat << EOF > /data/gpu/Dockerfile
FROM nvcr.io/nvidia/tensorrt:20.12-py3

RUN /opt/tensorrt/python/python_setup.sh
RUN /opt/tensorrt/install_opensource.sh 
RUN /opt/tensorrt/install_opensource.sh -b master
# RUN cd /workspace/tensorrt/samples && make -j4

CMD tail -f /dev/null

EOF

本文描述如何在项目现场,补充缺失的离线镜像

感谢 william shen, kevin lin 的帮助和提醒,大大简化了ocp4.3补充镜像的过程。

大致的流程是

  • 编辑 add.image.list 文件,把想要补充的镜像写进去,可以用#开始,代表注释,注意文件末尾加几个回车换行。
  • 在外网主机,运行add.image.sh,会下载镜像到指定的目录,然后自行压缩成tgz
  • 在内网工具机主机,上传压缩的tgz, 并解压缩
  • 在内网工具机主机,cd /data/ocp4, 运行 add.image.load.sh, 加载镜像即可。
# 在外网云主机
# on vultr
# edit add.image.list

export MIRROR_DIR='/data/redhat-operator'
/bin/rm -rf ${MIRROR_DIR}

cd /data/ocp4
bash add.image.sh add.image.list ${MIRROR_DIR}
# bash add.image.sh is.openshift.list

# on 内网 工具机
# scp back /data/mirror_dir.tgz to /data/ocp4

bash add.image.load.sh /data/mirror_dir 'registry.redhat.ren:5443'
# bash add.image.load.sh /data/remote/4.3.3/is.samples/mirror_dir

openshift 补充 sample 镜像

openshift集群里面的openshift project,有很多的自带的image stream,这些image stream指向的是公网的镜像仓库地址,如果是离线环境,应该如何导入镜像,并更新image stream定义呢?

# 导入镜像
# 解压缩is.sample.tgz  到 /data
pigz -dc is.samples.tgz | tar xf -
# 根据现场环境修改add.image.load.sh,并运行
bash add.image.load.sh /data/is.samples/mirror_dir/

# 修正image stream定义
# 根据现场环境,修改is.patch.sh
bash is.patch.sh

openshift 4.3 calico 离线部署

https://docs.projectcalico.org/getting-started/openshift/requirements

image prepare


cd /data/ocp4

cat << EOF > add.image.list
quay.io/tigera/operator-init:v1.3.3
quay.io/tigera/operator:v1.3.3
docker.io/calico/ctl:v3.13.2
docker.io/calico/kube-controllers:v3.13.2
docker.io/calico/node:v3.13.2
docker.io/calico/typha:v3.13.2
docker.io/calico/pod2daemon-flexvol:v3.13.2
docker.io/calico/cni:v3.13.2
EOF

bash add.image.sh add.image.list

bash add.image.load.sh /data/down/mirror_dir

install


# scp install-config.yaml into /root/ocp4
# sed -i 's/OpenShiftSDN/Calico/' install-config.yaml
openshift-install create manifests --dir=/root/ocp4
# scp calico/manifests to manifests
openshift-install create ignition-configs --dir=/root/ocp4

# follow 4.3.disconnect.operator.md to install

oc get tigerastatus

oc get pod -n tigera-operator

oc get pod -n calico-system

# 看看都用了什么image
oc project tigera-operator

oc get pod -o json | jq -r '.items[].spec.containers[].image' | sort | uniq
# quay.io/tigera/operator-init:v1.3.3
# quay.io/tigera/operator:v1.3.3

oc project calico-system

oc get pod -o json | jq -r '.items[].spec.containers[].image' | sort | uniq
# calico/ctl:v3.13.2
# docker.io/calico/kube-controllers:v3.13.2
# docker.io/calico/node:v3.13.2
# docker.io/calico/typha:v3.13.2

# docker.io/calico/pod2daemon-flexvol:v3.13.2
# docker.io/calico/cni:v3.13.2

# 安装控制命令行
oc apply -f calicoctl.yaml

oc exec calicoctl -n calico-system -it -- /calicoctl get node -o wide

oc exec calicoctl -n calico-system -it -- /calicoctl ipam show --show-blocks

oc exec calicoctl -n calico-system -it -- /calicoctl get ipPool -o wide

calico 下,创建 pod,指定 ip pool

视频讲解

  • https://youtu.be/GJSFF7DDCe8
  • https://www.bilibili.com/video/BV14Z4y1p7wa/

https://www.tigera.io/blog/calico-ipam-explained-and-enhanced/

# 创建ip pool
cat << EOF > calico.ip.pool.yaml
---
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: ip-pool-1
spec:
  cidr: 172.110.110.0/24
  ipipMode: Always
  natOutgoing: true
---
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: ip-pool-2
spec:
  cidr: 172.110.220.0/24
  ipipMode: Always
  natOutgoing: true
EOF
cat calico.ip.pool.yaml | oc exec calicoctl -n calico-system -i -- /calicoctl apply -f -

# 检查ip pool的创建情况
oc exec calicoctl -n calico-system -it -- /calicoctl get ipPool -o wide

cat << EOF > calico.pod.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod1
  namespace: demo
  annotations:
    cni.projectcalico.org/ipv4pools: '["ip-pool-1"]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod2
  namespace: demo
  annotations:
    cni.projectcalico.org/ipv4pools: '["ip-pool-1"]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod3
  namespace: demo
  annotations:
    cni.projectcalico.org/ipv4pools: '["ip-pool-2"]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod4
  namespace: demo
  annotations:
    cni.projectcalico.org/ipv4pools: '["ip-pool-1"]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-1.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod5
  namespace: demo
  annotations:
    cni.projectcalico.org/ipv4pools: '["ip-pool-2"]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-1.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
EOF
oc apply -f calico.pod.yaml

# 查看pod的IP分配,是按照我们指定的ip地址范围分配的
oc get pod -o wide -n demo
# [root@helper ocp4]# oc get pod -o wide -n demo
# NAME        READY   STATUS    RESTARTS   AGE     IP                NODE                       NOMINATED NODE   READINESS GATES
# demo-pod1   1/1     Running   0          8m52s   172.110.110.67    worker-0.ocp4.redhat.ren   <none>           <none>
# demo-pod2   1/1     Running   0          8m52s   172.110.110.68    worker-0.ocp4.redhat.ren   <none>           <none>
# demo-pod3   1/1     Running   0          8m52s   172.110.220.64    worker-0.ocp4.redhat.ren   <none>           <none>
# demo-pod4   1/1     Running   0          8m52s   172.110.110.128   worker-1.ocp4.redhat.ren   <none>           <none>
# demo-pod5   1/1     Running   0          8m52s   172.110.220.130   worker-1.ocp4.redhat.ren   <none>           <none>

# 获得除了demo-pod1以外的所有pod的ip地址
oc get pod -o json | jq -r '.items[] | select(.metadata.name != "demo-pod1") | .status.podIP'

# 从demo-pod1上pind这些pod的ip地址,都能ping通。
for var_i in $(oc get pod -o json | jq -r '.items[] | select(.metadata.name != "demo-pod1") | .status.podIP'); do
    oc exec -n demo demo-pod1 -it -- ping -c 5 ${var_i}
done

# clean up
oc delete -f calico.pod.yaml

cat calico.ip.pool.yaml | oc exec calicoctl -n calico-system -i -- /calicoctl delete -f -

calico + multus

视频讲解

  • https://youtu.be/MQRv6UASZcA
  • https://www.bilibili.com/video/BV1zi4y147sk/
  • https://www.ixigua.com/i6825969911781655048/
# 创建multus macvlan需要的ip地址
cat << EOF > calico.macvlan.yaml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks: 
  - name: multus-macvlan-0
    namespace: demo
    type: SimpleMacvlan
    simpleMacvlanConfig:
      ipamConfig:
        type: static
        staticIPAMConfig:
          addresses:
          - address: 10.123.110.11/24
          routes:
  - name: multus-macvlan-1
    namespace: demo
    type: SimpleMacvlan
    simpleMacvlanConfig:
      ipamConfig:
        type: static
        staticIPAMConfig:
          addresses:
          - address: 10.123.110.22/24

EOF
oc apply -f calico.macvlan.yaml

# 检查创建的ip地址
oc get Network.operator.openshift.io -o yaml

# 创建pod,并配置multus,使用macvlan
cat << EOF > calico.pod.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod1
  namespace: demo
  annotations:
    k8s.v1.cni.cncf.io/networks: '
      [{
        "name": "multus-macvlan-0"
      }]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod2
  namespace: demo
  annotations:
    k8s.v1.cni.cncf.io/networks: '
      [{
        "name": "multus-macvlan-1"
      }]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-1.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always

EOF
oc apply -f calico.pod.yaml

# 查看demo-pod2上的ip地址
var_ips=$(oc get pod -o json | jq -r '.items[] | select(.metadata.name != "demo-pod1") | .metadata.annotations["k8s.v1.cni.cncf.io/networks-status"] | fromjson | .[].ips[0] ' )
echo -e "$var_ips"

# oc get pod -o json | jq -r ' .items[] | select(.metadata.name != "demo-pod1") | { podname: .metadata.name, ip: ( .metadata.annotations["k8s.v1.cni.cncf.io/networks-status"] | fromjson | .[].ips[0] ) } | [.podname, .ip] | @tsv'

# 从demo pod1上ping demo pod2上的2个ip地址
for var_i in $var_ips; do
  oc exec -n demo demo-pod1 -it -- ping -c 5 ${var_i}
done

# restore
oc delete -f calico.pod.yaml

cat << EOF > calico.macvlan.yaml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
EOF
oc apply -f calico.macvlan.yaml

calico + static ip

https://docs.projectcalico.org/networking/use-specific-ip

视频讲解

  • https://youtu.be/q8FtuOzBixA
  • https://www.bilibili.com/video/BV1zz411q78i/
# 创建测试用的静态ip deployment,和pod
cat << EOF > demo.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo  
      annotations:
        "cni.projectcalico.org/ipAddrs": '["10.254.22.33"]'
    spec:
      nodeSelector:
        # kubernetes.io/hostname: 'worker-1.ocp4.redhat.ren'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod1
  namespace: demo
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
EOF
oc apply -n demo -f demo.yaml

# 检查pod的ip地址
oc get pod -o wide
# NAME                    READY   STATUS    RESTARTS   AGE   IP              NODE                       NOMINATED NODE   READINESS GATES
# demo-8688cf4477-s26rs   1/1     Running   0          5s    10.254.22.33    worker-1.ocp4.redhat.ren   <none>           <none>
# demo-pod1               1/1     Running   0          6s    10.254.115.48   worker-0.ocp4.redhat.ren   <none>           <none>

# ping测试
oc exec -n demo demo-pod1 -it -- ping -c 5 10.254.22.33

# 移动pod到其他node
oc get pod -o wide

# ping测试
oc exec -n demo demo-pod1 -it -- ping -c 5 10.254.22.33

# clean up
oc delete -n demo -f demo.yaml

calico + mtu

https://docs.projectcalico.org/networking/mtu

视频讲解

  • https://youtu.be/hTafoKlQiY0
  • https://www.bilibili.com/video/BV1Tk4y167Zs/
# 先检查一下已有的mtu
cat << EOF > demo.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod1
  namespace: demo
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod2
  namespace: demo
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-1.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
EOF
oc apply -n demo -f demo.yaml

# 检查 mtu,现在tunl上是1480,eth0上是1410
oc exec -it demo-pod1 -- ip a
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
#     inet 127.0.0.1/8 scope host lo
#        valid_lft forever preferred_lft forever
#     inet6 ::1/128 scope host
#        valid_lft forever preferred_lft forever
# 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
#     link/ipip 0.0.0.0 brd 0.0.0.0
# 4: eth0@if54: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UP group default
#     link/ether c2:e9:6a:c8:62:77 brd ff:ff:ff:ff:ff:ff link-netnsid 0
#     inet 10.254.115.50/32 scope global eth0
#        valid_lft forever preferred_lft forever
#     inet6 fe80::c0e9:6aff:fec8:6277/64 scope link
#        valid_lft forever preferred_lft forever

# 把mtu 从1410改成700
oc get installations.operator.tigera.io -o yaml

oc edit installations.operator.tigera.io
# spec:
#   calicoNetwork:
#     mtu: 700

# 重启calico node pod
# oc delete deploy calico-kube-controllers -n calico-system
# oc delete deploy calico-typha -n calico-system
# oc delete ds calico-node -n calico-system
oc delete -n demo -f demo.yaml

# 重启worker node

# 重新创建pod
# oc apply -n demo -f demo.yaml

# 查看mtu
oc exec -i demo-pod1 -- ip a
oc exec -i demo-pod2 -- ip a

# 各种ping测试
var_ip=$(oc get pod -o json | jq -r '.items[] | select(.metadata.name == "demo-pod1") | .status.podIP')
echo $var_ip
# ICMP+IP 的包头有 28 bytes
#  the IP stack of your system adds ICMP and IP headers which equals to 28 bytes
oc exec -i demo-pod2 -- ping -M do -s $((600-28)) -c 5 $var_ip
oc exec -i demo-pod2 -- ping -M do -s $((700-28)) -c 5 $var_ip
oc exec -i demo-pod2 -- ping -M do -s $((800-28)) -c 5 $var_ip

# 把mtu从700恢复成1410
oc edit installations.operator.tigera.io
# spec:
#   calicoNetwork:

oc get installations.operator.tigera.io -o yaml

# 重启calico node pod
# oc delete deploy calico-kube-controllers -n calico-system
# oc delete deploy calico-typha -n calico-system
# oc delete ds calico-node -n calico-system
oc delete -n demo -f demo.yaml

# 重启worker node

# 重新创建pod
# oc apply -n demo -f demo.yaml

# 查看mtu
oc exec -i demo-pod1 -- ip a
oc exec -i demo-pod2 -- ip a

# 各种ping测试
var_ip=$(oc get pod -o json | jq -r '.items[] | select(.metadata.name == "demo-pod1") | .status.podIP')
echo $var_ip
# ICMP+IP 的包头有 28 bytes
#  the IP stack of your system adds ICMP and IP headers which equals to 28 bytes
oc exec -i demo-pod2 -- ping -M do -s $((600-28)) -c 5 $var_ip
oc exec -i demo-pod2 -- ping -M do -s $((700-28)) -c 5 $var_ip
oc exec -i demo-pod2 -- ping -M do -s $((800-28)) -c 5 $var_ip

# restore
oc delete -n demo -f demo.yaml

calico + ipv4/v6 dual stack

视频讲解

  • https://youtu.be/ju4d7jWs7DQ
  • https://www.bilibili.com/video/BV1va4y1e7c1/
  • https://www.ixigua.com/i6827830624431112715/
# 在集群安装之前,配置文件写入ipv6地址信息
# install openshift with calico and ipv6 config
# networking:
#   clusterNetworks:
#   - cidr: 10.254.0.0/16
#     hostPrefix: 24
#   - cidr: fd00:192:168:7::/64
#     hostPrefix: 80

# 在安装集群的过程中,给主机添加ipv6地址,安装就可以顺利继续了
## add ipv6 address to hosts
# helper
nmcli con modify eth0 ipv6.address "fd00:192:168:7::11/64" ipv6.gateway fd00:192:168:7::1
nmcli con modify eth0 ipv6.method manual
nmcli con reload
nmcli con up eth0

# master0
nmcli con modify ens3 ipv6.address fd00:192:168:7::13/64 ipv6.gateway fd00:192:168:7::1 ipv6.method manual
nmcli con reload
nmcli con up ens3

# master1
nmcli con modify ens3 ipv6.address fd00:192:168:7::14/64 ipv6.gateway fd00:192:168:7::1 ipv6.method manual
nmcli con reload
nmcli con up ens3

# master2
nmcli con modify ens3 ipv6.address fd00:192:168:7::15/64 ipv6.gateway fd00:192:168:7::1 ipv6.method manual
nmcli con reload
nmcli con up ens3

# worker0
nmcli con modify ens3 ipv6.address fd00:192:168:7::16/64 ipv6.gateway fd00:192:168:7::1 ipv6.method manual
nmcli con reload
nmcli con up ens3

# worker1
nmcli con modify ens3 ipv6.address fd00:192:168:7::17/64 ipv6.gateway fd00:192:168:7::1 ipv6.method manual
nmcli con reload
nmcli con up ens3

oc apply -f calicoctl.yaml

oc exec calicoctl -n calico-system -it -- /calicoctl get node -o wide

oc exec calicoctl -n calico-system -it -- /calicoctl ipam show --show-blocks

oc exec calicoctl -n calico-system -it -- /calicoctl get ipPool -o wide

# 在openshift的开发者视图上部署一个tomcat

# 从浏览器上,直接访问route入口,测试ipv4的效果。

# 在master0上直接访问worker1上的pod ipv6地址
curl -g -6 'http://[fd00:192:168:7:697b:8c59:3298:b950]:8080/'

# 在集群外,直接访问worker0上的pod ipv6地址
ip -6 route add fd00:192:168:7:697b:8c59:3298::/112 via fd00:192:168:7::17 dev eth0
curl -g -6 'http://[fd00:192:168:7:697b:8c59:3298:b950]:8080/'

calico + bgp


cat << EOF > calico.serviceip.yaml
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  serviceClusterIPs:
  - cidr: 10.96.0.0/16
EOF
cat calico.serviceip.yaml | oc exec calicoctl -n calico-system -i -- /calicoctl apply -f -

oc exec calicoctl -n calico-system -i -- /calicoctl patch bgpconfiguration default -p '{"spec": {"nodeToNodeMeshEnabled": true}}'

oc exec calicoctl -n calico-system -it -- /calicoctl get bgpconfig default -o yaml

oc exec calicoctl -n calico-system -it -- /calicoctl get node -o wide

oc exec calicoctl -n calico-system -it -- /calicoctl ipam show --show-blocks

oc exec calicoctl -n calico-system -it -- /calicoctl get ipPool -o wide


oc exec calicoctl -n calico-system -it -- /calicoctl get workloadEndpoint

oc exec calicoctl -n calico-system -it -- /calicoctl get BGPPeer

cat << EOF > calico.bgp.yaml
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: my-global-peer
spec:
  peerIP: 192.168.7.11
  asNumber: 64513
EOF
cat calico.bgp.yaml | oc exec calicoctl -n calico-system -i -- /calicoctl apply -f -

# on helper
# https://www.vultr.com/docs/configuring-bgp-using-quagga-on-vultr-centos-7
yum install quagga
systemctl start zebra
systemctl start bgpd
cp /usr/share/doc/quagga-*/bgpd.conf.sample /etc/quagga/bgpd.conf
vtysh
show running-config
configure terminal
no router bgp 7675
router bgp 64513
no auto-summary
no synchronization
neighbor 192.168.7.13 remote-as 64512
neighbor 192.168.7.13 description "calico"
neighbor 192.168.7.13 attribute-unchanged next-hop
neighbor 192.168.7.13 ebgp-multihop 255
neighbor 192.168.7.13 next-hop-self
# no neighbor 192.168.7.13 next-hop-self
neighbor 192.168.7.13 activate
interface eth0
exit
exit
write
show running-config
show ip bgp summary

# 测试一下
cat << EOF > calico.ip.pool.yaml
---
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: ip-pool-1
spec:
  cidr: 172.110.110.0/24
  ipipMode: Always
  natOutgoing: false
---
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: ip-pool-2
spec:
  cidr: 172.110.220.0/24
  ipipMode: Always
  natOutgoing: false
EOF
cat calico.ip.pool.yaml | oc exec calicoctl -n calico-system -i -- /calicoctl apply -f -

oc exec calicoctl -n calico-system -it -- /calicoctl get ipPool -o wide

cat << EOF > calico.pod.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod1
  namespace: demo
  annotations:
    cni.projectcalico.org/ipv4pools: '["ip-pool-1"]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod2
  namespace: demo
  annotations:
    cni.projectcalico.org/ipv4pools: '["ip-pool-1"]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod3
  namespace: demo
  annotations:
    cni.projectcalico.org/ipv4pools: '["ip-pool-2"]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod4
  namespace: demo
  annotations:
    cni.projectcalico.org/ipv4pools: '["ip-pool-1"]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-1.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod5
  namespace: demo
  annotations:
    cni.projectcalico.org/ipv4pools: '["ip-pool-2"]'
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-1.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
EOF
oc apply -f calico.pod.yaml

run calico/node with —backend=none

CALICO_NETWORKING_BACKEND none

https://docs.projectcalico.org/reference/node/configuration

backups


skopeo copy docker://quay.io/tigera/operator-init:v1.3.3 docker://registry.redhat.ren:5443/tigera/operator-init:v1.3.3
skopeo copy docker://quay.io/tigera/operator:v1.3.3 docker://registry.redhat.ren:5443/tigera/operator:v1.3.3

skopeo copy docker://docker.io/calico/ctl:v3.13.2 docker://registry.redhat.ren:5443/calico/ctl:v3.13.2
skopeo copy docker://docker.io/calico/kube-controllers:v3.13.2 docker://registry.redhat.ren:5443/calico/kube-controllers:v3.13.2
skopeo copy docker://docker.io/calico/node:v3.13.2 docker://registry.redhat.ren:5443/calico/node:v3.13.2
skopeo copy docker://docker.io/calico/typha:v3.13.2 docker://registry.redhat.ren:5443/calico/typha:v3.13.2
skopeo copy docker://docker.io/calico/pod2daemon-flexvol:v3.13.2 docker://registry.redhat.ren:5443/calico/pod2daemon-flexvol:v3.13.2
skopeo copy docker://docker.io/calico/cni:v3.13.2 docker://registry.redhat.ren:5443/calico/cni:v3.13.2

curl https://docs.projectcalico.org/manifests/ocp/crds/01-crd-installation.yaml -o manifests/01-crd-installation.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/01-crd-tigerastatus.yaml -o manifests/01-crd-tigerastatus.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-bgpconfiguration.yaml -o manifests/02-crd-bgpconfiguration.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-bgppeer.yaml -o manifests/02-crd-bgppeer.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-blockaffinity.yaml -o manifests/02-crd-blockaffinity.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-clusterinformation.yaml -o manifests/02-crd-clusterinformation.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-felixconfiguration.yaml -o manifests/02-crd-felixconfiguration.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-globalnetworkpolicy.yaml -o manifests/02-crd-globalnetworkpolicy.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-globalnetworkset.yaml -o manifests/02-crd-globalnetworkset.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-hostendpoint.yaml -o manifests/02-crd-hostendpoint.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-ipamblock.yaml -o manifests/02-crd-ipamblock.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-ipamconfig.yaml -o manifests/02-crd-ipamconfig.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-ipamhandle.yaml -o manifests/02-crd-ipamhandle.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-ippool.yaml -o manifests/02-crd-ippool.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-networkpolicy.yaml -o manifests/02-crd-networkpolicy.yaml
curl https://docs.projectcalico.org/manifests/ocp/crds/calico/kdd/02-crd-networkset.yaml -o manifests/02-crd-networkset.yaml
curl https://docs.projectcalico.org/manifests/ocp/tigera-operator/00-namespace-tigera-operator.yaml -o manifests/00-namespace-tigera-operator.yaml
curl https://docs.projectcalico.org/manifests/ocp/tigera-operator/02-rolebinding-tigera-operator.yaml -o manifests/02-rolebinding-tigera-operator.yaml
curl https://docs.projectcalico.org/manifests/ocp/tigera-operator/02-role-tigera-operator.yaml -o manifests/02-role-tigera-operator.yaml
curl https://docs.projectcalico.org/manifests/ocp/tigera-operator/02-serviceaccount-tigera-operator.yaml -o manifests/02-serviceaccount-tigera-operator.yaml
curl https://docs.projectcalico.org/manifests/ocp/tigera-operator/02-configmap-calico-resources.yaml -o manifests/02-configmap-calico-resources.yaml
curl https://docs.projectcalico.org/manifests/ocp/tigera-operator/02-configmap-tigera-install-script.yaml -o manifests/02-configmap-tigera-install-script.yaml
curl https://docs.projectcalico.org/manifests/ocp/tigera-operator/02-tigera-operator.yaml -o manifests/02-tigera-operator.yaml
curl https://docs.projectcalico.org/manifests/ocp/01-cr-installation.yaml -o manifests/01-cr-installation.yaml

curl https://docs.projectcalico.org/manifests/calicoctl.yaml -o manifests/calicoctl.yaml

oc get Network.operator.openshift.io -o yaml
  # defaultNetwork:
  #   calicoSDNConfig:
  #     mtu: 700
  #   openshiftSDNConfig:
  #     mtu: 700
oc api-resources | grep -i calico
oc api-resources | grep -i tigera

oc get FelixConfiguration -o yaml

oc exec calicoctl -n calico-system -it -- /calicoctl get bgpconfig default

cat << EOF > calico.serviceip.yaml
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  serviceClusterIPs:
  - cidr: 10.96.0.0/16
  - cidr: fd00:192:168:7:1:1::/112
EOF
cat calico.serviceip.yaml | oc exec calicoctl -n calico-system -i -- /calicoctl apply -f -

oc exec calicoctl -n calico-system -it -- /calicoctl get workloadEndpoint
oc exec calicoctl -n calico-system -it -- /calicoctl get BGPPeer

cat << EOF > calico.bgp.yaml
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: my-global-peer
spec:
  peerIP: 192.168.7.11
  asNumber: 64513
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: my-global-peer-v6
spec:
  peerIP: fd00:192:168:7::11
  asNumber: 64513
EOF
cat calico.bgp.yaml | oc exec calicoctl -n calico-system -i -- /calicoctl apply -f -

# on helper
# https://www.vultr.com/docs/configuring-bgp-using-quagga-on-vultr-centos-7
yum install quagga
systemctl start zebra
systemctl start bgpd
cp /usr/share/doc/quagga-*/bgpd.conf.sample /etc/quagga/bgpd.conf
vtysh
show running-config
configure terminal
no router bgp 7675
router bgp 64513
no auto-summary
no synchronization
neighbor 192.168.7.13 remote-as 64512
neighbor 192.168.7.13 description "calico"
neighbor fd00:192:168:7::13 remote-as 64512
neighbor fd00:192:168:7::13 description "calico"
interface eth0
?? no ipv6 nd suppress-ra
exit
exit
write
show running-config
show ip bgp summary

# https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html/networking/cluster-network-operator

oc get Network.operator.openshift.io -o yaml
oc edit Network.operator.openshift.io cluster
  # - cidr: fd01:192:168:7:11:/64
  #   hostPrefix: 80

oc get network.config/cluster
oc edit network.config/cluster

oc get installations.operator.tigera.io -o yaml
oc edit installations.operator.tigera.io
    # nodeAddressAutodetectionV6:
    #   firstFound: true
    - blockSize: 122
      cidr: fd01:192:168:7:11:/80
      encapsulation: None
      natOutgoing: Disabled
      nodeSelector: all()

openshift4 集群升级

4.2的集群升级很简单,更新一下镜像仓库,然后运行一个命令,等着就好了。

# on base host
cat << EOF > /etc/docker-distribution/registry/config.yml
version: 0.1
log:
  fields:
    service: registry
storage:
    cache:
        layerinfo: inmemory
    filesystem:
        rootdirectory: /data/4.2.7/registry
    delete:
        enabled: true
http:
    addr: :443
    tls:
       certificate: /etc/crts/redhat.ren.crt
       key: /etc/crts/redhat.ren.key
EOF

systemctl restart docker-distribution

# on helper node
# oc patch OperatorHub cluster --type json  -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'

oc patch OperatorHub cluster --type json  -p '[{"op": "remove", "path": "/spec/disableAllDefaultSources"}]'

oc patch -n openshift-cluster-samples-operator  configs.samples.operator.openshift.io cluster -p '{"items[0]":{"spec":{"managementState":"Removed"}}}'  --type=merge

oc adm upgrade --allow-explicit-upgrade --allow-upgrade-with-warnings=true --force=true --to-image=registry.redhat.ren/ocp4/openshift4:4.2.7 

openshift4 缩小 / & sysroot 分区大小

openshift4默认安装的时候,会把sda/vda整个硬盘占满,如果我们是baremetal按照,一般会配置SSD/NVME, 1T大小,这样非常浪费。我们完全可以把硬盘空间节省下来,分一些分区,给local storage operator用。

视频讲解:

# backup the ignition file you want
/bin/cp -f /var/www/html/ignition/worker-1.ign /var/www/html/ignition/worker-1.ign.bak

# 修改 /data/ocp4/partition.sh ,
# 主要是修改里面的root分区大小,默认是200G
# 然后是想要创建的数据分区的个数和大小参数,
# 默认会创建5个10G分区,5个5G分区。
bash /data/ocp4/partition.sh

butane /data/ocp4/root-partition.bu -r -o /data/install/partition-ric.ign

/bin/cp -f /var/www/html/ignition/worker-1.ign.bak /var/www/html/ignition/worker-1.ign

# merge the 2 ignition files
jq -s '.[0] * .[1]' /var/www/html/ignition/worker-1.ign /data/install/partition-ric.ign | jq -c . > /var/www/html/ignition/worker-1.ign.new

/bin/cp -f /var/www/html/ignition/worker-1.ign.new /var/www/html/ignition/worker-1.ign

# then install using iso

# login to worker-1
lsblk
# NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
# sr0      11:0    1 1024M  0 rom
# vda     252:0    0    1T  0 disk
# ├─vda1  252:1    0    1M  0 part
# ├─vda2  252:2    0  127M  0 part
# ├─vda3  252:3    0  384M  0 part /boot
# ├─vda4  252:4    0  200G  0 part /sysroot
# ├─vda5  252:5    0   10G  0 part
# ├─vda6  252:6    0   10G  0 part
# ├─vda7  252:7    0   10G  0 part
# ├─vda8  252:8    0   10G  0 part
# ├─vda9  252:9    0   10G  0 part
# ├─vda10 252:10   0    5G  0 part
# ├─vda11 252:11   0    5G  0 part
# ├─vda12 252:12   0    5G  0 part
# ├─vda13 252:13   0    5G  0 part
# └─vda14 252:14   0    5G  0 part /var/lib/kubelet/pods/a364c83a-deae-4431-b7c3-bcef8457aed6/volumes/kubernetes.io~local-volume/local-pv-9fa7f

# let's check what we created
cat /data/ocp4/root-partition.bu
# variant: openshift
# version: 4.8.0
# metadata:
#   name: root-storage
#   labels:
#     machineconfiguration.openshift.io/role: worker
# storage:
#   disks:
#     - device: /dev/vda
#       wipe_table: false
#       partitions:
#         - number: 4
#           label: root
#           size_mib: 204800
#           resize: true
#         - label: data_10G_1
#           size_mib: 10240
#         - label: data_10G_2
#           size_mib: 10240
#         - label: data_10G_3
#           size_mib: 10240
#         - label: data_10G_4
#           size_mib: 10240
#         - label: data_10G_5
#           size_mib: 10240
#         - label: data_5G_1
#           size_mib: 5120
#         - label: data_5G_2
#           size_mib: 5120
#         - label: data_5G_3
#           size_mib: 5120
#         - label: data_5G_4
#           size_mib: 5120
#         - label: data_5G_5
#           size_mib: 5120

cat /data/install/partition-ric.ign | jq .
# {
#   "ignition": {
#     "version": "3.2.0"
#   },
#   "storage": {
#     "disks": [
#       {
#         "device": "/dev/vda",
#         "partitions": [
#           {
#             "label": "root",
#             "number": 4,
#             "resize": true,
#             "sizeMiB": 204800
#           },
#           {
#             "label": "data_10G_1",
#             "sizeMiB": 10240
#           },
#           {
#             "label": "data_10G_2",
#             "sizeMiB": 10240
#           },
#           {
#             "label": "data_10G_3",
#             "sizeMiB": 10240
#           },
#           {
#             "label": "data_10G_4",
#             "sizeMiB": 10240
#           },
#           {
#             "label": "data_10G_5",
#             "sizeMiB": 10240
#           },
#           {
#             "label": "data_5G_1",
#             "sizeMiB": 5120
#           },
#           {
#             "label": "data_5G_2",
#             "sizeMiB": 5120
#           },
#           {
#             "label": "data_5G_3",
#             "sizeMiB": 5120
#           },
#           {
#             "label": "data_5G_4",
#             "sizeMiB": 5120
#           },
#           {
#             "label": "data_5G_5",
#             "sizeMiB": 5120
#           }
#         ],
#         "wipeTable": false
#       }
#     ]
#   }
# }

local storage operator

我们有了很多分区,那么赶快来测试一下如何把他们变成 PV 吧

apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "local-disks"
  namespace: "openshift-local-storage" 
spec:
  nodeSelector: 
    nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - worker-1
  storageClassDevices:
    - storageClassName: "local-sc" 
      volumeMode: Filesystem 
      fsType: xfs 
      devicePaths: 
        - /dev/vda5
        - /dev/vda14

我们可以看到配置已经生效 系统已经帮我们创建好了PV

我们创建pod,创建和使用pvc,然后弄点数据,然后删掉pod,删掉pvc。然后重新创建pod,创建和使用pvc,看看里面的数据是否会清空。

cat << EOF >> /data/install/pvc-demo.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: local-pvc-demo
spec:
  accessModes:
  - ReadWriteOnce
  volumeMode: Filesystem 
  resources:
    requests:
      storage: 2Gi 
  storageClassName: local-sc 
---
kind: Pod
apiVersion: v1
metadata:
  annotations:
  name: demo1
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-1'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        quay.io/wangzheng422/qimgs:centos7-test
      env:
        - name: key
          value: value
      command: 
        - sleep
        - infinity
      imagePullPolicy: Always
      volumeMounts:
        - mountPath: /data
          name: demo 
          readOnly: false
  volumes:
    - name: demo 
      persistentVolumeClaim:
        claimName: local-pvc-demo 
EOF
oc create -n default -f /data/install/pvc-demo.yaml

我们能看到 PVC 已经创建 PV 也已经挂载

oc rsh pod/demo1
df -h
# Filesystem      Size  Used Avail Use% Mounted on
# overlay         200G  8.4G  192G   5% /
# tmpfs            64M     0   64M   0% /dev
# tmpfs            24G     0   24G   0% /sys/fs/cgroup
# shm              64M     0   64M   0% /dev/shm
# tmpfs            24G   64M   24G   1% /etc/hostname
# /dev/vda14      5.0G   68M  5.0G   2% /data
# /dev/vda4       200G  8.4G  192G   5% /etc/hosts
# tmpfs            24G   20K   24G   1% /run/secrets/kubernetes.io/serviceaccount
# tmpfs            24G     0   24G   0% /proc/acpi
# tmpfs            24G     0   24G   0% /proc/scsi
# tmpfs            24G     0   24G   0% /sys/firmware

echo wzh > /data/1
cat /data/1
# wzh

# destroy the pvc and pod
oc delete -n default -f /data/install/pvc-demo.yaml

# recreate 
oc create -n default -f /data/install/pvc-demo.yaml

PVC重新创建了 PV也重新挂在了

我们发现,PV release以后,重新挂载,之前的存储内容,就都没有了。

oc rsh pod/demo1
sh-4.2# cd /data
sh-4.2# ls
sh-4.2# ls -hl
total 0

openshift4 离线升级服务 / disconnected update service

openshift4默认的集群管理界面,会向公网的升级服务请求升级信息,如果在离线安装的情况,这个升级信息是拿不到的,于是集群的管理界面就会一堆报错,很难看。现在openshift4有一个update server operator,这个可以在集群内部创建一个离线的update server,提供升级信息,这样集群的管理界面就不会那么难看啦。

本次实验的部署架构:

视频讲解:

based on:

  • https://www.openshift.com/blog/openshift-update-service-update-manager-for-your-cluster
  • https://docs.openshift.com/container-platform/4.8/updating/installing-update-service.html

离线安装以后,不配置的话,系统管理页面是这个鬼样子:

# search OpenShift Update Service in operator hub, and install

# build a update container
mkdir -p /data/update
cd /data/update
cat << EOF > /data/update/Dockerfile
FROM registry.access.redhat.com/ubi8

RUN curl -L -o cincinnati-graph-data.tar.gz https://github.com/openshift/cincinnati-graph-data/archive/master.tar.gz

CMD exec /bin/bash -c "tar xvzf cincinnati-graph-data.tar.gz -C /var/lib/cincinnati/graph-data/ --strip-components=1"
EOF

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

buildah bud -f ./Dockerfile -t quay.io/wangzheng422/graph-data-image:$var_date
podman push quay.io/wangzheng422/graph-data-image:$var_date

echo quay.io/wangzheng422/graph-data-image:$var_date
# quay.io/wangzheng422/graph-data-image:2021-09-07-0709

cat << EOF > /data/install/update.yaml
apiVersion: updateservice.operator.openshift.io/v1
kind: UpdateService
metadata:
  namespace: openshift-update-service
  name: sample
spec:
  graphDataImage: 'nexus.ocp4.redhat.ren:8083/wangzheng422/graph-data-image:2021-09-07-0709'
  releases: 'registry.ocp4.redhat.ren:5443/ocp4/release'
  replicas: 1
EOF
oc create -f /data/install/update.yaml

# to restore
oc delete -f /data/install/update.yaml

# 部署完了update service 以后,发现报错
# 发现update service operator依赖有password的registry
# 我们之前默认安装的registry是没有密码的,就不行
# 所以重新部署一个需要密码认证的registry就可以了。

oc get secret/pull-secret -n openshift-config -o json | jq '.data.".dockerconfigjson"' | jq -r . | base64 -d | jq .
# {
#   "auths": {
#     "registry.ocp4.redhat.ren:5443": {
#       "username": "admin",
#       "password": "redhat",
#       "auth": "YWRtaW46cmVkaGF0",
#       "email": "admin@redhat.ren"
#     }
#   }
# }

oc delete cm ca.for.registry -n openshift-config
oc create configmap ca.for.registry -n openshift-config \
    --from-file=registry.ocp4.redhat.ren..5443=/etc/crts/redhat.ren.ca.crt \
    --from-file=updateservice-registry=/etc/crts/redhat.ren.ca.crt

oc patch image.config.openshift.io/cluster -p '{"spec":{"additionalTrustedCA":{"name":"ca.for.registry"}}}'  --type=merge

# oc patch image.config.openshift.io/cluster -p '{"spec":{"additionalTrustedCA":{"name":"ca.for.registry"}}}'  --type=merge

# our router's https certs is self-sign, 
# update service will report error on this certs
# so we create a http route, to avoid this error
cat << EOF > /data/install/update-wzh-route.yaml
kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: update-wzh
  namespace: openshift-update-service
  labels:
    app: sample-policy-engine
spec:
  to:
    kind: Service
    name: sample-policy-engine
    weight: 100
  port:
    targetPort: policy-engine
EOF
oc create -f /data/install/update-wzh-route.yaml

oc patch clusterversion version --type='json' -p='[{"op": "replace", "path": "/spec/upstream", "value": "http://update-wzh-openshift-update-service.apps.ocp4.redhat.ren/api/upgrades_info/v1/graph"}]'

oc get clusterversion version -o yaml | more

可以在operator的图形界面中,配置离线的update service参数

离线update service配置好了以后,看上去就非常舒适了。

windows node in openshift 4.8

在本文中,我们将安装一个win10节点,并加入到openshift 4.8集群中去。之后会部署一个演示应用。

经过测试,我们发现,当前的win10当作worker节点,还是不太适合,原因如下:

  • windows要求容器的基础镜像版本,和宿主机的版本严格一致,这样就不能向rhel一样,在rhel8上运行rhel7的容器,在部署的时候会造成很大困惑。
  • windows的容器,不能运行GUI app。虽然也有很多.net的web服务应用,但是更多的老旧windows应用,应该还是包含GUI的程序。这样大大的限制了windows容器的应用访问。
  • docker for windows版本,只能设置proxy,不能为第三方镜像仓库设置mirror,这样对于离线部署,就很难受了。
  • 目前版本,对静态IP部署还不友好,需要手动配置windows网卡。
  • 目前版本的稳定性还有待加强,会出现k8s的服务崩溃现象,只能做开发测试,体验用,当然如果我们用windows server来做,稳定性会好很多。

本次部署的架构图:

视频讲解:

安装 win10

安装win10,需要注意选择正确的版本,因为win10的docker镜像版本,要求和宿主机一致。 在这里查看 win10 docker image version.

在本文撰写的时候,版本是win10 20H2 20H2, 在这里找下载这个版本的ISO.

选择好版本,我们就要开始安装了。

# 先要准备一下 virtio 的驱动,因为 win10 里面没有, 安装的时候找不到硬盘。
podman pull registry.redhat.io/container-native-virtualization/virtio-win
podman run --rm -it --name swap registry.redhat.io/container-native-virtualization/virtio-win bash
podman create --name swap registry.redhat.io/container-native-virtualization/virtio-win ls
podman cp swap:/disk/virtio-win.iso - > virtio-win.iso.tar
gzip virtio-win.iso.tar
podman rm swap

# 直接创建kvm, 自动开始安装啦。
export KVM_DIRECTORY=/data/kvm
virt-install --name=ocp4-windows --vcpus=6,cores=6 --ram=12288 \
--cpu=host-model \
--disk path=/data/nvme/ocp4-windows.qcow2,bus=virtio,size=100 \
--os-variant win10 --network bridge=baremetal,model=virtio \
--graphics vnc,port=59017 \
--boot menu=on \
--cdrom ${KVM_DIRECTORY}/win10.iso \
--disk ${KVM_DIRECTORY}/virtio-win.iso,device=cdrom

win10的话,必须选择专业版。

选择自定义安装,因为我们要加载硬盘驱动

选择加载驱动程序

选择正确的驱动程序位置

选择驱动,下一步

默认安装整个硬盘

安装就自动进行

安装完成后,进入系统,把剩下的驱动,一口气都装了。

系统识别出了网卡,那就设置IP地址吧

我们需要装ssh服务端,从 设置-应用 中找

点击可选功能

点击添加功能

搜索ssh服务器,并安装

安装完了ssh是这样样子的

我们还需要打开防火墙端口,从网络配置进入

选择高级设置

新建入站规则

根据文档要求,打开 22, 10250 端口

允许连接

所有网络位置都允许

给起个名字

ssh服务不是自动启动了,我们设置成自动启动

选择自动

从外面,就能ssh到windows了

我把实验用的win10,打包到了一个镜像里面,需要的可以下载使用。

用户名密码是: wzh / redhat

ssh wzh@worker-1
# Microsoft Windows [版本 10.0.19043.1237]
# (c) Microsoft Corporation。保留所有权利。

# wzh@DESKTOP-FUIF19L C:\Users\wzh>

设置 ssh key auth

我们需要设置ssh使用key的方式自动登录,那么要有几个特殊的步骤。

首先,是解除win10的powershell的限制

Set-ExecutionPolicy unrestricted

接下来准备2个文件

参考这个文章,写一个允许ssh自动key登录的脚本,我们在里面还加上了自动激活hyper-v, windows container的步骤。

# the script here also enable hyper-v and windows container
cat << 'EOF' > /data/install/win-ssh.ps1
$acl = Get-Acl C:\ProgramData\ssh\administrators_authorized_keys
$acl.SetAccessRuleProtection($true, $false)
$administratorsRule = New-Object system.security.accesscontrol.filesystemaccessrule("Administrators","FullControl","Allow")
$systemRule = New-Object system.security.accesscontrol.filesystemaccessrule("SYSTEM","FullControl","Allow")
$acl.SetAccessRule($administratorsRule)
$acl.SetAccessRule($systemRule)
$acl | Set-Acl

Enable-WindowsOptionalFeature -Online -FeatureName $("Microsoft-Hyper-V", "Containers") -All
EOF

# 把脚本, key, 还有安装文件,复制到win10上 
scp /data/install/win-ssh.ps1 wzh@worker-1:c:\\win-ssh.ps1

scp /root/.ssh/id_rsa.pub wzh@worker-1:C:\\ProgramData\\ssh\\administrators_authorized_keys

scp /data/down/Docker\ Desktop\ Installer.exe wzh@worker-1:c:\\docker-install.exe

scp /data/down/wsl_update_x64.msi wzh@worker-1:c:\\wsl_update_x64.msi

用管理员权限,打开power shell

运行我们的脚本

重启win10, 然后你就可以用key自动登录啦。

安装docker,并切换到windows container。

第一次启动docker,会说什么wsl2 linux kernel要更新,可以用我提供的文件,直接更新,也可以直接切换windows container,不用理会那个报警。

设置 docker for windows, 使用 process 来隔离, 因为kvm上的某种未知配置错误,默认hyper-v形式的隔离,启动不了容器,我们换成process来隔离.

{
  "registry-mirrors": [],
  "insecure-registries": [],
  "debug": true,
  "experimental": false,
  "exec-opts": [
    "isolation=process"
  ]
}

配置界面长这样

记得改一下windows的主机名

backup win10 kvm

我们备份一下win10 kvm,并上传quay.io,方便以后重新做实验。

我们可以参考这里,来备份和回复kvm。

# poweroff you win7 vm

mkdir -p /data/nvme/bak

cd /data/nvme

virsh dumpxml ocp4-windows > /data/nvme/bak/ocp4-windows.xml
pigz -c ocp4-windows.qcow2 > /data/nvme/bak/ocp4-windows.qcow2.gz

cd /data/nvme/bak

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

buildah from --name onbuild-container scratch
buildah copy onbuild-container ocp4-windows.xml  /
buildah copy onbuild-container ocp4-windows.qcow2.gz  /
buildah umount onbuild-container 
buildah commit --rm onbuild-container quay.io/wangzheng422/qimgs:win7-ssh-$var_date
# buildah rm onbuild-container
# rm -f nexus-image.tgz 
echo "quay.io/wangzheng422/qimgs:win7-ssh-$var_date"
buildah push quay.io/wangzheng422/qimgs:win7-ssh-$var_date

# so, we got a image contain win10, and feature enabled.
# this is for win10 versin 10.0.19043.1237
# quay.io/wangzheng422/qimgs:win7-ssh-2021-09-30-1340

你可以使用上面的这个版本的镜像,拉取到本地,并从中取出win10虚拟机,然后自己尝试啦。

安装 ocp, 使用 ovn with hybrid mode

参考官方文档:

  • https://docs.openshift.com/container-platform/4.8/windows_containers/byoh-windows-instance.html
  • https://docs.openshift.com/container-platform/4.8/windows_containers/enabling-windows-container-workloads.html

# vi install-config.yaml 
cat << EOF > /data/install/install-config.yaml 
apiVersion: v1
baseDomain: redhat.ren
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 1
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.128.0.0/16
    hostPrefix: 23
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: '{"auths":{"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ppa.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"}}}'
sshKey: |
$( cat /root/.ssh/id_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  - registry.ocp4.redhat.ren:5443/ocp4/release
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  - registry.ocp4.redhat.ren:5443/ocp4/release
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

cat << EOF > /data/install/manifests/cluster-network-03-config.yml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    ovnKubernetesConfig:
      hybridOverlayConfig:
        hybridClusterNetwork: 
        - cidr: 10.132.0.0/16
          hostPrefix: 23
        hybridOverlayVXLANPort: 9898 
EOF

安装windows machien config operator

# 导入ssh key
oc create secret generic cloud-private-key --from-file=private-key.pem=/root/.ssh/id_rsa \
    -n openshift-windows-machine-config-operator

# 配置win10自动登录用户名和ip地址
cat << EOF > /data/install/win-node.yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: windows-instances
  namespace: openshift-windows-machine-config-operator
data:
  192.168.7.17: |- 
    username=wzh
EOF
oc create -f /data/install/win-node.yaml

# to restore
oc delete -f /data/install/win-node.yaml

# csr is automatically approved
oc get csr
# NAME                                       AGE   SIGNERNAME                                    REQUESTOR                                                                         CONDITION
# csr-ff7q5                                  63m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         Approved,Issued
# csr-gzlpq                                  53s   kubernetes.io/kubelet-serving                 system:node:worker-1                                                              Approved,Issued
# csr-rgdzv                                  59s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         Approved,Issued
# csr-zkw8c                                  63m   kubernetes.io/kubelet-serving                 system:node:master-0                                                              Approved,Issued
# system:openshift:openshift-authenticator   59m   kubernetes.io/kube-apiserver-client           system:serviceaccount:openshift-authentication-operator:authentication-operator   Approved,Issued

估计是当前实现的bug,或者其他原因,windows的默认网卡,上面的协议会被disable掉,造成windows node加入集群失败,目前暂时手动的把这些协议都enable,只留一个不激活。当然,你也可以只enable ipv4的配置,也是可以的。

之后就等着好了,openshift会自动上传程序和配置,并配置好windows node,加入集群,成功以后,我们就能看到如下的日志。

{"level":"info","ts":1633004643.789956,"logger":"controllers.configmap","msg":"processing","instances in":"windows-instances"}
{"level":"info","ts":1633004674.0080738,"logger":"wc 192.168.7.17","msg":"configuring"}
{"level":"info","ts":1633004675.3135288,"logger":"wc 192.168.7.17","msg":"transferring files"}
{"level":"info","ts":1633004693.670281,"logger":"wc 192.168.7.17","msg":"configured","service":"windows_exporter","args":"--collectors.enabled cpu,cs,logical_disk,net,os,service,system,textfile,container,memory,cpu_info\""}
{"level":"info","ts":1633004697.0266535,"logger":"controllers.CertificateSigningRequests","msg":"CSR approved","CSR":"csr-rgdzv"}
{"level":"info","ts":1633004703.104529,"logger":"controllers.CertificateSigningRequests","msg":"CSR approved","CSR":"csr-gzlpq"}
{"level":"info","ts":1633004726.9497287,"logger":"wc 192.168.7.17","msg":"configured kubelet","cmd":"C:\\k\\\\wmcb.exe initialize-kubelet --ignition-file C:\\Windows\\Temp\\worker.ign --kubelet-path C:\\k\\kubelet.exe --node-ip=192.168.7.17","output":"Bootstrapping completed successfully"}
{"level":"info","ts":1633004757.078427,"logger":"wc 192.168.7.17","msg":"configure","service":"hybrid-overlay-node","args":"--node worker-1 --hybrid-overlay-vxlan-port=9898 --k8s-kubeconfig c:\\k\\kubeconfig --windows-service --logfile C:\\var\\log\\hybrid-overlay\\hybrid-overlay.log\" depend= kubelet"}
{"level":"info","ts":1633004880.6788793,"logger":"wc 192.168.7.17","msg":"configured","service":"hybrid-overlay-node","args":"--node worker-1 --hybrid-overlay-vxlan-port=9898 --k8s-kubeconfig c:\\k\\kubeconfig --windows-service --logfile C:\\var\\log\\hybrid-overlay\\hybrid-overlay.log\" depend= kubelet"}
{"level":"info","ts":1633004928.5883121,"logger":"wc 192.168.7.17","msg":"configured kubelet for CNI","cmd":"C:\\k\\wmcb.exe configure-cni --cni-dir=\"C:\\k\\cni\\ --cni-config=\"C:\\k\\cni\\config\\cni.conf","output":"CNI configuration completed successfully"}
{"level":"info","ts":1633004941.3937094,"logger":"wc 192.168.7.17","msg":"configured","service":"kube-proxy","args":"--windows-service --v=4 --proxy-mode=kernelspace --feature-gates=WinOverlay=true --hostname-override=worker-1 --kubeconfig=c:\\k\\kubeconfig --cluster-cidr=10.132.0.0/24 --log-dir=C:\\var\\log\\kube-proxy\\ --logtostderr=false --network-name=OVNKubernetesHybridOverlayNetwork --source-vip=10.132.0.14 --enable-dsr=false --feature-gates=IPv6DualStack=false\" depend= hybrid-overlay-node"}
{"level":"info","ts":1633004956.4613981,"logger":"nc 192.168.7.17","msg":"instance has been configured as a worker node","version":"3.1.0+06e96071"}
{"level":"info","ts":1633004956.4949114,"logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"}
{"level":"info","ts":1633004956.5283544,"logger":"controllers.configmap","msg":"processing","instances in":"windows-instances"}
{"level":"info","ts":1633004956.5387952,"logger":"controllers.configmap","msg":"instance is up to date","node":"worker-1","version":"3.1.0+06e96071"}
{"level":"info","ts":1633004956.5493839,"logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"}

我们能看到 windows节点了。

oc get node
# NAME       STATUS   ROLES           AGE     VERSION
# master-0   Ready    master,worker   19h     v1.21.1+a620f50
# worker-1   Ready    worker          4m50s   v1.21.1-1398+98073871f173ba

oc get node --show-labels
# NAME       STATUS   ROLES           AGE     VERSION                       LABELS
# master-0   Ready    master,worker   4h13m   v1.21.1+a620f50               beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
# worker-1   Ready    worker          5m25s   v1.21.1-1398+98073871f173ba   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=windows,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker-1,kubernetes.io/os=windows,node-role.kubernetes.io/worker=,node.kubernetes.io/windows-build=10.0.19042,node.openshift.io/os_id=Windows,windowsmachineconfig.openshift.io/byoh=true

# 看了windows节点不占用machine config pool
oc get mcp
# NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
# master   rendered-master-607708e411d75c10e680d8bf5e24de6f   True      False      False      1              1                   1                     0                      19h
# worker   rendered-worker-cacf7f7f871c77ae92070b0a44fe0b91   True      False      False      0              0                   0                     0                      19h

探索一下装了什么

进入win10,可以看到C:\下面,有一个k目录,还有一个var目录,k目录下面就是配置和可执行程序啦。

wzh@WORKER-1 c:\>dir
 驱动器 C 中的卷没有标签。
 卷的序列号是 C607-13D4

 c:\ 的目录

2021/09/28  19:37       535,444,968 Docker Desktop Installer.exe
2021/09/29  11:12    <DIR>          k
2019/12/07  17:14    <DIR>          PerfLogs
2021/09/28  19:57    <DIR>          Program Files
2021/04/09  21:57    <DIR>          Program Files (x86)
2021/09/29  11:12    <DIR>          Temp
2021/09/28  08:25    <DIR>          Users
2021/09/29  11:11    <DIR>          var
2021/09/28  17:51               428 win-ssh.ps1
2021/09/28  16:34    <DIR>          Windows
               2 个文件    535,445,396 字节
               8 个目录 19,381,813,248 可用字节

wzh@WORKER-1 c:\>dir k
 驱动器 C 中的卷没有标签。
 卷的序列号是 C607-13D4

 c:\k 的目录

2021/09/29  11:12    <DIR>          .
2021/09/29  11:12    <DIR>          ..
2021/09/29  11:12            10,908 bootstrap-kubeconfig
2021/09/29  11:12    <DIR>          cni
2021/09/29  11:12    <DIR>          etc
2021/09/29  11:12        47,493,632 hybrid-overlay-node.exe
2021/09/29  11:12        47,809,536 kube-proxy.exe
2021/09/29  11:12            10,132 kubeconfig
2021/09/29  11:12             5,875 kubelet-ca.crt
2021/09/29  11:12               739 kubelet.conf
2021/09/29  11:12       117,698,048 kubelet.exe
2021/09/29  11:12    <DIR>          usr
2021/09/29  11:12        16,986,112 windows_exporter.exe
2021/09/29  11:12        16,331,776 wmcb.exe
               9 个文件    246,346,758 字节
               5 个目录 19,381,317,632 可用字节

wzh@WORKER-1 c:\>dir var\log
 驱动器 C 中的卷没有标签。
 卷的序列号是 C607-13D4

 c:\var\log 的目录

2021/09/29  11:12    <DIR>          .
2021/09/29  11:12    <DIR>          ..
2021/09/29  11:12    <DIR>          containers
2021/09/29  11:12    <DIR>          hybrid-overlay
2021/09/29  11:16    <DIR>          kube-proxy
2021/09/29  11:12    <DIR>          kubelet
2021/09/29  11:12    <DIR>          pods
               0 个文件              0 字节
               7 个目录 19,381,059,584 可用字节

wzh@WORKER-1 c:\>dir var\lib
 驱动器 C 中的卷没有标签。
 卷的序列号是 C607-13D4

 c:\var\lib 的目录

2021/09/28  20:36    <DIR>          .
2021/09/28  20:36    <DIR>          ..
2021/09/28  20:36    <DIR>          dockershim
2021/09/28  20:38    <DIR>          kubelet
               0 个文件              0 字节
               4 个目录 19,381,043,200 可用字节

删除windows节点

除了官方文档说的,改config map之外,发现,最好还是重启一下windows node为好。

改了config map,耐心等着,最后oc get node,就会看到windows node没有了。

从operator的日志里面,可以看到如下的日志信息。

{"level":"info","ts":1632916600.248877,"logger":"controllers.configmap","msg":"processing","instances in":"windows-instances"}
{"level":"info","ts":1632916610.646764,"logger":"wc 192.168.7.17","msg":"deconfiguring"}
{"level":"info","ts":1632916641.877409,"logger":"wc 192.168.7.17","msg":"deconfigured","service":"windows_exporter"}
{"level":"info","ts":1632916672.9587948,"logger":"wc 192.168.7.17","msg":"deconfigured","service":"kube-proxy"}
{"level":"info","ts":1632916703.9290483,"logger":"wc 192.168.7.17","msg":"deconfigured","service":"hybrid-overlay-node"}
{"level":"info","ts":1632916734.8715909,"logger":"wc 192.168.7.17","msg":"deconfigured","service":"kubelet"}
{"level":"info","ts":1632916734.8733184,"logger":"wc 192.168.7.17","msg":"removing directories"}
{"level":"info","ts":1632916735.4904935,"logger":"wc 192.168.7.17","msg":"removing HNS networks"}
{"level":"info","ts":1632916924.5720427,"logger":"nc 192.168.7.17","msg":"instance has been deconfigured","node":"worker-1"}
{"level":"info","ts":1632916924.6041753,"logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"}
{"level":"info","ts":1632916924.6054258,"logger":"controllers.configmap","msg":"processing","instances in":"windows-instances"}
{"level":"info","ts":1632916924.6281445,"logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"}

resize qcow2 disk

https://computingforgeeks.com/how-to-extend-increase-kvm-virtual-machine-disk-size/

qemu-img info /data/nvme/ocp4-windows.qcow2
# image: /data/nvme/ocp4-windows.qcow2
# file format: qcow2
# virtual size: 50 GiB (53687091200 bytes)
# disk size: 43.3 GiB
# cluster_size: 65536
# Format specific information:
#     compat: 1.1
#     lazy refcounts: true
#     refcount bits: 16
#     corrupt: false

qemu-img resize /data/nvme/ocp4-windows.qcow2 +20G
# Image resized.


windows workload

似乎现在的 docker for windows 并不支持给 mcr.microsoft.com 做镜像代理,只能配置一个proxy,这个太讨厌了,等以后迁移到 podman 或者 containerd 吧。所以我们现在基本上属于联网或者半联网的部署模式。

在这里查找windows镜像的版本

# pod pause的镜像
# mcr.microsoft.com/oss/kubernetes/pause:3.4.1

# 创建runtime class
cat << EOF > /data/install/win-runtime.yaml
apiVersion: node.k8s.io/v1beta1
kind: RuntimeClass
metadata:
  name: runtime-class-win10
handler: 'docker'
scheduling:
  nodeSelector: 
    kubernetes.io/os: 'windows'
    kubernetes.io/arch: 'amd64'
    node.kubernetes.io/windows-build: '10.0.19042'
  tolerations: 
  - effect: NoSchedule
    key: os
    operator: Equal
    value: "Windows"
EOF
oc create -f /data/install/win-runtime.yaml

# https://hub.docker.com/_/microsoft-windows
# mcr.microsoft.com/windows:20H2
cat << 'EOF' > /data/install/win-dep.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: win-webserver
  name: win-webserver
spec:
  selector:
    matchLabels:
      app: win-webserver
  replicas: 1
  template:
    metadata:
      labels:
        app: win-webserver
      name: win-webserver
    spec:
      tolerations:
      - key: "os"
        value: "Windows"
        Effect: "NoSchedule"
      containers:
      - name: windowswebserver
        image: mcr.microsoft.com/windows:20H2
        imagePullPolicy: IfNotPresent
        command:
        - powershell.exe
        - -command
        - $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Red Hat OpenShift + Windows Container Workloads</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); };
        securityContext:
          windowsOptions:
            runAsUserName: "ContainerAdministrator"
      nodeSelector:
        beta.kubernetes.io/os: windows
EOF
oc create -f /data/install/win-dep.yaml

# to restore
oc delete -f /data/install/win-dep.yaml

cat << EOF > /data/install/win-svc.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: win-webserver
  labels:
    app: win-webserver
spec:
  ports:
    # the port that this service should serve on
  - port: 80
    targetPort: 80
  selector:
    app: win-webserver
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: win-webserver
spec:
  port:
    targetPort: 80
  to:
    kind: Service
    name: win-webserver
---
EOF
oc create -f /data/install/win-svc.yaml

# try windows server core, if you run on windows server
# otherwize, it will failed, say os not match with host: 
# "The container operating system does not match the host operating system."
# https://hub.docker.com/_/microsoft-windows-servercore
# mcr.microsoft.com/windows/servercore:20H2

cat << EOF > /data/install/test-pod.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mypod
  labels:
    app: mypod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mypod
  template:
    metadata:
      labels:
        app: mypod
    spec:
      containers:
      - name: mypod
        image: quay.io/wangzheng422/qimgs:centos7-test
        command:
          - sleep
          - infinity
EOF
oc create -f /data/install/test-pod.yaml

oc get all
# NAME                                READY   STATUS    RESTARTS   AGE
# pod/mypod-6b8b7b46cb-rrfmd          1/1     Running   1          21h
# pod/win-webserver-9f98c76d4-8nb2q   1/1     Running   0          110s

# NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP                            PORT(S)   AGE
# service/kubernetes      ClusterIP      172.30.0.1      <none>                                 443/TCP   26h
# service/openshift       ExternalName   <none>          kubernetes.default.svc.cluster.local   <none>    25h
# service/win-webserver   ClusterIP      172.30.240.75   <none>                                 80/TCP    21h

# NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
# deployment.apps/mypod           1/1     1            1           21h
# deployment.apps/win-webserver   1/1     1            1           110s

# NAME                                      DESIRED   CURRENT   READY   AGE
# replicaset.apps/mypod-6b8b7b46cb          1         1         1       21h
# replicaset.apps/win-webserver-9f98c76d4   1         1         1       110s

# NAME                                     HOST/PORT                                    PATH   SERVICES        PORT   TERMINATION   WILDCARD
# route.route.openshift.io/win-webserver   win-webserver-default.apps.ocp4.redhat.ren          win-webserver   80                   None

curl win-webserver-default.apps.ocp4.redhat.ren && echo
# <html><body><H1>Red Hat OpenShift + Windows Container Workloads</H1></body></html>

oc exec -it pod/win-webserver-9f98c76d4-8nb2q -- cmd

Microsoft Windows [Version 10.0.19042.1237]
(c) Microsoft Corporation. All rights reserved.

C:\>tasklist

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
System Idle Process              0                            0          8 K
System                           4                            0        148 K
smss.exe                      9992                            0      1,760 K
csrss.exe                     6788 Services                   3      4,524 K
wininit.exe                   7096 Services                   3      5,260 K
services.exe                  6456 Services                   3      6,668 K
lsass.exe                     3324 Services                   3     12,536 K
fontdrvhost.exe               5736 Services                   3      2,860 K
svchost.exe                   4948 Services                   3     12,896 K
svchost.exe                   6960 Services                   3      8,180 K
svchost.exe                   3332 Services                   3     16,952 K
svchost.exe                    756 Services                   3     53,864 K
svchost.exe                   5924 Services                   3      9,728 K
svchost.exe                   6412 Services                   3      8,012 K
svchost.exe                   5628 Services                   3      6,740 K
svchost.exe                   9488 Services                   3      4,688 K
svchost.exe                   8912 Services                   3     12,896 K
CExecSvc.exe                  5616 Services                   3      4,020 K
svchost.exe                   5916 Services                   3     28,600 K
svchost.exe                   2780 Services                   3      4,404 K
powershell.exe                2816 Services                   3     78,156 K
CompatTelRunner.exe           3056 Services                   3      2,852 K
svchost.exe                   9412 Services                   3     11,104 K
conhost.exe                   7748 Services                   3     10,824 K
svchost.exe                   3636 Services                   3      7,404 K
conhost.exe                   1288 Services                   3      3,800 K
cmd.exe                       5112 Services                   3      2,884 K
svchost.exe                   4492 Services                   3      8,900 K
MicrosoftEdgeUpdate.exe       8808 Services                   3      1,760 K
svchost.exe                   7612 Services                   3     10,112 K
conhost.exe                   4944 Services                   3      5,176 K
cmd.exe                       9848 Services                   3      5,140 K
MoUsoCoreWorker.exe           3016 Services                   3     17,220 K
WmiPrvSE.exe                  7924 Services                   3      9,340 K
WmiPrvSE.exe                  5976 Services                   3      9,384 K
spoolsv.exe                   6204 Services                   3      6,580 K
conhost.exe                   6184 Services                   3      5,208 K
cmd.exe                       5680 Services                   3      4,428 K
tasklist.exe                  8424 Services                   3      8,812 K

在win10上,我们能从docker界面上,看到有2个container启动了。

同样,在docker界面上,我们能看到他下载了2个镜像,并且正在使用中。

排错

如果发现有异常,首先要做的是,查看kubelet, kubeproxy, hybrid-overlay-node 这3个服务,是不是还在运行,当前的版本,似乎这几个服务,很容易崩溃。

之后,就是看看默认网卡的ipv4配置,是否被禁用了,估计未来兼容性好了,就不用操心这个了。

# on windows cmd
netsh interface dump

openshift 4.9 静态IP 半离线 baremetal 安装,包含SNO(single node openshift)

安装过程视频

本文描述ocp4.9在baremetal(kvm模拟)上面,静态ip安装的方法。包括operator hub步骤。

架构图

离线安装包下载

ocp4.3的离线安装包下载和3.11不太一样,按照如下方式准备。另外,由于默认的baremetal是需要dhcp, pxe环境的,那么需要准备一个工具机,上面有dhcp, tftp, haproxy等工具,另外为了方便项目现场工作,还准备了ignition文件的修改工具,所以离线安装包需要一些其他第三方的工具。

https://github.com/wangzheng422/ocp4-upi-helpernode 这个工具,是创建工具机用的。

https://github.com/wangzheng422/filetranspiler 这个工具,是修改ignition文件用的。

打包好的安装包,在这里下载,百度盘下载链接,版本是 4.9.12 :

  • 4.9.12
    • 链接: https://pan.baidu.com/s/1Wj5MUBLMFli1kOit1eafug 提取码: ur8r

其中包括如下类型的文件:

  • ocp4.tgz 这个文件包含了iso等安装介质,以及各种安装脚本,全部下载的镜像列表等。需要复制到宿主机,以及工具机上去。
  • registry.tgz 这个文件也是docker image registry的仓库打包文件。需要先补充镜像的话,按照这里操作: 4.6.add.image.md

合并这些切分文件,使用类似如下的命令

cat registry.?? > registry.tgz

在外网云主机上面准备离线安装源

准备离线安装介质的文档,已经转移到了这里:4.9.build.dist.md

宿主机准备

本次实验,是在一个32C, 256G 的主机上面,用很多个虚拟机安装测试。所以先准备这个宿主机。

如果是多台宿主机,记得一定要调整时间配置,让这些宿主机的时间基本一致,否则证书会出问题。

主要的准备工作有

  • 配置yum源
  • 配置dns
  • 安装镜像仓库
  • 配置vnc环境
  • 配置kvm需要的网络
  • 创建helper kvm
  • 配置一个haproxy,从外部导入流量给kvm

以上准备工作,dns部分需要根据实际项目环境有所调整。

本次的宿主机是两台rocky linux

kvm host 101


# 因为是半离线,我们的host os还有helper os是联线的,那么我们就用在线的源吧。
# dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
dnf install -y epel-release

dnf install -y byobu htop dstat

# 准备vnc环境
vncpasswd

cat << EOF > ~/.vnc/config
session=gnome
securitytypes=vncauth,tlsvnc
# desktop=sandbox
geometry=1280x800
alwaysshared
EOF

cat << EOF >> /etc/tigervnc/vncserver.users
:1=root
EOF

# systemctl disable vncserver@:1
systemctl start vncserver@:1
# 如果你想停掉vnc server,这么做
systemctl stop vncserver@:1

/usr/libexec/vncsession-start :1


# 配置kvm环境
dnf -y groupinstall "Server with GUI"

dnf -y install qemu-kvm libvirt libguestfs-tools virt-install virt-viewer virt-manager tigervnc-server

systemctl disable --now firewalld
systemctl enable --now libvirtd


# 创建实验用虚拟网络

mkdir -p /data/kvm
cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.104/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh

# 创建工具机

mkdir -p /data/kvm
cd /data/kvm

osinfo-query os | grep rhel8
#  rhel8-unknown        | Red Hat Enterprise Linux 8 Unknown                 | 8-unknown | http://redhat.com/rhel/8-unknown
#  rhel8.0              | Red Hat Enterprise Linux 8.0                       | 8.0      | http://redhat.com/rhel/8.0
#  rhel8.1              | Red Hat Enterprise Linux 8.1                       | 8.1      | http://redhat.com/rhel/8.1
#  rhel8.2              | Red Hat Enterprise Linux 8.2                       | 8.2      | http://redhat.com/rhel/8.2
#  rhel8.3              | Red Hat Enterprise Linux 8.3                       | 8.3      | http://redhat.com/rhel/8.3
#  rhel8.4              | Red Hat Enterprise Linux 8.4                       | 8.4      | http://redhat.com/rhel/8.4

wget https://mirrors.sjtug.sjtu.edu.cn/rocky/8.4/isos/x86_64/Rocky-8.4-x86_64-minimal.iso

# lvremove -f rhel/data
lvcreate -y -l 100%FREE -n data nvme
mkfs.xfs /dev/nvme/data
mkdir -p /data/nvme
mount /dev/nvme/data /data/nvme

cat << EOF >> /etc/fstab
/dev/nvme/data /data/nvme                   xfs     defaults        0 0
EOF

cd /data/kvm
wget https://mirrors.sjtug.sjtu.edu.cn/rocky/8.4/isos/x86_64/Rocky-8.4-x86_64-minimal.iso

export http_proxy="http://192.168.195.54:5085"
export https_proxy=${http_proxy}

wget https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.9/scripts/helper-ks-rocky.cfg

unset http_proxy
unset https_proxy

sed -i '0,/^network.*/s/^network.*/network  --bootproto=static --device=enp1s0 --gateway=192.168.7.1 --ip=192.168.7.71  --netmask=255.255.255.0 --nameserver=192.168.7.71  --ipv6=auto --activate/' helper-ks-rhel8.cfg
# https://stackoverflow.com/questions/18620153/find-matching-text-and-replace-next-line
sed -i '/^network.*/{n;s/^network.*/network  --hostname=sno-helper/}' helper-ks-rhel8.cfg

export KVM_DIRECTORY=/home/data/kvm
virt-install --name="sno-aHelper" --vcpus=2 --ram=4096 \
--cpu=host-model \
--disk path=${KVM_DIRECTORY}/sno-aHelper.qcow2,bus=virtio,size=20 \
--os-variant rhel8.3 --network bridge=baremetal,model=virtio \
--graphics vnc,port=59200 \
--boot menu=on \
--location ${KVM_DIRECTORY}/rhel-8.3-x86_64-dvd.iso \
--disk ${KVM_DIRECTORY}/rhel-8.3-x86_64-dvd.iso,device=cdrom \
--initrd-inject helper-ks-rhel8.cfg --extra-args "inst.ks=file:/helper-ks-rhel8.cfg" 

# virt-viewer --domain-name ocp4-aHelper
# virsh start ocp4-aHelper
# virsh list --all

# start chrony/ntp server on host
# cat << EOF > /etc/chrony.conf
# driftfile /var/lib/chrony/drift
# makestep 1.0 3
# rtcsync
# allow 192.0.0.0/8
# local stratum 10
# logdir /var/log/chrony
# EOF

# echo "allow 192.0.0.0/8" >> /etc/chrony.conf
# systemctl enable --now chronyd
# # systemctl restart chronyd
# chronyc tracking
# chronyc sources -v
# chronyc sourcestats -v
# chronyc makestep

工具机准备

以下是在工具机里面,进行的安装操作。

主要的操作有

  • 配置yum源
  • 运行ansible脚本,自动配置工具机
  • 上传定制的安装配置文件
  • 生成ignition文件

export YUMIP="192.168.7.1"
cat << EOF > /etc/yum.repos.d/remote.repo
[remote-ftp]
name=ftp
baseurl=ftp://${YUMIP}/
enabled=1
gpgcheck=0

EOF

# dnf install -y epel-release
# dnf install -y byobu
dnf update -y
reboot

sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config
systemctl restart sshd

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

echo "allow 192.0.0.0/8" >> /etc/chrony.conf
systemctl enable --now chronyd
# systemctl restart chronyd
chronyc tracking
chronyc sources -v
chronyc sourcestats -v
chronyc makestep

# nmcli con mod enp1s0 +ipv4.addresses "192.168.7.71/24"
# nmcli con up enp1s0

dnf -y install ansible git unzip podman python3 buildah skopeo

mkdir -p /data/ocp4/
# scp ocp4.tgz to /data
# scp * root@172.21.6.11:/data/
cd /data
tar zvxf ocp.*.tgz
tar zvxf registry.*.tgz
cd /data/ocp4

rm -f /data/*.tgz

# 配置registry
# 配置registry
mkdir -p /etc/crts/ && cd /etc/crts

# https://access.redhat.com/documentation/en-us/red_hat_codeready_workspaces/2.1/html/installation_guide/installing-codeready-workspaces-in-tls-mode-with-self-signed-certificates_crw
openssl genrsa -out /etc/crts/redhat.ren.ca.key 4096
openssl req -x509 \
  -new -nodes \
  -key /etc/crts/redhat.ren.ca.key \
  -sha256 \
  -days 36500 \
  -out /etc/crts/redhat.ren.ca.crt \
  -subj /CN="Local Red Hat Ren Signer" \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat /etc/pki/tls/openssl.cnf \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature'))

openssl genrsa -out /etc/crts/redhat.ren.key 2048

openssl req -new -sha256 \
    -key /etc/crts/redhat.ren.key \
    -subj "/O=Local Red Hat Ren /CN=*.ocp4.redhat.ren" \
    -reqexts SAN \
    -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "\n[SAN]\nsubjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth")) \
    -out /etc/crts/redhat.ren.csr

openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth") \
    -days 365 \
    -in /etc/crts/redhat.ren.csr \
    -CA /etc/crts/redhat.ren.ca.crt \
    -CAkey /etc/crts/redhat.ren.ca.key \
    -CAcreateserial -out /etc/crts/redhat.ren.crt

openssl x509 -in /etc/crts/redhat.ren.crt -text

/bin/cp -f /etc/crts/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

cd /data
# mkdir -p /data/registry
# tar zxf registry.tgz
dnf -y install podman pigz skopeo jq 
# pigz -dc registry.tgz | tar xf -
cd /data/ocp4
podman load -i /data/ocp4/registry.tgz

systemctl disable --now firewalld

podman run --restart always --name local-registry -p 5443:5443 \
  -d --restart=always \
  -v /home/ocp.4.9.5/registry/:/var/lib/registry:z \
  -v /etc/crts:/certs:z \
  -e REGISTRY_HTTP_ADDR=0.0.0.0:5443 \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/redhat.ren.crt \
  -e REGISTRY_HTTP_TLS_KEY=/certs/redhat.ren.key \
  docker.io/library/registry:2

podman start local-registry

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/building_running_and_managing_containers/assembly_porting-containers-to-systemd-using-podman_building-running-and-managing-containers

# podman generate systemd --new --files --name local-registry
podman generate systemd --files --name local-registry
# /root/container-local-registry.service
cp -Z container-local-registry.service  /usr/lib/systemd/system

systemctl enable --now container-local-registry.service
systemctl status container-local-registry.service

# podman rm --storage 7cb9fcea76ad384313a682a469be6784786eb5004a190ad2abe68978b1566416

# firewall-cmd --permanent --add-port=5443/tcp
# firewall-cmd --reload

# 加载更多的镜像
# 解压缩 ocp4.tgz
# bash add.image.load.sh /data/4.6.5/install.image 'registry.ocp4.redhat.ren:5443'

# https://github.com/christianh814/ocp4-upi-helpernode/blob/master/docs/quickstart.md

# in helper node
# mkdir /etc/yum.repos.d.bak
# mv /etc/yum.repos.d/* /etc/yum.repos.d.bak/
# cat << EOF > /etc/yum.repos.d/remote.repo
# [remote]
# name=RHEL FTP
# baseurl=ftp://192.168.7.1/data
# enabled=1
# gpgcheck=0

# EOF

# yum clean all
# yum repolist

# 这里使用了一个ansible的项目,用来部署helper节点的服务。
# https://github.com/wangzheng422/ocp4-upi-helpernode
cd /data/ocp4
unzip ocp4-upi-helpernode.zip
# 这里使用了一个ignition文件合并的项目,用来帮助自定义ignition文件。
# https://github.com/wangzheng422/filetranspiler
# podman load -i filetranspiler.tgz

# podman load -i nexus.tgz

ssh-keygen

# 接下来,我们使用ansible来配置helper节点,装上各种openshift集群需要的服务
# 根据现场环境,修改 ocp4-upi-helpernode-master/vars-static.yaml
# 主要是修改各个节点的网卡和硬盘参数,还有IP地址
cd /data/ocp4/ocp4-upi-helpernode-master
# ansible-playbook -e @vars-static.yaml -e '{staticips: true}' tasks/main.yml

cat << 'EOF' > /data/ocp4/ocp4-upi-helpernode-master/vars.yaml
---
ocp_version: 4.9.5
ssh_gen_key: false
staticips: true
firewalld: false
dns_forward: yes
iso:
  iso_dl_url: "/data/ocp4/rhcos-live.x86_64.iso"
  my_iso: "rhcos-live.iso" # this is internal file, just leave as it.
helper:
  name: "helper"
  ipaddr: "192.168.7.71"
  networkifacename: "enp1s0"
  gateway: "192.168.7.1"
  netmask: "255.255.255.0"
dns:
  domain: "redhat.ren"
  clusterid: "ocp4s"
  forwarder1: "202.106.0.20"
  forwarder2: "202.106.0.20"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.72"
  interface: "enp1s0"
  install_drive: "vda"
  manual: false
masters:
  - name: "master-0"
    ipaddr: "192.168.7.73"
    interface: "enp103s0f1"
    install_drive: "sda"
    disable_interfaces:
      - interface: "enp3s0"
        ipaddr: "10.44.44.44"
      - interface: "enp4s0"
        ipaddr: "10.44.44.45"
      - interface: "enp103s0f0"
        ipaddr: "10.44.44.46"
    manual: false
  # - name: "master-1"
  #   ipaddr: "192.168.7.14"
  #   interface: "enp1s0"
  #   install_drive: "vda"    
  # - name: "master-2"
  #   ipaddr: "192.168.7.15"
  #   interface: "enp1s0"
  #   install_drive: "vda"    
# workers:
  # - name: "worker-0"
  #   ipaddr: "192.168.7.16"
  #   interface: "ens3f0"
  #   install_drive: "sda"
  # - name: "worker-1"
  #   ipaddr: "192.168.7.17"
  #   interface: "enp1s0"
  #   install_drive: "sda"
  # - name: "worker-2"
  #   ipaddr: "192.168.7.18"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "infra-0"
  #   ipaddr: "192.168.7.19"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "infra-1"
  #   ipaddr: "192.168.7.20"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "worker-3"
  #   ipaddr: "192.168.7.21"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "worker-4"
  #   ipaddr: "192.168.7.22"
  #   interface: "enp1s0"
  #   install_drive: "vda"
others:
  - name: "registry"
    ipaddr: "192.168.7.1"
  - name: "yum"
    ipaddr: "192.168.7.1"
  - name: "quay"
    ipaddr: "192.168.7.11"
  - name: "nexus"
    ipaddr: "192.168.7.1"
  - name: "git"
    ipaddr: "192.168.7.11"
otherdomains:
  - domain: "rhv.redhat.ren"
    hosts:
    - name: "manager"
      ipaddr: "192.168.7.71"
    - name: "rhv01"
      ipaddr: "192.168.7.72"
  - domain: "others.redhat.ren"
    hosts:
    - name: "*"
      ipaddr: "192.168.7.71"
    - name: "*.apps"
      ipaddr: "192.168.7.71"
  - domain: "ocp4.redhat.ren"
    hosts:
      - name: "registry"
        ipaddr: "192.168.7.1"
      - name: "yum"
        ipaddr: "192.168.7.1"
      - name: "quay"
        ipaddr: "192.168.7.11"
      - name: "nexus"
        ipaddr: "192.168.7.1"
      - name: "git"
        ipaddr: "192.168.7.11"
force_ocp_download: false
remove_old_config_files: false
ocp_client: "file:///data/ocp4/{{ ocp_version }}/openshift-client-linux-{{ ocp_version }}.tar.gz"
ocp_installer: "file:///data/ocp4/{{ ocp_version }}/openshift-install-linux-{{ ocp_version }}.tar.gz"
ocp_bios: "file:///data/ocp4/rhcos-metal.x86_64.raw.gz"
ppc64le: false
arch: 'x86_64'
chronyconfig:
  enabled: true
  content:
    - server: "192.168.7.1"
      options: iburst
setup_registry: # don't worry about this, just leave it here
  deploy: false
  registry_image: docker.io/library/registry:2
  local_repo: "ocp4/openshift4"
  product_repo: "openshift-release-dev"
  release_name: "ocp-release"
  release_tag: "4.6.1-x86_64"
ocp_filetranspiler: "file:///data/ocp4/filetranspiler.tgz"
registry_server: "registry.ocp4.redhat.ren:5443"
EOF

ansible-playbook -e @vars.yaml tasks/main.yml

# try this:
/usr/local/bin/helpernodecheck

mkdir -p /data/install

# # GOTO image registry host
# # copy crt files to helper node
# scp /etc/crts/redhat.ren.ca.crt root@192.168.7.11:/data/install/
# scp /etc/crts/redhat.ren.crt root@192.168.7.11:/data/install/
# scp /etc/crts/redhat.ren.key root@192.168.7.11:/data/install/

# GO back to help node
# /bin/cp -f /data/install/redhat.ren.crt /etc/pki/ca-trust/source/anchors/
# update-ca-trust extract

# 定制ignition
cd /data/install

# 根据现场环境,修改 install-config.yaml
# 至少要修改ssh key, 还有 additionalTrustBundle,这个是镜像仓库的csr 

# vi install-config.yaml 
cat << EOF > /data/install/install-config.yaml 
apiVersion: v1
baseDomain: redhat.ren
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 1
metadata:
  name: ocp4s
networking:
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: '{"auths":{"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ppa.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"}}}'
sshKey: |
$( cat /root/.ssh/id_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  - registry.ocp4.redhat.ren:5443/ocp4/release
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.ocp4.redhat.ren:5443/ocp4/openshift4
  - registry.ocp4.redhat.ren:5443/ocp4/release
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

cd /data/install/
/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] 

openshift-install create manifests --dir=/data/install

# copy ntp related config
/bin/cp -f  /data/ocp4/ocp4-upi-helpernode-master/machineconfig/* /data/install/openshift/

# copy image registry proxy related config
cd /data/ocp4
bash image.registries.conf.sh nexus.ocp4.redhat.ren:8083

/bin/cp -f /data/ocp4/image.registries.conf /etc/containers/registries.conf.d/

/bin/cp -f /data/ocp4/99-worker-container-registries.yaml /data/install/openshift
/bin/cp -f /data/ocp4/99-master-container-registries.yaml /data/install/openshift

cd /data/install/
openshift-install create ignition-configs --dir=/data/install

cd /data/ocp4/ocp4-upi-helpernode-master
# 我们来为每个主机,复制自己版本的ign,并复制到web server的目录下
ansible-playbook -e @vars.yaml tasks/ign.yml
# 如果对每个主机有自己ign的独特需求,在这一步,去修改ign。

# 以下操作本来是想设置网卡地址,但是实践发现是不需要的。
# 保留在这里,是因为他可以在安装的时候注入文件,非常有用。
# mkdir -p bootstrap/etc/sysconfig/network-scripts/
# cat <<EOF > bootstrap/etc/sysconfig/network-scripts/ifcfg-ens3
# DEVICE=ens3
# BOOTPROTO=none
# ONBOOT=yes
# IPADDR=192.168.7.12
# NETMASK=255.255.255.0
# GATEWAY=192.168.7.1
# DNS=192.168.7.11
# DNS1=192.168.7.11
# DNS2=192.168.7.1
# DOMAIN=redhat.ren
# PREFIX=24
# DEFROUTE=yes
# IPV6INIT=no
# EOF
# filetranspiler -i bootstrap.ign -f bootstrap -o bootstrap-static.ign
# /bin/cp -f bootstrap-static.ign /var/www/html/ignition/

# 我们为每个节点创建各自的iso文件
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars.yaml tasks/iso.yml
# ansible-playbook -e @vars.yaml tasks/iso.small.yml

# if boot using live-iso, you need to run this cmd during install
nmtui
coreos-installer install --copy-network \
     --ignition-url=http://192.168.7.71:8080/ignition/master-0.ign --insecure-ignition  --image-url=http://192.168.7.71:8080/install/bios.raw.gz  --insecure /dev/sda 

回到宿主机

本来,到了这一步,就可以开始安装了,但是我们知道coreos装的时候,要手动输入很长的命令行,实际操作的时候,那是不可能输入对的,输入错一个字符,安装就失败,要重启,重新输入。。。

为了避免这种繁琐的操作,参考网上的做法,我们就需要为每个主机定制iso了。好在,之前的步骤,我们已经用ansible创建了需要的iso,我们把这些iso复制到宿主机上,就可以继续了。

这里面有一个坑,我们是不知道主机的网卡名称的,只能先用coreos iso安装启动一次,进入单用户模式以后,ip a 来查看以下,才能知道,一般来说,是ens3。

另外,如果是安装物理机,disk是哪个,也需要上述的方法,来看看具体的盘符。另外,推荐在物理机上安装rhel 8 来测试一下物理机是不是支持coreos。物理机安装的时候,遇到不写盘的问题,可以尝试添加启动参数: ignition.firstboot=1

# on kvm host 172.21.6.101
export KVM_DIRECTORY=/home/data/kvm

mkdir -p  ${KVM_DIRECTORY}
cd ${KVM_DIRECTORY}
scp root@192.168.7.71:/data/install/{*boot*,*master-0,*worker-0}.iso ${KVM_DIRECTORY}/

# on kvm host 172.21.6.101

# finally, we can start install :)
# 你可以一口气把虚拟机都创建了,然后喝咖啡等着。
# 从这一步开始,到安装完毕,大概30分钟。
# export KVM_DIRECTORY=/data/kvm
virt-install --name=sno-bootstrap --vcpus=4 --ram=8192 \
--disk path=${KVM_DIRECTORY}/ocp4-bootstrap.qcow2,bus=virtio,size=30 \
--os-variant rhel8.3 --network bridge=baremetal,model=virtio \
--graphics vnc,port=59101 \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-bootstrap.iso   

# 想登录进coreos一探究竟?那么这么做
# ssh core@bootstrap 
# journalctl -b -f -u bootkube.service
# export KVM_DIRECTORY=/data/kvm
virt-install --name=sno-master-0 --vcpus=16 --ram=49152 \
--cpu=host-model \
--disk path=${KVM_DIRECTORY}/ocp4-master-0.qcow2,bus=virtio,size=120 \
--os-variant rhel8.3 --network bridge=baremetal,model=virtio \
--graphics vnc,port=59002 \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-0.iso 

# virt-install --name=ocp4-master-1 --vcpus=10 --ram=20480 \
# --cpu=host-model \
# --disk path=/data/nvme/ocp4-master-1.qcow2,bus=virtio,size=120 \
# --os-variant rhel8.4 --network bridge=baremetal,model=virtio \
# --graphics vnc,port=59003 \
# --boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-1.iso 

# # ssh core@192.168.7.13

# # on kvm host 172.21.6.103
# export KVM_DIRECTORY=/data/kvm

# virt-install --name=ocp4-master-2 --vcpus=22 --ram=30720 \
# --cpu=host-model \
# --disk path=/data/kvm/ocp4-master-2.qcow2,bus=virtio,size=120 \
# --os-variant rhel8.4 --network bridge=baremetal,model=virtio \
# --graphics vnc,port=59004 \
# --boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-2.iso 

# on kvm host 172.21.6.104
export KVM_DIRECTORY=/data/kvm

# virt-install --name=ocp4-worker-0 --vcpus=4 --ram=10240 \
# --disk path=/data/kvm/ocp4-worker-0.qcow2,bus=virtio,size=120 \
# --os-variant rhel8.4 --network bridge=baremetal,model=virtio \
# --graphics vnc,port=59005 \
# --boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-worker-0.iso 

# if install on baremetal
nmtui
coreos-install install --copy-network --ignition-url=http://192.168.7.71:8080/ignition/master-0.ign --inscure-ignition /dev/sda 

# on workstation
# open http://192.168.7.11:9000/
# to check

# if you want to stop or delete vm, try this
virsh list --all
virsh destroy ocp4-bootstrap
virsh destroy ocp4-master-0 
# virsh destroy ocp4-master-1 
# virsh destroy ocp4-master-2 
# virsh destroy ocp4-worker0 
# virsh destroy ocp4-worker1 
# virsh destroy ocp4-worker2
virsh undefine ocp4-bootstrap --remove-all-storage
virsh undefine ocp4-master-0 --remove-all-storage
# virsh undefine ocp4-master-1 --remove-all-storage
# virsh undefine ocp4-master-2 --remove-all-storage
# virsh undefine ocp4-worker0 
# virsh undefine ocp4-worker1 
# virsh undefine ocp4-worker2

在工具机上面

这个时候,安装已经自动开始了,我们只需要回到工具机上静静的观察就可以了。

在bootstrap和装master阶段,用这个命令看进度。

cd /data/install
export KUBECONFIG=/data/install/auth/kubeconfig
echo "export KUBECONFIG=/data/install/auth/kubeconfig" >> ~/.bashrc
oc completion bash | sudo tee /etc/bash_completion.d/openshift > /dev/null

cd /data/install
openshift-install wait-for bootstrap-complete --log-level debug

一切正常的话,会看到这个。

有时候证书会过期,验证方法是登录 bootstrap, 看看过期时间。如果确定过期,要清除所有的openshift-install生成配置文件的缓存,重新来过。

echo | openssl s_client -connect localhost:6443 | openssl x509 -noout -text | grep Not

一般来说,如果在openshift-install这一步之前,按照文档,删除了缓存文件,就不会出现过期的现象。

oc get nodes

这个时候,只能看到master,是因为worker的csr没有批准。如果虚拟机是一口气创建的,那么多半不会遇到下面的问题。

oc get csr

会发现有很多没有被批准的

批准之

yum -y install jq
oc get csr | grep -v Approved
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
# oc get csr -o name | xargs oc adm certificate approve

然后worker 节点cpu飙高,之后就能看到worker了。

等一会,会看到这个,就对了。

上面的操作完成以后,就可以完成最后的安装了

cd /data/install
openshift-install wait-for install-complete --log-level debug
# here is the output
# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4s.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "pg4cY-hBERh-GrAmI-Srku5"
# INFO Time elapsed: 0s

# we are testing env, so we don't need ingress replicas.
oc patch --namespace=openshift-ingress-operator --patch='{"spec": {"replicas": 1}}' --type=merge ingresscontroller/default

# after install finish, delete bootstrap vm,
# and we need to fix the dns setting, 
# remove them from helper to master-0
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars.yaml tasks/sno.dns.yml


镜像仓库代理 / image registry proxy

准备离线镜像仓库非常麻烦,好在我们找到了一台在线的主机,那么我们可以使用nexus构造image registry proxy,在在线环境上面,做一遍PoC,然后就能通过image registry proxy得到离线镜像了

  • https://mtijhof.wordpress.com/2018/07/23/using-nexus-oss-as-a-proxy-cache-for-docker-images/
#############################################
## build nexus docker image
# 开启 https
# https://blog.csdn.net/s7799653/article/details/105378645
# https://help.sonatype.com/repomanager3/system-configuration/configuring-ssl#ConfiguringSSL-InboundSSL-ConfiguringtoServeContentviaHTTPS
mkdir -p /data/install/tmp
cd /data/install/tmp

# 将证书导出成pkcs格式
# 这里需要输入密码  用 password,
# openssl pkcs12 -export -out keystore.pkcs12 -inkey /etc/crts/redhat.ren.key -in /etc/crts/redhat.ren.crt

# cat << EOF >> Dockerfile
# FROM docker.io/sonatype/nexus3:3.29.0
# USER root
# COPY keystore.pkcs12 /keystore.pkcs12
# RUN keytool -v -importkeystore -srckeystore keystore.pkcs12 -srcstoretype PKCS12 -destkeystore keystore.jks -deststoretype JKS -storepass password -srcstorepass password  &&\
#     cp keystore.jks /opt/sonatype/nexus/etc/ssl/
# USER nexus
# EOF
# buildah bud --format=docker -t quay.io/wangzheng422/qimgs:nexus3-3.29.0-wzh -f Dockerfile .
# buildah push quay.io/wangzheng422/qimgs:nexus3-3.29.0-wzh


#####################################################
# init build the nexus fs
# /bin/cp -f nexus-image.tgz /data/ccn/
# tar zxf nexus-image.tgz
# chown -R 200 /data/ccn/nexus-image

###################################################
## import nexus fs
mkdir -p /data/ccn
cd /data/ccn

podman create --name swap quay.io/wangzheng422/qimgs:nexus-fs-image-2021-05-08-1516 ls
podman cp swap:/nexus-image.tgz - > /data/ccn/nexus-image.tgz.tar
podman rm -fv swap
tar vxf nexus-image.tgz.tar
tar zvxf nexus-image.tgz
rm -f nexus-image.tgz*

chown -R 200 /data/ccn/nexus-image

#########################################
## run the nexus for image

# podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z docker.io/sonatype/nexus3:3.30.1

podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v //home/data/ccn/nexus-image:/nexus-data:Z docker.io/sonatype/nexus3:3.33.1

podman generate systemd --files --name nexus-image
# /root/container-local-registry.service
cp -Z container-nexus-image.service  /usr/lib/systemd/system

systemctl enable --now container-nexus-image.service
systemctl status container-nexus-image.service

# podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z quay.io/wangzheng422/qimgs:nexus3-3.29.0-wzh

podman stop nexus-image
podman rm nexus-image



# get the admin password
cat /data/ccn/nexus-image/admin.password && echo
# 84091bcd-c82f-44a3-8b7b-dfc90f5b7da1

# open http://nexus.ocp4.redhat.ren:8082

######################################################
# go to helper, update proxy setting for ocp cluster
cd /data/ocp4
bash image.registries.conf.sh nexus.ocp4.redhat.ren:8083

# mkdir -p /etc/containers/registries.conf.d
# /bin/cp -f image.registries.conf /etc/containers/registries.conf.d/

cd /data/ocp4
oc apply -f ./99-worker-container-registries.yaml -n openshift-config
oc apply -f ./99-master-container-registries.yaml -n openshift-config

######################################################
# dump the nexus image fs out
podman stop nexus-image

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date
cd /data/ccn

tar cf - ./nexus-image | pigz -c > nexus-image.tgz 
buildah from --name onbuild-container scratch
buildah copy onbuild-container nexus-image.tgz  /
buildah umount onbuild-container 
buildah commit --rm onbuild-container quay.io/wangzheng422/qimgs:nexus-fs-image-$var_date
# buildah rm onbuild-container
# rm -f nexus-image.tgz 
buildah push quay.io/wangzheng422/qimgs:nexus-fs-image-$var_date
echo "quay.io/wangzheng422/qimgs:nexus-fs-image-$var_date"

# 以下这个版本,可以作为初始化的image proxy,里面包含了nfs provision,以及sample operator的metadata。很高兴的发现,image stream并不会完全下载镜像,好想只是下载metadata,真正用的时候,才去下载。
# quay.io/wangzheng422/qimgs:nexus-fs-image-2022-01-14-2155

配置镜像仓库的ca

安装过程里面,已经把镜像仓库的ca放进去了,但是好想image stream不认,让我们再试试

oc project openshift-config
oc create configmap ca.for.registry -n openshift-config \
    --from-file=registry.ocp4.redhat.ren..5443=/etc/crts/redhat.ren.ca.crt 
#     --from-file=nexus.ocp4.redhat.ren..8083=/data/install/redhat.ren.ca.crt 

oc patch image.config.openshift.io/cluster -p '{"spec":{"additionalTrustedCA":{"name":"ca.for.registry"}}}'  --type=merge

oc patch image.config.openshift.io/cluster -p '{"spec":{"registrySources":{"insecureRegistries":["nexus.ocp4.redhat.ren:8083"]}}}'  --type=merge

oc get image.config.openshift.io/cluster -o yaml

# openshift project下面的image stream重新加载一下把
oc get is -o json | jq -r '.items[].metadata.name' | xargs -L1 oc import-image --all 

配置internal registry

我们的工具机是带nfs的,那么就给interneal registry配置高档一些的nfs存储吧,不要用emptydir

bash /data/ocp4/ocp4-upi-helpernode-master/files/nfs-provisioner-setup.sh

# oc edit configs.imageregistry.operator.openshift.io
# 修改 storage 部分
# storage:
#   pvc:
#     claim:
oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Managed","storage":{"pvc":{"claim":""}}}}' --type=merge

# if you want to restore
oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

oc get clusteroperator image-registry

oc get configs.imageregistry.operator.openshift.io cluster -o yaml

# 把imagepruner给停掉
# https://bugzilla.redhat.com/show_bug.cgi?id=1852501#c24
# oc patch imagepruner.imageregistry/cluster --patch '{"spec":{"suspend":true}}' --type=merge
# oc -n openshift-image-registry delete jobs --all

配置sample operator

openshift内置了一个sample operator,里面有一大堆红帽的产品。

oc get configs.samples.operator.openshift.io/cluster -o yaml

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Managed", "samplesRegistry": "nexus.ocp4.redhat.ren:8083"}}' --type=merge

# if you want to restore
oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge

# if you want to get ride of sampe operator
oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

chrony/NTP 设置

在 ocp 4.6 里面,需要设定ntp同步,我们之前ansible脚本,已经创建好了ntp的mco配置,把他打到系统里面就好了。

oc apply -f /data/ocp4/ocp4-upi-helpernode-master/machineconfig/

Operator Hub 离线安装

使用nexus作为image proxy以后,就不需要做这个离线操作了。有些情况下,我们可能还需要屏蔽掉默认的operator hub


oc patch OperatorHub cluster --type json \
    -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'

oc get OperatorHub cluster -o yaml

oc get OperatorHub
# NAME      AGE
# cluster   20h

给 openshift project image stream 打补丁

在有代理的网络环境中,我们需要给openshift project下的image stream打一些补丁。

cd /data/ocp4
bash is.patch.sh registry.ocp4.redhat.ren:5443/ocp4/openshift4

disable offical helm chart & enable helm proxy

我们是半离线环境,所以openshift4内置的官方helm chart是无法访问的,我们禁用之。

oc get HelmChartRepository
# NAME               AGE
# redhat-helm-repo   19h

# oc patch HelmChartRepository redhat-helm-repo --type json \
#     -p '[{"op": "add", "path": "/spec/disabled", "value": true}]'

oc patch  --patch='{"spec": {"disabled": true}}' --type=merge HelmChartRepository/openshift-helm-charts

cat << EOF > /data/install/helm.ocp.yaml
apiVersion: helm.openshift.io/v1beta1
kind: HelmChartRepository
metadata:
  name: openshift-helm-charts-wzh
spec:
 # optional name that might be used by console
  name: openshift-helm-charts-wzh
  connectionConfig:
    url: http://nexus.ocp4.redhat.ren:8082/repository/charts.openshift.io/
EOF
oc create -f /data/install/helm.ocp.yaml

给 router / ingress 更换证书

有时候,我们需要公网CA认证的证书,给router来用,那么我们就搞一下

https://docs.openshift.com/container-platform/4.6/security/certificates/replacing-default-ingress-certificate.html


mkdir -p /data/ccn/ingress-keys/etc
mkdir -p /data/ccn/ingress-keys/lib
cd /data/ccn/ingress-keys
podman run -it --rm --name certbot \
            -v "/data/ccn/ingress-keys/etc:/etc/letsencrypt":Z \
            -v "/data/ccn/ingress-keys/lib:/var/lib/letsencrypt":Z \
            docker.io/certbot/certbot certonly  -d "*.apps.ocp4.redhat.ren" --manual --preferred-challenges dns-01  --server https://acme-v02.api.letsencrypt.org/directory

cp ./etc/archive/apps.ocp4.redhat.ren/fullchain1.pem apps.ocp4.redhat.ren.crt
cp ./etc/archive/apps.ocp4.redhat.ren/privkey1.pem apps.ocp4.redhat.ren.key

ssh root@192.168.7.11 mkdir -p /data/install/ingress-key

scp apps.* root@192.168.7.11:/data/install/ingress-key

# on helper
cd /data/install/ingress-key

oc create secret tls wzh-ingress-key \
     --cert=apps.ocp4.redhat.ren.crt \
     --key=apps.ocp4.redhat.ren.key \
     -n openshift-ingress

oc patch ingresscontroller.operator default \
     --type=merge -p \
     '{"spec":{"defaultCertificate": {"name": "wzh-ingress-key"}}}' \
     -n openshift-ingress-operator

更改系统默认时区

https://access.redhat.com/solutions/5487331

cat << EOF > /data/install/timezone.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-custom-timezone-configuration
spec:
  config:
    ignition:
      version: 2.2.0
    systemd:
      units:
      - contents: |
          [Unit]
          Description=set timezone
          After=network-online.target

          [Service]
          Type=oneshot
          ExecStart=timedatectl set-timezone Asia/Shanghai

          [Install]
          WantedBy=multi-user.target
        enabled: true
        name: custom-timezone.service
EOF
oc create -f /data/install/timezone.yaml

time sync between kvm and hosts

CHAPTER 8. KVM GUEST TIMING MANAGEMENT


echo ptp_kvm > /etc/modules-load.d/ptp_kvm.conf

echo "refclock PHC /dev/ptp0 poll 2" >> /etc/chrony.conf

systemctl restart chronyd

chronyc sources
# 210 Number of sources = 1
# MS Name/IP address         Stratum Poll Reach LastRx Last sample
# ===============================================================================
# #* PHC0                          0   2   377     5  -4784ns[  -62us] +/-   56us

排错技巧


# login to bootstrap to debug
# find the ip from kvm console
ssh -i ~/.ssh/helper_rsa core@192.168.7.75
journalctl -b -f -u release-image.service -u bootkube.service
journalctl -b -u release-image.service -u bootkube.service | grep -i baremetal
sudo -i
export KUBECONFIG=/etc/kubernetes/kubeconfig
oc get pod -n openshift-machine-api
oc get BareMetalHost -n openshift-machine-api

# debug why bootstrap can't be ping...
cat .openshift_install_state.json | jq  '."*bootstrap.Bootstrap"'.Config.storage.files[].path

cat .openshift_install_state.json | jq -r '."*bootstrap.Bootstrap"'.File.Data | base64 -d | jq -r . > ign.json

cat .openshift_install_state.json | jq  -r '."*bootstrap.Bootstrap".Config.storage.files[].contents.source ' | sed 's/.*base64,//g' | base64 -d > decode

cat .openshift_install_state.json | jq  -r '."*bootstrap.Bootstrap".Config.storage.files[] | .path, .contents.source ' | while read -r line ; do if [[ $line =~ .*base64,.* ]]; then echo $(echo $line | sed 's/.*base64,//g' | base64 -d) ; else echo $line; fi; done > files

# https://serverfault.com/questions/329287/free-up-not-used-space-on-a-qcow2-image-file-on-kvm-qemu
virt-sparsify disk.img new-file.img

# https://access.redhat.com/solutions/24029
xz -dc < /boot/initrd-$(uname -r).img | cpio -idmv

openshift 4.10 离线 baremetal IPI (全自动)安装 单网络 静态IP模式

简介

本文描述ocp4.10在baremetal(kvm模拟)上面,IPI (全自动)安装。由于4.10支持nmstat,所以他原生支持静态IP安装了。

根据openshift文档,baremetal IPI安装有两种模式,一种是provisioning网络独立,另外一种是provisioning网络和baremetal(服务)网络合并的模式。考虑到POC现场的环境,本次实验,使用简单的网络部署,也就是合并的网络模式。

以下是本次实验的架构图:

注意:本文使用single node (sno) 模式,并使用 IPI (全自动) 安装,这种模式官方是不支持的,我们这么做,是为了后续的ACM zero touch provision 实验, ZTP实验,需要ACM hub集群是IPI模式安装,而我们做实验资源紧张,所以我们搞了一个sno with IPI 模式的安装步骤。本文中有一些手动执行的步骤,都是因为官方IPI不支持sno,我们需要做一些小小的patch操作。

离线安装包下载

打包好的安装包,在这里下载,百度盘下载链接,版本是4.10.4:

链接:https://pan.baidu.com/s/16H8goM8AQ5ASXXPsWT4GAg?pwd=426x 提取码:426x

其中包括如下类型的文件:

  • ocp4.tgz 这个文件包含了iso等安装介质,以及各种安装脚本,全部下载的镜像列表等。需要复制到宿主机,以及工具机上去。
  • registry.tgz 这个文件也是docker image registry的仓库打包文件。需要先补充镜像的话,按照这里操作: 4.6.add.image.md
  • nexus-image.tgz 这个是nexus的镜像仓库打包,集群的镜像proxy指向nexus,由nexus提供镜像的cache
  • poc.image.tgz 这个是给registry.tgz补充的一些镜像,主要是ccn使用,补充的镜像列表在这里 poc.image.list ,按照这里操作: 4.6.add.image.md

合并这些切分文件,使用类似如下的命令

cat registry.?? > registry.tgz

注意,可能需要更新离线镜像包中的helper用的ansible脚本。

在外网云主机上面准备离线安装源

准备离线安装介质的文档,已经转移到了这里:4.10.build.dist.md

前期准备,主要在宿主机上

本次实验,是在一个24C, 128G 的主机上面,用很多个虚拟机安装测试。所以先准备这个宿主机。

如果是多台宿主机,记得一定要调整时间配置,让这些宿主机的时间基本一致,否则证书会出问题。

主要的准备工作有

  • 配置yum源
  • 配置dns
  • 安装镜像仓库
  • 配置vnc环境
  • 配置kvm需要的网络
  • 创建helper kvm

以上准备工作,dns部分需要根据实际项目环境有所调整。

本次的宿主机是一台rhel8, 参考这里进行离线repo等基本的配置rhel8.build.kernel.repo.cache.md

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

cat << EOF >>  /etc/hosts
127.0.0.1 registry.ocp4.redhat.ren nexus.ocp4.redhat.ren git.ocp4.redhat.ren
EOF

dnf clean all
dnf repolist

dnf -y install byobu htop jq ipmitool

systemctl disable --now firewalld

# 配置registry
mkdir -p /etc/crts/ && cd /etc/crts

# https://access.redhat.com/documentation/en-us/red_hat_codeready_workspaces/2.1/html/installation_guide/installing-codeready-workspaces-in-tls-mode-with-self-signed-certificates_crw
openssl genrsa -out /etc/crts/redhat.ren.ca.key 4096
openssl req -x509 \
  -new -nodes \
  -key /etc/crts/redhat.ren.ca.key \
  -sha256 \
  -days 36500 \
  -out /etc/crts/redhat.ren.ca.crt \
  -subj /CN="Local Red Hat Ren Signer" \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat /etc/pki/tls/openssl.cnf \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature'))

openssl genrsa -out /etc/crts/redhat.ren.key 2048

openssl req -new -sha256 \
    -key /etc/crts/redhat.ren.key \
    -subj "/O=Local Red Hat Ren /CN=*.ocp4.redhat.ren" \
    -reqexts SAN \
    -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "\n[SAN]\nsubjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth")) \
    -out /etc/crts/redhat.ren.csr

openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth") \
    -days 36500 \
    -in /etc/crts/redhat.ren.csr \
    -CA /etc/crts/redhat.ren.ca.crt \
    -CAkey /etc/crts/redhat.ren.ca.key \
    -CAcreateserial -out /etc/crts/redhat.ren.crt

openssl x509 -in /etc/crts/redhat.ren.crt -text

/bin/cp -f /etc/crts/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

配置镜像仓库

这里是旧的,使用docker registry的配置镜像仓库的方法,如果想配置quay,可以参考这里

cd /data
mkdir -p /data/registry
# tar zxf registry.tgz
dnf -y install podman pigz skopeo jq 
# pigz -dc registry.tgz | tar xf -
cd /data/ocp4
podman load -i /data/ocp4/registry.tgz

podman run --name local-registry -p 5443:5000 \
  -d --restart=always \
  -v /data/registry/:/var/lib/registry:z \
  -v /etc/crts:/certs:z \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/redhat.ren.crt \
  -e REGISTRY_HTTP_TLS_KEY=/certs/redhat.ren.key \
  docker.io/library/registry:2

podman start local-registry

# firewall-cmd --permanent --add-port=5443/tcp
# firewall-cmd --reload

# 加载更多的镜像
# 解压缩 ocp4.tgz
bash add.image.load.sh /data/install.image 'registry.ocp4.redhat.ren:5443'

# https://github.com/christianh814/ocp4-upi-helpernode/blob/master/docs/quickstart.md

oc image mirror -a /data/registry.auth.json --from-dir=/data/file.registry/ 'file://openshift/release:4.10.4-x86_64*' quaylab.infra.redhat.ren/ocp4/openshift4

准备vnc环境

vncpasswd

cat << EOF > ~/.vnc/config
session=gnome
securitytypes=vncauth,tlsvnc
desktop=sandbox
geometry=1440x855
alwaysshared
EOF

cat << EOF >> /etc/tigervnc/vncserver.users
:1=root
EOF

systemctl start vncserver@:1
# 如果你想停掉vnc server,这么做
systemctl stop vncserver@:1

# firewall-cmd --permanent --add-port=6001/tcp
# firewall-cmd --permanent --add-port=5901/tcp
# firewall-cmd --reload

# connect vnc at port 5901
# export DISPLAY=:1

创建实验用虚拟网络

cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.105/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF

nmcli con mod baremetal +ipv4.address '192.168.7.1/24'
nmcli networking off; nmcli networking on

创建工具机

mkdir -p /data/kvm
cd /data/kvm

lvremove -f rhel/helperlv
lvcreate -y -L 200G -n helperlv rhel

virt-install --name="ocp4-aHelper" --vcpus=2 --ram=4096 \
--disk path=/dev/rhel/helperlv,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio \
--boot menu=on --location /data/kvm/rhel-8.3-x86_64-dvd.iso \
--initrd-inject helper-ks-rhel8-ipi.cfg --extra-args "inst.ks=file:/helper-ks-rhel8-ipi.cfg" 

virsh start ocp4-aHelper

# DO NOT USE, restore kvm
virsh destroy ocp4-aHelper
virsh undefine ocp4-aHelper

# virt-viewer --domain-name ocp4-aHelper
# virsh start ocp4-aHelper
# virsh list --all

配置时间服务

# start chrony/ntp server on host
/bin/cp -f /etc/chrony.conf /etc/chrony.conf.default
cat << EOF > /etc/chrony.conf
# pool 2.rhel.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow 192.0.0.0/8
local stratum 10
logdir /var/log/chrony
EOF
systemctl enable --now chronyd
# systemctl restart chronyd
chronyc tracking
chronyc sources -v
chronyc sourcestats -v
chronyc makestep

# setup ftp data root
mount --bind /data/dnf /var/ftp/dnf
chcon -R -t public_content_t  /var/ftp/dnf

在helper上配置静态变量

在 helper / 工具机上,配置静态变量。这些变量,将帮助配置工作可以在不同项目之间复用。后续也许可以考虑把相关的脚本,放到ansible项目里面去。

# on helper define static parameter

NODE_SSH_KEY="$(cat ~/.ssh/id_rsa.pub)"
INSTALL_IMAGE_REGISTRY=quaylab.infra.redhat.ren

PULL_SECRET='{"auths":{"registry.redhat.io": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"'${INSTALL_IMAGE_REGISTRY}'": {"auth": "'$( echo -n 'quayadmin:password' | openssl base64 )'","email": "noemail@localhost"}}}'

NTP_SERVER=192.168.7.1
HELP_SERVER=192.168.7.11
KVM_HOST=192.168.7.1
API_VIP=192.168.7.100
INGRESS_VIP=192.168.7.101
CLUSTER_PROVISION_IP=192.168.7.103
BOOTSTRAP_IP=192.168.7.12

ACM_DEMO_MNGED_CLUSTER=acm-demo1
ACM_DEMO_MNGED_SNO_IP=192.168.7.15

echo $PULL_SECRET

# 定义单节点集群的节点信息
SNO_CLUSTER_NAME=acm-demo-hub
SNO_BASE_DOMAIN=redhat.ren
SNO_IP=192.168.7.13
SNO_GW=192.168.7.1
SNO_NETMAST=255.255.255.0
SNO_NETMAST_S=24
SNO_HOSTNAME=acm-demo-hub-master
SNO_IF=enp1s0
SNO_IF_MAC=`printf '00:60:2F:%02X:%02X:%02X' $[RANDOM%256] $[RANDOM%256] $[RANDOM%256]`
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_CORE_PWD=redhat

echo ${SNO_IF_MAC} > /data/sno/sno.mac

创建openshift4集群节点vm模板

# back to kvm host

# create the master and worker vm, but not start them
export KVM_DIRECTORY=/data/kvm
mkdir -p ${KVM_DIRECTORY}
cd ${KVM_DIRECTORY}
# scp root@192.168.7.11:/data/install/*.iso ${KVM_DIRECTORY}/
scp root@192.168.7.11:/data/sno/sno.mac ${KVM_DIRECTORY}/

remove_lv() {
    var_vg=$1
    var_lv=$2
    lvremove -f $var_vg/$var_lv
}

create_lv() {
    var_vg=$1
    var_lv=$2
    lvcreate -y -L 120G -n $var_lv $var_vg
    wipefs --all --force /dev/$var_vg/$var_lv
}

remove_lv vgdata lvacmdemo1
# remove_lv vgdata lvbootstrap
# remove_lv vgdata lvdata01
# remove_lv vgdata lvdata02
remove_lv vgdata lvmaster0
# remove_lv vgdata lvsno

# create_lv rhel bootstraplv
create_lv vgdata lvmaster0

virt-install --name=ocp4-master0 --vcpus=16 --ram=49152 \
--cpu=host-model \
--disk path=/dev/vgdata/lvmaster0,device=disk,bus=virtio,format=raw \
--os-variant rhel8.0 --network bridge=baremetal,model=virtio,mac=$(<sno.mac) \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
--print-xml > ${KVM_DIRECTORY}/ocp4-master0.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-master0.xml

# --boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on  \
# --boot hd,cdrom,menu=on  \

cd /data/kvm/
# for i in master{0..2} worker{0..2}
for i in master{0..0}
do
  echo -ne "${i}\t" ; 
  virsh dumpxml ocp4-${i} | grep "mac address" | cut -d\' -f2 | tr '\n' '\t'
  echo 
done > mac.list
cat /data/kvm/mac.list
# master0 00:60:2f:86:fc:ba

# GOTO image registry & kvm host
# copy crt files to helper node
ssh-copy-id root@192.168.7.11

ssh root@192.168.7.11 mkdir -p /data/install
ssh root@192.168.7.11 mkdir -p /data/ocp4
ssh root@192.168.7.11 mkdir -p /etc/crts
scp /data/down/ocp4.tgz root@192.168.7.11:/data/
rsync -e ssh --info=progress2 -P --delete -arz /data/ocp4/ 192.168.7.11:/data/ocp4/

# scp /etc/crts/redhat.ren.ca.crt root@192.168.7.11:/data/install/
scp /etc/crts/redhat.ren.ca.crt root@192.168.7.11:/etc/crts/

scp /data/kvm/mac.list root@192.168.7.11:/data/install/

配置redfish模拟


# install redfish for kvm
# https://access.redhat.com/solutions/4315581
# https://access.redhat.com/solutions/3057171
# https://docs.openstack.org/virtualbmc/latest/user/index.html
# https://docs.openstack.org/sushy-tools/latest/user/dynamic-emulator.html
dnf -y install python3-pip
# pip3 install --user sushy-tools

mkdir -p /data/install
cd /data/install

# podman create --name swap docker.io/wangzheng422/imgs:openshift-baremetal-install-4.6.5 ls
# podman cp swap:/openshift-baremetal-install ./
# podman rm -fv swap
# quay.io/wangzheng422/qimgs:ocp.bm.ipi.python.dep.rhel8-4.6.7

podman create --name swap quay.io/wangzheng422/qimgs:ocp.bm.ipi.python.dep.rhel8-4.10.4 ls
podman cp swap:/wheelhouse.tar.gz - > wheelhouse.tar.gz.tar
tar xvf wheelhouse.tar.gz.tar
tar zvxf wheelhouse.tar.gz
podman rm -fv swap

dnf groupinstall -y 'Development Tools'
dnf -y install python3-pip libvirt libvirt-devel python3-devel openssl-devel

pip3 install --user --no-index --find-links wheelhouse setuptools-rust
# export CRYPTOGRAPHY_DONT_BUILD_RUST=1
dnf install -y rust cargo
pip3 install --user -r wheelhouse/requirements.txt --no-index --find-links wheelhouse

/root/.local/bin/sushy-emulator -i 0.0.0.0 --ssl-certificate /etc/crts/redhat.ren.crt --ssl-key /etc/crts/redhat.ren.key

# curl https://registry.ocp4.redhat.ren:8000/redfish/v1/Systems/

# DO NOT USE, restore 
# if you want to stop or delete vm, try this
virsh list --all
# virsh destroy ocp4-bootstrap
virsh destroy ocp4-master0 
# virsh destroy ocp4-master1 
# virsh destroy ocp4-master2 
# virsh destroy ocp4-worker0 
# virsh destroy ocp4-worker1 
# virsh destroy ocp4-worker2
# virsh undefine ocp4-bootstrap
virsh undefine ocp4-master0 --nvram
# virsh undefine ocp4-master1 --nvram
# virsh undefine ocp4-master2 --nvram
# virsh undefine ocp4-worker0 --nvram
# virsh undefine ocp4-worker1 --nvram
# virsh undefine ocp4-worker2 --nvram

工具机上的准备工作

以下是在工具机里面,进行的安装操作。

主要的操作有

  • 配置yum源
  • 运行ansible脚本,自动配置工具机
  • 上传定制的安装配置文件
  • 生成ignition文件

工具机的基础配置


sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config
systemctl restart sshd

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

systemctl disable --now firewalld

# in helper node
mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

export YUMIP="192.168.7.1"
cat << EOF > /etc/yum.repos.d/remote.repo
[remote-epel]
name=epel
baseurl=ftp://${YUMIP}/dnf/epel
enabled=1
gpgcheck=0

[remote-epel-modular]
name=epel-modular
baseurl=ftp://${YUMIP}/dnf/epel-modular
enabled=1
gpgcheck=0

[remote-appstream]
name=appstream
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-appstream-rpms
enabled=1
gpgcheck=0

[remote-baseos]
name=baseos
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-baseos-rpms
enabled=1
gpgcheck=0

[remote-baseos-source]
name=baseos-source
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-baseos-source-rpms
enabled=1
gpgcheck=0

[remote-supplementary]
name=supplementary
baseurl=ftp://${YUMIP}/dnf/rhel-8-for-x86_64-supplementary-rpms
enabled=1
gpgcheck=0

[remote-codeready-builder]
name=supplementary
baseurl=ftp://${YUMIP}/dnf/codeready-builder-for-rhel-8-x86_64-rpms
enabled=1
gpgcheck=0

EOF

yum clean all
yum makecache
yum repolist

yum -y install ansible git unzip podman python3

yum -y update

reboot

# yum -y install ansible git unzip podman python36

准备 openshift 的定制化 ansible 安装工具


mkdir -p /data/ocp4/
# scp ocp4.tgz to /data
# scp /data/down/ocp4.tgz root@192.168.7.11:/data/
cd /data
tar zvxf ocp4.tgz
cd /data/ocp4

# 这里使用了一个ansible的项目,用来部署helper节点的服务。
# https://github.com/wangzheng422/ocp4-upi-helpernode
unzip ocp4-upi-helpernode.zip
# 这里使用了一个ignition文件合并的项目,用来帮助自定义ignition文件。
# https://github.com/wangzheng422/filetranspiler
podman load -i filetranspiler.tgz

mkdir -p /data/install

# on helper

mkdir -p /data/ocp4/
cd /data/ocp4/
cat << EOF > redfish.sh
#!/usr/bin/env bash

curl -k -s https://${KVM_HOST}:8000/redfish/v1/Systems/ | jq -r '.Members[]."@odata.id"' >  list

while read -r line; do
    curl -k -s https://${KVM_HOST}:8000/\$line | jq -j '.Id, " ", .Name, "\n" '
done < list

EOF
bash redfish.sh > /data/install/vm.list
cat /data/install/vm.list
# 1bc0116f-d376-45e2-b28c-d6b4b772b2bf ocp4-master0
# e70f66bc-7878-4617-811d-89cdaf62cc8c ocp4-Helper

# 配置ansible脚本的参数,注意修改里面的静态参数

cat << EOF > /data/ocp4/ocp4-upi-helpernode-master/vars.yaml
---
ocp_version: 4.10.4
ssh_gen_key: false
staticips: true
bm_ipi: true
firewalld: false
dns_forward: true
iso:
  iso_dl_url: "file:///data/ocp4/rhcos-live.x86_64.iso"
  my_iso: "rhcos-live.iso"
helper:
  name: "helper"
  ipaddr: "${HELP_SERVER}"
  networkifacename: "enp1s0"
  gateway: "${SNO_GW}"
  netmask: "${SNO_NETMAST}"
dns:
  domain: "redhat.ren"
  clusterid: "ocp4"
  forwarder1: "172.21.1.1"
  forwarder2: "172.21.1.1"
  api_vip: "${API_VIP}"
  ingress_vip: "${INGRESS_VIP}"
bootstrap:
  name: "bootstrap"
  ipaddr: "${BOOTSTRAP_IP}"
  interface: "enp1s0"
  install_drive: "vda"
masters:
  - name: "master-0"
    ipaddr: "192.168.7.13"
    interface: "enp1s0"
    install_drive: "vda"
others:
  - name: "registry"
    ipaddr: "192.168.7.103"
  - name: "yum"
    ipaddr: "172.21.6.103"
  - name: "quay"
    ipaddr: "172.21.6.103"
  - name: "nexus"
    ipaddr: "172.21.6.103"
  - name: "git"
    ipaddr: "172.21.6.103"
otherdomains:
  - domain: "infra.redhat.ren"
    hosts:
    - name: "registry"
      ipaddr: "192.168.7.1"
    - name: "yum"
      ipaddr: "192.168.7.1"
    - name: "quay"
      ipaddr: "192.168.7.1"
    - name: "quaylab"
      ipaddr: "192.168.7.1"
    - name: "nexus"
      ipaddr: "192.168.7.1"
    - name: "git"
      ipaddr: "192.168.7.1"
  - domain: "${ACM_DEMO_MNGED_CLUSTER}.${SNO_BASE_DOMAIN}"
    hosts:
    - name: "api"
      ipaddr: "${ACM_DEMO_MNGED_SNO_IP}"
    - name: "api-int"
      ipaddr: "${ACM_DEMO_MNGED_SNO_IP}"
    - name: "${ACM_DEMO_MNGED_CLUSTER}-master"
      ipaddr: "${ACM_DEMO_MNGED_SNO_IP}"
    - name: "*.apps"
      ipaddr: "${ACM_DEMO_MNGED_SNO_IP}"
  - domain: "${SNO_CLUSTER_NAME}.${SNO_BASE_DOMAIN}"
    hosts:
    - name: "api"
      ipaddr: "${SNO_IP}"
    - name: "api-int"
      ipaddr: "${SNO_IP}"
    - name: "${SNO_CLUSTER_NAME}-master"
      ipaddr: "${SNO_IP}"
    - name: "*.apps"
      ipaddr: "${SNO_IP}"
force_ocp_download: false
remove_old_config_files: false
ocp_client: "file:///data/ocp4/{{ ocp_version }}/openshift-client-linux-{{ ocp_version }}.tar.gz"
ocp_installer: "file:///data/ocp4/{{ ocp_version }}/openshift-install-linux-{{ ocp_version }}.tar.gz"
ppc64le: false
arch: 'x86_64'
chronyconfig:
  enabled: true
  content:
    - server: "${NTP_SERVER}"
      options: iburst
setup_registry: # don't worry about this, just leave it here
  deploy: false
  registry_image: docker.io/library/registry:2
  local_repo: "ocp4/openshift4"
  product_repo: "openshift-release-dev"
  release_name: "ocp-release"
  release_tag: "4.6.1-x86_64"
ocp_filetranspiler: "file:///data/ocp4/filetranspiler.tgz"

EOF

# 接下来,我们使用ansible来配置helper节点,装上各种openshift集群需要的服务
# 根据现场环境,修改 ocp4-upi-helpernode-master/vars-static.yaml
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars.yaml -e '{ staticips: true, bm_ipi: true }'  tasks/main.yml


# generate image registry proxy related config
cd /data/ocp4
bash image.registries.conf.sh nexus.ocp4.redhat.ren:8083


# try this:
/usr/local/bin/helpernodecheck

mkdir -p /data/install

# GO back to help node
# apply registry's CA
/bin/cp -f /etc/crts/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

配置 ignition 点火配置文件

openshift4安装的关键,就是ignition文件,更准确的说,是rhcos的点火配置文件,所有项目现场想做的定制,都在ignition文件里面。

rhcos就是一个rhel,所有你想要的定制化,都可以写成配置文件和脚本,加到ignition文件中去。但是,openshift4在安装过程中,至少要重启3次,我们的ignition文件中的配置,更多的是影响第一次启动,而之后的启动,rhcos会根据自身的升级机制,使用新的ignition去启动,这个新的ignition文件在哪里?怎么影响这个igntion文件的生成?作者现在也还在探索中,但是大致的方向是定制 /opt/openshift/openshift/ 下面的machine config yaml文件,把machine config写进去。

# on helper

# 根据现场环境,修改 install-config.yaml
# 至少要修改ssh key, 还有 additionalTrustBundle,这个是镜像仓库的csr 

# copy your pull secret file into helper
# SEC_FILE='/data/pull-secret.json'
# cat << 'EOF' > $SEC_FILE

# 定制ignition
mkdir -p /data/install

cd /data/install

# vi install-config.yaml 
cat << EOF > /data/install/install-config.yaml 
apiVersion: v1
baseDomain: ${SNO_BASE_DOMAIN}
# bootMode: legacy
platform:
  baremetal:
    apiVIP: ${API_VIP}
    ingressVIP: ${INGRESS_VIP}
    bootstrapProvisioningIP: ${BOOTSTRAP_IP}
    clusterProvisioningIP: ${CLUSTER_PROVISION_IP}
    provisioningNetwork: "Disabled"
    externalBridge: baremetal
    bootstrapOSImage: http://${HELP_SERVER}:8080/install/rhcos-qemu.x86_64.qcow2.gz?sha256=$(zcat /var/www/html/install/rhcos-qemu.x86_64.qcow2.gz | sha256sum | awk '{print $1}')
    clusterOSImage: http://${HELP_SERVER}:8080/install/rhcos-openstack.x86_64.qcow2.gz?sha256=$(zcat /var/www/html/install/rhcos-openstack.x86_64.qcow2.gz | sha256sum  | awk '{print $1}')
    hosts:
      - name: ${SNO_HOSTNAME}
        role: master
        bmc:
          address: redfish-virtualmedia://${KVM_HOST}:8000/redfish/v1/Systems/$(cat vm.list | grep master0 | awk '{print $1}')
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat mac.list | grep master0 | awk '{print $2}')
        rootDeviceHints:
          deviceName: "${SNO_DISK}"
        networkConfig: 
          dns-resolver:
            config:
              server:
              - ${SNO_DNS}
          interfaces:
          - ipv4:
              address:
              - ip: ${SNO_IP}
                prefix-length: ${SNO_NETMAST_S}
              # - ip: ${API_VIP}
              #   prefix-length: 32
              # - ip: ${INGRESS_VIP}
              #   prefix-length: 32
              # - ip: ${CLUSTER_PROVISION_IP}
              #   prefix-length: 32
              dhcp: false
              enabled: true
            name: ${SNO_IF}
            state: up
            type: ethernet
          routes:
            config:
            - destination: 0.0.0.0/0
              next-hop-address: ${SNO_GW}
              next-hop-interface: ${SNO_IF}
              table-id: 254
metadata:
  name: ${SNO_CLUSTER_NAME}
networking:
  clusterNetworks:
  - cidr: 10.254.0.0/16
    hostPrefix: 24
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
  machineCIDR: 192.168.7.0/24
compute:
- name: worker
  replicas: 0
controlPlane:
  name: master
  replicas: 1
  platform:
    baremetal: {}
pullSecret: '${PULL_SECRET}'
sshKey: |
$( cat /root/.ssh/id_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

在宿主机上开始安装

将配置文件复制到宿主机上


# GO back to host
mkdir -p /data/install
cd /data/install
/bin/rm -rf .openshift_install.log .openshift_install_state.json terraform* auth tls *

scp root@192.168.7.11:/data/install/install-config.yaml /data/install/

cd /data/install
for i in $(sudo virsh list --all | tail -n +3 | grep bootstrap | awk {'print $2'});
do
  sudo virsh destroy $i;
  sudo virsh undefine $i;
  sudo virsh vol-delete $i --pool default;
  sudo virsh vol-delete $i.ign --pool default;
  virsh pool-destroy $i
  virsh pool-delete $i
  virsh pool-undefine $i
done

从ignition点火配置文件创建安装配置文件


export BUILDNUMBER=4.10.4

/data/ocp4/${BUILDNUMBER}/openshift-baremetal-install --dir /data/install/ create manifests

# copy ntp related config
scp root@192.168.7.11:/data/ocp4/ocp4-upi-helpernode-master/machineconfig/* /data/install/openshift/

# /bin/cp -f /data/ocp4/image.registries.conf /etc/containers/registries.conf.d/

scp root@192.168.7.11:/data/ocp4/99-worker-container-registries.yaml /data/install/openshift
scp root@192.168.7.11:/data/ocp4/99-master-container-registries.yaml /data/install/openshift

# /data/ocp4/${BUILDNUMBER}/openshift-baremetal-install --dir /data/install/ --log-level debug create cluster
/data/ocp4/${BUILDNUMBER}/openshift-baremetal-install --dir /data/install/ create ignition-configs

定制 bootstrap 的 ignition 点火配置文件

mkdir -p /data/sno/disconnected/

# 定义单节点集群的节点信息
BTS_CLUSTER_NAME=ocp4s-ais
BTS_BASE_DOMAIN=redhat.ren
BTS_IP=192.168.7.12
BTS_GW=192.168.7.1
BTS_NETMAST=255.255.255.0
BTS_NETMAST_S=24
BTS_HOSTNAME=ocp4s-ais-bootstrap
# SNO_CON="Wired connection 1"
BTS_CON="ens3"
BTS_IF=ens3
BTS_DNS=192.168.7.11
BTS_DISK=/dev/vda
BTS_CORE_PWD=redhat

SNO_HOSTNAME=acm-demo-hub-master

cat << EOF > /data/sno/static.ip.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-static-ip
storage:
  files:
    - path: /etc/NetworkManager/system-connections/${BTS_CON}.nmconnection
      mode: 0600
      overwrite: true
      contents:
        inline: |
          [connection]
          id=${BTS_IF}
          # uuid=$(uuidgen)
          type=ethernet
          interface-name=${BTS_IF}
          autoconnect=true

          [ipv4]
          address1=${BTS_IP}/${BTS_NETMAST_S=24},${BTS_GW}
          dns=${BTS_DNS};
          dns-search=
          method=manual

          [ipv6]
          addr-gen-mode=eui64
          dhcp-hostname=${BTS_HOSTNAME}
          dhcp-timeout=90
          dns-search=
          method=disabled

          [proxy]

EOF

# set static hostname for master
# only works for sno
# do not use this in 3-master cluster
# in 3-master cluster, use dhcp to set hostname instead.

cat << EOF > /data/sno/static.hostname.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-static-hostname
storage:
  files:
    - path: /etc/hostname
      mode: 0644
      overwrite: true
      contents:
        inline: |
          ${SNO_HOSTNAME}

EOF

source /data/ocp4/acm.fn.sh

butane /data/sno/static.ip.bu > /data/sno/disconnected/99-zzz-bootstrap-ip.yaml
get_file_content_for_ignition "/opt/openshift/openshift/99-zzz-bootstrap-ip.yaml" "/data/sno/disconnected/99-zzz-bootstrap-ip.yaml"
VAR_99_master_bootstrap_ip=$RET_VAL
VAR_99_master_bootstrap_ip_2=$RET_VAL_2

butane /data/sno/static.hostname.bu > /data/sno/disconnected/99-zzz-master-static-hostname.yaml
get_file_content_for_ignition "/opt/openshift/openshift/99-zzz-master-static-hostname.yaml" "/data/sno/disconnected/99-zzz-master-static-hostname.yaml"
VAR_99_master_master_static_hostname=$RET_VAL
VAR_99_master_master_static_hostname_2=$RET_VAL_2

VAR_PWD_HASH="$(python3 -c 'import crypt,getpass; print(crypt.crypt("redhat"))')"

tmppath=$(mktemp)
cat /data/install/bootstrap.ign \
  | jq --arg VAR "$VAR_PWD_HASH" --arg VAR_SSH "$NODE_SSH_KEY" '.passwd.users += [{ "name": "wzh", "system": true, "passwordHash": $VAR , "sshAuthorizedKeys": [ $VAR_SSH ], "groups": [ "adm", "wheel", "sudo", "systemd-journal"  ] }]' \
  | jq --argjson VAR "$VAR_99_master_bootstrap_ip_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_master_static_hostname" '.storage.files += [$VAR] ' \
  | jq -c . \
  > ${tmppath}
/bin/cp -f ${tmppath} /data/install/bootstrap.ign
rm -f ${tmppath}

开始 IPI 安装 openshift4


/data/ocp4/${BUILDNUMBER}/openshift-baremetal-install --dir /data/install/ --log-level debug create cluster

安装自动开始,等2分钟以后,可以看到自动创建了一个bootstrap虚拟机

bootstrap运行一段时间后,会通过redfish,启动 master vm.


# we can login to the bootstrap by using username and password ( wzh/redhat ) in console
# or we can login using ssh
ssh core@192.168.7.12

# 在安装过程中,安装程序会检查master-0节点的hostname是不是localhost,不是的话等待网络配置
# 这个超时时间还有点长,等不及的话,登录到master-0节点上,直接用以下命令改一下
# hostnamectl set-hostname acm-demo-hub-master

# 在安装过程中,也许是bug,apiVIP, ingressVIP 无法漂移到master-0上正常加载
# 我们手动加上去就好了
# 这并不是一个bug,而是一个解决方案,因为IPI安装的设计,是要求3个master节点. 也许以后会内置支持吧。
# on master-0 kvm
nmcli con mod enp1s0 +ipv4.addresses 192.168.7.100/32
nmcli con mod enp1s0 +ipv4.addresses 192.168.7.101/32
nmcli con mod enp1s0 +ipv4.addresses 192.168.7.103/32
nmcli con up enp1s0

/data/ocp4/${BUILDNUMBER}/openshift-baremetal-install --dir /data/install/ wait-for bootstrap-complete --log-level debug
# DEBUG Bootstrap status: complete
# INFO It is now safe to remove the bootstrap resources
# DEBUG Time elapsed per stage:
# DEBUG Bootstrap Complete: 14s
# DEBUG                API: 14s
# INFO Time elapsed: 14s

/data/ocp4/${BUILDNUMBER}/openshift-baremetal-install --dir /data/install/  wait-for install-complete --log-level debug
# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.acm-demo-hub.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "FpbMV-zasXr-8xczB-SSuIy"
# DEBUG Time elapsed per stage:
# DEBUG Cluster Operators: 8m39s
# INFO Time elapsed: 8m39s

# on kvm host, copy back auth folder to helper node
rsync -arz /data/install/auth root@192.168.7.11:/data/install/

# Go back to helper
ansible localhost -m lineinfile -a 'path=$HOME/.bashrc regexp="^export KUBECONFIG" line="export KUBECONFIG=/data/install/auth/kubeconfig"'
source $HOME/.bashrc

oc get node
# NAME                  STATUS   ROLES           AGE    VERSION
# acm-demo-hub-master   Ready    master,worker   143m   v1.23.3+e419edf

oc get pod -n openshift-machine-api
# NAME                                          READY   STATUS    RESTARTS       AGE
# cluster-autoscaler-operator-86fb4975-ljssk    2/2     Running   8              137m
# cluster-baremetal-operator-5946dc9f9b-sksrh   2/2     Running   6              137m
# machine-api-controllers-9688d969d-qgn2j       7/7     Running   32 (34m ago)   135m
# machine-api-operator-568bb89984-s28kx         2/2     Running   6              137m
# metal3-d88947f6f-rbp9m                        7/7     Running   24 (35m ago)   134m
# metal3-image-cache-vf548                      1/1     Running   3              134m
# metal3-image-customization-577f886bb4-v7xg5   1/1     Running   3              134m

oc get all -n openshift-kni-infra
# NAME                                 READY   STATUS    RESTARTS   AGE
# pod/coredns-acm-demo-hub-master      2/2     Running   4          92m
# pod/haproxy-acm-demo-hub-master      2/2     Running   4          93m
# pod/keepalived-acm-demo-hub-master   2/2     Running   4          92m

oc get BareMetalHost -n openshift-machine-api
# NAME                  STATE                    CONSUMER                      ONLINE   ERROR   AGE
# acm-demo-hub-master   externally provisioned   acm-demo-hub-6rh7s-master-0   true             157m

oc get bmh -n openshift-machine-api
# NAME                  STATE                    CONSUMER                      ONLINE   ERROR   AGE
# acm-demo-hub-master   externally provisioned   acm-demo-hub-6rh7s-master-0   true             161m

可以看到web console上node的配置指向了bm

我们也可以看到久违的machine配置

machine set 也有了

有了machine 自然 machine health check 也有了

有一个单独的 baremetal hosts 的页面也出来了

静态添加 vip for api_server, ingress

我们是定制的 SNO IPI,其实不需要 api server , ingress 的 vip, 所以我们就写死到节点的启动脚本中,把这些 vip 给静态加上。 但是默认 ipi 安装会有一个 keepalived static pod , 启动的时候,会清除到这些vip,那么我们还要把这个 keepalived static pod 关掉,否则会导致 vip 不可用。

# on helper

cat << EOF > /data/install/wzh.script
#!/bin/bash

nmcli con mod enp1s0 +ipv4.addresses 192.168.7.100/32
nmcli con mod enp1s0 +ipv4.addresses 192.168.7.101/32
nmcli con mod enp1s0 +ipv4.addresses 192.168.7.103/32
nmcli con up enp1s0

EOF

var_local=$(cat /data/install/wzh.script | python3 -c "import sys, urllib.parse; print(urllib.parse.quote(''.join(sys.stdin.readlines())))"  )

cat <<EOF > /data/install/45-master-wzh-service.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 45-master-wzh-service
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          source: data:text/plain,${var_local}
          verification: {}
        filesystem: root
        mode: 0755
        path: /etc/rc.d/wzh.local
      - path: /etc/kubernetes/manifests/keepalived.yaml
        contents:
          source: data:text/plain,
          verification: {}
        filesystem: root
        mode: 0644
        overwrite: true
    systemd:
      units:
      - name: wzh.service
        enabled: true
        contents: |
          [Unit]
          Description=/etc/rc.d/wzh.local Compatibility
          ConditionFileIsExecutable=/etc/rc.d/wzh.local
          After=network.target

          [Service]
          Type=oneshot
          User=root
          Group=root
          ExecStart=/bin/bash -c /etc/rc.d/wzh.local

          [Install]
          WantedBy=multi-user.target

EOF
oc apply -f 45-master-wzh-service.yaml 

安装后的操作

添加一个新节点(sno未验证)

IPI 模式下,添加一个新节点非常方便,只要定义一个BareMetalHost就好了。

cd /data/install/
cat << EOF > /data/install/bmh.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: worker-2-bmc-secret
type: Opaque
data:
  username: $(echo -ne "admin" | base64)
  password: $(echo -ne "password" | base64)
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: worker-2
spec:
  online: true
  bootMACAddress: $(cat mac.list | grep worker2 | awk '{print $2}')
  bmc:
    address: redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/$(cat vm.list | grep worker2 | awk '{print $1}')
    credentialsName: worker-2-bmc-secret
    disableCertificateVerification: true
  rootDeviceHints:
    deviceName: /dev/vda
EOF
oc -n openshift-machine-api create -f bmh.yaml

# DO NOT USE, restore, delete the vm
oc -n openshift-machine-api delete -f bmh.yaml

oc get bmh -n openshift-machine-api
# NAME       STATUS   PROVISIONING STATUS      CONSUMER                    BMC                                                                                               HARDWARE PROFILE   ONLINE   ERROR
# master-0   OK       externally provisioned   ocp4-zn8lq-master-0         redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/965c420a-f127-4639-9184-fe3546d2bde4                      true
# master-1   OK       externally provisioned   ocp4-zn8lq-master-1         redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/46f9dff4-1b44-4286-8a7c-691673340030                      true
# master-2   OK       externally provisioned   ocp4-zn8lq-master-2         redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/9e544eb6-1b98-4b0a-ad32-7df232ae582a                      true
# worker-0   OK       provisioned              ocp4-zn8lq-worker-0-mv4d7   redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/c399c6b7-525a-4f4e-8280-0472b6494fc5   unknown            true
# worker-1   OK       provisioned              ocp4-zn8lq-worker-0-9frt6   redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/a4052132-7598-4879-b3e1-c48c47cf67ed   unknown            true
# worker-2   OK       inspecting                                           redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/2eee2e57-e18b-460b-bb3f-7f048f84c69b                      true

oc get machinesets -n openshift-machine-api
# NAME                  DESIRED   CURRENT   READY   AVAILABLE   AGE
# ocp4-zn8lq-worker-0   2         2         2       2           155m

oc get machinesets -n openshift-machine-api -o json | jq -r .items[0].metadata.name

# 扩容worker到3副本,会触发worker-2的部署
oc scale --replicas=3 machineset $(oc get machinesets -n openshift-machine-api -o json | jq -r .items[0].metadata.name) -n openshift-machine-api

镜像仓库代理 / image registry proxy

准备离线镜像仓库非常麻烦,好在我们找到了一台在线的主机,那么我们可以使用nexus构造image registry proxy,在在线环境上面,做一遍PoC,然后就能通过image registry proxy得到离线镜像了

  • https://mtijhof.wordpress.com/2018/07/23/using-nexus-oss-as-a-proxy-cache-for-docker-images/
#####################################################
# init build the nexus fs
/bin/cp -f nexus-image.tgz /data/ccn/
tar zxf nexus-image.tgz
chown -R 200 /data/ccn/nexus-image

# podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z docker.io/sonatype/nexus3:3.29.0

podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh

podman stop nexus-image
podman rm nexus-image

# get the admin password
cat /data/ccn/nexus-image/admin.password && echo
# 84091bcd-c82f-44a3-8b7b-dfc90f5b7da1

# open http://nexus.ocp4.redhat.ren:8082

# 开启 https
# https://blog.csdn.net/s7799653/article/details/105378645
# https://help.sonatype.com/repomanager3/system-configuration/configuring-ssl#ConfiguringSSL-InboundSSL-ConfiguringtoServeContentviaHTTPS
mkdir -p /data/install/tmp
cd /data/install/tmp

# 将证书导出成pkcs格式
# 这里需要输入密码  用 password,
openssl pkcs12 -export -out keystore.pkcs12 -inkey /etc/crts/redhat.ren.key -in /etc/crts/redhat.ren.crt

cat << EOF >> Dockerfile
FROM docker.io/sonatype/nexus3:3.29.0
USER root
COPY keystore.pkcs12 /keystore.pkcs12
RUN keytool -v -importkeystore -srckeystore keystore.pkcs12 -srcstoretype PKCS12 -destkeystore keystore.jks -deststoretype JKS -storepass password -srcstorepass password  &&\
    cp keystore.jks /opt/sonatype/nexus/etc/ssl/
USER nexus
EOF
buildah bud --format=docker -t docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh -f Dockerfile .
buildah push docker.io/wangzheng422/imgs:nexus3-3.29.0-wzh

######################################################
# go to helper, update proxy setting for ocp cluster
cd /data/ocp4
bash image.registries.conf.sh nexus.ocp4.redhat.ren:8083

mkdir -p /etc/containers/registries.conf.d
/bin/cp -f image.registries.conf /etc/containers/registries.conf.d/

cd /data/ocp4
oc apply -f ./99-worker-container-registries.yaml -n openshift-config
oc apply -f ./99-master-container-registries.yaml -n openshift-config

######################################################
# dump the nexus image fs out
podman stop nexus-image

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date
cd /data/ccn

tar cf - ./nexus-image | pigz -c > nexus-image.tgz 
buildah from --name onbuild-container scratch
buildah copy onbuild-container nexus-image.tgz  /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/nexus-fs:image-$var_date
# buildah rm onbuild-container
# rm -f nexus-image.tgz 
buildah push docker.io/wangzheng422/nexus-fs:image-$var_date
echo "docker.io/wangzheng422/nexus-fs:image-$var_date"

# 以下这个版本,可以作为初始化的image proxy,里面包含了nfs provision,以及sample operator的metadata。很高兴的发现,image stream并不会完全下载镜像,好想只是下载metadata,真正用的时候,才去下载。
# docker.io/wangzheng422/nexus-fs:image-2020-12-26-1118

配置镜像仓库的ca

安装过程里面,已经把镜像仓库的ca放进去了,但是好想image stream不认,让我们再试试

oc project openshift-config
oc create configmap ca.for.registry -n openshift-config \
    --from-file=registry.ocp4.redhat.ren..5443=/data/install/redhat.ren.ca.crt \
    --from-file=nexus.ocp4.redhat.ren..8083=/data/install/redhat.ren.ca.crt 
oc patch image.config.openshift.io/cluster -p '{"spec":{"additionalTrustedCA":{"name":"ca.for.registry"}}}'  --type=merge

# oc patch image.config.openshift.io/cluster -p '{"spec":{"registrySources":{"insecureRegistries":["nexus.ocp4.redhat.ren:8083"]}}}'  --type=merge

oc get image.config.openshift.io/cluster -o yaml

# openshift project下面的image stream重新加载一下把
oc get is -o json | jq -r '.items[].metadata.name' | xargs -L1 oc import-image --all 

配置internal registry

我们的工具机是带nfs的,那么就给interneal registry配置高档一些的nfs存储吧,不要用emptydir

bash /data/ocp4/ocp4-upi-helpernode-master/files/nfs-provisioner-setup.sh

# oc edit configs.imageregistry.operator.openshift.io
# 修改 storage 部分
# storage:
#   pvc:
#     claim:
oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Managed","storage":{"pvc":{"claim":""}}}}' --type=merge

oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

oc get clusteroperator image-registry

oc get configs.imageregistry.operator.openshift.io cluster -o yaml

# 把imagepruner给停掉
# https://bugzilla.redhat.com/show_bug.cgi?id=1852501#c24
# oc patch imagepruner.imageregistry/cluster --patch '{"spec":{"suspend":true}}' --type=merge
# oc -n openshift-image-registry delete jobs --all

配置sample operator

openshift内置了一个sample operator,里面有一大堆红帽的产品。

oc get configs.samples.operator.openshift.io/cluster -o yaml

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Managed", "samplesRegistry": "nexus.ocp4.redhat.ren:8083"}}' --type=merge

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

chrony/NTP 设置

在 ocp 4.6 里面,需要设定ntp同步,我们之前ansible脚本,已经创建好了ntp的mco配置,把他打到系统里面就好了。

oc apply -f /data/ocp4/ocp4-upi-helpernode-master/machineconfig/

Operator Hub 离线安装

使用nexus作为image proxy以后,就不需要做这个离线操作了,但是如果我们想搞CCN这种项目,因为他自带了一个catalog,为了避免冲突,我们可能还是需要屏蔽到默认的operator hub


oc patch OperatorHub cluster --type json \
    -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'

oc get OperatorHub cluster -o yaml

给 openshift project image stream 打补丁

在有代理的网络环境中,我们需要给openshift project下的image stream打一些补丁。

cd /data/ocp4
bash is.patch.sh registry.ocp4.redhat.ren:5443/ocp4/openshift4

给 router / ingress 更换证书

有时候,我们需要公网CA认证的证书,给router来用,那么我们就搞一下

https://docs.openshift.com/container-platform/4.6/security/certificates/replacing-default-ingress-certificate.html


mkdir -p /data/ccn/ingress-keys/etc
mkdir -p /data/ccn/ingress-keys/lib
cd /data/ccn/ingress-keys
podman run -it --rm --name certbot \
            -v "/data/ccn/ingress-keys/etc:/etc/letsencrypt":Z \
            -v "/data/ccn/ingress-keys/lib:/var/lib/letsencrypt":Z \
            docker.io/certbot/certbot certonly  -d "*.apps.ocp4.redhat.ren" --manual --preferred-challenges dns-01  --server https://acme-v02.api.letsencrypt.org/directory

cp ./etc/archive/apps.ocp4.redhat.ren/fullchain1.pem apps.ocp4.redhat.ren.crt
cp ./etc/archive/apps.ocp4.redhat.ren/privkey1.pem apps.ocp4.redhat.ren.key

ssh root@192.168.7.11 mkdir -p /data/install/ingress-key

scp apps.* root@192.168.7.11:/data/install/ingress-key

# on helper
cd /data/install/ingress-key

oc create secret tls wzh-ingress-key \
     --cert=apps.ocp4.redhat.ren.crt \
     --key=apps.ocp4.redhat.ren.key \
     -n openshift-ingress

oc patch ingresscontroller.operator default \
     --type=merge -p \
     '{"spec":{"defaultCertificate": {"name": "wzh-ingress-key"}}}' \
     -n openshift-ingress-operator

build the pip dependencies for rhel8


export BUILDNUMBER=4.10.4

dnf groupinstall -y 'Development Tools'
dnf -y install python3-pip libvirt libvirt-devel python3-devel

pip3 uninstall -y $(pip3 list --user --format=legacy | awk '{print $1}' | tr '\n' ' ' )

pip3 install --user setuptools-rust
pip3 install --user virtualbmc
pip3 install --user sushy-tools
pip3 freeze --user > requirements.txt
# pip3 install -r requirements.txt --user
mkdir -p wheelhouse
pip3 download -r requirements.txt -d wheelhouse
/bin/cp -f requirements.txt wheelhouse/
tar -zcf wheelhouse.tar.gz wheelhouse

buildah from --name onbuild-container scratch
buildah copy onbuild-container wheelhouse.tar.gz /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container quay.io/wangzheng422/qimgs:ocp.bm.ipi.python.dep.rhel8-${BUILDNUMBER}
# buildah rm onbuild-container
buildah push quay.io/wangzheng422/qimgs:ocp.bm.ipi.python.dep.rhel8-${BUILDNUMBER}
echo "quay.io/wangzheng422/qimgs:ocp.bm.ipi.python.dep.rhel8-${BUILDNUMBER}"

# quay.io/wangzheng422/qimgs:ocp.bm.ipi.python.dep.rhel8-4.10.4

排错技巧


# login to bootstrap to debug
# find the ip from kvm console
ssh -i ~/.ssh/helper_rsa core@192.168.7.75
journalctl -b -f -u release-image.service -u bootkube.service
journalctl -b -u release-image.service -u bootkube.service | grep -i baremetal
sudo -i
export KUBECONFIG=/etc/kubernetes/kubeconfig
oc get pod -n openshift-machine-api
oc get BareMetalHost -n openshift-machine-api

# debug why bootstrap can't be ping...
cat .openshift_install_state.json | jq  '."*bootstrap.Bootstrap"'.Config.storage.files[].path

cat .openshift_install_state.json | jq -r '."*bootstrap.Bootstrap"'.File.Data | base64 -d | jq -r . > ign.json

cat .openshift_install_state.json | jq  -r '."*bootstrap.Bootstrap".Config.storage.files[].contents.source ' | sed 's/.*base64,//g' | base64 -d > decode

cat .openshift_install_state.json | jq  -r '."*bootstrap.Bootstrap".Config.storage.files[] | .path, .contents.source ' | while read -r line ; do if [[ $line =~ .*base64,.* ]]; then echo $(echo $line | sed 's/.*base64,//g' | base64 -d) ; else echo $line; fi; done > files

cat bootstrap.ign | jq '.storage.files[] | select ( .path == "/opt/openshift/openshift/99_baremetal-provisioning-config.yaml" ) ' | jq  -r .contents.source | sed 's/.*base64,//g' | base64 -d

cat bootstrap.ign | jq '.storage.files[] | select ( .path | contains("/opt/openshift/openshift/") ) ' | jq  -r .contents.source | sed 's/.*base64,//g' | base64 -d


openshift4.10 acm with ztp disconnected static-ip auto

本文介绍,在openshift4.10上,装ACM组件以后,如何通过zero touch provision的方式,来部署一个单节点openshift4.10的集群(SNO),在部署的过程中,我们模拟离线的网络环境,并且禁止DHCP,只用静态IP。

ZTP(zero touch provision)模式之所以诱人,是因为他只需要baremetal的bmc信息,以及网卡的mac地址,就可以完成集群的部署。ACM会创建一个iso,并调用bmc的api,去挂载这个iso并启动。

本次实验,使用一个半自动流程,就是让ACM创建iso,但是手动用iso启动kvm。整个流程如下:

  1. 在openshift4上安装ACM组件
  2. 在ACM上配置cluster, infra env等配置。
  3. ACM通过网络启动kvm
  4. kvm自动开始集群安装,但是由于kvm+redfish的限制,需要手动配置之后的重启都是由硬盘启动。
  5. 集群安装完成,保存集群登录信息

本次实验的部署架构图:

本次实验有一个前导实验,就是用一个单机版本的assisted install service部署一个SNO集群,这个SNO集群是本次实验部署ACM的基础。这个前导实验如何做,请参见这里

参考资料:

  • https://github.com/jparrill/ztp-the-hard-way/blob/main/docs/connected-ZTP-flow-hub-deployment.md
  • https://github.com/jparrill/ztp-the-hard-way/blob/main/docs/disconnected-ZTP-flow-hub-deployment.md

视频讲解

静态变量和 kvm 配置

assisted install 模式下,如果想静态ip安装,需要在实验网络上部署一个dns服务。因为我们部署的是single node openshift,只需要把如下4个域名,指向同一个ip地址就可以。当然,你需要提前想好域名。同时,我们的实验环境里面,其实有2个SNO,所以要配置2套域名。

  • acm-demo-hub.redhat.ren
    • api.acm-demo-hub.redhat.ren
    • api-int.acm-demo-hub.redhat.ren
    • *.apps.acm-demo-hub.redhat.ren
    • acm-demo-hub-master.acm-demo-hub.redhat.ren
  • acm-demo1.redhat.ren
    • api.acm-demo1.redhat.ren
    • api-int.acm-demo1.redhat.ren
    • *.apps.acm-demo1.redhat.ren
    • acm-demo1-master.acm-demo1.redhat.ren

我们复用本作者基于上游改的一套ansible脚本来配置这个dns

# on helper

# 做一些配置参数定义
INSTALL_IMAGE_REGISTRY=quaylab.infra.redhat.ren
PULL_SECRET='{"auths":{"registry.redhat.io": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"'${INSTALL_IMAGE_REGISTRY}'": {"auth": "'$( echo -n 'quayadmin:password' | openssl base64 )'","email": "noemail@localhost"}}}'

ACM_DEMO_CLUSTER=acm-demo1

SNO_BASE_DOMAIN=redhat.ren
SNO_IP=192.168.7.15
SNO_GW=192.168.7.1
SNO_NETMAST=255.255.255.0
SNO_NETMAST_S=24
SNO_HOSTNAME=acm-demo1-master
SNO_IF=enp1s0
SNO_IF_MAC=`printf '00:60:2F:%02X:%02X:%02X' $[RANDOM%256] $[RANDOM%256] $[RANDOM%256]`
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_CORE_PWD=redhat

echo ${SNO_IF_MAC} > /data/install/acm.demo1.mac

# back to kvm host

create_lv() {
    var_vg=$1
    var_lv=$2
    var_size=$3
    lvremove -f $var_vg/$var_lv
    lvcreate -y -L $var_size -n $var_lv $var_vg
    wipefs --all --force /dev/$var_vg/$var_lv
}

create_lv vgdata lvacmdemo1 120G

export KVM_DIRECTORY=/data/kvm

mkdir -p  ${KVM_DIRECTORY}
cd ${KVM_DIRECTORY}
scp root@192.168.7.11:/data/install/acm.demo1.mac ${KVM_DIRECTORY}/

# on kvm host
# export KVM_DIRECTORY=/data/kvm
virt-install --name=ocp4-acm-demo1-master0 --vcpus=16 --ram=32768 \
--cpu=host-model \
--disk path=/dev/vgdata/lvacmdemo1,device=disk,bus=virtio,format=raw \
--disk device=cdrom \
--os-variant rhel8.3 --network bridge=baremetal,model=virtio,mac=$(<acm.demo1.mac) \
--graphics vnc,port=59013 \
--boot uefi,nvram_template=/usr/share/OVMF/OVMF_VARS.fd,menu=on \
--print-xml > ${KVM_DIRECTORY}/ocp4-acm-demo1.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-acm-demo1.xml


cd /data/kvm/
# for i in master{0..2} worker{0..2}
for i in acm-demo1-master{0..0}
do
  echo -ne "${i}\t" ; 
  virsh dumpxml ocp4-${i} | grep "mac address" | cut -d\' -f2 | tr '\n' '\t'
  echo 
done > mac.list
cat /data/kvm/mac.list
# acm-demo1-master0       00:60:2f:ee:aa:4e

scp /data/kvm/mac.list root@192.168.7.11:/data/install/

DNS 配置

# back to helper

# set up dns
cd /data/ocp4/ocp4-upi-helpernode-master/
cat << 'EOF' > /data/ocp4/ocp4-upi-helpernode-master/vars.yaml
---
ocp_version: 4.10.4
ssh_gen_key: false
staticips: true
bm_ipi: true
firewalld: false
dns_forward: true
iso:
  iso_dl_url: "file:///data/ocp4/rhcos-live.x86_64.iso"
  my_iso: "rhcos-live.iso"
helper:
  name: "helper"
  ipaddr: "192.168.7.11"
  networkifacename: "enp1s0"
  gateway: "192.168.7.1"
  netmask: "255.255.255.0"
dns:
  domain: "redhat.ren"
  clusterid: "ocp4"
  forwarder1: "172.21.1.1"
  forwarder2: "172.21.1.1"
  api_vip: "192.168.7.100"
  ingress_vip: "192.168.7.101"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.12"
  interface: "enp1s0"
  install_drive: "vda"
  # macaddr: "52:54:00:7e:f8:f7"
masters:
  - name: "master-0"
    ipaddr: "192.168.7.13"
    interface: "enp1s0"
    install_drive: "vda"
    # macaddr: "$(cat /data/install/mac.list | grep master0 | awk '{print $2}')"
#   - name: "master-1"
#     ipaddr: "192.168.7.14"
#     interface: "enp1s0"
#     install_drive: "vda"    
#     macaddr: "$(cat /data/install/mac.list | grep master1 | awk '{print $2}')"
#   - name: "master-2"
#     ipaddr: "192.168.7.15"
#     interface: "enp1s0"
#     install_drive: "vda"   
#     macaddr: "$(cat /data/install/mac.list | grep master2 | awk '{print $2}')"
# workers:
#   - name: "worker-0"
#     ipaddr: "192.168.7.16"
#     interface: "enp1s0"
#     install_drive: "vda"
#     macaddr: "$(cat /data/install/mac.list | grep worker0 | awk '{print $2}')"
#   - name: "worker-1"
#     ipaddr: "192.168.7.17"
#     interface: "enp1s0"
#     install_drive: "vda"
#     macaddr: "$(cat /data/install/mac.list | grep worker1 | awk '{print $2}')"
#   - name: "worker-2"
#     ipaddr: "192.168.7.18"
#     interface: "enp1s0"
#     install_drive: "vda"
#     macaddr: "$(cat /data/install/mac.list | grep worker2 | awk '{print $2}')"
others:
  - name: "registry"
    ipaddr: "192.168.7.103"
  - name: "yum"
    ipaddr: "172.21.6.103"
  - name: "quay"
    ipaddr: "172.21.6.103"
  - name: "nexus"
    ipaddr: "172.21.6.103"
  - name: "git"
    ipaddr: "172.21.6.103"
otherdomains:
  - domain: "infra.redhat.ren"
    hosts:
    - name: "registry"
      ipaddr: "192.168.7.1"
    - name: "yum"
      ipaddr: "192.168.7.1"
    - name: "quay"
      ipaddr: "192.168.7.1"
    - name: "quaylab"
      ipaddr: "192.168.7.1"
    - name: "nexus"
      ipaddr: "192.168.7.1"
    - name: "git"
      ipaddr: "192.168.7.1"
  - domain: "acm-demo1.redhat.ren"
    hosts:
    - name: "api"
      ipaddr: "192.168.7.15"
    - name: "api-int"
      ipaddr: "192.168.7.15"
    - name: "acm-demo1-master"
      ipaddr: "192.168.7.15"
    - name: "*.apps"
      ipaddr: "192.168.7.15"
  - domain: "acm-demo-hub.redhat.ren"
    hosts:
    - name: "api"
      ipaddr: "192.168.7.13"
    - name: "api-int"
      ipaddr: "192.168.7.13"
    - name: "acm-demo-hub-master"
      ipaddr: "192.168.7.13"
    - name: "*.apps"
      ipaddr: "192.168.7.13"
force_ocp_download: false
remove_old_config_files: false
ocp_client: "file:///data/ocp4/{{ ocp_version }}/openshift-client-linux-{{ ocp_version }}.tar.gz"
ocp_installer: "file:///data/ocp4/{{ ocp_version }}/openshift-install-linux-{{ ocp_version }}.tar.gz"
ppc64le: false
arch: 'x86_64'
chronyconfig:
  enabled: true
  content:
    - server: "192.168.7.11"
      options: iburst
setup_registry: # don't worry about this, just leave it here
  deploy: false
  registry_image: docker.io/library/registry:2
  local_repo: "ocp4/openshift4"
  product_repo: "openshift-release-dev"
  release_name: "ocp-release"
  release_tag: "4.6.1-x86_64"
ocp_filetranspiler: "file:///data/ocp4/filetranspiler.tgz"
registry_server: "registry.ocp4.redhat.ren:5443"
EOF

ansible-playbook -e @vars.yaml tasks/main.yml

# then followin AIS, to install sno using 192.168.7.13

部署CNV

我们部署ACM,是需要存储的,最简单的存储,就是本地目录啦,那我们就需要一个自动的auto provisioner,正好CNV带有一个hostpath auto provisioner,所以作者就犯懒,部署一个CNV,为的是里面的本地目录的自动部署。

# 首先需要一个本地目录
cat << EOF > /data/install/host-path.yaml
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: 50-set-selinux-for-hostpath-master
  labels:
    machineconfiguration.openshift.io/role: master
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - contents: |
            [Unit]
            Description=Set SELinux chcon for hostpath baicell
            Before=kubelet.service

            [Service]
            Type=oneshot
            RemainAfterExit=yes
            ExecStartPre=-mkdir -p /var/hostpath
            ExecStart=chcon -Rt container_file_t /var/hostpath/

            [Install]
            WantedBy=multi-user.target
          enabled: true
          name: hostpath-baicell.service
EOF
oc create -f /data/install/host-path.yaml

# install operator OpenShift Virtualization
# active HostPathProvisioner deployment

# https://docs.openshift.com/container-platform/4.9/virt/install/installing-virt-cli.html

cat << EOF > /data/install/cnv.subscript.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-cnv
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: kubevirt-hyperconverged-group
  namespace: openshift-cnv
spec:
  targetNamespaces:
    - openshift-cnv
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: hco-operatorhub
  namespace: openshift-cnv
spec:
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  name: kubevirt-hyperconverged
  # startingCSV: kubevirt-hyperconverged-operator.v4.9.3
  channel: "stable" 
EOF
oc create -f /data/install/cnv.subscript.yaml

# 创建hostpath配置
cat << EOF > /data/install/host-path-provision.yaml
apiVersion: hostpathprovisioner.kubevirt.io/v1beta1
kind: HostPathProvisioner
metadata:
  name: hostpath-provisioner
spec:
  imagePullPolicy: IfNotPresent
  pathConfig:
    path: "/var/hostpath" 
    useNamingPrefix: false 

EOF
oc create -f /data/install/host-path-provision.yaml -n openshift-cnv

# 创建storage class配置
cat << EOF > /data/install/host-path-storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hostpath-provisioner 
  annotations:
    storageclass.kubernetes.io/is-default-class: 'true'
provisioner: kubevirt.io/hostpath-provisioner
reclaimPolicy: Delete 
volumeBindingMode: WaitForFirstConsumer 
EOF
oc create -f /data/install/host-path-storage-class.yaml

部署完了这样:

部署ACM

接下来,我们就部署ACM,我们用最简单的部署模式。

# install operator Advanced Cluster Management for Kubernetes

# https://docs.openshift.com/container-platform/4.9/scalability_and_performance/ztp-deploying-disconnected.html#enabling-assisted-installer-service-on-bare-metal_ztp-deploying-disconnected
# https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/install/installing#installing-from-the-cli


cat << EOF > /data/install/acm.subscript.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: open-cluster-management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name:  open-cluster-management-wzh
  namespace: open-cluster-management
spec:
  targetNamespaces:
    - open-cluster-management
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: advanced-cluster-management
  namespace: open-cluster-management
spec:
  sourceNamespace: openshift-marketplace
  source: redhat-operators
  channel: release-2.4
  installPlanApproval: Automatic
  name: advanced-cluster-management
  # startingCSV: advanced-cluster-management.v2.4.2
EOF
oc create -f /data/install/acm.subscript.yaml

# RHACM create the MultiClusterHub resource

cat << EOF > /data/install/acm.mch.mch.yaml
apiVersion: operator.open-cluster-management.io/v1
kind: MultiClusterHub
metadata:
  name: multiclusterhub
  namespace: open-cluster-management
spec: {}
EOF
oc create -f /data/install/acm.mch.mch.yaml

装好了是这样:

我们可以通过webUI访问ACM: https://multicloud-console.apps.acm-demo-hub.redhat.ren/overview

我们可以看到有一个local clustr,这个就是ACM自己运行的集群:

用ZTP模式部署一个SNO

有过部署assisted install service,并通过AIS来部署SNO的经验,那么通过ACM,用ZTP的模式来部署,就容易理解了,整个过程一样,都是配置ACM里面的assisted install service,然后创建一个iso出来,调用BMC API,来直接挂载iso,并启动主机。

命令行配置新集群

ACM 2.4 UI 上并不是完全支持ZTP,所以有些配置要用命令行完成,后续版本会把UI补上。

# https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html-single/clusters/index#infra-env-prerequisites
oc project open-cluster-management

# do not need, because now, it is acm 2.4.2
# but it seems doesn't matter, if you enable it
oc patch hiveconfig hive --type merge -p '{"spec":{"targetNamespace":"hive","logLevel":"debug","featureGates":{"custom":{"enabled":["AlphaAgentInstallStrategy"]},"featureSet":"Custom"}}}'

oc get hiveconfig hive -o yaml
# ......
# spec:
#   featureGates:
#     custom:
#       enabled:
#       - AlphaAgentInstallStrategy
#     featureSet: Custom
#   logLevel: debug
#   targetNamespace: hive
# ......

oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true }}'

oc get provisioning provisioning-configuration -o yaml
# ......
# spec:
#   preProvisioningOSDownloadURLs: {}
#   provisioningIP: 192.168.7.103
#   provisioningMacAddresses:
#   - 00:60:2f:ab:66:f6
#   provisioningNetwork: Disabled
#   provisioningNetworkCIDR: 192.168.7.0/24
#   provisioningOSDownloadURL: http://192.168.7.11:8080/install/rhcos-openstack.x86_64.qcow2.gz?sha256=6b5731d90fa78eb50c07928811675d$f9c1d3f94eca0a94afef17cfbce706ddf
#   watchAllNamespaces: true
# ......

cat << EOF > /data/install/acm.ocp.release.yaml
apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
  name: openshift-v4.10.4
  namespace: open-cluster-management
spec:
  releaseImage: ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4:4.10.4-x86_64
EOF
oc create -f /data/install/acm.ocp.release.yaml

oc get ClusterImageSet
# NAME                RELEASE
# openshift-v4.10.4   quaylab.infra.redhat.ren/ocp4/openshift4:4.10.4-x86_64

cat << EOF > /data/install/acm.cm.asc.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: assisted-service-config
  namespace: open-cluster-management
  labels:
    app: assisted-service
data:
  LOG_LEVEL: "debug"
EOF
oc create -f /data/install/acm.cm.asc.yaml


cat << EOF > /data/install/acm.secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: assisted-deployment-pull-secret
  namespace: open-cluster-management
stringData:
  .dockerconfigjson: '$PULL_SECRET'
EOF
oc create -f /data/install/acm.secret.yaml

# oc get pod -A | grep metal3
# the result is empty, so we will go in manual way
oc get pod -A | grep metal3
# openshift-machine-api                              metal3-697fb46867-8zxxw                                           7/7     Running     8 (42m ago)    4h40m
# openshift-machine-api                              metal3-image-cache-hhvnm                                          1/1     Running     1              4h40m
# openshift-machine-api                              metal3-image-customization-577f886bb4-cwl2l                       1/1     Running     1              4h40m

# curl -s https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.9.12/release.txt | grep 'machine-os '
cat /data/ocp4/4.10.4/release.txt | grep 'machine-os '
  # machine-os 410.84.202203081640-0 Red Hat Enterprise Linux CoreOS

cat << EOF > /data/install/acm.mirror.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: hyper1-mirror-config
  namespace: open-cluster-management
  labels:
    app: assisted-service
data:
  ca-bundle.crt: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/    /g' )
  registries.conf: |
    unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]

    [[registry]]
      prefix = ""
      location = "quay.io/openshift-release-dev/ocp-release"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4"

    [[registry]]
      prefix = ""
      location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4"

---
EOF
oc create -f /data/install/acm.mirror.yaml


cat << EOF > /data/install/acm.agentservicecofnig.yaml
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
  name: agent
  namespace: open-cluster-management
  ### This is the annotation that injects modifications in the Assisted Service pod
  annotations:
    unsupported.agent-install.openshift.io/assisted-service-configmap: "assisted-service-config"
###
spec:
  databaseStorage:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 40Gi
  filesystemStorage:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 40Gi
  ### This is a ConfigMap that only will make sense on Disconnected environments
  mirrorRegistryRef:
    name: "hyper1-mirror-config"
  ###
  osImages:
    - openshiftVersion: "4.10"
      version: "410.84.202203081640-0"
      url: "http://192.168.7.11:8080/install/live.iso"
      rootFSUrl: "http://192.168.7.11:8080/install/rootfs.img.4.9"
      cpuArchitecture: x86_64
EOF
oc create -f /data/install/acm.agentservicecofnig.yaml
# oc delete -f /data/install/acm.asc.yaml

oc get AgentServiceConfig/agent -n open-cluster-management -o yaml  
# ......
# status:
#   conditions:
#   - lastTransitionTime: "2022-04-06T09:38:21Z"
#     message: AgentServiceConfig reconcile completed without error.
#     reason: ReconcileSucceeded
#     status: "True"
#     type: ReconcileCompleted

# logs in infrastructure-operator 

# stop here, and wait the assisted-service pod run into ok status
oc get pod -n open-cluster-management | grep assisted
# assisted-image-service-b686cf67d-4hs2t                            1/1     Running   0                44s
# assisted-service-7476bfdd8c-lnnn8                                 2/2     Running   0                44s

# begin to create new cluster

oc create ns ${ACM_DEMO_CLUSTER}
oc project ${ACM_DEMO_CLUSTER}

cat << EOF > /data/install/acm.managed.secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: assisted-deployment-pull-secret
  namespace: ${ACM_DEMO_CLUSTER}
stringData:
  .dockerconfigjson: '$PULL_SECRET'
EOF
oc create -f /data/install/acm.managed.secret.yaml

cat << EOF > /data/install/acm.nmsc.yaml
apiVersion: agent-install.openshift.io/v1beta1
kind: NMStateConfig
metadata:
 name: ${ACM_DEMO_CLUSTER}
 namespace: ${ACM_DEMO_CLUSTER}
 labels:
   nmstate-conf-cluster-name: ${ACM_DEMO_CLUSTER}
spec:
 config:
   interfaces:
     - name: ${SNO_IF}
       type: ethernet
       state: up
       ipv4:
         enabled: true
         address:
           - ip: ${SNO_IP}
             prefix-length: ${SNO_NETMAST_S}
         dhcp: false
   dns-resolver:
     config:
       server:
         - ${SNO_DNS}
   routes:
     config:
       - destination: 0.0.0.0/0
         next-hop-address: ${SNO_GW}
         next-hop-interface: ${SNO_IF}
         table-id: 254
 interfaces:
   - name: "${SNO_IF}" 
     macAddress: ${SNO_IF_MAC}
EOF
oc create -f /data/install/acm.nmsc.yaml

cat << EOF > /data/install/acm.clusterdeployment.yaml
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  name: ${ACM_DEMO_CLUSTER}
  namespace: ${ACM_DEMO_CLUSTER}
spec:
  baseDomain: ${SNO_BASE_DOMAIN}
  clusterName: ${ACM_DEMO_CLUSTER}
  controlPlaneConfig:
    servingCertificates: {}
  installed: false
  clusterInstallRef:
    group: extensions.hive.openshift.io
    kind: AgentClusterInstall
    name: ${ACM_DEMO_CLUSTER}
    version: v1beta1
  platform:
    agentBareMetal:
      agentSelector:
        matchLabels:
          cluster-name: "${ACM_DEMO_CLUSTER}"
  pullSecretRef:
    name: assisted-deployment-pull-secret
EOF
oc create -f /data/install/acm.clusterdeployment.yaml

oc get ClusterDeployment/${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER} -o json | jq .status | head
# {
#   "conditions": [
#     {
#       "lastProbeTime": "2022-04-08T14:58:11Z",
#       "lastTransitionTime": "2022-04-08T14:58:11Z",
#       "message": "Platform credentials passed authentication check",
#       "reason": "PlatformAuthSuccess",
#       "status": "False",
#       "type": "AuthenticationFailure"
#     },

cat << EOF > /data/install/acm.agentclusterinstall.yaml
apiVersion: extensions.hive.openshift.io/v1beta1
kind: AgentClusterInstall
metadata:
  name: ${ACM_DEMO_CLUSTER}
  namespace: ${ACM_DEMO_CLUSTER}
  # Only include the annotation if using OVN, otherwise omit the annotation
#   annotations:
#     agent-install.openshift.io/install-config-overrides: '{"networking":{"networkType":"OVNKubernetes"}}'
spec:
  clusterDeploymentRef:
    name: ${ACM_DEMO_CLUSTER}
  imageSetRef:
    name: openshift-v4.10.4
  networking:
    clusterNetwork:
      - cidr: "10.128.0.0/14"
        hostPrefix: 23
    serviceNetwork:
      - "172.30.0.0/16"
    machineNetwork:
      - cidr: "192.168.7.0/24"
  provisionRequirements:
    controlPlaneAgents: 1
  sshPublicKey: "$(< ~/.ssh/id_rsa.pub)"
EOF
oc create -f /data/install/acm.agentclusterinstall.yaml
# oc delete -f /data/install/acm.agentclusterinstall.yaml

# wait a moment, and this will be ok
oc get AgentClusterInstall/${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER} -o json | jq .status | head
# {
#   "conditions": [
#     {
#       "lastProbeTime": "2022-04-06T13:50:43Z",
#       "lastTransitionTime": "2022-04-06T13:50:43Z",
#       "message": "SyncOK",
#       "reason": "SyncOK",
#       "status": "True",
#       "type": "SpecSynced"
#     },

cat << EOF > /data/install/acm.klusterletaddonconfig.yaml
apiVersion: agent.open-cluster-management.io/v1
kind: KlusterletAddonConfig
metadata:
  name: ${ACM_DEMO_CLUSTER}
  namespace: ${ACM_DEMO_CLUSTER}
spec:
  clusterName: ${ACM_DEMO_CLUSTER}
  clusterNamespace: ${ACM_DEMO_CLUSTER}
  clusterLabels:
    cloud: auto-detect
    vendor: auto-detect
  applicationManager:
    enabled: true
  certPolicyController:
    enabled: true
  iamPolicyController:
    enabled: true
  policyController:
    enabled: true
  searchCollector:
    enabled: true 
EOF
oc create -f /data/install/acm.klusterletaddonconfig.yaml

oc get KlusterletAddonConfig/${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER} -o yaml
# apiVersion: agent.open-cluster-management.io/v1
# kind: KlusterletAddonConfig
# metadata:
#   creationTimestamp: "2022-04-06T13:51:19Z"
#   generation: 1
#   name: acm-demo1
#   namespace: acm-demo1
#   resourceVersion: "2187935"
#   uid: 1615ed53-80a3-48d2-823f-8dff08a97d75
# spec:
#   applicationManager:
#     enabled: true
#   certPolicyController:
#     enabled: true
#   clusterLabels:
#     cloud: auto-detect
#     vendor: auto-detect
#   clusterName: acm-demo1
#   clusterNamespace: acm-demo1
#   iamPolicyController:
#     enabled: true
#   policyController:
#     enabled: true
#   searchCollector:
#     enabled: true
# status:
#   conditions:
#   - lastTransitionTime: "2022-04-06T13:51:19Z"
#     message: The cluster is not provisioned by ACM.
#     reason: OCPGlobalProxyNotDetected
#     status: "False"
#     type: OCPGlobalProxyDetected
#   ocpGlobalProxy: {}

cat << EOF > /data/install/acm.managedcluster.yaml
apiVersion: cluster.open-cluster-management.io/v1
kind: ManagedCluster
metadata:
  name: ${ACM_DEMO_CLUSTER}
spec:
  hubAcceptsClient: true
EOF
oc create -f /data/install/acm.managedcluster.yaml

# 我们是离线安装,所以要定制一下启动配置文件
# generate the ignition

cat << EOF > /data/sno/ign.base.json
{
  "ignition": {
    "version": "3.1.0"
  }
}
EOF

cat << EOF > /data/sno/install.images.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-install-images
storage:
  files:
    - path: /etc/containers/registries.conf.d/base.registries.conf
      overwrite: true
      contents:
        inline: |
          unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
          short-name-mode = ""

          [[registry]]
            prefix = ""
            location = "quay.io/openshift-release-dev/ocp-release"
            mirror-by-digest-only = true

            [[registry.mirror]]
              location = "${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4"

          [[registry]]
            prefix = ""
            location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"
            mirror-by-digest-only = true

            [[registry.mirror]]
              location = "${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4"

EOF

cat << EOF > /data/sno/install.crts.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-install-crts
storage:
  files:
    - path: /etc/pki/ca-trust/source/anchors/quaylab.crt
      overwrite: true
      contents:
        inline: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/          /g' )

EOF

mkdir -p /data/sno/disconnected/
# copy ntp related config
/bin/cp -f  /data/ocp4/ocp4-upi-helpernode-master/machineconfig/* /data/sno/disconnected/

# copy image registry proxy related config
cd /data/ocp4
bash image.registries.conf.sh nexus.infra.redhat.ren:8083

/bin/cp -f /data/ocp4/99-worker-container-registries.yaml /data/sno/disconnected/
/bin/cp -f /data/ocp4/99-master-container-registries.yaml /data/sno/disconnected/

cd /data/sno/
# load ignition file generation function
source /data/ocp4/acm.fn.sh


get_file_content_for_ignition "/opt/openshift/openshift/99-master-chrony-configuration.yaml" "/data/sno/disconnected/99-master-chrony-configuration.yaml"
VAR_99_master_chrony=$RET_VAL
VAR_99_master_chrony_2=$RET_VAL_2

get_file_content_for_ignition "/opt/openshift/openshift/99-worker-chrony-configuration.yaml" "/data/sno/disconnected/99-worker-chrony-configuration.yaml"
VAR_99_worker_chrony=$RET_VAL
VAR_99_worker_chrony_2=$RET_VAL_2

get_file_content_for_ignition "/opt/openshift/openshift/99-master-container-registries.yaml" "/data/sno/disconnected/99-master-container-registries.yaml"
VAR_99_master_container_registries=$RET_VAL
VAR_99_master_container_registries_2=$RET_VAL_2

get_file_content_for_ignition "/opt/openshift/openshift/99-worker-container-registries.yaml" "/data/sno/disconnected/99-worker-container-registries.yaml"
VAR_99_worker_container_registries=$RET_VAL
VAR_99_worker_container_registries_2=$RET_VAL_2

butane /data/sno/install.images.bu > /data/sno/disconnected/99-zzz-master-install-images.yaml
get_file_content_for_ignition "/opt/openshift/openshift/99-zzz-master-install-images.yaml" "/data/sno/disconnected/99-zzz-master-install-images.yaml"
VAR_99_master_install_images=$RET_VAL
VAR_99_master_install_images_2=$RET_VAL_2

butane /data/sno/install.crts.bu > /data/sno/disconnected/99-zzz-master-install-crts.yaml
get_file_content_for_ignition "/opt/openshift/openshift/99-zzz-master-install-crts.yaml" "/data/sno/disconnected/99-zzz-master-install-crts.yaml"
VAR_99_master_install_crts=$RET_VAL
VAR_99_master_install_crts_2=$RET_VAL_2

# https://access.redhat.com/solutions/6194821
# butane /data/sno/static.ip.bu | python3 -c 'import json, yaml, sys; print(json.dumps(yaml.load(sys.stdin)))'

# https://stackoverflow.com/questions/2854655/command-to-escape-a-string-in-bash
# VAR_PULL_SEC=`printf "%q" $(cat  /data/pull-secret.json)`

# https://access.redhat.com/solutions/221403
# VAR_PWD_HASH="$(openssl passwd -1 -salt 'openshift' 'redhat')"
VAR_PWD_HASH="$(python3 -c 'import crypt,getpass; print(crypt.crypt("redhat"))')"

tmppath=$(mktemp)
cat /data/sno/ign.base.json \
  | jq --arg VAR "$VAR_PWD_HASH" --arg VAR_SSH "$NODE_SSH_KEY" '.passwd.users += [{ "name": "wzh", "system": true, "passwordHash": $VAR , "sshAuthorizedKeys": [ $VAR_SSH ], "groups": [ "adm", "wheel", "sudo", "systemd-journal"  ] }]' \
  | jq --argjson VAR "$VAR_99_master_chrony" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_worker_chrony" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_container_registries" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_worker_container_registries" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_chrony_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_container_registries_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_install_images_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_install_crts_2" '.storage.files += [$VAR] ' \
  | jq -c . \
  > ${tmppath}
VAR_IGNITION=$(cat ${tmppath})
rm -f ${tmppath}


cat << EOF > /data/install/acm.infraenv.yaml
apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
  name: ${ACM_DEMO_CLUSTER}
  namespace: ${ACM_DEMO_CLUSTER}
spec:
  additionalNTPSources:
    - 192.168.7.11
  clusterRef:
    name: ${ACM_DEMO_CLUSTER}
    namespace: ${ACM_DEMO_CLUSTER}
  sshAuthorizedKey: "$(< ~/.ssh/id_rsa.pub)"
  pullSecretRef:
    name: assisted-deployment-pull-secret
  ignitionConfigOverride: '${VAR_IGNITION}'
  nmStateConfigLabelSelector:
    matchLabels:
      nmstate-conf-cluster-name: ${ACM_DEMO_CLUSTER}
  # imageType: "full-iso"
EOF
oc create -f /data/install/acm.infraenv.yaml
# oc delete -f /data/install/acm.infraenv.yaml

oc get infraenv/${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER} -o json | jq .status
# {
#   "agentLabelSelector": {
#     "matchLabels": {
#       "infraenvs.agent-install.openshift.io": "acm-demo1"
#     }
#   },
#   "conditions": [
#     {
#       "lastTransitionTime": "2022-04-06T13:52:54Z",
#       "message": "Image has been created",
#       "reason": "ImageCreated",
#       "status": "True",
#       "type": "ImageCreated"
#     }
#   ],
#   "createdTime": "2022-04-06T13:52:54Z",
#   "debugInfo": {
#     "eventsURL": ""
#   },
#   "isoDownloadURL": "https://assisted-image-service-open-cluster-management.apps.acm-demo-hub.redhat.ren/images/a87141aa-d980-4f34-ba59-d236e2158c98?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJhODcxNDFhYS1kOTgwLTRmMzQtYmE1OS1kMjM2ZTIxNThjOTgifQ.muD_hlhMIgcNaAZk00M09QW-EwI1REGkxavKo26P-CZ_IkPR3GcdPhWLVtBjdTkrcAOgt__pcWkmJQyko5sqtw&arch=x86_64&type=minimal-iso&version=4.10"
# }


# VAR_ISO=`oc get infraenv ${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER} -o jsonpath={.status.isoDownloadURL}`

# cd /data/install/
# wget --no-check-certificate -O acm.demo1.iso $VAR_ISO

oc get pod -A | grep metal3
# openshift-machine-api                              metal3-6b7b4665f6-knwzr                                           7/7     Running     0               39m
# openshift-machine-api                              metal3-image-cache-hhvnm                                          1/1     Running     1               6h14m
# openshift-machine-api                              metal3-image-customization-577f886bb4-cwl2l                       1/1     Running     1               6h13m

cd /data/ocp4/
cat << 'EOF' > redfish.sh
#!/usr/bin/env bash

curl -k -s https://192.168.7.1:8000/redfish/v1/Systems/ | jq -r '.Members[]."@odata.id"' >  list

while read -r line; do
    curl -k -s https://192.168.7.1:8000/$line | jq -j '.Id, " ", .Name, "\n" '
done < list

EOF
bash redfish.sh > /data/install/vm.list
cat /data/install/vm.list
# 075b17f7-9be9-4576-8d72-2ddd99909e19 ocp4-acm-demo1-master0
# c991312a-26de-438d-8c2d-6aa6cd586bca ocp4-master0
# e70f66bc-7878-4617-811d-89cdaf62cc8c ocp4-Helper

# oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true}}'

cat << EOF > /data/install/acm.demo.secret.bmc.yaml
apiVersion: v1
kind: Secret
metadata:
  name: ${ACM_DEMO_CLUSTER}-bmc-master-0
  namespace: ${ACM_DEMO_CLUSTER}
data:
  password: $(echo password | base64)
  username: $(echo admin | base64)
type: Opaque
EOF
oc create -f /data/install/acm.demo.secret.bmc.yaml

cat << EOF > /data/install/acm.demo.bmh.master.yaml
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: ${ACM_DEMO_CLUSTER}-master0
  namespace: ${ACM_DEMO_CLUSTER}
  labels:
    infraenvs.agent-install.openshift.io: "${ACM_DEMO_CLUSTER}"
  annotations:
    ## Disable the Introspection
    inspect.metal3.io: disabled
    ## Set Static Hostname
    bmac.agent-install.openshift.io/hostname: "${SNO_HOSTNAME}"
    ## Set Static Role
    bmac.agent-install.openshift.io/role: "master"
spec:
  online: true
  bmc:
    address: redfish-virtualmedia://192.168.7.1:8000/redfish/v1/Systems/$(cat /data/install/vm.list | grep acm-demo1-master0 | awk '{print $1}')
    credentialsName: ${ACM_DEMO_CLUSTER}-bmc-master-0
    disableCertificateVerification: true
  bootMACAddress: $(cat /data/install/mac.list | grep acm-demo1-master0 | awk '{print $2}')
  automatedCleaningMode: disabled
EOF
oc create -f /data/install/acm.demo.bmh.master.yaml
# oc delete -f /data/install/acm.demo.bmh.master.yaml

我们回到ACM的界面中,能从基础架构中,看到我们新创建的HOST了,能看到ACM正在通过redfish配置这个kvm

这个bare metal host其实是调用的openshift4平台上的服务创建的,所以从openshift4的console上也能看得到:

能从openshift4 console上看到这个bare metal host的详细信息:

回到ACM的界面中,我们能看到安装正在继续:

从ACM的cluster界面中,我们能看到安装的详细进展情况:

但是安装的中途,提示我们需要动手操作一下。这是因为我们是用kvm模拟的物理机,并且模拟了一个redfish,这个redfish功能比较简单,在安装ocp的过程中,kvm会重启,但是远程挂载的光盘没有卸载,所以我们需要卸载掉这个光驱,然后继续安装:

进入kvm的界面,调整一下启动顺序:

然后重启kvm,等待一段时间,infra env就安装完成了。

不过,cluster还在继续安装,我们安心等待安装过程完成。

安装完成

装好了以后,我们在ACM里面就能看到如下景象: https://multicloud-console.apps.acm-demo-hub.redhat.ren/overview

cluster 也能看到了绿色的正常状态了, 这里面local-cluster是ACM hub所在的集群:

看cluster的详细信息,也正常了:

⚠️一定记得,下载kubeconfig文件,还有密码

cluster的node tab也有内容了:

cluster的add-on,也装上了我们之前配置的组件:

infra env也绿色状态了 https://multicloud-console.apps.acm-demo-hub.redhat.ren/multicloud/infra-environments

详细信息和原来一样:

hosts tab 也完成了


# on helper
useradd -m wzh

su - wzh
mkdir ~/auth

# upload kubeconfig.json to /home/wzh/auth/

ansible localhost -m lineinfile -a 'path=$HOME/.bashrc regexp="^export KUBECONFIG" line="export KUBECONFIG=~/auth/kubeconfig.json"'
source $HOME/.bashrc

check dhcp existed

我们是静态IP安装,那么就要确认一下环境里面是不是真的 DHCP 给关了,检查的方法如下。

https://superuser.com/questions/750359/check-if-a-dhcp-server-existing-in-my-network-using-bash

dnf install nmap -y
nmap --script broadcast-dhcp6-discover -e enp1s0

end


# revert the order
tac << EOF 
oc delete -f /data/install/acm.ocp.release.yaml
oc delete -f /data/install/acm.cm.asc.yaml
oc delete -f /data/install/acm.secret.yaml
oc delete -f /data/install/acm.mirror.yaml
oc delete -f /data/install/acm.agentservicecofnig.yaml
oc delete -f /data/install/acm.managed.secret.yaml
oc delete -f /data/install/acm.agentclusterinstall.yaml
oc delete -f /data/install/acm.nmsc.yaml
oc delete -f /data/install/acm.clusterdeployment.yaml
oc delete -f /data/install/acm.klusterletaddonconfig.yaml
oc delete -f /data/install/acm.managedcluster.yaml
oc delete -f /data/install/acm.infraenv.yaml
EOF
oc delete -f /data/install/acm.infraenv.yaml
oc delete -f /data/install/acm.managedcluster.yaml
oc delete -f /data/install/acm.klusterletaddonconfig.yaml
oc delete -f /data/install/acm.clusterdeployment.yaml
oc delete -f /data/install/acm.nmsc.yaml
oc delete -f /data/install/acm.agentclusterinstall.yaml
oc delete -f /data/install/acm.managed.secret.yaml
oc delete -f /data/install/acm.agentservicecofnig.yaml
oc delete -f /data/install/acm.mirror.yaml
oc delete -f /data/install/acm.secret.yaml
oc delete -f /data/install/acm.cm.asc.yaml
oc delete -f /data/install/acm.ocp.release.yaml


coreos 启动分析

我们知道,coreos 是内核和根文件系统,一起打包升级的,也就是所谓的 A/B 切换升级,那么他到底是怎么实现这个的呢?现在我们就来分析一下。

视频讲解

首先我们分析一下 /boot 分区

ls -hl /boot/loader/entries/

total 4.0K
-rw-r--r--. 1 root root 629 Apr  9 11:57 ostree-1-rhcos.conf
-rw-r--r--. 1 root root 630 Apr  9 11:57 ostree-2-rhcos.conf

cat /boot/loader/entries/*.conf

title Red Hat Enterprise Linux CoreOS 49.84.202110081407-0 (Ootpa) (ostree:1)
version 1
options random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal $ignition_firstboot ostree=/ostree/boot.0/rhcos/a10b07df1aa66c008cd3b9acb17d765f0755702cadfa0090155dced4d2e9bfe0/0 ip=enp1s0:dhcp root=UUID=0a0d4701-04bf-45a2-8b9b-f761542a617a rw rootflags=prjquota
linux /ostree/rhcos-a10b07df1aa66c008cd3b9acb17d765f0755702cadfa0090155dced4d2e9bfe0/vmlinuz-4.18.0-305.19.1.el8_4.x86_64
initrd /ostree/rhcos-a10b07df1aa66c008cd3b9acb17d765f0755702cadfa0090155dced4d2e9bfe0/initramfs-4.18.0-305.19.1.el8_4.x86_64.img

title Red Hat Enterprise Linux CoreOS 410.84.202203081640-0 (Ootpa) (ostree:0)
version 2
options random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal $ignition_firstboot ostree=/ostree/boot.0/rhcos/838cd9a10892dbd5e32ffdbec249a4c0db18f6d1c56f416f7a59a2f806f55941/0 ip=enp1s0:dhcp root=UUID=0a0d4701-04bf-45a2-8b9b-f761542a617a rw rootflags=prjquota
linux /ostree/rhcos-838cd9a10892dbd5e32ffdbec249a4c0db18f6d1c56f416f7a59a2f806f55941/vmlinuz-4.18.0-305.40.1.el8_4.x86_64
initrd /ostree/rhcos-838cd9a10892dbd5e32ffdbec249a4c0db18f6d1c56f416f7a59a2f806f55941/initramfs-4.18.0-305.40.1.el8_4.x86_64.img

我们可以清晰的看到,这里面定义了2个入口,并且每个入口,对应了/boot/ostree//vmlinuz- 和 /boot/ostree//initramfs-.img , 另外启动参数option定义了 ostree的路径,指向了对应的/boot/ostree/boot.0/ , 所以,我们得出结论,这个A/B切换,在系统启动的时候,是通过grub2的一些特殊配置实现的,看上去并不复杂。

参考文档:

  • https://access.redhat.com/solutions/5847011

再分析一下 mount

lsblk

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sr0     11:0    1   104M  0 rom
vda    252:0    0   120G  0 disk
├─vda1 252:1    0     1M  0 part
├─vda2 252:2    0   127M  0 part
├─vda3 252:3    0   384M  0 part /boot
└─vda4 252:4    0 119.5G  0 part /sysroot

mount | grep vda4

/dev/vda4 on /sysroot type xfs (ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/vda4 on / type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/vda4 on /etc type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/vda4 on /usr type xfs (ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/vda4 on /var type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/vda4 on /var/lib/containers/storage/overlay type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/vda4 on /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/1 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/vda4 on /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/2 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/vda4 on /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/3 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/vda4 on /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/4 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/vda4 on /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/5 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)

我们非常困惑,mount命令的输出显示,vda4被挂载了很多次,每次都是不同的路径,这是为什么呢?

cat /proc/1/mountinfo  | grep vda4

99 102 252:4 / /sysroot ro,relatime - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
102 1 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0 / rw,relatime shared:1 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
103 102 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc /etc rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
104 102 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/usr /usr ro,relatime shared:3 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
133 102 252:4 /ostree/deploy/rhcos/var /var rw,relatime shared:4 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
299 133 252:4 /ostree/deploy/rhcos/var/lib/containers/storage/overlay /var/lib/containers/storage/overlay rw,relatime - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
7886 133 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc/modprobe.d /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/1 rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
5920 133 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc/sysconfig /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/2 rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
7429 133 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc/sysctl.d /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/3 rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
7965 133 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc/sysctl.conf /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/4 rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
8491 133 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc/systemd /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/5 rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota

答案在/proc/1/mountinfo中,我们来仔细分析一下里面的内容,特别是根文件系统的挂载。

  1. 我们看第一行,/dev/vda4说的是设备,xfs说的是这个设备上的文件系统,/ 说的是设备上的本来的目录, /sysroot 说的是挂载到当前进程空间的什么目录
  2. 我们再来看第二行,/dev/vda4说的是设备,xfs说的是这个设备上的文件系统,/ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0 说的是设备上的本来的目录, / 说的是挂载到当前进程空间的什么目录

所以,总结下来,/dev/vda4 上面的目录结构,和我们一般的目录结果不一样,系统启动以后,关键的路径被重新的安排就位了一下。


find /sysroot -maxdepth 3

/sysroot
/sysroot/boot
/sysroot/ostree
/sysroot/ostree/repo
/sysroot/ostree/repo/config
/sysroot/ostree/repo/tmp
/sysroot/ostree/repo/extensions
/sysroot/ostree/repo/state
/sysroot/ostree/repo/refs
/sysroot/ostree/repo/objects
/sysroot/ostree/repo/.lock
/sysroot/ostree/deploy
/sysroot/ostree/deploy/rhcos
/sysroot/ostree/boot.0.1
/sysroot/ostree/boot.0.1/rhcos
/sysroot/ostree/boot.0
/sysroot/.coreos-aleph-version.json

调查和 mount fs 相关的systemd

systemctl cat ostree-remount.service

[Unit]
Description=OSTree Remount OS/ Bind Mounts
Documentation=man:ostree(1)
DefaultDependencies=no
ConditionKernelCommandLine=ostree
OnFailure=emergency.target
Conflicts=umount.target
# Run after core mounts
After=-.mount var.mount
After=systemd-remount-fs.service
# But we run *before* most other core bootup services that need write access to /etc and /var
Before=local-fs.target umount.target
Before=systemd-random-seed.service plymouth-read-write.service systemd-journal-flush.service
Before=systemd-tmpfiles-setup.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/lib/ostree/ostree-remount
StandardInput=null
StandardOutput=journal
StandardError=journal+console

[Install]
WantedBy=local-fs.target

systemctl list-unit-files | grep mount

proc-sys-fs-binfmt_misc.automount                                      static
boot.mount                                                             generated
dev-hugepages.mount                                                    static
dev-mqueue.mount                                                       static
proc-fs-nfsd.mount                                                     static
proc-sys-fs-binfmt_misc.mount                                          static
run-vmblock\x2dfuse.mount                                              disabled
sys-fs-fuse-connections.mount                                          static
sys-kernel-config.mount                                                static
sys-kernel-debug.mount                                                 static
tmp.mount                                                              disabled
var-lib-nfs-rpc_pipefs.mount                                           static
var.mount                                                              generated
dracut-mount.service                                                   static
dracut-pre-mount.service                                               static
nfs-mountd.service                                                     static
ostree-remount.service                                                 disabled
systemd-remount-fs.service                                             static
umount.target                                                          static

systemctl cat dracut-mount.service
# /usr/lib/systemd/system/../../dracut/modules.d/98dracut-systemd/dracut-mount.service
#  This file is part of dracut.
#
# See dracut.bootup(7) for details

[Unit]
Description=dracut mount hook
Documentation=man:dracut-mount.service(8)
After=initrd-root-fs.target initrd-parse-etc.service
After=dracut-initqueue.service dracut-pre-mount.service
ConditionPathExists=/usr/lib/initrd-release
ConditionDirectoryNotEmpty=|/lib/dracut/hooks/mount
ConditionKernelCommandLine=|rd.break=mount
DefaultDependencies=no
Conflicts=shutdown.target emergency.target

[Service]
Environment=DRACUT_SYSTEMD=1
Environment=NEWROOT=/sysroot
Type=oneshot
ExecStart=-/bin/dracut-mount
StandardInput=null
StandardOutput=syslog
StandardError=syslog+console
KillMode=process
RemainAfterExit=yes

# Bash ignores SIGTERM, so we send SIGHUP instead, to ensure that bash
# terminates cleanly.
KillSignal=SIGHUP

参考文档:

  • https://man7.org/linux/man-pages/man7/dracut.bootup.7.html
  • https://ostreedev.github.io/ostree/adapting-existing/#booting-and-initramfs-technology

openshift 4.10 single node, installer 安装,离线静态IP

openshift single node 是可以用installer来安装的,但是很多客户都遇到问题,这里我们就来试一下。

本文有一个前导实验,就是创建 helper node , 这个工具机用来做一个跳板,模拟离线环境的proxy

installer 的内部安装逻辑图:

视频讲解

on helper node


NODE_SSH_KEY="$(cat ~/.ssh/id_rsa.pub)"
INSTALL_IMAGE_REGISTRY=quaylab.infra.redhat.ren:8443

PULL_SECRET='{"auths":{"registry.redhat.io": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"'${INSTALL_IMAGE_REGISTRY}'": {"auth": "'$( echo -n 'admin:shadowman' | openssl base64 )'","email": "noemail@localhost"}}}'

NTP_SERVER=192.168.7.11
HELP_SERVER=192.168.7.11
KVM_HOST=192.168.7.11
API_VIP=192.168.7.100
INGRESS_VIP=192.168.7.101
CLUSTER_PROVISION_IP=192.168.7.103
BOOTSTRAP_IP=192.168.7.12

ACM_DEMO_MNGED_CLUSTER=acm-demo1
ACM_DEMO_MNGED_SNO_IP=192.168.7.15

# 定义单节点集群的节点信息
SNO_CLUSTER_NAME=acm-demo-hub
SNO_BASE_DOMAIN=redhat.ren
SNO_IP=192.168.7.13
SNO_GW=192.168.7.11
SNO_NETMAST=255.255.255.0
SNO_NETMAST_S=24
SNO_HOSTNAME=acm-demo-hub-master
SNO_IF=enp1s0
SNO_IF_MAC=`printf '00:60:2F:%02X:%02X:%02X' $[RANDOM%256] $[RANDOM%256] $[RANDOM%256]`
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_CORE_PWD=redhat

echo ${SNO_IF_MAC} > /data/sno/sno.mac


mkdir -p /data/install
cd /data/install

/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] 

cat << EOF > /data/install/install-config.yaml 
apiVersion: v1
baseDomain: $SNO_BASE_DOMAIN
compute:
- name: worker
  replicas: 0 
controlPlane:
  name: master
  replicas: 1 
metadata:
  name: $SNO_CLUSTER_NAME
networking:
  # OVNKubernetes , OpenShiftSDN
  networkType: OVNKubernetes
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
bootstrapInPlace:
  installationDisk: $SNO_DISK
pullSecret: '${PULL_SECRET}'
sshKey: |
$( cat /root/.ssh/id_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

openshift-install create manifests --dir=/data/install

/bin/cp -f  /data/ocp4/ocp4-upi-helpernode-master/machineconfig/* /data/install/openshift/

# copy image registry proxy related config
cd /data/ocp4
bash image.registries.conf.sh nexus.infra.redhat.ren:8083

/bin/cp -f /data/ocp4/image.registries.conf /etc/containers/registries.conf.d/

/bin/cp -f /data/ocp4/99-worker-container-registries.yaml /data/install/openshift
/bin/cp -f /data/ocp4/99-master-container-registries.yaml /data/install/openshift

cd /data/install/

openshift-install --dir=/data/install create single-node-ignition-config

alias coreos-installer='podman run --privileged --rm \
        -v /dev:/dev -v /run/udev:/run/udev -v $PWD:/data \
        -w /data quay.io/coreos/coreos-installer:release'

# /bin/cp -f bootstrap-in-place-for-live-iso.ign iso.ign

cat << EOF > /data/sno/static.hostname.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-static-hostname
storage:
  files:
    - path: /etc/hostname
      mode: 0644
      overwrite: true
      contents:
        inline: |
          ${SNO_HOSTNAME}

EOF


cat << EOF > /data/sno/static.ip.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-static-ip
storage:
  files:
    - path: /etc/NetworkManager/system-connections/${SNO_IF}.nmconnection
      mode: 0600
      overwrite: true
      contents:
        inline: |
          [connection]
          id=${SNO_IF}
          type=ethernet
          autoconnect-retries=1
          interface-name=${SNO_IF}
          multi-connect=1
          permissions=
          wait-device-timeout=60000

          [ethernet]
          mac-address-blacklist=

          [ipv4]
          address1=${SNO_IP}/${SNO_NETMAST_S=24},${SNO_GW}
          dhcp-hostname=${SNO_HOSTNAME}
          dhcp-timeout=90
          dns=${SNO_DNS};
          dns-search=
          may-fail=false
          method=manual

          [ipv6]
          addr-gen-mode=eui64
          dhcp-hostname=${SNO_HOSTNAME}
          dhcp-timeout=90
          dns-search=
          method=disabled

          [proxy]

EOF

source /data/ocp4/acm.fn.sh

# butane /data/sno/static.bootstrap.ip.bu > /data/sno/disconnected/99-zzz-bootstrap-ip.yaml
# get_file_content_for_ignition "/opt/openshift/openshift/99-zzz-bootstrap-ip.yaml" "/data/sno/disconnected/99-zzz-bootstrap-ip.yaml"
# VAR_99_master_bootstrap_ip=$RET_VAL
# VAR_99_master_bootstrap_ip_2=$RET_VAL_2

butane /data/sno/static.hostname.bu > /data/sno/disconnected/99-zzz-master-static-hostname.yaml
get_file_content_for_ignition "/opt/openshift/openshift/99-zzz-master-static-hostname.yaml" "/data/sno/disconnected/99-zzz-master-static-hostname.yaml"
VAR_99_master_master_static_hostname=$RET_VAL
VAR_99_master_master_static_hostname_2=$RET_VAL_2

butane /data/sno/static.ip.bu > /data/sno/disconnected/99-zzz-master-ip.yaml
get_file_content_for_ignition "/opt/openshift/openshift/99-zzz-master-ip.yaml" "/data/sno/disconnected/99-zzz-master-ip.yaml"
VAR_99_master_ip=$RET_VAL
VAR_99_master_ip_2=$RET_VAL_2


# 我们会创建一个wzh用户,密码是redhat,这个可以在第一次启动的是,从console/ssh直接用用户名口令登录
# 方便排错和研究
VAR_PWD_HASH="$(python3 -c 'import crypt,getpass; print(crypt.crypt("redhat"))')"

# tmppath=$(mktemp)
cat /data/install/bootstrap-in-place-for-live-iso.ign \
  | jq --arg VAR "$VAR_PWD_HASH" --arg VAR_SSH "$NODE_SSH_KEY" '.passwd.users += [{ "name": "wzh", "system": true, "passwordHash": $VAR , "sshAuthorizedKeys": [ $VAR_SSH ], "groups": [ "adm", "wheel", "sudo", "systemd-journal"  ] }]' \
  | jq --argjson VAR "$VAR_99_master_ip_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_master_static_hostname" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_ip" '.storage.files += [$VAR] ' \
  | jq -c . \
  > /data/install/iso.ign

# jump to other document here, if you want to customize the ignition file for partition and user
# then comeback

/bin/cp -f /data/ocp4/rhcos-live.x86_64.iso sno.iso

coreos-installer iso ignition embed -fi iso.ign sno.iso

on kvm host ( 103 )


# 创建实验用虚拟网络

mkdir -p /data/kvm
cd /data/kvm

cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.103/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh

nmcli con mod baremetal +ipv4.addresses "192.168.7.103/24"
nmcli con up baremetal

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

pvcreate -y /dev/vdb
vgcreate vgdate /dev/vdb

# https://access.redhat.com/articles/766133
lvcreate -y -n poolA -L 500G vgdata
lvcreate -y -n poolA_meta -L 10G vgdata
lvconvert -y --thinpool vgdata/poolA --poolmetadata vgdata/poolA_meta

scp root@192.168.7.11:/data/install/sno.iso /data/kvm/

virsh destroy ocp4-acm-hub
virsh undefine ocp4-acm-hub

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

create_lv vgdata poolA lvacmhub 100G recreate
create_lv vgdata poolA lvacmhub-data 100G recreate

SNO_MEM=64

virt-install --name=ocp4-acm-hub-master01 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lvacmhub,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lvacmhub-data,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.3 --network bridge=baremetal,model=virtio \
  --graphics vnc,port=59002 \
  --boot menu=on --cdrom /data/kvm/sno.iso 

# --disk path=/dev/vgdata/lvacmhub-data,device=disk,bus=virtio,format=raw \

on helper to see result

cd /data/install
export KUBECONFIG=/data/install/auth/kubeconfig
echo "export KUBECONFIG=/data/install/auth/kubeconfig" >> ~/.bashrc
oc completion bash | sudo tee /etc/bash_completion.d/openshift > /dev/null

cd /data/install
openshift-install wait-for install-complete --log-level debug
# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.acm-demo-hub.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "M5hQw-NizfX-qKzEq-eUnNk"
# DEBUG Time elapsed per stage:
# DEBUG Cluster Operators: 9m39s
# INFO Time elapsed: 9m39s

back and merge kubeconfig


mkdir -p ~/.kube/bak/

var_date=$(date '+%Y-%m-%d-%H%M')

/bin/cp -f /data/install/auth/kubeconfig ~/.kube/bak/kubeconfig-$var_date
/bin/cp -f /data/install/auth/kubeadmin-password ~/.kube/bak/kubeadmin-password-$var_date

sed "s/admin/admin\/$SNO_CLUSTER_NAME/g" /data/install/auth/kubeconfig > /tmp/config.new

# https://medium.com/@jacobtomlinson/how-to-merge-kubernetes-kubectl-config-files-737b61bd517d
/bin/cp -f ~/.kube/config ~/.kube/config.bak && KUBECONFIG=~/.kube/config:/tmp/config.new kubectl config view --flatten > /tmp/config && /bin/mv -f /tmp/config ~/.kube/config

unset KUBECONFIG

add worker node

我们装好了single node,那么接下来,我们还可以给这个single node添加worker节点,让这个single node cluster变成一个单master的集群。


# first, lets stick ingress to master
oc label node acm-demo-hub-master  ocp-ingress-run="true"

oc patch ingresscontroller default -n openshift-ingress-operator --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector": {"matchLabels":{"ocp-ingress-run":"true"}}}}}'

# we are testing env, so we don't need ingress replicas.
oc patch --namespace=openshift-ingress-operator --patch='{"spec": {"replicas": 1}}' --type=merge ingresscontroller/default

oc get -n openshift-ingress-operator ingresscontroller/default -o yaml

# then we get worker's ignition file, and start worker node, add it to cluster

oc extract -n openshift-machine-api secret/worker-user-data --keys=userData --to=- > /var/www/html/ignition/sno-worker.ign


HELP_SERVER=192.168.7.11

# 定义单节点集群的节点信息
SNO_IP=192.168.7.16
SNO_GW=192.168.7.11
SNO_NETMAST=255.255.255.0
SNO_HOSTNAME=acm-demo-hub-worker-01
SNO_IF=enp1s0
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_MEM=16

BOOT_ARG=" ip=$SNO_IP::$SNO_GW:$SNO_NETMAST:$SNO_HOSTNAME:$SNO_IF:none nameserver=$SNO_DNS coreos.inst.install_dev=${SNO_DISK##*/} coreos.inst.ignition_url=http://$HELP_SERVER:8080/ignition/sno-worker.ign"

/bin/cp -f /data/ocp4/rhcos-live.x86_64.iso sno.iso

coreos-installer iso kargs modify -a "$BOOT_ARG" sno.iso

# go to kvm host ( 103 )
scp root@192.168.7.11:/data/install/sno.iso /data/kvm/

virsh destroy ocp4-acm-hub-worker01
virsh undefine ocp4-acm-hub-worker01

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

create_lv vgdata poolA lvacmhub-worker01 120G recreate
# create_lv vgdata poolA lvacmhub-worker01-data 100G remove

virt-install --name=ocp4-acm-hub-worker01 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lvacmhub-worker01,device=disk,bus=virtio,format=raw \
  `# --disk path=/dev/vgdata/lvacmhub-data,device=disk,bus=virtio,format=raw` \
  --os-variant rhel8.3 --network bridge=baremetal,model=virtio \
  --graphics vnc,port=59003 \
  --boot menu=on --cdrom /data/kvm/sno.iso 

# after 2 boot up,
# go back to helper
oc get csr
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

end

openshift 4.10, single node with customized partition

做POC经常有这样的问题:客户的主机上,就一块1T SSD,我们知道,ocp 默认安装,会占用这第一块硬盘,分4个分区,其中root分区几乎占满整个硬盘。但是其实ocp运行的时候,只会用到100G左右,那么这1T SSD的大部分就浪费掉了。我们希望能从这个1T的硬盘中,分出来800G,给存储解决方案用,比如ceph/odf,那么怎么在安装的过程中,定制installer,让他给我们多分出来第5个分区呢?

本文描述,如何在single node openshift上,如果是单独硬盘的话,如何在这个硬盘上,多分出来几个分区,来做数据分区。

这个在某些资源有限PoC场景下,会很有用,比如需要在单硬盘single node上启动ceph.

本文,有一个背景知识,或者是前导实验,就是如何部署一个普通的single node openshift

内部安装逻辑图如下:

视频讲解

additional steps during sno install


# download yq and install
mkdir tmp; cd tmp
wget https://github.com/mikefarah/yq/releases/download/v4.25.1/yq_linux_amd64.tar.gz
tar -zxvf yq_linux_amd64.tar.gz
install yq_linux_amd64 /usr/local/bin/yq

# calcuate a password
# VAR_PWD=`podman run -ti --rm quay.io/coreos/mkpasswd --method=yescrypt redhat`
# $y$j9T$UCg7ef5in/0aw0C2ZqSFo.$n8gC9.kDzWwlq0GmXKDVH8KUuGNdj7l6tnAsR4RZaG5

VAR_PWD_HASH="$(python3 -c 'import crypt,getpass; print(crypt.crypt("redhat"))')"

# # https://docs.fedoraproject.org/en-US/fedora-coreos/storage/#_setting_up_separate_var_mounts
# cat << EOF > /data/sno/root-partition.bu
# variant: openshift
# version: 4.8.0
# metadata:
#   name: root-storage
#   labels:
#     machineconfiguration.openshift.io/role: master
# storage:
#   disks:
#     - device: /dev/vda
#       wipe_table: false
#       partitions:
#         - number: 4
#           label: root
#           size_mib: $(( 120 * 1024 ))
#           resize: true
#         - label: data_odf_lvm
#           size_mib: 0
# EOF

# butane /data/sno/root-partition.bu -r -o /data/install/partition-ric.ign

# # merge the 2 ignition files
# jq -s '.[0] * .[1]' /data/install/iso.ign /data/install/partition-ric.ign | jq -c . > /data/install/iso.ign.new

# /bin/cp -f /data/install/iso.ign.new  /data/install/iso.ign

# https://github.com/openshift/installer/blob/master/data/data/bootstrap/bootstrap-in-place/files/opt/openshift/bootstrap-in-place/master-update.fcc
# cat iso.ign | jq ' .storage.files[] | select ( .path == "/opt/openshift/bootstrap-in-place/master-update.fcc" ) ' | jq -r .contents.source | sed 's/.*base64,//g' | base64 -d > /data/install/master-update.fcc



cat << EOF > /data/install/root-partition.fc
variant: fcos
version: 1.3.0
#   !!! do not include passwd / users in production system !!!
passwd:
  users:
    - name: wzh
      password_hash: $VAR_PWD_HASH
      system: true
      ssh_authorized_keys:
        - $NODE_SSH_KEY
      groups:
        - adm
        - wheel
        - sudo
        - systemd-journal
storage:
  disks:
    - device: /dev/vda
      wipe_table: false
      partitions:
        - number: 4
          label: root
          size_mib: $(( 120 * 1024 ))
          resize: true
        # - label: data_01
        #   size_mib: $(( 5 * 1024 ))
        - label: data_odf_lvm
          size_mib: 0
EOF

butane /data/install/root-partition.fc -r -o /data/install/partition-ric.ign


cat << EOF > /data/sno/root-partition.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-static-hostname
storage:
  files:
    - path: /opt/openshift/partition-ric.ign
      mode: 0644
      overwrite: true
      contents:
        local: partition-ric.ign

EOF


# yq '. *= load("/data/install/master-update.fcc")' /data/install/root-partition.fc > /data/install/root-partition.fcc

# config_source=$(cat /data/install/root-partition.fcc | python3 -c "import sys, urllib.parse; print(urllib.parse.quote(''.join(sys.stdin.readlines())))"  )
# VAR_FCC_FILE="data:text/plain,${config_source}"

butane -d /data/install /data/sno/root-partition.bu > /data/sno/disconnected/99-zzz-master-root-partition.yaml
get_file_content_for_ignition "/opt/openshift/partition-ric.ign" "/data/sno/disconnected/99-zzz-master-root-partition.yaml"
VAR_99_master_fcc=$RET_VAL
VAR_99_master_fcc_2=$RET_VAL_2


cat iso.ign | jq ' .storage.files[] | select ( .path == "/usr/local/bin/bootstrap-in-place.sh" ) ' | jq -r .contents.source | sed 's/.*base64,//g' | base64 -d > /data/install/bootstrap-in-place.sh

# try to replace

# merge the 2 ignition files
cat << EOF > /data/install/bootstrap-in-place.sh.patch
jq -s '.[0] * .[1]' /opt/openshift/master.ign /opt/openshift/partition-ric.ign | jq -c . > /opt/openshift/master.ign.new
/bin/cp -f /opt/openshift/master.ign.new /opt/openshift/master.ign
EOF

# https://stackoverflow.com/questions/26141347/using-sed-to-insert-file-content-into-a-file-before-a-pattern
sed $'/touch master-ignition.done/{e cat \/data\/install\/bootstrap-in-place.sh.patch\n}' /data/install/bootstrap-in-place.sh > /data/install/bootstrap-in-place.sh.new

cat << EOF > /data/sno/bootstrap-in-place.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-bootstrap-in-place
storage:
  files:
    - path: /usr/local/bin/bootstrap-in-place.sh
      mode: 0555
      overwrite: true
      contents:
        local: bootstrap-in-place.sh.new

EOF

butane -d /data/install /data/sno/bootstrap-in-place.bu > /data/sno/disconnected/99-zzz-master-bootstrap-in-place.yaml
get_file_content_for_ignition "/usr/local/bin/bootstrap-in-place.sh" "/data/sno/disconnected/99-zzz-master-bootstrap-in-place.yaml"
VAR_99_master_bootstrap_sh=$RET_VAL
VAR_99_master_bootstrap_sh_2=$RET_VAL_2

cat iso.ign | jq ' del ( .storage.files[] | select ( .path == "/usr/local/bin/bootstrap-in-place.sh" ) )' > /data/install/iso.ign.new

# cat iso.ign | jq ' .storage.files[] | select ( .path == "/usr/local/bin/bootstrap-in-place.sh" ) ' | jq -r .contents.source 

cat /data/install/iso.ign.new \
  | jq --argjson VAR "$VAR_99_master_fcc_2" '.storage.files += [$VAR] ' \
  | jq --argjson VAR "$VAR_99_master_bootstrap_sh_2" '.storage.files += [$VAR] ' \
  | jq -c . \
  > /data/install/iso.ign

# cat iso.ign | jq ' .storage.files[] | select ( .path == "/opt/openshift/bootstrap-in-place/master-update.fcc" ) ' | jq -r .contents.source 
# cat iso.ign | jq ' .storage.files[] | select ( .path == "/opt/openshift/partition-ric.ign" ) ' | jq -r .contents.source 

check the result

ssh -tt core@192.168.7.13 -- lsblk
# NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
# sr0     11:0    1  1024M  0 rom
# vda    252:0    0   400G  0 disk
# ├─vda1 252:1    0     1M  0 part
# ├─vda2 252:2    0   127M  0 part
# ├─vda3 252:3    0   384M  0 part /boot
# ├─vda4 252:4    0   120G  0 part /sysroot
# └─vda5 252:5    0 279.5G  0 part

local storage operator

我们有了很多分区,那么赶快来测试一下如何把他们变成 PV 吧

cat << EOF > /data/install/local-storage.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-local-storage
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-local-storage
  namespace: openshift-local-storage
spec:
  targetNamespaces:
  - openshift-local-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: local-storage-operator
  namespace: openshift-local-storage
spec:
  channel: "stable"
  installPlanApproval: Manual
  name: local-storage-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF
oc create -f /data/install/local-storage.yaml


cat << EOF > /data/install/local-storage-lv.yaml
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "local-disks"
  namespace: "openshift-local-storage" 
spec:
  nodeSelector: 
    nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - acm-demo-hub-master
  storageClassDevices:
    - storageClassName: "local-sc" 
      volumeMode: Filesystem 
      fsType: xfs 
      devicePaths: 
        - /dev/vda5
EOF
oc create -f /data/install/local-storage-lv.yaml


我们创建pod,创建和使用pvc,然后弄点数据,然后删掉pod,删掉pvc。然后重新创建pod,创建和使用pvc,看看里面的数据是否会清空。

cat << EOF >> /data/install/pvc-demo.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: local-pvc-demo
spec:
  accessModes:
  - ReadWriteOnce
  volumeMode: Filesystem 
  resources:
    requests:
      storage: 2Gi 
  storageClassName: local-sc 
---
kind: Pod
apiVersion: v1
metadata:
  annotations:
  name: demo1
spec:
  nodeSelector:
    kubernetes.io/hostname: 'acm-demo-hub-master'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        quay.io/wangzheng422/qimgs:centos7-test
      env:
        - name: key
          value: value
      command: 
        - sleep
        - infinity
      imagePullPolicy: Always
      volumeMounts:
        - mountPath: /data
          name: demo 
          readOnly: false
  volumes:
    - name: demo 
      persistentVolumeClaim:
        claimName: local-pvc-demo 
EOF
oc create -n default -f /data/install/pvc-demo.yaml

install lvm operator

lvm operator dose NOT work.

tips

cat iso.ign | jq .storage.files[].path | grep fcc
# "/opt/openshift/bootstrap-in-place/master-update.fcc"

cat iso.ign | jq ' .storage.files[] | select ( .path == "/opt/openshift/bootstrap-in-place/master-update.fcc" ) ' | jq -r .contents.source | sed 's/.*base64,//g' | base64 -d
# variant: fcos
# version: 1.1.0
# ignition:
#   config:
#     merge:
#       - local: original-master.ign
# storage:
#   trees:
#     - local: kubernetes/bootstrap-configs
#       path: /etc/kubernetes/bootstrap-configs
#     - local: tls/
#       path: /etc/kubernetes/bootstrap-secrets
#     - local: etcd-bootstrap/etc-kubernetes/static-pod-resources/etcd-member/
#       path: /etc/kubernetes/static-pod-resources/etcd-member
#     - local: etcd-data
#       path: /var/lib/etcd
#   files:
#     - path: /etc/kubernetes/bootstrap-secrets/kubeconfig
#       contents:
#         local: auth/kubeconfig-loopback
#     - path: /etc/kubernetes/manifests/etcd-pod.yaml
#       contents:
#         local: etcd-bootstrap/etc-kubernetes/manifests/etcd-member-pod.yaml
#     - path: /etc/kubernetes/manifests/kube-apiserver-pod.yaml
#       contents:
#         local: bootstrap-manifests/kube-apiserver-pod.yaml
#     - path: /etc/kubernetes/manifests/kube-controller-manager-pod.yaml
#       contents:
#         local: bootstrap-manifests/kube-controller-manager-pod.yaml
#     - path: /etc/kubernetes/manifests/kube-scheduler-pod.yaml
#       contents:
#         local: bootstrap-manifests/kube-scheduler-pod.yaml
#     - path: /usr/local/bin/bootstrap-in-place-post-reboot.sh
#       contents:
#         local: bootstrap-in-place/bootstrap-in-place-post-reboot.sh
#       mode: 0555
#     - path: /var/log/log-bundle-bootstrap.tar.gz
#       contents:
#         local: log-bundle-bootstrap.tar.gz
#     - path: /usr/local/bin/installer-masters-gather.sh
#       contents:
#         local: bin/installer-masters-gather.sh
#       mode: 0555
#     - path: /usr/local/bin/installer-gather.sh
#       contents:
#         local: bin/installer-gather.sh
#       mode: 0555
# systemd:
#   units:
#     - name: bootkube.service
#       enabled: true
#       contents: |
#         [Unit]
#         Description=Bootkube - bootstrap in place post reboot
#         Wants=kubelet.service
#         After=kubelet.service
#         ConditionPathExists=/etc/kubernetes/bootstrap-secrets/kubeconfig
#         [Service]
#         Type=oneshot
#         ExecStart=/usr/local/bin/bootstrap-in-place-post-reboot.sh
#         RestartSec=5s
#         [Install]
#         WantedBy=multi-user.target

cat iso.ign | jq ' .storage.files[] | select ( .path == "/usr/local/bin/bootstrap-in-place.sh" ) ' | jq -r .contents.source | sed 's/.*base64,//g' | base64 -d
# ......

#   echo "Adding bootstrap control plane and bootstrap installer-gather bundle to master ignition"
#   bootkube_podman_run \
#     --rm \
#     --privileged \
#     --volume "$PWD:/assets:z" \
#     --volume "/usr/local/bin/:/assets/bin" \
#     --volume "/var/lib/etcd/:/assets/etcd-data" \
#     --volume "/etc/kubernetes:/assets/kubernetes" \
#     "${CLUSTER_BOOTSTRAP_IMAGE}" \
#     bootstrap-in-place \
#     --asset-dir /assets \
#     --input /assets/bootstrap-in-place/master-update.fcc \
#     --output /assets/master.ign

#   touch master-ignition.done
#   record_service_stage_success
# fi

# https://github.com/openshift/cluster-bootstrap
cd /data/install

podman run --rm -it  \
    --privileged \
    --volume "$PWD:/assets:z" \
    --volume "/usr/local/bin/:/assets/bin" \
    --volume "/var/lib/etcd/:/assets/etcd-data" \
    --volume "/etc/kubernetes:/assets/kubernetes" \
    quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c29cb321d7ac72d86a86ba4a74a0774ed2ebf9910d65c1805245a17d7b005b88 \
    bootstrap-in-place \
    --asset-dir /assets \
    --input /assets/bootstrap-in-place/master-update.fcc \
    --output /assets/master.ign

lsblk -o PARTUUID,NAME,FSTYPE,LABEL,UUID,MOUNTPOINT
# PARTUUID                             NAME   FSTYPE LABEL      UUID                                 MOUNTPOINT
#                                      sr0
#                                      vda
# e23d3123-1d83-4665-8b0f-1c39f8e8f533 ├─vda1
# ed26d305-052e-4148-9b44-05357053742a ├─vda2 vfat   EFI-SYSTEM 1533-24B8
# ae634b25-a5b9-4667-85ce-119455a92e53 ├─vda3 ext4   boot       85555068-e37d-4773-837c-d279550eb818 /boot
# ef1b4117-0c2d-4f53-abd4-d3019ecf267e ├─vda4 xfs    root       936512ae-5449-4a2f-808e-1c698859c877 /sysroot
# e7b459fb-f2e1-43c9-b638-c732898eedf5 ├─vda5
# 9f0f85c7-51c6-4f2a-b7b7-c8ea3131fb32 └─vda6

end


Array.from(document.querySelectorAll("div[class='catalog-tile-pf-title']")).forEach(txt => console.log(txt.html())); 

openshift 4.10 single node, post-install, lvm and nfs

single node ocp,如果有一块单独的硬盘,那么可以用lvm operator来自动分配lvm,创建存储给应用用。我们接下来在这个存储基础上配置nfs,就变成了一个集群内部的nfs服务。

视频讲解

install lvm operator

we need local storage, and we are single node openshift, so we use lvm operator, find the operator from operator hub and install :

lvm operator is in TP, so it is buggy, we need some fix.


# oc create ns lvm-operator-system

cat << EOF > /data/install/lvm-operator.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: lvm-operator-system
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: lvm-operator-system
  namespace: lvm-operator-system
spec:
  targetNamespaces:
  - lvm-operator-system
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: odf-lvm-operator
  namespace: lvm-operator-system
spec:
  channel: "stable-4.10"
  installPlanApproval: Manual
  name: odf-lvm-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF
oc create -f /data/install/lvm-operator.yaml

# oc delete -f /data/install/lvm-operator.yaml


ssh -tt core@192.168.7.13 -- lsblk
# NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
# sr0     11:0    1  1024M  0 rom
# vda    252:0    0   120G  0 disk
# ├─vda1 252:1    0     1M  0 part
# ├─vda2 252:2    0   127M  0 part
# ├─vda3 252:3    0   384M  0 part /boot
# └─vda4 252:4    0 119.5G  0 part /sysroot
# vdb    252:16   0   100G  0 disk

oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:lvm-operator-system:topolvm-controller -n lvm-operator-system

oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:lvm-operator-system:vg-manager -n lvm-operator-system

oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:lvm-operator-system:topolvm-node -n lvm-operator-system

cat << EOF > /data/install/lvm.op.yaml
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
  name: lvmcluster-sample
spec:
  storage:
    deviceClasses:
    - name: vg1
    #   thinPoolConfig:
    #     name: thin-pool-1
    #     sizePercent: 50
    #     overprovisionRatio: 50
EOF
oc create -n lvm-operator-system -f /data/install/lvm.op.yaml

kubectl patch storageclass odf-lvm-vg1 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'


cat << EOF > /data/install/lvm.op.pvc.sample.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lvm-file-pvc
spec:
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: odf-lvm-vg1
EOF
oc create -f /data/install/lvm.op.pvc.sample.yaml -n default

cat <<EOF > /data/install/lvm.op.app.sample.yaml
apiVersion: v1
kind: Pod
metadata:
  name: app-file
spec:
  containers:
  - name: app-file
    image: registry.access.redhat.com/ubi8/ubi:8.4
    imagePullPolicy: IfNotPresent
    command: ["/usr/bin/bash", "-c", "/usr/bin/tail -f /dev/null"]
    volumeMounts:
    - mountPath: "/mnt/file"
      name: lvm-file-pvc
  volumes:
    - name: lvm-file-pvc
      persistentVolumeClaim:
        claimName: lvm-file-pvc
EOF
oc create -f /data/install/lvm.op.app.sample.yaml -n default

install nfs service inside cluster


oc create ns nfs-system

oc project nfs-system

cd /data/install

wget -O nfs.all.yaml https://raw.githubusercontent.com/wangzheng422/nfs-ganesha-server-and-external-provisioner/wzh/deploy/openshift/nfs.all.yaml

oc create -n nfs-system -f nfs.all.yaml

# try it out

wget -O nfs.demo.yaml https://raw.githubusercontent.com/wangzheng422/nfs-ganesha-server-and-external-provisioner/wzh/deploy/openshift/nfs.demo.yaml

oc create -n default -f nfs.demo.yaml

start install openshift 4.10 single node by booting from linux

在做PoC的时候,客户经常会给我们虚拟机平台上的虚拟机,这个时候,我们就需要从虚拟机 boot coreos , 进而开始安装, 本文就描述相关步骤。

安装过程内部流程图:

视频讲解

from centos7


# install a centos7 vm

sed -i '0,/^network.*/s/^network.*/network  --bootproto=static --device=eth0 --gateway=192.168.7.1 --ip=192.168.7.12  --netmask=255.255.255.0 --nameserver=192.168.7.11  --ipv6=auto --activate/' helper-ks.cfg

virsh destroy ocp4-acm-hub
virsh undefine ocp4-acm-hub

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

create_lv vgdata poolA lvacmhub 100G recreate
create_lv vgdata poolA lvacmhub-data 100G recreate


virt-install --name="ocp4-acm-hub" --vcpus=16 --ram=$((52*1024)) \
    --cpu=host-model \
    --disk path=/dev/vgdata/lvacmhub,device=disk,bus=virtio,format=raw \
    --disk path=/dev/vgdata/lvacmhub-data,device=disk,bus=virtio,format=raw \
    --os-variant rhel8.5 --network bridge=baremetal,model=virtio \
    --graphics vnc,port=59000 \
    --boot menu=on --location /data/kvm/CentOS-7-x86_64-Minimal-2009.iso \
    --initrd-inject helper-ks.cfg --extra-args "inst.ks=file:/helper-ks.cfg" 

# copy ignition file to webserver
# /bin/cp -f iso.ign /var/www/html/ignition/iso.ign

# copy rhcos-live.x86_64.iso to centos
ssh-copy-id root@192.168.7.12

scp /data/install/sno.iso root@192.168.7.12:~/

# goto centos
ssh root@192.168.7.12

mount -o ro sno.iso /mnt

/bin/cp -f /mnt/images/pxeboot/{initrd.img,vmlinuz} /boot/
/bin/cp -f /mnt/images/ignition.img /boot/

SNO_IP=192.168.7.13
SNO_GW=192.168.7.11
SNO_NETMAST=255.255.255.0
SNO_HOSTNAME=acm-demo-hub-master
SNO_IF=enp1s0
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_ROOTFS=http://192.168.7.11:8080/install/rootfs.img
SNO_IGN=http://192.168.7.11:8080/ignition/iso.ign

cat << EOF >> /etc/grub.d/40_custom
menuentry 'coreos' --class fedora --class gnu-linux --class gnu --class os {
    insmod gzio
    insmod part_msdos
    insmod xfs
    set root='hd0,msdos1'
    echo  'Loading coreos kernel ...'
    linux /vmlinuz rd.neednet=1 console=tty0 console=ttyS0 coreos.live.rootfs_url=$SNO_ROOTFS  ip=$SNO_IP::$SNO_GW:$SNO_NETMAST:$SNO_HOSTNAME:$SNO_IF:none nameserver=$SNO_DNS ignition.firstboot ignition.platform.id=metal random.trust_cpu=on
    echo  'Loading coreos initrd ...'
    initrd /initrd.img /ignition.img
}
EOF

sed -i 's/^GRUB_DEFAULT=.*/GRUB_DEFAULT="coreos"/' /etc/default/grub 

grub2-mkconfig -o /etc/grub2.cfg

reboot

from rocky linux 8



# install a rocky8 vm

sed -i '0,/^network.*/s/^network.*/network  --bootproto=static --device=enp1s0 --gateway=192.168.7.1 --ip=192.168.7.12  --netmask=255.255.255.0 --nameserver=192.168.7.11  --ipv6=auto --activate/' helper-ks-rocky.cfg

virsh destroy ocp4-acm-hub
virsh undefine ocp4-acm-hub

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

create_lv vgdata poolA lvacmhub 100G recreate
create_lv vgdata poolA lvacmhub-data 100G recreate


virt-install --name="ocp4-acm-hub" --vcpus=16 --ram=$((52*1024)) \
    --cpu=host-model \
    --disk path=/dev/vgdata/lvacmhub,device=disk,bus=virtio,format=raw \
    --disk path=/dev/vgdata/lvacmhub-data,device=disk,bus=virtio,format=raw \
    --os-variant rhel8.5 --network bridge=baremetal,model=virtio \
    --graphics vnc,port=59000 \
    --boot menu=on --location /data/kvm/Rocky-8.6-x86_64-minimal.iso \
    --initrd-inject helper-ks-rocky.cfg --extra-args "inst.ks=file:/helper-ks-rocky.cfg" 

# copy ignition file to webserver
# /bin/cp -f iso.ign /var/www/html/ignition/iso.ign

# copy rhcos-live.x86_64.iso to centos
ssh-copy-id root@192.168.7.12

scp /data/install/sno.iso root@192.168.7.12:~/

# goto centos
ssh root@192.168.7.12

mount -o ro sno.iso /mnt

/bin/cp -f /mnt/images/pxeboot/{initrd.img,vmlinuz} /boot/
/bin/cp -f /mnt/images/ignition.img /boot/

SNO_IP=192.168.7.13
SNO_GW=192.168.7.11
SNO_NETMAST=255.255.255.0
SNO_HOSTNAME=acm-demo-hub-master
SNO_IF=enp1s0
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_ROOTFS=http://192.168.7.11:8080/install/rootfs.img
SNO_IGN=http://192.168.7.11:8080/ignition/iso.ign

cat << EOF >> /etc/grub.d/40_custom
menuentry 'coreos' --class fedora --class gnu-linux --class gnu --class os {
    insmod gzio
    insmod part_msdos
    insmod xfs
    set root='hd0,msdos1'
    echo  'Loading coreos kernel ...'
    linux /vmlinuz rd.neednet=1 console=tty0 console=ttyS0 coreos.live.rootfs_url=$SNO_ROOTFS  ip=$SNO_IP::$SNO_GW:$SNO_NETMAST:$SNO_HOSTNAME:$SNO_IF:none nameserver=$SNO_DNS ignition.firstboot ignition.platform.id=metal random.trust_cpu=on
    echo  'Loading coreos initrd ...'
    initrd /initrd.img /ignition.img
}
EOF

sed -i 's/^GRUB_DEFAULT=.*/GRUB_DEFAULT="coreos"/' /etc/default/grub 

grub2-mkconfig -o /etc/grub2.cfg

reboot


openshift 4.10 single node with ODF

我们可以给single node openshift,配置一个ceph/odf存储。可以是单独的一块硬盘,也可以是系统安装盘上面多分出来的数据分区。

本文档的前导实验,是如何部署一个普通的single node openshift

视频讲解

reference:

install ceph components to ocp


# cat << EOF > /data/install/local-storage.yaml
# ---
# apiVersion: v1
# kind: Namespace
# metadata:
#   name: openshift-local-storage
#   annotations:
#     workload.openshift.io/allowed: management
# ---
# apiVersion: operators.coreos.com/v1
# kind: OperatorGroup
# metadata:
#   name: openshift-local-storage
#   namespace: openshift-local-storage
# spec:
#   targetNamespaces:
#   - openshift-local-storage
# ---
# apiVersion: operators.coreos.com/v1alpha1
# kind: Subscription
# metadata:
#   name: local-storage-operator
#   namespace: openshift-local-storage
# spec:
#   channel: "stable"
#   installPlanApproval: Manual
#   name: local-storage-operator
#   source: redhat-operators
#   sourceNamespace: openshift-marketplace
# EOF
# oc create -f /data/install/local-storage.yaml

cat << EOF > /data/install/openshift-storage.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-storage
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-storage
  namespace: openshift-storage
spec:
  targetNamespaces:
  - openshift-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: odf-operator
  namespace: openshift-storage
spec:
  channel: "stable-4.10"
  installPlanApproval: Manual
  name: odf-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF
oc create -f /data/install/openshift-storage.yaml

cd /data/install

cat << EOF > /data/install/ceph-cluster.yaml
---
apiVersion: ceph.rook.io/v1
kind: CephCluster

metadata:
  name: main
  namespace: openshift-storage

spec:
  storage:
    useAllNodes: true
    useAllDevices: true
  cephVersion:
    # Ceph 16 (pacific)
    image: quay.io/ceph/ceph:v16.2.6 # https://quay.io/repository/ceph/ceph?tab=tags
    #image: registry.redhat.io/rhceph/rhceph-5-rhel8:5-14 # https://catalog.redhat.com/software/containers/rhceph/rhceph-5-rhel8/60ec72a74a6a2c7844abe5fb?tag=all

    # Ceph 14 (nautilus)
    #image: quay.io/ceph/ceph:v14.2.22
    #image: registry.redhat.io/rhceph/rhceph-4-rhel8:4-59 # https://catalog.redhat.com/software/containers/detail/5e39df7cd70cc54b02baf33f?tag=all

    # Ceph 12 (luminous)
    #image: registry.redhat.io/rhceph/rhceph-3-rhel7:3-51 # https://catalog.redhat.com/software/containers/rhceph/rhceph-3-rhel7/5a15ec17ecb5244d5b553577?tag=all
  mon:
    allowMultiplePerNode: true
  mgr:
    allowMultiplePerNode: true
    modules:
    - name: balancer
      enabled: true
    - name: pg_autoscaler
      enabled: true
    - name: rook
      enabled: true
  dashboard:
    enabled: true
    port: 8443
    ssl: false
  monitoring:
    enabled: true
    rulesNamespace: openshift-storage
  logCollector:
    enabled: true
    periodicity: 24h
  disruptionManagement:
    managePodBudgets: true
    machineDisruptionBudgetNamespace: openshift-machine-api
  priorityClassNames:
    mgr: system-node-critical
    mon: system-node-critical
    osd: system-node-critical
  dataDirHostPath: /var/lib/rook # under /host in CoreOS
  continueUpgradeAfterChecksEvenIfNotHealthy: true

---

kind: ConfigMap
apiVersion: v1

metadata:
  name: rook-config-override # this name is required!
  namespace: openshift-storage

data:
  config: |
    [global]
    osd_pool_default_size = 1
    mon_warn_on_pool_no_redundancy = false
EOF

oc create -f /data/install/ceph-cluster.yaml

# oc apply -f /data/install/ceph-cluster.yaml

oc exec deployment/rook-ceph-operator -n openshift-storage -- \
    ceph -c /var/lib/rook/openshift-storage/openshift-storage.config -s
#   cluster:
#     id:     17cb663d-e4f4-4f9b-9993-ce33c971496a
#     health: HEALTH_OK

#   services:
#     mon: 3 daemons, quorum a,b,c (age 8m)
#     mgr: a(active, since 7m)
#     osd: 1 osds: 1 up (since 7m), 1 in (since 7m)

#   data:
#     pools:   1 pools, 128 pgs
#     objects: 0 objects, 0 B
#     usage:   5.4 MiB used, 100 GiB / 100 GiB avail
#     pgs:     128 active+clean

# oc expose svc/rook-ceph-mgr-dashboard -n openshift-storage
oc create route edge --service=rook-ceph-mgr-dashboard -n openshift-storage

oc get route -n openshift-storage
# NAME                      HOST/PORT                                                                PATH   SERVICES                  PORT             TERMINATION   WILDCARD
# rook-ceph-mgr-dashboard   rook-ceph-mgr-dashboard-openshift-storage.apps.acm-demo-hub.redhat.ren          rook-ceph-mgr-dashboard   http-dashboard                 None

oc get secret rook-ceph-dashboard-password --output=jsonpath="{['data']['password']}" -n openshift-storage | base64 -d && echo
# d%`1E#/jBL?7NcG0G5\*

# access cashboard on http://rook-ceph-mgr-dashboard-openshift-storage.apps.acm-demo-hub.redhat.ren/
# with username admin 

add cephfs support

cat << EOF > /data/install/ceph-cluster-config.yaml
apiVersion: ceph.rook.io/v1
kind: CephFilesystem

metadata:
  name: main
  namespace: openshift-storage

# See:
# https://github.com/rook/rook/blob/master/Documentation/ceph-filesystem.md
# https://github.com/rook/rook/blob/master/Documentation/ceph-filesystem-crd.md
# https://github.com/rook/rook/blob/master/Documentation/ceph-pool-crd.md

spec:
  metadataPool:
    replicated:
      size: 1
      requireSafeReplicaSize: false
  dataPools:
  - failureDomain: osd
    replicated:
      size: 1
      requireSafeReplicaSize: false
  metadataServer:
    activeCount: 1
    activeStandby: true

---

apiVersion: storage.k8s.io/v1
kind: StorageClass

metadata:
  name: ceph-fs

reclaimPolicy: Delete
provisioner: openshift-storage.cephfs.csi.ceph.com
parameters:
  clusterID: openshift-storage
  fsName: main
  pool: main-data0

  csi.storage.k8s.io/provisioner-secret-namespace: openshift-storage
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner

  csi.storage.k8s.io/controller-expand-secret-namespace: openshift-storage
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner

  csi.storage.k8s.io/node-stage-secret-namespace: openshift-storage
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
EOF
oc create -f /data/install/ceph-cluster-config.yaml

# oc delete -f /data/install/ceph-cluster-config.yaml

oc exec deployment/rook-ceph-operator -n openshift-storage --     ceph -c /var/lib/rook/openshift-storage/openshift-storage.config -s
  # cluster:
  #   id:     3e7d32b0-9160-4421-9c7e-217116279601
  #   health: HEALTH_OK

  # services:
  #   mon: 3 daemons, quorum a,b,c (age 4m)
  #   mgr: a(active, since 3m)
  #   mds: 1/1 daemons up, 1 hot standby
  #   osd: 1 osds: 1 up (since 3m), 1 in (since 4m)

  # data:
  #   volumes: 1/1 healthy
  #   pools:   3 pools, 192 pgs
  #   objects: 22 objects, 2.3 KiB
  #   usage:   6.2 MiB used, 100 GiB / 100 GiB avail
  #   pgs:     192 active+clean

  # io:
  #   client:   852 B/s rd, 1 op/s rd, 0 op/s wr

  # progress:

add ceph-rbd support

cat << EOF > /data/install/ceph-cluster-config-rdb.yaml
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: openshift-storage
spec:
  failureDomain: osd
  replicated:
    size: 1
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: ceph-rbd
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: openshift-storage.rbd.csi.ceph.com
parameters:
    # clusterID is the namespace where the rook cluster is running
    clusterID: openshift-storage
    # Ceph pool into which the RBD image shall be created
    pool: replicapool

    # (optional) mapOptions is a comma-separated list of map options.
    # For krbd options refer
    # https://docs.ceph.com/docs/master/man/8/rbd/#kernel-rbd-krbd-options
    # For nbd options refer
    # https://docs.ceph.com/docs/master/man/8/rbd-nbd/#options
    # mapOptions: lock_on_read,queue_depth=1024

    # (optional) unmapOptions is a comma-separated list of unmap options.
    # For krbd options refer
    # https://docs.ceph.com/docs/master/man/8/rbd/#kernel-rbd-krbd-options
    # For nbd options refer
    # https://docs.ceph.com/docs/master/man/8/rbd-nbd/#options
    # unmapOptions: force

    # RBD image format. Defaults to "2".
    imageFormat: "2"

    # RBD image features. Available for imageFormat: "2". CSI RBD currently supports only layering feature.
    imageFeatures: layering

    # The secrets contain Ceph admin credentials.
    csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/provisioner-secret-namespace: openshift-storage
    csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/controller-expand-secret-namespace: openshift-storage
    csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
    csi.storage.k8s.io/node-stage-secret-namespace: openshift-storage

    csi.storage.k8s.io/fstype: ext4

# Delete the rbd volume when a PVC is deleted
reclaimPolicy: Delete

# Optional, if you want to add dynamic resize for PVC.
# For now only ext3, ext4, xfs resize support provided, like in Kubernetes itself.
allowVolumeExpansion: true

EOF
oc create -f /data/install/ceph-cluster-config-rdb.yaml

# oc delete -f /data/install/ceph-cluster-config-rdb.yaml

kubectl patch storageclass ceph-rbd -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'


oc exec deployment/rook-ceph-operator -n openshift-storage --     ceph -c /var/lib/rook/openshift-storage/openshift-storage.config -s
#   cluster:
#     id:     17cb663d-e4f4-4f9b-9993-ce33c971496a
#     health: HEALTH_WARN
#             too many PGs per OSD (302 > max 250)

#   services:
#     mon: 3 daemons, quorum a,b,c (age 67m)
#     mgr: a(active, since 38m)
#     mds: 1/1 daemons up, 1 hot standby
#     osd: 1 osds: 1 up (since 38m), 1 in (since 67m)

#   data:
#     volumes: 1/1 healthy
#     pools:   4 pools, 302 pgs
#     objects: 28 objects, 2.3 KiB
#     usage:   33 MiB used, 100 GiB / 100 GiB avail
#     pgs:     0.331% pgs not active
#              301 active+clean
#              1   peering

#   progress:
#     Global Recovery Event (4s)
#       [===========================.]

add object storage / s3 support

cat << EOF > /data/install/ceph-cluster-config-object-store.yaml
---
apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
  name: my-store
  namespace: openshift-storage
spec:
  metadataPool:
    failureDomain: osd
    replicated:
      size: 1
  dataPool:
    failureDomain: osd
    # erasureCoded:
    #   dataChunks: 2
    #   codingChunks: 1
  preservePoolsOnDelete: true
  gateway:
    sslCertificateRef:
    port: 80
    # securePort: 443
    instances: 1
  healthCheck:
    bucket:
      disabled: false
      interval: 60s

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: ceph-bucket
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: openshift-storage.ceph.rook.io/bucket
reclaimPolicy: Delete
parameters:
  objectStoreName: my-store
  objectStoreNamespace: openshift-storage

EOF
oc create -f /data/install/ceph-cluster-config-object-store.yaml

# test out
cat << EOF > /data/install/ceph-cluster-config-s3.yaml
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: ceph-bucket
spec:
  generateBucketName: ceph-bkt
  storageClassName: ceph-bucket
EOF
oc create -n default -f /data/install/ceph-cluster-config-s3.yaml

# oc get -n default  ObjectBucketClaim

# get parameters from ceph's object storage
export AWS_HOST=$(kubectl -n default get cm ceph-bucket -o jsonpath='{.data.BUCKET_HOST}')
export PORT=$(kubectl -n default get cm ceph-bucket -o jsonpath='{.data.BUCKET_PORT}')
export BUCKET_NAME=$(kubectl -n default get cm ceph-bucket -o jsonpath='{.data.BUCKET_NAME}')
export AWS_ACCESS_KEY_ID=$(kubectl -n default get secret ceph-bucket -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 --decode)
export AWS_SECRET_ACCESS_KEY=$(kubectl -n default get secret ceph-bucket -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 --decode)

customized coreos/rhcos for openshift4 / 定制 openshift4 的底层 coreos/rhcos 操作系统

我们做项目的时候,经常有对底层操作系统做修改的需求,比如

  • 添加一些顺手的工具,如 htop, tcpdump, iperf 等,都是系统出故障的时候,排查用的利器
  • 添加一些内核驱动程序,特别是我们有特殊的硬件插了进来,比如 DPU,GPU
  • 我们有一些特殊的软件方案,需要在操作系统一层进行启动。

When we are working on projects, we often need to modify the underlying operating system, such as

  • Add some handy tools, such as htop, tcpdump, iperf, etc., which are all useful tools for troubleshooting when the system fails
  • Add some kernel drivers, especially we have special hardware plugged in, like DPU, GPU
  • We have some special software solutions that need to be activated at the OS level.

而 openshift4 设计的初衷,是云原生安全,于是把底层操作系统,使用 coreos / rhcos 的方式提供,并且 rhcos 官方没有定制化方法和文档。这种方法确实提高了 openshift4 的安全性,稳定性,和全局的一致性,但是项目中也确实遇到了很多尴尬。

The original intention of openshift4 design is cloud-native security, so the underlying operating system is provided in the form of coreos / rhcos, and rhcos officially does not have customized methods and documents. This approch does improve the security, stability, and global consistency of openshift4, but it does encounter a lot of embarrassment in the project.

本文就针对以上问题,摸索出了如何定制底层 rhcos , 并且应用到 openshift4 之上的方法。其实这些方法,在 openshift 的 github 项目文档中都有,只不过之前没仔细研究罢了。

In view of the above problems, this article finds out how to customize the underlying rhcos and apply it to openshift4. In fact, these methods are available in the github project documentation of openshift, but they have not been carefully studied before.

⚠️⚠️⚠️注意,本文所述方法,涉及到了以下问题,不能使用在生产环境中,只能作为 PoC 应急,或者研究学习之用。如果确实是项目急需,请和红帽GPS部门沟(gěi)通(qián),获得支持。

  • ⚠️编译需要多个 rhel 相关的特种源,而且还是 eus, tus 版本,这些都需要单独购买
  • ⚠️编译需要一个红帽内部的 repo 源,属于红帽机密
  • ⚠️自定义的 rhcos 不能得到红帽 CEE 支持

⚠️⚠️⚠️ Note that the method described in this article involves the following issues and cannot be used in a production environment. It can only be used as a PoC emergency or for research and learning. If it is really urgent for the project, please communicate with the Red Hat GPS department for support.

  • ⚠️ Compilation requires multiple rhel-related special sources, and they are also eus and tus versions, which need to be purchased separately
  • ⚠️ Compilation requires a Red Hat internal repo source, which is Red Hat Confidential
  • ⚠️ Custom rhcos cannot be supported by Red Hat CEE

本次实验的架构图如下: The architecture diagram of this experiment is as follows:

过程中,重度使用了 cosa , 这个是 coreos-assembler 工具集中的命令,他封装了一系列的工具,根据一个配置文件项目,来自动化的编译出来 coreos/rhcos 镜像。

In the process, cosa is heavily used, which is a command in the coreos-assembler tool set. It encapsulates a series of tools and automatically compiles the coreos/rhcos image according to a configuration file project.

视频讲解 / Video explanation

准备 dnf repo 源 / Prepare the dnf repo source

注意,这些 repo 源都是需要特殊单独购买,请联系红帽销售和GPS服务部门。

Note that these repo sources are required to be purchased separately, please contact Red Hat Sales and GPS Services.


# install a rhel on vultr

# disable user/passwd login
# ChallengeResponseAuthentication no
# PasswordAuthentication no
# UsePAM no
sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
# sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config

systemctl restart sshd

ssh root@v.redhat.ren -o PubkeyAuthentication=no
# root@v.redhat.ren: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

subscription-manager register --auto-attach --username ******** --password ********

# subscription-manager release --list
# subscription-manager release --set=8.4

# subscription-manager config --rhsm.baseurl=https://china.cdn.redhat.com

subscription-manager repos --list > list

subscription-manager repos \
    --enable="rhel-8-for-x86_64-baseos-rpms" \
    --enable="rhel-8-for-x86_64-appstream-rpms" \
    --enable="codeready-builder-for-rhel-8-x86_64-rpms" \
    # 

dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

dnf install -y byobu htop

# byobu
dnf update -y

reboot

mkdir -p /data/dnf

# Create new empty partitions, and filesystem
parted -s /dev/vdb mklabel gpt
parted -s /dev/vdb unit mib mkpart primary 0% 100%

mkfs.ext4 /dev/vdb1

cat << EOF >> /etc/fstab
/dev/vdb1               /data/dnf      ext4    defaults,noatime,nofail 0 0
EOF

mount /dev/vdb1 /data/dnf

mkdir -p /data/dnf/dnf-ocp-4.10

cd /data/dnf/dnf-ocp-4.10

subscription-manager release --set=8.4

dnf reposync --repoid rhel-8-for-x86_64-baseos-eus-rpms -m --download-metadata --delete -n
dnf reposync --repoid=rhel-8-for-x86_64-appstream-eus-rpms -m --download-metadata --delete -n
# dnf reposync --repoid=rhel-8-for-x86_64-nfv-tus-rpms -m --download-metadata --delete -n
dnf reposync --repoid=rhel-8-for-x86_64-nfv-rpms -m --download-metadata --delete -n
dnf reposync --repoid=advanced-virt-for-rhel-8-x86_64-eus-rpms -m --download-metadata --delete -n
dnf reposync --repoid=fast-datapath-for-rhel-8-x86_64-rpms -m --download-metadata --delete -n

subscription-manager release --set=8

dnf -y install vsftpd

mkdir -p /var/ftp/dnf
mount --bind /data/dnf/dnf-ocp-4.10 /var/ftp/dnf
chcon -R -t public_content_t /var/ftp/dnf

sed -i "s/anonymous_enable=NO/anonymous_enable=YES/" /etc/vsftpd/vsftpd.conf

cat << EOF >> /etc/vsftpd/vsftpd.conf

pasv_enable=YES
pasv_max_port=10100
pasv_min_port=10090

EOF

systemctl disable --now firewalld
systemctl enable --now vsftpd

准备 build 服务器 / Prepare the build server

注意,build 服务器需要支持 kvm ,如果选用的云平台,需要云平台支持嵌套虚拟化。

本次实验,我们选用了一台 centos stream 8 的云主机。

Note that the build server needs to support kvm. If you choose a cloud platform, the cloud platform needs to support nested virtualization.

In this experiment, we chose a cloud host of centos stream 8.

# install a centos stream 8 on digitalocean, 
# 2c 2G for ostree only
# 4c 8G for iso because it needs metal first

dnf install -y epel-release

dnf install -y byobu htop

dnf update -y

reboot

dnf groupinstall -y server

dnf install -y lftp podman

dnf -y install qemu-kvm libvirt libguestfs-tools virt-install virt-viewer virt-manager tigervnc-server

systemctl disable --now firewalld

systemctl enable --now libvirtd

开始编译 rhcos / Start compiling rhcos

cosa 的输入是一个配置文件项目,上游是 https://github.com/openshift/os , 我们做了下游扩展,加入了 epel 源,并且把操作系统名字,加入了 wzh 的标记,并且添加了 htop, tcpdump, iperf3 这几个常用的排错命令,作为演示。我们还从 epel 引入 pdns, pdns-recursor , 支持离线环境 dns 内嵌。 同时,我们从 https://github.com/distribution/distribution/releases 下载 docker image registry 的二进制文件加入进来,支持镜像仓库内嵌。

The input of cosa is a configuration file project, and the upstream is https://github.com/openshift/os. We made downstream extensions, added the epel source, added the operating system name, added the wzh mark, and added htop , tcpdump, iperf3 These commonly used troubleshooting commands are used as demonstrations.

# machine-os-images just copy a iso into container
# machine-os-content is our target

# follow coreos-assembler instruction
# https://github.com/coreos/coreos-assembler/blob/main/docs/building-fcos.md
# https://coreos.github.io/coreos-assembler/
# https://github.com/openshift/os/blob/master/docs/development-rhcos.md
# https://github.com/openshift/os/blob/master/docs/development.md

# https://github.com/openshift/os/blob/master/docs/development.md
# https://github.com/openshift/release/blob/master/core-services/release-controller/README.md#rpm-mirrors

export COREOS_ASSEMBLER_CONTAINER=quay.io/coreos-assembler/coreos-assembler:rhcos-4.10
# export COREOS_ASSEMBLER_CONTAINER=quay.io/coreos-assembler/coreos-assembler:latest
podman pull $COREOS_ASSEMBLER_CONTAINER

podman login ************* quay.io

cosa() {
   env | grep COREOS_ASSEMBLER
   local -r COREOS_ASSEMBLER_CONTAINER_LATEST="quay.io/coreos-assembler/coreos-assembler:latest"
   if [[ -z ${COREOS_ASSEMBLER_CONTAINER} ]] && $(podman image exists ${COREOS_ASSEMBLER_CONTAINER_LATEST}); then
       local -r cosa_build_date_str="$(podman inspect -f "{{.Created}}" ${COREOS_ASSEMBLER_CONTAINER_LATEST} | awk '{print $1}')"
       local -r cosa_build_date="$(date -d ${cosa_build_date_str} +%s)"
       if [[ $(date +%s) -ge $((cosa_build_date + 60*60*24*7)) ]] ; then
         echo -e "\e[0;33m----" >&2
         echo "The COSA container image is more that a week old and likely outdated." >&2
         echo "You should pull the latest version with:" >&2
         echo "podman pull ${COREOS_ASSEMBLER_CONTAINER_LATEST}" >&2
         echo -e "----\e[0m" >&2
         sleep 10
       fi
   fi
   set -x
   podman run --rm -ti --security-opt label=disable --privileged                                    \
              --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536                          \
              -v ${PWD}:/srv/ --device /dev/kvm --device /dev/fuse                                  \
              -v /run/user/0/containers/auth.json:/home/builder/.docker/config.json                      \
              --tmpfs /tmp -v /var/tmp:/var/tmp --name cosa                                         \
              ${COREOS_ASSEMBLER_CONFIG_GIT:+-v $COREOS_ASSEMBLER_CONFIG_GIT:/srv/src/config/:ro}   \
              ${COREOS_ASSEMBLER_GIT:+-v $COREOS_ASSEMBLER_GIT/src/:/usr/lib/coreos-assembler/:ro}  \
              ${COREOS_ASSEMBLER_CONTAINER_RUNTIME_ARGS}                                            \
              ${COREOS_ASSEMBLER_CONTAINER:-$COREOS_ASSEMBLER_CONTAINER_LATEST} "$@"
   rc=$?; set +x; return $rc
}

rm -rf /data/rhcos
mkdir -p /data/rhcos
cd /data/rhcos

cosa init --branch wzh-ocp-4.10 https://github.com/wangzheng422/machine-os-content

sed -i 's/REPO_IP/v.redhat.ren/g' /data/rhcos/src/config/wzh.repo

wget -O src/config/overlay.d/99wzh-pdns/usr/bin/registry.tgz https://github.com/distribution/distribution/releases/download/v2.8.1/registry_2.8.1_linux_amd64.tar.gz

tar zvxf src/config/overlay.d/99wzh-pdns/usr/bin/registry.tgz -C src/config/overlay.d/99wzh-pdns/usr/bin

/bin/rm -f src/config/overlay.d/99wzh-pdns/usr/bin/registry.tgz
/bin/rm -f src/config/overlay.d/99wzh-pdns/usr/bin/LICENSE
/bin/rm -f src/config/overlay.d/99wzh-pdns/usr/bin/README.md

cosa fetch

cosa build ostree
# ......
# Ignored user missing from new passwd file: root
# New passwd entries: clevis, dnsmasq, gluster, systemd-coredump, systemd-journal-remote, systemd-resolve, tcpdump, unbound
# Ignored group missing from new group file: root
# New group entries: clevis, dnsmasq, gluster, input, kvm, printadmin, render, systemd-coredump, systemd-journal-remote, systemd-resolve, tcpdump, unbound
# Committing... done
# Metadata Total: 10907
# Metadata Written: 3721
# Content Total: 6584
# Content Written: 1344
# Content Cache Hits: 22043
# Content Bytes Written: 328474751
# 3721 metadata, 24647 content objects imported; 2.4 GB content written
# Wrote commit: 12876365301ad8f07ecf89b4fbe184f000a0816c895c6659ebc6822ef9c18ff7
# New image input checksum: 05e3c499a794b62d22ba12d8d73404ce5970d24b4f7a664b71d17c5cf50ccd4c
# None
# New build ID: 410.84.wzh.202208220647-0
# sha256:fa305389ffa50b73e259000d8f21753049de7e4c217c12df470798d34bd4b209
# Total objects: 28612
# No unreachable objects
# Ignoring non-directory /srv/builds/.build-commit
# + rc=0
# + set +x

# or build with default setting, ostree and qcow2
# cosa build

cosa list
# 410.84.wzh.202208220647-0
#    Timestamp: 2022-08-22T06:59:55Z (0:02:49 ago)
#    Artifacts: ostree
#       Config: wzh-ocp-4.10 (16c263bb4b5c) (dirty)

cosa upload-oscontainer --name "quay.io/wangzheng422/ocp"
# quay.io/wangzheng422/ocp:410.84.202208220734-wzh-0 afbdcfab3ffa897842f181505897e6b448f40e961014f74d94996e0589934b7e

# for pdns
# quay.io/wangzheng422/ocp:410.84.202208251115-wzh-0 75beaec896b43eaa910e04f9c405687419baff09eb627c984382698f67066e8a

# for pdns, registry
# quay.io/wangzheng422/ocp:410.84.202208260926-wzh-0 13942b16d990b5934f8fc1dd344ffc2b7a009459a0af7d26624601b01a3ebe30

cosa buildextend-metal
# ......
# + cosa meta --workdir /srv --build 410.84.202208221336-wzh-0 --artifact metal --artifact-json /srv/tmp/build.metal/meta.json.new
# /srv/builds/410.84.202208221336-wzh-0/x86_64/meta.json wrote with version stamp 1661176393037675276
# + /usr/lib/coreos-assembler/finalize-artifact rhcos-410.84.202208221336-wzh-0-metal.x86_64.raw /srv/builds/410.84.202208221336-wzh-0/x86_64/rhcos-410.84.202208221336-wzh-0-metal.x86_64.raw
# + set +x
# Successfully generated: rhcos-410.84.202208221336-wzh-0-metal.x86_64.raw

cosa buildextend-metal4k
# ......
# + cosa meta --workdir /srv --build 410.84.202208221336-wzh-0 --artifact metal4k --artifact-json /srv/tmp/build.metal4k/meta.json.new
# /srv/builds/410.84.202208221336-wzh-0/x86_64/meta.json wrote with version stamp 1661176647683428498
# + /usr/lib/coreos-assembler/finalize-artifact rhcos-410.84.202208221336-wzh-0-metal4k.x86_64.raw /srv/builds/410.84.202208221336-wzh-0/x86_64/rhcos-410.84.202208221336-wzh-0-metal4k.x86_64.raw
# + set +x
# Successfully generated: rhcos-410.84.202208221336-wzh-0-metal4k.x86_64.raw

cosa buildextend-live
# ......
# Writing:   Extension record                        Start Block 43
# Done with: Extension record                        Block(s)    1
# Writing:   The File(s)                             Start Block 44
#   9.70% done, estimate finish Mon Aug 22 14:14:12 2022
#  19.36% done, estimate finish Mon Aug 22 14:14:12 2022
#  29.05% done, estimate finish Mon Aug 22 14:14:12 2022
#  38.71% done, estimate finish Mon Aug 22 14:14:12 2022
#  48.40% done, estimate finish Mon Aug 22 14:14:12 2022
#  58.06% done, estimate finish Mon Aug 22 14:14:12 2022
#  67.75% done, estimate finish Mon Aug 22 14:14:12 2022
#  77.41% done, estimate finish Mon Aug 22 14:14:12 2022
#  87.10% done, estimate finish Mon Aug 22 14:14:12 2022
#  96.78% done, estimate finish Mon Aug 22 14:14:12 2022
# Total translation table size: 2048
# Total rockridge attributes bytes: 2838
# Total directory bytes: 12288
# Path table size(bytes): 96
# Done with: The File(s)                             Block(s)    51483
# Writing:   Ending Padblock                         Start Block 51527
# Done with: Ending Padblock                         Block(s)    150
# Max brk space used 24000
# 51677 extents written (100 MB)
# + /usr/bin/isohybrid --uefi /srv/tmp/buildpost-live/rhcos-410.84.202208221336-wzh-0-live.x86_64.iso.minimal
# + isoinfo -lR -i /srv/tmp/buildpost-live/rhcos-410.84.202208221336-wzh-0-live.x86_64.iso
# Embedded 262144 bytes Ignition config space at 4872192
# + coreos-installer iso extract pack-minimal-iso /srv/tmp/buildpost-live/rhcos-410.84.202208221336-wzh-0-live.x86_64.iso /srv/tmp/buildpost-live/rhcos-410.84.202208221336-wzh-0-live.x86_64.iso.minimal --consume
# Packing minimal ISO
# Matched 17 files of 17
# Total bytes skipped: 105419322
# Total bytes written: 486854
# Total bytes written (compressed): 2808
# Verifying that packed image matches digest
# Packing successful!
# Updated: builds/410.84.202208221336-wzh-0/x86_64/meta.json

# Create a new release based on openshift 4.10.28 and override a single image
oc adm release new -a /data/pull-secret.json \
  --from-release quay.io/openshift-release-dev/ocp-release@sha256:2127608ebd67a2470860c42368807a0de2308dba144ec4c298bec1c03d79cb52 \
  machine-os-content=quay.io/wangzheng422/ocp:410.84.202208260926-wzh-0 \
  --to-image docker.io/wangzheng422/ocp:4.10-demo-pdns

oc image mirror docker.io/wangzheng422/ocp:4.10-demo-pdns quay.io/wangzheng422/ocp:4.10-demo-pdns

oc adm release info quay.io/wangzheng422/ocp:4.10-demo --commit-urls=true
# Name:      4.10.28
# Digest:    sha256:57add9e36d950ea7eacfe8704279573952cfbed3192449b7cdcc8a72c4d28921
# Created:   2022-08-22T08:13:38Z
# OS/Arch:   linux/amd64
# Manifests: 544

# Pull From: quay.io/wangzheng422/ocp@sha256:57add9e36d950ea7eacfe8704279573952cfbed3192449b7cdcc8a72c4d28921

# Release Metadata:
#   Version:  4.10.28
#   Upgrades: 4.9.19, 4.9.21, 4.9.22, 4.9.23, 4.9.24, 4.9.25, 4.9.26, 4.9.27, 4.9.28, 4.9.29, 4.9.30, 4.9.31, 4.9.32, 4.9.33, 4.9.34, 4.9.35, 4.9.36, 4.9.37, 4.9.38, 4.9.39, 4.9.40, 4.9.41, 4.9.42, 4.9.43, 4.9.44, 4.9.45, 4.9.46, 4.10.3, 4.10.4, 4.10.5, 4.10.6, 4.10.7, 4.10.8, 4.10.9, 4.10.10, 4.10.11, 4.10.12, 4.10.13, 4.10.14, 4.10.15, 4.10.16, 4.10.17, 4.10.18, 4.10.20, 4.10.21, 4.10.22, 4.10.23, 4.10.24, 4.10.25, 4.10.26, 4.10.27
#   Metadata:
#     url: https://access.redhat.com/errata/RHBA-2022:6095

# Component Versions:
#   kubernetes 1.23.5
#   machine-os 410.84.202208220734-wzh-0 Red Hat Enterprise Linux CoreOS

# Images:
#   NAME                                           URL
#   alibaba-cloud-controller-manager               https://github.com/openshift/cloud-provider-alibaba-cloud/commit/db2d118ad70ff62a2111e83a8d14c5b32e176b38
#   alibaba-cloud-csi-driver                       https://github.com/openshift/alibaba-cloud-csi-driver/commit/3ddbb2b9d4994206183b5ffd6a0872ad9a5ce193
#   alibaba-disk-csi-driver-operator               https://github.com/openshift/alibaba-disk-csi-driver-operator/commit/f0d6966321e3d416efec2ac7405494b057cb35f8
#   alibaba-machine-controllers                    https://github.com/openshift/cluster-api-provider-alibaba/commit/0206121348c9a0d220dd6805cea79d1eae7fd3e0
# ......

oc adm release info quay.io/wangzheng422/ocp:4.10-demo
# ......
#   machine-config-operator                        sha256:6f0daed53e44e6377b0ac440f6293949278b912051b933b2dddfce0e6af2c70b
#   machine-os-content                             quay.io/wangzheng422/ocp:410.84.202208220734-wzh-0
#   machine-os-images                              sha256:783daa259e91647dec5b3e82ce496f8733345d707910d7dbbbdcaadcd75d599b
# ......

oc adm release info quay.io/wangzheng422/ocp:4.10-demo-pdns
# ......
#   machine-config-operator                        sha256:6f0daed53e44e6377b0ac440f6293949278b912051b933b2dddfce0e6af2c70b
#   machine-os-content                             sha256:99bfb9b88cd8bddcea353d304032a59d0734b2ef10353e105dbe4b6538207b88
#   machine-os-images                              sha256:783daa259e91647dec5b3e82ce496f8733345d707910d7dbbbdcaadcd75d599b
# ......

应用到 openshift4 / Apply to openshift4

我们编译好了 rhcos, 那么怎么应用到 openshift4 集群上呢,一般来说,有3种办法,github上有文章写,笔者认为,直接集群级别强制升级最简单。当然,不同项目,不同情况,需要根据情况分析。

We have compiled rhcos, so how to apply it to the openshift4 cluster, generally speaking, there are 3 ways, there is an article on github, the author believes that the direct cluster level forced upgrade is the easiest. Of course, different projects and different situations need to be analyzed according to the situation.

直接强制升级 / Direct forced upgrade

# test upgrade
oc adm upgrade \
  --to-image=quay.io/wangzheng422/ocp@sha256:b6d6fd197df2acf3ceafe60a7cc1023e7b192dffb43c675ddfcdfb4322828ddb \
  --allow-explicit-upgrade --allow-upgrade-with-warnings=true --force=true 

# after cluster upgrade
# check the os-release
cat /etc/os-release 
# NAME="Red Hat Enterprise Linux CoreOS"
# VERSION="410.84.202208220734-wzh-0"
# ID="rhcos"
# ID_LIKE="rhel fedora"
# VERSION_ID="4.10"
# PLATFORM_ID="platform:el8"
# PRETTY_NAME="Red Hat Enterprise Linux CoreOS 410.84.202208220734-wzh-0 (Ootpa)"
# ANSI_COLOR="0;31"
# CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos"
# HOME_URL="https://www.redhat.com/"
# DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.10/"
# BUG_REPORT_URL="https://bugzilla.redhat.com/"
# REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
# REDHAT_BUGZILLA_PRODUCT_VERSION="4.10"
# REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
# REDHAT_SUPPORT_PRODUCT_VERSION="4.10"
# OPENSHIFT_VERSION="4.10"
# RHEL_VERSION="8.4"
# OSTREE_VERSION='410.84.202208220734-wzh-0'

以下是截屏,这里是 os-release, 可以看到有 wzh 的标记:

这里是在 rhcos 上直接运行 htop 的界面:

analyze the content

我们可以 dump 这个 machine-os-content 的内容出来仔细分析。

We can dump the contents of this machine-os-content and analyze it carefully.


export BUILDNUMBER=4.10.28

wget -O openshift-client-linux-${BUILDNUMBER}.tar.gz https://mirror.openshift.com/pub/openshift-v4/clients/ocp/${BUILDNUMBER}/openshift-client-linux-${BUILDNUMBER}.tar.gz
wget -O openshift-install-linux-${BUILDNUMBER}.tar.gz https://mirror.openshift.com/pub/openshift-v4/clients/ocp/${BUILDNUMBER}/openshift-install-linux-${BUILDNUMBER}.tar.gz

tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/sbin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/sbin/

mkdir -p /data/ostree
cd /data/ostree

oc image extract --path /:/data/ostree --registry-config /run/user/0/containers/auth.json quay.io/wangzheng422/ocp:410.84.wzh.202208211552-0

end

rhcos / coreos install rpm using rpm-ostree | 给 rhcos / coreos 安装 rpm包

⚠️注意,本文所述操作,涉及更改 openshift 4 底层操作系统 rhcos,这有可能导致失去红帽支持资格,具体的情况,请和对口的红帽 GPS 团队沟通, 或者联系红帽 CEE 团队确认。

rhcos 是一个特殊版本的coreos, 它是openshift 4的底座操作系统,在openshift 4的官方文档中,rhcos被描述成为不可变操作系统,这会让人误以为,rhcos是不可改变的。这个错误的认识,让openshift 4在项目实施的过程中,遇到很多尴尬,也让很多场景,支持起来非常的别扭。

本文我们就来探索一下,如何在 rhcos / coreos 上安装rpm包,并正确理解一下不可变操作系统。

先说结论吧,笔者认为 rhcos / coreos 的 immutable os / 不可变操作系统的意思是这样的

  1. 操作系统的 /usr /lib /boot 等重要分区是只读的
  2. 操作系统的 /etc /var 是可写的,并且升级,重启保留/合并客户的修改内容。
  3. 操作系统的整个文件系统,使用类似 git 版本的方式管理,并且(当前)最多有2个版本
  4. 由于使用git方式管理,操作系统的改动,可以分为版本切换,和patch(layerd package)。其中版本切换,是中心下发的大版本升级,而patch可以认为是各个设备上做的小的修改。

而最终的实验结果,告诉我们,rhcos / coreos 是可以安装rpm的,安装命令是 rpm-ostree 。

接下来,我们就开始做实验,探索一下。以下是实验的部署架构图,部署结构很简单,就是一个openshift 4.10.26的6节点机器,并且有一个外部的rhel 8.4作为repo源。

视频讲解 / Video explanation

reference

openshift 4 using rpm-ostree install

使用rpm-ostree install并不神秘,openshift 4 支持的 machine config extension 操作的时候,就使用 rpm-ostree install来装软件包的。比如,如果我们激活 openshift 4 real-time kernel的支持,在node上看,就能看到他是通过装了更多的rpm来实现的。

rpm-ostree status
# State: idle
# Deployments:
# ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:480e39d63063bae8992542905d48442fd1d9d1325a5136a3be8256d123efe490
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 49.84.202110220538-0 (2021-10-22T05:41:35Z)
#        RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-305.19.1.el8_4
#            LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules kernel-rt-modules-extra

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:480e39d63063bae8992542905d48442fd1d9d1325a5136a3be8256d123efe490
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 49.84.202110220538-0 (2021-10-22T05:41:35Z)

在这里,我们可以看到,他是装了real-time kernel相关的rpm包来实现的,同时,他还删除了一些kernel相关的包。

using single rpm file

我们先做一个准备实验,如果我们有一个rpm文件,我们能下载并且直接安装吗?后面,如果openshift 4升级了,这个安装的rpm还在吗?

为了回答这个问题,我们就从epel上,下载一个 htop 的 rpm, 然后安装一下看看。

# login to worker: ip-10-0-139-149 shell
curl -o htop-3.2.1-1.el8.x86_64.rpm  https://rpmfind.net/linux/epel/8/Everything/x86_64/Packages/h/htop-3.2.1-1.el8.x86_64.rpm

rpm-ostree install ./htop-3.2.1-1.el8.x86_64.rpm
# Checking out tree 8b334e0... done
# No enabled rpm-md repositories.
# Importing rpm-md... done
# Resolving dependencies... done
# Checking out packages... done
# Running pre scripts... done
# Running post scripts... done
# Running posttrans scripts... done
# Writing rpmdb... done
# Writing OSTree commit... done
# Staging deployment... done
# Added:
#   htop-3.2.1-1.el8.x86_64
# Run "systemctl reboot" to start a reboot

systemctl reboot

# after reboot

rpm-ostree status
# State: idle
# Deployments:
# * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)
#              LocalPackages: htop-3.2.1-1.el8.x86_64

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)

oc get mcp
# NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
# master   rendered-master-c3ceea1602f442fde75df6aab905c41e   True      False      False      3              3                   3                     0                      11h
# worker   rendered-worker-c527565b03d522c2eb9bf6f33c419175   True      False      False      3              3                   3                     0                      11h

oc get node
# NAME                                         STATUS   ROLES    AGE   VERSION
# ip-10-0-133-232.us-east-2.compute.internal   Ready    master   11h   v1.23.5+012e945
# ip-10-0-139-149.us-east-2.compute.internal   Ready    worker   11h   v1.23.5+012e945
# ip-10-0-159-38.us-east-2.compute.internal    Ready    master   11h   v1.23.5+012e945
# ip-10-0-167-145.us-east-2.compute.internal   Ready    worker   11h   v1.23.5+012e945
# ip-10-0-189-34.us-east-2.compute.internal    Ready    master   11h   v1.23.5+012e945
# ip-10-0-215-151.us-east-2.compute.internal   Ready    worker   11h   v1.23.5+012e945

# upgrade from 4.10.26 to 4.10.28
rpm-ostree status
# State: idle
# Deployments:
# * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:822737b305b28aa4890f7bf847ebebc896cd7b549318195fc8c953ae3008cc44
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208161501-0 (2022-08-16T15:04:45Z)
#              LocalPackages: htop-3.2.1-1.el8.x86_64

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)
#              LocalPackages: htop-3.2.1-1.el8.x86_64


可以看到,htop rpm文件,是可以单独安装的,并且集群升级以后,这个rpm还在,patch的类型是LocalPackages。

using repo source

我们平常项目里面,要装的rpm很多,并且这些rpm还有依赖,那么rhcos能用传统的repo的方式,我们指定repo源,它就帮我们自动解析依赖,自动安装呢?

接下来,我们就从配置一个rpm源开始,一步一步的操作看看结果。

rpm simplified list

首先,我们确定一下,我们要装如下的rpm包。

htop lshw numactl libhugetlbfs-utils iperf3 tcpdump pdns pdns-recursor

build the repo

然后,我们要做一个rpm的repo。


export REPO_IP=http://v.redhat.ren:5180

cat << EOF > /etc/yum.repos.d/wzh.repo 
# RHEL repos
[rhel-8-baseos]
baseurl=${REPO_IP}/rhel-8-for-x86_64-baseos-eus-rpms

[rhel-8-appstream]
baseurl=${REPO_IP}/rhel-8-for-x86_64-appstream-eus-rpms

[rhel-8-fast-datapath]
baseurl=${REPO_IP}/fast-datapath-for-rhel-8-x86_64-rpms

[rhel-8-advanced-virt]
baseurl=${REPO_IP}/advanced-virt-for-rhel-8-x86_64-eus-rpms

[rhel-8-nfv]
baseurl=${REPO_IP}/rhel-8-for-x86_64-nfv-tus-rpms

# upstream: http://download.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/plashets/4.10-el8/building/x86_64/os/
# it is internal resource right now, confidential.
# or: https://mirror.openshift.com/enterprise/reposync/
# https://mirror.openshift.com/enterprise/reposync/4.10/rhel-8-server-ose-rpms/
# it also require logins.
[rhel-8-server-ose]
baseurl=${REPO_IP}/rhel-8-server-ose

# mirror list
# https://mirrors.fedoraproject.org/mirrorlist?repo=epel-8&arch=x86_64&&country=us
[epel]
baseurl=https://mirror.fcix.net/epel/8/Everything/x86_64/
enabled=1
repo_gpgcheck=0
gpgcheck=0

EOF

mv /etc/yum.repos.d/redhat.repo /etc/yum.repos.d/redhat.repo.wzh

rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

dnf install -y byobu htop createrepo_c python39

mkdir -p /data/dnf-ocp-4.10-simple
cd /data/dnf-ocp-4.10-simple

# 注意,这里是把rpm的依赖也一起下载了。
dnf download --resolve htop lshw numactl libhugetlbfs-utils iperf3 tcpdump pdns pdns-recursor

createrepo ./

至此,我们就有一个目录,目录里面是一个小小的rpm repo。

setup repo source

接下来,我们就把这个目录,通过http的方式发布出去,让openshift 4的节点,能使用到。


systemctl disable --now firewalld

mkdir -p /data/dnf

mount /dev/vdb1 /data/dnf

cd /data/dnf/dnf-ocp-4.10-simple

python3 -m http.server 5180

install to rhcos

我们使用前面提供好的rpm repo,并且在worker节点上安装我们需要的包。


export REPO_IP=http://v.redhat.ren:5180

cat << EOF > /etc/yum.repos.d/wzh.repo 
# RHEL repos
[simple]
baseurl=${REPO_IP}/
enabled=1
repo_gpgcheck=0
gpgcheck=0

EOF

rpm-ostree install htop lshw numactl libhugetlbfs-utils iperf3 tcpdump pdns pdns-recursor

systemctl reboot

# after reboot
rpm-ostree status
# State: idle
# Deployments:
# * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)
#            LayeredPackages: htop iperf3 libhugetlbfs-utils lshw numactl pdns pdns-recursor tcpdump

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)

可以看到,安装完成,多了很多包。那么我们把集群升级一下,会是什么效果呢?

# upgrade from 4.10.26 -> 4.10.28
rpm-ostree status
# State: idle
# Deployments:
# * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:822737b305b28aa4890f7bf847ebebc896cd7b549318195fc8c953ae3008cc44
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208161501-0 (2022-08-16T15:04:45Z)
#            LayeredPackages: htop iperf3 libhugetlbfs-utils lshw numactl pdns pdns-recursor tcpdump

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)
#            LayeredPackages: htop iperf3 libhugetlbfs-utils lshw numactl pdns pdns-recursor tcpdump

可以看到,升级完成以后,我们装的包,依然都在。

research

setup repo source


systemctl disable --now firewalld

mkdir -p /data/dnf

mount /dev/vdb1 /data/dnf

cd /data/dnf/dnf-ocp-4.10

python3 -m http.server 5180

install to rhcos


export REPO_IP=http://v.redhat.ren:5180

cat << EOF > /etc/yum.repos.d/wzh.repo 
# RHEL repos
[rhel-8-baseos]
baseurl=${REPO_IP}/rhel-8-for-x86_64-baseos-eus-rpms

[rhel-8-appstream]
baseurl=${REPO_IP}/rhel-8-for-x86_64-appstream-eus-rpms

[rhel-8-fast-datapath]
baseurl=${REPO_IP}/fast-datapath-for-rhel-8-x86_64-rpms

[rhel-8-advanced-virt]
baseurl=${REPO_IP}/advanced-virt-for-rhel-8-x86_64-eus-rpms

[rhel-8-nfv]
baseurl=${REPO_IP}/rhel-8-for-x86_64-nfv-tus-rpms

# upstream: http://download.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/plashets/4.10-el8/building/x86_64/os/
# it is internal resource right now, confidential.
# or: https://mirror.openshift.com/enterprise/reposync/
# https://mirror.openshift.com/enterprise/reposync/4.10/rhel-8-server-ose-rpms/
# it also require logins.
[rhel-8-server-ose]
baseurl=${REPO_IP}/rhel-8-server-ose

# mirror list
# https://mirrors.fedoraproject.org/mirrorlist?repo=epel-8&arch=x86_64&&country=us
[epel]
baseurl=https://mirror.fcix.net/epel/8/Everything/x86_64/
enabled=1
repo_gpgcheck=0
gpgcheck=0

EOF

rpm-ostree install htop
# Checking out tree 203abe6... done
# Enabled rpm-md repositories: rhel-8-baseos rhel-8-appstream rhel-8-fast-datapath rhel-8-advanced-virt rhel-8-nfv rhel-8-server-ose epel
# rpm-md repo 'rhel-8-baseos' (cached); generated: 2022-07-19T19:30:27Z
# Updating metadata for 'rhel-8-appstream'... done
# rpm-md repo 'rhel-8-appstream'; generated: 2022-08-16T17:13:40Z
# Updating metadata for 'rhel-8-fast-datapath'... done
# rpm-md repo 'rhel-8-fast-datapath'; generated: 2022-08-01T13:46:17Z
# Updating metadata for 'rhel-8-advanced-virt'... done
# rpm-md repo 'rhel-8-advanced-virt'; generated: 2022-06-13T11:46:08Z
# Updating metadata for 'rhel-8-nfv'... done
# rpm-md repo 'rhel-8-nfv'; generated: 2022-07-19T19:21:36Z
# Updating metadata for 'rhel-8-server-ose'... done
# rpm-md repo 'rhel-8-server-ose'; generated: 2022-08-20T01:24:13Z
# Updating metadata for 'epel'... done
# rpm-md repo 'epel'; generated: 2022-09-01T10:12:52Z
# Importing rpm-md... done
# Resolving dependencies... done
# Will download: 1 package (173.6 kB)
# Downloading from 'epel'... done
# Importing packages... done
# Checking out packages... done
# Running pre scripts... done
# Running post scripts... done
# Running posttrans scripts... done
# Writing rpmdb... done
# Writing OSTree commit... done
# Staging deployment... done
# Freed: 1.2 GB (pkgcache branches: 0)
# Added:
#   htop-3.2.1-1.el8.x86_64
# Run "systemctl reboot" to start a reboot

rpm-ostree status
# State: idle
# Deployments:
#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:822737b305b28aa4890f7bf847ebebc896cd7b549318195fc8c953ae3008cc44
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208161501-0 (2022-08-16T15:04:45Z)
#                       Diff: 1 added
#            LayeredPackages: htop

# * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:822737b305b28aa4890f7bf847ebebc896cd7b549318195fc8c953ae3008cc44
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208161501-0 (2022-08-16T15:04:45Z)

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)

systemcto reboot

# after reboot
rpm-ostree status
# State: idle
# Deployments:
# * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:822737b305b28aa4890f7bf847ebebc896cd7b549318195fc8c953ae3008cc44
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208161501-0 (2022-08-16T15:04:45Z)
#            LayeredPackages: htop

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:822737b305b28aa4890f7bf847ebebc896cd7b549318195fc8c953ae3008cc44
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208161501-0 (2022-08-16T15:04:45Z)

ostree admin status
# * rhcos 43a75eb50db67449b7b546dec6e30866a907d6b85317ce1ba5af71d07c903755.0
#     Version: 410.84.202208161501-0
#     origin: <unknown origin type>
#   rhcos 203abe66048544a0415be2c3089e236da15b3a468f9e2bf3c6e2590c31ecc8db.0 (rollback)
#     Version: 410.84.202208161501-0
#     origin refspec: 203abe66048544a0415be2c3089e236da15b3a468f9e2bf3c6e2590c31ecc8db

which htop
# /usr/bin/htop

end

openshift 4 组件的版本 / components version of openshift 4

客户在项目中提出了一个问题,就是openshift 4是由很多开源组件构成,并且打了补丁的,那么这些开源组件是什么版本呢?

针对这个问题,红帽由一个官方知识库,里面由核心组件的版本信息:

但是上面的知识库,只告诉了我们,crio, etcd, ovs, ovn 的版本信息,但是并没有说其他的,比如 multus 的版本信息。

客户需要很多其他组件的版本信息,好和已有的解决方法进行匹配度检查。那么我们就来一步一步的看,怎么找到这些版本信息吧。

本文,用 multus 来举例子。

视频讲解 / Video explanation

begin

首先,我们可以从openshift的发布信息中,得到multus的源代码地址。

oc adm release info `curl -s https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.9.30/release.txt | grep ocp-release | awk '{print $3}'` --commit-urls=true | grep multus
# ......
#   multus-admission-controller                    https://github.com/openshift/multus-admission-controller/commit/3c28a57a831d11380e612a616820bf8a42261d9d
#   multus-cni                                     https://github.com/openshift/multus-cni/commit/c2499377b6fb43320618025876eb5b9751006222
#   multus-networkpolicy                           https://github.com/openshift/multus-networkpolicy/commit/fd12fedeb9e05637279386aa2aacd443ac1c0da7
#   multus-route-override-cni                      https://github.com/openshift/route-override-cni/commit/1953205643c2739486c315d4ea58e17d29cfa610
#   multus-whereabouts-ipam-cni                    https://github.com/openshift/whereabouts-cni/commit/43552df5f301618a1857c9a1c2b51cbb7188ad38
# ......

我们可以看到,openshift 4 使用的multus,源代码地址是: https://github.com/openshift/multus-cni ,使用的版本id是 c2499377b6fb43320618025876eb5b9751006222

我们用git clone出来这个源代码项目,并且打开git history,查找 c2499377b6fb43320618025876eb5b9751006222 这个commit id,可以看到,他对应的是 release-4.9 分支。

那么,我们就筛选 release-4.9 分支,看看 git 的历史上有些什么 tag 信息。

我们可以看到,在release-4.9 分支上,有很多的补丁,但是最近的一个 tag 是 3.7.1

到这里,我们就知道了,openshift 4.9 使用的 multus ,是在 3.7.1 版本基础上打补丁出来的版本。

openshift4 内置 dns, haproxy, image registry / openshift4 embeds dns, haproxy, image registry

⚠️注意,本文所述操作,涉及更改 openshift 4 底层操作系统 rhcos,这有可能导致失去红帽支持资格,具体的情况,请和对口的红帽 GPS 团队沟通, 或者联系红帽 CEE 团队确认。这是因为本方案:

  • 没有经过严格的测试
  • 将在rhcos上安装rpm
  • rpm来自于epel, DIY

⚠️Note that the operation described in this article involves changing the underlying operating system rhcos of openshift 4, which may lead to the loss of Red Hat support qualification. For specific circumstances, please communicate with the corresponding Red Hat GPS team, or contact the Red Hat CEE team for confirmation. This is because this solution:

  • Not rigorously tested
  • will install rpm on rhcos
  • rpm from epel, DIY

rhcos 是一个特殊版本的coreos, 它是openshift 4的底座操作系统,在openshift 4的官方文档中,rhcos被描述成为不可变操作系统,这会让人误以为,rhcos是不可改变的。这个错误的认识,让openshift 4在项目实施的过程中,遇到很多尴尬,也让很多场景,支持起来非常的别扭。

rhcos is a special version of coreos, which is the base operating system of openshift 4. In the official documents of openshift 4, rhcos is described as an immutable operating system, which will make people mistakenly think that rhcos is immutable. This wrong understanding made Openshift 4 encounter a lot of embarrassment in the process of project implementation, and it also made many scenarios very awkward to support.

比如,我们有一个边缘的5GC的场景,客户要求服务器数量尽量少,并且要求高可用。而openshift 4如果要做到高可用,必须3台服务器,而如果考虑到,需要外部的dns, 负载分担,镜像仓库,并且考虑他们的HA,那么还需要2个服务器,这样一共就5台服务器了。这对于一个边缘部署来说,太重了。

For example, we have an edge 5GC scenario, where customers require as few servers as possible and high availability. If openshift 4 is to be highly available, 3 servers are required, and if it is considered that external dns, load sharing, mirror registry are required, and their HA is considered, then 2 servers are needed, so there are 5 servers in total. This is too heavy for an edge deployment.

openshift 4的竞品们,一般都是把dns,负载分担,镜像仓库等等周边组件,融入到集群内部,也就是在操作系统上直接部署,而openshift 4号称操作系统不可变,那是不是这些服务就不能部署到内部去呢?本文我们就来探索一下。

Competitors of openshift 4 generally integrate dns, load sharing, mirror registry and other peripheral components into the cluster, that is, deploy directly on the operating system, while openshift 4 claims that the operating system is immutable, is that right? Can't the service be deployed internally? In this article we will explore.

openshift4 虽然号称支持单节点,3节点的边缘部署模式,但是实际项目实施的时候,往往需要多一个节点,这个节点需要承载的任务有:

  • DNS服务 : 因为k8s的各种内部服务,都依赖DNS解析
  • load balancer 服务 : 3 k8s master是需要负载均衡服务的。
  • 镜像仓库 : 这个是因为crio会在系统重启的时候,检查是否是意外重启,如果是,会清空本机镜像缓存,重新从镜像仓库下载。
  • NTP服务 : 集群节点之间的时间同步服务,好在现在大多数 交换机/路由器 都可以提供这个服务。

Although openshift4 claims to support the edge deployment mode of single node and 3 nodes, when the actual project is implemented, one more node is often required. The tasks that this node needs to carry include:

  • DNS service: Because various internal services of k8s rely on DNS resolution
  • Load balancer service: 3 k8s master needs load balancing service.
  • Mirror registry: This is because crio will check whether it is an accidental restart when the system restarts. If so, it will clear the local container image cache and download it from the mirror registry again.
  • NTP service: Time synchronization service between cluster nodes. Fortunately, most switches/routers can provide this service.

上述服务,当然可以集中部署到核心区域,但是有些场景,比如私有5G核心网,我们必须把上述服务部署到边缘站点中,这是因为,私有5GC是断外网的环境。

The above services can of course be deployed in the core area, but in some scenarios, such as private 5G core networks, we must deploy the above services to edge sites, because private 5GC is an environment where the external network is disconnected.

我们还知道,openshift4 本身就是基于 rhcos / coreos 操作系统之上的 k8s, 我们自然希望可以把上述的服务,内嵌到 rhcos / coreos 里面去,实现真正意义上的 单节点/3节点 的部署模式。

We also know that openshift4 itself is based on k8s on the rhcos / coreos operating system. We naturally hope that the above services can be embedded in rhcos / coreos to achieve a true single-node/3-node deployment mode.

如果没有本方案,那么我们的部署会是这个样子的,可以看到,必须要有一个 helper 节点,提供辅助功能。

Without this solution, our deployment would look like this. As you can see, there must be a helper node to provide auxiliary functions.

以下是本方案的架构设计: / The following is the architectural design of this scheme:

让我们开始吧。 / Let's begin

视频讲解 / Video explanation

on single node ocp

我们从最简单的单节点openshift 4 集群开始。我们的目标,是把helper上的以下组件,用openshift 4的单节点中的组件替代:

We start with the simplest single node openshift 4 cluster. Our goal is to replace the following components on the helper with components in a single node of openshift 4:

  • dns -> pdns (power dns)
  • image registry -> docker distribution

我们不考虑 haproxy ,是因为单节点,没有外部负载分担的需要。

We do not consider haproxy because it is a single node and there is no need for external load sharing.

而NTP服务,我们认为网络交换机/路由器可以提供。或者在SNO场景下,可以不用NTP服务。也可以在SNO节点上直接启动一个NTP服务都可以。

And NTP service, we think network switch/router can provide. Or in the SNO scenario, the NTP service may not be used. You can also directly start an NTP service on the SNO node.

这里是这个single node ocp的day-0的部署过程记录

Here is the deployment process record of day-0 of this single node ocp.

以下是day-0的时候,部署的架构图: / The following is the architecture diagram of the deployment at day-0:

我们的目标,是通过day-2的操作,把他变成这个样子: / Our goal is to make it look like this through the operation of day-2:

prepare docker registry content

我们需要先准备以下离线镜像仓库,openshift支持了一个oc-mirror的工具,我们可以方便的使用。我们先把离线镜像仓库下载到文件中。留着后面使用。

We need to prepare the following offline mirror images first. Openshift supports an oc-mirror tool, which we can use easily. We first download the offline mirror repository to a file. Save it for later use.


# setup a stand alone docker registry
# on helper

cat > /data/ocp4/mirror.yaml << EOF
apiVersion: mirror.openshift.io/v1alpha1
kind: ImageSetConfiguration
# archiveSize: 4
mirror:
  ocp:
    channels:
      - name: stable-4.10
        versions:
          - '4.10.28'
          - '4.10.26'
  additionalImages:
    - name: registry.redhat.io/redhat/redhat-operator-index:v4.10
    - name: registry.redhat.io/redhat/certified-operator-index:v4.10
    - name: registry.redhat.io/redhat/community-operator-index:v4.10
    - name: registry.redhat.io/redhat/redhat-marketplace-index:v4.10

EOF

mkdir -p /data/install/mirror-tmp
cd /data/install/mirror-tmp

oc-mirror --config /data/ocp4/mirror.yaml file:///data/install/mirror-tmp

install rpm to rhcos

我们需要向rhcos直接安装pdns, docker distribution等软件,为什么不能用openshift的容器来提供这些服务呢?这里面有一个crio的bug,简单说,如果主机意外重启,crio会把本地镜像全部作废,然后重新从镜像仓库下载。所以,我们的dns, registry服务就不能用容器来启动,否则如果宿主机暴力重启,dns, registry的容器服务都启动不了,这个节点的openshift服务就无法启动了。

We need to install pdns, docker distribution and other software directly to rhcos, why can't we use openshift containers to provide these services? There is a crio bug here. Simply speaking, if the host restarts unexpectedly, crio will invalidate all the local images and download them from the mirror repository again. Therefore, our dns and registry services cannot be started with containers. Otherwise, if the host restarts violently, the container services of dns and registry cannot be started, and the openshift service of this node cannot be started.

有同事建议,可以使用podman/systemd的方式,在systemd里面注册一个服务,在服务里面通过podman启动pdns, registry,经过实验测试,断电重启的情况下,podman的镜像,也会丢失,所以对应的systemd service也启动不了。所以我们就彻底放弃容器解决方案。

A colleague suggested that we can use the podman/systemd method to register a service in systemd, and start pdns and registry through podman in the service. After experimental testing, in the case of power failure and restart, the image of podman will also be lost, so the corresponding The systemd service does not start either. So we dropped the container solution entirely.

我们还需要做一个rpm repo源,这里作者做好了一个demo rpm repo源,注意,这个源引用了epel的rpm, 还有作者自己打包的rpm。所以这个源只能作为学习和测试之用。

We also need to make an rpm repo source. Here the author has prepared a demo rpm repo source. Note that this source refers to the rpm of epel and the rpm packaged by the author himself. So this source should only be used for learning and testing purposes.

最后,用rpm-ostree向rhcos装rpm,这个技术是openshift 4自己就在使用的,openshift 4 extension功能,比如real-time kernel extension, 就是通过rpm-ostree向rhcos装了对应的kernel包实现的。

Finally, use rpm-ostree to install rpm to rhcos. This technology is used by openshift 4 itself. Openshift 4 extension functions, such as real-time kernel extension, are implemented by installing the corresponding kernel package to rhcos through rpm-ostree.


# on helper
mkdir -p /data/repo
cd /data/repo

# here is the demo simple repo
# you can build the repo by youself, just following rhel8.4 way
wget https://github.com/wangzheng422/release/releases/download/ocp.4.10.28.simple.repo/dnf-ocp-4.10-simple.tgz

tar zvxf dnf-ocp-4.10-simple.tgz

cd /data/repo/dnf-ocp-4.10-simple/

# start http server to serve the rpm repo
python3 -m http.server 5180
# Serving HTTP on 0.0.0.0 port 5180 (http://0.0.0.0:5180/) ...

# login into single node

export REPO_IP=http://192.168.7.11:5180

cat << EOF > /etc/yum.repos.d/wzh.repo 
# RHEL repos
[simple]
baseurl=${REPO_IP}/
enabled=1
repo_gpgcheck=0
gpgcheck=0

EOF

rpm-ostree install htop pdns pdns-recursor docker-distribution
# Checking out tree 8b334e0... done
# Enabled rpm-md repositories: simple
# Updating metadata for 'simple'... done
# rpm-md repo 'simple'; generated: 2022-09-09T06:17:17Z
# Importing rpm-md... done
# Resolving dependencies... done
# Will download: 11 packages (12.9 MB)
# Downloading from 'simple'... done
# Importing packages... done
# Checking out packages... done
# Running pre scripts... done
# Running post scripts... done
# Running posttrans scripts... done
# Writing rpmdb... done
# Writing OSTree commit... done
# Staging deployment... done
# Added:
#   boost-context-1.66.0-10.el8.x86_64
#   boost-filesystem-1.66.0-10.el8.x86_64
#   boost-program-options-1.66.0-10.el8.x86_64
#   boost-system-1.66.0-10.el8.x86_64
#   docker-distribution-2.8.1-0.el8.x86_64
#   htop-3.2.1-1.el8.x86_64
#   libsodium-1.0.18-2.el8.x86_64
#   luajit-2.1.0-0.16beta3.el8.x86_64
#   pdns-4.6.2-1.el8.x86_64
#   pdns-recursor-4.3.6-1.el8.x86_64
#   protobuf-3.5.0-13.el8.x86_64
# Run "systemctl reboot" to start a reboot

systemctl reboot

# after reboot
rpm-ostree status
# State: idle
# Deployments:
# ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)
#            LayeredPackages: docker-distribution htop pdns pdns-recursor

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)

重启以后,我们就能看到LayeredPackages了,以后版本的 openshift 4 会在集群层面支持 LayeredPackages 功能。目前我们只能直接登录rhcos来手动做安装。

After restarting, we can see LayeredPackages, and furture versions of openshift 4 will support the LayeredPackages function at the cluster / k8s level. At present, we can only log in directly to rhcos to do the installation manually.

embed registry on single node ocp

我们需要的软件,已经装在节点上了,接下来,我们就做一些配置,把本地的镜像仓库激活。注意,这里面我们使用的是docker distribution, 我们把之前helper上的镜像仓库的证书拿来直接给他用,这样之后,我们只要更改dns指向就可以了。

The software we need has been installed on the node. Next, we will do some configuration to activate the local mirror registry. Note that we are using the docker distribution here. We use the certificate of the image registry on the helper directly for it to use. After that, we only need to change the dns point.

我们的配置文件位于/etc下面, 上传的镜像位于/var下面,那么节点重启,集群升级,这些目录会不会被重置呢?目前的实测表明不会,按照文档的说法,/etc下面的内容,在升级的时候会进行合并,/var下面的内容,会保留。

Our configuration file is located under /etc, and the uploaded image is located under /var. Then, if the node is restarted and the cluster is upgraded, will these directories be reset? The current testing shows that it will not. According to the document, the content under /etc will be merged during the upgrade, and the content under /var will be retained.


export BASE_DIR='/home/sno/'
export VAR_CERT_DIR=/etc/crts/

echo "obase=8;ibase=10;420" | bc
# 644

echo "obase=10;ibase=8;700" | bc
# 448

#########################
# run with root
# to grant read access to key
chmod og+r $VAR_CERT_DIR/redhat.ren.key
#########################

cat << EOF > ${BASE_DIR}/data/sno/registry.images.bu
variant: openshift
version: 4.10.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-registry
storage:
  files:
    - path: /etc/wzh/redhat.ren.crt
      overwrite: true
      contents:
        source: data:text/plain;charset=utf-8;base64,$( base64 -w 0 < ${VAR_CERT_DIR}/redhat.ren.crt )
      mode: 420
      user:
        name: root

    - path: /etc/wzh/redhat.ren.key
      overwrite: true
      contents:
        source: data:text/plain;charset=utf-8;base64,$( base64 -w 0 < ${VAR_CERT_DIR}/redhat.ren.key )
      mode: 420
      user:
        name: root

    - path: /etc/wzh/registry-config.yml
      overwrite: true
      contents:
        inline: |
          version: 0.1
          log:
            accesslog:
                disabled: true
            fields:
                service: registry
          storage:
              cache:
                  layerinfo: inmemory
              filesystem:
                  rootdirectory: /var/wzh-registry
              delete:
                  enabled: true
              maintenance:
                  readonly:
                      enabled: false
          http:
              addr: :8443
              tls:
                certificate: /etc/wzh/redhat.ren.crt
                key: /etc/wzh/redhat.ren.key
      mode: 420
      user:
        name: root

systemd:
  units:
    - contents: |
        [Unit]
        Description=Set SELinux chcon for image registry
        Before=docker-distribution.service

        [Service]
        Type=oneshot
        RemainAfterExit=yes
        User=root
        ExecStartPre=-mkdir -p /var/wzh-registry
        ExecStart=/usr/bin/chcon -Rt container_file_t /var/wzh-registry

        [Install]
        WantedBy=multi-user.target
      enabled: true
      name: hostpath-registry.service

    - contents: |
        [Unit]
        Description=v2 Registry server for Docker
        After=network.target hostpath-registry.service
        Requires=hostpath-registry.service
        Before=kubelet.service

        [Service]
        Type=simple
        ExecStart=/usr/bin/registry serve /etc/wzh/registry-config.yml

        [Install]
        WantedBy=multi-user.target
      enabled: true
      name: docker-distribution.service

    - name: kubelet.service
      dropins:
      - name: 99-after-registry.conf
        contents: |
          [Unit]
          Requires=docker-distribution.service
          After=docker-distribution.service

EOF

butane ${BASE_DIR}/data/sno/registry.images.bu > ${BASE_DIR}/data/sno/99-zzz-master-registry.yaml

oc create --save-config -f ${BASE_DIR}/data/sno/99-zzz-master-registry.yaml

# oc apply -f ${BASE_DIR}/data/sno/99-zzz-master-registry.yaml
# oc delete -f ${BASE_DIR}/data/sno/99-zzz-master-registry.yaml

upload registry content

有了镜像仓库,我们就把之前下载的离线镜像文件,导入到节点内置的镜像仓库中。

With the mirror registry, we import the offline mirror file downloaded before into the built-in mirror registry of the node.

# on helper
oc-mirror --dest-skip-tls --from mirror_seq1_000000.tar docker://192.168.7.13:8443

(optional) update registry config to read only

我们的离线镜像上传了,就不希望别别人改掉,那么我们可以把本地的镜像仓库设置成只读模式。

Our offline mirror is uploaded, and we don't want others to change it, then we can set the local mirror repository to read-only mode.

cat << EOF > ${BASE_DIR}/data/sno/registry.images.bu
variant: openshift
version: 4.10.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-registry
storage:
  files:
    - path: /etc/wzh/redhat.ren.crt
      overwrite: true
      contents:
        source: data:text/plain;charset=utf-8;base64,$( base64 -w 0 < ${VAR_CERT_DIR}/redhat.ren.crt )
      mode: 420
      user:
        name: root

    - path: /etc/wzh/redhat.ren.key
      overwrite: true
      contents:
        source: data:text/plain;charset=utf-8;base64,$( base64 -w 0 < ${VAR_CERT_DIR}/redhat.ren.key )
      mode: 420
      user:
        name: root

    - path: /etc/wzh/registry-config.yml
      overwrite: true
      contents:
        inline: |
          version: 0.1
          log:
            accesslog:
                disabled: true
            fields:
                service: registry
          storage:
              cache:
                  layerinfo: inmemory
              filesystem:
                  rootdirectory: /var/wzh-registry
              delete:
                  enabled: false
              maintenance:
                  readonly:
                      enabled: true
          http:
              addr: :5443
              tls:
                certificate: /etc/wzh/redhat.ren.crt
                key: /etc/wzh/redhat.ren.key
      mode: 420
      user:
        name: root

systemd:
  units:
    - contents: |
        [Unit]
        Description=Set SELinux chcon for image registry
        Before=docker-distribution.service

        [Service]
        Type=oneshot
        RemainAfterExit=yes
        User=root
        ExecStartPre=-mkdir -p /var/wzh-registry
        ExecStart=/usr/bin/chcon -Rt container_file_t /var/wzh-registry

        [Install]
        WantedBy=multi-user.target
      enabled: true
      name: hostpath-registry.service

    - contents: |
        [Unit]
        Description=v2 Registry server for Docker
        After=network.target hostpath-registry.service
        Requires=hostpath-registry.service
        Before=kubelet.service

        [Service]
        Type=simple
        ExecStart=/usr/bin/registry serve /etc/wzh/registry-config.yml

        [Install]
        WantedBy=multi-user.target
      enabled: true
      name: docker-distribution.service

    - name: kubelet.service
      dropins:
      - name: 99-after-registry.conf
        contents: |
          [Unit]
          Requires=docker-distribution.service
          After=docker-distribution.service

EOF

butane ${BASE_DIR}/data/sno/registry.images.bu > ${BASE_DIR}/data/sno/99-zzz-master-registry.yaml

oc apply -f ${BASE_DIR}/data/sno/99-zzz-master-registry.yaml

deploy power dns (pdns) as local dns service

我们配置本地的power dns,把我们需要的dns记录都写进去,并且配置它在kubelet之前启动。

We configure the local power dns, write all the dns records we need, and configure it to start before the kubelet.


oc patch mcp/master --patch '{"spec":{"paused":true}}' --type=merge
oc patch mcp/worker --patch '{"spec":{"paused":true}}' --type=merge

cat > ${BASE_DIR}/data/sno/pdns.bu << 'EOF' 
variant: openshift
version: 4.10.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-pdns
storage:
  files:
    - path: /etc/pdns/pdns.conf
      overwrite: true
      contents:
        inline: |
          launch=bind
          local-address=0.0.0.0
          local-port=53
          setgid=pdns
          setuid=pdns
          bind-config=/etc/pdns/bind.conf
          bind-check-interval=300
          enable-lua-records=yes
      mode: 420
      user:
        name: root

    - path: /etc/pdns/bind.conf
      overwrite: true
      contents:
        inline: |
          zone "acm-demo-hub.redhat.ren" { type master; file "/etc/pdns/inside-out.xyz"; };
          zone "infra.redhat.ren" { type master; file "/etc/pdns/infra.xyz"; };
      mode: 420
      user:
        name: root

    - path: /etc/pdns/inside-out.xyz
      overwrite: true
      contents:
        inline: |
          $TTL 10 
          @ IN SOA ns1.acm-demo-hub.redhat.ren. postmaster.acm-demo-hub.redhat.ren. (
                  2014080704 ; Serial Number (date YYYYMMDD++) 
                  3H              ; refresh (3 hours)
                  30M             ; retry (30 minutes)
                  2W              ; expiry (2 weeks)
                  1W )            ; minimum (1 week)
                  ;IN NS ns1.ocp4.redhat.ren.
                  ;IN NS ns2.ocp4.redhat.ren.
          @       IN    A    192.168.7.13
          ;ns1     IN A 8.8.8.8 
          ;ns2     IN A 8.8.4.4
          helper  IN      A       192.168.7.11
          ;
          ;
          ; The api points to the IP of your load balancer
          api             IN    A    192.168.7.13
          api-int         IN    A    192.168.7.13
          ;
          ; The wildcard also points to the load balancer
          *.apps          IN    A    192.168.7.13
          ;
          ; Create entry for the bootstrap host
          ; bootstrap       IN      A       192.168.7.12
          ;
          ; Create entries for the master hosts
          master-0                IN      A       192.168.7.13
          ;master-1                IN      A       192.168.7.14
          ;master-2                IN      A       192.168.7.15
          ;
          ; Create entries for the worker hosts
          ;worker-0                IN      A       192.168.7.16
          ;worker-1                IN      A       192.168.7.17
          ;worker-2                IN      A       192.168.7.18
          ;
          ; The ETCd cluster lives on the masters...so point these to the IP of the masters
          ;etcd-0  IN      A       192.168.7.13
          ;etcd-1  IN      A       192.168.7.14
          ;etcd-2  IN      A       192.168.7.15
          ;
          ; Create entries for the other hosts
          registry        IN      A       192.168.7.13
          yum             IN      A       192.168.7.1
          nexus           IN      A       192.168.7.1
          git             IN      A       192.168.7.11
          tmp-registry    IN      A       192.168.7.177
      mode: 420
      user:
        name: root

    - path: /etc/pdns/infra.xyz
      overwrite: true
      contents:
        inline: |
          $TTL 10 
          @ IN SOA ns1.infra.redhat.ren. postmaster.infra.redhat.ren. (
                  2014080704 ; Serial Number (date YYYYMMDD++) 
                  3H              ; refresh (3 hours)
                  30M             ; retry (30 minutes)
                  2W              ; expiry (2 weeks)
                  1W )            ; minimum (1 week)
                  ;IN NS ns1.ocp4.redhat.ren.
                  ;IN NS ns2.ocp4.redhat.ren.
          @       IN    A    192.168.7.13
          quay            IN    A    192.168.7.13
          quaylab         IN    A    192.168.7.13

      mode: 420
      user:
        name: root
systemd:
  units:
    - name: pdns.service
      enabled: true

    - name: kubelet.service
      dropins:
      - name: 99-after-pdns.conf
        contents: |
          [Unit]
          Requires=pdns.service
          After=pdns.service

EOF

butane ${BASE_DIR}/data/sno/pdns.bu > ${BASE_DIR}/data/sno/99-zzz-master-pdns.yaml

oc create --save-config -f ${BASE_DIR}/data/sno/99-zzz-master-pdns.yaml

# oc apply -f ${BASE_DIR}/data/sno/99-zzz-master-pdns.yaml

update registry.conf to point to local registry

默认情况下,这一步并不需要,但是作者的集群装的时候,对registries.conf做过特殊的配置,这里面就要把镜像仓库重新调整以下。image.registries.conf.sh脚本的源代码在这里

By default, this step is not required, but when the author's cluster is installed, he has made special configuration to registries.conf, and the mirror warehouse needs to be readjusted as follows. The source code for the image.registries.conf.sh script is here.


######################
# run as root
cd /data/ocp4
bash image.registries.conf.sh quay.infra.redhat.ren:8443
######################

oc apply -f /data/ocp4/99-worker-container-registries.yaml
oc apply -f /data/ocp4/99-master-container-registries.yaml

set sno dns to local dns service

更改single node ocp的dns配置,根据集群安装的方法不同而不同。本次实验的集群的安装方法在这里,于是我们就这样来更改dns指向。

Change the dns configuration of the single node ocp, which varies according to the method of cluster installation. The installation method of the cluster in this experiment is here, so we will change the dns point like this.


NTP_SERVER=192.168.7.11
HELP_SERVER=192.168.7.11
KVM_HOST=192.168.7.11
API_VIP=192.168.7.100
INGRESS_VIP=192.168.7.101
CLUSTER_PROVISION_IP=192.168.7.103
BOOTSTRAP_IP=192.168.7.12

ACM_DEMO_MNGED_CLUSTER=acm-demo1
ACM_DEMO_MNGED_SNO_IP=192.168.7.15

# 定义单节点集群的节点信息
SNO_CLUSTER_NAME=acm-demo-hub
SNO_BASE_DOMAIN=redhat.ren
SNO_IP=192.168.7.13
# ocp bug, gateway needs to be online, otherwise, ovn will mis-behaviour, and ingress failed to start.
SNO_GW=192.168.7.9
SNO_NETMAST=255.255.255.0
SNO_NETMAST_S=24
SNO_HOSTNAME=acm-demo-hub-master
SNO_IF=enp1s0
SNO_IF_MAC=`printf '00:60:2F:%02X:%02X:%02X' $[RANDOM%256] $[RANDOM%256] $[RANDOM%256]`
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_CORE_PWD=redhat

export BASE_DIR='/home/sno/'

cat << EOF > ${BASE_DIR}/data/sno/static.ip.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-static-ip
storage:
  files:
    - path: /etc/NetworkManager/system-connections/${SNO_IF}.nmconnection
      mode: 0600
      overwrite: true
      contents:
        inline: |
          [connection]
          id=${SNO_IF}
          type=ethernet
          autoconnect-retries=1
          interface-name=${SNO_IF}
          multi-connect=1
          permissions=
          wait-device-timeout=60000

          [ethernet]
          mac-address-blacklist=

          [ipv4]
          address1=${SNO_IP}/${SNO_NETMAST_S=24},${SNO_GW}
          dhcp-hostname=${SNO_HOSTNAME}
          dhcp-timeout=90
          dns=${SNO_IP};
          dns-search=
          may-fail=false
          method=manual

          [ipv6]
          addr-gen-mode=eui64
          dhcp-hostname=${SNO_HOSTNAME}
          dhcp-timeout=90
          dns-search=
          method=disabled

          [proxy]

EOF

butane ${BASE_DIR}/data/sno/static.ip.bu > ${BASE_DIR}/data/sno/disconnected/99-zzz-master-ip.yaml

oc apply -f ${BASE_DIR}/data/sno/disconnected/99-zzz-master-ip.yaml

oc patch mcp/master --patch '{"spec":{"paused":false}}' --type=merge
oc patch mcp/worker --patch '{"spec":{"paused":false}}' --type=merge

test with force power off

我们知道,如果ocp node意外断电的话,启动的时候,他会重新下载集群需要的基础镜像,那么我们就暴力断电,来测试sno能否启动吧。

We know that if the ocp node is accidentally powered off, it will re-download the basic image required by the cluster when it starts up, so we will power off violently to test whether the sno can be started.

重启之后,正常启动。 / After restarting, start normally.

oc get co
# NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
# authentication                             4.10.26   True        False         False      30m
# baremetal                                  4.10.26   True        False         False      4d22h
# cloud-controller-manager                   4.10.26   True        False         False      4d22h
# cloud-credential                           4.10.26   True        False         False      4d22h
# cluster-autoscaler                         4.10.26   True        False         False      4d22h
# config-operator                            4.10.26   True        False         False      4d22h
# console                                    4.10.26   True        False         False      7m23s
# csi-snapshot-controller                    4.10.26   True        False         False      4d22h
# dns                                        4.10.26   True        False         False      20m
# etcd                                       4.10.26   True        False         False      4d22h
# image-registry                             4.10.26   True        False         False      4d22h
# ingress                                    4.10.26   True        False         False      4d22h
# insights                                   4.10.26   True        False         False      40s
# kube-apiserver                             4.10.26   True        False         False      4d22h
# kube-controller-manager                    4.10.26   True        False         False      4d22h
# kube-scheduler                             4.10.26   True        False         False      4d22h
# kube-storage-version-migrator              4.10.26   True        False         False      3d18h
# machine-api                                4.10.26   True        False         False      4d22h
# machine-approver                           4.10.26   True        False         False      4d22h
# machine-config                             4.10.26   True        False         False      4d22h
# marketplace                                4.10.26   True        False         False      4d22h
# monitoring                                 4.10.26   True        False         False      4d22h
# network                                    4.10.26   True        False         False      4d22h
# node-tuning                                4.10.26   True        False         False      30m
# openshift-apiserver                        4.10.26   True        False         False      3d22h
# openshift-controller-manager               4.10.26   True        False         False      2d19h
# openshift-samples                          4.10.26   True        False         False      3d23h
# operator-lifecycle-manager                 4.10.26   True        False         False      4d22h
# operator-lifecycle-manager-catalog         4.10.26   True        False         False      4d22h
# operator-lifecycle-manager-packageserver   4.10.26   True        False         False      7m48s
# service-ca                                 4.10.26   True        False         False      4d22h
# storage                                    4.10.26   True        False         False      4d22h

test with ocp upgrade

我们上传的镜像,包括了4.10.26, 4.10.28两个版本,那么我们就来试试升级吧

The images we uploaded include two versions: 4.10.26 and 4.10.28, so let's try to upgrade

rpm-ostree status
# State: idle
# Deployments:
# ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)
#            LayeredPackages: docker-distribution htop pdns pdns-recursor

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)

# before upgrade, make sure the rpm repo is online
# rpm-ostree will call rpm repo during upgrade
# although it will not download anything

# upgrade ocp to 4.10.28
oc adm upgrade \
  --to-image=quay.io/openshift-release-dev/ocp-release@sha256:2127608ebd67a2470860c42368807a0de2308dba144ec4c298bec1c03d79cb52 \
  --allow-explicit-upgrade --allow-upgrade-with-warnings=true --force=true 

rpm-ostree status
# State: idle
# Deployments:
# ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:822737b305b28aa4890f7bf847ebebc896cd7b549318195fc8c953ae3008cc44
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208161501-0 (2022-08-16T15:04:45Z)
#            LayeredPackages: docker-distribution htop pdns pdns-recursor

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)
#            LayeredPackages: docker-distribution htop pdns pdns-recursor

oc get co
# NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
# authentication                             4.10.28   True        False         False      26m
# baremetal                                  4.10.28   True        False         False      130m
# cloud-controller-manager                   4.10.28   True        False         False      130m
# cloud-credential                           4.10.28   True        False         False      154m
# cluster-autoscaler                         4.10.28   True        False         False      130m
# config-operator                            4.10.28   True        False         False      142m
# console                                    4.10.28   True        False         False      26m
# csi-snapshot-controller                    4.10.28   True        False         False      32m
# dns                                        4.10.28   True        False         False      26m
# etcd                                       4.10.28   True        False         False      138m
# image-registry                             4.10.28   True        False         False      36m
# ingress                                    4.10.28   True        False         False      141m
# insights                                   4.10.28   True        False         False      17s
# kube-apiserver                             4.10.28   True        False         False      131m
# kube-controller-manager                    4.10.28   True        False         False      136m
# kube-scheduler                             4.10.28   True        False         False      133m
# kube-storage-version-migrator              4.10.28   True        False         False      141m
# machine-api                                4.10.28   True        False         False      130m
# machine-approver                           4.10.28   True        False         False      141m
# machine-config                             4.10.28   True        False         False      138m
# marketplace                                4.10.28   True        False         False      141m
# monitoring                                 4.10.28   True        False         False      35m
# network                                    4.10.28   True        False         False      142m
# node-tuning                                4.10.28   True        False         False      36m
# openshift-apiserver                        4.10.28   True        False         False      36m
# openshift-controller-manager               4.10.28   True        False         False      131m
# openshift-samples                          4.10.28   True        False         False      36m
# operator-lifecycle-manager                 4.10.28   True        False         False      130m
# operator-lifecycle-manager-catalog         4.10.28   True        False         False      130m
# operator-lifecycle-manager-packageserver   4.10.28   True        False         False      104m
# service-ca                                 4.10.28   True        False         False      141m
# storage                                    4.10.28   True        False         False      130m

我们可以看到,能够正常的升级和启动。

We can see that it can be upgraded and started normally.

3 node cluster

接下来,我们尝试 3 node openshift / compact cluster。我们的目标,是把helper上的以下组件,用openshift 4的节点中的组件替代:

Next, we try a 3 node openshift / compact cluster. Our goal is to replace the following components on the helper with components in the openshift 4 node:

  • dns -> pdns (power dns)
  • haproxy -> pdns lua plugin (ifportup)
  • image registry -> docker distribution

而NTP服务,我们依然认为网络交换机/路由器可以提供。

And NTP service, we still think that network switch/router can provide.

install rpm to rhcos

这个步骤,和single node ocp是一样的,只不过需要在 3 master 上都执行一遍。另外,我们多安装了一个pdns-selinux, 这个包和docker-distribution都是作者自己打包的,pdns-selinux补充了selinux规则,运行pdns能够做对外的端口检查。

This step is the same as single node ocp, except that it needs to be executed on all 3 masters. In addition, we have installed one more pdns-selinux. This package and docker-distribution are packaged by the author himself. pdns-selinux supplements the selinux rules, and running pdns can perform external port inspection.

# Delete cached rpm repo metadata
# rpm-ostree cleanup -m

rpm-ostree install htop pdns pdns-recursor docker-distribution pdns-selinux
# Added:
#   pdns-selinux-0.0.1-0.el8.x86_64
# Run "systemctl reboot" to start a reboot

reboot

rpm-ostree status
# State: idle
# Deployments:
# ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)
#            LayeredPackages: docker-distribution htop pdns pdns-recursor pdns-selinux

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)
#            LayeredPackages: docker-distribution htop pdns pdns-recursor

embed registry on each ocp node

这个步骤,也和 single node ocp是一样的。

This step is also the same as single node ocp.


export BASE_DIR='/home/3node/'
export VAR_CERT_DIR=/etc/crts/

# ......

upload registry content

这个步骤,和single node ocp是一样的,只不过需要为 3 master 都执行一遍。

This step is the same as single node ocp, but it needs to be executed for all 3 masters.

oc-mirror --dest-skip-tls --from mirror_seq1_000000.tar docker://192.168.7.13:8443

oc-mirror --dest-skip-tls --from mirror_seq1_000000.tar docker://192.168.7.14:8443

oc-mirror --dest-skip-tls --from mirror_seq1_000000.tar docker://192.168.7.15:8443

deploy power dns (pdns) as local dns service

我们配置本地的power dns,把我们需要的dns记录都写进去,并且配置它在kubelet之前启动。这一步和之前的single node ocp不一样,需要用到pdns lua plugin,用 ifportup 的方法,探测对应节点上的端口是否打开,如果没有打开,认为对应的服务没有启动,或者节点掉线,然后 pdns 就不会返回对应节点的解析。我们用这种方法,来代替haproxy。

We configure the local power dns, write all the dns records we need, and configure it to start before the kubelet. This step is different from the previous single node ocp. You need to use the pdns lua plugin. Use the ifportup method to detect whether the port on the corresponding node is open. If it is not open, it is considered that the corresponding service is not started, or the node is offline, and then pdns The parsing of the corresponding node will not be returned. We use this method to replace haproxy.


cat > ${BASE_DIR}/data/sno/pdns.bu << 'EOF' 
variant: openshift
version: 4.10.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-pdns
storage:
  files:
    - path: /etc/pdns/pdns.conf
      overwrite: true
      contents:
        inline: |
          launch=bind
          local-address=0.0.0.0
          local-port=53
          setgid=pdns
          setuid=pdns
          bind-config=/etc/pdns/bind.conf
          bind-check-interval=300
          enable-lua-records=yes
      mode: 420
      user:
        name: root

    - path: /etc/pdns/bind.conf
      overwrite: true
      contents:
        inline: |
          zone "acm-demo-hub.redhat.ren" { type master; file "/etc/pdns/inside-out.xyz"; };
          zone "infra.redhat.ren" { type master; file "/etc/pdns/infra.xyz"; };
      mode: 420
      user:
        name: root

    - path: /etc/pdns/inside-out.xyz
      overwrite: true
      contents:
        inline: |
          $TTL 10 
          @ IN SOA ns1.acm-demo-hub.redhat.ren. postmaster.acm-demo-hub.redhat.ren. (
                  2014080704 ; Serial Number (date YYYYMMDD++) 
                  3H              ; refresh (3 hours)
                  30M             ; retry (30 minutes)
                  2W              ; expiry (2 weeks)
                  1W )            ; minimum (1 week)
                  ;IN NS ns1.ocp4.redhat.ren.
                  ;IN NS ns2.ocp4.redhat.ren.
          @       IN    LUA    A    "ifportup(6443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
          ;ns1     IN A 8.8.8.8 
          ;ns2     IN A 8.8.4.4
          helper  IN      A       192.168.7.11
          ;
          ;
          ; The api points to the IP of your load balancer
          api             IN    LUA    A    "ifportup(6443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
          api-int         IN    LUA    A    "ifportup(6443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
          ;
          ; The wildcard also points to the load balancer
          *.apps          IN    LUA    A    "ifportup(443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
          ;
          ; Create entry for the bootstrap host
          ; bootstrap       IN      A       192.168.7.12
          ;
          ; Create entries for the master hosts
          ;master-0                IN      A       192.168.7.13
          ;master-1                IN      A       192.168.7.14
          ;master-2                IN      A       192.168.7.15
          ;
          ; Create entries for the worker hosts
          ;worker-0                IN      A       192.168.7.16
          ;worker-1                IN      A       192.168.7.17
          ;worker-2                IN      A       192.168.7.18
          ;
          ; The ETCd cluster lives on the masters...so point these to the IP of the masters
          ;etcd-0  IN      A       192.168.7.13
          ;etcd-1  IN      A       192.168.7.14
          ;etcd-2  IN      A       192.168.7.15
          ;
          ; Create entries for the other hosts
          ;registry        IN    LUA    A    "ifportup(8443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
          ;yum             IN      A       192.168.7.1
          ;quay            IN    LUA    A    "ifportup(8443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
          nexus           IN      A       192.168.7.1
          git             IN      A       192.168.7.11
          tmp-registry    IN      A       192.168.7.177
      mode: 420
      user:
        name: root

    - path: /etc/pdns/infra.xyz
      overwrite: true
      contents:
        inline: |
          $TTL 10 
          @ IN SOA ns1.infra.redhat.ren. postmaster.infra.redhat.ren. (
                  2014080704 ; Serial Number (date YYYYMMDD++) 
                  3H              ; refresh (3 hours)
                  30M             ; retry (30 minutes)
                  2W              ; expiry (2 weeks)
                  1W )            ; minimum (1 week)
                  ;IN NS ns1.ocp4.redhat.ren.
                  ;IN NS ns2.ocp4.redhat.ren.
          @       IN    A    192.168.7.13
          quay            IN    LUA    A    "ifportup(8443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
          quaylab         IN    LUA    A    "ifportup(8443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"

      mode: 420
      user:
        name: root
systemd:
  units:
    - name: pdns.service
      enabled: true

    - name: kubelet.service
      dropins:
      - name: 99-after-pdns.conf
        contents: |
          [Unit]
          Requires=pdns.service
          After=pdns.service

EOF

butane ${BASE_DIR}/data/sno/pdns.bu > ${BASE_DIR}/data/sno/99-zzz-master-pdns.yaml

oc create --save-config -f ${BASE_DIR}/data/sno/99-zzz-master-pdns.yaml

# oc apply -f ${BASE_DIR}/data/sno/99-zzz-master-pdns.yaml

update registry.conf to point to local registry

这个步骤,也和 single node ocp是一样的。根据集群的安装方法不同,而不同。

This step is also the same as single node ocp. It varies according to the installation method of the cluster.


######################
# run as root
cd /data/ocp4
bash image.registries.conf.sh quay.infra.redhat.ren:8443
######################

oc patch mcp/master --patch '{"spec":{"paused":true}}' --type=merge
oc patch mcp/worker --patch '{"spec":{"paused":true}}' --type=merge

oc apply -f /data/ocp4/99-worker-container-registries.yaml
oc apply -f /data/ocp4/99-master-container-registries.yaml

oc patch mcp/master --patch '{"spec":{"paused":false}}' --type=merge
oc patch mcp/worker --patch '{"spec":{"paused":false}}' --type=merge

set sno dns to local dns service

把dns指向到本地的 power dns, 指向的方法根据集群安装的方法各不相同。作者的 3 node / compact cluster 是这么安装的,因为网络使用ovn,dns配置信息会在启动的时候,从网卡copy到 br-ex 上,所以作者需要在每个节点上,修改网卡的dns指向,然后重启。

Point the dns to the local power dns, the method of pointing varies according to the method of cluster installation. The author's 3 node / compact cluster is installed like this. Because the network uses ovn, the dns configuration information will be copied from the network card to br-ex at startup, so the author needs to modify the dns point of the network card on each node. Then reboot.

# for master-01
nmcli con mod enp1s0 ipv4.dns 192.168.7.13
reboot

# for master-02
nmcli con mod enp1s0 ipv4.dns 192.168.7.14
reboot

# for master-03
nmcli con mod enp1s0 ipv4.dns 192.168.7.15
reboot

# after reboot, test the dns
dig @127.0.0.1 quaylab.infra.redhat.ren
# ; <<>> DiG 9.11.26-RedHat-9.11.26-4.el8_4 <<>> @127.0.0.1 quaylab.infra.redhat.ren
# ; (1 server found)
# ;; global options: +cmd
# ;; Got answer:
# ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55590
# ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
# ;; WARNING: recursion requested but not available

# ;; OPT PSEUDOSECTION:
# ; EDNS: version: 0, flags:; udp: 1232
# ;; QUESTION SECTION:
# ;quaylab.infra.redhat.ren.      IN      A

# ;; ANSWER SECTION:
# quaylab.infra.redhat.ren. 10    IN      A       192.168.7.15

# ;; Query time: 7 msec
# ;; SERVER: 127.0.0.1#53(127.0.0.1)
# ;; WHEN: Thu Sep 15 02:23:09 UTC 2022
# ;; MSG SIZE  rcvd: 69


dig @127.0.0.1 api.acm-demo-hub.redhat.ren
# ; <<>> DiG 9.11.26-RedHat-9.11.26-4.el8_4 <<>> @127.0.0.1 api.acm-demo-hub.redhat.ren
# ; (1 server found)
# ;; global options: +cmd
# ;; Got answer:
# ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14103
# ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
# ;; WARNING: recursion requested but not available

# ;; OPT PSEUDOSECTION:
# ; EDNS: version: 0, flags:; udp: 1232
# ;; QUESTION SECTION:
# ;api.acm-demo-hub.redhat.ren.   IN      A

# ;; ANSWER SECTION:
# api.acm-demo-hub.redhat.ren. 10 IN      A       192.168.7.15

# ;; Query time: 1 msec
# ;; SERVER: 127.0.0.1#53(127.0.0.1)
# ;; WHEN: Thu Sep 15 02:24:19 UTC 2022
# ;; MSG SIZE  rcvd: 72

dig @127.0.0.1 a.apps.acm-demo-hub.redhat.ren
# ; <<>> DiG 9.11.26-RedHat-9.11.26-4.el8_4 <<>> @127.0.0.1 a.apps.acm-demo-hub.redhat.ren
# ; (1 server found)
# ;; global options: +cmd
# ;; Got answer:
# ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16264
# ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
# ;; WARNING: recursion requested but not available

# ;; OPT PSEUDOSECTION:
# ; EDNS: version: 0, flags:; udp: 1232
# ;; QUESTION SECTION:
# ;a.apps.acm-demo-hub.redhat.ren.        IN      A

# ;; ANSWER SECTION:
# a.apps.acm-demo-hub.redhat.ren. 10 IN   A       192.168.7.14

# ;; Query time: 1 msec
# ;; SERVER: 127.0.0.1#53(127.0.0.1)
# ;; WHEN: Thu Sep 15 02:25:20 UTC 2022
# ;; MSG SIZE  rcvd: 75

test with force power off

我们知道,如果ocp node意外断电的话,启动的时候,他会重新下载集群需要的基础镜像,那么我们就暴力断电其中一个节点,来测试这个节点能否启动吧。

We know that if the ocp node is accidentally powered off, it will re-download the basic image required by the cluster when it starts up. Then we will violently power off one of the nodes to test whether the node can be started.

oc get mcp
# NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
# master   rendered-master-80dda25e010fb6de88514875eefd7c19   True      False      False      3              3                   3                     0                      19h
# worker   rendered-worker-df248a1c64755ca00714f4f2b6d13e48   True      False      False      0              0                   0                     0                      19h

oc get node
# NAME             STATUS   ROLES           AGE   VERSION
# master-01-demo   Ready    master,worker   19h   v1.23.5+012e945
# master-02-demo   Ready    master,worker   19h   v1.23.5+012e945
# master-03-demo   Ready    master,worker   19h   v1.23.5+012e945

oc get co
# NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
# authentication                             4.10.26   True        False         False      3m14s
# baremetal                                  4.10.26   True        False         False      19h
# cloud-controller-manager                   4.10.26   True        False         False      19h
# cloud-credential                           4.10.26   True        False         False      19h
# cluster-autoscaler                         4.10.26   True        False         False      19h
# config-operator                            4.10.26   True        False         False      19h
# console                                    4.10.26   True        False         False      3m58s
# csi-snapshot-controller                    4.10.26   True        False         False      19h
# dns                                        4.10.26   True        False         False      153m
# etcd                                       4.10.26   True        False         False      19h
# image-registry                             4.10.26   True        False         False      19h
# ingress                                    4.10.26   True        False         False      130m
# insights                                   4.10.26   True        False         False      55s
# kube-apiserver                             4.10.26   True        False         False      19h
# kube-controller-manager                    4.10.26   True        False         False      19h
# kube-scheduler                             4.10.26   True        False         False      19h
# kube-storage-version-migrator              4.10.26   True        False         False      71m
# machine-api                                4.10.26   True        False         False      19h
# machine-approver                           4.10.26   True        False         False      19h
# machine-config                             4.10.26   True        False         False      12h
# marketplace                                4.10.26   True        False         False      19h
# monitoring                                 4.10.26   True        False         False      19h
# network                                    4.10.26   True        False         False      19h
# node-tuning                                4.10.26   True        False         False      19h
# openshift-apiserver                        4.10.26   True        False         False      131m
# openshift-controller-manager               4.10.26   True        False         False      19h
# openshift-samples                          4.10.26   True        False         False      19h
# operator-lifecycle-manager                 4.10.26   True        False         False      19h
# operator-lifecycle-manager-catalog         4.10.26   True        False         False      19h
# operator-lifecycle-manager-packageserver   4.10.26   True        False         False      131m
# service-ca                                 4.10.26   True        False         False      19h
# storage                                    4.10.26   True        False         False      19h

测试结果,能正常启动。

The test result shows that it can be started normally.

test showdown 1 master

我们关掉一个节点,然后看集群的状态

We shut down a node and see the status of the cluster

oc get node
# NAME             STATUS     ROLES           AGE   VERSION
# master-01-demo   NotReady   master,worker   19h   v1.23.5+012e945
# master-02-demo   Ready      master,worker   19h   v1.23.5+012e945
# master-03-demo   Ready      master,worker   19h   v1.23.5+012e945

oc get co
# NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
# authentication                             4.10.26   True        False         False      8m5s
# baremetal                                  4.10.26   True        False         False      19h
# cloud-controller-manager                   4.10.26   True        False         False      19h
# cloud-credential                           4.10.26   True        False         False      19h
# cluster-autoscaler                         4.10.26   True        False         False      19h
# config-operator                            4.10.26   True        False         False      19h
# console                                    4.10.26   True        False         False      14m
# csi-snapshot-controller                    4.10.26   True        False         False      19h
# dns                                        4.10.26   True        True          False      164m    DNS "default" reports Progressing=True: "Have 2 available node-resolver pods, want 3."
# etcd                                       4.10.26   True        False         True       19h     ClusterMemberControllerDegraded: unhealthy members found during reconciling members...
# image-registry                             4.10.26   True        False         False      19h
# ingress                                    4.10.26   True        False         False      141m
# insights                                   4.10.26   True        False         False      93s
# kube-apiserver                             4.10.26   True        False         True       19h     NodeControllerDegraded: The master nodes not ready: node "master-01-demo" not ready since 2022-09-15 03:33:40 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
# kube-controller-manager                    4.10.26   True        False         True       19h     NodeControllerDegraded: The master nodes not ready: node "master-01-demo" not ready since 2022-09-15 03:33:40 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
# kube-scheduler                             4.10.26   True        False         True       19h     NodeControllerDegraded: The master nodes not ready: node "master-01-demo" not ready since 2022-09-15 03:33:40 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
# kube-storage-version-migrator              4.10.26   True        False         False      82m
# machine-api                                4.10.26   True        False         False      19h
# machine-approver                           4.10.26   True        False         False      19h
# machine-config                             4.10.26   True        False         False      12h
# marketplace                                4.10.26   True        False         False      19h
# monitoring                                 4.10.26   True        False         False      19h
# network                                    4.10.26   True        True          False      19h     DaemonSet "openshift-multus/multus" is not available (awaiting 1 nodes)...
# node-tuning                                4.10.26   True        False         False      19h
# openshift-apiserver                        4.10.26   True        False         False      8m
# openshift-controller-manager               4.10.26   True        False         False      19h
# openshift-samples                          4.10.26   True        False         False      19h
# operator-lifecycle-manager                 4.10.26   True        False         False      19h
# operator-lifecycle-manager-catalog         4.10.26   True        False         False      19h
# operator-lifecycle-manager-packageserver   4.10.26   True        False         False      142m
# service-ca                                 4.10.26   True        False         False      19h
# storage                                    4.10.26   True        False         False      19h


关闭了一个节点,集群还能工作。

After shutting down a node, the cluster still works.

看看web console能否使用? / See if the web console can be used?

test with ocp upgrade

我们上传的镜像,包括了4.10.26, 4.10.28两个版本,那么我们就来试试升级吧

The images we uploaded include two versions: 4.10.26 and 4.10.28, so let's try to upgrade

oc get node
# NAME             STATUS   ROLES           AGE   VERSION
# master-01-demo   Ready    master,worker   19h   v1.23.5+012e945
# master-02-demo   Ready    master,worker   19h   v1.23.5+012e945
# master-03-demo   Ready    master,worker   19h   v1.23.5+012e945

oc get clusterversion
# NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
# version   4.10.26   True        False         19h     Cluster version is 4.10.26

# upgrade ocp to 4.10.28
oc adm upgrade \
  --to-image=quay.io/openshift-release-dev/ocp-release@sha256:2127608ebd67a2470860c42368807a0de2308dba144ec4c298bec1c03d79cb52 \
  --allow-explicit-upgrade --allow-upgrade-with-warnings=true --force=true 

# after upgrade
oc get clusterversion
# NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
# version   4.10.28   True        False         43m     Cluster version is 4.10.28

oc get co
# NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
# authentication                             4.10.28   True        False         False      62m
# baremetal                                  4.10.28   True        False         False      21h
# cloud-controller-manager                   4.10.28   True        False         False      21h
# cloud-credential                           4.10.28   True        False         False      22h
# cluster-autoscaler                         4.10.28   True        False         False      21h
# config-operator                            4.10.28   True        False         False      21h
# console                                    4.10.28   True        False         False      148m
# csi-snapshot-controller                    4.10.28   True        False         False      21h
# dns                                        4.10.28   True        False         False      4h58m
# etcd                                       4.10.28   True        False         False      21h
# image-registry                             4.10.28   True        False         False      21h
# ingress                                    4.10.28   True        False         False      4h35m
# insights                                   4.10.28   True        False         False      81s
# kube-apiserver                             4.10.28   True        False         False      21h
# kube-controller-manager                    4.10.28   True        False         False      21h
# kube-scheduler                             4.10.28   True        False         False      21h
# kube-storage-version-migrator              4.10.28   True        False         False      54m
# machine-api                                4.10.28   True        False         False      21h
# machine-approver                           4.10.28   True        False         False      21h
# machine-config                             4.10.28   True        False         False      129m
# marketplace                                4.10.28   True        False         False      21h
# monitoring                                 4.10.28   True        False         False      21h
# network                                    4.10.28   True        False         False      21h
# node-tuning                                4.10.28   True        False         False      100m
# openshift-apiserver                        4.10.28   True        False         False      142m
# openshift-controller-manager               4.10.28   True        False         False      21h
# openshift-samples                          4.10.28   True        False         False      98m
# operator-lifecycle-manager                 4.10.28   True        False         False      21h
# operator-lifecycle-manager-catalog         4.10.28   True        False         False      21h
# operator-lifecycle-manager-packageserver   4.10.28   True        False         False      4h36m
# service-ca                                 4.10.28   True        False         False      21h
# storage                                    4.10.28   True        False         False      21h

oc get mcp
# NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
# master   rendered-master-24f4773e2eb47a6524572c1e7185e836   True      False      False      3              3                   3                     0                      21h
# worker   rendered-worker-28261f188bfcb7348c5f6aab2e876b2e   True      False      False      0              0                   0                     0                      21h

rpm-ostree status
# State: idle
# Deployments:
# ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:822737b305b28aa4890f7bf847ebebc896cd7b549318195fc8c953ae3008cc44
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208161501-0 (2022-08-16T15:04:45Z)
#            LayeredPackages: docker-distribution htop pdns pdns-recursor pdns-selinux

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23d0609643c25efcd30a7a64483fdee2343ced26b1fd08c0cbf8d03a5d405939
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 410.84.202208030316-0 (2022-08-03T03:19:21Z)
#            LayeredPackages: docker-distribution htop pdns pdns-recursor pdns-selinux

我们可以看到,升级成功,各个后安装的软件包也都在。

We can see that the upgrade is successful, and all the installed packages are also there.

web console工作也正常。 / The web console works fine too.

finished

notes

research


yum install -y pdns pdns-recursor

mv /etc/pdns/pdns.conf /etc/pdns/pdns.conf.bak

cat << EOF > /etc/pdns/pdns.conf
launch=bind
local-address=127.0.0.1
local-port=5301
setgid=pdns
setuid=pdns
bind-config=/etc/pdns/bind.conf
bind-check-interval=300
enable-lua-records=yes
EOF

cat << EOF > /etc/pdns/bind.conf
zone "ocp4.redhat.ren" { type master; file "/etc/pdns/inside-out.xyz"; };
EOF

cat << 'EOF' > /etc/pdns/inside-out.xyz
$TTL 180 
@ IN SOA ns1.ocp4.redhat.ren. postmaster.ocp4.redhat.ren. (
        2014080704 ; Serial Number (date YYYYMMDD++) 
        3H              ; refresh (3 hours)
        30M             ; retry (30 minutes)
        2W              ; expiry (2 weeks)
        1W )            ; minimum (1 week)
        IN NS ns1.ocp4.redhat.ren.
        IN NS ns2.ocp4.redhat.ren.
@       IN    LUA    A    "ifportup(6443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
ns1     IN A 8.8.8.8 
ns2     IN A 8.8.4.4
helper  IN      A       192.168.7.11
;
;
; The api points to the IP of your load balancer
api             IN    LUA    A    "ifportup(6443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
api-int         IN    LUA    A    "ifportup(6443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
;
; The wildcard also points to the load balancer
*.apps          IN    LUA    A    "ifportup(443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
;
; Create entry for the bootstrap host
; bootstrap       IN      A       192.168.7.12
;
; Create entries for the master hosts
master-0                IN      A       192.168.7.13
master-1                IN      A       192.168.7.14
master-2                IN      A       192.168.7.15
;
; Create entries for the worker hosts
worker-0                IN      A       192.168.7.16
worker-1                IN      A       192.168.7.17
worker-2                IN      A       192.168.7.18
;
; The ETCd cluster lives on the masters...so point these to the IP of the masters
etcd-0  IN      A       192.168.7.13
etcd-1  IN      A       192.168.7.14
etcd-2  IN      A       192.168.7.15
;
; Create entries for the other hosts
registry        IN    LUA    A    "ifportup(5443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
yum             IN      A       192.168.7.1
quay            IN    LUA    A    "ifportup(5443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
nexus           IN      A       192.168.7.1
git             IN      A       192.168.7.11
tmp-registry    IN      A       192.168.7.177
EOF

# ausearch -c 'pdns_server' --raw | audit2allow -M my-pdnsserver
# semodule -X 300 -i my-pdnsserver.pp

# SELinux is preventing /usr/sbin/pdns_server from name_connect access on the tcp_socket port 6443.

# *****  Plugin connect_ports (92.2 confidence) suggests   *********************

# If you want to allow /usr/sbin/pdns_server to connect to network port 6443
# Then you need to modify the port type.
# Do
# # semanage port -a -t PORT_TYPE -p tcp 6443
#     where PORT_TYPE is one of the following: dns_port_t, dnssec_port_t, kerberos_port_t, ocsp_port_t.
#                                                                                                                                                                                                       *****  Plugin catchall_boolean (7.83 confidence) suggests   ******************

# If you want to allow system to run with NIS
# Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.

# Do
# setsebool -P nis_enabled 1

# *****  Plugin catchall (1.41 confidence) suggests   **************************

# If you believe that pdns_server should be allowed name_connect access on the port 6443 tcp_socket by default.
# Then you should report this as a bug.
# You can generate a local policy module to allow this access.
# Do
# allow this access for now by executing:
# # ausearch -c 'pdns/distributo' --raw | audit2allow -M my-pdnsdistributo
# # semodule -X 300 -i my-pdnsdistributo.pp

systemctl enable --now pdns

pdnsutil check-all-zones

mv /etc/pdns-recursor/recursor.conf /etc/pdns-recursor/recursor.conf.bak

cat << EOF > /etc/pdns-recursor/recursor.conf
local-address=0.0.0.0 ::
allow-from=192.168.7.0/0    #允许所有用户端请求
dnssec=off    #关闭dnssec
forward-zones=ocp4.redhat.ren=127.0.0.1:5301 
forward-zones-recurse=.=114.114.114.114
setgid=pdns-recursor
setuid=pdns-recursor
security-poll-suffix=
EOF

systemctl enable --now pdns-recursor

ausearch -m avc --start recent -i

audit2allow -a -M wzh-pdns

semodule -i wzh-pdns.pp


systemctl restart pdns


dig @127.0.0.1 helper.ocp4.redhat.ren

dig @127.0.0.1 api.ocp4.redhat.ren

dig @127.0.0.1 c.apps.ocp4.redhat.ren

dig @127.0.0.1 registry.ocp4.redhat.ren

test stand alone


dnf install -y epel-release

dnf install -y pdns pdns-recursor

dnf update -y

semodule -i wzh-pdns.pp

cat << EOF > /etc/pdns/pdns.conf
launch=bind
local-address=0.0.0.0 ::
# local-port=5301
setgid=pdns
setuid=pdns
bind-config=/etc/pdns/bind.conf
bind-check-interval=300
enable-lua-records=yes
EOF

cat << EOF > /etc/pdns/bind.conf
zone "ocp4.redhat.ren" { type master; file "/etc/pdns/inside-out.xyz"; };
EOF

cat << 'EOF' > /etc/pdns/inside-out.xyz
$TTL 180 
@ IN SOA ns1.ocp4.redhat.ren. postmaster.ocp4.redhat.ren. (
        2014080704 ; Serial Number (date YYYYMMDD++) 
        3H              ; refresh (3 hours)
        30M             ; retry (30 minutes)
        2W              ; expiry (2 weeks)
        1W )            ; minimum (1 week)
        IN NS ns1.ocp4.redhat.ren.
        IN NS ns2.ocp4.redhat.ren.
@       IN    LUA    A    "ifportup(6443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
ns1     IN A 8.8.8.8 
ns2     IN A 8.8.4.4
helper  IN      A       192.168.7.11
;
;
; The api points to the IP of your load balancer
api             IN    LUA    A    "ifportup(6443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
api-int         IN    LUA    A    "ifportup(6443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
;
; The wildcard also points to the load balancer
*.apps          IN    LUA    A    "ifportup(443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
;
; Create entry for the bootstrap host
; bootstrap       IN      A       192.168.7.12
;
; Create entries for the master hosts
master-0                IN      A       192.168.7.13
master-1                IN      A       192.168.7.14
master-2                IN      A       192.168.7.15
;
; Create entries for the worker hosts
worker-0                IN      A       192.168.7.16
worker-1                IN      A       192.168.7.17
worker-2                IN      A       192.168.7.18
;
; The ETCd cluster lives on the masters...so point these to the IP of the masters
etcd-0  IN      A       192.168.7.13
etcd-1  IN      A       192.168.7.14
etcd-2  IN      A       192.168.7.15
;
; Create entries for the other hosts
registry        IN    LUA    A    "ifportup(5443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
yum             IN      A       192.168.7.1
quay            IN    LUA    A    "ifportup(5443, {'192.168.7.13', '192.168.7.14', '192.168.7.15'})"
nexus           IN      A       192.168.7.1
git             IN      A       192.168.7.11
tmp-registry    IN      A       192.168.7.177
EOF

systemctl enable --now pdns

dig @127.0.0.1 helper.ocp4.redhat.ren

dig @127.0.0.1 api.ocp4.redhat.ren

dig @127.0.0.1 c.apps.ocp4.redhat.ren

dig @127.0.0.1 registry.ocp4.redhat.ren

test install


======================================================================================================================================================================================================
 Package                                                Architecture                            Version                                              Repository                                  Size
======================================================================================================================================================================================================
Installing:
 pdns                                                   x86_64                                  4.6.2-1.el8                                          epel                                       3.7 M
 pdns-recursor                                          x86_64                                  4.3.6-1.el8                                          epel                                       2.0 M
Installing dependencies:
 boost-context                                          x86_64                                  1.66.0-10.el8                                        appstream                                   15 k
 boost-program-options                                  x86_64                                  1.66.0-10.el8                                        appstream                                  140 k
 libsodium                                              x86_64                                  1.0.18-2.el8                                         epel                                       162 k
 luajit                                                 x86_64                                  2.1.0-0.16beta3.el8                                  epel                                       359 k
 protobuf                                               x86_64                                  3.5.0-13.el8                                         appstream                                  892 k

Transaction Summary
======================================================================================================================================================================================================
Install  7 Packages



registry


cat << EOF > /usr/lib/systemd/system/docker-distribution.service
[Unit]
Description=v2 Registry server for Docker
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/registry serve /etc/wzh/registry-config.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target

EOF

mkdir -p /etc/wzh

cat << EOF > /etc/wzh/registry-config.yml
version: 0.1
log:
  accesslog:
    disabled: true
  fields:
    service: registry
storage:
    cache:
        layerinfo: inmemory
    filesystem:
        rootdirectory: /var/wzh-registry
    delete:
        enabled: false
    maintenance:
        readonly:
            enabled: true
http:
    addr: :5443
    tls:
       certificate: /etc/wzh/redhat.ren.crt
       key: /etc/wzh/redhat.ren.key
EOF


# 配置registry
export VAR_CERT_DIR=/etc/wzh/
mkdir -p ${VAR_CERT_DIR} && cd ${VAR_CERT_DIR}

# https://access.redhat.com/documentation/en-us/red_hat_codeready_workspaces/2.1/html/installation_guide/installing-codeready-workspaces-in-tls-mode-with-self-signed-certificates_crw
openssl genrsa -out ${VAR_CERT_DIR}/redhat.ren.ca.key 4096
openssl req -x509 \
  -new -nodes \
  -key ${VAR_CERT_DIR}/redhat.ren.ca.key \
  -sha256 \
  -days 36500 \
  -out ${VAR_CERT_DIR}/redhat.ren.ca.crt \
  -subj /CN="Local Red Hat Ren Signer" \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat /etc/pki/tls/openssl.cnf \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature'))

openssl genrsa -out ${VAR_CERT_DIR}/redhat.ren.key 2048

openssl req -new -sha256 \
    -key ${VAR_CERT_DIR}/redhat.ren.key \
    -subj "/O=Local Red Hat Ren /CN=*.ocp4.redhat.ren" \
    -reqexts SAN \
    -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "\n[SAN]\nsubjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth")) \
    -out ${VAR_CERT_DIR}/redhat.ren.csr

openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth") \
    -days 36500 \
    -in ${VAR_CERT_DIR}/redhat.ren.csr \
    -CA ${VAR_CERT_DIR}/redhat.ren.ca.crt \
    -CAkey ${VAR_CERT_DIR}/redhat.ren.ca.key \
    -CAcreateserial -out ${VAR_CERT_DIR}/redhat.ren.crt

openssl x509 -in ${VAR_CERT_DIR}/redhat.ren.crt -text

/bin/cp -f ${VAR_CERT_DIR}/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

cat << EOF >> /etc/hosts

127.0.0.1       registry.redhat.ren

EOF

mkdir -p /var/wzh-registry

systemctl restart docker-distribution

podman for pdns & registrty


mkdir -p /data/pdns/conf
cd /data/pdns

cat > /data/pdns/pdns.Dockerfile << EOF
FROM docker.io/library/almalinux:8

RUN dnf -y install epel-release

RUN dnf -y update

RUN dnf -y install pdns pdns-recursor

ENTRYPOINT ["/usr/sbin/pdns_server"]
CMD ["--socket-dir=/tmp/pdns", "--guardian=no", "--daemon=no", "--disable-syslog", "--log-timestamp=no", "--write-pid=no"]
EOF

podman build --squash -t quay.io/nepdemo/pdns:4.6.2-alma8 -f pdns.Dockerfile .

podman push quay.io/nepdemo/pdns:4.6.2-alma8

cat > /data/pdns/pdns.Dockerfile << EOF
FROM registry.access.redhat.com/ubi8

RUN dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

RUN dnf -y update

RUN dnf -y install pdns pdns-recursor

ENTRYPOINT ["/usr/sbin/pdns_server"]
CMD ["--socket-dir=/tmp/pdns", "--guardian=no", "--daemon=no", "--disable-syslog", "--log-timestamp=no", "--write-pid=no"]
EOF

podman build --squash -t quay.io/nepdemo/pdns:4.6.2-ubi8 -f pdns.Dockerfile .

podman push quay.io/nepdemo/pdns:4.6.2-ubi8

cat > /data/pdns/conf/pdns.conf << EOF
launch=bind
local-address=0.0.0.0
local-port=53
setgid=pdns
setuid=pdns
bind-config=/etc/pdns/bind.conf
bind-check-interval=300
enable-lua-records=yes
EOF

cat > /data/pdns/conf/bind.conf << EOF
zone "acm-demo-hub.redhat.ren" { type master; file "/etc/pdns/inside-out.xyz"; };
zone "infra.redhat.ren" { type master; file "/etc/pdns/infra.xyz"; };
EOF

cat > /data/pdns/conf/inside-out.xyz << 'EOF'
$TTL 10 
@ IN SOA ns1.acm-demo-hub.redhat.ren. postmaster.acm-demo-hub.redhat.ren. (
        2014080704 ; Serial Number (date YYYYMMDD++) 
        3H              ; refresh (3 hours)
        30M             ; retry (30 minutes)
        2W              ; expiry (2 weeks)
        1W )            ; minimum (1 week)
        ;IN NS ns1.ocp4.redhat.ren.
        ;IN NS ns2.ocp4.redhat.ren.
@       IN    A    192.168.7.13
;ns1     IN A 8.8.8.8 
;ns2     IN A 8.8.4.4
helper  IN      A       192.168.7.11
;
;
; The api points to the IP of your load balancer
api             IN    A    192.168.7.13
api-int         IN    A    192.168.7.13
;
; The wildcard also points to the load balancer
*.apps          IN    A    192.168.7.13
;
; Create entry for the bootstrap host
; bootstrap       IN      A       192.168.7.12
;
; Create entries for the master hosts
master-0                IN      A       192.168.7.13
;master-1                IN      A       192.168.7.14
;master-2                IN      A       192.168.7.15
;
; Create entries for the worker hosts
;worker-0                IN      A       192.168.7.16
;worker-1                IN      A       192.168.7.17
;worker-2                IN      A       192.168.7.18
;
; The ETCd cluster lives on the masters...so point these to the IP of the masters
;etcd-0  IN      A       192.168.7.13
;etcd-1  IN      A       192.168.7.14
;etcd-2  IN      A       192.168.7.15
;
; Create entries for the other hosts
registry        IN      A       192.168.7.13
yum             IN      A       192.168.7.1
nexus           IN      A       192.168.7.1
git             IN      A       192.168.7.11
tmp-registry    IN      A       192.168.7.177
EOF

cat > /data/pdns/conf/infra.xyz << 'EOF'
$TTL 10 
@ IN SOA ns1.infra.redhat.ren. postmaster.infra.redhat.ren. (
        2014080704 ; Serial Number (date YYYYMMDD++) 
        3H              ; refresh (3 hours)
        30M             ; retry (30 minutes)
        2W              ; expiry (2 weeks)
        1W )            ; minimum (1 week)
        ;IN NS ns1.ocp4.redhat.ren.
        ;IN NS ns2.ocp4.redhat.ren.
@       IN    A    192.168.7.13
quay            IN    LUA    A    "ifportup(5180, {'158.247.225.4', '192.168.7.14', '192.168.7.15'})"
quaylab         IN    A    192.168.7.13
EOF

rm -f /tmp/pdns-*

podman run \
  --name local-pdns \
  --network=host \
  -v /data/pdns/conf/:/etc/pdns/:z \
  --conmon-pidfile /tmp/pdns-pid \
  --cidfile /tmp/pdns-cid \
  --cgroups=no-conmon \
  --replace \
  quay.io/nepdemo/pdns:4.6.2-ubi8

/usr/bin/podman stop --ignore --cidfile /tmp/pdns-cid -t 1

registry


cat << EOF > ${BASE_DIR}/data/sno/registry.images.bu
variant: openshift
version: 4.10.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-registry
storage:
  files:
    - path: /etc/wzh/redhat.ren.crt
      overwrite: true
      contents:
        source: data:text/plain;charset=utf-8;base64,$( base64 -w 0 < ${VAR_CERT_DIR}/redhat.ren.crt )
      mode: 420
      user:
        name: root

    - path: /etc/wzh/redhat.ren.key
      overwrite: true
      contents:
        source: data:text/plain;charset=utf-8;base64,$( base64 -w 0 < ${VAR_CERT_DIR}/redhat.ren.key )
      mode: 420
      user:
        name: root

    - path: /etc/wzh/registry-config.yml
      overwrite: true
      contents:
        inline: |
          version: 0.1
          log:
            accesslog:
                disabled: true
            fields:
                service: registry
          storage:
              cache:
                  layerinfo: inmemory
              filesystem:
                  rootdirectory: /var/wzh-registry
              delete:
                  enabled: true
              maintenance:
                  readonly:
                      enabled: false
          http:
              addr: :8443
              tls:
                certificate: /etc/wzh/redhat.ren.crt
                key: /etc/wzh/redhat.ren.key
      mode: 420
      user:
        name: root

systemd:
  units:
    - contents: |
        [Unit]
        Description=Set SELinux chcon for image registry
        Before=docker-distribution.service

        [Service]
        Type=oneshot
        RemainAfterExit=yes
        User=root
        ExecStartPre=-mkdir -p /var/wzh-registry
        ExecStart=/usr/bin/chcon -Rt container_file_t /var/wzh-registry

        [Install]
        WantedBy=multi-user.target
      enabled: true
      name: hostpath-registry.service

    - contents: |
        [Unit]
        Description=v2 Registry server for Docker
        After=network.target hostpath-registry.service
        Requires=hostpath-registry.service
        Before=kubelet.service

        [Service]
        Type=simple
        TimeoutStartSec=5m
        ExecStartPre=-/bin/rm -f %t/%n-pid %t/%n-cid
        ExecStart=/usr/bin/podman run \
            --name local-registry \
            --network=host \
            -v /var/wzh-registry/:/var/lib/registry:z \
            -v /etc/wzh:/certs:z \
            -e REGISTRY_HTTP_ADDR=0.0.0.0:8443 \
            -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/redhat.ren.crt \
            -e REGISTRY_HTTP_TLS_KEY=/certs/redhat.ren.key \
            --conmon-pidfile %t/%n-pid \
            --cidfile %t/%n-cid \
            --cgroups=no-conmon \
            --replace \
            docker.io/library/registry:2

        ExecStop=-/usr/bin/podman stop --ignore --cidfile %t/%n-cid -t 1
        ExecStopPost=-/usr/bin/podman rm --ignore -f --cidfile %t/%n-cid
        PIDFile=%t/%n-pid
        KillMode=none
        Restart=always
        RestartSec=30

        [Install]
        WantedBy=multi-user.target
      enabled: true
      name: docker-distribution.service

    - name: kubelet.service
      dropins:
      - name: 99-after-registry.conf
        contents: |
          [Unit]
          Requires=docker-distribution.service
          After=docker-distribution.service

EOF

butane ${BASE_DIR}/data/sno/registry.images.bu > ${BASE_DIR}/data/sno/99-zzz-master-registry.yaml

oc create --save-config -f ${BASE_DIR}/data/sno/99-zzz-master-registry.yaml

# oc apply -f ${BASE_DIR}/data/sno/99-zzz-master-registry.yaml
# oc delete -f ${BASE_DIR}/data/sno/99-zzz-master-registry.yaml


pdns



cat > ${BASE_DIR}/data/sno/pdns.bu << 'EOF' 
variant: openshift
version: 4.10.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-pdns
storage:
  files:
    - path: /etc/pdns/pdns.conf
      overwrite: true
      contents:
        inline: |
          launch=bind
          local-address=0.0.0.0
          local-port=53
          setgid=pdns
          setuid=pdns
          bind-config=/etc/pdns/bind.conf
          bind-check-interval=300
          enable-lua-records=yes
      mode: 420
      user:
        name: root

    - path: /etc/pdns/bind.conf
      overwrite: true
      contents:
        inline: |
          zone "acm-demo-hub.redhat.ren" { type master; file "/etc/pdns/inside-out.xyz"; };
          zone "infra.redhat.ren" { type master; file "/etc/pdns/infra.xyz"; };
      mode: 420
      user:
        name: root

    - path: /etc/pdns/inside-out.xyz
      overwrite: true
      contents:
        inline: |
          $TTL 10 
          @ IN SOA ns1.acm-demo-hub.redhat.ren. postmaster.acm-demo-hub.redhat.ren. (
                  2014080704 ; Serial Number (date YYYYMMDD++) 
                  3H              ; refresh (3 hours)
                  30M             ; retry (30 minutes)
                  2W              ; expiry (2 weeks)
                  1W )            ; minimum (1 week)
                  ;IN NS ns1.ocp4.redhat.ren.
                  ;IN NS ns2.ocp4.redhat.ren.
          @       IN    A    192.168.7.13
          ;ns1     IN A 8.8.8.8 
          ;ns2     IN A 8.8.4.4
          helper  IN      A       192.168.7.11
          ;
          ;
          ; The api points to the IP of your load balancer
          api             IN    A    192.168.7.13
          api-int         IN    A    192.168.7.13
          ;
          ; The wildcard also points to the load balancer
          *.apps          IN    A    192.168.7.13
          ;
          ; Create entry for the bootstrap host
          ; bootstrap       IN      A       192.168.7.12
          ;
          ; Create entries for the master hosts
          master-0                IN      A       192.168.7.13
          ;master-1                IN      A       192.168.7.14
          ;master-2                IN      A       192.168.7.15
          ;
          ; Create entries for the worker hosts
          ;worker-0                IN      A       192.168.7.16
          ;worker-1                IN      A       192.168.7.17
          ;worker-2                IN      A       192.168.7.18
          ;
          ; The ETCd cluster lives on the masters...so point these to the IP of the masters
          ;etcd-0  IN      A       192.168.7.13
          ;etcd-1  IN      A       192.168.7.14
          ;etcd-2  IN      A       192.168.7.15
          ;
          ; Create entries for the other hosts
          registry        IN      A       192.168.7.13
          yum             IN      A       192.168.7.1
          nexus           IN      A       192.168.7.1
          git             IN      A       192.168.7.11
          tmp-registry    IN      A       192.168.7.177
      mode: 420
      user:
        name: root

    - path: /etc/pdns/infra.xyz
      overwrite: true
      contents:
        inline: |
          $TTL 10 
          @ IN SOA ns1.infra.redhat.ren. postmaster.infra.redhat.ren. (
                  2014080704 ; Serial Number (date YYYYMMDD++) 
                  3H              ; refresh (3 hours)
                  30M             ; retry (30 minutes)
                  2W              ; expiry (2 weeks)
                  1W )            ; minimum (1 week)
                  ;IN NS ns1.ocp4.redhat.ren.
                  ;IN NS ns2.ocp4.redhat.ren.
          @       IN    A    192.168.7.13
          quay            IN    A    192.168.7.13
          quaylab         IN    A    192.168.7.13

      mode: 420
      user:
        name: root
systemd:
  units:
    - contents: |
        [Unit]
        Description=PowerDNS Authoritative Server
        After=network.target
        Before=kubelet.service

        [Service]
        Type=simple
        TimeoutStartSec=5m
        ExecStartPre=-/bin/rm -f %t/%n-pid %t/%n-cid
        ExecStart=/usr/bin/podman run \
            --name local-pdns \
            --network=host \
            -v /etc/pdns/:/etc/pdns/:z \
            --conmon-pidfile %t/%n-pid \
            --cidfile %t/%n-cid \
            --cgroups=no-conmon \
            --replace \
            quay.io/nepdemo/pdns:4.6.2-ubi8

        ExecStop=-/usr/bin/podman stop --ignore --cidfile %t/%n-cid -t 1
        ExecStopPost=-/usr/bin/podman rm --ignore -f --cidfile %t/%n-cid
        PIDFile=%t/%n-pid
        KillMode=none
        Restart=always
        SyslogIdentifier=pdns_server
        User=pdns
        Group=pdns
        RestartSec=1
        StartLimitInterval=0
        RuntimeDirectory=pdns

        [Install]
        WantedBy=multi-user.target
      name: pdns.service
      enabled: true

    - name: kubelet.service
      dropins:
      - name: 99-after-pdns.conf
        contents: |
          [Unit]
          Requires=pdns.service
          After=pdns.service

EOF

butane ${BASE_DIR}/data/sno/pdns.bu > ${BASE_DIR}/data/sno/99-zzz-master-pdns.yaml

oc create --save-config -f ${BASE_DIR}/data/sno/99-zzz-master-pdns.yaml

# oc apply -f ${BASE_DIR}/data/sno/99-zzz-master-pdns.yaml


end

upgrade openshift 4.10 based rhcos to rhel 9.1 / 升级 openshift 4.10 基础操作系统到 rhel 9.1 支持海光x86 cpu

我们项目中,要求openshift支持海光x86 cpu,linux kernel大概是在4.20以后,合并了对海光x86 cpu支持的代码。但是当前版本的openshift(<4.12)都是基于rhel8的,rhel8的内核是基于4.18版本改造而来,还没有海光x86 cpu的支持。

好在redhat已经推出了rhel9, 是基于kernel 5.14的,经过实际测试,rhel9.1是能在海光x86 cpu上正常安装和运行的,那么我们就来试试,把openshift 4.10的底层操作系统rhcos,升级到rhel9.1的内核。

In our project, openshift is required to support Hygon x86 cpu, and the linux kernel is probably after 4.20, which merged the code supporting Hygon x86 cpu. However, the current version of openshift (<4.12) is based on rhel8, and the kernel of rhel8 is modified based on version 4.18, and there is no support for Hygon x86 cpu.

Fortunately, redhat has launched rhel9, which is based on kernel 5.14. After actual testing, rhel9.1 can be installed and run normally on Hygon x86 cpu, so let's try it and use rhcos, the underlying operating system of openshift 4.10, Upgrade to rhel9.1 kernel.

⚠️⚠️⚠️注意,本文所述方法,涉及到了以下问题,不能使用在生产环境中,只能作为 PoC 应急,或者研究学习之用。如果确实是项目急需,请和红帽GPS部门沟(gěi)通(qián),获得支持。

  • ⚠️编译需要多个 rhel 相关的特种源,而且还是 eus, tus 版本,这些都需要单独购买
  • ⚠️编译需要一个红帽内部的 repo 源,属于红帽机密
  • ⚠️自定义的 rhcos 不能得到红帽 CEE 支持

⚠️⚠️⚠️ Note that the method described in this article involves the following issues and cannot be used in a production environment. It can only be used as a PoC emergency or for research and learning. If it is really urgent for the project, please communicate with the Red Hat GPS department for support.

  • ⚠️ Compilation requires multiple rhel-related special sources, and they are also eus and tus versions, which need to be purchased separately
  • ⚠️ Compilation requires a Red Hat internal repo source, which is Red Hat Confidential
  • ⚠️ Custom rhcos cannot be supported by Red Hat CEE

本次实验的架构图如下: The architecture diagram of this experiment is as follows:

过程中,重度使用了 cosa , 这个是 coreos-assembler 工具集中的命令,他封装了一系列的工具,根据一个配置文件项目,来自动化的编译出来 coreos/rhcos 镜像。

In the process, cosa is heavily used, which is a command in the coreos-assembler tool set. It encapsulates a series of tools and automatically compiles the coreos/rhcos image according to a configuration file project.

编译成果 / compiling result

以下是编译成果 / The following is the compiling result

  • openshift4.10.41 release image
    • quay.io/wangzheng422/ocp:4.10.41-rhel-9.1-v02
  • openshift4.10.41 os images
    • 百度分享 / baidu sharing: https://pan.baidu.com/s/16_T72CqQeS2rLJ4MzW4dEQ?pwd=zpbg

⚠️⚠️⚠️ 另外,编译成果并没有严格测试,还需要客户根据自己的场景,完善的测试以后,才可以使用。

⚠️⚠️⚠️ In addition, the compilation results have not been strictly tested, and customers need to complete the test according to their own scenarios before they can be used.

视频讲解 / Video explanation

准备 dnf repo 源 / Prepare the dnf repo source

注意,这些 repo 源都是需要特殊单独购买,请联系红帽销售和GPS服务部门。

Note that these repo sources are required to be purchased separately, please contact Red Hat Sales and GPS Services.

rhel 9.1

我们首先要做的,是准备一个rhel9.1的rpm repo,这里有准备步骤。很遗憾,其中有几个openshift专用的repo,是不公开的。如果客户必须要这些repo的访问权限,请联系对口的SA,在公司内部申请试试。


# install a rhel on vultr

# disable user/passwd login
# ChallengeResponseAuthentication no
# PasswordAuthentication no
# UsePAM no
# sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
# sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config

cat << EOF > /etc/ssh/sshd_config.d/99-wzh.conf
PasswordAuthentication no
UsePAM no
EOF

systemctl restart sshd

ssh root@v.redhat.ren -o PubkeyAuthentication=no
# root@v.redhat.ren: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

subscription-manager register --auto-attach --username ******** --password ********

# subscription-manager release --list
# subscription-manager release --set=8.4

# subscription-manager config --rhsm.baseurl=https://china.cdn.redhat.com

subscription-manager repos --list > list

subscription-manager repos \
    --enable="rhel-9-for-x86_64-baseos-rpms" \
    --enable="rhel-9-for-x86_64-appstream-rpms" \
    --enable="codeready-builder-for-rhel-9-x86_64-rpms" \
    # 

dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm

dnf install -y htop createrepo_c

dnf install -y https://download-ib01.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/b/byobu-5.133-1.el8.noarch.rpm

# byobu
dnf update -y

reboot

mkdir -p /data/dnf

# Create new empty partitions, and filesystem
parted -s /dev/vdb mklabel gpt
parted -s /dev/vdb unit mib mkpart primary 0% 100%

mkfs.ext4 /dev/vdb1

cat << EOF >> /etc/fstab
/dev/vdb1               /data/dnf      ext4    defaults,noatime,nofail 0 0
EOF

mount /dev/vdb1 /data/dnf

mkdir -p /data/dnf/dnf-ocp

cd /data/dnf/dnf-ocp

# subscription-manager release --set=9.0

# dnf reposync --repoid rhel-9-for-x86_64-baseos-eus-rpms -m --download-metadata --delete -n
# dnf reposync --repoid=rhel-9-for-x86_64-appstream-eus-rpms -m --download-metadata --delete -n
dnf reposync --repoid rhel-9-for-x86_64-baseos-rpms -m --download-metadata --delete -n
dnf reposync --repoid=rhel-9-for-x86_64-appstream-rpms -m --download-metadata --delete -n
dnf reposync --repoid=rhel-9-for-x86_64-nfv-rpms -m --download-metadata --delete -n
# dnf reposync --repoid=advanced-virt-for-rhel-8-x86_64-eus-rpms -m --download-metadata --delete -n
dnf reposync --repoid=fast-datapath-for-rhel-9-x86_64-rpms -m --download-metadata --delete -n

subscription-manager release --set=9

# fix for coreos-installer version
mkdir -p /data/dnf/dnf-ocp/fixes
cd /data/dnf/dnf-ocp/fixes
# dnf download --resolve --alldeps coreos-installer coreos-installer-bootinfra
dnf download --resolve coreos-installer coreos-installer-bootinfra selinux-policy
createrepo ./

# username, and password is confidensial
cat << 'EOF' > /etc/yum.repos.d/ose.repo
[rhel-8-server-ose]
name=rhel-8-server-ose
enabled=1
gpgcheck=0
baseurl=https://mirror.openshift.com/enterprise/reposync/4.10/rhel-8-server-ose-rpms/
module_hotfixes=true
username=??????
password=??????

[rhel-9-server-ose]
name=rhel-9-server-ose
enabled=1
gpgcheck=0
baseurl=https://mirror.openshift.com/enterprise/reposync/4.13/rhel-9-server-ose-rpms/
module_hotfixes=true
username=??????
password=??????

[rhel-9-server-ironic]
name=rhel-9-server-ironic
enabled=1
gpgcheck=0
baseurl=https://mirror.openshift.com/enterprise/reposync/4.13/rhel-9-server-ironic-rpms/
module_hotfixes=true
username=??????
password=??????
EOF

dnf reposync --repoid=rhel-8-server-ose -m --download-metadata --delete -n
dnf reposync --repoid=rhel-9-server-ose -m --download-metadata --delete -n
dnf reposync --repoid=rhel-9-server-ironic -m --download-metadata --delete -n

systemctl disable --now firewalld

# host the repo with web service
cd /data/dnf/dnf-ocp
python3 -m http.server 5180

准备 build 服务器 / Prepare the build server

注意,build 服务器需要支持 kvm ,如果选用的云平台,需要云平台支持嵌套虚拟化。

本次实验,我们选用了一台 centos stream 8 的云主机。

Note that the build server needs to support kvm. If you choose a cloud platform, the cloud platform needs to support nested virtualization.

In this experiment, we chose a cloud host of centos stream 8.

# install a centos stream 8 on digitalocean, 
# 2c 2G for ostree only
# 4c 8G for iso because it needs metal first

dnf install -y epel-release

dnf install -y byobu htop

dnf update -y

reboot

dnf groupinstall -y server

dnf install -y lftp podman

dnf -y install qemu-kvm libvirt libguestfs-tools virt-install virt-viewer virt-manager tigervnc-server

systemctl disable --now firewalld

systemctl enable --now libvirtd

开始编译 rhcos / Start compiling rhcos

cosa 的输入是一个配置文件项目,上游是 https://github.com/openshift/os , 我们做了下游扩展,加入了各种repo源,并且把操作系统名字,加入了 wzh 的标记。

The input of cosa is a configuration file project, and the upstream is https://github.com/openshift/os. We made downstream extensions, added the rpm repo source, added the operating system name, added the wzh mark.

# machine-os-images just copy a iso into container
# machine-os-content is our target

# follow coreos-assembler instruction
# https://github.com/coreos/coreos-assembler/blob/main/docs/building-fcos.md
# https://coreos.github.io/coreos-assembler/
# https://github.com/openshift/os/blob/master/docs/development-rhcos.md
# https://github.com/openshift/os/blob/master/docs/development.md

# https://github.com/openshift/os/blob/master/docs/development.md
# https://github.com/openshift/release/blob/master/core-services/release-controller/README.md#rpm-mirrors

podman login ************* quay.io

# export COREOS_ASSEMBLER_CONTAINER=quay.io/coreos-assembler/coreos-assembler:rhcos-4.12
export COREOS_ASSEMBLER_CONTAINER=quay.io/coreos-assembler/coreos-assembler:latest
podman pull $COREOS_ASSEMBLER_CONTAINER

cosa() {
   env | grep COREOS_ASSEMBLER
   local -r COREOS_ASSEMBLER_CONTAINER_LATEST="quay.io/coreos-assembler/coreos-assembler:latest"
   if [[ -z ${COREOS_ASSEMBLER_CONTAINER} ]] && $(podman image exists ${COREOS_ASSEMBLER_CONTAINER_LATEST}); then
       local -r cosa_build_date_str="$(podman inspect -f "{{.Created}}" ${COREOS_ASSEMBLER_CONTAINER_LATEST} | awk '{print $1}')"
       local -r cosa_build_date="$(date -d ${cosa_build_date_str} +%s)"
       if [[ $(date +%s) -ge $((cosa_build_date + 60*60*24*7)) ]] ; then
         echo -e "\e[0;33m----" >&2
         echo "The COSA container image is more that a week old and likely outdated." >&2
         echo "You should pull the latest version with:" >&2
         echo "podman pull ${COREOS_ASSEMBLER_CONTAINER_LATEST}" >&2
         echo -e "----\e[0m" >&2
         sleep 10
       fi
   fi
   set -x
   podman run --rm -ti --security-opt label=disable --privileged                                    \
              --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536                          \
              -v ${PWD}:/srv/ --device /dev/kvm --device /dev/fuse                                  \
              -v /run/user/0/containers/auth.json:/home/builder/.docker/config.json                      \
              --tmpfs /tmp -v /var/tmp:/var/tmp --name cosa                                         \
              ${COREOS_ASSEMBLER_CONFIG_GIT:+-v $COREOS_ASSEMBLER_CONFIG_GIT:/srv/src/config/:ro}   \
              ${COREOS_ASSEMBLER_GIT:+-v $COREOS_ASSEMBLER_GIT/src/:/usr/lib/coreos-assembler/:ro}  \
              ${COREOS_ASSEMBLER_CONTAINER_RUNTIME_ARGS}                                            \
              ${COREOS_ASSEMBLER_CONTAINER:-$COREOS_ASSEMBLER_CONTAINER_LATEST} "$@"
   rc=$?; set +x; return $rc
}

rm -rf /data/rhcos
mkdir -p /data/rhcos
cd /data/rhcos

# cosa init --branch wzh-ocp-4.8-rhel-9.1 https://github.com/wangzheng422/machine-os-content

cosa init \
      --branch wzh-ocp-4.10-based-on-4.13-rhel-9 \
      --variant rhel-coreos-9 \
      https://github.com/wangzheng422/machine-os-content

sed -i 's/REPO_IP/45.77.125.88:5180/g' /data/rhcos/src/config/rhel-9.0.repo

cosa fetch

# cosa build ostree
# ......
# Ignored user missing from new passwd file: root
# New passwd entries: clevis, dnsmasq, gluster, systemd-coredump, systemd-journal-remote, unbound
# Ignored group missing from new group file: root
# New group entries: clevis, dnsmasq, gluster, input, kvm, printadmin, render, systemd-coredump, systemd-journal-remote, unbound
# Committing... done
# Metadata Total: 9777
# Metadata Written: 3156
# Content Total: 6635
# Content Written: 1456
# Content Cache Hits: 19307
# Content Bytes Written: 149555523
# 3156 metadata, 22414 content objects imported; 2.0 GB content written
# Wrote commit: 9c9831a17f276a55d263c7856aa61af722ec84d9780405018ac46b3c2c7aa5d6
# New image input checksum: 9062762601fde9b726033297ef1c442589066328334c88268d3952dcf1014826
# None
# New build ID: 48.90.202211260320-wzh-0
# Running:  rpm-ostree compose container-encapsulate --max-layers=50 --format-version=1 --repo=/srv/tmp/repo --label=coreos-assembler.image-config-checksum=e748dfefac80583a123d35bfdfe87fcce2c2757f15d8251e8482d1aeb7e4b7a0 --label=coreos-assembler.image-input-checksum=9062762601fde9b726033297ef1c442589066328334c88268d3952dcf1014826 --label=org.opencontainers.image.source=https://github.com/wangzheng422/machine-os-content --label=org.opencontainers.image.revision=331baaa292509c237e8647b598a9768aefbb984d 48.90.202211260320-wzh-0 oci-archive:rhcos-48.90.202211260320-wzh-0-ostree.x86_64.ociarchive.tmp:latest
# Reading packages... done
# Building package mapping... done
# 22414 objects in 511 packages (332 source)
# rpm size: 1978859148
# Earliest changed package: nss-altfiles-2.18.1-20.el9.x86_64 at 2021-08-02 15:39:20 UTC
# 1488 duplicates
# Multiple owners:
#   /usr/lib/.build-id/93/1521a98c6e8ca8485e3508ac3ee12e7a0bb233
#   /usr/lib/.build-id/fb/c60f5edbc2853811a813d9fb404cdaddfaf70a
#   /usr/share/licenses/systemd/LICENSE.LGPL2.1
# Generating container image... done
# Pushed digest: sha256:95ea1eeff653f2ec7ee9a3826978cbe5cadad2e9894d76edffb6a425892fdbab
# Total objects: 25866
# No unreachable objects
# Ignoring non-directory /srv/builds/.build-commit
# + rc=0
# + set +x

# or build with default setting, ostree and qcow2
cosa build
# ......
# + cosa meta --workdir /srv --build 48.90.202211270909-wzh-0 --artifact qemu --artifact-json /srv/tmp/build.qemu/meta.json.new
# /srv/builds/48.90.202211270909-wzh-0/x86_64/meta.json wrote with version stamp 1669540779194835967
# + /usr/lib/coreos-assembler/finalize-artifact rhcos-48.90.202211270909-wzh-0-qemu.x86_64.qcow2 /srv/builds/48.90.202211270909-wzh-0/x86_64/rhcos-48.90.202211270909-wzh-0-qemu.x86_64.qcow2
# + set +x
# Successfully generated: rhcos-48.90.202211270909-wzh-0-qemu.x86_64.qcow2

cosa list
# 48.90.202211270909-wzh-0
#    Timestamp: 2022-11-27T09:14:21Z (0:05:40 ago)
#    Artifacts: ostree qemu
#       Config: wzh-ocp-4.8-based-on-4.13-rhel-9.0 (64094f653298) (dirty)

cosa upload-oscontainer --name "quay.io/wangzheng422/ocp"
# ......
# 2022-11-27 09:22:35,785 INFO - Running command: ['ostree', '--repo=/srv/tmp/containers-storage/vfs/dir/da857426a657461466a3d17f4faa848f71a9a311b2fec5165946adabf5ea3900/srv/repo', 'pull-local', '--disable-fsync', '/srv/tmp/repo', '3c009c9794dc1deea6b419e84e56d17247954d236777842de59abef6ef82658f']
# Writing objects: 55
# 2022-11-27 09:22:41,424 INFO - Running command: ['tar', '-xf', '/srv/builds/48.90.202211270909-wzh-0/x86_64/rhcos-48.90.202211270909-wzh-0-extensions.x86_64.tar']
# 2022-11-27 09:22:41,665 INFO - Running command: ['buildah', '--root=./tmp/containers-storage', '--storage-driver', 'vfs', 'config', '--entrypoint', '["/noentry"]', '-l', 'com.coreos.ostree-commit=3c009c9794dc1deea6b419e84e56d17247954d236777842de59abef6ef82658f', '-l', 'version=48.90.202211270909-wzh-0', '-l', 'com.coreos.rpm.cri-o=1.25.0-53.rhaos4.12.git2002c49.el9.x86_64', '-l', 'com.coreos.rpm.ignition=2.13.0-1.el9.x86_64', '-l', 'com.coreos.rpm.kernel=5.14.0-70.30.1.el9_0.x86_64', '-l', 'com.coreos.rpm.ostree=2022.5-1.el9_0.x86_64', '-l', 'com.coreos.rpm.rpm-ostree=2022.2-2.el9.x86_64', '-l', 'com.coreos.rpm.runc=4:1.1.3-2.el9_0.x86_64', '-l', 'com.coreos.rpm.systemd=250-6.el9_0.1.x86_64', '-l', 'com.coreos.coreos-assembler-commit=538402ec655961f7a79e9745c9a3af67e1123e39', '-l', 'com.coreos.redhat-coreos-commit=64094f6532982cd2118224785b88ba2890659aee', '-l', 'com.coreos.os-extensions=kerberos;kernel-devel;kernel-rt;usbguard;sandboxed-containers', '-l', 'com.coreos.rpm.kernel=5.14.0-70.30.1.el9_0.x86_64', '-l', 'com.coreos.rpm.kernel-rt-core=5.14.0-70.30.1.rt21.102.el9_0.x86_64', '-l', 'io.openshift.build.version-display-names=machine-os=Red Hat Enterprise Linux CoreOS', '-l', 'io.openshift.build.versions=machine-os=48.90.202211270909-wzh-0', 'ubi-working-container']
# WARN[0000] cmd "/bin/bash" exists and will be passed to entrypoint as a parameter
# Committing container...
# Getting image source signatures
# Copying blob 33204bfe17ee skipped: already exists
# Copying blob 06081b81a130 done
# Copying config 031de9981c done
# Writing manifest to image destination
# Storing signatures
# quay.io/wangzheng422/ocp:48.90.202211270909-wzh-0 031de9981c87301aeaffa5c7a0166067dad7a5c7f86166e999694953b89ef264
# Pushing container
# 2022-11-27 09:23:24,398 INFO - Running command: ['buildah', '--root=./tmp/containers-storage', '--storage-driver', 'vfs', 'push', '--tls-verify', '--authfile=/home/builder/.docker/config.json', '--digestfile=tmp/oscontainer-digest', '--format=v2s2', 'quay.io/wangzheng422/ocp:48.90.202211270909-wzh-0']
# Getting image source signatures
# Copying blob 06081b81a130 done
# Copying blob 33204bfe17ee done
# Copying config 031de9981c done
# Writing manifest to image destination
# Storing signatures

cosa buildextend-metal
# ......
# + cosa meta --workdir /srv --build 48.90.202211270909-wzh-0 --artifact metal --artifact-json /srv/tmp/build.metal/meta.json.new
# /srv/builds/48.90.202211270909-wzh-0/x86_64/meta.json wrote with version stamp 1669541240634979743
# + /usr/lib/coreos-assembler/finalize-artifact rhcos-48.90.202211270909-wzh-0-metal.x86_64.raw /srv/builds/48.90.202211270909-wzh-0/x86_64/rhcos-48.90.202211270909-wzh-0-metal.x86_64.raw
# + set +x
# Successfully generated: rhcos-48.90.202211270909-wzh-0-metal.x86_64.raw

cosa buildextend-metal4k
# ......
# + cosa meta --workdir /srv --build 48.90.202211270909-wzh-0 --artifact metal4k --artifact-json /srv/tmp/build.metal4k/meta.json.new
# /srv/builds/48.90.202211270909-wzh-0/x86_64/meta.json wrote with version stamp 1669541380398141511
# + /usr/lib/coreos-assembler/finalize-artifact rhcos-48.90.202211270909-wzh-0-metal4k.x86_64.raw /srv/builds/48.90.202211270909-wzh-0/x86_64/rhcos-48.90.202211270909-wzh-0-metal4k.x86_64.raw
# + set +x
# Successfully generated: rhcos-48.90.202211270909-wzh-0-metal4k.x86_64.raw

cosa buildextend-live
# ......
# 2022-11-27 09:38:49,575 INFO - Running command: ['/usr/bin/isohybrid', '--uefi', '/srv/tmp/buildpost-live/rhcos-48.90.202211270909-wzh-0-live.x86_64.iso.minimal']
# 2022-11-27 09:38:49,661 INFO - Running command: ['/usr/lib/coreos-assembler/runvm-coreos-installer', 'builds/48.90.202211270909-wzh-0/x86_64/rhcos-48.90.202211270909-wzh-0-metal.x86_64.raw', '', 'pack', 'minimal-iso', '/srv/tmp/buildpost-live/rhcos-48.90.202211270909-wzh-0-live.x86_64.iso', '/srv/tmp/buildpost-live/rhcos-48.90.202211270909-wzh-0-live.x86_64.iso.minimal', '--consume']
# + RUST_BACKTRACE=full
# + chroot /sysroot/ostree/deploy/rhcos/deploy/3c009c9794dc1deea6b419e84e56d17247954d236777842de59abef6ef82658f.0 env -C /srv coreos-installer pack minimal-iso /srv/tmp/buildpost-live/rhcos-48.90.202211270909-wzh-0-live.x86_64.iso /srv/tmp/buildpost-live/rhcos-48.90.202211270909-wzh-0-live.x86_64.iso.minimal --consume
# Packing minimal ISO
# Matched 16 files of 16
# Total bytes skipped: 89430463
# Total bytes written: 747073
# Total bytes written (compressed): 2788
# Verifying that packed image matches digest
# Packing successful!
# + '[' -f /var/tmp/coreos-installer-output ']'
# Updated: builds/48.90.202211270909-wzh-0/x86_64/meta.json


# run them all
cat << 'EOF' > /root/build.sh
# exit when any command fails
set -e
set -x

rm -rf /data/rhcos
mkdir -p /data/rhcos
cd /data/rhcos

export COREOS_ASSEMBLER_CONTAINER=quay.io/coreos-assembler/coreos-assembler:latest
podman pull $COREOS_ASSEMBLER_CONTAINER

cosa() {
   env | grep COREOS_ASSEMBLER
   local -r COREOS_ASSEMBLER_CONTAINER_LATEST="quay.io/coreos-assembler/coreos-assembler:latest"
   if [[ -z ${COREOS_ASSEMBLER_CONTAINER} ]] && $(podman image exists ${COREOS_ASSEMBLER_CONTAINER_LATEST}); then
       local -r cosa_build_date_str="$(podman inspect -f "{{.Created}}" ${COREOS_ASSEMBLER_CONTAINER_LATEST} | awk '{print $1}')"
       local -r cosa_build_date="$(date -d ${cosa_build_date_str} +%s)"
       if [[ $(date +%s) -ge $((cosa_build_date + 60*60*24*7)) ]] ; then
         echo -e "\e[0;33m----" >&2
         echo "The COSA container image is more that a week old and likely outdated." >&2
         echo "You should pull the latest version with:" >&2
         echo "podman pull ${COREOS_ASSEMBLER_CONTAINER_LATEST}" >&2
         echo -e "----\e[0m" >&2
         sleep 10
       fi
   fi
   set -x
   podman run --rm -ti --security-opt label=disable --privileged                                    \
              --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536                          \
              -v ${PWD}:/srv/ --device /dev/kvm --device /dev/fuse                                  \
              -v /run/user/0/containers/auth.json:/home/builder/.docker/config.json                      \
              --tmpfs /tmp -v /var/tmp:/var/tmp --name cosa                                         \
              ${COREOS_ASSEMBLER_CONFIG_GIT:+-v $COREOS_ASSEMBLER_CONFIG_GIT:/srv/src/config/:ro}   \
              ${COREOS_ASSEMBLER_GIT:+-v $COREOS_ASSEMBLER_GIT/src/:/usr/lib/coreos-assembler/:ro}  \
              ${COREOS_ASSEMBLER_CONTAINER_RUNTIME_ARGS}                                            \
              ${COREOS_ASSEMBLER_CONTAINER:-$COREOS_ASSEMBLER_CONTAINER_LATEST} "$@"
   rc=$?; set +x; return $rc
}


cosa init \
      --branch wzh-ocp-4.10-based-on-4.13-rhel-9 \
      --variant rhel-coreos-9 \
      https://github.com/wangzheng422/machine-os-content

sed -i 's/REPO_IP/45.76.173.230:5180/g' /data/rhcos/src/config/rhel-9.0.repo

cosa fetch

cosa build
cosa upload-oscontainer --name "quay.io/wangzheng422/ocp"
cosa buildextend-metal
cosa buildextend-metal4k
cosa buildextend-live

EOF

cd /root
bash /root/build.sh

# podman pull quay.io/wangzheng422/ocp:410.91.202211291516-wzh-0
# podman pull quay.io/wangzheng422/ocp@sha256:c7209dcadf2d27892eab9c692e8afb6a752307270526231961500647591d7129

ls -l /data/rhcos/builds/latest/x86_64/
# total 10333424
# -r--r--r--. 1 root root      66639 Nov 29 15:24 commitmeta.json
# -r--r--r--. 1 root root        473 Nov 29 15:16 coreos-assembler-config-git.json
# -r--r--r--. 1 root root     346037 Nov 29 15:16 coreos-assembler-config.tar.gz
# -rw-r--r--. 1 root root      14107 Nov 29 15:16 manifest.json
# -r--r--r--. 1 root root      33628 Nov 29 15:21 manifest-lock.generated.x86_64.json
# -rw-r--r--. 1 root root       6965 Nov 29 15:43 meta.json
# -r--r--r--. 1 root root      34844 Nov 29 15:21 ostree-commit-object
# -rw-r--r--. 1 root root  347832320 Nov 29 15:28 rhcos-410.91.202211291516-wzh-0-extensions.x86_64.tar
# -rw-r--r--. 1 root root   80525940 Nov 29 15:42 rhcos-410.91.202211291516-wzh-0-live-initramfs.x86_64.img
# -rw-r--r--. 1 root root   11649784 Nov 29 15:43 rhcos-410.91.202211291516-wzh-0-live-kernel-x86_64
# -rw-r--r--. 1 root root  930239488 Nov 29 15:42 rhcos-410.91.202211291516-wzh-0-live-rootfs.x86_64.img
# -rw-r--r--. 1 root root 1028653056 Nov 29 15:43 rhcos-410.91.202211291516-wzh-0-live.x86_64.iso
# -r--r--r--. 1 root root 3596615680 Nov 29 15:34 rhcos-410.91.202211291516-wzh-0-metal4k.x86_64.raw
# -r--r--r--. 1 root root 3596615680 Nov 29 15:32 rhcos-410.91.202211291516-wzh-0-metal.x86_64.raw
# -r--r--r--. 1 root root  965853184 Nov 29 15:24 rhcos-410.91.202211291516-wzh-0-ostree.x86_64.ociarchive
# -r--r--r--. 1 root root 2383609856 Nov 29 15:26 rhcos-410.91.202211291516-wzh-0-qemu.x86_64.qcow2

# ocp 4.8 is too buggy, we switch to ocp 4.10
# https://bugzilla.redhat.com/show_bug.cgi?id=2044808

# Create a new release based on openshift 4.10.41 and override a single image
export BUILDNUMBER=4.10.41
export VAR_RELEASE_VER=$BUILDNUMBER-rhel-9.1-v02

oc adm release new -a /data/pull-secret.json \
  --from-release `  curl -s https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/$BUILDNUMBER/release.txt | grep "Pull From:"  | awk '{print $3}'  ` \
  machine-os-content=quay.io/wangzheng422/ocp@sha256:c7209dcadf2d27892eab9c692e8afb6a752307270526231961500647591d7129 \
  --to-image docker.io/wangzheng422/ocp:$VAR_RELEASE_VER

# docker.io/wangzheng422/ocp:4.10.41-rhel-9.1-v02

oc image mirror docker.io/wangzheng422/ocp:$VAR_RELEASE_VER quay.io/wangzheng422/ocp:$VAR_RELEASE_VER

# podman pull quay.io/wangzheng422/ocp:4.10.41-rhel-9.1-v02
# podman pull quay.io/wangzheng422/ocp@sha256:73394d5833b12a848fed80154953fe97962362cc153b239e513afade7f87fb3c


try to install using UPI

我们已经准备好了镜像,那就试试装一个集群出来看看什么样子的。

We have prepared the image, so let's try to install a cluster to see what it looks like.

on vps, download image and binary for 4.10.41

第一步,还是在公网上,下载一些安装用的文件,这一步不是必须的。我们主要用里面的ansible工具,配置我们环境的dns。

The first step is to download some installation files from the public network. This step is not necessary. We mainly use the ansible tool inside to configure the dns of our environment.

# download image and binary for 4.8.53 
# on vultr

rm -rf /data/ocp4/

mkdir -p /data/ocp4/
cd /data/ocp4

export BUILDNUMBER=4.11.18

wget -O openshift-client-linux-${BUILDNUMBER}.tar.gz https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/${BUILDNUMBER}/openshift-client-linux-${BUILDNUMBER}.tar.gz
wget -O openshift-install-linux-${BUILDNUMBER}.tar.gz https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/${BUILDNUMBER}/openshift-install-linux-${BUILDNUMBER}.tar.gz

tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/

wget -O opm-linux.tar.gz https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/opm/4.6.1/opm-linux-4.6.1.tar.gz
tar -xzf opm-linux.tar.gz -C /usr/local/bin/

wget https://github.com/operator-framework/operator-registry/releases/download/v1.26.2/linux-amd64-opm
chmod +x linux-amd64-opm
install linux-amd64-opm /usr/local/bin/opm

rm -rf /data/ocp4/

mkdir -p /data/ocp4/tmp
cd /data/ocp4/tmp
git clone https://github.com/wangzheng422/openshift4-shell
cd openshift4-shell
git checkout ocp-4.8
/bin/cp -f prepare.content.with.oc.mirror.sh /data/ocp4/

rm -rf /data/ocp4/tmp

cd /data/ocp4

# bash prepare.content.with.oc.mirror.sh -v 4.11.5,${BUILDNUMBER}, -m ${BUILDNUMBER%.*} -b ocp-4.11
bash prepare.content.with.oc.mirror.sh -v ${BUILDNUMBER}, -m ${BUILDNUMBER%.*} -b ocp-4.8

import ocp content into quay

第二步,根据我们自定义的release image,同步安装镜像,到我们内部的镜像仓库,并且抽取安装二进制文件。

The second part, according to our custom release image, synchronously installs the image to our internal mirror warehouse, and extracts the installation binary file.


export BUILDNUMBER=4.11.18

pushd /data/ocp4/${BUILDNUMBER}
tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
# tar -xzf oc-mirror.tar.gz -C /usr/local/bin/
# chmod +x /usr/local/bin/oc-mirror
install -m 755 /data/ocp4/clients/butane-amd64 /usr/local/bin/butane
# install -m 755 /data/ocp4/clients/coreos-installer_amd64 /usr/local/bin/coreos-installer
popd

SEC_FILE="$XDG_RUNTIME_DIR/containers/auth.json"
# $XDG_RUNTIME_DIR/containers
mkdir -p ${SEC_FILE%/*}

# OR
# SEC_FILE="$HOME/.docker/config.json"
SEC_FILE="$HOME/.config/containers/auth.json"
mkdir -p ${SEC_FILE%/*}

# copy the password file 

podman login quaylab.infra.redhat.ren:8443 --username admin --password redhatadmin

export VAR_RELEASE_VER=4.10.41-rhel-9.1-v02

oc adm release mirror -a $SEC_FILE \
  --from=quay.io/wangzheng422/ocp:$VAR_RELEASE_VER \
  --to=quaylab.infra.wzhlab.top:5443/ocp4/openshift4
# ......
# Success
# Update image:  quaylab.infra.wzhlab.top:5443/ocp4/openshift4:4.10.41-x86_64
# Mirror prefix: quaylab.infra.wzhlab.top:5443/ocp4/openshift4

# To use the new mirrored repository to install, add the following section to the install-config.yaml:

# imageContentSources:
# - mirrors:
#   - quaylab.infra.wzhlab.top:5443/ocp4/openshift4
#   source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
# - mirrors:
#   - quaylab.infra.wzhlab.top:5443/ocp4/openshift4
#   source: quay.io/wangzheng422/ocp


# To use the new mirrored repository for upgrades, use the following to create an ImageContentSourcePolicy:

# apiVersion: operator.openshift.io/v1alpha1
# kind: ImageContentSourcePolicy
# metadata:
#   name: example
# spec:
#   repositoryDigestMirrors:
#   - mirrors:
#     - quaylab.infra.wzhlab.top:5443/ocp4/openshift4
#     source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
#   - mirrors:
#     - quaylab.infra.wzhlab.top:5443/ocp4/openshift4
#     source: quay.io/wangzheng422/ocp

# !!!! 注意,以下步骤必须执行,因为版本信息在可执行程序和里面 !!!

mkdir -p /data/work/ext-client
cd /data/work/ext-client

RELEASE_IMAGE=quay.io/wangzheng422/ocp:$VAR_RELEASE_VER
LOCAL_SECRET_JSON=/data/pull-secret.json

oc adm release extract --registry-config ${LOCAL_SECRET_JSON} --command='openshift-baremetal-install' ${RELEASE_IMAGE}

oc adm release extract --registry-config ${LOCAL_SECRET_JSON} --command='openshift-install' ${RELEASE_IMAGE}

oc adm release extract --registry-config ${LOCAL_SECRET_JSON} --command='oc' ${RELEASE_IMAGE}

# oc adm release extract --registry-config ${LOCAL_SECRET_JSON} --tools=true ${RELEASE_IMAGE}

./openshift-install version
# ./openshift-install 4.10.41
# built from commit 14145f0cbc879ca19cfcb583c86bd01595afb9d5
# release image quay.io/wangzheng422/ocp@sha256:1c6a539ac44c65e2d1005a270e5d05442deaa9b3a0101edab695010a90f09aed
# release architecture amd64

install -m 755 /data/work/ext-client/openshift-install /usr/local/bin/openshift-install
install -m 755 /data/work/ext-client/oc /usr/local/bin/oc
# install -m 755 /data/ocp4/clients/butane-amd64 /usr/local/bin/butane

mirror for disconnected

我们把operator用到的镜像,都mirror到内部镜像仓库试试。

# we use oc-mirror from ocp 4.11

cat > /data/ocp4/mirror.yaml << EOF
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
# archiveSize: 4
mirror:
  platform:
    architectures:
      - amd64
      # - arm64
    channels:
      # - name: stable-4.11
      #   type: ocp
      #   minVersion: 4.11.18
      #   maxVersion: 4.11.18
      #   shortestPath: true
      # - name: stable-4.10
      #   type: ocp
      #   minVersion: 4.10.45
      #   maxVersion: 4.10.45
      #   shortestPath: true
    graph: false
  additionalImages:
    - name: registry.redhat.io/redhat/redhat-operator-index:v4.10
    - name: registry.redhat.io/redhat/certified-operator-index:v4.10
    - name: registry.redhat.io/redhat/community-operator-index:v4.10
    - name: registry.redhat.io/redhat/redhat-marketplace-index:v4.10 
    - name: quay.io/wangzheng422/local-storage-operator:wzh-ocp-4.10-v01
    - name: quay.io/wangzheng422/local-storage-bundle:wzh-ocp-4.10-v01
    - name: quay.io/wangzheng422/local-diskmaker:wzh-ocp-4.10-v01
    - name: quay.io/wangzheng422/local-storage-operator:wzh-ocp-4.10-v01
    - name: quay.io/wangzheng422/local-must-gather:wzh-ocp-4.10-v01
    - name: quay.io/openshift/origin-kube-rbac-proxy:latest
    - name: quay.io/wangzheng422/debug-pod:alma-9.1
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.10  
      packages:
      - name: cluster-logging                                   
        channels:
        - name: stable
          minVersion: 5.5.5
      - name: elasticsearch-operator                               
        channels:
        - name: stable
          minVersion: 5.5.5
      - name: jaeger-product                             
        channels:
        - name: stable
          minVersion: 1.39.0-3
      - name: kubernetes-nmstate-operator                               
        channels:
        - name: stable
          minVersion: 4.10.0-202212061900
      - name: odf-operator                                 
        channels:
        - name: stable-4.10
          minVersion: 4.10.9
      - name: sriov-network-operator                             
        channels:
        - name: stable
          minVersion: 4.10.0-202212061900
      - name: kubevirt-hyperconverged
        channels:
        - name: stable
          minVersion: 4.10.7
    - catalog: quay.io/wangzheng422/local-storage-index:wzh-ocp-4.10-v01
      packages:
      - name: local-storage-operator                              
        channels:
        - name: preview
EOF

mkdir -p /data/install/mirror-tmp
cd /data/install/mirror-tmp

oc-mirror --config /data/ocp4/mirror.yaml docker://quaylab.infra.wzhlab.top:5443


mirror to files and import back

之前我们都是直接mirror到内部镜像仓库,但是实际项目环境,是根本不会联网的,所以我们需要先镜像到本地目录/文件,然后从目录/文件导入到内部镜像仓库。这里就按照这个流程做一遍。

mkdir -p /data/ocp4/
mkdir -p /data/ocp-install/images/
mkdir -p /data/ocp-install/clients/

cd /data/ocp4/

export BUILDNUMBER=4.10.41

wget -O openshift-client-linux-${BUILDNUMBER}.tar.gz https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/${BUILDNUMBER}/openshift-client-linux-${BUILDNUMBER}.tar.gz
wget -O openshift-install-linux-${BUILDNUMBER}.tar.gz https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/${BUILDNUMBER}/openshift-install-linux-${BUILDNUMBER}.tar.gz

tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/

export BUILDNUMBER=4.11.18

wget -O oc-mirror.tar.gz https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/${BUILDNUMBER}/oc-mirror.tar.gz
tar -xzf oc-mirror.tar.gz -C /usr/local/bin/
chmod +x /usr/local/bin/oc-mirror

# SEC_FILE="$XDG_RUNTIME_DIR/containers/auth.json"
# # $XDG_RUNTIME_DIR/containers
# mkdir -p ${SEC_FILE%/*}

# OR
SEC_FILE="$HOME/.config/containers/auth.json"
mkdir -p ${SEC_FILE%/*}

# copy the password file 

# podman login quaylab.infra.redhat.ren:8443 --username admin --password redhatadmin

export VAR_RELEASE_VER=4.10.41-rhel-9.1-v02

oc adm release mirror -a $SEC_FILE \
  --from=quay.io/wangzheng422/ocp:$VAR_RELEASE_VER \
  --to-dir=/data/ocp-install/images/
# ......
# Success
# Update image:  openshift/release:4.10.41-x86_64

# To upload local images to a registry, run:

#     oc image mirror --from-dir=/data/ocp-install/images/ 'file://openshift/release:4.10.41-x86_64*' REGISTRY/REPOSITORY


cd /data/ocp-install/clients/

RELEASE_IMAGE=quay.io/wangzheng422/ocp:$VAR_RELEASE_VER
LOCAL_SECRET_JSON=$SEC_FILE

oc adm release extract --registry-config ${LOCAL_SECRET_JSON} --command='openshift-baremetal-install' ${RELEASE_IMAGE}

oc adm release extract --registry-config ${LOCAL_SECRET_JSON} --command='openshift-install' ${RELEASE_IMAGE}

oc adm release extract --registry-config ${LOCAL_SECRET_JSON} --command='oc' ${RELEASE_IMAGE}

/bin/cp -f /usr/local/bin/oc-mirror ./


cat > /data/ocp4/mirror.yaml << EOF
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
# archiveSize: 4
mirror:
  platform:
    architectures:
      - amd64
      # - arm64
    channels:
      # - name: stable-4.11
      #   type: ocp
      #   minVersion: 4.11.18
      #   maxVersion: 4.11.18
      #   shortestPath: true
      # - name: stable-4.10
      #   type: ocp
      #   minVersion: 4.10.45
      #   maxVersion: 4.10.45
      #   shortestPath: true
    graph: false
  additionalImages:
    - name: registry.redhat.io/redhat/redhat-operator-index:v4.10
    - name: registry.redhat.io/redhat/certified-operator-index:v4.10
    - name: registry.redhat.io/redhat/community-operator-index:v4.10
    - name: registry.redhat.io/redhat/redhat-marketplace-index:v4.10 
    - name: quay.io/wangzheng422/local-storage-operator:wzh-ocp-4.10-v01
    - name: quay.io/wangzheng422/local-storage-bundle:wzh-ocp-4.10-v01
    - name: quay.io/wangzheng422/local-diskmaker:wzh-ocp-4.10-v01
    - name: quay.io/wangzheng422/local-storage-operator:wzh-ocp-4.10-v01
    - name: quay.io/wangzheng422/local-must-gather:wzh-ocp-4.10-v01
    - name: quay.io/openshift/origin-kube-rbac-proxy:latest
    - name: quay.io/wangzheng422/debug-pod:alma-9.1
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.10  
      packages:
      - name: cluster-logging                                   
        channels:
        - name: stable
          minVersion: 5.5.5
      - name: elasticsearch-operator                               
        channels:
        - name: stable
          minVersion: 5.5.5
      - name: jaeger-product                             
        channels:
        - name: stable
          minVersion: 1.39.0-3
      - name: kubernetes-nmstate-operator                               
        channels:
        - name: stable
          minVersion: 4.10.0-202212061900
      - name: odf-operator                                 
        channels:
        - name: stable-4.10
          minVersion: 4.10.9
      - name: sriov-network-operator                             
        channels:
        - name: stable
          minVersion: 4.10.0-202212061900
      - name: kubevirt-hyperconverged
        channels:
        - name: stable
          minVersion: 4.10.7
    - catalog: quay.io/wangzheng422/local-storage-index:wzh-ocp-4.10-v01
      packages:
      - name: local-storage-operator                              
        channels:
        - name: preview
EOF

mkdir -p /data/ocp-install/oc-mirror/
cd /data/ocp-install/oc-mirror/

oc-mirror --config /data/ocp4/mirror.yaml file:///data/ocp-install/oc-mirror/



mkdir -p /data/bypy
cd /data/bypy

cd /data
# export BUILDNUMBER=4.8.17

tar -cvf - ocp-install/ | pigz -c > /data/bypy/ocp-install.tgz

cd /data/bypy
# https://github.com/houtianze/bypy
# yum -y install python3-pip
# pip3 install --user bypy 
# /root/.local/bin/bypy list
/root/.local/bin/bypy upload


# test import
tar zvxf ocp-install.tgz

/bin/cp -f ./ocp-install/clients/* /usr/local/bin/

oc image mirror --from-dir=./ocp-install/images/ 'file://openshift/release:4.10.41-x86_64*' quaylab.infra.wzhlab.top:5443/ocp4/openshift4

oc-mirror --from=./ocp-install/oc-mirror/mirror_seq1_000000.tar \
  docker://quaylab.infra.wzhlab.top:5443

try to config the ocp install

然后,我们就开始定义ocp的安装install配置文件,并且由于我们是UPI安装,我们还要定制iso。

Then, we start to define the installation configuration file of ocp, and since we are installing using UPI, we also need to customize the iso.


# export BUILDNUMBER=4.8.53

# pushd /data/ocp4/${BUILDNUMBER}
# tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
# tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
# tar -xzf oc-mirror.tar.gz -C /usr/local/bin/
# chmod +x /usr/local/bin/oc-mirror
# install -m 755 /data/ocp4/clients/butane-amd64 /usr/local/bin/butane
# install -m 755 /data/ocp4/clients/coreos-installer_amd64 /usr/local/bin/coreos-installer
# popd


# create a user and create the cluster under the user


useradd -m 3node
# useradd -G wheel 3node

usermod -aG wheel 3node

echo -e "%wheel\tALL=(ALL)\tNOPASSWD: ALL" > /etc/sudoers.d/020_sudo_for_me

su - 3node

ssh-keygen

cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

chmod 600 ~/.ssh/config

cat << 'EOF' >> ~/.bashrc

export BASE_DIR='/home/3node/'

EOF

# export BASE_DIR='/home/3node/'

mkdir -p ${BASE_DIR}/data/{sno/disconnected,install}

# set some parameter of you rcluster

NODE_SSH_KEY="$(cat ${BASE_DIR}/.ssh/id_rsa.pub)"
INSTALL_IMAGE_REGISTRY=quaylab.infra.wzhlab.top:5443

PULL_SECRET='{"auths":{"registry.redhat.io": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"'${INSTALL_IMAGE_REGISTRY}'": {"auth": "'$( echo -n 'admin:redhatadmin' | openssl base64 )'","email": "noemail@localhost"}}}'

# NTP_SERVER=192.168.7.11
# HELP_SERVER=192.168.7.11
# KVM_HOST=192.168.7.11
# API_VIP=192.168.7.100
# INGRESS_VIP=192.168.7.101
# CLUSTER_PROVISION_IP=192.168.7.103
# BOOTSTRAP_IP=192.168.7.12

# 定义单节点集群的节点信息
SNO_CLUSTER_NAME=acm-demo-one
SNO_BASE_DOMAIN=wzhlab.top

# echo ${SNO_IF_MAC} > /data/sno/sno.mac

mkdir -p ${BASE_DIR}/data/install
cd ${BASE_DIR}/data/install

/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] 

cat << EOF > ${BASE_DIR}/data/install/install-config.yaml 
apiVersion: v1
baseDomain: $SNO_BASE_DOMAIN
compute:
- name: worker
  replicas: 0 
controlPlane:
  name: master
  replicas: 3 
metadata:
  name: $SNO_CLUSTER_NAME
networking:
  # OVNKubernetes , OpenShiftSDN
  networkType: OVNKubernetes
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd01::/48
    hostPrefix: 64
  serviceNetwork:
  - 172.30.0.0/16
  - fd02::/112
  machineNetwork:
  - cidr: 10.0.0.0/16
  - cidr: fd03::/64
platform:
  none: {}
pullSecret: '${PULL_SECRET}'
sshKey: |
$( cat ${BASE_DIR}/.ssh/id_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/wangzheng422/ocp
EOF

/bin/cp -f ${BASE_DIR}/data/install/install-config.yaml ${BASE_DIR}/data/install/install-config.yaml.bak

openshift-install create manifests --dir=${BASE_DIR}/data/install

# additional ntp config
/bin/cp -f  /data/ocp4/ansible-helper/files/* ${BASE_DIR}/data/install/openshift/

#############################################
# run as root if you have not run below, at least one time
# it will generate registry configuration
# copy image registry proxy related config
# cd /data/ocp4
# bash image.registries.conf.sh nexus.infra.redhat.ren:8083

# /bin/cp -f /data/ocp4/image.registries.conf /etc/containers/registries.conf.d/
#############################################

sudo bash -c "cd /data/ocp4 ; bash image.registries.conf.sh quaylab.infra.wzhlab.top:5443 ;"

/bin/cp -f /data/ocp4/99-worker-container-registries.yaml ${BASE_DIR}/data/install/openshift
/bin/cp -f /data/ocp4/99-master-container-registries.yaml ${BASE_DIR}/data/install/openshift

cd ${BASE_DIR}/data/install/

openshift-install --dir=${BASE_DIR}/data/install create ignition-configs 

BOOTSTRAP_IP=192.168.77.22
MASTER_01_IP=192.168.77.23
MASTER_02_IP=192.168.77.24
MASTER_03_IP=192.168.77.25

BOOTSTRAP_IPv6=fd03::22
MASTER_01_IPv6=fd03::23
MASTER_02_IPv6=fd03::24
MASTER_03_IPv6=fd03::25

BOOTSTRAP_HOSTNAME=bootstrap-demo
MASTER_01_HOSTNAME=master-01-demo
MASTER_02_HOSTNAME=master-02-demo
MASTER_03_HOSTNAME=master-03-demo

BOOTSTRAP_INTERFACE=enp1s0
MASTER_01_INTERFACE=enp1s0
MASTER_02_INTERFACE=enp1s0
MASTER_03_INTERFACE=enp1s0

BOOTSTRAP_DISK=/dev/vda
MASTER_01_DISK=/dev/vda
MASTER_02_DISK=/dev/vda
MASTER_03_DISK=/dev/vda

OCP_GW=192.168.77.11
OCP_NETMASK=255.255.255.0
OCP_NETMASK_S=24
OCP_DNS=192.168.77.11

OCP_GW_v6=fd03::11
OCP_NETMASK_v6=64

# HTTP_PATH=http://192.168.7.11:8080/ignition

source /data/ocp4/acm.fn.sh

# 我们会创建一个wzh用户,密码是redhat,这个可以在第一次启动的是,从console/ssh直接用用户名口令登录
# 方便排错和研究
VAR_PWD_HASH="$(python3 -c 'import crypt,getpass; print(crypt.crypt("redhat"))')"

cat ${BASE_DIR}/data/install/bootstrap.ign \
  | jq --arg VAR "$VAR_PWD_HASH" --arg VAR_SSH "$NODE_SSH_KEY" '.passwd.users += [{ "name": "wzh", "system": true, "passwordHash": $VAR , "sshAuthorizedKeys": [ $VAR_SSH ], "groups": [ "adm", "wheel", "sudo", "systemd-journal"  ] }]' \
  | jq -c . \
  > ${BASE_DIR}/data/install/bootstrap-iso.ign

cat ${BASE_DIR}/data/install/master.ign \
  | jq --arg VAR "$VAR_PWD_HASH" --arg VAR_SSH "$NODE_SSH_KEY" '.passwd.users += [{ "name": "wzh", "system": true, "passwordHash": $VAR , "sshAuthorizedKeys": [ $VAR_SSH ], "groups": [ "adm", "wheel", "sudo", "systemd-journal"  ] }]' \
  | jq -c . \
  > ${BASE_DIR}/data/install/master-iso.ign

VAR_IMAGE_VER=410.91.202211291516-wzh-0

cd ${BASE_DIR}/data/install/
/bin/cp -f /data/work/ext-client/iso/rhcos-$VAR_IMAGE_VER-live.x86_64.iso bootstrap.iso
/bin/cp -f bootstrap.iso master01.iso
/bin/cp -f bootstrap.iso master02.iso
/bin/cp -f bootstrap.iso master03.iso
sudo /bin/cp -f /data/work/ext-client/iso/rhcos-$VAR_IMAGE_VER-metal.x86_64.raw /data/dnf/
sudo /bin/cp -f ${BASE_DIR}/data/install/{bootstrap,master}-iso.ign /data/dnf/

# for ipv4 only
coreos-installer iso kargs modify -a "ip=$BOOTSTRAP_IP::$OCP_GW:$OCP_NETMASK:$BOOTSTRAP_HOSTNAME:$BOOTSTRAP_INTERFACE:none nameserver=$OCP_DNS coreos.inst.install_dev=$BOOTSTRAP_DISK coreos.inst.ignition_url=http://192.168.77.11:5000/bootstrap-iso.ign coreos.inst.image_url=http://192.168.77.11:5000/rhcos-$VAR_IMAGE_VER-metal.x86_64.raw coreos.inst.insecure" bootstrap.iso

coreos-installer iso kargs modify -a "ip=$MASTER_01_IP::$OCP_GW:$OCP_NETMASK:$MASTER_01_HOSTNAME:$MASTER_01_INTERFACE:none nameserver=$OCP_DNS coreos.inst.install_dev=$MASTER_01_DISK coreos.inst.ignition_url=http://192.168.77.11:5000/master-iso.ign coreos.inst.image_url=http://192.168.77.11:5000/rhcos-$VAR_IMAGE_VER-metal.x86_64.raw coreos.inst.insecure" master01.iso

coreos-installer iso kargs modify -a "ip=$MASTER_02_IP::$OCP_GW:$OCP_NETMASK:$MASTER_02_HOSTNAME:$MASTER_02_INTERFACE:none nameserver=$OCP_DNS coreos.inst.install_dev=$MASTER_02_DISK coreos.inst.ignition_url=http://192.168.77.11:5000/master-iso.ign coreos.inst.image_url=http://192.168.77.11:5000/rhcos-$VAR_IMAGE_VER-metal.x86_64.raw coreos.inst.insecure" master02.iso

coreos-installer iso kargs modify -a "ip=$MASTER_03_IP::$OCP_GW:$OCP_NETMASK:$MASTER_03_HOSTNAME:$MASTER_03_INTERFACE:none nameserver=$OCP_DNS coreos.inst.install_dev=$MASTER_03_DISK coreos.inst.ignition_url=http://192.168.77.11:5000/master-iso.ign coreos.inst.image_url=http://192.168.77.11:5000/rhcos-$VAR_IMAGE_VER-metal.x86_64.raw coreos.inst.insecure" master03.iso


# for ipv4 / ipv6 dual stack
coreos-installer iso kargs modify -a " ip=$BOOTSTRAP_IP::$OCP_GW:$OCP_NETMASK:$BOOTSTRAP_HOSTNAME:$BOOTSTRAP_INTERFACE:none   nameserver=$OCP_DNS   ip=[$BOOTSTRAP_IPv6]::[$OCP_GW_v6]:$OCP_NETMASK_v6:$BOOTSTRAP_HOSTNAME:$BOOTSTRAP_INTERFACE:none   coreos.inst.install_dev=$BOOTSTRAP_DISK   coreos.inst.ignition_url=http://192.168.77.11:5000/bootstrap-iso.ign   coreos.inst.image_url=http://192.168.77.11:5000/rhcos-$VAR_IMAGE_VER-metal.x86_64.raw   coreos.inst.insecure " bootstrap.iso

coreos-installer iso kargs modify -a " ip=$MASTER_01_IP::$OCP_GW:$OCP_NETMASK:$MASTER_01_HOSTNAME:$MASTER_01_INTERFACE:none   nameserver=$OCP_DNS   ip=[$MASTER_01_IPv6]::[$OCP_GW_v6]:$OCP_NETMASK_v6:$MASTER_01_HOSTNAME:$MASTER_01_INTERFACE:none  coreos.inst.install_dev=$MASTER_01_DISK   coreos.inst.ignition_url=http://192.168.77.11:5000/master-iso.ign   coreos.inst.image_url=http://192.168.77.11:5000/rhcos-$VAR_IMAGE_VER-metal.x86_64.raw   coreos.inst.insecure " master01.iso

coreos-installer iso kargs modify -a " ip=$MASTER_02_IP::$OCP_GW:$OCP_NETMASK:$MASTER_02_HOSTNAME:$MASTER_02_INTERFACE:none   nameserver=$OCP_DNS   ip=[$MASTER_02_IPv6]::[$OCP_GW_v6]:$OCP_NETMASK_v6:$MASTER_02_HOSTNAME:$MASTER_02_INTERFACE:none  coreos.inst.install_dev=$MASTER_02_DISK   coreos.inst.ignition_url=http://192.168.77.11:5000/master-iso.ign   coreos.inst.image_url=http://192.168.77.11:5000/rhcos-$VAR_IMAGE_VER-metal.x86_64.raw   coreos.inst.insecure " master02.iso

coreos-installer iso kargs modify -a " ip=$MASTER_03_IP::$OCP_GW:$OCP_NETMASK:$MASTER_03_HOSTNAME:$MASTER_03_INTERFACE:none   nameserver=$OCP_DNS   ip=[$MASTER_03_IPv6]::[$OCP_GW_v6]:$OCP_NETMASK_v6:$MASTER_03_HOSTNAME:$MASTER_03_INTERFACE:none  coreos.inst.install_dev=$MASTER_03_DISK   coreos.inst.ignition_url=http://192.168.77.11:5000/master-iso.ign   coreos.inst.image_url=http://192.168.77.11:5000/rhcos-$VAR_IMAGE_VER-metal.x86_64.raw   coreos.inst.insecure " master03.iso

deploy on kvm host

有了iso文件,我们就可以用他们启动kvm,开始安装了,这一部分,可以参考引用文档,这里就不重复写了。

With the iso files, we can use them to start kvm and start the installation. For this part, you can refer to the reference document, so I will not repeat it here.

result

等着安装完成,什么都不需要做,然后运行下面的命令,就能得到我们集群的登录参数了。

之后,我们登录到节点,就能看到,节点的kernel已经升级好了。

Wait for the installation to complete, you don't need to do anything, and then run the following command to get the login parameters of our cluster.

After that, when we log in to the node, we can see that the kernel of the node has been upgraded.


openshift-install wait-for install-complete --log-level debug
# ......
# INFO Waiting up to 10m0s (until 12:31PM) for the openshift-console route to be created...
# DEBUG Route found in openshift-console namespace: console
# DEBUG OpenShift console route is admitted
# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/3node/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.acm-demo-one.wzhlab.top
# INFO Login to the console with user: "kubeadmin", and password: "NpBWx-CM25p-oykYx-TBAoy"
# DEBUG Time elapsed per stage:
# DEBUG Cluster Operators: 6m44s
# INFO Time elapsed: 6m44s

password login and oc config


# init setting for helper node
cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF
chmod 600 ~/.ssh/config


cat > ${BASE_DIR}/data/install/crack.txt << 'EOF'

echo redhat | sudo passwd --stdin root

sudo sh -c 'echo "PasswordAuthentication yes" > /etc/ssh/sshd_config.d/99-wzh.conf '
sudo sh -c 'echo "PermitRootLogin yes" >> /etc/ssh/sshd_config.d/99-wzh.conf '
sudo sh -c 'echo "ClientAliveInterval 1800" >> /etc/ssh/sshd_config.d/99-wzh.conf '

sudo systemctl restart sshd

sudo sh -c 'echo "export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost.kubeconfig" >> /root/.bashrc'

sudo sh -c 'echo "RET=\`oc config use-context system:admin\`" >> /root/.bashrc'

EOF

for i in 23 24 25
do
  ssh core@192.168.77.$i < ${BASE_DIR}/data/install/crack.txt
done

from other host

# https://unix.stackexchange.com/questions/230084/send-the-password-through-stdin-in-ssh-copy-id
dnf install -y sshpass

for i in 23 24 25
do
  sshpass -p 'redhat' ssh-copy-id root@192.168.77.$i
done

log into ocp to check

我们登录到openshift里面,看看成果吧。

Let's log in to openshift and see the results.

# login to master-01
uname -a
# Linux master-01-demo 5.14.0-162.6.1.el9_1.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 30 07:36:03 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/os-release
# NAME="Red Hat Enterprise Linux CoreOS"
# ID="rhcos"
# ID_LIKE="rhel fedora"
# VERSION="410.91.202211291516-wzh-0"
# VERSION_ID="4.10"
# VARIANT="CoreOS"
# VARIANT_ID=coreos
# PLATFORM_ID="platform:el9"
# PRETTY_NAME="Red Hat Enterprise Linux CoreOS 410.91.202211291516-wzh-0 (Plow)"
# ANSI_COLOR="0;31"
# CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos"
# HOME_URL="https://www.redhat.com/"
# DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.10/"
# BUG_REPORT_URL="https://bugzilla.redhat.com/"
# REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
# REDHAT_BUGZILLA_PRODUCT_VERSION="4.10"
# REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
# REDHAT_SUPPORT_PRODUCT_VERSION="4.10"
# OPENSHIFT_VERSION="4.10"
# RHEL_VERSION="9.1"
# OSTREE_VERSION="410.91.202211291516-wzh-0"

lscpu
# Architecture:            x86_64
#   CPU op-mode(s):        32-bit, 64-bit
#   Address sizes:         48 bits physical, 48 bits virtual
#   Byte Order:            Little Endian
# CPU(s):                  128
#   On-line CPU(s) list:   0-127
# Vendor ID:               HygonGenuine
#   BIOS Vendor ID:        Chengdu Hygon
#   Model name:            Hygon C86 7285 32-core Processor
#     BIOS Model name:     Hygon C86 7285 32-core Processor
#     CPU family:          24
#     Model:               1
#     Thread(s) per core:  2
#     Core(s) per socket:  32
#     Socket(s):           2
#     Stepping:            1
#     Frequency boost:     enabled
#     CPU max MHz:         2000.0000
#     CPU min MHz:         1200.0000
#     BogoMIPS:            4000.04
#     Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rd
#                          tscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_
#                          2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce
#                           topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clf
#                          lushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid d
#                          ecodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sme sev sev_es
# Virtualization features:
#   Virtualization:        AMD-V
# Caches (sum of all):
#   L1d:                   2 MiB (64 instances)
#   L1i:                   4 MiB (64 instances)
#   L2:                    32 MiB (64 instances)
#   L3:                    128 MiB (16 instances)
# NUMA:
#   NUMA node(s):          8
#   NUMA node0 CPU(s):     0-7,64-71
#   NUMA node1 CPU(s):     8-15,72-79
#   NUMA node2 CPU(s):     16-23,80-87
#   NUMA node3 CPU(s):     24-31,88-95
#   NUMA node4 CPU(s):     32-39,96-103
#   NUMA node5 CPU(s):     40-47,104-111
#   NUMA node6 CPU(s):     48-55,112-119
#   NUMA node7 CPU(s):     56-63,120-127
# Vulnerabilities:
#   Itlb multihit:         Not affected
#   L1tf:                  Not affected
#   Mds:                   Not affected
#   Meltdown:              Not affected
#   Mmio stale data:       Not affected
#   Retbleed:              Mitigation; untrained return thunk; SMT vulnerable
#   Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
#   Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
#   Spectre v2:            Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
#   Srbds:                 Not affected
#   Tsx async abort:       Not affected


oc get mcp
# NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
# master   rendered-master-e21c00ca880030866d0c598d24ca301b   True      False      False      3              3                   3                     0                      40m
# worker   rendered-worker-537f39ac419adbe3ede22a4d09132329   True      False      False      0              0                   0                     0                      40m

oc get node
# NAME             STATUS   ROLES           AGE   VERSION
# master-01-demo   Ready    master,worker   45m   v1.23.12+8a6bfe4
# master-02-demo   Ready    master,worker   44m   v1.23.12+8a6bfe4
# master-03-demo   Ready    master,worker   43m   v1.23.12+8a6bfe4

oc get clusterversion
# NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
# version   4.10.41   True        False         5h30m   Cluster version is 4.10.41

oc get co
# NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
# authentication                             4.10.41   True        False         False      19m
# baremetal                                  4.10.41   True        False         False      43m
# cloud-controller-manager                   4.10.41   True        False         False      49m
# cloud-credential                           4.10.41   True        False         False      50m
# cluster-autoscaler                         4.10.41   True        False         False      43m
# config-operator                            4.10.41   True        False         False      44m
# console                                    4.10.41   True        False         False      28m
# csi-snapshot-controller                    4.10.41   True        False         False      32m
# dns                                        4.10.41   True        False         False      32m
# etcd                                       4.10.41   True        False         False      42m
# image-registry                             4.10.41   True        False         False      30m
# ingress                                    4.10.41   True        False         False      32m
# insights                                   4.10.41   True        False         False      90s
# kube-apiserver                             4.10.41   True        False         False      40m
# kube-controller-manager                    4.10.41   True        False         False      41m
# kube-scheduler                             4.10.41   True        False         False      40m
# kube-storage-version-migrator              4.10.41   True        False         False      30m
# machine-api                                4.10.41   True        False         False      43m
# machine-approver                           4.10.41   True        False         False      43m
# machine-config                             4.10.41   True        False         False      43m
# marketplace                                4.10.41   True        False         False      43m
# monitoring                                 4.10.41   True        False         False      36m
# network                                    4.10.41   True        False         False      44m
# node-tuning                                4.10.41   True        False         False      43m
# openshift-apiserver                        4.10.41   True        False         False      32m
# openshift-controller-manager               4.10.41   True        False         False      32m
# openshift-samples                          4.10.41   True        False         False      37m
# operator-lifecycle-manager                 4.10.41   True        False         False      43m
# operator-lifecycle-manager-catalog         4.10.41   True        False         False      43m
# operator-lifecycle-manager-packageserver   4.10.41   True        False         False      32m
# service-ca                                 4.10.41   True        False         False      44m
# storage                                    4.10.41   True        False         False      44m

other config to fix hygon deploy errors

disk treated as removalable disk (flag RM)

不知道是海关x86 cpu的问题,还是这个服务器主板的问题,所有内置硬盘都会认成移动硬盘。在主板bios里面,sata controller没有可以配置的项,只有海光cpu有相关的配置,ACHI的相关配置,没有关闭热插拔的选项。

这个问题,对于安装openshift倒是没看出来有什么影响,但是会影响安装openshift data fundation(odf),因为odf安装的时候,会默认扫描节点的硬盘,然后把移动硬盘都排除。结果,海光cpu的服务器,就变成没有硬盘可以来装了。

没办法,我们只好定制local storage operator,这个东西是odf的底层,真正的硬盘扫描,就是这个operator干的。

# you can see the RM flag is set to 1
lsblk
# NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
# sda      8:0    1   3.6T  0 disk
# |-sda1   8:1    1     1M  0 part
# |-sda2   8:2    1   127M  0 part
# |-sda3   8:3    1   384M  0 part /boot
# `-sda4   8:4    1   3.6T  0 part /var/lib/kubelet/pods/9c993e46-ed1f-4f5c-a48a-bf563a29d6b8/volume-subpaths/etc/tuned/5
#                                  /var/lib/kubelet/pods/9c993e46-ed1f-4f5c-a48a-bf563a29d6b8/volume-subpaths/etc/tuned/4
#                                  /var/lib/kubelet/pods/9c993e46-ed1f-4f5c-a48a-bf563a29d6b8/volume-subpaths/etc/tuned/3
#                                  /var/lib/kubelet/pods/9c993e46-ed1f-4f5c-a48a-bf563a29d6b8/volume-subpaths/etc/tuned/2
#                                  /var/lib/kubelet/pods/9c993e46-ed1f-4f5c-a48a-bf563a29d6b8/volume-subpaths/etc/tuned/1
#                                  /var/lib/containers/storage/overlay
#                                  /var
#                                  /sysroot/ostree/deploy/rhcos/var
#                                  /usr
#                                  /etc
#                                  /
#                                  /sysroot
# sdb      8:16   1 447.1G  0 disk
# sdc      8:32   1 447.1G  0 disk

dmesg | grep sdb
# [    6.900118] sd 16:0:0:0: [sdb] 937703088 512-byte logical blocks: (480 GB/447 GiB)
# [    6.900134] sd 16:0:0:0: [sdb] 4096-byte physical blocks
# [    6.900206] sd 16:0:0:0: [sdb] Write Protect is off
# [    6.900211] sd 16:0:0:0: [sdb] Mode Sense: 00 3a 00 00
# [    6.900908] sd 16:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
# [    6.953704] sd 16:0:0:0: [sdb] Attached SCSI removable disk

udevadm info --query all --path /sys/block/sdb --attribute-walk
  # looking at device '/devices/pci0000:00/0000:00:08.1/0000:05:00.2/ata18/host17/target17:0:0/17:0:0:0/block/sdb':
  #   KERNEL=="sdb"
  #   SUBSYSTEM=="block"
  #   DRIVER==""
  #   ATTR{alignment_offset}=="0"
  #   ATTR{capability}=="1"
  #   ATTR{discard_alignment}=="0"
  #   ATTR{diskseq}=="2"
  #   ATTR{events}=="media_change"
  #   ATTR{events_async}==""
  #   ATTR{events_poll_msecs}=="-1"
  #   ATTR{ext_range}=="256"
  #   ATTR{hidden}=="0"
  #   ATTR{inflight}=="       0        0"
  #   ATTR{integrity/device_is_integrity_capable}=="0"
  #   ATTR{integrity/format}=="none"
  #   ATTR{integrity/protection_interval_bytes}=="0"
  #   ATTR{integrity/read_verify}=="0"
  #   ATTR{integrity/tag_size}=="0"
  #   ATTR{integrity/write_generate}=="0"
  #   ATTR{mq/0/cpu_list}=="0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127"
  #   ATTR{mq/0/nr_reserved_tags}=="0"
  #   ATTR{mq/0/nr_tags}=="32"
  #   ATTR{power/control}=="auto"
  #   ATTR{power/runtime_active_time}=="0"
  #   ATTR{power/runtime_status}=="unsupported"
  #   ATTR{power/runtime_suspended_time}=="0"
  #   ATTR{queue/add_random}=="0"
  #   ATTR{queue/chunk_sectors}=="0"
  #   ATTR{queue/dax}=="0"
  #   ATTR{queue/discard_granularity}=="4096"
  #   ATTR{queue/discard_max_bytes}=="2147450880"
  #   ATTR{queue/discard_max_hw_bytes}=="2147450880"
  #   ATTR{queue/discard_zeroes_data}=="0"
  #   ATTR{queue/fua}=="0"
  #   ATTR{queue/hw_sector_size}=="512"
  #   ATTR{queue/io_poll}=="0"
  #   ATTR{queue/io_poll_delay}=="-1"
  #   ATTR{queue/io_timeout}=="30000"
  #   ATTR{queue/iosched/async_depth}=="48"
  #   ATTR{queue/iosched/fifo_batch}=="16"
  #   ATTR{queue/iosched/front_merges}=="1"
  #   ATTR{queue/iosched/prio_aging_expire}=="10000"
  #   ATTR{queue/iosched/read_expire}=="500"
  #   ATTR{queue/iosched/write_expire}=="5000"
  #   ATTR{queue/iosched/writes_starved}=="2"
  #   ATTR{queue/iostats}=="1"
  #   ATTR{queue/logical_block_size}=="512"
  #   ATTR{queue/max_discard_segments}=="1"
  #   ATTR{queue/max_hw_sectors_kb}=="32767"
  #   ATTR{queue/max_integrity_segments}=="0"
  #   ATTR{queue/max_sectors_kb}=="1280"
  #   ATTR{queue/max_segment_size}=="65536"
  #   ATTR{queue/max_segments}=="168"
  #   ATTR{queue/minimum_io_size}=="4096"
  #   ATTR{queue/nomerges}=="0"
  #   ATTR{queue/nr_requests}=="64"
  #   ATTR{queue/nr_zones}=="0"
  #   ATTR{queue/optimal_io_size}=="0"
  #   ATTR{queue/physical_block_size}=="4096"
  #   ATTR{queue/read_ahead_kb}=="128"
  #   ATTR{queue/rotational}=="0"
  #   ATTR{queue/rq_affinity}=="1"
  #   ATTR{queue/scheduler}=="[mq-deadline] kyber bfq none"
  #   ATTR{queue/stable_writes}=="0"
  #   ATTR{queue/virt_boundary_mask}=="0"
  #   ATTR{queue/wbt_lat_usec}=="2000"
  #   ATTR{queue/write_cache}=="write back"
  #   ATTR{queue/write_same_max_bytes}=="0"
  #   ATTR{queue/write_zeroes_max_bytes}=="0"
  #   ATTR{queue/zone_append_max_bytes}=="0"
  #   ATTR{queue/zone_write_granularity}=="0"
  #   ATTR{queue/zoned}=="none"
  #   ATTR{range}=="16"
  #   ATTR{removable}=="1"
  #   ATTR{ro}=="0"
  #   ATTR{size}=="937703088"
  #   ATTR{stat}=="      94        0     4504       16        0        0        0        0        0       16       16        0        0        0        0        0        0"
  #   ATTR{trace/act_mask}=="disabled"
  #   ATTR{trace/enable}=="0"
  #   ATTR{trace/end_lba}=="disabled"
  #   ATTR{trace/pid}=="disabled"
  #   ATTR{trace/start_lba}=="disabled"

build local-storage-operator

我们要做的,就是修改local-storage-operator里面的源代码,在源代码里面,写死了移动硬盘不能作为local-storage使用,我们就把这个限制放开。比较走运的是,这个项目的代码逻辑还算是简单,让我们比较方便的找到了写死的地方。

# https://github.com/wangzheng422/local-storage-operator

# dnf module -y install go-toolset docker ruby

dnf module -y install go-toolset  ruby
dnf install -y docker

rm -rf /data/operator
mkdir -p /data/operator
cd /data/operator

git clone https://github.com/wangzheng422/local-storage-operator
cd local-storage-operator
git checkout wzh-ocp-4.10

export REGISTRY=quay.io/wangzheng422/
export VERSION=wzh-ocp-4.10-v01

sed -i 's/REPO_IP/45.76.77.134:5180/g' wzh.repo

make images

make push-images
# quay.io/wangzheng422/local-diskmaker:wzh-ocp-4.10-v01
# quay.io/wangzheng422/local-storage-operator:wzh-ocp-4.10-v01
# quay.io/wangzheng422/local-must-gather:wzh-ocp-4.10-v01

make bundle
# quay.io/wangzheng422/local-storage-bundle:wzh-ocp-4.10-v01
# quay.io/wangzheng422/ocp/local-storage-index:wzh-ocp-4.10-v01

deploy RM hotfix to openshift

我们编译好了自定义版本的local-storage-operator,里面包括了operator本身,还有catalog source。接下来,我们就部署这个local-storage-operator版本,然后在这个基础之上,再部署odf。


cat << EOF >> ~/wzh/disk.fix.project.yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    openshift.io/cluster-monitoring: "true"
  name: openshift-local-storage
EOF
oc create --save-config -f ~/wzh/disk.fix.project.yaml

oc project openshift-local-storage

cat << EOF > ~/wzh/disk.fix.sub.yaml
# apiVersion: v1
# kind: Namespace
# metadata:
#   labels:
#     openshift.io/cluster-monitoring: "true"
#   name: openshift-local-storage
---
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  name: local-operator-group
  namespace: openshift-local-storage
spec:
  targetNamespaces:
  - openshift-local-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: localstorage-operator-manifests
  namespace: openshift-local-storage
spec:
  sourceType: grpc
  # replace this with your index image
  image: quay.io/wangzheng422/local-storage-index:wzh-ocp-4.10-v01
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: local-storage-subscription
  namespace: openshift-local-storage
spec:
  channel: preview   # this is the default channel name defined in config bundle file
  name: local-storage-operator
  source: localstorage-operator-manifests
  sourceNamespace: openshift-local-storage
EOF

oc create --save-config -f ~/wzh/disk.fix.sub.yaml

# if you want to restore
# oc delete -f ~/wzh/disk.fix.sub.yaml

# after deploy ODF, set default storage class to rbd
oc patch storageclass ocs-storagecluster-ceph-rbd -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

sriov fix

实验环境,有sriov官方不支持的网卡,那么我们需要激活这些网卡支持,要做2个事情,一个是禁用webhook,另外一个是配置一个config map,把网卡识别信息放进去。

# disable sriov webhook
# https://docs.openshift.com/container-platform/4.10/networking/hardware_networks/configuring-sriov-operator.html#disable-enable-sr-iov-operator-admission-control-webhook_configuring-sriov-operator
oc patch sriovoperatorconfig default --type=merge \
  -n openshift-sriov-network-operator \
  --patch '{ "spec": { "enableOperatorWebhook": false } }'

# add unsupport nic ids 

cat << EOF > ~/wzh/sriov-unsupport.yaml
apiVersion: v1
data:
  INTEL: 8086 10fb 10ed
  I350:  8086 1521 1520
  Wuxi:  8848 1000 1080
kind: ConfigMap
metadata:
  name: unsupported-nic-ids
  namespace: openshift-sriov-network-operator
EOF
oc apply -f ~/wzh/sriov-unsupport.yaml


# 如何查找上面的那些网卡参数?在kernel里面能找到。
VAR_IF=ens19f0
cat /sys/class/net/$VAR_IF/device/vendor
# 0x8086
cat /sys/class/net/$VAR_IF/device/device
# 0x1521
cat /sys/class/net/$VAR_IF/device/sriov_vf_device
# 1520


[root@master1 device]# dmesg  |grep i40
[    3.700084] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[    3.700088] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[    3.718875] i40e 0000:23:00.0: fw 8.84.66032 api 1.14 nvm 8.40 0x8000af82 20.5.13 [8086:1572] [8086:0006]
[    3.815120] i40e 0000:23:00.0: MAC address: 6c:fe:54:44:29:60
[    3.815438] i40e 0000:23:00.0: FW LLDP is enabled
[    3.832075] i40e 0000:23:00.0: PCI-Express: Speed 8.0GT/s Width x8
[    3.862256] i40e 0000:23:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[    3.892534] i40e 0000:23:00.1: fw 8.84.66032 api 1.14 nvm 8.40 0x8000af82 20.5.13 [8086:1572] [8086:0006]
[    3.977303] i40e 0000:23:00.1: MAC address: 6c:fe:54:44:29:61
[    3.980272] i40e 0000:23:00.1: FW LLDP is enabled
[    3.993587] i40e 0000:23:00.1: PCI-Express: Speed 8.0GT/s Width x8
[    4.009877] i40e 0000:23:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[    4.033807] i40e 0000:51:00.0: fw 6.0.48442 api 1.7 nvm 6.01 0x80003554 1.1747.0 [8086:158b] [8086:0001]
[    4.115076] i40e 0000:51:00.0: MAC address: 3c:fd:fe:c5:58:68
[    4.120848] i40e 0000:51:00.0: FW LLDP is enabled
[    4.136188] i40e 0000:51:00.0: PCI-Express: Speed 8.0GT/s Width x8
[    4.139533] i40e 0000:51:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[    4.158734] i40e 0000:51:00.1: fw 6.0.48442 api 1.7 nvm 6.01 0x80003554 1.1747.0 [8086:158b] [8086:0001]
[    4.245403] i40e 0000:51:00.1: MAC address: 3c:fd:fe:c5:58:69
[    4.248148] i40e 0000:51:00.1: FW LLDP is enabled
[    4.260198] i40e 0000:51:00.1: PCI-Express: Speed 8.0GT/s Width x8
[    4.262961] i40e 0000:51:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA


[root@master3 device]# dmesg  |grep i40
[    3.776216] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[    3.778116] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[    3.798495] i40e 0000:23:00.0: fw 8.13.63341 api 1.12 nvm 8.15 0x8000a4e8 1.2879.0 [8086:1572] [1bd4:0042]
[    3.902899] i40e 0000:23:00.0: MAC address: b4:05:5d:e1:71:3e
[    3.904856] i40e 0000:23:00.0: FW LLDP is disabled
[    3.906678] i40e 0000:23:00.0: FW LLDP is disabled, attempting SW DCB
[    3.924126] i40e 0000:23:00.0: SW DCB initialization succeeded.
[    3.942003] i40e 0000:23:00.0: PCI-Express: Speed 8.0GT/s Width x8
[    3.963194] i40e 0000:23:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[    3.981141] i40e 0000:23:00.1: fw 8.13.63341 api 1.12 nvm 8.15 0x8000a4e8 1.2879.0 [8086:1572] [1bd4:0042]
[    4.067137] i40e 0000:23:00.1: MAC address: b4:05:5d:e1:71:3f
[    4.070012] i40e 0000:23:00.1: FW LLDP is disabled
[    4.072641] i40e 0000:23:00.1: FW LLDP is disabled, attempting SW DCB
[    4.085208] i40e 0000:23:00.1: SW DCB initialization succeeded.
[    4.103701] i40e 0000:23:00.1: PCI-Express: Speed 8.0GT/s Width x8
[    4.116830] i40e 0000:23:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[    4.127157] i40e 0000:23:00.1 ens22f1: renamed from eth0
[    4.160401] i40e 0000:23:00.0 ens22f0: renamed from eth1

lspci -vs 0000:23:00.0
# 23:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
#         Subsystem: Intel Corporation Ethernet 10G 2P X710 Adapter
#         Physical Slot: 22
#         Flags: bus master, fast devsel, latency 0, IRQ 105, NUMA node 2, IOMMU group 53
#         Memory at d7000000 (64-bit, prefetchable) [size=16M]
#         Memory at d8008000 (64-bit, prefetchable) [size=32K]
#         Expansion ROM at d9180000 [disabled] [size=512K]
#         Capabilities: [40] Power Management version 3
#         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
#         Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
#         Capabilities: [a0] Express Endpoint, MSI 00
#         Capabilities: [e0] Vital Product Data
#         Capabilities: [100] Advanced Error Reporting
#         Capabilities: [140] Device Serial Number 60-29-44-ff-ff-54-fe-6c
#         Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
#         Capabilities: [1a0] Transaction Processing Hints
#         Capabilities: [1b0] Access Control Services
#         Capabilities: [1d0] Secondary PCI Express
#         Kernel driver in use: i40e
#         Kernel modules: i40e

lspci -vs 0000:23:00.0
# 23:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
#         Subsystem: Inspur Electronic Information Industry Co., Ltd. 10G SFP+ DP EP102Fi4 Adapter
#         Physical Slot: 22
#         Flags: bus master, fast devsel, latency 0, IRQ 130, NUMA node 2, IOMMU group 53
#         Memory at d7000000 (64-bit, prefetchable) [size=8M]
#         Memory at d8008000 (64-bit, prefetchable) [size=32K]
#         Expansion ROM at d9180000 [disabled] [size=512K]
#         Capabilities: [40] Power Management version 3
#         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
#         Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
#         Capabilities: [a0] Express Endpoint, MSI 00
#         Capabilities: [100] Advanced Error Reporting
#         Capabilities: [140] Device Serial Number 3e-71-e1-ff-ff-5d-05-b4
#         Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
#         Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
#         Capabilities: [1a0] Transaction Processing Hints
#         Capabilities: [1b0] Access Control Services
#         Capabilities: [1d0] Secondary PCI Express
#         Kernel driver in use: i40e
#         Kernel modules: i40e

hugepage numa allocation

默认hugepage会平均分配在numa node之上,而dpdk程序,是绑定numa node运行得,所以一个不小心,就会出现hugepage不足,导致dpdk启动不了得情况。

这里,我们先看看这个环境里面的numa是个什么情况。

debug pod

为了方便测试,我们搞一个debug pod,然后oc debug node的方式,来运行这个pod,这样以后方便查询各种主机上的硬件信息。

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-memory-configuring-huge-pages

# build debug pod
mkdir -p /data/pod
cd /data/pod

cat << EOF > debugpod.dockerfile
FROM docker.io/library/almalinux:9

RUN dnf install -y epel-release && dnf update -y 
RUN dnf repolist

RUN dnf install -y --allowerasing which iproute bind-utils wget htop btop bash-completion curl net-tools java-1.8.0-openjdk git iperf3 tcpdump stress-ng fio numactl hwloc-gui lshw nc nmap-ncat dmidecode

RUN dnf clean all -y
EOF

# VAR_IMAGE=quay.io/wangzheng422/debug-pod:alma-9.1

podman build --squash -t quay.io/wangzheng422/debug-pod:alma-9.2 -f debugpod.dockerfile ./

podman push quay.io/wangzheng422/debug-pod:alma-9.2

podman tag quay.io/wangzheng422/debug-pod:alma-9.1 quaylab.infra.wzhlab.top:5443/wangzheng422/debug-pod:alma-9.1

podman push quaylab.infra.wzhlab.top:5443/wangzheng422/debug-pod:alma-9.1

# try it
oc debug node/master1.ocp.ytl.com --image=quay.io/wangzheng422/debug-pod:alma-9.1

numastat -cm | egrep 'Node|Huge'
#                  Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7  Total
# AnonHugePages      3570   1796   1830   2920    934   1366   2486   4482  19384
# ShmemHugePages        0      0      0      0      0      0      0      0      0
# HugePages_Total       0      0  24576      0      0      0      0      0  24576
# HugePages_Free        0      0  15360      0      0      0      0      0  15360
# HugePages_Surp        0      0      0      0      0      0      0      0      0

lstopo --of png > test.png 

# check nic belongs to numa node
cat /sys/class/net/ens22f0/device/numa_node
# 2

# check hugepage belongs to numa node
cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/nr_hugepages
# 24

config numa hugepage binding

我们参考官方文档,配置hugepage和numa的绑定关系。

  • https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-memory-configuring-huge-pages
oc patch mcp/master --patch '{"spec":{"paused":true}}' --type=merge
oc patch mcp/worker --patch '{"spec":{"paused":true}}' --type=merge

cat << EOF > ~/wzh/master-hugepage.yaml
kind: MachineConfig
apiVersion: machineconfiguration.openshift.io/v1
metadata:
  #name: 80-worker-hugepages
  name: 80-master-hugepages
  labels:
   # machineconfiguration.openshift.io/role: worker
    machineconfiguration.openshift.io/role: master
spec:
  osImageURL: ""
  config:
    ignition:
      version: 3.1.0
  kernelArguments:
    - hugepagesz=1G
    - hugepages=32
    - hugepagesz=2M
    - hugepages=0
    - default_hugepagesz=1G
    - intel_iommu=on
    - iommu=pt
EOF

oc apply -f ~/wzh/master-hugepage.yaml

cat << 'EOF' > ~/wzh/hugepage.bu
variant: openshift
version: 4.10.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-hugetlb-gigantic-pages
storage:
  files:

    - path: /etc/lib/systemd/hugetlb-reserve-pages.sh
      overwrite: true
      contents:
        inline: |
          #!/bin/sh

          nodes_path=/sys/devices/system/node/
          if [ ! -d $nodes_path ]; then
            echo "ERROR: $nodes_path does not exist"
            exit 1
          fi

          reserve_pages()
          {
            echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages
          }

          reserve_pages 0 node0
          reserve_pages 0 node1
          reserve_pages 16 node2
          reserve_pages 0 node3
          reserve_pages 0 node4
          reserve_pages 16 node5
          reserve_pages 0 node6
          reserve_pages 0 node7
      mode: 493
      user:
        name: root

systemd:
  units:

    - contents: |
        [Unit]
        Description=HugeTLB Gigantic Pages Reservation
        DefaultDependencies=no
        Before=dev-hugepages.mount
        ConditionPathExists=/sys/devices/system/node
        ConditionKernelCommandLine=hugepagesz=1G

        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=/etc/lib/systemd/hugetlb-reserve-pages.sh

        [Install]
        WantedBy=sysinit.target
      enabled: true
      name: hugetlb-gigantic-pages.service

EOF

butane ~/wzh/hugepage.bu > ~/wzh/hugepage.yaml

oc apply -f ~/wzh/hugepage.yaml

# oc create --save-config -f ~/wzh/hugepage.yaml

# oc delete -f ~/wzh/hugepage.yaml

oc patch mcp/master --patch '{"spec":{"paused":false}}' --type=merge
oc patch mcp/worker --patch '{"spec":{"paused":false}}' --type=merge

cnv disable auto import

实验室环境的外网非常慢,而cnv安装完了,会自动导入centos, rhel的镜像,这些镜像我们根本用不到,那么就禁止这种自动下载和导入。

  • https://docs.openshift.com/container-platform/4.10/virt/virtual_machines/advanced_vm_management/virt-automatic-bootsource-updates.html

oc patch hco kubevirt-hyperconverged -n openshift-cnv --type json -p '[{"op": "replace", "path": "/spec/featureGates/enableCommonBootImageImport", "value": false}]'

cluster logging storage sizing

默认ocp cluster logging operator,会使用200G的存储,如果我们集群内部ODF的存储很小,那么我们要调整,减小存储需求,并且配置他每天清楚旧的日志。


oc get clusterlogging/instance -n openshift-logging -o yaml
# ......
#   logStore:
#     elasticsearch:
#       nodeCount: 3
#       proxy:
#         resources:
#           limits:
#             memory: 256Mi
#           requests:
#             memory: 256Mi
#       redundancyPolicy: SingleRedundancy
#       resources:
#         limits:
#           cpu: 1
#           memory: 8Gi
#         requests:
#           cpu: 500m
#           memory: 8Gi
#       storage:
#         size: 52Gi
#     retentionPolicy:
#       application:
#         maxAge: 1d
#       audit:
#         maxAge: 1d
#       infra:
#         maxAge: 1d
#     type: elasticsearch
# ......

numa

默认有一些系统关于numa和cpu manager的配置。


cat << EOF > ~/wzh/numa.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: worker-cpumanager-enabled
spec:
  kubeletConfig:
    cpuManagerPolicy: static
    cpuManagerReconcilePeriod: 5s
    maxPods: 1000
    topologyManagerPolicy: single-numa-node
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: worker-cpumanager-enabled
EOF

oc apply -f ~/wzh/numa.yaml

node taint


oc adm taint nodes worker1.ocp.ytl.com intel_cpu=true:NoExecute

odf error fix

odf有过一些故障,有一个比较打的,是硬盘上有分区信息,于是ceph不纳管。另外,就是pod有遗留的volumn,导致新的pod无法创建,于是手动清理了这些volumn,让新的pod能继续创建。

# https://access.redhat.com/solutions/5512711
journalctl -r -u kubelet | grep 'orphaned pod' | head -1

journalctl -r -u kubelet | grep 'orphaned pod' | head -1 | sed 's/.*orphaned pod//' | sed 's/ found.*//' | xargs printf | xargs printf && echo

POD_NAME=`journalctl -r -u kubelet | grep 'orphaned pod' | head -1 | sed 's/.*orphaned pod//' | sed 's/ found.*//' | xargs printf | xargs printf`
echo $POD_NAME

# cd /var/lib/kubelet/pods
rm -rf /var/lib/kubelet/pods/$POD_NAME/volumes

journalctl -r -u kubelet | grep 'orphaned pod' | head -1

timezone

openshift默认的时区是utf+0的,我们要按照中国时区显示时间,就可以这么做。


TZ=":Asia/Shanghai" date

disable container image wipe

openshift在节点上有一个crio-wipe.service的systemd启动服务,他会运行crio wipe来清空本地镜像缓存,至于原因,说是因为重启,会有一定概率损坏容器存储,所以最简单的办法,就是删掉,重新下载。

我们可以把这个服务屏蔽掉,然后看看实际测试的效果,如果可以接受,那么就避免重启后去下载系统镜像了。


cat << EOF > ${BASE_DIR}/data/install/crio-wipe.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-disable-crio-wipe-master

systemd:
  units:
    - name: crio-wipe.service
      mask: true

EOF

butane -d ${BASE_DIR}/data/install ${BASE_DIR}/data/install/crio-wipe.bu > ${BASE_DIR}/data/install/99-zzz-disable-crio-wipe-master.yaml

oc create --save-config -f ${BASE_DIR}/data/install/99-zzz-disable-crio-wipe-master.yaml

sctp with externalIP

ipv4 single stack

# https://docs.openshift.com/container-platform/4.10/networking/using-sctp.html

cat << EOF > ${BASE_DIR}/data/install/99-load-sctp-module-master.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: 99-load-sctp-module-master
  labels:
    machineconfiguration.openshift.io/role: master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - path: /etc/modprobe.d/sctp-blacklist.conf
          mode: 0644
          overwrite: true
          contents:
            source: data:,
        - path: /etc/modules-load.d/sctp-load.conf
          mode: 0644
          overwrite: true
          contents:
            source: data:,sctp
EOF
oc create --save-config -f ${BASE_DIR}/data/install/99-load-sctp-module-master.yaml

cat << EOF > ${BASE_DIR}/data/install/sctp.demo.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: sctpserver
  labels:
    app: sctpserver
spec:
  containers:
    - name: sctpserver
      image: quay.io/wangzheng422/debug-pod:alma-9.1
      imagePullPolicy: Always
      command: ["/bin/sh", "-c"]
      args:
        [" ncat -l 30102 --sctp -v "]
      ports:
        - containerPort: 30102
          name: sctpserver
          protocol: SCTP
---
apiVersion: v1
kind: Service
metadata:
  name: sctpservice
  labels:
    app: sctpserver
spec:
  type: ClusterIP
  externalIPs:
    - 192.168.77.88
  selector:
    app: sctpserver
  ports:
    - name: sctpserver
      protocol: SCTP
      port: 30102
      targetPort: 30102
---
apiVersion: v1
kind: Pod
metadata:
  name: sctpclient
  labels:
    app: sctpclient
spec:
  containers:
    - name: sctpclient
      image: quay.io/wangzheng422/debug-pod:alma-9.1
      imagePullPolicy: Always
      command: ["/bin/sh", "-c"]
      args:
        ["sleep inf"]
---
EOF

oc create --save-config -n default -f ${BASE_DIR}/data/install/sctp.demo.yaml

# to restore
oc delete -n default -f ${BASE_DIR}/data/install/sctp.demo.yaml

oc get services sctpservice -o go-template='{{.spec.clusterIP}}{{"\n"}}'
# 172.30.40.207


oc get pod -n default -o wide
# NAME         READY   STATUS    RESTARTS       AGE   IP            NODE             NOMINATED NODE   READINESS GATES
# sctpclient   1/1     Running   0              15m   10.128.0.45   master-03-demo   <none>           <none>
# sctpserver   1/1     Running   4 (4m8s ago)   15m   10.128.0.44   master-03-demo   <none>           <none>

oc rsh sctpclient

echo '111' | ncat 172.30.40.207 30102 --sctp -v
# Ncat: Version 7.91 ( https://nmap.org/ncat )
# Ncat: Connected to 172.30.40.207:30102.
# Ncat: 4 bytes sent, 0 bytes received in 0.13 seconds.

# login to master-01
ssh root@192.168.77.23

echo '1111' | ncat 192.168.77.88 30102 --sctp -v
# Ncat: Version 7.91 ( https://nmap.org/ncat )
# Ncat: Connected to 192.168.77.88:30102.
# Ncat: 5 bytes sent, 0 bytes received in 0.12 seconds.

ipv4 and ipv6 dual stack

oc edit network/cluster
# remove external ip policy by set it to null.
# by setting .spec.externalIP.policy -> null

# apiVersion: config.openshift.io/v1
# kind: Network
# metadata:
#   creationTimestamp: "2023-01-10T06:39:18Z"
#   generation: 2
#   name: cluster
#   resourceVersion: "3473"
#   uid: c871e247-f941-426f-8f0e-02ecd2d497b8
# spec:
#   clusterNetwork:
#   - cidr: 10.128.0.0/14
#     hostPrefix: 23
#   - cidr: fd01::/48
#     hostPrefix: 64
#   externalIP:
#     policy: {}
#   networkType: OVNKubernetes
#   serviceNetwork:
#   - 172.30.0.0/16
#   - fd02::/112

# https://docs.openshift.com/container-platform/4.10/networking/using-sctp.html

cat << EOF > ${BASE_DIR}/data/install/99-load-sctp-module-master.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: 99-load-sctp-module-master
  labels:
    machineconfiguration.openshift.io/role: master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - path: /etc/modprobe.d/sctp-blacklist.conf
          mode: 0644
          overwrite: true
          contents:
            source: data:,
        - path: /etc/modules-load.d/sctp-load.conf
          mode: 0644
          overwrite: true
          contents:
            source: data:,sctp
EOF
oc create --save-config -f ${BASE_DIR}/data/install/99-load-sctp-module-master.yaml

cat << EOF > ${BASE_DIR}/data/install/sctp.demo.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: sctpserver
  labels:
    app: sctpserver
spec:
  containers:
    - name: sctpserver
      image: quay.io/wangzheng422/debug-pod:alma-9.1
      imagePullPolicy: Always
      command: ["/bin/sh", "-c"]
      args:
        ["sleep inf"]
        # [" while true; do ncat -l 30102 --sctp -v 2>&1 ; done; "]
      ports:
        - containerPort: 30102
          name: sctpserver
          protocol: SCTP
---
apiVersion: v1
kind: Service
metadata:
  name: sctpservice
  labels:
    app: sctpserver
spec:
  type: ClusterIP
  ipFamilyPolicy: RequireDualStack  
  ipFamilies:
    - IPv4
    - IPv6
  externalIPs:
    - 192.168.77.88
    - fd03::88
  selector:
    app: sctpserver
  ports:
    - name: sctpserver
      protocol: SCTP
      port: 30102
      targetPort: 30102
# ---
# apiVersion: v1
# kind: Service
# metadata:
#   name: sctpservice-v6
#   labels:
#     app: sctpserver
# spec:
#   type: ClusterIP
#   ipFamilyPolicy: SingleStack
#   ipFamilies:
#     - IPv6
#   externalIPs:
#     - fd03::88
#   selector:
#     app: sctpserver
#   ports:
#     - name: sctpserver
#       protocol: SCTP
#       port: 30102
#       targetPort: 30102
---
apiVersion: v1
kind: Pod
metadata:
  name: sctpclient
  labels:
    app: sctpclient
spec:
  containers:
    - name: sctpclient
      image: quay.io/wangzheng422/debug-pod:alma-9.1
      imagePullPolicy: Always
      command: ["/bin/sh", "-c"]
      args:
        ["sleep inf"]
---
EOF

oc create --save-config -n default -f ${BASE_DIR}/data/install/sctp.demo.yaml

# run below command in terminal windows of sctp server
# while true; do ncat -l 30102 --sctp -v 2>&1 ; done;

# to restore
oc delete -n default -f ${BASE_DIR}/data/install/sctp.demo.yaml

oc get pod -n default -o wide
# NAME         READY   STATUS    RESTARTS   AGE    IP            NODE             NOMINATED NODE   READINESS GATES
# sctpclient   1/1     Running   0          112s   10.128.0.71   master-03-demo   <none>           <none>
# sctpserver   1/1     Running   0          112s   10.128.0.70   master-03-demo   <none>           <none>

oc get services sctpservice -n default -o json | jq -r .spec.clusterIPs[]
# 172.30.74.183
# fd02::776e

oc rsh -n default sctpclient

echo '12345' | ncat 172.30.74.183 30102 --sctp -v
# Ncat: Version 7.91 ( https://nmap.org/ncat )
# Ncat: Connected to 172.30.74.183:30102.
# Ncat: 6 bytes sent, 0 bytes received in 0.12 seconds.

echo '123456' | ncat fd02::776e 30102 --sctp -v
# Ncat: Version 7.91 ( https://nmap.org/ncat )
# Ncat: Connected to fd02::776e:30102.
# Ncat: 7 bytes sent, 0 bytes received in 0.12 seconds.

# login to master-01
ssh root@192.168.77.23

echo '12' | ncat 192.168.77.88 30102 --sctp -v
# Ncat: Version 7.91 ( https://nmap.org/ncat )
# Ncat: Connected to 192.168.77.88:30102.
# Ncat: 3 bytes sent, 0 bytes received in 0.13 seconds.

echo '123' |  ncat fd03::88 30102 --sctp -v
# Ncat: Version 7.91 ( https://nmap.org/ncat )
# Ncat: Connected to fd03::88:30102.
# Ncat: 4 bytes sent, 0 bytes received in 3.11 seconds.

podman run -it --rm --network=host quay.io/wangzheng422/debug-pod:alma-9.1 bash


nmstat operator

https://github.com/openshift/kubernetes-nmstate

end

other backup

grow fs

dnf install -y cloud-utils-growpart 

lsblk
# NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
# sr0     11:0    1 1024M  0 rom
# vda    253:0    0   50G  0 disk
# ├─vda1 253:1    0    1G  0 part /boot
# ├─vda2 253:2    0    2G  0 part [SWAP]
# └─vda3 253:3    0   47G  0 part /
# vdb    253:16   0   80G  0 disk
# └─vdb1 253:17   0   40G  0 part

growpart /dev/vdb 1

e2fsck -fp /dev/vdb1

mount /dev/vdb1 /data/dnf

resize2fs /dev/vdb1


disable udisk


cat << EOF > ~/wzh/blk.rm.flag.bu
variant: openshift
version: 4.10.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-blk-rm-flag
storage:
  files:
    - path: /etc/udev/rules.d/62-internal-disk.rules
      mode: 0644
      overwrite: true
      contents:
        inline: |
           KERNEL=="sd[b-c]*",ENV{UDISKS_IGNORE}="1"
EOF

# /etc/udev/rules.d/62-internal-disk.rules
#  KERNEL=="sd[b-c]*",ENV{UDISKS_IGNORE}="1"

butane ~/wzh/blk.rm.flag.bu > ~/wzh/99-zzz-master-blk-rm-flag.yaml

oc create --save-config -f ~/wzh/99-zzz-master-blk-rm-flag.yaml

openshift4.11 acm with hypershift on baremetal

本文介绍,在openshift4.11上,装 ACM 组件以后,然后通过hypershift的方式,来部署一个单worker节点openshift4.11的控制面托管的集群,在部署的过程中,我们模拟离线的网络环境,并且禁止DHCP,只用静态IP。

This document, will describe how to deploy a single worker node cluster using hypershift, on a ocp 4.11 hub cluster with ACM. During the deployment process, we simulate an offline network environment, and Disable DHCP, only use static IP.

控制面托管(hypershift)模式,之所以诱人,是因为他能够让控制面变成一个namespace,然后托管到中心控制面集群上,这样就能把多个集群的控制面集中到一个中心集群上,能大大提高master节点的计算密度,节约master节点的成本。并且能够把集群master节点的运行维护工作,交给专业团队运维的控制面集群,作为最终用户,只要关心worker节点的运行和维护,而worker节点的运行维护相对来说,是非常简单的。

The control plane hosting (hypershift) mode is attractive because it can turn the control plane into a namespace and then host it on the central cluster, so that the control planes of multiple clusters can be concentrated on one central cluster, which can greatly increase the computing density of the master node and save the cost of the master node. And the operation and maintenance of the master node can be handed over to the central cluster operated by a professional team. As an end user, you only need to care about the operation and maintenance of the worker node, and the operation and maintenance of the worker node is relatively simple.

对比SNO,compact cluster这种master/worker混合部署的方案,hypershift通过剥离控制面业务负载,到中心集群,防止work load对master的不利影响,比如用户部署了一个UPF这种极度消耗CPU的应用,就会无意间影响master,从而让整个集群垮掉。而hypershift就从方案层面,避免了这种情况。而从中心集群的角度来说,他的业务负载种类比较单一,就能刚好的有针对性的优化和运维。

Compared with the master/worker combind deployment mod of SNO and compact cluster, hypershift removes the control plane work load and transfers it to the central cluster to prevent the adverse impact of work load on the master. For example, if a user deploys an application that consumes CPU such as UPF, It will inadvertently affect the master, causing the entire cluster to collapse. And hypershift avoids this situation from the architecture level. From the perspective of the central cluster, its work load type is relatively simple and consistent, and it can be optimized for operation and maintenance by focusing on the control plan.

本次实验,整个流程如下:

  1. 在openshift4上安装ACM组件。
  2. 在ACM上配置cluster, infra env等配置。
  3. MCE通过网络 redfish 协议启动kvm
  4. kvm自动开始集群安装,但是由于kvm+redfish的限制,安装过程中的重启,需要手动停止kvm,配置由硬盘启动,然后再手动启动kvm。
  5. 集群安装完成,保存集群登录信息

In this experiment, the whole process is as follows:

  1. Install the ACM component on openshift4.
  2. Configure cluster, infra env and other configurations on ACM.
  3. MCE starts kvm through network redfish protocol
  4. Kvm automatically starts the cluster installation, but due to the limitation of kvm+redfish, the restart during the installation process requires manually stopping kvm, configuring it to start from the hard disk, and then manually starting kvm.
  5. The cluster installation is complete, save the cluster login information

本次实验的部署架构图: >The deployment architecture diagram of this experiment:

本次实验的网络架构,和服务器, kvm部属架构,是依托之前的一个未完成的实验,工厂模式,虽然工厂模式实验的网络模型比较复杂,但是我们就不重复配置环境了。如果想了解IPI模式如何部署集群,可以参考上述文档。

The network architecture of this experiment, as well as the server and kvm deployment architecture, are based on a previous unfinished experiment, Factory Mode, although the network model of the factory mode experiment is more complicated , but we will not repeat the configuration environment. If you want to know how to deploy clusters in IPI mode, you can refer to the above documents.

参考资料:

reference:

  • https://cloud.redhat.com/blog/how-to-build-bare-metal-hosted-clusters-on-red-hat-advanced-cluster-management-for-kubernetes
  • https://cloud.redhat.com/blog/a-guide-to-red-hat-hypershift-on-bare-metal

静态变量 / static variable

根据factory的安装过程,我们弄了一个 3 node IPI 模式安装的 openshift, 是一个 ipi 的 compact cluster. 我们把这个集群作为hub集群,里面要装ACM组件。

According to the installation process of the factory, we have installed openshift in 3 node IPI mode, which is an ipi compact cluster. We use this cluster as a hub cluster, and ACM components must be installed in it.

以下的参数,是我们用这个hub集群,通过hypershift创建出来新集群的参数,新集群只有1个worker节点。

The following parameters are the parameters of the new cluster created by using this hub cluster through hypershift. The new cluster has only one worker node.

# on helper

# 做一些配置参数定义
INSTALL_IMAGE_REGISTRY=quaylab.infra.wzhlab.top:8443
# PULL_SECRET='{"auths":{"registry.redhat.io": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"'${INSTALL_IMAGE_REGISTRY}'": {"auth": "'$( echo -n 'admin:redhatadmin' | openssl base64 )'","email": "noemail@localhost"}}}'
PULL_SECRET=$(cat /data/pull-secret.json)

ACM_DEMO_CLUSTER=edge01

SNO_BASE_DOMAIN=wzhlab.top
SNO_IP=192.168.12.33
SNO_GW=192.168.12.1
SNO_NETMAST=255.255.255.0
SNO_NETMAST_S=24
SNO_HOSTNAME=edge-worker-01
SNO_IF=enp1s0
SNO_IF_MAC=52:54:00:20:a2:01
SNO_DNS=192.168.77.11
SNO_DISK=/dev/vda
SNO_CORE_PWD=redhat

另外,要说明的是,我们发现参考材料里面,对dns的配置不需要那么搞,至少对于单一worker节点来说,apps都指向这个worker节点就可以,api,api-int的域名指向并不重要,因为我们的实验,通过nodeport暴露API server,然后ip地址和端口号被静态的写入了kubelet的配置。

In addition, it should be noted that we found that in the reference materials, the configuration of dns does not need to be so, at least for a single worker node, apps can all point to this worker node, and the domain names of api and api-int are not important. Because of our experiment, the API server is exposed through nodeport, and then the ip address and port number are statically written into the kubelet configuration.

部署ACM / deploy ACM

接下来,我们就部署ACM,我们用最简单的部署模式。

Next, we deploy ACM, we use the simplest deployment mode.

# install operator Advanced Cluster Management for Kubernetes

cat << EOF > ${BASE_DIR}/data/install/acm.subscript.ns.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: open-cluster-management
EOF
oc create -f ${BASE_DIR}/data/install/acm.subscript.ns.yaml

cat << EOF > ${BASE_DIR}/data/install/acm.subscript.yaml
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name:  open-cluster-management-wzh
  namespace: open-cluster-management
spec:
  targetNamespaces:
    - open-cluster-management
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: advanced-cluster-management
  namespace: open-cluster-management
spec:
  sourceNamespace: openshift-marketplace
  source: redhat-operators
  channel: release-2.6
  installPlanApproval: Automatic
  name: advanced-cluster-management
EOF

oc create -f ${BASE_DIR}/data/install/acm.subscript.yaml

# RHACM create the MultiClusterHub resource

cat << EOF > ${BASE_DIR}/data/install/acm.mch.mch.yaml
apiVersion: operator.open-cluster-management.io/v1
kind: MultiClusterHub
metadata:
  name: multiclusterhub
  namespace: open-cluster-management
spec: {}
EOF
oc create -f ${BASE_DIR}/data/install/acm.mch.mch.yaml

oc patch mce multiclusterengine --type=merge -p '{"spec":{"overrides":{"components":[{"name":"hypershift-preview","enabled": true}]}}}'

# wait here until you can see the local-cluster
oc get ManagedCluster -A
# NAME            HUB ACCEPTED   MANAGED CLUSTER URLS                  JOINED   AVAILABLE   AGE
# local-cluster   true           https://api.factory.wzhlab.top:6443   True     True        5h22m

cat << EOF > ${BASE_DIR}/data/install/managed-cluster-addon.yaml
apiVersion: addon.open-cluster-management.io/v1alpha1
kind: ManagedClusterAddOn
metadata:
  name: hypershift-addon
  namespace: local-cluster
spec:
  installNamespace: open-cluster-management-agent-addon
EOF
oc create --save-config -f ${BASE_DIR}/data/install/managed-cluster-addon.yaml
# oc delete -f ${BASE_DIR}/data/install/managed-cluster-addon.yaml

oc get managedclusteraddons -A
# NAMESPACE       NAME                          AVAILABLE   DEGRADED   PROGRESSING
# local-cluster   application-manager           True
# local-cluster   cert-policy-controller        True
# local-cluster   cluster-proxy                 True
# local-cluster   config-policy-controller      True
# local-cluster   governance-policy-framework   True
# local-cluster   hypershift-addon              True
# local-cluster   iam-policy-controller         True
# local-cluster   work-manager                  True


装好了是这样,我们能看到装了2个operator, ACM和MCE

This is how it is installed, we can see that 2 operators, ACM and MCE are installed

我们可以通过webUI访问ACM:

We can access ACM through webUI:

https://console-openshift-console.apps.factory.wzhlab.top/multicloud/infrastructure/clusters/managed

可以看到,默认有一个local-cluster,类型是hub,这个就是我们这个装了ACM的集群。

As you can see, there is a local-cluster by default, the type is hub, and this is our cluster with ACM installed.

点击进去,就能看到这个cluster的详细信息。

Click into it, you can see the detailed information of this cluster.

以及这个cluster包含的节点。

And the nodes contained in this cluster.

这个集群装的ACM插件。

The ACM addon installed in this cluster.

新版本的ACM还有一个cluster set的概念,用来分类cluster.

The new version of ACM also has a concept of cluster set, which is used to classify clusters.

在ACM概览页面,能看到这个ACM管理的多云环境。

On the ACM overview page, you can see the multi-cloud environment managed by this ACM.

其他的链接,都没有内容,页面是空的。

Other links have no content and the page is empty.

用hypershift模式部署集群 / Deploy the cluster using hypershift

有过部署assisted install service,并通过AIS来部署SNO的经验,那么通过ACM,用hypershift的模式来部署,就容易理解了,整个过程一样,都是配置ACM里面的assisted install service,然后定义infr env,调用BMC API,来直接挂载iso,并启动主机。不同的地方,以前的实验,之后是定义一个 ClusterDeployment, 现在要定义一个 HostedCluster,这个hosted cluster会帮助我们创建 cluster deployment 。

Having deployed assisted install service and deploying SNO through AIS, it is easy to understand through ACM and deploying in hypershift mode. The whole process is the same as that of configuring assisted install service in ACM, and then defining infr env , call the BMC API to directly mount the iso and start the host. The difference is that in the previous experiment, after above steps, we define a ClusterDeployment, now we need to define a HostedCluster. This hosted cluster will help us create a cluster deployment.

setup ACM for agent service

ACM 2.6 UI 是完全支持hypershift的,但是,我们现在的实验,是为了项目上能定制,所以有些配置要用命令行完成。

ACM 2.6 UI fully supports hypershift, but our current experiment is for project customization, so some configurations need to be done using the command line.

本文就是手动创建yaml,然后一步一步的做,更深入的理解一下hypershift的过程。

This article is to manually create yaml, and then do it step by step to understand the process of hypershift more deeply.


oc project open-cluster-management

oc get hiveconfig hive -n multicluster-engine -o yaml
# ......
# spec: {}
# status:
#   aggregatorClientCAHash: b30ffa769079a2ac0e37e40172084089
#   conditions:
#   - lastProbeTime: "2023-01-13T09:10:10Z"
#     lastTransitionTime: "2023-01-13T09:10:10Z"
#     message: Hive is deployed successfully
#     reason: DeploymentSuccess
#     status: "True"
#     type: Ready
#   configApplied: true
#   observedGeneration: 1

oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true }}'

oc get provisioning provisioning-configuration -o yaml
# ......
# spec:
#   preProvisioningOSDownloadURLs: {}
#   provisioningMacAddresses:
#   - 52:54:00:20:a1:01
#   - 52:54:00:20:a1:02
#   - 52:54:00:20:a1:03
#   provisioningNetwork: Disabled
#   provisioningOSDownloadURL: http://192.168.77.11:8080/rhcos-openstack.x86_64.qcow2.gz?sha256=506bb66f8cb407c74061a8201f13e7b1edd44000d944be85eb7a4df7058dcb79
#   watchAllNamespaces: true
# ......

cat << EOF > ${BASE_DIR}/data/install/acm.ocp.release.yaml
apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
  name: openshift-v4.11.21
  namespace: multicluster-engine
spec:
  releaseImage: ${INSTALL_IMAGE_REGISTRY}/openshift/release-images:4.11.21-x86_64
EOF
oc create -f ${BASE_DIR}/data/install/acm.ocp.release.yaml
# oc delete -f ${BASE_DIR}/data/install/acm.ocp.release.yaml

oc get ClusterImageSet
# NAME                 RELEASE
# openshift-v4.11.21   quaylab.infra.wzhlab.top:8443/openshift/release-images:4.11.21-x86_64

cat << EOF > ${BASE_DIR}/data/install/acm.cm.asc.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: assisted-service-config
  namespace: multicluster-engine
  labels:
    app: assisted-service
data:
  LOG_LEVEL: "debug"
EOF
oc create -f ${BASE_DIR}/data/install/acm.cm.asc.yaml
# oc delete -f ${BASE_DIR}/data/install/acm.cm.asc.yaml

openshift-install version
# openshift-install 4.11.21
# built from commit d3fb15afdbf1558344ea88a1e134c8e9a011440f
# release image quay.io/openshift-release-dev/ocp-release@sha256:860cc37824074671c4cf76e02d224d243e670d2298e6dab8923ee391fbd0ae1c
# release architecture amd64

openshift-install coreos print-stream-json | jq .architectures.x86_64.artifacts.metal.release -r
# 411.86.202210041459-0

VAR_COREOS_VERSION=`openshift-install coreos print-stream-json | jq .architectures.x86_64.artifacts.metal.release -r`

# the config of CA is important here.
# assisted service will not use cluster's CA config
cat << EOF > ${BASE_DIR}/data/install/acm.mirror.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: hyper1-mirror-config
  namespace: multicluster-engine
  labels:
    app: assisted-service
data:
  ca-bundle.crt: |
$( cat /etc/crts/infra.wzhlab.top.crt | sed 's/^/    /g' )
  registries.conf: |
    unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]

    [[registry]]
      prefix = ""
      location = "quay.io/openshift-release-dev/ocp-release"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "${INSTALL_IMAGE_REGISTRY}/openshift/release-images"

    [[registry]]
      prefix = ""
      location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "${INSTALL_IMAGE_REGISTRY}/openshift/release"

---
EOF
oc create -f ${BASE_DIR}/data/install/acm.mirror.yaml
# oc delete -f ${BASE_DIR}/data/install/acm.mirror.yaml

cat << EOF > ${BASE_DIR}/data/install/acm.agentservicecofnig.yaml
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
  name: agent
  namespace: multicluster-engine
  ### This is the annotation that injects modifications in the Assisted Service pod
  annotations:
    unsupported.agent-install.openshift.io/assisted-service-configmap: "assisted-service-config"
###
spec:
  databaseStorage:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 40Gi
  filesystemStorage:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 40Gi
  ### This is a ConfigMap that only will make sense on Disconnected environments
  mirrorRegistryRef:
    name: "hyper1-mirror-config"
  ###
  osImages:
    - openshiftVersion: "4.11"
      version: "$VAR_COREOS_VERSION"
      url: "http://192.168.77.11:8080/rhcos-live.x86_64.iso"
      rootFSUrl: "http://192.168.77.11:8080/rhcos-live-rootfs.x86_64.img"
      cpuArchitecture: x86_64
EOF
oc create -f ${BASE_DIR}/data/install/acm.agentservicecofnig.yaml
# oc delete -f ${BASE_DIR}/data/install/acm.agentservicecofnig.yaml

# oc get pod -n multicluster-engine -o json | jq .items[].metadata.name -r | xargs -I DEMO oc logs -n multicluster-engine --prefix=true DEMO | grep 'failed to add release image '

# wait here to see all the status is True
oc get AgentServiceConfig/agent -n multicluster-engine -o yaml  
# ......
# status:
#   conditions:
#   - lastTransitionTime: "2023-01-13T01:38:25Z"
#     message: AgentServiceConfig reconcile completed without error.
#     reason: ReconcileSucceeded
#     status: "True"
#     type: ReconcileCompleted
#   - lastTransitionTime: "2023-01-13T01:40:25Z"
#     message: All the deployments managed by Infrastructure-operator are healthy.
#     reason: DeploymentSucceeded
#     status: "True"
#     type: DeploymentsHealthy

# stop here, and wait the assisted-service pod run into ok status
oc get pod -n multicluster-engine | grep assisted
# assisted-image-service-0                               1/1     Running   0               4m38s
# assisted-service-764cd98cf7-2r2db                      2/2     Running   1 (2m59s ago)   4m40s

create the infra env

infra env这个概念比较古怪,他的意思是,一组相同的主机共享的配置,共享什么配置呢?主要是网络参数配置,启动盘ISO的定制化配置等等。

The concept of infra env is rather weird. What it means is that a group of identical hosts share configurations. What configurations are shared? Mainly network parameter configuration, customized configuration of boot disk ISO, etc.


oc create ns ${ACM_DEMO_CLUSTER}
oc project ${ACM_DEMO_CLUSTER}


cat << EOF > ${BASE_DIR}/data/install/acm.managed.secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: assisted-deployment-pull-secret
  namespace: ${ACM_DEMO_CLUSTER}
stringData:
  .dockerconfigjson: '$PULL_SECRET'
EOF
oc create -f ${BASE_DIR}/data/install/acm.managed.secret.yaml
# oc delete -f ${BASE_DIR}/data/install/acm.managed.secret.yaml


cat << EOF > ${BASE_DIR}/data/install/acm.nmsc.yaml
apiVersion: agent-install.openshift.io/v1beta1
kind: NMStateConfig
metadata:
 name: ${ACM_DEMO_CLUSTER}
 namespace: ${ACM_DEMO_CLUSTER}
 labels:
   nmstate-conf-cluster-name: ${ACM_DEMO_CLUSTER}
spec:
 config:
   interfaces:
     - name: ${SNO_IF}
       type: ethernet
       state: up
       ipv4:
         enabled: true
         address:
           - ip: ${SNO_IP}
             prefix-length: ${SNO_NETMAST_S}
         dhcp: false
   dns-resolver:
     config:
       server:
         - ${SNO_DNS}
   routes:
     config:
       - destination: 0.0.0.0/0
         next-hop-address: ${SNO_GW}
         next-hop-interface: ${SNO_IF}
         table-id: 254
 interfaces:
   - name: "${SNO_IF}" 
     macAddress: ${SNO_IF_MAC}
EOF
oc create -f ${BASE_DIR}/data/install/acm.nmsc.yaml
# oc delete -f ${BASE_DIR}/data/install/acm.nmsc.yaml

oc get NMStateConfig/${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER}
# NAME     AGE
# edge01   3h30m


cat << EOF > ${BASE_DIR}/data/install/acm.infraenv.yaml
apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
  name: ${ACM_DEMO_CLUSTER}
  namespace: ${ACM_DEMO_CLUSTER}
  labels:
    agentclusterinstalls.extensions.hive.openshift.io/location: ${ACM_DEMO_CLUSTER}
    networkType: static
spec:
  agentLabels:
    'agentclusterinstalls.extensions.hive.openshift.io/location': ${ACM_DEMO_CLUSTER}
  additionalNTPSources:
    - 192.168.77.11
  # clusterRef:
  #   name: ${ACM_DEMO_CLUSTER}
  #   namespace: ${ACM_DEMO_CLUSTER}-${ACM_DEMO_CLUSTER}
  sshAuthorizedKey: "$(< ~/.ssh/id_rsa.pub)"
  pullSecretRef:
    name: assisted-deployment-pull-secret
  # ignitionConfigOverride: '${VAR_IGNITION}'
  nmStateConfigLabelSelector:
    matchLabels:
      nmstate-conf-cluster-name: ${ACM_DEMO_CLUSTER}
  # imageType: "full-iso"
EOF
oc create -f ${BASE_DIR}/data/install/acm.infraenv.yaml
# oc delete -f ${BASE_DIR}/data/install/acm.infraenv.yaml

oc get infraenv/${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER} -o yaml 
# additionalNTPSources:
#   - 192.168.77.11
# agentLabels:
#   agentclusterinstalls.extensions.hive.openshift.io/location: edge01
# cpuArchitecture: x86_64
# ipxeScriptType: DiscoveryImageAlways
# nmStateConfigLabelSelector:
#   matchLabels:
#     infraenvs.agent-install.openshift.io: edge01
# pullSecretRef:
#   name: pullsecret-edge01
# sshAuthorizedKey: ssh-rsa .....

oc get infraenv/${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER} -o json | jq .status
# {
#   "agentLabelSelector": {
#     "matchLabels": {
#       "infraenvs.agent-install.openshift.io": "edge01"
#     }
#   },
#   "bootArtifacts": {
#     "initrd": "https://assisted-image-service-multicluster-engine.apps.factory.wzhlab.top/images/c70485f3-0b12-437f-9efe-85b17f0c627f/pxe-initrd?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJjNzA0ODVmMy0wYjEyLTQzN2YtOWVmZS04NWIxN2YwYzYyN2YifQ.rrkRFxLVcMjEw16W3brxl_YCxHtJtUu-h0KMHcvj3DO701_ZPUM6cDg765Q02CviGSNcSTmu0ic5g06AkU0Zzg&arch=x86_64&version=4.11",
#     "ipxeScript": "https://assisted-service-multicluster-engine.apps.factory.wzhlab.top/api/assisted-install/v2/infra-envs/c70485f3-0b12-437f-9efe-85b17f0c627f/downloads/files?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJjNzA0ODVmMy0wYjEyLTQzN2YtOWVmZS04NWIxN2YwYzYyN2YifQ.3j_oKrmfOVQn85v2S3laLojUKaCTRqgkv_aSBPo-z_7k8-n2swb2m9aNT3uPr3CEstV4UVurkYwShtawFed0Cg&file_name=ipxe-script",
#     "kernel": "https://assisted-image-service-multicluster-engine.apps.factory.wzhlab.top/boot-artifacts/kernel?arch=x86_64&version=4.11",
#     "rootfs": "https://assisted-image-service-multicluster-engine.apps.factory.wzhlab.top/boot-artifacts/rootfs?arch=x86_64&version=4.11"
#   },
#   "conditions": [
#     {
#       "lastTransitionTime": "2023-01-13T03:15:17Z",
#       "message": "Image has been created",
#       "reason": "ImageCreated",
#       "status": "True",
#       "type": "ImageCreated"
#     }
#   ],
#   "createdTime": "2023-01-13T03:15:16Z",
#   "debugInfo": {
#     "eventsURL": "https://assisted-service-multicluster-engine.apps.factory.wzhlab.top/api/assisted-install/v2/events?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJjNzA0ODVmMy0wYjEyLTQzN2YtOWVmZS04NWIxN2YwYzYyN2YifQ.W_KCQgx4SwgbErK6eiyh7EmxPb9L8KKawXLOWPgBoPxVPH79QXq5wb-X5DT48b6qBlk3xk-F7MCT_bEG1f30Ww&infra_env_id=c70485f3-0b12-437f-9efe-85b17f0c627f"
#   },
#   "isoDownloadURL": "https://assisted-image-service-multicluster-engine.apps.factory.wzhlab.top/images/c70485f3-0b12-437f-9efe-85b17f0c627f?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJjNzA0ODVmMy0wYjEyLTQzN2YtOWVmZS04NWIxN2YwYzYyN2YifQ.4FqFWSqfYijmGGWAKopqHIiKghDZBZ2NAqTY1hmUhwNfTzuKlFLZ2pDZAevAxtmf7aN96-6UCeNewIfqoLzPVQ&arch=x86_64&type=minimal-iso&version=4.11"
# }

# VAR_ISO=`oc get infraenv ${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER} -o jsonpath={.status.isoDownloadURL}`

# cd /data/install/
# wget --no-check-certificate -O acm.demo1.iso $VAR_ISO

定义好了infra env,我们就能在ACM的web界面上看到啦。

After defining the infra env, we can see it on the ACM web interface.

infra env的详细信息,似乎没什么有用的,就是一些普通的配置。

The details of infra env seem to be of little use, just some common configurations.

在infra env的host配置里面,我们看到,现在还没有一个主机添加进来。

In the host configuration of infra env, we see that no host has been added yet.

add host to infra env

我们接下来要做的,就是给infra env添加主机,从web界面上看,大概有3种添加方法,一个是手动挂载discovery ISO,然后在infra env里面自动发现,一个是通过web界面,配置BMC等参数,来添加host,最后一种,是通过上传yaml配置文件来完成导入host的操作。

The next thing we need to do is to add hosts to the infra env. From the web interface, there are about 3 ways to add them. One is to manually mount the discovery ISO, and then automatically discover it in the infra env. The other is to configure it through the web interface, by adding BMC and other parameters to add the host, and the last one is to complete the operation of importing the host by uploading the yaml configuration file.

本文是通过命令行的方式来添加,那么就类似界面上最后一种,通过上传yaml的方式来导入host。

This article is added through the command line, so it is similar to the last one on the web interface, importing host by uploading yaml.

# lets confirm that the metal3 component is ready
# then we can use ocp to manage the baremetal
oc get pod -A | grep metal3
# openshift-machine-api                              metal3-8666f4cf4d-2bkfb                                           5/5     Running     5               12h
# openshift-machine-api                              metal3-image-cache-8jhtr                                          1/1     Running     1               13h
# openshift-machine-api                              metal3-image-cache-9jfs7                                          1/1     Running     1               13h
# openshift-machine-api                              metal3-image-cache-fl545                                          1/1     Running     1               13h
# openshift-machine-api                              metal3-image-customization-868d87999b-x2mnw                       1/1     Running     1               13h


cat << EOF > ${BASE_DIR}/data/install/acm.demo.secret.bmc.yaml
apiVersion: v1
kind: Secret
metadata:
  name: ${ACM_DEMO_CLUSTER}-bmc-master-01
  namespace: ${ACM_DEMO_CLUSTER}
data:
  password: $(echo password | base64)
  username: $(echo admin | base64)
type: Opaque
EOF
oc create -f ${BASE_DIR}/data/install/acm.demo.secret.bmc.yaml
# oc delete -f ${BASE_DIR}/data/install/acm.demo.secret.bmc.yaml

cat << EOF > ${BASE_DIR}/data/install/acm.demo.bmh.master.yaml
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: ${ACM_DEMO_CLUSTER}-${SNO_HOSTNAME}
  namespace: ${ACM_DEMO_CLUSTER}
  labels:
    infraenvs.agent-install.openshift.io: "${ACM_DEMO_CLUSTER}"
  annotations:
    ## Disable the Introspection
    inspect.metal3.io: disabled
    ## Set Static Hostname
    bmac.agent-install.openshift.io/hostname: "${SNO_HOSTNAME}"
    ## Set Static Role, auto-assign?
    bmac.agent-install.openshift.io/role: "worker"
spec:
  online: true
  bmc:
    address: redfish-virtualmedia://192.168.77.101:8000/redfish/v1/Systems/$(cat /data/install/vm.list.* | grep ocp4-ipi-edge-master-01 | awk '{print $1}')
    credentialsName: ${ACM_DEMO_CLUSTER}-bmc-master-01
    disableCertificateVerification: true
  bootMACAddress: $(cat /data/install/mac.list.* | grep ocp4-ipi-edge-master-01 | awk '{print $2}')
  automatedCleaningMode: disabled
EOF
oc create -f ${BASE_DIR}/data/install/acm.demo.bmh.master.yaml
# oc delete -f ${BASE_DIR}/data/install/acm.demo.bmh.master.yaml

oc get BareMetalHost/${ACM_DEMO_CLUSTER}-${SNO_HOSTNAME} -n ${ACM_DEMO_CLUSTER} -o yaml
# ......
# metadata:
#   annotations:
#     bmac.agent-install.openshift.io/hostname: edge-worker-01
#     bmac.agent-install.openshift.io/role: worker
#     inspect.metal3.io: disabled
#   creationTimestamp: "2023-01-18T15:08:22Z"
#   finalizers:
#   - baremetalhost.metal3.io
#   generation: 2
#   labels:
#     infraenvs.agent-install.openshift.io: edge01
#   name: edge01-edge-worker-01
#   namespace: edge01
#   resourceVersion: "111945"
#   uid: b21c5b31-c28c-4b43-b8c1-a0ba80581e60
# spec:
#   automatedCleaningMode: disabled
#   bmc:
#     address: redfish-virtualmedia://192.168.77.101:8000/redfish/v1/Systems/a176e428-fea7-43ff-95c7-a927514227ed
#     credentialsName: edge01-bmc-master-01
#     disableCertificateVerification: true
#   bootMACAddress: 52:54:00:20:a2:01
#   image:
#     format: live-iso
#     url: https://assisted-image-service-multicluster-engine.apps.factory.wzhlab.top/images/2e9fa857-17c6-493f-8030-b4cb2b736fd1?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiIyZTlmYTg1Ny0xN2M2LTQ5M2YtODAzMC1iNGNiMmI3MzZmZDEifQ.yF_UwtoDKWdc6dYUkcYpNDOWzLt_jVS1ZSqU-SLzZq4QZwt6v7x5Hl8azM3S9THX0xi0K-ert3gqVLbNV62s9Q&arch=x86_64&type=minimal-iso&version=4.11
#   online: true
# ......

配置完成以后,在web界面上,就能看到这个主机啦,其实在openshift的界面里面,也能看到这个baremetal,我们看到系统正在试图配置这个主机。

After the configuration is complete, you can see this host on the web interface. In fact, you can also see this baremetal in the openshift interface. We can see that the system is trying to configure this host.

其实在目标kvm上,是启动了一个定制的coreos live cd,启动了以后,运行了一个服务,他会搜集本机的信息,然后上报,上述操作顺利的话,我们就能在界面上看到主机信息更新了。

In fact, on the target kvm, a customized coreos live cd is started. After starting, a service is run, which will collect the information of the machine and then report it. If the above operations are successful, we can see the host on the interface. The information has been updated.

这里面的host,在后台对应的是agent的配置,我们可以通过命令行查看agent对应的详细信息。

The host here corresponds to the configuration of the agent CR in the background, and we can view the detailed information corresponding to the agent through the command line.


oc get agent -n ${ACM_DEMO_CLUSTER}
# NAME                                   CLUSTER   APPROVED   ROLE     STAGE
# a176e428-fea7-43ff-95c7-a927514227ed             true       worker

oc get agent/a176e428-fea7-43ff-95c7-a927514227ed -n ${ACM_DEMO_CLUSTER} -o yaml 
# ......
# metadata:
#   annotations:
#     inventory.agent-install.openshift.io/version: "0.1"
#   creationTimestamp: "2023-01-18T15:11:47Z"
#   finalizers:
#   - agent.agent-install.openshift.io/ai-deprovision
#   generation: 2
#   labels:
#     agent-install.openshift.io/bmh: edge01-edge-worker-01
#     agent-install.openshift.io/clusterdeployment-namespace: ""
#     agentclusterinstalls.extensions.hive.openshift.io/location: edge01
#     infraenvs.agent-install.openshift.io: edge01
#     inventory.agent-install.openshift.io/cpu-architecture: x86_64
#     inventory.agent-install.openshift.io/cpu-virtenabled: "false"
#     inventory.agent-install.openshift.io/host-isvirtual: "true"
#     inventory.agent-install.openshift.io/host-manufacturer: RedHat
#     inventory.agent-install.openshift.io/host-productname: KVM
#     inventory.agent-install.openshift.io/storage-hasnonrotationaldisk: "false"
#   name: a176e428-fea7-43ff-95c7-a927514227ed
#   namespace: edge01
#   resourceVersion: "117085"
#   uid: c410d01b-1bdb-4ade-b5e6-630aadf634b3
# spec:
#   approved: true
#   hostname: edge-worker-01
#   role: worker
# ......

begin to create new cluster - control plan

准备工作都做好了,我们开始创建一个hypershift管理的,托管控制面的新集群。

With everything in place, we started creating a new hypershift-managed cluster hosting the control plane.


cat << EOF > ${BASE_DIR}/data/install/capi-role-${ACM_DEMO_CLUSTER}.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: capi-provider-role
  namespace: ${ACM_DEMO_CLUSTER}
rules:
- apiGroups:
  - agent-install.openshift.io
  resources:
  - agents
  verbs:
  - '*'
EOF
oc create --save-config -f ${BASE_DIR}/data/install/capi-role-${ACM_DEMO_CLUSTER}.yaml

# nodepool -> config -> config map -> machine config
# we have container image cache, so we add customize config through machine config
cat << EOF > ${BASE_DIR}/data/install/hyper.mirror.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: hyper-mirror-config
  namespace: ${ACM_DEMO_CLUSTER}
data:
  config: |
$( cat /data/ocp4/99-worker-container-registries.yaml | sed 's/^/    /g' )

---
EOF
oc create -f ${BASE_DIR}/data/install/hyper.mirror.yaml
# oc delete -f ${BASE_DIR}/data/install/hyper.mirror.yaml


cat << EOF > ${BASE_DIR}/data/sno/install.images.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-zzz-master-install-images
storage:
  files:
    - path: /etc/containers/registries.conf.d/base.registries.conf
      overwrite: true
      contents:
        inline: |
          unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
          short-name-mode = ""

          [[registry]]
            prefix = ""
            location = "quay.io/openshift-release-dev/ocp-release"
            mirror-by-digest-only = true

            [[registry.mirror]]
              location = "${INSTALL_IMAGE_REGISTRY}/openshift/release-images"

          [[registry]]
            prefix = ""
            location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"
            mirror-by-digest-only = true

            [[registry.mirror]]
              location = "${INSTALL_IMAGE_REGISTRY}/openshift/release"

EOF
butane ${BASE_DIR}/data/sno/install.images.bu > ${BASE_DIR}/data/sno/disconnected/99-zzz-master-install-images.yaml

cat << EOF > ${BASE_DIR}/data/install/hyper.mirror.main.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: hyper-mirror-main-config
  namespace: ${ACM_DEMO_CLUSTER}
data:
  config: |
$( cat ${BASE_DIR}/data/sno/disconnected/99-zzz-master-install-images.yaml | sed 's/^/    /g' )

---
EOF
oc create -f ${BASE_DIR}/data/install/hyper.mirror.main.yaml



cat << EOF > ${BASE_DIR}/data/install/hosted-cluster-${ACM_DEMO_CLUSTER}.yaml 
---
apiVersion: hypershift.openshift.io/v1alpha1
kind: HostedCluster
metadata:
  name: ${ACM_DEMO_CLUSTER}
  namespace: ${ACM_DEMO_CLUSTER}
  labels:
    "cluster.open-cluster-management.io/clusterset": 'default'
spec:
  release:
    image: ${INSTALL_IMAGE_REGISTRY}/openshift/release-images:4.11.21-x86_64
  pullSecret:
    name: pullsecret-cluster-${ACM_DEMO_CLUSTER}
  sshKey:
    name: sshkey-cluster-${ACM_DEMO_CLUSTER}
  networking:
    podCIDR: 10.132.0.0/14
    serviceCIDR: 172.31.0.0/16
    machineCIDR: 192.168.12.0/24
    networkType: OpenShiftSDN
  platform:
    type: Agent
    agent:
      agentNamespace: ${ACM_DEMO_CLUSTER}
  infraID: ${ACM_DEMO_CLUSTER}
  dns:
    baseDomain: '$SNO_BASE_DOMAIN'
  services:
  - service: APIServer
    servicePublishingStrategy:
        nodePort:
          address: 192.168.12.23
          port: 30000
        type: NodePort
  - service: OAuthServer
    servicePublishingStrategy:
      type: Route
  - service: OIDC
    servicePublishingStrategy:
      type: Route
  - service: Konnectivity
    servicePublishingStrategy:
      type: Route
  - service: Ignition
    servicePublishingStrategy:
      type: Route
---
apiVersion: v1
kind: Secret
metadata:
  name: pullsecret-cluster-${ACM_DEMO_CLUSTER}
  namespace: ${ACM_DEMO_CLUSTER}
stringData:
  '.dockerconfigjson': '$PULL_SECRET'
type: kubernetes.io/dockerconfigjson
---
apiVersion: v1
kind: Secret
metadata:
  name: sshkey-cluster-${ACM_DEMO_CLUSTER}
  namespace: ${ACM_DEMO_CLUSTER}
stringData:
  id_rsa.pub: '$(< ~/.ssh/id_rsa.pub)'
---
apiVersion: hypershift.openshift.io/v1alpha1
kind: NodePool
metadata:
  name: 'nodepool-${ACM_DEMO_CLUSTER}-01'
  namespace: ${ACM_DEMO_CLUSTER}
spec:
  clusterName: ${ACM_DEMO_CLUSTER}
  config:
    - name: hyper-mirror-config
    - name: hyper-mirror-main-config
  replicas: 1
  management:
    autoRepair: false
    upgradeType: InPlace
  platform:
    type: Agent
    agent:
      agentLabelSelector:
        matchLabels: {}
  release:
    image: ${INSTALL_IMAGE_REGISTRY}/openshift/release-images:4.11.21-x86_64
---
apiVersion: cluster.open-cluster-management.io/v1
kind: ManagedCluster
metadata:
  labels:
    cloud: hypershift
    name: ${ACM_DEMO_CLUSTER}
    cluster.open-cluster-management.io/clusterset: 'default'
  name: ${ACM_DEMO_CLUSTER}
spec:
  hubAcceptsClient: true
---
apiVersion: agent.open-cluster-management.io/v1
kind: KlusterletAddonConfig
metadata:
  name: ${ACM_DEMO_CLUSTER}
  namespace: ${ACM_DEMO_CLUSTER}
spec:
  clusterName: ${ACM_DEMO_CLUSTER}
  clusterNamespace: ${ACM_DEMO_CLUSTER}
  clusterLabels:
    cloud: ai-hypershift
  applicationManager:
    enabled: true
  policyController:
    enabled: true
  searchCollector:
    enabled: true
  certPolicyController:
    enabled: true
  iamPolicyController:
    enabled: true
EOF

oc create --save-config -f ${BASE_DIR}/data/install/hosted-cluster-${ACM_DEMO_CLUSTER}.yaml 
# oc delete -f ${BASE_DIR}/data/install/hosted-cluster-${ACM_DEMO_CLUSTER}.yaml 

oc get HostedCluster -A
# NAMESPACE   NAME     VERSION   KUBECONFIG                PROGRESS   AVAILABLE   PROGRESSING   MESSAGE
# edge01      edge01             edge01-admin-kubeconfig   Partial    True        False         The hosted control plane is available

oc get HostedCluster/${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER} -o yaml | yq .spec
# autoscaling: {}
# clusterID: 8c0fb18c-22dd-4fb9-a2a6-420ee19d9f8a
# controllerAvailabilityPolicy: SingleReplica
# dns:
#   baseDomain: wzhlab.top
# etcd:
#   managed:
#     storage:
#       persistentVolume:
#         size: 4Gi
#       type: PersistentVolume
#   managementType: Managed
# fips: false
# infraID: edge01
# infrastructureAvailabilityPolicy: SingleReplica
# issuerURL: https://kubernetes.default.svc
# networking:
#   clusterNetwork:
#     - cidr: 10.132.0.0/14
#   machineNetwork:
#     - cidr: 192.168.12.0/24
#   networkType: OVNKubernetes
#   serviceNetwork:
#     - cidr: 172.31.0.0/16
# olmCatalogPlacement: management
# platform:
#   agent:
#     agentNamespace: edge01
#   type: Agent
# pullSecret:
#   name: pullsecret-cluster-edge01
# release:
#   image: quaylab.infra.wzhlab.top:8443/openshift/release-images:4.11.21-x86_64
# services:
#   - service: APIServer
#     servicePublishingStrategy:
#       nodePort:
#         address: 192.168.12.23
#         port: 30000
#       type: NodePort
#   - service: OAuthServer
#     servicePublishingStrategy:
#       type: Route
#   - service: OIDC
#     servicePublishingStrategy:
#       type: Route
#   - service: Konnectivity
#     servicePublishingStrategy:
#       type: Route
#   - service: Ignition
#     servicePublishingStrategy:
#       type: Route
# sshKey:
#   name: sshkey-cluster-edge01

oc get clusterdeployment -A
# NAMESPACE       NAME     INFRAID                                PLATFORM          REGION   VERSION   CLUSTERTYPE   PROVISIONSTATUS   POWERSTATE   AGE
# edge01-edge01   edge01   39d863f0-57f8-4ff4-a2b5-61e3e654c4db   agent-baremetal            4.11.21                 Provisioned                    122m

oc get clusterdeployment/${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER}-${ACM_DEMO_CLUSTER} -o yaml | yq .spec
# baseDomain: wzhlab.top
# clusterInstallRef:
#   group: extensions.hive.openshift.io
#   kind: AgentClusterInstall
#   name: edge01
#   version: v1beta1
# clusterMetadata:
#   adminKubeconfigSecretRef:
#     name: admin-kubeconfig
#   clusterID: 28c54029-b032-4c48-8486-deb1dabe8ea8
#   infraID: 28c54029-b032-4c48-8486-deb1dabe8ea8
# clusterName: edge01
# controlPlaneConfig:
#   servingCertificates: {}
# installed: true
# platform:
#   agentBareMetal:
#     agentSelector: {}
# pullSecretRef:
#   name: pull-secret

oc get AgentClusterInstall -A
# NAMESPACE       NAME     CLUSTER   STATE
# edge01-edge01   edge01   edge01    adding-hosts

oc get AgentClusterInstall/${ACM_DEMO_CLUSTER} -n ${ACM_DEMO_CLUSTER}-${ACM_DEMO_CLUSTER} -o yaml | yq .spec
# clusterDeploymentRef:
#   name: edge01
# ignitionEndpoint:
#   caCertificateReference:
#     name: ignition-server-ca-cert
#     namespace: edge01-edge01
#   url: https://ignition-server-edge01-edge01.apps.factory.wzhlab.top
# networking:
#   userManagedNetworking: true
# provisionRequirements:
#   controlPlaneAgents: 3

oc get agent -n ${ACM_DEMO_CLUSTER}
# NAME                                   CLUSTER   APPROVED   ROLE     STAGE
# a176e428-fea7-43ff-95c7-a927514227ed             true       worker

oc get agent/a176e428-fea7-43ff-95c7-a927514227ed -n ${ACM_DEMO_CLUSTER} -o yaml | yq .spec
# approved: true
# clusterDeploymentName:
#   name: edge01
#   namespace: edge01-edge01
# hostname: edge-worker-01
# ignitionEndpointTokenReference:
#   name: agent-user-data-nodepool-edge01-01-e3fdfbf8
#   namespace: edge01-edge01
# machineConfigPool: ignition
# role: worker


# wait here, and check the control plan creation.
oc get pod -n ${ACM_DEMO_CLUSTER}-${ACM_DEMO_CLUSTER}
# NAME                                             READY   STATUS    RESTARTS   AGE
# capi-provider-87b88465c-zgrx2                    1/1     Running   0          10m
# catalog-operator-7dcf86576f-vffl6                2/2     Running   0          7m33s
# certified-operators-catalog-7b4bdcb679-25gls     1/1     Running   0          7m39s
# cluster-api-5984dc678b-46ms7                     1/1     Running   0          10m
# cluster-autoscaler-5cd6b96d55-nzw4x              1/1     Running   0          9m33s
# cluster-network-operator-547f6988f4-6q2f2        1/1     Running   0          7m49s
# cluster-policy-controller-857bf8594f-9dhhj       1/1     Running   0          7m56s
# cluster-version-operator-85f5fd968f-rhchm        1/1     Running   0          7m55s
# community-operators-catalog-f6d797bc-87f9k       1/1     Running   0          7m38s
# control-plane-operator-65444fdff8-fzhvb          1/1     Running   0          10m
# etcd-0                                           1/1     Running   0          9m36s
# hosted-cluster-config-operator-cb8bd76f7-wvtfl   1/1     Running   0          7m41s
# ignition-server-57fbf98b8b-wvkv2                 1/1     Running   0          9m26s
# ingress-operator-594bdd5d6d-2t6kw                2/2     Running   0          7m46s
# konnectivity-agent-67bd878b88-bwxcp              1/1     Running   0          9m35s
# konnectivity-server-764ffdb8fd-xgxqq             1/1     Running   0          9m36s
# kube-apiserver-7f85bd5d7f-cvd7r                  3/3     Running   0          9m34s
# kube-controller-manager-7bd7ff884f-2c4jr         1/1     Running   0          6m35s
# kube-scheduler-68858b678d-jlpmx                  1/1     Running   0          8m30s
# machine-approver-c6b6f6ff8-jh445                 1/1     Running   0          9m33s
# oauth-openshift-5bb59d5596-55mtw                 2/2     Running   0          6m15s
# olm-operator-949f6f76b-r8kkz                     2/2     Running   0          7m32s
# openshift-apiserver-5ddbbd9847-n2824             2/2     Running   0          6m35s
# openshift-controller-manager-7cdd5bcc7b-p7kfb    1/1     Running   0          7m56s
# openshift-oauth-apiserver-8c76cb9b9-t9nts        1/1     Running   0          7m58s
# packageserver-58d5b997b9-wdn58                   2/2     Running   0          7m32s
# redhat-marketplace-catalog-85748dc79-tl8sr       1/1     Running   0          7m38s
# redhat-operators-catalog-74849cb9d6-9bg49        1/1     Running   0          7m38s


oc get pod -n ${ACM_DEMO_CLUSTER}-${ACM_DEMO_CLUSTER} | tail -n +2 | wc -l
# 28

配置导入以后,我们就能看到多了一个集群edge01, 类型是hosted.

After the configuration is imported, we can see that there is an additional cluster edge01, the type is hosted.

安装过程稍微有一点时间,期间,我们能看到集群状态,nodepool状态有所变化。

The installation process takes a little while. During this period, we can see the status of the cluster and the status of the nodepool has changed.

我们还能看到hub集群上,有了一个edge01-edge01的namespace,里面有集群控制面的pod,其中就有我们熟悉的etcd, api-server

We can also see that on the hub cluster, there is an edge01-edge01 namespace, which contains the pods of the cluster control plane, including the familiar etcd and api-server

import the hosted cluster

经过一段时间,新集群就安装成功了,但是页面上提示,需要手动导入。我们复制页面上的命令,并到helper上,运行这2个命令,他们是登录到hosted control plan,然后配置一些CR进去

After a period of time, the new cluster is installed successfully, but the page prompts that it needs to be imported manually. We copy the commands on the page, and run these two commands on the helper node, they log in to the hosted control plan, and then configure some CR/CRD into it

# on helper

# copy past the 1st command
oc login https://192.168.12.23:30000 -u kubeadmin -p z2I9i-BZF8L-sYvUC-47c7x

# copy past the 2nd command
# it is too large, we will omit most of them
echo "Ci0tLQphc............" | base64 -d | oc create -f - || test $? -eq 0 && sleep 2 && echo "Ci0tLQphcGlWZ............" | base64 -d | oc apply -f - || echo "VGhlIGNsdXN..............." | base64 -d
# namespace/open-cluster-management-agent created
# serviceaccount/klusterlet created
# clusterrole.rbac.authorization.k8s.io/klusterlet created
# clusterrole.rbac.authorization.k8s.io/open-cluster-management:klusterlet-admin-aggregate-clusterrole created
# clusterrolebinding.rbac.authorization.k8s.io/klusterlet created
# Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "klusterlet" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "klusterlet" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "klusterlet" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "klusterlet" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
# deployment.apps/klusterlet created
# secret/bootstrap-hub-kubeconfig created
# klusterlet.operator.open-cluster-management.io/klusterlet created

# lets decode the first 2 base64 content, the 3rd one is just a message.

我们很好奇到底导入了什么东西,那让我们解码看看。第一个导入hosted control plan的yaml是一个CRD。

We are curious about what is being imported, so let's decode it. The first yaml imported into the hosted control plan is a CRD.

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: klusterlets.operator.open-cluster-management.io
spec:
  conversion:
    strategy: None
  group: operator.open-cluster-management.io
  names:
    kind: Klusterlet
    listKind: KlusterletList
    plural: klusterlets
    singular: klusterlet
  scope: Cluster
  preserveUnknownFields: false
  versions:
    - name: v1
      schema:
        openAPIV3Schema:
          description: Klusterlet represents controllers to install the resources for a managed cluster. When configured, the Klusterlet requires a secret named bootstrap-hub-kubeconfig in the agent namespace to allow API requests to the hub for the registration protocol. In Hosted mode, the Klusterlet requires an additional secret named external-managed-kubeconfig in the agent namespace to allow API requests to the managed cluster for resources installation.
          type: object
          properties:
            apiVersion:
              description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
              type: string
            kind:
              description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
              type: string
            metadata:
              type: object
            spec:
              description: Spec represents the desired deployment configuration of Klusterlet agent.
              type: object
              properties:
                clusterName:
                  description: ClusterName is the name of the managed cluster to be created on hub. The Klusterlet agent generates a random name if it is not set, or discovers the appropriate cluster name on OpenShift.
                  type: string
                deployOption:
                  description: DeployOption contains the options of deploying a klusterlet
                  type: object
                  properties:
                    mode:
                      description: 'Mode can be Default or Hosted. It is Default mode if not specified In Default mode, all klusterlet related resources are deployed on the managed cluster. In Hosted mode, only crd and configurations are installed on the spoke/managed cluster. Controllers run in another cluster (defined as management-cluster) and connect to the mangaged cluster with the kubeconfig in secret of "external-managed-kubeconfig"(a kubeconfig of managed-cluster with cluster-admin permission). Note: Do not modify the Mode field once it''s applied.'
                      type: string
                externalServerURLs:
                  description: ExternalServerURLs represents the a list of apiserver urls and ca bundles that is accessible externally If it is set empty, managed cluster has no externally accessible url that hub cluster can visit.
                  type: array
                  items:
                    description: ServerURL represents the apiserver url and ca bundle that is accessible externally
                    type: object
                    properties:
                      caBundle:
                        description: CABundle is the ca bundle to connect to apiserver of the managed cluster. System certs are used if it is not set.
                        type: string
                        format: byte
                      url:
                        description: URL is the url of apiserver endpoint of the managed cluster.
                        type: string
                namespace:
                  description: 'Namespace is the namespace to deploy the agent. The namespace must have a prefix of "open-cluster-management-", and if it is not set, the namespace of "open-cluster-management-agent" is used to deploy agent. Note: in Detach mode, this field will be **ignored**, the agent will be deployed to the namespace with the same name as klusterlet.'
                  type: string
                nodePlacement:
                  description: NodePlacement enables explicit control over the scheduling of the deployed pods.
                  type: object
                  properties:
                    nodeSelector:
                      description: NodeSelector defines which Nodes the Pods are scheduled on. The default is an empty list.
                      type: object
                      additionalProperties:
                        type: string
                    tolerations:
                      description: Tolerations is attached by pods to tolerate any taint that matches the triple <key,value,effect> using the matching operator <operator>. The default is an empty list.
                      type: array
                      items:
                        description: The pod this Toleration is attached to tolerates any taint that matches the triple <key,value,effect> using the matching operator <operator>.
                        type: object
                        properties:
                          effect:
                            description: Effect indicates the taint effect to match. Empty means match all taint effects. When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
                            type: string
                          key:
                            description: Key is the taint key that the toleration applies to. Empty means match all taint keys. If the key is empty, operator must be Exists; this combination means to match all values and all keys.
                            type: string
                          operator:
                            description: Operator represents a key's relationship to the value. Valid operators are Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for value, so that a pod can tolerate all taints of a particular category.
                            type: string
                          tolerationSeconds:
                            description: TolerationSeconds represents the period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (do not evict). Zero and negative values will be treated as 0 (evict immediately) by the system.
                            type: integer
                            format: int64
                          value:
                            description: Value is the taint value the toleration matches to. If the operator is Exists, the value should be empty, otherwise just a regular string.
                            type: string
                registrationImagePullSpec:
                  description: RegistrationImagePullSpec represents the desired image configuration of registration agent. quay.io/open-cluster-management.io/registration:latest will be used if unspecified.
                  type: string
                workImagePullSpec:
                  description: WorkImagePullSpec represents the desired image configuration of work agent. quay.io/open-cluster-management.io/work:latest will be used if unspecified.
                  type: string
            status:
              description: Status represents the current status of Klusterlet agent.
              type: object
              properties:
                conditions:
                  description: 'Conditions contain the different condition statuses for this Klusterlet. Valid condition types are: Applied: Components have been applied in the managed cluster. Available: Components in the managed cluster are available and ready to serve. Progressing: Components in the managed cluster are in a transitioning state. Degraded: Components in the managed cluster do not match the desired configuration and only provide degraded service.'
                  type: array
                  items:
                    description: "Condition contains details for one aspect of the current state of this API Resource. --- This struct is intended for direct use as an array at the field path .status.conditions.  For example, type FooStatus struct{     // Represents the observations of a foo's current state.     // Known .status.conditions.type are: \"Available\", \"Progressing\", and \"Degraded\"     // +patchMergeKey=type     // +patchStrategy=merge     // +listType=map     // +listMapKey=type     Conditions []metav1.Condition `json:\"conditions,omitempty\" patchStrategy:\"merge\" patchMergeKey:\"type\" protobuf:\"bytes,1,rep,name=conditions\"` \n     // other fields }"
                    type: object
                    required:
                      - lastTransitionTime
                      - message
                      - reason
                      - status
                      - type
                    properties:
                      lastTransitionTime:
                        description: lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed.  If that is not known, then using the time when the API field changed is acceptable.
                        type: string
                        format: date-time
                      message:
                        description: message is a human readable message indicating details about the transition. This may be an empty string.
                        type: string
                        maxLength: 32768
                      observedGeneration:
                        description: observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.
                        type: integer
                        format: int64
                        minimum: 0
                      reason:
                        description: reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.
                        type: string
                        maxLength: 1024
                        minLength: 1
                        pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
                      status:
                        description: status of the condition, one of True, False, Unknown.
                        type: string
                        enum:
                          - "True"
                          - "False"
                          - Unknown
                      type:
                        description: type of condition in CamelCase or in foo.example.com/CamelCase. --- Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be useful (see .node.status.conditions), the ability to deconflict is important. The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt)
                        type: string
                        maxLength: 316
                        pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
                generations:
                  description: Generations are used to determine when an item needs to be reconciled or has changed in a way that needs a reaction.
                  type: array
                  items:
                    description: GenerationStatus keeps track of the generation for a given resource so that decisions about forced updates can be made. The definition matches the GenerationStatus defined in github.com/openshift/api/v1
                    type: object
                    properties:
                      group:
                        description: group is the group of the resource that you're tracking
                        type: string
                      lastGeneration:
                        description: lastGeneration is the last generation of the resource that controller applies
                        type: integer
                        format: int64
                      name:
                        description: name is the name of the resource that you're tracking
                        type: string
                      namespace:
                        description: namespace is where the resource that you're tracking is
                        type: string
                      resource:
                        description: resource is the resource type of the resource that you're tracking
                        type: string
                      version:
                        description: version is the version of the resource that you're tracking
                        type: string
                observedGeneration:
                  description: ObservedGeneration is the last generation change you've dealt with
                  type: integer
                  format: int64
                relatedResources:
                  description: RelatedResources are used to track the resources that are related to this Klusterlet.
                  type: array
                  items:
                    description: RelatedResourceMeta represents the resource that is managed by an operator
                    type: object
                    properties:
                      group:
                        description: group is the group of the resource that you're tracking
                        type: string
                      name:
                        description: name is the name of the resource that you're tracking
                        type: string
                      namespace:
                        description: namespace is where the thing you're tracking is
                        type: string
                      resource:
                        description: resource is the resource type of the resource that you're tracking
                        type: string
                      version:
                        description: version is the version of the thing you're tracking
                        type: string
      served: true
      storage: true
      subresources:
        status: {}
status:
  acceptedNames:
    kind: ""
    plural: ""
  conditions: []
  storedVersions: []

第二个yaml是这样的,这个是配置了一个新的namespace,然后部署了一个klusterlet应用和配置,这个到底是啥,作者暂时也说不出。

The second yaml is like this. This is to configure a new namespace, and then deploy a klusterlet application and configuration. What exactly is this, the author can’t say for the time being.


---
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    workload.openshift.io/allowed: "management"
  name: "open-cluster-management-agent"

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: klusterlet
  namespace: "open-cluster-management-agent"

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: klusterlet
rules:
- apiGroups: [""]
  resources: ["secrets", "configmaps", "serviceaccounts"]
  verbs: ["create", "get", "list", "update", "watch", "patch", "delete"]
- apiGroups: ["coordination.k8s.io"]
  resources: ["leases"]
  verbs: ["create", "get", "list", "update", "watch", "patch"]
- apiGroups: ["authorization.k8s.io"]
  resources: ["subjectaccessreviews"]
  verbs: ["create"]
- apiGroups: [""]
  resources: ["namespaces"]
  verbs: ["create", "get", "list", "watch","delete"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["", "events.k8s.io"]
  resources: ["events"]
  verbs: ["create", "patch", "update"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["create", "get", "list", "update", "watch", "patch", "delete"]
- apiGroups: ["rbac.authorization.k8s.io"]
  resources: ["clusterrolebindings", "rolebindings"]
  verbs: ["create", "get", "list", "update", "watch", "patch", "delete"]
- apiGroups: ["rbac.authorization.k8s.io"]
  resources: ["clusterroles", "roles"]
  verbs: ["create", "get", "list", "update", "watch", "patch", "delete", "escalate", "bind"]
- apiGroups: ["apiextensions.k8s.io"]
  resources: ["customresourcedefinitions"]
  verbs: ["create", "get", "list", "update", "watch", "patch", "delete"]
- apiGroups: ["operator.open-cluster-management.io"]
  resources: ["klusterlets"]
  verbs: ["get", "list", "watch", "update", "patch", "delete"]
- apiGroups: ["operator.open-cluster-management.io"]
  resources: ["klusterlets/status"]
  verbs: ["update", "patch"]
- apiGroups: ["work.open-cluster-management.io"]
  resources: ["appliedmanifestworks"]
  verbs: ["list", "update", "patch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: open-cluster-management:klusterlet-admin-aggregate-clusterrole
  labels:
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
rules:
- apiGroups: ["operator.open-cluster-management.io"]
  resources: ["klusterlets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: klusterlet
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: klusterlet
subjects:
- kind: ServiceAccount
  name: klusterlet
  namespace: "open-cluster-management-agent"

---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: klusterlet
  namespace: "open-cluster-management-agent"
  labels:
    app: klusterlet
spec:
  replicas: 1
  selector:
    matchLabels:
      app: klusterlet
  template:
    metadata:
      annotations:
        target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
      labels:
        app: klusterlet
    spec:
      serviceAccountName: klusterlet
      tolerations:
      - key: "node-role.kubernetes.io/infra"
        value: ""
        effect: "NoSchedule"
        operator: "Exists"
      containers:
      - name: klusterlet
        image: registry.redhat.io/multicluster-engine/registration-operator-rhel8@sha256:183dc28f1991ad2aa2fcb987d217fc63863909497ae9291b14a96079640463d3
        imagePullPolicy: IfNotPresent
        args:
          - "/registration-operator"
          - "klusterlet"
          - "--disable-leader-election"
        livenessProbe:
          httpGet:
            path: /healthz
            scheme: HTTPS
            port: 8443
          initialDelaySeconds: 2
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /healthz
            scheme: HTTPS
            port: 8443
          initialDelaySeconds: 2

---
apiVersion: v1
kind: Secret
metadata:
  name: "bootstrap-hub-kubeconfig"

  namespace: "open-cluster-management-agent"

type: Opaque
data:
  kubeconfig: "YXBpVmVyc2............"

---
apiVersion: operator.open-cluster-management.io/v1
kind: Klusterlet
metadata:
  name: klusterlet
spec:
  deployOption:
    mode: Default
  registrationImagePullSpec: "registry.redhat.io/multicluster-engine/registration-rhel8@sha256:52efbbbd9deef8517ea2c96b1d4756c154ebf342a6331603c6942cf0a64ee133"
  workImagePullSpec: "registry.redhat.io/multicluster-engine/work-rhel8@sha256:3e1a592361dc8176dae1eb5d2bc82bd3aabb6e370add47ae84325ddeb00d661c"
  clusterName: "edge01"
  namespace: "open-cluster-management-agent"
  nodePlacement:
    tolerations:
    - key: "node-role.kubernetes.io/infra"
      value: ""
      effect: "NoSchedule"
      operator: "Exists"

导入配置以后,我们就能看到集群导入成功啦。

After importing the configuration, we can see that the cluster was imported successfully.

cluster set页面上也都是正常的标志。

There are also normal signs on the cluster set page.

集群详细页面上也都是正常的标志。

The cluster details page is also full of normal signs.

新集群的host页面,也有了一个新的worker节点。

The host page of the new cluster also has a new worker node.

集群详细信息的插件页面上,也都是正常的标志。

On the addon page of the cluster details, there are also normal signs.

我们登录到新装的edge01集群的管理页面看看。

Let's log in to the management page of the newly installed edge01 cluster to see.

新的edge01集群,是不能自行升级的,提示这是一个特殊的hosted集群。

The new edge01 cluster cannot be upgraded by itself, suggesting that this is a special hosted cluster.

回想一下,在ACM界面里面,edge01是hosted类型。

Recall that in the ACM interface, edge01 is a hosted type.

我们简单的看看,这个hosted control plan的资源消耗。

Let's take a brief look at the resource consumption of this hosted control plan.

我们看一下这个control plan里面都些什么pod。

Let's take a look at what pods are in this control plan.

cli login into the hosted cluster

接下来,我们通过命令行来登录到新的edge01集群,看看命令行上,这个新的集群有什么特殊的地方。

Next, we log in to the new edge01 cluster through the command line, and see what is special about this new cluster on the command line.

oc extract -n ${ACM_DEMO_CLUSTER} secret/${ACM_DEMO_CLUSTER}-admin-kubeconfig --to=- > ${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER}

# approve the worker node, if the node can't import
# under normal situation, this is no needed.
oc --kubeconfig=${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER} get csr | grep -v Approved
oc --kubeconfig=${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER} get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc --kubeconfig=${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER} adm certificate approve


oc --kubeconfig=${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER} get co
# NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
# console                                    4.11.21   True        False         False      6h22m
# csi-snapshot-controller                    4.11.21   True        False         False      6h24m
# dns                                        4.11.21   True        False         False      6h23m
# image-registry                             4.11.21   True        False         False      6h23m
# ingress                                    4.11.21   True        False         False      6h39m
# insights                                   4.11.21   True        False         False      6h25m
# kube-apiserver                             4.11.21   True        False         False      6h40m
# kube-controller-manager                    4.11.21   True        False         False      6h40m
# kube-scheduler                             4.11.21   True        False         False      6h40m
# kube-storage-version-migrator              4.11.21   True        False         False      6h24m
# monitoring                                 4.11.21   True        False         False      6h20m
# network                                    4.11.21   True        False         False      6h24m
# openshift-apiserver                        4.11.21   True        False         False      6h40m
# openshift-controller-manager               4.11.21   True        False         False      6h40m
# openshift-samples                          4.11.21   True        False         False      6h23m
# operator-lifecycle-manager                 4.11.21   True        False         False      6h40m
# operator-lifecycle-manager-catalog         4.11.21   True        False         False      6h40m
# operator-lifecycle-manager-packageserver   4.11.21   True        False         False      6h40m
# service-ca                                 4.11.21   True        False         False      6h25m
# storage                                    4.11.21   True        False         False      6h25m

oc --kubeconfig=${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER} get node
# NAME             STATUS   ROLES    AGE   VERSION
# edge-worker-01   Ready    worker   17h   v1.24.6+5658434

oc --kubeconfig=${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER} get mcp
# error: the server doesn't have a resource type "mcp"

oc --kubeconfig=${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER} get mc
# error: the server doesn't have a resource type "mc"

oc --kubeconfig=${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER} get all -o wide -n openshift-ingress
# NAME                                 READY   STATUS    RESTARTS   AGE     IP              NODE             NOMINATED NODE   READINESS GATES
# pod/router-default-bb569f544-cknjw   1/1     Running   0          6h41m   192.168.12.33   edge-master-01   <none>           <none>

# NAME                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE     SELECTOR
# service/router-internal-default   ClusterIP   172.31.152.115   <none>        80/TCP,443/TCP,1936/TCP   6h41m   ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default

# NAME                             READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS   IMAGES                                                                                                                   SELECTOR
# deployment.apps/router-default   1/1     1            1           6h41m   router       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e0dc935b7825a800e32eac69fafa2d238e1d6eb2f344cdf29345cb1123c26a22   ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default

# NAME                                       DESIRED   CURRENT   READY   AGE     CONTAINERS   IMAGES                                                                                                                   SELECTOR
# replicaset.apps/router-default-bb569f544   1         1         1       6h41m   router       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e0dc935b7825a800e32eac69fafa2d238e1d6eb2f344cdf29345cb1123c26a22   ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default,pod-template-hash=bb569f544


oc --kubeconfig=${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER} get pod -A | wc -l
# 56

oc --kubeconfig=${BASE_DIR}/data/install/kubeconfig-${ACM_DEMO_CLUSTER} get clusterversion
# NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
# version   4.11.21   True        False         6h35m   Cluster version is 4.11.21


post operation

装完了,我们为了方便做实验,我们对集群节点做点配置。虽然减低了集群的安全性,但是做实验吗,无所谓了。

After the installation is complete, we will configure the cluster nodes for the convenience of experiments. Although the security of the cluster is reduced, it doesn't matter if you only do the experiment.


# on helper

# VAR_CLUSTER=edge01
# oc get secret/$VAR_CLUSTER-keypair -n $VAR_CLUSTER --template='{{index .data "id_rsa.key" | base64decode}}' > ${BASE_DIR}/data/install/edge.key

# chmod 600 ${BASE_DIR}/data/install/edge.key

# ssh -i ${BASE_DIR}/data/install/edge.key core@192.168.12.33

cat > ${BASE_DIR}/data/install/crack.txt << EOF

echo redhat | sudo passwd --stdin root

sudo sed -i "s|^PasswordAuthentication no$|PasswordAuthentication yes|g" /etc/ssh/sshd_config
sudo sed -i "s|^PermitRootLogin no$|PermitRootLogin yes|g" /etc/ssh/sshd_config
sudo sed -i "s|^#ClientAliveInterval 180$|ClientAliveInterval 1800|g" /etc/ssh/sshd_config

sudo systemctl restart sshd

sudo sh -c 'echo "export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost.kubeconfig" >> /root/.bashrc'

sudo sh -c 'echo "RET=\\\`oc config use-context system:admin\\\`" >> /root/.bashrc'

EOF

for i in 33
do
  ssh core@192.168.12.$i < ${BASE_DIR}/data/install/crack.txt
done


for i in 33
do
  sshpass -p 'redhat' ssh-copy-id root@192.168.12.$i
done


ssh root@192.168.12.33

end

openshift 4.12 UPI in agent way, 3 node.

OpenShift的安装方式很多了,现在又多了一种,agent based installer。最大的特点是,不需要额外的bootstrap节点了。这可是天大的好消息,因为,以前安装之前,和客户交流,客户总是不理解,为什么红帽说支持3节点部署,但是却要求提供4台服务器。也不能怪客户,按照一般的理解,之前红帽是不支持严格意义上的3节点部署,就因为有这个bootstrap. 现在好了,agent based installer是真正世俗意义上的支持3节点部署了。

从官方文档来看,能压缩掉bootstrap,是因为bootstrap相关的服务,都压缩到一个master节点上,并使用了assisted installer流程,来达到真正的3节点安装的。

  • https://docs.openshift.com/container-platform/4.12/installing/installing_with_agent_based_installer/preparing-to-install-with-agent-based-installer.html

本文,就用agent based installer来装一个3节点的ocp集群。和单节点集群不同,3节点集群需要配置vip,来承载api server和ingress.

on helper node


# switch to you install version

export BUILDNUMBER=4.12.9

pushd /data/ocp4/${BUILDNUMBER}
tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
# tar -xzf oc-mirror.tar.gz -C /usr/local/bin/
# chmod +x /usr/local/bin/oc-mirror
install -m 755 /data/ocp4/clients/butane-amd64 /usr/local/bin/butane
install -m 755 /data/ocp4/clients/coreos-installer_amd64 /usr/local/bin/coreos-installer
popd

# create a user and create the cluster under the user

useradd -m 3node

su - 3node

ssh-keygen

cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

chmod 600 ~/.ssh/config

cat << 'EOF' >> ~/.bashrc

export BASE_DIR='/home/3node/'

EOF

# export BASE_DIR='/home/3node/'

export BUILDNUMBER=4.12.9

mkdir -p ${BASE_DIR}/data/{sno/disconnected,install}

# set some parameter of you rcluster

NODE_SSH_KEY="$(cat ${BASE_DIR}/.ssh/id_rsa.pub)"
INSTALL_IMAGE_REGISTRY=quaylab.infra.wzhlab.top:5443

# PULL_SECRET='{"auths":{"registry.redhat.io": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"'${INSTALL_IMAGE_REGISTRY}'": {"auth": "'$( echo -n 'admin:shadowman' | openssl base64 )'","email": "noemail@localhost"}}}'
PULL_SECRET=$(cat /data/pull-secret.json)

NTP_SERVER=192.168.77.11
# HELP_SERVER=192.168.7.11
# KVM_HOST=192.168.7.11
API_VIP=192.168.77.99
INGRESS_VIP=192.168.77.98
# CLUSTER_PROVISION_IP=192.168.7.103
# BOOTSTRAP_IP=192.168.7.12

# 定义单节点集群的节点信息
SNO_CLUSTER_NAME=osp-demo
SNO_BASE_DOMAIN=wzhlab.top

BOOTSTRAP_IP=192.168.77.42
MASTER_01_IP=192.168.77.43
MASTER_02_IP=192.168.77.44
MASTER_03_IP=192.168.77.45

BOOTSTRAP_IPv6=fd03::42
MASTER_01_IPv6=fd03::43
MASTER_02_IPv6=fd03::44
MASTER_03_IPv6=fd03::45

BOOTSTRAP_HOSTNAME=bootstrap-demo
MASTER_01_HOSTNAME=master-01-demo
MASTER_02_HOSTNAME=master-02-demo
MASTER_03_HOSTNAME=master-03-demo

BOOTSTRAP_INTERFACE=enp1s0
MASTER_01_INTERFACE=enp1s0
MASTER_02_INTERFACE=enp1s0
MASTER_03_INTERFACE=enp1s0

MASTER_01_INTERFACE_MAC=52:54:00:12:A1:01
MASTER_02_INTERFACE_MAC=52:54:00:12:A1:02
MASTER_03_INTERFACE_MAC=52:54:00:12:A1:03

BOOTSTRAP_DISK=/dev/vda
MASTER_01_DISK=/dev/vda
MASTER_02_DISK=/dev/vda
MASTER_03_DISK=/dev/vda

OCP_GW=192.168.77.11
OCP_NETMASK=255.255.255.0
OCP_NETMASK_S=24
OCP_DNS=192.168.77.11

OCP_GW_v6=fd03::11
OCP_NETMASK_v6=64

# echo ${SNO_IF_MAC} > /data/sno/sno.mac

mkdir -p ${BASE_DIR}/data/install
cd ${BASE_DIR}/data/install

/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] *

cat << EOF > ${BASE_DIR}/data/install/install-config.yaml 
apiVersion: v1
baseDomain: $SNO_BASE_DOMAIN
compute:
- name: worker
  replicas: 0 
controlPlane:
  name: master
  replicas: 3 
metadata:
  name: $SNO_CLUSTER_NAME
networking:
  # OVNKubernetes , OpenShiftSDN
  clusterNetwork:
    - cidr: 172.21.0.0/16
      hostPrefix: 23
    # - cidr: fd02::/48
    #   hostPrefix: 64
  machineNetwork:
    - cidr: 192.168.77.0/24
    # - cidr: 2001:DB8::/32
  serviceNetwork:
    - 172.22.0.0/16
    # - fd03::/112
platform:
  baremetal:
    apiVIPs:
    - $API_VIP
    # - 2001:DB8::4
    ingressVIPs:
    - $INGRESS_VIP
    # - 2001:DB8::5
pullSecret: '${PULL_SECRET}'
sshKey: |
$( cat ${BASE_DIR}/.ssh/id_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

cat << EOF > ${BASE_DIR}/data/install/agent-config.yaml
apiVersion: v1alpha1
kind: AgentConfig
metadata:
  name: $SNO_CLUSTER_NAME
rendezvousIP: $MASTER_01_IP
additionalNTPSources:
- $NTP_SERVER
hosts:
  - hostname: $MASTER_01_HOSTNAME
    role: master
    rootDeviceHints:
      deviceName: "$MASTER_01_DISK"
    interfaces:
      - name: $MASTER_01_INTERFACE
        macAddress: $MASTER_01_INTERFACE_MAC
    networkConfig:
      interfaces:
        - name: $MASTER_01_INTERFACE
          type: ethernet
          state: up
          mac-address: $MASTER_01_INTERFACE_MAC
          ipv4:
            enabled: true
            address:
              - ip: $MASTER_01_IP
                prefix-length: $OCP_NETMASK_S
            dhcp: false
      dns-resolver:
        config:
          server:
            - $OCP_DNS
      routes:
        config:
          - destination: 0.0.0.0/0
            next-hop-address: $OCP_GW
            next-hop-interface: $MASTER_01_INTERFACE
            table-id: 254
  - hostname: $MASTER_02_HOSTNAME
    role: master
    rootDeviceHints:
      deviceName: "$MASTER_02_DISK"
    interfaces:
      - name: $MASTER_02_INTERFACE
        macAddress: $MASTER_02_INTERFACE_MAC
    networkConfig:
      interfaces:
        - name: $MASTER_02_INTERFACE
          type: ethernet
          state: up
          mac-address: $MASTER_02_INTERFACE_MAC
          ipv4:
            enabled: true
            address:
              - ip: $MASTER_02_IP
                prefix-length: $OCP_NETMASK_S
            dhcp: false
      dns-resolver:
        config:
          server:
            - $OCP_DNS
      routes:
        config:
          - destination: 0.0.0.0/0
            next-hop-address: $OCP_GW
            next-hop-interface: $MASTER_02_INTERFACE
            table-id: 254
  - hostname: $MASTER_03_HOSTNAME
    role: master
    rootDeviceHints:
      deviceName: "$MASTER_03_DISK" 
    interfaces:
      - name: $MASTER_03_INTERFACE
        macAddress: $MASTER_03_INTERFACE_MAC
    networkConfig:
      interfaces:
        - name: $MASTER_03_INTERFACE
          type: ethernet
          state: up
          mac-address: $MASTER_03_INTERFACE_MAC
          ipv4:
            enabled: true
            address:
              - ip: $MASTER_03_IP
                prefix-length: $OCP_NETMASK_S
            dhcp: false
      dns-resolver:
        config:
          server:
            - $OCP_DNS
      routes:
        config:
          - destination: 0.0.0.0/0
            next-hop-address: $OCP_GW
            next-hop-interface: $MASTER_03_INTERFACE
            table-id: 254            
EOF

/bin/cp -f ${BASE_DIR}/data/install/install-config.yaml ${BASE_DIR}/data/install/install-config.yaml.bak

openshift-install --dir=${BASE_DIR}/data/install agent create cluster-manifests

sudo bash -c "/bin/cp -f mirror/registries.conf /etc/containers/registries.conf.d/; chmod +r /etc/containers/registries.conf.d/*"

# /bin/cp -f  /data/ocp4/ansible-helper/files/* ${BASE_DIR}/data/install/openshift/

sudo bash -c "cd /data/ocp4 ; bash image.registries.conf.sh quaylab.infra.wzhlab.top:5443 ;"

/bin/cp -f /data/ocp4/99-worker-container-registries.yaml ${BASE_DIR}/data/install/openshift
/bin/cp -f /data/ocp4/99-master-container-registries.yaml ${BASE_DIR}/data/install/openshift

cd ${BASE_DIR}/data/install/

# openshift-install --dir=${BASE_DIR}/data/install create ignition-configs 

mkdir -p ~/.cache/agent/image_cache/
/bin/cp -f /data/ocp-$BUILDNUMBER/rhcos-live.x86_64.iso ~/.cache/agent/image_cache/coreos-x86_64.iso

openshift-install --dir=${BASE_DIR}/data/install agent create image --log-level=debug
# ......
# DEBUG Fetching image from OCP release (oc adm release info --image-for=machine-os-images --insecure=true --icsp-file=/tmp/icsp-file3636774741 quay.io/openshift-release-dev/ocp-release@sha256:96bf74ce789ccb22391deea98e0c5050c41b67cc17defbb38089d32226dba0b8)
# DEBUG The file was found in cache: /home/3node/.cache/agent/image_cache/coreos-x86_64.iso
# INFO Verifying cached file
# DEBUG extracting /coreos/coreos-x86_64.iso.sha256 to /tmp/cache1876698393, oc image extract --path /coreos/coreos-x86_64.iso.sha256:/tmp/cache1876698393 --confirm --icsp-file=/tmp/icsp-file455852761 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:052130abddf741195b6753888cf8a00757dedeb7010f7d4dcc4b842b5bc705f6
# ......

coreos-installer iso ignition show agent.x86_64.iso > ignition.ign

# HTTP_PATH=http://192.168.7.11:8080/ignition

source /data/ocp4/acm.fn.sh

# 我们会创建一个wzh用户,密码是redhat,这个可以在第一次启动的是,从console/ssh直接用用户名口令登录
# 方便排错和研究
VAR_PWD_HASH="$(python3 -c 'import crypt,getpass; print(crypt.crypt("redhat"))')"

cat ${BASE_DIR}/data/install/ignition.ign \
  | jq --arg VAR "$VAR_PWD_HASH" --arg VAR_SSH "$NODE_SSH_KEY" '.passwd.users += [{ "name": "wzh", "system": true, "passwordHash": $VAR , "sshAuthorizedKeys": [ $VAR_SSH ], "groups": [ "adm", "wheel", "sudo", "systemd-journal"  ] }]' \
  | jq '. += { "kernel_arguments" : { "should_exist" : [ "systemd.debug-shell=1" ] } }' \
  | jq -c . \
  > ${BASE_DIR}/data/install/ignition-iso.ign

coreos-installer iso ignition embed -f -i ignition-iso.ign agent.x86_64.iso

# VAR_IMAGE_VER=rhcos-410.86.202303200936-AnolisOS-0-live.x86_64.iso


on kvm host ( 103 )

cleanup


create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

virsh destroy ocp4-acm-one-bootstrap
virsh undefine ocp4-acm-one-bootstrap

create_lv vgdata poolA lvacm-one-bootstrap 500G 
create_lv vgdata poolA lvacm-one-bootstrap-data 500G 

virsh destroy ocp4-acm-one-master-01
virsh undefine ocp4-acm-one-master-01

create_lv vgdata poolA lvacm-one-master-01 500G 
create_lv vgdata poolA lvacm-one-master-01-data 500G 

virsh destroy ocp4-acm-one-master-02
virsh undefine ocp4-acm-one-master-02

create_lv vgdata poolA lvacm-one-master-02 500G 
create_lv vgdata poolA lvacm-one-master-02-data 500G 

virsh destroy ocp4-acm-one-master-03
virsh undefine ocp4-acm-one-master-03

create_lv vgdata poolA lvacm-one-master-03 500G 
create_lv vgdata poolA lvacm-one-master-03-data 500G 

begin


cat << EOF >> /etc/sysctl.d/99-wzh-sysctl.conf

vm.overcommit_memory = 1

EOF
sysctl --system

# 创建实验用虚拟网络

mkdir -p /data/kvm
cd /data/kvm

cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.103/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh

nmcli con mod baremetal +ipv4.addresses "192.168.7.103/24"
nmcli con up baremetal

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

pvcreate -y /dev/vdb
vgcreate vgdate /dev/vdb

# https://access.redhat.com/articles/766133
lvcreate -y -n poolA -L 500G vgdata
lvcreate -y -n poolA_meta -L 10G vgdata
lvconvert -y --thinpool vgdata/poolA --poolmetadata vgdata/poolA_meta

lvextend -l +100%FREE vgdata/poolA

mkdir -p /data/kvm/one/

scp root@192.168.77.11:/home/3node/data/install/agent.x86_64.iso /data/kvm/one/

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}


SNO_MEM=32

virsh destroy ocp4-acm-one-master-01
virsh undefine ocp4-acm-one-master-01

create_lv vgdata poolA lvacm-one-master-01 500G recreate
create_lv vgdata poolA lvacm-one-master-01-data 500G recreate

virt-install --name=ocp4-acm-one-master-01 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lvacm-one-master-01,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lvacm-one-master-01-data,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.3 --network bridge=baremetal,model=virtio,mac=52:54:00:12:A1:01 \
  --graphics vnc,port=59003 --noautoconsole \
  --boot menu=on --cdrom /data/kvm/one/agent.x86_64.iso

virsh destroy ocp4-acm-one-master-02
virsh undefine ocp4-acm-one-master-02

create_lv vgdata poolA lvacm-one-master-02 500G recreate
create_lv vgdata poolA lvacm-one-master-02-data 500G recreate

virt-install --name=ocp4-acm-one-master-02 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lvacm-one-master-02,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lvacm-one-master-02-data,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.3 --network bridge=baremetal,model=virtio,mac=52:54:00:12:A1:02 \
  --graphics vnc,port=59004 --noautoconsole \
  --boot menu=on --cdrom /data/kvm/one/agent.x86_64.iso

virsh destroy ocp4-acm-one-master-03
virsh undefine ocp4-acm-one-master-03

create_lv vgdata poolA lvacm-one-master-03 500G recreate
create_lv vgdata poolA lvacm-one-master-03-data 500G recreate

virt-install --name=ocp4-acm-one-master-03 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lvacm-one-master-03,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lvacm-one-master-03-data,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.3 --network bridge=baremetal,model=virtio,mac=52:54:00:12:A1:03 \
  --graphics vnc,port=59005 --noautoconsole \
  --boot menu=on --cdrom /data/kvm/one/agent.x86_64.iso

on helper to see result

for unkonwn reason, the vm will be shutdown, instead of reboot, you have to poweron it manually.

cd ${BASE_DIR}/data/install
export KUBECONFIG=${BASE_DIR}/data/install/auth/kubeconfig
echo "export KUBECONFIG=${BASE_DIR}/data/install/auth/kubeconfig" >> ~/.bashrc
# oc completion bash | sudo tee /etc/bash_completion.d/openshift > /dev/null


cd ${BASE_DIR}/data/install
openshift-install --dir=${BASE_DIR}/data/install agent wait-for bootstrap-complete \
    --log-level=debug
# INFO Uploaded logs for host master-02-demo cluster b1d26586-caae-4b49-a0c7-30c6f8c3b9db
# INFO Host: master-01-demo, reached installation stage Writing image to disk: 100%
# INFO Host: master-01-demo, reached installation stage Waiting for control plane: Waiting for masters to join bootstrap control plane
# INFO Bootstrap Kube API Initialized
# INFO Host: master-02-demo, reached installation stage Configuring
# INFO Host: master-03-demo, reached installation stage Configuring
# INFO Host: master-02-demo, reached installation stage Joined
# INFO Host: master-01-demo, reached installation stage Waiting for bootkube
# INFO Host: master-03-demo, reached installation stage Done
# INFO Host: master-01-demo, reached installation stage Waiting for controller: waiting for controller pod ready event
# INFO Bootstrap configMap status is complete
# INFO cluster bootstrap is complete

cd ${BASE_DIR}/data/install
openshift-install --dir=${BASE_DIR}/data/install agent wait-for install-complete 
# INFO Waiting for cluster install to initialize. Sleeping for 30 seconds
# INFO Bootstrap Kube API Initialized
# INFO Bootstrap configMap status is complete
# INFO cluster bootstrap is complete
# INFO Cluster is installed
# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run
# INFO     export KUBECONFIG=/home/3node/data/install/auth/kubeconfig
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.osp-demo.wzhlab.top
# INFO Login to the console with user: "kubeadmin", and password: "LsWCT-b8oaw-tvEKY-RUwKC"

password login and oc config


# init setting for helper node
cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF
chmod 600 ~/.ssh/config

# ssh core@*****

# sudo -i

# # change password for root
# echo 'redhat' | passwd --stdin root

# sed -i "s|^PasswordAuthentication no$|PasswordAuthentication yes|g" /etc/ssh/sshd_config
# sed -i "s|^PermitRootLogin no$|PermitRootLogin yes|g" /etc/ssh/sshd_config
# sed -i "s|^#ClientAliveInterval 180$|ClientAliveInterval 1800|g" /etc/ssh/sshd_config

# systemctl restart sshd

# # set env, so oc can be used
# cat << EOF >> ~/.bashrc

# export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost.kubeconfig

# RET=`oc config use-context system:admin`

# EOF

cat > ${BASE_DIR}/data/install/crack.txt << EOF

echo redhat | sudo passwd --stdin root

sudo sed -i "s|^PasswordAuthentication no$|PasswordAuthentication yes|g" /etc/ssh/sshd_config
sudo sed -i "s|^PermitRootLogin no$|PermitRootLogin yes|g" /etc/ssh/sshd_config
sudo sed -i "s|^#ClientAliveInterval 180$|ClientAliveInterval 1800|g" /etc/ssh/sshd_config

sudo systemctl restart sshd

sudo sh -c 'echo "export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost.kubeconfig" >> /root/.bashrc'

sudo sh -c 'echo "RET=\\\`oc config use-context system:admin\\\`" >> /root/.bashrc'

EOF

for i in 23 24 25
do
  ssh core@192.168.7.$i < ${BASE_DIR}/data/install/crack.txt
done

from other host

# https://unix.stackexchange.com/questions/230084/send-the-password-through-stdin-in-ssh-copy-id
dnf install -y sshpass

for i in 23 24 25
do
  sshpass -p 'redhat' ssh-copy-id root@192.168.7.$i
done

poweroff


for i in 23 24 25
do
  ssh root@192.168.7.$i poweroff
done

poweron


virsh start ocp4-acm-one-master-01

virsh start ocp4-acm-one-master-02

virsh start ocp4-acm-one-master-03

back and merge kubeconfig


mkdir -p ~/.kube/bak/

var_date=$(date '+%Y-%m-%d-%H%M')

/bin/cp -f /data/install/auth/kubeconfig ~/.kube/bak/kubeconfig-$var_date
/bin/cp -f /data/install/auth/kubeadmin-password ~/.kube/bak/kubeadmin-password-$var_date

sed "s/admin/admin\/$SNO_CLUSTER_NAME/g" /data/install/auth/kubeconfig > /tmp/config.new

# https://medium.com/@jacobtomlinson/how-to-merge-kubernetes-kubectl-config-files-737b61bd517d
/bin/cp -f ~/.kube/config ~/.kube/config.bak && KUBECONFIG=~/.kube/config:/tmp/config.new kubectl config view --flatten > /tmp/config && /bin/mv -f /tmp/config ~/.kube/config

unset KUBECONFIG

add worker node

我们装好了single node,那么接下来,我们还可以给这个single node添加worker节点,让这个single node cluster变成一个单master的集群。


# first, lets stick ingress to master
oc label node acm-demo-hub-master  ocp-ingress-run="true"

oc patch ingresscontroller default -n openshift-ingress-operator --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector": {"matchLabels":{"ocp-ingress-run":"true"}}}}}'

# we are testing env, so we don't need ingress replicas.
oc patch --namespace=openshift-ingress-operator --patch='{"spec": {"replicas": 1}}' --type=merge ingresscontroller/default

oc get -n openshift-ingress-operator ingresscontroller/default -o yaml

# then we get worker's ignition file, and start worker node, add it to cluster

oc extract -n openshift-machine-api secret/worker-user-data --keys=userData --to=- > /var/www/html/ignition/sno-worker.ign


HELP_SERVER=192.168.7.11

# 定义单节点集群的节点信息
SNO_IP=192.168.7.16
SNO_GW=192.168.7.11
SNO_NETMAST=255.255.255.0
SNO_HOSTNAME=acm-demo-hub-worker-01
SNO_IF=enp1s0
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_MEM=16

BOOT_ARG=" ip=$SNO_IP::$SNO_GW:$SNO_NETMAST:$SNO_HOSTNAME:$SNO_IF:none nameserver=$SNO_DNS coreos.inst.install_dev=${SNO_DISK##*/} coreos.inst.ignition_url=http://$HELP_SERVER:8080/ignition/sno-worker.ign"

/bin/cp -f /data/ocp4/rhcos-live.x86_64.iso sno.iso

coreos-installer iso kargs modify -a "$BOOT_ARG" sno.iso

# go to kvm host ( 103 )
scp root@192.168.7.11:/data/install/sno.iso /data/kvm/

virsh destroy ocp4-acm-hub-worker01
virsh undefine ocp4-acm-hub-worker01

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

create_lv vgdata poolA lvacmhub-worker01 500G recreate
# create_lv vgdata poolA lvacmhub-worker01-data 500G remove

virt-install --name=ocp4-acm-hub-worker01 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lvacmhub-worker01,device=disk,bus=virtio,format=raw \
  `# --disk path=/dev/vgdata/lvacmhub-data,device=disk,bus=virtio,format=raw` \
  --os-variant rhel8.3 --network bridge=baremetal,model=virtio \
  --graphics vnc,port=59003 \
  --boot menu=on --cdrom /data/kvm/sno.iso 

# after 2 boot up,
# go back to helper
oc get csr
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

end

openshift 4.12 UPI in agent way, single node

OpenShift的安装方式很多了,现在又多了一种,agent based installer。最大的特点是,不需要额外的bootstrap节点了。这可是天大的好消息,因为,以前安装之前,和客户交流,客户总是不理解,为什么红帽说支持3节点部署,但是却要求提供4台服务器。也不能怪客户,按照一般的理解,之前红帽是不支持严格意义上的3节点部署,就因为有这个bootstrap. 现在好了,agent based installer是真正世俗意义上的支持3节点部署了。

从官方文档来看,能压缩掉bootstrap,是因为bootstrap相关的服务,都压缩到一个master节点上,并使用了assisted installer流程,来达到真正的3节点安装的。

  • https://docs.openshift.com/container-platform/4.12/installing/installing_with_agent_based_installer/preparing-to-install-with-agent-based-installer.html

本文,就用agent based installer来装一个单节点的ocp集群。

on helper node


# switch to you install version

export BUILDNUMBER=4.12.9

pushd /data/ocp4/${BUILDNUMBER}
tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
# tar -xzf oc-mirror.tar.gz -C /usr/local/bin/
# chmod +x /usr/local/bin/oc-mirror
install -m 755 /data/ocp4/clients/butane-amd64 /usr/local/bin/butane
install -m 755 /data/ocp4/clients/coreos-installer_amd64 /usr/local/bin/coreos-installer
popd

# create a user and create the cluster under the user

useradd -m 3node

su - 3node

ssh-keygen

cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

chmod 600 ~/.ssh/config

cat << 'EOF' >> ~/.bashrc

export BASE_DIR='/home/3node/'

EOF

# export BASE_DIR='/home/3node/'

export BUILDNUMBER=4.12.9

mkdir -p ${BASE_DIR}/data/{sno/disconnected,install}

# set some parameter of you rcluster

NODE_SSH_KEY="$(cat ${BASE_DIR}/.ssh/id_rsa.pub)"
INSTALL_IMAGE_REGISTRY=quaylab.infra.wzhlab.top:5443

# PULL_SECRET='{"auths":{"registry.redhat.io": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"'${INSTALL_IMAGE_REGISTRY}'": {"auth": "'$( echo -n 'admin:shadowman' | openssl base64 )'","email": "noemail@localhost"}}}'
PULL_SECRET=$(cat /data/pull-secret.json)

NTP_SERVER=192.168.77.11
# HELP_SERVER=192.168.7.11
# KVM_HOST=192.168.7.11
# API_VIP=192.168.77.99
# INGRESS_VIP=192.168.77.98
# CLUSTER_PROVISION_IP=192.168.7.103
# BOOTSTRAP_IP=192.168.7.12
MACHINE_NETWORK='192.168.77.0/24'

# 定义单节点集群的节点信息
SNO_CLUSTER_NAME=osp-demo
SNO_BASE_DOMAIN=wzhlab.top

BOOTSTRAP_IP=192.168.77.42
MASTER_01_IP=192.168.77.43
MASTER_02_IP=192.168.77.44
MASTER_03_IP=192.168.77.45

BOOTSTRAP_IPv6=fd03::42
MASTER_01_IPv6=fd03::43
MASTER_02_IPv6=fd03::44
MASTER_03_IPv6=fd03::45

BOOTSTRAP_HOSTNAME=bootstrap-demo
MASTER_01_HOSTNAME=master-01-demo
MASTER_02_HOSTNAME=master-02-demo
MASTER_03_HOSTNAME=master-03-demo

BOOTSTRAP_INTERFACE=enp1s0
MASTER_01_INTERFACE=enp1s0
MASTER_02_INTERFACE=enp1s0
MASTER_03_INTERFACE=enp1s0

MASTER_01_INTERFACE_MAC=52:54:00:12:A1:01
MASTER_02_INTERFACE_MAC=52:54:00:12:A1:02
MASTER_03_INTERFACE_MAC=52:54:00:12:A1:03

BOOTSTRAP_DISK=/dev/vda
MASTER_01_DISK=/dev/vda
MASTER_02_DISK=/dev/vda
MASTER_03_DISK=/dev/vda

OCP_GW=192.168.77.11
OCP_NETMASK=255.255.255.0
OCP_NETMASK_S=24
OCP_DNS=192.168.77.11

OCP_GW_v6=fd03::11
OCP_NETMASK_v6=64

# echo ${SNO_IF_MAC} > /data/sno/sno.mac

mkdir -p ${BASE_DIR}/data/install
cd ${BASE_DIR}/data/install

/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] *

cat << EOF > ${BASE_DIR}/data/install/install-config.yaml 
apiVersion: v1
baseDomain: $SNO_BASE_DOMAIN
compute:
- name: worker
  replicas: 0 
controlPlane:
  name: master
  replicas: 1
metadata:
  name: $SNO_CLUSTER_NAME
networking:
  # OVNKubernetes , OpenShiftSDN
  clusterNetwork:
    - cidr: 172.21.0.0/16
      hostPrefix: 23
    # - cidr: fd02::/48
    #   hostPrefix: 64
  machineNetwork:
    - cidr: $MACHINE_NETWORK
    # - cidr: 2001:DB8::/32
  serviceNetwork:
    - 172.22.0.0/16
    # - fd03::/112
platform: 
  none: {}
pullSecret: '${PULL_SECRET}'
sshKey: |
$( cat ${BASE_DIR}/.ssh/id_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

cat << EOF > ${BASE_DIR}/data/install/agent-config.yaml
apiVersion: v1alpha1
kind: AgentConfig
metadata:
  name: $SNO_CLUSTER_NAME
rendezvousIP: $MASTER_01_IP
additionalNTPSources:
- $NTP_SERVER
hosts:
  - hostname: $MASTER_01_HOSTNAME
    role: master
    rootDeviceHints:
      deviceName: "$MASTER_01_DISK"
    interfaces:
      - name: $MASTER_01_INTERFACE
        macAddress: $MASTER_01_INTERFACE_MAC
    networkConfig:
      interfaces:
        - name: $MASTER_01_INTERFACE
          type: ethernet
          state: up
          mac-address: $MASTER_01_INTERFACE_MAC
          ipv4:
            enabled: true
            address:
              - ip: $MASTER_01_IP
                prefix-length: $OCP_NETMASK_S
            dhcp: false
      dns-resolver:
        config:
          server:
            - $OCP_DNS
      routes:
        config:
          - destination: 0.0.0.0/0
            next-hop-address: $OCP_GW
            next-hop-interface: $MASTER_01_INTERFACE
            table-id: 254        
EOF

/bin/cp -f ${BASE_DIR}/data/install/install-config.yaml ${BASE_DIR}/data/install/install-config.yaml.bak

openshift-install --dir=${BASE_DIR}/data/install agent create cluster-manifests

sudo bash -c "/bin/cp -f mirror/registries.conf /etc/containers/registries.conf.d/; chmod +r /etc/containers/registries.conf.d/*"

# /bin/cp -f  /data/ocp4/ansible-helper/files/* ${BASE_DIR}/data/install/openshift/

sudo bash -c "cd /data/ocp4 ; bash image.registries.conf.sh quaylab.infra.wzhlab.top:5443 ;"

/bin/cp -f /data/ocp4/99-worker-container-registries.yaml ${BASE_DIR}/data/install/openshift
/bin/cp -f /data/ocp4/99-master-container-registries.yaml ${BASE_DIR}/data/install/openshift

cd ${BASE_DIR}/data/install/

# openshift-install --dir=${BASE_DIR}/data/install create ignition-configs 

mkdir -p ~/.cache/agent/image_cache/
/bin/cp -f /data/ocp-$BUILDNUMBER/rhcos-live.x86_64.iso ~/.cache/agent/image_cache/coreos-x86_64.iso

openshift-install --dir=${BASE_DIR}/data/install agent create image --log-level=debug
# ......
# DEBUG Fetching image from OCP release (oc adm release info --image-for=machine-os-images --insecure=true --icsp-file=/tmp/icsp-file3636774741 quay.io/openshift-release-dev/ocp-release@sha256:96bf74ce789ccb22391deea98e0c5050c41b67cc17defbb38089d32226dba0b8)
# DEBUG The file was found in cache: /home/3node/.cache/agent/image_cache/coreos-x86_64.iso
# INFO Verifying cached file
# DEBUG extracting /coreos/coreos-x86_64.iso.sha256 to /tmp/cache1876698393, oc image extract --path /coreos/coreos-x86_64.iso.sha256:/tmp/cache1876698393 --confirm --icsp-file=/tmp/icsp-file455852761 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:052130abddf741195b6753888cf8a00757dedeb7010f7d4dcc4b842b5bc705f6
# ......

coreos-installer iso ignition show agent.x86_64.iso > ignition.ign

# HTTP_PATH=http://192.168.7.11:8080/ignition

source /data/ocp4/acm.fn.sh

# 我们会创建一个wzh用户,密码是redhat,这个可以在第一次启动的是,从console/ssh直接用用户名口令登录
# 方便排错和研究
VAR_PWD_HASH="$(python3 -c 'import crypt,getpass; print(crypt.crypt("redhat"))')"

cat ${BASE_DIR}/data/install/ignition.ign \
  | jq --arg VAR "$VAR_PWD_HASH" --arg VAR_SSH "$NODE_SSH_KEY" '.passwd.users += [{ "name": "wzh", "system": true, "passwordHash": $VAR , "sshAuthorizedKeys": [ $VAR_SSH ], "groups": [ "adm", "wheel", "sudo", "systemd-journal"  ] }]' \
  | jq '. += { "kernel_arguments" : { "should_exist" : [ "systemd.debug-shell=1" ] } }' \
  | jq -c . \
  > ${BASE_DIR}/data/install/ignition-iso.ign

coreos-installer iso ignition embed -f -i ignition-iso.ign agent.x86_64.iso

# VAR_IMAGE_VER=rhcos-410.86.202303200936-AnolisOS-0-live.x86_64.iso


on kvm host ( 103 )

cleanup


create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

virsh destroy ocp4-acm-one-bootstrap
virsh undefine ocp4-acm-one-bootstrap

create_lv vgdata poolA lvacm-one-bootstrap 500G 
create_lv vgdata poolA lvacm-one-bootstrap-data 500G 

virsh destroy ocp4-acm-one-master-01
virsh undefine ocp4-acm-one-master-01

create_lv vgdata poolA lvacm-one-master-01 500G 
create_lv vgdata poolA lvacm-one-master-01-data 500G 

virsh destroy ocp4-acm-one-master-02
virsh undefine ocp4-acm-one-master-02

create_lv vgdata poolA lvacm-one-master-02 500G 
create_lv vgdata poolA lvacm-one-master-02-data 500G 

virsh destroy ocp4-acm-one-master-03
virsh undefine ocp4-acm-one-master-03

create_lv vgdata poolA lvacm-one-master-03 500G 
create_lv vgdata poolA lvacm-one-master-03-data 500G 

begin


cat << EOF >> /etc/sysctl.d/99-wzh-sysctl.conf

vm.overcommit_memory = 1

EOF
sysctl --system

# 创建实验用虚拟网络

mkdir -p /data/kvm
cd /data/kvm

cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.103/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh

nmcli con mod baremetal +ipv4.addresses "192.168.7.103/24"
nmcli con up baremetal

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

pvcreate -y /dev/vdb
vgcreate vgdate /dev/vdb

# https://access.redhat.com/articles/766133
lvcreate -y -n poolA -L 500G vgdata
lvcreate -y -n poolA_meta -L 10G vgdata
lvconvert -y --thinpool vgdata/poolA --poolmetadata vgdata/poolA_meta

lvextend -l +100%FREE vgdata/poolA

mkdir -p /data/kvm/one/

scp root@192.168.77.11:/home/3node/data/install/agent.x86_64.iso /data/kvm/one/

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}


SNO_MEM=64

virsh destroy ocp4-acm-one-master-01
virsh undefine ocp4-acm-one-master-01

create_lv vgdata poolA lvacm-one-master-01 500G recreate
create_lv vgdata poolA lvacm-one-master-01-data 500G recreate

virt-install --name=ocp4-acm-one-master-01 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lvacm-one-master-01,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lvacm-one-master-01-data,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.3 --network bridge=baremetal,model=virtio,mac=52:54:00:12:A1:01 \
  --graphics vnc,port=59003 --noautoconsole \
  --boot menu=on --cdrom /data/kvm/one/agent.x86_64.iso


on helper to see result

for unkonwn reason, the vm will be shutdown, instead of reboot, you have to poweron it manually.

cd ${BASE_DIR}/data/install
export KUBECONFIG=${BASE_DIR}/data/install/auth/kubeconfig
echo "export KUBECONFIG=${BASE_DIR}/data/install/auth/kubeconfig" >> ~/.bashrc
# oc completion bash | sudo tee /etc/bash_completion.d/openshift > /dev/null


cd ${BASE_DIR}/data/install
openshift-install --dir=${BASE_DIR}/data/install agent wait-for bootstrap-complete --log-level=debug
# ......
# DEBUG RendezvousIP from the AgentConfig 192.168.77.43
# INFO Bootstrap Kube API Initialized
# INFO Bootstrap configMap status is complete
# INFO cluster bootstrap is complete

cd ${BASE_DIR}/data/install
openshift-install --dir=${BASE_DIR}/data/install agent wait-for install-complete --log-level=debug
# ......
# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run
# INFO     export KUBECONFIG=/home/3node/data/install/auth/kubeconfig
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.osp-demo.wzhlab.top
# INFO Login to the console with user: "kubeadmin", and password: "UmfI2-99uAb-BRdaS-LLjQ9"

password login and oc config


# init setting for helper node
cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF
chmod 600 ~/.ssh/config

# ssh core@*****

# sudo -i

# # change password for root
# echo 'redhat' | passwd --stdin root

# sed -i "s|^PasswordAuthentication no$|PasswordAuthentication yes|g" /etc/ssh/sshd_config
# sed -i "s|^PermitRootLogin no$|PermitRootLogin yes|g" /etc/ssh/sshd_config
# sed -i "s|^#ClientAliveInterval 180$|ClientAliveInterval 1800|g" /etc/ssh/sshd_config

# systemctl restart sshd

# # set env, so oc can be used
# cat << EOF >> ~/.bashrc

# export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost.kubeconfig

# RET=`oc config use-context system:admin`

# EOF

cat > ${BASE_DIR}/data/install/crack.txt << EOF

echo redhat | sudo passwd --stdin root

sudo sed -i "s|^PasswordAuthentication no$|PasswordAuthentication yes|g" /etc/ssh/sshd_config
sudo sed -i "s|^PermitRootLogin no$|PermitRootLogin yes|g" /etc/ssh/sshd_config
sudo sed -i "s|^#ClientAliveInterval 180$|ClientAliveInterval 1800|g" /etc/ssh/sshd_config

sudo systemctl restart sshd

sudo sh -c 'echo "export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost.kubeconfig" >> /root/.bashrc'

sudo sh -c 'echo "RET=\\\`oc config use-context system:admin\\\`" >> /root/.bashrc'

EOF

for i in 23 24 25
do
  ssh core@192.168.7.$i < ${BASE_DIR}/data/install/crack.txt
done

from other host

# https://unix.stackexchange.com/questions/230084/send-the-password-through-stdin-in-ssh-copy-id
dnf install -y sshpass

for i in 23 24 25
do
  sshpass -p 'redhat' ssh-copy-id root@192.168.7.$i
done

poweroff


for i in 23 24 25
do
  ssh root@192.168.7.$i poweroff
done

poweron


virsh start ocp4-acm-one-master-01

virsh start ocp4-acm-one-master-02

virsh start ocp4-acm-one-master-03

back and merge kubeconfig


mkdir -p ~/.kube/bak/

var_date=$(date '+%Y-%m-%d-%H%M')

/bin/cp -f /data/install/auth/kubeconfig ~/.kube/bak/kubeconfig-$var_date
/bin/cp -f /data/install/auth/kubeadmin-password ~/.kube/bak/kubeadmin-password-$var_date

sed "s/admin/admin\/$SNO_CLUSTER_NAME/g" /data/install/auth/kubeconfig > /tmp/config.new

# https://medium.com/@jacobtomlinson/how-to-merge-kubernetes-kubectl-config-files-737b61bd517d
/bin/cp -f ~/.kube/config ~/.kube/config.bak && KUBECONFIG=~/.kube/config:/tmp/config.new kubectl config view --flatten > /tmp/config && /bin/mv -f /tmp/config ~/.kube/config

unset KUBECONFIG

add worker node

我们装好了single node,那么接下来,我们还可以给这个single node添加worker节点,让这个single node cluster变成一个单master的集群。


# first, lets stick ingress to master
oc label node acm-demo-hub-master  ocp-ingress-run="true"

oc patch ingresscontroller default -n openshift-ingress-operator --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector": {"matchLabels":{"ocp-ingress-run":"true"}}}}}'

# we are testing env, so we don't need ingress replicas.
oc patch --namespace=openshift-ingress-operator --patch='{"spec": {"replicas": 1}}' --type=merge ingresscontroller/default

oc get -n openshift-ingress-operator ingresscontroller/default -o yaml

# then we get worker's ignition file, and start worker node, add it to cluster

oc extract -n openshift-machine-api secret/worker-user-data --keys=userData --to=- > /var/www/html/ignition/sno-worker.ign


HELP_SERVER=192.168.7.11

# 定义单节点集群的节点信息
SNO_IP=192.168.7.16
SNO_GW=192.168.7.11
SNO_NETMAST=255.255.255.0
SNO_HOSTNAME=acm-demo-hub-worker-01
SNO_IF=enp1s0
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_MEM=16

BOOT_ARG=" ip=$SNO_IP::$SNO_GW:$SNO_NETMAST:$SNO_HOSTNAME:$SNO_IF:none nameserver=$SNO_DNS coreos.inst.install_dev=${SNO_DISK##*/} coreos.inst.ignition_url=http://$HELP_SERVER:8080/ignition/sno-worker.ign"

/bin/cp -f /data/ocp4/rhcos-live.x86_64.iso sno.iso

coreos-installer iso kargs modify -a "$BOOT_ARG" sno.iso

# go to kvm host ( 103 )
scp root@192.168.7.11:/data/install/sno.iso /data/kvm/

virsh destroy ocp4-acm-hub-worker01
virsh undefine ocp4-acm-hub-worker01

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

create_lv vgdata poolA lvacmhub-worker01 500G recreate
# create_lv vgdata poolA lvacmhub-worker01-data 500G remove

virt-install --name=ocp4-acm-hub-worker01 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lvacmhub-worker01,device=disk,bus=virtio,format=raw \
  `# --disk path=/dev/vgdata/lvacmhub-data,device=disk,bus=virtio,format=raw` \
  --os-variant rhel8.3 --network bridge=baremetal,model=virtio \
  --graphics vnc,port=59003 \
  --boot menu=on --cdrom /data/kvm/sno.iso 

# after 2 boot up,
# go back to helper
oc get csr
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

end

给 openshift 的 coreos 编译内核驱动 rpm

作者有文档和项目,描述了如何编译设备的内核驱动,但是在 openshift 这里,rh-coreos用的 kernel 是高级订阅才有的,我们没办法弄一个和 rh-coreos 相同内核的 rhel 出来,也就没办法继续编译 .ko 了。

好在 openshift 发行版给了一个容器,里面有高级订阅才有的kernel版本开发包,可以帮助我们把这个 .ko 给编译出来,进而编译一个 rpm 包出来。那么我们今天就一步一步做做看。

制作一个工具镜像

openshift 发行版,自带一个 driver-toolkit 镜像,里面有 kernel 相关的开发包,满足了编译的需求,我们的目标是编译一个 rpm,那么我们就需要补充完善这个工具镜像。

OCP_VERSION=$(oc get clusterversion/version -ojsonpath={.status.desired.version})
DRIVER_TOOLKIT_IMAGE=$(oc adm release info $OCP_VERSION --image-for=driver-toolkit)

echo $OCP_VERSION
# 4.11.39
echo $DRIVER_TOOLKIT_IMAGE
# quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dfed734e35163b1ab8483568780d13b528b4c0f558f8e727538af723b7a41ed4

# build a new image based on driver toolkit
# on a rhel
mkdir -p /data/driver
cd /data/driver

cat << EOF > docker.file
FROM quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dfed734e35163b1ab8483568780d13b528b4c0f558f8e727538af723b7a41ed4

RUN dnf install -y rpm-build 
RUN cd /root && git clone https://github.com/wangzheng422/nic-rpm-rnp
RUN cd /root/nic-rpm-rnp && git checkout ocp-4.11.36
RUN mv /root/nic-rpm-rnp/rpmbuild /root/
EOF

podman build --no-cache --authfile /data/pull-secret.json -t quay.io/wangzheng422/driver-toolkit:nic-rpm-rnp-v03 -f docker.file .

podman push quay.io/wangzheng422/driver-toolkit:nic-rpm-rnp-v03

在 openshift 里面编译 rpm

我们有了工具镜像,就可以用特权模式运行它,然后到这个 pod 里面,去运行编译命令,完成 rpm 的编译。

# come back to your cluster
# https://master.sdk.operatorframework.io/docs/best-practices/pod-security-standards/
oc create ns driver-build
oc label --overwrite ns driver-build \
   pod-security.kubernetes.io/enforce=privileged

# oc create serviceaccount -n driver-build demo-app
# oc adm policy add-scc-to-user privileged -z demo-app -n driver-build

cat << EOF > ~/wzh/build.yaml
apiVersion: v1
kind: Pod
metadata:
  name: kmod-driver-samplepod
  annotations:
    openshift.io/scc: privileged
    # openshift.io/scc: restricted-v2

spec:
  # serviceAccountName: demo-app
  containers:
  - image: quay.io/wangzheng422/driver-toolkit:nic-rpm-rnp-v03
    name: simple-kmod-driver-container
    imagePullPolicy: Always
    command: [sleep, infinity]
    securityContext:
      # privileged: true
      AllowPrivilegedContainer: true
  # nodeSelector:
  #   node-role.kubernetes.io/worker: ""
EOF

oc create --save-config -n driver-build -f ~/wzh/build.yaml

# oc delete -n driver-build -f ~/wzh/build.yaml

# oc get all -n driver-build
# NAME                        READY   STATUS    RESTARTS   AGE
# pod/kmod-driver-samplepod   1/1     Running   0          22m

oc rsh -n driver-build pod/kmod-driver-samplepod

bash
cd ~/nic-rpm-rnp
tar zvxf rnp-nic-drv-0.1.6.rc44-35c40ea.tgz
cd rnp-nic-drv-0.1.6.rc44-35c40ea
cd rnp
bash do_build.sh
#   MODPOST 1 modules
#   CC      /root/nic-rpm-rnp/rnp-nic-drv-0.1.6.rc44-35c40ea/rnp/rnp.mod.o
#   LD [M]  /root/nic-rpm-rnp/rnp-nic-drv-0.1.6.rc44-35c40ea/rnp/rnp.ko
# make[1]: Leaving directory '/usr/src/kernels/4.18.0-372.52.1.el8_6.x86_64'
exit

# copy the rpm out to helper node
mkdir -p ~/wzh/rsync
oc project driver-build
oc rsync kmod-driver-samplepod:/root/rpmbuild/RPMS/x86_64/ ~/wzh/rsync/

scp ~/wzh/rsync/rnp-nic-drv-0.1.6.rc44_35c40ea-1.el8.x86_64.rpm core@172.29.17.61:~/

安装 rpm

我们有了驱动rpm,那么我们就直接在node上安装,看看效果吧。

ssh core@172.29.17.61
sudo -i
rpm-ostree install /home/core/rnp-nic-drv-0.1.6.rc44_35c40ea-1.el8.x86_64.rpm

# wait 1 mins at least, then
systemctl reboot

rpm-ostree status
# State: idle
# Deployments:
# ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9b2f4d103a9116e5fb0e5237dd7c932360dda0ef77d3d435374692eaa26dad7c
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 411.86.202304190130-0 (2023-04-19T01:34:04Z)
#              LocalPackages: rnp-nic-drv-0.1.6.rc44_35c40ea-1.el8.x86_64

################
# nic driver update
oc project driver-build

oc cp ./rnp-0.2.0-wzh.tar.gz driver-build/kmod-driver-samplepod:/root/rnp-0.2.0-wzh.tar.gz

oc rsh -n driver-build pod/kmod-driver-samplepod

bash
cd /root
rpmbuild -tb rnp-0.2.0-wzh.tar.gz

oc cp driver-build/kmod-driver-samplepod:/root/rpmbuild/RPMS/x86_64/rnp-0.2.0-1.x86_64.rpm ./rnp-0.2.0-1.x86_64.rpm

scp rnp-0.2.0-1.x86_64.rpm core@172.29.17.61:~/

ssh core@172.29.17.61
sudo -i

rpm-ostree install /home/core/rnp-0.2.0-1.x86_64.rpm

end


#### 使用ethtool命令更新固件
>新固件须重启设备后生效

1.1拷贝固件到Linux系统的/lib/firmware路径下

cp xxx.img.bin /lib/firmware

1.2执行烧录命令,<ethx>需要修改为实际网口名

ethtool -f <ethx> xxx.img.bin 0

@注意:指定网卡上任何一个网口,执行一次更新固件动作即可

ocp crash

  • https://access.redhat.com/solutions/5907731
rpm-ostree kargs --append='crashkernel=256M slub_debug=FZPU'

rpm-ostree kargs --delete='crashkernel=256M'

rpm-ostree kargs --delete='slub_debug=FZPU'

rpm-ostree kargs --append='slub_debug=F'

1. demo lab for openshift 4.12

In this document, we will record the steps to build a demo lab, to show the capability of openshift.

The key show points includes:

  1. agent based install ( 3 master node ) with static ip allocation
  2. worker node scale out
  3. data foundation install

Some additional technical skill includes:

  1. simulate bmc for kvm
  2. lvm thin provision for kvm
  3. ansible tips

Suggested demo senario in the furture:

  1. ODF DR
  2. CNV
  3. osp on ocp

The architecture of demo lab is:

The purpose of this document is to show a practice way to build an openshift demo lab, so the partner can know where to start to build their own lab. For production env, please contact redhat professional service (GPS) for assistant.

2. remote access config

we will use zerotier to connect to the demo lab. we will use the BM 192.168.25.90 as jumpbox.

# on 192.168.25.90
# install zerotier
curl -s https://install.zerotier.com | sudo bash

# join zerotier network
zerotier-cli join xxxxxxxxxxxx

# using a moon to accelerate network speed
zerotier-cli orbit xxxxxxxxxxxx xxxxxxxxxxxx

# enable gui
dnf groupinstall -y 'server with gui' 

# add some handy tools
dnf install -y \
  https://download-ib01.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/b/byobu-5.133-1.el8.noarch.rpm  \
  https://dl.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/s/screen-4.6.2-12.el8.x86_64.rpm \
  https://dl.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/h/htop-3.2.1-1.el8.x86_64.rpm

# add support for kvm and vnc
dnf -y install qemu-kvm libvirt libguestfs-tools virt-install virt-viewer virt-manager tigervnc-server

# auto start libvirt
systemctl enable --now libvirtd

# create password for vnc
# replease xxxxxx with your password
printf 'xxxxxx\nxxxxxx\n\n' | vncpasswd

# create vnc config for vnc starting up
cat << EOF > ~/.vnc/config
session=gnome
securitytypes=vncauth,tlsvnc
# desktop=sandbox
# localhost
geometry=1440x855
alwaysshared
EOF

# auto start vnc session for root user at port 5902
cat << EOF >> /etc/tigervnc/vncserver.users
:2=root
EOF

# auto start vnc session
systemctl enable --now vncserver@:2

# disable firewalld totally, just because I am lazy.
# DO NOT use at production env.
systemctl disable --now firewalld

3. setup helper node

We need helper node, or called it base station, to host several service like container image registry, dns, load balancer for api server, yum repo ( based on use case ). The helper node is also an operation console, the login key, kubeconfig is store on helper node by default.

We will use helper node as default gw for our disconnected openshift cluster. Openshift needs a gateway to be alive, the gateway doesn't need to be functional, for example, it can forward packet to outside, if it can be pinged by openshift nodes, that is OK. If we lost the gateway, or the gateway can't be pinged, openshift installtion will be wired, and failed finally.

We will bring in some hack tips, will use powerdns as dns service, and replease load balancer, normally it is haproxy, with lua plugin of the powerdns. DO NOT use this in production env. It is just convinent for the author.

As disconnection env, we will download the installation media on VPS and sync it to helper node.

3.1. config host BM (97)

# DO NOT use at production env.
cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

# setup ntp server on BM node
sed -i "s/#allow.*/allow all/" /etc/chrony.conf
systemctl enable --now chronyd

chronyc sources -v
#   .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
#  / .- Source state '*' = current best, '+' = combined, '-' = not combined,
# | /             'x' = may be in error, '~' = too variable, '?' = unusable.
# ||                                                 .- xxxx [ yyyy ] +/- zzzz
# ||      Reachability register (octal) -.           |  xxxx = adjusted offset,
# ||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
# ||                                \     |          |  zzzz = estimated error.
# ||                                 |    |           \
# MS Name/IP address         Stratum Poll Reach LastRx Last sample
# ===============================================================================
# ^+ 111.235.248.121               1   8   377    31   -210us[ -210us] +/- 2855us
# ^- static.home.twn.sciurida>     2   7   377   129   +468us[ +448us] +/- 9558us
# ^* twtpe2-ntp-002.aaplimg.c>     1   7   377    33    -50us[  -76us] +/- 1457us
# ^- 114-33-15-129.hinet-ip.h>     2   9   377   335   +994us[ +957us] +/- 8159us

3.2. create helper vm


SNO_MEM=32

# clean up kvm, if we created it before.
virsh destroy ocp4-helper
virsh undefine ocp4-helper

virt-install --name=ocp4-helper --vcpus=8 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/image/ocp4-helper.qcow2,bus=virtio,size=800 \
  --os-variant rhel8.3 --network bridge=br-int,model=virtio,mac=52:54:00:12:A1:01 \
  --graphics vnc,port=59003 --noautoconsole \
  --boot menu=on --cdrom /home/rhel-8.8-x86_64-dvd.iso


3.3. setup helper vm

# DO NOT use at production env.
cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

# DO NOT use at production env.
systemctl disable --now firewalld

# ntp
mv /etc/chrony.conf /etc/chrony.conf.bak

cat << EOF > /etc/chrony.conf
server 192.168.10.90 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow all
logdir /var/log/chrony
EOF
systemctl restart chronyd

systemctl enable --now chronyd

# wait sometime, then check the status
chronyc sources -v
#   .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
#  / .- Source state '*' = current best, '+' = combined, '-' = not combined,
# | /             'x' = may be in error, '~' = too variable, '?' = unusable.
# ||                                                 .- xxxx [ yyyy ] +/- zzzz
# ||      Reachability register (octal) -.           |  xxxx = adjusted offset,
# ||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
# ||                                \     |          |  zzzz = estimated error.
# ||                                 |    |           \
# MS Name/IP address         Stratum Poll Reach LastRx Last sample
# ===============================================================================
# ^* 192.168.10.90                 3   6     7    10   -859ns[-1112ms] +/- 2795us

# setup http web server for yum repo
mkdir -p /data/yum.repos

rsync -P -arz  root@192.168.10.90:/mnt/disc/BaseOS /data/yum.repos/
rsync -P -arz  root@192.168.10.90:/mnt/disc/AppStream /data/yum.repos/

cat << EOF > /etc/yum.repos.d/wzh.repo
[BaseOS]
name=BaseOS
baseurl=file:////data/yum.repos/BaseOS
enabled=1
gpgcheck=0

[AppStream]
name=AppStream
baseurl=file:////data/yum.repos/AppStream
enabled=1
gpgcheck=0
EOF

dnf groupinstall -y 'development'

dnf install -y python3 nmstate ansible-core

cat << EOF > /etc/systemd/system/local-webserver-yum.service
[Unit]
Description=local-webserver-yum

[Service]
User=root
WorkingDirectory=/data/yum.repos
ExecStart=/bin/bash -c 'python3 -m http.server 5000'
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload

systemctl enable --now local-webserver-yum.service

cat << EOF > /etc/yum.repos.d/wzh.repo
[BaseOS]
name=BaseOS
baseurl=http://192.168.10.10:5000/BaseOS
enabled=1
gpgcheck=0

[AppStream]
name=AppStream
baseurl=http://192.168.10.10:5000/AppStream
enabled=1
gpgcheck=0

[epel-fix]
name=epel-fix
baseurl=http://192.168.10.10:5000/epel-fix
enabled=1
gpgcheck=0

EOF

3.4. download installation media

we will download the installation media on VPS and sync it to helper node.

3.4.1. on a VPS with vultr

# on a vultr
dnf install -y createrepo_c

# add your ocp pull secret, the content can be download from redhat portal
SEC_FILE='/data/pull-secret.json'

cat << 'EOF' > $SEC_FILE
{"auths":xxxxxxxxxxxxxxxxxxxxxxxxxxx
EOF

SEC_FILE="$HOME/.docker/config.json"
mkdir -p ${SEC_FILE%/*}

cat << 'EOF' > $SEC_FILE
{"auths":xxxxxxxxxxxxxxxxxxxxxxxxxxx
EOF

/bin/rm -rf /data/ocp4

/bin/rm -rf /data/ocp4/tmp/
mkdir -p /data/ocp4/tmp/
cd /data/ocp4/tmp/
# export http_proxy="http://127.0.0.1:18801"
# export https_proxy=${http_proxy}
git clone https://github.com/wangzheng422/openshift4-shell
# unset http_proxy
# unset https_proxy

cd /data/ocp4/tmp/openshift4-shell
git checkout ocp-4.12
# git pull origin ocp-${var_major_version}
/bin/cp -rf /data/ocp4/tmp/openshift4-shell/* /data/ocp4/

/bin/rm -rf /data/ocp4/tmp/

mkdir -p /data/ocp4/container.images
cd /data/ocp4/container.images

podman pull registry.access.redhat.com/ubi8/pause:8.7-6
podman save registry.access.redhat.com/ubi8/pause:8.7-6 | pigz -c > pause.tgz

cd /data/ocp4/
bash helper.node.client.sh -v 4.12.16

tar -xzf /data/ocp-4.12.16/oc-mirror.tar.gz -C /usr/local/bin/
chmod +x /usr/local/bin/oc-mirror

cat > /data/ocp4/mirror.yaml << EOF
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
# archiveSize: 4
mirror:
  platform:
    architectures:
      - amd64
      # - arm64
    channels:
      - name: stable-4.12
        type: ocp
        minVersion: 4.12.16
        maxVersion: 4.12.16
        shortestPath: true
    graph: false
  additionalImages:
    - name: registry.redhat.io/redhat/redhat-operator-index:v4.12
    - name: registry.redhat.io/redhat/certified-operator-index:v4.12
    - name: registry.redhat.io/redhat/community-operator-index:v4.12
    - name: registry.redhat.io/redhat/redhat-marketplace-index:v4.12 
    - name: quay.io/openshift/origin-kube-rbac-proxy:latest
    - name: quay.io/wangzheng422/debug-pod:alma-9.1
  # operators:
  #   - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.10  
  #     packages:
  #     - name: cluster-logging                                   
  #       channels:
  #       - name: stable
  #         minVersion: 5.6.3
  #     - name: elasticsearch-operator                               
  #       channels:
  #       - name: stable
  #         minVersion: 5.6.3
  #     - name: jaeger-product                             
  #       channels:
  #       - name: stable
  #         minVersion: 1.39.0-3
  #     - name: kubernetes-nmstate-operator                               
  #       channels:
  #       - name: stable
  #         minVersion: 4.10.0-202303022128
  #     - name: odf-operator                                 
  #       channels:
  #       - name: stable-4.10
  #         minVersion: 4.10.11
  #     - name: sriov-network-operator                             
  #       channels:
  #       - name: stable
  #         minVersion: 4.10.0-202302280915
  #     - name: kubevirt-hyperconverged
  #       channels:
  #       - name: stable
  #         minVersion: 4.10.8
EOF


mkdir -p /data/ocp-install/oc-mirror/
cd /data/ocp-install/oc-mirror/

oc-mirror --config /data/ocp4/mirror.yaml file:///data/ocp-install/oc-mirror/

# sync back to demo lab jumpbox
cd /data
rsync -P -arz  /data/ocp4 root@10.229.104.55:/home/wzh/
rsync -P -arz  /data/ocp-4.12.16 root@10.229.104.55:/home/wzh/
rsync -P -arz  /data/ocp-install root@10.229.104.55:/home/wzh/

3.4.2. on helper vm node

sync back from demo lab jumpbox

# on helper vm node
rsync -P -arz  root@192.168.10.90:/home/wzh/* /data/

mkdir -p /data/yum.repos/epel-fix
rsync -P -arz /data/ocp4/rpms/* /data/yum.repos/epel-fix/

3.5. automatic setup power dns

setup pdns by using an ansible playbook. RedHatters build some ansible projects to help deply the openshift, our ansible playbook is used some scripts from them.


dnf install -y ansible-core

cd /data/ocp4/ansible-helper

cat > var.yaml << EOF
helper:
  ip_addr: 192.168.10.10
  nic: enp1s0
pdns:
  bind: 0.0.0.0
  port: 53
  recursor_port: 5301
  # forward: 172.21.1.1
  static:
    - base_domain: demolab-infra.wzhlab.top
      record:
        - name: registry
          ip_addr: 192.168.10.10
        - name: quay
          ip_addr: 192.168.10.10
ntp:
  server: 192.168.10.10

# below doesn't need after ocp-4.12 for agent based installer
# becaure coredns, haproxy move to static-pod
# and they are configured to support local resolve and redirection.
# keep here for legacy compatibility
cluster:
  - base_domain: demolab-ocp.wzhlab.top
    node: 
      - ip_addr: 192.168.10.21
        name: master-01
      - ip_addr: 192.168.10.22
        name: master-02
      - ip_addr: 192.168.10.23
        name: master-03
      - ip_addr: 192.168.10.31
        name: infra-01
      - ip_addr: 192.168.10.32
        name: infra-02
      - ip_addr: 192.168.10.33
        name: infra-03
      - ip_addr: 192.168.10.41
        name: worker-01
      - ip_addr: 192.168.10.42
        name: worker-02
      - ip_addr: 192.168.10.51
        name: scale-01
      - ip_addr: 192.168.10.52
        name: scale-02
      - ip_addr: 192.168.10.53
        name: scale-03
    api:
      - ip_addr: 192.168.10.11
    api_int:
      - ip_addr: 192.168.10.11
    apps:
      - ip_addr: 192.168.12.12
ptr: 
  - addr: 192.168.10
    domain: ptr01.wzhlab.top
EOF

cd /data/ocp4/ansible-helper
# ansible-playbook -vvv -e @var.yaml  helper.yaml
ansible-playbook  -e @var.yaml  helper.yaml


and config public dns record, if your workstation's dns not point to our helper node's power dns.

3.6. create ca key and crt

# on helper vm

mkdir -p /etc/crts/ && cd /etc/crts

# https://access.redhat.com/documentation/en-us/red_hat_codeready_workspaces/2.1/html/installation_guide/installing-codeready-workspaces-in-tls-mode-with-self-signed-certificates_crw
openssl genrsa -out /etc/crts/wzhlab.top.ca.key 4096

openssl req -x509 \
  -new -nodes \
  -key /etc/crts/wzhlab.top.ca.key \
  -sha256 \
  -days 36500 \
  -out /etc/crts/wzhlab.top.ca.crt \
  -subj /CN="Local wzh lab Signer" \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat /etc/pki/tls/openssl.cnf \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature'))

openssl genrsa -out /etc/crts/wzhlab.top.key 2048

openssl req -new -sha256 \
    -key /etc/crts/wzhlab.top.key \
    -subj "/O=Local wzh lab /CN=*.demolab-infra.wzhlab.top" \
    -reqexts SAN \
    -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "\n[SAN]\nsubjectAltName=DNS:*.demolab-infra.wzhlab.top,DNS:*.demolab-ocp.wzhlab.top,DNS:*.wzhlab.top\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth")) \
    -out /etc/crts/wzhlab.top.csr

openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=DNS:*.demolab-infra.wzhlab.top,DNS:*.demolab-ocp.wzhlab.top,DNS:*.wzhlab.top\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth") \
    -days 36500 \
    -in /etc/crts/wzhlab.top.csr \
    -CA /etc/crts/wzhlab.top.ca.crt \
    -CAkey /etc/crts/wzhlab.top.ca.key \
    -CAcreateserial -out /etc/crts/wzhlab.top.crt

openssl x509 -in /etc/crts/wzhlab.top.crt -text

/bin/cp -f /etc/crts/wzhlab.top.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract



3.7. setup image registry


# https://docs.openshift.com/container-platform/4.12/installing/disconnected_install/installing-mirroring-creating-registry.html

ssh-copy-id root@192.168.10.10

podman load -i /data/ocp4/container.images/pause.tgz

mkdir -p /data/quay 
cd /data/ocp4/clients
tar zvxf mirror-registry.tar.gz

# replace the xxxxxx with your password
./mirror-registry install -v \
  --initPassword xxxxxx --initUser admin \
  -k ~/.ssh/id_rsa \
  --quayHostname quay.demolab-infra.wzhlab.top --quayRoot /data/quay \
  --targetHostname quay.demolab-infra.wzhlab.top \
  --sslKey /etc/crts/wzhlab.top.key --sslCert /etc/crts/wzhlab.top.crt

# ......
# PLAY RECAP ****************************************************************************************************************************************************************root@quay.demolab-infra.wzhlab.top : ok=48   changed=26   unreachable=0    failed=0    skipped=19   rescued=0    ignored=0

# INFO[2023-05-25 13:04:43] Quay installed successfully, config data is stored in /data/quay
# INFO[2023-05-25 13:04:43] Quay is available at https://quay.demolab-infra.wzhlab.top:8443 with credentials (admin, xxxxxx)


podman pod ps
# POD ID        NAME        STATUS      CREATED        INFRA ID      # OF CONTAINERS
# 5afa94fc84fc  quay-pod    Running     9 minutes ago  b911a67bf5cb  4


# import installation media into quay
mkdir -p $HOME/.local/bin
cat << 'EOF' >> ~/.bash_profile

PATH=$HOME/.local/bin:$PATH
export PATH

EOF

export BUILDNUMBER=4.12.16

pushd /data/ocp-${BUILDNUMBER}
tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C ~/.local/bin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C ~/.local/bin/
tar -xzf oc-mirror.tar.gz -C ~/.local/bin/
chmod +x ~/.local/bin/oc-mirror
/bin/cp -f openshift-baremetal-install ~/.local/bin/
popd


SEC_FILE="$HOME/.docker/config.json"
mkdir -p ${SEC_FILE%/*}

cat << 'EOF' > $SEC_FILE
{"auths":xxxxxxxxxxxxxxxxxxxxxxxxxxx
EOF

mkdir -p /data/wzh.work
cd /data/wzh.work
oc-mirror --from=/data/ocp-install/oc-mirror/mirror_seq1_000000.tar \
  docker://quay.demolab-infra.wzhlab.top:8443

after import, you can check the result from web console. as you can see, there are several repository created.

4. install 3 master compact cluster

all dependency service are installed and ready, now we will start to install 3 master compact cluster. we will begin with 3 node compact cluster, and then demo to scale out 3 kvm worker node, add 3 infra node and 2 baremetal worker node.

4.1. config on helper node

# create a user to hold the config env for the new ocp cluster
useradd -m 3node

usermod -aG wheel 3node

echo -e "%wheel\tALL=(ALL)\tNOPASSWD: ALL" > /etc/sudoers.d/020_sudo_for_me

su - 3node

ssh-keygen

cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

chmod 600 ~/.ssh/config

cat << 'EOF' >> ~/.bashrc

export BASE_DIR='/home/3node/'

EOF


export BUILDNUMBER=4.12.16

mkdir -p ~/.local/bin
pushd /data/ocp-${BUILDNUMBER}
tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C ~/.local/bin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C ~/.local/bin/
install -m 755 /data/ocp4/clients/butane-amd64 ~/.local/bin/butane
install -m 755 /data/ocp4/clients/coreos-installer_amd64 ~/.local/bin/coreos-installer
popd


export BUILDNUMBER=4.12.16

mkdir -p ${BASE_DIR}/data/{sno/disconnected,install}

# set some parameter of you rcluster

NODE_SSH_KEY="$(cat ${BASE_DIR}/.ssh/id_rsa.pub)"
INSTALL_IMAGE_REGISTRY=quay.demolab-infra.wzhlab.top:8443

# update the xxxxxx with your password for the image registry
PULL_SECRET='{"auths":{"registry.redhat.io": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"'${INSTALL_IMAGE_REGISTRY}'": {"auth": "'$( echo -n 'admin:xxxxxx' | openssl base64 )'","email": "noemail@localhost"}}}'


NTP_SERVER=192.168.10.10
# HELP_SERVER=192.168.7.11
# KVM_HOST=192.168.7.11
API_VIP=192.168.10.11
INGRESS_VIP=192.168.10.12
# CLUSTER_PROVISION_IP=192.168.7.103
# BOOTSTRAP_IP=192.168.7.12

# 定义单节点集群的节点信息
SNO_CLUSTER_NAME=demolab-ocp
SNO_BASE_DOMAIN=wzhlab.top

# BOOTSTRAP_IP=192.168.77.42
MASTER_01_IP=192.168.10.21
MASTER_02_IP=192.168.10.22
MASTER_03_IP=192.168.10.23

# BOOTSTRAP_IPv6=fd03::42
MASTER_01_IPv6=fd03::21
MASTER_02_IPv6=fd03::22
MASTER_03_IPv6=fd03::23

# BOOTSTRAP_HOSTNAME=bootstrap-demo
MASTER_01_HOSTNAME=master-01
MASTER_02_HOSTNAME=master-02
MASTER_03_HOSTNAME=master-03

# BOOTSTRAP_INTERFACE=enp1s0
MASTER_01_INTERFACE=enp1s0
MASTER_02_INTERFACE=enp1s0
MASTER_03_INTERFACE=enp1s0

MASTER_01_INTERFACE_MAC=52:54:00:13:A1:21
MASTER_02_INTERFACE_MAC=52:54:00:13:A1:22
MASTER_03_INTERFACE_MAC=52:54:00:13:A1:23

# BOOTSTRAP_DISK=/dev/vda
MASTER_01_DISK=/dev/vda
MASTER_02_DISK=/dev/vda
MASTER_03_DISK=/dev/vda

OCP_GW=192.168.10.10
OCP_NETMASK=255.255.255.0
OCP_NETMASK_S=24
OCP_DNS=192.168.10.10

OCP_GW_v6=fd03::10
OCP_NETMASK_v6=64

# echo ${SNO_IF_MAC} > /data/sno/sno.mac

mkdir -p ${BASE_DIR}/data/install
cd ${BASE_DIR}/data/install

/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] *

cat << EOF > ${BASE_DIR}/data/install/install-config.yaml 
apiVersion: v1
baseDomain: $SNO_BASE_DOMAIN
compute:
- name: worker
  replicas: 0 
controlPlane:
  name: master
  replicas: 3 
metadata:
  name: $SNO_CLUSTER_NAME
networking:
  # OVNKubernetes , OpenShiftSDN
  clusterNetwork:
    - cidr: 172.21.0.0/16
      hostPrefix: 23
    # - cidr: fd02::/48
    #   hostPrefix: 64
  machineNetwork:
    - cidr: 192.168.10.0/24
    # - cidr: 2001:DB8::/32
  serviceNetwork:
    - 172.22.0.0/16
    # - fd03::/112
platform:
  baremetal:
    apiVIPs:
    - $API_VIP
    # - 2001:DB8::4
    ingressVIPs:
    - $INGRESS_VIP
    # - 2001:DB8::5
pullSecret: '${PULL_SECRET}'
sshKey: |
$( cat ${BASE_DIR}/.ssh/id_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /etc/crts/wzhlab.top.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/openshift/release-images
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/openshift/release
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

cat << EOF > ${BASE_DIR}/data/install/agent-config.yaml
apiVersion: v1alpha1
kind: AgentConfig
metadata:
  name: $SNO_CLUSTER_NAME
rendezvousIP: $MASTER_01_IP
additionalNTPSources:
- $NTP_SERVER
hosts:
  - hostname: $MASTER_01_HOSTNAME
    role: master
    rootDeviceHints:
      deviceName: "$MASTER_01_DISK"
    interfaces:
      - name: $MASTER_01_INTERFACE
        macAddress: $MASTER_01_INTERFACE_MAC
    networkConfig:
      interfaces:
        - name: $MASTER_01_INTERFACE
          type: ethernet
          state: up
          mac-address: $MASTER_01_INTERFACE_MAC
          ipv4:
            enabled: true
            address:
              - ip: $MASTER_01_IP
                prefix-length: $OCP_NETMASK_S
            dhcp: false
      dns-resolver:
        config:
          server:
            - $OCP_DNS
      routes:
        config:
          - destination: 0.0.0.0/0
            next-hop-address: $OCP_GW
            next-hop-interface: $MASTER_01_INTERFACE
            table-id: 254
  - hostname: $MASTER_02_HOSTNAME
    role: master
    rootDeviceHints:
      deviceName: "$MASTER_02_DISK"
    interfaces:
      - name: $MASTER_02_INTERFACE
        macAddress: $MASTER_02_INTERFACE_MAC
    networkConfig:
      interfaces:
        - name: $MASTER_02_INTERFACE
          type: ethernet
          state: up
          mac-address: $MASTER_02_INTERFACE_MAC
          ipv4:
            enabled: true
            address:
              - ip: $MASTER_02_IP
                prefix-length: $OCP_NETMASK_S
            dhcp: false
      dns-resolver:
        config:
          server:
            - $OCP_DNS
      routes:
        config:
          - destination: 0.0.0.0/0
            next-hop-address: $OCP_GW
            next-hop-interface: $MASTER_02_INTERFACE
            table-id: 254
  - hostname: $MASTER_03_HOSTNAME
    role: master
    rootDeviceHints:
      deviceName: "$MASTER_03_DISK" 
    interfaces:
      - name: $MASTER_03_INTERFACE
        macAddress: $MASTER_03_INTERFACE_MAC
    networkConfig:
      interfaces:
        - name: $MASTER_03_INTERFACE
          type: ethernet
          state: up
          mac-address: $MASTER_03_INTERFACE_MAC
          ipv4:
            enabled: true
            address:
              - ip: $MASTER_03_IP
                prefix-length: $OCP_NETMASK_S
            dhcp: false
      dns-resolver:
        config:
          server:
            - $OCP_DNS
      routes:
        config:
          - destination: 0.0.0.0/0
            next-hop-address: $OCP_GW
            next-hop-interface: $MASTER_03_INTERFACE
            table-id: 254            
EOF

/bin/cp -f ${BASE_DIR}/data/install/install-config.yaml ${BASE_DIR}/data/install/install-config.yaml.bak
/bin/cp -f ${BASE_DIR}/data/install/agent-config.yaml ${BASE_DIR}/data/install/agent-config.yaml.bak

openshift-install --dir=${BASE_DIR}/data/install agent create cluster-manifests

sudo bash -c "/bin/cp -f mirror/registries.conf /etc/containers/registries.conf.d/; chmod +r /etc/containers/registries.conf.d/*"

mkdir -p ${BASE_DIR}/data/install/openshift/

# this is used to copy ntp config for ocp
# but not used anymore for agent based install mode
# /bin/cp -f  /data/ocp4/ansible-helper/files/* ${BASE_DIR}/data/install/openshift/

sudo bash -c "cd /data/ocp4 ; bash image.registries.conf.sh quay.demolab-infra.wzhlab.top:8443 ;"

/bin/cp -f /data/ocp4/99-worker-container-registries.yaml ${BASE_DIR}/data/install/openshift/
/bin/cp -f /data/ocp4/99-master-container-registries.yaml ${BASE_DIR}/data/install/openshift/

cd ${BASE_DIR}/data/install/

# openshift-install --dir=${BASE_DIR}/data/install create ignition-configs 

mkdir -p ~/.cache/agent/image_cache/
/bin/cp -f /data/ocp-$BUILDNUMBER/rhcos-live.x86_64.iso ~/.cache/agent/image_cache/coreos-x86_64.iso

openshift-install --dir=${BASE_DIR}/data/install agent create image --log-level=debug
# ......
# DEBUG Fetching image from OCP release (oc adm release info --image-for=machine-os-images --insecure=true --icsp-file=/tmp/icsp-file3636774741 quay.io/openshift-release-dev/ocp-release@sha256:96bf74ce789ccb22391deea98e0c5050c41b67cc17defbb38089d32226dba0b8)
# DEBUG The file was found in cache: /home/3node/.cache/agent/image_cache/coreos-x86_64.iso
# INFO Verifying cached file
# DEBUG extracting /coreos/coreos-x86_64.iso.sha256 to /tmp/cache1876698393, oc image extract --path /coreos/coreos-x86_64.iso.sha256:/tmp/cache1876698393 --confirm --icsp-file=/tmp/icsp-file455852761 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:052130abddf741195b6753888cf8a00757dedeb7010f7d4dcc4b842b5bc705f6
# ......


# we will add another user for debugging
# DO NOT USE IN PRODUCTION
coreos-installer iso ignition show agent.x86_64.iso > ignition.ign

# HTTP_PATH=http://192.168.7.11:8080/ignition

source /data/ocp4/acm.fn.sh

# 我们会创建一个wzh用户,密码是redhat,这个可以在第一次启动的是,从console/ssh直接用用户名口令登录
# 方便排错和研究
VAR_PWD_HASH="$(python3 -c 'import crypt,getpass; print(crypt.crypt("redhat"))')"

cat ${BASE_DIR}/data/install/ignition.ign \
  | jq --arg VAR "$VAR_PWD_HASH" --arg VAR_SSH "$NODE_SSH_KEY" '.passwd.users += [{ "name": "wzh", "system": true, "passwordHash": $VAR , "sshAuthorizedKeys": [ $VAR_SSH ], "groups": [ "adm", "wheel", "sudo", "systemd-journal"  ] }]' \
  | jq -c . \
  > ${BASE_DIR}/data/install/ignition-iso.ign

coreos-installer iso ignition embed -f -i ignition-iso.ign agent.x86_64.iso

# VAR_IMAGE_VER=rhcos-410.86.202303200936-AnolisOS-0-live.x86_64.iso



4.2. boot 3 kvm for master node

# on helper node
# copy back the iso to baremetal 97
scp /home/3node/data/install/agent.x86_64.iso  root@192.168.10.90:/home/wzh.iso/


# on baremetal 97

# cleanup
virsh destroy ocp4-master-01
virsh undefine ocp4-master-01
/bin/rm -f /image/ocp4-master-01.qcow2

virsh destroy ocp4-master-02
virsh undefine ocp4-master-02
/bin/rm -f /image/ocp4-master-02.qcow2

virsh destroy ocp4-master-03
virsh undefine ocp4-master-03
/bin/rm -f /image/ocp4-master-03.qcow2



SNO_MEM=48

virsh destroy ocp4-master-01
virsh undefine ocp4-master-01

virt-install --name=ocp4-master-01 --vcpus=12 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/image/ocp4-master-01.qcow2,bus=virtio,size=120 \
  --os-variant rhel8.3 \
  --network bridge=br-int,model=virtio,mac=52:54:00:13:A1:21 \
  --graphics vnc,port=59021 --noautoconsole \
  --boot menu=on --cdrom /home/wzh.iso/agent.x86_64.iso

virsh destroy ocp4-master-02
virsh undefine ocp4-master-02

virt-install --name=ocp4-master-02 --vcpus=12 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/image/ocp4-master-02.qcow2,bus=virtio,size=120 \
  --os-variant rhel8.3 \
  --network bridge=br-int,model=virtio,mac=52:54:00:13:A1:22 \
  --graphics vnc,port=59022 --noautoconsole \
  --boot menu=on --cdrom /home/wzh.iso/agent.x86_64.iso

virsh destroy ocp4-master-03
virsh undefine ocp4-master-03

virt-install --name=ocp4-master-03 --vcpus=12 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/image/ocp4-master-03.qcow2,bus=virtio,size=120 \
  --os-variant rhel8.3 \
  --network bridge=br-int,model=virtio,mac=52:54:00:13:A1:23 \
  --graphics vnc,port=59023 --noautoconsole \
  --boot menu=on --cdrom /home/wzh.iso/agent.x86_64.iso

The vm will reboot, in the first reboot, the kvm will not poweron after poweroff, keep an eye on the kvm manager, and start it manually.

4.3. wait and check the result

cd ${BASE_DIR}/data/install
export KUBECONFIG=${BASE_DIR}/data/install/auth/kubeconfig
echo "export KUBECONFIG=${BASE_DIR}/data/install/auth/kubeconfig" >> ~/.bashrc
# oc completion bash | sudo tee /etc/bash_completion.d/openshift > /dev/null


cd ${BASE_DIR}/data/install
openshift-install --dir=${BASE_DIR}/data/install agent wait-for bootstrap-complete \
    --log-level=debug

# DEBUG Host master-02 validation: Host subnets are not overlapping
# DEBUG Host master-02 validation: cnv is disabled
# DEBUG Host master-02 validation: lso is disabled
# DEBUG Host master-02 validation: lvm is disabled
# DEBUG Host master-02 validation: odf is disabled
# INFO Host: master-03, reached installation stage Done
# INFO Host: master-01, reached installation stage Waiting for controller: waiting for controller pod ready event
# INFO Bootstrap configMap status is complete
# INFO cluster bootstrap is complete

# if for some reason, master-01 is pending approve to join cluster
# add master-01 back
# you should not use below commands in normal case.
oc get csr
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve


cd ${BASE_DIR}/data/install
openshift-install --dir=${BASE_DIR}/data/install agent wait-for install-complete 
# INFO Bootstrap Kube API Initialized
# INFO Bootstrap configMap status is complete
# INFO cluster bootstrap is complete
# INFO Cluster is installed
# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run
# INFO     export KUBECONFIG=/home/3node/data/install/auth/kubeconfig
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.demolab-ocp.wzhlab.top
# INFO Login to the console with user: "kubeadmin", and password: "jxjb8-PPkX5-4WF78-5w8eL"


# customize registry config for quay
# oc patch mcp/master --patch '{"spec":{"paused":true}}' --type=merge
# oc patch mcp/worker --patch '{"spec":{"paused":true}}' --type=merge

# oc create -f ${BASE_DIR}/data/install/99-worker-container-registries.yaml
# oc create -f ${BASE_DIR}/data/install/99-master-container-registries.yaml

# oc patch mcp/master --patch '{"spec":{"paused":false}}' --type=merge
# oc patch mcp/worker --patch '{"spec":{"paused":false}}' --type=merge

5. scale out 3 kvm worker nodes

We will build 3 kvm worker nodes, and let openshift to scale out openshift nodes on these kvm. So we can demo the openshift scale-out and scale-in function.

The lab BM's bmc is connect to br-mgmt, and the br-int can not route to the br-mgmt, so the metal3's pod can't access bmc/idrac to insert boot image, that is why we have to demo the scale-out function using kvm.

5.1. config on host server


# on baremetal 97
mkdir -p /home/wzh.work

# cleanup
virsh destroy ocp4-scale-01
virsh undefine ocp4-scale-01
/bin/rm -f /image/ocp4-scale-01.qcow2

virsh destroy ocp4-scale-02
virsh undefine ocp4-scale-02
/bin/rm -f /image/ocp4-scale-02.qcow2

virsh destroy ocp4-scale-03
virsh undefine ocp4-scale-03
/bin/rm -f /image/ocp4-scale-03.qcow2

# define scale worker


SNO_MEM=48

virsh destroy ocp4-scale-01
virsh undefine ocp4-scale-01

virt-install --name=ocp4-scale-01 --vcpus=12 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/image/ocp4-scale-01.qcow2,bus=virtio,size=100 \
  --os-variant rhel8.3 \
  --network bridge=br-int,model=virtio,mac=52:54:00:13:A1:51 \
  --graphics vnc,port=59051 --noautoconsole \
  --print-xml > /home/wzh.work/ocp4-scale-01.xml
virsh define --file /home/wzh.work/ocp4-scale-01.xml

virsh destroy ocp4-scale-02
virsh undefine ocp4-scale-02

virt-install --name=ocp4-scale-02 --vcpus=12 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/image/ocp4-scale-02.qcow2,bus=virtio,size=100 \
  --os-variant rhel8.3 \
  --network bridge=br-int,model=virtio,mac=52:54:00:13:A1:52 \
  --graphics vnc,port=59052 --noautoconsole \
  --print-xml > /home/wzh.work/ocp4-scale-02.xml
virsh define --file /home/wzh.work/ocp4-scale-02.xml

virsh destroy ocp4-scale-03
virsh undefine ocp4-scale-03

virt-install --name=ocp4-scale-03 --vcpus=12 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/image/ocp4-scale-03.qcow2,bus=virtio,size=100 \
  --os-variant rhel8.3 \
  --network bridge=br-int,model=virtio,mac=52:54:00:13:A1:53 \
  --graphics vnc,port=59053 --noautoconsole \
  --print-xml > /home/wzh.work/ocp4-scale-03.xml
virsh define --file /home/wzh.work/ocp4-scale-03.xml

# setup and start bmc simulator for kvm
dnf -y install python3-pip
python3 -m pip install --upgrade pip --user
pip3 install --user sushy-tools

mkdir -p /etc/crts
scp root@192.168.10.10:/etc/crts/* /etc/crts/

# /root/.local/bin/sushy-emulator -i 0.0.0.0 --ssl-certificate /etc/crts/redhat.ren.crt --ssl-key /etc/crts/redhat.ren.key

# try to deploy as systemd service
cat << EOF > /etc/systemd/system/sushy-emulator.service
[Unit]
Description=sushy-emulator

[Service]
User=root
WorkingDirectory=/root
ExecStart=/bin/bash -c '/root/.local/bin/sushy-emulator -i 0.0.0.0 --ssl-certificate /etc/crts/wzhlab.top.crt --ssl-key /etc/crts/wzhlab.top.key'
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload

systemctl enable --now sushy-emulator.service


# collect mac and vm info for helper

# on helper clean all
# /bin/rm -f /data/install/mac.list.*
# /bin/rm -f /data/install/vm.list.*

# back to 103
cd /home/wzh.work
for i in ocp4-scale-0{1..3}
do
  echo -ne "${i}\t" ; 
  virsh dumpxml ${i} | grep "mac address" | cut -d\' -f2 | tr '\n' '\t'
  echo 
done > mac.list.97
cat /home/wzh.work/mac.list.97
# ocp4-scale-01   52:54:00:13:a1:51
# ocp4-scale-02   52:54:00:13:a1:52
# ocp4-scale-03   52:54:00:13:a1:53

cat << 'EOF' > redfish.sh
#!/usr/bin/env bash

curl -k -s https://127.0.0.1:8000/redfish/v1/Systems/ | jq -r '.Members[]."@odata.id"' >  list

while read -r line; do
    curl -k -s https://127.0.0.1:8000/$line | jq -j '.Id, " ", .Name, "\n" '
done < list

EOF
bash redfish.sh | grep ocp4-scale > /home/wzh.work/vm.list.97
cat /home/wzh.work/vm.list.97
# e0113aa6-1465-40da-9128-ae9087c76924 ocp4-scale-02
# 97d16a4b-fae3-43b7-bd5b-711c83cf840f ocp4-scale-01
# 25dda43c-fb42-4ac8-bea1-46c46635e7fa ocp4-scale-03

scp /home/wzh.work/{mac,vm}.list.* root@192.168.10.10:/home/3node/data/install/

cat > /home/wzh.work/crack.txt << 'EOF'

chown 3node: /home/3node/data/install/*

EOF
ssh root@192.168.10.10 < /home/wzh.work/crack.txt

5.2. config on helper node


# on helper node

cd ${BASE_DIR}/data/install/

cat << EOF > ${BASE_DIR}/data/install/bmh-01.yaml
# below is for ocp4-scale-01
---
apiVersion: v1
kind: Secret
metadata:
  name: scale-01-bmc-secret
type: Opaque
data:
  username: $(echo -ne "admin" | base64)
  password: $(echo -ne "password" | base64)
---
apiVersion: v1
kind: Secret
metadata:
  name: ocp4-scale-01-network-config-secret
type: Opaque
stringData:
  nmstate: |
    dns-resolver:
      config:
        server:
        - 192.168.10.10
    interfaces:
    - ipv4:
        address:
        - ip: 192.168.10.51
          prefix-length: 24
        dhcp: false
        enabled: true
      name: enp1s0
      state: up
      type: ethernet
    routes:
      config:
      - destination: 0.0.0.0/0
        next-hop-address: 192.168.10.10
        next-hop-interface: enp1s0
        table-id: 254
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: ocp4-scale-01
spec:
  online: false
  bootMode: legacy 
  # externallyProvisioned: true
  # hardwareProfile: unknown
  bootMACAddress: $(cat ${BASE_DIR}/data/install/mac.list.* | grep ocp4-scale-01 | awk '{print $2}')
  bmc:
    address: redfish-virtualmedia://192.168.10.90:8000/redfish/v1/Systems/$(cat ${BASE_DIR}/data/install/vm.list.* | grep ocp4-scale-01 | awk '{print $1}')
    credentialsName: scale-01-bmc-secret
    disableCertificateVerification: true
  rootDeviceHints:
    deviceName: /dev/vda
  preprovisioningNetworkDataName: ocp4-scale-01-network-config-secret

# below is for ocp4-scale-02
---
apiVersion: v1
kind: Secret
metadata:
  name: scale-02-bmc-secret
type: Opaque
data:
  username: $(echo -ne "admin" | base64)
  password: $(echo -ne "password" | base64)
---
apiVersion: v1
kind: Secret
metadata:
  name: ocp4-scale-02-network-config-secret
type: Opaque
stringData:
  nmstate: |
    dns-resolver:
      config:
        server:
        - 192.168.10.10
    interfaces:
    - ipv4:
        address:
        - ip: 192.168.10.52
          prefix-length: 24
        dhcp: false
        enabled: true
      name: enp1s0
      state: up
      type: ethernet
    routes:
      config:
      - destination: 0.0.0.0/0
        next-hop-address: 192.168.10.10
        next-hop-interface: enp1s0
        table-id: 254
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: ocp4-scale-02
spec:
  online: false
  bootMode: legacy 
  # externallyProvisioned: true
  # hardwareProfile: unknown
  bootMACAddress: $(cat ${BASE_DIR}/data/install/mac.list.* | grep ocp4-scale-02 | awk '{print $2}')
  bmc:
    address: redfish-virtualmedia://192.168.10.90:8000/redfish/v1/Systems/$(cat ${BASE_DIR}/data/install/vm.list.* | grep ocp4-scale-02 | awk '{print $1}')
    credentialsName: scale-02-bmc-secret
    disableCertificateVerification: true
  rootDeviceHints:
    deviceName: /dev/vda
  preprovisioningNetworkDataName: ocp4-scale-02-network-config-secret

# below is for ocp4-scale-03
---
apiVersion: v1
kind: Secret
metadata:
  name: scale-03-bmc-secret
type: Opaque
data:
  username: $(echo -ne "admin" | base64)
  password: $(echo -ne "password" | base64)
---
apiVersion: v1
kind: Secret
metadata:
  name: ocp4-scale-03-network-config-secret
type: Opaque
stringData:
  nmstate: |
    dns-resolver:
      config:
        server:
        - 192.168.10.10
    interfaces:
    - ipv4:
        address:
        - ip: 192.168.10.53
          prefix-length: 24
        dhcp: false
        enabled: true
      name: enp1s0
      state: up
      type: ethernet
    routes:
      config:
      - destination: 0.0.0.0/0
        next-hop-address: 192.168.10.10
        next-hop-interface: enp1s0
        table-id: 254
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: ocp4-scale-03
spec:
  online: false
  bootMode: legacy 
  # externallyProvisioned: true
  # hardwareProfile: unknown
  bootMACAddress: $(cat ${BASE_DIR}/data/install/mac.list.* | grep ocp4-scale-03 | awk '{print $2}')
  bmc:
    address: redfish-virtualmedia://192.168.10.90:8000/redfish/v1/Systems/$(cat ${BASE_DIR}/data/install/vm.list.* | grep ocp4-scale-03 | awk '{print $1}')
    credentialsName: scale-03-bmc-secret
    disableCertificateVerification: true
  rootDeviceHints:
    deviceName: /dev/vda
  preprovisioningNetworkDataName: ocp4-scale-03-network-config-secret

EOF
oc -n openshift-machine-api create -f ${BASE_DIR}/data/install/bmh-01.yaml

After apply the baremetal host config, the kvm will boot, and ocp will detect the machine hareware config using ironic. And the kvm will be power-off after the check finish.

Then you can see the baremetal is ready to provision.

5.3. scale out and check the result

find the entry of machineset config, and set the machine count to '1'.

You will find the kvm is booting and provisioning as worker node.

after some time, the worker node is provisioned, and you can see it in cli console.

oc get node
# NAME                              STATUS   ROLES                         AGE     VERSION
# master-01                         Ready    control-plane,master,worker   3h19m   v1.25.8+37a9a08
# master-02                         Ready    control-plane,master,worker   4h1m    v1.25.8+37a9a08
# master-03                         Ready    control-plane,master,worker   4h3m    v1.25.8+37a9a08
# scale-01.demolab-ocp.wzhlab.top   Ready    worker                        3m55s   v1.25.8+37a9a08

you can find a new baremetal host is provisioned in web console.

You can also find a new machine is created in web console.

you can also find a new node is created in web console.

5.4. scale in and check the result

Scale in is very simple, just open machine set config, and decrease the number

Then, the vm is powered off, and the CR, like machine and node is deleted.

You can confirm that in the cli.

oc get node
# NAME        STATUS   ROLES                         AGE     VERSION
# master-01   Ready    control-plane,master,worker   3h52m   v1.25.8+37a9a08
# master-02   Ready    control-plane,master,worker   4h33m   v1.25.8+37a9a08
# master-03   Ready    control-plane,master,worker   4h35m   v1.25.8+37a9a08

6. add 3 infra nodes

Add the 3 infra kvm node is simple, becasue we will not using metal3 to automatically scale-out. We will build an ISO file for each of the kvm, and boot kvm using them.

6.1. config on helper node

# get ignition file for worker node
cd ${BASE_DIR}/data/install/

oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign

# copy the ignition file to root of local web server
# later, during the rhcos booting, it will fetch the ignition file
# from the webserver
sudo mkdir -p /data/yum.repos/conf
sudo /bin/cp -f worker.ign /data/yum.repos/conf/


# some env

# BOOTSTRAP_IP=192.168.77.42
INFRA_01_IP=192.168.10.31
INFRA_02_IP=192.168.10.32
INFRA_03_IP=192.168.10.33

# BOOTSTRAP_IPv6=fd03::42
INFRA_01_IPv6=fd03::31
INFRA_02_IPv6=fd03::32
INFRA_03_IPv6=fd03::33

# BOOTSTRAP_HOSTNAME=bootstrap-demo
INFRA_01_HOSTNAME=infra-01
INFRA_02_HOSTNAME=infra-02
INFRA_03_HOSTNAME=infra-03

# BOOTSTRAP_INTERFACE=enp1s0
INFRA_01_INTERFACE=enp1s0
INFRA_02_INTERFACE=enp1s0
INFRA_03_INTERFACE=enp1s0


# BOOTSTRAP_DISK=/dev/vda
INFRA_01_DISK=/dev/vda
INFRA_02_DISK=/dev/vda
INFRA_03_DISK=/dev/vda

OCP_GW=192.168.10.10
OCP_NETMASK=255.255.255.0
OCP_NETMASK_S=24
OCP_DNS=192.168.10.10

OCP_GW_v6=fd03::10
OCP_NETMASK_v6=64


# build the iso file for each of kvm
export BUILDNUMBER=4.12.16
cd ${BASE_DIR}/data/install/

/bin/cp -f /data/ocp-${BUILDNUMBER}/rhcos-live.x86_64.iso infra-01.iso

coreos-installer iso kargs modify -a "ip=$INFRA_01_IP::$OCP_GW:$OCP_NETMASK:$INFRA_01_HOSTNAME:$INFRA_01_INTERFACE:none nameserver=$OCP_DNS coreos.inst.install_dev=$INFRA_01_DISK coreos.inst.ignition_url=http://192.168.10.10:5000/conf/worker.ign coreos.inst.insecure systemd.debug-shell=1 " infra-01.iso

/bin/cp -f /data/ocp-${BUILDNUMBER}/rhcos-live.x86_64.iso infra-02.iso

coreos-installer iso kargs modify -a "ip=$INFRA_02_IP::$OCP_GW:$OCP_NETMASK:$INFRA_02_HOSTNAME:$INFRA_02_INTERFACE:none nameserver=$OCP_DNS coreos.inst.install_dev=$INFRA_02_DISK coreos.inst.ignition_url=http://192.168.10.10:5000/conf/worker.ign coreos.inst.insecure systemd.debug-shell=1 " infra-02.iso

/bin/cp -f /data/ocp-${BUILDNUMBER}/rhcos-live.x86_64.iso infra-03.iso

coreos-installer iso kargs modify -a "ip=$INFRA_03_IP::$OCP_GW:$OCP_NETMASK:$INFRA_03_HOSTNAME:$INFRA_03_INTERFACE:none nameserver=$OCP_DNS coreos.inst.install_dev=$INFRA_03_DISK coreos.inst.ignition_url=http://192.168.10.10:5000/conf/worker.ign coreos.inst.insecure systemd.debug-shell=1 " infra-03.iso

# transfer the iso file to kvm host server ( 98 )
scp infra-01.iso root@192.168.10.92:/data/kvm/
scp infra-02.iso root@192.168.10.92:/data/kvm/
scp infra-03.iso root@192.168.10.92:/data/kvm/

6.2. config infra host BM nodes (98)

# dnf setup for the server
cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

# DO NOT use at production env.
systemctl disable --now firewalld

# ntp
mv /etc/chrony.conf /etc/chrony.conf.bak

cat << EOF > /etc/chrony.conf
server 192.168.10.90 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow all
logdir /var/log/chrony
EOF
systemctl restart chronyd

systemctl enable --now chronyd

cat << EOF > /etc/yum.repos.d/wzh.repo
[BaseOS]
name=BaseOS
baseurl=http://192.168.10.10:5000/BaseOS
enabled=1
gpgcheck=0

[AppStream]
name=AppStream
baseurl=http://192.168.10.10:5000/AppStream
enabled=1
gpgcheck=0

[epel-fix]
name=epel-fix
baseurl=http://192.168.10.10:5000/epel-fix
enabled=1
gpgcheck=0

EOF

dnf groupinstall -y 'server with gui'

# add support for kvm and vnc
dnf -y install qemu-kvm libvirt libguestfs-tools virt-install virt-viewer virt-manager tigervnc-server

# auto start libvirt
systemctl enable --now libvirtd

# create password for vnc
# replace 'xxxxxx' with your password
printf 'xxxxxx\nxxxxxx\n\n' | vncpasswd

# create vnc config for vnc starting up
cat << EOF > ~/.vnc/config
session=gnome
securitytypes=vncauth,tlsvnc
# desktop=sandbox
geometry=1440x855
alwaysshared
EOF

# auto start vnc session for root user at port 5902
cat << EOF >> /etc/tigervnc/vncserver.users
:2=root
EOF

# auto start vnc session
systemctl enable --now vncserver@:2

#setup network bridge
cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='192.168.10.92/24'
PUB_GW='192.168.10.10'
PUB_DNS='192.168.10.10'
BR_IF='br-int'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down "$BR_IF"
nmcli con delete "$BR_IF"
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname "$BR_IF" type bridge con-name "$BR_IF" ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master "$BR_IF"
nmcli con down "$PUB_CONN";pkill dhclient;dhclient "$BR_IF"
nmcli con up "$BR_IF"
EOF

bash /data/kvm/bridge.sh


# setup the thin provision lvm
lsblk
# NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
# sda      8:0    0  4.4T  0 disk
# ├─sda1   8:1    0  600M  0 part /boot/efi
# ├─sda2   8:2    0    1G  0 part /boot
# └─sda3   8:3    0  500G  0 part /
# sr0     11:0    1 1024M  0 rom

fdisk /dev/sda
# n -> to create new partition
# w -> to write out the new partition

lsblk
# NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
# sda      8:0    0  4.4T  0 disk
# ├─sda1   8:1    0  600M  0 part /boot/efi
# ├─sda2   8:2    0    1G  0 part /boot
# ├─sda3   8:3    0  500G  0 part /
# └─sda4   8:4    0  3.9T  0 part
# sr0     11:0    1 1024M  0 rom

pvcreate -y /dev/sda4
vgcreate vgdata /dev/sda4

# https://access.redhat.com/articles/766133
lvcreate -y -n poolA -L 100G vgdata
lvcreate -y -n poolA_meta -L 1G vgdata
lvconvert -y --thinpool vgdata/poolA --poolmetadata vgdata/poolA_meta
  # Thin pool volume with chunk size 64.00 KiB can address at most <15.88 TiB of data.
  # WARNING: Converting vgdata/poolA and vgdata/poolA_meta to thin pool's data and metadata volumes with metadata wiping.
  # THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  # Converted vgdata/poolA and vgdata/poolA_meta to thin pool.

lvextend -l +100%FREE vgdata/poolA
  # Rounding size to boundary between physical extents: <3.88 GiB.
  # Size of logical volume vgdata/poolA_tmeta changed from 1.00 GiB (256 extents) to <3.88 GiB (992 extents).
  # Size of logical volume vgdata/poolA_tdata changed from 100.00 GiB (25600 extents) to <3.87 TiB (1013929 extents).
  # Logical volume vgdata/poolA successfully resized.

lsblk
# NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
# sda                      8:0    0  4.4T  0 disk
# ├─sda1                   8:1    0  600M  0 part /boot/efi
# ├─sda2                   8:2    0    1G  0 part /boot
# ├─sda3                   8:3    0  500G  0 part /
# └─sda4                   8:4    0  3.9T  0 part
#   ├─vgdata-poolA_tmeta 253:0    0  3.9G  0 lvm
#   │ └─vgdata-poolA     253:2    0  3.9T  0 lvm
#   └─vgdata-poolA_tdata 253:1    0  3.9T  0 lvm
#     └─vgdata-poolA     253:2    0  3.9T  0 lvm
# sr0                     11:0    1 1024M  0 rom

why we use thin provision lvm, not using qcow2 file on xfs filesystem? because the performance, lvm is 2x or 3x faster than qcow2 file on filesystem.

6.3. boot 3 infra kvm

# cleanup the kvm config

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

virsh destroy ocp4-infra-01
virsh undefine ocp4-infra-01

create_lv vgdata poolA lv-ocp4-infra-01 100G 
create_lv vgdata poolA lv-ocp4-infra-01-data 1024G 

virsh destroy ocp4-infra-02
virsh undefine ocp4-infra-02

create_lv vgdata poolA lv-ocp4-infra-02 100G 
create_lv vgdata poolA lv-ocp4-infra-02-data 1024G 

virsh destroy ocp4-infra-03
virsh undefine ocp4-infra-03

create_lv vgdata poolA lv-ocp4-infra-03 100G 
create_lv vgdata poolA lv-ocp4-infra-03-data 1024G 

# start the kvm

SNO_MEM=32

virsh destroy ocp4-infra-01
virsh undefine ocp4-infra-01

create_lv vgdata poolA lv-ocp4-infra-01 100G recreate
create_lv vgdata poolA lv-ocp4-infra-01-data 1024G recreate

virt-install --name=ocp4-infra-01 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lv-ocp4-infra-01,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-infra-01-data,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.3 --network bridge=br-int,model=virtio,mac=52:54:00:13:A1:31 \
  --graphics vnc,port=59031 --noautoconsole \
  --boot menu=on --cdrom /data/kvm/infra-01.iso


virsh destroy ocp4-infra-02
virsh undefine ocp4-infra-02

create_lv vgdata poolA lv-ocp4-infra-02 100G recreate
create_lv vgdata poolA lv-ocp4-infra-02-data 1024G recreate

virt-install --name=ocp4-infra-02 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lv-ocp4-infra-02,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-infra-02-data,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.3 --network bridge=br-int,model=virtio,mac=52:54:00:13:A1:32 \
  --graphics vnc,port=59032 --noautoconsole \
  --boot menu=on --cdrom /data/kvm/infra-02.iso


virsh destroy ocp4-infra-03
virsh undefine ocp4-infra-03

create_lv vgdata poolA lv-ocp4-infra-03 100G recreate
create_lv vgdata poolA lv-ocp4-infra-03-data 1024G recreate

virt-install --name=ocp4-infra-03 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lv-ocp4-infra-03,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-infra-03-data,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.3 --network bridge=br-int,model=virtio,mac=52:54:00:13:A1:33 \
  --graphics vnc,port=59033 --noautoconsole \
  --boot menu=on --cdrom /data/kvm/infra-03.iso

6.4. wait and check the result

# approve is automatically, if it is not,
# approve the new infra node to join cluster manually
oc get csr
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

oc get node
# NAME        STATUS   ROLES                         AGE     VERSION
# infra-01    Ready    worker                        117s    v1.25.8+37a9a08
# infra-02    Ready    worker                        111s    v1.25.8+37a9a08
# infra-03    Ready    worker                        110s    v1.25.8+37a9a08
# master-01   Ready    control-plane,master,worker   6h3m    v1.25.8+37a9a08
# master-02   Ready    control-plane,master,worker   6h45m   v1.25.8+37a9a08
# master-03   Ready    control-plane,master,worker   6h47m   v1.25.8+37a9a08

7. add 2 worker BM nodes

Add the 2 worker baremetal nodes is the same with the 3 infra node.

if you still want to scale out by metal3, using below config parameter after you plug-in the line of bmc to br-int

worker-01
F0:D4:E2:EA:6F:E0
idrac-virtualmedia://<ip of bmc>/redfish/v1/Systems/System.Embedded.1

7.1. config on helper node


# some env

# BOOTSTRAP_IP=192.168.77.42
WORKER_01_IP=192.168.10.41
WORKER_02_IP=192.168.10.42
# INFRA_03_IP=192.168.10.33

# BOOTSTRAP_IPv6=fd03::42
WORKER_01_IPv6=fd03::41
WORKER_02_IPv6=fd03::42
# INFRA_03_IPv6=fd03::33

# BOOTSTRAP_HOSTNAME=bootstrap-demo
WORKER_01_HOSTNAME=worker-01
WORKER_02_HOSTNAME=worker-02
# INFRA_03_HOSTNAME=infra-03

# BOOTSTRAP_INTERFACE=enp1s0
WORKER_01_INTERFACE=eno1
WORKER_02_INTERFACE=eno1
# INFRA_03_INTERFACE=enp1s0


# BOOTSTRAP_DISK=/dev/vda
WORKER_01_DISK=/dev/sdb
WORKER_02_DISK=/dev/sda
# INFRA_03_DISK=/dev/vda

OCP_GW=192.168.10.10
OCP_NETMASK=255.255.255.0
OCP_NETMASK_S=24
OCP_DNS=192.168.10.10

OCP_GW_v6=fd03::10
OCP_NETMASK_v6=64


# build the iso file for each of kvm
export BUILDNUMBER=4.12.16
cd ${BASE_DIR}/data/install/

/bin/cp -f /data/ocp-${BUILDNUMBER}/rhcos-live.x86_64.iso worker-01.iso

coreos-installer iso kargs modify -a "ip=$WORKER_01_IP::$OCP_GW:$OCP_NETMASK:$WORKER_01_HOSTNAME:$WORKER_01_INTERFACE:none nameserver=$OCP_DNS coreos.inst.install_dev=$WORKER_01_DISK coreos.inst.ignition_url=http://192.168.10.10:5000/conf/worker.ign coreos.inst.insecure systemd.debug-shell=1 " worker-01.iso

/bin/cp -f /data/ocp-${BUILDNUMBER}/rhcos-live.x86_64.iso worker-02.iso

coreos-installer iso kargs modify -a "ip=$WORKER_02_IP::$OCP_GW:$OCP_NETMASK:$WORKER_02_HOSTNAME:$WORKER_02_INTERFACE:none nameserver=$OCP_DNS coreos.inst.install_dev=$WORKER_02_DISK coreos.inst.ignition_url=http://192.168.10.10:5000/conf/worker.ign coreos.inst.insecure systemd.debug-shell=1 " worker-02.iso


# transfer the iso file to host server
scp worker-01.iso root@192.168.10.90:/home/wzh.iso/
scp worker-02.iso root@192.168.10.90:/home/wzh.iso/


7.2. boot the BM and check the result

power on the BM after attch the ISO image to virtual cdrom.

Before booting the BM with iso, it is better to reset integrated raid card config, and reset all vdisk. Otherwise, you will fall into booting issues with uefi.

Some machine, like the BM server in demo lab, need manually remove the virtual cdrom during the 1st reboot.

# approve the new infra node to join cluster manually
oc get csr
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve


8. set up infra role on cluster

the offical document is here:

  • https://docs.openshift.com/container-platform/4.12/machine_management/creating-infrastructure-machinesets.html#creating-an-infra-node_creating-infrastructure-machinesets
  • https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/managing_and_allocating_storage_resources/index#manual_creation_of_infrastructure_nodes

I will not create infra machine set, because I can not find the document on how to create it for baremetal cluster.

8.1. basic cluster config

# currently the cluster looks like this
oc get node
# NAME        STATUS   ROLES                         AGE    VERSION
# infra-01    Ready    worker                        23h    v1.25.8+37a9a08
# infra-02    Ready    worker                        23h    v1.25.8+37a9a08
# infra-03    Ready    worker                        23h    v1.25.8+37a9a08
# master-01   Ready    control-plane,master,worker   29h    v1.25.8+37a9a08
# master-02   Ready    control-plane,master,worker   30h    v1.25.8+37a9a08
# master-03   Ready    control-plane,master,worker   30h    v1.25.8+37a9a08
# worker-01   Ready    worker                        3h4m   v1.25.8+37a9a08
# worker-02   Ready    worker                        99m    v1.25.8+37a9a08

oc get mcp
# NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
# master   rendered-master-6b284ac2e77636bd9f5fe05b8f68bf3a   True      False      False      3              3                   3                     0                      30h
# worker   rendered-worker-8404cadc036bdaa800e4924522f5ace6   True      False      False      5              5                   5                     0                      30h


# add node lable for infra
for i in worker-0{1..2}; do
  oc label node $i node-role.kubernetes.io/app=""
done

for i in infra-0{1..3}; do
  oc label node $i node-role.kubernetes.io/infra=""
  # enable below if you want to run only ODF on infra
  oc label node $i cluster.ocs.openshift.io/openshift-storage=""
done

oc get node
# NAME        STATUS   ROLES                         AGE     VERSION
# infra-01    Ready    infra,worker                  23h     v1.25.8+37a9a08
# infra-02    Ready    infra,worker                  23h     v1.25.8+37a9a08
# infra-03    Ready    infra,worker                  23h     v1.25.8+37a9a08
# master-01   Ready    control-plane,master,worker   29h     v1.25.8+37a9a08
# master-02   Ready    control-plane,master,worker   30h     v1.25.8+37a9a08
# master-03   Ready    control-plane,master,worker   30h     v1.25.8+37a9a08
# worker-01   Ready    app,worker                    3h12m   v1.25.8+37a9a08
# worker-02   Ready    app,worker                    107m    v1.25.8+37a9a08

cat << EOF > ${BASE_DIR}/data/install/infra.mcp.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} 
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: "" 
EOF
oc create --save-config -f ${BASE_DIR}/data/install/infra.mcp.yaml

oc get mcp
# NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
# infra    rendered-infra-8404cadc036bdaa800e4924522f5ace6    True      False      False      3              3                   3                     0                      2m43s
# master   rendered-master-6b284ac2e77636bd9f5fe05b8f68bf3a   True      False      False      3              3                   3                     0                      30h
# worker   rendered-worker-8404cadc036bdaa800e4924522f5ace6   True      False      False      2              2                   2                     0                      30h

# taint infra node
for i in infra-0{1..3}; do
  # oc adm taint nodes $i node-role.kubernetes.io/infra=reserved:NoExecute
  # remove the taint, just for our demo lab env
  oc adm taint nodes $i node-role.kubernetes.io/infra:NoExecute-
  # enable below if you want to run only ODF on infra
  oc adm taint node $i node.ocs.openshift.io/storage="true":NoSchedule
done

# fix for dns
# https://access.redhat.com/solutions/6592171
cat << EOF > ${BASE_DIR}/data/install/patch-dns.yaml
spec:
  nodePlacement:
    tolerations:
    - operator: Exists
EOF
oc patch dns.operator/default --type merge \
  --patch-file=${BASE_DIR}/data/install/patch-dns.yaml

8.2. move workload to infra node

DO NOT move workload to infra node, if you only have 3 infra nodes in cluster, because we will use the 3 infra node for ODF dedicated.

8.2.1. for router

# for router
oc get ingresscontroller default -n openshift-ingress-operator -o json | jq .spec
# {
#   "clientTLS": {
#     "clientCA": {
#       "name": ""
#     },
#     "clientCertificatePolicy": ""
#   },
#   "httpCompression": {},
#   "httpEmptyRequestsPolicy": "Respond",
#   "httpErrorCodePages": {
#     "name": ""
#   },
#   "replicas": 2,
#   "tuningOptions": {
#     "reloadInterval": "0s"
#   },
#   "unsupportedConfigOverrides": null
# }

oc get pod -n openshift-ingress -o wide
# NAME                            READY   STATUS    RESTARTS   AGE   IP              NODE        NOMINATED NODE   READINESS GATES
# router-default-656fd575-d7w8s   1/1     Running   0          40h   192.168.10.22   master-02   <none>           <none>
# router-default-656fd575-s6tl6   1/1     Running   0          40h   192.168.10.23   master-03   <none>           <none>


cat << EOF > ${BASE_DIR}/data/install/patch-router.yaml
spec:
  nodePlacement:
    nodeSelector: 
      matchLabels:
        node-role.kubernetes.io/infra: ""
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/infra
      value: reserved
    - effect: NoExecute
      key: node-role.kubernetes.io/infra
      value: reserved
EOF
oc patch -n openshift-ingress-operator ingresscontroller/default --type merge \
  --patch-file=${BASE_DIR}/data/install/patch-router.yaml

# to roll back only
# do not use this, it will delete the patch
cat << EOF > ${BASE_DIR}/data/install/patch-router.yaml
spec:
  nodePlacement: null
EOF
oc patch -n openshift-ingress-operator ingresscontroller/default --type merge \
  --patch-file=${BASE_DIR}/data/install/patch-router.yaml

oc get pod -n openshift-ingress -o wide
# NAME                              READY   STATUS    RESTARTS   AGE    IP              NODE       NOMINATED NODE   READINESS GATES
# router-default-788c864f85-5dj9f   1/1     Running   0          90s    192.168.10.32   infra-02   <none>           <none>
# router-default-788c864f85-qcmv7   1/1     Running   0          2m4s   192.168.10.33   infra-03   <none>           <none>

8.2.2. for internal registry

oc get configs.imageregistry.operator.openshift.io/cluster -o json | jq .spec
# {
#   "logLevel": "Normal",
#   "managementState": "Removed",
#   "observedConfig": null,
#   "operatorLogLevel": "Normal",
#   "proxy": {},
#   "replicas": 1,
#   "requests": {
#     "read": {
#       "maxWaitInQueue": "0s"
#     },
#     "write": {
#       "maxWaitInQueue": "0s"
#     }
#   },
#   "rolloutStrategy": "RollingUpdate",
#   "storage": {},
#   "unsupportedConfigOverrides": null
# }

oc get pods -o wide -n openshift-image-registry |grep registry


cat << EOF > ${BASE_DIR}/data/install/patch-registry.yaml
spec:
  nodeSelector: 
    node-role.kubernetes.io/infra: ""
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/infra
    value: reserved
  - effect: NoExecute
    key: node-role.kubernetes.io/infra
    value: reserved
EOF
oc patch configs.imageregistry.operator.openshift.io/cluster --type merge \
  --patch-file=${BASE_DIR}/data/install/patch-registry.yaml

# to roll back only
# do not use this, it will delete the patch
cat << EOF > ${BASE_DIR}/data/install/patch-registry.yaml
spec:
  nodeSelector: null
  tolerations: null
EOF
oc patch configs.imageregistry.operator.openshift.io/cluster --type merge \
  --patch-file=${BASE_DIR}/data/install/patch-registry.yaml

8.2.3. for monitor

oc get pod -n openshift-monitoring -o wide
# NAME                                                     READY   STATUS    RESTARTS      AGE   IP              NODE        NOMINATED NODE   READINESS GATES
# alertmanager-main-0                                      6/6     Running   0             40h   172.21.0.107    master-03   <none>           <none>
# alertmanager-main-1                                      6/6     Running   1 (40h ago)   40h   172.21.2.40     master-02   <none>           <none>
# cluster-monitoring-operator-7dd6795794-v9mqh             2/2     Running   0             40h   172.21.2.23     master-02   <none>           <none>
# kube-state-metrics-6b66b788d5-j2v2j                      3/3     Running   0             40h   172.21.0.95     master-03   <none>           <none>
# node-exporter-54x95                                      2/2     Running   0             35h   192.168.10.33   infra-03    <none>           <none>
# node-exporter-7gtr5                                      2/2     Running   0             35h   192.168.10.31   infra-01    <none>           <none>
# node-exporter-bfbt6                                      2/2     Running   2             42h   192.168.10.22   master-02   <none>           <none>
# node-exporter-cz8p8                                      2/2     Running   0             35h   192.168.10.32   infra-02    <none>           <none>
# node-exporter-d759x                                      2/2     Running   2             42h   192.168.10.23   master-03   <none>           <none>
# node-exporter-jplrr                                      2/2     Running   0             14h   192.168.10.42   worker-02   <none>           <none>
# node-exporter-k498r                                      2/2     Running   2             41h   192.168.10.21   master-01   <none>           <none>
# node-exporter-xcxv5                                      2/2     Running   0             15h   192.168.10.41   worker-01   <none>           <none>
# openshift-state-metrics-86884485c8-4zcpf                 3/3     Running   0             40h   172.21.0.91     master-03   <none>           <none>
# prometheus-adapter-68759db859-m8hw7                      1/1     Running   0             18h   172.21.4.36     master-01   <none>           <none>
# prometheus-adapter-68759db859-mlxfz                      1/1     Running   0             11h   172.21.12.7     worker-01   <none>           <none>
# prometheus-k8s-0                                         6/6     Running   0             40h   172.21.0.109    master-03   <none>           <none>
# prometheus-k8s-1                                         6/6     Running   0             40h   172.21.2.34     master-02   <none>           <none>
# prometheus-operator-78b549956b-676kt                     2/2     Running   0             40h   172.21.0.100    master-03   <none>           <none>
# prometheus-operator-admission-webhook-746c7d6ffb-nmglp   1/1     Running   0             40h   172.21.2.28     master-02   <none>           <none>
# prometheus-operator-admission-webhook-746c7d6ffb-w8tz6   1/1     Running   0             40h   172.21.0.105    master-03   <none>           <none>
# thanos-querier-6b5bcc9cb-b9r4j                           6/6     Running   0             40h   172.21.0.104    master-03   <none>           <none>
# thanos-querier-6b5bcc9cb-xvsvl                           6/6     Running   0             40h   172.21.2.30     master-02   <none>           <none>


cat << EOF > ${BASE_DIR}/data/install/cm-monitor.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |+
    alertmanagerMain:
      nodeSelector: 
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
    prometheusK8s:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
    prometheusOperator:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
    k8sPrometheusAdapter:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
    kubeStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
    telemeterClient:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
    openshiftStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
    thanosQuerier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
EOF
oc create --save-config -n openshift-monitoring -f ${BASE_DIR}/data/install/cm-monitor.yaml

# oc delete -n openshift-monitoring -f ${BASE_DIR}/data/install/cm-monitor.yaml

oc get pod -n openshift-monitoring -o wide
# NAME                                                    READY   STATUS    RESTARTS        AGE     IP              NODE        NOMINATED NODE   READINESS GATES
# alertmanager-main-0                                     6/6     Running   1 (2m29s ago)   2m33s   172.21.10.12    infra-03    <none>           <none>
# alertmanager-main-1                                     6/6     Running   1 (3m2s ago)    3m7s    172.21.6.11     infra-01    <none>           <none>
# cluster-monitoring-operator-7dd6795794-v9mqh            2/2     Running   0               40h     172.21.2.23     master-02   <none>           <none>
# kube-state-metrics-857fc67cb9-snbpc                     3/3     Running   0               3m11s   172.21.8.8      infra-02    <none>           <none>
# node-exporter-54x95                                     2/2     Running   0               35h     192.168.10.33   infra-03    <none>           <none>
# node-exporter-7gtr5                                     2/2     Running   0               35h     192.168.10.31   infra-01    <none>           <none>
# node-exporter-bfbt6                                     2/2     Running   2               42h     192.168.10.22   master-02   <none>           <none>
# node-exporter-cz8p8                                     2/2     Running   0               35h     192.168.10.32   infra-02    <none>           <none>
# node-exporter-d759x                                     2/2     Running   2               42h     192.168.10.23   master-03   <none>           <none>
# node-exporter-jplrr                                     2/2     Running   0               14h     192.168.10.42   worker-02   <none>           <none>
# node-exporter-k498r                                     2/2     Running   2               42h     192.168.10.21   master-01   <none>           <none>
# node-exporter-xcxv5                                     2/2     Running   0               15h     192.168.10.41   worker-01   <none>           <none>
# openshift-state-metrics-6469575fd-sknv5                 3/3     Running   0               3m11s   172.21.8.9      infra-02    <none>           <none>
# prometheus-adapter-765d86b6c9-ffps5                     1/1     Running   0               3m10s   172.21.6.9      infra-01    <none>           <none>
# prometheus-adapter-765d86b6c9-s8r7p                     1/1     Running   0               3m10s   172.21.8.10     infra-02    <none>           <none>
# prometheus-k8s-0                                        6/6     Running   0               2m1s    172.21.10.13    infra-03    <none>           <none>
# prometheus-k8s-1                                        6/6     Running   0               3m3s    172.21.8.11     infra-02    <none>           <none>
# prometheus-operator-5d45f8bb65-fhgsf                    2/2     Running   0               3m17s   172.21.10.10    infra-03    <none>           <none>
# prometheus-operator-admission-webhook-b847d7dd4-82s44   1/1     Running   0               3m22s   172.21.10.9     infra-03    <none>           <none>
# prometheus-operator-admission-webhook-b847d7dd4-f7gnt   1/1     Running   0               3m22s   172.21.6.8      infra-01    <none>           <none>
# thanos-querier-696b585794-gwvdj                         6/6     Running   0               3m8s    172.21.6.10     infra-01    <none>           <none>
# thanos-querier-696b585794-ws5rr                         6/6     Running   0               3m8s    172.21.10.11    infra-03    <none>           <none>

8.2.4. for logging

The default openshift installation does not have logging included. So do not worry now.

The offical document:

  • https://docs.openshift.com/container-platform/4.12/machine_management/creating-infrastructure-machinesets.html#infrastructure-moving-logging_creating-infrastructure-machinesets

9. install ODF

9.1. download additional installation media

# on helper
# try to find out the correct operator version.

# to list all channel
oc get PackageManifest -o json | jq -r ' .items[] | "\(.metadata.name),\(.status.channels[].name),\(.status.channels[].currentCSVDesc.version)" ' | column -ts $',' | grep odf
# ocs-client-operator                               stable-4.12                4.12.3-rhodf
# odf-multicluster-orchestrator                     stable-4.11                4.11.8
# odf-multicluster-orchestrator                     stable-4.12                4.11.8
# odf-multicluster-orchestrator                     stable-4.11                4.12.3-rhodf
# odf-multicluster-orchestrator                     stable-4.12                4.12.3-rhodf
# ocs-operator                                      stable-4.12                4.12.3-rhodf
# ocs-operator                                      stable-4.11                4.12.3-rhodf
# odr-hub-operator                                  stable-4.11                4.12.3-rhodf
# odr-hub-operator                                  stable-4.12                4.12.3-rhodf
# ibm-storage-odf-operator                          stable-v1.3                1.3.0
# mcg-operator                                      stable-4.11                4.12.3-rhodf
# mcg-operator                                      stable-4.12                4.12.3-rhodf
# odf-operator                                      stable-4.11                4.11.8
# odf-operator                                      stable-4.12                4.11.8
# odf-operator                                      stable-4.11                4.12.3-rhodf
# odf-operator                                      stable-4.12                4.12.3-rhodf
# odf-csi-addons-operator                           stable-4.11                4.11.8
# odf-csi-addons-operator                           stable-4.12                4.11.8
# odf-csi-addons-operator                           stable-4.11                4.12.3-rhodf
# odf-csi-addons-operator                           stable-4.12                4.12.3-rhodf
# odr-cluster-operator                              stable-4.11                4.12.3-rhodf
# odr-cluster-operator                              stable-4.12                4.12.3-rhodf

# on vultr host

cat > /data/ocp4/mirror.yaml << EOF
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
# archiveSize: 4
mirror:
  platform:
    architectures:
      - amd64
      # - arm64
  #   channels:
  #     - name: stable-4.12
  #       type: ocp
  #       minVersion: 4.12.16
  #       maxVersion: 4.12.16
  #       shortestPath: true
  #   graph: false
  # additionalImages:
  #   - name: registry.redhat.io/redhat/redhat-operator-index:v4.12
  #   - name: registry.redhat.io/redhat/certified-operator-index:v4.12
  #   - name: registry.redhat.io/redhat/community-operator-index:v4.12
  #   - name: registry.redhat.io/redhat/redhat-marketplace-index:v4.12 
  #   - name: quay.io/openshift/origin-kube-rbac-proxy:latest
  #   - name: quay.io/wangzheng422/debug-pod:alma-9.1
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.12
      packages:
      - name: odf-operator                                 
        channels:
        - name: stable-4.12
          minVersion: 4.12.3-rhodf
      - name: local-storage-operator
        channels:
        - name: stable
          minVersion: 4.12.0-202305101515
EOF


mkdir -p /data/ocp-install/oc-mirror/
cd /data/ocp-install/oc-mirror/

cd /data/wzh.work
oc-mirror --config /data/ocp4/mirror.yaml file:///data/ocp-install/oc-mirror/

# sync back to demo lab jumpbox
cd /data
rsync -P -arz  /data/ocp-install root@10.229.104.55:/home/wzh/

# on helper vm node
rsync -P -arz  root@192.168.10.90:/home/wzh/ocp-install /data/

# import the image to internal registry
oc-mirror --from=/data/ocp-install/oc-mirror/mirror_seq1_000000.tar \
  docker://quay.demolab-infra.wzhlab.top:8443

# as user 3node
oc get OperatorHub/cluster -o yaml
# ......
# spec: {}
# status:
#   sources:
#   - disabled: false
#     name: certified-operators
#     status: Success
#   - disabled: false
#     name: community-operators
#     status: Success
#   - disabled: false
#     name: redhat-marketplace
#     status: Success
#   - disabled: false
#     name: redhat-operators
#     status: Success


cat << EOF > ${BASE_DIR}/data/install/hub.disable.yaml
spec:
  sources: [ 
    {
      name: "certified-operators",
      disabled: true
    },
    {
      name: "community-operators",
      disabled: true
    },
    {
      name: "redhat-marketplace",
      disabled: true
    }
  ]
EOF
oc patch OperatorHub/cluster --type merge \
  --patch-file=${BASE_DIR}/data/install/hub.disable.yaml

9.2. install ODF

install ODF is straightforward. Just following official document:

  • https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/deploying_openshift_data_foundation_using_bare_metal_infrastructure/index#deploy-using-local-storage-devices-bm

first, you have to install local storage operator, this operator will init the disk, and provide the disk to consume by ODF

click install.

enable monitoring, this is optional.

just wait, then it will be ok.

then, install ODF

click install.

keep the default config.

after some time, the odf operator is ready.

you have to init the ODF, by creating a storage system.

keep default config in the first stop.

then, a config will be apply to local storage operator, and it will auto discovery the local disk, wait for sometime, it will show up all the node in the cluster, and all the disk. Select only infra node.

after click next, it will create local volume set in local storage operator. The local disk will be encapsulated into local volume, and consumed by ODF.

In the next, just keep the default, or you can taint the node. We already taint the node, so do not worry here.

next step, keep the default config.

review your config, and begin to create.

wait for sometime, it will be ok, remember to refresh the web console.

you can see the block and file is ok.

object service is ok either.

The default storage class is not ok, we will create a new one. To save space, we will use 2-replica, default is 3-replica, we will also use compression.

Select rbd provisioner, and create a new pool

in the popup, set the data replication policy to 2-way, and enable compression.

create block pool is ok.

keep other config in default.

then, the new storage class is ready to use.

9.3. patch for csi components

bring csi components to infra node, NO need in our demo lab.

you can ignore below.

offical document:

  • https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/managing_and_allocating_storage_resources/index#managing-container-storage-interface-component-placements_rhodf
# bring csi components to infra node
# NO need in out demo lab.
# you can ignore it
oc get configmap rook-ceph-operator-config -n openshift-storage -o yaml
# apiVersion: v1
# kind: ConfigMap
# metadata:
#   creationTimestamp: "2023-06-01T13:00:58Z"
#   name: rook-ceph-operator-config
#   namespace: openshift-storage
#   resourceVersion: "1139866"
#   uid: 94177029-8189-4725-b712-0dbbc6fef71a

cat << EOF > ${BASE_DIR}/data/install/odf.csi-patch.yaml
data:
  CSI_PLUGIN_TOLERATIONS: |
    - key: nodetype
      operator: Equal
      value: infra
      effect: NoSchedule
    - key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
      effect: NoSchedule
EOF
oc patch OperatorHub/cluster --type merge \
  --patch-file=${BASE_DIR}/data/install/odf.csi-patch.yaml

10. enable integrated image register

we have odf installed, so we have backend storage, then, we can active internal image register with this odf backend storage.

official document:


# create the pvc for image registry
cat << EOF > ${BASE_DIR}/data/install/pvc.image.registry.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: ocs4registry
  namespace: openshift-image-registry
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 200Gi
  storageClassName: ocs-storagecluster-cephfs
EOF

oc create --save-config -f ${BASE_DIR}/data/install/pvc.image.registry.yaml

# then patch the cluster object to enable image register and use the pvc
oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Managed","storage":{"pvc":{"claim":"ocs4registry"}}}}' --type=merge

# if you want to restore
oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Removed"}}' --type=merge

oc get clusteroperator image-registry
# NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
# image-registry   4.12.16   True        True          False      3d5h    Progressing: The deployment has not completed...

# wait some time for the image registry to be ready

# export the image registry route
oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge

HOST=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')
echo $HOST
# default-route-openshift-image-registry.apps.demolab-ocp.wzhlab.top

# login as kubeadmin, and podman login to try
oc login https://api.demolab-ocp.wzhlab.top:6443 -u kubeadmin

podman login -u kubeadmin -p $(oc whoami -t) --tls-verify=false $HOST
# Login Succeeded!

# push a demo image to default namespace
podman pull quay.demolab-infra.wzhlab.top:8443/wangzheng422/debug-pod:alma-9.1
podman tag quay.demolab-infra.wzhlab.top:8443/wangzheng422/debug-pod:alma-9.1 $HOST/default/debug-pod:alma-9.1
podman push $HOST/default/debug-pod:alma-9.1
# Getting image source signatures
# Copying blob 4bfa56c571a7 skipped: already exists
# Copying blob 326f1ea0e5d2 skipped: already exists
# Copying config 1d9fb6b20f done
# Writing manifest to image destination
# Storing signatures

# you can see the image as image stream in ocp platform
oc get is
# NAME        IMAGE REPOSITORY                                                                       TAGS       UPDATED
# debug-pod   default-route-openshift-image-registry.apps.demolab-ocp.wzhlab.top/default/debug-pod   alma-9.1   About a minute ago

# oc config switch context back
oc config use-context admin

oc get configs.imageregistry.operator.openshift.io cluster -o json | jq .spec
# {
#   "defaultRoute": true,
#   "httpSecret": "5bb05db2f8a67fcfe9809bf83ae4a492d4ebcf51a50a29a10bea5fda938300f7d6e6c12618a618f8444d0f9579e5ca2f26120b8d90c480a564011cbf356d3528",
#   "logLevel": "Normal",
#   "managementState": "Managed",
#   "observedConfig": null,
#   "operatorLogLevel": "Normal",
#   "proxy": {},
#   "replicas": 1,
#   "requests": {
#     "read": {
#       "maxWaitInQueue": "0s"
#     },
#     "write": {
#       "maxWaitInQueue": "0s"
#     }
#   },
#   "rolloutStrategy": "RollingUpdate",
#   "storage": {
#     "managementState": "Unmanaged",
#     "pvc": {
#       "claim": "ocs4registry"
#     }
#   },
#   "unsupportedConfigOverrides": null
# }

you can see the image stream from webconsole

11. end

在 openshift 4.11 上安装和运行 openstack

➡️In English (google translated)

本文将讲述,如何在openshift上激活openstack operator,并最终安装一个openstack overcloud。本文的目的,是在home lab上,装出来一个openstack on openshift集群,方便进一步的学习和研究。

由于作者水平,时间和实验环境的限制,本文所使用的方法,会有诸多的不足甚至是错误,欢迎大家指正。

背景

在进入正题之前,我们简单的说2句为什么要搞openstack on openshift这个实验。

大家都说,现在是云时代,一般来说,我们指的是使用k8s和容器做为云的底座。但是在k8s出现之前,云是用openstack和虚机作为云的底座的。由于历史原因,openstack在客户群体中,还有很大的装机量,特别是电信行业,大量的私有云,都是用openstack来打造的。如何让这部分群体,想现在的k8s/容器云转型,就是一个很大的问题。

而openstack on openshift,就是推动这种转型的一个尝试/技术方案。我们之前看到的方案,大多数是openstack作为虚机云的底座,然后k8s作为虚机云上的应用/paas部署出来,然后把paas交付给客户。openstack on openshift则是另外一种思路,它把k8s/openshift作为云的底座,然后把openstack作为在容器云上的应用部署出来,然后把这个openstack的overcloud集群交付给客户。

在本文中,我们就展示一下,openshift作为云底座,是如何做到这些的。

实验环境和架构图

本次实验,主要是在一台24core, 128G内存的物理机上完成。实验的部署架构图如下。

  • 在物理机上,创建了2个bridge(这部分以后会改进,使用ovs, 配合vlan的配置)
  • 创建4个kvm,每个kvm有4个网络接口,为了方便,每个kvm都有4块硬盘,但是其实只有worker的那个节点,是需要多出来的3个硬盘的。
  • 创建了虚拟BMC,这个是openshift IPI安装过程中需要的。
  • 物理机上开启了内存超分,硬盘的thin provision,最大化的榨取物理机的资源。

实验步骤

  1. 使用IPI模式安装openshift,我们使用 3 node / compact cluster 模式安装
  2. 纳管worker node / kvm
  3. 安装 cnv, nmstat, sriov, nfs等openshift插件
  4. 安装 openstack operator
  5. 配置 openstack 参数, 部署 openstack overcloud

视频讲解

prepare host

我们的实验是在一台物理机上,资源有限,我们就开启内存超分,让我们可以多多创建虚拟机。另外openstack controller是以虚拟机的形式,运行在openshift node里面的,我们的node已经是kvm了,那么就需要开启嵌套虚拟化的功能。

另外,很重要的一步,就是准备网络,本文创建了2个bridge,这个是openshift官方的方法,虽然并不是非常适用在本次实验,但是也凑合能用,问题是多个vlan混杂在bridge里面了,还有一个问题,就是如果想多个主机部署,vlan下的接口,跨主机是不通的。这个后面再用手动部署的ovs来完善,目前先凑合用。

memory over commit

我们先看看如何开启内存超分。如果不做如下的操作,多个kvm的内存总和就不能超过物理内存,这个对我们做实验会造成很大的麻烦,我们就开启它。以下操作不用重启系统。


cat << EOF >> /etc/sysctl.d/99-wzh-sysctl.conf

vm.overcommit_memory = 1

EOF
sysctl --system

nested virtulization

接下来,就是开启嵌套虚拟化。这样,我们就能在kvm里面,再启动一个kvm了。虽然,不得不说,这个里面的kvm是真的慢啊,但是做实验,能做下去最重要,对不。以下操作,不用重启系统。

# first, go to kvm host to config nested kvm
# https://zhuanlan.zhihu.com/p/35320117
cat /sys/module/kvm_intel/parameters/nested
# 0

cat << EOF > /etc/modprobe.d/kvm-nested.conf
options kvm_intel nested=1  
options kvm-intel enable_shadow_vmcs=1   
options kvm-intel enable_apicv=1         
options kvm-intel ept=1                  
EOF

modprobe -r kvm_intel   #协助掉内核中的kvm_intel模块,注意要在所有虚拟机都关闭的情况下执行
modprobe -a kvm_intel   #重新加载该模块

cat /sys/module/kvm_intel/parameters/nested
# 1

cat /proc/cpuinfo | grep vmx

# on guest os
# if you see this file, means it is success.
ls /dev/kvm

prepare network on 103

接下来,我们就在物理机上创建2个bridge, 分别是baremetal, provisioning,为什么要创建2个bridge,因为这个是openshift IPI安装的需求,但是其实IPI安装是有2个网络模式的,一个是单网络模式,只要baremetal就可以,另外一个模式,才是双网络模式,需要baremetal, provisioning两个网络。那我们能不能只用baremetal这个单网络模式呢?

openstack的官方文档里面,说了必须要provisioning网络,很遗憾,这次官方文档说对了。作者用baremetal单网络模式试过了,在最后部署openstack overcloud computerHCI节点的时候,openstack operator指示openshift给worker node耍操纵系统镜像,这次刷的不是coreos,而是我们提供的rhel镜像。在这一步,可能是openstack operator的限制,镜像必须从provisioning网络提供,也许以后软件升级了,这个双网络模式的要求会取消吧。

这一步操作,对应到架构图,是这部分:

bridge: baremetal

我们先按照openshift官方文档,创建baremetal bridge.


# 创建实验用虚拟网络

mkdir -p /data/kvm
cd /data/kvm

cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.103/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh

nmcli con mod baremetal +ipv4.addresses "192.168.7.103/24"
nmcli con up baremetal

vxlan: provisioning

然后,我们需要创建另外一个bridge, provisioning。如果物理机有2个网卡,当然可以在另外一个网卡上,创建这个bridge,作者之前也是这么做的。但是这里尝试另外一个做法,为以后切换到sdn做一些尝试和准备。

我们就在之前的那个网卡上,创建一个vxlan接口,并且绑定到provisioning bridge上去。

我们照着官方文档做就好了。


nmcli connection add type bridge con-name br-prov ifname br-prov ipv4.method disabled ipv6.method disabled

nmcli con modify br-prov ipv4.method manual ipv4.address 172.22.0.1/24

nmcli connection add type vxlan slave-type bridge con-name br-prov-vxlan5 ifname vxlan5 id 5 local 172.21.6.103 remote 172.21.6.102 master br-prov

nmcli connection up br-prov

bridge fdb show dev vxlan5
# c2:d2:b2:e7:6e:f5 vlan 1 master br-prov permanent
# c2:d2:b2:e7:6e:f5 master br-prov permanent
# 00:00:00:00:00:00 dst 172.21.6.103 self permanent
# ce:1f:f5:9e:f8:7f dst 172.21.6.103 self

cat << EOF > /data/kvm/vxlan5-bridge.xml
<network>
 <name>provisioning</name>
 <forward mode="bridge" />
 <bridge name="br-prov" />
</network>
EOF

virsh net-define /data/kvm/vxlan5-bridge.xml
virsh net-start provisioning
virsh net-autostart provisioning

virsh net-list
#  Name           State    Autostart   Persistent
# -------------------------------------------------
#  default        active   yes         yes
#  provisioning   active   yes         yes

prepare rpm repo on helper

在openshift上部署openstack,目前来说,原理并不复杂,openshift负责提供虚拟机(通过cnv), 和物理机(通过machine api),这些虚拟机和物理机都已经装好了操纵系统。接下来openshift会准备好一套ansible脚本,管理员到指定pod里面,去运行这个openstack 安装的ansible脚本就好了。后面这一半的操作步骤,和安装openstack是一样的。

既然和安装openstack的过程是一样的,那我们就要按照openstack的要求,准备要rpm repo源。我们还是按照先去海外vps上下载,然后下载回来的做法来搞。

接下来,我们给下载好了rpm repo配置一个,repo config 文件,方便后面openstack安装的时候,导入。

最后,我们做一个自动启动的web server,来给这个rpm repo提供服务。

# sync repo on vultr
dnf install -y yum-utils

cd mnt/blockstorage/

subscription-manager release --set=8.6

declare -a arr=("rhel-8-for-x86_64-baseos-eus-rpms" 
                "rhel-8-for-x86_64-appstream-eus-rpms" 
                "rhel-8-for-x86_64-highavailability-eus-rpms"
                "ansible-2.9-for-rhel-8-x86_64-rpms"
                openstack-16.2-for-rhel-8-x86_64-rpms
                fast-datapath-for-rhel-8-x86_64-rpms
                )

for i in "${arr[@]}"
do
   dnf reposync --repoid="$i" -m --download-metadata -n --delete
done

# baseos should sync with old versoin
dnf reposync --repoid=rhel-8-for-x86_64-baseos-eus-rpms -m --download-metadata --delete 

# on local / helper
declare -a arr=("rhel-8-for-x86_64-baseos-eus-rpms" 
                "rhel-8-for-x86_64-appstream-eus-rpms" 
                "rhel-8-for-x86_64-highavailability-eus-rpms"
                "ansible-2.9-for-rhel-8-x86_64-rpms"
                openstack-16.2-for-rhel-8-x86_64-rpms
                fast-datapath-for-rhel-8-x86_64-rpms
                )

VAR_IP=158.247.234.245

for i in "${arr[@]}"
do
   rsync -P --delete -arz root@$VAR_IP:/mnt/blockstorage/$i /data/dnf/
done

# after download , we create a repo config file
# this will be used later when install openstack
echo > /data/dnf/osp.repo
for i in "${arr[@]}"
do
cat << EOF >> /data/dnf/osp.repo
[$i]
name=$i
baseurl=http://192.168.7.11:5000/$i
enabled=1
gpgcheck=0
EOF
done

# setup web server startup service
# let the web server auto start
cat << EOF > /etc/systemd/system/local-webserver-osp.service
[Unit]
Description=local-webserver-osp

[Service]
User=root
WorkingDirectory=/data/dnf
ExecStart=/bin/bash -c 'python3 -m http.server 5000'
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload

systemctl enable --now local-webserver-osp.service

lvs config

为了压榨服务器资源,我们还配置lvm thin provision,这样能高效的使用磁盘资源,避免浪费。lvm thin provision简单来说,就是硬盘的超售。


pvcreate -y /dev/sdb
vgcreate vgdata /dev/sdb

# https://access.redhat.com/articles/766133
lvcreate -y -n poolA -L 500G vgdata
lvcreate -y -n poolA_meta -L 1G vgdata
lvconvert -y --thinpool vgdata/poolA --poolmetadata vgdata/poolA_meta
  # Thin pool volume with chunk size 64.00 KiB can address at most <15.88 TiB of data.
  # WARNING: Converting vgdata/poolA and vgdata/poolA_meta to thin pool's data and metadata volumes with metadata wiping.
  # THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  # Converted vgdata/poolA and vgdata/poolA_meta to thin pool.

lvextend -l +100%FREE vgdata/poolA
  # Rounding size to boundary between physical extents: <1.09 GiB.
  # Size of logical volume vgdata/poolA_tmeta changed from 1.00 GiB (256 extents) to <1.09 GiB (279 extents).
  # Size of logical volume vgdata/poolA_tdata changed from 500.00 GiB (128000 extents) to <1.09 TiB (285457 extents).
  # Logical volume vgdata/poolA successfully resized.

kvm setup

做完了上面的准备工作,我们就要开始创建kvm了,我们做实验是会反复重装的,所以会首先有清理的脚本。然后我们有另外一些脚本去创建kvm,注意,我们是创建kvm,而不会去启动他们。

cleanup

我们准备了脚本,来清理kvm,把物理机清理成一个干净的系统。


create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

virsh destroy ocp4-ipi-osp-master-01
virsh undefine ocp4-ipi-osp-master-01

create_lv vgdata poolA lv-ocp4-ipi-osp-master-01 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-master-01-data 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-master-01-data-02 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-master-01-data-03 100G 

virsh destroy ocp4-ipi-osp-master-02
virsh undefine ocp4-ipi-osp-master-02

create_lv vgdata poolA lv-ocp4-ipi-osp-master-02 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-master-02-data 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-master-02-data-02 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-master-02-data-03 100G 

virsh destroy ocp4-ipi-osp-master-03
virsh undefine ocp4-ipi-osp-master-03

create_lv vgdata poolA lv-ocp4-ipi-osp-master-03 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-master-03-data 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-master-03-data-02 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-master-03-data-03 100G 

virsh destroy ocp4-ipi-osp-worker-01
virsh undefine ocp4-ipi-osp-worker-01

create_lv vgdata poolA lv-ocp4-ipi-osp-worker-01 200G 
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-01-data 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-01-data-02 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-01-data-03 100G 

virsh destroy ocp4-ipi-osp-worker-02
virsh undefine ocp4-ipi-osp-worker-02

create_lv vgdata poolA lv-ocp4-ipi-osp-worker-02 200G 
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-02-data 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-02-data-02 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-02-data-03 100G 

virsh destroy ocp4-ipi-osp-worker-03
virsh undefine ocp4-ipi-osp-worker-03

create_lv vgdata poolA lv-ocp4-ipi-osp-worker-03 200G 
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-03-data 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-03-data-02 100G 
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-03-data-03 100G 

VAR_VM=`virsh list --all | grep bootstrap | awk '{print $2}'`
virsh destroy $VAR_VM
virsh undefine $VAR_VM
VAR_POOL=`virsh pool-list --all | grep bootstrap | awk '{print $1}'`
virsh pool-destroy $VAR_POOL
virsh pool-undefine $VAR_POOL
/bin/rm -rf /var/lib/libvirt/openshift-images/*
/bin/rm -rf /var/lib/libvirt/images/*


define kvm on 103

然后,我们就可以开始定义kvm了,这里不能启动kvm,因为定义的kvm没有引导盘,启动了也无法开始安装,IPI模式下,installer会调用virtual bmc redfish接口,给kvm挂载上启动镜像,开始安装过程。

我们为了简单起见,每个kvm都配置了4块硬盘,4个网卡,其实只有worker node这一个kvm会用到4块硬盘。我们的vda硬盘还要大一些,因为要承载集群内的nfs服务器。由于我们配置了lvm thin provision,所以 lv 使用起来就可以肆无忌惮了。


/bin/rm -rf /var/lib/libvirt/images/*

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

SNO_MEM=32
export KVM_DIRECTORY=/data/kvm

virsh destroy ocp4-ipi-osp-master-01
virsh undefine ocp4-ipi-osp-master-01

create_lv vgdata poolA lv-ocp4-ipi-osp-master-01 500G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-master-01-data 100G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-master-01-data-02 100G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-master-01-data-03 100G recreate

virt-install --name=ocp4-ipi-osp-master-01 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-01,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-01-data,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-01-data-02,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-01-data-03,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.4 \
  --network bridge=baremetal,model=virtio \
  --network network:provisioning,model=virtio \
  --network bridge=baremetal,model=virtio \
  --network bridge=baremetal,model=virtio \
  --print-xml > ${KVM_DIRECTORY}/ocp4-ipi-osp-master-01.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-ipi-osp-master-01.xml

virsh destroy ocp4-ipi-osp-master-02
virsh undefine ocp4-ipi-osp-master-02

create_lv vgdata poolA lv-ocp4-ipi-osp-master-02 500G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-master-02-data 100G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-master-02-data-02 100G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-master-02-data-03 100G recreate

virt-install --name=ocp4-ipi-osp-master-02 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-02,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-02-data,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-02-data-02,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-02-data-03,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.4 \
  --network bridge=baremetal,model=virtio \
  --network network:provisioning,model=virtio \
  --network bridge=baremetal,model=virtio \
  --network bridge=baremetal,model=virtio \
  --print-xml > ${KVM_DIRECTORY}/ocp4-ipi-osp-master-02.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-ipi-osp-master-02.xml


# SNO_MEM=64

virsh destroy ocp4-ipi-osp-master-03
virsh undefine ocp4-ipi-osp-master-03

create_lv vgdata poolA lv-ocp4-ipi-osp-master-03 500G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-master-03-data 100G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-master-03-data-02 100G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-master-03-data-03 100G recreate

virt-install --name=ocp4-ipi-osp-master-03 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-03,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-03-data,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-03-data-02,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-master-03-data-03,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.4 \
  --network bridge=baremetal,model=virtio \
  --network network:provisioning,model=virtio \
  --network bridge=baremetal,model=virtio \
  --network bridge=baremetal,model=virtio \
  --print-xml > ${KVM_DIRECTORY}/ocp4-ipi-osp-master-03.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-ipi-osp-master-03.xml

SNO_MEM=16

virsh destroy ocp4-ipi-osp-worker-01
virsh undefine ocp4-ipi-osp-worker-01

create_lv vgdata poolA lv-ocp4-ipi-osp-worker-01 500G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-01-data 100G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-01-data-02 100G recreate
create_lv vgdata poolA lv-ocp4-ipi-osp-worker-01-data-03 100G recreate

virt-install --name=ocp4-ipi-osp-worker-01 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-worker-01,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-worker-01-data,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-worker-01-data-02,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lv-ocp4-ipi-osp-worker-01-data-03,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.4 \
  --network bridge=baremetal,model=virtio \
  --network network:provisioning,model=virtio \
  --network bridge=baremetal,model=virtio \
  --network bridge=baremetal,model=virtio \
  --print-xml > ${KVM_DIRECTORY}/ocp4-ipi-osp-worker-01.xml
virsh define --file ${KVM_DIRECTORY}/ocp4-ipi-osp-worker-01.xml

bmc simulator

定义了kvm,我们需要配套的virtual BMC / redfish 接口来控制他们,这都是为了模拟真实的物理机,在真实的物理机场景下,openshift installer会调用redfish接口来控制物理机。

我们选用openstack项目的sushy工具来做这个virtual BMC。运行一个sushy实例,就可以管理同一个物理机上的所有kvm实例,简单易用。

最后,我们使用systemd来定义一个自动启动的服务,来运行sushy.

这一步操作,对应到架构图,是这部分:

# try to install and run it manually
dnf -y install python3-pip
pip3 install --user sushy-tools

mkdir -p /etc/crts
scp root@192.168.7.11:/etc/crts/* /etc/crts/

/root/.local/bin/sushy-emulator -i 0.0.0.0 --ssl-certificate /etc/crts/redhat.ren.crt --ssl-key /etc/crts/redhat.ren.key

# try to deploy as systemd service
cat << EOF > /etc/systemd/system/sushy-emulator.service
[Unit]
Description=sushy-emulator

[Service]
User=root
WorkingDirectory=/root
ExecStart=/bin/bash -c '/root/.local/bin/sushy-emulator -i 0.0.0.0 --ssl-certificate /etc/crts/redhat.ren.crt --ssl-key /etc/crts/redhat.ren.key'
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload

systemctl enable --now sushy-emulator.service

get mac and vm list on 103

有了virtual BMC,我们就要抽取一些openshift installer需要用到的参数,一个是kvm的mac地址,一个是redfish里面需要的uuid。

我们使用如下的脚本,来自动的得到,并且上传到 helper 节点去。


# on helper clean all
/bin/rm -f /data/install/mac.list.*
/bin/rm -f /data/install/vm.list.*

# back to 103
cd /data/kvm/
for i in ocp4-ipi-osp-master-0{1..3} ocp4-ipi-osp-worker-0{1..1}
do
  echo -ne "${i}\t" ; 
  virsh dumpxml ${i} | grep "mac address" | cut -d\' -f2 | tr '\n' '\t'
  echo 
done > mac.list.103
cat /data/kvm/mac.list.103
# ocp4-ipi-osp-master-01  52:54:00:67:64:5f       52:54:00:e8:28:e7       52:54:00:4a:a4:39
# ocp4-ipi-osp-master-02  52:54:00:ac:ed:36       52:54:00:b5:34:c4       52:54:00:87:36:75
# ocp4-ipi-osp-master-03  52:54:00:ae:72:e5       52:54:00:87:19:c2       52:54:00:99:55:12
# ocp4-ipi-osp-worker-01  52:54:00:17:b2:2d       52:54:00:ca:74:c0       52:54:00:f4:5e:a8

cat << 'EOF' > redfish.sh
#!/usr/bin/env bash

curl -k -s https://127.0.0.1:8000/redfish/v1/Systems/ | jq -r '.Members[]."@odata.id"' >  list

while read -r line; do
    curl -k -s https://127.0.0.1:8000/$line | jq -j '.Id, " ", .Name, "\n" '
done < list

EOF
bash redfish.sh | grep ipi > /data/kvm/vm.list.103
cat /data/kvm/vm.list.103
# 6b9a4f6b-d751-4fd5-9493-39792039e9e2 ocp4-ipi-osp-worker-01
# 1a2d1e2a-5f50-49cf-920e-11f7b7f136dc ocp4-ipi-osp-master-02
# 9c7085a2-ed0c-4cbf-94ca-065d3e8db335 ocp4-ipi-osp-master-01
# 14474c89-152c-4580-8bbb-7f03e4e370e0 ocp4-ipi-osp-master-03

scp /data/kvm/{mac,vm}.list.* root@192.168.7.11:/data/install/

on helper node

终于所有的准备工作都做完了,我们开始在helper上面进行openshift的安装。在这之前,还有一个配置helper节点的步骤,主要是配置dns服务之类的,在这里就不重复了,有需要了解的,可以看这里的文档

get installer binary

我们先要从安装文件目录中,得到installer的二进制文件。


# switch to you install version

export BUILDNUMBER=4.11.6

pushd /data/ocp4/${BUILDNUMBER}
tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
tar -xzf oc-mirror.tar.gz -C /usr/local/bin/
chmod +x /usr/local/bin/oc-mirror
install -m 755 /data/ocp4/clients/butane-amd64 /usr/local/bin/butane
install -m 755 /data/ocp4/clients/coreos-installer_amd64 /usr/local/bin/coreos-installer
popd

prepare web server for iso/images

接下来,我们准备一个自动启动的 web server,提供一些iso等镜像的下载服务。

############################
# as root create web server
cd /data/ocp4

python3 -m http.server 8080

cat << EOF > /etc/systemd/system/local-webserver.service
[Unit]
Description=local-webserver

[Service]
User=root
WorkingDirectory=/data/ocp4
ExecStart=/bin/bash -c 'python3 -m http.server 8080'
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload

systemctl enable --now local-webserver.service

# end as root
############################

create the install yaml

接下来我们创建安装配置文件。这里面最关键的就是那个yaml模板,我们在模板里面,启动IPI安装模式,并且配置3个master的redfish接口信息,并启用静态IP安装的方法,配置了静态IP信息。

安装配置yaml文件创建后,我们调用installer,把他们转化成ignition等真正的安装配置文件,并且和baremetal installer二进制文件一起,传递到物理机上。

这里面有2个二进制文件,一个是openshift installer,这个一般场景下,比如对接公有云,私有云,就够了,它会创建ignition文件,并且调用各种云的接口,创建虚拟机,开始安装。

但是如果是baremetal场景,有一个单独的baremetal installer二进制文件,它读取配置文件,调用物理机BMC接口信息,来开始安装,这个区别是目前openshift版本上的情况,不知道未来会不会有变化。

# create a user and create the cluster under the user

useradd -m 3nodeipi

su - 3nodeipi

ssh-keygen

cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

chmod 600 ~/.ssh/config

cat << 'EOF' >> ~/.bashrc

export BASE_DIR='/home/3nodeipi/'

EOF

export BASE_DIR='/home/3nodeipi/'
export BUILDNUMBER=4.11.6

mkdir -p ${BASE_DIR}/data/{sno/disconnected,install}

# set some parameter of you rcluster

NODE_SSH_KEY="$(cat ${BASE_DIR}/.ssh/id_rsa.pub)"
INSTALL_IMAGE_REGISTRY=quaylab.infra.redhat.ren:8443

PULL_SECRET='{"auths":{"registry.redhat.io": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"'${INSTALL_IMAGE_REGISTRY}'": {"auth": "'$( echo -n 'admin:shadowman' | openssl base64 )'","email": "noemail@localhost"}}}'

NTP_SERVER=192.168.7.11
HELP_SERVER=192.168.7.11
KVM_HOST=192.168.7.11
API_VIP=192.168.7.100
INGRESS_VIP=192.168.7.101
CLUSTER_PROVISION_IP=192.168.7.103
BOOTSTRAP_IP=192.168.7.12

# 定义单节点集群的节点信息
SNO_CLUSTER_NAME=acm-demo-one
SNO_BASE_DOMAIN=redhat.ren

BOOTSTRAP_IP=192.168.7.22
MASTER_01_IP=192.168.7.23
MASTER_02_IP=192.168.7.24
MASTER_03_IP=192.168.7.25
WORKER_01_IP=192.168.7.26

BOOTSTRAP_HOSTNAME=bootstrap-demo
MASTER_01_HOSTNAME=master-01-demo
MASTER_02_HOSTNAME=master-02-demo
MASTER_03_HOSTNAME=master-03-demo
WORKER_01_HOSTNAME=worker-01-demo

BOOTSTRAP_INTERFACE=enp1s0
MASTER_01_INTERFACE=enp1s0
MASTER_02_INTERFACE=enp1s0
MASTER_03_INTERFACE=enp1s0
WORKER_01_INTERFACE=enp1s0

BOOTSTRAP_DISK=/dev/vda
MASTER_01_DISK=/dev/vda
MASTER_02_DISK=/dev/vda
MASTER_03_DISK=/dev/vda
WORKER_01_DISK=/dev/vda

OCP_GW=192.168.7.11
OCP_NETMASK=255.255.255.0
OCP_NETMASK_S=24
OCP_DNS=192.168.7.11

# echo ${SNO_IF_MAC} > /data/sno/sno.mac

mkdir -p ${BASE_DIR}/data/install
cd ${BASE_DIR}/data/install

/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] openshift

cat << EOF > ${BASE_DIR}/data/install/install-config.yaml 
apiVersion: v1
baseDomain: $SNO_BASE_DOMAIN
compute:
- name: worker
  replicas: 0
controlPlane:
  name: master
  replicas: 3 
metadata:
  name: $SNO_CLUSTER_NAME
networking:
  # OVNKubernetes , OpenShiftSDN
  networkType: OVNKubernetes
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.31.0.0/16
  machineNetwork:
  - cidr: 192.168.7.0/24
pullSecret: '${PULL_SECRET}'
sshKey: |
$( cat ${BASE_DIR}/.ssh/id_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/openshift/release-images
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - ${INSTALL_IMAGE_REGISTRY}/openshift/release
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
platform:
  baremetal:
    apiVIP: $API_VIP
    ingressVIP: $INGRESS_VIP
    provisioningNetwork: "Managed"
    provisioningNetworkCIDR: 172.22.0.0/24
    provisioningNetworkInterface: enp2s0
    provisioningBridge: br-prov
    clusterProvisioningIP: 172.22.0.6
    bootstrapProvisioningIP: 172.22.0.7
    bootstrapExternalStaticIP: 192.168.7.22/24
    bootstrapExternalStaticGateway: 192.168.7.11
    externalBridge: baremetal
    bootstrapOSImage: http://192.168.7.11:8080/rhcos-qemu.x86_64.qcow2.gz?sha256=$(zcat /data/ocp4/rhcos-qemu.x86_64.qcow2.gz | sha256sum | awk '{print $1}')
    clusterOSImage: http://192.168.7.11:8080/rhcos-openstack.x86_64.qcow2.gz?sha256=$(sha256sum /data/ocp4/rhcos-openstack.x86_64.qcow2.gz | awk '{print $1}')
    hosts:
      - name: ocp4-ipi-osp-master-01
        role: master
        bootMode: legacy
        bmc:
          address: redfish-virtualmedia://192.168.7.103:8000/redfish/v1/Systems/$(cat /data/install/vm.list.* | grep master-01 | awk '{print $1}')
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat /data/install/mac.list.* | grep master-01 | awk '{print $2}')
        rootDeviceHints:
          deviceName: "$MASTER_01_DISK"
        networkConfig: 
          dns-resolver:
            config:
              server:
              - ${OCP_DNS}
          interfaces:
          - ipv4:
              address:
              - ip: ${MASTER_01_IP}
                prefix-length: ${OCP_NETMASK_S}
              dhcp: false
              enabled: true
            name: ${MASTER_01_INTERFACE}
            state: up
            type: ethernet
          routes:
            config:
            - destination: 0.0.0.0/0
              next-hop-address: ${OCP_GW}
              next-hop-interface: ${MASTER_01_INTERFACE}
              table-id: 254
      - name: ocp4-ipi-osp-master-02
        role: master
        bootMode: legacy
        bmc:
          address: redfish-virtualmedia://192.168.7.103:8000/redfish/v1/Systems/$(cat /data/install/vm.list.* | grep master-02 | awk '{print $1}')
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat /data/install/mac.list.* | grep master-02 | awk '{print $2}')
        rootDeviceHints:
          deviceName: "$MASTER_02_DISK"
        networkConfig: 
          dns-resolver:
            config:
              server:
              - ${OCP_DNS}
          interfaces:
          - ipv4:
              address:
              - ip: ${MASTER_02_IP}
                prefix-length: ${OCP_NETMASK_S}
              dhcp: false
              enabled: true
            name: ${MASTER_02_INTERFACE}
            state: up
            type: ethernet
          routes:
            config:
            - destination: 0.0.0.0/0
              next-hop-address: ${OCP_GW}
              next-hop-interface: ${MASTER_02_INTERFACE}
              table-id: 254
      - name: ocp4-ipi-osp-master-03
        role: master
        bootMode: legacy
        bmc:
          address: redfish-virtualmedia://192.168.7.103:8000/redfish/v1/Systems/$(cat /data/install/vm.list.* | grep master-03 | awk '{print $1}')
          username: admin
          password: password
          disableCertificateVerification: True
        bootMACAddress: $(cat /data/install/mac.list.* | grep master-03 | awk '{print $2}')
        rootDeviceHints:
          deviceName: "$MASTER_03_DISK"
        networkConfig: 
          dns-resolver:
            config:
              server:
              - ${OCP_DNS}
          interfaces:
          - ipv4:
              address:
              - ip: ${MASTER_03_IP}
                prefix-length: ${OCP_NETMASK_S}
              dhcp: false
              enabled: true
            name: ${MASTER_03_INTERFACE}
            state: up
            type: ethernet
          routes:
            config:
            - destination: 0.0.0.0/0
              next-hop-address: ${OCP_GW}
              next-hop-interface: ${MASTER_03_INTERFACE}
              table-id: 254
EOF

/bin/cp -f ${BASE_DIR}/data/install/install-config.yaml ${BASE_DIR}/data/install/install-config.yaml.bak

/data/ocp4/${BUILDNUMBER}/openshift-baremetal-install --dir ${BASE_DIR}/data/install/ create manifests

/bin/cp -f  /data/ocp4/ansible-helper/files/* ${BASE_DIR}/data/install/openshift/

#############################################
# run as root if you have not run below, at least one time
# it will generate registry configuration
# copy image registry proxy related config
cd /data/ocp4
bash image.registries.conf.sh nexus.infra.redhat.ren:8083

/bin/cp -f /data/ocp4/image.registries.conf /etc/containers/registries.conf.d/
#############################################

/bin/cp -f /data/ocp4/99-worker-container-registries.yaml ${BASE_DIR}/data/install/openshift
/bin/cp -f /data/ocp4/99-master-container-registries.yaml ${BASE_DIR}/data/install/openshift

cd ${BASE_DIR}/data/install/


# then, we copy baremetal install binary to kvm host

sshpass -p panpan ssh-copy-id root@172.21.6.103

scp /data/ocp4/${BUILDNUMBER}/openshift-baremetal-install root@172.21.6.103:/usr/local/bin/

# the, we copy configuration files to kvm host

cat << EOF > ${BASE_DIR}/data/install/scp.sh
ssh root@172.21.6.103 "rm -rf /data/install;"

scp -r ${BASE_DIR}/data/install root@172.21.6.103:/data/install
EOF

bash ${BASE_DIR}/data/install/scp.sh

kvm host (103) to begin install

到现在位置,万事俱备了,我们就可以在物理机上真正的开始安装了。到这一步,我们没有特别需要做的,因为是IPI模式,全自动,我们运行命令,等着安装成功的结果,并且把各种密码输出记录下来就好了。


cd /data/install
openshift-baremetal-install --dir /data/install/ --log-level debug create cluster
# ......
# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run
# INFO     export KUBECONFIG=/data/install/auth/kubeconfig
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.acm-demo-one.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "JgTXJ-d9Nsb-QHGS2-Puor3"
# DEBUG Time elapsed per stage:
# DEBUG          bootstrap: 23s
# DEBUG            masters: 16m31s
# DEBUG Bootstrap Complete: 19m11s
# DEBUG  Bootstrap Destroy: 11s
# DEBUG  Cluster Operators: 7m10s
# INFO Time elapsed: 43m37s

# tail -f /data/install/.openshift_install.log

on helper to see result

我们需要把物理机上的密钥文件等信息,传回helper节点。方便我们后续的操作。

# on helper node
scp -r root@172.21.6.103:/data/install/auth ${BASE_DIR}/data/install/auth

cd ${BASE_DIR}/data/install
export KUBECONFIG=${BASE_DIR}/data/install/auth/kubeconfig
echo "export KUBECONFIG=${BASE_DIR}/data/install/auth/kubeconfig" >> ~/.bashrc
# oc completion bash | sudo tee /etc/bash_completion.d/openshift > /dev/null


# if you power off cluster for long time
# you will need to re-approve the csr
oc get csr | grep -v Approved
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

password login and oc config

安装完成了,我们要配置一些节点ssh登录的配置。openshift默认的ssh登录,是禁用root登录的,并且启动了time out 机制,这个让我们做实验非常难受和不便,我们在这里就使用脚本,打开这些限制。最终,我们可以轻松的远程root直接用密码登录了。


# init setting for helper node
cat << EOF > ~/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF
chmod 600 ~/.ssh/config


cat > ${BASE_DIR}/data/install/crack.txt << EOF

echo redhat | sudo passwd --stdin root

sudo sed -i "s|^PasswordAuthentication no$|PasswordAuthentication yes|g" /etc/ssh/sshd_config
sudo sed -i "s|^PermitRootLogin no$|PermitRootLogin yes|g" /etc/ssh/sshd_config
sudo sed -i "s|^#ClientAliveInterval 180$|ClientAliveInterval 1800|g" /etc/ssh/sshd_config

sudo systemctl restart sshd

sudo sh -c 'echo "export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost.kubeconfig" >> /root/.bashrc'

sudo sh -c 'echo "RET=\\\`oc config use-context system:admin\\\`" >> /root/.bashrc'

EOF

for i in 23 24 25
do
  ssh core@192.168.7.$i < ${BASE_DIR}/data/install/crack.txt
done

from other host

能远程密码登录了,我们还希望能自动ssh登录,由于openshift节点很多,一台一台的去配置比较麻烦,我们这里提供了脚本,可以批量的来搞。

# https://unix.stackexchange.com/questions/230084/send-the-password-through-stdin-in-ssh-copy-id
dnf install -y sshpass

for i in 23 24 25
do
  sshpass -p 'redhat' ssh-copy-id root@192.168.7.$i
done

power off

home lab的特点,是为了节电,没人用的时候,需要关机,那么我们就提供这样的脚本,来方便openshift集群关机操作。


for i in 23 24 25
do
  ssh root@192.168.7.$i poweroff
done

reboot

有的时候,openshift节点需要全部来一次reboot,来排除错误。这里也有脚本来帮忙。


for i in 23 24 25
do
  ssh root@192.168.7.$i reboot
done

power on

同样,我们有脚本帮助批量启动虚拟机。


# or

for i in {1..3}
do
  virsh start ocp4-ipi-osp-master-0$i
done

check info

我们日常还会有一些集群各个节点收集信息,脚本操作的工作,也提供脚本模板,帮助日常工作。


for i in 23 24 25
do
  ssh root@192.168.7.$i "ip a"
done

cat > ${BASE_DIR}/data/install/crack.txt << 'EOF'

for i in {3..8}
do
  nmcli con down enp${i}s0
  nmcli con del enp${i}s0
done

EOF

for i in 23 24 25
do
  ssh root@192.168.7.$i < ${BASE_DIR}/data/install/crack.txt
done

try to deploy gitea

接下来,需要在helper节点上,部署一个git服务,这个是因为openstack在安装过程中,会先把安装脚本和配置,上传到git服务器上,作为一个git commit,然后真正的部署动作,会从这个git commit上下载,执行。

我们用gitea来提供这个git服务,安装过程网上教程一堆,我们就用最简单的方法来装。但是openstack对git服务有特殊的要求,就是git服务必须使用ssh通道来提供服务,这个就需要测试了,而我们使用了非标准的ssh端口来提供这个服务,也导致了后面一连串的不兼容错误。

不管怎么说,通过ssh访问git服务,肯定要配置出来,并且,我们还要配置ssh key 认证,用密钥的方式的访问。

最后,给出了测试ssh访问git服务的命令行,方便验证。

rm -rf /data/ccn/gitea
mkdir -p /data/ccn/gitea
chown -R 1000:1000 /data/ccn/gitea

podman run -d --replace --name gitea \
  -v /data/ccn/gitea:/data:Z \
  -v /etc/localtime:/etc/localtime:ro \
  -e USER_UID=1000 \
  -e USER_GID=1000 \
  -p 10090:3000 \
  -p 10022:22 \
  docker.io/gitea/gitea:1.17.3

# use systemd to auto-run gitea
podman generate systemd --files --name gitea
/bin/cp -Zf container-gitea.service  /etc/systemd/system/

systemctl daemon-reload
systemctl enable --now  container-gitea.service

# http://quaylab.infra.redhat.ren:10090/
# root / redhat

# setup ssh key for gitea

# test the ssh git access
ssh -T -p 10022 git@quaylab.infra.redhat.ren

git clone ssh://git@quaylab.infra.redhat.ren:10022/root/demo.git

add baremetal host

IPI 模式下,添加一个新节点非常方便,只要定义一个BareMetalHost就好了。我们做osp on ocp的实验,添加一个worker node就可以了,后面osp operator会重新格式化这个worker node,然后把他加到osp cluster里面去。

配置是很简单,但是后面具体openshift是做了什么,让它能管理这个baremetal节点呢? 经过反复的实验,作者大概归纳了相关的行为如下:

  1. 当第一次定义BareMetalHost的以后,ocp会调用redfish端口,启动这个节点,同时挂载一个rhcos 的iso,启动这个节点,这个iso是定制过的,会有一些网络参数,还有定义了自启动的服务。启动了以后,会默认启动一个ironic agent,这个ironic agent会连接 machine api,去下载任务,没发现什么特殊的任务的时候,它会探测一下主机环境,比如有多少core, memory, 存储等等,上报machine api以后,就自动关机了。
  2. 如果BareMetalHost里面还定义了image,那么ironic agent会下载这个镜像(之前在metal3 service里面转化过了),然后把它写到硬盘上,然后重启。这部分自动运行的命令,作者给记录下来了,在这里

这一步操作,对应到架构图,是这部分:


cd ${BASE_DIR}/data/install/

cat << EOF > ${BASE_DIR}/data/install/bmh-01.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: worker-1-bmc-secret
type: Opaque
data:
  username: $(echo -ne "admin" | base64)
  password: $(echo -ne "password" | base64)
---
apiVersion: v1
kind: Secret
metadata:
  name: ocp4-ipi-osp-worker-01-network-config-secret
type: Opaque
stringData:
  nmstate: |
    dns-resolver:
      config:
        server:
        - 192.168.7.11
    interfaces:
    - ipv4:
        address:
        - ip: 192.168.7.26
          prefix-length: 24
        dhcp: false
        enabled: true
      name: enp1s0
      state: up
      type: ethernet
    routes:
      config:
      - destination: 0.0.0.0/0
        next-hop-address: 192.168.7.11
        next-hop-interface: enp1s0
        table-id: 254
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: ocp4-ipi-osp-worker-01
spec:
  online: false
  bootMode: legacy 
  # externallyProvisioned: true
  # hardwareProfile: unknown
  bootMACAddress: $(cat /data/install/mac.list.* | grep worker-01 | awk '{print $2}')
  bmc:
    address: redfish-virtualmedia://192.168.7.103:8000/redfish/v1/Systems/$(cat /data/install/vm.list.* | grep worker-01 | awk '{print $1}')
    credentialsName: worker-1-bmc-secret
    disableCertificateVerification: true
  rootDeviceHints:
    deviceName: /dev/vda
  preprovisioningNetworkDataName: ocp4-ipi-osp-worker-01-network-config-secret
EOF
oc -n openshift-machine-api create -f ${BASE_DIR}/data/install/bmh-01.yaml

# oc delete -f ${BASE_DIR}/data/install/bmh.yaml -n openshift-machine-api 

# DO NOT USE, restore, delete the vm
# oc -n openshift-machine-api delete -f ${BASE_DIR}/data/install/bmh.yaml


# oc delete -f ${BASE_DIR}/data/install/bmh-03.yaml -n openshift-machine-api 

oc get bmh -n openshift-machine-api 
# NAME                     STATE                    CONSUMER                      ONLINE   ERROR   AGE
# ocp4-ipi-osp-master-01   externally provisioned   acm-demo-one-42z8b-master-0   true             3h23m
# ocp4-ipi-osp-master-02   externally provisioned   acm-demo-one-42z8b-master-1   true             3h23m
# ocp4-ipi-osp-master-03   externally provisioned   acm-demo-one-42z8b-master-2   true             3h23m
# ocp4-ipi-osp-worker-01   externally provisioned                                 true             54s

oc get machinesets -n openshift-machine-api
# NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
# acm-demo-one-42z8b-worker-0   0         0                             3h25m

# oc get machinesets -n openshift-machine-api -o json | jq -r .items[0].metadata.name

# # 扩容worker到3副本,会触发worker-2的部署
# oc scale --replicas=1 machineset $(oc get machinesets -n openshift-machine-api -o json | jq -r .items[0].metadata.name) -n openshift-machine-api

# oc scale --replicas=0 machineset $(oc get machinesets -n openshift-machine-api -o json | jq -r .items[0].metadata.name) -n openshift-machine-api

install nfs

我们装的ocp集群,要想使用复杂的业务场景,肯定是需要存储的,我们是home lab,肯定想选取一个轻量化的存储方案,红帽自己的ODF对资源要求比较高,那么我们就选择k8s sig的NFS方案,装一个NFS服务,到集群里面。这个方案的特点,是把集群里面的一个节点,变成存储节点,用这个节点的一个目录作为数据存储空间,应用/pod可以在集群里面的各个节点运行。总的来说,虽然简单,但是性能受限。

如果有其他的需求,可以考虑cnv的host-path方案,或者openEBS方案。

add local volumn

我们弄一个local volumn,给k8s sig的NFS方案作为后端存储。


# go to master-03, this is as storage node
# create the storage path
mkdir -p /var/wzh-local-pv/
chcon -Rt container_file_t /var/wzh-local-pv/

# on helper
cat << EOF > ${BASE_DIR}/data/install/local-pv.yaml
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-volume
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: example-local-pv
spec:
  capacity:
    storage: 450Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-volume
  local:
    path: /var/wzh-local-pv/
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - one-master-03.acm-demo-one.redhat.ren
EOF
oc create --save-config -f ${BASE_DIR}/data/install/local-pv.yaml

# oc delete -f ${BASE_DIR}/data/install/local-pv.yaml

install nfs based on local pv

接下来,我们直接用yaml的方式,部署这个NFS服务,我们已经定制好了k8s sig的NFS服务启动yaml,我们下载,然后修改一下参数就可以直接启动了。他会创建对应的role, deployment等参数信息。


oc create ns nfs-system

# oc project nfs-system

cd ${BASE_DIR}/data/install

export http_proxy="http://127.0.0.1:18801"
export https_proxy=${http_proxy}

wget -O nfs.all.yaml https://raw.githubusercontent.com/wangzheng422/nfs-ganesha-server-and-external-provisioner/wzh/deploy/openshift/nfs.all.local.pv.yaml

unset http_proxy
unset https_proxy

/bin/cp -f nfs.all.yaml nfs.all.yaml.run

# sed -i 's/storageClassName: odf-lvm-vg1/storageClassName: local-volume/' nfs.all.yaml.run
sed -i 's/one-master-03.acm-demo-one.redhat.ren/one-master-03.acm-demo-one.redhat.ren/' nfs.all.yaml.run
sed -i 's/storage: 5Gi/storage: 450Gi/' nfs.all.yaml.run

oc create --save-config -n nfs-system -f nfs.all.yaml.run

# oc delete -n nfs-system -f nfs.all.yaml.run

install cnv, nmstat, sriov operator

我们按照openstack operator的官方文档,安装几个依赖的operator,他们是

  1. cnv, 这个是在openshift集群里面启动kvm虚拟机的,openstack的overcloud controller是用cnv启动的kvm来承载运行。
  2. nmstat, 这个是配置openshift节点网卡参数的插件,openstack会定义很复杂的网络参数,从上面的架构图就能看出来。注意,nmstat只能修改ocp集群管理的节点网卡参数,对于已经更改了基础镜像,变成osp纳管的节点,这个插件是管不到的。
  3. sriov, 这个是配置网卡直通的,作者还不太确定他用在什么地方,目前看,好像是cnv启动kvm的时候,会通过sriov把一堆网卡直通到kvm里面,干的是这个事情。

我们安装的时候,先用automatic approve的方式来安装,这样就能省去点击确认授权的步骤,装完了以后,我们在修改成manual approve的方式,防止集群自动升级operator。自动升级这个功能是很好,但是对于已经装好的集群,如果自动升级了,你还不知道,很可能就升级失败,导致你的集群功能受到影响。

install cnv

我们先装CNV,这个是在ocp里面启动虚拟机用的。CNV会带起来一大堆的pod,所以安装的时间有点长。


# install cnv
cat << EOF > ${BASE_DIR}/data/install/cnv.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-cnv
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: kubevirt-hyperconverged-group
  namespace: openshift-cnv
spec:
  targetNamespaces:
    - openshift-cnv
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: hco-operatorhub
  namespace: openshift-cnv
spec:
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  name: kubevirt-hyperconverged
  startingCSV: kubevirt-hyperconverged-operator.v4.11.0
  channel: "stable" 
EOF
oc create --save-config -f ${BASE_DIR}/data/install/cnv.yaml

# oc delete -f ${BASE_DIR}/data/install/cnv.yaml

oc get csv -n openshift-cnv
# NAME                                       DISPLAY                    VERSION   REPLACES                                   PHASE
# kubevirt-hyperconverged-operator.v4.11.0   OpenShift Virtualization   4.11.0    kubevirt-hyperconverged-operator.v4.10.5   Succeeded

cat << EOF > ${BASE_DIR}/data/install/patch.yaml
spec:
  installPlanApproval: Manual
EOF
oc patch -n openshift-cnv subscription/hco-operatorhub --type merge --patch-file=${BASE_DIR}/data/install/patch.yaml


cat << EOF > ${BASE_DIR}/data/install/cnv-hc.yaml
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
EOF
oc create --save-config -f ${BASE_DIR}/data/install/cnv-hc.yaml

# cat << EOF > ${BASE_DIR}/data/install/hostpath.yaml
# apiVersion: hostpathprovisioner.kubevirt.io/v1beta1
# kind: HostPathProvisioner
# metadata:
#   name: hostpath-provisioner
# spec:
#   imagePullPolicy: IfNotPresent
#   storagePools: 
#   - name: wzh-cnv-hostpath-storage-pool
#     path: "/var/wzh-cnv-hostpath" 
# workload:
#   nodeSelector:
#     kubernetes.io/os: linux
# EOF

# oc create --save-config -f ${BASE_DIR}/data/install/hostpath.yaml

# cat << EOF > ${BASE_DIR}/data/install/sc.yaml
# apiVersion: storage.k8s.io/v1
# kind: StorageClass
# metadata:
#   name: hostpath-csi 
# provisioner: kubevirt.io.hostpath-provisioner
# reclaimPolicy: Delete 
# # volumeBindingMode: WaitForFirstConsumer 
# volumeBindingMode: Immediate
# parameters:
#   storagePool: wzh-cnv-hostpath-storage-pool
# EOF

# oc create --save-config -f ${BASE_DIR}/data/install/sc.yaml

# oc delete -f ${BASE_DIR}/data/install/sc.yaml

install nmstat

接下来装nmstat,这个是设置网卡用的,记得不要设置ocp集群使用的通讯网口。


cat << EOF > ${BASE_DIR}/data/install/nmstat.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    kubernetes.io/metadata.name: openshift-nmstate
    name: openshift-nmstate
  name: openshift-nmstate
spec:
  finalizers:
  - kubernetes
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  annotations:
    olm.providedAPIs: NMState.v1.nmstate.io
  generateName: openshift-nmstate-
  name: openshift-nmstate-wzh
  namespace: openshift-nmstate
spec:
  targetNamespaces:
  - openshift-nmstate
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  labels:
    operators.coreos.com/kubernetes-nmstate-operator.openshift-nmstate: ""
  name: kubernetes-nmstate-operator
  namespace: openshift-nmstate
spec:
  channel: "4.11"
  name: kubernetes-nmstate-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF
oc create --save-config -f ${BASE_DIR}/data/install/nmstat.yaml

oc get csv -n openshift-nmstate
# NAME                                              DISPLAY                       VERSION               REPLACES   PHASE
# kubernetes-nmstate-operator.4.11.0-202210250857   Kubernetes NMState Operator   4.11.0-202210250857              Succeeded

cat << EOF > ${BASE_DIR}/data/install/patch.yaml
spec:
  installPlanApproval: Manual
EOF
oc patch -n openshift-nmstate subscription/kubernetes-nmstate-operator --type merge --patch-file=${BASE_DIR}/data/install/patch.yaml


cat << EOF > ${BASE_DIR}/data/install/nmstat-stat.yaml
---
apiVersion: nmstate.io/v1
kind: NMState
metadata:
  name: nmstate
EOF
oc create --save-config -f ${BASE_DIR}/data/install/nmstat-stat.yaml

install sriov

最后,装sriov,这个是给cnv启动的kvm配置网口直通的。

# oc annotate ns/openshift-sriov-network-operator workload.openshift.io/allowed=management

cat << EOF > ${BASE_DIR}/data/install/sriov.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-sriov-network-operator
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
spec:
  targetNamespaces:
  - openshift-sriov-network-operator
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-network-operator-subscription
  namespace: openshift-sriov-network-operator
spec:
  channel: "4.11"
  name: sriov-network-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF
oc create --save-config -f ${BASE_DIR}/data/install/sriov.yaml

oc get csv -n openshift-sriov-network-operator
# NAME                                         DISPLAY                   VERSION               REPLACES   PHASE
# sriov-network-operator.4.11.0-202210250857   SR-IOV Network Operator   4.11.0-202210250857              Succeeded

oc get subscription -n openshift-sriov-network-operator
# NAME                                  PACKAGE                  SOURCE             CHANNEL
# sriov-network-operator-subscription   sriov-network-operator   redhat-operators   4.11

oc get subscription/sriov-network-operator-subscription -n openshift-sriov-network-operator -o json | jq .spec
# {
#   "channel": "4.11",
#   "name": "sriov-network-operator",
#   "source": "redhat-operators",
#   "sourceNamespace": "openshift-marketplace"
# }

cat << EOF > ${BASE_DIR}/data/install/patch.yaml
spec:
  installPlanApproval: Manual
EOF
oc patch -n openshift-sriov-network-operator subscription/sriov-network-operator-subscription --type merge --patch-file=${BASE_DIR}/data/install/patch.yaml

install osp operator

我们马上就要开始安装openstack组件啦。我们参考的文档是官方文档,官方文档,写的已经很用心,很好了,但是还是免不了有错误。我们会在接下来的步骤中,修正文档里面的错误。

build operator images

安装的第一步,居然是自己编译osp operator的镜像?算了,毕竟是TP版本的软件,有一些不完善,也能理解。根据文档,我们需要找最新的版本,自己打包operator catalog 镜像,这个镜像是一个operator hub的catalog,可以简单理解为,我们在ocp的应用商店里面,开了一个新的门面,叫openstack,里面就有一样商品,叫openstack。


# https://github.com/openstack-k8s-operators/osp-director-operator
# [osp-director-operator](https://catalog.redhat.com/software/containers/rhosp-rhel8-tech-preview/osp-director-operator/607dd3bf87c834779d77611b) 
# [osp-director-operator-bundle](https://catalog.redhat.com/software/containers/rhosp-rhel8-tech-preview/osp-director-operator-bundle/607dd43903f4b3563ab483b3)

#########################
# on helper
# run as root
cd /data/ocp4/4.11.6/
tar zvxf opm-linux-4.11.6.tar.gz
install opm /usr/local/bin/

/bin/cp -f /etc/containers/policy.json /etc/containers/policy.json.bak
cat << EOF > /etc/containers/policy.json
{
    "default": [
        {
            "type": "insecureAcceptAnything"
        }
    ],
    "transports":
        {
            "docker-daemon":
                {
                    "": [{"type":"insecureAcceptAnything"}]
                }
        }
}
EOF

# end run as root
#########################

# registry.redhat.io/rhosp-rhel8-tech-preview/osp-director-operator-bundle:1.2.3-12
BUNDLE_IMG="registry.redhat.io/rhosp-rhel8-tech-preview/osp-director-operator-bundle:1.2.3-12"
INDEX_IMG="quay.io/wangzheng422/osp-director-operator-index:1.2.3-12"
opm index add --bundles ${BUNDLE_IMG} --tag ${INDEX_IMG} -u podman --pull-tool podman
podman push ${INDEX_IMG}

install openstack director operator

编译好了openstack的catalog镜像,我们就用这个镜像,部署一个catalog,并且安装openstack director operator.

oc new-project openstack

cat << EOF > ${BASE_DIR}/data/install/osp-director-operator.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: osp-director-operator-index
  namespace: openstack
spec:
  sourceType: grpc
  # image: quay.io/openstack-k8s-operators/osp-director-operator-index:1.0.0-1
  # image: quay.io/openstack-k8s-operators/osp-director-operator-index:1.2.3
  image: quay.io/wangzheng422/osp-director-operator-index@sha256:ac810497a3b29662573e0843715285a1ad69e3fe7a8c7b6e5fe43d2f6d5bda8d
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: "osp-director-operator-group"
  namespace: openstack
spec:
  targetNamespaces:
  - openstack
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: osp-director-operator-subscription
  namespace: openstack
spec:
  config:
    env:
    - name: WATCH_NAMESPACE
      value: openstack,openshift-machine-api,openshift-sriov-network-operator
  source: osp-director-operator-index
  sourceNamespace: openstack
  name: osp-director-operator
EOF

oc create --save-config -f ${BASE_DIR}/data/install/osp-director-operator.yaml

# oc delete -f ${BASE_DIR}/data/install/osp-director-operator.yaml

oc get operators
# NAME                                                      AGE
# kubernetes-nmstate-operator.openshift-nmstate             21h
# kubevirt-hyperconverged.openshift-cnv                     22h
# osp-director-operator.openstack                           17m
# sriov-network-operator.openshift-sriov-network-operator   21h

oc get csv -n openstack
# NAME                           DISPLAY                 VERSION   REPLACES   PHASE
# osp-director-operator.v1.2.3   OSP Director Operator   1.2.3                Succeeded

try to deploy osp

有了openstack director operator,我们就要真正的开始一步一步的安装一个openstack overcloud啦。

我们参考的文档在这里: Chapter 7. Director operator deployment scenario: Overcloud with Hyper-Converged Infrastructure (HCI)

upload rhel image

openstack是虚机平台,我们需要准备操作系统镜像,我们就下载官网的rhel8.6镜像,并且按照文档的要求,进行小小的定制化。

然后用cnv的命令行virtctl,去把这个镜像上传。


# download rhel-8.6-x86_64-kvm.qcow2 from redhat website
ls -l /data/down | grep rhel
# -rw-r--r--. 1 root root 8770508800 Apr 27  2022 rhel-8.6-aarch64-dvd.iso
# -rw-r--r--. 1 root root  832438272 May 10 13:23 rhel-8.6-x86_64-kvm.qcow2

export PROXY="http://127.0.0.1:18801"

subscription-manager repos --proxy=$PROXY --enable=cnv-4.11-for-rhel-8-x86_64-rpms

dnf install -y kubevirt-virtctl libguestfs-tools-c

/bin/cp -f /data/down/rhel-8.6-x86_64-kvm.qcow2 /data/down/rhel-8.6-x86_64-kvm-wzh.qcow2

virt-customize -a /data/down/rhel-8.6-x86_64-kvm-wzh.qcow2 --run-command 'sed -i -e "s/^\(kernelopts=.*\)net.ifnames=0 \(.*\)/\1\2/" /boot/grub2/grubenv'
virt-customize -a /data/down/rhel-8.6-x86_64-kvm-wzh.qcow2 --run-command 'sed -i -e "s/^\(GRUB_CMDLINE_LINUX=.*\)net.ifnames=0 \(.*\)/\1\2/" /etc/default/grub'

virtctl version
# Client Version: version.Info{GitVersion:"v0.36.5-2-gdd266dff9", GitCommit:"dd266dff95f7de9f79e3e0e5d4867c5ba9d50c9d", GitTreeState:"clean", BuildDate:"2022-04-01T22:51:18Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}
# dial tcp [::1]:8080: connect: connection refused

# copy qcow2 to helper
scp /data/down/rhel-8.6-x86_64-kvm-wzh.qcow2    root@192.168.7.11:/data/swap/

# on helper download virtctl 
export http_proxy="http://127.0.0.1:18801"
export https_proxy=${http_proxy}

export VERSION=v0.53.2
wget https://github.com/kubevirt/kubevirt/releases/download/${VERSION}/virtctl-${VERSION}-linux-amd64

install -m 755 virtctl-${VERSION}-linux-amd64 /usr/local/bin/virtctl

unset http_proxy
unset https_proxy

su - 3nodeipi

oc get storageclass
# NAME             PROVISIONER                        RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
# hostpath-csi     kubevirt.io.hostpath-provisioner   Delete          Immediate           false                  8m36s
# redhat-ren-nfs   redhat.ren/nfs                     Delete          Immediate           false                  3m27s

virtctl image-upload dv openstack-base-img -n openstack --size=50Gi --image-path=/data/swap/rhel-8.6-x86_64-kvm-wzh.qcow2  --storage-class redhat-ren-nfs --access-mode ReadWriteOnce --insecure
# PVC openstack/openstack-base-img not found
# DataVolume openstack/openstack-base-img created
# Waiting for PVC openstack-base-img upload pod to be ready...
# Pod now ready
# Uploading data to https://cdi-uploadproxy-openshift-cnv.apps.acm-demo-one.redhat.ren

#  797.50 MiB / 797.50 MiB [===============================================================================================================================================================] 100.00% 13s

# Uploading data completed successfully, waiting for processing to complete, you can hit ctrl-c without interrupting the progress
# Processing completed successfully
# Uploading rhel-8.6-x86_64-kvm-wzh.qcow2 completed successfully

# virtctl image-upload dv openstack-base-img -n openstack --no-create --size=50Gi --image-path=/data/swap/rhel-8.6-x86_64-kvm-wzh.qcow2  --storage-class redhat-ren-nfs --access-mode ReadWriteOnce --insecure

oc get datavolume
# NAME                 PHASE         PROGRESS   RESTARTS   AGE
# openstack-base-img   UploadReady   N/A        1          113s

# in some case, import fail, just delete the data volume to restart
# oc delete datavolume/openstack-base-img

# ensure there is only one pvc
oc get pv
# NAME               CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                     STORAGECLASS   REASON   AGE
# example-local-pv   450Gi      RWO            Retain           Bound    nfs-system/lvm-file-pvc   local-volume            42m

# in some case, import will never success, 
# it is because cdi is kill by OOM, the reason is un-knonw.
# just reboot master-03 to fix that.

config key for git service, define default password

接下来,我们导入git服务的密钥,后面openstack会把安装脚本上传。

然后我们还要设置主机默认的用户名和口令。

oc create secret generic git-secret -n openstack --from-file=git_ssh_identity=${BASE_DIR}/.ssh/id_rsa --from-literal=git_url=ssh://git@quaylab.infra.redhat.ren:10022/root/openstack.git

# Setting the root password for nodes
echo -n "redhat" | base64
# cmVkaGF0

cat << EOF > ${BASE_DIR}/data/install/openstack-userpassword.yaml
apiVersion: v1
kind: Secret
metadata:
  name: userpassword
  namespace: openstack
data:
  NodeRootPassword: "`echo -n "redhat" | base64`"
EOF

oc create --save-config -f ${BASE_DIR}/data/install/openstack-userpassword.yaml -n openstack

define network parameter

我们定义openstack用到的网络参数。这里面很绕,因为这个定义里面,IP地址的配置,是openstack的controller, computer节点都使用的。但是bridge, bridge对应的网卡,network附着的bridge这些配置,只是对openshift节点有效。

总的来说,这个网络参数配置,是针对openshift节点的,虽然他的名字是OpenStackNetConfig

这一步操作,对应到架构图,是这部分:

# network link name no longer than 15
# https://access.redhat.com/solutions/2425471
# https://github.com/openstack-k8s-operators/osp-director-dev-tools/blob/osp16_tech_preview/ansible/templates/osp/tripleo_heat_envs/vlan/network-environment.yaml.j2
# https://github.com/openstack-k8s-operators/osp-director-dev-tools/blob/master/ansible/templates/osp/netconfig/osnetconfig.yaml.j2

cat << EOF > ${BASE_DIR}/data/install/openstacknetconfig.yaml
apiVersion: osp-director.openstack.org/v1beta1
kind: OpenStackNetConfig
metadata:
  name: openstacknetconfig
spec:
  attachConfigurations:
    br-osp:
      nodeNetworkConfigurationPolicy:
        nodeSelector:
          node-role.kubernetes.io/master: ""
        desiredState:
          interfaces:
          - bridge:
              options:
                stp:
                  enabled: false
              port:
              - name: enp4s0
            description: Linux bridge with enp4s0 as a port
            name: br-osp
            state: up
            type: linux-bridge
            mtu: 1500
    br-osp-ex:
      nodeNetworkConfigurationPolicy:
        nodeSelector:
          node-role.kubernetes.io/master: ""
        desiredState:
          interfaces:
          - bridge:
              options:
                stp:
                  enabled: false
              port:
              - name: enp3s0
            description: Linux bridge with enp3s0 as a port
            name: br-osp-ex
            state: up
            type: linux-bridge
            mtu: 1500

  # optional DnsServers list
  dnsServers:
  - 192.168.7.11
  # optional DnsSearchDomains list
  dnsSearchDomains:
  - osp-demo.redhat.ren
  - some.other.domain
  # DomainName of the OSP environment
  domainName: osp-demo.redhat.ren
  networks:
  - name: Control
    nameLower: ctlplane
    subnets:
    - name: ctlplane
      ipv4:
        allocationEnd: 192.168.7.60
        allocationStart: 192.168.7.40
        cidr: 192.168.7.0/24
        gateway: 192.168.7.11
      attachConfiguration: br-osp
  - name: InternalApi
    nameLower: internal_api
    mtu: 1350
    subnets:
    - name: internal_api
      attachConfiguration: br-osp
      vlan: 20
      ipv4:
        allocationEnd: 172.17.0.250
        allocationStart: 172.17.0.10
        cidr: 172.17.0.0/24
  - name: External
    nameLower: external
    subnets:
    - name: external
      ipv4:
        allocationEnd: 172.21.6.60
        allocationStart: 172.21.6.40
        cidr: 172.21.6.0/24
        gateway: 172.21.6.254
      attachConfiguration: br-osp-ex
  - name: Storage
    nameLower: storage
    mtu: 1500
    subnets:
    - name: storage
      ipv4:
        allocationEnd: 172.18.0.250
        allocationStart: 172.18.0.10
        cidr: 172.18.0.0/24
      vlan: 30
      attachConfiguration: br-osp
  - name: StorageMgmt
    nameLower: storage_mgmt
    mtu: 1500
    subnets:
    - name: storage_mgmt
      ipv4:
        allocationEnd: 172.19.0.250
        allocationStart: 172.19.0.10
        cidr: 172.19.0.0/24
      vlan: 40
      attachConfiguration: br-osp
  - name: Tenant
    nameLower: tenant
    vip: False
    mtu: 1500
    subnets:
    - name: tenant
      ipv4:
        allocationEnd: 172.20.0.250
        allocationStart: 172.20.0.10
        cidr: 172.20.0.0/24
      vlan: 50
      attachConfiguration: br-osp
EOF

oc create --save-config -f ${BASE_DIR}/data/install/openstacknetconfig.yaml -n openstack

# oc delete -f ${BASE_DIR}/data/install/openstacknetconfig.yaml -n openstack
# oc apply -f ${BASE_DIR}/data/install/openstacknetconfig.yaml -n openstack

oc get openstacknetconfig/openstacknetconfig -n openstack
# NAME                 ATTACHCONFIG DESIRED   ATTACHCONFIG READY   NETWORKS DESIRED   NETWORKS READY   PHYSNETWORKS DESIRED   PHYSNETWORKS READY   STATUS       REASON
# openstacknetconfig   2                      2                    6                  6                1                      1                    Configured   OpenStackNetConfig openstacknetconfig all resources configured

# oc get openstacknetattach -n openstack

oc get openstacknet -n openstack
# NAME          CIDR             DOMAIN                            MTU    VLAN   VIP     GATEWAY        ROUTES   RESERVED IPS   STATUS
# ctlplane      192.168.7.0/24   ctlplane.osp-demo.redhat.ren      1500   0      true    192.168.7.11   []       0              Configured
# external      172.21.6.0/24    external.osp-demo.redhat.ren      1500   0      true    172.21.6.254   []       0              Configured
# internalapi   172.17.0.0/24    internalapi.osp-demo.redhat.ren   1350   20     true                   []       0              Configured
# storage       172.18.0.0/24    storage.osp-demo.redhat.ren       1500   30     true                   []       0              Configured
# storagemgmt   172.19.0.0/24    storagemgmt.osp-demo.redhat.ren   1500   40     true                   []       0              Configured
# tenant        172.20.0.0/24    tenant.osp-demo.redhat.ren        1500   50     false                  []       0              Configured

oc get network-attachment-definitions -n openstack
# NAME                 AGE
# ctlplane             2m12s
# ctlplane-static      2m11s
# external             2m11s
# external-static      2m11s
# internalapi          2m11s
# internalapi-static   2m11s
# storage              2m11s
# storage-static       2m11s
# storagemgmt          2m10s
# storagemgmt-static   2m10s
# tenant               2m10s
# tenant-static        2m10s

oc get nncp
# NAME        STATUS      REASON
# br-osp      Available   SuccessfullyConfigured
# br-osp-ex   Available   SuccessfullyConfigured

deploy controller

我们定义一个单节点的controller,这个定义保存后,openshift会通过cnv启动一个kvm,这个kvm会使用我们之前上传的rhel镜像作为操作系统,启动完成以后,就以一个空的操作系统,静静的运行在那里。

同时,他会运行一个openstack client的pod,我们后面日常对openstack的操作,就都会在这个openstack client里面。

注意,这里面文档有bug。文档里面的版本是v1beta2,而我们的镜像里面只有v1beta1,所以我们需要对配置做一些小的调整。

这一步操作,对应到架构图,是这部分:

# here version mismatch with official document.
# we use old official document, which can't be found. :(
cat << EOF > ${BASE_DIR}/data/install/openstack-controller.yaml
apiVersion: osp-director.openstack.org/v1beta1
kind: OpenStackControlPlane
metadata:
  name: overcloud
  namespace: openstack
spec:
  # openStackClientImageURL: registry.redhat.io/rhosp-beta/openstack-tripleoclient:16.2
  openStackClientNetworks:
        - ctlplane
        - external
        - internal_api
  openStackClientStorageClass: redhat-ren-nfs
  passwordSecret: userpassword
  gitSecret: git-secret
  virtualMachineRoles:
    controller:
      roleName: Controller
      roleCount: 1
      networks:
        - ctlplane
        - internal_api
        - external
        - tenant
        - storage
        - storage_mgmt
      cores: 6
      memory: 12
      diskSize: 60
      baseImageVolumeName: openstack-base-img
      storageClass: redhat-ren-nfs
EOF
oc create --save-config -f ${BASE_DIR}/data/install/openstack-controller.yaml -n openstack

# oc delete -f ${BASE_DIR}/data/install/openstack-controller.yaml -n openstack
# oc apply -f ${BASE_DIR}/data/install/openstack-controller.yaml -n openstack

# here, it will take a long time, because it will clone pvc to 3 pvc
# half to 1 hour, based on your disk performance.

oc get openstackcontrolplane/overcloud -n openstack
# NAME        VMSETS DESIRED   VMSETS READY   CLIENT READY   STATUS        REASON
# overcloud   1                1              true           Provisioned   All requested OSVMSets have been provisioned

oc get openstackcontrolplane -n openstack
# NAME        VMSETS DESIRED   VMSETS READY   CLIENT READY   STATUS        REASON
# overcloud   1                1              true           Provisioned   All requested OSVMSets have been provisioned

oc get openstackvmsets -n openstack
# NAME         DESIRED   READY   STATUS        REASON
# controller   3         3       Provisioned   All requested VirtualMachines have been provisioned

oc get virtualmachines -n openstack
# NAME           AGE    STATUS    READY
# controller-0   107m   Running   True
# controller-1   107m   Running   True
# controller-2   107m   Running   True

define openstack install script

接着,我们按照官方文档,定义openstack install script,这个安装脚本,是配置computer节点网络的。

官方文档里面有个bug,就是没定义StorageMgmt网络,我们补充进去就好了。

安装脚本的定义分好几个步骤,有几个步骤,是把官方文档copy past进去,还有步骤,是在openstack client pod里面,创建模板,然后导出,总之,按照步骤做就好,并不难。


# on helper
mkdir -p ${BASE_DIR}/data/custom_templates
mkdir -p ${BASE_DIR}/data/custom_environment_files

/bin/rm -rf ${BASE_DIR}/data/custom_templates/*
/bin/rm -rf ${BASE_DIR}/data/custom_environment_files/*

cat << 'EOF' > ${BASE_DIR}/data/custom_templates/net-config-two-nic-vlan-computehci.yaml
heat_template_version: rocky
description: >
  Software Config to drive os-net-config to configure VLANs for the Compute role.
parameters:
  ControlPlaneIp:
    default: ''
    description: IP address/subnet on the ctlplane network
    type: string
  ControlPlaneSubnetCidr:
    default: ''
    description: >
      The subnet CIDR of the control plane network. (The parameter is
      automatically resolved from the ctlplane subnet's cidr attribute.)
    type: string
  ControlPlaneDefaultRoute:
    default: ''
    description: The default route of the control plane network. (The parameter
      is automatically resolved from the ctlplane subnet's gateway_ip attribute.)
    type: string
  ControlPlaneStaticRoutes:
    default: []
    description: >
      Routes for the ctlplane network traffic.
      JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
      Unless the default is changed, the parameter is automatically resolved
      from the subnet host_routes attribute.
    type: json
  ControlPlaneMtu:
    default: 1500
    description: The maximum transmission unit (MTU) size(in bytes) that is
      guaranteed to pass through the data path of the segments in the network.
      (The parameter is automatically resolved from the ctlplane network's mtu attribute.)
    type: number
  StorageIpSubnet:
    default: ''
    description: IP address/subnet on the storage network
    type: string
  StorageNetworkVlanID:
    default: 30
    description: Vlan ID for the storage network traffic.
    type: number
  StorageMtu:
    default: 1500
    description: The maximum transmission unit (MTU) size(in bytes) that is
      guaranteed to pass through the data path of the segments in the
      Storage network.
    type: number
  StorageInterfaceRoutes:
    default: []
    description: >
      Routes for the storage network traffic.
      JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
      Unless the default is changed, the parameter is automatically resolved
      from the subnet host_routes attribute.
    type: json
  StorageMgmtIpSubnet:
    default: ''
    description: IP address/subnet on the storage_mgmt network
    type: string
  StorageMgmtNetworkVlanID:
    default: 40
    description: Vlan ID for the storage_mgmt network traffic.
    type: number
  StorageMgmtMtu:
    default: 1500
    description: The maximum transmission unit (MTU) size(in bytes) that is
      guaranteed to pass through the data path of the segments in the
      StorageMgmt network.
    type: number
  StorageMgmtInterfaceRoutes:
    default: []
    description: >
      Routes for the storage_mgmt network traffic.
      JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
      Unless the default is changed, the parameter is automatically resolved
      from the subnet host_routes attribute.
    type: json
  InternalApiIpSubnet:
    default: ''
    description: IP address/subnet on the internal_api network
    type: string
  InternalApiNetworkVlanID:
    default: 20
    description: Vlan ID for the internal_api network traffic.
    type: number
  InternalApiMtu:
    default: 1500
    description: The maximum transmission unit (MTU) size(in bytes) that is
      guaranteed to pass through the data path of the segments in the
      InternalApi network.
    type: number
  InternalApiInterfaceRoutes:
    default: []
    description: >
      Routes for the internal_api network traffic.
      JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
      Unless the default is changed, the parameter is automatically resolved
      from the subnet host_routes attribute.
    type: json
  TenantIpSubnet:
    default: ''
    description: IP address/subnet on the tenant network
    type: string
  TenantNetworkVlanID:
    default: 50
    description: Vlan ID for the tenant network traffic.
    type: number
  TenantMtu:
    default: 1500
    description: The maximum transmission unit (MTU) size(in bytes) that is
      guaranteed to pass through the data path of the segments in the
      Tenant network.
    type: number
  TenantInterfaceRoutes:
    default: []
    description: >
      Routes for the tenant network traffic.
      JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
      Unless the default is changed, the parameter is automatically resolved
      from the subnet host_routes attribute.
    type: json
  ExternalMtu:
    default: 1500
    description: The maximum transmission unit (MTU) size(in bytes) that is
      guaranteed to pass through the data path of the segments in the
      External network.
    type: number
  DnsServers: # Override this via parameter_defaults
    default: []
    description: >
      DNS servers to use for the Overcloud (2 max for some implementations).
      If not set the nameservers configured in the ctlplane subnet's
      dns_nameservers attribute will be used.
    type: comma_delimited_list
  DnsSearchDomains: # Override this via parameter_defaults
    default: []
    description: A list of DNS search domains to be added (in order) to resolv.conf.
    type: comma_delimited_list

resources:

  MinViableMtu:
    # This resource resolves the minimum viable MTU for interfaces, bonds and
    # bridges that carry multiple VLANs. Each VLAN may have different MTU. The
    # bridge, bond or interface must have an MTU to allow the VLAN with the
    # largest MTU.
    type: OS::Heat::Value
    properties:
      type: number
      value:
        yaql:
          expression: $.data.max()
          data:
            - {get_param: ControlPlaneMtu}
            - {get_param: StorageMtu}
            - {get_param: StorageMgmtMtu}
            - {get_param: InternalApiMtu}
            - {get_param: TenantMtu}
            - {get_param: ExternalMtu}

  OsNetConfigImpl:
    type: OS::Heat::SoftwareConfig
    properties:
      group: script
      config:
        str_replace:
          template:
            get_file: /usr/share/openstack-tripleo-heat-templates/network/scripts/run-os-net-config.sh
          params:
            $network_config:
              network_config:
              - type: interface
                name: nic4
                mtu:
                  get_attr: [MinViableMtu, value]
                use_dhcp: false
                dns_servers:
                  get_param: DnsServers
                domain:
                  get_param: DnsSearchDomains
                addresses:
                - ip_netmask:
                    list_join:
                    - /
                    - - get_param: ControlPlaneIp
                      - get_param: ControlPlaneSubnetCidr
                routes:
                  list_concat_unique:
                    - get_param: ControlPlaneStaticRoutes
                    - - default: true
                        next_hop:
                          get_param: ControlPlaneDefaultRoute
              - type: vlan
                mtu:
                  get_param: StorageMtu
                device: nic4
                vlan_id:
                  get_param: StorageNetworkVlanID
                addresses:
                - ip_netmask:
                    get_param: StorageIpSubnet
                routes:
                  list_concat_unique:
                    - get_param: StorageInterfaceRoutes
              - type: vlan
                device: nic4
                mtu:
                  get_param: StorageMgmtMtu
                vlan_id:
                  get_param: StorageMgmtNetworkVlanID
                addresses:
                - ip_netmask:
                    get_param: StorageMgmtIpSubnet
                routes:
                  list_concat_unique:
                    - get_param: StorageMgmtInterfaceRoutes                    
              - type: vlan
                mtu:
                  get_param: InternalApiMtu
                device: nic4
                vlan_id:
                  get_param: InternalApiNetworkVlanID
                addresses:
                - ip_netmask:
                    get_param: InternalApiIpSubnet
                routes:
                  list_concat_unique:
                    - get_param: InternalApiInterfaceRoutes

              - type: ovs_bridge
                # This will default to br-ex, anything else   requires specific
                # bridge mapping entries for it to be used.
                name: bridge_name
                mtu:
                  get_param: ExternalMtu
                use_dhcp: false
                members:
                - type: interface
                  name: nic3
                  mtu:
                    get_param: ExternalMtu
                  use_dhcp: false
                  primary: true
                - type: vlan
                  mtu:
                    get_param: TenantMtu
                  vlan_id:
                    get_param: TenantNetworkVlanID
                  addresses:
                  - ip_netmask:
                      get_param: TenantIpSubnet
                  routes:
                    list_concat_unique:
                      - get_param: TenantInterfaceRoutes
outputs:
  OS::stack_id:
    description: The OsNetConfigImpl resource.
    value:
      get_resource: OsNetConfigImpl
EOF


oc rsh -n openstack openstackclient
# in the shell
unset OS_CLOUD
cd /home/cloud-admin/
openstack overcloud roles generate Controller ComputeHCI > roles_data.yaml
exit

oc cp openstack/openstackclient:home/cloud-admin/roles_data.yaml ${BASE_DIR}/data/custom_templates/roles_data.yaml

cd ${BASE_DIR}/data/custom_templates
tar -cvzf custom-config.tar.gz *.yaml
oc delete configmap tripleo-tarball-config -n openstack
oc create configmap tripleo-tarball-config --from-file=custom-config.tar.gz -n openstack

oc get configmap/tripleo-tarball-config -n openstack
# NAME                     DATA   AGE
# tripleo-tarball-config   1      7s

cat << EOF > ${BASE_DIR}/data/custom_environment_files/network-environment.yaml
resource_registry:
  OS::TripleO::ComputeHCI::Net::SoftwareConfig: net-config-two-nic-vlan-computehci.yaml

# parameter_defaults:
#   # self define
#   NeutronBridgeMappings: datacentre:br-osp-ex
EOF

cat << EOF > ${BASE_DIR}/data/custom_environment_files/compute-hci.yaml
resource_registry:
  OS::TripleO::Services::CephMgr: deployment/ceph-ansible/ceph-mgr.yaml
  OS::TripleO::Services::CephMon: deployment/ceph-ansible/ceph-mon.yaml
  OS::TripleO::Services::CephOSD: deployment/ceph-ansible/ceph-osd.yaml
  OS::TripleO::Services::CephClient: deployment/ceph-ansible/ceph-client.yaml

parameter_defaults:
  # needed for now because of the repo used to create tripleo-deploy image
  CephAnsibleRepo: "rhelosp-ceph-4-tools"
  CephAnsiblePlaybookVerbosity: 3
  CinderEnableIscsiBackend: false
  CinderEnableRbdBackend: true
  CinderBackupBackend: ceph
  CinderEnableNfsBackend: false
  NovaEnableRbdBackend: true
  GlanceBackend: rbd
  CinderRbdPoolName: "volumes"
  NovaRbdPoolName: "vms"
  GlanceRbdPoolName: "images"
  CephPoolDefaultPgNum: 32
  CephPoolDefaultSize: 2
  CephAnsibleDisksConfig:
    devices:
      - '/dev/vdb'
      - '/dev/vdc'
      - '/dev/vdd'
    osd_scenario: lvm
    osd_objectstore: bluestore
  CephAnsibleExtraConfig:
    is_hci: true
  CephConfigOverrides:
    rgw_swift_enforce_content_length: true
    rgw_swift_versioning_enabled: true
EOF

oc delete configmap -n openstack heat-env-config 
oc create configmap -n openstack heat-env-config --from-file=${BASE_DIR}/data/custom_environment_files/ --dry-run=client -o yaml | oc apply -f -

oc get configmap/heat-env-config -n openstack
# NAME              DATA   AGE
# heat-env-config   2      4s

define computer node

接下来,我们定义computer node。在定义computer node之前,我们openshift集群是有一个worker节点的,这个worker节点是空的,啥也没有,我们通过定义OpenStackBaremetalSet,调用openshift的metal3相关的功能,用我们指定的镜像,把这个worker节点刷成一个rhel节点。

这一步操作,对应到架构图,是这部分:


cat << EOF > ${BASE_DIR}/data/install/openstack-hcicompute.yaml
apiVersion: osp-director.openstack.org/v1beta1
kind: OpenStackBaremetalSet
metadata:
  name: computehci
  namespace: openstack
spec:
  count: 1
  baseImageUrl: http://192.168.7.11:8080/rhel-8.6-x86_64-kvm-wzh.qcow2
  deploymentSSHSecret: osp-controlplane-ssh-keys
  ctlplaneInterface: enp4s0
  networks:
    - ctlplane
    - internal_api
    - tenant
    - storage
    - storage_mgmt
  roleName: ComputeHCI
  passwordSecret: userpassword
EOF
oc create --save-config -f ${BASE_DIR}/data/install/openstack-hcicompute.yaml -n openstack

# oc delete -f ${BASE_DIR}/data/install/openstack-hcicompute.yaml -n openstack

# very tricky, after read source code, there is a buggy logic to check the online to false.
# cat << EOF > ${BASE_DIR}/data/install/patch.yaml
# spec:
#   online: fales
# EOF
# oc patch -n openshift-machine-api BareMetalHost/ocp4-ipi-osp-worker-01 --type merge --patch-file=${BASE_DIR}/data/install/patch.yaml

# ssh into the worker-1, and add public access ip address
# so it can download ironic agent podman image
# and the ironic agent will write base image to disk
# but first, it will boot using coreos live iso
# ssh -i id_rsa core@172.22.0.199
# sudo -i
# nmcli con add ifname enp1s0 con-name enp1s0 type ethernet ipv4.method manual ipv4.addresses 192.168.7.26/24 ipv4.dns 192.168.7.11
# nmcli con up enp1s0

# /bin/qemu-img convert -O host_device -t directsync -S 0 -W /tmp/compressed-rhel-8.6-x86_64-kvm-wzh.qcow2 /dev/vda
# sgdisk -e /dev/vda
# sgdisk -Z /dev/vda3

oc describe crd openstackbaremetalset

oc get openstackbaremetalset -n openstack
# NAME         DESIRED   READY   STATUS        REASON
# computehci   1         1       Provisioned   All requested BaremetalHosts have been provisioned

oc get openstackbaremetalset/computehci -n openstack
# NAME         DESIRED   READY   STATUS        REASON
# computehci   1         1       Provisioned   All requested BaremetalHosts have been provisioned

oc get baremetalhosts -n openshift-machine-api
# NAME                     STATE                    CONSUMER                      ONLINE   ERROR   AGE
# ocp4-ipi-osp-master-01   externally provisioned   acm-demo-one-8zwdl-master-0   true             126m
# ocp4-ipi-osp-master-02   externally provisioned   acm-demo-one-8zwdl-master-1   true             126m
# ocp4-ipi-osp-master-03   externally provisioned   acm-demo-one-8zwdl-master-2   true             126m
# ocp4-ipi-osp-worker-01   provisioned              computehci                    true             54m

patch for openstack nodes

在继续操作之前,我们需要给已有的openstack 节点,包括controller, computer,打一些配置上去,因为我们是离线环境,主要就是把repo yum源,还有container registry源,都指向内网的环境。

###########################
# add repo for osp nodes
oc rsh -n openstack openstackclient

cd /home/cloud-admin

VAR_URL=http://192.168.7.11:5000/osp.repo

# ansible-playbook -i /home/cloud-admin/ctlplane-ansible-inventory ./rhsm.yaml

ansible -i /home/cloud-admin/ctlplane-ansible-inventory overcloud -a "sudo dnf config-manager --add-repo $VAR_URL"

ansible -i /home/cloud-admin/ctlplane-ansible-inventory overcloud -a "sudo mkdir -p /etc/cni/net.d"

scp root@192.168.7.11:/data/ocp4/image.registries.conf ./
sed -i 's/nexus.infra.redhat.ren/192.168.7.11/g' image.registries.conf 

ansible -i /home/cloud-admin/ctlplane-ansible-inventory overcloud -a "sudo mkdir -p /etc/containers/registries.conf.d/"
ansible -i /home/cloud-admin/ctlplane-ansible-inventory overcloud -b -m copy -a "src=./image.registries.conf dest=/etc/containers/registries.conf.d/image.registries.conf"

cat << EOF > ./policy.json
{
    "default": [
        {
            "type": "insecureAcceptAnything"
        }
    ],
    "transports":
        {
            "docker-daemon":
                {
                    "": [{"type":"insecureAcceptAnything"}]
                }
        }
}
EOF
ansible -i /home/cloud-admin/ctlplane-ansible-inventory overcloud -b -m copy -a "src=./policy.json dest=/etc/containers/policy.json"

generate ansible playbooks

我们终于到了最后的步骤了,我们将要创建安装用的ansible playbook。

在这里面,有一个osp operator的小bug,因为我们的git ssh端口不是22端口,所以我们的git uri形式不是 git@server:root/openstack.git , 而是 ssh://git@quaylab.infra.redhat.ren:10022/root/openstack.git ,这就造成的脚本解析错误,暂时没有好办法解决,只能到pod里面去,把ssh config文件patch一下。

cat << EOF > ${BASE_DIR}/data/install/openstack-config-generator.yaml
apiVersion: osp-director.openstack.org/v1beta1
kind: OpenStackConfigGenerator
metadata:
  name: default
  namespace: openstack
spec:
  enableFencing: false
  gitSecret: git-secret
  # imageURL: registry.redhat.io/rhosp-rhel8/openstack-tripleoclient:16.2
  heatEnvConfigMap: heat-env-config
  # List of heat environment files to include from tripleo-heat-templates/environments
  # heatEnvs:
  # - ssl/tls-endpoints-public-dns.yaml
  # - ssl/enable-tls.yaml
  tarballConfigMap: tripleo-tarball-config
  # interactive: true
EOF
oc create --save-config -f ${BASE_DIR}/data/install/openstack-config-generator.yaml -n openstack

# oc delete -f ${BASE_DIR}/data/install/openstack-config-generator.yaml -n openstack

oc get openstackconfiggenerator/default -n openstack
# NAME      STATUS
# default   Initializing

# fix for ssh connect bugs
# if the git host on ssh other than port 22
# the osp script will buggy
watch oc get pod -l job-name=generate-config-default

oc rsh $(oc get pod -o name -l job-name=generate-config-default)
# ls -la /home/cloud-admin/
cat /home/cloud-admin/.ssh/config

cat << EOF >> /home/cloud-admin/.ssh/config

Host quaylab.infra.redhat.ren
    User root
    IdentityFile /home/cloud-admin/.ssh/git_id_rsa
    StrictHostKeyChecking no

EOF
# git clone ssh://git@quaylab.infra.redhat.ren:10022/root/openstack.git

oc get openstackconfiggenerator/default -n openstack
# NAME      STATUS
# default   Generating

run ansible playbooks

我们将要在openstack client pod里面,运行ansible playbook脚本,安装部署我们的overcloud。在这里,又和官方文档有点不一样的,官方文档使用的OpenStackDeploy,我们的operator里面还没有,所以我们需要手动运行。

由于我们的controller是nested kvm,运行起来那是非常慢,整个安装过程在作者的home lab里面要2个多小时。


# official doc versoin mis-match
# there is no openstackDeploy CRD for current osd operator

# cat << EOF > ${BASE_DIR}/data/install/openstack-deployment.yaml
# apiVersion: osp-director.openstack.org/v1beta1
# kind: OpenStackDeploy
# metadata:
#   name: default
# spec:
#   configVersion: n54dh548h5d7h5f5h648h95h5b5h686h64bhf8h566h65fh5f7h674hdchdh59dh58hf7h667h7h57fh85h557hdh59bh5dh54ch7dh547h579hcfq
#   configGenerator: default
# EOF
# oc create --save-config -f ${BASE_DIR}/data/install/openstack-deployment.yaml -n openstack

oc rsh -n openstack openstackclient

cd /home/cloud-admin
ansible -i /home/cloud-admin/ctlplane-ansible-inventory overcloud -a "sudo dnf -y install python3 lvm2"

# run ansible driven OpenStack deployment
cat << EOF >> /home/cloud-admin/.ssh/config

Host quaylab.infra.redhat.ren
    User root
    IdentityFile /home/cloud-admin/.ssh/git_id_rsa
    StrictHostKeyChecking no

EOF
chmod 600 /home/cloud-admin/.ssh/config

# it is better to run on local machine through crictl exec -it **** bash
./tripleo-deploy.sh -a

/bin/cp tripleo-deploy.sh tripleo-deploy.wzh.sh
sed -i 's/ansible-playbook /ansible-playbook -v /g' tripleo-deploy.wzh.sh
./tripleo-deploy.wzh.sh -p

# because it is nested virtulization
# it will cost almost 2 hours to deploy

# debug 
ssh cloud-admin@192.168.7.43

# podman pull registry.redhat.io/rhosp-rhel8/openstack-cron:16.2
# podman pull registry.redhat.io/rhosp-rhel8/openstack-ovn-controller:16.2
# podman pull registry.redhat.io/rhosp-rhel8/openstack-nova-libvirt:16.2
# podman pull registry.redhat.io/rhosp-rhel8/openstack-iscsid:16.2
# podman pull registry.redhat.io/rhosp-rhel8/openstack-nova-compute:16.2
# podman pull registry.redhat.io/rhosp-rhel8/openstack-neutron-metadata-agent-ovn:16.2


# access the webpage
oc get secret tripleo-passwords -o jsonpath='{.data.tripleo-overcloud-passwords\.yaml}' | base64 -d | grep AdminPassword
  # AdminPassword: 9dhv6qr7xlsbkrzvlvvjtndks
  # CephDashboardAdminPassword: 74rpbjqm8qt586v79rtcwhr2c
  # CephGrafanaAdminPassword: hlg8k7m6fg2zqqvq799xpmsxx
  # HeatStackDomainAdminPassword: flqfdp86gk7xf8rjh2f6nkxhl

# http://172.21.6.40/
# admin / ....

use the openstack overcloud

我们已经装好了一个openstack overcloud了,接下来,我们就使用一下这个overcloud,创建一个vm试试。

这一步操作,对应到架构图,是这部分:

openstack endpoint list
# +----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+
# | ID                               | Region    | Service Name | Service Type   | Enabled | Interface | URL                                           |
# +----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+
# | 0841809b67a84e8a9f8b5fe1c3fe78b0 | regionOne | glance       | image          | True    | internal  | http://172.17.0.10:9292                       |
# | 0fc84999e6e948cf82d2abb1ff8ffbaf | regionOne | heat         | orchestration  | True    | public    | http://172.21.6.40:8004/v1/%(tenant_id)s      |
# | 1447065d224943c4a3ff886c3bb8c4b3 | regionOne | heat         | orchestration  | True    | internal  | http://172.17.0.10:8004/v1/%(tenant_id)s      |
# | 1c57db71dfcf438b8607cf2549929757 | regionOne | cinderv3     | volumev3       | True    | admin     | http://172.17.0.10:8776/v3/%(tenant_id)s      |
# | 21e24d92d592457782f7c8b39b55ab41 | regionOne | nova         | compute        | True    | public    | http://172.21.6.40:8774/v2.1                  |
# | 26e93f7d318149d492268d7abbeebb8c | regionOne | placement    | placement      | True    | public    | http://172.21.6.40:8778/placement             |
# | 3338679b75b949b0810c807a3dd5b175 | regionOne | heat-cfn     | cloudformation | True    | admin     | http://172.17.0.10:8000/v1                    |
# | 459fa0db22ce4edcba52309bf5157bd6 | regionOne | nova         | compute        | True    | internal  | http://172.17.0.10:8774/v2.1                  |
# | 494e26a5e80d40258e07a32d7f7cd527 | regionOne | placement    | placement      | True    | admin     | http://172.17.0.10:8778/placement             |
# | 4a612665e36840eeb354ebfc1540d372 | regionOne | swift        | object-store   | True    | internal  | http://172.18.0.10:8080/v1/AUTH_%(tenant_id)s |
# | 4ca6bd346e714c86934de3fd80199490 | regionOne | glance       | image          | True    | admin     | http://172.17.0.10:9292                       |
# | 6d1cc6552c4f4bd99b78cfa173f4d30e | regionOne | nova         | compute        | True    | admin     | http://172.17.0.10:8774/v2.1                  |
# | 73a38a88baa342f0882fe846bcf20c23 | regionOne | keystone     | identity       | True    | internal  | http://172.17.0.10:5000                       |
# | 7a194566071140f1a4a88da42e131520 | regionOne | cinderv3     | volumev3       | True    | internal  | http://172.17.0.10:8776/v3/%(tenant_id)s      |
# | 8058f1fd49b6474bab4b2c889bfb8769 | regionOne | cinderv3     | volumev3       | True    | public    | http://172.21.6.40:8776/v3/%(tenant_id)s      |
# | 8158d87f9fb648939e70ebf84398dcb2 | regionOne | neutron      | network        | True    | admin     | http://172.17.0.10:9696                       |
# | 8b6ccf76ea67428ba49431cb58a7d749 | regionOne | cinderv2     | volumev2       | True    | admin     | http://172.17.0.10:8776/v2/%(tenant_id)s      |
# | 8e462b16b50840c380d20a04c05ef19d | regionOne | heat-cfn     | cloudformation | True    | internal  | http://172.17.0.10:8000/v1                    |
# | 96f9cc117a744e74af9a8c889bdcc294 | regionOne | neutron      | network        | True    | internal  | http://172.17.0.10:9696                       |
# | 97996f0256db4ecd97d24dd09d122fed | regionOne | swift        | object-store   | True    | admin     | http://172.18.0.10:8080                       |
# | a874381f22ef480eb858c2062ee0bc84 | regionOne | cinderv2     | volumev2       | True    | internal  | http://172.17.0.10:8776/v2/%(tenant_id)s      |
# | ae9e4589fda4420ebfac1e4d385ebf39 | regionOne | heat-cfn     | cloudformation | True    | public    | http://172.21.6.40:8000/v1                    |
# | c25acb62d7fd4c0c9450c62f99e257e9 | regionOne | neutron      | network        | True    | public    | http://172.21.6.40:9696                       |
# | e16700f441b64798beca1d95982743e0 | regionOne | keystone     | identity       | True    | public    | http://172.21.6.40:5000                       |
# | e433ea6ae0bf4ce19b2fe5424204d35b | regionOne | heat         | orchestration  | True    | admin     | http://172.17.0.10:8004/v1/%(tenant_id)s      |
# | ef151edac10b4838909009e8892fa3a4 | regionOne | placement    | placement      | True    | internal  | http://172.17.0.10:8778/placement             |
# | f31465304ea249f5a4888e52269c6891 | regionOne | keystone     | identity       | True    | admin     | http://192.168.7.40:35357                     |
# | f42b5746d871425db507cf32f4d7d536 | regionOne | cinderv2     | volumev2       | True    | public    | http://172.21.6.40:8776/v2/%(tenant_id)s      |
# | fbe5ee79ad504ef995d111bb2e76a032 | regionOne | swift        | object-store   | True    | public    | http://172.21.6.40:8080/v1/AUTH_%(tenant_id)s |
# | fdefc2aa2c36430294c7c436662cfb16 | regionOne | glance       | image          | True    | public    | http://172.21.6.40:9292                       |
# +----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+


openstack flavor create --ram 512 --disk 1 --vcpu 1 --public tiny
# +----------------------------+--------------------------------------+
# | Field                      | Value                                |
# +----------------------------+--------------------------------------+
# | OS-FLV-DISABLED:disabled   | False                                |
# | OS-FLV-EXT-DATA:ephemeral  | 0                                    |
# | disk                       | 1                                    |
# | id                         | 511a9cc2-9f68-4850-a3e1-de40a68db8d7 |
# | name                       | tiny                                 |
# | os-flavor-access:is_public | True                                 |
# | properties                 |                                      |
# | ram                        | 512                                  |
# | rxtx_factor                | 1.0                                  |
# | swap                       |                                      |
# | vcpus                      | 1                                    |
# +----------------------------+--------------------------------------+

#####################
# on helper
wget https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img

oc cp /data/swap/cirros-0.4.0-x86_64-disk.img openstack/openstackclient:/home/cloud-admin/swap/
# end on helper
#####################

openstack image create cirros --container-format bare --disk-format qcow2 --public --file /home/cloud-admin/swap/cirros-0.4.0-x86_64-disk.img
# +------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field            | Value                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
# +------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | checksum         | 443b7623e27ecf03dc9e01ee93f67afe                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
# | container_format | bare                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
# | created_at       | 2022-11-18T15:38:57Z                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
# | disk_format      | qcow2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
# | file             | /v2/images/2d66e2af-8fcb-4d33-8232-5949787a6164/file                                                                                                                                                                                                                                                                                                                                                                                                                                           |
# | id               | 2d66e2af-8fcb-4d33-8232-5949787a6164                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
# | min_disk         | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
# | min_ram          | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
# | name             | cirros                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
# | owner            | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
# | properties       | direct_url='rbd://1438a42b-6d15-4143-8e73-8d9f2c9488be/images/2d66e2af-8fcb-4d33-8232-5949787a6164/snap', locations='[{'url': 'rbd://1438a42b-6d15-4143-8e73-8d9f2c9488be/images/2d66e2af-8fcb-4d33-8232-5949787a6164/snap', 'metadata': {'store': 'default_backend'}}]', os_hash_algo='sha512', os_hash_value='6513f21e44aa3da349f248188a44bc304a3653a04122d8fb4535423c8e1d14cd6a153f735bb0982e2161b5b5186106570c17a9e58b64dd39390617cd5a350f78', os_hidden='False', stores='default_backend' |
# | protected        | False                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
# | schema           | /v2/schemas/image                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
# | size             | 12716032                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
# | status           | active                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
# | tags             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
# | updated_at       | 2022-11-18T15:39:03Z                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
# | virtual_size     | None                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
# | visibility       | public                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
# +------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

ssh-keygen -m PEM -t rsa -b 2048 -f ~/.ssh/id_rsa_pem

openstack keypair create --public-key ~/.ssh/id_rsa_pem.pub default
# +-------------+-------------------------------------------------+
# | Field       | Value                                           |
# +-------------+-------------------------------------------------+
# | fingerprint | 64:34:f6:33:9f:87:10:27:d6:5f:80:4c:e7:03:a7:2a |
# | name        | default                                         |
# | user_id     | 0049debaf5d34a83a54486fd418b6981                |
# +-------------+-------------------------------------------------+

openstack security group create basic
# +-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field           | Value                                                                                                                                                                     |
# +-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | created_at      | 2022-11-18T15:40:45Z                                                                                                                                                      |
# | description     | basic                                                                                                                                                                     |
# | id              | c3f80883-589f-4cf4-b1b5-059e7966ae82                                                                                                                                      |
# | location        | cloud='overcloud', project.domain_id=, project.domain_name='Default', project.id='3647f67bbd5844e38e13f418143c4b57', project.name='admin', region_name='regionOne', zone= |
# | name            | basic                                                                                                                                                                     |
# | project_id      | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                          |
# | revision_number | 1                                                                                                                                                                         |
# | rules           | created_at='2022-11-18T15:40:46Z', direction='egress', ethertype='IPv4', id='184384ca-4048-4711-9419-b2a9c3c685f8', updated_at='2022-11-18T15:40:46Z'                     |
# |                 | created_at='2022-11-18T15:40:46Z', direction='egress', ethertype='IPv6', id='19cf0c67-6724-49e3-a13c-e366e662b63e', updated_at='2022-11-18T15:40:46Z'                     |
# | tags            | []                                                                                                                                                                        |
# | updated_at      | 2022-11-18T15:40:46Z                                                                                                                                                      |
# +-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

openstack security group rule create basic --protocol tcp --dst-port 22:22 --remote-ip 0.0.0.0/0
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field             | Value                                                                                                                                                                     |
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | created_at        | 2022-11-18T15:41:24Z                                                                                                                                                      |
# | description       |                                                                                                                                                                           |
# | direction         | ingress                                                                                                                                                                   |
# | ether_type        | IPv4                                                                                                                                                                      |
# | id                | a44aa3ee-9fb3-4559-a655-eb90ea974cf8                                                                                                                                      |
# | location          | cloud='overcloud', project.domain_id=, project.domain_name='Default', project.id='3647f67bbd5844e38e13f418143c4b57', project.name='admin', region_name='regionOne', zone= |
# | name              | None                                                                                                                                                                      |
# | port_range_max    | 22                                                                                                                                                                        |
# | port_range_min    | 22                                                                                                                                                                        |
# | project_id        | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                          |
# | protocol          | tcp                                                                                                                                                                       |
# | remote_group_id   | None                                                                                                                                                                      |
# | remote_ip_prefix  | 0.0.0.0/0                                                                                                                                                                 |
# | revision_number   | 0                                                                                                                                                                         |
# | security_group_id | c3f80883-589f-4cf4-b1b5-059e7966ae82                                                                                                                                      |
# | tags              | []                                                                                                                                                                        |
# | updated_at        | 2022-11-18T15:41:24Z                                                                                                                                                      |
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

openstack security group rule create --protocol icmp basic
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field             | Value                                                                                                                                                                     |
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | created_at        | 2022-11-18T15:42:26Z                                                                                                                                                      |
# | description       |                                                                                                                                                                           |
# | direction         | ingress                                                                                                                                                                   |
# | ether_type        | IPv4                                                                                                                                                                      |
# | id                | dfe1760d-c76e-4797-9bbe-cf5cbb5a8386                                                                                                                                      |
# | location          | cloud='overcloud', project.domain_id=, project.domain_name='Default', project.id='3647f67bbd5844e38e13f418143c4b57', project.name='admin', region_name='regionOne', zone= |
# | name              | None                                                                                                                                                                      |
# | port_range_max    | None                                                                                                                                                                      |
# | port_range_min    | None                                                                                                                                                                      |
# | project_id        | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                          |
# | protocol          | icmp                                                                                                                                                                      |
# | remote_group_id   | None                                                                                                                                                                      |
# | remote_ip_prefix  | 0.0.0.0/0                                                                                                                                                                 |
# | revision_number   | 0                                                                                                                                                                         |
# | security_group_id | c3f80883-589f-4cf4-b1b5-059e7966ae82                                                                                                                                      |
# | tags              | []                                                                                                                                                                        |
# | updated_at        | 2022-11-18T15:42:26Z                                                                                                                                                      |
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

openstack security group rule create --protocol udp --dst-port 53:53 basic
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field             | Value                                                                                                                                                                     |
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | created_at        | 2022-11-18T15:42:58Z                                                                                                                                                      |
# | description       |                                                                                                                                                                           |
# | direction         | ingress                                                                                                                                                                   |
# | ether_type        | IPv4                                                                                                                                                                      |
# | id                | 339c1bfd-e812-472d-84db-77ba47425dfc                                                                                                                                      |
# | location          | cloud='overcloud', project.domain_id=, project.domain_name='Default', project.id='3647f67bbd5844e38e13f418143c4b57', project.name='admin', region_name='regionOne', zone= |
# | name              | None                                                                                                                                                                      |
# | port_range_max    | 53                                                                                                                                                                        |
# | port_range_min    | 53                                                                                                                                                                        |
# | project_id        | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                          |
# | protocol          | udp                                                                                                                                                                       |
# | remote_group_id   | None                                                                                                                                                                      |
# | remote_ip_prefix  | 0.0.0.0/0                                                                                                                                                                 |
# | revision_number   | 0                                                                                                                                                                         |
# | security_group_id | c3f80883-589f-4cf4-b1b5-059e7966ae82                                                                                                                                      |
# | tags              | []                                                                                                                                                                        |
# | updated_at        | 2022-11-18T15:42:58Z                                                                                                                                                      |
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

openstack network create --external --provider-physical-network datacentre --provider-network-type flat public
# +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field                     | Value                                                                                                                                                                     |
# +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | admin_state_up            | UP                                                                                                                                                                        |
# | availability_zone_hints   |                                                                                                                                                                           |
# | availability_zones        |                                                                                                                                                                           |
# | created_at                | 2022-11-18T15:47:04Z                                                                                                                                                      |
# | description               |                                                                                                                                                                           |
# | dns_domain                |                                                                                                                                                                           |
# | id                        | 38ae7119-1628-45c6-8763-24dd5eb967cc                                                                                                                                      |
# | ipv4_address_scope        | None                                                                                                                                                                      |
# | ipv6_address_scope        | None                                                                                                                                                                      |
# | is_default                | False                                                                                                                                                                     |
# | is_vlan_transparent       | None                                                                                                                                                                      |
# | location                  | cloud='overcloud', project.domain_id=, project.domain_name='Default', project.id='3647f67bbd5844e38e13f418143c4b57', project.name='admin', region_name='regionOne', zone= |
# | mtu                       | 1500                                                                                                                                                                      |
# | name                      | public                                                                                                                                                                    |
# | port_security_enabled     | True                                                                                                                                                                      |
# | project_id                | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                          |
# | provider:network_type     | flat                                                                                                                                                                      |
# | provider:physical_network | datacentre                                                                                                                                                                |
# | provider:segmentation_id  | None                                                                                                                                                                      |
# | qos_policy_id             | None                                                                                                                                                                      |
# | revision_number           | 1                                                                                                                                                                         |
# | router:external           | External                                                                                                                                                                  |
# | segments                  | None                                                                                                                                                                      |
# | shared                    | False                                                                                                                                                                     |
# | status                    | ACTIVE                                                                                                                                                                    |
# | subnets                   |                                                                                                                                                                           |
# | tags                      |                                                                                                                                                                           |
# | updated_at                | 2022-11-18T15:47:06Z                                                                                                                                                      |
# +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

openstack network create --internal private
# +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field                     | Value                                                                                                                                                                     |
# +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | admin_state_up            | UP                                                                                                                                                                        |
# | availability_zone_hints   |                                                                                                                                                                           |
# | availability_zones        |                                                                                                                                                                           |
# | created_at                | 2022-11-18T15:48:08Z                                                                                                                                                      |
# | description               |                                                                                                                                                                           |
# | dns_domain                |                                                                                                                                                                           |
# | id                        | a33927cd-7983-490b-8ab1-e70887abc398                                                                                                                                      |
# | ipv4_address_scope        | None                                                                                                                                                                      |
# | ipv6_address_scope        | None                                                                                                                                                                      |
# | is_default                | False                                                                                                                                                                     |
# | is_vlan_transparent       | None                                                                                                                                                                      |
# | location                  | cloud='overcloud', project.domain_id=, project.domain_name='Default', project.id='3647f67bbd5844e38e13f418143c4b57', project.name='admin', region_name='regionOne', zone= |
# | mtu                       | 1442                                                                                                                                                                      |
# | name                      | private                                                                                                                                                                   |
# | port_security_enabled     | True                                                                                                                                                                      |
# | project_id                | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                          |
# | provider:network_type     | geneve                                                                                                                                                                    |
# | provider:physical_network | None                                                                                                                                                                      |
# | provider:segmentation_id  | 44033                                                                                                                                                                     |
# | qos_policy_id             | None                                                                                                                                                                      |
# | revision_number           | 1                                                                                                                                                                         |
# | router:external           | Internal                                                                                                                                                                  |
# | segments                  | None                                                                                                                                                                      |
# | shared                    | False                                                                                                                                                                     |
# | status                    | ACTIVE                                                                                                                                                                    |
# | subnets                   |                                                                                                                                                                           |
# | tags                      |                                                                                                                                                                           |
# | updated_at                | 2022-11-18T15:48:08Z                                                                                                                                                      |
# +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


export GATEWAY=172.21.6.254
# export STANDALONE_HOST=192.168.25.2
export PUBLIC_NETWORK_CIDR=172.21.6.0/24
export PRIVATE_NETWORK_CIDR=192.168.100.0/24
export PUBLIC_NET_START=172.21.6.70
export PUBLIC_NET_END=172.21.6.80
export DNS_SERVER=172.21.1.1

openstack subnet create public-net \
    --subnet-range $PUBLIC_NETWORK_CIDR \
    --no-dhcp \
    --gateway $GATEWAY \
    --allocation-pool start=$PUBLIC_NET_START,end=$PUBLIC_NET_END \
    --network public
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field             | Value                                                                                                                                                                     |
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | allocation_pools  | 172.21.6.70-172.21.6.80                                                                                                                                                   |
# | cidr              | 172.21.6.0/24                                                                                                                                                             |
# | created_at        | 2022-11-18T15:51:01Z                                                                                                                                                      |
# | description       |                                                                                                                                                                           |
# | dns_nameservers   |                                                                                                                                                                           |
# | enable_dhcp       | False                                                                                                                                                                     |
# | gateway_ip        | 172.21.6.254                                                                                                                                                              |
# | host_routes       |                                                                                                                                                                           |
# | id                | 812aa93f-aa5b-42b5-97ef-63ae59e8c1da                                                                                                                                      |
# | ip_version        | 4                                                                                                                                                                         |
# | ipv6_address_mode | None                                                                                                                                                                      |
# | ipv6_ra_mode      | None                                                                                                                                                                      |
# | location          | cloud='overcloud', project.domain_id=, project.domain_name='Default', project.id='3647f67bbd5844e38e13f418143c4b57', project.name='admin', region_name='regionOne', zone= |
# | name              | public-net                                                                                                                                                                |
# | network_id        | 38ae7119-1628-45c6-8763-24dd5eb967cc                                                                                                                                      |
# | prefix_length     | None                                                                                                                                                                      |
# | project_id        | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                          |
# | revision_number   | 0                                                                                                                                                                         |
# | segment_id        | None                                                                                                                                                                      |
# | service_types     |                                                                                                                                                                           |
# | subnetpool_id     | None                                                                                                                                                                      |
# | tags              |                                                                                                                                                                           |
# | updated_at        | 2022-11-18T15:51:01Z                                                                                                                                                      |
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

openstack subnet create private-net \
    --subnet-range $PRIVATE_NETWORK_CIDR \
    --network private
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field             | Value                                                                                                                                                                     |
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | allocation_pools  | 192.168.100.2-192.168.100.254                                                                                                                                             |
# | cidr              | 192.168.100.0/24                                                                                                                                                          |
# | created_at        | 2022-11-18T15:52:06Z                                                                                                                                                      |
# | description       |                                                                                                                                                                           |
# | dns_nameservers   |                                                                                                                                                                           |
# | enable_dhcp       | True                                                                                                                                                                      |
# | gateway_ip        | 192.168.100.1                                                                                                                                                             |
# | host_routes       |                                                                                                                                                                           |
# | id                | 0a378d92-386f-437b-acf8-564595e394ba                                                                                                                                      |
# | ip_version        | 4                                                                                                                                                                         |
# | ipv6_address_mode | None                                                                                                                                                                      |
# | ipv6_ra_mode      | None                                                                                                                                                                      |
# | location          | cloud='overcloud', project.domain_id=, project.domain_name='Default', project.id='3647f67bbd5844e38e13f418143c4b57', project.name='admin', region_name='regionOne', zone= |
# | name              | private-net                                                                                                                                                               |
# | network_id        | a33927cd-7983-490b-8ab1-e70887abc398                                                                                                                                      |
# | prefix_length     | None                                                                                                                                                                      |
# | project_id        | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                          |
# | revision_number   | 0                                                                                                                                                                         |
# | segment_id        | None                                                                                                                                                                      |
# | service_types     |                                                                                                                                                                           |
# | subnetpool_id     | None                                                                                                                                                                      |
# | tags              |                                                                                                                                                                           |
# | updated_at        | 2022-11-18T15:52:06Z                                                                                                                                                      |
# +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

# NOTE: In this case an IP will be automatically assigned
# from the allocation pool for the subnet.
openstack router create vrouter
# +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field                   | Value                                                                                                                                                                     |
# +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | admin_state_up          | UP                                                                                                                                                                        |
# | availability_zone_hints |                                                                                                                                                                           |
# | availability_zones      |                                                                                                                                                                           |
# | created_at              | 2022-11-18T15:53:12Z                                                                                                                                                      |
# | description             |                                                                                                                                                                           |
# | external_gateway_info   | null                                                                                                                                                                      |
# | flavor_id               | None                                                                                                                                                                      |
# | id                      | cb1fcb45-1716-4676-8ecd-9a0ee22ce936                                                                                                                                      |
# | location                | cloud='overcloud', project.domain_id=, project.domain_name='Default', project.id='3647f67bbd5844e38e13f418143c4b57', project.name='admin', region_name='regionOne', zone= |
# | name                    | vrouter                                                                                                                                                                   |
# | project_id              | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                          |
# | revision_number         | 1                                                                                                                                                                         |
# | routes                  |                                                                                                                                                                           |
# | status                  | ACTIVE                                                                                                                                                                    |
# | tags                    |                                                                                                                                                                           |
# | updated_at              | 2022-11-18T15:53:12Z                                                                                                                                                      |
# +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

openstack router set vrouter --external-gateway public

openstack router add subnet vrouter private-net

openstack floating ip create public
# +---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | Field               | Value                                                                                                                                                                                               |
# +---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# | created_at          | 2022-11-18T15:56:17Z                                                                                                                                                                                |
# | description         |                                                                                                                                                                                                     |
# | dns_domain          |                                                                                                                                                                                                     |
# | dns_name            |                                                                                                                                                                                                     |
# | fixed_ip_address    | None                                                                                                                                                                                                |
# | floating_ip_address | 172.21.6.79                                                                                                                                                                                         |
# | floating_network_id | 38ae7119-1628-45c6-8763-24dd5eb967cc                                                                                                                                                                |
# | id                  | de30c4aa-3aac-4216-af83-f335aac2765e                                                                                                                                                                |
# | location            | Munch({'cloud': 'overcloud', 'region_name': 'regionOne', 'zone': None, 'project': Munch({'id': '3647f67bbd5844e38e13f418143c4b57', 'name': 'admin', 'domain_id': None, 'domain_name': 'Default'})}) |
# | name                | 172.21.6.79                                                                                                                                                                                         |
# | port_details        | None                                                                                                                                                                                                |
# | port_id             | None                                                                                                                                                                                                |
# | project_id          | 3647f67bbd5844e38e13f418143c4b57                                                                                                                                                                    |
# | qos_policy_id       | None                                                                                                                                                                                                |
# | revision_number     | 0                                                                                                                                                                                                   |
# | router_id           | None                                                                                                                                                                                                |
# | status              | DOWN                                                                                                                                                                                                |
# | subnet_id           | None                                                                                                                                                                                                |
# | tags                | []                                                                                                                                                                                                  |
# | updated_at          | 2022-11-18T15:56:17Z                                                                                                                                                                                |
# +---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


openstack server create --flavor tiny --image cirros --key-name default --network private --security-group basic myserver
# +-------------------------------------+-----------------------------------------------+
# | Field                               | Value                                         |
# +-------------------------------------+-----------------------------------------------+
# | OS-DCF:diskConfig                   | MANUAL                                        |
# | OS-EXT-AZ:availability_zone         |                                               |
# | OS-EXT-SRV-ATTR:host                | None                                          |
# | OS-EXT-SRV-ATTR:hypervisor_hostname | None                                          |
# | OS-EXT-SRV-ATTR:instance_name       |                                               |
# | OS-EXT-STS:power_state              | NOSTATE                                       |
# | OS-EXT-STS:task_state               | scheduling                                    |
# | OS-EXT-STS:vm_state                 | building                                      |
# | OS-SRV-USG:launched_at              | None                                          |
# | OS-SRV-USG:terminated_at            | None                                          |
# | accessIPv4                          |                                               |
# | accessIPv6                          |                                               |
# | addresses                           |                                               |
# | adminPass                           | r9QVNEs5r8Ji                                  |
# | config_drive                        |                                               |
# | created                             | 2022-11-18T15:57:49Z                          |
# | flavor                              | tiny (511a9cc2-9f68-4850-a3e1-de40a68db8d7)   |
# | hostId                              |                                               |
# | id                                  | c6488d98-bdc8-4439-a586-f74c7d31e64d          |
# | image                               | cirros (2d66e2af-8fcb-4d33-8232-5949787a6164) |
# | key_name                            | default                                       |
# | name                                | myserver                                      |
# | progress                            | 0                                             |
# | project_id                          | 3647f67bbd5844e38e13f418143c4b57              |
# | properties                          |                                               |
# | security_groups                     | name='c3f80883-589f-4cf4-b1b5-059e7966ae82'   |
# | status                              | BUILD                                         |
# | updated                             | 2022-11-18T15:57:50Z                          |
# | user_id                             | 0049debaf5d34a83a54486fd418b6981              |
# | volumes_attached                    |                                               |
# +-------------------------------------+-----------------------------------------------+

openstack server add floating ip myserver 172.21.6.79

ssh -i ~/.ssh/id_rsa_pem cirros@172.21.6.79
ip a
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
#     inet 127.0.0.1/8 scope host lo
#        valid_lft forever preferred_lft forever
#     inet6 ::1/128 scope host
#        valid_lft forever preferred_lft forever
# 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1442 qdisc pfifo_fast qlen 1000
#     link/ether fa:16:3e:f5:62:0f brd ff:ff:ff:ff:ff:ff
#     inet 192.168.100.108/24 brd 192.168.100.255 scope global eth0
#        valid_lft forever preferred_lft forever
#     inet6 fe80::f816:3eff:fef5:620f/64 scope link
#        valid_lft forever preferred_lft forever

network config on computerHCI

[root@computehci-0 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:4a:71:e9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::e112:e339:d036:a9ae/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:ec:40:fa brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.25/24 brd 172.22.0.255 scope global dynamic noprefixroute enp2s0
       valid_lft 79sec preferred_lft 79sec
    inet6 fe80::a494:89f6:d082:30ec/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
4: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000
    link/ether 52:54:00:7c:23:1a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe7c:231a/64 scope link
       valid_lft forever preferred_lft forever
5: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:36:ee:42 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.43/24 brd 192.168.7.255 scope global enp4s0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe36:ee42/64 scope link
       valid_lft forever preferred_lft forever
6: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:88:1f:84 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::67c4:3ad8:90c6:9196/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
7: enp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:92:b5:ed brd ff:ff:ff:ff:ff:ff
    inet6 fe80::6f9:254d:87cb:7f3e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
8: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:d9:66:89 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::d41b:2e70:f90:39a7/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
9: enp8s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:d9:61:82 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::b779:5bdb:b449:3d9/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 16:c6:40:67:40:f9 brd ff:ff:ff:ff:ff:ff
11: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 52:54:00:7c:23:1a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe7c:231a/64 scope link
       valid_lft forever preferred_lft forever
12: vlan50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether ea:13:bc:e1:ae:8d brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.11/24 brd 172.20.0.255 scope global vlan50
       valid_lft forever preferred_lft forever
    inet6 fe80::e813:bcff:fee1:ae8d/64 scope link
       valid_lft forever preferred_lft forever
13: vlan30@enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:36:ee:42 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.12/24 brd 172.18.0.255 scope global vlan30
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe36:ee42/64 scope link
       valid_lft forever preferred_lft forever
14: vlan40@enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:36:ee:42 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.12/24 brd 172.19.0.255 scope global vlan40
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe36:ee42/64 scope link
       valid_lft forever preferred_lft forever
15: vlan20@enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1350 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:36:ee:42 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.13/24 brd 172.17.0.255 scope global vlan20
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe36:ee42/64 scope link
       valid_lft forever preferred_lft forever
16: br-int: <BROADCAST,MULTICAST> mtu 1442 qdisc noop state DOWN group default qlen 1000
    link/ether 3a:0d:9a:c5:a4:eb brd ff:ff:ff:ff:ff:ff
17: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
    link/ether 92:b3:8a:d2:73:ac brd ff:ff:ff:ff:ff:ff
    inet6 fe80::90b3:8aff:fed2:73ac/64 scope link
       valid_lft forever preferred_lft forever
18: tapa46cc8cc-20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1442 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
    link/ether fe:16:3e:f5:62:0f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fef5:620f/64 scope link
       valid_lft forever preferred_lft forever
19: tapa33927cd-70@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP group default qlen 1000
    link/ether 6a:49:ac:e8:bd:37 brd ff:ff:ff:ff:ff:ff link-netns ovnmeta-a33927cd-7983-490b-8ab1-e70887abc398
    inet6 fe80::6849:acff:fee8:bd37/64 scope link
       valid_lft forever preferred_lft forever

[root@computehci-0 ~]# ip netns exec  ovnmeta-a33927cd-7983-490b-8ab1-e70887abc398 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tapa33927cd-71@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:b0:e5:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.169.254/16 brd 169.254.255.255 scope global tapa33927cd-71
       valid_lft forever preferred_lft forever
    inet 192.168.100.2/24 brd 192.168.100.255 scope global tapa33927cd-71
       valid_lft forever preferred_lft forever

[root@computehci-0 ~]# ovs-vsctl show
54ff9cc9-ffe9-4a1d-9136-4a15c60f43dd
    Manager "ptcp:6640:127.0.0.1"
        is_connected: true
    Bridge br-ex
        fail_mode: standalone
        Port patch-provnet-bfad145a-fb08-420c-8454-2c69a86ac674-to-br-int
            Interface patch-provnet-bfad145a-fb08-420c-8454-2c69a86ac674-to-br-int
                type: patch
                options: {peer=patch-br-int-to-provnet-bfad145a-fb08-420c-8454-2c69a86ac674}
        Port vlan50
            tag: 50
            Interface vlan50
                type: internal
        Port br-ex
            Interface br-ex
                type: internal
        Port enp3s0
            Interface enp3s0
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port patch-br-int-to-provnet-bfad145a-fb08-420c-8454-2c69a86ac674
            Interface patch-br-int-to-provnet-bfad145a-fb08-420c-8454-2c69a86ac674
                type: patch
                options: {peer=patch-provnet-bfad145a-fb08-420c-8454-2c69a86ac674-to-br-int}
        Port tapa46cc8cc-20
            Interface tapa46cc8cc-20
        Port ovn-8c68c1-0
            Interface ovn-8c68c1-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="172.20.0.10"}
                bfd_status: {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up, state=up}
        Port br-int
            Interface br-int
                type: internal
        Port tapa33927cd-70
            Interface tapa33927cd-70
    ovs_version: "2.15.7"

[root@computehci-0 ~]# ovs-dpctl dump-flows | grep 172.21.6.79
recirc_id(0),in_port(4),eth(src=00:0c:29:d9:f0:d9,dst=fa:16:3e:df:80:af),eth_type(0x0800),ipv4(src=172.21.6.0/255.255.255.192,dst=172.21.6.79,proto=1,ttl=64,frag=no), packets:1962, bytes:192276, used:0.645s, actions:ct(zone=1,nat),recirc(0x3)


[root@computehci-0 ~]# ovs-dpctl dump-flows
recirc_id(0),in_port(4),eth(src=02:2e:8d:00:00:06,dst=32:df:00:f0:a3:4d),eth_type(0x8100),vlan(vid=50,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:3438, bytes:412560, used:0.467s, actions:pop_vlan,5
recirc_id(0),in_port(4),eth(src=52:54:00:d9:66:89,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=0.0.0.0/255.0.0.0,dst=240.0.0.0/240.0.0.0,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:3
recirc_id(0),in_port(4),eth(src=00:0c:29:d9:f0:d9,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.7.11,tip=192.168.7.22,op=1/0xff), packets:2111, bytes:126660, used:0.018s, actions:3
recirc_id(0),in_port(4),eth(src=90:b1:1c:44:d6:0f,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=172.21.6.103,tip=172.21.6.102,op=1/0xff,sha=90:b1:1c:44:d6:0f), packets:2280, bytes:95760, used:1.595s, actions:3
recirc_id(0x3),in_port(4),ct_state(-new+est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:0c:29:d9:f0:d9,dst=fa:16:3e:df:80:af),eth_type(0x0800),ipv4(dst=192.168.100.50,proto=1,ttl=64,frag=no), packets:1988, bytes:194824, used:0.580s, actions:ct_clear,set(eth(src=fa:16:3e:89:22:a5,dst=fa:16:3e:bd:98:6b)),set(ipv4(ttl=63)),ct(zone=7),recirc(0x6)
recirc_id(0),in_port(6),eth(src=fa:16:3e:bd:98:6b,dst=fa:16:3e:89:22:a5),eth_type(0x0800),ipv4(src=192.168.100.50,dst=172.21.6.0/255.255.255.192,proto=1,frag=no), packets:1989, bytes:194922, used:0.579s, actions:ct(zone=7),recirc(0xa)
recirc_id(0),in_port(4),eth(src=52:54:00:3d:23:56,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=0.0.0.0/255.0.0.0,dst=240.0.0.0/240.0.0.0,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:3
recirc_id(0),in_port(4),eth(src=52:54:00:aa:41:08,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.7.101,tip=192.168.7.101,op=1/0xff), packets:0, bytes:0, used:never, actions:3
recirc_id(0),in_port(4),eth(src=52:54:00:1e:bb:a4,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=0.0.0.0/255.0.0.0,dst=240.0.0.0/240.0.0.0,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:3
recirc_id(0),tunnel(tun_id=0x0,src=172.20.0.10,dst=172.20.0.11,flags(-df+csum+key)),in_port(2),eth(),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:3437, bytes:226842, used:0.467s, actions:userspace(pid=3165666405,slow_path(bfd))
recirc_id(0),in_port(5),eth(src=32:df:00:f0:a3:4d,dst=02:2e:8d:00:00:06),eth_type(0x0800),ipv4(frag=no), packets:3458, bytes:401128, used:0.744s, actions:push_vlan(vid=50,pcp=0),4
recirc_id(0),in_port(4),eth(src=52:54:00:d9:66:89,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(dst=ff02::16,proto=58,hlimit=1,frag=no), packets:1, bytes:90, used:5.509s, actions:3
recirc_id(0xe),in_port(6),ct_state(-new+est-rel+rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:df:80:af,dst=00:0c:29:d9:f0:d9),eth_type(0x0800),ipv4(src=128.0.0.0/192.0.0.0,dst=172.21.6.0/255.255.255.192,frag=no), packets:1988, bytes:194824, used:0.580s, actions:ct_clear,4
recirc_id(0),in_port(4),eth(src=00:17:94:73:12:8b,dst=01:00:0c:cc:cc:cc),eth_type(0/0xffff), packets:1, bytes:398, used:7.440s, actions:drop
recirc_id(0),in_port(4),eth(src=52:54:00:95:3f:da,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(dst=ff02::16,proto=58,hlimit=1,frag=no), packets:1, bytes:150, used:3.723s, actions:3
recirc_id(0),in_port(4),eth(src=52:54:00:95:3f:da,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=0.0.0.0/255.0.0.0,dst=240.0.0.0/240.0.0.0,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:3
recirc_id(0),in_port(4),eth(src=02:2e:8d:00:00:06,dst=32:df:00:f0:a3:4d),eth_type(0x8100),vlan(vid=50,pcp=0),encap(eth_type(0x0806)), packets:1, bytes:46, used:5.278s, actions:pop_vlan,5
recirc_id(0xa),in_port(6),ct_state(-new+est-rel+rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:bd:98:6b,dst=fa:16:3e:89:22:a5),eth_type(0x0800),ipv4(src=192.168.100.50,dst=172.21.6.11,proto=1,ttl=64,frag=no), packets:1989, bytes:194922, used:0.580s, actions:ct_clear,set(eth(src=fa:16:3e:df:80:af,dst=00:0c:29:d9:f0:d9)),set(ipv4(ttl=63)),ct(zone=1,nat),recirc(0xe)
recirc_id(0),in_port(4),eth(src=fe:54:00:7c:23:1a,dst=01:80:c2:00:00:00),eth_type(0/0xffff), packets:15467, bytes:804284, used:0.468s, actions:drop
recirc_id(0),in_port(4),eth(src=00:0c:29:d9:f0:d9,dst=fa:16:3e:df:80:af),eth_type(0x0800),ipv4(src=172.21.6.0/255.255.255.192,dst=172.21.6.79,proto=1,ttl=64,frag=no), packets:2011, bytes:197078, used:0.581s, actions:ct(zone=1,nat),recirc(0x3)
recirc_id(0x6),in_port(4),ct_state(-new+est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:89:22:a5,dst=fa:16:3e:bd:98:6b),eth_type(0x0800),ipv4(dst=192.168.100.50,proto=1,frag=no), packets:1988, bytes:194824, used:0.581s, actions:6
recirc_id(0),in_port(4),eth(src=52:54:00:73:0f:fb,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=0.0.0.0/255.0.0.0,dst=240.0.0.0/240.0.0.0,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:3
recirc_id(0),in_port(4),eth(src=52:54:00:3d:23:56,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(dst=ff02::16,proto=58,hlimit=1,frag=no), packets:1, bytes:150, used:1.293s, actions:3
recirc_id(0),in_port(5),eth(src=32:df:00:f0:a3:4d,dst=02:2e:8d:00:00:06),eth_type(0x0806), packets:1, bytes:42, used:5.278s, actions:push_vlan(vid=50,pcp=0),4
recirc_id(0),in_port(4),eth(src=52:54:00:73:0f:fb,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(dst=ff02::16,proto=58,hlimit=1,frag=no), packets:1, bytes:150, used:9.113s, actions:3
recirc_id(0),in_port(4),eth(src=00:0c:29:d9:f0:d9,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.7.11,tip=192.168.7.26,op=1/0xff), packets:2117, bytes:127020, used:0.021s, actions:3
recirc_id(0),in_port(4),eth(src=52:54:00:1e:bb:a4,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(dst=ff02::16,proto=58,hlimit=1,frag=no), packets:1, bytes:150, used:5.710s, actions:3

network config on contoller

[root@controller-0 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc mq state UP group default qlen 1000
    link/ether 02:2e:8d:00:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.2/24 brd 10.0.2.255 scope global enp1s0
       valid_lft forever preferred_lft forever
    inet6 fe80::2e:8dff:fe00:0/64 scope link
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 02:2e:8d:00:00:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.42/24 brd 192.168.7.255 scope global enp2s0
       valid_lft forever preferred_lft forever
    inet 192.168.7.40/32 brd 192.168.7.255 scope global enp2s0
       valid_lft forever preferred_lft forever
    inet6 fe80::2e:8dff:fe00:1/64 scope link
       valid_lft forever preferred_lft forever
4: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether 02:2e:8d:00:00:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2e:8dff:fe00:2/64 scope link
       valid_lft forever preferred_lft forever
5: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1350 qdisc mq state UP group default qlen 1000
    link/ether 02:2e:8d:00:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.12/24 brd 172.17.0.255 scope global enp4s0
       valid_lft forever preferred_lft forever
    inet 172.17.0.10/32 brd 172.17.0.255 scope global enp4s0
       valid_lft forever preferred_lft forever
    inet6 fe80::2e:8dff:fe00:3/64 scope link
       valid_lft forever preferred_lft forever
6: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 02:2e:8d:00:00:04 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.11/24 brd 172.18.0.255 scope global enp5s0
       valid_lft forever preferred_lft forever
    inet 172.18.0.10/32 brd 172.18.0.255 scope global enp5s0
       valid_lft forever preferred_lft forever
    inet6 fe80::2e:8dff:fe00:4/64 scope link
       valid_lft forever preferred_lft forever
7: enp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 02:2e:8d:00:00:05 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.11/24 brd 172.19.0.255 scope global enp6s0
       valid_lft forever preferred_lft forever
    inet 172.19.0.10/32 brd 172.19.0.255 scope global enp6s0
       valid_lft forever preferred_lft forever
    inet6 fe80::2e:8dff:fe00:5/64 scope link
       valid_lft forever preferred_lft forever
8: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether 02:2e:8d:00:00:06 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2e:8dff:fe00:6/64 scope link
       valid_lft forever preferred_lft forever
9: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 7a:50:65:fe:ff:25 brd ff:ff:ff:ff:ff:ff
10: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:2e:8d:00:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.21.6.42/24 brd 172.21.6.255 scope global br-ex
       valid_lft forever preferred_lft forever
    inet 172.21.6.40/32 brd 172.21.6.255 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fe80::2e:8dff:fe00:2/64 scope link
       valid_lft forever preferred_lft forever
11: br-tenant: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:2e:8d:00:00:06 brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.10/24 brd 172.20.0.255 scope global br-tenant
       valid_lft forever preferred_lft forever
    inet6 fe80::2e:8dff:fe00:6/64 scope link
       valid_lft forever preferred_lft forever
12: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 7a:5c:8b:47:78:45 brd ff:ff:ff:ff:ff:ff
13: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
    link/ether ce:f7:9c:bb:43:b7 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ccf7:9cff:febb:43b7/64 scope link
       valid_lft forever preferred_lft forever



[root@controller-0 ~]# ovs-vsctl show
78811126-7766-4789-98e3-e8c3cf1cff0e
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port patch-br-int-to-provnet-bfad145a-fb08-420c-8454-2c69a86ac674
            Interface patch-br-int-to-provnet-bfad145a-fb08-420c-8454-2c69a86ac674
                type: patch
                options: {peer=patch-provnet-bfad145a-fb08-420c-8454-2c69a86ac674-to-br-int}
        Port ovn-980774-0
            Interface ovn-980774-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="172.20.0.11"}
                bfd_status: {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up, state=up}
        Port br-int
            Interface br-int
                type: internal
    Bridge br-tenant
        fail_mode: standalone
        Port enp7s0
            Interface enp7s0
        Port br-tenant
            Interface br-tenant
                type: internal
    Bridge br-ex
        fail_mode: standalone
        Port br-ex
            Interface br-ex
                type: internal
        Port patch-provnet-bfad145a-fb08-420c-8454-2c69a86ac674-to-br-int
            Interface patch-provnet-bfad145a-fb08-420c-8454-2c69a86ac674-to-br-int
                type: patch
                options: {peer=patch-br-int-to-provnet-bfad145a-fb08-420c-8454-2c69a86ac674}
        Port enp3s0
            Interface enp3s0
    ovs_version: "2.15.7"


[root@controller-0 ~]# ovs-dpctl dump-flows
recirc_id(0),in_port(2),eth(src=00:0c:29:d9:f0:d9,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.7.11,tip=192.168.7.26,op=1/0xff), packets:2057, bytes:123420, used:0.758s, actions:1
recirc_id(0),in_port(2),eth(src=00:17:94:73:12:8b,dst=01:00:0c:00:00:00),eth_type(0/0xffff), packets:0, bytes:0, used:never, actions:drop
recirc_id(0),in_port(5),eth(src=00:17:94:73:12:8b,dst=01:00:0c:00:00:00),eth_type(0/0xffff), packets:0, bytes:0, used:never, actions:drop
recirc_id(0),in_port(5),eth(src=52:54:00:4a:71:e9,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(frag=no), packets:1, bytes:90, used:9.340s, actions:6
recirc_id(0),in_port(5),eth(src=00:0c:29:d9:f0:d9,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.7.11,tip=192.168.7.22,op=1/0xff), packets:2046, bytes:122760, used:0.756s, actions:6
recirc_id(0),in_port(2),eth(src=00:17:94:73:12:8b,dst=01:00:0c:cc:cc:cc),eth_type(0/0xffff), packets:0, bytes:0, used:never, actions:drop
recirc_id(0),in_port(5),eth(src=fe:54:00:b9:09:0e,dst=01:80:c2:00:00:00),eth_type(0/0xffff), packets:1706, bytes:88712, used:0.181s, actions:drop
recirc_id(0),in_port(6),eth(src=02:2e:8d:00:00:06,dst=32:df:00:f0:a3:4d),eth_type(0x0806), packets:1, bytes:42, used:0.636s, actions:5
recirc_id(0),tunnel(tun_id=0x0,src=172.20.0.11,dst=172.20.0.10,flags(-df+csum+key)),in_port(4),eth(),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:3544, bytes:233904, used:0.822s, actions:userspace(pid=2361299521,slow_path(bfd))
recirc_id(0),in_port(2),eth(src=00:0c:29:d9:f0:d9,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.7.11,tip=192.168.7.22,op=1/0xff), packets:2057, bytes:123420, used:0.758s, actions:1
recirc_id(0),in_port(5),eth(src=32:df:00:f0:a3:4d,dst=02:2e:8d:00:00:06),eth_type(0x0806), packets:1, bytes:42, used:0.636s, actions:6
recirc_id(0),in_port(5),eth(src=00:17:94:73:12:8b,dst=01:00:0c:cc:cc:cc),eth_type(0/0xffff), packets:0, bytes:0, used:never, actions:drop
recirc_id(0),in_port(2),eth(src=fe:54:00:5c:61:1f,dst=01:80:c2:00:00:00),eth_type(0/0xffff), packets:1714, bytes:89128, used:0.181s, actions:drop
recirc_id(0),in_port(2),eth(src=52:54:00:72:47:a9,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=172.22.0.6,tip=172.22.0.240,op=1/0xff), packets:0, bytes:0, used:never, actions:1
recirc_id(0),in_port(5),eth(src=52:54:00:72:47:a9,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=172.22.0.6,tip=172.22.0.240,op=1/0xff), packets:0, bytes:0, used:never, actions:6
recirc_id(0),in_port(2),eth(src=90:b1:1c:44:d6:0f,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=172.21.6.103,tip=172.21.6.102,op=1/0xff), packets:2584, bytes:108528, used:1.372s, actions:1
recirc_id(0),in_port(5),eth(src=32:df:00:f0:a3:4d,dst=02:2e:8d:00:00:06),eth_type(0x0800),ipv4(frag=no), packets:3545, bytes:411220, used:0.822s, actions:6
recirc_id(0),in_port(5),eth(src=90:b1:1c:44:d6:0f,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=172.21.6.103,tip=172.21.6.102,op=1/0xff), packets:2571, bytes:107982, used:1.371s, actions:6
recirc_id(0),in_port(5),eth(src=00:0c:29:d9:f0:d9,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.7.11,tip=192.168.7.26,op=1/0xff), packets:2046, bytes:122760, used:0.756s, actions:6
recirc_id(0),in_port(6),eth(src=02:2e:8d:00:00:06,dst=32:df:00:f0:a3:4d),eth_type(0x0800),ipv4(frag=no), packets:3525, bytes:408900, used:0.280s, actions:5
recirc_id(0),in_port(2),eth(src=52:54:00:4a:71:e9,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(src=fe80::/ffc0::,dst=ff02::16,proto=58,hlimit=1,frag=no),icmpv6(type=143), packets:1, bytes:90, used:9.344s, actions:1

tips

delete


oc delete -f ${BASE_DIR}/data/install/overcloud-network.yaml -n openstack
oc delete -f ${BASE_DIR}/data/install/ctlplane-network.yaml -n openstack
oc delete -f ${BASE_DIR}/data/install/openstack-userpassword.yaml -n openstack
oc delete -f ${BASE_DIR}/data/install/osp-director-operator.yaml

oc delete project openstack

fix nics

virsh list --all
#  Id   Name                     State
# -----------------------------------------
#  -    ocp4-acm-hub-master01    shut off
#  -    ocp4-acm-one-bootstrap   shut off
#  -    ocp4-acm-one-master-01   shut off
#  -    ocp4-acm-one-master-02   shut off
#  -    ocp4-acm-one-master-03   shut off
#  -    ocp4-ipi-osp-master-01   shut off
#  -    ocp4-ipi-osp-master-02   shut off
#  -    ocp4-ipi-osp-master-03   shut off
#  -    ocp4-ipi-osp-worker-01   shut off
#  -    osp-17-0-all-in-one      shut off

for i in {1..3}
do
  for j in {1..4}
  do
    echo ocp4-ipi-osp-master-0$i
    # virsh attach-interface --domain ocp4-ipi-osp-master-0$i --type bridge --source baremetal --model virtio
  done
done

for i in 23 24 25
do
  ssh root@192.168.7.$i "nmcli con del br-osp "
done


question

  1. network topo
  2. 2 host with ovs and vlan

end

FlexRAN 20.11 enable on ocp4, pf mode, option 7.2

本文描述,如何把 intel 的 oran 解决方案 flexran (版本 20.11) ,移植到 openshift 平台之上。

This article describes how to port Intel's oran solution flexran (version 20.11) to the openshift platform.

本运行环境,是在openshift 4.9.5 上,硬件包含了intel e810网卡, ACC100 加速卡。 由于软件的限制,网卡开启了VF模式,但是ACC100没有开启VF模式,使用的PF模式。 PTP 组件没有使用openshift自带的ptp operator,而是使用了升级的自定义版本。 容器运行的时候,和operator以及硬件的关系结构图:

This operating environment is based on openshift 4.9.5, and the hardware includes intel e810 network card and ACC100 accelerator card. Due to software limitations, the network card enables the VF mode, but the ACC100 does not enable the VF mode and uses the PF mode. The PTP component does not use the ptp operator that comes with openshift, but uses an upgraded custom version. When the container is running, the structure diagram of the relationship with the operator and hardware:

本次实验整体网络架构图: The overall network architecture diagram of this experiment:

intel E810 Nic 的样子:

intel ACC100 是这个样子的

实验用的RU 长这个样子 The RU used for the experiment looks like this

视频讲解

如何编译相关的基础镜像,请参考环境开发文档

How to compile the relevant basic image, please refer to Environmental Development Documentation

应用镜像编译

我们已经制作好了一个基础镜像,quay.io/nepdemo/flexran_vdu:flexran-20.11-dpdk-19.11-ocp4.9.5-ubi-8.4-core-conf ,镜像很大(>5G), 项目现场有一个镜像仓库,就很有必要了。在项目现场,我们需要调整bbu应用参数的,这个是通过一个config map,注入一个脚本实现的。

核配置

bbu应用是大型的dpdk应用,而dpdk应用,cpu绑定配置,非常重要,配置不善,直接导致dpdk应用core dump,甚至物理机死机,这里,我们就提供一个 16 核配置的模板。他使用 1-16 core,实际测试证明,稳定性还是可以接受的。

demo bbu 应用的特点,是物理层使用8个core,l2, l3使用剩下的8个core,这些core如果相互冲突,物理层就会coredump.

制作镜像

上游镜像是一个包含systemd的ubi-init镜像,里面是有一个set_ip.sh的脚本,并且配置了对应的system service。 但是在项目实际过程中,发现通过systemd 启动服务的方式,启动bbu等应用,有莫名其妙的退出问题,于是我们还是用这个 ubi-init 的镜像,但是启动的时候,就不去用默认的init了,而是指定脚本运行。

既然指定脚本运行了,那我们就在这个脚本里面,做环境初始化,并且把bbu的核绑定参数也放进去。最后,在k8s的配置里面,启动bbu应用。

具体的制作镜像步骤,非常繁琐,参见这个文档

deploy on ocp 4.9.5

镜像都准备好了,我们开始在openshift4 上进行部署测试。

set security for temp image registry

我们临时创建了一个镜像仓库,那么我们就要把这个配置放到集群里面去,主要是让ocp集群,不要检查这个新镜像仓库的证书。

oc patch schedulers.config.openshift.io/cluster --type merge -p '{"spec":{"mastersSchedulable":false}}'

install /data/ocp4/clients/butane-amd64 /usr/local/bin/butane

cat << EOF > /data/sno/tmp.images.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-zzz-worker-temp-images
storage:
  files:
    - path: /etc/containers/registries.conf.d/temp.registries.conf
      overwrite: true
      contents:
        inline: |

            [[registry]]
            location = "tmp-registry.ocp4.redhat.ren:5443"
            insecure = true
            blocked = false
            mirror-by-digest-only = false
            prefix = ""

EOF

butane /data/sno/tmp.images.bu > /data/sno/99-zzz-worker-temp-images.yaml

oc create -f /data/sno/99-zzz-worker-temp-images.yaml

worker-2 node, rt-kernel setting

bbu 应用是需要实时操作系统支持的,那么我们就用openshift的performance addon operator来搞这个事情,pao支持激活rt-kernel,同时还能设置一些内核参数,我们需要把hugepage,cpu隔离设置好,还有e810的驱动屏蔽。

The bbu application needs the support of the real-time operating system, so we use the performance addon operator of openshift to do this. Pao supports the activation of rt-kernel, and can also set some kernel parameters. We need to set hugepage and cpu isolation. There are driver shields for e810.


cat << EOF > /data/install/performance-2.yaml
---
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
   name: wzh-performanceprofile-2
spec:
  additionalKernelArgs:
    - nmi_watchdog=0
    - isolcpus=1-18
    - nohz_full=1-18
    - rcu_nocbs=1-18
    - kthread_cpus=0,19
    - irqaffinity=0,19
    - iommu=pt
    - intel_iommu=on
    - intel_pstate=disable
    # try to upgrade e810 driver
    - module_name.blacklist=1 
    - rd.driver.blacklist=ice
    # profile creator
    - audit=0
    - mce=off
    - nmi_watchdog=0
  globallyDisableIrqLoadBalancing: true
  cpu:
      isolated: "1-18"
      reserved: "0,19"
  hugepages:
    defaultHugepagesSize: "1G"
    pages:
    - size:  "1G"
      count:  24
  realTimeKernel:
      enabled: true
  numa:  
      topologyPolicy: "single-numa-node"
  nodeSelector:
      node-role.kubernetes.io/worker-rt-2: ""
  machineConfigPoolSelector:
    machineconfiguration.openshift.io/role: worker-rt-2
EOF
oc create  --save-config  -f /data/install/performance-2.yaml

# oc apply -f /data/install/performance-2.yaml
# oc delete -f /data/install/performance-2.yaml

oc label node worker-2.ocp4.redhat.ren node-role.kubernetes.io/worker-rt-2=""

intel e810 driver

openshift 4.9.5 对应的coreos操作系统里面自带的e810驱动ice.ko,版本比较低,无法支持ptp,我们需要升级驱动。但是coreos升级驱动操作比较麻烦,我们需要制作一个systemd service,让他在kubelet之前启动,在这个service 里面,用podman启动一个特权容器,在容器里面,insmod ice.ko。当然一切的前提,是在kernel参数上,屏蔽了ice的自动加载。

The e810 driver ice.ko that comes with the coreos operating system corresponding to openshift 4.9.5 has a relatively low version and cannot support ptp. We need to upgrade the driver. But the coreos upgrade driver operation is more troublesome, we need to make a systemd service, let it start before kubelet, in this service, use podman to start a privileged container, in the container, insmod ice.ko. Of course, the premise of everything is that the automatic loading of ice is blocked on the kernel parameters.


cat << EOF > /data/sno/static-pod.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker-rt-2
  name: 99-zzz-e810-dpdk-driver-static-worker-rt-2
storage:
  files:
    - path: /etc/modprobe.d/blacklist-ice.conf
      mode: 0644
      overwrite: true
      contents:
        inline: |
          blacklist ice
systemd:
  units:
    - name: driver.ice.service
      enabled: true
      contents: |
        [Unit]
        Description=driver.ice service
        Wants=network-online.target
        After=network-online.target

        [Service]
        Type=oneshot
        RemainAfterExit=yes
        User=root
        WorkingDirectory=/root/
        ExecStart=podman run --privileged --rm -it quay.io/nepdemo/intel-driver:8.4-rt-1.9.7 /bin/sh -c " rmmod ice; rmmod auxiliary ; insmod /diy/auxiliary.ko; insmod /diy/ice.ko ; "

        [Install]
        WantedBy=multi-user.target
    - name: kubelet.service
      dropins:
      - name: 99-after-ice.conf
        contents: |
          [Unit]
          Requires=driver.ice.service
          After=driver.ice.service

EOF

butane -d /data/install /data/sno/static-pod.bu > /data/install/99-zzz-e810-dpdk-driver-static-worker-rt-2.yaml

oc create --save-config -f /data/install/99-zzz-e810-dpdk-driver-static-worker-rt-2.yaml

# oc apply -f /data/install/99-zzz-e810-dpdk-driver-static-worker-rt-2.yaml
# oc delete -f /data/install/99-zzz-e810-dpdk-driver-static-worker-rt-2.yaml

linuxptp 3.11

vRAN应用,特别是option 7.2 的方案,需要ptp时钟方案支持,物理形态,要么是GPS master连到交换机上,通过交换机授时,要么GPS master直接连网卡上。在服务器端,需要支持网络授时的网卡,并且主机上要启动ptp相关的服务,同时关掉ntp。

从上图可以看到,ptp4l,是从网络上拿时间到网卡上,phc2sys,是从网卡上,写到系统时钟上。ts2phc应该是把本地时间同步给其他设备用的。

openshift自带的ptp operator版本比较低,我们需要升级,就自己做镜像,做服务吧。

build linuxptp container image

我们在外网,用linuxptp 3.1.1版本做镜像,并且支持注入参数。方便项目现场调整。

# http://linuxptp.sourceforge.net/
# download linuxptp-3.1.1

mkdir -p /data/ptp
cd /data/ptp
wget https://nchc.dl.sourceforge.net/project/linuxptp/v3.1/linuxptp-3.1.1.tgz
tar zvxf linuxptp-3.1.1.tgz
cd linuxptp-3.1.1
make

cat << 'EOF' > ptp4l.sh
#!/bin/bash

if [ -z $DEMO_ENV_PRIO ]; then
  /usr/local/sbin/ptp4l -f /etc/ptp4l.conf -m $DEMO_ENV_PTP4L_ARG
else
  /usr/bin/chrt -f $DEMO_ENV_PRIO /usr/local/sbin/ptp4l -f /etc/ptp4l.conf -m $DEMO_ENV_PTP4L_ARG
fi

EOF

cat << 'EOF' > phc2sys.sh
#!/bin/bash

if [ -z $DEMO_ENV_PRIO ]; then
  /usr/local/sbin/phc2sys -m -z /var/run/ptp4l -t [phc2sys] $DEMO_ENV_PHC2SYS_ARG
else
  /usr/bin/chrt -f $DEMO_ENV_PRIO /usr/local/sbin/phc2sys -m -z /var/run/ptp4l -t [phc2sys] $DEMO_ENV_PHC2SYS_ARG
fi

EOF

cat << 'EOF' > ts2phc.sh
#!/bin/bash

if [ -z  $DEMO_ENV_PRIO ]; then
  /usr/local/sbin/ts2phc -f /etc/ts2phc.cfg -m $DEMO_ENV_TS2PHC_ARG
else
  /usr/bin/chrt -f $DEMO_ENV_PRIO /usr/local/sbin/ts2phc -f /etc/ts2phc.cfg -m $DEMO_ENV_TS2PHC_ARG
fi

EOF

cat << EOF > ./ptp.dockerfile
FROM registry.access.redhat.com/ubi8/ubi:8.4

COPY hwstamp_ctl nsm phc2sys phc_ctl pmc ptp4l timemaster ts2phc incdefs.sh version.sh ptp4l.sh phc2sys.sh ts2phc.sh /usr/local/sbin/
RUN cd /usr/local/sbin/ && chmod +x hwstamp_ctl nsm phc2sys phc_ctl pmc ptp4l timemaster ts2phc incdefs.sh version.sh ptp4l.sh phc2sys.sh ts2phc.sh

EOF

podman build --squash -t quay.io/nepdemo/linuxptp:3.1.1-ubi-8.4-v06 -f ptp.dockerfile ./

podman push quay.io/nepdemo/linuxptp:3.1.1-ubi-8.4-v06

deploy linux ptp

有了镜像,我们就做一个deployment,来启动ptp,注意,里面有3个container, 同时还有一些configmap 做配置文件注入. 在项目现场, 注意需要调整配置文件参数.


oc new-project vbbu-demo

oc project vbbu-demo

export REG_TMP='tmp-registry.ocp4.redhat.ren:5443'

# kernel driver deployment
oc create serviceaccount svcacct-driver -n vbbu-demo
oc adm policy add-scc-to-user privileged -z svcacct-driver -n vbbu-demo
# oc adm policy add-scc-to-user anyuid -z mysvcacct -n vbbu-demo

# !!! remember to disable chronyd on dest host !!!
# we do not use ptp opeerator, so we need to do it manually
# TODO
# https://docs.openshift.com/container-platform/4.10/scalability_and_performance/ztp-configuring-single-node-cluster-deployment-during-installation.html#sno-du-disabling-ntp_sno-du-deploying-distributed-units-manually-on-single-node-openshift

cat << 'EOF' > /data/install/ptp.chrony.conf
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker-rt-2
  name: disable-chronyd
spec:
  config:
    systemd:
      units:
        - contents: |
            [Unit]
            Description=NTP client/server
            Documentation=man:chronyd(8) man:chrony.conf(5)
            After=ntpdate.service sntp.service ntpd.service
            Conflicts=ntpd.service systemd-timesyncd.service
            ConditionCapability=CAP_SYS_TIME
            [Service]
            Type=forking
            PIDFile=/run/chrony/chronyd.pid
            EnvironmentFile=-/etc/sysconfig/chronyd
            ExecStart=/usr/sbin/chronyd $OPTIONS
            ExecStartPost=/usr/libexec/chrony-helper update-daemon
            PrivateTmp=yes
            ProtectHome=yes
            ProtectSystem=full
            [Install]
            WantedBy=multi-user.target
          enabled: false
          name: chronyd.service
    ignition:
      version: 2.2.0
EOF
oc create -f /data/install/ptp.chrony.conf

cat << EOF > /data/install/ptp4l.conf
[global]
#
# Default Data Set
#
twoStepFlag              1
slaveOnly                0
priority1                128
priority2                128
domainNumber             24
clockClass               248
clockAccuracy            0xFE
offsetScaledLogVariance  0xFFFF
free_running             0
freq_est_interval        0
#
# Port Data Set
# 16 TS a second use logSyncInterval        -4
#
#logAnnounceInterval      4
logAnnounceInterval      1
logSyncInterval          -4
logMinDelayReqInterval   0
logMinPdelayReqInterval  0
announceReceiptTimeout   3
syncReceiptTimeout       0
delayAsymmetry           0
fault_reset_interval     4
neighborPropDelayThresh  20000000
#
# Run time options
#
assume_two_step          0
logging_level            6
path_trace_enabled       0
follow_up_info           0
tx_timestamp_timeout     200
use_syslog               1
verbose                  0
summary_interval         0
kernel_leap              1
check_fup_sync           0
#
# Servo Options
#
pi_proportional_const    0.0
pi_integral_const        0.0
pi_proportional_scale    0.0
pi_proportional_exponent -0.3
pi_proportional_norm_max 0.7
pi_integral_scale        0.0
pi_integral_exponent     0.4
pi_integral_norm_max     0.3
step_threshold           0.00000002
first_step_threshold     0.00002
max_frequency            900000000
clock_servo              nullf
sanity_freq_limit        200000000
ntpshm_segment           0
#
# Transport options
#
transportSpecific        0x0
ptp_dst_mac              01:1B:19:00:00:00
p2p_dst_mac              01:80:C2:00:00:0E
udp6_scope               0x0E
uds_address              /var/run/ptp4l
#
# Default interface options
#
network_transport        UDPv4
#network_transport        L2
delay_mechanism          E2E
time_stamping            hardware
delay_filter             moving_median
delay_filter_length      10
egressLatency            0
ingressLatency           0
boundary_clock_jbod      0
#
# Clock description
#
productDescription       ;;
revisionData             ;;
manufacturerIdentity     00:00:00
userDescription          ;
timeSource               0xA0
EOF

cat << EOF > /data/install/ts2phc.cfg
[global]
use_syslog              0
verbose                 1
logging_level           7
ts2phc.pulsewidth       100000000
# For GNSS module
ts2phc.nmea_serialport /dev/ttyGNSS_6500_0
[ens2f0]
ts2phc.extts_polarity rising
EOF

oc delete configmap ptp-config -n vbbu-demo

oc create configmap ptp-config -n vbbu-demo --from-file=/data/install/ptp4l.conf --from-file=/data/install/ts2phc.cfg --save-config=true

# 06 for fifo
# 07 for nice
export VAR_IMAGE='quay.io/nepdemo/linuxptp:3.1.1-ubi-8.4-v06'

cat << EOF > /data/install/ptp.demo.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nepdemo-linuxptp-daemon
  labels:
    app: nepdemo-linuxptp-daemon
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nepdemo-linuxptp-daemon
  template:
    metadata:
      annotations:
      labels:
        app: nepdemo-linuxptp-daemon
      name: nepdemo-linuxptp-daemon
      # namespace: openshift-ptp
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchFields:
              - key: metadata.name
                operator: In
                values:
                - worker-2.ocp4.redhat.ren
      tolerations:
      - key: "vbbu"
        operator: "Exists"
        effect: "NoSchedule"
      containers:
      - name: ptp4l
        image: $VAR_IMAGE
        command: ["/bin/sh", "-c", "--"]
        args: [" /usr/local/sbin/ptp4l.sh ;"]
        env:
        - name: DEMO_ENV_PTP4L_ARG
          value: " -i ens2f0 -2 "
        - name: DEMO_ENV_PRIO
          value: "65"
        securityContext:
          privileged: true
          runAsUser: 0 
        volumeMounts:
        - mountPath: /etc/ptp4l.conf
          subPath: ptp4l.conf
          name: config-volume
        - mountPath: /var/run
          name: socket-dir
      - name: phc2sys
        image: $VAR_IMAGE
        imagePullPolicy: IfNotPresent
        command: ["/bin/sh", "-c", "--"]
        args: [" /usr/local/sbin/phc2sys.sh ;"]
        env:
        - name: DEMO_ENV_PHC2SYS_ARG
          # value: " -s ens2f0 -O 0 -R 8 "  
          value: " -s ens2f0 -r -u 1 -O 0 -R 8 "
        - name: DEMO_ENV_PRIO
          value: "65"
        securityContext:
          privileged: true
          runAsUser: 0     
        volumeMounts:
        - mountPath: /etc/ptp4l.conf
          subPath: ptp4l.conf
          name: config-volume
        - mountPath: /var/run
          name: socket-dir
      - name: ts2phc
        image: $VAR_IMAGE
        imagePullPolicy: IfNotPresent
        command: ["/bin/sh", "-c", "--"]
        args: [" /usr/local/sbin/ts2phc.sh ;"]
        env:
        - name: DEMO_ENV_TS2PHC_ARG
          value: " -s generic -c ens2f0 "
        - name: DEMO_ENV_PRIO
          value: "65"
        securityContext:
          privileged: true
          runAsUser: 0      
        volumeMounts:
        - mountPath: /etc/ts2phc.cfg
          subPath: ts2phc.cfg
          name: config-volume
        - mountPath: /var/run
          name: socket-dir
        - name: dev
          mountPath: /dev
      hostNetwork: true
      # hostPID: true
      serviceAccountName: svcacct-driver
      volumes:
      - configMap:
          defaultMode: 420
          name: ptp-config
        name: config-volume
      - name: socket-dir
        emptyDir: {}
      - name: dev
        hostPath:
          path: "/dev"
EOF

oc create --save-config -n vbbu-demo -f /data/install/ptp.demo.yaml

# oc delete -n vbbu-demo -f /data/install/ptp.demo.yaml

setup sriov operator

openshift有sriov的operator,官方支持intel e810网卡,我们直接用就好了。

the env has nic Intel e810 : 8086 1593

# install sriov operator
cat << EOF > /data/install/sriov.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-sriov-network-operator
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
spec:
  targetNamespaces:
  - openshift-sriov-network-operator
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-network-operator-subscription
  namespace: openshift-sriov-network-operator
spec:
  channel: "stable"
  installPlanApproval: Manual
  name: sriov-network-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF
oc create -f /data/install/sriov.yaml

oc get SriovNetworkNodeState -n openshift-sriov-network-operator
# NAME                       AGE
# master-0                   42m
# worker-0.ocp4.redhat.ren   42m
# worker-1                   42m
# worker-2.ocp4.redhat.ren   42m

oc get SriovNetworkNodeState/worker-2.ocp4.redhat.ren -n openshift-sriov-network-operator -o yaml
# apiVersion: sriovnetwork.openshift.io/v1
# kind: SriovNetworkNodeState
# metadata:
#   creationTimestamp: "2022-05-06T14:34:54Z"
#   generation: 61
#   name: worker-2.ocp4.redhat.ren
#   namespace: openshift-sriov-network-operator
#   ownerReferences:
#   - apiVersion: sriovnetwork.openshift.io/v1
#     blockOwnerDeletion: true
#     controller: true
#     kind: SriovNetworkNodePolicy
#     name: default
#     uid: 4eca5eea-e1e5-410f-8833-dd2de1434e53
#   resourceVersion: "93262422"
#   uid: 1d122c8e-b788-4f1e-a3d5-865c6230a476
# spec:
#   dpConfigVersion: "93222170"
# status:
#   interfaces:
#   - deviceID: "1593"
#     driver: ice
#     linkSpeed: -1 Mb/s
#     linkType: ETH
#     mac: 40:a6:b7:82:0e:4c
#     mtu: 1500
#     name: ens2f0
#     pciAddress: 0000:65:00.0
#     totalvfs: 64
#     vendor: "8086"
#   - deviceID: "1593"
#     driver: ice
#     linkSpeed: -1 Mb/s
#     linkType: ETH
#     mac: 40:a6:b7:82:0e:4d
#     mtu: 1500
#     name: ens2f1
#     pciAddress: 0000:65:00.1
#     totalvfs: 64
#     vendor: "8086"
#   - deviceID: "1593"
#     driver: ice
#     linkSpeed: -1 Mb/s
#     linkType: ETH
#     mac: 40:a6:b7:82:0e:4e
#     mtu: 1500
#     name: ens2f2
#     pciAddress: 0000:65:00.2
#     totalvfs: 64
#     vendor: "8086"
#   - deviceID: "1593"
#     driver: ice
#     linkSpeed: -1 Mb/s
#     linkType: ETH
#     mac: 40:a6:b7:82:0e:4f
#     mtu: 1500
#     name: ens2f3
#     pciAddress: 0000:65:00.3
#     totalvfs: 64
#     vendor: "8086"
#   - deviceID: 37d1
#     driver: i40e
#     linkSpeed: 1000 Mb/s
#     linkType: ETH
#     mac: ac:1f:6b:ea:5b:32
#     mtu: 1500
#     name: eno1
#     pciAddress: 0000:b5:00.0
#     totalvfs: 32
#     vendor: "8086"
#   - deviceID: 37d1
#     driver: i40e
#     linkSpeed: 1000 Mb/s
#     linkType: ETH
#     mac: ac:1f:6b:ea:5b:33
#     mtu: 1500
#     name: eno2
#     pciAddress: 0000:b5:00.1
#     totalvfs: 32
#     vendor: "8086"
#   syncStatus: Succeeded


# how to use the sriov to create VF and attach to pod, depends on use case from nep demo request
# remember to active SRIOV in bios
# remember to active VT-d in bios
cat << EOF > /data/install/sriov.policy.yaml
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-810-nic01-rt2
  namespace: openshift-sriov-network-operator
spec:
  resourceName: intel_810_nic01_rt2
  nodeSelector:
    kubernetes.io/hostname: worker-2.ocp4.redhat.ren
  numVfs: 2
  nicSelector:
    vendor: "8086"
    deviceID: "1593"
    rootDevices:
      - "0000:65:00.0"
    # pfNames:
    #   - "ens2f0"
  # linkType: eth
  # isRdma: false
  deviceType: vfio-pci 
EOF
oc create -f /data/install/sriov.policy.yaml

# oc delete -f /data/install/sriov.policy.yaml

oc get sriovnetworknodestates/worker-2.ocp4.redhat.ren -n openshift-sriov-network-operator  -o jsonpath='{.status.syncStatus}' && echo
# Succeeded


cat << EOF > /data/install/sriov.attach.yaml
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: intel-810-nic01-vf0-rt2
  namespace: openshift-sriov-network-operator
spec:
  resourceName: intel_810_nic01_rt2
  networkNamespace: vbbu-demo
  vlan: 5
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: intel-810-nic01-vf1-rt2
  namespace: openshift-sriov-network-operator
spec:
  resourceName: intel_810_nic01_rt2
  networkNamespace: vbbu-demo
  vlan: 5
EOF
oc create -f /data/install/sriov.attach.yaml

# oc delete -f /data/install/sriov.attach.yaml

oc get net-attach-def -n vbbu-demo
# NAME                      AGE
# intel-810-nic01-vf0-rt2   2m19s
# intel-810-nic01-vf1-rt2   2m19s


nepdemo license file

把license file放到config map里面,注入容器。

不过呢,当前,我们是在制作容器镜像的步骤里面,直接把license 复制到容器里面了。


# license file 加载到config map中
oc create configmap -n vbbu-demo license.for.nepdemo  \
    --from-file=license=./3496531EC238AD91DED6DBA5BD6B.lic

# to updated config map
oc create configmap -n vbbu-demo license.for.nepdemo --from-file=license=./3496531EC238AD91DED6DBA5BD6B.lic -o yaml --dry-run=client | oc apply -f -

create deployment for release/production

终于, 我们要启动服务了, 这是一个dpdk程序, 我们设置resource request, limits, 达到绑核的目的. 

Finally, we have to start the service, this is a dpdk program, we set the resource request, limits, to achieve the purpose of binding the core.


oc new-project vbbu-demo

oc project vbbu-demo

# kernel driver deployment
oc create serviceaccount svcacct-driver -n vbbu-demo
oc adm policy add-scc-to-user privileged -z svcacct-driver -n vbbu-demo
# oc adm policy add-scc-to-user anyuid -z mysvcacct -n vbbu-demo

16 cpu core config, auto convert

cpu core 1-16


cat << 'EOF' > /data/install/bbu.core.conf.sh
#!/bin/bash

sed -i 's/<systemThread>.*</<systemThread>2, 0, 0</'  /root/flexran/bin/nr5g/gnb/l1/phycfg_xran.xml
sed -i 's/<timerThread>.*</<timerThread>1, 96, 0</'   /root/flexran/bin/nr5g/gnb/l1/phycfg_xran.xml
sed -i 's/<FpgaDriverCpuInfo>.*</<FpgaDriverCpuInfo>3, 96, 0</'   /root/flexran/bin/nr5g/gnb/l1/phycfg_xran.xml
sed -i 's/<FrontHaulCpuInfo>.*</<FrontHaulCpuInfo>3, 96, 0</'     /root/flexran/bin/nr5g/gnb/l1/phycfg_xran.xml
sed -i 's/<radioDpdkMaster>.*</<radioDpdkMaster>2, 99, 0</'       /root/flexran/bin/nr5g/gnb/l1/phycfg_xran.xml
sed -i "s/<BbuPoolThreadDefault_0_63>.*</<BbuPoolThreadDefault_0_63>0x$(to_hex '10,11,12,13,14,15')</"   /root/flexran/bin/nr5g/gnb/l1/phycfg_xran.xml

sed -i 's/<xRANThread>.*</<xRANThread>9, 96, 0</'     /root/flexran/bin/nr5g/gnb/l1/xrancfg_sub6.xml
sed -i "s/<xRANWorker>.*</<xRANWorker>0x$(to_hex '16'), 96, 0</" /root/flexran/bin/nr5g/gnb/l1/xrancfg_sub6.xml

sed -i "s/OAM_SHARED_CORE_BITMAP=.*/OAM_SHARED_CORE_BITMAP=$(to_dec '3,4')/"  /etc/BBU_cfg/cu_cfg/gNodeB_CU_Configuration.cfg
sed -i "s/L3_SHARED_CORE_BITMAP=.*/L3_SHARED_CORE_BITMAP=$(to_dec '4,5')/"    /etc/BBU_cfg/cu_cfg/gNodeB_CU_Configuration.cfg
sed -i "s/PDCP_SHRED_CORE_BITMAP=.*/PDCP_SHRED_CORE_BITMAP=$(to_dec '7,8')/"  /etc/BBU_cfg/cu_cfg/gNodeB_CU_Configuration.cfg
sed -i "s/RRM_SHARED_CORE_BITMAP=.*/RRM_SHARED_CORE_BITMAP=$(to_dec '1,8')/"  /etc/BBU_cfg/cu_cfg/gNodeB_CU_Configuration.cfg
sed -i "s/SON_SHARED_CORE_BITMAP=.*/SON_SHARED_CORE_BITMAP=$(to_dec '1,2')/"  /etc/BBU_cfg/cu_cfg/gNodeB_CU_Configuration.cfg

# https://unix.stackexchange.com/questions/487451/sed-replace-a-pattern-between-a-pattern-and-the-end-of-file
sed -i "/<oam_shm_logger_cfg>/,\$s/<cpu_bitmap>.*</<cpu_bitmap>$(to_dec '7')</" /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i "/<shm_logger_cfg>/,\$s/<cpu_bitmap>.*</<cpu_bitmap>$(to_dec '7')</"     /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i '/<L3Params>/,$s/<core_no>.*</<core_no>2</'        /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i '/<process_name>gnb_cu_son</,$s/<process_args>.* /<process_args>2 /' /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i '/<process_name>gnb_cu_rrm</,$s/<process_args>.* /<process_args>2 /' /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i '/<pdcp_index>0</,$s/<core_num_for_worker_thread>.*</<core_num_for_worker_thread>3</' /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i '/<pdcp_index>1</,$s/<core_num_for_worker_thread>.*</<core_num_for_worker_thread>3</' /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i '/<egtpu_instance>0</,$s/<core_num_of_worker_thread>.*</<core_num_of_worker_thread>6</'  /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i '/<egtpu_instance>1</,$s/<core_num_of_worker_thread>.*</<core_num_of_worker_thread>6</'  /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i '/<f1u_instance>0</,$s/<core_num_of_worker_thread>.*</<core_num_of_worker_thread>6</'    /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i '/<f1u_instance>1</,$s/<core_num_of_worker_thread>.*</<core_num_of_worker_thread>6</'    /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml
sed -i 's/<core_num_mapping>.*</<core_num_mapping>4,4</'    /etc/BBU_cfg/cu_cfg/Proprietary_gNodeB_CU_Data_Model.xml

sed -i 's/MAC_BINREAD_CORE_NUM=.*/MAC_BINREAD_CORE_NUM=1/'  /etc/BBU_cfg/du_cfg/gNB_DU_Configuration.cfg
sed -i 's/RLC_BINREAD_CORE_NUM=.*/RLC_BINREAD_CORE_NUM=1/'  /etc/BBU_cfg/du_cfg/gNB_DU_Configuration.cfg
sed -i 's/MAC_HP_CORE_NUM=.*/MAC_HP_CORE_NUM=5/'            /etc/BBU_cfg/du_cfg/gNB_DU_Configuration.cfg
sed -i 's/RLC_MASTER_CORE_NUM=.*/RLC_MASTER_CORE_NUM=2/'    /etc/BBU_cfg/du_cfg/gNB_DU_Configuration.cfg
sed -i "s/SHARED_CORE_BITMAP=.*/SHARED_CORE_BITMAP=$(to_dec '5,6')/"      /etc/BBU_cfg/du_cfg/gNB_DU_Configuration.cfg
sed -i 's/RELAY_ADAPTER_RECVR_THREAD_CORE_NUM=.*/RELAY_ADAPTER_RECVR_THREAD_CORE_NUM=4/'  /etc/BBU_cfg/du_cfg/gNB_DU_Configuration.cfg

sed -i '/<RlcProvsioningParams>/,$s/<CoreNumWorkerThread>.*</<CoreNumWorkerThread>7</'    /etc/BBU_cfg/du_cfg/Proprietary_gNodeB_DU_Data_Model.xml
sed -i '/<RlclSystemParams>/,$s/<CoreNumWorkerThread>.*</<CoreNumWorkerThread>7</'        /etc/BBU_cfg/du_cfg/Proprietary_gNodeB_DU_Data_Model.xml
sed -i '/<F1uProvisioningParams>/,$s/<numCoreWorkerThreads>.*</<numCoreWorkerThreads>7</' /etc/BBU_cfg/du_cfg/Proprietary_gNodeB_DU_Data_Model.xml

#  --a=8 --t=8  --b=8  
sed -i 's/\.\/gnb_cu_pdcp .* >/\.\/gnb_cu_pdcp --r=2 --a=8 --t=8 --m=2 --i=0 --b=8 --p=0 --s=50 --n=10 >/' /home/BaiBBU_XSS/BaiBBU_SXSS/gNB_app

EOF

oc delete configmap vbbu-core-config -n vbbu-demo

oc create configmap vbbu-core-config -n vbbu-demo --from-file=/data/install/bbu.core.conf.sh --save-config=true


create vbbu deployment

oc adm taint nodes worker-2.ocp4.redhat.ren vbbu=realtime:NoSchedule 
# oc adm taint nodes worker-2.ocp4.redhat.ren vbbu=realtime:NoExecute-

oc get nodes -o json | jq '.items[] | .metadata.name, (.spec.taints | tostring )' | paste - -
# "master-0"      "null"
# "worker-1"      "null"
# "worker-2.ocp4.redhat.ren"      "[{\"effect\":\"NoSchedule\",\"key\":\"vbbu\",\"value\":\"realtime\"}]"

export REG_TMP='tmp-registry.ocp4.redhat.ren:5443'

# the pod with vbbu container and dev container
# later, it will change to deployment
cat << EOF > /data/install/vran.intel.flexran.yaml
---

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: host-device-vbbu-demo
spec:
  config: '{
    "cniVersion": "0.3.1",
    "type": "host-device",
    "device": "ens2f2",
    "ipam": {
      "type": "static",
      "addresses": [
        {
          "address": "192.168.12.20/24"
        },
        {
          "address": "192.168.12.19/24"
        }
      ]
    }
  }'


---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flexran-binary-release-deployment
  labels:
    app: flexran-binary-release-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flexran-binary-release
  template:
    metadata:
      labels:
        app: flexran-binary-release
      name: flexran-binary-release
      annotations:
        k8s.v1.cni.cncf.io/networks: |-
          [
            { 
              "name": "host-device-vbbu-demo"
            },
            {
              "name": "intel-810-nic01-vf0-rt2",
              "mac": "00:11:22:33:44:66"
            },
            {
              "name": "intel-810-nic01-vf1-rt2",
              "mac": "00:11:22:33:44:67"
            }
          ]
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - flexran-binary-release
              topologyKey: "kubernetes.io/hostname"
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - worker-2.ocp4.redhat.ren
      tolerations:
      - key: "vbbu"
        operator: "Exists"
        effect: "NoSchedule"
      serviceAccountName: svcacct-driver
      containers:
      - name: flexran-release-running
        securityContext:
          privileged: true
          runAsUser: 0
        # command: [ "/sbin/init" ]
        command: [ "/bin/sh","-c","--" ]
        args: [" /root/systemd/set_ip.sh ; cd /home/BaiBBU_XSS/tools/ ; ./XRAN_BBU start ; trap  '{ cd /home/BaiBBU_XSS/tools/ ; ./XRAN_BBU stop ; exit 255; }'  SIGINT SIGTERM ERR EXIT ; sleep infinity ; "]
        tty: true
        stdin: true
        image: ${REG_TMP}/nepdemo/flexran_vdu:flexran-20.11-dpdk-19.11-ocp4.9.5-ubi-8.4-core-conf
        imagePullPolicy: Always
        resources:
          requests:
            cpu: 16
            memory: "48Gi" 
            hugepages-1Gi: 24Gi  
          limits:
            cpu: 16
            memory: "48Gi"
            hugepages-1Gi: 24Gi
        volumeMounts:
        - name: hugepage
          mountPath: /hugepages
          readOnly: False
        - name: varrun
          mountPath: /var/run/dpdk
          readOnly: false
        - name: lib-modules
          mountPath: /lib/modules
        - name: src
          mountPath: /usr/src
        - name: dev
          mountPath: /dev
        - name: cache-volume
          mountPath: /dev/shm
        - name: license-volume
          mountPath: /nepdemo/lic
        - mountPath: /root/bbu.core.conf.sh
          subPath: bbu.core.conf.sh
          name: vbbu-core-config-volume

      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
      - name: varrun
        emptyDir: {}
      - name: lib-modules
        hostPath:
          path: /lib/modules
      - name: src
        hostPath:
          path: /usr/src
      - name: dev
        hostPath:
          path: "/dev"
      - name: cache-volume
        emptyDir:
          medium: Memory
          sizeLimit: 1Gi
      - name: license-volume
        configMap:
          name: license.for.nepdemo
          items:
          - key: license
            path: license.lic
      - name: vbbu-core-config-volume
        configMap:
          defaultMode: 420
          name: vbbu-core-config
---

apiVersion: v1
kind: Service
metadata:
  name: vbbu-http 
spec:
  ports:
  - name: http
    port: 80
    targetPort: 80 
    nodePort: 31071
  type: NodePort 
  selector:
    app: flexran-binary-release

---

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: vbbu-http 
spec:
  port:
    targetPort: http
  to:
    kind: Service
    name: vbbu-http 

---
EOF
oc create -n vbbu-demo -f /data/install/vran.intel.flexran.yaml

# oc delete -n vbbu-demo -f /data/install/vran.intel.flexran.yaml

# below, used for debug 

POD_ID=$(oc get pod -n vbbu-demo -o json | jq -r '.items[].metadata.name | select(. | contains("flexran-binary-release"))' )
oc rsh -c flexran-release-running ${POD_ID}
# below runs the command in the pod
bash

tail -100 /root/flexran/bin/nr5g/gnb/l1/Phy.log
# ......
# ==== l1app Time: 315002 ms NumCarrier: 1 NumBbuCores: 6. Tti2Tti Time: [  0.00..  0.00..  0.00] usces
# ==== [o-du0][rx 17639351 pps 55999 kbps 1585561][tx 58086520 pps 184408 kbps 5133280] [on_time 17639351 early 0 late 0 corrupt 0 pkt_dupl 8 Total 17639351]
#      Pusch[   64000    63999    64000    64000        0        0        0        0] SRS[       0]
# -------------------------------------------------------------------------------------------------------------------------------------------------------
#       Cell        DL Tput           UL Tput         UL BLER         SRS SNR    MIMO    PCI
#       0 (Kbps)  1,329,880     34,314 /    36,537      0.00%         0 Db       4T4R    21
# -------------------------------------------------------------------------------------------------------------------------------------------------------
# Core Utilization [6 BBU core(s)]:
#      Core Id:  10  11  12  13  14  15   Avg
#      Util %:   14  20  18  17  16  21 17.67
#      Xran Id:   9  16     Master Core Util:  61 %
# -------------------------------------------------------------------------------------------------------------------------------------------------------

top to show thread and core

我们在调优的时候,特别是分配核绑定的时候,经常需要看看现在都有哪些线程,占用了哪些核,或者说,哪些核比较繁忙,我们就要重新平衡一下。那么就需要一个好用的工具,来帮助我们。很幸运top本身就具备这样的功能。我们只要 -H 运行 top,然后打开线程绑定的核的显示项就可以了。

When we are tuning, especially when assigning core bindings, we often need to see which threads are currently available, which cores are occupied, or which cores are busy, and we need to rebalance. Then we need a useful tool to help us. Fortunately, top itself has such a function. We just need to run top with -H, and then open the display item of the core bound to the thread.


sudo -i
# /root/.config/procps/toprc

mkdir /root/wzh

cat << 'EOF' > /root/wzh/.toprc
top's Config File (Linux processes with windows)
Id:i, Mode_altscr=0, Mode_irixps=1, Delay_time=3.0, Curwin=0
Def     fieldscur=ķ&')*+,-./01258<>?ABCFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghij
        winflags=193844, sortindx=18, maxtasks=0, graph_cpus=0, graph_mems=0
        summclr=1, msgsclr=1, headclr=3, taskclr=1
Job     fieldscur=(Ļ@<)*+,-./012568>?ABCFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghij
        winflags=193844, sortindx=0, maxtasks=0, graph_cpus=0, graph_mems=0
        summclr=6, msgsclr=6, headclr=7, taskclr=6
Mem     fieldscur=<MBND34&'()*+,-./0125689FGHIJKLOPQRSTUVWXYZ[\]^_`abcdefghij
        winflags=193844, sortindx=21, maxtasks=0, graph_cpus=0, graph_mems=0
        summclr=5, msgsclr=5, headclr=4, taskclr=5
Usr     fieldscur=)+,-./1234568;<=>?@ABCFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghij
        winflags=193844, sortindx=3, maxtasks=0, graph_cpus=0, graph_mems=0
        summclr=3, msgsclr=3, headclr=2, taskclr=3
Fixed_widest=0, Summ_mscale=1, Task_mscale=0, Zero_suppress=0

EOF

HOME="/root/wzh/" top -H

end

show netflow table in openshift 4.10

begin in openshift 4.10, admin can set ovs to export netflow to a remote server

install lvm operator

we need local storage, and we are single node openshift, so we use lvm operator, find the operator from operator hub and install :

lvm operator is in TP, so it is buggy, we need some fix.


oc create ns lvm-operator-system

ssh -tt core@192.168.7.13 -- lsblk
# NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
# sr0     11:0    1  1024M  0 rom
# vda    252:0    0   120G  0 disk
# ├─vda1 252:1    0     1M  0 part
# ├─vda2 252:2    0   127M  0 part
# ├─vda3 252:3    0   384M  0 part /boot
# └─vda4 252:4    0 119.5G  0 part /sysroot
# vdb    252:16   0   100G  0 disk

oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:lvm-operator-system:topolvm-controller -n lvm-operator-system

oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:lvm-operator-system:vg-manager -n lvm-operator-system

oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:lvm-operator-system:topolvm-node -n lvm-operator-system

cat << EOF > /data/install/lvm.op.yaml
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
  name: lvmcluster-sample
spec:
  storage:
    deviceClasses:
    - name: vg1
    #   thinPoolConfig:
    #     name: thin-pool-1
    #     sizePercent: 50
    #     overprovisionRatio: 50
EOF
oc create -n lvm-operator-system -f /data/install/lvm.op.yaml

kubectl patch storageclass odf-lvm-vg1 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

ssh -tt core@192.168.7.13 -- sudo pvs
#   PV         VG  Fmt  Attr PSize    PFree
#   /dev/vdb   vg1 lvm2 a--  <100.00g <100.00g

ssh -tt core@192.168.7.13 -- sudo vgs
#   VG  #PV #LV #SN Attr   VSize    VFree
#   vg1   1   0   0 wz--n- <100.00g <100.00g

oc get lvmvolumegroup vg1 -oyaml -n lvm-operator-system
# apiVersion: lvm.topolvm.io/v1alpha1
# kind: LVMVolumeGroup
# metadata:
#   creationTimestamp: "2022-05-19T08:59:24Z"
#   generation: 1
#   name: vg1
#   namespace: lvm-operator-system
#   resourceVersion: "37141"
#   uid: c67e2c71-06bc-42f8-be3e-18b7df220725
# spec: {}

oc get lvmvolumegroupnodestatuses.lvm.topolvm.io acm-demo-hub-master -oyaml -n lvm-operator-system
# apiVersion: lvm.topolvm.io/v1alpha1
# kind: LVMVolumeGroupNodeStatus
# metadata:
#   creationTimestamp: "2022-05-19T09:02:34Z"
#   generation: 1
#   name: acm-demo-hub-master
#   namespace: lvm-operator-system
#   resourceVersion: "38271"
#   uid: bc37f640-444c-4cca-bb2e-9235408b52e1
# spec:
#   nodeStatus:
#   - devices:
#     - /dev/vdb
#     name: vg1
#     status: Ready

oc get storageclass
# NAME          PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
# odf-lvm-vg1   topolvm.cybozu.com   Delete          WaitForFirstConsumer   true                   17m

kubectl patch storageclass odf-lvm-vg1 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

cat << EOF > /data/install/lvm.op.pvc.sample.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lvm-file-pvc
spec:
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: odf-lvm-vg1
EOF
oc create -f /data/install/lvm.op.pvc.sample.yaml -n default

cat <<EOF > /data/install/lvm.op.app.sample.yaml
apiVersion: v1
kind: Pod
metadata:
  name: app-file
spec:
  containers:
  - name: app-file
    image: registry.access.redhat.com/ubi8/ubi:8.4
    imagePullPolicy: IfNotPresent
    command: ["/usr/bin/bash", "-c", "/usr/bin/tail -f /dev/null"]
    volumeMounts:
    - mountPath: "/mnt/file"
      name: lvm-file-pvc
  volumes:
    - name: lvm-file-pvc
      persistentVolumeClaim:
        claimName: lvm-file-pvc
EOF
oc create -f /data/install/lvm.op.app.sample.yaml -n default

ssh -tt core@192.168.7.13 -- sudo lvs
#   LV                                   VG  Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#   34f10bb3-ebd0-4eab-acc9-41b68de832d0 vg1 -wi-ao---- 5.00g

install NetObserv Operator

install loki

following netobserv operator's installation guide, you can install a simplified version of loki.

# install Loki
kubectl create namespace network-observability

# oc delete ns network-observability

wget https://raw.githubusercontent.com/netobserv/documents/main/examples/zero-click-loki/1-storage.yaml
wget https://raw.githubusercontent.com/netobserv/documents/main/examples/zero-click-loki/2-loki.yaml

kubectl apply -f /data/install/1-storage.yaml -n network-observability
kubectl apply -f /data/install/2-loki.yaml -n network-observability

# oc delete -f /data/install/2-loki.yaml -n network-observability
# oc delete -f /data/install/1-storage.yaml -n network-observability

install NetObserv Operator

find the netobserv operator from operator hub, and install:

create flow collector with default config:


# check the result 
for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node -o jsonpath='{range@.items[*]}{.metadata.name}{"\n"}{end}'); do  echo; echo $pod; oc -n openshift-ovn-kubernetes exec -c ovnkube-node $pod \
  -- bash -c 'for type in ipfix sflow netflow ; do ovs-vsctl find $type ; done'; done
# ovnkube-node-988rk
# _uuid               : 6a6c11b7-157c-4cce-be66-9bafec4627de
# cache_active_timeout: 60
# cache_max_flows     : 100
# external_ids        : {}
# obs_domain_id       : []
# obs_point_id        : []
# other_config        : {}
# sampling            : 400
# targets             : ["192.168.7.13:2055"]

install grafana

select grafana community operator

create grafana instance with default setting


# create a route by yourself
oc expose service/grafana-service -n network-observability

oc get route  -n network-observability
# NAME              HOST/PORT                                                            PATH   SERVICES          PORT      TERMINATION   WILDCARD
# grafana-service   grafana-service-network-observability.apps.acm-demo-hub.redhat.ren          grafana-service   grafana                 None

# get username and password of the grafana
oc get secret/grafana-admin-credentials  -n network-observability -o json | jq -r .data.GF_SECURITY_ADMIN_USER | base64 -d && echo
# admin
oc get secret/grafana-admin-credentials  -n network-observability -o json | jq -r .data.GF_SECURITY_ADMIN_PASSWORD | base64 -d && echo
# ggQhu8PwVS0poQ==

# create a grafana and import dashboards
# https://github.com/netobserv/network-observability-operator/blob/release-4.10/config/samples/dashboards/Network%20Observability.json

import dashboards from :

  • https://github.com/netobserv/network-observability-operator/blob/release-4.10/config/samples/dashboards/Network%20Observability.json

create loki datasource:

then the result:

from openshift console

end

install loki operator

FlexRAN 20.11 enable on ocp4

本文描述,如何把 intel 的 oran 解决方案 flexran ,移植到 openshift 平台之上。

容器镜像构建和运行架构,文件目录结构:

容器运行的时候,和operator以及硬件的关系结构图:

问题

  1. ptp服务配置了,vbbu 怎么用?

prepare public cloud env

我们先在公网环境里面编译镜像,并且上传到quay.io

basic init setup


# vultr, ssh enhance

# disable user/passwd login
# ChallengeResponseAuthentication no
# PasswordAuthentication no
# UsePAM no
sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config

systemctl restart sshd

ssh root@v.redhat.ren -o PubkeyAuthentication=no
# root@v.redhat.ren: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

subscription-manager register --auto-attach --username ******** --password ********

subscription-manager release --list
subscription-manager release --set=8.4

subscription-manager repos \
    --enable="codeready-builder-for-rhel-8-x86_64-rpms" 

dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

dnf install -y byobu htop fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true
# [recidive]
# enabled = true
EOF

systemctl enable --now fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true
[recidive]
enabled = true
EOF

systemctl restart fail2ban

# byobu
dnf update -y

reboot

install ocp rhcos rt kernel

mkdir -p /data/ostree

export BUILDNUMBER=4.9.5

wget -O openshift-client-linux-${BUILDNUMBER}.tar.gz https://mirror.openshift.com/pub/openshift-v4/clients/ocp/${BUILDNUMBER}/openshift-client-linux-${BUILDNUMBER}.tar.gz
wget -O openshift-install-linux-${BUILDNUMBER}.tar.gz https://mirror.openshift.com/pub/openshift-v4/clients/ocp/${BUILDNUMBER}/openshift-install-linux-${BUILDNUMBER}.tar.gz

tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/sbin/
tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/sbin/

oc image extract --path /:/data/ostree --registry-config /data/pull-secret.json   `  curl -s https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/$BUILDNUMBER/release.txt | grep machine-os-content | awk '{print $2}'  `

mkdir -p /data/dnf
mv /data/ostree/extensions /data/dnf/
rm -rf /data/ostree

mkdir -p /etc/yum.repos.d
cat > /etc/yum.repos.d/rt.repo << 'EOF'
[rt]
name=rt
baseurl=file:///data/dnf/extensions
gpgcheck=0
EOF

dnf install -y kernel-rt-core kernel-rt-devel kernel-rt-modules kernel-rt-modules-extra kernel-headers libhugetlbfs-devel zlib-devel numactl-devel cmake gcc gcc-c++

reboot

build flexran with intel icc/icx

dnf groupinstall -y 'Development Tools'
dnf install -y cmake

# flexran install on host
# yum install centos-release-scl devtoolset-8 -y

# install intel icc icx

cd /data/down
tar zvxf  system_studio_2019_update_3_ultimate_edition_offline.tar.gz

cd /data/down/system_studio_2019_update_3_ultimate_edition_offline

cat > s.cfg << 'EOF'
ACCEPT_EULA=accept
CONTINUE_WITH_OPTIONAL_ERROR=yes
PSET_INSTALL_DIR=/opt/intel
CONTINUE_WITH_INSTALLDIR_OVERWRITE=yes
COMPONENTS=ALL
PSET_MODE=install
ACTIVATION_SERIAL_NUMBER=******************
ACTIVATION_TYPE=serial_number
EOF

./install.sh -s s.cfg

echo "source  /opt/intel/system_studio_2019/bin/compilervars.sh intel64" >> /root/.bashrc  


cd /data/down/

# wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18236/l_BaseKit_p_2021.4.0.3422_offline.sh

bash l_BaseKit_p_2021.4.0.3422_offline.sh

# source /opt/intel/oneapi/setvars.sh
echo "source /opt/intel/oneapi/setvars.sh" >> /root/.bashrc  

download dpdk and patch, and comile flexran sdk

cd /data/down/

# wget https://fast.dpdk.org/rel/dpdk-19.11.tar.xz

tar xf dpdk-19.11.tar.xz
rm -rf /opt/dpdk-19.11
mv /data/down/dpdk-19.11 /opt

export RTE_SDK=/opt/dpdk-19.11
cd $RTE_SDK 
patch -p1 < /data/down/dpdk_19.11_20.11.7.patch

# patch flexran
pip3 install meson ninja
# dnf install -y ninja-build

# dnf install -y cmake

rm -rf /data/flexran/
mkdir -p /data/flexran/
cd /data/down
tar zvxf FlexRAN-20.11.tar.gz -C /data/flexran/

export RTE_SDK=/opt/dpdk-19.11
cd /data/flexran
./extract.sh

cd /data/flexran
source set_env_var.sh -d
# for intel: /opt/intel/system_studio_2019/
# for dpdk: /opt/dpdk-19.11

# sourcing /opt/intel/system_studio_2019//bin/iccvars.sh  intel64 -platform linux
# Set RTE_SDK=/opt/dpdk-19.11
# Set RTE_TARGET=x86_64-native-linuxapp-icc


# ====================================================================================
# Environment Variables:
# ====================================================================================
# RTE_SDK=/opt/dpdk-19.11
# RTE_TARGET=x86_64-native-linuxapp-icc
# WIRELESS_SDK_TARGET_ISA=avx512
# RPE_DIR=/data/flexran/libs/ferrybridge
# CPA_DIR=/data/flexran/libs/cpa
# ROE_DIR=/data/flexran/libs/roe
# XRAN_DIR=/data/flexran/xran
# DIR_WIRELESS_SDK_ROOT=/data/flexran/sdk
# SDK_BUILD=build-avx512-icc
# DIR_WIRELESS_SDK=/data/flexran/sdk/build-avx512-icc
# FLEXRAN_SDK=/data/flexran/sdk/build-avx512-icc/install
# DIR_WIRELESS_FW=/data/flexran/framework
# DIR_WIRELESS_TEST_4G=/data/flexran/tests/lte
# DIR_WIRELESS_TEST_5G=/data/flexran/tests/nr5g
# DIR_WIRELESS_TABLE_5G=/data/flexran/bin/nr5g/gnb/l1/table
# ====================================================================================

./flexran_build.sh -e -r 5gnr_sub6 -i avx512 -m sdk

# https://www.i4k.xyz/article/qq_40982287/119571504
sed -i "s/.ndo_tx_timeout = kni_net_tx_timeout,/\/\/.ndo_tx_timeout = kni_net_tx_timeout,/g" /opt/dpdk-19.11/kernel/linux/kni/kni_net.c

sed -i 's/DEFAULT_PATH=.*/DEFAULT_PATH=\/opt\/intel\/system_studio_2019\/bin\/iccvars.sh/' /opt/dpdk-19.11/usertools/dpdk-setup.sh

sed -i 's/CONFIG_RTE_BBDEV_SDK_AVX2=.*/CONFIG_RTE_BBDEV_SDK_AVX2=y/' /opt/dpdk-19.11/config/common_base
sed -i 's/CONFIG_RTE_BBDEV_SDK_AVX512=.*/CONFIG_RTE_BBDEV_SDK_AVX512=y/' /opt/dpdk-19.11/config/common_base
# DEFAULT_PATH=/opt/intel/system_studio_2019/bin/iccvars.sh
# sed -i 's/CONFIG_RTE_BUILD_SHARED_LIB=.*/CONFIG_RTE_BUILD_SHARED_LIB=y/' /opt/dpdk-19.11/config/common_base

sed -i 's/MODULE_CFLAGS += -Wall -Werror/#MODULE_CFLAGS += -Wall -Werror/' /opt/dpdk-19.11/kernel/linux/kni/Makefile

cd /opt/dpdk-19.11/usertools/
./dpdk-setup.sh
# 39
# 62

sed -i 's/#include <linux\/bootmem.h>/\/\/#include <linux\/bootmem.h>/' /data/flexran/libs/cpa/sub6/rec/drv/src/nr_dev.c

# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/flexran/wls_mod/lib

# export CC=icc

# export DEV_OPT=" -Wl,--exclude-libs,/usr/lib64/libmvec_nonshared.a "

# export LDFLAGS=" -Wl,--exclude-libs,/usr/lib64/libmvec_nonshared.a "

# export RTE_LIBS=" -Wl,--exclude-libs,/usr/lib64/libmvec_nonshared.a "

# -Wl,--exclude-libs=libmvec_nonshared.a
# -Wl,--allow-multiple-definition

sed -i 's/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread  -Wl,--allow-multiple-definition/' /data/flexran/build/nr5g/gnb/l1app/makefile_phy

sed -i 's/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread -Wl,--allow-multiple-definition -Wl,-lrte_port -Wl,-lrte_cryptodev -Wl,-lrte_eventdev/' /data/flexran/build/nr5g/gnb/testapp/linux/makefile_phy

sed -i 's/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread -Wl,--allow-multiple-definition/' /data/flexran/build/lte/l1app_nbiot/makefile

sed -i 's/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread -Wl,--allow-multiple-definition/' /data/flexran/build/lte/bbdevapp/Makefile

sed -i 's/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread -Wl,--allow-multiple-definition/' /data/flexran/build/lte/l1app/makefile

sed -i 's/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread/@$(LD) -o $@ $(LD_FLAGS) -Wl,-L $(BUILDDIR) $(INC_LIBS) -lm -lrt -lpthread -Wl,--allow-multiple-definition/' /data/flexran/build/nr5g/gnb/bbdevapp/Makefile

sed -i 's/@$(CC) -o $(APP) $(OBJS) $(RTE_LIBS) $(LDFLAGS)/@$(CC) -o $(APP) $(OBJS) $(RTE_LIBS) $(LDFLAGS) -Wl,--allow-multiple-definition/' /data/flexran/build/nr5g/gnb/testmac/makefile

sed -i 's/@$(CC) -o $(APP) $(OBJS) $(RTE_LIBS) $(LDFLAGS)/@$(CC) -o $(APP) $(OBJS) $(RTE_LIBS) $(LDFLAGS) -Wl,--allow-multiple-definition/' /data/flexran/build/lte/l1app_nbiot/makefile

# -Wl,-lrte_port -Wl,-lrte_cryptodev -Wl,-lrte_eventdev
# build/nr5g/gnb/testapp/linux/makefile_phy:540

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/flexran/wls_mod/lib

cd /data/flexran
./flexran_build.sh -e -r 5gnr_sub6 -i avx512 -b


# dnf install -y podman-docker

# export RTE_SDK=/opt/dpdk-19.11
# cd /data/flexran
# bash ./flexran_build_dockerfile.sh -v -e -i avx512 -r 5gnr_sub6 -b -m all

# podman image ls
# # REPOSITORY                           TAG         IMAGE ID      CREATED         SIZE
# # flexran.docker.registry/flexran_vdu  latest      8c5460a697e6  16 minutes ago  1.36 GB
# # quay.io/centos/centos                7.9.2009    8652b9f0cb4c  17 months ago   212 MB

# podman tag flexran.docker.registry/flexran_vdu:latest  quay.io/nepdemo/flexran_vdu:flexran-20.11-dpdk-20.11.3-ocp4.9.5-centos-7.9

# podman push quay.io/nepdemo/flexran_vdu:flexran-20.11-dpdk-20.11.3-ocp4.9.5-centos-7.9

vsftpd

我们需要在本地准备一个ftp服务器,来承载rt-kernel的repo,后面编译容器镜像,需要访问这个临时的repo

dnf install -y vsftpd
sed -i 's/anonymous_enable=NO/anonymous_enable=YES/g' /etc/vsftpd/vsftpd.conf
systemctl disable --now firewalld
systemctl enable --now vsftpd

mkdir -p /var/ftp/dnf
mount --bind /data/dnf /var/ftp/dnf
chcon -R -t public_content_t  /var/ftp/dnf
find /data/dnf/extensions -type f -exec chmod 644 {} \;

chmod +x /etc/rc.d/rc.local
cat << EOF >>/etc/rc.d/rc.local

iptables -A INPUT -d 10.88.0.1 -j ACCEPT
iptables -A INPUT -p tcp --dport 21 -j REJECT

EOF
systemctl enable --now rc-local

flexran_vdu for rhel8.4

dnf install -y podman-docker

export RTE_SDK=/opt/dpdk-19.11
cd /data/flexran

bash ./flexran_build_dockerfile.wzh.sh -v -e -i avx512 -r 5gnr_sub6 -b -m all

podman tag flexran.docker.registry/flexran_vdu:latest  quay.io/nepdemo/flexran_vdu:flexran-20.11-dpdk-19.11-ocp4.9.5-ubi-8.4

podman push quay.io/nepdemo/flexran_vdu:flexran-20.11-dpdk-19.11-ocp4.9.5-ubi-8.4

copy flexran sdk to image

cat << 'EOF' > /data/flexran.sdk.dockerfile
FROM registry.access.redhat.com/ubi8/ubi:8.4

RUN dnf repolist
RUN sed -i 's|enabled=1|enabled=0|g' /etc/yum/pluginconf.d/subscription-manager.conf
RUN sed -i 's|$releasever|8.4|g' /etc/yum.repos.d/redhat.repo
RUN sed -i '/codeready-builder-for-rhel-8-x86_64-rpms/,/\[/ s/enabled = 0/enabled = 1/' /etc/yum.repos.d/redhat.repo
RUN mv -f /etc/yum.repos.d/ubi.repo /etc/yum.repos.d/ubi.repo.bak

RUN dnf -y update
RUN dnf -y install rsync

COPY flexran /data/flexran
EOF

cd /data
podman build --squash -t quay.io/nepdemo/flexran_basekit:flexran-sdk-20.11-ocp-4.9.5-ubi-8.4 -f flexran.sdk.dockerfile ./

podman push quay.io/nepdemo/flexran_basekit:flexran-sdk-20.11-ocp-4.9.5-ubi-8.4

copy intel icc to image

cat << 'EOF' > /opt/intel/flexran.intel.icc.dockerfile
FROM registry.access.redhat.com/ubi8/ubi:8.4

RUN dnf repolist
RUN sed -i 's|enabled=1|enabled=0|g' /etc/yum/pluginconf.d/subscription-manager.conf
RUN sed -i 's|$releasever|8.4|g' /etc/yum.repos.d/redhat.repo
RUN sed -i '/codeready-builder-for-rhel-8-x86_64-rpms/,/\[/ s/enabled = 0/enabled = 1/' /etc/yum.repos.d/redhat.repo
RUN mv -f /etc/yum.repos.d/ubi.repo /etc/yum.repos.d/ubi.repo.bak

RUN dnf -y update
RUN dnf -y install rsync

COPY system_studio_2019 /opt/intel/system_studio_2019
COPY licenses /opt/intel/licenses
COPY packagemanager /opt/intel/packagemanager
EOF

cd /opt/intel
podman build --squash -t quay.io/nepdemo/flexran_basekit:intel.icc-21.11-ocp-4.9.5-ubi-8.4 -f flexran.intel.icc.dockerfile ./

podman push quay.io/nepdemo/flexran_basekit:intel.icc-21.11-ocp-4.9.5-ubi-8.4

copy intel icx to image

cat << 'EOF' > /opt/intel/flexran.intel.icx.dockerfile
FROM registry.access.redhat.com/ubi8/ubi:8.4

RUN dnf repolist
RUN sed -i 's|enabled=1|enabled=0|g' /etc/yum/pluginconf.d/subscription-manager.conf
RUN sed -i 's|$releasever|8.4|g' /etc/yum.repos.d/redhat.repo
RUN sed -i '/codeready-builder-for-rhel-8-x86_64-rpms/,/\[/ s/enabled = 0/enabled = 1/' /etc/yum.repos.d/redhat.repo
RUN mv -f /etc/yum.repos.d/ubi.repo /etc/yum.repos.d/ubi.repo.bak

RUN dnf -y update
RUN dnf -y install rsync

COPY oneapi /opt/intel/oneapi
COPY licenses /opt/intel/licenses
COPY packagemanager /opt/intel/packagemanager
EOF

cd /opt/intel
podman build --squash -t quay.io/nepdemo/flexran_basekit:intel.icx-21.11-ocp-4.9.5-ubi-8.4 -f flexran.intel.icx.dockerfile ./

podman push quay.io/nepdemo/flexran_basekit:intel.icx-21.11-ocp-4.9.5-ubi-8.4

build dev docker image with dpdk 19.11

cat << 'EOF' > /opt/flexran.dpdk.dockerfile
FROM registry.access.redhat.com/ubi8/ubi:8.4

RUN dnf repolist
RUN sed -i 's|enabled=1|enabled=0|g' /etc/yum/pluginconf.d/subscription-manager.conf
RUN sed -i 's|$releasever|8.4|g' /etc/yum.repos.d/redhat.repo
RUN sed -i '/codeready-builder-for-rhel-8-x86_64-rpms/,/\[/ s/enabled = 0/enabled = 1/' /etc/yum.repos.d/redhat.repo
RUN mv -f /etc/yum.repos.d/ubi.repo /etc/yum.repos.d/ubi.repo.bak

RUN echo -e "\
[localrepo]\n\
name=LocalRepo\n\
baseurl=ftp://10.88.0.1/dnf/extensions/\n\
enabled=1\n\
gpgcheck=0" \
> /etc/yum.repos.d/local.repo

RUN dnf -y update
RUN dnf -y install rsync

RUN dnf -y install kernel-rt-core kernel-rt-devel kernel-rt-modules kernel-rt-modules-extra kernel-headers libhugetlbfs-devel zlib-devel numactl-devel cmake gcc gcc-c++ libhugetlbfs-utils libhugetlbfs-devel libhugetlbfs numactl-devel pciutils libaio libaio-devel net-tools libpcap python3-pip
RUN dnf install -y --allowerasing coreutils
RUN dnf groupinstall -y development server
RUN pip-3 install meson ninja

COPY dpdk-19.11 /opt/dpdk-19.11
# RUN ln -s /opt/dpdk-stable-20.11.3 /opt/dpdk-20.11

EOF

cd /opt/
podman build --squash -t quay.io/nepdemo/flexran_basekit:dpdk-19.11-ocp-4.9.5-ubi-8.4 -f flexran.dpdk.dockerfile ./

podman push quay.io/nepdemo/flexran_basekit:dpdk-19.11-ocp-4.9.5-ubi-8.4 

build in nepdemo env

在nepdemo的内网环境中,编译镜像,并将镜像推送到nepdemo的镜像仓库

create a image registry to hold the large container image

# found a centos7 host

mkdir /etc/crts/ && cd /etc/crts
openssl req \
   -newkey rsa:2048 -nodes -keyout redhat.ren.key \
   -x509 -days 3650 -out redhat.ren.crt -subj \
   "/C=CN/ST=GD/L=SZ/O=Global Security/OU=IT Department/CN=*.redhat.ren"

cp /etc/crts/redhat.ren.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

mkdir -p /home/data/registry
cd /data
# tar zxf registry.tgz
yum -y install docker-distribution

cat << EOF > /etc/docker-distribution/registry/config.yml
version: 0.1
log:
  fields:
    service: registry
storage:
    cache:
        layerinfo: inmemory
    filesystem:
        rootdirectory: /home/data/registry
    delete:
        enabled: true
http:
    addr: :5443
    tls:
       certificate: /etc/crts/redhat.ren.crt
       key: /etc/crts/redhat.ren.key
EOF
# systemctl restart docker
# systemctl stop docker-distribution
systemctl enable --now docker-distribution

build container image for intel sdk

cat << EOF >>  /etc/hosts
192.168.123.252 reg-tmp.redhat.ren
EOF

export REG_TMP="reg-tmp.redhat.ren:5443"

podman tag flexran.docker.registry/flexran_vdu:latest  ${REG_TMP}/nepdemo/flexran_vdu:flexran-20.11-dpdk-19.11-ocp4.9.5-ubi-8.4

podman push --tls-verify=false ${REG_TMP}/nepdemo/flexran_vdu:flexran-20.11-dpdk-19.11-ocp4.9.5-ubi-8.4

# copy flexran sdk to image
cd /data
podman build --squash -t ${REG_TMP}/nepdemo/flexran_basekit:flexran-sdk-20.11-ocp-4.9.5-ubi-8.4 -f flexran.sdk.dockerfile ./

podman push --tls-verify=false ${REG_TMP}/nepdemo/flexran_basekit:flexran-sdk-20.11-ocp-4.9.5-ubi-8.4

# dpdk-kmods
cd /data/git
podman build --squash -t ${REG_TMP}/nepdemo/flexran_vdu:dpdk-kmods-ocp-4.9.5-ubi -f flexran.sdk.dockerfile ./

podman push --tls-verify=false ${REG_TMP}/nepdemo/flexran_vdu:dpdk-kmods-ocp-4.9.5-ubi

# copy intel icc to image
cd /opt/intel
podman build --squash -t ${REG_TMP}/nepdemo/flexran_basekit:intel.icc-21.11-ocp-4.9.5-ubi-8.4 -f flexran.intel.icc.dockerfile ./

podman push --tls-verify=false ${REG_TMP}/nepdemo/flexran_basekit:intel.icc-21.11-ocp-4.9.5-ubi-8.4

# copy intel icx to image
cd /opt/intel
podman build --squash -t ${REG_TMP}/nepdemo/flexran_basekit:intel.icx-21.11-ocp-4.9.5-ubi-8.4 -f flexran.intel.icx.dockerfile ./

podman push --tls-verify=false ${REG_TMP}/nepdemo/flexran_basekit:intel.icx-21.11-ocp-4.9.5-ubi-8.4


# build dev docker image with dpdk 19.11
cat << 'EOF' > /opt/flexran.dpdk.dockerfile
FROM registry.access.redhat.com/ubi8/ubi:8.4

RUN dnf repolist
RUN sed -i 's|enabled=1|enabled=0|g' /etc/yum/pluginconf.d/subscription-manager.conf
RUN sed -i 's|$releasever|8.4|g' /etc/yum.repos.d/redhat.repo
RUN sed -i 's|cdn.redhat.com|china.cdn.redhat.com|g' /etc/yum.repos.d/redhat.repo
RUN sed -i '/codeready-builder-for-rhel-8-x86_64-rpms/,/\[/ s/enabled = 0/enabled = 1/' /etc/yum.repos.d/redhat.repo
RUN mv -f /etc/yum.repos.d/ubi.repo /etc/yum.repos.d/ubi.repo.bak

RUN echo -e "\
[localrepo]\n\
name=LocalRepo\n\
baseurl=ftp://192.168.122.1/dnf/extensions/\n\
enabled=1\n\
gpgcheck=0" \
> /etc/yum.repos.d/local.repo

RUN dnf -y update
RUN dnf -y install rsync

RUN dnf -y install kernel-rt-core kernel-rt-devel kernel-rt-modules kernel-rt-modules-extra kernel-headers libhugetlbfs-devel zlib-devel numactl-devel cmake gcc gcc-c++ libhugetlbfs-utils libhugetlbfs-devel libhugetlbfs numactl-devel pciutils libaio libaio-devel net-tools libpcap python3-pip
RUN dnf install -y --allowerasing coreutils
RUN dnf groupinstall -y development server
RUN pip-3 install meson ninja

COPY dpdk-19.11 /opt/dpdk-19.11
# RUN ln -s /opt/dpdk-19.11 /opt/dpdk-20.11

EOF

cd /opt/
podman build --squash -t ${REG_TMP}/nepdemo/flexran_basekit:dpdk-19.11-ocp-4.9.5-ubi-8.4 -f flexran.dpdk.dockerfile ./

podman push --tls-verify=false ${REG_TMP}/nepdemo/flexran_basekit:dpdk-19.11-ocp-4.9.5-ubi-8.4 

deploy on ocp 4.9.5

镜像都准备好了,我们开始在openshift4 上进行部署测试。

set security for temp image registry

我们临时创建了一个镜像仓库,那么我们就要把这个配置放到集群里面去,主要是让ocp集群,不要检查这个新镜像仓库的证书。

oc patch schedulers.config.openshift.io/cluster --type merge -p '{"spec":{"mastersSchedulable":false}}'

install /data/ocp4/clients/butane-amd64 /usr/local/bin/butane

cat << EOF > /data/sno/tmp.images.bu
variant: openshift
version: 4.9.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-zzz-worker-temp-images
storage:
  files:
    - path: /etc/containers/registries.conf.d/temp.registries.conf
      overwrite: true
      contents:
        inline: |

            [[registry]]
            location = "tmp-registry.ocp4.redhat.ren:5443"
            insecure = true
            blocked = false
            mirror-by-digest-only = false
            prefix = ""

EOF

butane /data/sno/tmp.images.bu > /data/sno/99-zzz-worker-temp-images.yaml

oc create -f /data/sno/99-zzz-worker-temp-images.yaml

set a host-path dir for flexran sdk

我们需要在 worker-2 上创建本地目录,好承载 flexran sdk, intel icc, intel icx 等超级大的目录和文件,主要是开发组有在容器平台做开发和测试的需求。如果是生成运行环境,这个本地目录是不应该存在的。

# do not need, as it is already deployed
cat << EOF > /data/install/host-path.yaml
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: 50-set-selinux-for-hostpath-nepdemo-worker-rt-2
  labels:
    machineconfiguration.openshift.io/role: worker-rt-2
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - contents: |
            [Unit]
            Description=Set SELinux chcon for hostpath nepdemo
            Before=kubelet.service

            [Service]
            Type=oneshot
            RemainAfterExit=yes
            ExecStartPre=-mkdir -p /var/nepdemo
            ExecStart=chcon -Rt container_file_t /var/nepdemo/

            [Install]
            WantedBy=multi-user.target
          enabled: true
          name: hostpath-nepdemo.service
EOF
oc create -f /data/install/host-path.yaml

using job to copy files to local path

我们使用job的方式,把flexran sdk, intel icc/icx sdk复制到worker-2的本地目录,以便后续使用。用job的方式,主要考虑,这个是一次性的工作,并且我们在container image 里面还装了rsync,这样以后如果破坏了本地目录,重新运行以下job,就可以很快的同步目录。


export REG_TMP='tmp-registry.ocp4.redhat.ren:5443'

# copy dpdk to local
cat << EOF > /data/install/job.flexran.dpdk.yaml
---
apiVersion: batch/v1
kind: Job
metadata:
  name: flexran.basekit.dpdk.copy
  namespace: default
spec:
  template:
    spec:
      containers:
      - name: files
        image: ${REG_TMP}/nepdemo/flexran_basekit:dpdk-19.11-ocp-4.9.5-ubi-8.4 
        command: [ "/bin/sh","-c","--" ]
        # command: ["rsync", "--delete", "-arz", "/opt/dpdk-19.11", "/nepdemo/"]
        args: [" rsync -P --delete -arz /opt/dpdk-19.11 /nepdemo/ "]
        volumeMounts:
          - name: nepdemo
            mountPath: /nepdemo
      restartPolicy: Never
      nodeName: worker-2.ocp4.redhat.ren
      volumes:
        - name: nepdemo
          hostPath:
            path: /var/nepdemo      
EOF

oc create -f /data/install/job.flexran.dpdk.yaml

# copy flexran sdk to local
cat << EOF > /data/install/job.flexran.sdk.yaml
---
apiVersion: batch/v1
kind: Job
metadata:
  name: flexran.basekit.sdk.copy
  namespace: default
spec:
  template:
    spec:
      containers:
      - name: files
        image: ${REG_TMP}/nepdemo/flexran_basekit:flexran-sdk-20.11-ocp-4.9.5-ubi-8.4
        command: [ "/bin/sh","-c","--" ]
        # command: ["rsync", "--delete", "-arz", "/data/flexran", "/nepdemo/"]
        args: [" rsync -P --delete -arz /data/flexran /nepdemo/ "]
        volumeMounts:
          - name: nepdemo
            mountPath: /nepdemo
      restartPolicy: Never
      nodeName: worker-2.ocp4.redhat.ren
      volumes:
        - name: nepdemo
          hostPath:
            path: /var/nepdemo      
EOF

oc create -f /data/install/job.flexran.sdk.yaml

# copy intel icc sdk to local
cat << EOF > /data/install/job.intel.icc.yaml
---
apiVersion: batch/v1
kind: Job
metadata:
  name: flexran.basekit.intel.icc.copy
  namespace: default
spec:
  template:
    spec:
      containers:
      - name: files
        image: ${REG_TMP}/nepdemo/flexran_basekit:intel.icc-21.11-ocp-4.9.5-ubi-8.4
        command: [ "/bin/sh","-c","--" ]
        # command: ["rsync", "--delete", "-arz", "/opt/intel/system_studio_2019", "/nepdemo/"]
        args: [" rsync -P --delete -arz /opt/intel/system_studio_2019 /nepdemo/ "]
        volumeMounts:
          - name: nepdemo
            mountPath: /nepdemo
      restartPolicy: Never
      nodeName: worker-2.ocp4.redhat.ren
      volumes:
        - name: nepdemo
          hostPath:
            path: /var/nepdemo      
EOF

oc create -f /data/install/job.intel.icc.yaml

# copy intel icx sdk to local
cat << EOF > /data/install/job.intel.icx.yaml
---
apiVersion: batch/v1
kind: Job
metadata:
  name: flexran.basekit.intel.icx.copy
  namespace: default
spec:
  template:
    spec:
      containers:
      - name: files
        image: ${REG_TMP}/nepdemo/flexran_basekit:intel.icx-21.11-ocp-4.9.5-ubi-8.4
        command: [ "/bin/sh","-c","--" ]
        # command: ["rsync", "--delete", "-arz", "/opt/intel/oneapi", "/nepdemo/"]
        args: [" rsync -P --delete -arz /opt/intel/oneapi /nepdemo/ "]
        volumeMounts:
          - name: nepdemo
            mountPath: /nepdemo
      restartPolicy: Never
      nodeName: worker-2.ocp4.redhat.ren
      volumes:
        - name: nepdemo
          hostPath:
            path: /var/nepdemo      
EOF

oc create -f /data/install/job.intel.icx.yaml

# copy intel license to local
cat << EOF > /data/install/job.intel.license.yaml
---
apiVersion: batch/v1
kind: Job
metadata:
  name: flexran.basekit.intel.icx.copy
  namespace: default
spec:
  template:
    spec:
      containers:
      - name: files
        image: ${REG_TMP}/nepdemo/flexran_basekit:intel.icx-21.11-ocp-4.9.5-ubi-8.4
        command: [ "/bin/sh","-c","--" ]
        args: ["rsync -P --delete -arz /opt/intel/licenses /nepdemo/ ; rsync -P --delete -arz /opt/intel/packagemanager /nepdemo/ "]
        volumeMounts:
          - name: nepdemo
            mountPath: /nepdemo
      restartPolicy: Never
      nodeName: worker-2.ocp4.redhat.ren
      volumes:
        - name: nepdemo
          hostPath:
            path: /var/nepdemo      
EOF

oc create -f /data/install/job.intel.license.yaml

setup sriov operator

openshift有sriov的operator,官方支持intel x710网卡,我们直接用就好了。

the env has nic Intel X710 : 8086 1572

# install sriov operator
cat << EOF > /data/install/sriov.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-sriov-network-operator
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
spec:
  targetNamespaces:
  - openshift-sriov-network-operator
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-network-operator-subscription
  namespace: openshift-sriov-network-operator
spec:
  channel: "4.9"
  installPlanApproval: Manual
  name: sriov-network-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF
oc create -f /data/install/sriov.yaml

oc get SriovNetworkNodeState -n openshift-sriov-network-operator
# NAME                       AGE
# master-0                   42m
# worker-0.ocp4.redhat.ren   42m
# worker-1                   42m
# worker-2.ocp4.redhat.ren   42m

oc get SriovNetworkNodeState/worker-2.ocp4.redhat.ren -n openshift-sriov-network-operator -o yaml
# apiVersion: sriovnetwork.openshift.io/v1
# kind: SriovNetworkNodeState
# metadata:
#   creationTimestamp: "2022-05-06T14:34:54Z"
#   generation: 1
#   name: worker-2.ocp4.redhat.ren
#   namespace: openshift-sriov-network-operator
#   ownerReferences:
#   - apiVersion: sriovnetwork.openshift.io/v1
#     blockOwnerDeletion: true
#     controller: true
#     kind: SriovNetworkNodePolicy
#     name: default
#     uid: 4eca5eea-e1e5-410f-8833-dd2de1434e53
#   resourceVersion: "70932404"
#   uid: 1d122c8e-b788-4f1e-a3d5-865c6230a476
# spec:
#   dpConfigVersion: "70930693"
# status:
#   interfaces:
#   - deviceID: "1572"
#     driver: i40e
#     linkSpeed: -1 Mb/s
#     linkType: ETH
#     mac: 90:e2:ba:a8:29:e6
#     mtu: 1500
#     name: ens2f0
#     pciAddress: 0000:65:00.0
#     totalvfs: 64
#     vendor: "8086"
#   - deviceID: "1572"
#     driver: i40e
#     linkSpeed: -1 Mb/s
#     linkType: ETH
#     mac: 90:e2:ba:a8:29:e7
#     mtu: 1500
#     name: ens2f1
#     pciAddress: 0000:65:00.1
#     totalvfs: 64
#     vendor: "8086"
#   - deviceID: 37d1
#     driver: i40e
#     linkSpeed: 1000 Mb/s
#     linkType: ETH
#     mac: ac:1f:6b:ea:5b:32
#     mtu: 1500
#     name: eno1
#     pciAddress: 0000:b5:00.0
#     totalvfs: 32
#     vendor: "8086"
#   - deviceID: 37d1
#     driver: i40e
#     linkSpeed: 1000 Mb/s
#     linkType: ETH
#     mac: ac:1f:6b:ea:5b:33
#     mtu: 1500
#     name: eno2
#     pciAddress: 0000:b5:00.1
#     totalvfs: 32
#     vendor: "8086"
#   syncStatus: Succeeded

# how to use the sriov to create VF and attach to pod, depends on use case from nep demo request
# remember to active SRIOV in bios
# remember to active VT-d in bios
cat << EOF > /data/install/sriov.policy.yaml
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-710-nic01-rt2
  namespace: openshift-sriov-network-operator
spec:
  resourceName: intel_710_nic01_rt2
  nodeSelector:
    kubernetes.io/hostname: worker-2.ocp4.redhat.ren
  numVfs: 4
  nicSelector:
    vendor: "8086"
    deviceID: "1572"
    rootDevices:
      - "0000:65:00.0"
    # pfNames:
    #   - "ens2f0"
  # linkType: eth
  # isRdma: false
  deviceType: vfio-pci 
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-710-nic02-rt2
  namespace: openshift-sriov-network-operator
spec:
  resourceName: intel_710_nic02_rt2
  nodeSelector:
    kubernetes.io/hostname: worker-2.ocp4.redhat.ren
  numVfs: 4
  nicSelector:
    vendor: "8086"
    deviceID: "1572"
    rootDevices:
      - "0000:65:00.1"
    # pfNames:
    #   - "ens2f1"
  # linkType: eth
  # isRdma: false
  deviceType: vfio-pci 
EOF
oc create -f /data/install/sriov.policy.yaml

# oc delete -f /data/install/sriov.policy.yaml

oc get sriovnetworknodestates/worker-2.ocp4.redhat.ren -n openshift-sriov-network-operator  -o jsonpath='{.status.syncStatus}' && echo
# Succeeded


cat << EOF > /data/install/sriov.attach.yaml
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: intel-710-nic01-rt2
  namespace: openshift-sriov-network-operator
spec:
  resourceName: intel_710_nic01_rt2
  networkNamespace: vbbu-demo
  ipam: |-
    {
      "type": "static",
      "addresses": [
        {
          "address": "192.168.12.21/24"
        }
      ]
    }
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: intel-710-nic02-rt2
  namespace: openshift-sriov-network-operator
spec:
  resourceName: intel_710_nic02_rt2
  networkNamespace: vbbu-demo
  # ipam: |-
  #   {
  #     "type": "dhcp"
  #   }
  ipam: |-
    {
      "type": "static",
      "addresses": [
        {
          "address": "192.168.22.21/24"
        }
      ]
    }  
EOF
oc create -f /data/install/sriov.attach.yaml

# oc delete -f /data/install/sriov.attach.yaml

oc get net-attach-def -n vbbu-demo
# NAME                  AGE
# intel-710-nic01-rt2   34s
# intel-710-nic02-rt2   34s


setup fec sriov operator

intel已经给自己的FEC加速卡做好了operator,还有非常详细的文档,我们很幸福的直接用就好了。

# install sriov operator
cat << EOF > /data/install/sriov.fec.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: vran-acceleration-operators
  annotations:
    workload.openshift.io/allowed: management
  labels:
     openshift.io/cluster-monitoring: "true"
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: vran-operators
  namespace: vran-acceleration-operators
spec:
  targetNamespaces:
    - vran-acceleration-operators
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-fec-subscription
  namespace: vran-acceleration-operators
spec:
  channel: stable
  installPlanApproval: Manual
  name: sriov-fec
  source: certified-operators
  sourceNamespace: openshift-marketplace
EOF
oc create -f /data/install/sriov.fec.yaml

oc get csv -n vran-acceleration-operators
# NAME                                DISPLAY                                             VERSION   REPLACES   PHASE
# performance-addon-operator.v4.9.0   Performance Addon Operator                          4.9.0                Succeeded
# sriov-fec.v2.2.1                    SEO SR-IOV Operator for Wireless FEC Accelerators   2.2.1                Succeeded

oc get sriovfecnodeconfig -n vran-acceleration-operators 
# No resources found in vran-acceleration-operators namespace.

cat << EOF > /data/install/sriov.fec.config.yaml
apiVersion: sriovfec.intel.com/v2
kind: SriovFecClusterConfig
metadata:
  name: config
  namespace: vran-acceleration-operators 
spec:
  priority: 1
  nodeSelector:
    kubernetes.io/hostname: worker-2.ocp4.redhat.ren
  acceleratorSelector:
    pciAddress: 0000:17:00.0
  physicalFunction:  
    pfDriver: "pci-pf-stub"
    vfDriver: "vfio-pci"
    vfAmount: 16
    bbDevConfig:
      acc100:
        # Programming mode: 0 = VF Programming, 1 = PF Programming
        # true = PF Programming, false = VF Programming
        pfMode: true
        numVfBundles: 16
        maxQueueSize: 1024
        uplink4G:
          numQueueGroups: 0
          numAqsPerGroups: 16
          aqDepthLog2: 4
        downlink4G:
          numQueueGroups: 0
          numAqsPerGroups: 16
          aqDepthLog2: 4
        uplink5G:
          numQueueGroups: 4
          numAqsPerGroups: 16
          aqDepthLog2: 4
        downlink5G:
          numQueueGroups: 4
          numAqsPerGroups: 16
          aqDepthLog2: 4
EOF
oc create -f /data/install/sriov.fec.config.yaml

# oc delete -f /data/install/sriov.fec.config.yaml

oc get sriovfecnodeconfig -n vran-acceleration-operators
# NAME                       CONFIGURED
# worker-2.ocp4.redhat.ren   Succeeded

oc get sriovfecnodeconfig -n vran-acceleration-operators worker-2.ocp4.redhat.ren -o yaml
# apiVersion: sriovfec.intel.com/v2
# kind: SriovFecNodeConfig
# metadata:
#   creationTimestamp: "2022-05-09T06:51:45Z"
#   generation: 2
#   name: worker-2.ocp4.redhat.ren
#   namespace: vran-acceleration-operators
#   resourceVersion: "72789505"
#   uid: 265c42ae-f898-407c-a4bc-7f17aa8b94bb
# spec:
#   physicalFunctions:
#   - bbDevConfig:
#       acc100:
#         downlink4G:
#           aqDepthLog2: 4
#           numAqsPerGroups: 16
#           numQueueGroups: 0
#         downlink5G:
#           aqDepthLog2: 4
#           numAqsPerGroups: 16
#           numQueueGroups: 4
#         maxQueueSize: 1024
#         numVfBundles: 16
#         pfMode: true
#         uplink4G:
#           aqDepthLog2: 4
#           numAqsPerGroups: 16
#           numQueueGroups: 0
#         uplink5G:
#           aqDepthLog2: 4
#           numAqsPerGroups: 16
#           numQueueGroups: 4
#     pciAddress: "0000:17:00.0"
#     pfDriver: pci-pf-stub
#     vfAmount: 16
#     vfDriver: vfio-pci
# status:
#   conditions:
#   - lastTransitionTime: "2022-05-09T12:48:10Z"
#     message: Configured successfully
#     observedGeneration: 2
#     reason: Succeeded
#     status: "True"
#     type: Configured
#   inventory:
#     sriovAccelerators:
#     - deviceID: 0d5c
#       driver: pci-pf-stub
#       maxVirtualFunctions: 16
#       pciAddress: "0000:17:00.0"
#       vendorID: "8086"
#       virtualFunctions:
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:00.0"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:00.1"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:01.2"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:01.3"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:01.4"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:01.5"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:01.6"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:01.7"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:00.2"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:00.3"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:00.4"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:00.5"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:00.6"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:00.7"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:01.0"
#       - deviceID: 0d5d
#         driver: vfio-pci
#         pciAddress: "0000:18:01.1"



setup ptp

intel flexran文档里面说,必须要用ptp,这个正常,在o-ran架构中,ptp是必须的。

# install ptp operator
cat << EOF > /data/install/ptp.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-ptp
  annotations:
    workload.openshift.io/allowed: management
  labels:
    name: openshift-ptp
    openshift.io/cluster-monitoring: "true"
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: ptp-operators
  namespace: openshift-ptp
spec:
  targetNamespaces:
  - openshift-ptp
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ptp-operator-subscription
  namespace: openshift-ptp
spec:
  channel: "4.9"
  installPlanApproval: Manual
  name: ptp-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF
oc create -f /data/install/ptp.yaml

oc get csv -n openshift-ptp
# NAME                                DISPLAY                      VERSION              REPLACES   PHASE
# performance-addon-operator.v4.9.0   Performance Addon Operator   4.9.0                           Succeeded
# ptp-operator.4.9.0-202204211825     PTP Operator                 4.9.0-202204211825              Succeeded

oc get csv -n openshift-ptp \
  -o custom-columns=Name:.metadata.name,Phase:.status.phase
# Name                                Phase
# performance-addon-operator.v4.9.0   Succeeded
# ptp-operator.4.9.0-202204211825     Succeeded

# as nepdemo request, disable phc2sys service, but we enabled it.
# 坑爹的 ptp4lConf 配置,我查了源代码才知道,他不能有空行
cat << EOF > /data/install/ptp.config.yaml
apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: ordinary-clock-ptp-config-worker-2 
  namespace: openshift-ptp
spec:
  profile: 
  - name: "profile1" 
    interface: "ens2f1" 
    ptp4lOpts: "-2 -m" 
    phc2sysOpts: "-a -r" 
    ptp4lConf: |-
      [global]
      #
      # Default Data Set
      #
      twoStepFlag             1
      slaveOnly               0
      priority1               128
      priority2               128
      domainNumber            24
      #utc_offset             37
      clockClass              248
      clockAccuracy           0xFE
      offsetScaledLogVariance 0xFFFF
      free_running            0
      freq_est_interval       1
      dscp_event              0
      dscp_general            0
      dataset_comparison      ieee1588
      G.8275.defaultDS.localPriority  128
      #
      # Port Data Set
      # 16 TS a second use logSyncInterval  -4
      logAnnounceInterval     1
      logSyncInterval         -4
      logMinDelayReqInterval  0
      logMinPdelayReqInterval 0
      announceReceiptTimeout  3
      syncReceiptTimeout      0
      delayAsymmetry          0
      fault_reset_interval    4
      neighborPropDelayThresh 20000000
      masterOnly              0
      G.8275.portDS.localPriority     128
      #
      # Run time options
      #
      assume_two_step         0
      logging_level           6
      path_trace_enabled      0
      follow_up_info          0
      hybrid_e2e              0
      inhibit_multicast_service       0
      net_sync_monitor        0
      tc_spanning_tree        0
      tx_timestamp_timeout    1
      unicast_listen          0
      unicast_master_table    0
      unicast_req_duration    3600
      use_syslog              1
      verbose                 0
      summary_interval        0
      kernel_leap             1
      check_fup_sync          0
      #
      # Servo Options
      #
      pi_proportional_const   0.0
      pi_integral_const       0.0
      pi_proportional_scale   0.0
      pi_proportional_exponent        -0.3
      pi_proportional_norm_max        0.7
      pi_integral_scale       0.0
      pi_integral_exponent    0.4
      pi_integral_norm_max    0.3
      step_threshold          0.0
      first_step_threshold    0.00002
      max_frequency           900000000
      clock_servo             pi
      sanity_freq_limit       200000000
      ntpshm_segment          0
      #
      # Transport options
      #
      transportSpecific       0x0
      ptp_dst_mac             01:1B:19:00:00:00
      p2p_dst_mac             01:80:C2:00:00:0E
      udp_ttl                 1
      udp6_scope              0x0E
      uds_address             /var/run/ptp4l
      #
      # Default interface options
      #
      clock_type              OC
      network_transport       UDPv4
      delay_mechanism         E2E
      time_stamping           hardware
      tsproc_mode             filter
      delay_filter            moving_median
      delay_filter_length     10
      egressLatency           0
      ingressLatency          0
      boundary_clock_jbod     0
      #
      # Clock description
      #
      productDescription      ;;
      revisionData            ;;
      manufacturerIdentity    00:00:00
      userDescription         ;
      timeSource              0xA0
    ptpSchedulingPolicy: SCHED_FIFO  
    ptpSchedulingPriority: 65 
  recommend: 
  - profile: "profile1" 
    priority: 10 
    match: 
    - nodeLabel: "node-role.kubernetes.io/worker" 
      nodeName: "worker-2.ocp4.redhat.ren" 
EOF
oc create -f /data/install/ptp.config.yaml

# oc delete -f /data/install/ptp.config.yaml

create deployment ( put all together )

最终,我们可以拼装出一个完整的部署,我们的部署是一个 pod 里面有 2 个 container。一个 container 是 vbbu 应用的 container , 按照 intel sdk 中的方法来搞,也就是尽量只把编译后的应用程序本身放进来,而不是把其他的依赖包放进来。这样镜像就会比较小,大概2G左右。 另外一个container是开发用的,因为开发组需要一个开发环境,把东西编译好,然后复制到 vbbu 应用的那个container里面去。

在这里,flexran-release-running 这个container就是最终运行用的。而flexran-dev-env就是开发环境。

目前这个版本是开发版,未来开发测试结束,将把flexran-dev-env取消,另外本地host-path的目录,也会删除,也就是本地的intel sdk都删掉。


oc new-project vbbu-demo

oc project vbbu-demo

export REG_TMP='tmp-registry.ocp4.redhat.ren:5443'

# kernel driver deployment
oc create serviceaccount svcacct-driver -n vbbu-demo
oc adm policy add-scc-to-user privileged -z svcacct-driver -n vbbu-demo
# oc adm policy add-scc-to-user anyuid -z mysvcacct -n vbbu-demo

cat << EOF > /data/install/dpdk.kmod.driver.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dpdk-kmod-driver
  # namespace: default
  labels:
    app: dpdk-kmod-driver
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dpdk-kmod-driver
  template:
    metadata:
      labels:
        app: dpdk-kmod-driver
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - dpdk-kmod-driver
              topologyKey: "kubernetes.io/hostname"
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - worker-2.ocp4.redhat.ren
      # restartPolicy: Never
      serviceAccountName: svcacct-driver
      initContainers:
      - name: copy
        image: ${REG_TMP}/nepdemo/flexran_vdu:dpdk-kmods-ocp-4.9.5-ubi
        command: ["/bin/sh", "-c", "--"]
        args: ["/bin/cp -rf /data/* /nepdemo/"]
        # imagePullPolicy: Always
        volumeMounts:
        - name: driver-files
          mountPath: /nepdemo
      containers:
      - name: driver
        image: ${REG_TMP}/nepdemo/flexran_vdu:flexran-20.11-dpdk-19.11-ocp4.9.5-ubi-8.4
        imagePullPolicy: Always
        command: ["/bin/sh", "-c", "--"]
        args: ["insmod /nepdemo/dpdk-kmods/linux/igb_uio/igb_uio.ko ; sleep infinity ;"]
        resources:
          requests:
            cpu: 10m
            memory: 20Mi
        securityContext:
          privileged: true
          runAsUser: 0
        volumeMounts:
        - name: driver-files
          mountPath: /nepdemo
        # - name: host
        #   mountPath: /host
      volumes: 
      - name: driver-files
        emptyDir: {}
      # - name: host
      #   hostPath:
      #     path: /
      #     type: Directory
EOF
oc create -n vbbu-demo -f /data/install/dpdk.kmod.driver.yaml

# to restore
# oc delete -f /data/install/dpdk.kmod.driver.yaml


# the pod with vbbu container and dev container
# later, it will change to deployment
cat << EOF > /data/install/vran.intel.flexran.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flexran-binary-release-deployment
  labels:
    app: flexran-binary-release-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flexran-binary-release
  template:
    metadata:
      labels:
        app: flexran-binary-release
      name: flexran-binary-release
      annotations:
        k8s.v1.cni.cncf.io/networks: |-
          [
            {
              "name": "intel-710-nic01-rt2",
              "mac": "00:11:22:33:44:01"
            },
            {
              "name": "intel-710-nic02-rt2",
              "mac": "00:11:22:33:44:02"
            }
          ]
      cpu-load-balancing.crio.io: "true"
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - flexran-binary-release
              topologyKey: "kubernetes.io/hostname"
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - worker-2.ocp4.redhat.ren
      # nodeSelector:
      #   kubernetes.io/hostname: worker-2.ocp4.redhat.ren
      runtimeClassName: performance-wzh-performanceprofile-2
      serviceAccountName: svcacct-driver
      containers:
      - securityContext:
          privileged: false
          capabilities:
            add:
              #- SYS_ADMIN
              - IPC_LOCK
              - SYS_NICE
              - SYS_RESOURCE
              - NET_RAW
        command: [ "/sbin/init" ]
        # command: [ "/bin/sh","-c","--" ]
        # args: ["  sleep infinity ; "]
        # tty: true
        # stdin: true
        image: ${REG_TMP}/nepdemo/flexran_vdu:flexran-20.11-dpdk-19.11-ocp4.9.5-ubi-8.4
        # image: ${REG_TMP}/nepdemo/flexran_basekit:dpdk-19.11-ocp-4.9.5-ubi-8.4 
        # imagePullPolicy: Always
        name: flexran-release-running
        resources:
          requests:
            memory: "24Gi" 
            intel.com/intel_fec_acc100: '1'
            hugepages-1Gi: 16Gi  
          limits:
            memory: "24Gi"
            intel.com/intel_fec_acc100: '1'
            hugepages-1Gi: 16Gi
        volumeMounts:
        - name: hugepage
          mountPath: /hugepages
        - name: varrun
          mountPath: /var/run/dpdk
          readOnly: false
        # - name: oneapi
        #   mountPath: /opt/intel/oneapi
        #   readOnly: false
        # - name: system-studio-2019
        #   mountPath: /opt/intel/system_studio_2019
        #   readOnly: false   
        # - name: licenses
        #   mountPath: /opt/intel/licenses
        #   readOnly: false
        # - name: packagemanager
        #   mountPath: /opt/intel/packagemanager
        #   readOnly: false 
        - name: dpdk-19-11
          mountPath: /opt/dpdk-19.11
          readOnly: false
        - name: flexran
          mountPath: /data/flexran
          readOnly: false   
        - name: sys
          mountPath: /sys/
          readOnly: false

      - securityContext:
          privileged: false
        command: [ "/bin/sh","-c","--" ]
        args: [" echo 'source  /opt/intel/system_studio_2019/bin/compilervars.sh intel64' >> /root/.bashrc ; echo 'source /opt/intel/oneapi/setvars.sh' >> /root/.bashrc ; sleep infinity"]
        # tty: true
        # stdin: true
        # env:
        image: ${REG_TMP}/nepdemo/flexran_basekit:dpdk-19.11-ocp-4.9.5-ubi-8.4 
        name: flexran-dev-env
        volumeMounts:
        - name: oneapi
          mountPath: /opt/intel/oneapi
          readOnly: false
        - name: system-studio-2019
          mountPath: /opt/intel/system_studio_2019
          readOnly: false   
        - name: licenses
          mountPath: /opt/intel/licenses
          readOnly: false
        - name: packagemanager
          mountPath: /opt/intel/packagemanager
          readOnly: false   
        - name: dpdk-19-11
          mountPath: /opt/dpdk-19-11
          readOnly: false
        - name: flexran
          mountPath: /data/flexran
          readOnly: false            
      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
      - name: varrun
        emptyDir: {}
      - name: dpdk-19-11
        hostPath:
          path: "/var/nepdemo/dpdk-19.11"
      - name: flexran
        hostPath:
          path: "/var/nepdemo/flexran"
      - name: oneapi
        hostPath:
          path: "/var/nepdemo/oneapi"
      - name: system-studio-2019
        hostPath:
          path: "/var/nepdemo/system_studio_2019"
      - name: licenses
        hostPath:
          path: "/var/nepdemo/licenses"
      - name: packagemanager
        hostPath:
          path: "/var/nepdemo/packagemanager"
      - name: sys
        hostPath:
          path: "/sys/"

EOF
oc create -n vbbu-demo -f /data/install/vran.intel.flexran.yaml

# oc delete -n vbbu-demo -f /data/install/vran.intel.flexran.yaml

POD_ID=$(oc get pod -n vbbu-demo -o json | jq -r '.items[].metadata.name | select(. | contains("flexran-binary-release"))' )
oc rsh -c flexran-dev-env ${POD_ID}
# switch to bash, will run .bashrc, which wil bring you intel icc/icx sdk env.
# bash

# 我们从fec的device plugin里面,能看到设备已经提供出来了
POD_ID=$(oc get pod -n vran-acceleration-operators -o json | jq -r ' .items[].metadata.name | select( contains( "device-plugin" ) ) ')
oc logs -n vran-acceleration-operators $POD_ID
# ......
# I0509 12:53:38.288275       1 server.go:119] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest{DevicesIDs:[0000:18:01.2],},},}
# I0509 12:53:38.288326       1 accelResourcePool.go:46] GetDeviceSpecs(): for devices: [0000:18:01.2]
# I0509 12:53:38.288435       1 pool_stub.go:97] GetEnvs(): for devices: [0000:18:01.2]
# I0509 12:53:38.288443       1 pool_stub.go:113] GetMounts(): for devices: [0000:18:01.2]
# I0509 12:53:38.288447       1 server.go:128] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_INTEL_COM_INTEL_FEC_ACC100: 0000:18:01.2,},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec{ContainerPath:/dev/vfio/vfio,HostPath:/dev/vfio/vfio,Permissions:mrw,},&DeviceSpec{ContainerPath:/dev/vfio/110,HostPath:/dev/vfio/110,Permissions:mrw,},},Annotations:map[string]string{},},},}

POD_ID=$(oc get pod -n openshift-sriov-network-operator -o json | jq -r ' .items[].metadata.name | select( contains( "device-plugin" ) ) ')
oc logs -n openshift-sriov-network-operator $POD_ID
# ......
# I0511 13:03:13.167902       1 server.go:115] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest{DevicesIDs:[0000:65:02.0],},},}
# I0511 13:03:13.167961       1 netResourcePool.go:50] GetDeviceSpecs(): for devices: [0000:65:02.0]
# I0511 13:03:13.168068       1 pool_stub.go:97] GetEnvs(): for devices: [0000:65:02.0]
# I0511 13:03:13.168077       1 pool_stub.go:113] GetMounts(): for devices: [0000:65:02.0]
# I0511 13:03:13.168082       1 server.go:124] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_OPENSHIFT_IO_INTEL_710_NIC01_RT2: 0000:65:02.0,},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec{ContainerPath:/dev/vfio/vfio,HostPath:/dev/vfio/vfio,Permissions:mrw,},&DeviceSpec{ContainerPath:/dev/vfio/108,HostPath:/dev/vfio/108,Permissions:mrw,},},Annotations:map[string]string{},},},}
# I0511 13:03:13.168369       1 server.go:115] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest{DevicesIDs:[0000:65:0a.0],},},}
# I0511 13:03:13.168393       1 netResourcePool.go:50] GetDeviceSpecs(): for devices: [0000:65:0a.0]
# I0511 13:03:13.168470       1 pool_stub.go:97] GetEnvs(): for devices: [0000:65:0a.0]
# I0511 13:03:13.168477       1 pool_stub.go:113] GetMounts(): for devices: [0000:65:0a.0]
# I0511 13:03:13.168481       1 server.go:124] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_OPENSHIFT_IO_INTEL_710_NIC02_RT2: 0000:65:0a.0,},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec{ContainerPath:/dev/vfio/vfio,HostPath:/dev/vfio/vfio,Permissions:mrw,},&DeviceSpec{ContainerPath:/dev/vfio/112,HostPath:/dev/vfio/112,Permissions:mrw,},},Annotations:map[string]string{},},},}

# 到vbbu pod里面验证一下,也能看到设备出现了。
POD_ID=$(oc get pod -n vbbu-demo -o json | jq -r '.items[].metadata.name | select(. | contains("flexran-binary-release"))' )
oc exec -c flexran-release-running  ${POD_ID} -- ls /dev/vfio
# Defaulted container "flexran-release-running" out of: flexran-release-running, flexran-dev-env
# 110
# 112
# 97
# vfio

POD_ID=$(oc get pod -n vbbu-demo -o json | jq -r '.items[].metadata.name | select(. | contains("flexran-binary-release"))' )
oc rsh -c flexran-release-running ${POD_ID}

POD_ID=$(oc get pod -n vbbu-demo -o json | jq -r '.items[].metadata.name | select(. | contains("flexran-binary-release"))' )
oc exec -c flexran-release-running  ${POD_ID} -- ip link
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# 3: eth0@if30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
#     link/ether 0a:58:0a:fe:0a:0a brd ff:ff:ff:ff:ff:ff link-netnsid 0

POD_ID=$(oc get pod -n vbbu-demo -o json | jq -r '.items[].metadata.name | select(. | contains("flexran-binary-release"))' )
oc exec -c flexran-release-running  ${POD_ID} -- python3 /root/dpdk-19.11/usertools/dpdk-devbind.py -s
# Network devices using DPDK-compatible driver
# ============================================
# 0000:65:02.0 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=iavf,igb_uio
# 0000:65:02.1 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=iavf,igb_uio
# 0000:65:02.2 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=iavf,igb_uio
# 0000:65:02.3 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=iavf,igb_uio
# 0000:65:0a.0 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=iavf,igb_uio
# 0000:65:0a.1 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=iavf,igb_uio
# 0000:65:0a.2 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=iavf,igb_uio
# 0000:65:0a.3 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=iavf,igb_uio

# Network devices using kernel driver
# ===================================
# 0000:65:00.0 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=ens2f0 drv=i40e unused=igb_uio,vfio-pci
# 0000:65:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=ens2f1 drv=i40e unused=igb_uio,vfio-pci
# 0000:b5:00.0 'Ethernet Connection X722 for 1GbE 37d1' if=eno1 drv=i40e unused=igb_uio,vfio-pci
# 0000:b5:00.1 'Ethernet Connection X722 for 1GbE 37d1' if=eno2 drv=i40e unused=igb_uio,vfio-pci

# Baseband devices using DPDK-compatible driver
# =============================================
# 0000:18:00.0 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:00.1 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:00.2 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:00.3 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:00.4 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:00.5 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:00.6 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:00.7 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:01.0 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:01.1 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:01.2 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:01.3 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:01.4 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:01.5 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:01.6 'Device 0d5d' drv=vfio-pci unused=igb_uio
# 0000:18:01.7 'Device 0d5d' drv=vfio-pci unused=igb_uio

# Baseband devices using kernel driver
# ====================================
# 0000:17:00.0 'Device 0d5c' drv=pci-pf-stub unused=igb_uio,vfio-pci

# No 'Crypto' devices detected
# ============================

# No 'Eventdev' devices detected
# ==============================

# No 'Mempool' devices detected
# =============================

# No 'Compress' devices detected
# ==============================

# Misc (rawdev) devices using kernel driver
# =========================================
# 0000:00:04.0 'Sky Lake-E CBDMA Registers 2021' drv=ioatdma unused=igb_uio,vfio-pci
# 0000:00:04.1 'Sky Lake-E CBDMA Registers 2021' drv=ioatdma unused=igb_uio,vfio-pci
# 0000:00:04.2 'Sky Lake-E CBDMA Registers 2021' drv=ioatdma unused=igb_uio,vfio-pci
# 0000:00:04.3 'Sky Lake-E CBDMA Registers 2021' drv=ioatdma unused=igb_uio,vfio-pci
# 0000:00:04.4 'Sky Lake-E CBDMA Registers 2021' drv=ioatdma unused=igb_uio,vfio-pci
# 0000:00:04.5 'Sky Lake-E CBDMA Registers 2021' drv=ioatdma unused=igb_uio,vfio-pci
# 0000:00:04.6 'Sky Lake-E CBDMA Registers 2021' drv=ioatdma unused=igb_uio,vfio-pci
# 0000:00:04.7 'Sky Lake-E CBDMA Registers 2021' drv=ioatdma unused=igb_uio,vfio-pci

# No 'Regex' devices detected
# ===========================

oc debug node/worker-2.ocp4.redhat.ren -- ip link
# Starting pod/worker-2ocp4redhatren-debug ...
# To use host binaries, run `chroot /host`
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# 2: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
#     link/ether 90:e2:ba:a8:29:e6 brd ff:ff:ff:ff:ff:ff
#     vf 0     link/ether 06:b4:8a:df:01:b6 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
#     vf 1     link/ether 6a:f3:e9:2e:ce:95 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
#     vf 2     link/ether 86:23:2b:24:12:8f brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
#     vf 3     link/ether 00:11:22:33:44:01 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
# 3: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
#     link/ether 90:e2:ba:a8:29:e7 brd ff:ff:ff:ff:ff:ff
#     vf 0     link/ether 00:11:22:33:44:02 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
#     vf 1     link/ether f6:9f:b3:a4:f2:da brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
#     vf 2     link/ether 36:44:0f:fa:b9:84 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
#     vf 3     link/ether fa:5b:75:f2:77:8c brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
# 4: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether ac:1f:6b:ea:5b:32 brd ff:ff:ff:ff:ff:ff
# 5: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether ac:1f:6b:ea:5b:33 brd ff:ff:ff:ff:ff:ff
# 10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
#     link/ether 52:50:27:19:21:e2 brd ff:ff:ff:ff:ff:ff
# 11: br0: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
#     link/ether fe:7b:d1:84:da:4f brd ff:ff:ff:ff:ff:ff
# 12: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
#     link/ether b6:c9:1d:9d:77:aa brd ff:ff:ff:ff:ff:ff
# 13: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
#     link/ether 36:7a:65:37:c1:33 brd ff:ff:ff:ff:ff:ff
# 14: vethf21a4c33@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default
#     link/ether ae:f2:57:a5:67:ad brd ff:ff:ff:ff:ff:ff link-netnsid 0
# 15: veth8662e3e2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default
#     link/ether 9e:49:15:3f:7c:a1 brd ff:ff:ff:ff:ff:ff link-netnsid 1
# 16: veth5d3ab571@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default
#     link/ether aa:ad:f7:cc:b9:57 brd ff:ff:ff:ff:ff:ff link-netnsid 2
# 17: veth20ff5e06@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default
#     link/ether 82:72:8e:6d:1a:4a brd ff:ff:ff:ff:ff:ff link-netnsid 3
# 18: vethd11f4604@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default
#     link/ether 96:df:20:6a:a0:6f brd ff:ff:ff:ff:ff:ff link-netnsid 4
# 20: vethc860c9be@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default
#     link/ether c6:c6:37:fb:1d:48 brd ff:ff:ff:ff:ff:ff link-netnsid 5
# 30: vethfe0374a4@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
#     link/ether 1e:a1:67:b2:00:f6 brd ff:ff:ff:ff:ff:ff link-netnsid 6
# 32: vethecce46ea@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default
#     link/ether 2e:1d:11:80:37:29 brd ff:ff:ff:ff:ff:ff link-netnsid 8

# Removing debug pod ...


以上是一个开发环境的部署,要注意,在 /data/flexran 的开发成功,要复制到 /root/flexran 里面,然后用 release 容器来运行测试。

后期,当开发完成以后,会单独的重新制作 release 容器,dev 相关的容器在生成环境上,就都不用了,同理,那些复制文件的 job 也都不会在生产系统上运行。

end

linuxptp 3.11

# http://linuxptp.sourceforge.net/
# download linuxptp-3.1.1

# on a rhel8.4
dnf install -y linuxptp

# /etc/ptp4l.conf
# /etc/sysconfig/phc2sys
# /etc/sysconfig/ptp4l
# /etc/timemaster.conf
# /usr/lib/systemd/system/phc2sys.service
# /usr/lib/systemd/system/ptp4l.service
# /usr/lib/systemd/system/timemaster.service

cat /etc/sysconfig/phc2sys
# OPTIONS="-a -r"

cat /etc/sysconfig/ptp4l
# OPTIONS="-f /etc/ptp4l.conf -i eth0"

systemctl cat phc2sys
# # /usr/lib/systemd/system/phc2sys.service
# [Unit]
# Description=Synchronize system clock or PTP hardware clock (PHC)
# After=ntpdate.service ptp4l.service

# [Service]
# Type=simple
# EnvironmentFile=-/etc/sysconfig/phc2sys
# ExecStart=/usr/sbin/phc2sys $OPTIONS

# [Install]
# WantedBy=multi-user.target

systemctl cat ptp4l.service
# # /usr/lib/systemd/system/ptp4l.service
# [Unit]
# Description=Precision Time Protocol (PTP) service
# After=network-online.target
# Wants=network-online.target

# [Service]
# Type=simple
# EnvironmentFile=-/etc/sysconfig/ptp4l
# ExecStart=/usr/sbin/ptp4l $OPTIONS

# [Install]
# WantedBy=multi-user.target

mkdir -p /data/ptp
cd /data/ptp
wget https://nchc.dl.sourceforge.net/project/linuxptp/v3.1/linuxptp-3.1.1.tgz
tar zvxf linuxptp-3.1.1.tgz
cd linuxptp-3.1.1
make

cat << 'EOF' > ptp4l.sh
#!/bin/bash

# echo $DEMO_ENV_NIC > /demo.txt
# echo $DEMO_ENV_PTP4L_ARG >> /demo.txt

# ./ptp4l -f ./configs/default_zill.cfg -2 -i enp101s0f0   -m  > /home/ptp4l.log  2>&1 &
# /usr/local/sbin/ptp4l -f /etc/ptp4l.conf -2 -m -i $DEMO_ENV_NIC
/usr/local/sbin/ptp4l -f /etc/ptp4l.conf -m $DEMO_ENV_PTP4L_ARG

EOF

cat << 'EOF' > phc2sys.sh
#!/bin/bash

# echo $DEMO_ENV_NIC > /demo.1.txt
# echo $DEMO_ENV_PHC2SYS_ARG >> /demo1.txt

# ./phc2sys -s  enp101s0f0  -O 0 -m -R 8 >/home/phc2sys.log   2>&1 &
# /usr/local/sbin/phc2sys -s $DEMO_ENV_NIC -a -r -m -u 1 -O 0 -R 8 -z /var/run/ptp4l -t [phc2sys]
/usr/local/sbin/phc2sys -m -z /var/run/ptp4l -t [phc2sys] $DEMO_ENV_PHC2SYS_ARG

EOF

cat << 'EOF' > ts2phc.sh
#!/bin/bash

# echo $DEMO_ENV_NIC > /demo.2.txt
# echo $DEMO_ENV_TS2PHC_ARG >> /demo2.txt

# ./ts2phc -f ./configs/ts2phc-generic_GNSS0.cfg -s generic -m -c enp23s0f0 > /home/ts2phc.log 2>&1 &
# /usr/local/sbin/ts2phc -f /etc/ts2phc.cfg -s generic -m -c $DEMO_ENV_NIC
/usr/local/sbin/ts2phc -f /etc/ts2phc.cfg -m $DEMO_ENV_TS2PHC_ARG

EOF

cat << EOF > ./ptp.dockerfile
FROM registry.access.redhat.com/ubi8/ubi:8.4

COPY hwstamp_ctl nsm phc2sys phc_ctl pmc ptp4l timemaster ts2phc incdefs.sh version.sh ptp4l.sh phc2sys.sh ts2phc.sh /usr/local/sbin/
RUN cd /usr/local/sbin/ && chmod +x hwstamp_ctl nsm phc2sys phc_ctl pmc ptp4l timemaster ts2phc incdefs.sh version.sh ptp4l.sh phc2sys.sh ts2phc.sh

EOF

podman build --squash -t quay.io/nepdemo/linuxptp:3.1.1-ubi-8.4-v04 -f ptp.dockerfile ./

podman push quay.io/nepdemo/linuxptp:3.1.1-ubi-8.4-v04

cat << EOF > /data/install/ptp4l.conf
[global]
#
# Default Data Set
#
twoStepFlag             1
slaveOnly               0
priority1               128
priority2               128
domainNumber            24
#utc_offset             37
clockClass              248
clockAccuracy           0xFE
offsetScaledLogVariance 0xFFFF
free_running            0
freq_est_interval       1
dscp_event              0
dscp_general            0
dataset_comparison      ieee1588
G.8275.defaultDS.localPriority  128
#
# Port Data Set
# 16 TS a second use logSyncInterval  -4
logAnnounceInterval     1
logSyncInterval         -4
logMinDelayReqInterval  0
logMinPdelayReqInterval 0
announceReceiptTimeout  3
syncReceiptTimeout      0
delayAsymmetry          0
fault_reset_interval    4
neighborPropDelayThresh 20000000
masterOnly              0
G.8275.portDS.localPriority     128
#
# Run time options
#
assume_two_step         0
logging_level           6
path_trace_enabled      0
follow_up_info          0
hybrid_e2e              0
inhibit_multicast_service       0
net_sync_monitor        0
tc_spanning_tree        0
tx_timestamp_timeout    1
unicast_listen          0
unicast_master_table    0
unicast_req_duration    3600
use_syslog              1
verbose                 0
summary_interval        0
kernel_leap             1
check_fup_sync          0
#
# Servo Options
#
pi_proportional_const   0.0
pi_integral_const       0.0
pi_proportional_scale   0.0
pi_proportional_exponent        -0.3
pi_proportional_norm_max        0.7
pi_integral_scale       0.0
pi_integral_exponent    0.4
pi_integral_norm_max    0.3
step_threshold          0.0
first_step_threshold    0.00002
max_frequency           900000000
clock_servo             pi
sanity_freq_limit       200000000
ntpshm_segment          0
#
# Transport options
#
transportSpecific       0x0
ptp_dst_mac             01:1B:19:00:00:00
p2p_dst_mac             01:80:C2:00:00:0E
udp_ttl                 1
udp6_scope              0x0E
uds_address             /var/run/ptp4l
#
# Default interface options
#
clock_type              OC
network_transport       UDPv4
delay_mechanism         E2E
time_stamping           hardware
tsproc_mode             filter
delay_filter            moving_median
delay_filter_length     10
egressLatency           0
ingressLatency          0
boundary_clock_jbod     0
#
# Clock description
#
productDescription      ;;
revisionData            ;;
manufacturerIdentity    00:00:00
userDescription         ;
timeSource              0xA0
EOF

cat << EOF > /data/install/ts2phc.cfg
[global]
use_syslog              0
verbose                 1
logging_level           7
ts2phc.pulsewidth       100000000
# For GNSS module
ts2phc.nmea_serialport /dev/ttyGNSS_6500_0
[ens18f0]
ts2phc.extts_polarity rising
EOF

oc delete configmap ptp-config -n vbbu-demo

oc create configmap ptp-config -n vbbu-demo --from-file=/data/install/ptp4l.conf --from-file=/data/install/ts2phc.cfg --save-config=true

cat << 'EOF' > /data/install/ptp.demo.yaml
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
  labels:
    app: nepdemo-linuxptp-daemon
  name: nepdemo-linuxptp-daemon
  # namespace: openshift-ptp
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - worker-0.ocp4.redhat.ren
  containers:
  - name: ptp4l
    image: quay.io/nepdemo/linuxptp:3.1.1-ubi-8.4-v04
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh", "-c", "--"]
    args: [" /usr/local/sbin/ptp4l.sh ;"]
    env:
    - name: DEMO_ENV_PTP4L_ARG
      value: " -i ens18f0 -2 "
    securityContext:
      privileged: true
      runAsUser: 0    
    volumeMounts:
    - mountPath: /etc/ptp4l.conf
      subPath: ptp4l.conf
      name: config-volume
    - mountPath: /var/run/ptp4l
      name: socket-dir
  - name: phc2sys
    image: quay.io/nepdemo/linuxptp:3.1.1-ubi-8.4-v04
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh", "-c", "--"]
    args: [" /usr/local/sbin/phc2sys.sh ;"]
    env:
    - name: DEMO_ENV_PHC2SYS_ARG
      value: " -s ens18f0 -r -u 1 -O 0 -R 8 "      
    securityContext:
      privileged: true
      runAsUser: 0    
    volumeMounts:
    - mountPath: /etc/ptp4l.conf
      subPath: ptp4l.conf
      name: config-volume
    - mountPath: /var/run/ptp4l
      name: socket-dir
  - name: ts2phc
    image: quay.io/nepdemo/linuxptp:3.1.1-ubi-8.4-v04
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh", "-c", "--"]
    args: [" /usr/local/sbin/ts2phc.sh ;"]
    env:
    - name: DEMO_ENV_TS2PHC_ARG
      value: " -s generic -c ens18f0 "      
    securityContext:
      privileged: true
      runAsUser: 0    
    volumeMounts:
    - mountPath: /etc/ts2phc.cfg
      subPath: ts2phc.cfg
      name: config-volume
    - mountPath: /var/run/ptp4l
      name: socket-dir
    - name: dev
      mountPath: /dev
  hostNetwork: true
  hostPID: true
  serviceAccountName: svcacct-driver
  volumes:
  - configMap:
      defaultMode: 420
      name: ptp-config
    name: config-volume
  - hostPath:
      path: /var/run/ptp
      type: DirectoryOrCreate
    name: socket-dir
  - name: dev
    hostPath:
      path: "/dev"
EOF

oc create -n vbbu-demo -f /data/install/ptp.demo.yaml

# oc delete -n vbbu-demo -f /data/install/ptp.demo.yaml

baicell bbu

cd /home/BaiBBU_XSS/tools
./XRAN_BBU stop

./XRAN_BBU start

cat /home/BaiBBU_XSS/BaiBBU_SXSS/DU/bin/logs_gNB_DU

tail -f /home/BaiBBU_XSS/BaiBBU_SXSS/DU/bin/logs_gNB_DU

export tmp_path='/home/BaiBBU_XSS-A/BaiBBU_PXSS/PHY'

cd /data/flexran
cp -r libs $tmp_path/
cp -r sdk $tmp_path/
#cp -r tests flexran_build/flexran/
cp -r wls_mod $tmp_path/
cp -r xran $tmp_path/
#cd flexran_build/flexran/
#add remove flexran source code
rm -rf $tmp_path/sdk/test
rm -rf $tmp_path/sdk/source
rm -rf $tmp_path/libs/ferrybridge

cd /home/BaiBBU_XSS-A/BaiBBU_PXSS/PHY

cat /home/BaiBBU_XSS-A/BaiBBU_PXSS/PHY/bin/l1.sh

cat /home/BaiBBU_XSS-A/BaiBBU_PXSS/PHY/bin/Phy.log

# patch /home/BaiBBU_XSS-A/BaiBBU_PXSS/PHY/bin/l1.sh
# add env variable
# export DIR_WIRELESS_SDK=/data/flexran/sdk/build-avx512-icc
# export -n DIR_WIRELESS_SDK
# export DIR_WIRELESS_SDK=/home/BaiBBU_XSS-A/BaiBBU_PXSS/PHY/sdk/build-avx512-icc

cat /data/flexran/bin/nr5g/gnb/l1/l1.sh

cat /data/flexran/bin/nr5g/gnb/l1/Phy.log



finnaly, we find out, l1app co-works with gnb_du_mac, but both working as 'EAL: Auto-detected process type: PRIMARY' DPDK docs say, multiple processes can work together.

dhcp for ru


nmcli dev con ens1f0

nmcli connection mod ens1f0 ipv4.add 192.168.160.1/24 ipv4.method manual
nmcli con up ens1f0

cat /etc/sysconfig/dhcpd
# .......
# DHCPDARGS=ens1f0

cat /etc/dhcp/dhcpd.conf
# option callhomeip code 43 = string;
# subnet 192.168.160.0 netmask 255.255.255.0 {
#         range 192.168.160.10 192.168.160.100;
#         option domain-name-servers 192.168.160.1;
#         option routers 192.168.160.1;
#         option callhomeip 81:04:C0:A8:A0:A2;
#         default-lease-time 600;
#         max-lease-time 7200;
# }

some test, no use here

# intel icc repo
# https://www.intel.com/content/www/us/en/developer/articles/guide/installing-intel-parallel-studio-xe-runtime-2020-using-yum-repository.html


# offical oneapi docker image build
# https://hub.docker.com/r/intel/oneapi-basekit
# https://github.com/intel/oneapi-containers/blob/12932f721dd0201dfae85cacb62495924ecf42cf/images/docker/basekit/Dockerfile.centos-8

# using files/flexran.dockerfile
# buildah bud --squash -t quay.io/nepdemo/flexran_basekit:oneapi-basekit-official-ocp-4.9.5-ubi-8.4 -f flexran.dockerfile ./

# buildah push quay.io/nepdemo/flexran_basekit:oneapi-basekit-official-ocp-4.9.5-ubi-8.4

podman build --squash -t quay.io/nepdemo/flexran_basekit:oneapi-basekit-official-ocp-4.9.5-ubi-8.4 -f flexran.dockerfile ./

podman push quay.io/nepdemo/flexran_basekit:oneapi-basekit-official-ocp-4.9.5-ubi-8.4

# in container
echo 'distroverpkg=redhat-release' >> /etc/yum.conf

rpm -q --qf %{version} redhat-release;echo
# 8.4

rpm -q --provides $(rpm -q --whatprovides "system-release(releasever)")
# base-module(platform:el8)
# config(redhat-release) = 8.4-0.6.el8
# redhat-release = 8.4-0.6.el8
# redhat-release(x86-64) = 8.4-0.6.el8
# redhat-release-client
# redhat-release-computenode
# redhat-release-server
# redhat-release-workstation
# system-release = 8.4-0.6.el8
# system-release(releasever) = 8

dnf repolist
sed -i 's|enabled=1|enabled=0|g' /etc/yum/pluginconf.d/subscription-manager.conf
sed -i 's|$releasever|8.4|g' /etc/yum.repos.d/redhat.repo
sed -i '/codeready-builder-for-rhel-8-x86_64-rpms/,/\[/ s/enabled = 0/enabled = 1/' /etc/yum.repos.d/redhat.repo
mv -f /etc/yum.repos.d/ubi.repo /etc/yum.repos.d/ubi.repo.bak

cache dnf repo

mkdir -p /data/dnf
cd /data/dnf

dnf reposync -m --download-metadata --delete -n

dnf copr enable frostyx/modulemd-tools
dnf install -y modulemd-tools 

createrepo ./
repo2module . \
    --module-name foo \
    --module-stream devel \
    --module-version 123 \
    --module-context f32
createrepo_mod .

sriov setting for non-dpdk


# oc label node worker-2.ocp4.redhat.ren feature.node.kubernetes.io/network-sriov.capable="true"

# https://docs.openshift.com/container-platform/4.9/networking/hardware_networks/configuring-sriov-ib-attach.html
# Dynamic IP address (DHCP) assignment configuration
# require a dhcp server in cluster
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks:
  - name: dhcp-shim
    namespace: default
    type: Raw
    rawCNIConfig: |-
      {
        "name": "dhcp-shim",
        "cniVersion": "0.3.1",
        "type": "bridge",
        "ipam": {
          "type": "dhcp"
        }
      }
  # ...

oc get Network.operator.openshift.io/cluster -o yaml
# ......
# spec:
#   clusterNetwork:
#   - cidr: 10.254.0.0/16
#     hostPrefix: 24
#   defaultNetwork:
#     type: OpenShiftSDN
#   disableNetworkDiagnostics: false
#   logLevel: Normal
#   managementState: Managed
#   observedConfig: null
#   operatorLogLevel: Normal
#   serviceNetwork:
#   - 172.30.0.0/16
#   unsupportedConfigOverrides: null
# ......

# if you use ipam dhcp, then you do this, otherwise skip
oc edit Network.operator.openshift.io/cluster

oc get pod -n openshift-multus  | grep dhcp
# dhcp-daemon-4s2c4                     1/1     Running   0          3h11m
# dhcp-daemon-9lvch                     1/1     Running   0          3h11m
# dhcp-daemon-lhss5                     1/1     Running   0          3h11m
# dhcp-daemon-q8qmh                     1/1     Running   0          3h11m

一个简单的端到端的CI/CD演示 a simple, working ci/cd process demo

客户需求:

  1. 实现一个简单的ci/cd流程,因为现在容器的ci/cd流程还没有
  2. 不能影响已有的开发流程,也就是和现在的开发流程手动对接,从现在的开发流程里面,直接拿到二进制文件
  3. 可以使用公有云服务,包括github, quay.io
  4. 手动触发ci/cd流程,手动出发测试环境部署。

客户现场的局限:

  1. 公网连接的网速比较慢,大概1MB/s
  2. 服务器硬盘资源相对有限
  3. 服务器性质是做实验的,所以可能被临时征用做为他用。

架构设计:

架构设计要点:

  1. 公网服务采用github, quay.io,用途是持久化存储代码和镜像,避免内网服务器的不稳定或硬盘空间不足。同时在公网服务上编译基础镜像。
  2. 公司内网部署gitea, quay,并和公网服务同步。
  3. 采用openshift的pipeline, gitops功能,实现CI/CD流程。

视频讲解:

基础镜像 / base image

我们先来配置公有云服务的基础镜像构建。我们用quay.io来作为容器镜像存储平台,用github的action功能,来编译镜像。

用github action的功能,是因为未来,我们会基于redhat ubi来编译基础镜像,在这个过程中,需要导入redhat订阅文件,这个就对公有云上的ci/cd工具的灵活性有要求,那么我们就暂时用github的action来编译基础镜像。

quay.io

在quay.io上,配置robot账号

查看和记录robot账号的用户密码

给robot账号分配权限

reference:

  1. https://event-driven.io/en/how_to_buid_and_push_docker_image_with_github_actions/
  2. https://github.com/docker/build-push-action
  3. https://docs.github.com/cn/actions/publishing-packages/publishing-docker-images

github

已经制作了单独的github项目,作为镜像编译的源文件项目,项目中centos7目录中,有一个docker file文件,是基于centos7的镜像基础,并安装一些软件,最终结果打包,并上传 quay.io。不过,这个docker file依赖另外一个镜像,主要是需要那个镜像里面的一个安装包,我们之所以这样设计,是因为找不到一个合适的在公网上免费存储安装包的地方,于是我们就把这个很大的安装包,打到镜像里面,上传到公网的镜像仓库里面,需要用的时候,就采用这种多阶段编译的方式,来使用。

包含安装包的镜像如何制作,在项目文档中,有详细描述。

buildah from --name onbuild-container scratch
buildah copy onbuild-container nr5g_19.10.03.bz2 /
buildah umount onbuild-container 
buildah commit --rm onbuild-container quay.io/baicell/nr5g:latest
buildah push quay.io/baicell/nr5g:latest

项目中的.github/workflow目录下的main.yml文件,描述了激活github action,并且ci/cd的步骤。可以参考这个文件看公有云上,如何编译镜像。

github action里面,需要quay.io的robot账号信息,我们使用github的secret功能来实现。

gitee

由于不可描述的原因,国内访问github很不稳定,所以我们就用gitee来克隆github repo,让gitee变相做一个git代理。github clone to gitee

http proxy

由于我们的openshift环境是模拟的全离线环境,而我们的实验/方案,有一些操作是需要联网的,那么我们就需要部署一个http proxy,来模拟企业内部常有的访问互联网的代理。

podman run -d --rm --name tinyproxy -p 18080:8888 ghcr.io/wangzheng422/tinyproxy:latest

export http_proxy="http://192.168.7.1:18080"
export https_proxy=${http_proxy}

curl https://ifconfig.co

unset http_proxy
unset https_proxy


quay

我们来部署一个quay服务,同时激活远程镜像同步功能。由于项目架构设计(基础镜像已经在公有云上扫描了)和服务器资源现状,我们就不开启镜像扫描了。

# on 103
cat << EOF >> /etc/hosts

172.21.6.103 quaylab.infra.redhat.ren
EOF

export QUAY=/data/quay

# generate cert for *.redhat.ren

# 配置registry
mkdir -p /etc/crts/ && cd /etc/crts

# https://access.redhat.com/documentation/en-us/red_hat_codeready_workspaces/2.1/html/installation_guide/installing-codeready-workspaces-in-tls-mode-with-self-signed-certificates_crw
openssl genrsa -out /etc/crts/redhat.ren.ca.key 4096
openssl req -x509 \
  -new -nodes \
  -key /etc/crts/redhat.ren.ca.key \
  -sha256 \
  -days 36500 \
  -out /etc/crts/redhat.ren.ca.crt \
  -subj /CN="Local Red Hat Ren Signer" \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat /etc/pki/tls/openssl.cnf \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature'))

openssl genrsa -out /etc/crts/redhat.ren.key 2048

openssl req -new -sha256 \
    -key /etc/crts/redhat.ren.key \
    -subj "/O=Local Red Hat Ren /CN=*.ocp4.redhat.ren" \
    -reqexts SAN \
    -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "\n[SAN]\nsubjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.ocp4a.redhat.ren,DNS:*.apps.ocp4a.redhat.ren,DNS:*.ocp4b.redhat.ren,DNS:*.apps.ocp4b.redhat.ren,DNS:*.ocp4c.redhat.ren,DNS:*.apps.ocp4c.redhat.ren,DNS:*.ocp4s.redhat.ren,DNS:*.apps.ocp4s.redhat.ren,DNS:*.infra.redhat.ren,DNS:*.tool.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth")) \
    -out /etc/crts/redhat.ren.csr

openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.ocp4a.redhat.ren,DNS:*.apps.ocp4a.redhat.ren,DNS:*.ocp4b.redhat.ren,DNS:*.apps.ocp4b.redhat.ren,DNS:*.ocp4c.redhat.ren,DNS:*.apps.ocp4c.redhat.ren,DNS:*.ocp4s.redhat.ren,DNS:*.apps.ocp4s.redhat.ren,DNS:*.infra.redhat.ren,DNS:*.tool.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth") \
    -days 365 \
    -in /etc/crts/redhat.ren.csr \
    -CA /etc/crts/redhat.ren.ca.crt \
    -CAkey /etc/crts/redhat.ren.ca.key \
    -CAcreateserial -out /etc/crts/redhat.ren.crt

openssl x509 -in /etc/crts/redhat.ren.crt -text

/bin/cp -f /etc/crts/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

# first config quay
mkdir -p $QUAY/postgres-quay
setfacl -m u:26:-wx $QUAY/postgres-quay
podman run -d --rm --name postgresql-quay \
  -e POSTGRESQL_USER=quayuser \
  -e POSTGRESQL_PASSWORD=quaypass \
  -e POSTGRESQL_DATABASE=quay \
  -e POSTGRESQL_ADMIN_PASSWORD=adminpass \
  -p 5432:5432 \
  -v $QUAY/postgres-quay:/var/lib/pgsql/data:Z \
  registry.redhat.io/rhel8/postgresql-10:1

# Ensure that the Postgres pg_trgm module is installed, as it is required by Quay
podman exec -it postgresql-quay /bin/bash -c 'echo "CREATE EXTENSION IF NOT EXISTS pg_trgm" | psql -d quay -U postgres'  
# CREATE EXTENSION

podman run -d --rm --name redis \
  -p 6379:6379 \
  -e REDIS_PASSWORD=strongpassword \
  registry.redhat.io/rhel8/redis-5:1

podman run --rm -it --name quay_config -p 80:8080 -p 443:8443 registry.redhat.io/quay/quay-rhel8:v3.6.2 config secret

# go to http://quaylab.infra.redhat.ren
# Log in with the username quayconfig and password secret
# make config, and download the config

Database Type: Postgres
Database Server: quaylab.infra.redhat.ren:5432
Username: quayuser
Password: quaypass
Database Name: quay

Redis Hostname: quaylab.infra.redhat.ren
Redis port: 6379 (default)
Redis password: strongpassword

log path: /logarchive

super user: quayadmin

ctrl-c exit the container

# then run the quay
mkdir $QUAY/config
cp ~/Downloads/quay-config.tar.gz $QUAY/config
cd $QUAY/config
tar xvf quay-config.tar.gz

mkdir $QUAY/storage
setfacl -m u:1001:-wx $QUAY/storage

podman run -d --rm -p 80:8080 -p 443:8443  \
   --name=quay \
   -v $QUAY/config:/conf/stack:Z \
   -v $QUAY/storage:/datastorage:Z \
   registry.redhat.io/quay/quay-rhel8:v3.6.2

访问 http://quaylab.infra.redhat.ren

第一次使用,直接创建用户,我们创建quayadmin这个用户,因为之前在配置的时候,quayadmin这个用户是超级管理员。

# try it out
podman login quaylab.infra.redhat.ren
# Username: quayadmin
# Password: password

/bin/cp -f /run/user/0/containers/auth.json /data/registry.auth.json

# setup quay mirror
podman run -d --name mirroring-worker \
  -v $QUAY/config:/conf/stack:Z \
  registry.redhat.io/quay/quay-rhel8:v3.6.2 repomirror

# auto restart
cd ~/
podman generate systemd --new --files --name redis
podman generate systemd --new --files --name postgresql-quay
podman generate systemd --new --files --name quay
podman generate systemd --new --files --name mirroring-worker

cp -Z container-redis.service /usr/lib/systemd/system
cp -Z container-postgresql-quay.service /usr/lib/systemd/system
cp -Z container-quay.service /usr/lib/systemd/system
cp -Z container-mirroring-worker.service /usr/lib/systemd/system

systemctl daemon-reload

systemctl enable --now container-redis.service
systemctl enable --now container-postgresql-quay.service
systemctl enable --now container-quay.service
systemctl enable --now container-mirroring-worker.service

rm -f container*

用我们新创建的quayadmin用户,登录 创建一个组织 组织创建成功以后是这样· 我们在组织内部,创建镜像repo: base 创建成功了是这样 我们为了让这个repo能自动同步 quay.io,我们要把内网的这个repo设置成mirror类型。 然后我们要给sync操作,创建一个机器人账号 创建机器人账号很简单,起一个名字就好了。 给机器人账号分配repo的权限,由于我们是要从远端同步repo过来,所以这个机器人账号需要写权限。 在repo中,配置同步参数,包括上级repo位置,repo版本,同步频率等。 保存以后,能看到同步参数已经生效。点击sync now,就可以手动开始同步。 可以在 repo 的历史信息中,看到同步的进度。 看repo tag的信息,能看到远端的repo已经同步过来了。

参考资料:

openshift4

我们的演示,是围绕公司网络里面的容器平台openshift4的,所以我们要装一个openshift4,并且安装一些我们需要组件。

install ocp4

我们装一个最小版本的openshift4,只有一个节点,也就是master/worker混合部署,并且这个节点是kvm。

除了openshift4本身的节点以外,我们还需要helper kvm,这是因为openshift4的安装和运行,依赖云环境,比如load balancer, dns等,但是我们的实验室环境里面,这些都需要自己搭建提供,那么我们就创建一个helper kvm,来模拟和承载这些云服务。

# 配置openshift版本
# import openshift4 install images into quay
export BUILDNUMBER=4.9.12

# 解压缩openshift 客户端软件
tar -xzf /data/ocp4/${BUILDNUMBER}/openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/sbin/

# 向内部的容器镜像仓库quay,导入openshift4安装介质。
oc image mirror -a /data/registry.auth.json --from-dir=/data/file.registry/ 'file://openshift/release:4.9.12-x86_64*' quaylab.infra.redhat.ren/ocp4/openshift4

# 我们的openshift4是模拟离线模式,所以我们需要一个容器镜像proxy,来桥接下载容器镜像。
# setup nexus
mkdir -p /data/ccn
cd /data/ccn

podman create --name swap quay.io/wangzheng422/qimgs:nexus-fs-image-2022-01-14-2155 ls
podman cp swap:/nexus-image.tgz - > /data/ccn/nexus-image.tgz.tar
podman rm -fv swap
tar vxf nexus-image.tgz.tar
tar zvxf nexus-image.tgz
rm -f nexus-image.tgz*

chown -R 200 /data/ccn/nexus-image

podman run -d -p 8082:8081 -p 8083:8083 -it --name nexus-image -v /data/ccn/nexus-image:/nexus-data:Z docker.io/sonatype/nexus3:3.33.1

# auto start nexus
cd ~/
podman generate systemd --files --name nexus-image
cp -Z container-nexus-image.service  /usr/lib/systemd/system
systemctl daemon-reload
systemctl enable --now container-nexus-image.service

# 我们准备安装helper节点
# we follow single node ocp4 deployment
cd /data/kvm

wget -O rhel8.iso 'https://access.cdn.redhat.com/content/origin/files/sha256/1f/1f78e705cd1d8897a05afa060f77d81ed81ac141c2465d4763c0382aa96cadd0/rhel-8.5-x86_64-dvd.iso?user=a768b217cf6ae8041b67586bb4dd5c77&_auth_=1642400208_d400d34f0d5e2caab120537d05b0b8c9'

create_lv() {
    var_vg=$1
    var_lv=$2
    var_size=$3
    lvremove -f $var_vg/$var_lv
    lvcreate -y -L $var_size -n $var_lv $var_vg
    wipefs --all --force /dev/$var_vg/$var_lv
}

create_lv vgdata lvhelper 120G

create_lv vgdata lvbootstrap 120G
create_lv vgdata lvmaster0 120G

export http_proxy="http://192.168.195.54:5085"
export https_proxy=${http_proxy}

wget https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.9/scripts/helper-ks-rhel8.cfg

unset http_proxy
unset https_proxy

sed -i '0,/^network.*/s/^network.*/network  --bootproto=static --device=enp1s0 --gateway=192.168.7.1 --ip=192.168.7.11  --netmask=255.255.255.0 --nameserver=192.168.7.11  --ipv6=auto --activate/' helper-ks-rhel8.cfg
# https://stackoverflow.com/questions/18620153/find-matching-text-and-replace-next-line
sed -i '/^network.*/{n;s/^network.*/network  --hostname=ocp4-helper/}' helper-ks-rhel8.cfg

export KVM_DIRECTORY=/data/kvm
virt-install --name="ocp4-Helper" --vcpus=2 --ram=4096 \
--cpu=host-model \
--disk path=/dev/vgdata/lvhelper,device=disk,bus=virtio,format=raw \
--os-variant rhel8.5 --network bridge=baremetal,model=virtio \
--graphics vnc,port=59200 \
--boot menu=on \
--location ${KVM_DIRECTORY}/rhel8.iso \
--disk ${KVM_DIRECTORY}/rhel8.iso,device=cdrom \
--initrd-inject helper-ks-rhel8.cfg --extra-args "inst.ks=file:/helper-ks-rhel8.cfg" 

# 装好了helper vm,我们需要配置一下他
# config helper vm
ssh root@192.168.7.11

export YUMIP="192.168.7.1"
cat << EOF > /etc/yum.repos.d/remote.repo
[BaseOS]
name=BaseOS
baseurl=ftp://$YUMIP/rhel/dnf/rhel-8-for-x86_64-baseos-rpms
enabled=1
gpgcheck=0

[AppStream]
name=AppStream
baseurl=ftp://$YUMIP/rhel/dnf/rhel-8-for-x86_64-appstream-rpms
enabled=1
gpgcheck=0

[Ansible]
name=Ansible
baseurl=ftp://$YUMIP/rhel/dnf/ansible-2.9-for-rhel-8-x86_64-rpms
enabled=1
gpgcheck=0

EOF

sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config
systemctl restart sshd

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

echo "allow 192.0.0.0/8" >> /etc/chrony.conf
systemctl enable --now chronyd
# systemctl restart chronyd
chronyc tracking
chronyc sources -v
chronyc sourcestats -v
chronyc makestep

dnf update -y
reboot

dnf -y install ansible git unzip podman python3 buildah skopeo jq pigz

# copy in the ocp installer
mkdir -p /data/ocp4/
# scp ocp4.tgz to /data
# scp * root@192.168.7.11:/data/
cd /data
tar zvxf ocp.*.tgz
tar zvxf registry.*.tgz
cd /data/ocp4

rm -f /data/*.tgz

# update the certification for quay
mkdir -p /etc/crts/ && cd /etc/crts
# scp * root@192.168.7.11:/etc/crts/

/bin/cp -f /etc/crts/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

# create ssh key
ssh-keygen

# extract install ansible playbooks
cd /data/ocp4
unzip ocp4-upi-helpernode.zip
cd /data/ocp4/ocp4-upi-helpernode-master

# 给ansible playbook配置参数文件
cat << 'EOF' > /data/ocp4/ocp4-upi-helpernode-master/vars.yaml
---
ocp_version: 4.9.12
ssh_gen_key: false
staticips: true
firewalld: false
dns_forward: yes
iso:
  iso_dl_url: "/data/ocp4/rhcos-live.x86_64.iso"
  my_iso: "rhcos-live.iso" # this is internal file, just leave as it.
helper:
  name: "helper"
  ipaddr: "192.168.7.11"
  networkifacename: "enp1s0"
  gateway: "192.168.7.1"
  netmask: "255.255.255.0"
dns:
  domain: "redhat.ren"
  clusterid: "ocp4"
  forwarder1: "192.168.7.1"
  forwarder2: "192.168.7.1"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.12"
  interface: "enp1s0"
  install_drive: "vda"
  manual: false
masters:
  - name: "master-0"
    ipaddr: "192.168.7.13"
    interface: "enp1s0"
    install_drive: "vda"
    manual: false
  # - name: "master-1"
  #   ipaddr: "192.168.7.14"
  #   interface: "enp1s0"
  #   install_drive: "vda"    
  # - name: "master-2"
  #   ipaddr: "192.168.7.15"
  #   interface: "enp1s0"
  #   install_drive: "vda"    
workers:
  - name: "worker-0"
    ipaddr: "192.168.7.16"
    interface: "eno1"
    install_drive: "sda"
  # - name: "worker-1"
  #   ipaddr: "192.168.7.17"
  #   interface: "enp1s0"
  #   install_drive: "sda"
  # - name: "worker-2"
  #   ipaddr: "192.168.7.18"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "infra-0"
  #   ipaddr: "192.168.7.19"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "infra-1"
  #   ipaddr: "192.168.7.20"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "worker-3"
  #   ipaddr: "192.168.7.21"
  #   interface: "enp1s0"
  #   install_drive: "vda"
  # - name: "worker-4"
  #   ipaddr: "192.168.7.22"
  #   interface: "enp1s0"
  #   install_drive: "vda"
others:
  - name: "registry"
    ipaddr: "192.168.7.1"
  - name: "yum"
    ipaddr: "192.168.7.1"
  - name: "quay"
    ipaddr: "192.168.7.1"
  - name: "nexus"
    ipaddr: "192.168.7.1"
  - name: "git"
    ipaddr: "192.168.7.1"
otherdomains:
  - domain: "rhv.redhat.ren"
    hosts:
    - name: "manager"
      ipaddr: "192.168.7.71"
    - name: "rhv01"
      ipaddr: "192.168.7.72"
  - domain: "others.redhat.ren"
    hosts:
    - name: "*"
      ipaddr: "192.168.7.71"
    - name: "*.apps"
      ipaddr: "192.168.7.71"
  - domain: "infra.redhat.ren"
    hosts:
      - name: "registry"
        ipaddr: "192.168.7.1"
      - name: "yum"
        ipaddr: "192.168.7.1"
      - name: "quaylab"
        ipaddr: "192.168.7.1"
      - name: "nexus"
        ipaddr: "192.168.7.1"
      - name: "git"
        ipaddr: "192.168.7.1"
force_ocp_download: false
remove_old_config_files: false
ocp_client: "file:///data/ocp4/{{ ocp_version }}/openshift-client-linux-{{ ocp_version }}.tar.gz"
ocp_installer: "file:///data/ocp4/{{ ocp_version }}/openshift-install-linux-{{ ocp_version }}.tar.gz"
ocp_bios: "file:///data/ocp4/rhcos-metal.x86_64.raw.gz"
ppc64le: false
arch: 'x86_64'
chronyconfig:
  enabled: true
  content:
    - server: "192.168.7.11"
      options: iburst
setup_registry: # don't worry about this, just leave it here
  deploy: false
  registry_image: docker.io/library/registry:2
  local_repo: "ocp4/openshift4"
  product_repo: "openshift-release-dev"
  release_name: "ocp-release"
  release_tag: "4.6.1-x86_64"
ocp_filetranspiler: "file:///data/ocp4/filetranspiler.tgz"
registry_server: "registry.infra.redhat.ren:5443"
EOF

# ansible 脚本要运行很多次,这是第一次,主要是装云服务,配置他们
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars.yaml tasks/main.yml

mkdir -p /data/install
cd /data/install

# vi install-config.yaml 
cat << EOF > /data/install/install-config.yaml 
apiVersion: v1
baseDomain: redhat.ren
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 1
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: '{"auths":{"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"quaylab.infra.redhat.ren": {"auth": "cXVheWFkbWluOnBhc3N3b3Jk","email": "noemail@localhost"}}}'
sshKey: |
$( cat /root/.ssh/id_rsa.pub | sed 's/^/   /g' )
additionalTrustBundle: |
$( cat /etc/crts/redhat.ren.ca.crt | sed 's/^/   /g' )
imageContentSources:
- mirrors:
  - quaylab.infra.redhat.ren/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - quaylab.infra.redhat.ren/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
EOF

# 清空之前的openshift安装缓存,并且创建新的ignition files
cd /data/install/
/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap manifests master*[0-9] worker*[0-9] 

openshift-install create manifests --dir=/data/install

# 我们有一些自定义的ignition配置,把他们加进去
# copy ntp related config
/bin/cp -f  /data/ocp4/ocp4-upi-helpernode-master/machineconfig/* /data/install/openshift/

# copy image registry proxy related config
cd /data/ocp4
bash image.registries.conf.sh nexus.infra.redhat.ren:8083

/bin/cp -f /data/ocp4/image.registries.conf /etc/containers/registries.conf.d/

/bin/cp -f /data/ocp4/99-worker-container-registries.yaml /data/install/openshift
/bin/cp -f /data/ocp4/99-master-container-registries.yaml /data/install/openshift

# 创建 ignition 文件
cd /data/install/
openshift-install create ignition-configs --dir=/data/install

cd /data/ocp4/ocp4-upi-helpernode-master
# 我们来为每个主机,复制自己版本的ign,并复制到 web server 的目录下
ansible-playbook -e @vars.yaml tasks/ign.yml

# 我们为每个节点创建各自的iso文件
cd /data/ocp4/ocp4-upi-helpernode-master
ansible-playbook -e @vars.yaml tasks/iso.yml

# 接下来,我们把 master, worker 的启动iso复制到宿主机上
# 并启动kvm,将自动开始安装 master, worker 节点
# on kvm host 172.21.6.103
export KVM_DIRECTORY=/data/kvm

mkdir -p  ${KVM_DIRECTORY}
cd ${KVM_DIRECTORY}
scp root@192.168.7.11:/data/install/{*boot*,*master-0,*worker-0}.iso ${KVM_DIRECTORY}/

virt-install --name=ocp4-bootstrap --vcpus=4 --ram=8192 \
--disk path=/dev/vgdata/lvbootstrap,device=disk,bus=virtio,format=raw \
--os-variant rhel8.5 --network bridge=baremetal,model=virtio \
--graphics vnc,port=59001 \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-bootstrap.iso   

virt-install --name=ocp4-master-0 --vcpus=16 --ram=73728 \
--cpu=host-model \
--disk path=/dev/vgdata/lvmaster0,device=disk,bus=virtio,format=raw \
--os-variant rhel8.5 --network bridge=baremetal,model=virtio \
--graphics vnc,port=59002 \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-master-0.iso 

# 回到helper vm上,等待安装结束
# back to helper vm
cd /data/install
export KUBECONFIG=/data/install/auth/kubeconfig
echo "export KUBECONFIG=/data/install/auth/kubeconfig" >> ~/.bashrc
oc completion bash | sudo tee /etc/bash_completion.d/openshift > /dev/null

dnf -y install jq
oc get csr | grep -v Approved
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

cd /data/install
openshift-install wait-for bootstrap-complete --log-level debug

cd /data/install
openshift-install wait-for install-complete --log-level debug
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "eLVhg-TUx3X-fWYL9-dHepi"

install tekton(ci/cd pipeline)

openshift pipeline 官方安装文档写的很好,照着做,点一下鼠标就好了。

install argocd(ci/cd gitops)

openshift gitops官方安装文档写的很好,照着做,点一下鼠标就好了。

install hostpath-provisioner from kubevirt

我们需要在openshift上的简单存储方案,那么我们就借用openshift virtulization来搞,他里面有一个hostpath组件

以下是配置要点

# 在节点上创建对应目录,并设置selinux权限
cat << EOF > /data/install/host-path.yaml
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: 50-set-selinux-for-hostpath-master
  labels:
    machineconfiguration.openshift.io/role: master
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - contents: |
            [Unit]
            Description=Set SELinux chcon for hostpath baicell
            Before=kubelet.service

            [Service]
            Type=oneshot
            RemainAfterExit=yes
            ExecStartPre=-mkdir -p /var/hostpath
            ExecStart=chcon -Rt container_file_t /var/hostpath/

            [Install]
            WantedBy=multi-user.target
          enabled: true
          name: hostpath-baicell.service
EOF
oc create -f /data/install/host-path.yaml

# 创建hostpath配置
cat << EOF > /data/install/host-path-provision.yaml
apiVersion: hostpathprovisioner.kubevirt.io/v1beta1
kind: HostPathProvisioner
metadata:
  name: hostpath-provisioner
spec:
  imagePullPolicy: IfNotPresent
  pathConfig:
    path: "/var/hostpath" 
    useNamingPrefix: false 

EOF
oc create -f /data/install/host-path-provision.yaml -n openshift-cnv

# 创建storage class配置
cat << EOF > /data/install/host-path-storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hostpath-provisioner 
provisioner: kubevirt.io/hostpath-provisioner
reclaimPolicy: Delete 
volumeBindingMode: WaitForFirstConsumer 
EOF
oc create -f /data/install/host-path-storage-class.yaml

CI/CD in shell

CI/CD是一种理念,强调的是快速的从业务构想,到产品代码开发,到产品的上线以及后面的自动维护和升级。具体采用什么工具,各个团队和公司的情况不同,所以需要自己去研究。但是原则是,用自己熟悉的,能掌控的,能快速解决问题的就行。

根据我们的整体CI/CD设计,我们做2个版本的CI/CD,一个是用最简单的脚本实现,另外一个用openshift4的工具实现。这两个版本并没有谁好谁坏,只不过脚本的实现方式,更适合小团队,而openshift4的工具,更适合大团队。因为在大团队里面,沟通是一项成本非常高的活动,而openshift4中提供的这种工具,能很大程度的降低团队内部的沟通成本,同时简化和模版化的配置,降低配置错误的可能性,所以推荐团队使用。

容器镜像版本号

容器镜像都版本,比如 quay.io/baicell/fpga-driver:set_ip.v06 ,set_ip.v06就是版本号,这个版本号可以根据公司和团队的需求,进行格式定义,一般会有软件版本,架构信息,构建日期等信息。我们这个演示,大部分都用日期时间戳的方式。有的时候,我们还会把构建者信息通过label的方式放到容器里面,不过这样并不直观,一般是把需要的信息,尽量压缩到镜像的版本号里面。

注意,版本号格式虽然是随意的,但是一旦在公司/团队内部定义下来,就要尽量贯彻执行遵循执行。

build image && sync image

我们先来看看,怎么用脚本实现容器镜像的自动构建和自动上传/同步

for vbbu app

先看看vBBU这个应用的容器镜像构建,这个镜像的特点是非常大,我们已经在公有云上构建了他的编译基础版本,有6G大小,并使用quay的功能,把他异步的同步到公司内网中,接下来,我们就进行增量的构建。并把构建结果上传到公司内部的镜像仓库中。

# on helper vm
# get git repo from gitee, and copy to helper
mkdir -p /data/cicd
cd /data/cicd
wget -O main.zip https://gitee.com/wangzheng422/container.build.demo/repository/archive/main.zip
# scp main.zip root@192.168.7.11:/data/tmp/

cd /data/cicd
unzip main.zip
cd /data/cicd/container.build.demo-main/vbbu

var_date=$(date '+%Y-%m-%d-%H%M')
podman build --no-cache --build-arg REGISTRY=quaylab.infra.redhat.ren -t quaylab.infra.redhat.ren/baicell/vbbu:$var_date .
podman push quaylab.infra.redhat.ren/baicell/vbbu:$var_date

echo quaylab.infra.redhat.ren/baicell/vbbu:$var_date

# sync to public cloud
podman tag quaylab.infra.redhat.ren/baicell/vbbu:$var_date quay.io/baicell/vbbu:$var_date
podman push quay.io/baicell/vbbu:$var_date

for fpga driver

接下来,我们看看如何构建fpga driver的容器镜像。这个进行很小,我们就直接在github上面,用action直接自动构建了。

以下是假设我们需要在公有云上,手动编译的步骤。

# on public cloud host (vultr)
git clone https://github.com/wangzheng422/container.build.demo
cd container.build.demo/fpga

var_date=$(date '+%Y-%m-%d-%H%M')
podman build --no-cache -t quay.io/baicell/fpga-driver:$var_date -f driver.Dockerfile .

podman push quay.io/baicell/fpga-driver:$var_date

auto deploy to openshift

自动化部署,我们采用k8s原生支持的kustomize来做。用kustomize倒不是他多强大,只不过他很简单,可以整体上线和下线。

# on helper vm
oc new-project baicell
oc project baicell

oc create sa demo
oc adm policy add-scc-to-user privileged -z demo

mkdir -p /data/cicd
cd /data/cicd
wget -O main.zip https://gitee.com/wangzheng422/container.build.demo/repository/archive/main.zip

unzip main.zip
cd container.build.demo-main/deploy.demo/

# oc new-project baicell

oc -n baicell kustomize .
oc -n baicell apply -k .

# to restore
oc -n baicell delete -k .

CI/CD in openshift4

现在我们来看看,如果用openshift4里面自带的功能,如何用开源的方式来实现ci/cd。

tekton / pipeline

首先我们来看看pipeline/tekton。我们先把配置过程,用截屏的方式记录一下。

我们已经定义好了一个pipeline,这个pipeline有2步组成,一个是用git的方式,从远端clone一个项目,另外一个,是用buildah来编译镜像。

我们点击编辑pipeline以后,进入了编辑流水线的页面,略过名称这里的配置,我们能看到一个流程图编辑界面,鼠标放到其中的步骤上,可以新增步骤/task,注意最后有一个workspace,这里面,我们需要配置一个存储,好让数据可以在不同task之间流动。

点击某一个步骤/task以后,我们就可以配置这个步骤的参数,以buildah为例子,我们给他配置镜像名称等参数。

pipeline的每次运行,都会有记录,叫pipeline run,我们可以进入每个pipeline run,看那一次运行的日志。

pipeline会对所有属于自己的pipeline run,进行简单的统计。

接下来,我们看配置的一些点。

oc new-project demo

oc project demo

# 要给service account创建push用的token
# https://docs.openshift.com/container-platform/4.9/openshift_images/managing_images/using-image-pull-secrets.html
oc create secret docker-registry pipeline-push-quaylab \
    --docker-server=quaylab.infra.redhat.ren \
    --docker-username=quayadmin \
    --docker-password=password \
    --docker-email=quayadmin@redhat.ren

oc secrets link pipeline pipeline-push-quaylab --for=pull,mount

# 我们需要定义存储,给pipeline使用
# we define a pvc for the pipeline
cat << EOF > /data/cicd/pipeline.pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pipeline-vbbu-image-build
  namespace: demo
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: hostpath-provisioner
  volumeMode: Filesystem
EOF
oc create -f /data/cicd/pipeline.pvc.yaml

# 我们在界面上定义的pipeline,实际的yaml长这个样子,可以直接在命令行上创建。
# we define a pipeline
cat << EOF > /data/cicd/pipeline.yaml
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: vbbu-build-image-pipeline
  namespace: demo
spec:
  params:
    - default: demo
      description: docker image tag
      name: image_tag
      type: string
  tasks:
    - name: git-clone
      params:
        - name: url
          value: 'https://gitee.com/wangzheng422/container.build.demo'
        - name: httpProxy
          value: 'http://192.168.7.1:18080'
        - name: httpsProxy
          value: 'http://192.168.7.1:18080'
      taskRef:
        kind: ClusterTask
        name: git-clone
      workspaces:
        - name: output
          workspace: workspace-demo
    - name: buildah
      params:
        - name: IMAGE
          value: 'quaylab.infra.redhat.ren/baicell/vbbu:$(params.image_tag)'
        - name: DOCKERFILE
          value: vbbu/Dockerfile
        - name: CONTEXT
          value: vbbu/
        - name: TLSVERIFY
          value: 'false'
        - name: BUILD_EXTRA_ARGS
          value: '--build-arg REGISTRY=''quaylab.infra.redhat.ren'''
      runAfter:
        - git-clone
      taskRef:
        kind: ClusterTask
        name: buildah
      workspaces:
        - name: source
          workspace: workspace-demo
  workspaces:
    - name: workspace-demo
EOF
oc create -f /data/cicd/pipeline.yaml

argocd / gitops

接下来,我们来看看openshift4里面的gitops,他使用argocd做的。

gitops/argocd有专门的界面,在这里找登录位置,直接点击,用SSO登录就可以,默认operator都给你配置好了。

接入gitops/argocd界面后,我们要配置git源

然后创建应用

应用配置的关键信息,有git源,目标cluster(默认就好),git里面的路径等。后面有yaml配置,可以直接使用。

应用创建好以后,长这个样子

我们进入应用,看到了应用的结构图,很漂亮,我们点击同步,让gitops生效。这个时候,系统会根据git里面的yaml配置,来创建k8s对象。

gitops成功以后,拓扑图更好看了,他把隐藏创建的一些系统对象,也显示出来了。

回到概览页面,能看到,我们的应用已经正常了。

以下是一些用到的命令。

# 给被管理project打标签,让这个project被gitops管理。
oc label namespace baicell argocd.argoproj.io/managed-by=openshift-gitops

oc api-resources  | grep argo
# applications                          app,apps           argoproj.io/v1alpha1                           true         Application
# applicationsets                       appset,appsets     argoproj.io/v1alpha1                           true         ApplicationSet
# appprojects                           appproj,appprojs   argoproj.io/v1alpha1                           true         AppProject
# argocds                                                  argoproj.io/v1alpha1                           true         ArgoCD

oc project openshift-gitops

# 创建我们的gitops应用,这个可以直接创建,剩的在界面上敲字了。
cat << EOF > /data/cicd/gitops-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: demo
  namespace: openshift-gitops
spec:
  destination:
    namespace: baicell
    server: https://kubernetes.default.svc
  project: default
  source:
    path: deploy.demo
    repoURL: https://gitee.com/wangzheng422/container.build.demo
EOF
oc create -f /data/cicd/gitops-app.yaml

oc get applications
# NAME   SYNC STATUS   HEALTH STATUS
# demo   Synced        Healthy

end

openshift/k8s, 远程shell / oc exec, 原理

我们在日常运维openshift/k8s的时候,经常会运行oc exec命令,比如

oc exec -it $pod_name -- bash

但是,有的时候,这个session会连不上,有的时候,这个session会突然中断,接下来我们就尝试看看这个命令背后的原理

oc exec -v 6 -it pod/du-deployment1-58944f9f85-8m49m -- bash
# I1230 14:38:39.347429  188014 loader.go:372] Config loaded from file:  /data/install/auth/kubeconfig
# I1230 14:38:39.370718  188014 round_trippers.go:454] GET https://api.ocp4s.redhat.ren:6443/api/v1/namespaces/default/pods/du-deployment1-58944f9f85-8m49m 200 OK in 10 milliseconds
# I1230 14:38:39.376109  188014 podcmd.go:88] Defaulting container name to du-container1
# I1230 14:38:39.396350  188014 round_trippers.go:454] POST https://api.ocp4s.redhat.ren:6443/api/v1/namespaces/default/pods/du-deployment1-58944f9f85-8m49m/exec?command=bash&container=du-container1&stdin=true&stdout=true&tty=true 101 Switching Protocols in 19 milliseconds
#                                                                                                                             [root@du-deployment1-58944f9f85-8m49m /]#


oc exec -v 7 -it pod/du-deployment1-58944f9f85-8m49m -- bash
# I1230 14:39:13.441167  188023 loader.go:372] Config loaded from file:  /data/install/auth/kubeconfig
# I1230 14:39:13.450807  188023 round_trippers.go:432] GET https://api.ocp4s.redhat.ren:6443/api/v1/namespaces/default/pods/du-deployment1-58944f9f85-8m49m
# I1230 14:39:13.450830  188023 round_trippers.go:438] Request Headers:
# I1230 14:39:13.450837  188023 round_trippers.go:442]     Accept: application/json, */*
# I1230 14:39:13.450842  188023 round_trippers.go:442]     User-Agent: oc/4.9.0 (linux/amd64) kubernetes/96e95ce
# I1230 14:39:13.465425  188023 round_trippers.go:457] Response Status: 200 OK in 14 milliseconds
# I1230 14:39:13.473072  188023 podcmd.go:88] Defaulting container name to du-container1
# I1230 14:39:13.475155  188023 round_trippers.go:432] POST https://api.ocp4s.redhat.ren:6443/api/v1/namespaces/default/pods/du-deployment1-58944f9f85-8m49m/exec?command=bash&container=du-container1&stdin=true&stdout=true&tty=true
#                                                                                  I1230 14:39:13.475182  188023 round_trippers.go:438] Request Headers:
#    I1230 14:39:13.475187  188023 round_trippers.go:442]     X-Stream-Protocol-Version: v4.channel.k8s.io
#                                                                                                         I1230 14:39:13.475191  188023 round_trippers.go:442]     X-Stream-Protocol-Version: v3.channel.k8s.io
#                                                           I1230 14:39:13.475195  188023 round_trippers.go:442]     X-Stream-Protocol-Version: v2.channel.k8s.io
#             I1230 14:39:13.475199  188023 round_trippers.go:442]     X-Stream-Protocol-Version: channel.k8s.io
#                                                                                                               I1230 14:39:13.475203  188023 round_trippers.go:442]     User-Agent: oc/4.9.0 (linux/amd64) kubernetes/96e95ce
#                                                                          I1230 14:39:13.496289  188023 round_trippers.go:457] Response Status: 101 Switching Protocols in 21 milliseconds
#                                       [root@du-deployment1-58944f9f85-8m49m /]#

上面2个命令,我们打开了log,等级设置不同,可以看到oc exec命令,其实是调用了api server上的pod接口,然后通道协议切换到了x-stream

那么我们在项目上,发现oc exec不稳定,那就要先去看api server是不是正常,在通往api server的通路上,是不是有haproxy之类的代理,代理是否正常。这样逐步的排查。

reference

  • https://www.cnblogs.com/a00ium/p/10905279.html
  • https://cloud.redhat.com/blog/executing-commands-in-pods-using-k8s-api
  • https://docs.openshift.com/container-platform/4.9/rest_api/workloads_apis/pod-core-v1.html#apiv1namespacesnamespacepodsnameexec

nf_conntrack 在 openshift4.9上的处理

最近看到一个case,是运行了高负载docker应用的主机上,nf_conntrack报告table full。其实这是一个老问题了,原因是docker在处理容器网络的时候,默认会用nat的方式,也就是容器里面看到的地址空间是一个私有地址,需要操作系统iptables/nftables来转换。而这个转化,就需要nf_conntrack来追踪和支持。

也没什么太好的解决办法,要么就用host network,绕过nat,要么就用no tracking的方式,让iptables别记录nf_conntrack。总之没有什么特别好的办法。

openshift

openshift是一个容器平台,那么openshift上是怎么处理的呢?我们实际来看看。

# 可以看到,openshift上,对应vxlan的通讯,不进行nf_conntrack的追踪,也就是说,对于vxlan的通讯,不会被记录在nf_conntrack中。
iptables -L -v -n -t raw
# Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
#  pkts bytes target     prot opt in     out     source               destination
#   88M   39G OPENSHIFT-NOTRACK  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* disable conntrack for vxlan */

# Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
#  pkts bytes target     prot opt in     out     source               destination
#   87M   54G OPENSHIFT-NOTRACK  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* disable conntrack for vxlan */

# Chain OPENSHIFT-NOTRACK (2 references)
#  pkts bytes target     prot opt in     out     source               destination
#     0     0 CT         udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:4789 NOTRACK

# 统计总的连接跟踪数
conntrack -L -o extended | wc -l
# conntrack v1.4.4 (conntrack-tools): 760 flow entries have been shown.
# 760
 
# 统计 TCP 协议各个状态的连接跟踪数
conntrack -L -o extended | awk '/^.*tcp.*$/ {sum[$6]++} END {for(i in sum) print i, sum[i]}'
# conntrack v1.4.4 (conntrack-tools): 774 flow entries have been shown.
# LAST_ACK 1
# CLOSE 78
# ESTABLISHED 428
# SYN_SENT 1
# TIME_WAIT 214
 
# 统计各个源 IP 的连接跟踪数
conntrack -L -o extended | awk '{print $7}' | cut -d "=" -f 2 | sort | uniq -c | sort -nr | head -n 10
# conntrack v1.4.4 (conntrack-tools): 805 flow entries have been shown.
#     226 10.128.0.1
#     225 192.168.7.73
#      74 192.168.7.71
#      68 10.128.0.36
#      61 172.30.0.10
#      38 127.0.0.1
#      13 10.128.0.16
#      10 10.128.0.34
#       7 10.128.0.39
#       5 10.128.0.9

# 如果没有conntrack的话,可以用下面的命令来查看
awk -F'=' '{c[$2]++} END {for ( i in c) print i,c[i]}' /proc/net/nf_conntrack | sort -g -k 3
# ...ignored...
# 10.128.0.16 dst 13
# 10.128.0.37 dst 14
# 10.128.0.43 dst 27
# 127.0.0.1 dst 38
# 192.168.7.71 dst 62
# 10.128.0.36 dst 67
# 0000:0000:0000:0000:0000:0000:0000:0001 dst 208
# 192.168.7.73 dst 220
# 10.128.0.1 dst 230

# 如果链接数量太大的话,用下面的命令先把链接信息导出来,然后在别的机器上排序
awk -F'=' '{print $2}' /proc/net/nf_conntrack > list

awk '{c[$1]++} END {for ( i in c) print i,c[i]}' list | sort -g -k 2

reference

  • https://blog.cloudflare.com/conntrack-turns-a-blind-eye-to-dropped-syns/

  • https://blog.cloudflare.com/conntrack-tales-one-thousand-and-one-flows/

  • https://www.codeleading.com/article/31982187817/

  • https://blog.longwin.com.tw/2018/07/linux-nf-conntrack-table-full-drop-packet-2018/

  • https://www.reddit.com/r/docker/comments/iq04tw/nated_containers_conntrack_table_full_inside/

  • https://forum.proxmox.com/threads/how-to-disable-nf_conntrack-completely.17957/

  • https://docs.docker.com/network/iptables/

  • https://blog.csdn.net/chunnidong6528/article/details/100975427

  • https://www.cnblogs.com/sreops/p/14023368.html

  • https://zyh.cool/posts/f41d0763/

  • https://www.redhat.com/en/blog/mitigate-tcp-syn-flood-attacks-red-hat-enterprise-linux-7-beta

  • https://access.redhat.com/discussions/6307391

  • https://access.redhat.com/solutions/781873

openshift 4.9 加载第三方驱动 / 内核模块

我们在项目中,会遇到特种硬件,比如 fpga 卡,软件供应商为这个 fpga 卡提供了驱动/内核模块,我们需要把这个驱动加载到系统中。本文就讲述,如何在 openshift 4.9 里面,通过 deployment / pod 的方式,想系统注入这个驱动/内核模块。

在本次实验中,物理机上有一块fpga卡,我们得到了对应的驱动 nr_drv_wr.ko ,这个驱动加载以后,会创建一个网卡,我们要初始化这个网卡。

好了,就让我们来看看是怎么做的吧。

制作镜像

我们把驱动拷贝到镜像里面,还把自动加载脚本也复制到镜像里面。自动加载脚本里面,有一个小技巧,就是 ko 文件,需要打上正确的selinux 标签,否则 insmod 会报错。


mkdir -p /data/wzh/fpga
cd /data/wzh/fpga

cat << 'EOF' > ./ocp4.install.sh
#!/bin/bash

set -e
set -x

if  chroot /host lsmod  | grep nr_drv > /dev/null 2>&1
then
    echo NR Driver Module had loaded!
else
    echo Inserting NR Driver Module
    # chroot /host rmmod nr_drv > /dev/null 2>&1

    if [ $(uname -r) == "4.18.0-305.19.1.rt7.91.el8_4.x86_64" ];
    then
        echo insmod nr_drv_wr.ko ...
        /bin/cp -f nr_drv_wr.ko /host/tmp/nr_drv_wr.ko
        chroot /host chcon -t modules_object_t /tmp/nr_drv_wr.ko
        chroot /host insmod /tmp/nr_drv_wr.ko load_xeth=1
        /bin/rm -f /host/tmp/nr_drv_wr.ko

        CON_NAME=`chroot /host nmcli -g GENERAL.CONNECTION dev show xeth`

        chroot /host nmcli connection modify "$CON_NAME" con-name xeth
        chroot /host nmcli connection modify xeth ipv4.method disabled ipv6.method disabled
        chroot /host nmcli dev conn xeth
    else
        echo insmod nr_drv_ko Failed!
    fi

fi
EOF

cat << EOF > ./fpga.dockerfile
FROM docker.io/busybox:1.34

USER root
COPY Driver.PKG /Driver.PKG

COPY ocp4.install.sh /ocp4.install.sh
RUN chmod +x /ocp4.install.sh

WORKDIR /
EOF

buildah bud -t registry.ocp4.redhat.ren:5443/nep/fgpa-driver:v07 -f fpga.dockerfile .

buildah push registry.ocp4.redhat.ren:5443/nep/fgpa-driver:v07

openshift 部署

部署之前,我们先给service account加上特权模式,我们这个实验,在default project里面,用了default service account,所以命令就在下面,但是到了具体项目中,一般是要创建单独的project,并且创建单独的service account的。

然后我们用了几个小技巧,首先用init container,把驱动复制进pod,传递给真正运行的容器,然后我们无限睡眠,保持这个pod运行,这么做是因为,如果容器正常退出了,deployment会自动重启,但是我们这里不想自动重启,所以我们无限睡眠,保持这个pod运行。好在这个 pod 消耗很小。

未来可能会优化成用 job / static pod 的方式来运行。


oc adm policy add-scc-to-user privileged -z default -n default

cat << EOF > /data/install/fpga.driver.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fpga-driver
  # namespace: default
  labels:
    app: fpga-driver
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fpga-driver
  template:
    metadata:
      labels:
        app: fpga-driver
    spec:
      hostPID: true
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - fpga-driver
              topologyKey: "kubernetes.io/hostname"
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - worker-0
      # restartPolicy: Never
      initContainers:
      - name: copy
        image: registry.ocp4.redhat.ren:5443/nep/fgpa-driver:v07
        command: ["/bin/sh", "-c", "tar zvxf /Driver.PKG --strip 1 -C /nep/driver/ && /bin/cp -f /ocp4.install.sh /nep/driver/ "]
        imagePullPolicy: Always
        volumeMounts:
        - name: driver-files
          mountPath: /nep/driver/
      containers:
      - name: driver
        image: registry.redhat.io/rhel8/support-tools:8.4
        # imagePullPolicy: Always
        command: [ "/usr/bin/bash","-c","cd /nep/driver/ && bash ./ocp4.install.sh && sleep infinity " ]
        # command: [ "/usr/bin/bash","-c","tail -f /dev/null || true " ]
        resources:
          requests:
            cpu: 10m
            memory: 20Mi
        securityContext:
          privileged: true
          # runAsUser: 0
          seLinuxOptions:
            level: "s0"
        volumeMounts:
        - name: driver-files
          mountPath: /nep/driver/
        - name: host
          mountPath: /host
      volumes: 
      - name: driver-files
        emptyDir: {}
      - name: host
        hostPath:
          path: /
          type: Directory
EOF
oc create -f /data/install/fpga.driver.yaml

# to restore
oc delete -f /data/install/fpga.driver.yaml


sign the kernel model

CHAPTER 4. SIGNING KERNEL MODULES FOR SECURE BOOT

helm chart / helm operator 制作

2021.12 helm chart/helm operator

build helm operator

mkdir -p /data/down
cd /data/down
wget https://mirror.openshift.com/pub/openshift-v4/clients/operator-sdk/latest/operator-sdk-linux-x86_64.tar.gz
tar zvxf operator-sdk-linux-x86_64.tar.gz
install operator-sdk /usr/local/bin/

operator-sdk init --plugins helm --help

mkdir -p /data/helm
cd /data/helm

# 初始化项目
operator-sdk init \
    --plugins=helm \
    --project-name nep-helm-operator \
    --domain=nep.com \
    --group=apps \
    --version=v1alpha1 \
    --kind=VBBU 

make bundle
# operator-sdk generate kustomize manifests -q

# Display name for the operator (required):
# > nep vBBU

# Description for the operator (required):
# > nep vRAN application including fpga driver, vCU, vDU

# Provider's name for the operator (required):
# > nep

# Any relevant URL for the provider name (optional):
# > na.nep.com

# Comma-separated list of keywords for your operator (required):
# > nep,vbbu,vran,vcu,vdu

# Comma-separated list of maintainers and their emails (e.g. 'name1:email1, name2:email2') (required):
# >
# No list provided.
# Comma-separated list of maintainers and their emails (e.g. 'name1:email1, name2:email2') (required):
# > wangzheng:wangzheng422@foxmail.com
# cd config/manager && /data/helm/bin/kustomize edit set image controller=quay.io/nep/nep-helm-operator:latest
# /data/helm/bin/kustomize build config/manifests | operator-sdk generate bundle -q --overwrite --version 0.0.1
# INFO[0001] Creating bundle.Dockerfile
# INFO[0001] Creating bundle/metadata/annotations.yaml
# INFO[0001] Bundle metadata generated suceessfully
# operator-sdk bundle validate ./bundle
# INFO[0000] All validation tests have completed successfully

cd /data/helm/helm-charts/vbbu
helm lint

dnf install -y podman-docker

cd /data/helm/
make docker-build
# docker build -t quay.io/nep/nep-helm-operator:v01 .
# Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
# STEP 1/5: FROM registry.redhat.io/openshift4/ose-helm-operator:v4.9
# STEP 2/5: ENV HOME=/opt/helm
# --> 1eec2f9c094
# STEP 3/5: COPY watches.yaml ${HOME}/watches.yaml
# --> 1836589a08c
# STEP 4/5: COPY helm-charts  ${HOME}/helm-charts
# --> b6cd9f24e47
# STEP 5/5: WORKDIR ${HOME}
# COMMIT quay.io/nep/nep-helm-operator:v01
# --> 1f9bcc4cecc
# Successfully tagged quay.io/nep/nep-helm-operator:v01
# 1f9bcc4cecc55e68170e2a6f45dad7b318018df8bf3989bd990f567e3ccdfcd9

make docker-push
# docker push quay.io/nep/nep-helm-operator:v01
# Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
# Getting image source signatures
# Copying blob 8cd9b2cfbe06 skipped: already exists
# Copying blob 5bc03dec6239 skipped: already exists
# Copying blob 525ed45dbdb1 skipped: already exists
# Copying blob 758ace4ace74 skipped: already exists
# Copying blob deb6b0f93acd skipped: already exists
# Copying blob ac83cd3b61fd skipped: already exists
# Copying blob 12f964d7475b [--------------------------------------] 0.0b / 0.0b
# Copying config 1f9bcc4cec [--------------------------------------] 0.0b / 4.0KiB
# Writing manifest to image destination
# Copying config 1f9bcc4cec [--------------------------------------] 0.0b / 4.0KiB
# Writing manifest to image destination
# Storing signatures

make bundle-build BUNDLE_IMG=quay.io/nep/nep-helm-operator:bundle-v01
# docker build -f bundle.Dockerfile -t quay.io/nep/nep-helm-operator:bundle-v01 .
# Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
# STEP 1/14: FROM scratch
# STEP 2/14: LABEL operators.operatorframework.io.bundle.mediatype.v1=registry+v1
# --> Using cache b67edfbd23d6ba9c3f484a1e01f9da79fbffdc44e913423e2f616e477df372e1
# --> b67edfbd23d
# STEP 3/14: LABEL operators.operatorframework.io.bundle.manifests.v1=manifests/
# --> Using cache f2eef5180d3c9c63f40a98880ec95088b8395845e0f90960a194326d77a6f3b4
# --> f2eef5180d3
# STEP 4/14: LABEL operators.operatorframework.io.bundle.metadata.v1=metadata/
# --> Using cache 6fc10718a71e30d31cc652b47ac27ca87901ff4fda17a25e2d6bc53344e50673
# --> 6fc10718a71
# STEP 5/14: LABEL operators.operatorframework.io.bundle.package.v1=nep-helm-operator
# --> Using cache 6664d1d6c64c0954c18a432194845551e5a0c6f9bba33175d77c8791e2b0f6e0
# --> 6664d1d6c64
# STEP 6/14: LABEL operators.operatorframework.io.bundle.channels.v1=alpha
# --> Using cache 32878b9e903851bb51b6c0635c77112b4244f4ce7e9d8a7b0a0d8cf7fe7bbe0e
# --> 32878b9e903
# STEP 7/14: LABEL operators.operatorframework.io.metrics.builder=operator-sdk-v1.10.1-ocp
# --> Using cache c5482c80a3287494a5f35ee8df782f4499ad6def2aaa55652e5fc57d4dfa8f0d
# --> c5482c80a32
# STEP 8/14: LABEL operators.operatorframework.io.metrics.mediatype.v1=metrics+v1
# --> Using cache 68822f2fae03c5efc8b980882f66e870d8942d80dbf697e3d784c46f95c50437
# --> 68822f2fae0
# STEP 9/14: LABEL operators.operatorframework.io.metrics.project_layout=helm.sdk.operatorframework.io/v1
# --> Using cache a85519d2774008b3071baf6098ec59561102ef1f337acd19b2c7ef739ebae89e
# --> a85519d2774
# STEP 10/14: LABEL operators.operatorframework.io.test.mediatype.v1=scorecard+v1
# --> Using cache 17a1b08e1dca2295f98e3288d592a08636d15d7461e25e11744a499160a1546c
# --> 17a1b08e1dc
# STEP 11/14: LABEL operators.operatorframework.io.test.config.v1=tests/scorecard/
# --> Using cache 9b6a20b0ff75b501a321fe4fbdfd1d284763e65596dc85675f119e5e3de69657
# --> 9b6a20b0ff7
# STEP 12/14: COPY bundle/manifests /manifests/
# --> Using cache ff3aa5b299dae11f464d8ad56f4ae5130974e1cebd0cf273bc03aba11fcb7377
# --> ff3aa5b299d
# STEP 13/14: COPY bundle/metadata /metadata/
# --> Using cache 19395ef3259bbb4e1f5da9616195139698a3ef18e7f904a2a1cd7515cd9829f3
# --> 19395ef3259
# STEP 14/14: COPY bundle/tests/scorecard /tests/scorecard/
# --> Using cache 2268eb0a731f424f70e5b46222a1accd5344560ac9ab609ca3ccb5a4d0cd6669
# COMMIT quay.io/nep/nep-helm-operator:bundle-v01
# --> 2268eb0a731
# Successfully tagged quay.io/nep/nep-helm-operator:bundle-v01
# Successfully tagged quay.io/nep/nep-helm-operator-bundle:v0.0.1
# 2268eb0a731f424f70e5b46222a1accd5344560ac9ab609ca3ccb5a4d0cd6669


make bundle-push BUNDLE_IMG=quay.io/nep/nep-helm-operator:bundle-v01
# make docker-push IMG=quay.io/nep/nep-helm-operator:bundle-v01
# make[1]: Entering directory '/data/helm'
# docker push quay.io/nep/nep-helm-operator:bundle-v01
# Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
# Getting image source signatures
# Copying blob 24b54377030e skipped: already exists
# Copying blob 1929cd83db02 skipped: already exists
# Copying blob 44ef63131a17 [--------------------------------------] 0.0b / 0.0b
# Copying config 2268eb0a73 done
# Writing manifest to image destination
# Copying config 2268eb0a73 [--------------------------------------] 0.0b / 3.3KiB
# Writing manifest to image destination
# Storing signatures
# make[1]: Leaving directory '/data/helm'

make catalog-build CATALOG_IMG=quay.io/nep/nep-helm-operator:catalog-v01  BUNDLE_IMG=quay.io/nep/nep-helm-operator:bundle-v01 
# ./bin/opm index add --mode semver --tag quay.io/nep/nep-helm-operator:catalog-v01 --bundles quay.io/nep/nep-helm-operator:bundle-v01
# INFO[0000] building the index                            bundles="[quay.io/nep/nep-helm-operator:bundle-v01]"
# INFO[0000] resolved name: quay.io/nep/nep-helm-operator:bundle-v01
# INFO[0000] fetched                                       digest="sha256:1365e5913f05b733124a2a88c3113899db0c42f62b5758477577ef2117aff09f"
# INFO[0000] fetched                                       digest="sha256:be008c9c2b4f2c031b301174608accb8622c8d843aba2d1af4d053d8b00373c2"
# INFO[0000] fetched                                       digest="sha256:2268eb0a731f424f70e5b46222a1accd5344560ac9ab609ca3ccb5a4d0cd6669"
# INFO[0000] fetched                                       digest="sha256:d8e28b323fec2e4de5aecfb46c4ce3e315e20f49b78f43eb7a1d657798695655"
# INFO[0000] fetched                                       digest="sha256:c19ac761be31fa163ea3da95cb63fc0c2aaca3b316bfb049f6ee36f77522d323"
# INFO[0001] unpacking layer: {application/vnd.docker.image.rootfs.diff.tar.gzip sha256:d8e28b323fec2e4de5aecfb46c4ce3e315e20f49b78f43eb7a1d657798695655 2985 [] map[] <nil>}
# INFO[0001] unpacking layer: {application/vnd.docker.image.rootfs.diff.tar.gzip sha256:c19ac761be31fa163ea3da95cb63fc0c2aaca3b316bfb049f6ee36f77522d323 398 [] map[] <nil>}
# INFO[0001] unpacking layer: {application/vnd.docker.image.rootfs.diff.tar.gzip sha256:be008c9c2b4f2c031b301174608accb8622c8d843aba2d1af4d053d8b00373c2 438 [] map[] <nil>}
# INFO[0001] Could not find optional dependencies file     dir=bundle_tmp582129875 file=bundle_tmp582129875/metadata load=annotations
# INFO[0001] found csv, loading bundle                     dir=bundle_tmp582129875 file=bundle_tmp582129875/manifests load=bundle
# INFO[0001] loading bundle file                           dir=bundle_tmp582129875/manifests file=apps.nep.com_vbbus.yaml load=bundle
# INFO[0001] loading bundle file                           dir=bundle_tmp582129875/manifests file=nep-helm-operator-controller-manager-metrics-service_v1_service.yaml load=bundle
# INFO[0001] loading bundle file                           dir=bundle_tmp582129875/manifests file=nep-helm-operator-manager-config_v1_configmap.yaml load=bundle
# INFO[0001] loading bundle file                           dir=bundle_tmp582129875/manifests file=nep-helm-operator-metrics-reader_rbac.authorization.k8s.io_v1_clusterrole.yaml load=bundle
# INFO[0001] loading bundle file                           dir=bundle_tmp582129875/manifests file=nep-helm-operator.clusterserviceversion.yaml load=bundle
# INFO[0001] Generating dockerfile                         bundles="[quay.io/nep/nep-helm-operator:bundle-v01]"
# INFO[0001] writing dockerfile: index.Dockerfile322782265  bundles="[quay.io/nep/nep-helm-operator:bundle-v01]"
# INFO[0001] running podman build                          bundles="[quay.io/nep/nep-helm-operator:bundle-v01]"
# INFO[0001] [podman build --format docker -f index.Dockerfile322782265 -t quay.io/nep/nep-helm-operator:catalog-v01 .]  bundles="[quay.io/nep/nep-helm-operator:bundle-v01]"

make catalog-push CATALOG_IMG=quay.io/nep/nep-helm-operator:catalog-v01
# make docker-push IMG=quay.io/nep/nep-helm-operator:catalog-v01
# make[1]: Entering directory '/data/helm'
# docker push quay.io/nep/nep-helm-operator:catalog-v01
# Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
# Getting image source signatures
# Copying blob 8a20ae5d4166 done
# Copying blob a98a386b6ec2 skipped: already exists
# Copying blob 4e7f383eb531 skipped: already exists
# Copying blob bc276c40b172 skipped: already exists
# Copying blob b15904f6a114 skipped: already exists
# Copying blob 86aadf4df7dc skipped: already exists
# Copying config 5d5d1c219c done
# Writing manifest to image destination
# Storing signatures
# make[1]: Leaving directory '/data/helm'

export OPERATOR_VERION=v04

make docker-build IMG=quay.io/nep/nep-helm-operator:$OPERATOR_VERION

make docker-push IMG=quay.io/nep/nep-helm-operator:$OPERATOR_VERION

make bundle IMG=quay.io/nep/nep-helm-operator:$OPERATOR_VERION

make bundle-build bundle-push catalog-build catalog-push \
    BUNDLE_IMG=quay.io/nep/nep-helm-operator:bundle-$OPERATOR_VERION \
    CATALOG_IMG=quay.io/nep/nep-helm-operator:catalog-$OPERATOR_VERION

# on openshift helper node
cat << EOF > /data/install/nep.catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: nep
  namespace: openshift-marketplace
spec:
  displayName: nep
  publisher: nep
  sourceType: grpc
  image: ghcr.io/wangzheng422/nep-helm-operator:catalog-2021-12-03-0504
  updateStrategy:
    registryPoll:
      interval: 10m
EOF
oc create -f /data/install/nep.catalog.yaml
# to restore
oc delete -f /data/install/nep.catalog.yaml

helm repository

https://medium.com/@mattiaperi/create-a-public-helm-chart-repository-with-github-pages-49b180dbb417

# try to build the repo, and add it into github action
# mkdir -p /data/helm/helm-repo
cd /data/helm/helm-repo

helm package ../helm-charts/*

helm repo index --url https://wangzheng422.github.io/nep-helm-operator/ .

# try to use the repo
helm repo add myhelmrepo https://wangzheng422.github.io/nep-helm-operator/

helm repo list
# NAME            URL
# myhelmrepo      https://wangzheng422.github.io/nep-helm-operator/

helm search repo vbbu
# NAME            CHART VERSION   APP VERSION     DESCRIPTION
# myhelmrepo/vbbu 0.1.0           1.16.0          A Helm chart for Kubernetes

# for ocp, if you are disconnected
cat << EOF > /data/install/helm.nep.yaml
apiVersion: helm.openshift.io/v1beta1
kind: HelmChartRepository
metadata:
  name: nep-helm-charts-wzh
spec:
 # optional name that might be used by console
  name: nep-helm-charts-wzh
  connectionConfig:
    url: http://nexus.ocp4.redhat.ren:8082/repository/wangzheng422.github.io/
EOF
oc create -f /data/install/helm.nep.yaml

MetalLB layer2 mode on openshift 4.8

openshift对外提供服务,默认是router的方式,里面是一个haproxy,但是默认只是支持http/https,定制一下,可以支持tcp。这种配置方法不是很直观,特别是tcp的支持也很鸡肋。

我们已经知道metalLB可以帮助service之间暴露external IP,并且通过BGP的方式广播出去,但是在PoC的时候,BGP路由器还是比较难搞,好在metalLB还提供了layer2的方式,更简单的对外暴露external IP.

本次实验部署架构图:

安装 MetalLB

安装MetalLB非常简单

https://metallb.universe.tf/installation/clouds/#metallb-on-openshift-ocp


mkdir -p /data/install/metallb
cd /data/install/metallb

wget https://raw.githubusercontent.com/metallb/metallb/v0.10.2/manifests/namespace.yaml
wget https://raw.githubusercontent.com/metallb/metallb/v0.10.2/manifests/metallb.yaml

sed -i '/runAsUser: 65534/d' ./metallb.yaml

oc create -f /data/install/metallb/namespace.yaml
oc adm policy add-scc-to-user privileged -n metallb-system -z speaker
oc create -f /data/install/metallb/metallb.yaml

# to restore
oc delete -f /data/install/metallb/metallb.yaml

配置 MetalLB

# on helper
cat << EOF > /data/install/metal-bgp.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: my-ip-space
      protocol: layer2
      addresses:
        - 192.168.7.150-192.168.7.200
EOF
oc create -f /data/install/metal-bgp.yaml

# to restore
oc delete -f /data/install/metal-bgp.yaml

创建测试应用

# back to helper vm

cat << EOF > /data/install/demo.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: test-0
  labels:
    env: test
spec:
  restartPolicy: OnFailure
  nodeSelector:
    kubernetes.io/hostname: 'master-0'
  containers:
  - name: php
    image: "quay.io/wangzheng422/php:demo.02"
---
apiVersion: v1
kind: Pod
metadata:
  name: test-1
  labels:
    env: test
spec:
  restartPolicy: OnFailure
  nodeSelector:
    kubernetes.io/hostname: 'worker-0'
  containers:
  - name: php
    image: "quay.io/wangzheng422/php:demo.02"
---
kind: Service
apiVersion: v1
metadata:
  name: demo
spec:
  type: LoadBalancer
  ports:
    - name: "http"
      protocol: TCP
      port: 80
      targetPort: 80
  selector:
    env: test
EOF
oc create -f /data/install/demo.yaml

# to restore
oc delete -f /data/install/demo.yaml

oc get all
# NAME                         READY   STATUS              RESTARTS   AGE
# pod/mypod-787d79b456-4f4xr   1/1     Running             4          4d17h
# pod/test-0                   0/1     ContainerCreating   0          4s
# pod/test-1                   1/1     Running             0          4s

# NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP                            PORT(S)        AGE
# service/demo         LoadBalancer   172.30.178.14   192.168.7.150                          80:30781/TCP   4s
# service/kubernetes   ClusterIP      172.30.0.1      <none>                                 443/TCP        5d16h
# service/openshift    ExternalName   <none>          kubernetes.default.svc.cluster.local   <none>         5d16h

# NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
# deployment.apps/mypod   1/1     1            1           4d17h

# NAME                               DESIRED   CURRENT   READY   AGE
# replicaset.apps/mypod-787d79b456   1         1         1       4d17h

oc get pod -o wide
# NAME                     READY   STATUS    RESTARTS   AGE     IP            NODE       NOMINATED NODE   READINESS GATES
# mypod-787d79b456-4f4xr   1/1     Running   4          4d17h   10.254.1.19   worker-0   <none>           <none>
# test-0                   1/1     Running   0          9m36s   10.254.0.74   master-0   <none>           <none>
# test-1                   1/1     Running   0          9m36s   10.254.1.65   worker-0   <none>           <none>

oc get svc/demo -o yaml
# apiVersion: v1
# kind: Service
# metadata:
#   creationTimestamp: "2021-08-31T06:39:39Z"
#   name: demo
#   namespace: default
#   resourceVersion: "2277414"
#   uid: 6f36e7a4-ee2e-4f86-802e-6053debecfb2
# spec:
#   clusterIP: 172.30.178.14
#   clusterIPs:
#   - 172.30.178.14
#   externalTrafficPolicy: Cluster
#   ipFamilies:
#   - IPv4
#   ipFamilyPolicy: SingleStack
#   ports:
#   - name: http
#     nodePort: 30781
#     port: 80
#     protocol: TCP
#     targetPort: 80
#   selector:
#     env: test
#   sessionAffinity: None
#   type: LoadBalancer
# status:
#   loadBalancer:
#     ingress:
#     - ip: 192.168.7.150

for i in {1..10}
do
   curl 192.168.7.150 && echo
done
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.1.65
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.1.65
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.1.65
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.1.65
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.74
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.1.65
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.74
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.1.65
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.74
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.1.65

arp -a
# ? (10.88.0.3) at 9a:b9:62:83:0f:75 [ether] on cni-podman0
# master-2.ocp4.redhat.ren (192.168.7.15) at <incomplete> on enp1s0
# ? (10.88.0.2) at 4e:de:d9:d5:f8:f1 [ether] on cni-podman0
# master-1.ocp4.redhat.ren (192.168.7.14) at <incomplete> on enp1s0
# ? (192.168.7.150) at 52:54:00:d2:ba:43 [ether] on enp1s0
# worker-1.ocp4.redhat.ren (192.168.7.17) at <incomplete> on enp1s0
# _gateway (172.21.6.254) at 00:17:94:73:12:c2 [ether] on enp1s0
# master-0.ocp4.redhat.ren (192.168.7.13) at 52:54:00:d2:ba:43 [ether] on enp1s0
# worker-0.ocp4.redhat.ren (192.168.7.16) at 90:b1:1c:44:d6:0f [ether] on enp1s0
# bootstrap.ocp4.redhat.ren (192.168.7.12) at <incomplete> on enp1s0

到worker-0上,看看 nft 规则

# go to worker-0 to analyze the nat rules
nft list ruleset | grep 192.168.7.150
                # meta l4proto tcp ip daddr 192.168.7.150  tcp dport 80 counter packets 0 bytes 0 jump KUBE-FW-CTBMGJDNUDRWEDVR

nft list ruleset | grep KUBE-FW-CTBMGJDNUDRWEDVR -A 5
#                 meta l4proto tcp ip daddr 192.168.7.150  tcp dport 80 counter packets 0 bytes 0 jump KUBE-FW-CTBMGJDNUDRWEDVR
#                 meta l4proto tcp @nh,96,16 != 2814 ip daddr 172.30.35.8  tcp dport 80 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                 meta l4proto tcp ip daddr 172.30.35.8  tcp dport 80 counter packets 0 bytes 0 jump KUBE-SVC-T3U64PSX3UGU57NF
#                 meta l4proto tcp @nh,96,16 != 2814 ip daddr 172.30.152.93  tcp dport 80 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                 meta l4proto tcp ip daddr 172.30.152.93  tcp dport 80 counter packets 0 bytes 0 jump KUBE-SVC-ZOXDBRX7A3I2MI4S
#                 meta l4proto tcp @nh,96,16 != 2814 ip daddr 172.30.99.142  tcp dport 8443 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
# --
#         chain KUBE-FW-CTBMGJDNUDRWEDVR {
#                  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                  counter packets 0 bytes 0 jump KUBE-SVC-CTBMGJDNUDRWEDVR
#                  counter packets 0 bytes 0 jump KUBE-MARK-DROP
#         }


nft list ruleset | grep KUBE-SVC-CTBMGJDNUDRWEDVR -A 3
#                 meta l4proto tcp ip daddr 172.30.178.14  tcp dport 80 counter packets 0 bytes 0 jump KUBE-SVC-CTBMGJDNUDRWEDVR
#                 meta l4proto tcp ip daddr 192.168.7.150  tcp dport 80 counter packets 0 bytes 0 jump KUBE-FW-CTBMGJDNUDRWEDVR
#                 meta l4proto tcp @nh,96,16 != 2814 ip daddr 172.30.35.8  tcp dport 80 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                 meta l4proto tcp ip daddr 172.30.35.8  tcp dport 80 counter packets 0 bytes 0 jump KUBE-SVC-T3U64PSX3UGU57NF
# --
#                 meta l4proto tcp  tcp dport 30781 counter packets 0 bytes 0 jump KUBE-SVC-CTBMGJDNUDRWEDVR
#         }

#         chain KUBE-SVC-HH47JV2DWEPNMQEX {
# --
#         chain KUBE-SVC-CTBMGJDNUDRWEDVR {
#                   counter packets 0 bytes 0 jump KUBE-SEP-CGMBWTJH33MIKSJY
#                  counter packets 0 bytes 0 jump KUBE-SEP-V5VBCVCJRZSWQ4D6
#         }
# --
#                  counter packets 0 bytes 0 jump KUBE-SVC-CTBMGJDNUDRWEDVR
#                  counter packets 0 bytes 0 jump KUBE-MARK-DROP
#         }

nft list ruleset | grep KUBE-SEP-CGMBWTJH33MIKSJY -A 3
#                   counter packets 0 bytes 0 jump KUBE-SEP-CGMBWTJH33MIKSJY
#                  counter packets 0 bytes 0 jump KUBE-SEP-V5VBCVCJRZSWQ4D6
#         }

# --
#         chain KUBE-SEP-CGMBWTJH33MIKSJY {
#                 ip saddr 10.254.0.74  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                 meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.254.0.74:80
#         }

nft list ruleset | grep KUBE-SEP-V5VBCVCJRZSWQ4D6 -A 3
#                  counter packets 0 bytes 0 jump KUBE-SEP-V5VBCVCJRZSWQ4D6
#         }

#         chain KUBE-FW-CTBMGJDNUDRWEDVR {
# --
#         chain KUBE-SEP-V5VBCVCJRZSWQ4D6 {
#                 ip saddr 10.254.1.65  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                 meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.254.1.65:80
#         }


nft --handle --numeric list ruleset | grep random
                #  counter packets 0 bytes 0 masquerade  random-fully  # handle 13

看看iptables的规则

iptables -L -v -n -t nat | grep 192.168.7.150
    # 0     0 KUBE-FW-CTBMGJDNUDRWEDVR  tcp  --  *      *       0.0.0.0/0            192.168.7.150        /* default/demo:http loadbalancer IP */ tcp dpt:80

iptables -L -v -n -t nat | grep KUBE-FW-CTBMGJDNUDRWEDVR -A 5
#     0     0 KUBE-FW-CTBMGJDNUDRWEDVR  tcp  --  *      *       0.0.0.0/0            192.168.7.150        /* default/demo:http loadbalancer IP */ tcp dpt:80
#     0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.254.0.0/16        172.30.210.66        /* openshift-kube-scheduler-operator/metrics:https cluster IP */ tcp dpt:443
#     0     0 KUBE-SVC-HH47JV2DWEPNMQEX  tcp  --  *      *       0.0.0.0/0            172.30.210.66        /* openshift-kube-scheduler-operator/metrics:https cluster IP */ tcp dpt:443
#     0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.254.0.0/16        172.30.55.237        /* openshift-apiserver-operator/metrics:https cluster IP */ tcp dpt:443
#     0     0 KUBE-SVC-CIUYVLZDADCHPTYT  tcp  --  *      *       0.0.0.0/0            172.30.55.237        /* openshift-apiserver-operator/metrics:https cluster IP */ tcp dpt:443
#     0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.254.0.0/16        172.30.134.31        /* openshift-pipelines/tekton-pipelines-controller:probes cluster IP */ tcp dpt:8080
# --
# Chain KUBE-FW-CTBMGJDNUDRWEDVR (1 references)
#  pkts bytes target     prot opt in     out     source               destination
#     0     0 KUBE-MARK-MASQ  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http loadbalancer IP */
#     0     0 KUBE-SVC-CTBMGJDNUDRWEDVR  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http loadbalancer IP */
#     0     0 KUBE-MARK-DROP  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http loadbalancer IP */

iptables -L -v -n -t nat | grep KUBE-SVC-CTBMGJDNUDRWEDVR -A 4
#     0     0 KUBE-SVC-CTBMGJDNUDRWEDVR  tcp  --  *      *       0.0.0.0/0            172.30.178.14        /* default/demo:http cluster IP */ tcp dpt:80
#     0     0 KUBE-FW-CTBMGJDNUDRWEDVR  tcp  --  *      *       0.0.0.0/0            192.168.7.150        /* default/demo:http loadbalancer IP */ tcp dpt:80
#     0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.254.0.0/16        172.30.210.66        /* openshift-kube-scheduler-operator/metrics:https cluster IP */ tcp dpt:443
#     0     0 KUBE-SVC-HH47JV2DWEPNMQEX  tcp  --  *      *       0.0.0.0/0            172.30.210.66        /* openshift-kube-scheduler-operator/metrics:https cluster IP */ tcp dpt:443
#     0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.254.0.0/16        172.30.55.237        /* openshift-apiserver-operator/metrics:https cluster IP */ tcp dpt:443
# --
#     0     0 KUBE-SVC-CTBMGJDNUDRWEDVR  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http */ tcp dpt:30781

# Chain KUBE-SVC-HH47JV2DWEPNMQEX (1 references)
#  pkts bytes target     prot opt in     out     source               destination
#     0     0 KUBE-SEP-XIWZUKNCQE6LJCFA  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* openshift-kube-scheduler-operator/metrics:https */
# --
# Chain KUBE-SVC-CTBMGJDNUDRWEDVR (3 references)
#  pkts bytes target     prot opt in     out     source               destination
#     0     0 KUBE-SEP-CGMBWTJH33MIKSJY  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http */ statistic mode random probability 0.50000000000
#     0     0 KUBE-SEP-V5VBCVCJRZSWQ4D6  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http */

# --
#     0     0 KUBE-SVC-CTBMGJDNUDRWEDVR  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http loadbalancer IP */
#     0     0 KUBE-MARK-DROP  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http loadbalancer IP */

# Chain KUBE-SEP-V5VBCVCJRZSWQ4D6 (1 references)
#  pkts bytes target     prot opt in     out     source               destination

iptables -L -v -n -t nat | grep KUBE-SEP-CGMBWTJH33MIKSJY -A 3
#     0     0 KUBE-SEP-CGMBWTJH33MIKSJY  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http */ statistic mode random probability 0.50000000000
#     0     0 KUBE-SEP-V5VBCVCJRZSWQ4D6  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http */

# Chain KUBE-FW-CTBMGJDNUDRWEDVR (1 references)
# --
# Chain KUBE-SEP-CGMBWTJH33MIKSJY (1 references)
#  pkts bytes target     prot opt in     out     source               destination
#     0     0 KUBE-MARK-MASQ  all  --  *      *       10.254.0.74          0.0.0.0/0            /* default/demo:http */
#     0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http */ tcp to:10.254.0.74:80

iptables -L -v -n -t nat | grep KUBE-SEP-V5VBCVCJRZSWQ4D6 -A 3
#     0     0 KUBE-SEP-V5VBCVCJRZSWQ4D6  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http */

# Chain KUBE-FW-CTBMGJDNUDRWEDVR (1 references)
#  pkts bytes target     prot opt in     out     source               destination
# --
# Chain KUBE-SEP-V5VBCVCJRZSWQ4D6 (1 references)
#  pkts bytes target     prot opt in     out     source               destination
#     0     0 KUBE-MARK-MASQ  all  --  *      *       10.254.1.65          0.0.0.0/0            /* default/demo:http */
#     0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/demo:http */ tcp to:10.254.1.65:80


MetalLB BGP mode on openshift 4.8

openshift对外提供服务,默认是router的方式,里面是一个haproxy,但是默认只是支持http/https,定制一下,可以支持tcp。这种配置方法不是很直观,特别是tcp的支持也很鸡肋。我们希望的方式,是k8s service直接暴露一个对外服务ip,并且通过bgp广播出去。今天,我们就看看metalLB项目如何帮助我们达到这个目的。

本次实验部署架构图:

视频讲解:

安装 MetalLB

安装MetalLB非常简单

https://metallb.universe.tf/installation/clouds/#metallb-on-openshift-ocp


mkdir -p /data/install/metallb
cd /data/install/metallb

wget https://raw.githubusercontent.com/metallb/metallb/v0.10.2/manifests/namespace.yaml
wget https://raw.githubusercontent.com/metallb/metallb/v0.10.2/manifests/metallb.yaml

sed -i '/runAsUser: 65534/d' ./metallb.yaml

oc create -f namespace.yaml
oc adm policy add-scc-to-user privileged -n metallb-system -z speaker
oc create -f metallb.yaml

创建路由器

我们用一个 kvm 来模拟 bgp 路由器

  • https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/setting-your-routing-protocols_configuring-and-managing-networking#intro-to-frr_setting-your-routing-protocols
  • https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/iproute_bgp/configuration/xe-16/irg-xe-16-book/bgp-dynamic-neighbors.html
  • https://ipbgp.com/2018/02/07/quagga/
  • https://docs.frrouting.org/en/latest/bgp.html
# to setup a router vm for testing
# go to kvm host
cd /data/kvm

wget https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.8/scripts/helper-ks-rocky.cfg

sed -i '0,/^network.*/s/^network.*/network  --bootproto=static --device=enp1s0 --gateway=172.21.6.254 --ip=172.21.6.10  --netmask=255.255.255.0 --nameserver=172.21.1.1  --ipv6=auto --activate/' helper-ks-rocky.cfg

sed -i '0,/^network  --hostname.*/s/^network  --hostname.*/network  --hostname=bgp-router/' helper-ks-rocky.cfg

virt-install --name="bgp-router" --vcpus=2 --ram=2048 \
--cpu=host-model \
--disk path=/data/nvme/bgp-router.qcow2,bus=virtio,size=30 \
--os-variant rhel8.4 --network bridge=baremetal,model=virtio \
--graphics vnc,port=49000 \
--boot menu=on --location /data/kvm/Rocky-8.4-x86_64-minimal.iso \
--initrd-inject helper-ks-rocky.cfg --extra-args "inst.ks=file:/helper-ks-rocky.cfg" 

# in the bgp-router vm
nmcli con mod enp1s0 +ipv4.addresses "192.168.7.10/24"
nmcli con up enp1s0

systemctl disable --now firewalld

dnf install -y frr

sed -i 's/bgpd=no/bgpd=yes/g' /etc/frr/daemons
systemctl enable --now frr

# 进入路由器配置界面
vtysh
# 以下是 bgp 路由器配置
router bgp 64512
 neighbor metallb peer-group
 neighbor metallb remote-as 64512
 bgp listen limit 200
 bgp listen range 192.168.7.0/24 peer-group metallb

配置 MetalLB 和 bgp-router 进行配对

# on helper
cat << EOF > /data/install/metal-bgp.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    peers:
    - my-asn: 64512
      peer-asn: 64512
      peer-address: 192.168.7.10
    address-pools:
    - name: my-ip-space
      protocol: bgp
      avoid-buggy-ips: true
      addresses:
      - 198.51.100.0/24
EOF
oc create -f /data/install/metal-bgp.yaml

# to restore
oc delete -f /data/install/metal-bgp.yaml

回到 bgp-router 看看路由情况

# back to bgp-router vm
vtysh

bgp-router# show ip bgp summary

IPv4 Unicast Summary:
BGP router identifier 192.168.7.10, local AS number 64512 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 2, using 43 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt
*192.168.7.13   4      64512         2         2        0    0    0 00:00:25            0        0
*192.168.7.16   4      64512         2         2        0    0    0 00:00:25            0        0

Total number of neighbors 2
* - dynamic neighbor
2 dynamic neighbor(s), limit 200

我们看到,集群里面的2个node,分别和路由器建立的peer关系。

创建测试应用

# back to helper vm

cat << EOF > /data/install/demo.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: test-0
  labels:
    env: test
spec:
  restartPolicy: OnFailure
  nodeSelector:
    kubernetes.io/hostname: 'master-0'
  containers:
  - name: php
    image: "quay.io/wangzheng422/php:demo.02"
---
apiVersion: v1
kind: Pod
metadata:
  name: test-1
  labels:
    env: test
spec:
  restartPolicy: OnFailure
  nodeSelector:
    kubernetes.io/hostname: 'worker-0'
  containers:
  - name: php
    image: "quay.io/wangzheng422/php:demo.02"
---
kind: Service
apiVersion: v1
metadata:
  name: demo
spec:
  type: LoadBalancer
  ports:
    - name: "http"
      protocol: TCP
      port: 80
      targetPort: 80
  selector:
    env: test
EOF
oc create -f /data/install/demo.yaml

# to restore
oc delete -f /data/install/demo.yaml

oc get all
# NAME                         READY   STATUS    RESTARTS   AGE
# pod/mypod-787d79b456-4f4xr   1/1     Running   3          3d23h
# pod/test-0                   1/1     Running   0          2m28s
# pod/test-1                   1/1     Running   0          2m28s

# NAME                 TYPE           CLUSTER-IP     EXTERNAL-IP                            PORT(S)        AGE
# service/demo         LoadBalancer   172.30.82.87   198.51.100.1                           80:32203/TCP   2m28s
# service/kubernetes   ClusterIP      172.30.0.1     <none>                                 443/TCP        4d22h
# service/openshift    ExternalName   <none>         kubernetes.default.svc.cluster.local   <none>         4d22h

# NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
# deployment.apps/mypod   1/1     1            1           3d23h

# NAME                               DESIRED   CURRENT   READY   AGE
# replicaset.apps/mypod-787d79b456   1         1         1       3d23h

oc get pod -o wide
# NAME                     READY   STATUS    RESTARTS   AGE     IP             NODE       NOMINATED NODE   READINESS GATES
# mypod-787d79b456-4f4xr   1/1     Running   3          4d      10.254.1.2     worker-0   <none>           <none>
# test-0                   1/1     Running   0          8m38s   10.254.0.66    master-0   <none>           <none>
# test-1                   1/1     Running   0          8m38s   10.254.1.230   worker-0   <none>           <none>

oc get svc/demo -o yaml
# apiVersion: v1
# kind: Service
# metadata:
#   creationTimestamp: "2021-08-30T12:42:21Z"
#   name: demo
#   namespace: default
#   resourceVersion: "2046159"
#   uid: 1af07435-5234-4062-994d-4715453118c6
# spec:
#   clusterIP: 172.30.82.87
#   clusterIPs:
#   - 172.30.82.87
#   externalTrafficPolicy: Cluster
#   ipFamilies:
#   - IPv4
#   ipFamilyPolicy: SingleStack
#   ports:
#   - name: http
#     nodePort: 32203
#     port: 80
#     protocol: TCP
#     targetPort: 80
#   selector:
#     env: test
#   sessionAffinity: None
#   type: LoadBalancer
# status:
#   loadBalancer:
#     ingress:
#     - ip: 198.51.100.1

回到 bgp-router 看看路由更新情况

# back to bgp-router

bgp-router# show ip bgp summary

IPv4 Unicast Summary:
BGP router identifier 192.168.7.10, local AS number 64512 vrf-id 0
BGP table version 1
RIB entries 1, using 192 bytes of memory
Peers 2, using 43 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt
*192.168.7.13   4      64512        73        72        0    0    0 00:35:16            1        0
*192.168.7.16   4      64512        73        72        0    0    0 00:35:16            1        0

Total number of neighbors 2
* - dynamic neighbor
2 dynamic neighbor(s), limit 200

bgp-router# show ip bgp neighbors 192.168.7.13 routes
BGP table version is 1, local router ID is 192.168.7.10, vrf id 0
Default local pref 100, local AS 64512
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i198.51.100.1/32  192.168.7.13                    0      0 ?

Displayed  1 routes and 2 total paths
bgp-router#
bgp-router# show ip bgp neighbors 192.168.7.16 routes
BGP table version is 1, local router ID is 192.168.7.10, vrf id 0
Default local pref 100, local AS 64512
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*=i198.51.100.1/32  192.168.7.16                    0      0 ?

Displayed  1 routes and 2 total paths

在路由器的shell界面上看看

ip r
# default via 172.21.6.254 dev enp1s0 proto static metric 100
# 172.21.6.0/24 dev enp1s0 proto kernel scope link src 172.21.6.10 metric 100
# 192.168.7.0/24 dev enp1s0 proto kernel scope link src 192.168.7.10 metric 100
# 198.51.100.1 proto bgp metric 20
#         nexthop via 192.168.7.13 dev enp1s0 weight 1
#         nexthop via 192.168.7.16 dev enp1s0 weight 1

[root@bgp-router ~]# curl 198.51.100.1 && echo
Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.66
[root@bgp-router ~]# curl 198.51.100.1 && echo
Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.66
[root@bgp-router ~]# curl 198.51.100.1 && echo
Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.66
[root@bgp-router ~]# curl 198.51.100.1 && echo
Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.66
[root@bgp-router ~]# curl 198.51.100.1 && echo
Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.1.230
[root@bgp-router ~]# curl 198.51.100.1 && echo
Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.66
[root@bgp-router ~]# curl 198.51.100.1 && echo
Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.66
[root@bgp-router ~]# curl 198.51.100.1 && echo
Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.66
[root@bgp-router ~]# curl 198.51.100.1 && echo
Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.1.230
[root@bgp-router ~]# curl 198.51.100.1 && echo
Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>10.254.0.66

到worker-0上,看看 nft 规则

# go to worker-0 to analyze the nat rules
nft list ruleset | grep 198.51
                # meta l4proto tcp ip daddr 198.51.100.1  tcp dport 80 counter packets 0 bytes 0 jump KUBE-FW-CTBMGJDNUDRWEDVR

nft list ruleset | grep KUBE-FW-CTBMGJDNUDRWEDVR -A 5
#                 meta l4proto tcp ip daddr 198.51.100.1  tcp dport 80 counter packets 0 bytes 0 jump KUBE-FW-CTBMGJDNUDRWEDVR
#                 meta l4proto tcp @nh,96,16 != 2814 ip daddr 172.30.145.124  tcp dport 443 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                 meta l4proto tcp ip daddr 172.30.145.124  tcp dport 443 counter packets 0 bytes 0 jump KUBE-SVC-L54HVQEJKTL2PXFK
#                 meta l4proto tcp @nh,96,16 != 2814 ip daddr 172.30.16.253  tcp dport 8443 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                 meta l4proto tcp ip daddr 172.30.16.253  tcp dport 8443 counter packets 0 bytes 0 jump KUBE-SVC-YVQ2VVJT4ABSS56R
#                 meta l4proto tcp @nh,96,16 != 2814 ip daddr 172.30.185.119  tcp dport 9091 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
# --
#         chain KUBE-FW-CTBMGJDNUDRWEDVR {
#                  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                  counter packets 0 bytes 0 jump KUBE-SVC-CTBMGJDNUDRWEDVR
#                  counter packets 0 bytes 0 jump KUBE-MARK-DROP
#         }


nft list ruleset | grep KUBE-SVC-CTBMGJDNUDRWEDVR -A 3
#                 meta l4proto tcp ip daddr 172.30.82.87  tcp dport 80 counter packets 0 bytes 0 jump KUBE-SVC-CTBMGJDNUDRWEDVR
#                 meta l4proto tcp ip daddr 198.51.100.1  tcp dport 80 counter packets 11 bytes 660 jump KUBE-FW-CTBMGJDNUDRWEDVR
#                 meta l4proto tcp @nh,96,16 != 2814 ip daddr 172.30.145.124  tcp dport 443 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                 meta l4proto tcp ip daddr 172.30.145.124  tcp dport 443 counter packets 0 bytes 0 jump KUBE-SVC-L54HVQEJKTL2PXFK
# --
#                 meta l4proto tcp  tcp dport 32203 counter packets 0 bytes 0 jump KUBE-SVC-CTBMGJDNUDRWEDVR
#         }

#         chain KUBE-SVC-DCLNKYLNAMROIJRV {
# --
#         chain KUBE-SVC-CTBMGJDNUDRWEDVR {
#                   counter packets 9 bytes 540 jump KUBE-SEP-BKD3LMWAJNKW5GNU
#                  counter packets 2 bytes 120 jump KUBE-SEP-M5WVBCWAFJ2J2M2U
#         }
# --
#                  counter packets 11 bytes 660 jump KUBE-SVC-CTBMGJDNUDRWEDVR
#                  counter packets 0 bytes 0 jump KUBE-MARK-DROP
#         }

nft list ruleset | grep KUBE-SEP-BKD3LMWAJNKW5GNU -A 3
#                   counter packets 9 bytes 540 jump KUBE-SEP-BKD3LMWAJNKW5GNU
#                  counter packets 2 bytes 120 jump KUBE-SEP-M5WVBCWAFJ2J2M2U
#         }

# --
#         chain KUBE-SEP-BKD3LMWAJNKW5GNU {
#                 ip saddr 10.254.0.66  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                 meta l4proto tcp   counter packets 9 bytes 540 dnat to 10.254.0.66:80
#         }

nft list ruleset | grep KUBE-SEP-M5WVBCWAFJ2J2M2U -A 3
#                  counter packets 2 bytes 120 jump KUBE-SEP-M5WVBCWAFJ2J2M2U
#         }

#         chain KUBE-FW-CTBMGJDNUDRWEDVR {
# --
#         chain KUBE-SEP-M5WVBCWAFJ2J2M2U {
#                 ip saddr 10.254.1.230  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
#                 meta l4proto tcp   counter packets 2 bytes 120 dnat to 10.254.1.230:80
#         }

Kata / sandbox container in openshift 4.8

红帽 openshift 4.8 容器平台,最新支持了kata,或者叫沙盒容器, 是在物理机上启动vm,然后在vm里面启动容器进程的技术,初衷是为了进一步提高安全性,消除用户对容器是否存在逃逸问题的顾虑,虽然还是TP阶段,但是已经可以一探究竟啦。

https://docs.openshift.com/container-platform/4.8/sandboxed_containers/understanding-sandboxed-containers.html

视频讲解:

首先我们来安装它,在operator hub里面选择sandbox container,点击安装。

然后在operator里面创建一个kata config,默认就可以,现在是TP阶段,也没什么花活。

创建好了以后,kata operator就会在系统里面创建一些配置,我们来一个一个看一下。

# 首先是runtime class,这个是指出了pod可以使用kata作为runtime, 
# 注意礼貌的overhead,这个配置的意思,是kata有qemu作为虚拟机,所以会有一些额外的消耗,
# 这些消耗在scheduling的时候,需要计算,这里就把这个计算量静态的配置进去。。。
# 虽然我觉得这个不太灵活,但是目前就是这样的。
oc get runtimeclass/kata -o yaml
# apiVersion: node.k8s.io/v1
# handler: kata
# kind: RuntimeClass
# metadata:
#   name: kata
# overhead:
#   podFixed:
#     cpu: 250m
#     memory: 350Mi
# scheduling:
#   nodeSelector:
#     node-role.kubernetes.io/worker: ""

# ocp会把kata通过machine config的方式,配置到节点里面去
oc get mc
# NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
# 00-master                                          723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             15h
# 00-worker                                          723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             15h
# 01-master-container-runtime                        723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             15h
# 01-master-kubelet                                  723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             15h
# 01-worker-container-runtime                        723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             15h
# 01-worker-kubelet                                  723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             15h
# 50-enable-sandboxed-containers-extension                                                      3.2.0             51m
# 99-master-chrony-configuration                                                                2.2.0             15h
# 99-master-container-registries                                                                3.1.0             15h
# 99-master-generated-registries                     723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             15h
# 99-master-ssh                                                                                 3.2.0             15h
# 99-worker-chrony-configuration                                                                2.2.0             15h
# 99-worker-container-registries                                                                3.1.0             15h
# 99-worker-generated-registries                     723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             15h
# 99-worker-ssh                                                                                 3.2.0             15h
# rendered-master-8c1e34a69aa4b919b6f2eec350570491   723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             15h
# rendered-worker-4afd90ddf39588aae385def4519e8da9   723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             51m
# rendered-worker-5abff4814eef2f9bc7535e5cbb10564c   723a8a4992f42530af95202e51e5a940d2a3d169   3.2.0             15h

# 那这个machine config里面是什么呢?我们看一看
# 原来是加了一个extension, 
# 经过查看源代码,这个sandboxed-containers extension就是对应了kata-containers rpm
oc get mc/50-enable-sandboxed-containers-extension -o yaml
# apiVersion: machineconfiguration.openshift.io/v1
# kind: MachineConfig
# metadata:
#   labels:
#     app: example-kataconfig
#     machineconfiguration.openshift.io/role: worker
#   name: 50-enable-sandboxed-containers-extension
# spec:
#   config:
#     ignition:
#       version: 3.2.0
#   extensions:
#   - sandboxed-containers

# 我们到worker-0上看看,发现确实是安装了一个新的kata-containers rpm
rpm-ostree status
# State: idle
# Deployments:
# ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6ddc94ab09a4807ea3d1f29a922fe15f0b4ee863529258c486a04e7fb7b95a4b
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 48.84.202108161759-0 (2021-08-16T18:03:02Z)
#            LayeredPackages: kata-containers

#   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6ddc94ab09a4807ea3d1f29a922fe15f0b4ee863529258c486a04e7fb7b95a4b
#               CustomOrigin: Managed by machine-config-operator
#                    Version: 48.84.202108161759-0 (2021-08-16T18:03:02Z)

# 我们看看这个kata-containers rpm里面都提供了什么文件
rpm -ql kata-containers
# /etc/crio/crio.conf.d/50-kata
# /usr/bin/containerd-shim-kata-v2
# /usr/bin/kata-collect-data.sh
# /usr/bin/kata-monitor
# /usr/bin/kata-runtime
# /usr/lib/.build-id
# /usr/lib/.build-id/0f
# /usr/lib/.build-id/0f/dc6751937c4b54a2e10ed431f7969bfd85d2d7
# /usr/lib/.build-id/5e
# /usr/lib/.build-id/5e/ad1e1eca5ab8111a23bf094caf6acbd3b9d7af
# /usr/lib/.build-id/67
# /usr/lib/.build-id/67/e5107c68c0e147f24f6e8f4e96104564b8f223
# /usr/lib/.build-id/be
# /usr/lib/.build-id/be/0add7df48b5f06a305e95497355666a1e04e39
# /usr/lib/systemd/system/kata-osbuilder-generate.service
# /usr/libexec/kata-containers
# /usr/libexec/kata-containers/VERSION
# /usr/libexec/kata-containers/agent
# /usr/libexec/kata-containers/agent/usr
# /usr/libexec/kata-containers/agent/usr/bin
# /usr/libexec/kata-containers/agent/usr/bin/kata-agent
# /usr/libexec/kata-containers/agent/usr/lib
# /usr/libexec/kata-containers/agent/usr/lib/systemd
# /usr/libexec/kata-containers/agent/usr/lib/systemd/system
# /usr/libexec/kata-containers/agent/usr/lib/systemd/system/kata-agent.service
# /usr/libexec/kata-containers/agent/usr/lib/systemd/system/kata-containers.target
# /usr/libexec/kata-containers/kata-netmon
# /usr/libexec/kata-containers/osbuilder
# /usr/libexec/kata-containers/osbuilder/dracut
# /usr/libexec/kata-containers/osbuilder/dracut/dracut.conf.d
# /usr/libexec/kata-containers/osbuilder/dracut/dracut.conf.d/05-base.conf
# /usr/libexec/kata-containers/osbuilder/dracut/dracut.conf.d/15-dracut-rhel.conf
# /usr/libexec/kata-containers/osbuilder/initrd-builder
# /usr/libexec/kata-containers/osbuilder/initrd-builder/README.md
# /usr/libexec/kata-containers/osbuilder/initrd-builder/initrd_builder.sh
# /usr/libexec/kata-containers/osbuilder/kata-osbuilder.sh
# /usr/libexec/kata-containers/osbuilder/nsdax
# /usr/libexec/kata-containers/osbuilder/rootfs-builder
# /usr/libexec/kata-containers/osbuilder/rootfs-builder/README.md
# /usr/libexec/kata-containers/osbuilder/rootfs-builder/rootfs.sh
# /usr/libexec/kata-containers/osbuilder/scripts
# /usr/libexec/kata-containers/osbuilder/scripts/lib.sh
# /usr/share/bash-completion/completions/kata-runtime
# /usr/share/doc/kata-containers
# /usr/share/doc/kata-containers/CONTRIBUTING.md
# /usr/share/doc/kata-containers/README.md
# /usr/share/kata-containers
# /usr/share/kata-containers/defaults
# /usr/share/kata-containers/defaults/configuration.toml
# /usr/share/licenses/kata-containers
# /usr/share/licenses/kata-containers/LICENSE
# /var/cache/kata-containers

# 我们看看kata-containers 使用的虚拟机镜像
ls -Rl /var/cache/kata-containers
# /var/cache/kata-containers:
# total 0
# lrwxrwxrwx. 1 root root 121 Aug 26 05:22 kata-containers-initrd.img -> '/var/cache/kata-containers/osbuilder-images/4.18.0-305.12.1.el8_4.x86_64/"rhcos"-kata-4.18.0-305.12.1.el8_4.x86_64.initrd'
# drwxr-xr-x. 3 root root  42 Aug 26 05:22 osbuilder-images
# lrwxrwxrwx. 1 root root  50 Aug 26 05:22 vmlinuz.container -> /lib/modules/4.18.0-305.12.1.el8_4.x86_64//vmlinuz

# /var/cache/kata-containers/osbuilder-images:
# total 0
# drwxr-xr-x. 2 root root 62 Aug 26 05:22 4.18.0-305.12.1.el8_4.x86_64

# /var/cache/kata-containers/osbuilder-images/4.18.0-305.12.1.el8_4.x86_64:
# total 19224
# -rw-r--r--. 1 root root 19682871 Aug 26 05:22 '"rhcos"-kata-4.18.0-305.12.1.el8_4.x86_64.initrd'

# 我们看看kata和crio的结合点,就是crios的配置文件里面
cat /etc/crio/crio.conf.d/50-kata
# [crio.runtime.runtimes.kata]
#   runtime_path = "/usr/bin/containerd-shim-kata-v2"
#   runtime_type = "vm"
#   runtime_root = "/run/vc"
#   privileged_without_host_devices = true

# 我们能看到,系统启动的时候,会根据当前操作系统,编译一个kata使用的虚拟机镜像。
# 后面如果项目上有需要,可以在这个步骤上,做定制,做一个客户需要的虚拟机镜像。
systemctl cat kata-osbuilder-generate.service
# # /usr/lib/systemd/system/kata-osbuilder-generate.service
# [Unit]
# Description=Generate Kata appliance image for host kernel

# [Service]
# Type=oneshot
# ExecStart=/usr/libexec/kata-containers/osbuilder/kata-osbuilder.sh -c
# ExecReload=/usr/libexec/kata-containers/osbuilder/kata-osbuilder.sh

# [Install]
# WantedBy=kubelet.service

# 我们来搞一个pod,测试一下。
cat << EOF > /data/install/kata.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mypod
  labels:
    app: mypod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mypod
  template:
    metadata:
      labels:
        app: mypod
    spec:
      runtimeClassName: kata
      containers:
      - name: mypod
        image: quay.io/wangzheng422/qimgs:centos7-test
        command:
          - sleep
          - infinity
EOF
oc create -f /data/install/kata.yaml

# to restore
oc delete -f /data/install/kata.yaml

# 到worker-0上,可以看到qemu进程。
ps aufx ww | grep qemu
# root       99994  0.0  0.0  12816  1076 pts/0    S+   06:22   0:00                      \_ grep --color=auto qemu
# root       93561  1.3  0.9 2466300 326724 ?      Sl   06:19   0:03 /usr/libexec/qemu-kiwi -name sandbox-42f003b365352a71ab87e8a1f49b1c301b6c3c856ec5520b4986aa8b9e43151f -uuid 1cd86e5c-3f86-45e8-bce2-96b16dce635a -machine q35,accel=kvm,kernel_irqchip -cpu host,pmu=off -qmp unix:/run/vc/vm/42f003b365352a71ab87e8a1f49b1c301b6c3c856ec5520b4986aa8b9e43151f/qmp.sock,server=on,wait=off -m 2048M,slots=10,maxmem=33122M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2 -device virtio-serial-pci,disable-modern=false,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/42f003b365352a71ab87e8a1f49b1c301b6c3c856ec5520b4986aa8b9e43151f/console.sock,server=on,wait=off -device virtio-scsi-pci,id=scsi0,disable-modern=false -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0 -device vhost-vsock-pci,disable-modern=false,vhostfd=3,id=vsock-976011602,guest-cid=976011602 -chardev socket,id=char-b4b86634faff36bb,path=/run/vc/vm/42f003b365352a71ab87e8a1f49b1c301b6c3c856ec5520b4986aa8b9e43151f/vhost-fs.sock -device vhost-user-fs-pci,chardev=char-b4b86634faff36bb,tag=kataShared -netdev tap,id=network-0,vhost=on,vhostfds=4,fds=5 -device driver=virtio-net-pci,netdev=network-0,mac=0a:58:0a:fe:01:1a,disable-modern=false,mq=on,vectors=4 -rtc base=utc,driftfix=slew,clock=host -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic --no-reboot -daemonize -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/lib/modules/4.18.0-305.12.1.el8_4.x86_64/vmlinuz -initrd /var/cache/kata-containers/osbuilder-images/4.18.0-305.12.1.el8_4.x86_64/"rhcos"-kata-4.18.0-305.12.1.el8_4.x86_64.initrd -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 cryptomgr.notests net.ifnames=0 pci=lastbus=0 quiet panic=1 nr_cpus=24 scsi_mod.scan=none -pidfile /run/vc/vm/42f003b365352a71ab87e8a1f49b1c301b6c3c856ec5520b4986aa8b9e43151f/pid -smp 1,cores=1,threads=1,sockets=24,maxcpus=24

# 我们很好奇kata的详细配置,那么我们看看kata的配置文件在哪里
kata-runtime --show-default-config-paths
# /etc/kata-containers/configuration.toml
# /usr/share/kata-containers/defaults/configuration.toml

# 我们看看kata的配置文件内容
cat /usr/share/kata-containers/defaults/configuration.toml

result check here

# 我们看看kata runtime感知到的配置内容
kata-runtime env
# [Meta]
#   Version = "1.0.25"

# [Runtime]
#   Debug = false
#   Trace = false
#   DisableGuestSeccomp = true
#   DisableNewNetNs = false
#   SandboxCgroupOnly = true
#   Path = "/usr/bin/kata-runtime"
#   [Runtime.Version]
#     OCI = "1.0.1-dev"
#     [Runtime.Version.Version]
#       Semver = "2.1.0"
#       Major = 2
#       Minor = 1
#       Patch = 0
#       Commit = "fa7b9408555e863d0f36f7d0640134069b0c70c8"
#   [Runtime.Config]
#     Path = "/usr/share/kata-containers/defaults/configuration.toml"

# [Hypervisor]
#   MachineType = "q35"
#   Version = "QEMU emulator version 5.2.0 (qemu-kvm-5.2.0-16.module+el8.4.0+11536+725e25d9.2)\nCopyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers"
#   Path = "/usr/libexec/qemu-kiwi"
#   BlockDeviceDriver = "virtio-scsi"
#   EntropySource = "/dev/urandom"
#   SharedFS = "virtio-fs"
#   VirtioFSDaemon = "/usr/libexec/virtiofsd"
#   Msize9p = 8192
#   MemorySlots = 10
#   PCIeRootPort = 0
#   HotplugVFIOOnRootBus = false
#   Debug = false

# [Image]
#   Path = ""

# [Kernel]
#   Path = "/usr/lib/modules/4.18.0-305.12.1.el8_4.x86_64/vmlinuz"
#   Parameters = "scsi_mod.scan=none"

# [Initrd]
#   Path = "/var/cache/kata-containers/osbuilder-images/4.18.0-305.12.1.el8_4.x86_64/\"rhcos\"-kata-4.18.0-305.12.1.el8_4.x86_64.initrd"

# [Agent]
#   Debug = false
#   Trace = false
#   TraceMode = ""
#   TraceType = ""

# [Host]
#   Kernel = "4.18.0-305.12.1.el8_4.x86_64"
#   Architecture = "amd64"
#   VMContainerCapable = true
#   SupportVSocks = true
#   [Host.Distro]
#     Name = "Red Hat Enterprise Linux CoreOS"
#     Version = "4.8"
#   [Host.CPU]
#     Vendor = "GenuineIntel"
#     Model = "Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz"
#     CPUs = 24
#   [Host.Memory]
#     Total = 32868716
#     Free = 27704960
#     Available = 29880404

# [Netmon]
#   Path = "/usr/libexec/kata-containers/kata-netmon"
#   Debug = false
#   Enable = false
#   [Netmon.Version]
#     Semver = "2.1.0"
#     Major = 2
#     Minor = 1
#     Patch = 0
#     Commit = "<<unknown>>"

# 我们看看这个构建kata虚拟机镜像的脚本
cat /usr/libexec/kata-containers/osbuilder/kata-osbuilder.sh

result check here

try to debug

# try to debug
# 为了能进入到kata虚拟机内部,我们需要修改一下kata的配置文件,激活debug console
mkdir -p /etc/kata-containers/
install -o root -g root -m 0640 /usr/share/kata-containers/defaults/configuration.toml /etc/kata-containers
sed -i -e 's/^# *\(debug_console_enabled\).*=.*$/\1 = true/g' /etc/kata-containers/configuration.toml

# 然后重启pod,我们就能直接连进去kata虚拟机了。
# ps -ef | grep qemu-kiwi | sed 's/.* sandbox-\([^ ]*\) .*/\1/p' | grep -v qemu-kiwi
KATA_PID=`ps -ef | grep qemu-kiwi | sed 's/.* sandbox-\([^ ]*\) .*/\1/g' | grep -v qemu-kiwi`
kata-runtime exec $KATA_PID

in the kata vm

# 虚拟机里面,是个超级简化的系统,命令奇缺
bash-4.4# cd /etc

# ls都没有,只能echo * 代替。
bash-4.4# echo *
chrony.conf cmdline.d conf.d group ld.so.cache ld.so.conf ld.so.conf.d machine-id modules-load.d passwd resolv.conf systemd udev virc

# 可以看到,操作系统和宿主机一样,因为启动的时候,用宿主机的内核构建出来的
bash-4.4# uname -a
Linux mypod-787d79b456-4f4xr 4.18.0-305.12.1.el8_4.x86_64 #1 SMP Mon Jul 26 08:06:24 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

# 看看激活了什么内核模块
bash-4.4# lsmod
Module                  Size  Used by
mcryptd                16384  0
virtio_blk             20480  0
virtio_console         36864  0
virtio_net             53248  0
net_failover           24576  1 virtio_net
sg                     40960  0
virtio_scsi            20480  0
virtiofs               28672  1
failover               16384  1 net_failover
vmw_vsock_virtio_transport    16384  2
vmw_vsock_virtio_transport_common    32768  1 vmw_vsock_virtio_transport
vsock                  45056  10 vmw_vsock_virtio_transport_common,vmw_vsock_virtio_transport
fuse                  151552  1 virtiofs

# 看看挂载了什么分区
bash-4.4# mount
rootfs on / type rootfs (rw,size=964048k,nr_inodes=241012)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=964064k,nr_inodes=241016,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev)
configfs on /sys/kernel/config type configfs (rw,relatime)
nsfs on /run/sandbox-ns/ipc type nsfs (rw)
nsfs on /run/sandbox-ns/uts type nsfs (rw)
kataShared on /run/kata-containers/shared/containers type virtiofs (rw,relatime)
shm on /run/kata-containers/sandbox/shm type tmpfs (rw,relatime)
tmpfs on /etc/resolv.conf type tmpfs (rw,nosuid,nodev,mode=755)
kataShared on /run/kata-containers/8330bf4c2a98360975ce16244af81c4a5dfa74d4ea3c8a520d9244f0c14e541b/rootfs type virtiofs (rw,relatime)
kataShared on /run/kata-containers/bc201bf92ec8dcad3435ff4191912a41efb64a1e0fb463ad4a651b4dea94a8a5/rootfs type virtiofs (rw,relatime)
b

# 看看都有什么进程
bash-4.4# ps efx ww
    PID TTY      STAT   TIME COMMAND
      2 ?        S      0:00 [kthreadd]
      3 ?        I<     0:00  \_ [rcu_gp]
      4 ?        I<     0:00  \_ [rcu_par_gp]
      6 ?        I<     0:00  \_ [kworker/0:0H-events_highpri]
      7 ?        I      0:00  \_ [kworker/0:1-virtio_vsock]
      8 ?        I      0:00  \_ [kworker/u48:0-events_unbound]
      9 ?        I<     0:00  \_ [mm_percpu_wq]
     10 ?        S      0:00  \_ [ksoftirqd/0]
     11 ?        I      0:00  \_ [rcu_sched]
     12 ?        S      0:00  \_ [migration/0]
     13 ?        S      0:00  \_ [watchdog/0]
     14 ?        S      0:00  \_ [cpuhp/0]
     16 ?        S      0:00  \_ [kdevtmpfs]
     17 ?        I<     0:00  \_ [netns]
     18 ?        S      0:00  \_ [kauditd]
     19 ?        S      0:00  \_ [khungtaskd]
     20 ?        S      0:00  \_ [oom_reaper]
     21 ?        I<     0:00  \_ [writeback]
     22 ?        S      0:00  \_ [kcompactd0]
     23 ?        SN     0:00  \_ [ksmd]
     24 ?        SN     0:00  \_ [khugepaged]
     25 ?        I<     0:00  \_ [crypto]
     26 ?        I<     0:00  \_ [kintegrityd]
     27 ?        I<     0:00  \_ [kblockd]
     28 ?        I<     0:00  \_ [blkcg_punt_bio]
     29 ?        I<     0:00  \_ [tpm_dev_wq]
     30 ?        I<     0:00  \_ [md]
     31 ?        I<     0:00  \_ [edac-poller]
     32 ?        S      0:00  \_ [watchdogd]
     33 ?        I<     0:00  \_ [kworker/0:1H]
     35 ?        I      0:00  \_ [kworker/u48:1]
     49 ?        S      0:00  \_ [kswapd0]
    132 ?        I<     0:00  \_ [kthrotld]
    133 ?        I<     0:00  \_ [acpi_thermal_pm]
    134 ?        S      0:00  \_ [hwrng]
    135 ?        I<     0:00  \_ [kmpath_rdacd]
    136 ?        I<     0:00  \_ [kaluad]
    137 ?        I<     0:00  \_ [ipv6_addrconf]
    138 ?        I<     0:00  \_ [kstrp]
    203 ?        I      0:00  \_ [kworker/0:3-mm_percpu_wq]
    206 ?        S      0:00  \_ [scsi_eh_0]
    207 ?        I<     0:00  \_ [scsi_tmf_0]
    218 ?        S      0:00  \_ [khvcd]
      1 ?        Ss     0:00 /init HOME=/ TERM=linux
    193 ?        Ss     0:00 /usr/lib/systemd/systemd-journald PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin NOTIFY_SOCKET=/run/systemd/notify LISTEN_PID=193 LISTEN_FDS=3 LISTEN_FDNAMES=systemd-journald-dev-log.socket:systemd-journald.socket:systemd-journald.socket WATCHDOG_PID=193 WATCHDOG_USEC=180000000 INVOCATION_ID=00385279d7314bf5a02002d5f1e33050
    201 ?        Ss     0:00 /usr/lib/systemd/systemd-udevd PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin NOTIFY_SOCKET=/run/systemd/notify LISTEN_PID=201 LISTEN_FDS=2 LISTEN_FDNAMES=systemd-udevd-kernel.socket:systemd-udevd-control.socket WATCHDOG_PID=201 WATCHDOG_USEC=180000000 INVOCATION_ID=b3e4a3cd29b34c91a192bc9527da10cf JOURNAL_STREAM=9:10719
    225 ?        Ssl    0:02 /usr/bin/kata-agent PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin INVOCATION_ID=5683abfd11c542fe98c5f7ece1afa599 TERM=vt220
    231 ?        S      0:00  \_ /usr/bin/pod PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm HOME=/root
    235 ?        S      0:00  \_ sleep infinity PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm HOSTNAME=mypod-787d79b456-4f4xr NSS_SDB_USE_CACHE=no KUBERNETES_SERVICE_HOST=172.30.0.1 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_HTTPS=443 KUBERNETES_PORT=tcp://172.30.0.1:443 KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1 HOME=/root
    236 pts/0    Ss     0:00  \_ [bash] PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin INVOCATION_ID=5683abfd11c542fe98c5f7ece1afa599 TERM=vt220 RUST_BACKTRACE=full
    268 pts/0    R+     0:00  |   \_ ps efx ww RUST_BACKTRACE=full INVOCATION_ID=5683abfd11c542fe98c5f7ece1afa599 PWD=/proc/net TERM=vt220 SHLVL=1 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin OLDPWD=/proc _=/usr/bin/ps
    247 pts/1    Ss+    0:00  \_ /bin/sh TERM=screen-256color HOSTNAME=mypod-787d79b456-4f4xr KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT=tcp://172.30.0.1:443 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_HOST=172.30.0.1 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ SHLVL=1 HOME=/root KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_SERVICE_PORT_HTTPS=443 NSS_SDB_USE_CACHE=no KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1 KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443 _=/bin/sh

# 看看有多少内存
bash-4.4# free -h
              total        used        free      shared  buff/cache   available
Mem:          1.9Gi        30Mi       1.8Gi        58Mi        72Mi       1.7Gi
Swap:            0B          0B          0B

# 看看内核启动参数
bash-4.4# cat cmdline
tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 cryptomgr.notests net.ifnames=0 pci=lastbus=0 quiet panic=1 nr_cpus=24 scsi_mod.scan=none agent.debug_console agent.debug_console_vport=1026

# 没有ip命令,只能用内核接口,凑合看一下本机ip 地址
bash-4.4# cat /proc/net/fib_trie
Main:
  +-- 0.0.0.0/0 3 0 4
     +-- 0.0.0.0/4 2 0 2
        |-- 0.0.0.0
           /0 universe UNICAST
        +-- 10.254.0.0/23 2 0 1
           |-- 10.254.0.0
              /16 universe UNICAST
           +-- 10.254.1.0/28 2 0 2
              |-- 10.254.1.0
                 /32 link BROADCAST
                 /24 link UNICAST
              |-- 10.254.1.14
                 /32 host LOCAL
           |-- 10.254.1.255
              /32 link BROADCAST
     +-- 127.0.0.0/8 2 0 2
        +-- 127.0.0.0/31 1 0 0
           |-- 127.0.0.0
              /32 link BROADCAST
              /8 host LOCAL
           |-- 127.0.0.1
              /32 host LOCAL
        |-- 127.255.255.255
           /32 link BROADCAST
     |-- 172.30.0.0
        /16 universe UNICAST
     |-- 224.0.0.0
        /4 universe UNICAST
Local:
  +-- 0.0.0.0/0 3 0 4
     +-- 0.0.0.0/4 2 0 2
        |-- 0.0.0.0
           /0 universe UNICAST
        +-- 10.254.0.0/23 2 0 1
           |-- 10.254.0.0
              /16 universe UNICAST
           +-- 10.254.1.0/28 2 0 2
              |-- 10.254.1.0
                 /32 link BROADCAST
                 /24 link UNICAST
              |-- 10.254.1.14
                 /32 host LOCAL
           |-- 10.254.1.255
              /32 link BROADCAST
     +-- 127.0.0.0/8 2 0 2
        +-- 127.0.0.0/31 1 0 0
           |-- 127.0.0.0
              /32 link BROADCAST
              /8 host LOCAL
           |-- 127.0.0.1
              /32 host LOCAL
        |-- 127.255.255.255
           /32 link BROADCAST
     |-- 172.30.0.0
        /16 universe UNICAST
     |-- 224.0.0.0
        /4 universe UNICAST

# 看看systemctl的服务
bash-4.4# systemctl list-units
  UNIT                          LOAD   ACTIVE SUB     DESCRIPTION
  sys-devices-pci0000:00-0000:00:01.0-virtio0-virtio\x2dports-vport0p0.device loaded active plugged /sys/devices/pci0000:00/0000:00:01.0/virtio0/virtio-ports/vport0p0
  sys-devices-pci0000:00-0000:00:07.0-virtio5-net-eth0.device loaded active plugged /sys/devices/pci0000:00/0000:00:07.0/virtio5/net/eth0
  sys-devices-platform-serial8250-tty-ttyS0.device loaded active plugged /sys/devices/platform/serial8250/tty/ttyS0
  sys-devices-platform-serial8250-tty-ttyS1.device loaded active plugged /sys/devices/platform/serial8250/tty/ttyS1
  sys-devices-platform-serial8250-tty-ttyS2.device loaded active plugged /sys/devices/platform/serial8250/tty/ttyS2
  sys-devices-platform-serial8250-tty-ttyS3.device loaded active plugged /sys/devices/platform/serial8250/tty/ttyS3
  sys-devices-virtual-tty-hvc0.device loaded active plugged /sys/devices/virtual/tty/hvc0
  sys-devices-virtual-tty-hvc1.device loaded active plugged /sys/devices/virtual/tty/hvc1
  sys-devices-virtual-tty-hvc2.device loaded active plugged /sys/devices/virtual/tty/hvc2
  sys-devices-virtual-tty-hvc3.device loaded active plugged /sys/devices/virtual/tty/hvc3
  sys-devices-virtual-tty-hvc4.device loaded active plugged /sys/devices/virtual/tty/hvc4
  sys-devices-virtual-tty-hvc5.device loaded active plugged /sys/devices/virtual/tty/hvc5
  sys-devices-virtual-tty-hvc6.device loaded active plugged /sys/devices/virtual/tty/hvc6
  sys-devices-virtual-tty-hvc7.device loaded active plugged /sys/devices/virtual/tty/hvc7
  sys-module-configfs.device    loaded active plugged /sys/module/configfs
  sys-module-fuse.device        loaded active plugged /sys/module/fuse
  sys-subsystem-net-devices-eth0.device loaded active plugged /sys/subsystem/net/devices/eth0
  -.mount                       loaded active mounted Root Mount
  etc-resolv.conf.mount         loaded active mounted /etc/resolv.conf
  run-kata\x2dcontainers-3daea1739ff15b732a2a1e7cf76d64b49f128a5a55bb8807c5ddde96d378e5cd-rootfs.mount loaded active mounted /run/kata-containers/3daea1739ff15b732a2a1e7cf76d64b49f128a5a55bb8807c5ddde96d378e5cd/rootfs
  run-kata\x2dcontainers-e47a609923ce835a252c87d71fc3ba92adb974f00fdae194576b3d388b1bc770-rootfs.mount loaded active mounted /run/kata-containers/e47a609923ce835a252c87d71fc3ba92adb974f00fdae194576b3d388b1bc770/rootfs
  run-kata\x2dcontainers-sandbox-shm.mount loaded active mounted /run/kata-containers/sandbox/shm
-containers/shared/containersed-containers.mount loaded active mounted /run/kata--More--
  run-sandbox\x2dns-ipc.mount   loaded active mounted /run/sandbox-ns/ipc
  run-sandbox\x2dns-uts.mount   loaded active mounted /run/sandbox-ns/uts
  sys-kernel-config.mount       loaded active mounted Kernel Configuration File System
  tmp.mount                     loaded active mounted Temporary Directory (/tmp)
  systemd-ask-password-console.path loaded active waiting Dispatch Password Requests to Console Directory Watch
  init.scope                    loaded active running System and Service Manager
  kata-agent.service            loaded active running Kata Containers Agent
  kmod-static-nodes.service     loaded active exited  Create list of required static device nodes for the current kernel
  systemd-journald.service      loaded active running Journal Service
● systemd-modules-load.service  loaded failed failed  Load Kernel Modules
  systemd-sysctl.service        loaded active exited  Apply Kernel Variables
  systemd-tmpfiles-setup-dev.service loaded active exited  Create Static Device Nodes in /dev
  systemd-tmpfiles-setup.service loaded active exited  Create Volatile Files and Directories
  systemd-udev-trigger.service  loaded active exited  udev Coldplug all Devices
  systemd-udevd.service         loaded active running udev Kernel Device Manager
  -.slice                       loaded active active  Root Slice
  system.slice                  loaded active active  System Slice
  systemd-journald-dev-log.socket loaded active running Journal Socket (/dev/log)
  systemd-journald.socket       loaded active running Journal Socket
  systemd-udevd-control.socket  loaded active running udev Control Socket
  systemd-udevd-kernel.socket   loaded active running udev Kernel Socket
  basic.target                  loaded active active  Basic System
  kata-containers.target        loaded active active  Kata Containers Agent Target
  local-fs.target               loaded active active  Local File Systems
  multi-user.target             loaded active active  Multi-User System
  paths.target                  loaded active active  Paths
  slices.target                 loaded active active  Slices
  sockets.target                loaded active active  Sockets
  swap.target                   loaded active active  Swap
  sysinit.target                loaded active active  System Initialization
  timers.target                 loaded active active  Timers

# 有一个kata-containers的服务,我们很感兴趣,看看什么内容。
bash-4.4# systemctl cat kata-containers.target
# /usr/lib/systemd/system/kata-containers.target
#
# Copyright (c) 2018-2019 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#

[Unit]
Description=Kata Containers Agent Target
Requires=basic.target
Requires=tmp.mount
Wants=chronyd.service
Requires=kata-agent.service
Conflicts=rescue.service rescue.target
After=basic.target rescue.service rescue.target
AllowIsolate=yes

bash-4.4# systemctl cat kata-agent.service
# /usr/lib/systemd/system/kata-agent.service
#
# Copyright (c) 2018-2019 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#

[Unit]
Description=Kata Containers Agent
Documentation=https://github.com/kata-containers/kata-containers
Wants=kata-containers.target

[Service]
# Send agent output to tty to allow capture debug logs
# from a VM vsock port
StandardOutput=tty
Type=simple
ExecStart=/usr/bin/kata-agent
LimitNOFILE=1048576
# ExecStop is required for static agent tracing; in all other scenarios
# the runtime handles shutting down the VM.
ExecStop=/bin/sync ; /usr/bin/systemctl --force poweroff
FailureAction=poweroff
# Discourage OOM-killer from touching the agent
OOMScoreAdjust=-997

# 我们的容器都在哪里呢?找到了。
bash-4.4# pwd
/run/kata-containers/e47a609923ce835a252c87d71fc3ba92adb974f00fdae194576b3d388b1bc770/rootfs
bash-4.4# echo *
anaconda-post.log bin check.sh dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var

从helper登录到容器里面,看看什么情况。

[root@helper ~]# oc rsh pod/mypod-787d79b456-4f4xr
sh-4.2# ls
anaconda-post.log  bin  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
sh-4.2# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    link/ether 0a:58:0a:fe:01:0e brd ff:ff:ff:ff:ff:ff
    inet 10.254.1.14/24 brd 10.254.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::858:aff:fefe:10e/64 scope link
       valid_lft forever preferred_lft forever
    inet6 fe80::5c25:c3ff:fe29:f429/64 scope link
       valid_lft forever preferred_lft forever

sh-4.2# ps efx ww
    PID TTY      STAT   TIME COMMAND
      2 ?        Ss     0:00 /bin/sh TERM=screen-256color HOSTNAME=mypod-787d79b456-4f4xr KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT=tcp://172.30.0.1:443 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_HOST=172.30.0.1 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ SHLVL=1 HOME=/root KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_SERVICE_PORT_HTTPS=443 NSS_SDB_USE_CACHE=no KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1 KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443 _=/bin/sh
      9 ?        R+     0:00  \_ ps efx ww HOSTNAME=mypod-787d79b456-4f4xr KUBERNETES_PORT=tcp://172.30.0.1:443 KUBERNETES_PORT_443_TCP_PORT=443 TERM=screen-256color KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_HOST=172.30.0.1 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ HOME=/root SHLVL=2 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_SERVICE_PORT_HTTPS=443 NSS_SDB_USE_CACHE=no KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1 KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443 _=/usr/bin/ps
      1 ?        S      0:00 sleep infinity PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm HOSTNAME=mypod-787d79b456-4f4xr NSS_SDB_USE_CACHE=no KUBERNETES_SERVICE_HOST=172.30.0.1 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_HTTPS=443 KUBERNETES_PORT=tcp://172.30.0.1:443 KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1 HOME=/root       

研究一下网络

kata的网络模型,我们很关心,官方有文档

# 我们在worker-0上,看看namespace情况
[root@worker-0 ~]# lsns --output NS,TYPE,NETNSID,PID,COMMAND | grep qemu
4026533791 net             5 20394 /usr/libexec/qemu-kiwi -name sandbox-0f60fb9af6dbf8c8e355b9e27a62debe8276aa76f4246857e46520fa677ce40e -uuid 0a101364-3814-42a4-91b9-c8a81fc377ef -machine q35,accel=kvm,kernel_irqchip -cpu host,pmu=off -qmp unix:/run/vc/vm/0f60fb9af6dbf8c8e355b9e27a62debe8276aa76f4246857e46520fa677ce40e/qmp.sock,server=on,wait=off -m 2048M,slots=10,maxmem=33122M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2 -device virtio-serial-pci,disable-modern=false,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/0f60fb9af6dbf8c8e355b9e27a62debe8276aa76f4246857e46520fa677ce40e/console.sock,server=on,wait=off -device virtio-scsi-pci,id=scsi0,disable-modern=false -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0 -device vhost-vsock-pci,disable-modern=false,vhostfd=3,id=vsock-2809816003,guest-cid=2809816003 -chardev socket,id=char-3bb1f59f00a0b873,path=/run/vc/vm/0f60fb9af6dbf8c8e355b9e27a62debe8276aa76f4246857e46520fa677ce40e/vhost-fs.sock -device vhost-user-fs-pci,chardev=char-3bb1f59f00a0b873,tag=kataShared -netdev tap,id=network-0,vhost=on,vhostfds=4,fds=5 -device driver=virtio-net-pci,netdev=network-0,mac=0a:58:0a:81:00:12,disable-modern=false,mq=on,vectors=4 -rtc base=utc,driftfix=slew,clock=host -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic --no-reboot -daemonize -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/lib/modules/4.18.0-305.19.1.el8_4.x86_64/vmlinuz -initrd /var/cache/kata-containers/osbuilder-images/4.18.0-305.19.1.el8_4.x86_64/"rhcos"-kata-4.18.0-305.19.1.el8_4.x86_64.initrd -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 cryptomgr.notests net.ifnames=0 pci=lastbus=0 quiet panic=1 nr_cpus=24 scsi_mod.scan=none agent.debug_console agent.debug_console_vport=1026 -pidfile /run/vc/vm/0f60fb9af6dbf8c8e355b9e27a62debe8276aa76f4246857e46520fa677ce40e/pid -smp 1,cores=1,threads=1,sockets=24,maxcpus=24

# 我们到kata的netns里面去看看忘了情况, eth0后面的@if22,说的是在对端,是22号接口和本接口做了peer。
[root@worker-0 ~]# nsenter -t 20394 -n ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default qlen 1000
    link/ether 0a:58:0a:81:00:12 brd ff:ff:ff:ff:ff:ff link-netns a4db0b05-2ff7-4a29-98da-1df2491622fb
    inet 10.129.0.18/23 brd 10.129.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::858:aff:fe81:12/64 scope link
       valid_lft forever preferred_lft forever
4: tap0_kata: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc mq state UNKNOWN group default qlen 1000
    link/ether 56:51:b2:40:7c:56 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5451:b2ff:fe40:7c56/64 scope link
       valid_lft forever preferred_lft forever

# 我们在worker-0上,能看到有28号接口,并且对应这kata里面的3好接口
[root@worker-0 ~]# ip link | grep 22 -A3
    link/ether 9e:88:4d:e5:55:80 brd ff:ff:ff:ff:ff:ff link-netns 7ccc8362-c042-4bf3-9ddc-fa4fef322134
18: 6f53bb03a970cf7@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default
    link/ether 8e:a7:85:94:de:7b brd ff:ff:ff:ff:ff:ff link-netns 5f33c5e4-1788-4ab6-883b-78bf7ab5372e
22: 0f60fb9af6dbf8c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default
    link/ether 02:3c:63:91:ae:7f brd ff:ff:ff:ff:ff:ff link-netns 50226e1e-a0fd-48e3-b05c-7d5aa1d41acf

# 我们看看kata netns里面有没有nftables
[root@worker-0 ~]# nsenter -t 20394 -n nft list ruleset
table ip filter {
        chain INPUT {
                type filter hook input priority filter; policy accept;
        }

        chain FORWARD {
                type filter hook forward priority filter; policy accept;
                meta l4proto tcp tcp dport 22623 tcp flags & (fin|syn|rst|ack) == syn counter packets 0 bytes 0 reject
                meta l4proto tcp tcp dport 22624 tcp flags & (fin|syn|rst|ack) == syn counter packets 0 bytes 0 reject
                meta l4proto tcp ip daddr 169.254.169.254 tcp dport != 53 counter packets 0 bytes 0 reject
                meta l4proto udp ip daddr 169.254.169.254 udp dport 53 counter packets 0 bytes 0 reject
        }

        chain OUTPUT {
                type filter hook output priority filter; policy accept;
                meta l4proto tcp tcp dport 22623 tcp flags & (fin|syn|rst|ack) == syn counter packets 0 bytes 0 reject
                meta l4proto tcp tcp dport 22624 tcp flags & (fin|syn|rst|ack) == syn counter packets 0 bytes 0 reject
                meta l4proto tcp ip daddr 169.254.169.254 tcp dport != 53 counter packets 0 bytes 0 reject
                meta l4proto udp ip daddr 169.254.169.254 udp dport 53 counter packets 0 bytes 0 reject
        }
}

TC ( traffic control ) 的配置还是需要好好学习的,命令行比较复杂,可以参考以下的一些内容

可以使用的 man 命令

  • man tc-mirred
  • man tc-ctinfo
  • man tc-u32
  • man tc-actions

注意 action 里面有一个stolen,这个是说,命中以后,后续tc动作就中断了,进入netfilter等内核后续流程。

# 我们看看文档里面的tc配置,意思就是在eth0和tap0_kata之间mirror流量

# 根据网上的文档,tc qdisc add dev eth0 handle ffff: ingress is equivalent to tc qdisc add dev eth0 ingress, and also equals to 'qdisc ingress ffff: dev enp0s31f6 parent ffff:fff1 ----------------'

[root@worker-0 ~]# nsenter -t 20394 -n tc -s -p qdisc show dev eth0
qdisc noqueue 0: root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc ingress ffff: parent ffff:fff1 ----------------
 Sent 192 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

# 根据网上的文档,以下配置是 tc filter add dev eth0 parent ffff: protocol all u32 match u32 0 0 action mirred egress mirror dev tap0_kata 的结果

[root@worker-0 ~]# nsenter -t 20394 -n tc -s -p filter show dev eth0 root
filter parent ffff: protocol all pref 49152 u32 chain 0
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800: ht divisor 1
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw  (rule hit 2 success 2)
  match 00000000/00000000 at 0 (success 2 )
        action order 1: mirred (Egress Redirect to device tap0_kata) stolen
        index 1 ref 1 bind 1 installed 2310 sec used 2310 sec firstused 2310 sec
        Action statistics:
        Sent 192 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

[root@worker-0 ~]# nsenter -t 20394 -n tc -s -p filter show dev eth0 ingress
filter parent ffff: protocol all pref 49152 u32 chain 0
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800: ht divisor 1
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw  (rule hit 2 success 2)
  match 00000000/00000000 at 0 (success 2 )
        action order 1: mirred (Egress Redirect to device tap0_kata) stolen
        index 1 ref 1 bind 1 installed 1797 sec used 1797 sec firstused 1797 sec
        Action statistics:
        Sent 192 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

[root@worker-0 ~]# nsenter -t 20394 -n tc -s -p filter show dev eth0 egress
filter parent ffff: protocol all pref 49152 u32 chain 0
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800: ht divisor 1
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw  (rule hit 2 success 2)
  match 00000000/00000000 at 0 (success 2 )
        action order 1: mirred (Egress Redirect to device tap0_kata) stolen
        index 1 ref 1 bind 1 installed 2330 sec used 2330 sec firstused 2330 sec
        Action statistics:
        Sent 192 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

# 根据网上的文档,以下配置是 tc filter add dev tap0_kata parent ffff: protocol all u32 match u32 0 0 action mirred egress mirror dev eth0 的结果

[root@worker-0 ~]# nsenter -t 20394 -n tc -s -p qdisc show dev tap0_kata
qdisc mq 0: root
 Sent 1296 bytes 16 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1414 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 1296 bytes 16 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc ingress ffff: parent ffff:fff1 ----------------
 Sent 880 bytes 14 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

[root@worker-0 ~]# nsenter -t 20394 -n tc -s -p filter show dev tap0_kata root
filter parent ffff: protocol all pref 49152 u32 chain 0
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800: ht divisor 1
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw  (rule hit 15 success 15)
  match 00000000/00000000 at 0 (success 15 )
        action order 1: mirred (Egress Redirect to device eth0) stolen
        index 2 ref 1 bind 1 installed 2383 sec used 247 sec firstused 2380 sec
        Action statistics:
        Sent 936 bytes 15 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

[root@worker-0 ~]# nsenter -t 20394 -n tc -s -p filter show dev tap0_kata ingress
filter parent ffff: protocol all pref 49152 u32 chain 0
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800: ht divisor 1
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw  (rule hit 14 success 14)
  match 00000000/00000000 at 0 (success 14 )
        action order 1: mirred (Egress Redirect to device eth0) stolen
        index 2 ref 1 bind 1 installed 1690 sec used 636 sec firstused 1687 sec
        Action statistics:
        Sent 880 bytes 14 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

[root@worker-0 ~]# nsenter -t 20394 -n tc -s -p filter show dev tap0_kata egress
filter parent ffff: protocol all pref 49152 u32 chain 0
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800: ht divisor 1
filter parent ffff: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw  (rule hit 15 success 15)
  match 00000000/00000000 at 0 (success 15 )
        action order 1: mirred (Egress Redirect to device eth0) stolen
        index 2 ref 1 bind 1 installed 2400 sec used 264 sec firstused 2397 sec
        Action statistics:
        Sent 936 bytes 15 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

qemu-kiwi rpm sourcing

我们来看看 qemu-kiwi 这个rpm是从哪里来的。红帽官网也有工具查。答案是 Red Hat Enterprise Linux Advanced Virtualization 8 x86_64 ( advanced-virt-for-rhel-8-x86_64-rpms )

rpm -qpi kata-containers-2.1.0-6.el8.x86_64.rpm
# warning: kata-containers-2.1.0-6.el8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
# Name        : kata-containers
# Version     : 2.1.0
# Release     : 6.el8
# Architecture: x86_64
# Install Date: (not installed)
# Group       : Unspecified
# Size        : 104672045
# License     : ASL 2.0
# Signature   : RSA/SHA256, Fri 13 Aug 2021 07:38:35 AM UTC, Key ID 199e2f91fd431d51
# Source RPM  : kata-containers-2.1.0-6.el8.src.rpm
# Build Date  : Thu 29 Jul 2021 08:43:06 PM UTC
# Build Host  : x86-vm-56.build.eng.bos.redhat.com
# Relocations : (not relocatable)
# Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
# Vendor      : Red Hat, Inc.
# URL         : https://github.com/kata-containers/kata-containers
# Summary     : Kata Containers version 2.x repository
# Description :

# Kata Containers version 2.x repository. Kata Containers is an open source
# project and community working to build a standard implementation of lightweight
# Virtual Machines (VMs) that feel and perform like containers, but provide the
# workload isolation and security advantages of VMs. https://katacontainers.io/.

# %gopkg

rpm -qp --fileprovide kata-containers-2.1.0-6.el8.x86_64.rpm
# warning: kata-containers-2.1.0-6.el8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
# /etc/crio/crio.conf.d/50-kata
# /usr/bin/containerd-shim-kata-v2
# /usr/bin/kata-collect-data.sh
# /usr/bin/kata-monitor
# /usr/bin/kata-runtime
# /usr/lib/.build-id
# /usr/lib/.build-id/05
# /usr/lib/.build-id/05/4f48f5aef5a7120fe76e8f41bc2e96fe82cb20
# /usr/lib/.build-id/50
# /usr/lib/.build-id/50/a5e84ca71250993215cb19c1fed802800fb358
# /usr/lib/.build-id/b1
# /usr/lib/.build-id/b1/b275acd0ff5df77c6f5abc9b6c8c5b2b4ac88e
# /usr/lib/.build-id/e7
# /usr/lib/.build-id/e7/6ecd091d646ac823c7292c65b2a186d40b8023
# /usr/lib/systemd/system/kata-osbuilder-generate.service
# /usr/libexec/kata-containers
# /usr/libexec/kata-containers/VERSION
# /usr/libexec/kata-containers/agent
# /usr/libexec/kata-containers/agent/usr
# /usr/libexec/kata-containers/agent/usr/bin
# /usr/libexec/kata-containers/agent/usr/bin/kata-agent
# /usr/libexec/kata-containers/agent/usr/lib
# /usr/libexec/kata-containers/agent/usr/lib/systemd
# /usr/libexec/kata-containers/agent/usr/lib/systemd/system
# /usr/libexec/kata-containers/agent/usr/lib/systemd/system/kata-agent.service
# /usr/libexec/kata-containers/agent/usr/lib/systemd/system/kata-containers.target
# /usr/libexec/kata-containers/kata-netmon
# /usr/libexec/kata-containers/osbuilder
# /usr/libexec/kata-containers/osbuilder/dracut
# /usr/libexec/kata-containers/osbuilder/dracut/dracut.conf.d
# /usr/libexec/kata-containers/osbuilder/dracut/dracut.conf.d/05-base.conf
# /usr/libexec/kata-containers/osbuilder/dracut/dracut.conf.d/15-dracut-rhel.conf
# /usr/libexec/kata-containers/osbuilder/initrd-builder
# /usr/libexec/kata-containers/osbuilder/initrd-builder/README.md
# /usr/libexec/kata-containers/osbuilder/initrd-builder/initrd_builder.sh
# /usr/libexec/kata-containers/osbuilder/kata-osbuilder.sh
# /usr/libexec/kata-containers/osbuilder/nsdax
# /usr/libexec/kata-containers/osbuilder/rootfs-builder
# /usr/libexec/kata-containers/osbuilder/rootfs-builder/README.md
# /usr/libexec/kata-containers/osbuilder/rootfs-builder/rootfs.sh
# /usr/libexec/kata-containers/osbuilder/scripts
# /usr/libexec/kata-containers/osbuilder/scripts/lib.sh
# /usr/share/bash-completion/completions/kata-runtime
# /usr/share/doc/kata-containers
# /usr/share/doc/kata-containers/CONTRIBUTING.md
# /usr/share/doc/kata-containers/README.md
# /usr/share/kata-containers
# /usr/share/kata-containers/defaults
# /usr/share/kata-containers/defaults/configuration.toml
# /usr/share/licenses/kata-containers
# /usr/share/licenses/kata-containers/LICENSE
# /var/cache/kata-containers

rpm -qp --requires kata-containers-2.1.0-6.el8.x86_64.rpm
# warning: kata-containers-2.1.0-6.el8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
# /bin/bash
# /bin/sh
# /bin/sh
# /bin/sh
# dracut
# kernel
# libc.so.6()(64bit)
# libc.so.6(GLIBC_2.10)(64bit)
# libc.so.6(GLIBC_2.14)(64bit)
# libc.so.6(GLIBC_2.15)(64bit)
# libc.so.6(GLIBC_2.17)(64bit)
# libc.so.6(GLIBC_2.18)(64bit)
# libc.so.6(GLIBC_2.2.5)(64bit)
# libc.so.6(GLIBC_2.3)(64bit)
# libc.so.6(GLIBC_2.3.2)(64bit)
# libc.so.6(GLIBC_2.3.4)(64bit)
# libc.so.6(GLIBC_2.4)(64bit)
# libc.so.6(GLIBC_2.7)(64bit)
# libc.so.6(GLIBC_2.9)(64bit)
# libdl.so.2()(64bit)
# libdl.so.2(GLIBC_2.2.5)(64bit)
# libgcc_s.so.1()(64bit)
# libgcc_s.so.1(GCC_3.0)(64bit)
# libgcc_s.so.1(GCC_3.3)(64bit)
# libgcc_s.so.1(GCC_4.2.0)(64bit)
# libm.so.6()(64bit)
# libm.so.6(GLIBC_2.2.5)(64bit)
# libpthread.so.0()(64bit)
# libpthread.so.0(GLIBC_2.2.5)(64bit)
# libpthread.so.0(GLIBC_2.3.2)(64bit)
# libpthread.so.0(GLIBC_2.3.3)(64bit)
# libutil.so.1()(64bit)
# libutil.so.1(GLIBC_2.2.5)(64bit)
# qemu-kiwi >= 5.1.0-16
# rpmlib(CompressedFileNames) <= 3.0.4-1
# rpmlib(FileDigests) <= 4.6.0-1
# rpmlib(PayloadFilesHavePrefix) <= 4.0-1
# rpmlib(PayloadIsXz) <= 5.2-1
# rtld(GNU_HASH)
# systemd
# systemd
# systemd

rpm -qpi qemu-kiwi-5.2.0-16.module+el8.4.0+13460+2e130eec.13.x86_64.rpm
# warning: qemu-kiwi-5.2.0-16.module+el8.4.0+13460+2e130eec.13.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
# Name        : qemu-kiwi
# Epoch       : 15
# Version     : 5.2.0
# Release     : 16.module+el8.4.0+13460+2e130eec.13
# Architecture: x86_64
# Install Date: (not installed)
# Group       : Development/Tools
# Size        : 12941413
# License     : GPLv2 and GPLv2+ and CC-BY
# Signature   : RSA/SHA256, Tue 30 Nov 2021 10:43:30 PM UTC, Key ID 199e2f91fd431d51
# Source RPM  : qemu-kvm-5.2.0-16.module+el8.4.0+13460+2e130eec.13.src.rpm
# Build Date  : Fri 26 Nov 2021 09:59:08 PM UTC
# Build Host  : x86-037.build.eng.bos.redhat.com
# Relocations : (not relocatable)
# Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
# Vendor      : Red Hat, Inc.
# URL         : http://www.qemu.org/
# Summary     : qemu-kiwi components
# Description :
# qemu-kiwi is a version of qemu-kvm with a restricted set of features
# intended for use by specific applications.
# It's experimental and unsupported.

rpm -qp --fileprovide qemu-kiwi-5.2.0-16.module+el8.4.0+13460+2e130eec.13.x86_64.rpm
# warning: qemu-kiwi-5.2.0-16.module+el8.4.0+13460+2e130eec.13.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
# /usr/lib/.build-id
# /usr/lib/.build-id/02
# /usr/lib/.build-id/02/3daf3e2bc89b7e0363ac89ea46bb70ddd74ae7
# /usr/libexec/qemu-kiwi
# /usr/share/systemtap/tapset/qemu-kiwi-log.stp
# /usr/share/systemtap/tapset/qemu-kiwi-simpletrace.stp
# /usr/share/systemtap/tapset/qemu-kiwi.stp

rpm -qp --requires qemu-kiwi-5.2.0-16.module+el8.4.0+13460+2e130eec.13.x86_64.rpm
# warning: qemu-kiwi-5.2.0-16.module+el8.4.0+13460+2e130eec.13.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
# libaio.so.1()(64bit)
# libaio.so.1(LIBAIO_0.1)(64bit)
# libaio.so.1(LIBAIO_0.4)(64bit)
# libc.so.6()(64bit)
# libc.so.6(GLIBC_2.10)(64bit)
# libc.so.6(GLIBC_2.11)(64bit)
# libc.so.6(GLIBC_2.12)(64bit)
# libc.so.6(GLIBC_2.14)(64bit)
# libc.so.6(GLIBC_2.17)(64bit)
# libc.so.6(GLIBC_2.2.5)(64bit)
# libc.so.6(GLIBC_2.25)(64bit)
# libc.so.6(GLIBC_2.27)(64bit)
# libc.so.6(GLIBC_2.28)(64bit)
# libc.so.6(GLIBC_2.3)(64bit)
# libc.so.6(GLIBC_2.3.2)(64bit)
# libc.so.6(GLIBC_2.3.4)(64bit)
# libc.so.6(GLIBC_2.4)(64bit)
# libc.so.6(GLIBC_2.7)(64bit)
# libc.so.6(GLIBC_2.8)(64bit)
# libc.so.6(GLIBC_2.9)(64bit)
# libgcc_s.so.1()(64bit)
# libgcc_s.so.1(GCC_3.0)(64bit)
# libgcc_s.so.1(GCC_3.3.1)(64bit)
# libgcc_s.so.1(GCC_3.4)(64bit)
# libgcc_s.so.1(GCC_4.7.0)(64bit)
# libgio-2.0.so.0()(64bit)
# libglib-2.0.so.0()(64bit)
# libgobject-2.0.so.0()(64bit)
# libm.so.6()(64bit)
# libm.so.6(GLIBC_2.2.5)(64bit)
# libnuma.so.1()(64bit)
# libnuma.so.1(libnuma_1.1)(64bit)
# libpixman-1.so.0()(64bit)
# libpmem.so.1()(64bit)
# libpmem.so.1(LIBPMEM_1.0)(64bit)
# libpthread.so.0()(64bit)
# libpthread.so.0(GLIBC_2.12)(64bit)
# libpthread.so.0(GLIBC_2.2.5)(64bit)
# libpthread.so.0(GLIBC_2.3.2)(64bit)
# libseccomp.so.2()(64bit)
# libutil.so.1()(64bit)
# libutil.so.1(GLIBC_2.2.5)(64bit)
# libz.so.1()(64bit)
# libz.so.1(ZLIB_1.2.0)(64bit)
# qemu-kvm-common = 15:5.2.0-16.module+el8.4.0+13460+2e130eec.13
# rpmlib(CompressedFileNames) <= 3.0.4-1
# rpmlib(FileDigests) <= 4.6.0-1
# rpmlib(PayloadFilesHavePrefix) <= 4.0-1
# rpmlib(PayloadIsXz) <= 5.2-1
# rtld(GNU_HASH)


end

sriov on openshift4 with unsupport NIC

openshift4自带sriov支持,但是由于内核只认证了某些网卡,所以openshift4内置了一个白名单,sriov的功能只对这些网卡开放。那么我们做实验的时候,没有这些网卡,但是网卡本身支持sriov,怎么做实验呢?本文就讲述如何操作。

实验拓扑图

视频讲解:

there is nic whitelist build-in for openshift4's sriov, to disable it, using:

  • https://docs.openshift.com/container-platform/4.6/networking/hardware_networks/configuring-sriov-operator.html#disable-enable-sr-iov-operator-admission-control-webhook_configuring-sriov-operator

openshift

# sriov的实验不能在kvm里面做,因为sriov PF不能透传到kvm里面,那我们就搞一个物理机worker node

# check vendoer id and device id
# https://access.redhat.com/solutions/56081

# on worker-1
lspci -vv | grep -i Mellanox
# 04:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
#         Subsystem: Mellanox Technologies Device 0011
# 04:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
#         Subsystem: Mellanox Technologies Device 0011

lspci -nvv | grep "04:00.0\|04:00.1"
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:00.0 0200: 15b3:101d
# 04:00.1 0200: 15b3:101d

cat /sys/class/net/*/device/sriov_numvfs
# 0
# 0
cat /sys/class/net/*/device/sriov_totalvfs
# 8
# 8

install NFD ( node feature discovery) operator

install SRIOV operator

oc create namespace openshift-sriov-network-operator

oc create -f - <<EOF
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
spec:
  targetNamespaces:
  - openshift-sriov-network-operator
EOF

# https://catalog.redhat.com/software/containers/openshift4/dpdk-base-rhel8/5e32be6cdd19c77896004a41
# registry.redhat.io/openshift4/dpdk-base-rhel8:latest

# oc get sriovnetworknodestates -n openshift-sriov-network-operator -o jsonpath='{.items[*].status}'  | jq

# 可以看到,worker-1上的网卡,已经辨别出了VF
oc get sriovnetworknodestates -n openshift-sriov-network-operator -o json  | jq ".items[] | (.metadata.name, .status)"
"master-0"
{
  "interfaces": [
    {
      "deviceID": "1000",
      "driver": "virtio-pci",
      "pciAddress": "0000:00:03.0",
      "vendor": "1af4"
    }
  ],
  "syncStatus": "Succeeded"
}
"master-1"
{
  "interfaces": [
    {
      "deviceID": "1000",
      "driver": "virtio-pci",
      "pciAddress": "0000:00:03.0",
      "vendor": "1af4"
    }
  ],
  "syncStatus": "Succeeded"
}
"master-2"
{
  "interfaces": [
    {
      "deviceID": "1000",
      "driver": "virtio-pci",
      "pciAddress": "0000:00:03.0",
      "vendor": "1af4"
    }
  ],
  "syncStatus": "Succeeded"
}
"worker-0"
{
  "interfaces": [
    {
      "deviceID": "1000",
      "driver": "virtio-pci",
      "pciAddress": "0000:00:03.0",
      "vendor": "1af4"
    }
  ],
  "syncStatus": "Succeeded"
}
"worker-1"
{
  "interfaces": [
    {
      "deviceID": "165f",
      "driver": "tg3",
      "linkSpeed": "1000 Mb/s",
      "linkType": "ETH",
      "mac": "90:b1:1c:44:d6:0f",
      "mtu": 1500,
      "name": "eno1",
      "pciAddress": "0000:01:00.0",
      "vendor": "14e4"
    },
    {
      "deviceID": "165f",
      "driver": "tg3",
      "linkSpeed": "-1 Mb/s",
      "linkType": "ETH",
      "mac": "90:b1:1c:44:d6:10",
      "mtu": 1500,
      "name": "eno2",
      "pciAddress": "0000:01:00.1",
      "vendor": "14e4"
    },
    {
      "deviceID": "165f",
      "driver": "tg3",
      "linkSpeed": "-1 Mb/s",
      "linkType": "ETH",
      "mac": "90:b1:1c:44:d6:11",
      "mtu": 1500,
      "name": "eno3",
      "pciAddress": "0000:02:00.0",
      "vendor": "14e4"
    },
    {
      "deviceID": "165f",
      "driver": "tg3",
      "linkSpeed": "-1 Mb/s",
      "linkType": "ETH",
      "mac": "90:b1:1c:44:d6:12",
      "mtu": 1500,
      "name": "eno4",
      "pciAddress": "0000:02:00.1",
      "vendor": "14e4"
    },
    {
      "deviceID": "101d",
      "driver": "mlx5_core",
      "linkSpeed": "-1 Mb/s",
      "linkType": "ETH",
      "mac": "0c:42:a1:fa:18:52",
      "mtu": 1500,
      "name": "enp4s0f0",
      "pciAddress": "0000:04:00.0",
      "totalvfs": 8,
      "vendor": "15b3"
    },
    {
      "deviceID": "101d",
      "driver": "mlx5_core",
      "linkSpeed": "-1 Mb/s",
      "linkType": "ETH",
      "mac": "0c:42:a1:fa:18:53",
      "mtu": 1500,
      "name": "enp4s0f1",
      "pciAddress": "0000:04:00.1",
      "totalvfs": 8,
      "vendor": "15b3"
    }
  ],
  "syncStatus": "Succeeded"
}
# config worker-1 with hugepage

cat << EOF > /data/install/worker-performance.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-performance
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-performance]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-performance: ""

EOF
oc create -f /data/install/worker-performance.yaml

# to restore
oc delete -f /data/install/worker-performance.yaml

oc label node worker-1 node-role.kubernetes.io/worker-performance=""

cat << EOF > /data/install/worker-1-hugepage.yaml
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: worker-1-hugepage
spec:
  cpu:
    isolated: "5-23"
    reserved: "0-4"
  hugepages:
    defaultHugepagesSize: 1G
    pages:
      - count: 4
        size: 1G
  nodeSelector:
    node-role.kubernetes.io/worker-performance: ''
EOF
oc create -f /data/install/worker-1-hugepage.yaml

# to restore
oc delete -f /data/install/worker-1-hugepage.yaml

# on worker-1
grep -i huge /proc/meminfo
# before
# AnonHugePages:    448512 kB
# ShmemHugePages:        0 kB
# HugePages_Total:       0
# HugePages_Free:        0
# HugePages_Rsvd:        0
# HugePages_Surp:        0
# Hugepagesize:       2048 kB
# Hugetlb:               0 kB

# after
# AnonHugePages:    376832 kB
# ShmemHugePages:        0 kB
# HugePages_Total:       4
# HugePages_Free:        4
# HugePages_Rsvd:        0
# HugePages_Surp:        0
# Hugepagesize:    1048576 kB
# Hugetlb:         4194304 kB


cat << EOF > /data/install/sriov-cx4.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-cx4-net-1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: cx4nic1
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  nicSelector:
    vendor: "15b3"
    deviceID: "101d"
    # rootDevices:
    #   - "0000:19:00.0"
  deviceType: netdevice 
  isRdma: true
EOF
oc create -f /data/install/sriov-cx4.yaml
# Error from server (vendor/device 15b3/101d is not supported): error when creating "/data/install/sriov-cx4.yaml": admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: vendor/device 15b3/101d is not supported

# to restore
oc delete -f /data/install/sriov-cx4.yaml

oc get sriovoperatorconfig default -n openshift-sriov-network-operator -o yaml | yq e '.spec' -
# enableInjector: true
# enableOperatorWebhook: true
# logLevel: 2

oc patch sriovoperatorconfig default --type=merge \
  -n openshift-sriov-network-operator \
  --patch '{ "spec": { "enableOperatorWebhook": false } }'

oc get sriovoperatorconfig default -n openshift-sriov-network-operator -o yaml | yq e '.spec' -
# enableInjector: true
# enableOperatorWebhook: false
# logLevel: 2

oc create -f /data/install/sriov-cx4.yaml
# sriovnetworknodepolicy.sriovnetwork.openshift.io/policy-cx4-net-1 created

# you can see, VF num set to '4'
# oc get sriovnetworknodestates worker-1 -n openshift-sriov-network-operator -o json  | jq "(.metadata.name, .status)"
oc get sriovnetworknodestates worker-1 -n openshift-sriov-network-operator -o yaml | yq e "del(.metadata.managedFields)" -
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  creationTimestamp: "2021-06-30T16:00:09Z"
  generation: 4
  name: worker-1
  namespace: openshift-sriov-network-operator
  ownerReferences:
    - apiVersion: sriovnetwork.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: SriovNetworkNodePolicy
      name: default
      uid: cef00fc5-7952-42ec-b863-980fdc1e6318
  resourceVersion: "4425538"
  selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/worker-1
  uid: fcf58d46-3127-4956-ac2f-df5ce2e2ac8c
spec:
  dpConfigVersion: "4381421"
  interfaces:
    - name: enp4s0f0
      numVfs: 4
      pciAddress: "0000:04:00.0"
      vfGroups:
        - deviceType: netdevice
          policyName: policy-cx4-net-1
          resourceName: cx4nic1
          vfRange: 0-3
    - name: enp4s0f1
      numVfs: 4
      pciAddress: "0000:04:00.1"
      vfGroups:
        - deviceType: netdevice
          policyName: policy-cx4-net-1
          resourceName: cx4nic1
          vfRange: 0-3
status:
  interfaces:
    - deviceID: 165f
      driver: tg3
      linkSpeed: 1000 Mb/s
      linkType: ETH
      mac: 90:b1:1c:44:d6:0f
      mtu: 1500
      name: eno1
      pciAddress: "0000:01:00.0"
      vendor: "14e4"
    - deviceID: 165f
      driver: tg3
      linkSpeed: -1 Mb/s
      linkType: ETH
      mac: 90:b1:1c:44:d6:10
      mtu: 1500
      name: eno2
      pciAddress: "0000:01:00.1"
      vendor: "14e4"
    - deviceID: 165f
      driver: tg3
      linkSpeed: -1 Mb/s
      linkType: ETH
      mac: 90:b1:1c:44:d6:11
      mtu: 1500
      name: eno3
      pciAddress: "0000:02:00.0"
      vendor: "14e4"
    - deviceID: 165f
      driver: tg3
      linkSpeed: -1 Mb/s
      linkType: ETH
      mac: 90:b1:1c:44:d6:12
      mtu: 1500
      name: eno4
      pciAddress: "0000:02:00.1"
      vendor: "14e4"
    - Vfs:
        - deviceID: 101e
          driver: mlx5_core
          mac: 36:da:1c:a9:47:9a
          mtu: 1500
          name: enp4s0f0v0
          pciAddress: "0000:04:00.2"
          vendor: 15b3
          vfID: 0
        - deviceID: 101e
          driver: mlx5_core
          mac: 62:ab:95:db:e6:cc
          mtu: 1500
          name: enp4s0f0v1
          pciAddress: "0000:04:00.3"
          vendor: 15b3
          vfID: 1
        - deviceID: 101e
          driver: mlx5_core
          pciAddress: "0000:04:00.4"
          vendor: 15b3
          vfID: 2
        - deviceID: 101e
          driver: mlx5_core
          mac: 5e:9f:cc:cc:e4:a1
          mtu: 1500
          name: enp4s0f0v3
          pciAddress: "0000:04:00.5"
          vendor: 15b3
          vfID: 3
      deviceID: 101d
      driver: mlx5_core
      eSwitchMode: legacy
      linkSpeed: -1 Mb/s
      linkType: ETH
      mac: 0c:42:a1:fa:18:52
      mtu: 1500
      name: enp4s0f0
      numVfs: 4
      pciAddress: "0000:04:00.0"
      totalvfs: 4
      vendor: 15b3
    - Vfs:
        - deviceID: 101e
          driver: mlx5_core
          mac: e6:75:48:6f:56:33
          mtu: 1500
          name: enp4s0f1v0
          pciAddress: "0000:04:00.6"
          vendor: 15b3
          vfID: 0
        - deviceID: 101e
          driver: mlx5_core
          mac: 5a:74:7a:e7:3d:2b
          mtu: 1500
          name: enp4s0f1v1
          pciAddress: "0000:04:00.7"
          vendor: 15b3
          vfID: 1
        - deviceID: 101e
          driver: mlx5_core
          mac: 62:f8:19:98:d5:5f
          mtu: 1500
          name: enp4s0f1v2
          pciAddress: "0000:04:01.0"
          vendor: 15b3
          vfID: 2
        - deviceID: 101e
          driver: mlx5_core
          mac: f2:14:1e:93:e9:39
          mtu: 1500
          name: enp4s0f1v3
          pciAddress: "0000:04:01.1"
          vendor: 15b3
          vfID: 3
      deviceID: 101d
      driver: mlx5_core
      eSwitchMode: legacy
      linkSpeed: -1 Mb/s
      linkType: ETH
      mac: 0c:42:a1:fa:18:53
      mtu: 1500
      name: enp4s0f1
      numVfs: 4
      pciAddress: "0000:04:00.1"
      totalvfs: 4
      vendor: 15b3
  syncStatus: Succeeded
cat << EOF > /data/install/sriov-network.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: mlx-dpdk-network
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: demo
  ipam: "{}"
  resourceName: cx4nic1
EOF
oc create -f /data/install/sriov-network.yaml

# to restore
oc delete -f /data/install/sriov-network.yaml

# https://github.com/openshift/sriov-network-operator/issues/133

lspci -vv | grep -i Mellanox
# 04:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
#         Subsystem: Mellanox Technologies Device 0011
# 04:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
#         Subsystem: Mellanox Technologies Device 0011
# 04:00.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
#         Subsystem: Mellanox Technologies Device 0011
# 04:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
#         Subsystem: Mellanox Technologies Device 0011
# 04:00.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
#         Subsystem: Mellanox Technologies Device 0011
# 04:00.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
#         Subsystem: Mellanox Technologies Device 0011
# 04:00.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
#         Subsystem: Mellanox Technologies Device 0011
# 04:00.7 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
#         Subsystem: Mellanox Technologies Device 0011
# 04:01.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
#         Subsystem: Mellanox Technologies Device 0011
# 04:01.1 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
#         Subsystem: Mellanox Technologies Device 0011

lspci -nvv | grep "04:00.0\|04:00.1"
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:00.0 0200: 15b3:101d
# 04:00.1 0200: 15b3:101d

lspci | grep -i Mellanox | awk '{print $1}' | xargs -I DEMO sh -c "lspci -nvv | grep DEMO "
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:00.0 0200: 15b3:101d
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:00.1 0200: 15b3:101d
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:00.2 0200: 15b3:101e
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:00.3 0200: 15b3:101e
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:00.4 0200: 15b3:101e
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:00.5 0200: 15b3:101e
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:00.6 0200: 15b3:101e
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:00.7 0200: 15b3:101e
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:01.0 0200: 15b3:101e
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 04:01.1 0200: 15b3:101e

# <human readable name>: <vendor ID> <pf ID> <vf ID>
cat << EOF > /data/install/sriov-unsupport.yaml
apiVersion: v1
data:
  CX6DX: 15b3 101d 101e
kind: ConfigMap
metadata:
  name: unsupported-nic-ids
  namespace: openshift-sriov-network-operator
EOF
oc create -f /data/install/sriov-unsupport.yaml

# try to deply a demo pod
cat << EOF > /data/install/dpdk-test.yaml
apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  annotations:
    k8s.v1.cni.cncf.io/networks: mlx-dpdk-network
spec:
  containers:
  - name: testpmd
    image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.6
    securityContext:
     capabilities:
        add: ["IPC_LOCK"] 
    volumeMounts:
    - mountPath: /dev/hugepages 
      name: hugepage
    resources:
      limits:
        openshift.io/cx4nic1: "1" 
        memory: "1Gi"
        cpu: "4" 
        hugepages-1Gi: "4Gi" 
      requests:
        openshift.io/cx4nic1: "1"
        memory: "1Gi"
        cpu: "4"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages
EOF
oc create -n demo -f /data/install/dpdk-test.yaml

# to restore
oc delete -n demo -f /data/install/dpdk-test.yaml

# in the pod
rpm -ql dpdk-tools
# /usr/sbin/dpdk-devbind
# /usr/share/dpdk/usertools
# /usr/share/dpdk/usertools/cpu_layout.py
# /usr/share/dpdk/usertools/dpdk-devbind.py
# /usr/share/dpdk/usertools/dpdk-pmdinfo.py
# /usr/share/dpdk/usertools/dpdk-telemetry-client.py

/usr/share/dpdk/usertools/dpdk-devbind.py --status-dev net
# lspci: Unable to load libkmod resources: error -12
# lspci: Unable to load libkmod resources: error -12
# lspci: Unable to load libkmod resources: error -12
# lspci: Unable to load libkmod resources: error -12
# lspci: Unable to load libkmod resources: error -12
# lspci: Unable to load libkmod resources: error -12
# lspci: Unable to load libkmod resources: error -12

# Network devices using kernel driver
# ===================================
# 0000:01:00.0 'NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 165f' if= drv=tg3 unused= 
# 0000:01:00.1 'NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 165f' if= drv=tg3 unused= 
# 0000:02:00.0 'NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 165f' if= drv=tg3 unused= 
# 0000:02:00.1 'NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 165f' if= drv=tg3 unused= 
# 0000:04:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if= drv=mlx5_core unused= 
# 0000:04:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if= drv=mlx5_core unused= 
# 0000:04:00.2 'ConnectX Family mlx5Gen Virtual Function 101e' if= drv=mlx5_core unused= 
# 0000:04:00.3 'ConnectX Family mlx5Gen Virtual Function 101e' if= drv=mlx5_core unused= 
# 0000:04:00.4 'ConnectX Family mlx5Gen Virtual Function 101e' if=net1 drv=mlx5_core unused= 
# 0000:04:00.5 'ConnectX Family mlx5Gen Virtual Function 101e' if= drv=mlx5_core unused= 
# 0000:04:00.6 'ConnectX Family mlx5Gen Virtual Function 101e' if= drv=mlx5_core unused= 
# 0000:04:00.7 'ConnectX Family mlx5Gen Virtual Function 101e' if= drv=mlx5_core unused= 
# 0000:04:01.0 'ConnectX Family mlx5Gen Virtual Function 101e' if= drv=mlx5_core unused= 
# 0000:04:01.1 'ConnectX Family mlx5Gen Virtual Function 101e' if= drv=mlx5_core unused= 

kvm does't support sriov PF passthrough

it only support VF passthrough

  • https://www.cnblogs.com/dion-90/articles/8522733.html
  • https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-pci_devices-pci_passthrough
# on 101
ls /sys/class/net/

lspci -vv | grep -i Mellanox
# pcilib: sysfs_read_vpd: read failed: Input/output error
# 05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
#         Subsystem: Mellanox Technologies Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT
# 05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
#         Subsystem: Mellanox Technologies Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT
# 07:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
#         Subsystem: Mellanox Technologies Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT
# 07:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
#         Subsystem: Mellanox Technologies Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT

virsh nodedev-list | grep 000_05
# pci_0000_05_00_0
# pci_0000_05_00_1

virsh nodedev-dumpxml pci_0000_05_00_0
<device>
  <name>pci_0000_05_00_0</name>
  <path>/sys/devices/pci0000:00/0000:00:06.0/0000:05:00.0</path>
  <parent>pci_0000_00_06_0</parent>
  <driver>
    <name>mlx5_core</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>5</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x1015'>MT27710 Family [ConnectX-4 Lx]</product>
    <vendor id='0x15b3'>Mellanox Technologies</vendor>
    <capability type='virt_functions' maxCount='64'/>
    <iommuGroup number='17'>
      <address domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </iommuGroup>
    <pci-express>
      <link validity='cap' port='0' speed='8' width='8'/>
      <link validity='sta' speed='5' width='4'/>
    </pci-express>
  </capability>
</device>
virsh nodedev-dumpxml pci_0000_05_00_1
<device>
  <name>pci_0000_05_00_1</name>
  <path>/sys/devices/pci0000:00/0000:00:06.0/0000:05:00.1</path>
  <parent>pci_0000_00_06_0</parent>
  <driver>
    <name>mlx5_core</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>5</bus>
    <slot>0</slot>
    <function>1</function>
    <product id='0x1015'>MT27710 Family [ConnectX-4 Lx]</product>
    <vendor id='0x15b3'>Mellanox Technologies</vendor>
    <capability type='virt_functions' maxCount='64'/>
    <iommuGroup number='18'>
      <address domain='0x0000' bus='0x05' slot='0x00' function='0x1'/>
    </iommuGroup>
    <pci-express>
      <link validity='cap' port='0' speed='8' width='8'/>
      <link validity='sta' speed='5' width='4'/>
    </pci-express>
  </capability>
</device>

on 103

ls /sys/class/net/
# baremetal  eno1  eno2  eno3  eno4  enp4s0f0  enp4s0f1  lo  virbr0  virbr0-nic
echo 0 > /sys/class/net/enp4s0f0/device/sriov_numvfs
echo 0 > /sys/class/net/enp4s0f1/device/sriov_numvfs

lspci -vv | grep -i Mellanox
# 04:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
#         Subsystem: Mellanox Technologies Device 0011
# 04:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
#         Subsystem: Mellanox Technologies Device 0011

virsh nodedev-list | grep 000_04
# pci_0000_04_00_0
# pci_0000_04_00_1

virsh nodedev-dumpxml pci_0000_04_00_0
<device>
  <name>pci_0000_04_00_0</name>
  <path>/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0</path>
  <parent>pci_0000_00_02_0</parent>
  <driver>
    <name>mlx5_core</name>
  </driver>
  <capability type='pci'>
    <class>0x020000</class>
    <domain>0</domain>
    <bus>4</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x101d'>MT2892 Family [ConnectX-6 Dx]</product>
    <vendor id='0x15b3'>Mellanox Technologies</vendor>
    <capability type='virt_functions' maxCount='8'/>
    <iommuGroup number='27'>
      <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </iommuGroup>
    <numa node='0'/>
    <pci-express>
      <link validity='cap' port='0' speed='16' width='16'/>
      <link validity='sta' speed='8' width='8'/>
    </pci-express>
  </capability>
</device>
virsh nodedev-dumpxml pci_0000_04_00_1
<device>
  <name>pci_0000_04_00_1</name>
  <path>/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1</path>
  <parent>pci_0000_00_02_0</parent>
  <driver>
    <name>mlx5_core</name>
  </driver>
  <capability type='pci'>
    <class>0x020000</class>
    <domain>0</domain>
    <bus>4</bus>
    <slot>0</slot>
    <function>1</function>
    <product id='0x101d'>MT2892 Family [ConnectX-6 Dx]</product>
    <vendor id='0x15b3'>Mellanox Technologies</vendor>
    <capability type='virt_functions' maxCount='8'/>
    <iommuGroup number='28'>
      <address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
    </iommuGroup>
    <numa node='0'/>
    <pci-express>
      <link validity='cap' port='0' speed='16' width='16'/>
      <link validity='sta' speed='8' width='8'/>
    </pci-express>
  </capability>
</device>

for ocp4-aHelper, change below kvm config

<hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/>
    </hostdev>

to

    <interface type='hostdev' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
      </source>
    </interface>
    <interface type='hostdev' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x1'/>
      </source>
    </interface>
virsh edit ocp4-aHelper

keepalived operator in openshift4

痛点

openshift4 标准安装,使用router(haproxy)来做ingress,向集群导入流量,这么做,默认只能工作在7层,虽然也有方法进行定制,让他工作在4层,但是不管从对外暴露的IP地址的可管理性,以及应用端口冲突处理方面来说,都非常不方便。

根本原因,其实是openshift4 私有化安装不支持 LoadBalancer 这个service type。 那么今天我们就找了 keepalived operator,来弥补这个缺陷。

视频讲解:

本文,参考openshift blog上的文章

  • https://www.openshift.com/blog/self-hosted-load-balancer-for-openshift-an-operator-based-approach
  • https://github.com/redhat-cop/keepalived-operator

试验架构图

可以看到,keepalived,会在节点上,根据service的定义,创建second IP,然后外部流量,就从这个IP地址,进入集群。这是一种k8s LoadBalancer 的实现方式,和ingress controller的方式对比,就是天然支持tcp模式的4层转发。

安装keepalived operator很简单

在web界面操作完了,需要标记节点,已经调整一下权限

oc label node master-2 node-role.kubernetes.io/loadbalancer=""
oc label node master-1 node-role.kubernetes.io/loadbalancer=""

oc adm policy add-scc-to-user privileged -z default -n keepalived-operator

接下来,我们来看看keepalived的部署有什么特殊的地方。

我们可以看到 keepalived pod 使用了 hostnetwork 和 privileged: true。 但是keepalived pod 没有挂载特殊的主机目录。

测试部署一个应用

cat << 'EOF' > /data/install/network-patch.yaml
spec:
  externalIP:
    policy:
      allowedCIDRs:
      - ${ALLOWED_CIDR}
    autoAssignCIDRs:
      - "${AUTOASSIGNED_CIDR}"
EOF

# export VERSION="4.9.4"
# export BINARY="yq_linux_amd64"
# wget https://github.com/mikefarah/yq/releases/download/${VERSION}/${BINARY} -O /usr/local/bin/yq && chmod +x /usr/local/bin/yq

# 24 256
# 25 128
# 26 64
# 27 32
# 28 16
cd /data/install
export ALLOWED_CIDR="172.21.6.33/27"
export AUTOASSIGNED_CIDR="172.21.6.33/27"
oc patch network cluster -p "$(envsubst < ./network-patch.yaml | yq eval -j -)" --type=merge

oc get network cluster -o yaml
# spec:
#   clusterNetwork:
#   - cidr: 10.254.0.0/16
#     hostPrefix: 24
#   externalIP:
#     autoAssignCIDRs:
#     - 172.21.6.33/27
#     policy:
#       allowedCIDRs:
#       - 172.21.6.33/27
#   networkType: OpenShiftSDN
#   serviceNetwork:
#   - 172.30.0.0/16
# status:
#   clusterNetwork:
#   - cidr: 10.254.0.0/16
#     hostPrefix: 24
#   clusterNetworkMTU: 1450
#   networkType: OpenShiftSDN
#   serviceNetwork:
#   - 172.30.0.0/16

oc new-project demo

cat << EOF > /data/install/demo.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: test-0
  labels:
    env: test
spec:
  restartPolicy: OnFailure
  nodeSelector:
    kubernetes.io/hostname: 'master-0'
  containers:
  - name: php
    image: "quay.io/wangzheng422/php:demo.02"
---
apiVersion: v1
kind: Pod
metadata:
  name: test-1
  labels:
    env: test
spec:
  restartPolicy: OnFailure
  nodeSelector:
    kubernetes.io/hostname: 'master-2'
  containers:
  - name: php
    image: "quay.io/wangzheng422/php:demo.02"
---
kind: Service
apiVersion: v1
metadata:
  name: demo
  annotations:
    keepalived-operator.redhat-cop.io/keepalivedgroup: keepalived-operator/keepalivedgroup-workers
spec:
  type: LoadBalancer
  ports:
    - name: "http"
      protocol: TCP
      port: 80
      targetPort: 80
  selector:
    env: test
EOF
oc create -n demo -f /data/install/demo.yaml

# to restore
oc delete -n demo -f /data/install/demo.yaml

分析一下应用的行为

看看service的配置,能看到已经分配了对外的IP

oc get svc
# NAME   TYPE           CLUSTER-IP       EXTERNAL-IP               PORT(S)        AGE
# demo   LoadBalancer   172.30.203.237   172.21.6.50,172.21.6.50   80:31682/TCP   14m

curl http://172.21.6.50/
# Hello!<br>Welcome to RedHat Developer<br>Enjoy all of the ad-free articles<br>

master-2 上面,相关的iptables 配置

    0     0 KUBE-FW-ZFZLPEKTCJ3DBGAL  tcp  --  *      *       0.0.0.0/0            172.21.6.50          /* demo/demo:http loadbalancer IP */ tcp dpt:80

可以看到,svc的防火墙策略,分流到了pod

keepalived pods definition

we can see, it use hostnetwork and privileged: true

kind: Pod
apiVersion: v1
metadata:
  generateName: keepalivedgroup-workers-
  annotations:
    openshift.io/scc: privileged
  selfLink: /api/v1/namespaces/keepalived-operator/pods/keepalivedgroup-workers-fgzv8
  resourceVersion: '2700532'
  name: keepalivedgroup-workers-fgzv8
  uid: 1addc7c7-4e6d-49c7-ae5e-3a4e2963755b
  creationTimestamp: '2021-06-09T08:51:40Z'
  namespace: keepalived-operator
  ownerReferences:
    - apiVersion: apps/v1
      kind: DaemonSet
      name: keepalivedgroup-workers
      uid: dba36a9c-f2aa-4951-aa60-a3836275ae1b
      controller: true
      blockOwnerDeletion: true
  labels:
    controller-revision-hash: 7459c85f64
    keepalivedGroup: keepalivedgroup-workers
    pod-template-generation: '1'
spec:
  nodeSelector:
    node-role.kubernetes.io/loadbalancer: ''
  restartPolicy: Always
  initContainers:
    - resources: {}
      terminationMessagePath: /dev/termination-log
      name: config-setup
      command:
        - bash
        - '-c'
        - /usr/local/bin/notify.sh
      env:
        - name: file
          value: /etc/keepalived.d/src/keepalived.conf
        - name: dst_file
          value: /etc/keepalived.d/dst/keepalived.conf
        - name: reachip
        - name: create_config_only
          value: 'true'
      securityContext:
        runAsUser: 0
      imagePullPolicy: Always
      volumeMounts:
        - name: config
          readOnly: true
          mountPath: /etc/keepalived.d/src
        - name: config-dst
          mountPath: /etc/keepalived.d/dst
      terminationMessagePolicy: File
      image: 'quay.io/redhat-cop/keepalived-operator:latest'
  serviceAccountName: default
  imagePullSecrets:
    - name: default-dockercfg-2d5d5
  priority: 0
  schedulerName: default-scheduler
  hostNetwork: true
  enableServiceLinks: false
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchFields:
              - key: metadata.name
                operator: In
                values:
                  - master-1
  terminationGracePeriodSeconds: 30
  shareProcessNamespace: true
  preemptionPolicy: PreemptLowerPriority
  nodeName: master-1
  securityContext: {}
  containers:
    - resources: {}
      terminationMessagePath: /dev/termination-log
      name: keepalived
      command:
        - /bin/bash
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
      securityContext:
        privileged: true
      imagePullPolicy: Always
      volumeMounts:
        - name: lib-modules
          readOnly: true
          mountPath: /lib/modules
        - name: config-dst
          readOnly: true
          mountPath: /etc/keepalived.d
        - name: pid
          mountPath: /etc/keepalived.pid
        - name: stats
          mountPath: /tmp
      terminationMessagePolicy: File
      image: registry.redhat.io/openshift4/ose-keepalived-ipfailover
      args:
        - '-c'
        - >
          exec /usr/sbin/keepalived --log-console --log-detail --dont-fork
          --config-id=${POD_NAME} --use-file=/etc/keepalived.d/keepalived.conf
          --pid=/etc/keepalived.pid/keepalived.pid
    - resources: {}
      terminationMessagePath: /dev/termination-log
      name: config-reloader
      command:
        - bash
        - '-c'
        - /usr/local/bin/notify.sh
      env:
        - name: pid
          value: /etc/keepalived.pid/keepalived.pid
        - name: file
          value: /etc/keepalived.d/src/keepalived.conf
        - name: dst_file
          value: /etc/keepalived.d/dst/keepalived.conf
        - name: reachip
        - name: create_config_only
          value: 'false'
      securityContext:
        runAsUser: 0
      imagePullPolicy: Always
      volumeMounts:
        - name: config
          readOnly: true
          mountPath: /etc/keepalived.d/src
        - name: config-dst
          mountPath: /etc/keepalived.d/dst
        - name: pid
          mountPath: /etc/keepalived.pid
      terminationMessagePolicy: File
      image: 'quay.io/redhat-cop/keepalived-operator:latest'
    - resources: {}
      terminationMessagePath: /dev/termination-log
      name: prometheus-exporter
      command:
        - /usr/local/bin/keepalived_exporter
      securityContext:
        privileged: true
      ports:
        - name: metrics
          hostPort: 9650
          containerPort: 9650
          protocol: TCP
      imagePullPolicy: Always
      volumeMounts:
        - name: lib-modules
          readOnly: true
          mountPath: /lib/modules
        - name: stats
          mountPath: /tmp
      terminationMessagePolicy: File
      image: 'quay.io/redhat-cop/keepalived-operator:latest'
      args:
        - '-web.listen-address'
        - ':9650'
        - '-web.telemetry-path'
        - /metrics
  automountServiceAccountToken: false
  serviceAccount: default
  volumes:
    - name: lib-modules
      hostPath:
        path: /lib/modules
        type: ''
    - name: config
      configMap:
        name: keepalivedgroup-workers
        defaultMode: 420
    - name: config-dst
      emptyDir: {}
    - name: pid
      emptyDir:
        medium: Memory
    - name: stats
      emptyDir: {}
  dnsPolicy: ClusterFirst
  tolerations:
    - operator: Exists
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
    - key: node.kubernetes.io/disk-pressure
      operator: Exists
      effect: NoSchedule
    - key: node.kubernetes.io/memory-pressure
      operator: Exists
      effect: NoSchedule
    - key: node.kubernetes.io/pid-pressure
      operator: Exists
      effect: NoSchedule
    - key: node.kubernetes.io/unschedulable
      operator: Exists
      effect: NoSchedule
    - key: node.kubernetes.io/network-unavailable
      operator: Exists
      effect: NoSchedule
status:
  containerStatuses:
    - restartCount: 0
      started: true
      ready: true
      name: config-reloader
      state:
        running:
          startedAt: '2021-06-09T08:52:34Z'
      imageID: >-
        quay.io/redhat-cop/keepalived-operator@sha256:dab32df252b705b07840dc0488fce0577ed743aaa33bed47e293f115bdda9348
      image: 'quay.io/redhat-cop/keepalived-operator:latest'
      lastState: {}
      containerID: 'cri-o://2d9c37aea1c623f1ff4afb50233c1d67567d3315ea64d10476cd613e8ccc2d04'
    - restartCount: 0
      started: true
      ready: true
      name: keepalived
      state:
        running:
          startedAt: '2021-06-09T08:52:34Z'
      imageID: >-
        registry.redhat.io/openshift4/ose-keepalived-ipfailover@sha256:385f014b07acc361d1bb41ffd9d3abc151ab64e01f42dacba80053a4dfcbd242
      image: 'registry.redhat.io/openshift4/ose-keepalived-ipfailover:latest'
      lastState: {}
      containerID: 'cri-o://02b384c94506b7dcbd18cbf8ceadef83b366c356de36b8e2646cc233f1c23902'
    - restartCount: 0
      started: true
      ready: true
      name: prometheus-exporter
      state:
        running:
          startedAt: '2021-06-09T08:52:34Z'
      imageID: >-
        quay.io/redhat-cop/keepalived-operator@sha256:dab32df252b705b07840dc0488fce0577ed743aaa33bed47e293f115bdda9348
      image: 'quay.io/redhat-cop/keepalived-operator:latest'
      lastState: {}
      containerID: 'cri-o://daeb85bf94923d9562a0cc777664397269ed642bd0d86cf993f12a2ff6fff925'
  qosClass: BestEffort
  podIPs:
    - ip: 192.168.7.14
  podIP: 192.168.7.14
  hostIP: 192.168.7.14
  startTime: '2021-06-09T08:51:40Z'
  initContainerStatuses:
    - name: config-setup
      state:
        terminated:
          exitCode: 0
          reason: Completed
          startedAt: '2021-06-09T08:51:54Z'
          finishedAt: '2021-06-09T08:51:54Z'
          containerID: >-
            cri-o://9ecc0e9a469a0518a7ca2fc5feef551d56c052dfe569dba391d0c0fc998b2f41
      lastState: {}
      ready: true
      restartCount: 0
      image: 'quay.io/redhat-cop/keepalived-operator:latest'
      imageID: >-
        quay.io/redhat-cop/keepalived-operator@sha256:dab32df252b705b07840dc0488fce0577ed743aaa33bed47e293f115bdda9348
      containerID: 'cri-o://9ecc0e9a469a0518a7ca2fc5feef551d56c052dfe569dba391d0c0fc998b2f41'
  conditions:
    - type: Initialized
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2021-06-09T08:51:55Z'
    - type: Ready
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2021-06-09T08:52:35Z'
    - type: ContainersReady
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2021-06-09T08:52:35Z'
    - type: PodScheduled
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2021-06-09T08:51:40Z'
  phase: Running

准备一个php的测试镜像

# 准备一个php的测试镜像

cat << 'EOF' > index.php
<?php
$localIP = getHostByName(getHostName());
ECHO "Hello!<br>";
echo "Welcome to RedHat Developer<br>";
EcHo "Enjoy all of the ad-free articles<br>".$localIP;
?>
EOF

cat << EOF > php.dockerfile
FROM php:apache
COPY . /var/www/html/
EOF

buildah bud -t quay.io/wangzheng422/php:demo.02 -f php.dockerfile .

buildah push quay.io/wangzheng422/php:demo.02

Real-Time Kernel for Openshift4

5G RAN vDU 对操作系统的实时性要求很高, 基本都要求基于实时操作系统搞, openshift4 是一个和操作系统紧密捆绑的paas平台, 内置了实时操作系统, 这个操作系统是使用了 rhel8 的内核, 并使用 ostree 打包的操作系统。

openshift4 可以在node 上启动实时操作系统,有2个办法,一个是通过performance-addon operator

  • https://docs.openshift.com/container-platform/4.7/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.html

另外一个,是直接用machine config的办法搞

  • https://docs.openshift.com/container-platform/4.7/post_installation_configuration/machine-configuration-tasks.html#nodes-nodes-rtkernel-arguments_post-install-machine-configuration-tasks

本次试验部署架构图

视频讲解:

操作系统上怎么做

用实时操作系统,就是为了性能,那么如果我们是一台物理机,不考虑容器平台,我们应该怎么配置,让这个实时操作系统性能最大化呢?

一般来说,有2个通用的配置

  • 对实时操作系统,并进行系统调优配置。
  • 物理机bios进行配置,关闭超线程,关闭irq balance,关闭cpu c-state 等节电功能。

对于第一个,实时操作系统的配置,参考这里

  • install kernel-rt
  • install rt-test
cat /etc/tuned/realtime-variables.conf
# isolated_cores=1-30
# isolate_managed_irq=Y
tuned-adm profile realtime
reboot

swapoff -a
systemctl stop irqbalance

对于第二个,物理机上bios配置,要找服务器的厂商文档,查看官方的low latency配置文档。 比如这里

System Setup ScreenSettingDefaultRecommended Alternative for Low- Latency Environments
Processor SettingsLogical ProcessorEnabledDisabled
Processor SettingsTurbo ModeEnabledDisabled2
Processor SettingsC-StatesEnabledDisabled
Processor SettingsC1EEnabledDisabled
Power ManagementPower ManagementActive Power ControllerMaximum Performance

先使用performance addon operator,这个是官方推荐的方法。

performance addon operator 是openshift4里面的一个operator,他的作用是,让用户进行简单的yaml配置,然后operator帮助客户进行复杂的kernel parameter, kubelet, tuned配置。

# on 104, create a new worker node
export KVM_DIRECTORY=/data/kvm

mkdir -p  ${KVM_DIRECTORY}
cd ${KVM_DIRECTORY}
scp root@172.21.6.11:/data/install/{*worker-0}.iso ${KVM_DIRECTORY}/

virt-install --name=ocp4-worker0 --vcpus=4 --ram=8192 \
--disk path=/data/kvm/ocp4-worker0.qcow2,bus=virtio,size=120 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--graphics vnc,listen=127.0.0.1,port=59005 \
--boot menu=on --cdrom ${KVM_DIRECTORY}/rhcos_install-worker-0.iso 

# go back to helper
oc get csr
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

# install performance addon operator following offical document
# https://docs.openshift.com/container-platform/4.7/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.html

cat << EOF > /data/install/worker-rt.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-rt
  labels:
    machineconfiguration.openshift.io/role: worker-rt
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-rt]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-rt: ""

EOF
oc create -f /data/install/worker-rt.yaml

oc label MachineConfigPool/worker-rt machineconfiguration.openshift.io/role=worker-rt

# to restore
oc delete -f /data/install/worker-rt.yaml

oc label node worker-0 node-role.kubernetes.io/worker-rt=""

# 以下的配置,是保留了0-1核给系统,剩下的2-3核给应用,实际物理机上,一般是2-19给应用。
cat << EOF > /data/install/performance.yaml
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
   name: example-performanceprofile
spec:
  additionalKernelArgs:
    - selinux=0
    - intel_iommu=on
  globallyDisableIrqLoadBalancing: true
  cpu:
      isolated: "2-3"
      reserved: "0-1"
  hugepages:
      defaultHugepagesSize: "1G"
      pages:
         - size: "1G"
           count: 2
           node: 0
  realTimeKernel:
      enabled: true
  numa:  
      topologyPolicy: "single-numa-node"
  nodeSelector:
      node-role.kubernetes.io/worker-rt: ""

EOF
oc create -f /data/install/performance.yaml

# restore
oc delete -f /data/install/performance.yaml

# check the result
ssh core@worker-0
uname -a
# Linux worker-0 4.18.0-240.22.1.rt7.77.el8_3.x86_64 #1 SMP PREEMPT_RT Fri Mar 26 18:44:48 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

remove worker-0

oc delete node worker-0

virsh destroy ocp4-worker0 

virsh undefine ocp4-worker0 

try with machine config with tunned, this is DIY if you like :)

machine config的办法,特点是定制化程度很高,如果客户之前用rt-kernel的操作系统,调优过应用,那么用machine config的方法,能够直接把客户之前的调优参数于应用过来,就不用纠结各种调优的参数,在openshift4上面,应该怎么配置进去了。

you can use machine config dirctly, this can give you full customization capabilities. If you customer already fine-tune kernel parameter on rt-kernel, you can use their kernel parameter directly on openshift4 without try the parameters by yourself.

# 打开节点的real time kernel
# cat << EOF > /data/install/99-worker-realtime.yaml
# apiVersion: machineconfiguration.openshift.io/v1
# kind: MachineConfig
# metadata:
#   labels:
#     machineconfiguration.openshift.io/role: "worker-rt"
#   name: 99-worker-realtime
# spec:
#   kernelType: realtime
# EOF
# oc create -f  /data/install/99-worker-realtime.yaml

# 配置kernel启动参数,每个参数一行
# http://abcdxyzk.github.io/blog/2015/02/11/kernel-base-param/
# no_timer_check clocksource=tsc tsc=perfect intel_pstate=disable selinux=0 enforcing=0 nmi_watchdog=0 softlockup_panic=0 isolcpus=2-19 nohz_full=2-19 idle=poll default_hugepagesz=1G hugepagesz=1G hugepages=32  skew_tick=1 rcu_nocbs=2-19 kthread_cpus=0-1 irqaffinity=0-1 rcu_nocb_poll iommu=pt intel_iommu=on
cat << EOF > /data/install/05-worker-kernelarg-realtime.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker-rt
  name: 05-worker-kernelarg-realtime
spec:
  config:
    ignition:
      version: 3.1.0
  kernelArguments:
    - no_timer_check  # 禁止运行内核中时钟IRQ源缺陷检测代码。主要用于解决某些AMD平台的CPU占用过高以及时钟过快的故障。
    - clocksource=tsc # clocksource={jiffies|acpi_pm|hpet|tsc} tsc TSC(Time Stamp Counter)的主体是位于CPU里面的一个64位TSC寄存器,与传统的以中断形式存在的周期性时钟不同,TSC是以计数器形式存在的单步递增性时钟,两者的区别在于,周期性时钟是通过周期性触发中断达到计时目的,如心跳一般。而单步递增时钟则不发送中断,取而代之的是由软件自己在需要的时候去主动读取TSC寄存器的值来获得时间。TSC的精度更高并且速度更快,但仅能在较新的CPU(Sandy Bridge之后)上使用。
    - tsc=perfect
    - intel_pstate=disable  # intel_pstate驱动支持现代Intel处理器的温控。 intel_pstate=disable选项可以强制使用传统遗留的CPU驱动acpi_cpufreq
    - selinux=0
    - enforcing=0
    - nmi_watchdog=0  # 配置nmi_watchdog(不可屏蔽中断看门狗) 0 表示关闭看门狗;
    - softlockup_panic=0  # 是否在检测到软死锁(soft-lockup)的时候让内核panic
    - isolcpus=2-19 # 将列表中的CPU从内核SMP平衡和调度算法中剔除。 提出后并不是绝对不能再使用该CPU的,操作系统仍然可以强制指定特定的进程使用哪个CPU(可以通过taskset来做到)。该参数的目的主要是用于实现特定cpu只运行特定进程的目的。
    - nohz_full=2-19  #在 16 核的系统中,设定 nohz_full=1-15 可以在 1 到 15 内核中启用动态无时钟内核性能,并将所有的计时移动至唯一未设定的内核中(0 内核), [注意](1)"boot CPU"(通常都是"0"号CPU)会无条件的从列表中剔除。(2)这里列出的CPU编号必须也要同时列进"rcu_nocbs=..."参数中。
    - idle=poll # 对CPU进入休眠状态的额外设置。poll 从根本上禁用休眠功能(也就是禁止进入C-states状态),可以略微提升一些CPU性能,但是却需要多消耗许多电力,得不偿失。不推荐使用。
    - default_hugepagesz=1G
    - hugepagesz=1G
    - hugepages=32
    - skew_tick=1 # Offset the periodic timer tick per cpu to mitigate xtime_lock contention on larger systems, and/or RCU lock contention on all systems with CONFIG_MAXSMP set. Note: increases power consumption, thus should only be enabled if running jitter sensitive (HPC/RT) workloads.
    - rcu_nocbs=2-19  # 指定哪些CPU是No-CB CPU
    - kthread_cpus=0-1
    - irqaffinity=0-1 # 通过内核参数irqaffinity==[cpu列表],设置linux中断的亲和性,设置后,默认由这些cpu核来处理非CPU绑定中断。避免linux中断影响cpu2、cpu3上的实时应用,将linux中断指定到cpu0、cpu1处理。
    - rcu_nocb_poll # 减少了需要从卸载cpu执行唤醒操作。避免了rcuo kthreads线程显式的唤醒。另一方面这会增加耗电量
    - iommu=pt
    - intel_iommu=on
  kernelType: realtime
EOF
oc create -f /data/install/05-worker-kernelarg-realtime.yaml

# 一般都需要 cpu/numa 绑核,这个在 kubelet 的配置里面做
cat << EOF > /data/install/cpumanager-kubeletconfig.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static 
     cpuManagerReconcilePeriod: 5s 
     topologyManagerPolicy: single-numa-node 
     reservedSystemCPUs: "0,1" 
EOF
oc create -f  /data/install/cpumanager-kubeletconfig.yaml

# 以下如果在 bios 里面关掉了,就不用做了。
# if irqbalance disabled in bios, you can skip below step.
# cat << EOF > /data/install/99-custom-disable-irqbalance-worker.yaml
# apiVersion: machineconfiguration.openshift.io/v1
# kind: MachineConfig
# metadata:
#     labels:
#         machineconfiguration.openshift.io/role: worker-rt
#     name: 99-custom-disable-irqbalance-worker
# spec:
#     config:
#         ignition:
#             version: 2.2.0
#         systemd:
#             units:
#             - enabled: false
#               mask: true
#               name: irqbalance.service
# EOF
# oc create -f /data/install/99-custom-disable-irqbalance-worker.yaml


# 我们基于performace addon , 改一下他的例子, 这次我们基于 realtime
cat << EOF > /data/install/tuned.yaml
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: wzh-realtime
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=wzh version for realtime, 5G RAN
      include=openshift-node,realtime

      # Different values will override the original values in parent profiles.

      [variables]
      # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7

      isolated_cores=2-19
      isolate_managed_irq=Y

      [service]
      service.stalld=start,enable

    name: wzh-realtime
  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: worker-rt
    priority: 20
    profile: wzh-realtime
EOF
oc create -f /data/install/tuned.yaml

# to restore
oc delete -f  /data/install/tuned.yaml

# https://zhuanlan.zhihu.com/p/336381111
# yum install rt-test
# 在测试现场,经过整个晚上的测试,可以看到系统的实时性非常好
# 目标结果,最大不应超过 6μs
cyclictest -m -p95 -d0 -a 2-17 -t 16

try to deploy a vDU pod, using yaml

---

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: host-device-du
spec:
  config: '{
    "cniVersion": "0.3.0",
    "type": "host-device",
    "device": "ens81f1np1",
    "ipam": {
      "type": "host-local",
      "subnet": "192.168.12.0/24",
      "rangeStart": "192.168.12.105",
      "rangeEnd": "192.168.12.105",
      "routes": [{
        "dst": "0.0.0.0/0"
      }],
      "gateway": "192.168.12.1"
    }
  }'

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: du-deployment1
  labels:
    app: du-deployment1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: du-pod1
  template:
    metadata:
      labels: 
        app: du-pod1
      annotations:
        k8s.v1.cni.cncf.io/networks: '[
          { "name": "host-device-du",
            "interface": "net1" }
          ]'
    spec:
      containers:
      - name: du-container1
        image: "registry.ocp4.redhat.ren:5443/ocp4/centos:7.6.1810"
        imagePullPolicy: IfNotPresent
        tty: true
        stdin: true
        env:
          - name: duNetProviderDriver
            value: "host-netdevice"
        command:
          - sleep
          - infinity
        securityContext:
            privileged: true
            capabilities:
                add:
                - CAP_SYS_ADMIN
        volumeMounts:
          - mountPath: /hugepages
            name: hugepage
          - name: lib-modules
            mountPath: /lib/modules
          - name: src
            mountPath: /usr/src
          - name: dev
            mountPath: /dev
          - name: cache-volume
            mountPath: /dev/shm
        resources:
          requests:
            cpu: 16
            memory: 48Gi
            hugepages-1Gi: 8Gi
          limits:
            cpu: 16
            memory: 48Gi
            hugepages-1Gi: 8Gi
      volumes:
        - name: hugepage
          emptyDir:
            medium: HugePages
        - name: lib-modules
          hostPath:
            path: /lib/modules
        - name: src
          hostPath:
            path: /usr/src
        - name: dev
          hostPath:
            path: "/dev"
        - name: cache-volume
          emptyDir:
            medium: Memory
            sizeLimit: 16Gi
      nodeSelector:
        node-role.kubernetes.io/worker-rt: ""

research


oc get Tuned -n openshift-cluster-node-tuning-operator
# NAME                                                    AGE
# default                                                 18d
# openshift-node-performance-example-performanceprofile   12d
# rendered                                                18d

oc get Tuned/default -o yaml -n openshift-cluster-node-tuning-operator
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  creationTimestamp: "2021-05-05T16:09:36Z"
  generation: 1
  name: default
  namespace: openshift-cluster-node-tuning-operator
  resourceVersion: "6067"
  selfLink: /apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds/default
  uid: 205c01c5-2609-4f2f-b676-ad746ea3c9f3
spec:
  profile:
  - data: |
      [main]
      summary=Optimize systems running OpenShift (parent profile)
      include=${f:virt_check:virtual-guest:throughput-performance}

      [selinux]
      avc_cache_threshold=8192

      [net]
      nf_conntrack_hashsize=131072

      [sysctl]
      net.ipv4.ip_forward=1
      kernel.pid_max=>4194304
      net.netfilter.nf_conntrack_max=1048576
      net.ipv4.conf.all.arp_announce=2
      net.ipv4.neigh.default.gc_thresh1=8192
      net.ipv4.neigh.default.gc_thresh2=32768
      net.ipv4.neigh.default.gc_thresh3=65536
      net.ipv6.neigh.default.gc_thresh1=8192
      net.ipv6.neigh.default.gc_thresh2=32768
      net.ipv6.neigh.default.gc_thresh3=65536
      vm.max_map_count=262144

      [sysfs]
      /sys/module/nvme_core/parameters/io_timeout=4294967295
      /sys/module/nvme_core/parameters/max_retries=10
    name: openshift
  - data: |
      [main]
      summary=Optimize systems running OpenShift control plane
      include=openshift

      [sysctl]
      # ktune sysctl settings, maximizing i/o throughput
      #
      # Minimal preemption granularity for CPU-bound tasks:
      # (default: 1 msec#  (1 + ilog(ncpus)), units: nanoseconds)
      kernel.sched_min_granularity_ns=10000000
      # The total time the scheduler will consider a migrated process
      # "cache hot" and thus less likely to be re-migrated
      # (system default is 500000, i.e. 0.5 ms)
      kernel.sched_migration_cost_ns=5000000
      # SCHED_OTHER wake-up granularity.
      #
      # Preemption granularity when tasks wake up.  Lower the value to
      # improve wake-up latency and throughput for latency critical tasks.
      kernel.sched_wakeup_granularity_ns=4000000
    name: openshift-control-plane
  - data: |
      [main]
      summary=Optimize systems running OpenShift nodes
      include=openshift

      [sysctl]
      net.ipv4.tcp_fastopen=3
      fs.inotify.max_user_watches=65536
      fs.inotify.max_user_instances=8192
    name: openshift-node
  recommend:
  - match:
    - label: node-role.kubernetes.io/master
    - label: node-role.kubernetes.io/infra
    operand:
      debug: false
    priority: 30
    profile: openshift-control-plane
  - operand:
      debug: false
    priority: 40
    profile: openshift-node
status: {}
oc get Tuned/openshift-node-performance-example-performanceprofile -o yaml -n openshift-cluster-node-tuning-operator
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-node-performance-example-performanceprofile
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: "[main]\nsummary=Openshift node optimized for deterministic performance at the cost of increased power consumption, focused on low latency network performance. Based on Tuned 2.11 and Cluster node tuning (oc 4.5)\ninclude=openshift-node,cpu-partitioning\n\n# Inheritance of base profiles legend:\n# cpu-partitioning -> network-latency -> latency-performance\n# https://github.com/redhat-performance/tuned/blob/master/profiles/latency-performance/tuned.conf\n# https://github.com/redhat-performance/tuned/blob/master/profiles/network-latency/tuned.conf\n# https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf\n\n# All values are mapped with a comment where a parent profile contains them.\n# Different values will override the original values in parent profiles.\n\n[variables]\n# isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7\n\nisolated_cores=2-3 \n\n\nnot_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}}\n\n[cpu]\nforce_latency=cstate.id:1|3                   #  latency-performance  (override)\ngovernor=performance                          #  latency-performance \nenergy_perf_bias=performance                  #  latency-performance \nmin_perf_pct=100                              #  latency-performance \n\n[service]\nservice.stalld=start,enable\n\n[vm]\ntransparent_hugepages=never                   #  network-latency\n\n\n[irqbalance]\n# Override the value set by cpu-partitioning with an empty one\nbanned_cpus=\"\"\n\n\n[scheduler]\ngroup.ksoftirqd=0:f:11:*:ksoftirqd.*\ngroup.rcuc=0:f:11:*:rcuc.*\n\ndefault_irq_smp_affinity = ignore\n\n\n[sysctl]\nkernel.hung_task_timeout_secs = 600           # cpu-partitioning #realtime\nkernel.nmi_watchdog = 0                       # cpu-partitioning #realtime\nkernel.sched_rt_runtime_us = -1               # realtime \nkernel.timer_migration = 0                    # cpu-partitioning (= 1) #realtime (= 0)\nkernel.numa_balancing=0                       # network-latency\nnet.core.busy_read=50                         # network-latency\nnet.core.busy_poll=50                         # network-latency\nnet.ipv4.tcp_fastopen=3                       # network-latency\nvm.stat_interval = 10                         # cpu-partitioning  #realtime\n\n# ktune sysctl settings for rhel6 servers, maximizing i/o throughput\n#\n# Minimal preemption granularity for CPU-bound tasks:\n# (default: 1 msec#  (1 + ilog(ncpus)), units: nanoseconds)\nkernel.sched_min_granularity_ns=10000000      # latency-performance\n\n# If a workload mostly uses anonymous memory and it hits this limit, the entire\n# working set is buffered for I/O, and any more write buffering would require\n# swapping, so it's time to throttle writes until I/O can catch up.  Workloads\n# that mostly use file mappings may be able to use even higher values.\n#\n# The generator of dirty data starts writeback at this percentage (system default\n# is 20%)\nvm.dirty_ratio=10                             # latency-performance\n\n# Start background writeback (via writeback threads) at this percentage (system\n# default is 10%)\nvm.dirty_background_ratio=3                   # latency-performance\n\n# The swappiness parameter controls the tendency of the kernel to move\n# processes out of physical memory and onto the swap disk.\n# 0 tells the kernel to avoid swapping processes out of physical memory\n# for as long as possible\n# 100 tells the kernel to aggressively swap processes out of physical memory\n# and move them to swap cache\nvm.swappiness=10                              # latency-performance\n\n# The total time the scheduler will consider a migrated process\n# \"cache hot\" and thus less likely to be re-migrated\n# (system default is 500000, i.e. 0.5 ms)\nkernel.sched_migration_cost_ns=5000000        # latency-performance\n\n[selinux]\navc_cache_threshold=8192                      # Custom (atomic host)\n\n[net]\nnf_conntrack_hashsize=131072                  # Custom (atomic host)\n\n[bootloader]\n# set empty values to disable RHEL initrd setting in cpu-partitioning \ninitrd_remove_dir=     \ninitrd_dst_img=\ninitrd_add_dir=\n# overrides cpu-partitioning cmdline\ncmdline_cpu_part=+nohz=on rcu_nocbs=${isolated_cores} tuned.non_isolcpus=${not_isolated_cpumask} intel_pstate=disable nosoftlockup\n\ncmdline_realtime=+tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,${isolated_cores} systemd.cpu_affinity=${not_isolated_cores_expanded}\n\ncmdline_hugepages=+ default_hugepagesz=1G  \ncmdline_additionalArg=+\n"
    name: openshift-node-performance-example-performanceprofile
  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: worker-rt
    priority: 20
    profile: openshift-node-performance-example-performanceprofile
status: {}
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-node-performance-example-performanceprofile
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Openshift node optimized for deterministic performance at the cost of increased power consumption, focused on low latency network performance. Based on Tuned 2.11 and Cluster node tuning (oc 4.5)
      include=openshift-node,cpu-partitioning

      # Inheritance of base profiles legend:
      # cpu-partitioning -> network-latency -> latency-performance
      # https://github.com/redhat-performance/tuned/blob/master/profiles/latency-performance/tuned.conf
      # https://github.com/redhat-performance/tuned/blob/master/profiles/network-latency/tuned.conf
      # https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf

      # All values are mapped with a comment where a parent profile contains them.
      # Different values will override the original values in parent profiles.

      [variables]
      # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7

      isolated_cores=2-3


      not_isolated_cores_expanded=

      [cpu]
      force_latency=cstate.id:1|3                   #  latency-performance  (override)
      governor=performance                          #  latency-performance
      energy_perf_bias=performance                  #  latency-performance
      min_perf_pct=100                              #  latency-performance

      [service]
      service.stalld=start,enable

      [vm]
      transparent_hugepages=never                   #  network-latency


      [irqbalance]
      # Override the value set by cpu-partitioning with an empty one
      banned_cpus=""


      [scheduler]
      group.ksoftirqd=0:f:11:*:ksoftirqd.*
      group.rcuc=0:f:11:*:rcuc.*

      default_irq_smp_affinity = ignore


      [sysctl]
      kernel.hung_task_timeout_secs = 600           # cpu-partitioning #realtime
      kernel.nmi_watchdog = 0                       # cpu-partitioning #realtime
      kernel.sched_rt_runtime_us = -1               # realtime
      kernel.timer_migration = 0                    # cpu-partitioning (= 1) #realtime (= 0)
      kernel.numa_balancing=0                       # network-latency
      net.core.busy_read=50                         # network-latency
      net.core.busy_poll=50                         # network-latency
      net.ipv4.tcp_fastopen=3                       # network-latency
      vm.stat_interval = 10                         # cpu-partitioning  #realtime

      # ktune sysctl settings for rhel6 servers, maximizing i/o throughput
      #
      # Minimal preemption granularity for CPU-bound tasks:
      # (default: 1 msec#  (1 + ilog(ncpus)), units: nanoseconds)
      kernel.sched_min_granularity_ns=10000000      # latency-performance

      # If a workload mostly uses anonymous memory and it hits this limit, the entire
      # working set is buffered for I/O, and any more write buffering would require
      # swapping, so it's time to throttle writes until I/O can catch up.  Workloads
      # that mostly use file mappings may be able to use even higher values.
      #
      # The generator of dirty data starts writeback at this percentage (system default
      # is 20%)
      vm.dirty_ratio=10                             # latency-performance

      # Start background writeback (via writeback threads) at this percentage (system
      # default is 10%)
      vm.dirty_background_ratio=3                   # latency-performance

      # The swappiness parameter controls the tendency of the kernel to move
      # processes out of physical memory and onto the swap disk.
      # 0 tells the kernel to avoid swapping processes out of physical memory
      # for as long as possible
      # 100 tells the kernel to aggressively swap processes out of physical memory
      # and move them to swap cache
      vm.swappiness=10                              # latency-performance

      # The total time the scheduler will consider a migrated process
      # "cache hot" and thus less likely to be re-migrated
      # (system default is 500000, i.e. 0.5 ms)
      kernel.sched_migration_cost_ns=5000000        # latency-performance

      [selinux]
      avc_cache_threshold=8192                      # Custom (atomic host)

      [net]
      nf_conntrack_hashsize=131072                  # Custom (atomic host)

      [bootloader]
      # set empty values to disable RHEL initrd setting in cpu-partitioning
      initrd_remove_dir=
      initrd_dst_img=
      initrd_add_dir=
      # overrides cpu-partitioning cmdline
      cmdline_cpu_part=+nohz=on rcu_nocbs= tuned.non_isolcpus= intel_pstate=disable nosoftlockup

      cmdline_realtime=+tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq, systemd.cpu_affinity=

      cmdline_hugepages=+ default_hugepagesz=1G
      cmdline_additionalArg=+
    name: openshift-node-performance-example-performanceprofile
  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: worker-rt
    priority: 20
    profile: openshift-node-performance-example-performanceprofile
# tuned 的配置,如果有些在bios里面做了,那么也可以忽略。我们基于performace addon , 改一下他的例子.
cat << EOF > /data/install/tuned.yaml
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-node-wzh-performance-profile
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Openshift node optimized for deterministic performance at the cost of increased power consumption, focused on low latency network performance. Based on Tuned 2.11 and Cluster node tuning (oc 4.5)
      include=openshift-node,cpu-partitioning

      # Inheritance of base profiles legend:
      # cpu-partitioning -> network-latency -> latency-performance
      # https://github.com/redhat-performance/tuned/blob/master/profiles/latency-performance/tuned.conf
      # https://github.com/redhat-performance/tuned/blob/master/profiles/network-latency/tuned.conf
      # https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf

      # All values are mapped with a comment where a parent profile contains them.
      # Different values will override the original values in parent profiles.

      [variables]
      # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7

      isolated_cores=2-19
      isolate_managed_irq=Y

      not_isolated_cores_expanded=

      [cpu]
      # force_latency=cstate.id:1|3                   #  latency-performance  (override)
      governor=performance                          #  latency-performance
      energy_perf_bias=performance                  #  latency-performance
      min_perf_pct=100                              #  latency-performance

      [service]
      service.stalld=start,enable

      [vm]
      transparent_hugepages=never                   #  network-latency


      [irqbalance]
      # Override the value set by cpu-partitioning with an empty one
      banned_cpus=""


      [scheduler]
      group.ksoftirqd=0:f:11:*:ksoftirqd.*
      group.rcuc=0:f:11:*:rcuc.*

      default_irq_smp_affinity = ignore


      [sysctl]
      kernel.hung_task_timeout_secs = 600           # cpu-partitioning #realtime
      kernel.nmi_watchdog = 0                       # cpu-partitioning #realtime
      kernel.sched_rt_runtime_us = -1               # realtime
      kernel.timer_migration = 0                    # cpu-partitioning (= 1) #realtime (= 0)
      kernel.numa_balancing=0                       # network-latency
      net.core.busy_read=50                         # network-latency
      net.core.busy_poll=50                         # network-latency
      net.ipv4.tcp_fastopen=3                       # network-latency
      vm.stat_interval = 10                         # cpu-partitioning  #realtime

      # ktune sysctl settings for rhel6 servers, maximizing i/o throughput
      #
      # Minimal preemption granularity for CPU-bound tasks:
      # (default: 1 msec#  (1 + ilog(ncpus)), units: nanoseconds)
      kernel.sched_min_granularity_ns=10000000      # latency-performance

      # If a workload mostly uses anonymous memory and it hits this limit, the entire
      # working set is buffered for I/O, and any more write buffering would require
      # swapping, so it's time to throttle writes until I/O can catch up.  Workloads
      # that mostly use file mappings may be able to use even higher values.
      #
      # The generator of dirty data starts writeback at this percentage (system default
      # is 20%)
      vm.dirty_ratio=10                             # latency-performance

      # Start background writeback (via writeback threads) at this percentage (system
      # default is 10%)
      vm.dirty_background_ratio=3                   # latency-performance

      # The swappiness parameter controls the tendency of the kernel to move
      # processes out of physical memory and onto the swap disk.
      # 0 tells the kernel to avoid swapping processes out of physical memory
      # for as long as possible
      # 100 tells the kernel to aggressively swap processes out of physical memory
      # and move them to swap cache
      vm.swappiness=10                              # latency-performance

      # The total time the scheduler will consider a migrated process
      # "cache hot" and thus less likely to be re-migrated
      # (system default is 500000, i.e. 0.5 ms)
      kernel.sched_migration_cost_ns=5000000        # latency-performance

      [selinux]
      avc_cache_threshold=8192                      # Custom (atomic host)

      [net]
      nf_conntrack_hashsize=131072                  # Custom (atomic host)

      [bootloader]
      # set empty values to disable RHEL initrd setting in cpu-partitioning
      initrd_remove_dir=
      initrd_dst_img=
      initrd_add_dir=
      # overrides cpu-partitioning cmdline
      cmdline_cpu_part=+nohz=on rcu_nocbs= tuned.non_isolcpus= intel_pstate=disable nosoftlockup

      cmdline_realtime=+tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq, systemd.cpu_affinity=

      cmdline_hugepages=+ default_hugepagesz=1G
      cmdline_additionalArg=+
    name: openshift-node-wzh-performance-profile
  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: worker-rt
    priority: 20
    profile: openshift-node-wzh-performance-profile
EOF
oc create -f /data/install/tuned.yaml

# 用了performance example的profile 效果很好
cyclictest -m -p95 -d0 -a 2-17 -t 16

example config

oc get mc
NAME                                                  GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                             791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
00-worker                                             791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
01-master-container-runtime                           791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
01-master-kubelet                                     791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
01-worker-container-runtime                           791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
01-worker-kubelet                                     791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
05-worker-kernelarg-rtran                                                                        3.1.0             62d
50-nto-worker-rt                                                                                 3.1.0             58d
99-master-generated-registries                        791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
99-master-ssh                                                                                    3.1.0             62d
99-worker-generated-registries                        791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
99-worker-realtime                                                                                                 62d
99-worker-rt-generated-kubelet                        791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             58d
99-worker-ssh                                                                                    3.1.0             62d
load-sctp-module                                                                                 3.1.0             6d9h
rendered-master-0629f16bcba29a60e894f3d9e14e47b9      791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
rendered-worker-7497d1b2e86631a4f390a6eba0aef74f      791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
rendered-worker-rt-1e40da418635be6c6b81ebc33a1f0640   791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
rendered-worker-rt-35d27df9ed0ff75a6a192700313a88f8   791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             58d
rendered-worker-rt-3e87a41fe1e455977a4a972f8d4258aa   791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             58d
rendered-worker-rt-4ba64193fdbace8fc101541335067ad4   791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
rendered-worker-rt-7497d1b2e86631a4f390a6eba0aef74f   791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             62d
rendered-worker-rt-9cf8ebbc1c0cf88bb3a9716b6d66e60e   791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             58d
rendered-worker-rt-bb3c16a689e7797fb4c828cec877c9ed   791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             58d
rendered-worker-rt-ea53e6c4fc58b5f9f505ebed3cb32345   791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             58d
rendered-worker-rt-fd13902df04099f149d7653da3552f5d   791d1cc2626d1e4e5da59f15c1a6166fd398aef8   3.1.0             6d9h
oc get mc/05-worker-kernelarg-rtran -o json | jq "del(.metadata.managedFields, .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.generation, .metadata.creationTimestamp)"
{
  "apiVersion": "machineconfiguration.openshift.io/v1",
  "kind": "MachineConfig",
  "metadata": {
    "labels": {
      "machineconfiguration.openshift.io/role": "worker-rt"
    },
    "name": "05-worker-kernelarg-rtran"
  },
  "spec": {
    "config": {
      "ignition": {
        "version": "3.1.0"
      }
    },
    "kernelArguments": [
      "no_timer_check",
      "clocksource=tsc",
      "tsc=perfect",
      "selinux=0",
      "enforcing=0",
      "nmi_watchdog=0",
      "softlockup_panic=0",
      "isolcpus=2-19",
      "nohz_full=2-19",
      "idle=poll",
      "default_hugepagesz=1G",
      "hugepagesz=1G",
      "hugepages=16",
      "skew_tick=1",
      "rcu_nocbs=2-19",
      "kthread_cpus=0-1",
      "irqaffinity=0-1",
      "rcu_nocb_poll",
      "iommu=pt",
      "intel_iommu=on"
    ]
  }
}
oc get mc/50-nto-worker-rt -o json | jq "del(.metadata.managedFields, .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.generation, .metadata.creationTimestamp)"
{
  "apiVersion": "machineconfiguration.openshift.io/v1",
  "kind": "MachineConfig",
  "metadata": {
    "annotations": {
      "tuned.openshift.io/generated-by-controller-version": "v4.6.0-202104221811.p0-0-gfdb7aec-dirty"
    },
    "labels": {
      "machineconfiguration.openshift.io/role": "worker-rt"
    },
    "name": "50-nto-worker-rt"
  },
  "spec": {
    "config": {
      "ignition": {
        "config": {
          "replace": {
            "verification": {}
          }
        },
        "proxy": {},
        "security": {
          "tls": {}
        },
        "timeouts": {},
        "version": "3.1.0"
      },
      "passwd": {},
      "storage": {},
      "systemd": {}
    },
    "extensions": null,
    "fips": false,
    "kernelArguments": [
      "skew_tick=1",
      "isolcpus=managed_irq,domain,2-19",
      "intel_pstate=disable",
      "nosoftlockup",
      "tsc=nowatchdog"
    ],
    "kernelType": "",
    "osImageURL": ""
  }
}
oc get mc/99-worker-realtime -o json | jq "del(.metadata.managedFields, .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.generation, .metadata.creationTimestamp)"
{
  "apiVersion": "machineconfiguration.openshift.io/v1",
  "kind": "MachineConfig",
  "metadata": {
    "labels": {
      "machineconfiguration.openshift.io/role": "worker-rt"
    },
    "name": "99-worker-realtime"
  },
  "spec": {
    "kernelType": "realtime"
  }
}
oc get mc/load-sctp-module -o json | jq "del(.metadata.managedFields, .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.generation, .metadata.creationTimestamp)"
{
  "apiVersion": "machineconfiguration.openshift.io/v1",
  "kind": "MachineConfig",
  "metadata": {
    "labels": {
      "machineconfiguration.openshift.io/role": "worker-rt"
    },
    "name": "load-sctp-module"
  },
  "spec": {
    "config": {
      "ignition": {
        "version": "3.1.0"
      },
      "storage": {
        "files": [
          {
            "contents": {
              "source": "data:,"
            },
            "mode": 420,
            "overwrite": true,
            "path": "/etc/modprobe.d/sctp-blacklist.conf"
          },
          {
            "contents": {
              "source": "data:,sctp"
            },
            "mode": 420,
            "overwrite": true,
            "path": "/etc/modules-load.d/sctp-load.conf"
          }
        ]
      }
    }
  }
}

oc get Tuned -n openshift-cluster-node-tuning-operator
NAME           AGE
default        62d
rendered       62d
wzh-realtime   58d

oc get Tuned/wzh-realtime -n openshift-cluster-node-tuning-operator -o json | jq "del(.metadata.managedFields, .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.generation, .metadata.creationTimestamp)" 
{
  "apiVersion": "tuned.openshift.io/v1",
  "kind": "Tuned",
  "metadata": {
    "name": "wzh-realtime",
    "namespace": "openshift-cluster-node-tuning-operator"
  },
  "spec": {
    "profile": [
      {
        "data": "[main]\nsummary=wzh version for realtime, 5G RAN\ninclude=openshift-node,realtime\n\n# Inheritance of base profiles legend:\n# cpu-partitioning -> network-latency -> latency-performance\n# https://github.com/redhat-performance/tuned/blob/master/profiles/latency-performance/tuned.conf\n# https://github.com/redhat-performance/tuned/blob/master/profiles/network-latency/tuned.conf\n# https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf\n\n# All values are mapped with a comment where a parent profile contains them.\n# Different values will override the original values in parent profiles.\n\n[variables]\n# isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7\n\nisolated_cores=2-19\nisolate_managed_irq=Y\n",
        "name": "wzh-realtime"
      }
    ],
    "recommend": [
      {
        "machineConfigLabels": {
          "machineconfiguration.openshift.io/role": "worker-rt"
        },
        "priority": 20,
        "profile": "wzh-realtime"
      }
    ]
  }
}
oc get Tuned/wzh-realtime -n openshift-cluster-node-tuning-operator -o json | jq ".spec.profile[0].data" | jq -r
[main]
summary=wzh version for realtime, 5G RAN
include=openshift-node,realtime

# Inheritance of base profiles legend:
# cpu-partitioning -> network-latency -> latency-performance
# https://github.com/redhat-performance/tuned/blob/master/profiles/latency-performance/tuned.conf
# https://github.com/redhat-performance/tuned/blob/master/profiles/network-latency/tuned.conf
# https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf

# All values are mapped with a comment where a parent profile contains them.
# Different values will override the original values in parent profiles.

[variables]
# isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7

isolated_cores=2-19
isolate_managed_irq=Y
oc get deployment.apps/du-deployment1 -o json | jq "del(.metadata.managedFields, .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.generation, .metadata.creationTimestamp)"
{
  "apiVersion": "apps/v1",
  "kind": "Deployment",
  "metadata": {
    "annotations": {
      "deployment.kubernetes.io/revision": "1",
      "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"apps/v1\",\"kind\":\"Deployment\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"du-deployment1\"},\"name\":\"du-deployment1\",\"namespace\":\"default\"},\"spec\":{\"replicas\":1,\"selector\":{\"matchLabels\":{\"app\":\"du-pod1\"}},\"template\":{\"metadata\":{\"annotations\":{\"k8s.v1.cni.cncf.io/networks\":\"[ { \\\"name\\\": \\\"host-device-du\\\", \\\"interface\\\": \\\"veth11\\\" } ]\"},\"labels\":{\"app\":\"du-pod1\"}},\"spec\":{\"containers\":[{\"command\":[\"sleep\",\"infinity\"],\"env\":[{\"name\":\"duNetProviderDriver\",\"value\":\"host-netdevice\"}],\"image\":\"registry.ocp4.redhat.ren:5443/ocp4/du:v1-wzh\",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"du-container1\",\"resources\":{\"limits\":{\"cpu\":16,\"hugepages-1Gi\":\"8Gi\",\"memory\":\"48Gi\"},\"requests\":{\"cpu\":16,\"hugepages-1Gi\":\"8Gi\",\"memory\":\"48Gi\"}},\"securityContext\":{\"capabilities\":{\"add\":[\"CAP_SYS_ADMIN\"]},\"privileged\":true},\"stdin\":true,\"tty\":true,\"volumeMounts\":[{\"mountPath\":\"/hugepages\",\"name\":\"hugepage\"},{\"mountPath\":\"/lib/modules\",\"name\":\"lib-modules\"},{\"mountPath\":\"/usr/src\",\"name\":\"src\"},{\"mountPath\":\"/dev\",\"name\":\"dev\"},{\"mountPath\":\"/dev/shm\",\"name\":\"cache-volume\"}]}],\"nodeSelector\":{\"node-role.kubernetes.io/worker-rt\":\"\"},\"volumes\":[{\"emptyDir\":{\"medium\":\"HugePages\"},\"name\":\"hugepage\"},{\"hostPath\":{\"path\":\"/lib/modules\"},\"name\":\"lib-modules\"},{\"hostPath\":{\"path\":\"/usr/src\"},\"name\":\"src\"},{\"hostPath\":{\"path\":\"/dev\"},\"name\":\"dev\"},{\"emptyDir\":{\"medium\":\"Memory\",\"sizeLimit\":\"16Gi\"},\"name\":\"cache-volume\"}]}}}}\n"
    },
    "labels": {
      "app": "du-deployment1"
    },
    "name": "du-deployment1",
    "namespace": "default"
  },
  "spec": {
    "progressDeadlineSeconds": 600,
    "replicas": 1,
    "revisionHistoryLimit": 10,
    "selector": {
      "matchLabels": {
        "app": "du-pod1"
      }
    },
    "strategy": {
      "rollingUpdate": {
        "maxSurge": "25%",
        "maxUnavailable": "25%"
      },
      "type": "RollingUpdate"
    },
    "template": {
      "metadata": {
        "annotations": {
          "k8s.v1.cni.cncf.io/networks": "[ { \"name\": \"host-device-du\", \"interface\": \"veth11\" } ]"
        },
        "creationTimestamp": null,
        "labels": {
          "app": "du-pod1"
        }
      },
      "spec": {
        "containers": [
          {
            "command": [
              "sleep",
              "infinity"
            ],
            "env": [
              {
                "name": "duNetProviderDriver",
                "value": "host-netdevice"
              }
            ],
            "image": "registry.ocp4.redhat.ren:5443/ocp4/du:v1-wzh",
            "imagePullPolicy": "IfNotPresent",
            "name": "du-container1",
            "resources": {
              "limits": {
                "cpu": "16",
                "hugepages-1Gi": "8Gi",
                "memory": "48Gi"
              },
              "requests": {
                "cpu": "16",
                "hugepages-1Gi": "8Gi",
                "memory": "48Gi"
              }
            },
            "securityContext": {
              "capabilities": {
                "add": [
                  "CAP_SYS_ADMIN"
                ]
              },
              "privileged": true
            },
            "stdin": true,
            "terminationMessagePath": "/dev/termination-log",
            "terminationMessagePolicy": "File",
            "tty": true,
            "volumeMounts": [
              {
                "mountPath": "/hugepages",
                "name": "hugepage"
              },
              {
                "mountPath": "/lib/modules",
                "name": "lib-modules"
              },
              {
                "mountPath": "/usr/src",
                "name": "src"
              },
              {
                "mountPath": "/dev",
                "name": "dev"
              },
              {
                "mountPath": "/dev/shm",
                "name": "cache-volume"
              }
            ]
          }
        ],
        "dnsPolicy": "ClusterFirst",
        "nodeSelector": {
          "node-role.kubernetes.io/worker-rt": ""
        },
        "restartPolicy": "Always",
        "schedulerName": "default-scheduler",
        "securityContext": {},
        "terminationGracePeriodSeconds": 30,
        "volumes": [
          {
            "emptyDir": {
              "medium": "HugePages"
            },
            "name": "hugepage"
          },
          {
            "hostPath": {
              "path": "/lib/modules",
              "type": ""
            },
            "name": "lib-modules"
          },
          {
            "hostPath": {
              "path": "/usr/src",
              "type": ""
            },
            "name": "src"
          },
          {
            "hostPath": {
              "path": "/dev",
              "type": ""
            },
            "name": "dev"
          },
          {
            "emptyDir": {
              "medium": "Memory",
              "sizeLimit": "16Gi"
            },
            "name": "cache-volume"
          }
        ]
      }
    }
  },
  "status": {
    "availableReplicas": 1,
    "conditions": [
      {
        "lastTransitionTime": "2021-07-21T06:21:57Z",
        "lastUpdateTime": "2021-07-21T06:23:05Z",
        "message": "ReplicaSet \"du-deployment1-d5dc9854d\" has successfully progressed.",
        "reason": "NewReplicaSetAvailable",
        "status": "True",
        "type": "Progressing"
      },
      {
        "lastTransitionTime": "2021-07-21T11:07:55Z",
        "lastUpdateTime": "2021-07-21T11:07:55Z",
        "message": "Deployment has minimum availability.",
        "reason": "MinimumReplicasAvailable",
        "status": "True",
        "type": "Available"
      }
    ],
    "observedGeneration": 7,
    "readyReplicas": 1,
    "replicas": 1,
    "updatedReplicas": 1
  }
}

oc get net-attach-def
# NAME             AGE
# host-device-du   6h32m
# macvlan-conf     23d

oc get net-attach-def/host-device-du -o json | jq "del(.metadata.managedFields, .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.generation, .metadata.creationTimestamp)"
{
  "apiVersion": "k8s.cni.cncf.io/v1",
  "kind": "NetworkAttachmentDefinition",
  "metadata": {
    "annotations": {
      "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"k8s.cni.cncf.io/v1\",\"kind\":\"NetworkAttachmentDefinition\",\"metadata\":{\"annotations\":{},\"name\":\"host-device-du\",\"namespace\":\"default\"},\"spec\":{\"config\":\"{ \\\"cniVersion\\\": \\\"0.3.0\\\", \\\"type\\\": \\\"host-device\\\", \\\"device\\\": \\\"ens81f1np1\\\", \\\"ipam\\\": { \\\"type\\\": \\\"host-local\\\", \\\"subnet\\\": \\\"192.168.12.0/24\\\", \\\"rangeStart\\\": \\\"192.168.12.105\\\", \\\"rangeEnd\\\": \\\"192.168.12.105\\\", \\\"routes\\\": [{ \\\"dst\\\": \\\"0.0.0.0/0\\\" }], \\\"gateway\\\": \\\"192.168.12.1\\\" } }\"}}\n"
    },
    "name": "host-device-du",
    "namespace": "default"
  },
  "spec": {
    "config": "{ \"cniVersion\": \"0.3.0\", \"type\": \"host-device\", \"device\": \"ens18f1\", \"ipam\": { \"type\": \"host-local\", \"subnet\": \"192.168.12.0/24\", \"rangeStart\": \"192.168.12.105\", \"rangeEnd\": \"192.168.12.105\", \"routes\": [{ \"dst\": \"0.0.0.0/0\" }], \"gateway\": \"192.168.12.1\" } }"
  }
}
oc get net-attach-def/host-device-du -o json | jq "del(.metadata.managedFields, .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.generation, .metadata.creationTimestamp)" | jq .spec.config | jq "fromjson"
{
  "cniVersion": "0.3.0",
  "type": "host-device",
  "device": "ens18f1",
  "ipam": {
    "type": "host-local",
    "subnet": "192.168.12.0/24",
    "rangeStart": "192.168.12.105",
    "rangeEnd": "192.168.12.105",
    "routes": [
      {
        "dst": "0.0.0.0/0"
      }
    ],
    "gateway": "192.168.12.1"
  }
}

从容器向宿主机注入内核模块 kmod / driver

从容器向宿主机注入kmod/driver,最大的场景,就是在容器平台上给GPU和DPU装驱动,参考nvidia家的gpu驱动(nvidia gpu operator),都是从容器向宿主机注入的方式做的。

还有一个大的使用场景,就是像RHACS/StackRox这种安全平台,向宿主机注入内核模块,进行系统监控。

视频讲解:

先用podman进行单机版本测试

# on a centos8 to test the driver build
# https://blog.sourcerer.io/writing-a-simple-linux-kernel-module-d9dc3762c234

yum install -y epel-release
yum update -y
yum install -y byobu podman buildah

mkdir -p /data/kmod
cd /data/kmod

podman run -it --rm quay.io/generic/centos8 bash

# below will input/run in the container
dnf update -y

dnf install -y make gcc wget perl createrepo kernel-core-$(uname -r) kernel-devel-$(uname -r) pciutils python36-devel ethtool lsof elfutils-libelf-devel rpm-build kernel-rpm-macros python36 tk numactl-libs libmnl tcl binutils kmod procps git autoconf automake libtool hostname

mkdir -p ~/src/lkm_example
cd ~/src/lkm_example

cat << 'EOF' > lkm_example.c
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Wandering Star");
MODULE_DESCRIPTION("A simple example Linux module.");
MODULE_VERSION("0.01");
static int __init lkm_example_init(void) {
 printk(KERN_INFO "Hello, World, Wandering Star!\n");
 return 0;
}
static void __exit lkm_example_exit(void) {
 printk(KERN_INFO "Goodbye, World, Wandering Star!\n");
}
module_init(lkm_example_init);
module_exit(lkm_example_exit);

EOF

cat << EOF > Makefile
obj-m += lkm_example.o
all:
    make -C /lib/modules/$(uname -r)/build M=$(pwd) modules
clean:
    make -C/lib/modules/$(uname -r)/build M=$(pwd) clean
EOF
sed -i 's/^    /\t/g' Makefile

make
insmod lkm_example.ko
# insmod: ERROR: could not insert module lkm_example.ko: Operation not permitted

# poc again with priviledged
podman run -it --rm --privileged quay.io/generic/centos8 bash

# do the same above again
# yum install .............. 
# ........
# make

insmod lkm_example.ko

# go to host
dmesg | grep Wandering
# [ 5197.673179] Hello, World, Wandering Star!

lsmod | grep example
# lkm_example            16384  0

try the demo on openshift4

first, we try to get rpm repo offline

  • https://www.openshift.com/blog/how-to-use-entitled-image-builds-to-build-drivercontainers-with-ubi-on-openshift
# on a vultr host, centos7
mkdir -p /data/rhel8/entitle
cd /data/rhel8/entitle

# goto https://access.redhat.com/management/subscriptions
# search employee sku, find a system, go into, and download from subscription
# or goto: https://access.redhat.com/management/systems/4d1e4cc0-2c99-4431-99ce-2f589a24ea11/subscriptions
yum install -y unzip 
unzip *
unzip consumer_export.zip
find . -name *.pem -exec cp {} ./ \;

# podman run -ti --mount type=bind,source=/data/rhel8/entitle/$(ls *.pem | sed -n '2p'),target=/etc/pki/entitlement/entitlement.pem  --mount type=bind,source=/data/rhel8/entitle/$(ls *.pem | sed -n '2p'),target=/etc/pki/entitlement/entitlement-key.pem registry.access.redhat.com/ubi8:latest bash -c "dnf search kernel-devel --showduplicates"

mkdir -p /data/rhel8/dnf

podman run -it --rm -v /data/rhel8/dnf:/data/dnf:z \
    --mount type=bind,source=$(ls /data/rhel8/entitle/*.pem | sed -n '2p'),target=/etc/pki/entitlement/entitlement.pem  \
    --mount type=bind,source=$(ls /data/rhel8/entitle/*.pem | sed -n '2p'),target=/etc/pki/entitlement/entitlement-key.pem \
    registry.access.redhat.com/ubi8:8.3 bash

cd /data/dnf
# dnf -y --enablerepo=rhel-8-for-x86_64-baseos-rpms --releasever=8.3 install make gcc wget perl createrepo  pciutils python36-devel ethtool lsof elfutils-libelf-devel rpm-build kernel-rpm-macros python36 tk numactl-libs libmnl tcl binutils kmod procps git autoconf automake libtool hostname kernel-core-$(uname -r) kernel-devel-$(uname -r)

dnf -y --enablerepo=rhel-8-for-x86_64-baseos-rpms --releasever=8.3 install createrepo  

dnf -y download --resolve --alldeps --releasever=8.3 \
make gcc wget perl createrepo  pciutils python36-devel ethtool lsof elfutils-libelf-devel rpm-build kernel-rpm-macros python36 tk numactl-libs libmnl tcl binutils kmod procps git autoconf automake libtool hostname kernel-core-4.18.0-240.22.1.el8_3.x86_64 kernel-devel-4.18.0-240.22.1.el8_3.x86_64

dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
# dnf install -y https://kojipkgs.fedoraproject.org//packages/modulemd-tools/0.9/1.fc32/noarch/modulemd-tools-0.9-1.fc32.noarch.rpm
# https://copr.fedorainfracloud.org/coprs/frostyx/modulemd-tools/
dnf copr enable -y frostyx/modulemd-tools
dnf install -y modulemd-tools

createrepo ./
repo2module . \
    --module-name foo \
    --module-stream devel \
    --module-version 123 \
    --module-context f32
createrepo_mod .

# back to host
cd /data/rhel8
tar zcvf dnf.tgz dnf/

# upload dnf.tgz to helper /var/www/html/
# on helper
cd /var/www/html/
tar zvxf dnf.tgz


we will use an entrypoint file. the entrypoint script file is locate here

# on helper
mkdir -p /data/kmod
cd /data/kmod

cat << EOF > /data/kmod/Dockerfile
FROM registry.access.redhat.com/ubi8

WORKDIR /
COPY kmod.entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

EOF

buildah bud -t quay.io/wangzheng422/qimgs:kmod-demo.02 -f Dockerfile .
buildah push quay.io/wangzheng422/qimgs:kmod-demo.02

cd /data/install
cat << EOF > kmod-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: kmod-example
spec:
  nodeSelector:
    kubernetes.io/hostname: 'master-2'
  restartPolicy: Never
  containers:
  - securityContext:
      privileged: true
    image: quay.io/wangzheng422/qimgs:kmod-demo.02
    imagePullPolicy: Always
    name: kmod-example

EOF
oc create -n demo -f kmod-pod.yaml

# to restore
oc delete -n demo -f kmod-pod.yaml

# login to master-2
ssh core@master-2
lsmod | grep example
# lkm_example            16384  0

dmesg | grep Wandering
# [40933.691925] Hello, World, Wandering Star!

RHACS/Stackrox 使用案例

我们已经完成了内核模块的注入,但是为了更好的实现软件功能,我们一般需要把/sys, /dev这种目录挂载到容器中,以下就是RHACS/StackRox的挂载实例。

others


mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak
cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL-Mirror
baseurl=http://v.redhat.ren:8080/
enabled=1
gpgcheck=0

EOF


openshift4 上 GPU/vGPU 共享

openshift/k8s集群上,运行了越来越多的AI/ML应用,这些应用大部分需要GPU的支持,但是英伟达/k8s官方的device-plug中,GPU的调度,是按照一块GPU为单元来进行调度的,这就在k8s调度层面,带来一个问题,即GPU资源浪费的问题。

好在社区有很多类似的方案,比如aliyun的方案,就相对简单,当然功能也简单。本文就试图在openshift4上,运行aliyun的gpu共享方案。

由于aliyun等类似的方案,大多基于nvidia-docker,而openshift4使用了crio,所以里面有一点定制化的部分。

由于时间所限,本文只是完成了方案的大致成功运行,完美的运行,需要更多的定制化,这个就有待项目中继续完善吧。

注意

  • 这是调度共享方案,不是共享隔离方案

todo

  • 在真实的多GPU卡环境中验证。
  • 增强scheduler extender安全性

视频讲解

部署运行 scheduler extender

aliyun类似的方案,都是扩展k8s scheduler的功能,来增强k8s已有的功能,在最新版本的openshift4中,已经可以通过配置,把这个scheduler扩展功能激活。

cd /data/install
cat << EOF > ./policy.cfg
    {
    "kind" : "Policy",
    "apiVersion" : "v1",
    "predicates" : [
            {"name" : "MaxGCEPDVolumeCount"},
            {"name" : "GeneralPredicates"},
            {"name" : "MaxAzureDiskVolumeCount"},
            {"name" : "MaxCSIVolumeCountPred"},
            {"name" : "CheckVolumeBinding"},
            {"name" : "MaxEBSVolumeCount"},
            {"name" : "MatchInterPodAffinity"},
            {"name" : "CheckNodeUnschedulable"},
            {"name" : "NoDiskConflict"},
            {"name" : "NoVolumeZoneConflict"},
            {"name" : "PodToleratesNodeTaints"}
            ],
    "priorities" : [
            {"name" : "LeastRequestedPriority", "weight" : 1},
            {"name" : "BalancedResourceAllocation", "weight" : 1},
            {"name" : "ServiceSpreadingPriority", "weight" : 1},
            {"name" : "NodePreferAvoidPodsPriority", "weight" : 1},
            {"name" : "NodeAffinityPriority", "weight" : 1},
            {"name" : "TaintTolerationPriority", "weight" : 1},
            {"name" : "ImageLocalityPriority", "weight" : 1},
            {"name" : "SelectorSpreadPriority", "weight" : 1},
            {"name" : "InterPodAffinityPriority", "weight" : 1},
            {"name" : "EqualPriority", "weight" : 1}
            ],
    "extenders": [
            {
              "urlPrefix": "http://127.0.0.1:32766/gpushare-scheduler",
              "filterVerb": "filter",
              "bindVerb":   "bind",
              "enableHttps": false,
              "nodeCacheCapable": true,
              "managedResources": [
                {
                  "name": "aliyun.com/gpu-mem",
                  "ignoredByScheduler": false
                }
              ],
              "ignorable": false
            }
          ]
    }
   
EOF
oc delete configmap -n openshift-config  scheduler-policy
oc create configmap -n openshift-config --from-file=policy.cfg scheduler-policy

oc patch Scheduler cluster --type='merge' -p '{"spec":{"policy":{"name":"scheduler-policy"}}}' --type=merge

然后我们就可以部署 scheduler extender 了

curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml
# replace docker image
cd /data/install
sed -i 's/image:.*/image: quay.io\/wangzheng422\/qimgs:gpushare-scheduler-extender-2021-02-26-1339/' gpushare-schd-extender.yaml
oc delete -f gpushare-schd-extender.yaml
oc create -f gpushare-schd-extender.yaml

operator hub 中添加 catalog source

我们定制了nvidia gpu-operator,所以我们要把我们新的operator加到operator hub中去。

#
cat << EOF > /data/ocp4/my-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: wzh-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: WZH Operator Catalog
  image: 'quay.io/wangzheng422/qimgs:registry-wzh-index.2021-02-28-1446'
  publisher: WZH
  sourceType: grpc
EOF
oc create -f  /data/ocp4/my-catalog.yaml

oc delete -f /data/ocp4/my-catalog.yaml

到此,我们就能在 operator hub 中,查找到2个gpu-operator了

安装 gpu-operator 并配置 ClusterPolicies

点击安装 nvidia & wzh 那个。

安装成功以后,创建 project gpu-operator-resources

然后在 project gpu-operator-resources 中,给gpu-operator创建一个ClusterPolicies 配置,使用以下模版创建。不过里面涉及到准备一个离线安装源的操作,参考这里完成。


apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
  name: gpu-cluster-policy
spec:
  dcgmExporter:
    nodeSelector: {}
    imagePullSecrets: []
    resources: {}
    affinity: {}
    podSecurityContext: {}
    repository: nvcr.io/nvidia/k8s
    securityContext: {}
    version: 'sha256:85016e39f73749ef9769a083ceb849cae80c31c5a7f22485b3ba4aa590ec7b88'
    image: dcgm-exporter
    tolerations: []
  devicePlugin:
    nodeSelector: {}
    imagePullSecrets: []
    resources: {}
    affinity: {}
    podSecurityContext: {}
    repository: quay.io/wangzheng422
    securityContext: {}
    version: gpu-aliyun-device-plugin-2021-02-24-1346
    image: qimgs
    tolerations: []
    args:
      - 'gpushare-device-plugin-v2'
      - '-logtostderr'
      - '--v=5'
    env:
      - name: NODE_NAME
        valueFrom:
          fieldRef:
            fieldPath: spec.nodeName
  driver:
    nodeSelector: {}
    imagePullSecrets: []
    resources: {}
    affinity: {}
    podSecurityContext: {}
    repository: nvcr.io/nvidia
    securityContext: {}
    repoConfig:
      configMapName: repo-config
      destinationDir: /etc/yum.repos.d
    version: 'sha256:324e9dc265dec320207206aa94226b0c8735fd93ce19b36a415478c95826d934'
    image: driver
    tolerations: []
  gfd:
    nodeSelector: {}
    imagePullSecrets: []
    resources: {}
    affinity: {}
    podSecurityContext: {}
    repository: nvcr.io/nvidia
    securityContext: {}
    version: 'sha256:8d068b7b2e3c0b00061bbff07f4207bd49be7d5bfbff51fdf247bc91e3f27a14'
    image: gpu-feature-discovery
    tolerations: []
    migStrategy: single
    sleepInterval: 60s
  operator:
    defaultRuntime: crio
    validator:
      image: cuda-sample
      imagePullSecrets: []
      repository: nvcr.io/nvidia/k8s
      version: 'sha256:2a30fe7e23067bc2c3f8f62a6867702a016af2b80b9f6ce861f3fea4dfd85bc2'
    deployGFD: true
  toolkit:
    nodeSelector: {}
    imagePullSecrets: []
    resources: {}
    affinity: {}
    podSecurityContext: {}
    repository: nvcr.io/nvidia/k8s
    securityContext: {}
    version: 'sha256:81295a9eca36cbe5d94b80732210b8dc7276c6ef08d5a60d12e50479b9e542cd'
    image: container-toolkit
    tolerations: []

至此,gpu-operator就安装完成了,我们可以看到,device-plugin的validate并没有运行,这是因为,我们定制了sheduler, nvidia.com/gpu 已经被 aliyun.com/gpu-mem 代替。 完美解决这个问题,就需要继续定制化了,但是系统已经能按照预期运行,我们就把定制化留到以后项目中去做好了。

测试一下

我们就来实际测试一下效果

cat << EOF > /data/ocp4/gpu.test.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo1
  labels:
    app: demo1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo1
  template:
    metadata:
      labels:
        app: demo1
    spec:
      # nodeSelector:
      #   kubernetes.io/hostname: 'worker-0'
      restartPolicy: Always
      containers:
        - name: demo1
          image: "docker.io/wangzheng422/imgs:tensorrt-ljj-2021-01-21-1151"
          env:
            - name: NVIDIA_VISIBLE_DEVICES
              valueFrom:
                fieldRef:
                  fieldPath: metadata.annotations['ALIYUN_COM_GPU_MEM_IDX']
          resources:
            limits:
              # GiB
              aliyun.com/gpu-mem: 3

EOF
oc create -n demo -f /data/ocp4/gpu.test.yaml


进入测试容器,看环境变量,我们就能看到 NVIDIA_VISIBLE_DEVICES 被自动设置了

我们进入scheduler extender看看日志, 可以看到scheduler试图给pod添加annotation

我们再进入device-plugin看看日志,可以看到device-plugin在对比内存,挑选gpu设备。

headless service with router

本文讲述,如果service是headless的情况下,k8s/ocp ingress如何处理,和普通headless有什么区别。

演示视频

结论是,不管是是不是headless service, openshift都会找到最终的pod ip,然后在ingress/router/haproxy里面,修改配置,让流量直接导向pod ip。

# 这里是演示环境部署脚本
cat << EOF > headless.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: slb-001
spec:
  replicas: 1
  selector: 
    matchLabels: 
      pod: slb-001
  template: 
    metadata: 
      labels: 
        pod: slb-001
    spec:
      restartPolicy: Always
      
      containers:
      - name: slb-001-pg
        image: registry.redhat.ren:5443/docker.io/etherpad/etherpad:latest
        imagePullPolicy: IfNotPresent

---
apiVersion: v1
kind: Service
metadata:
  name: slb-001-service
spec:
  selector:
    pod: slb-001
  ports:
    - port: 9001
      protocol: TCP
      targetPort: 9001
---
apiVersion: v1
kind: Service
metadata:
  name: slb-002-service
spec:
  selector:
    pod: slb-001
  clusterIP: None
  ports:
    - port: 9001
      protocol: TCP
      targetPort: 9001

---
kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: slb-001
spec:
  to:
    kind: Service
    name: slb-001-service
  port:
    targetPort: 9001
---
kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: slb-002
spec:
  to:
    kind: Service
    name: slb-002-service
  port:
    targetPort: 9001
EOF
oc apply -n demo -f headless.yaml


volumn 测试

this is for single node cluster:

https://docs.openshift.com/container-platform/4.3/storage/persistent_storage/persistent-storage-hostpath.html

local volumn

https://docs.openshift.com/container-platform/4.3/storage/persistent_storage/persistent-storage-local.html

local volumn有个坑,他是挂载设备,不是节点上面的目录,所以想用的话,先要想办法把节点上面用lvm搞出很多个设备来,然后用local volumn挂载。。。这个太傻了。。。还是需要有商业版本的云原生的存储解决方案比较好。

# on worker-0
mkdir -p /data/demo

# on helper
oc project demo
oc get sa
oc create serviceaccount -n demo demo-app
oc adm policy add-scc-to-user privileged -z demo-app


local volumn block share

如果是local volume 在块设备的模式下,是可以被相同节点的pod共享的

video

  • https://youtu.be/P33sxtR57u8
  • https://www.ixigua.com/i6841022539582407180/
  • https://www.bilibili.com/video/BV115411W7FV/
# on infra0 create a lv
lvcreate --type raid0 -L 40G --stripes 12 -n sharelv datavg

apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "local-share-block-disks"
  namespace: "local-storage" 
spec:
  nodeSelector: 
    nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - infra0.hsc.redhat.ren
  storageClassDevices:
    - storageClassName: "local-share-block-sc"
      volumeMode: Block 
      devicePaths: 
        - /dev/datavg/sharelv

cat << EOF > storage.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: localpvc
spec:
  accessModes:
  - ReadWriteOnce
  volumeMode: Block 
  resources:
    requests:
      storage: 40Gi 
  storageClassName: local-share-block-sc
EOF
oc apply -n demo -f storage.yaml

cat << EOF > demo1.yaml
---
kind: Pod
apiVersion: v1
metadata:
  annotations:
  name: demo1
  namespace: demo
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra0.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: [ "/bin/bash", "-c", "--" ]
      args: [ "while true; do sleep 300000; done;" ]
      imagePullPolicy: Always
      securityContext:
        privileged: true
      volumeDevices:
        - devicePath: /mnt/block
          name: demo
  serviceAccount: demo-app
  volumes:
    - name: demo 
      persistentVolumeClaim:
        claimName: localpvc 
---
kind: Pod
apiVersion: v1
metadata:
  annotations:
  name: demo2
  namespace: demo
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra0.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: [ "/bin/bash", "-c", "--" ]
      args: [ "while true; do sleep 300000; done;" ]
      imagePullPolicy: Always
      securityContext:
        privileged: true
      volumeDevices:
        - devicePath: /mnt/block
          name: demo
  serviceAccount: demo-app
  volumes:
    - name: demo 
      persistentVolumeClaim:
        claimName: localpvc 
EOF
oc apply -f demo1.yaml

# 向块设备写入
oc exec -it -n demo demo1 -- bash -c "echo 'test 1' > /mnt/block"
oc exec -it -n demo demo1 -- head -n 1 /mnt/block

oc exec -it -n demo demo2 -- head -n 2 /mnt/block

oc delete -f demo1.yaml

local volume fs

cat << EOF > demo1.yaml
---
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "local-disks"
  namespace: "local-storage" 
spec:
  nodeSelector: 
    nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - worker-0
  storageClassDevices:
    - storageClassName: "local-sc"
      volumeMode: Filesystem 
      fsType: xfs 
      devicePaths: 
        - /data/lv01
        - /data/lv02
EOF

oc apply -f demo1.yaml
oc delete -f demo1.yaml

oc get all -n local-storage
oc get pv

cat << EOF > demo1.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: local-pvc-name 
spec:
  accessModes:
  - ReadWriteOnce
  volumeMode: Filesystem 
  resources:
    requests:
      storage: 100Gi 
  storageClassName: local-sc 
---
kind: Pod
apiVersion: v1
metadata:
  annotations:
  name: demo1
  namespace: demo
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: [ "/bin/bash", "-c", "--" ]
      args: [ "while true; do sleep 300000; done;" ]
      imagePullPolicy: Always
      securityContext:
        privileged: true
      volumeMounts:
        - mountPath: /data
          name: demo 
          readOnly: false
  serviceAccount: demo-app
  volumes:
    - name: demo 
      persistentVolumeClaim:
        claimName: localpvc 
EOF
oc apply -f demo1.yaml

demo for hostpath

https://docs.openshift.com/container-platform/4.3/storage/persistent_storage/persistent-storage-local.html

video

  • https://www.bilibili.com/video/BV1MV411Z7ZK/
  • https://youtu.be/Dzq-xZW3O5E

oc project demo
oc get sa
oc create serviceaccount -n demo demo-app
oc adm policy add-scc-to-user privileged -z demo-app

cat << EOF > demo1.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo1
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo1
  template:
    metadata:
      labels:
        app: demo1  
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /data
              name: demo 
              readOnly: false
      serviceAccount: demo-app
      volumes:
        - name: demo 
          hostPath:
            path: /data
            type: Directory
EOF
oc apply -f demo1.yaml

oc delete -f demo1.yaml

demo for emptydir

https://kubernetes.io/docs/concepts/storage/volumes/

cat << EOF > demo1.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo1
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo1
  template:
    metadata:
      labels:
        app: demo1  
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always
          volumeMounts:
            - mountPath: /data
              name: demo 
              readOnly: false
      volumes:
        - name: demo 
          emptyDir: {}
EOF
oc apply -f demo1.yaml

oc delete -f demo1.yaml

secret

https://docs.openshift.com/container-platform/4.3/nodes/pods/nodes-pods-secrets.html

cat << EOF > demo1.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  namespace: demo
data:
  username: dmFsdWUtMQ0K     
  password: dmFsdWUtMQ0KDQo= 
stringData:
  hostname: myapp.mydomain.com 
  secret.properties: |-     
    property1=valueA
    property2=valueB
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo1
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo1
  template:
    metadata:
      labels:
        app: demo1  
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always
          volumeMounts:
            - mountPath: /data
              name: demo 
              readOnly: true
      volumes:
        - name: demo 
          secret:
            secretName: test-secret
EOF
oc apply -f demo1.yaml

oc delete -f demo1.yaml

configmap

https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/

cat << EOF > demo1.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: special-config
  namespace: demo
data:
  SPECIAL_LEVEL: very
  SPECIAL_TYPE: charm
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo1
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo1
  template:
    metadata:
      labels:
        app: demo1  
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always
          volumeMounts:
            - mountPath: /data
              name: demo 
              readOnly: true
      volumes:
        - name: demo 
          configMap:
            name: special-config
EOF
oc apply -f demo1.yaml

oc delete -f demo1.yaml

nfs manual

https://docs.openshift.com/container-platform/4.3/storage/persistent_storage/persistent-storage-nfs.html

video

  • https://www.bilibili.com/video/BV1Ng4y1z7Dj/
  • https://youtu.be/DIM9fLGJZLU
# on helper
mkdir -p /data/export/lv01
mkdir -p /data/export/lv02

chown -R nfsnobody:nfsnobody /data/export/lv01
chown -R nfsnobody:nfsnobody /data/export/lv02

chmod 777 /data/export/lv01
chmod 777 /data/export/lv02

cat << EOF > /etc/exports
/data/export    *(rw,sync,root_squash)
/data/export/lv01    *(rw,sync,root_squash)
/data/export/lv02    *(rw,sync,root_squash)

EOF

systemctl restart nfs-server

exportfs  -s

cat << EOF > demo.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv0001 
  labels:
    storage-purpose: demo
spec:
  capacity:
    storage: 5Gi 
  accessModes:
  - ReadWriteOnce 
  nfs: 
    path: /data/export/lv01 
    server: 117.177.241.16
  persistentVolumeReclaimPolicy: Retain 
EOF

oc create -n demo -f demo.yaml

oc get pv

cat << EOF > demo.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-claim1
spec:
  storageClassName: ""
  accessModes:
    - ReadWriteOnce 
  resources:
    requests:
      storage: 5Gi
  selector: 
    matchLabels:
      storage-purpose: demo
EOF

oc create -n demo -f demo.yaml

cat << EOF > demo.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo  
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always
          volumeMounts:
            - mountPath: /data
              name: demo 
      volumes:
        - name: demo 
          persistentVolumeClaim:
            claimName: nfs-claim1
EOF
oc apply -n demo -f demo.yaml

nfs auto

https://github.com/kubernetes-incubator/external-storage/blob/master/nfs-client/deploy/test-claim.yaml

video

  • https://www.bilibili.com/video/BV1vt4y1272R/
  • https://youtu.be/aSfiv-G67Gg
cat << EOF > demo.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-demo
  annotations:
    volume.beta.kubernetes.io/storage-class: nfs-storage-provisioner
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
EOF

oc create -n demo -f demo.yaml

cat << EOF > demo.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo  
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always
          volumeMounts:
            - mountPath: /data
              name: demo 
      volumes:
        - name: demo 
          persistentVolumeClaim:
            claimName: pvc-demo
EOF
oc apply -n demo -f demo.yaml

oc delete -n demo -f demo.yaml

openshift 4.3 enable SupportPodPidsLimit

默认 /sys/fs/cgroup/pids/pids.max 是1024, 有些业务是要求突破这个值。如果不放松限制,会有 "read init-p: connection reset by peer" 这种错误,无法rsh进pod. 而且客户的java程序可能会出现线程创建失败的问题。

解决问题的思路,不要按照文档,开启集群的PodPidsLimit功能,而是用mc放开crio.conf里面的pid限制。

https://www.redhat.com/en/blog/red-hat-openshift-container-platform-4-now-defaults-cri-o-underlying-container-engine

https://docs.openshift.com/container-platform/4.3/nodes/clusters/nodes-cluster-enabling-features.html

https://blog.spider.im/post/pid-limit-in-k8s/

这个pids系统限制的是线程+进程数,可以理解成pstree -pl看到的数量

https://docs.openshift.com/container-platform/4.3/scalability_and_performance/recommended-host-practices.html

https://github.com/openshift/machine-config-operator/blob/master/pkg/apis/machineconfiguration.openshift.io/v1/types.go

https://github.com/openshift/machine-config-operator/blob/master/vendor/k8s.io/kubelet/config/v1beta1/types.go

https://github.com/cri-o/cri-o/issues/1921

正确

直接覆盖 /etc/crio/crio.conf


# check current pids limit
crictl ps | awk '{print $1}' | xargs -I DEMO crictl exec DEMO cat /sys/fs/cgroup/pids/pids.max

oc label mcp worker custom-kubelet-pod-pids-limit=true

cat << EOF > crio.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: set-log-and-pid
spec:
 machineConfigPoolSelector:
   matchLabels:
     custom-kubelet-pod-pids-limit: 'true'
 containerRuntimeConfig:
   pidsLimit: 10240
EOF
oc apply -f crio.yaml

oc delete -f crio.yaml

错误


# PodPidsLimit
oc label mcp worker custom-kubelet-pod-pids-limit=true

cat << EOF > PodPidsLimit.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: pod-pids-limit
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet-pod-pids-limit: 'true'
  kubeletConfig:
    PodPidsLimit: 4096
EOF
oc apply -f PodPidsLimit.yaml

oc delete -f PodPidsLimit.yaml

cat << EOF > PodPidsLimit.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: pod-pids-limit
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet-pod-pids-limit: 'true'
  kubeletConfig:
    PodPidsLimit: 10240
EOF
oc apply -f PodPidsLimit.yaml

openshift 使用 rh sso 做 oauth 认证

https://access.redhat.com/documentation/en-us/red_hat_single_sign-on/7.3/html/red_hat_single_sign-on_for_openshift/index

官方文档写的很好,但是是基于 ocp 3.11 的,所以里面有几个配置点需要调整:

  • 通过catalog部署的时候,一定要设置admin的用户名和密码。
  • issuer url: https://sso-sso-app-demo.apps.ocpef0a.sandbox1717.opentlc.com/auth/realms/OpenShift
  • Valid Redirect URIs: https://oauth-openshift.apps.ocpef0a.sandbox1717.opentlc.com/*
  • ca.crt 这个文件可以在web界面上上传,但是传什么文件呢,是 openshift-ingress-operator 的 router-ca 里面的 tls.crt
  • 界面老是刷不出来 openid 的登录方法: 这种情况,需要一路回退到系统界面,然后在跳转回来,再刷新才行。在登录界面一直刷新是没用的,应该是前端页面的小bug。
  • 用户在rh sso里面单点认证以后,如果在openshift退出,想换一个用户登录,是不行的。 这种情况,需要登录到rh sso,把之前的用户session做登出操作,然后openshift上面才能换一个用户登录。
  • 添加一个oauth Identity Providers 容易,但是没有删除界面。 这种情况,只能去直接改Identity Providers 的 yaml文件,删掉相关配置。

详细步骤

这里是配置过程的录屏:

  • https://www.ixigua.com/i6800709743808610827/
  • https://youtu.be/Ak9qdgIbOic

创建项目 sso-app-demo

从 catalog 里面选择 sso 创建, 注意设定sso管理员密码, 省的之后麻烦。 https://access.redhat.com/documentation/en-us/red_hat_single_sign-on/7.3/html-single/red_hat_single_sign-on_for_openshift/index#deploying_the_red_hat_single_sign_on_image_using_the_application_template

然后登录rh sso ,按照官方文档进行配置 https://access.redhat.com/documentation/en-us/red_hat_single_sign-on/7.3/html-single/red_hat_single_sign-on_for_openshift/index#OSE-SSO-AUTH-TUTE

备用命令


# oc -n openshift import-image redhat-sso73-openshift:1.0

# oc new-project sso-app-demo
# oc policy add-role-to-user view system:serviceaccount:$(oc project -q):default

# oc policy remove-role-from-user view system:serviceaccount:$(oc project -q):default

# get issuer url
curl -k https://sso-sso-app-demo.apps.ocpef0a.sandbox1717.opentlc.com/auth/realms/OpenShift/.well-known/openid-configuration | python -m json.tool | grep issuer

# curl -k https://sso-sso-app-demo.apps.ocpef0a.sandbox1717.opentlc.com/auth/realms/OpenShift/.well-known/openid-configuration | jq | less

# # on mac create a ca
# cd ~/Downloads/tmp/tmp/
# openssl req \
#    -newkey rsa:2048 -nodes -keyout redhat.ren.key \
#    -x509 -days 3650 -out redhat.ren.crt -subj \
#    "/C=CN/ST=GD/L=SZ/O=Global Security/OU=IT Department/CN=*.redhat.ren"
# # upload crt to ocp
# oc create configmap ca-config-map --from-file=ca.crt=./redhat.ren.crt -n openshift-config

# oc delete configmap ca-config-map -n openshift-config

oc get secrets router-ca -n openshift-ingress-operator -o jsonpath='{.data.tls\.crt}' | base64 -d > router.ca.crt

# oc get secrets router-ca -n openshift-ingress-operator -o jsonpath='{.data.tls\.key}' | base64 -d

# oc get OAuthClient

# if you want to debug, https://bugzilla.redhat.com/show_bug.cgi?id=1744599
oc patch authentication.operator cluster --type=merge -p "{\"spec\":{\"operatorLogLevel\": \"TraceAll\"}}"
oc patch authentication.operator cluster --type=merge -p "{\"spec\":{\"operatorLogLevel\": \"\"}}"

# update imate stream for offline
oc patch -n openshift is mysql -p "{\"spec\":{\"tags\":[{\"name\": \"5.7\",\"from\":{\"name\":\"registry.redhat.ren:5443/registry.redhat.io/rhscl/mysql-57-rhel7:latest\"}}]}}"
oc patch -n openshift is mysql -p "{\"spec\":{\"tags\":[{\"name\": \"8.0\",\"from\":{\"name\":\"registry.redhat.ren:5443/registry.redhat.io/rhscl/mysql-80-rhel7:latest\"}}]}}"
oc patch -n openshift is redhat-sso73-openshift -p "{\"spec\":{\"tags\":[{\"name\": \"1.0\",\"from\":{\"name\":\"registry.redhat.ren:5443/registry.redhat.io/redhat-sso-7/sso73-openshift:1.0\"}}]}}"
oc patch -n openshift is redhat-sso73-openshift -p "{\"spec\":{\"tags\":[{\"name\": \"latest\",\"from\":{\"name\":\"registry.redhat.ren:5443/registry.redhat.io/redhat-sso-7/sso73-openshift:1.0\"}}]}}"

oc create is ipa-server -n openshift

ocp scc

SecComp

https://docs.openshift.com/container-platform/4.3/authentication/managing-security-context-constraints.html

https://docs.docker.com/engine/security/seccomp/

https://docs.openshift.com/container-platform/4.3/nodes/nodes/nodes-nodes-managing.html

https://docs.openshift.com/container-platform/3.11/admin_guide/seccomp.html

https://gardener.cloud/050-tutorials/content/howto/secure-seccomp/

video

  • https://www.bilibili.com/video/BV1Sa4y1x7UP/
  • https://youtu.be/gwu53N4dIws

实验证明,容器内部更改date,不影响主机。

# 在kube-system中创建特权用户
oc project kube-system
oc create serviceaccount -n kube-system demo-app
oc adm policy add-scc-to-user privileged -z demo-app

# 创建seccomp需要的security profile,我们这里创建一个只屏蔽clock set的profile
cat << EOF > demo.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: seccomp-profile
  namespace: kube-system
data:
  my-profile.json: |
    {
      "defaultAction": "SCMP_ACT_ALLOW",
      "syscalls": [
        {
          "name": "clock_settime",
          "action": "SCMP_ACT_ERRNO"
        }
      ]
    }
EOF
oc apply -f demo.yaml

# 创建一个daemon set,把我们自定义的security profile复制到各个节点的containerd运行时配置目录中,这样containerd就可以根据需要,使用这个security profile
cat << EOF > demo.yaml
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: seccomp
  namespace: kube-system
  labels:
    security: seccomp
spec:
  selector:
    matchLabels:
      security: seccomp
  template:
    metadata:
      labels:
        security: seccomp
    spec:
      initContainers:
      - name: installer
        image: docker.io/library/alpine:latest
        command: ["/bin/sh", "-c", "cp -r -L /seccomp/*.json /host/seccomp/"]
        securityContext:
            privileged: true
        volumeMounts:
        - name: profiles
          mountPath: /seccomp
        - name: hostseccomp
          mountPath: /host/seccomp
          readOnly: false
      containers:
      - name: pause
        image: gcr.io/google_containers/pause-amd64:3.0
      terminationGracePeriodSeconds: 5
      serviceAccount: demo-app
      volumes:
      - name: hostseccomp
        hostPath:
          path: /var/lib/kubelet/seccomp
      - name: profiles
        configMap:
          name: seccomp-profile
EOF
oc apply -f demo.yaml

# 在我们的demo project中,创建特权用户
oc project demo
oc create serviceaccount -n demo demo-app
oc adm policy add-scc-to-user privileged -z demo-app

# 在demo project中,创建应用,指定使用我们刚创建的security profile,为了展现效果,我们特别的指明,需要clock setting这个权限,后面可以看到,security profile屏蔽了这个请求。
cat << EOF > demo.yaml
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: "localhost/my-profile.json"
  name: demo
spec:
  nodeSelector:
    kubernetes.io/hostname: 'worker-0.ocp4.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["/bin/bash", "-c", "--" ]
      args: [ "trap : TERM INT; sleep infinity & wait" ]
      imagePullPolicy: Always
      securityContext:
        capabilities:
            add: ["CAP_SYS_TIME"]
  serviceAccount: demo-app
EOF
oc apply -n demo -f demo.yaml

# 进入这个pod,运行命令能看到命令失败
# this will failed, even you add the capabilities.
date -s "1 second"
# date: cannot set date: Operation not permitted
# Tue Mar 24 02:10:49 UTC 2020

# 为了对比,我们更改刚才的security profile,所有的都放开
# try to allow
cat << EOF > demo.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: seccomp-profile
  namespace: kube-system
data:
  my-profile.json: |
    {
      "defaultAction": "SCMP_ACT_ALLOW"
    }
EOF
oc apply -f demo.yaml

# 通过打标签的方法,让daemon set重启,这样就把我们更新的security profile更新到各个节点上去
# restart damonset and restart pod.
# oc annotate -n kube-system ds seccomp last-update="`date`"
oc patch -n kube-system ds seccomp -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}"
oc get pod -n kube-system
# 进入demo pod,再次运行命令,可以看到命令运行成功。 
# this command will ok.
date -s "1 second"

# 最好,为了防止安全风险,我们将security profile重置成拒绝所有,并重启daemon set,更新到所有节点上。
# finally, restore 
cat << EOF > demo.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: seccomp-profile
  namespace: kube-system
data:
  my-profile.json: |
    {
      "defaultAction": "SCMP_ACT_ERRNO"
    }
EOF
oc apply -f demo.yaml
# restart damonset and restart pod.
oc patch -n kube-system ds seccomp -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}"

capabilities

video

  • https://youtu.be/yLdJghw-7xs
  • https://www.bilibili.com/video/BV1x64y1T7BZ/
# 创建pod,限制selinux,没有clock setting的capability.

cat << EOF > demo.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo  
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always
          securityContext:
            capabilities:
                drop: ["CAP_SYS_TIME"]
      serviceAccount: demo-app

EOF
oc apply -n demo -f demo.yaml

# 进入pod,运行以下命令,不成功
date -s "1 second"

# 更新这个pod,赋予clock setting的capability
cat << EOF > demo.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo  
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always
          securityContext:
            capabilities:
                add: ["CAP_SYS_TIME"]
      serviceAccount: demo-app

EOF
oc apply -n demo -f demo.yaml

# 进入pod,运行命令,可以看到命令运行成功
date -s "1 second"

# 删除演示应用
oc delete -n demo -f demo.yaml


MCS

https://access.redhat.com/documentation/en-us/openshift_container_platform/3.3/html/installation_and_configuration/configuring-persistent-storage#selinuxoptions

http://www.178linux.com/98614

https://access.redhat.com/sites/default/files/video/files/mls_-_wide_8.pdf

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/selinux_users_and_administrators_guide/mls

https://www.cnblogs.com/charlieroro/p/10830721.html

video

  • https://youtu.be/XoQ11ZXEL7Y
  • https://www.bilibili.com/video/BV115411t777/
# on worker-0
# yum install selinux-policy-mls
# vi /etc/selinux/config
# # SELINUXTYPE=mls
# # SELINUXTYPE=mls
# getenforce
# fixfiles -F onboot
# cat /.autorelabel | less
# semanage login -l
# # semanage login --modify --range s0-s15:c0.c1023 root
# chcon -R -t default_t /data/mcs

# 创建测试用的目录,设置特殊的权限。
mkdir /data/mcs
chcon -R -l s0:c100 /data/mcs
chcon -R -t container_file_t /data/mcs
chown -R 1000:2000 /data/mcs
chmod -R 775 /data/mcs

# semanage fcontext -l | grep default_t

oc get project demo -o yaml 
# metadata:
#   annotations:
#     openshift.io/description: ""
#     openshift.io/display-name: ""
#     openshift.io/requester: kube:admin
#     openshift.io/sa.scc.mcs: s0:c23,c22
#     openshift.io/sa.scc.supplemental-groups: 1000550000/10000
#     openshift.io/sa.scc.uid-range: 1000550000/10000

# 创建pod,指定selinux的权限s0:c99。
cat << EOF > demo.yaml
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
  name: demo
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["/bin/bash", "-c", "--" ]
      args: [ "trap : TERM INT; sleep infinity & wait" ]
      imagePullPolicy: Always
      securityContext:
        runAsUser: 1000
        runAsGroup: 2000 
        seLinuxOptions:
          level: 's0:c99'
      volumeMounts:
        - mountPath: /data
          name: demo 
          readOnly: false
  serviceAccount: demo-app
  volumes:
    - name: demo 
      hostPath:
        path: /data/mcs
        type: Directory
EOF
oc apply -n demo -f demo.yaml

# 进入pod检查权限,由于s0:c99 和目录的s0:c100 权限不符,以下操作失败
# below will fail
cd /data

# 修改目录权限,符合pod中的selinux声明
# after change the host path selinux flag
chcon -R -l s0:c99 /data/mcs
# system_u:object_r:default_t:s0:c99
# system_u:system_r:container_t:s0:c99
# seinfo -tcontainer_t
# seinfo -rsystem_r

# 进入pod操作,以下操作能够成功。
# then, below will ok
cd /data
ls
touch test


ocp 4.3 recover from node not ready

https://access.redhat.com/solutions/4923031

cat << "EOF" > recover_kubeconfig.sh
#!/bin/bash

set -eou pipefail

# context
intapi=$(oc get infrastructures.config.openshift.io cluster -o "jsonpath={.status.apiServerInternalURI}")
context="$(oc config current-context)"
# cluster
cluster="$(oc config view -o "jsonpath={.contexts[?(@.name==\"$context\")].context.cluster}")"
server="$(oc config view -o "jsonpath={.clusters[?(@.name==\"$cluster\")].cluster.server}")"
# token
ca_crt_data="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.ca\.crt}" | base64 --decode)"
namespace="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token  -o "jsonpath={.data.namespace}" | base64 --decode)"
token="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.token}" | base64 --decode)"

export KUBECONFIG="$(mktemp)"
oc config set-credentials "kubelet" --token="$token" >/dev/null
ca_crt="$(mktemp)"; echo "$ca_crt_data" > $ca_crt
oc config set-cluster $cluster --server="$intapi" --certificate-authority="$ca_crt" --embed-certs >/dev/null
oc config set-context kubelet --cluster="$cluster" --user="kubelet" >/dev/null
oc config use-context kubelet >/dev/null
cat "$KUBECONFIG"
EOF

chmod 755 recover_kubeconfig.sh
./recover_kubeconfig.sh > kubeconfig-bootstrap

# scp kubeconfig-bootstrap to each affected nodes
scp kubeconfig-bootstrap core@node.ip.address:~/

# on each affected nodes
systemctl stop kubelet
mkdir -p /root/backup-certs
cp -a /var/lib/kubelet/pki /var/lib/kubelet/kubeconfig /root/backup-certs
rm -rf /var/lib/kubelet/pki /var/lib/kubelet/kubeconfig
cp /home/core/kubeconfig-bootstrap /etc/kubernetes/kubeconfig
systemctl start kubelet

# on helper
oc get node
oc get csr
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

openshift 4.3 QoS

https://docs.openshift.com/container-platform/4.3/nodes/pods/nodes-pods-configuring.html

https://docs.openshift.com/container-platform/3.11/admin_guide/managing_pods.html#admin-guide-manage-pods-limit-bandwidth

video

  • https://youtu.be/ghObMDoLcAQ
  • https://www.bilibili.com/video/BV16Z4y1W75P/

# 创建一个服务端Pod,用iperf3作为服务端,服务端限制带宽1Mb/s。再创建一个客户端Pod,有iperf3作为客户端。
cat << EOF > demo.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod
  annotations:
    kubernetes.io/ingress-bandwidth: 1M
    kubernetes.io/egress-bandwidth: 1M
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: iperf
spec:
  replicas: 1
  selector:
    matchLabels:
      app: iperf
  template:
    metadata:
      labels:
        app: iperf  
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'infra0.hsc.redhat.ren'
      restartPolicy: Always
      containers:
        - name: iperf
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always
EOF
oc apply -n demo -f demo.yaml

# 查找服务端pod ip
oc get pod -o wide

# 进入客户端,进行测速
oc exec -it iperf-5b95866ff5-c9p9m -- iperf3 -t 20 -b 2M -p 6666 -c 10.254.5.52

# 查看服务端pod的日志,可以看到流量信息

# 更改服务端带宽为2M
oc delete pod -n demo demo-pod

cat << EOF > demo1.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod
  annotations:
    kubernetes.io/ingress-bandwidth: 2M
    kubernetes.io/egress-bandwidth: 2M
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always

EOF
oc apply -n demo -f demo1.yaml

# 查找服务端pod ip
oc get pod -o wide

# 进入客户端,进行测速
oc exec -it iperf-5b95866ff5-c9p9m -- iperf3 -t 20 -b 2M -p 6666 -c 10.254.5.53

# 查看服务端pod的日志,可以看到流量信息

oc delete -n demo -f demo.yaml

openshift 4.3 QoS

本文测试,openshift (ovs) pod 在大流量下, 限流功能的表现。

video

  • https://youtu.be/IaWdkPsRinw
  • https://www.bilibili.com/video/BV1cV411d7LV/

参考资料:

https://docs.openshift.com/container-platform/4.3/nodes/pods/nodes-pods-configuring.html

https://docs.openshift.com/container-platform/3.11/admin_guide/managing_pods.html#admin-guide-manage-pods-limit-bandwidth


# 查看infra0, infra1上面的端口速度,可以看到是10GE的网口
ethtool em1
# Settings for em1:
#         Supported ports: [ FIBRE ]
#         Supported link modes:   1000baseT/Full
#                                 10000baseT/Full
#         Supported pause frame use: Symmetric Receive-only
#         Supports auto-negotiation: No
#         Supported FEC modes: Not reported
#         Advertised link modes:  10000baseT/Full
#         Advertised pause frame use: No
#         Advertised auto-negotiation: No
#         Advertised FEC modes: Not reported
#         Speed: 10000Mb/s
#         Duplex: Full
#         Port: FIBRE
#         PHYAD: 1
#         Transceiver: internal
#         Auto-negotiation: off
#         Supports Wake-on: g
#         Wake-on: d
#         Current message level: 0x00000000 (0)

#         Link detected: yes

# 创建2个服务端Pod,用iperf3作为服务端,服务端不限速。再创建一个客户端Pod,有iperf3作为客户端。
cat << EOF > demo.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod1
  namespace: demo
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
      resources:
        requests:
          cpu: 4.0
          memory: 8Gi
        limits:
          cpu: 60.0
          memory: 100Gi
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod2
  namespace: default
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
      resources:
        requests:
          cpu: 4.0
          memory: 8Gi
        limits:
          cpu: 60.0
          memory: 100Gi
---
kind: Pod
apiVersion: v1
metadata:
  name: iperf
  namespace: zte
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra0.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: iperf
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["/bin/bash", "-c", "--" ]
      args: [ "trap : TERM INT; sleep infinity & wait" ]
      imagePullPolicy: Always
      resources:
        requests:
          cpu: 4.0
          memory: 8Gi
        limits:
          cpu: 60.0
          memory: 100Gi
EOF
oc apply -f demo.yaml

# 查找服务端pod ip
oc get pod -A -o wide | grep demo-pod
oc get pod -n zte -o wide

pod_demo1_ip=$(oc get pod -n demo demo-pod1 -o json | jq -r '.status.podIPs[0].ip')

pod_demo2_ip=$(oc get pod -n default demo-pod2 -o json | jq -r '.status.podIPs[0].ip')

echo $pod_demo1_ip
echo $pod_demo2_ip

# 进入客户端,对两个服务端pod进行测速
/bin/rm -f nohup.out
nohup oc exec -n zte -it iperf -- iperf3 -T demo1 -i 10 -t 30 -b 3G -P 6 -p 6666 -c $pod_demo1_ip 2>&1 &

nohup oc exec -n zte -it iperf -- iperf3 -T demo2 -i 10 -t 30 -b 6G -P 6 -p 6666 -c $pod_demo2_ip 2>&1 &

tail -f nohup.out

# 调整流量重新测试,对两个服务端pod进行测速
/bin/rm -f nohup.out
nohup oc exec -n zte -it iperf -- iperf3 -T demo1 -i 10 -t 30 -b 6G -P 6 -p 6666 -c $pod_demo1_ip 2>&1 &

nohup oc exec -n zte -it iperf -- iperf3 -T demo2 -i 10 -t 30 -b 6G -P 6 -p 6666 -c $pod_demo2_ip 2>&1 &

tail -f nohup.out

# 调整流量重新测试,对两个服务端pod进行测速
/bin/rm -f nohup.out
nohup oc exec -n zte -it iperf -- iperf3 -T demo1 -i 10 -t 30 -b 8G -P 6 -p 6666 -c $pod_demo1_ip 2>&1 &

nohup oc exec -n zte -it iperf -- iperf3 -T demo2 -i 10 -t 30 -b 6G -P 6 -p 6666 -c $pod_demo2_ip 2>&1 &

tail -f nohup.out

# 查看服务端pod的日志,可以看到流量信息

# 更改服务端带宽为6G
oc delete pod -n demo demo-pod1

cat << EOF > demo1.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod1
  namespace: demo
  annotations:
    kubernetes.io/ingress-bandwidth: 6G
    kubernetes.io/egress-bandwidth: 6G
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always

EOF
oc apply -n demo -f demo1.yaml

# 查找服务端pod ip
oc get pod -A -o wide | grep demo-pod
oc get pod -n zte -o wide

pod_demo1_ip=$(oc get pod -n demo demo-pod1 -o json | jq -r '.status.podIPs[0].ip')

pod_demo2_ip=$(oc get pod -n default demo-pod2 -o json | jq -r '.status.podIPs[0].ip')

echo $pod_demo1_ip
echo $pod_demo2_ip

# 调整流量重新测试,对两个服务端pod进行测速
/bin/rm -f nohup.out
nohup oc exec -n zte -it iperf -- iperf3 -T demo1 -i 10 -t 30 -b 8G -P 6 -p 6666 -c $pod_demo1_ip 2>&1 &

nohup oc exec -n zte -it iperf -- iperf3 -T demo2 -i 10 -t 30 -b 6G -P 6 -p 6666 -c $pod_demo2_ip 2>&1 &

tail -f nohup.out

# 查看服务端pod的日志,可以看到流量信息

# 更改服务端带宽为3G
oc delete pod -n demo demo-pod1

cat << EOF > demo1.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: demo-pod1
  namespace: demo
  annotations:
    kubernetes.io/ingress-bandwidth: 3G
    kubernetes.io/egress-bandwidth: 3G
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf3", "-s", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always

EOF
oc apply -n demo -f demo1.yaml

# 查找服务端pod ip
oc get pod -A -o wide | grep demo-pod
oc get pod -n zte -o wide

pod_demo1_ip=$(oc get pod -n demo demo-pod1 -o json | jq -r '.status.podIPs[0].ip')

pod_demo2_ip=$(oc get pod -n default demo-pod2 -o json | jq -r '.status.podIPs[0].ip')

echo $pod_demo1_ip
echo $pod_demo2_ip

# 调整流量重新测试,对两个服务端pod进行测速
/bin/rm -f nohup.out
nohup oc exec -n zte -it iperf -- iperf3 -T demo1 -i 10 -t 30 -b 8G -P 6 -p 6666 -c $pod_demo1_ip 2>&1 &

nohup oc exec -n zte -it iperf -- iperf3 -T demo2 -i 10 -t 30 -b 6G -P 6 -p 6666 -c $pod_demo2_ip 2>&1 &

tail -f nohup.out

# 查看服务端pod的日志,可以看到流量信息

oc delete -f demo.yaml

package size


oc exec -n zte -it iperf -- iperf3 -T demo1 -V -b 10G -M 1500 -p 6666 -c $pod_demo1_ip
# demo1:  Test Complete. Summary Results:
# demo1:  [ ID] Interval           Transfer     Bandwidth       Retr
# demo1:  [  4]   0.00-10.00  sec  3.66 GBytes  3.15 Gbits/sec  221             sender
# demo1:  [  4]   0.00-10.00  sec  3.66 GBytes  3.14 Gbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 38.5% (1.8%u/36.6%s), remote/receiver 9.6% (0.4%u/9.2%s)

oc exec -n zte -it iperf -- iperf3 -T demo1 -V -b 10G -M 1000 -p 6666 -c $pod_demo1_ip
# demo1:  Test Complete. Summary Results:
# demo1:  [ ID] Interval           Transfer     Bandwidth       Retr
# demo1:  [  4]   0.00-10.00  sec  2.68 GBytes  2.30 Gbits/sec  304             sender
# demo1:  [  4]   0.00-10.00  sec  2.68 GBytes  2.30 Gbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 22.8% (1.0%u/21.7%s), remote/receiver 2.4% (0.2%u/2.2%s)

oc exec -n zte -it iperf -- iperf3 -T demo1 -V -b 10G -M 500 -p 6666 -c $pod_demo1_ip
# demo1:  Test Complete. Summary Results:
# demo1:  [ ID] Interval           Transfer     Bandwidth       Retr
# demo1:  [  4]   0.00-10.00  sec  1.32 GBytes  1.14 Gbits/sec  195             sender
# demo1:  [  4]   0.00-10.00  sec  1.32 GBytes  1.13 Gbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 13.6% (0.9%u/12.7%s), remote/receiver 4.2% (0.3%u/4.0%s)

oc exec -n zte -it iperf -- iperf3 -T demo1 -V -b 10G -M 100 -p 6666 -c $pod_demo1_ip
# demo1:  Test Complete. Summary Results:
# demo1:  [ ID] Interval           Transfer     Bandwidth       Retr
# demo1:  [  4]   0.00-10.00  sec   224 MBytes   188 Mbits/sec  590             sender
# demo1:  [  4]   0.00-10.00  sec   223 MBytes   187 Mbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 3.5% (0.2%u/3.3%s), remote/receiver 10.2% (0.1%u/10.1%s)


oc exec -n zte -it iperf -- iperf3 -T demo1 -V -b 10G -M 1500 -P 10 -p 6666 -c $pod_demo1_ip
# demo1:  [SUM]   0.00-10.00  sec  9.21 GBytes  7.91 Gbits/sec  4804             sender
# demo1:  [SUM]   0.00-10.00  sec  9.20 GBytes  7.90 Gbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 65.3% (2.5%u/62.8%s), remote/receiver 28.5% (0.4%u/28.1%s)

oc exec -n zte -it iperf -- iperf3 -T demo1 -V -b 10G -M 1000 -P 10 -p 6666 -c $pod_demo1_ip
# demo1:  [SUM]   0.00-10.00  sec  8.62 GBytes  7.40 Gbits/sec  4354             sender
# demo1:  [SUM]   0.00-10.00  sec  8.61 GBytes  7.40 Gbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 73.7% (2.4%u/71.3%s), remote/receiver 19.7% (0.9%u/18.8%s)

oc exec -n zte -it iperf -- iperf3 -T demo1 -V -b 10G -M 500 -P 10 -p 6666 -c $pod_demo1_ip
# demo1:  [SUM]   0.00-10.00  sec  4.72 GBytes  4.05 Gbits/sec  7142             sender
# demo1:  [SUM]   0.00-10.00  sec  4.71 GBytes  4.05 Gbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 49.4% (2.0%u/47.3%s), remote/receiver 17.6% (0.6%u/17.1%s)

oc exec -n zte -it iperf -- iperf3 -T demo1 -V -b 10G -M 100 -P 10 -p 6666 -c $pod_demo1_ip
# demo1:  [SUM]   0.00-10.00  sec   895 MBytes   750 Mbits/sec  10362             sender
# demo1:  [SUM]   0.00-10.00  sec   889 MBytes   745 Mbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 14.4% (0.6%u/13.7%s), remote/receiver 22.6% (0.3%u/22.3%s)



iperf3 -T demo1 -V -b 10G -M 1500 -p 6666 -c 117.177.241.24
# demo1:  Test Complete. Summary Results:
# demo1:  [ ID] Interval           Transfer     Bandwidth       Retr
# demo1:  [  4]   0.00-10.00  sec  10.5 GBytes  8.98 Gbits/sec    0             sender
# demo1:  [  4]   0.00-10.00  sec  10.4 GBytes  8.98 Gbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 52.8% (2.7%u/50.2%s), remote/receiver 30.6% (1.0%u/29.5%s)

iperf3 -T demo1 -V -b 10G -M 1000 -p 6666 -c 117.177.241.24
# demo1:  Test Complete. Summary Results:
# demo1:  [ ID] Interval           Transfer     Bandwidth       Retr
# demo1:  [  4]   0.00-10.00  sec  9.28 GBytes  7.97 Gbits/sec    0             sender
# demo1:  [  4]   0.00-10.00  sec  9.27 GBytes  7.96 Gbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 54.4% (3.2%u/51.2%s), remote/receiver 19.2% (0.1%u/19.1%s)

iperf3 -T demo1 -V -b 10G -M 500 -p 6666 -c 117.177.241.24
# demo1:  Test Complete. Summary Results:
# demo1:  [ ID] Interval           Transfer     Bandwidth       Retr
# demo1:  [  4]   0.00-10.00  sec  6.14 GBytes  5.28 Gbits/sec  5857             sender
# demo1:  [  4]   0.00-10.00  sec  6.14 GBytes  5.27 Gbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 30.6% (2.1%u/28.5%s), remote/receiver 12.6% (0.1%u/12.5%s)

iperf3 -T demo1 -V -b 10G -M 100 -p 6666 -c 117.177.241.24
# demo1:  Test Complete. Summary Results:
# demo1:  [ ID] Interval           Transfer     Bandwidth       Retr
# demo1:  [  4]   0.00-10.00  sec  1.41 GBytes  1.21 Gbits/sec  3499             sender
# demo1:  [  4]   0.00-10.00  sec  1.40 GBytes  1.21 Gbits/sec                  receiver
# demo1:  CPU Utilization: local/sender 8.2% (0.9%u/7.4%s), remote/receiver 23.8% (0.1%u/23.7%s)


如何添加 http_proxy 来下载镜像

我们的部署环境,如果有一个image proxy,那么可以配置集群,使用这个proxy去下载镜像。

关键是crio的环境变量,所以给这个目录添加一个环境变量进去,/etc/systemd/system/crio.service.d/

cat << EOF > crio-env.conf
[Service]
Environment=HTTP_PROXY=http://v.redhat.ren:8080
Environment=HTTPS_PROXY=http://v.redhat.ren:8080
Environment=NO_PROXY=redhat.ren,10.254.0.0/16,172.30.0.0/16
EOF

config_source=$(cat ./crio-env.conf | python3 -c "import sys, urllib.parse; print(urllib.parse.quote(''.join(sys.stdin.readlines())))"  )

cat <<EOF > 50-crio-env-conf.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 50-crio-env-conf
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain,${config_source}
          verification: {}
        filesystem: root
        mode: 0420
        path: /etc/systemd/system/crio.service.d/20-wzh-env.conf
      - contents:
          source: data:text/plain,${config_source}
          verification: {}
        filesystem: root
        mode: 0420
        path: /etc/systemd/system/kubelet.service.d/20-wzh-env.conf
      - contents:
          source: data:text/plain,${config_source}
          verification: {}
        filesystem: root
        mode: 0420
        path: /etc/systemd/system/machine-config-daemon-host.service.d/20-wzh-env.conf
      - contents:
          source: data:text/plain,${config_source}
          verification: {}
        filesystem: root
        mode: 0420
        path: /etc/systemd/system/pivot.service.d/20-wzh-env.conf
EOF
oc apply -f 50-crio-env-conf.yaml -n openshift-config

等待集群重启以后,测试一下

cat << EOF > test-local-dc.yaml
kind: DeploymentConfig
apiVersion: apps.openshift.io/v1
metadata:
  name: busybox
  labels:
    run: busybox
spec:
  replicas: 1
  template:
    metadata:
      labels:
        run: busybox
    spec:
      containers:
        - name: busybox
          image: 'docker.io/busybox:1.28.0-glibc'
          command:
            - sleep
            - '36000'

EOF
oc apply -f test-local-dc.yaml

虽然实验环境网络问题,没下载成功,但是看到下载是在走proxy了。

以下是弯路

这样就可以通过内网的proxy server去pull image了。

调优 /etc/crio/crio.conf 的方法不可以,因为查过源代码以后,发现下面链接说的操作,源代码里面也就支持3个选项,其他选项都不支持。 https://www.redhat.com/en/blog/red-hat-openshift-container-platform-4-now-defaults-cri-o-underlying-container-engine

然后从源代码里面,高兴的发现,/etc/systemd/system/crio.service.d/10-default-env.conf 是可以通过proxy的配置生效的。 https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/files/etc-systemd-system-crio.service.d-10-default-env.conf.yaml

配置一个proxy https://access.redhat.com/solutions/3442811

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec:
  httpProxy: http://v.redhat.ren:8080 
  httpsProxy: http://v.redhat.ren:8080 
  noProxy: example.com 
cat << EOF > proxy.yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec:
  httpProxy: http://v.redhat.ren:8080 
  httpsProxy: http://v.redhat.ren:8080 
  readinessEndpoints:
  - http://www.google.com 
  noProxy: example.com 
  trustedCA:
    name: ca.for.proxy
EOF
oc apply -f proxy.yaml

cat << EOF > proxy.yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec: {}
EOF
oc apply -f proxy.yaml

cat /etc/systemd/system/crio.service.d/10-default-env.conf

cat << EOF > ca.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ca.for.proxy
  namespace: openshift-config 
data:
  ca-bundle.crt: | 
    -----BEGIN PRIVATE KEY-----
    MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCpRqAtwkQsmdA5
    qDyAV7ABoRmZdDh7aaH9OY+gHRVtMDYbEH1e3u4oIJ5CoAK4EiZ/AZA2Pb5xFO+5
    63YwMFEucg0TcCAs20yFbhkRXac1UxsGmx3zUSfex6/A6yxwyx14/HBoli6Trqpr
    oPxUFDFoHHe6zIqgQkdjdYttL/vwrVg2yH2Z3IS1qQ/uN8UpyL/yY48VRimQsGjX
    9FmRusONsUdRYh29gbOI76hJ7ooCNGvgbXq/6L6OGu6by+g6MgqHtBWMjnObWkWV
    ln1lRRfmhwlGO0136lURt58diJSIWPXOpSO4Ulc2JMH9D+pgAD59JU4pm1PvGotc
    e+WIxvJ9AgMBAAECggEACpulcBirgwwEk4hqejSEkCWTYB17aKh/AUp5KLSJ4jTS
    PzHyWV6pGBSrNkumv/hLN0xWyD9oTtfcCg+qcWylub5l+WDec1Eu43G52m+/CcVy
    fSB9aQEd+YUUC4fxWgQwjaNsO/Gla5XXkjUdevtk+TxHeIpW6aIdrSrxmN8X78Yj
    F0FIPYSAM4Lh2ZdykFS9igbteRN27WGlypKF6D7efDfbh4TLuVtSMRyehjewyy3U
    DAYkkMm1SD/TH4HJQU8eU3Gp3ZZmP4uSTESfBc/6lrSy/ooXqtc/x8dv0SQtky0I
    FQu/bTdrSjz3gOKZVfaLsG4LMiMo7M4SekyU2EGulQKBgQDUobsMXV0WrwVF4JFF
    ug3PxXwcatlnesrlcOPQQdhZz4ngk3z49GxPrXykzFQ5KtMCsgyOhNpXOVu6vqew
    0QmxJvF8Mo0GhwIOANlrQSn/Flt5s5GIPqteAE//RxSsAhRm6fDnxKik2aT5XOYl
    9GQvFvPDtjSR0nBHQg5BuBgtbwKBgQDLzSDr61tbU02/bV/td6CkVMSSpGHpfUU+
    0rGC9/JzBmBDr/mC5fDUN0bno1zq35HURxzhk306BJSbMMwnwmUFgWxPuJwlVo2V
    Zs3x41eYzTj7JOPZ/AphR+6pdpXlsoxpXUQRgWq1j8hq0wUqDL8s0ltzoDJFMxri
    J9N7fv6A0wKBgQChFk3Q1kKZ1sqV38XvHz8rcx/Nn51I6hwgqt/MfLXdhH+eJd59
    9R7BVluhtjLwhGMMHbuplTic8BVwatQ7/oHrNeepAdsZYNrLpRUSTnH0kQmIL+RH
    ZcMKGg6BBWbB0WmHdiBOVgy1pzV2vUyW4ImtqyPN15IID3eEZKTMYR3f/QKBgFke
    QBEp/+71hH/64gHDV/nEH5lITJB/ePI5y+nLZrepyBqRLvhweFk0Oss8Anuqe+hp
    mFWD2zStoBYkxoF0XhyENcq+nXkuWgdExzXJBhsJUqtvvDssHZXgkJqGApJI+2Fv
    qT5Ga1UtpKQh1pZGsKp26gqruI/OAyl15OKR69SFAoGADAOAADooY3Qcn9AWH1e8
    ebSDdimi4j1H9yFvcByaJkNrGhNgKwYYYeLsCvwxGLjRontoH6xOJAVdwmadV/CH
    6Ket3yJLWRIuu1N1IKvfLEqLsp2sbWKInhohEfh5yZmvCeTUjJKkz62DYS20JsN0
    1+gdBRElKgEz14GTvj7lpas=
    -----END PRIVATE KEY-----
    -----BEGIN CERTIFICATE-----
    MIIDVzCCAj+gAwIBAgIJANzkXo7TCVYVMA0GCSqGSIb3DQEBCwUAMEIxCzAJBgNV
    BAYTAlhYMRUwEwYDVQQHDAxEZWZhdWx0IENpdHkxHDAaBgNVBAoME0RlZmF1bHQg
    Q29tcGFueSBMdGQwHhcNMjAwMjIyMDMxOTMxWhcNMjEwMjIxMDMxOTMxWjBCMQsw
    CQYDVQQGEwJYWDEVMBMGA1UEBwwMRGVmYXVsdCBDaXR5MRwwGgYDVQQKDBNEZWZh
    dWx0IENvbXBhbnkgTHRkMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
    qUagLcJELJnQOag8gFewAaEZmXQ4e2mh/TmPoB0VbTA2GxB9Xt7uKCCeQqACuBIm
    fwGQNj2+cRTvuet2MDBRLnINE3AgLNtMhW4ZEV2nNVMbBpsd81En3sevwOsscMsd
    ePxwaJYuk66qa6D8VBQxaBx3usyKoEJHY3WLbS/78K1YNsh9mdyEtakP7jfFKci/
    8mOPFUYpkLBo1/RZkbrDjbFHUWIdvYGziO+oSe6KAjRr4G16v+i+jhrum8voOjIK
    h7QVjI5zm1pFlZZ9ZUUX5ocJRjtNd+pVEbefHYiUiFj1zqUjuFJXNiTB/Q/qYAA+
    fSVOKZtT7xqLXHvliMbyfQIDAQABo1AwTjAdBgNVHQ4EFgQUaTkD399lxrjHrHkl
    Mq1se4L+yr0wHwYDVR0jBBgwFoAUaTkD399lxrjHrHklMq1se4L+yr0wDAYDVR0T
    BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAkuBFWQV2dFfwVChhVGKxynQ3JD48
    tT27b8G0YHMIM1WGkYIO7jWOx4Vvpo0ykqvwP1r7gVLHectPynCt55c1/lN9FxuV
    o+VTGN2ObA8AyEr4pPUJf7rav9GBlyJlIGL2IM4A9b0aCqfwIg0OyTSQzI5E5Cv8
    SDj1XTCPwkZT+Vq8aXorpej4dNhz//0AA872pAtwp9ex+KPOVRRZM4cQfQof3saB
    oPSkc8R2sA1TYNweeF4cWctWz2G0Vy/uo0fwcTb9NJwpzZlRBclg2S9WA9dMwnV8
    LVnyLpo2cf4R2z8zDcfDoQV7i6JxzfTQCeUO1Zy4zPTbtKt1k8g3dYfF0w==
    -----END CERTIFICATE-----
EOF
oc apply -f ca.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: set-log-and-pid
spec:
 machineConfigPoolSelector:
   matchLabels:
     debug-crio: config1
 containerRuntimeConfig:
   conmon_env: "[ HTTP_PROXY=http://v.redhat.ren:8080, HTTPS_PROXY=http://v.redhat.ren:8080 ]" 
cat << EOF > crio.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: set-log-and-pid
spec:
 machineConfigPoolSelector:
   matchLabels:
     debug-crio: config1
 containerRuntimeConfig:
   conmon_env: '[HTTP_PROXY=http://v.redhat.ren:8080,HTTPS_PROXY=http://v.redhat.ren:8080]'
EOF

oc apply -f crio.yaml

oc delete -f crio.yaml

oc edit MachineConfigPool/worker

oc get ContainerRuntimeConfig -o yaml

oc get MachineConfigs

python3 -c "import sys, urllib.parse; print(urllib.parse.unquote(sys.argv[1]))" $(oc get MachineConfig/rendered-worker-a01b5da25ec85d2f0ffabfeb1fbe996d -o YAML | grep -B4 crio.conf | grep source | tail -n 1 | cut -d, -f2) | grep conmon

numa

https://docs.openshift.com/container-platform/4.3/scalability_and_performance/using-topology-manager.html#topology_manager_policies_using-topology-manager

https://www.sharcnet.ca/help/index.php/Using_numactl

video

  • https://youtu.be/J2VQQZxk3eY
  • https://www.bilibili.com/video/BV1HK4y1r7Di/
oc get featuregate/cluster -o yaml

oc patch featuregate/cluster -p '{"spec": { "featureSet": "LatencySensitive" } }' --type=merge

oc get KubeletConfig -o yaml

cat << EOF > cpumanager-kubeletconfig.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static
     cpuManagerReconcilePeriod: 5s
     topologyManagerPolicy: single-numa-node 
EOF
oc apply -f cpumanager-kubeletconfig.yaml

oc project demo 

cat << EOF > cpumanager-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  generateName: cpumanager-
spec:
  containers:
  - name: cpumanager
    image: gcr.io/google_containers/pause-amd64:3.0
    resources:
      requests:
        cpu: 1
        memory: "1G"
      limits:
        cpu: 1
        memory: "1G"
  nodeSelector:
    cpumanager: "true"
EOF
oc apply -f cpumanager-pod.yaml

# on the worker node
yum install numactl
# 指定命令运行在NUMA NODE0上(CPU,内存都来自NUMA NODE0)
numactl --cpunodebind=0 --membind=0 COMMAND
# 指定命令CPU来自NUMA NODE1,内存尽可能来自NUMA NODE1,如果NUMA NODE1没有足够的内存了,则使用NUMA NODE0上的内存
numactl --cpunodebind=1 --preferred=1 COMMAND
# 获取进程cpu的mask
taskset -p <pid>
# pid 26624's current affinity mask: ff  这个是没设置掩码

# 进程的memory信息可以通过命令获取
numastat <pid>
# Per-node process memory usage (in MBs) for PID 26624 (firefox)
#                            Node 0           Total
#                   --------------- ---------------
# Huge                         0.00            0.00
# Heap                         0.00            0.00
# Stack                        0.08            0.08
# Private                    208.50          208.50
# ----------------  --------------- ---------------
# Total                      208.58          208.58
# 类似于进程,在某个NUMA Node上占用多少内存

# 查询PCI网卡设备所在numa node
cat /sys/class/net/<devicename>/device/numa_node


# back to normal
cat << EOF > cpumanager-kubeletconfig.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static
     cpuManagerReconcilePeriod: 5s
     topologyManagerPolicy: none 
EOF
oc apply -f cpumanager-kubeletconfig.yaml

# delete them all
oc delete -f cpumanager-kubeletconfig.yaml

openshift 4.3 network policy demo

https://docs.openshift.com/container-platform/4.3/networking/configuring-networkpolicy.html

video

  • https://youtu.be/pbV2VwIExVg
  • https://www.bilibili.com/video/BV1vz411B7pC/

# 为zxcdn namespace,和demo namespace配置network policy,只放行CDN内部应用和ingress的流量,外部应用流量一律拒绝。
cat << EOF > demo.yaml
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-other-namespaces
spec:
  podSelector: null
  ingress:
    - from:
        - podSelector: {}
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-ingress
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          network.openshift.io/policy-group: ingress
  podSelector: {}
  policyTypes:
  - Ingress
EOF
oc apply -n zxcdn -f demo.yaml
oc apply -n demo -f demo.yaml

# 在 demo 和 zxcdn 空间中,各创建一个测试用的pod
cat << EOF > demo.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo  
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >- 
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: ["/bin/bash", "-c", "--" ]
          args: [ "trap : TERM INT; sleep infinity & wait" ]
          imagePullPolicy: Always

EOF
oc apply -n demo -f demo.yaml
oc apply -n zxcdn -f demo.yaml

# 查找cdn的ip地址
oc get pod -o wide -n zxcdn

# 进入demo pod,ping cdn pod,应该ping不通

# 配置zxcdn namespace的network policy,放行demo namespace
oc label namespace demo name=demo

cat << EOF > demo.yaml
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-other-namespaces
spec:
  podSelector: null
  ingress:
    - from:
        - podSelector: {}
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-ingress
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          network.openshift.io/policy-group: ingress
  podSelector: {}
  policyTypes:
  - Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-other
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: demo
  podSelector: {}
  policyTypes:
  - Ingress
EOF
oc apply -n zxcdn -f demo.yaml

# 进入demo pod,ping cdn pod,应该可以ping通


# 进入zxcdn project里面的一个pod, ping demo pod,应该ping不通
oc get pod -n demo -o wide

# 配置 demo namespace的network policy, 放行 zxcdn namespace
oc label namespace zxcdn name=zxcdn

cat << EOF > demo.yaml
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-other-namespaces
spec:
  podSelector: null
  ingress:
    - from:
        - podSelector: {}
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-ingress
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          network.openshift.io/policy-group: ingress
  podSelector: {}
  policyTypes:
  - Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-other
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: zxcdn
  podSelector: {}
  policyTypes:
  - Ingress
EOF
oc apply -n demo -f demo.yaml

# 进入zxcdn project里面的一个pod, ping demo pod,应该能够ping通



oc delete -n zxcdn -f demo.yaml
oc delete -n demo -f demo.yaml


openshift 4.3 multicast

本文测试 openshift pod 之间组播的功能

video:

  • https://youtu.be/4UriNYHRbHk
  • https://www.bilibili.com/video/BV1wk4y1k7sS/

参考资料:

https://docs.openshift.com/container-platform/4.3/networking/openshift_sdn/using-multicast.html

https://pktgen-dpdk.readthedocs.io/en/latest/getting_started.html

https://access.redhat.com/solutions/406553

https://wenku.baidu.com/view/9a7c3c3dbdd126fff705cc1755270722182e5943.html?rec_flag=default

# 在相应的 project 上激活组播功能
oc annotate netnamespace demo \
    netnamespace.network.openshift.io/multicast-enabled=true

# 创建两个组播服务端pod,再创建一个组播测试pod
cat << EOF > demo.yaml
---
kind: Pod
apiVersion: v1
metadata:
  name: demo1
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf", "-s", "-u ","-B", "224.0.0.1", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: demo2
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: demo1
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["iperf", "-s", "-u ","-B", "224.0.0.1", "-p" ]
      args: [ "6666" ]
      imagePullPolicy: Always
---
kind: Pod
apiVersion: v1
metadata:
  name: iperf
spec:
  nodeSelector:
    kubernetes.io/hostname: 'infra0.hsc.redhat.ren'
  restartPolicy: Always
  containers:
    - name: iperf
      image: >- 
        registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
      env:
        - name: key
          value: value
      command: ["/bin/bash", "-c", "--" ]
      args: [ "trap : TERM INT; sleep infinity & wait" ]
      imagePullPolicy: Always      
EOF
oc apply -n demo -f demo.yaml

oc project demo

# 查看 pod 运行正常,pod 分布正常
oc get pod -o wide

# 查看组播服务 pod demo1 的组播地址
oc exec -it demo1 -- ipmaddr show dev eth0
# 3:      eth0
#         link  33:33:00:00:00:01
#         link  01:00:5e:00:00:01
#         link  33:33:ff:07:a8:2e
#         inet  224.0.0.1
#         inet6 ff02::1:ff07:a82e
#         inet6 ff02::1
#         inet6 ff01::1

# 查看组播服务 pod demo2 的组播地址
oc exec -it demo2 -- ipmaddr show dev eth0
# 3:      eth0
#         link  33:33:00:00:00:01
#         link  01:00:5e:00:00:01
#         link  33:33:ff:5c:ba:66
#         inet  224.0.0.1
#         inet6 ff02::1:ff5c:ba66
#         inet6 ff02::1
#         inet6 ff01::1

# 在测试 pod iperf 上,创建目标是 224.0.0.1 的组播流量
oc exec -it iperf -- iperf -c 224.0.0.1 -u -p 6666 -t 30 -i 1

# 在服务端 pod demo1 上,监听端口,能看到目标 224.0.0.1 的组播流量
oc exec -it demo1 -- tcpdump -i eth0 -nn
# 在服务端 pod demo2 上,监听端口,能看到目标 224.0.0.1 的组播流量
oc exec -it demo2 -- tcpdump -i eth0 -nn

# 在测试 pod iperf 上,创建目标是 225.0.0.2 的组播流量
oc exec -it iperf -- iperf -c 225.0.0.2 -u -p 6666 -t 30 -i 1

# 在服务端 pod demo1 上,监听端口,能看到目标 225.0.0.2 的组播流量
oc exec -it demo1 -- tcpdump -i eth0 -nn
# 在服务端 pod demo2 上,监听端口,能看到目标 225.0.0.2 的组播流量
oc exec -it demo2 -- tcpdump -i eth0 -nn


# 恢复环境
oc delete -f demo.yaml

pkgen

oc annotate netnamespace demo \
    netnamespace.network.openshift.io/multicast-enabled=true

# do below before create pod
modprobe pktgen

ps aux | grep pktgen

ls /proc/net/pktgen/

# create pod
oc project demo
oc get sa
oc create serviceaccount -n demo demo-app
oc adm policy add-scc-to-user privileged -z demo-app

cat << EOF > demo1.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
  name: demo1
  namespace: demo
  labels:
    app: demo1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: demo1
  template:
    metadata:
      labels:
        app: demo1
    spec:
      nodeSelector:
        kubernetes.io/hostname: 'worker-0'
      restartPolicy: Always
      containers:
        - name: demo1
          image: >-
            registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
          env:
            - name: key
              value: value
          command: [ "/bin/bash", "-c", "--" ]
          args: [ "while true; do sleep 300000; done;" ]
          imagePullPolicy: Always
          securityContext:
            privileged: true
      serviceAccount: demo-app
EOF
oc apply -f demo1.yaml

ipmaddr show dev eth0
# 3:      eth0
#         link  33:33:00:00:00:01
#         link  01:00:5e:00:00:01
#         link  33:33:ff:ff:9d:55
#         inet  224.0.0.1
#         inet6 ff02::1:ffff:9d55
#         inet6 ff02::1
#         inet6 ff01::1

export IF=if581

echo "rem_device_all" > /proc/net/pktgen/kpktgend_0
echo "add_device eth0@${IF}" > /proc/net/pktgen/kpktgend_0
echo "max_before_softirq 100000" > /proc/net/pktgen/kpktgend_0

echo "count 100" > /proc/net/pktgen/eth0@${IF}
echo "clone_skb 1000000" > /proc/net/pktgen/eth0@${IF}
echo "pkt_size 1300" > /proc/net/pktgen/eth0@${IF}
echo "delay 0" > /proc/net/pktgen/eth0@${IF}
echo "dst 224.0.0.2" > /proc/net/pktgen/eth0@${IF}
echo "dst_mac 01:00:5e:00:00:02" > /proc/net/pktgen/eth0@${IF}

echo start > /proc/net/pktgen/pgctrl

cat /proc/net/pktgen/eth0@${IF}

# oc rsh <another pod>
tcpdump -i eth0 -nn

openshift 4.3 firewall

本文记录,如何在openshift集群主机上应用防火墙。这对于客户有内部扫描审计来说,很有用。

做法很简单,就是调用systemd来注入一个新服务,启动本地定制化脚本。

这种做法可以用来做任何你想在coreos瞎搞的事情:)

coreos

对于coreos,特别是master。


cat << EOF > wzh.script
#!/bin/bash

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -s 127.0.0.1/32 -j ACCEPT
iptables -A INPUT -s 223.87.20.0/24 -j ACCEPT
iptables -A INPUT -s 117.177.241.0/24 -j ACCEPT
iptables -A INPUT -s 39.134.200.0/24 -j ACCEPT
iptables -A INPUT -s 192.168.7.0/24 -j ACCEPT
iptables -A INPUT -s 112.44.102.224/27 -j ACCEPT
iptables -A INPUT -s 47.93.86.113/32 -j ACCEPT
iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -p udp -j REJECT

EOF

var_local=$(cat ./wzh.script | python3 -c "import sys, urllib.parse; print(urllib.parse.quote(''.join(sys.stdin.readlines())))"  )

cat <<EOF > 45-master-wzh-service.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 45-master-wzh-service
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain,${var_local}
          verification: {}
        filesystem: root
        mode: 0755
        path: /etc/rc.d/wzh.local
    systemd:
      units:
      - name: wzh.service
        enabled: true
        contents: |
          [Unit]
          Description=/etc/rc.d/wzh.local Compatibility
          Documentation=zhengwan@redhat.com
          ConditionFileIsExecutable=/etc/rc.d/wzh.local
          After=network.target

          [Service]
          Type=oneshot
          User=root
          Group=root
          ExecStart=/bin/bash -c /etc/rc.d/wzh.local

          [Install]
          WantedBy=multi-user.target

EOF
oc apply -f 45-master-wzh-service.yaml -n openshift-config

oc delete -f 45-wzh-service.yaml -n openshift-config

for rhel with firewalld

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/security_guide/sec-setting_and_controlling_ip_sets_using_firewalld

https://unix.stackexchange.com/questions/159873/whitelist-source-ip-addresses-in-centos-7


firewall-cmd --get-ipset-types
firewall-cmd --permanent --get-ipsets

firewall-cmd --permanent --new-ipset=my-allow-list --type=hash:net
firewall-cmd --permanent --get-ipsets

# firewall-cmd --permanent --info-ipset=my-allow-list

cat > /root/ocp4/iplist.txt <<EOL
127.0.0.1/32
223.87.20.0/24
117.177.241.0/24
39.134.200.0/24
39.134.201.0/24
39.137.101.0/24
192.168.7.0/24
112.44.102.224/27
47.93.86.113/32
EOL

firewall-cmd --permanent --ipset=my-allow-list --add-entries-from-file=iplist.txt

firewall-cmd --permanent --ipset=my-allow-list --get-entries

firewall-cmd --permanent --zone=trusted --add-source=ipset:my-allow-list 
firewall-cmd --reload

firewall-cmd --list-all

# firewall-cmd --permanent --zone=trusted --add-source=192.168.7.0/24
firewall-cmd --get-active-zones
# firewall-cmd --zone=block --change-interface=em1

firewall-cmd --set-default-zone=block
firewall-cmd --runtime-to-permanent
firewall-cmd --reload

firewall-cmd --list-all-zones

firewall-cmd --get-default-zone

for rhel with iptables

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/security_guide/sec-setting_and_controlling_ip_sets_using_iptables


# secure for anti-scan
cat << EOF >> /etc/rc.local

ipset create my-allow-set hash:net
ipset add my-allow-set 127.0.0.1/32
ipset add my-allow-set 223.87.20.0/24
ipset add my-allow-set 117.177.241.0/24
ipset add my-allow-set 39.134.200.0/24
ipset add my-allow-set 39.134.201.0/24
ipset add my-allow-set 39.137.101.0/24
ipset add my-allow-set 192.168.7.0/24
ipset add my-allow-set 112.44.102.224/27
ipset add my-allow-set 47.93.86.113/32

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m set --match-set my-allow-set src -j ACCEPT
iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -p udp -j REJECT

EOF

chmod +x /etc/rc.d/rc.local
systemctl enable rc-local

# systemctl start rc-local

ipset list

# 221.226.0.75
# 210.21.236.182
# 61.132.54.2
ipset add my-allow-set 221.226.0.75/32
ipset add my-allow-set 210.21.236.182/32
ipset add my-allow-set 61.132.54.2/32

other record


# https://bugzilla.redhat.com/show_bug.cgi?id=1723327
# https://access.redhat.com/solutions/4264181
for i in $(oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-daemon -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | xargs); do oc rsh -n openshift-machine-config-operator $i chroot /rootfs rm -rf /run/pivot/reboot-needed; done

rpm-ostree rollback --reboot

cat << EOF > wzh.service
[Unit]
Description=/etc/rc.d/wzh.local Compatibility
Documentation=zhengwan@redhat.com
ConditionFileIsExecutable=/etc/rc.d/wzh.local
After=network.target

[Service]
Type=oneshot
User=root
Group=root
ExecStart=/bin/bash -c /etc/rc.d/wzh.local

[Install]
WantedBy=multi-user.target
EOF

var_service=$(cat ./wzh.service | python3 -c "import sys, urllib.parse; print(urllib.parse.quote(''.join(sys.stdin.readlines())))"  )


openshift 4.3 using ldap

演示场景如下

  • 部署openlap,并部署web前端
  • 在openlap上配置2个group,一个是admins,一个是users,并给每个group配置一个user
  • ocp上配置ldap方式的用户认证
  • 在ocp上使用命令行,同步ldap,查看已经生成了group和user
  • 用这个用户登录ocp,发现什么都干不了
  • 在ocp上使用命令行,给admins group授予cluster view的权限,给users group授予demo project view的权限。
  • 重新登录/刷新页面,可以看到admin用户可以看到整个集群的内容,users的用户有了demo project的权限。

video

  • https://youtu.be/Sg3euS3ip4k
  • https://www.bilibili.com/video/BV1XA411b7N6/

参考资料:

  • https://docs.openshift.com/container-platform/4.3/authentication/identity_providers/configuring-ldap-identity-provider.html
  • https://docs.openshift.com/container-platform/4.3/authentication/ldap-syncing.html
  • https://www.cnblogs.com/ericnie/p/10063816.html
  • https://access.redhat.com/solutions/2484371
  • https://access.redhat.com/solutions/3419841

openldap


skopeo copy docker://docker.io/osixia/openldap:latest docker://registry.redhat.ren:5443/docker.io/osixia/openldap:latest

skopeo copy docker://docker.io/osixia/phpldapadmin:latest docker://registry.redhat.ren:5443/docker.io/osixia/phpldapadmin:latest

# 启动openldap服务
podman run -p 389:389 --name openldap --hostname ldap.redhat.ren --env LDAP_ORGANISATION="redhat" --env LDAP_DOMAIN="redhat.ren" --env LDAP_ADMIN_PASSWORD="ldap123" --detach registry.redhat.ren:5443/docker.io/osixia/openldap:latest
# 默认登录用户名:admin

podman run -d -p 5080:80 --name phpldapadmin --env PHPLDAPADMIN_HTTPS=false --env PHPLDAPADMIN_LDAP_HOSTS=117.177.241.16 --detach registry.redhat.ren:5443/docker.io/osixia/phpldapadmin:latest
# http://helper.hsc.redhat.ren:5080
# Login DN: cn=admin,dc=redhat,dc=ren
# Password: ldap123

podman rm -fv phpldapadmin
podman rm -fv openldap

yum install -y openldap openldap-clients openldap-servers

systemctl status slapd

# 为ldap添加测试用户数据
cat << EOF > base.ldif
dn: ou=users,dc=redhat,dc=ren
objectClass: organizationalUnit
objectClass: top
ou: users

dn: ou=groups,dc=redhat,dc=ren
objectClass: organizationalUnit
objectClass: top
ou: groups  
EOF

ldapadd -x -D "cn=admin,dc=redhat,dc=ren" -w ldap123 -f base.ldif

# 创建用户密码
slappasswd -s redhat
# {SSHA}yiR9306gQWh4mdeOuJ1KUg5cxQ8uoWKK

cat << EOF >users.ldif 
dn: cn=ocpadm,ou=users,dc=redhat,dc=ren
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
cn: ocpadm
sn: ocpadm
uid: ocpadm
displayName: ocpadm
mail: ocpadm@redhat.ren
userPassword: {SSHA}yiR9306gQWh4mdeOuJ1KUg5cxQ8uoWKK

dn: cn=wzh,ou=users,dc=redhat,dc=ren
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
cn: wzh
sn: wzh
uid: wzh
displayName: wzh
mail: wzh@redhat.ren
userPassword: {SSHA}yiR9306gQWh4mdeOuJ1KUg5cxQ8uoWKK

dn: cn=admins,ou=groups,dc=redhat,dc=ren
objectClass: groupOfNames
cn: admins
owner: cn=admin,dc=redhat,dc=ren
member: cn=ocpadm,ou=users,dc=redhat,dc=ren

dn: cn=normals,ou=groups,dc=redhat,dc=ren
objectClass: groupOfNames
cn: normals
owner: cn=admin,dc=redhat,dc=ren
member: cn=wzh,ou=users,dc=redhat,dc=ren

EOF
ldapadd -x -D "cn=admin,dc=redhat,dc=ren" -w ldap123 -f users.ldif 

ldapsearch -x -D "cn=admin,dc=redhat,dc=ren" -w ldap123 -b dc=redhat,dc=ren 

ocp operation

oc get user
oc get group
oc get identity

# cleanup 垃圾用户数据
oc get user | grep ldap | awk '{print $1}' | xargs -I DEMO oc delete user DEMO
oc get identity | grep ldap | awk '{print $1}' | xargs -I DEMO oc delete identity DEMO

# 创建登录密码
oc create secret generic ldap-secret --from-literal=bindPassword=ldap123 -n openshift-config

# 创建ldap登录入口
cat << EOF > ldap.yaml
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: "Local Password"
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpasswd
  - name: ldapidp 
    mappingMethod: claim 
    type: LDAP
    ldap:
      attributes:
        id: 
        - dn
        email: 
        - mail
        name: 
        - cn
        preferredUsername: 
        - uid
      bindDN: "cn=admin,dc=redhat,dc=ren"
      bindPassword: 
        name: ldap-secret
      insecure: true 
      url: "ldap://registry.redhat.ren:389/ou=users,dc=redhat,dc=ren?uid" 
EOF
oc apply -f ldap.yaml

# 从ldap同步group数据
cat << EOF > ldapsync.yaml
kind: LDAPSyncConfig
apiVersion: v1
url: ldap://registry.redhat.ren:389
insecure: true
bindDN: cn=admin,dc=redhat,dc=ren
bindPassword: ldap123 
groupUIDNameMapping:
  "cn=admins,ou=groups,dc=redhat,dc=ren": Administrators 
  "cn=normals,ou=groups,dc=redhat,dc=ren": NormalUsers 
rfc2307:
    groupsQuery:
        baseDN: "ou=groups,dc=redhat,dc=ren"
        scope: sub
        derefAliases: never
        pageSize: 0
        filter: (objectclass=groupOfNames)
    groupUIDAttribute: dn 
    groupNameAttributes: [ cn ] 
    groupMembershipAttributes: [ member ]
    usersQuery:
        baseDN: "ou=users,dc=redhat,dc=ren"
        scope: sub
        derefAliases: never
        pageSize: 0
    userUIDAttribute: dn 
    userNameAttributes: [ cn ]
    tolerateMemberNotFoundErrors: false
    tolerateMemberOutOfScopeErrors: false
EOF

oc adm groups sync --sync-config=ldapsync.yaml --confirm

# 删除ldap上已经删除的用户组
# oc adm prune groups --sync-config=ldapsync.yaml --confirm

# 在这个时候,可以用wzh/ocpadm登录系统,但是可以看到没有任何project的权限

# 准备为用户组赋权
oc get clusterrole
oc get role 

# 赋予admin和normal组不同的权限
oc adm policy add-cluster-role-to-group cluster-reader Administrators
oc policy add-role-to-group view NormalUsers -n demo 

# 再次登录系统,可以看到用户有了相应的权限

# 撤销用户组权限
oc adm policy remove-cluster-role-from-group cluster-reader Administrators
oc policy remove-role-from-group view NormalUsers -n demo 

# remove ldap 
# cleanup 垃圾用户数据
oc get user | grep ldap | awk '{print $1}' | xargs -I DEMO oc delete user DEMO
oc get identity | grep ldap | awk '{print $1}' | xargs -I DEMO oc delete identity DEMO

cat << EOF > ldap.yaml
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: "Local Password"
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpasswd
EOF
oc apply -f ldap.yaml

free ipa

skopeo copy docker://docker.io/freeipa/freeipa-server:latest docker://registry.redhat.ren:5443/docker.io/freeipa/freeipa-server:latest

mkdir -p /data/freeipa
cat << EOF > /data/freeipa/ipa-server-install-options
--realm=redhat.ren
--ds-password=The-directory-server-password
--admin-password=The-admin-password
EOF

# setsebool -P container_manage_cgroup 1

docker run --name freeipa-server-container -ti --privileged   \
    -e IPA_SERVER_IP=10.66.208.240 \
    -p 3080:80 -p 3443:443 -p 389:389 -p 636:636 -p 88:88 -p 464:464 \
    -p 88:88/udp -p 464:464/udp -p 123:123/udp \
   -h ipa.redhat.ren \
   -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
   --tmpfs /run --tmpfs /tmp \
   -v /data/freeipa:/data:Z \
   docker.io/freeipa/freeipa-server ipa-server-install

docker start -ai freeipa-server-container

docker rm -fv $(docker ps -qa)

firewall-cmd --zone=public --add-port=3443/tcp --permanent
firewall-cmd --reload

image pull secret

https://docs.openshift.com/container-platform/4.3/openshift_images/managing_images/using-image-pull-secrets.html

https://docs.openshift.com/container-platform/4.3/installing/install_config/installing-restricted-networks-preparations.html

# accross projects
oc policy add-role-to-user \
    system:image-puller system:serviceaccount:project-a:default \
    --namespace=project-b
oc policy add-role-to-group \
    system:image-puller system:serviceaccounts:project-a \
    --namespace=project-b

# ref outside
oc create secret generic <pull_secret_name> \
    --from-file=.dockercfg=<path/to/.dockercfg> \
    --type=kubernetes.io/dockercfg
oc create secret generic <pull_secret_name> \
    --from-file=.dockerconfigjson=<path/to/.docker/config.json> \
    --type=kubernetes.io/dockerconfigjson
oc create secret docker-registry <pull_secret_name> \
    --docker-server=<registry_server> \
    --docker-username=<user_name> \
    --docker-password=<password> \
    --docker-email=<email>
oc secrets link default <pull_secret_name> --for=pull
oc secrets link builder <pull_secret_name>


# global
oc get secret/pull-secret -n openshift-config -o yaml

oc get secret/pull-secret -n openshift-config -o json | jq -r '.data.".dockerconfigjson"' | base64 -d

oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=<pull-secret-location> 

cat ./pull-secret.text | jq .  > <path>/<pull-secret-file>

# <credentials>
echo -n '<user_name>:<password>' | base64 -w0 
#   "auths": {
# ...
#     "<local_registry_host_name>:<local_registry_host_port>": { 
#       "auth": "<credentials>", 
#       "email": "you@example.com"
#   },
# ...



openshift 4.3 huge page

video

  • https://youtu.be/T7R-j0B9eSY
  • https://www.bilibili.com/video/BV1De411W7JU/

https://docs.openshift.com/container-platform/4.3/scalability_and_performance/what-huge-pages-do-and-how-they-are-consumed-by-apps.html

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-configuring_transparent_huge_pages

# check original status
cat /sys/kernel/mm/transparent_hugepage/enabled
# [always] madvise never

cat /sys/kernel/mm/transparent_hugepage/defrag
# [always] madvise never

# begin to test 
oc label node infra1.hsc.redhat.ren hugepages=true

cat << EOF > hugepages_tuning.yaml
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: hugepages 
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile: 
  - data: |
      [main]
      summary=Configuration for hugepages
      include=openshift-node

      [vm]
      transparent_hugepages=never

      [sysctl]
      vm.nr_hugepages=1024
    name: node-hugepages
  recommend:
  - match: 
    - label: hugepages
    priority: 30
    profile: node-hugepages
EOF

oc create -f hugepages_tuning.yaml

oc get pod -o wide -n openshift-cluster-node-tuning-operator

oc logs tuned-86g8b \
    -n openshift-cluster-node-tuning-operator | grep 'applied$' | tail -n1

# check result
cat /sys/kernel/mm/transparent_hugepage/enabled
# always madvise [never]

cat /sys/kernel/mm/transparent_hugepage/defrag
# [always] madvise never

# node feature discovery 功能已经触发了profile自动选择。

cat << EOF > hugepages-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  generateName: hugepages-volume-
spec:
  containers:
  - securityContext:
      privileged: true
    image: registry.redhat.ren:5443/docker.io/wangzheng422/centos:centos7-test
    imagePullPolicy: Always
    command:
    - sleep
    - inf
    name: example
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    resources:
      limits:
        hugepages-2Mi: 100Mi 
        memory: "1Gi"
        cpu: "1"
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages
EOF
oc create -n demo -f hugepages-pod.yaml

# login into pod
oc rsh hugepages-volume-9nwlv

mount | grep page
# nodev on /dev/hugepages type hugetlbfs (rw,relatime,seclabel,pagesize=2Mi)

# 来看看系统huge page的状态
# yum install libhugetlbfs-utils
hugeadm --explain

# 根据以下的2个帖子,hugepage是给程序分配内存用的,不能用文件操作演示
# https://serverfault.com/questions/811670/how-to-create-copy-a-file-into-hugetlbfs
# https://stackoverflow.com/questions/40285971/how-to-load-text-segments-of-shared-libraries-into-huge-pages-on-linux]

# sysbench memory --memory-hugetlb=on --memory-total-size=200M run

# restore
oc delete -f hugepages_tuning.yaml
# reboot

openshift 4.3 helm

本文讲述,如何在openshift 4.3 上演示helm功能

video

  • https://youtu.be/L6ioq_JMOtE
  • https://www.bilibili.com/video/BV1qp4y197yH/

参考资料:

https://docs.openshift.com/container-platform/4.3/cli_reference/helm_cli/getting-started-with-helm-on-openshift-container-platform.html

https://chartmuseum.com/docs/#installing-chartsinto-kubernetes

https://whmzsu.github.io/helm-doc-zh-cn/chart/chart_repository-zh_cn.html

操作步骤

# 环境准备
skopeo copy docker://docker.io/gogs/gogs docker://registry.redhat.ren:5443/docker.io/gogs/gogs

skopeo copy docker://docker.io/chartmuseum/chartmuseum:latest docker://registry.redhat.ren:5443/docker.io/chartmuseum/chartmuseum:latest

skopeo copy docker://docker.io/ananwaresystems/webarchive:1.0 docker://registry.redhat.ren:5443/docker.io/ananwaresystems/webarchive:1.0

skopeo copy docker://docker.io/tomcat:7.0 docker://registry.redhat.ren:5443/docker.io/tomcat:7.0 

# https://github.com/helm/charts/tree/master/stable/chartmuseum

# 运行一个helm chart repository
mkdir -p /data/ocp4/helm/charts

podman run --rm -it \
  -p 18080:8080 \
  -v /data/ocp4/helm/charts:/charts:Z \
  -e DEBUG=true \
  -e STORAGE=local \
  -e STORAGE_LOCAL_ROOTDIR=/charts \
  --privileged \
  registry.redhat.ren:5443/docker.io/chartmuseum/chartmuseum:latest

# 准备 helm 客户端
curl -L https://mirror.openshift.com/pub/openshift-v4/clients/helm/latest/helm-linux-amd64 -o /usr/local/bin/helm

chmod +x /usr/local/bin/helm

helm version

helm repo add chartmuseum http://localhost:18080
helm repo list

# 编译一个helm chart, 并上传 chart repository
cd /data/ocp4/helm/tomcat
helm lint
helm package .
curl --data-binary "@tomcat-0.4.1.tgz" http://localhost:18080/api/charts
helm repo update
helm search repo

# 通过 helm chart 创建 tomcat deploy
oc project demo
helm install example-tomcat chartmuseum/tomcat
helm list

# 恢复环境
helm uninstall example-tomcat
helm repo remove chartmuseum

/bin/rm -f /data/ocp4/helm/charts/*

openshift tcp-router

本文描述,如何通过定制化haproxy template, 通过给route添加annotation,就可以向外界开放tcp路由。本文相关脚本和文件,在scripts目录中。

初衷和原理

经常会在openshift的poc中遇到L4负载均衡的测试,我们知道默认ocp router是haproxy做的,而且默认只支持http, https, 虽然tls/sni 也算是支持tcp的一种方式,但是这个也还是7层的。官方文档只是简单的说,如果有其他的需求,就定制haproxy template来满足,不过定制说的很少,例子也不多。本文就是一个通过定制化haproxy template,来达到动态监听route配置,并动态开放tcp端口。

定制haproxy template需要了解openshift router的一些原理要点

  • openshift router不仅仅是haproxy,它还有一个go程序,监听了openshift的配置,并且写入了一堆的map文件,这个文件是非常关键的配置haproxy template的配置文件。
  • openshift router里面的tls passthrough方式,对应到haproxy的配置里面,就是tcp的模式,我们的定制点就是在这里。
  • 定制过程集中在,屏蔽掉http/https 的edge和reencrypt部分,对于打annotation 的route,开放tls passthrough的frontend
  • route annotation 配置形式是 haproxy.router.openshift.io/external-tcp-port: "13306"
  • 当然,ocp4现在还不支持定制化route template,所以本文直接创建了一个route的deployment。
  • 现场实施的时候,注意更改router的image,每个版本的image可以去release.txt文件中找到。

既然是面向poc的,就肯定有局限

  • route annotation 定义的开放tcp端口,是手动定义,而且面向整个集群各个project开放,必然会导致tcp端口冲突。需要已有端口管理方案,这个就交给GPS吧。

以下是route的配置示例

kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: ottcache-002
  annotations:
    haproxy.router.openshift.io/wzh-router-name: "wzh-router-1"
    haproxy.router.openshift.io/external-tcp-port: "6620"
spec:
  to:
    kind: Service
    name: ottcache-002-service
  port:
    targetPort: 6620
  tls:
    termination: passthrough
    insecureEdgeTerminationPolicy: None

以下是template里面,关键的定制点

{{/*try to add tcp support*/}}

{{- if eq (env "WZH_ROUTER_NAME" "wzh-router-name") (index $cfg.Annotations "haproxy.router.openshift.io/wzh-router-name") }}
  {{- if (isInteger (index $cfg.Annotations "haproxy.router.openshift.io/external-tcp-port")) }} 
  frontend tcp-{{ (index $cfg.Annotations "haproxy.router.openshift.io/external-tcp-port") }}
    bind *:{{ (index $cfg.Annotations "haproxy.router.openshift.io/external-tcp-port") }}
    mode tcp
    default_backend {{genBackendNamePrefix $cfg.TLSTermination}}:{{$cfgIdx}}

  {{- end}}{{/* end haproxy.router.openshift.io */}}
{{- end}}{{/* end WZH_ROUTER_NAME */}}

{{/*end try to add tcp support*/}}

测试步骤

测试步骤不复杂,就是创建一个新的router,然后就可以去其他project创建应用,给route打annotation就可以了。

本文的例子,包含两个应用,一个是web应用,一个是mysql,都通过tcp端口对外开放。

# tcp-router will install in the same project with openshift router
oc project openshift-ingress

# install the tcp-router and demo
oc create configmap customrouter-wzh --from-file=haproxy-config.template
oc apply -f haproxy.router.yaml

oc apply -f haproxy.demo.yaml

# test your tcp-router, replace ip with router ip, both command will success.
curl 192.168.7.18:18080

podman run -it --rm registry.redhat.ren:5443/docker.io/mysql mysql -h 192.168.7.18 -P 13306 -u user -D db -p

# if you want to delete the tcp-router and demo
oc delete -f haproxy.router.yaml
oc delete configmap customrouter-wzh

oc delete -f haproxy.demo.yaml

# oc set volume deployment/router-wzh --add --overwrite \
#     --name=config-volume \
#     --mount-path=/var/lib/haproxy/conf/custom \
#     --source='{"configMap": { "name": "customrouter-wzh"}}'

# oc set env dc/router \
#     TEMPLATE_FILE=/var/lib/haproxy/conf/custom/haproxy-config.template

参考

https://docs.openshift.com/container-platform/3.11/install_config/router/customized_haproxy_router.html#go-template-actions

https://www.haproxy.com/blog/introduction-to-haproxy-maps/

https://access.redhat.com/solutions/3495011

https://blog.zhaw.ch/icclab/openshift-custom-router-with-tcpsni-support/

以下是弯路

分析源码,我们可以看到,openshift router还是对haproxy做了扩展的,那些map文件,都是router的扩展生成的,目的是对接endpoint,绕过service。所以我们想做tcp转发,可以借助sni-tcp来实现tcp转发。

openshift 4.3 grafana

展示 openshift 4.3 上的 grafana 功能

video

  • https://youtu.be/xGry0_LWFNw
  • https://www.bilibili.com/video/BV1yV411d7vR/

cpu manager

https://docs.openshift.com/container-platform/4.3/scalability_and_performance/using-cpu-manager.html

video

  • https://youtu.be/gzdb2AURhvo
  • https://www.bilibili.com/video/BV1Ua4y1t7aQ/
oc get node

oc label node ip-10-0-138-181.us-west-2.compute.internal cpumanager=true
oc label node worker-0 cpumanager=true

oc label node infra0.hsc.redhat.ren --overwrite cpumanager=false
oc label node worker-0.ocpsc.redhat.ren --overwrite cpumanager=true
oc label node worker-1.ocpsc.redhat.ren --overwrite cpumanager=true
oc label node worker-2.ocpsc.redhat.ren --overwrite cpumanager=true
oc label node worker-3.ocpsc.redhat.ren --overwrite cpumanager=true

oc get machineconfigpool worker -o yaml

# oc edit machineconfigpool worker
# metadata:
#   creationTimestamp: 2019-xx-xxx
#   generation: 3
#   labels:
#     custom-kubelet: cpumanager-enabled
oc patch machineconfigpool worker -p '{"metadata":{"labels": { "custom-kubelet": "cpumanager-enabled" } } }' --type=merge

cat << EOF > cpumanager-kubeletconfig.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static
     cpuManagerReconcilePeriod: 6s
EOF
oc apply -f cpumanager-kubeletconfig.yaml

alias urldecode='python3 -c "import sys, urllib.parse as ul; \
    print(ul.unquote_plus(sys.argv[1]))"'

alias urlencode='python3 -c "import sys, urllib.parse as ul; \
    print (ul.quote_plus(sys.argv[1]))"'

worker_mc_kubelet_yaml=$(oc get mc | grep kubelet | grep 99 | awk '{print $1}')

urldecode $(oc get mc ${worker_mc_kubelet_yaml} -o json | jq -r .spec.config.storage.files[0].contents.source | sed "s/data:text\/plain,//g") | jq

oc debug node/infra0.hsc.redhat.ren
cat /host/etc/kubernetes/kubelet.conf | grep cpuManager

# cat /etc/kubernetes/kubelet.conf | grep cpuManager

cat << EOF > cpumanager-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  generateName: cpumanager-
spec:
  containers:
  - name: cpumanager
    image: gcr.io/google_containers/pause-amd64:3.0
    resources:
      requests:
        cpu: 1
        memory: "1G"
      limits:
        cpu: 1
        memory: "1G"
  nodeSelector:
    cpumanager: "true"
EOF
oc apply -f cpumanager-pod.yaml

systemctl status
# └─kubepods.slice
#   ├─kubepods-podcc529083_9d0a_43aa_9d9f_1fc0dc3b626b.slice
#   │ ├─crio-conmon-b67ba6af381740b5f9b459482e41a14d4ced2cd8e9431598d84066d20027ef06.scope
#   │ │ └─1434963 /usr/libexec/crio/conmon -s -c b67ba6af381740b5f9b459482e41a14d4ced2cd8e9431598d84066d20027ef06 -n k8s_cpumanager_>            │ ├─crio-conmon-4ab85736504471dcca960aea960ca01ab0fa582439e444d407ac8d001d6dbd2b.scope
#   │ │ └─1434127 /usr/libexec/crio/conmon -s -c 4ab85736504471dcca960aea960ca01ab0fa582439e444d407ac8d001d6dbd2b -n k8s_POD_cpumana>            │ ├─crio-b67ba6af381740b5f9b459482e41a14d4ced2cd8e9431598d84066d20027ef06.scope
#   │ │ └─1434975 /pause
#   │ └─crio-4ab85736504471dcca960aea960ca01ab0fa582439e444d407ac8d001d6dbd2b.scope
#   │   └─1434151 /usr/bin/pod

cd /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-podcc529083_9d0a_43aa_9d9f_1fc0dc3b626b.slice/crio-b67ba6af381740b5f9b459482e41a14d4ced2cd8e9431598d84066d20027ef06.scope

for i in `ls cpuset.cpus tasks` ; do echo -n "$i "; cat $i ; done
# cpuset.cpus 12
# tasks 30894

grep Cpus_allowed_list /proc/1434975/status
# Cpus_allowed_list:      12

systemctl status
# ├─kubepods-burstable.slice
# │ ├─kubepods-burstable-podb8410218_65e9_4ec2_b944_6f0f1709e6a9.slice
# │ │ │ └─6696 /usr/bin/configmap-reload --webhook-url=http://localhost:8080/-/reload --volume-dir=/etc/serving-certs-ca-bundle
# │ │ ├─crio-conmon-958273b72d8d6f1a06a640bd158aa1f5dcc9372b232c79af9f3731068b0bcb9f.scope
# │ │ │ └─6922 /usr/libexec/crio/conmon -s -c 958273b72d8d6f1a06a640bd158aa1f5dcc9372b232c79af9f3731068b0bcb9f -n k8s_kube-rbac-pr>
# │ │ ├─crio-conmon-dc78df658a47a6bcad1772c5f0154c058b3b517f924c842eb9ba2c878edf86a3.scope
# │ │ │ └─6256 /usr/libexec/crio/conmon -s -c dc78df658a47a6bcad1772c5f0154c058b3b517f924c842eb9ba2c878edf86a3 -n k8s_telemeter-cl>
# │ │ ├─crio-958273b72d8d6f1a06a640bd158aa1f5dcc9372b232c79af9f3731068b0bcb9f.scope
# │ │ │ └─6958 /usr/bin/kube-rbac-proxy --secure-listen-address=:8443 --upstream=http://127.0.0.1:8080/ --tls-cert-file=/etc/tls/p>
# │ │ ├─crio-conmon-7a9aaeff818804cb48c6de76ef604e1241717ef25f9d2e31502bca5e03a0a126.scope
# │ │ │ └─5215 /usr/libexec/crio/conmon -s -c 7a9aaeff818804cb48c6de76ef604e1241717ef25f9d2e31502bca5e03a0a126 -n k8s_POD_telemete>
# │ │ ├─crio-dc78df658a47a6bcad1772c5f0154c058b3b517f924c842eb9ba2c878edf86a3.scope
# │ │ │ └─6321 /usr/bin/telemeter-client --id=02b8c3b4-9aed-4268-b1b7-84c998b50184 --from=https://prometheus-k8s.openshift-monitor>
# │ │ ├─crio-conmon-6cefa86b950deb57dac809b57246fb553e0c96fc31ae1cd7b8efa43207995749.scope
# │ │ │ └─6635 /usr/libexec/crio/conmon -s -c 6cefa86b950deb57dac809b57246fb553e0c96fc31ae1cd7b8efa43207995749 -n k8s_reload_telem>
# │ │ └─crio-7a9aaeff818804cb48c6de76ef604e1241717ef25f9d2e31502bca5e03a0a126.scope
# │ │   └─5292 /usr/bin/pod

cat /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb8410218_65e9_4ec2_b944_6f0f1709e6a9.slice/crio-dc78df658a47a6bcad1772c5f0154c058b3b517f924c842eb9ba2c878edf86a3.scope/cpuset.cpus
# 0-1,3

oc describe node ip-10-0-138-181.us-west-2.compute.internal

# 可以看到其他pod被限制了使用12号cpu,有一些进程不被限制,是控制进程。
# cd /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/
cd /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-besteffort.slice
find . -name cpuset.cpus | grep crio | xargs -I DEMO cat DEMO
# 0-11,13-23
# 0-23
# 0-11,13-23
# 0-23
# 0-11,13-23
# 0-23
# 0-11,13-23
# 0-23
# 0-11,13-23
# 0-23
# 0-23


# in pod
cat /sys/fs/cgroup/cpuset/cpuset.cpus
# 0-1,19

cat /proc/1/status | grep -i cpus_allow
# Cpus_allowed:   7fffc
# Cpus_allowed_list:      2-18

oc get pod -A | grep Running | awk '{print $1 "\t" $2}' > list
while read -r pod; do
  echo "$pod"
  oc exec -n $pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
  oc exec -n $pod -- cat /proc/1/status | grep -i cpus_allow
done < list

ls /proc | egrep '^[0-9]+$' | xargs -I DEMO echo " grep -s -i name /proc/DEMO/status | tr -d '\n'; echo -n -e '\t'; grep -s -i cpus_allowed_list /proc/DEMO/status ; " | sh
# Name:   systemd Cpus_allowed_list:      0-1
# Name:   ksoftirqd/0     Cpus_allowed_list:      0
# Name:   migration/10    Cpus_allowed_list:      10
# Name:   posixcputmr/10  Cpus_allowed_list:      10
# Name:   rcuc/10 Cpus_allowed_list:      10
# Name:   ksoftirqd/10    Cpus_allowed_list:      10
# Name:   kworker/10:0-mm_percpu_wq       Cpus_allowed_list:      10
# Name:   kworker/10:0H   Cpus_allowed_list:      10
# Name:   rcuop/10        Cpus_allowed_list:      0-1
# Name:   cpuhp/11        Cpus_allowed_list:      11
# Name:   watchdog/11     Cpus_allowed_list:      0-19
# Name:   migration/11    Cpus_allowed_list:      11
# Name:   systemd-journal Cpus_allowed_list:      0-1
# Name:   rcu_preempt     Cpus_allowed_list:      0-1
# Name:   posixcputmr/11  Cpus_allowed_list:      11
# Name:   rcuc/11 Cpus_allowed_list:      11
# Name:   systemd-udevd   Cpus_allowed_list:      0-1
# Name:   ksoftirqd/11    Cpus_allowed_list:      11
# Name:   kworker/11:0-mm_percpu_wq       Cpus_allowed_list:      11
# Name:   irq/149-ioat-ms Cpus_allowed_list:      0
# Name:   irq/151-ioat-ms Cpus_allowed_list:      1
# Name:   irq/152-ioat-ms Cpus_allowed_list:      0
# Name:   kworker/11:0H   Cpus_allowed_list:      11
# Name:   irq/153-ioat-ms Cpus_allowed_list:      1
# Name:   irq/154-ioat-ms Cpus_allowed_list:      0
# Name:   irq/155-ioat-ms Cpus_allowed_list:      1
# Name:   irq/156-ioat-ms Cpus_allowed_list:      1
# Name:   irq/157-mei_me  Cpus_allowed_list:      0
# Name:   irq/158-ioat-ms Cpus_allowed_list:      0
# Name:   irq/16-i801_smb Cpus_allowed_list:      1
# Name:   rcub/2  Cpus_allowed_list:      0-1
# Name:   kipmi0  Cpus_allowed_list:      0-1
# Name:   ib-comp-wq      Cpus_allowed_list:      0-19
# Name:   kworker/u41:0   Cpus_allowed_list:      0-1
# Name:   ib-comp-unb-wq  Cpus_allowed_list:      0-19
# Name:   ib_mcast        Cpus_allowed_list:      0-19
# Name:   rcuop/11        Cpus_allowed_list:      0-1
# Name:   ib_nl_sa_wq     Cpus_allowed_list:      0-19
# Name:   bnxt_re Cpus_allowed_list:      0-19
# Name:   irq/159-bnxt_qp Cpus_allowed_list:      0
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/160-bnxt_qp Cpus_allowed_list:      1
# Name:   ttm_swap        Cpus_allowed_list:      0-19
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/161-bnxt_qp Cpus_allowed_list:      1
# Name:   cpuhp/12        Cpus_allowed_list:      12
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/162-bnxt_qp Cpus_allowed_list:      1
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/163-bnxt_qp Cpus_allowed_list:      1
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/164-bnxt_qp Cpus_allowed_list:      0-1
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/165-bnxt_qp Cpus_allowed_list:      1
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/166-bnxt_qp Cpus_allowed_list:      1
# Name:   watchdog/12     Cpus_allowed_list:      0-19
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/167-bnxt_qp Cpus_allowed_list:      1
# Name:   ib_mad1 Cpus_allowed_list:      0-19
# Name:   irq/168-bnxt_qp Cpus_allowed_list:      1
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/169-bnxt_qp Cpus_allowed_list:      0
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/170-bnxt_qp Cpus_allowed_list:      1
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   migration/12    Cpus_allowed_list:      12
# Name:   irq/171-bnxt_qp Cpus_allowed_list:      0
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/172-bnxt_qp Cpus_allowed_list:      0
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/173-bnxt_qp Cpus_allowed_list:      0
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/174-bnxt_qp Cpus_allowed_list:      0
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   irq/175-bnxt_qp Cpus_allowed_list:      0
# Name:   bnxt_qplib_nq   Cpus_allowed_list:      0-19
# Name:   rcub/1  Cpus_allowed_list:      0-1
# Name:   posixcputmr/12  Cpus_allowed_list:      12
# Name:   irq/176-bnxt_qp Cpus_allowed_list:      0
# Name:   nfit    Cpus_allowed_list:      0-19
# Name:   ib_mad1 Cpus_allowed_list:      0-19
# Name:   rcuc/12 Cpus_allowed_list:      12
# Name:   rdma-ndd        Cpus_allowed_list:      0-1
# Name:   ksoftirqd/12    Cpus_allowed_list:      12
# Name:   rdma_cm Cpus_allowed_list:      0-19
# Name:   kworker/12:0-mm_percpu_wq       Cpus_allowed_list:      12
# Name:   iw_cxgb4        Cpus_allowed_list:      0-19
# Name:   kworker/12:0H   Cpus_allowed_list:      12
# Name:   Register_iWARP_ Cpus_allowed_list:      0-19
# Name:   rcuop/12        Cpus_allowed_list:      0-1
# Name:   rpciod  Cpus_allowed_list:      0-19
# Name:   xprtiod Cpus_allowed_list:      0-19
# Name:   cpuhp/13        Cpus_allowed_list:      13
# Name:   watchdog/13     Cpus_allowed_list:      0-19
# Name:   migration/13    Cpus_allowed_list:      13
# Name:   posixcputmr/13  Cpus_allowed_list:      13
# Name:   irq/119-i40e-ve Cpus_allowed_list:      0
# Name:   irq/120-i40e-ve Cpus_allowed_list:      1
# Name:   irq/121-i40e-ve Cpus_allowed_list:      0
# Name:   irq/122-i40e-ve Cpus_allowed_list:      1
# Name:   irq/123-i40e-ve Cpus_allowed_list:      0
# Name:   irq/124-i40e-ve Cpus_allowed_list:      1
# Name:   irq/125-i40e-ve Cpus_allowed_list:      0
# Name:   irq/126-i40e-ve Cpus_allowed_list:      1
# Name:   irq/127-i40e-ve Cpus_allowed_list:      0
# Name:   irq/128-i40e-ve Cpus_allowed_list:      1
# Name:   irq/129-i40e-ve Cpus_allowed_list:      0
# Name:   irq/130-i40e-ve Cpus_allowed_list:      0
# Name:   irq/131-i40e-ve Cpus_allowed_list:      0
# Name:   irq/132-i40e-ve Cpus_allowed_list:      1
# Name:   irq/133-i40e-ve Cpus_allowed_list:      1
# Name:   irq/134-i40e-ve Cpus_allowed_list:      0
# Name:   irq/135-i40e-ve Cpus_allowed_list:      1
# Name:   irq/136-i40e-ve Cpus_allowed_list:      1
# Name:   irq/137-i40e-ve Cpus_allowed_list:      0
# Name:   irq/138-i40e-ve Cpus_allowed_list:      0
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   pod     Cpus_allowed_list:      0-19
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   sleep   Cpus_allowed_list:      2-18
# Name:   runc    Cpus_allowed_list:      0-1
# Name:   bash    Cpus_allowed_list:      2-18
# Name:   runc    Cpus_allowed_list:      0-1
# Name:   bash    Cpus_allowed_list:      2-18
# Name:   runc    Cpus_allowed_list:      0-1
# Name:   bash    Cpus_allowed_list:      2-18
# Name:   runc    Cpus_allowed_list:      0-1
# Name:   bash    Cpus_allowed_list:      2-18
# Name:   rcuc/0  Cpus_allowed_list:      0
# Name:   rcuc/13 Cpus_allowed_list:      13
# Name:   jbd2/sda1-8     Cpus_allowed_list:      0-1
# Name:   ext4-rsv-conver Cpus_allowed_list:      0-19
# Name:   ksoftirqd/13    Cpus_allowed_list:      13
# Name:   kworker/13:0-mm_percpu_wq       Cpus_allowed_list:      13
# Name:   kworker/13:0H   Cpus_allowed_list:      13
# Name:   srp_remove      Cpus_allowed_list:      0-19
# Name:   licManager      Cpus_allowed_list:      2-18
# Name:   sh      Cpus_allowed_list:      2-18
# Name:   rcuop/13        Cpus_allowed_list:      0-1
# Name:   post-office     Cpus_allowed_list:      2-18
# Name:   oam     Cpus_allowed_list:      2-18
# Name:   tr069-v2        Cpus_allowed_list:      2-18
# Name:   ftp-func        Cpus_allowed_list:      2-18
# Name:   o-ru-controller Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   gnb_cu_oam      Cpus_allowed_list:      2-18
# Name:   bin_reader      Cpus_allowed_list:      2-18
# Name:   duoam   Cpus_allowed_list:      17-18
# Name:   gnb_cu_son      Cpus_allowed_list:      2-4
# Name:   target_completi Cpus_allowed_list:      0-19
# Name:   xcopy_wq        Cpus_allowed_list:      0-19
# Name:   cpuhp/14        Cpus_allowed_list:      14
# Name:   licManager      Cpus_allowed_list:      2-18
# Name:   sh      Cpus_allowed_list:      2-18
# Name:   post-office     Cpus_allowed_list:      2-18
# Name:   oam     Cpus_allowed_list:      2-18
# Name:   tr069-v2        Cpus_allowed_list:      2-18
# Name:   ftp-func        Cpus_allowed_list:      2-18
# Name:   o-ru-controller Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   gnb_cu_oam      Cpus_allowed_list:      2-18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   watchdog/14     Cpus_allowed_list:      0-19
# Name:   duoam   Cpus_allowed_list:      17-18
# Name:   migration/14    Cpus_allowed_list:      14
# Name:   licManager      Cpus_allowed_list:      2-18
# Name:   sh      Cpus_allowed_list:      2-18
# Name:   post-office     Cpus_allowed_list:      2-18
# Name:   oam     Cpus_allowed_list:      2-18
# Name:   tr069-v2        Cpus_allowed_list:      2-18
# Name:   ftp-func        Cpus_allowed_list:      2-18
# Name:   o-ru-controller Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   gnb_cu_oam      Cpus_allowed_list:      2-3
# Name:   gnb_cu_pdcp     Cpus_allowed_list:      2-5
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   duoam   Cpus_allowed_list:      17-18
# Name:   dumgr   Cpus_allowed_list:      17-18
# Name:   gnb_du_layer2   Cpus_allowed_list:      5
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   gnb_cu_son      Cpus_allowed_list:      2-4
# Name:   gnb_cu_l3       Cpus_allowed_list:      7-9
# Name:   posixcputmr/14  Cpus_allowed_list:      14
# Name:   auditd  Cpus_allowed_list:      0-1
# Name:   rcuc/14 Cpus_allowed_list:      14
# Name:   licManager      Cpus_allowed_list:      2-18
# Name:   sh      Cpus_allowed_list:      2-18
# Name:   post-office     Cpus_allowed_list:      2-18
# Name:   oam     Cpus_allowed_list:      2-18
# Name:   tr069-v2        Cpus_allowed_list:      2-18
# Name:   ftp-func        Cpus_allowed_list:      2-18
# Name:   o-ru-controller Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   gnb_cu_oam      Cpus_allowed_list:      2-3
# Name:   gnb_cu_pdcp     Cpus_allowed_list:      2-5
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   duoam   Cpus_allowed_list:      17-18
# Name:   dumgr   Cpus_allowed_list:      17-18
# Name:   gnb_du_layer2   Cpus_allowed_list:      5
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   gnb_cu_son      Cpus_allowed_list:      2-4
# Name:   gnb_cu_l3       Cpus_allowed_list:      7-9
# Name:   posixcputmr/0   Cpus_allowed_list:      0
# Name:   ksoftirqd/14    Cpus_allowed_list:      14
# Name:   kworker/14:0-mm_percpu_wq       Cpus_allowed_list:      14
# Name:   chronyd Cpus_allowed_list:      0-1
# Name:   sssd    Cpus_allowed_list:      0-1
# Name:   kworker/14:0H   Cpus_allowed_list:      14
# Name:   dbus-daemon     Cpus_allowed_list:      0-1
# Name:   licManager      Cpus_allowed_list:      2-18
# Name:   sh      Cpus_allowed_list:      2-18
# Name:   rcuop/14        Cpus_allowed_list:      0-1
# Name:   post-office     Cpus_allowed_list:      2-18
# Name:   oam     Cpus_allowed_list:      2-18
# Name:   tr069-v2        Cpus_allowed_list:      2-18
# Name:   ftp-func        Cpus_allowed_list:      2-18
# Name:   o-ru-controller Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   gnb_cu_oam      Cpus_allowed_list:      2-3
# Name:   gnb_cu_pdcp     Cpus_allowed_list:      2-4,9
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   duoam   Cpus_allowed_list:      17-18
# Name:   dumgr   Cpus_allowed_list:      17-18
# Name:   gnb_du_layer2   Cpus_allowed_list:      5
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   gnb_cu_son      Cpus_allowed_list:      2-4
# Name:   gnb_cu_rrm      Cpus_allowed_list:      5-6
# Name:   gnb_cu_l3       Cpus_allowed_list:      2-18
# Name:   cpuhp/15        Cpus_allowed_list:      15
# Name:   watchdog/15     Cpus_allowed_list:      0-19
# Name:   migration/15    Cpus_allowed_list:      15
# Name:   posixcputmr/15  Cpus_allowed_list:      15
# Name:   sssd_be Cpus_allowed_list:      0-1
# Name:   licManager      Cpus_allowed_list:      2-18
# Name:   sh      Cpus_allowed_list:      2-18
# Name:   post-office     Cpus_allowed_list:      2-18
# Name:   oam     Cpus_allowed_list:      2-18
# Name:   tr069-v2        Cpus_allowed_list:      2-18
# Name:   ftp-func        Cpus_allowed_list:      2-18
# Name:   o-ru-controller Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   lighttpd        Cpus_allowed_list:      2-18
# Name:   gnb_cu_oam      Cpus_allowed_list:      2-3
# Name:   gnb_cu_pdcp     Cpus_allowed_list:      2-4,9
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   bin_reader      Cpus_allowed_list:      18
# Name:   duoam   Cpus_allowed_list:      17-18
# Name:   dumgr   Cpus_allowed_list:      17-18
# Name:   gnb_du_layer2   Cpus_allowed_list:      5
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   bin_reader      Cpus_allowed_list:      2
# Name:   gnb_cu_son      Cpus_allowed_list:      2-4
# Name:   gnb_cu_rrm      Cpus_allowed_list:      5-6
# Name:   gnb_cu_l3       Cpus_allowed_list:      2-18
# Name:   kworker/u40:0-events_unbound    Cpus_allowed_list:      0-1
# Name:   rcuc/15 Cpus_allowed_list:      15
# Name:   ksoftirqd/15    Cpus_allowed_list:      15
# Name:   migration/0     Cpus_allowed_list:      0
# Name:   kworker/15:0-mm_percpu_wq       Cpus_allowed_list:      15
# Name:   sssd_nss        Cpus_allowed_list:      0-1
# Name:   kworker/15:0H   Cpus_allowed_list:      15
# Name:   systemd-logind  Cpus_allowed_list:      0-1
# Name:   rcuop/15        Cpus_allowed_list:      0-1
# Name:   vim     Cpus_allowed_list:      2-18
# Name:   ovsdb-server    Cpus_allowed_list:      0-1
# Name:   cpuhp/16        Cpus_allowed_list:      16
# Name:   watchdog/16     Cpus_allowed_list:      0-19
# Name:   migration/16    Cpus_allowed_list:      16
# Name:   posixcputmr/16  Cpus_allowed_list:      16
# Name:   rcuc/16 Cpus_allowed_list:      16
# Name:   ksoftirqd/16    Cpus_allowed_list:      16
# Name:   kworker/16:0-mm_percpu_wq       Cpus_allowed_list:      16
# Name:   watchdog/0      Cpus_allowed_list:      0
# Name:   kworker/16:0H   Cpus_allowed_list:      16
# Name:   rcuop/16        Cpus_allowed_list:      0-1
# Name:   cpuhp/17        Cpus_allowed_list:      17
# Name:   ovs-vswitchd    Cpus_allowed_list:      0-1
# Name:   watchdog/17     Cpus_allowed_list:      0-19
# Name:   kworker/u40:1-events_unbound    Cpus_allowed_list:      0-1
# Name:   migration/17    Cpus_allowed_list:      17
# Name:   kworker/0:0-events      Cpus_allowed_list:      0
# Name:   posixcputmr/17  Cpus_allowed_list:      17
# Name:   rcuc/17 Cpus_allowed_list:      17
# Name:   NetworkManager  Cpus_allowed_list:      0-1
# Name:   kworker/0:1-events      Cpus_allowed_list:      0
# Name:   ksoftirqd/17    Cpus_allowed_list:      17
# Name:   kworker/u40:2-events_unbound    Cpus_allowed_list:      0-1
# Name:   sshd    Cpus_allowed_list:      0-1
# Name:   sshd    Cpus_allowed_list:      0-1
# Name:   bash    Cpus_allowed_list:      0-1
# Name:   sudo    Cpus_allowed_list:      0-1
# Name:   bash    Cpus_allowed_list:      0-1
# Name:   kworker/17:0-mm_percpu_wq       Cpus_allowed_list:      17
# Name:   irq/79-i40e-ens Cpus_allowed_list:      1
# Name:   irq/80-i40e-ens Cpus_allowed_list:      0
# Name:   irq/81-i40e-ens Cpus_allowed_list:      0
# Name:   irq/82-i40e-ens Cpus_allowed_list:      1
# Name:   kworker/17:0H   Cpus_allowed_list:      17
# Name:   irq/83-i40e-ens Cpus_allowed_list:      0
# Name:   irq/84-i40e-ens Cpus_allowed_list:      0
# Name:   kworker/1:1-xfs-cil/dm-0        Cpus_allowed_list:      1
# Name:   irq/85-i40e-ens Cpus_allowed_list:      0
# Name:   irq/86-i40e-ens Cpus_allowed_list:      0
# Name:   irq/87-i40e-ens Cpus_allowed_list:      0
# Name:   irq/88-i40e-ens Cpus_allowed_list:      0
# Name:   irq/89-i40e-ens Cpus_allowed_list:      0
# Name:   irq/90-i40e-ens Cpus_allowed_list:      0
# Name:   irq/91-i40e-ens Cpus_allowed_list:      0
# Name:   irq/92-i40e-ens Cpus_allowed_list:      0
# Name:   cpuhp/0 Cpus_allowed_list:      0
# Name:   rcuop/17        Cpus_allowed_list:      0-1
# Name:   irq/93-i40e-ens Cpus_allowed_list:      1
# Name:   irq/94-i40e-ens Cpus_allowed_list:      0
# Name:   irq/95-i40e-ens Cpus_allowed_list:      1
# Name:   irq/96-i40e-ens Cpus_allowed_list:      0
# Name:   irq/97-i40e-ens Cpus_allowed_list:      1
# Name:   irq/98-i40e-ens Cpus_allowed_list:      1
# Name:   kworker/1:4-xfs-cil/dm-0        Cpus_allowed_list:      1
# Name:   cpuhp/18        Cpus_allowed_list:      18
# Name:   kworker/0:0H-xfs-log/dm-0       Cpus_allowed_list:      0
# Name:   kworker/u40:3-events_unbound    Cpus_allowed_list:      0-1
# Name:   watchdog/18     Cpus_allowed_list:      0-19
# Name:   kworker/1:0-events      Cpus_allowed_list:      1
# Name:   migration/18    Cpus_allowed_list:      18
# Name:   sleep   Cpus_allowed_list:      0-1,19
# Name:   sleep   Cpus_allowed_list:      0-1,19
# Name:   sh      Cpus_allowed_list:      0-1
# Name:   posixcputmr/18  Cpus_allowed_list:      18
# Name:   agetty  Cpus_allowed_list:      0-1
# Name:   agetty  Cpus_allowed_list:      0-1
# Name:   rcuc/18 Cpus_allowed_list:      18
# Name:   ksoftirqd/18    Cpus_allowed_list:      18
# Name:   kworker/18:0-mm_percpu_wq       Cpus_allowed_list:      18
# Name:   kworker/18:0H   Cpus_allowed_list:      18
# Name:   rcuop/18        Cpus_allowed_list:      0-1
# Name:   cpuhp/1 Cpus_allowed_list:      1
# Name:   cpuhp/19        Cpus_allowed_list:      19
# Name:   irq/70-ens81f0n Cpus_allowed_list:      0
# Name:   irq/71-ens81f0n Cpus_allowed_list:      1
# Name:   irq/72-ens81f0n Cpus_allowed_list:      0
# Name:   irq/73-ens81f0n Cpus_allowed_list:      1
# Name:   irq/74-ens81f0n Cpus_allowed_list:      0
# Name:   watchdog/19     Cpus_allowed_list:      0-19
# Name:   irq/75-ens81f0n Cpus_allowed_list:      1
# Name:   irq/76-ens81f0n Cpus_allowed_list:      0
# Name:   irq/77-ens81f0n Cpus_allowed_list:      1
# Name:   migration/19    Cpus_allowed_list:      19
# Name:   posixcputmr/19  Cpus_allowed_list:      19
# Name:   rcuc/19 Cpus_allowed_list:      19
# Name:   irq/110-ens81f1 Cpus_allowed_list:      0
# Name:   irq/111-ens81f1 Cpus_allowed_list:      1
# Name:   irq/112-ens81f1 Cpus_allowed_list:      0
# Name:   irq/113-ens81f1 Cpus_allowed_list:      1
# Name:   irq/114-ens81f1 Cpus_allowed_list:      0
# Name:   irq/115-ens81f1 Cpus_allowed_list:      1
# Name:   irq/116-ens81f1 Cpus_allowed_list:      0
# Name:   irq/117-ens81f1 Cpus_allowed_list:      1
# Name:   ksoftirqd/19    Cpus_allowed_list:      19
# Name:   kworker/19:0-mm_percpu_wq       Cpus_allowed_list:      19
# Name:   kworker/19:0H   Cpus_allowed_list:      19
# Name:   rcuop/19        Cpus_allowed_list:      0-1
# Name:   watchdog/1      Cpus_allowed_list:      0-19
# Name:   irq/4-ttyS0     Cpus_allowed_list:      0
# Name:   kdevtmpfs       Cpus_allowed_list:      0-1
# Name:   netns   Cpus_allowed_list:      0-19
# Name:   rcu_tasks_kthre Cpus_allowed_list:      0-1
# Name:   kauditd Cpus_allowed_list:      0-1
# Name:   sshd    Cpus_allowed_list:      0-1
# Name:   rpcbind Cpus_allowed_list:      0-1
# Name:   rpc.statd       Cpus_allowed_list:      0-1
# Name:   khungtaskd      Cpus_allowed_list:      0-1
# Name:   oom_reaper      Cpus_allowed_list:      0-1
# Name:   kthreadd        Cpus_allowed_list:      0-1
# Name:   migration/1     Cpus_allowed_list:      1
# Name:   writeback       Cpus_allowed_list:      0-19
# Name:   kcompactd0      Cpus_allowed_list:      0-1
# Name:   ksmd    Cpus_allowed_list:      0-1
# Name:   crypto  Cpus_allowed_list:      0-19
# Name:   kintegrityd     Cpus_allowed_list:      0-19
# Name:   kblockd Cpus_allowed_list:      0-19
# Name:   irq/9-acpi      Cpus_allowed_list:      0
# Name:   tpm_dev_wq      Cpus_allowed_list:      0-19
# Name:   posixcputmr/1   Cpus_allowed_list:      1
# Name:   md      Cpus_allowed_list:      0-19
# Name:   crio    Cpus_allowed_list:      0-1
# Name:   edac-poller     Cpus_allowed_list:      0-19
# Name:   watchdogd       Cpus_allowed_list:      0-1
# Name:   rcuc/1  Cpus_allowed_list:      1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   ksoftirqd/1     Cpus_allowed_list:      1
# Name:   kswapd0 Cpus_allowed_list:      0-1
# Name:   pod     Cpus_allowed_list:      0-1
# Name:   pod     Cpus_allowed_list:      0-1
# Name:   pod     Cpus_allowed_list:      0-1
# Name:   pod     Cpus_allowed_list:      0-1
# Name:   pod     Cpus_allowed_list:      0-1
# Name:   pod     Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   kworker/2:1-mm_percpu_wq        Cpus_allowed_list:      2
# Name:   pod     Cpus_allowed_list:      0-1
# Name:   kworker/3:1-mm_percpu_wq        Cpus_allowed_list:      3
# Name:   kworker/4:1-mm_percpu_wq        Cpus_allowed_list:      4
# Name:   kworker/1:1H-kblockd    Cpus_allowed_list:      1
# Name:   kworker/5:1-mm_percpu_wq        Cpus_allowed_list:      5
# Name:   kworker/6:1-mm_percpu_wq        Cpus_allowed_list:      6
# Name:   kworker/7:1-mm_percpu_wq        Cpus_allowed_list:      7
# Name:   kworker/8:1-mm_percpu_wq        Cpus_allowed_list:      8
# Name:   kworker/9:1-mm_percpu_wq        Cpus_allowed_list:      9
# Name:   kworker/10:1-mm_percpu_wq       Cpus_allowed_list:      10
# Name:   kworker/11:1-mm_percpu_wq       Cpus_allowed_list:      11
# Name:   kworker/12:1-mm_percpu_wq       Cpus_allowed_list:      12
# Name:   kworker/13:1-mm_percpu_wq       Cpus_allowed_list:      13
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   kworker/14:1-mm_percpu_wq       Cpus_allowed_list:      14
# Name:   kworker/15:1-mm_percpu_wq       Cpus_allowed_list:      15
# Name:   cpuhp/2 Cpus_allowed_list:      2
# Name:   kworker/16:1-mm_percpu_wq       Cpus_allowed_list:      16
# Name:   sh      Cpus_allowed_list:      0-1,19
# Name:   kworker/17:1-mm_percpu_wq       Cpus_allowed_list:      17
# Name:   tail    Cpus_allowed_list:      0-1,19
# Name:   kworker/18:1-mm_percpu_wq       Cpus_allowed_list:      18
# Name:   kworker/19:1-mm_percpu_wq       Cpus_allowed_list:      19
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   watchdog/2      Cpus_allowed_list:      0-19
# Name:   kubelet Cpus_allowed_list:      0-1
# Name:   systemd Cpus_allowed_list:      0-1
# Name:   (sd-pam)        Cpus_allowed_list:      0-1
# Name:   podman pause    Cpus_allowed_list:      0-1
# Name:   machine-config- Cpus_allowed_list:      0-1,19
# Name:   kworker/0:1H-kblockd    Cpus_allowed_list:      0
# Name:   openshift-sdn-n Cpus_allowed_list:      0-1,19
# Name:   migration/2     Cpus_allowed_list:      2
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   run     Cpus_allowed_list:      0-1,19
# Name:   posixcputmr/2   Cpus_allowed_list:      2
# Name:   rcu_gp  Cpus_allowed_list:      0-19
# Name:   rcuc/2  Cpus_allowed_list:      2
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   oauth-proxy     Cpus_allowed_list:      0-1,19
# Name:   kube-rbac-proxy Cpus_allowed_list:      0-1,19
# Name:   openshift-tuned Cpus_allowed_list:      0-1,19
# Name:   ksoftirqd/2     Cpus_allowed_list:      2
# Name:   kworker/1:2H-kblockd    Cpus_allowed_list:      1
# Name:   polkitd Cpus_allowed_list:      0-1
# Name:   kworker/2:0-mm_percpu_wq        Cpus_allowed_list:      2
# Name:   journalctl      Cpus_allowed_list:      0-1,19
# Name:   kworker/2:0H    Cpus_allowed_list:      2
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   node_exporter   Cpus_allowed_list:      0-1,19
# Name:   rcuop/2 Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   kthrotld        Cpus_allowed_list:      0-19
# Name:   irq/24-PCIe PME Cpus_allowed_list:      1
# Name:   irq/26-PCIe PME Cpus_allowed_list:      1
# Name:   kube-rbac-proxy Cpus_allowed_list:      0-1,19
# Name:   irq/26-pciehp   Cpus_allowed_list:      1
# Name:   irq/26-s-pciehp Cpus_allowed_list:      1
# Name:   irq/27-PCIe PME Cpus_allowed_list:      1
# Name:   cpuhp/3 Cpus_allowed_list:      3
# Name:   irq/65-PCIe PME Cpus_allowed_list:      0
# Name:   irq/65-aerdrv   Cpus_allowed_list:      0
# Name:   irq/65-s-aerdrv Cpus_allowed_list:      0
# Name:   acpi_thermal_pm Cpus_allowed_list:      0-19
# Name:   kmpath_rdacd    Cpus_allowed_list:      0-19
# Name:   kaluad  Cpus_allowed_list:      0-19
# Name:   irq/66-xhci_hcd Cpus_allowed_list:      1
# Name:   irq/8-rtc0      Cpus_allowed_list:      0
# Name:   ipv6_addrconf   Cpus_allowed_list:      0-19
# Name:   kstrp   Cpus_allowed_list:      0-19
# Name:   watchdog/3      Cpus_allowed_list:      0-19
# Name:   migration/3     Cpus_allowed_list:      3
# Name:   posixcputmr/3   Cpus_allowed_list:      3
# Name:   rcuc/3  Cpus_allowed_list:      3
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   pod     Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   network-metrics Cpus_allowed_list:      0-1,19
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   rcu_par_gp      Cpus_allowed_list:      0-19
# Name:   ksoftirqd/3     Cpus_allowed_list:      3
# Name:   pod     Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   kube-rbac-proxy Cpus_allowed_list:      0-1,19
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   kworker/3:0-mm_percpu_wq        Cpus_allowed_list:      3
# Name:   entrypoint.sh   Cpus_allowed_list:      0-1,19
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   coredns Cpus_allowed_list:      0-1,19
# Name:   kworker/3:0H    Cpus_allowed_list:      3
# Name:   rcuop/3 Cpus_allowed_list:      0-1
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   kube-rbac-proxy Cpus_allowed_list:      0-1,19
# Name:   cpuhp/4 Cpus_allowed_list:      4
# Name:   conmon  Cpus_allowed_list:      0-1
# Name:   bash    Cpus_allowed_list:      0-1,19
# Name:   watchdog/4      Cpus_allowed_list:      0-19
# Name:   migration/4     Cpus_allowed_list:      4
# Name:   posixcputmr/4   Cpus_allowed_list:      4
# Name:   tuned   Cpus_allowed_list:      0-1,19
# Name:   rcuc/4  Cpus_allowed_list:      4
# Name:   ksoftirqd/4     Cpus_allowed_list:      4
# Name:   irqbalance      Cpus_allowed_list:      0-1
# Name:   stalld  Cpus_allowed_list:      0-1
# Name:   kworker/4:0-mm_percpu_wq        Cpus_allowed_list:      4
# Name:   kworker/4:0H    Cpus_allowed_list:      4
# Name:   iscsi_eh        Cpus_allowed_list:      0-19
# Name:   rcuop/4 Cpus_allowed_list:      0-1
# Name:   cpuhp/5 Cpus_allowed_list:      5
# Name:   watchdog/5      Cpus_allowed_list:      0-19
# Name:   migration/5     Cpus_allowed_list:      5
# Name:   posixcputmr/5   Cpus_allowed_list:      5
# Name:   rcuc/5  Cpus_allowed_list:      5
# Name:   ksoftirqd/5     Cpus_allowed_list:      5
# Name:   kworker/5:0-mm_percpu_wq        Cpus_allowed_list:      5
# Name:   kworker/5:0H    Cpus_allowed_list:      5
# Name:   rcuop/5 Cpus_allowed_list:      0-1
# Name:   cpuhp/6 Cpus_allowed_list:      6
# Name:   watchdog/6      Cpus_allowed_list:      0-19
# Name:   cnic_wq Cpus_allowed_list:      0-19
# Name:   bnx2i_thread/0  Cpus_allowed_list:      0
# Name:   bnx2i_thread/1  Cpus_allowed_list:      1
# Name:   bnx2i_thread/2  Cpus_allowed_list:      2
# Name:   bnx2i_thread/3  Cpus_allowed_list:      3
# Name:   bnx2i_thread/4  Cpus_allowed_list:      4
# Name:   bnx2i_thread/5  Cpus_allowed_list:      5
# Name:   bnx2i_thread/6  Cpus_allowed_list:      6
# Name:   migration/6     Cpus_allowed_list:      6
# Name:   bnx2i_thread/7  Cpus_allowed_list:      7
# Name:   bnx2i_thread/8  Cpus_allowed_list:      8
# Name:   bnx2i_thread/9  Cpus_allowed_list:      9
# Name:   bnx2i_thread/10 Cpus_allowed_list:      10
# Name:   bnx2i_thread/11 Cpus_allowed_list:      11
# Name:   bnx2i_thread/12 Cpus_allowed_list:      12
# Name:   bnx2i_thread/13 Cpus_allowed_list:      13
# Name:   bnx2i_thread/14 Cpus_allowed_list:      14
# Name:   bnx2i_thread/15 Cpus_allowed_list:      15
# Name:   bnx2i_thread/16 Cpus_allowed_list:      16
# Name:   posixcputmr/6   Cpus_allowed_list:      6
# Name:   bnx2i_thread/17 Cpus_allowed_list:      17
# Name:   bnx2i_thread/18 Cpus_allowed_list:      18
# Name:   bnx2i_thread/19 Cpus_allowed_list:      19
# Name:   rcuc/6  Cpus_allowed_list:      6
# Name:   ksoftirqd/6     Cpus_allowed_list:      6
# Name:   kworker/6:0-mm_percpu_wq        Cpus_allowed_list:      6
# Name:   kmpathd Cpus_allowed_list:      0-19
# Name:   kworker/6:0H    Cpus_allowed_list:      6
# Name:   kmpath_handlerd Cpus_allowed_list:      0-19
# Name:   rcuop/6 Cpus_allowed_list:      0-1
# Name:   cpuhp/7 Cpus_allowed_list:      7
# Name:   watchdog/7      Cpus_allowed_list:      0-19
# Name:   migration/7     Cpus_allowed_list:      7
# Name:   posixcputmr/7   Cpus_allowed_list:      7
# Name:   rcuc/7  Cpus_allowed_list:      7
# Name:   ksoftirqd/7     Cpus_allowed_list:      7
# Name:   kworker/7:0-mm_percpu_wq        Cpus_allowed_list:      7
# Name:   kworker/7:0H    Cpus_allowed_list:      7
# Name:   ata_sff Cpus_allowed_list:      0-19
# Name:   i40e    Cpus_allowed_list:      0-19
# Name:   rcuop/7 Cpus_allowed_list:      0-1
# Name:   bnxt_pf_wq      Cpus_allowed_list:      0-19
# Name:   irq/67-ahci[000 Cpus_allowed_list:      0
# Name:   scsi_eh_0       Cpus_allowed_list:      0-1
# Name:   scsi_tmf_0      Cpus_allowed_list:      0-19
# Name:   scsi_eh_1       Cpus_allowed_list:      0-1
# Name:   scsi_tmf_1      Cpus_allowed_list:      0-19
# Name:   scsi_eh_2       Cpus_allowed_list:      0-1
# Name:   scsi_tmf_2      Cpus_allowed_list:      0-19
# Name:   cpuhp/8 Cpus_allowed_list:      8
# Name:   scsi_eh_3       Cpus_allowed_list:      0-1
# Name:   scsi_tmf_3      Cpus_allowed_list:      0-19
# Name:   scsi_eh_4       Cpus_allowed_list:      0-1
# Name:   scsi_tmf_4      Cpus_allowed_list:      0-19
# Name:   scsi_eh_5       Cpus_allowed_list:      0-1
# Name:   scsi_tmf_5      Cpus_allowed_list:      0-19
# Name:   watchdog/8      Cpus_allowed_list:      0-19
# Name:   irq/99-i40e-000 Cpus_allowed_list:      0
# Name:   irq/78-i40e-000 Cpus_allowed_list:      0
# Name:   irq/109-ahci[00 Cpus_allowed_list:      1
# Name:   scsi_eh_6       Cpus_allowed_list:      0-1
# Name:   migration/8     Cpus_allowed_list:      8
# Name:   scsi_tmf_6      Cpus_allowed_list:      0-19
# Name:   scsi_eh_7       Cpus_allowed_list:      0-1
# Name:   scsi_tmf_7      Cpus_allowed_list:      0-19
# Name:   scsi_eh_8       Cpus_allowed_list:      0-1
# Name:   scsi_tmf_8      Cpus_allowed_list:      0-19
# Name:   scsi_eh_9       Cpus_allowed_list:      0-1
# Name:   scsi_tmf_9      Cpus_allowed_list:      0-19
# Name:   scsi_eh_10      Cpus_allowed_list:      0-1
# Name:   scsi_tmf_10     Cpus_allowed_list:      0-19
# Name:   posixcputmr/8   Cpus_allowed_list:      8
# Name:   scsi_eh_11      Cpus_allowed_list:      0-1
# Name:   scsi_tmf_11     Cpus_allowed_list:      0-19
# Name:   scsi_eh_12      Cpus_allowed_list:      0-1
# Name:   scsi_tmf_12     Cpus_allowed_list:      0-19
# Name:   scsi_eh_13      Cpus_allowed_list:      0-1
# Name:   scsi_tmf_13     Cpus_allowed_list:      0-19
# Name:   irq/139-i40e-00 Cpus_allowed_list:      0
# Name:   rcuc/8  Cpus_allowed_list:      8
# Name:   irq/118-i40e-00 Cpus_allowed_list:      1
# Name:   ksoftirqd/8     Cpus_allowed_list:      8
# Name:   kworker/8:0-mm_percpu_wq        Cpus_allowed_list:      8
# Name:   kworker/8:0H    Cpus_allowed_list:      8
# Name:   rcuop/8 Cpus_allowed_list:      0-1
# Name:   cpuhp/9 Cpus_allowed_list:      9
# Name:   mm_percpu_wq    Cpus_allowed_list:      0-19
# Name:   watchdog/9      Cpus_allowed_list:      0-19
# Name:   migration/9     Cpus_allowed_list:      9
# Name:   posixcputmr/9   Cpus_allowed_list:      9
# Name:   kdmflush        Cpus_allowed_list:      0-19
# Name:   rcuc/9  Cpus_allowed_list:      9
# Name:   xfsalloc        Cpus_allowed_list:      0-19
# Name:   xfs_mru_cache   Cpus_allowed_list:      0-19
# Name:   ksoftirqd/9     Cpus_allowed_list:      9
# Name:   xfs-buf/dm-0    Cpus_allowed_list:      0-19
# Name:   xfs-conv/dm-0   Cpus_allowed_list:      0-19
# Name:   xfs-cil/dm-0    Cpus_allowed_list:      0-19
# Name:   xfs-reclaim/dm- Cpus_allowed_list:      0-19
# Name:   xfs-log/dm-0    Cpus_allowed_list:      0-19
# Name:   xfs-eofblocks/d Cpus_allowed_list:      0-19
# Name:   xfsaild/dm-0    Cpus_allowed_list:      0-1
# Name:   kworker/9:0-mm_percpu_wq        Cpus_allowed_list:      9
# Name:   kworker/9:0H    Cpus_allowed_list:      9
# Name:   rcuop/9 Cpus_allowed_list:      0-1
# Name:   cpuhp/10        Cpus_allowed_list:      10
# Name:   watchdog/10     Cpus_allowed_list:      0-19

openshift 4.3 build config & hpa

video for build config & scale up

  • https://youtu.be/O0TjPBisMVo
  • https://www.bilibili.com/video/BV1rT4y137QJ/
  • https://www.ixigua.com/i6824464593977344525/

video for scale up & service

  • https://youtu.be/6fMe7T4RlCI
  • https://www.bilibili.com/video/BV1Xt4y1y7xG/
  • https://www.ixigua.com/i6824739572237206023/

php build config


# 准备一个php的测试镜像
cat << EOF > php.dockerfile
FROM php:apache
COPY . /var/www/html/
EOF

cat <<EOF > index.php
<?php
ECHO "Hello!<br>";
echo "Welcome to RedHat Developer<br>";
EcHo "Enjoy all of the ad-free articles<br>";
?>
EOF

buildah build-using-dockerfile -t docker.io/wangzheng422/php:demo -f php.dockerfile .

podman run -it --rm -p 18080:80 --name my-running-app docker.io/wangzheng422/php:demo

# 创建一个git服务器,用gogs,启动以后要做一些配置。
# 配置 resolve.conf
# 配置 app.ini
# [webhook]
# SKIP_TLS_VERIFY  = true

mkdir -p /data/ocp4/gogs
podman run -d --name=gogs -p 10022:22 -p 10080:3000 -v /data/ocp4/gogs:/data:Z registry.redhat.ren:5443/docker.io/gogs/gogs

podman stop gogs
podman start gogs
# http://registry.redhat.ren:10080

# 在demo项目中,创建编译配置
oc project demo

oc import-image php:apache-wzh --from=registry.redhat.ren:5443/docker.io/library/php:apache-wzh --confirm

# oc import-image php:apache-wzh --from=registry.redhat.ren:5443/docker.io/wangzheng422/php:apache --confirm

oc create is php-sample -n demo

cat << EOF > bc.is.yaml
kind: BuildConfig
apiVersion: build.openshift.io/v1
metadata:
  name: "php-sample-build" 
spec:
  runPolicy: "Serial" 
  triggers: 
    - type: "Generic"
      generic:
        secret: "secret101"
    -
      type: "ImageChange"
  source: 
    git:
      uri: "http://registry.redhat.ren:10080/root/php"
    dockerfile: "FROM php:apache\nCOPY . /var/www/html/" 
  strategy: 
    dockerStrategy:
      from:
        kind: "ImageStreamTag"
        name: "php:apache-wzh"
  output: 
    to:
      kind: "ImageStreamTag"
      name: "php-sample:demo"
EOF
oc apply -f bc.is.yaml

# 在界面上操作,通过镜像创建应用,并通过代码更改,触发应用的重新部署。

hpa

我们在这里展示openshift如何根据cpu的负载,自动扩缩pod的数量。

video

  • https://youtu.be/_UTncz3StXE
  • https://www.bilibili.com/video/BV1Tk4y1r7Be/

# oc autoscale dc/php-sample \
#   --min 1 \
#   --max 3 \
#   --cpu-percent=50 

# 根据已经创建的deployment,创建HPA
cat << 'EOF' > demo.hpa.yaml
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: php-sample
  namespace: demo
spec:
  scaleTargetRef:
    kind: Deployment
    name: php-sample
    apiVersion: apps/v1
  minReplicas: 1
  maxReplicas: 3
  metrics:
    - type: Resource
      resource:
        name: cpu
        targetAverageUtilization: 50
EOF
oc apply -n demo -f demo.hpa.yaml

# 为了不影响其他测试,我们把php pod定点到同一个交换机主机上
cat << 'EOF'
nodeSelector:
    kubernetes.io/hostname: 'infra1.hsc.redhat.ren'
EOF

# 为了更好的展示效果,我们限制cpu使用量
cat << 'EOF'
    resources:
      requests:
        cpu: '100m'
        memory: "1G"
      limits:
        cpu: '100m'
        memory: "1G"
EOF

# 开始压力
ab -c 100 -n 99999 http://php-sample-demo.apps.ocpsc.redhat.ren/

# 查看当前hpa的状态
oc describe hpa/php-sample

弯路

skopeo copy docker://docker.io/php:apache docker-archive:///root/tmp/php.tar
gzip php.tar

skopeo copy docker-archive:///data/ocp4/tmp/php.tar.gz docker://registry.redhat.ren:5443/docker.io/library/php:apache

skopeo copy docker://docker.io/wangzheng422/php:apache docker://registry.redhat.ren:5443/docker.io/wangzheng422/php:apache

cat << EOF > docker.php.sh
#!/usr/bin/env bash

set -e
set -x

buildah from --name onbuild-container docker.io/php:apache
buildah run onbuild-container sed -i "s/80/8080/g" /etc/apache2/sites-available/000-default.conf /etc/apache2/ports.conf
buildah umount onbuild-container 
buildah config -p 8080 onbuild-container
buildah commit --squash --rm --format=docker onbuild-container docker.io/wangzheng422/php:apache
buildah push docker.io/wangzheng422/php:apache
EOF
bash docker.php.sh

cat << EOF > docker.php.sh
#!/usr/bin/env bash

set -e
set -x

buildah from --name onbuild-container registry.redhat.ren:5443/docker.io/library/php:apache
buildah run onbuild-container sed -i "s/80/8080/g" /etc/apache2/sites-available/000-default.conf /etc/apache2/ports.conf
buildah umount onbuild-container 
buildah config -p 8080 onbuild-container
buildah commit --squash --rm --format=docker onbuild-container registry.redhat.ren:5443/docker.io/library/php:apache-wzh
buildah push registry.redhat.ren:5443/docker.io/library/php:apache-wzh
EOF
bash docker.php.sh

# 我们不需要复杂的 template
oc get template -n openshift | grep php

# 用 source to image 功能就可以,所有找一下image stream
oc get is -A | grep php

# 我们把sample operator的状态改一下
oc get configs.samples.operator.openshift.io/cluster -o yaml

oc patch configs.samples.operator.openshift.io/cluster -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge

export LOCAL_REG='registry.redhat.ren:5443'

var_is_name='php'
var_json=$(oc get is ${var_is_name} -n openshift -o json)

var_j=0
for var_is_tag in $(echo $var_json | jq -r ".spec.tags[].name"); do
    var_is_image_name=$(echo $var_json | jq -r ".spec.tags[${var_j}].from.name")
        
    var_is_image_kind=$(echo $var_json | jq -r ".spec.tags[${var_j}].from.kind")
    
    if [[ $var_is_image_kind =~ 'DockerImage'  ]]; then
        var_new_is_image_name="${LOCAL_REG}/$var_is_image_name"
        
        echo "###############################"
        echo $var_is_image_name
        echo $var_is_image_kind
        echo $var_new_is_image_name
        echo $var_is_tag

        oc patch -n openshift is ${var_is_name} --type='json' -p="[{\"op\": \"replace\", \"path\": \"/spec/tags/${var_j}/from/name\", \"value\":\"${var_new_is_image_name}\"}]"
    fi
    var_j=$((var_j+1))
done

containered cloud-native (ccn) roadshow 离线部署

CCN是一个不错的演示openshift之上,ci/cd, cloud-native, istio, serverless的演示教材,教学的内容非常丰富。

第一个模块,着重讲解如何拆分单体应用,以及拆分的应用如何上云。

第二个模块,讲解如何在线debug, 如何监控上云的应用

第三个模块,应用转换到服务网格service mesh/istio架构

第四个模块,应用使用无服务架构serverless/knative架构开发

培训过程视频

安装过程视频

不过 upstream 的 CCN 是基于 rh demo system 的,必须在线,这里就做了一个离线的版本,供给客户离线使用。

离线部署架构描述

本次CCN离线,是基于ocp 4.4.7制作。一共有4个module。

做CCN的离线,主要有以下3部分工作

  • github 离线
  • maven, npm 离线
  • 需要的镜像离线

在实验室的部署架构如下,供参考:

可以看到,与标准的部署架构没什么区别,就是在helper节点上面,加了gogs, nexus。

安装介质下载

请到如下的链接,下载安装介质,注意,这个安装介质是基于ocp 4.4.7 制作。

  • 链接: https://pan.baidu.com/s/1f3EcbojFss5cDDQBPBzA-A 密码: 1jun

由于上传的时候,安装5G大小切分,下载以后,合并使用如下的命令范例:

cat registry.?? > registry.tgz

百度盘上还会有补丁文件,比如,当有一个 agnosticd.zip 文件时, 这个就是补丁文件,上传到helper上,替换ocp4.tgz解压缩出来的同名文件即可。

教材修订

教材根据上游的项目做了修订,主要是源代码,为了应对纯离线环境,做了小的修改。如果在教学现场,发现有步骤做不下去,多半是因为离线环境的问题,请参考教学视频录像,里面会有如何绕过离线环境问题的技巧。

基础ocp4.4环境的部署细节

  • 按照离线的方法安装ocp4,里面要特别注意要有这些安装细节
    • 打上离线registries.conf的补丁
    • 打上local image registry ca的补丁
    • 配置image registry
    • 配置sample operator,并打上image stream的补丁
    • 部署离线operator hub

ccn for ocp-4.4 安装步骤

建议用独立的ocp4集群来安装ccn教材,因为ccn教材会全局的激活多个operator,这些operator也许对集群中的其他环境有影响。

# on helper
# deploy gogs
export LOCAL_REG='registry.redhat.ren:5443/'
# export LOCAL_REG=''
# gogs_var_date='2020-07-06'
podman stop gogs
podman rm -fv gogs
cd /data/ccn
rm -rf /data/ccn/gogs
podman run -d --name gogs-fs --entrypoint "tail" ${LOCAL_REG}docker.io/wangzheng422/gogs-fs:2020-07-17-1412 -f /dev/null
podman cp gogs-fs:/gogs.tgz /data/ccn/
tar zxf gogs.tgz
podman rm -fv gogs-fs

# change /data/ccn/gogs/resolv.conf to fit your env
# change /data/ccn/gogs/gogs/conf/app.ini to fit your env

# generally, tag latest works
podman run -d --name=gogs -p 10022:22 -p 10080:3000 -v /data/ccn/gogs:/data:Z -v /data/ccn/gogs/resolv.conf:/etc/resolv.conf:Z ${LOCAL_REG}docker.io/gogs/gogs
# for those not using provided source, try a specific tag.
podman run -d --name=gogs -p 10022:22 -p 10080:3000 -v /data/ccn/gogs:/data:Z -v /data/ccn/gogs/resolv.conf:/etc/resolv.conf:Z ${LOCAL_REG}docker.io/gogs/gogs:0.12.3

# restore if you need.
podman stop gogs
podman rm -fv gogs

# deploy nexus
mkdir -p /data/ccn/nexus
cd /data/ccn
rm -rf /data/ccn/nexus
podman run -d --name nexus-fs --entrypoint "tail" ${LOCAL_REG}docker.io/wangzheng422/nexus-fs:2020-07-20-0320 -f /dev/null
podman cp nexus-fs:/nexus.tgz /data/ccn/
tar zxf nexus.tgz ./
podman rm -fv nexus-fs

chown -R 200:root /data/ccn/nexus

# generally, tag latest works
podman run -d -p 8081:8081 -it --name nexus -v /data/ccn/nexus:/nexus-data:Z ${LOCAL_REG}docker.io/sonatype/nexus3:latest
# for those not using provided source, try a specific tag.
podman run -d -p 8081:8081 -it --name nexus -v /data/ccn/nexus:/nexus-data:Z ${LOCAL_REG}docker.io/sonatype/nexus3:3.26.1


# restore if you need.
podman stop nexus
podman rm -fv nexus

# deploy etherpad
mkdir -p /data/ccn/etherpad
chown -R 5001 /data/ccn/etherpad

podman run -d -p 9001:9001 -it --name etherpad -v /data/ccn/etherpad:/opt/etherpad-lite/var:z ${LOCAL_REG}docker.io/etherpad/etherpad:latest

# restore if you need.
podman stop etherpad
podman rm -fv etherpad

# agnosticd on helper

mkdir -p /data/pip3
cd /data/pip3
podman create --name swap registry.redhat.ren:5443/docker.io/wangzheng422/base-fs:pip3-whl-2020-07-05 ls
podman cp swap:/wheelhouse.tar.gz - > wheelhouse.tar.gz
tar vxf wheelhouse.tar.gz
podman rm -fv swap

pip3 install --user --upgrade -r wheelhouse/requirements.txt --no-index --find-links wheelhouse

# yum downgrade ansible-2.8.12-1.el7ae

# 安装ccn环境的参数
oc login -u kubeadmin
# oc login -u system:admin
# TARGET_HOST="bastion.rhte-b5c8.openshiftworkshop.com"
OCP_USERNAME="system:admin"
WORKLOAD="ocp4-workload-ccnrd"
GUID=b5c8
USER_COUNT=2
MODULE_TYPE="m1;m2;m3;m4"
SSH_KEY=~/.ssh/id_rsa
WZH_SUBDOMIN_BASE=base.ocp4.redhat.ren
WZH_REGISTRY_SERVER=registry.redhat.ren:5443
WZH_GOGS_SERVER=gogs.redhat.ren:10080

# create users
BASE_DIR="/root/ocp4"
mkdir -p ${BASE_DIR}
cd ${BASE_DIR}
/bin/rm -f ${BASE_DIR}/htpasswd
touch ${BASE_DIR}/htpasswd

for i in $(seq 1 $USER_COUNT)
do 
    htpasswd -Bb ${BASE_DIR}/htpasswd user${i} redhat
done

oc create secret generic htpasswd --from-file=${BASE_DIR}/htpasswd -n openshift-config

oc apply -f - <<EOF
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: Local Password
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpasswd
EOF

# oc delete secret htpasswd -n openshift-config

# 以下是安装步骤
# a TARGET_HOST is specified in the command line, without using an inventory file
cd /root/ocp4/agnosticd/ansible
ansible-playbook -i localhost, ./configs/ocp-workloads/ocp-workload.yml \
    -e"ansible_ssh_private_key_file=${SSH_KEY}" \
    -e"ansible_user=root" \
    -e"ocp_username=${OCP_USERNAME}" \
    -e"ocp_workload=${WORKLOAD}" \
    -e"silent=False" \
    -e"guid=${GUID}" \
    -e"num_users=${USER_COUNT}" \
    -e"user_count=${USER_COUNT}" \
    -e"module_type=${MODULE_TYPE}" \
    -e"wzh_registry_server=${WZH_REGISTRY_SERVER}" \
    -e"wzh_gogs_server=${WZH_GOGS_SERVER}" \
    -e"ansible_python_interpreter=/usr/bin/python3" \
    -e"subdomain_base=${WZH_SUBDOMIN_BASE}" \
    -v \
    -e"ACTION=create"

# 由于实验环境里面的演示网站,会用到一些在线的静态文件,如果客户端浏览器不能联网
# 或者不能沟通"快速"上网,那么需要给这些在线资源做dns解析,解析到平台的router上来
# 离线的安装介质里面,有static-html,用来提供这些静态文件服务。
# at.alicdn.com
# maxcdn.bootstrapcdn.com
# cdnjs.cloudflare.com
# ajax.googleapis.com
# code.jquery.com

# 以下是删除ccn的步骤,注意大部分的operator不会删除。
# a TARGET_HOST is specified in the command line, without using an inventory file
ansible-playbook -i localhost, ./configs/ocp-workloads/ocp-workload.yml \
    -e"ansible_ssh_private_key_file=${SSH_KEY}" \
    -e"ansible_user=root" \
    -e"ocp_username=${OCP_USERNAME}" \
    -e"ocp_workload=${WORKLOAD}" \
    -e"silent=False" \
    -e"guid=${GUID}" \
    -e"num_users=${USER_COUNT}" \
    -e"user_count=${USER_COUNT}" \
    -e"module_type=${MODULE_TYPE}" \
    -e"wzh_registry_server=${WZH_REGISTRY_SERVER}" \
    -e"wzh_gogs_server=${WZH_GOGS_SERVER}" \
    -e"ansible_python_interpreter=/usr/bin/python3" \
    -e"subdomain_base=${WZH_SUBDOMIN_BASE}" \
    -v \
    -e"ACTION=remove"


其他备忘

yum install -y wget jq

# Keycloak credentials: admin / 2kBdjDwcZK94
# STACK_ID: stacksq1xbet4os1uioep

manully patch image stream

  • jenkins:2 to registry.redhat.ren/ocp4/openshift4@sha256:*****
  • jenkins:latest to registry.redhat.ren/ocp4/openshift4@sha256:*****

tips

oc get istio-io -n opentlc-mgr-tutorial

oc new-build -i openshift/redhat-openjdk18-openshift:1.5 --binary --name=inventory-quarkus -l app=inventory-quarkus

npm run nodeshift --  --dockerImage=registry.redhat.ren:5443/docker.io/wangzheng422/cloudnative-workspaces-quarkus  --imageTag=nodejs-10-2020-07-16-2155

todo

  • PPT

离线ccn, containered cloud native 制作

基本思路

  • 需要一个离线的github
    • 目前看,gogs没有体现在离线部署脚本中。
    • gogs集群外部署,不外置数据库。以后在考虑如何集群内部署,如何pv import
    • 研究gogs api,批量创建用户和project
  • 需要一个maven的离线proxy
    • 目前看,没有包含在离线脚本中,但是crw里面有个配置,指向了离线proxy,似乎好做。
    • nexus集群外部署.
  • 需要各种镜像
    • 目前看,用的大多是image stream,反而好做

additional need:

  • maven repository cache
  • github clone site
    • https://github.com/wangzheng422/cloud-native-workshop-v2m1-guides
    • https://github.com/wangzheng422/cloud-native-workshop-v2m2-guides
    • https://github.com/wangzheng422/cloud-native-workshop-v2m3-guides
    • https://github.com/RedHat-Middleware-Workshops/cloud-native-workshop-v2m4-guides
    • https://github.com/wangzheng422/cloud-native-workshop-v2-infra
      • branch: dev-ocp-4.2
    • https://github.com/wangzheng422/cloud-native-workshop-v2m1-labs
    • https://github.com/wangzheng422/cloud-native-workshop-v2m2-labs
    • https://github.com/wangzheng422/cloud-native-workshop-v2m3-labs
    • https://github.com/RedHat-Middleware-Workshops/cloud-native-workshop-v2m4-labs

image need:

  • gitlab/gitlab-ce:latest
  • quay.io/osevg/workshopper
  • quay.io/openshiftlabs/rhamt-web-openshift-messaging-executor:4.2.1.Final
  • quay.io/openshiftlabs/rhamt-web-openshift:4.2.1.Final
  • registry.redhat.io/openshift-service-mesh/istio-rhel8-operator:1.0.3
  • is: jenkins:2 from ocp 4.2 install
  • is: quarkus-stack:1.3 quay.io/openshiftlabs/cloudnative-workspaces-quarkus:1.3 to change .m2/settings.xml to add my mirror

reference:

  • https://github.com/RedHat-Middleware-Workshops/cloud-native-workshop-v2-infra/tree/ocp-3.11 , we use ocp-4.2 branch right now.

my upstream repository

  • https://github.com/wangzheng422/cloud-native-workshop-v2-infra
  • quay.io/wangzheng422/gogs-fs
  • quay.io/wangzheng422/nexus-fs

build github clone site, using gogs,

似乎gogs并没有在离线部署脚本中

# http://gogs.redhat.ren:10080/

yum install firewalld
systemctl enable firewalld
systemctl start firewalld

yum -y install podman pigz skopeo buildah

podman stop gogs || true
podman rm -fv gogs || true
podman stop nexus || true
podman rm -fv nexus || true
podman stop etherpad || true
podman rm -fv etherpad || true

podman image prune -a

cd /data/ccn
rm -rf /data/ccn/gogs
podman run -d --name gogs-fs --entrypoint "tail" docker.io/wangzheng422/gogs-fs:init -f /dev/null
podman cp gogs-fs:/gogs.tgz /data/ccn/
tar zxf gogs.tgz
podman rm -fv gogs-fs

firewall-cmd --permanent --add-port=10080/tcp
firewall-cmd --reload
firewall-cmd --list-all

podman run -d --name=gogs -p 10022:22 -p 10080:3000 -v /data/ccn/gogs:/data:Z docker.io/gogs/gogs

# Custom config '/data/ccn/gogs/gogs/conf/app.ini'
# find the access key in pwd file
export ACCESS_KEY=""

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m1-guides

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m2-guides

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m3-guides

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m4-guides

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m1-labs

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m2-labs

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m3-labs

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m4-labs

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://gogs.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m1-guides.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m1-guides"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://gogs.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m2-guides.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m2-guides"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://gogs.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m3-guides.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m3-guides"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://gogs.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m4-guides.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m4-guides"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://gogs.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m1-labs.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m1-labs"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://gogs.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m2-labs.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m2-labs"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://gogs.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m3-labs.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m3-labs"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://gogs.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m4-labs.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m4-labs"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://gogs.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/spring-projects/spring-petclinic.git"'", "uid": '"1"', "repo_name": "'"spring-petclinic"'" }' 


podman logs -f gogs

podman stop gogs
podman rm -fv gogs

# bash demo.env.build.sh
cd /data/ccn

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

podman stop gogs
podman rm -fv gogs

tar cf - ./gogs | pigz -c > gogs.tgz
buildah from --name onbuild-container docker.io/library/centos:centos7
buildah copy onbuild-container gogs.tgz /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/gogs-fs:$var_date
# buildah rm onbuild-container
buildah push docker.io/wangzheng422/gogs-fs:$var_date
echo "docker.io/wangzheng422/gogs-fs:$var_date"


build maven repository cache

# http://nexus.redhat.ren:8081
mkdir -p cd /data/ccn
cd /data/ccn
rm -rf /data/ccn/nexus
podman run -d --name nexus-fs --entrypoint "tail" docker.io/wangzheng422/nexus-fs:2020-10-25-0919 -f /dev/null
podman cp nexus-fs:/nexus.tgz /data/ccn/
tar zxf nexus.tgz ./
podman rm -fv nexus-fs

podman run -d -p 8081:8081 -it --name nexus -v /data/ccn/nexus:/nexus-data:Z docker.io/sonatype/nexus3:3.26.1


## change code ready workspace
# change maven settings.xml for maven proxy

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

# on vultr init stack image
# mkdir -p /data/ccn/workspaces
# cd /data/ccn/workspaces
# # wget -O settings.xml https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/settings.xml
# wget -O settings.xml https://raw.githubusercontent.com/wangzheng422/agnosticd/wzh-dev/ansible/roles/ocp4-workload-ccnrd/files/settings.xml
# wget -O .npmrc https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/.npmrc
# wget -O stack.Dockerfile https://raw.githubusercontent.com/wangzheng422/agnosticd/wzh-dev/ansible/roles/ocp4-workload-ccnrd/files/stack.Dockerfile

# buildah bud --squash --format=docker -t docker.io/wangzheng422/cloudnative-workspaces-quarkus:init-2.1 -f stack.Dockerfile .

# buildah push docker.io/wangzheng422/cloudnative-workspaces-quarkus:init-2.1

# on vultr update stack update
var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

mkdir -p /data/ccn/workspaces
cd /data/ccn/workspaces
# /bin/cp -f /data/order-service.tgz ./
wget -O settings.xml https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/settings.xml
wget -O .npmrc https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/.npmrc
wget -O .bowerrc https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/.bowerrc
wget --no-check-certificate --no-cache --no-cookies -O stack.Dockerfile https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/stack.dev.Dockerfile

buildah bud --squash --format=docker -t docker.io/wangzheng422/cloudnative-workspaces-quarkus:$var_date -f stack.Dockerfile .

buildah push docker.io/wangzheng422/cloudnative-workspaces-quarkus:$var_date

# on site stack update
buildah from --name onbuild-container registry.redhat.ren:5443/docker.io/wangzheng422/cloudnative-workspaces-quarkus:2020-07-08-1594213447
buildah run onbuild-container /bin/rm -rf /tmp/*
buildah umount onbuild-container 
buildah commit --rm --squash --format=docker onbuild-container registry.redhat.ren:5443/docker.io/wangzheng422/cloudnative-workspaces-quarkus:$var_date
# buildah rm onbuild-container
buildah push registry.redhat.ren:5443/docker.io/wangzheng422/cloudnative-workspaces-quarkus:$var_date
echo "registry.redhat.ren:5443/docker.io/wangzheng422/cloudnative-workspaces-quarkus:$var_date"


# get nexus fs
podman stop nexus
podman rm -fv nexus

cd /data/ccn

tar cf - ./nexus | pigz -c > nexus.tgz 
buildah from --name onbuild-container docker.io/library/centos:centos7
buildah copy onbuild-container nexus.tgz /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/nexus-fs:$var_date
# buildah rm onbuild-container
buildah push docker.io/wangzheng422/nexus-fs:$var_date
echo "docker.io/wangzheng422/nexus-fs:$var_date"
# docker.io/wangzheng422/nexus-fs:2020-10-25-0919

nodejs image


# on vultr update stack update
var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

mkdir -p /data/ccn/workspaces
cd /data/ccn/workspaces
# /bin/cp -f /data/order-service.tgz ./
wget -O settings.xml https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/settings.xml
wget -O .npmrc https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/.npmrc
wget --no-check-certificate --no-cache --no-cookies -O stack.Dockerfile https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/nodejs-10.Dockerfile

buildah bud --format=docker -t docker.io/wangzheng422/cloudnative-workspaces-quarkus:nodejs-10-$var_date -f stack.Dockerfile .

buildah push docker.io/wangzheng422/cloudnative-workspaces-quarkus:nodejs-10-$var_date


build static html file


# get source to image 
# https://github.com/openshift/source-to-image
wget -O source-to-image.tgz https://github.com/openshift/source-to-image/releases/download/v1.3.0/source-to-image-v1.3.0-eed2850f-linux-amd64.tar.gz
tar zvxf source-to-image.tgz
mv s2i /usr/local/bin/

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

rm -rf /data/ccn/static-html
mkdir -p /data/ccn/static-html/files
cd /data/ccn/static-html/files

mkdir -p bootstrap/3.3.5/css/
wget -O bootstrap/3.3.5/css/bootstrap.min.css https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css
wget -O bootstrap/3.3.5/css/bootstrap-theme.min.css https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap-theme.min.css

mkdir -p bootstrap/3.3.5/js/
wget -O bootstrap/3.3.5/js/bootstrap.min.js https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js

mkdir -p ajax/libs/jquery/2.1.4/
wget -O ajax/libs/jquery/2.1.4/jquery.min.js https://ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js

mkdir -p bootstrap/3.3.5/fonts/
wget -O bootstrap/3.3.5/fonts/glyphicons-halflings-regular.woff2  https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/glyphicons-halflings-regular.woff2
wget -O bootstrap/3.3.5/fonts/glyphicons-halflings-regular.woff https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/glyphicons-halflings-regular.woff
wget -O bootstrap/3.3.5/fonts/glyphicons-halflings-regular.ttf https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/glyphicons-halflings-regular.ttf

mkdir -p t/
wget -O t/font_148784_v4ggb6wrjmkotj4i.woff      https://at.alicdn.com/t/font_148784_v4ggb6wrjmkotj4i.woff
wget -O t/font_148784_v4ggb6wrjmkotj4i.ttf       https://at.alicdn.com/t/font_148784_v4ggb6wrjmkotj4i.ttf

mkdir -p bootstrap/4.0.0-beta/css/
wget -O bootstrap/4.0.0-beta/css/bootstrap.min.css       https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta/css/bootstrap.min.css

mkdir -p ajax/libs/patternfly/3.24.0/css/
wget -O ajax/libs/patternfly/3.24.0/css/patternfly.min.css      https://cdnjs.cloudflare.com/ajax/libs/patternfly/3.24.0/css/patternfly.min.css
wget -O ajax/libs/patternfly/3.24.0/css/patternfly-additions.min.css    https://cdnjs.cloudflare.com/ajax/libs/patternfly/3.24.0/css/patternfly-additions.min.css

wget -O jquery-3.2.1.min.js     https://code.jquery.com/jquery-3.2.1.min.js

mkdir -p ajax/libs/jquery-timeago/1.6.1/
wget -O ajax/libs/jquery-timeago/1.6.1/jquery.timeago.min.js    https://cdnjs.cloudflare.com/ajax/libs/jquery-timeago/1.6.1/jquery.timeago.min.js

mkdir -p ajax/libs/angularjs/1.4.8/
wget -O ajax/libs/angularjs/1.4.8/angular.min.js        https://ajax.googleapis.com/ajax/libs/angularjs/1.4.8/angular.min.js

cd /data/ccn/static-html/

s2i build --rm  files/  registry.redhat.io/rhscl/nginx-114-rhel7:latest  nginx-sample-app

docker tag nginx-sample-app docker.io/wangzheng422/cloudnative-workspaces-quarkus:swap-$var_date
docker push docker.io/wangzheng422/cloudnative-workspaces-quarkus:swap-$var_date
echo docker.io/wangzheng422/cloudnative-workspaces-quarkus:swap-$var_date

wget -O mime.types https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/mime.types
wget -O nginx.conf https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/nginx.conf

cat << EOF > nginx.Dockerfile
FROM docker.io/wangzheng422/cloudnative-workspaces-quarkus:swap-$var_date

USER root
COPY mime.types /etc/nginx/
COPY nginx.conf /etc/nginx/

USER 1001
EOF

buildah bud --format=docker -t docker.io/wangzheng422/cloudnative-workspaces-quarkus:static-html-$var_date -f nginx.Dockerfile .

buildah push docker.io/wangzheng422/cloudnative-workspaces-quarkus:static-html-$var_date
echo "docker.io/wangzheng422/cloudnative-workspaces-quarkus:static-html-$var_date"


docker image prune -f
podman image prune -a

# oc -n labs-infra create route edge static-html-0 --service=static-html --hostname=maxcdn.bootstrapcdn.com 
# oc -n labs-infra create route edge static-html-1 --service=static-html   --hostname=ajax.googleapis.com 
# oc -n labs-infra create route edge static-html-2 --service=static-html   --hostname=at.alicdn.com
# oc -n labs-infra create route edge static-html-3 --service=static-html   --hostname=cdnjs.cloudflare.com
# oc -n labs-infra create route edge static-html-4 --service=static-html   --hostname=code.jquery.com

pip for agnosticd

# on vultr perpare pip
# https://www.linuxtechi.com/use-ansible-galaxy-roles-ansible-playbook/
# https://docs.ansible.com/ansible/latest/scenario_guides/guide_kubernetes.html
# https://stackoverflow.com/questions/11091623/how-to-install-packages-offline
# https://www.activestate.com/resources/quick-reads/how-to-update-all-python-packages/
# yum install -y python2-pip
mkdir -p /data/pip3
cd /data/pip3
# pip install --upgrade pip
pip3 install --user --upgrade kubernetes openshift requests
pip3 freeze > requirements.txt
pip3 install -r requirements.txt --upgrade
mkdir -p wheelhouse
pip2 download -r requirements.txt -d wheelhouse
/bin/cp -f requirements.txt wheelhouse/
tar -zcf wheelhouse.tar.gz wheelhouse


var_date=$(date '+%Y-%m-%d')
echo $var_date

buildah from --name onbuild-container scratch
buildah copy onbuild-container wheelhouse.tar.gz /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/base-fs:pip3-whl-$var_date
# buildah rm onbuild-container
buildah push docker.io/wangzheng422/base-fs:pip3-whl-$var_date
echo "docker.io/wangzheng422/base-fs:pip3-whl-$var_date"

labs sync


rsync -e ssh --info=progress2 -P --delete -arz bastion.fd21.example.opentlc.com:/data/ccn/nexus/  /data/ccn/nexus/

rsync -e ssh -P --delete -arz root@bastion.fd21.example.opentlc.com:/data/ccn/nexus/  ./nexus/ 

rsync -e ssh -P --delete -arz  ./nexus/  root@192.168.7.11:/data/ccn/nexus/   

chown -R 200:root nexus

rsync -e ssh --info=progress2 -P --delete -arz   192.168.252.11:/data/ccn/nexus/   ./nexus/   



other tips

find object blocks deleting namespace/project

  • https://access.redhat.com/solutions/4165791
PROJECT_NAME=user1-cloudnativeapps

oc api-resources --verbs=list --namespaced -o name | xargs -n 1 oc get --show-kind --ignore-not-found -n $PROJECT_NAME

oc api-resources --verbs=list --cached --namespaced -o name | xargs -n 1 oc get --show-kind --ignore-not-found -n $PROJECT_NAME


configuration.serving.knative.dev/payment
service.serving.knative.dev/payment
route.serving.knative.dev/payment


service mesh & knative

oc project istio-system
oc get pod -o json | jq -r '.items[].spec.containers[].image' > tmp.list

oc project istio-operator
oc get pod -o json | jq -r '.items[].spec.containers[].image' >> tmp.list

oc project knative-eventing
oc get pod -o json | jq -r '.items[].spec.containers[].image' >> tmp.list

oc project knative-serving
oc get pod -o json | jq -r '.items[].spec.containers[].image' >> tmp.list

oc project tekton-pipelines
oc get pod -o json | jq -r '.items[].spec.containers[].image' >> tmp.list

oc get pod -o json | jq -r '.items[].spec.initContainers[].image' >> tmp.list

oc project openshift-operators
oc get pod -o json | jq -r '.items[].spec.containers[].image' >> tmp.list



cat tmp.list | sort | uniq

oc project user0-catalog
oc get pod -o json | jq -r '.items[].spec.containers[].image'| sort | uniq 


try the install shell

cd
git clone https://github.com/wangzheng422/cloud-native-workshop-v2-infra
cd cloud-native-workshop-v2-infra
git fetch origin 
git checkout -b dev-ocp-4.2 origin/dev-ocp-4.2

# in local vm
rsync -e ssh --info=progress2 -P --delete -arz /data/registry-add root@base-pvg.redhat.ren:/data/

# on base-pvg
ansible localhost -m lineinfile -a 'path=/etc/hosts line="127.0.0.1 registry-add.redhat.ren"'

cat > /etc/dnsmasq.d/origin-upstream-dns.conf << EOF 
server=10.66.208.137
EOF

systemctl restart dnsmasq

podman run -d --name mirror-registry \
-p 5000:5000 --restart=always \
-v /data/registry-add:/var/lib/registry:z \
-v /etc/crts/:/certs:z \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/redhat.ren.crt \
-e REGISTRY_HTTP_TLS_KEY=/certs/redhat.ren.key \
registry:2


###
skopeo copy docker://docker.io/wangzheng422/gogs-fs:2020-01-01 docker://registry.redhat.ren/docker.io/wangzheng422/gogs-fs:2020-01-01 
skopeo copy docker://docker.io/wangzheng422/nexus-fs:2020-01-01 docker://registry.redhat.ren/docker.io/wangzheng422/nexus-fs:2020-01-01 


# spring.datasource.initialization-mode: always

tips

  • spring.datasource.initialization-mode=always
  • prometheus: [ url ]

nodejs

git clone https://github.com/wangzheng422/cloud-native-workshop-v2m4-labs && cd cloud-native-workshop-v2m4-labs && git checkout ocp-4.4 && cd coolstore-ui

cat << EOF > Dockerfile
FROM docker.io/wangzheng422/cloudnative-workspaces-quarkus:nodejs-10-2020-07-16-2155

# Add application sources to a directory that the assemble script expects them
# and set permissions so that the container runs without root access
USER 0
ADD . /tmp/src
RUN chown -R 1001:0 /tmp/src
USER 1001

# Install the dependencies
RUN /usr/libexec/s2i/assemble

# Set the default command for the resulting image
CMD /usr/libexec/s2i/run
EOF


cat << "EOF" > post_install.sh
#!/bin/bash
var_new_domain="static-html-labs-infra.apps.redhat.container-contest.top"
var_new_domain_enc=$(echo $var_new_domain | sed "s/\./\\\./g")

# node_modules/.bin/bower install

# grep -rni "at.alicdn.com" *
# grep -rl 'at.alicdn.com' * | xargs sed -i "s/at\.alicdn\.com/$var_new_domain_enc/g" 
grep -rl 'code.jquery.com' * | xargs sed -i "s/code\.jquery\.com/$var_new_domain_enc/g" 

grep -rni "code.jquery.com" * || true
EOF

# change package.json
# change postinsall to the shell
# and try to fix domain issues.

podman build -t node-app .

以下是弯路

build github clone site, using gitlab

yum -y install podman

rm -rf /data/ccn/gitlab
mkdir -p /data/ccn/gitlab/config
mkdir -p /data/ccn/gitlab/logs
mkdir -p /data/ccn/gitlab/data


# podman run --detach \
#   --hostname local.redhat.ren \
#   --env GITLAB_OMNIBUS_CONFIG="external_url 'http://local.redhat.ren:7080/'; gitlab_rails['lfs_enabled'] = true;" \
#   --publish 7443:443 --publish 7080:80 --publish 7022:22 \
#   --name gitlab \
#   --restart always \
#   --volume /data/ocp4/demo/gitlab/config:/etc/gitlab:Z \
#   --volume /data/ocp4/demo/gitlab/logs:/var/log/gitlab:Z \
#   --volume /data/ocp4/demo/gitlab/data:/var/opt/gitlab:Z \
#   gitlab/gitlab-ce:latest

podman run --detach \
  --hostname local.redhat.ren \
  --publish 7443:443 --publish 7080:80 --publish 7022:22 \
  --name gitlab \
  --restart always \
  --volume /data/ccn/gitlab/config:/etc/gitlab:Z \
  --volume /data/ccn/gitlab/logs:/var/log/gitlab:Z \
  --volume /data/ccn/gitlab/data:/var/opt/gitlab:Z \
  gitlab/gitlab-ce:latest

# set default username / password
# root / redhat2019

podman stop gitlab

podman rm -fv gitlab

cd /data/ccn
# tar zcf gitlab.tgz ./gitlab 
cat << EOF > /data/ccn/gitlab.files.Dockerfile
FROM registry.redhat.io/ubi7/ubi
COPY gitlab /gitlab
EOF
podman build --no-cache -f /data/ccn/gitlab.files.Dockerfile -t quay.io/wangzheng422/gitlab-fs /data/ccn/
podman push quay.io/wangzheng422/gitlab-fs

podman exec -it gitlab update-permissions
podman restart gitlab
podman logs -f gitlab
getfacl /data/ccn/gitlab/

# now we try to use it
rm -rf /data/ccn/gitlab
podman run -d --name gitlab-fs --entrypoint "tail" quay.io/wangzheng422/gitlab-fs -f /dev/null
podman cp gitlab-fs:/gitlab /data/ccn/
podman rm -fv gitlab-fs
# tar zxf gitlab.tgz
# chown -R root: /data/ccn/gitlab/

containered cloud-native (ccn) roadshow 离线部署

CCN是一个不错的演示openshift之上,ci/cd, cloud-native, istio, serverless的演示教材,教学的内容非常丰富。

第一个模块,着重讲解如何拆分单体应用,以及拆分的应用如何上云。

第二个模块,讲解如何在线debug, 如何监控上云的应用

第三个模块,应用转换到服务网格service mesh/istio架构

第四个模块,应用使用无服务架构serverless/knative架构开发

培训过程视频

安装过程视频

不过 upstream 的 CCN 是基于 rh demo system 的,必须在线,这里就做了一个离线的版本,供给客户离线使用。

离线部署架构描述

本次CCN离线,是基于ocp 4.6.9 制作。一共有4个module。

做CCN的离线,主要有以下3部分工作

  • github 离线
  • maven, npm 离线
  • 需要的镜像离线

在实验室的部署架构如下,供参考:

可以看到,与标准的部署架构没什么区别,就是在helper节点上面,加了gogs, nexus。

安装介质下载

请到如下的链接,下载安装介质,注意,这个安装介质是基于ocp 4.6.9 制作。

链接: https://pan.baidu.com/s/1jJU0HLnZMnvCNMNq1OEDxA 密码: uaaw

其中包括如下类型的文件:

  • ocp4.tgz 这个文件包含了iso等安装介质,以及各种安装脚本,全部下载的镜像列表等。需要复制到宿主机,以及工具机上去。
  • registry.tgz 这个文件也是docker image registry的仓库打包文件。需要先补充镜像的话,按照这里操作: 4.6.add.image.md
  • nexus-image.tgz 这个是nexus的镜像仓库打包,集群的镜像proxy指向nexus,由nexus提供镜像的cache
  • poc.image.tgz 这个是给registry.tgz补充的一些镜像,主要是ccn使用,补充的镜像列表在这里 poc.image.list ,按照这里操作: 4.6.add.image.md

由于上传的时候,安装5G大小切分,下载以后,合并使用如下的命令范例:

cat registry.?? > registry.tgz

百度盘上还会有补丁文件,比如,当有一个 agnosticd.zip 文件时, 这个就是补丁文件,上传到helper上,替换ocp4.tgz解压缩出来的同名文件即可。

教材修订

教材根据上游的项目做了修订,主要是源代码,为了应对纯离线环境,做了小的修改。如果在教学现场,发现有步骤做不下去,多半是因为离线环境的问题,请参考教学视频录像,里面会有如何绕过离线环境问题的技巧。

基础ocp4.6环境的部署细节

  • 按照离线的方法安装ocp4,里面要特别注意要有这些安装细节
    • 部署nexus镜像仓库代理
    • 打上离线registries.conf的补丁,指向nexus
    • 给ingress配置真证书
    • 配置image registry
    • 配置sample operator,并打上image stream的补丁
    • 部署离线operator hub

ccn for ocp-4.6 安装步骤

建议用独立的ocp4集群来安装ccn教材,因为ccn教材会全局的激活多个operator,这些operator也许对集群中的其他环境有影响。

# on helper
# deploy gitea
export LOCAL_REG='registry.ocp4.redhat.ren:5443'
# export LOCAL_REG=''
# gogs_var_date='2020-07-06'
podman stop gitea
podman rm -fv gitea

mkdir -p /data/ccn/gitea

cd /data/ccn
podman create --name swap $LOCAL_REG/wangzheng422/gogs-fs:gitea-2020-12-26-1325 ls
podman cp swap:/gitea.tgz /data/ccn/gitea.tgz
podman rm -fv swap
tar zvxf gitea.tgz
rm -f gitea.tgz
chown -R 1000:1000 /data/ccn/gitea

podman run -d --name gitea \
  -v /data/ccn/gitea:/data:Z \
  -e USER_UID=1000 \
  -e USER_GID=1000 \
  -p 10080:3000 \
  -p 10022:22 \
  ${LOCAL_REG}/gitea/gitea:1.13.0

# deploy nexus for maven
mkdir -p /data/ccn/nexus
cd /data/ccn/
podman create --name swap $LOCAL_REG/wangzheng422/nexus-fs:maven-2020-12-25-2024 ls
podman cp swap:/nexus.tgz /data/ccn/nexus.tgz
podman rm -fv swap
tar zvxf nexus.tgz
rm -f nexus.tgz
chown -R 200 /data/ccn/nexus

podman run -d -p 8081:8081 --name nexus -v /data/ccn/nexus:/nexus-data:Z $LOCAL_REG/sonatype/nexus3:3.29.0


# deploy etherpad for notes
mkdir -p /data/ccn/etherpad
chown -R 5001 /data/ccn/etherpad

podman run -d -p 9001:9001 -it --name etherpad -v /data/ccn/etherpad:/opt/etherpad-lite/var:z $LOCAL_REG/etherpad/etherpad:latest

# deploy mta vscode extenstion to helper web server
mkdir -p /data/ccn/vscode
mkdir -p /var/www/html/ccn/
cd /data/ccn/vscode
podman create --name swap $LOCAL_REG/wangzheng422/imgs:mta-vscode-extension.vsix-2020-12-30-1012 ls
podman cp swap:/mta-vscode-extension.vsix /var/www/html/ccn/mta-vscode-extension.vsix
podman cp swap:/logo-eclipseche.svg /var/www/html/ccn/logo-eclipseche.svg
podman rm -fv swap


# agnosticd on helper
mkdir -p /data/pip3
cd /data/pip3
podman create --name swap $LOCAL_REG/wangzheng422/base-fs:pip3-whl-2020-07-05 ls
podman cp swap:/wheelhouse.tar.gz wheelhouse.tar.gz
tar vxf wheelhouse.tar.gz
podman rm -fv swap

pip3 install --user --upgrade -r wheelhouse/requirements.txt --no-index --find-links wheelhouse

# 集群证书
# ccn 环境,高度依赖ingress证书,需要配置一个公网CA签发的真证书,给 *.apps.ocp4.redhat.ren

# install chrome on kvm host
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
yum install ./google-chrome-stable_current_*.rpm
google-chrome --no-sandbox --ignore-certificate-errors &

# fix js cache issue
cat << EOF >> /etc/hosts
127.0.0.1 maxcdn.bootstrapcdn.com ajax.googleapis.com at.alicdn.com cdnjs.cloudflare.com code.jquery.com
EOF

# 安装ccn环境的参数
# oc login -u kubeadmin
oc login -u system:admin
# TARGET_HOST="bastion.rhte-b5c8.openshiftworkshop.com"
OCP_USERNAME="system:admin"
WORKLOAD="ocp4-workload-ccnrd"
GUID=b5c8
USER_COUNT=2
MODULE_TYPE="m1;m2;m3;m4"
SSH_KEY=~/.ssh/helper_rsa
WZH_SUBDOMIN_BASE=base.ocp4.redhat.ren
WZH_REGISTRY_SERVER=nexus.ocp4.redhat.ren:8083
WZH_GOGS_SERVER=git.ocp4.redhat.ren:10080
WZH_WEB_SERVER=helper.ocp4.redhat.ren:8080

ssh-copy-id -i ~/.ssh/helper_rsa.pub root@localhost

# create users
BASE_DIR="/data/install"
mkdir -p ${BASE_DIR}
cd ${BASE_DIR}
/bin/rm -f ${BASE_DIR}/htpasswd
touch ${BASE_DIR}/htpasswd

for i in $(seq 1 $USER_COUNT)
do 
    htpasswd -Bb ${BASE_DIR}/htpasswd user${i} redhat
done

oc create secret generic htpasswd --from-file=${BASE_DIR}/htpasswd -n openshift-config

oc apply -f - <<EOF
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: HTPassword
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpasswd
EOF

# oc delete secret htpasswd -n openshift-config

# 以下是安装步骤
# a TARGET_HOST is specified in the command line, without using an inventory file
oc project default
cd /data/ocp4/agnosticd/ansible
ansible-playbook -i localhost, ./configs/ocp-workloads/ocp-workload.yml \
    -e"ansible_ssh_private_key_file=${SSH_KEY}" \
    -e"ansible_user=root" \
    -e"ocp_username=${OCP_USERNAME}" \
    -e"ocp_workload=${WORKLOAD}" \
    -e"silent=False" \
    -e"guid=${GUID}" \
    -e"num_users=${USER_COUNT}" \
    -e"user_count=${USER_COUNT}" \
    -e"module_type=${MODULE_TYPE}" \
    -e"wzh_registry_server=${WZH_REGISTRY_SERVER}" \
    -e"wzh_gogs_server=${WZH_GOGS_SERVER}" \
    -e"wzh_web_server=${WZH_WEB_SERVER}" \
    -e"ansible_python_interpreter=/usr/bin/python3" \
    -e"subdomain_base=${WZH_SUBDOMIN_BASE}" \
    -v \
    -e"ACTION=create"

# 由于实验环境里面的演示网站,会用到一些在线的静态文件,如果客户端浏览器不能联网
# 或者不能沟通"快速"上网,那么需要给这些在线资源做dns解析,解析到平台的router上来
# 离线的安装介质里面,有static-html,用来提供这些静态文件服务。
# at.alicdn.com
# maxcdn.bootstrapcdn.com
# cdnjs.cloudflare.com
# ajax.googleapis.com
# code.jquery.com

# 以下是删除ccn的步骤,注意大部分的operator不会删除。
# a TARGET_HOST is specified in the command line, without using an inventory file
cd /data/ocp4/agnosticd/ansible
ansible-playbook -i localhost, ./configs/ocp-workloads/ocp-workload.yml \
    -e"ansible_ssh_private_key_file=${SSH_KEY}" \
    -e"ansible_user=root" \
    -e"ocp_username=${OCP_USERNAME}" \
    -e"ocp_workload=${WORKLOAD}" \
    -e"silent=False" \
    -e"guid=${GUID}" \
    -e"num_users=${USER_COUNT}" \
    -e"user_count=${USER_COUNT}" \
    -e"module_type=${MODULE_TYPE}" \
    -e"wzh_registry_server=${WZH_REGISTRY_SERVER}" \
    -e"wzh_gogs_server=${WZH_GOGS_SERVER}" \
    -e"wzh_web_server=${WZH_WEB_SERVER}" \
    -e"ansible_python_interpreter=/usr/bin/python3" \
    -e"subdomain_base=${WZH_SUBDOMIN_BASE}" \
    -v \
    -e"ACTION=remove"


做练习中需要注意的地方

# git 链接要改成gitea上的地址
# http://git.ocp4.redhat.ren:10080/root/cloud-native-workshop-v2m1-labs.git
# http://git.ocp4.redhat.ren:10080/root/cloud-native-workshop-v2m2-labs.git
# http://git.ocp4.redhat.ren:10080/root/cloud-native-workshop-v2m3-labs.git
# http://git.ocp4.redhat.ren:10080/root/cloud-native-workshop-v2m4-labs.git
# http://git.ocp4.redhat.ren:10080/root/vote-api.git
# http://git.ocp4.redhat.ren:10080/root/vote-ui.git

# oc 命令有引用镜像的地方,都要改成nexus上的地址
oc new-build --docker-image=nexus.ocp4.redhat.ren:8083/ubi8/openjdk-11 --binary --name=catalog-springboot -l app=catalog-springboot

# in module 4, nodeshift编译命令要改一下。
npm run nodeshift --  --dockerImage=nexus.ocp4.redhat.ren:8083/wangzheng422/imgs --imageTag=nodejs-10-wzh-2021-01-05

其他备忘

yum install -y wget jq

# Keycloak credentials: admin / 2kBdjDwcZK94
# STACK_ID: stacksq1xbet4os1uioep

todo

  • PPT

离线ccn, containered cloud native 制作

基本思路

  • 需要一个离线的github
    • 目前看,gogs没有体现在离线部署脚本中。
    • gogs集群外部署,不外置数据库。以后在考虑如何集群内部署,如何pv import
    • 研究gogs api,批量创建用户和project
  • 需要一个maven的离线proxy
    • 目前看,没有包含在离线脚本中,但是crw里面有个配置,指向了离线proxy,似乎好做。
    • nexus集群外部署.
  • 需要各种镜像
    • 目前看,用的大多是image stream,反而好做

additional need:

  • maven repository cache
  • github clone site
    • https://github.com/wangzheng422/cloud-native-workshop-v2m1-guides
    • https://github.com/wangzheng422/cloud-native-workshop-v2m2-guides
    • https://github.com/wangzheng422/cloud-native-workshop-v2m3-guides
    • https://github.com/RedHat-Middleware-Workshops/cloud-native-workshop-v2m4-guides
    • https://github.com/wangzheng422/cloud-native-workshop-v2-infra
      • branch: dev-ocp-4.2
    • https://github.com/wangzheng422/cloud-native-workshop-v2m1-labs
    • https://github.com/wangzheng422/cloud-native-workshop-v2m2-labs
    • https://github.com/wangzheng422/cloud-native-workshop-v2m3-labs
    • https://github.com/RedHat-Middleware-Workshops/cloud-native-workshop-v2m4-labs

image need:

  • registry.redhat.io/openshift-service-mesh/istio-rhel8-operator:1.0.3
  • is: jenkins:2 from ocp 4.2 install
  • is: quarkus-stack:1.3 quay.io/openshiftlabs/cloudnative-workspaces-quarkus:1.3 to change .m2/settings.xml to add my mirror

reference:

  • https://github.com/RedHat-Middleware-Workshops/cloud-native-workshop-v2-infra/tree/ocp-3.11 , we use ocp-4.2 branch right now.

my upstream repository

  • quay.io/wangzheng422/gogs-fs
  • quay.io/wangzheng422/nexus-fs

build github clone site, using gitea

似乎 gitea 并没有在离线部署脚本中

# http://git.ocp4.redhat.ren:10080/

cat << EOF >>  /etc/hosts
127.0.0.1 registry.ocp4.redhat.ren nexus.ocp4.redhat.ren git.ocp4.redhat.ren
EOF

yum install -y firewalld
systemctl disable --now firewalld
# systemctl start firewalld

yum -y install podman pigz skopeo buildah

podman image prune -a

############################################
# build init fs
mkdir -p /data/ccn/gitea
cd /data/ccn
rm -rf /data/ccn/gitea

mkdir -p /data/ccn/gitea
chown -R 1000:1000 /data/ccn/gitea

podman run -d --name gitea \
  -v /data/ccn/gitea:/data:Z \
  -e USER_UID=1000 \
  -e USER_GID=1000 \
  -p 10080:3000 \
  -p 10022:22 \
  docker.io/gitea/gitea:1.13.0

# admin user: root / redhat
# api call token : 6d47a0172d53e567737f7a81bbb6dbff4c1565d1

cd /data/ccn
tar cf - ./gitea | pigz -c > gitea.tgz 
buildah from --name onbuild-container scratch
buildah copy onbuild-container gitea.tgz  /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/gogs-fs:gitea-init
rm -f gitea.tgz 
buildah push docker.io/wangzheng422/gogs-fs:gitea-init
echo "docker.io/wangzheng422/gogs-fs:gitea-init"

######################################################
# build gitea based on init fs
mkdir -p /data/ccn/gitea
cd /data/ccn
rm -rf /data/ccn/gitea

mkdir -p /data/ccn/gitea
chown -R 1000:1000 /data/ccn/gitea

cd /data/ccn
podman create --name swap docker.io/wangzheng422/gogs-fs:gitea-init ls
podman cp swap:/gitea.tgz - > gitea.tgz
podman rm -fv swap
tar zvxf gitea.tgz
rm -f gitea.tgz
chown -R 1000:1000 /data/ccn/gitea

podman run -d --name gitea \
  -v /data/ccn/gitea:/data:Z \
  -e USER_UID=1000 \
  -e USER_GID=1000 \
  -p 10080:3000 \
  -p 10022:22 \
  docker.io/gitea/gitea:1.13.0


# Custom config '/data/ccn/gogs/gogs/conf/app.ini'
# find the access key in pwd file
export ACCESS_KEY="6d47a0172d53e567737f7a81bbb6dbff4c1565d1"

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m1-guides

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m2-guides

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m3-guides

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m4-guides

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m1-labs

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m2-labs

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m3-labs

# curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X DELETE http://gogs.redhat.ren:10080/api/v1/repos/root/cloud-native-workshop-v2m4-labs

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m1-guides.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m1-guides"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m2-guides.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m2-guides"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m3-guides.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m3-guides"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m4-guides.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m4-guides"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m1-labs.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m1-labs"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m2-labs.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m2-labs"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m3-labs.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m3-labs"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/cloud-native-workshop-v2m4-labs.git"'", "uid": '"1"', "repo_name": "'"cloud-native-workshop-v2m4-labs"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/spring-projects/spring-petclinic.git"'", "uid": '"1"', "repo_name": "'"spring-petclinic"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/vote-api.git"'", "uid": '"1"', "repo_name": "'"vote-api"'" }' 

curl -v -s -w '%{http_code}' -H "Authorization: token ${ACCESS_KEY}" -X POST http://git.ocp4.redhat.ren:10080/api/v1/repos/migrate \
        -H "Content-Type: application/json" \
        -d '{"clone_addr": "'"https://github.com/wangzheng422/vote-ui.git"'", "uid": '"1"', "repo_name": "'"vote-ui"'" }' 


podman logs -f gitea

podman stop gitea
podman rm -fv gitea

# bash demo.env.build.sh
cd /data/ccn

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

tar cf - ./gitea | pigz -c > gitea.tgz
buildah from --name onbuild-container scratch
buildah copy onbuild-container gitea.tgz /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/gogs-fs:gitea-$var_date
rm -f gitea.tgz
buildah push docker.io/wangzheng422/gogs-fs:gitea-$var_date
echo "docker.io/wangzheng422/gogs-fs:gitea-$var_date"

# docker.io/wangzheng422/gogs-fs:gitea-2021-01-06-0652

create an online nexus maven proxy

我们使用一个在线的nexus proxy,来cache maven

  • https://blog.csdn.net/kq1983/article/details/83066102
# get old fs
mkdir -p /data/ccn/nexus
cd /data/ccn/
podman create --name swap docker.io/wangzheng422/nexus-fs:2020-10-25-0919 ls
podman cp swap:/nexus.tgz - > /data/ccn/nexus.tgz
podman rm -fv swap
tar zvxf nexus.tgz
rm -f nexus.tgz

chown -R 200 /data/ccn/nexus

#####################################################
# init build the nexus fs
mkdir -p /data/ccn/nexus
chown -R 200 /data/ccn/nexus

podman run -d -p 8081:8081 --name nexus -v /data/ccn/nexus:/nexus-data:Z docker.io/sonatype/nexus3:3.29.0

podman stop nexus
podman rm nexus

# get the admin password
cat /data/ccn/nexus/admin.password && echo
# 8c9862da-5dcd-430c-a026-e3557539459a

# open http://nexus.ocp4.redhat.ren:8081

# add aliyun maven proxy
# https://blog.csdn.net/kq1983/article/details/83066102

######################################################
# dump the nexus image fs out

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date
cd /data/ccn

tar cf - ./nexus | pigz -c > nexus.tgz 
buildah from --name onbuild-container scratch
buildah copy onbuild-container nexus.tgz  /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/nexus-fs:maven-$var_date
# buildah rm onbuild-container
rm -f nexus.tgz 
buildah push docker.io/wangzheng422/nexus-fs:maven-$var_date
echo "docker.io/wangzheng422/nexus-fs:maven-$var_date"

# docker.io/wangzheng422/nexus-fs:maven-2021-01-06-1456

create code ready workspace image

CRW 给每个session启动了一个container,这个container的image就是用的操作台,我们定制一下这个操作台,让maven什么的都指向内网的proxy


mkdir -p /data/ccn/workspaces
cd /data/ccn/workspaces
# /bin/cp -f /data/order-service.tgz ./
wget -O settings.xml https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/ccn/settings.xml
wget -O .npmrc https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/ccn/.npmrc
wget -O .bowerrc https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/ccn/.bowerrc
wget --no-check-certificate --no-cache --no-cookies -O stack.Dockerfile https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/ccn/stack.dev.Dockerfile

buildah bud --format=docker -t docker.io/wangzheng422/cloudnative-workspaces-quarkus:2.4.1-wzh -f stack.Dockerfile .

buildah push docker.io/wangzheng422/cloudnative-workspaces-quarkus:2.4.1-wzh


mta vscode extension

ccn 4.6 做了一个vscode上的extension,这个需要做离线

################################3
## build mta extension
# install nodejs
curl -sL https://rpm.nodesource.com/setup_10.x | sudo bash -
yum install -y nodejs
npm install -g typescript vsce

mkdir -p /data/ccn/vscode
cd /data/ccn/vscode
git clone https://github.com/wangzheng422/rhamt-vscode-extension
cd rhamt-vscode-extension
git checkout ocp-4.6-ccn

npm install
npm run vscode:prepublish
vsce package -o mta-vscode-extension.vsix

cp mta-vscode-extension.vsix ../
cd /data/ccn/vscode

###################################
## use redhat upstream
var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

mkdir -p /data/ccn/vscode
cd /data/ccn/vscode
# wget -O mta-vscode-extension.vsix https://download.jboss.org/jbosstools/adapters/snapshots/mta-vscode-extension/mta-vscode-extension-0.0.48-662.vsix
wget https://www.eclipse.org/che/images/logo-eclipseche.svg

buildah from --name onbuild-container scratch
buildah copy onbuild-container mta-vscode-extension.vsix  /
buildah copy onbuild-container logo-eclipseche.svg  /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/imgs:mta-vscode-extension.vsix-$var_date
cd /data/ccn
# rm -rf /data/ccn/vscode
buildah push docker.io/wangzheng422/imgs:mta-vscode-extension.vsix-$var_date
echo "docker.io/wangzheng422/imgs:mta-vscode-extension.vsix-$var_date"
# docker.io/wangzheng422/imgs:mta-vscode-extension.vsix-2020-12-30-1012

##############################
# use real upstream
var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

buildah from --name onbuild-container quay.io/windupeng/mta-vscode-extension
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/imgs:mta-vscode-extension.base-$var_date
buildah push docker.io/wangzheng422/imgs:mta-vscode-extension.base-$var_date
echo "docker.io/wangzheng422/imgs:mta-vscode-extension.base-$var_date"
# docker.io/wangzheng422/imgs:mta-vscode-extension.base-2020-12-30-1340

# if you want to use prebuild newer version
# https://raw.githubusercontent.com/windup/rhamt-che-demo/master/meta.yaml
mkdir -p /data/ccn/vscode
cd /data/ccn/vscode
wget -O mta-vscode-extension.vsix https://download.jboss.org/jbosstools/adapters/snapshots/mta-vscode-extension/mta-vscode-extension-0.0.58-790.vsix
wget https://www.eclipse.org/che/images/logo-eclipseche.svg

buildah from --name onbuild-container scratch
buildah copy onbuild-container mta-vscode-extension.vsix  /
buildah copy onbuild-container logo-eclipseche.svg  /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/imgs:mta-vscode-extension.vsix-0.0.48-662
cd /data/ccn
# rm -rf /data/ccn/vscode
buildah push docker.io/wangzheng422/imgs:mta-vscode-extension.vsix-0.0.48-662

oc get pod -o json | jq -r .items[0].metadata.name
oc get pod -o json | jq -r .items[0].spec.containers[].name
oc get pod -o json | jq -r .items[0].spec.initContainers[].name

oc rsh -c $(oc get pod -o json | jq -r '.items[0].spec.containers[] | select( .name | contains("rhamt-extension") ) | .name')  $(oc get pod -o json | jq -r .items[0].metadata.name)

oc logs $(oc get pod -o json | jq -r .items[0].metadata.name) -c $(oc get pod -o json | jq -r '.items[0].spec.containers[] | select( .name | contains("rhamt-extension") ) | .name')

oc logs $(oc get pod -o json | jq -r .items[0].metadata.name) -c $(oc get pod -o json | jq -r '.items[0].spec.containers[] | select( .name | contains("theia-ide") ) | .name')

oc logs $(oc get pod -o json | jq -r .items[0].metadata.name) -c $(oc get pod -o json | jq -r '.items[0].spec.containers[] | select( .name | contains("vscode-quarkus") ) | .name')

oc logs $(oc get pod -o json | jq -r .items[0].metadata.name) -c $(oc get pod -o json | jq -r '.items[0].spec.containers[] | select( .name | contains("che-jwtproxy") ) | .name')

oc logs $(oc get pod -o json | jq -r .items[0].metadata.name) -c $(oc get pod -o json | jq -r '.items[0].spec.containers[] | select( .name | contains("quarkus-tools") ) | .name')

oc logs $(oc get pod -o json | jq -r .items[0].metadata.name) -c $(oc get pod -o json | jq -r '.items[0].spec.containers[] | select( .name | contains("che-machine-exe") ) | .name')

oc logs $(oc get pod -o json | jq -r .items[0].metadata.name) -c $(oc get pod -o json | jq -r '.items[0].spec.initContainers[] | select( .name | contains("remote-runtime-inject") ) | .name')

oc logs $(oc get pod -o json | jq -r .items[0].metadata.name) -c $(oc get pod -o json | jq -r '.items[0].spec.initContainers[] | select( .name | contains("pluginbroker-artifacts-rhel8") ) | .name')

oc exec $(oc get pod -o json | jq -r .items[0].metadata.name) -c $(oc get pod -o json | jq -r '.items[0].spec.containers[] | select( .name | contains("rhamt-extension") ) | .name') -- /usr/sbin/killall5


build static html file


# get source to image 
# https://github.com/openshift/source-to-image
wget -O source-to-image.tgz https://github.com/openshift/source-to-image/releases/download/v1.3.0/source-to-image-v1.3.0-eed2850f-linux-amd64.tar.gz
tar zvxf source-to-image.tgz
mv s2i /usr/local/bin/

var_date=$(date '+%Y-%m-%d-%H%M')
echo $var_date

rm -rf /data/ccn/static-html
mkdir -p /data/ccn/static-html/files
cd /data/ccn/static-html/files

download_url() {
  # https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css
  var_url=$1

  # bootstrap/3.3.5/css/bootstrap.min.css
  var_file=${var_url#*.*/}
  
  # bootstrap/3.3.5/css
  var_path=${var_file%/*}
  
  mkdir -p $var_path
  wget -O $var_file $var_url

}

download_url https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css
download_url https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap-theme.min.css
download_url https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css
download_url https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js
download_url https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta/css/bootstrap.min.css

download_url https://ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.map
download_url https://ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.map
download_url https://ajax.googleapis.com/ajax/libs/angularjs/1.4.8/angular.min.js

download_url https://at.alicdn.com/t/font_148784_v4ggb6wrjmkotj4i.woff
download_url https://at.alicdn.com/t/font_148784_v4ggb6wrjmkotj4i.ttf

download_url https://cdnjs.cloudflare.com/ajax/libs/patternfly/3.24.0/css/patternfly.min.css
download_url https://cdnjs.cloudflare.com/ajax/libs/patternfly/3.24.0/css/patternfly-additions.min.css
download_url https://cdnjs.cloudflare.com/ajax/libs/jquery-cookie/1.4.1/jquery.cookie.js
download_url https://cdnjs.cloudflare.com/ajax/libs/jquery-timeago/1.6.1/jquery.timeago.min.js

wget -O jquery-3.2.1.min.js     https://code.jquery.com/jquery-3.2.1.min.js
wget -O jquery-latest.min.js    http://code.jquery.com/jquery-latest.min.js

mkdir -p bootstrap/3.3.5/fonts/
wget -O bootstrap/3.3.5/fonts/glyphicons-halflings-regular.woff2  https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/glyphicons-halflings-regular.woff2
wget -O bootstrap/3.3.5/fonts/glyphicons-halflings-regular.woff https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/glyphicons-halflings-regular.woff
wget -O bootstrap/3.3.5/fonts/glyphicons-halflings-regular.ttf https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/glyphicons-halflings-regular.ttf

cd /data/ccn/static-html/

s2i build --rm  files/  registry.redhat.io/rhscl/nginx-114-rhel7:latest  nginx-sample-app

docker tag nginx-sample-app docker.io/wangzheng422/cloudnative-workspaces-quarkus:swap-$var_date
docker push docker.io/wangzheng422/cloudnative-workspaces-quarkus:swap-$var_date
echo docker.io/wangzheng422/cloudnative-workspaces-quarkus:swap-$var_date

wget -O mime.types https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/mime.types
wget -O nginx.conf https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.4/ccn/nginx.conf

cat << EOF > nginx.Dockerfile
FROM docker.io/wangzheng422/cloudnative-workspaces-quarkus:swap-$var_date

USER root
COPY mime.types /etc/nginx/
COPY nginx.conf /etc/nginx/

USER 1001
EOF

buildah bud --format=docker -t docker.io/wangzheng422/cloudnative-workspaces-quarkus:static-html-$var_date -f nginx.Dockerfile .

buildah push docker.io/wangzheng422/cloudnative-workspaces-quarkus:static-html-$var_date
echo "docker.io/wangzheng422/cloudnative-workspaces-quarkus:static-html-$var_date"


docker image prune -f
podman image prune -a

# oc -n labs-infra create route edge static-html-0 --service=static-html --hostname=maxcdn.bootstrapcdn.com 
# oc -n labs-infra create route edge static-html-1 --service=static-html   --hostname=ajax.googleapis.com 
# oc -n labs-infra create route edge static-html-2 --service=static-html   --hostname=at.alicdn.com
# oc -n labs-infra create route edge static-html-3 --service=static-html   --hostname=cdnjs.cloudflare.com
# oc -n labs-infra create route edge static-html-4 --service=static-html   --hostname=code.jquery.com

pip for agnosticd

# on vultr perpare pip
# https://www.linuxtechi.com/use-ansible-galaxy-roles-ansible-playbook/
# https://docs.ansible.com/ansible/latest/scenario_guides/guide_kubernetes.html
# https://stackoverflow.com/questions/11091623/how-to-install-packages-offline
# https://www.activestate.com/resources/quick-reads/how-to-update-all-python-packages/
# yum install -y python2-pip
mkdir -p /data/pip3
cd /data/pip3
# pip install --upgrade pip
pip3 install --user --upgrade kubernetes openshift requests
pip3 freeze --user > requirements.txt
# pip3 install -r requirements.txt --upgrade
mkdir -p wheelhouse
pip3 download -r requirements.txt -d wheelhouse
/bin/cp -f requirements.txt wheelhouse/
tar -zcf wheelhouse.tar.gz wheelhouse


var_date=$(date '+%Y-%m-%d')
echo $var_date

buildah from --name onbuild-container scratch
buildah copy onbuild-container wheelhouse.tar.gz /
buildah umount onbuild-container 
buildah commit --rm --format=docker onbuild-container docker.io/wangzheng422/base-fs:pip3-whl-$var_date
# buildah rm onbuild-container
buildah push docker.io/wangzheng422/base-fs:pip3-whl-$var_date
echo "docker.io/wangzheng422/base-fs:pip3-whl-$var_date"

nodejs

# docker.io/wangzheng422/cloudnative-workspaces-quarkus:nodejs-10-2020-07-16-2155
# this docker file is build using nodejs-10.Dockerfile

mkdir -p /data/ccn/nodejs
cd /data/ccn/nodejs

var_date=$(date '+%Y-%m-%d')
echo $var_date

wget -O .npmrc https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/ccn/.npmrc
wget -O .bowerrc https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/ccn/.bowerrc
wget https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/ccn/nodejs-10.Dockerfile

buildah bud --format=docker -t docker.io/wangzheng422/imgs:nodejs-10-wzh-$var_date -f nodejs-10.Dockerfile .
buildah push docker.io/wangzheng422/imgs:nodejs-10-wzh-$var_date 

echo "docker.io/wangzheng422/imgs:nodejs-10-wzh-$var_date"

# docker.io/wangzheng422/imgs:nodejs-10-wzh-2021-01-05

build dist

cd /data/ocp4
wget -O poc.image.list https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.6/ccn/poc.image.list

export MIRROR_DIR='/data/poc.image'
/bin/rm -rf ${MIRROR_DIR}
bash add.image.sh poc.image.list ${MIRROR_DIR}


labs sync


rsync -e ssh --info=progress2 -P --delete -arz bastion.fd21.example.opentlc.com:/data/ccn/nexus/  /data/ccn/nexus/

rsync -e ssh -P --delete -arz root@bastion.fd21.example.opentlc.com:/data/ccn/nexus/  ./nexus/ 

rsync -e ssh -P --delete -arz  ./nexus/  root@192.168.7.11:/data/ccn/nexus/   

chown -R 200:root nexus

rsync -e ssh --info=progress2 -P --delete -arz   192.168.252.11:/data/ccn/nexus/   ./nexus/   



other tips

find object blocks deleting namespace/project

  • https://access.redhat.com/solutions/4165791
PROJECT_NAME=user1-cloudnativeapps

oc api-resources --verbs=list --namespaced -o name | xargs -n 1 oc get --show-kind --ignore-not-found -n $PROJECT_NAME

oc api-resources --verbs=list --cached --namespaced -o name | xargs -n 1 oc get --show-kind --ignore-not-found -n $PROJECT_NAME


configuration.serving.knative.dev/payment
service.serving.knative.dev/payment
route.serving.knative.dev/payment


service mesh & knative

oc project istio-system
oc get pod -o json | jq -r '.items[].spec.containers[].image' > tmp.list

oc project istio-operator
oc get pod -o json | jq -r '.items[].spec.containers[].image' >> tmp.list

oc project knative-eventing
oc get pod -o json | jq -r '.items[].spec.containers[].image' >> tmp.list

oc project knative-serving
oc get pod -o json | jq -r '.items[].spec.containers[].image' >> tmp.list

oc project tekton-pipelines
oc get pod -o json | jq -r '.items[].spec.containers[].image' >> tmp.list

oc get pod -o json | jq -r '.items[].spec.initContainers[].image' >> tmp.list

oc project openshift-operators
oc get pod -o json | jq -r '.items[].spec.containers[].image' >> tmp.list



cat tmp.list | sort | uniq

oc project user0-catalog
oc get pod -o json | jq -r '.items[].spec.containers[].image'| sort | uniq 


以下是弯路

build github clone site, using gitlab

yum -y install podman

rm -rf /data/ccn/gitlab
mkdir -p /data/ccn/gitlab/config
mkdir -p /data/ccn/gitlab/logs
mkdir -p /data/ccn/gitlab/data


# podman run --detach \
#   --hostname local.redhat.ren \
#   --env GITLAB_OMNIBUS_CONFIG="external_url 'http://local.redhat.ren:7080/'; gitlab_rails['lfs_enabled'] = true;" \
#   --publish 7443:443 --publish 7080:80 --publish 7022:22 \
#   --name gitlab \
#   --restart always \
#   --volume /data/ocp4/demo/gitlab/config:/etc/gitlab:Z \
#   --volume /data/ocp4/demo/gitlab/logs:/var/log/gitlab:Z \
#   --volume /data/ocp4/demo/gitlab/data:/var/opt/gitlab:Z \
#   gitlab/gitlab-ce:latest

podman run --detach \
  --hostname local.redhat.ren \
  --publish 7443:443 --publish 7080:80 --publish 7022:22 \
  --name gitlab \
  --restart always \
  --volume /data/ccn/gitlab/config:/etc/gitlab:Z \
  --volume /data/ccn/gitlab/logs:/var/log/gitlab:Z \
  --volume /data/ccn/gitlab/data:/var/opt/gitlab:Z \
  gitlab/gitlab-ce:latest

# set default username / password
# root / redhat2019

podman stop gitlab

podman rm -fv gitlab

cd /data/ccn
# tar zcf gitlab.tgz ./gitlab 
cat << EOF > /data/ccn/gitlab.files.Dockerfile
FROM registry.redhat.io/ubi7/ubi
COPY gitlab /gitlab
EOF
podman build --no-cache -f /data/ccn/gitlab.files.Dockerfile -t quay.io/wangzheng422/gitlab-fs /data/ccn/
podman push quay.io/wangzheng422/gitlab-fs

podman exec -it gitlab update-permissions
podman restart gitlab
podman logs -f gitlab
getfacl /data/ccn/gitlab/

# now we try to use it
rm -rf /data/ccn/gitlab
podman run -d --name gitlab-fs --entrypoint "tail" quay.io/wangzheng422/gitlab-fs -f /dev/null
podman cp gitlab-fs:/gitlab /data/ccn/
podman rm -fv gitlab-fs
# tar zxf gitlab.tgz
# chown -R root: /data/ccn/gitlab/

openshift 4.10 ACM with observ user case

By default, the observability is disabled, here we will enable it, and see what it looks like.

Here is the architecture of the acm observability:

create the acm hub cluster

  • install a sno cluster with 16C, 64GB memory, and 2 100GB disks
  • install ODF from operator hub, and create a ceph following steps here.
  • install ACM from operator hub
  • install multiclusterhub with default setting

create the managed cluster

install a sno cluster with 16C, 32GB memory.


NODE_SSH_KEY="$(cat ~/.ssh/id_rsa.pub)"
INSTALL_IMAGE_REGISTRY=quaylab.infra.redhat.ren:8443

PULL_SECRET='{"auths":{"registry.redhat.io": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"registry.ocp4.redhat.ren:5443": {"auth": "ZHVtbXk6ZHVtbXk=","email": "noemail@localhost"},"'${INSTALL_IMAGE_REGISTRY}'": {"auth": "'$( echo -n 'admin:shadowman' | openssl base64 )'","email": "noemail@localhost"}}}'

NTP_SERVER=192.168.7.11
HELP_SERVER=192.168.7.11
KVM_HOST=192.168.7.11
API_VIP=192.168.7.100
INGRESS_VIP=192.168.7.101
CLUSTER_PROVISION_IP=192.168.7.103
BOOTSTRAP_IP=192.168.7.12

ACM_DEMO_MNGED_CLUSTER=acm-demo-man01
ACM_DEMO_MNGED_SNO_IP=192.168.7.23

# 定义单节点集群的节点信息
SNO_CLUSTER_NAME=acm-demo-man01
SNO_BASE_DOMAIN=redhat.ren
SNO_IP=192.168.7.23
SNO_GW=192.168.7.11
SNO_NETMAST=255.255.255.0
SNO_NETMAST_S=24
SNO_HOSTNAME=acm-demo-man01-master
SNO_IF=enp1s0
SNO_IF_MAC=`printf '00:60:2F:%02X:%02X:%02X' $[RANDOM%256] $[RANDOM%256] $[RANDOM%256]`
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_CORE_PWD=redhat

echo ${SNO_IF_MAC} > /data/sno/sno.mac

# goto kvm host ( 103 )

scp root@192.168.7.11:/data/install/sno.iso /data/kvm/

virsh destroy ocp4-acm-man01
virsh undefine ocp4-acm-man01

create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

create_lv vgdata poolA lvacm-man01 100G recreate
create_lv vgdata poolA lvacm-man01-data 100G recreate

SNO_MEM=32

virt-install --name=ocp4-acm-man01-master01 --vcpus=16 --ram=$(($SNO_MEM*1024)) \
  --cpu=host-model \
  --disk path=/dev/vgdata/lvacm-man01,device=disk,bus=virtio,format=raw \
  --disk path=/dev/vgdata/lvacm-man01-data,device=disk,bus=virtio,format=raw \
  --os-variant rhel8.3 --network bridge=baremetal,model=virtio \
  --graphics vnc,port=59003 \
  --boot menu=on --cdrom /data/kvm/sno.iso 


# INFO Install complete!
# INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
# INFO Access the OpenShift web-console here: https://console-openshift-console.apps.acm-demo-man01.redhat.ren
# INFO Login to the console with user: "kubeadmin", and password: "FohuH-IwyJe-3UQPL-AakHm"
# INFO Time elapsed: 0s

enable acm observ in acm hub cluster

offical document: acm observ enable, it will enable observ in managed cluster automatically.

# try to install acm observ
# https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html-single/observability/index
oc create namespace open-cluster-management-observability

DOCKER_CONFIG_JSON=`oc extract secret/pull-secret -n openshift-config --to=-`

oc create secret generic multiclusterhub-operator-pull-secret \
    -n open-cluster-management-observability \
    --from-literal=.dockerconfigjson="$DOCKER_CONFIG_JSON" \
    --type=kubernetes.io/dockerconfigjson

cat << EOF > /data/install/acm.observ.secret.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: thanos-object-storage
  namespace: open-cluster-management-observability
type: Opaque
stringData:
  thanos.yaml: |
    type: s3
    config:
      bucket: $BUCKET_NAME
      endpoint: $AWS_HOST
      insecure: true
      access_key: $AWS_ACCESS_KEY_ID
      secret_key: $AWS_SECRET_ACCESS_KEY
EOF
oc create -f /data/install/acm.observ.secret.yaml

# oc delete -f /data/install/acm.observ.secret.yaml

cat << EOF > /data/install/acm.observ.yaml
---
apiVersion: observability.open-cluster-management.io/v1beta2
kind: MultiClusterObservability
metadata:
  name: observability
spec:
  observabilityAddonSpec: {}
  storageConfig:
    metricObjectStorage:
      name: thanos-object-storage
      key: thanos.yaml
---
EOF
oc create -f /data/install/acm.observ.yaml -n open-cluster-management

# oc delete -f /data/install/acm.observ.yaml -n open-cluster-management


import second cluster into acm, by using kubeconfig file

click 'Grafana' on top right, you will see the grafana dashboard

you can see there are 3 default dashboard included with acm, 2 of them are usable for ocp4

look at the 'ACM - Clusters Overview' dashboard

look at the 'ACM - Resource Optimization / Cluster' dashboard

for acm hub cluster:

for managed cluster:

ansible platform 2.1 install

客户希望安装一套ansible platform,而且是全离线环境,那么我们就按照最简单的单节点模式,来安装一下。整个安装过程,就是用rhel8.5的dvd安装基本操作系统,然后把dvd作为系统的dnf源。接着,导入3个docker镜像,并且本地启动一个docker registry服务。

注意,单节点服务至少要8G内存,不然安装脚本检测不过的。

安装操作系统,配置基础服务

# install rhel 8.5 using dvd iso

# reboot, and set dvd iso as dnf source
blkid | grep sr0
# /dev/sr0: BLOCK_SIZE="2048" UUID="2021-10-13-03-57-25-00" LABEL="RHEL-8-5-0-BaseOS-x86_64" TYPE="iso9660" PTUUID="4d694e6c" PTTYPE="dos"
blkid /dev/sr0 -o value | sed -n 2p
# 2021-10-13-03-57-25-00
mkdir -p /media/cdrom

mount /dev/sr0 /media/cdrom

cat << EOF >> /etc/fstab
UUID=`blkid /dev/sr0 -o value | sed -n 2p`            /media/cdrom                iso9660 ro,user,auto  0 0
EOF

cat << EOF > /etc/yum.repos.d/dvd.repo
[dvd-base]
name=dvd-base
baseurl=file:///media/cdrom/BaseOS
enabled=1
gpgcheck=0

[dvd-app]
name=dvd-app
baseurl=file:///media/cdrom/AppStream
enabled=1
gpgcheck=0
EOF

# we need to setup a docker registry
# and we need copy docker registry image into the disconnected host
podman pull docker.io/library/registry:2
podman save docker.io/library/registry:2 | pigz -c > registry.tgz

podman load -i registry.tgz
# Loaded image(s): docker.io/library/registry:2

# this is testing/demo purpose,
# do not turn off firewalld on production system
systemctl disable --now firewalld

cat << EOF >>  /etc/hosts
127.0.0.1 registry.redhat.ren
EOF

# 配置registry
mkdir -p /etc/crts/ && cd /etc/crts

openssl genrsa -out /etc/crts/redhat.ren.ca.key 4096
openssl req -x509 \
  -new -nodes \
  -key /etc/crts/redhat.ren.ca.key \
  -sha256 \
  -days 36500 \
  -out /etc/crts/redhat.ren.ca.crt \
  -subj /CN="Local Red Hat Ren Signer" \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat /etc/pki/tls/openssl.cnf \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature'))

openssl genrsa -out /etc/crts/redhat.ren.key 2048

openssl req -new -sha256 \
    -key /etc/crts/redhat.ren.key \
    -subj "/O=Local Red Hat Ren /CN=*.ocp4.redhat.ren" \
    -reqexts SAN \
    -config <(cat /etc/pki/tls/openssl.cnf \
        <(printf "\n[SAN]\nsubjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth")) \
    -out /etc/crts/redhat.ren.csr

openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=DNS:*.ocp4.redhat.ren,DNS:*.apps.ocp4.redhat.ren,DNS:*.redhat.ren\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment, keyAgreement, dataEncipherment\nextendedKeyUsage=serverAuth") \
    -days 36500 \
    -in /etc/crts/redhat.ren.csr \
    -CA /etc/crts/redhat.ren.ca.crt \
    -CAkey /etc/crts/redhat.ren.ca.key \
    -CAcreateserial -out /etc/crts/redhat.ren.crt

openssl x509 -in /etc/crts/redhat.ren.crt -text

/bin/cp -f /etc/crts/redhat.ren.ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

cd /data/ocp4
# systemctl stop docker-distribution

/bin/rm -rf /data/registry
mkdir -p /data/registry

podman run --restart always --name local-registry -p 5443:5443 \
  -d --restart=always \
  -v /data/registry/:/var/lib/registry:z \
  -v /etc/crts:/certs:z \
  -e REGISTRY_HTTP_ADDR=0.0.0.0:5443 \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/redhat.ren.crt \
  -e REGISTRY_HTTP_TLS_KEY=/certs/redhat.ren.key \
  docker.io/library/registry:2

安装ansible platform

官方文档写的非常清晰,我们安装官方文档做就好。只不过官方文档里面,对全离线的时候,docker image怎么处理,似乎没讲的很详细,我们补充一下。

# document is here
# https://access.redhat.com/documentation/en-us/red_hat_ansible_automation_platform/2.1/pdf/red_hat_ansible_automation_platform_installation_guide/red_hat_ansible_automation_platform-2.1-red_hat_ansible_automation_platform_installation_guide-en-us.pdf

# goto https://access.redhat.com/downloads/content/480
# to download Ansible Automation Platform 2.1.0 Setup Bundle
mkdir -p /data
cd /data

tar zvxf ansible-automation-platform-setup-bundle-2.1.0-1.tar.gz
cd /data/ansible-automation-platform-setup-bundle-2.1.0-1

podman load -i images/ee-29-rhel8.tgz
# Loaded image(s): registry.redhat.io/ansible-automation-platform-21/ee-29-rhel8:latest
podman load -i images/ee-minimal-rhel8.tgz
# Loaded image(s): registry.redhat.io/ansible-automation-platform-21/ee-minimal-rhel8:latest
podman load -i images/ee-supported-rhel8.tgz
# Loaded image(s): registry.redhat.io/ansible-automation-platform-21/ee-supported-rhel8:latest

podman tag registry.redhat.io/ansible-automation-platform-21/ee-29-rhel8:latest registry.redhat.ren:5443/ansible-automation-platform-21/ee-29-rhel8:latest
podman push registry.redhat.ren:5443/ansible-automation-platform-21/ee-29-rhel8:latest

podman tag registry.redhat.io/ansible-automation-platform-21/ee-minimal-rhel8:latest registry.redhat.ren:5443/ansible-automation-platform-21/ee-minimal-rhel8:latest
podman push registry.redhat.ren:5443/ansible-automation-platform-21/ee-minimal-rhel8:latest

podman tag registry.redhat.io/ansible-automation-platform-21/ee-supported-rhel8:latest registry.redhat.ren:5443/ansible-automation-platform-21/ee-supported-rhel8:latest
podman push registry.redhat.ren:5443/ansible-automation-platform-21/ee-supported-rhel8:latest

/bin/cp -f inventory inventory.bak

cat << EOF > inventory
[automationcontroller]
127.0.0.1 ansible_connection=local

[database]

[all:vars]
admin_password='password'
pg_host=''
pg_port=''
pg_database='awx'
pg_username='awx'
pg_password='password'
registry_url='registry.redhat.ren:5443'

EOF

./setup.sh -e gpgcheck=0

# login using admin / password
# open browser to https://172.16.218.2/

安装到此结束,打开浏览器,访问 https://

并使用用户名 admin 密码 password 登录就可以了。

激活订阅

初始安装,第一次登录,会要求用订阅激活。而我们是离线安装模式,所以这里有一个在红帽官网导出离线证书的过程。

首先我们登录到ansible platform里面后,看到激活界面,点击链接,访问红帽官网。

访问到红帽官网以后,点击新的订阅分配

给新的订阅取个容易记忆的名字。订阅分配其实是一个订阅证书分发的机制,我们创建了这个订阅分配以后,就可以往里面添加购买的产品,比如买了ansible, rhel等,然后就会统一的下一个zip文件,都给你打包好,很方便。

创建好订阅分配后,点击订阅。

然后点击添加订阅

用关键字搜索产品,如果我们购买的产品少,那么就不用搜索,直接在列表中选择就可以,在要购买的产品后面,调整权利,比如我们要激活一个系统,就设置权利为1就可以了。

点击提交以后,我们就看到订阅添加成功了

我们点击导出清单,就可以到处订阅证书了

你会得到一个类似这样文件名的文件: manifest_ansible_20220107T110649Z.zip 。把这个文件,导入到ansible platform中。

在用户数据反馈中,取消点击,因为我们是离线的,访问不到红帽的公网系统。

提交后,我们就进入ansible platform的首页界面啦。

virus test for docker image security scanning

几乎所有的容器平台,都有容器安全的方案,比如大名鼎鼎的clair,但是他们的扫描原理并不是深度扫描,而是通过容器内部yum, apk等包管理工具,扫描包管理工具的历史数据库,看装在容器里面的软件版本,从而判断是否有漏洞的。

这种扫描方法,当然是为了性能,但是也给日常实践带来了困扰,一般工程师很容易的以为,我们上了一个容器安全平台,就可以高枕无忧了,其实不是这样的。

以下我们就举一个实际的例子,看看效果,然后我们再来想想怎么应对。

test quay / clair / docker hub

我们用网上的测试病毒,复制到容器里面去,打包,上传到镜像仓库,看看镜像仓库的扫描结果。为了更有代表性,我们把病毒复制成java,然后我们把镜像分别上传quay.io, docker hub

mkdir -p /data/tmp
cd /data/tmp

cat << EOF > ./virus.Dockerfile
FROM registry.access.redhat.com/ubi8/ubi-minimal
ADD https://www.ikarussecurity.com/wp-content/downloads/eicar_com.zip /wzh
ADD https://github.com/MalwareSamples/Linux-Malware-Samples/blob/main/00ae07c9fe63b080181b8a6d59c6b3b6f9913938858829e5a42ab90fb72edf7a /wzh01
ADD https://github.com/MalwareSamples/Linux-Malware-Samples/blob/main/00ae07c9fe63b080181b8a6d59c6b3b6f9913938858829e5a42ab90fb72edf7a /usr/bin/java

RUN chmod +x /wzh*
RUN chmod +x /usr/bin/java
EOF

buildah bud -t quay.io/wangzheng422/qimgs:virus -f virus.Dockerfile ./

buildah push quay.io/wangzheng422/qimgs:virus

buildah bud -t docker.io/wangzheng422/virus -f virus.Dockerfile ./

buildah push docker.io/wangzheng422/virus

我们发现,包含病毒的镜像,在两个容器镜像平台上,都扫描不出来。这也就证明了,普通的扫描工具,只能扫描包管理工具的历史数据库,而不能扫描容器内部的软件版本。

log4jshell

现在大名顶顶的log4j的漏洞,各地平台扫描能不一样,能扫描的也是检测容器里面的jar文件,然后看这个jar文件里面的MANIFEST.MF文件,在这个文件里面看看对应软件包的版本号,然后报警。

我们可以看看 quay.io/apoczeka/log4shell-vuln 这个容器,在quay.io上面的扫描结果,可以看到他无法发现log4j的漏洞。

ACS

那么我们看看红帽RHACS容器安全平台能不能扫出来。

# on vultr
wget https://mirror.openshift.com/pub/rhacs/assets/latest/bin/linux/roxctl
install -m 755 roxctl /usr/local/bin/

# on ACS platform
# Integrations -> API Token -> Create Integration
# role -> continous-integration -> create
# copy the API token out
export ROX_API_TOKEN=<api_token>
export ROX_CENTRAL_ADDRESS=central-stackrox.apps.cluster-ms246.ms246.sandbox1059.opentlc.com:443

roxctl -e "$ROX_CENTRAL_ADDRESS" --insecure-skip-tls-verify image scan -i docker.io/elastic/logstash:7.13.0 | jq '.scan.components[] | .vulns[]? | select(.cve == "CVE-2021-44228") | .cve'
# "CVE-2021-44228"
# "CVE-2021-44228"

roxctl -e "$ROX_CENTRAL_ADDRESS" --insecure-skip-tls-verify image scan -i quay.io/apoczeka/log4shell-vuln | jq '.scan.components[] | .vulns[]? | select(.cve == "CVE-2021-44228") | .cve'
# "CVE-2021-44228"

roxctl -e "$ROX_CENTRAL_ADDRESS" --insecure-skip-tls-verify image check -r 0 -o json -i docker.io/elastic/logstash:7.13.0 

我们可以看到ACS成功的检测到了log4j的漏洞。这样我们就可以把ACS继承到我们CI/CD流水线里面去,完成漏洞扫描。

当然,我们可以使用ACS内置的界面工具,快速的定义规则,并且第一时间禁止相关的漏洞。

这里是配置生效以后(要有点下发的时间,如果集群比较繁忙的话),ACS阻止有漏洞的镜像运行的效果。

不过,目前ACS只支持deployment模式的部署,你要是修改deployment, 或者干脆用pod直接部署,都会绕过ACS的检测,这个以后看ACS升级解决吧。

grype

类似ACS的命令行工具,还有很多其他的选择,这里举个例子。

# https://github.com/anchore/grype

grype -q quay.io/apoczeka/log4shell-vuln | grep log4j
# log4j-api          2.14.1       2.15.0       GHSA-jfh8-c2jp-5v3q  Critical
# log4j-api          2.14.1       2.16.0       GHSA-7rjr-3q55-vv33  Medium
# log4j-api          2.14.1                    CVE-2021-44228       Critical
# log4j-core         2.14.1       2.15.0       GHSA-jfh8-c2jp-5v3q  Critical
# log4j-core         2.14.1       2.16.0       GHSA-7rjr-3q55-vv33  Medium
# log4j-core         2.14.1                    CVE-2021-44228       Critical
# log4j-jul          2.14.1                    CVE-2021-44228       Critical
# log4j-slf4j-impl   2.14.1                    CVE-2021-44228       Critical


trivy

https://github.com/aquasecurity/trivy

openshift install cnv with ocs and external ceph

本次测试的目标业务场景是,一个CS服务的虚机,用CNV跑在openshift上,虚机的镜像承载在ceph上,并测试虚机热迁移和虚机克隆场景。

由于测试环境所限,我们配置一个单节点的ceph,挂3个5.5T的盘,这个ceph节点,就用kvm,配置了16C32G,实际跑下来,感觉8C32G也是够的。

部署架构图

diag

视频讲解

单节点ceph安装,ocs安装,对接外部ceph存储

cnv安装,导入虚机镜像,热迁移,克隆

install ceph

我们先安装这个ceph节点。


#####################################
## start to install ceph
cd /backup/wzh

lvremove -f ocp4/cephlv
lvcreate -y -L 230G -n cephlv ocp4

lvremove -f ocp4/cephdata01lv
lvcreate -y -L 3T -n cephdata01lv ocp4

lvremove -f ocp4/cephdata02lv
lvcreate -y -L 3T -n cephdata02lv ocp4

lvremove -f ocp4/cephdata03lv
lvcreate -y -L 3T -n cephdata03lv ocp4

virt-install --name=ocp4-ceph --vcpus=16 --ram=32768 \
--disk path=/dev/ocp4/cephlv,device=disk,bus=virtio,format=raw \
--disk path=/dev/ocp4/cephdata01lv,device=disk,bus=virtio,format=raw \
--disk path=/dev/ocp4/cephdata02lv,device=disk,bus=virtio,format=raw \
--disk path=/dev/ocp4/cephdata03lv,device=disk,bus=virtio,format=raw \
--os-variant centos7.0 --network network:openshift4,model=virtio \
--boot menu=on --location /home/data/openshift/ocp.4.3.21/rhel-server-7.8-x86_64-dvd.iso \
--initrd-inject rhel-ks-ceph.cfg --extra-args "inst.ks=file:/rhel-ks-ceph.cfg" 

#######################################
#  kvm's host bond and vlan

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-configure_802_1q_vlan_tagging_using_the_command_line_tool_nmcli

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-vlan_on_bond_and_bridge_using_the_networkmanager_command_line_tool_nmcli
nmcli con add type bond \
    con-name bond-24 \
    ifname bond-24 \
    mode 802.3ad ipv4.method disabled ipv6.method ignore
    
nmcli con mod id bond-24 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3
    
nmcli con add type bond-slave ifname enp176s0f0 con-name enp176s0f0 master bond-24
nmcli con add type bond-slave ifname enp59s0f0 con-name enp59s0f0 master bond-24

nmcli con up bond-24

nmcli connection add type bridge con-name br-ceph ifname br-ceph ip4 192.168.18.200/24

nmcli con up br-ceph

nmcli con add type vlan con-name vlan-ceph ifname vlan-ceph dev bond-24 id 501 master br-ceph slave-type bridge

nmcli con up vlan-ceph

# no need below
# cat << EOF >  /backup/wzh/virt-net.xml
# <network>
#   <name>vm-br-ceph</name>
#   <forward mode='bridge'>
#     <bridge name='br-ceph'/>
#   </forward>
# </network>
# EOF
# virsh net-define --file virt-net.xml
# virsh net-autostart br-ceph
# virsh net-start br-ceph
# virsh net-list

# # restore
# virsh net-undefine br-ceph
# virsh net-destroy br-ceph

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null

EOF

# restore
nmcli con del vlan-ceph
nmcli con del br-ceph
nmcli con del enp59s0f0
nmcli con del enp176s0f0
nmcli con del bond-24

########################################
# go to ceph vm

# https://www.cyberciti.biz/faq/linux-list-network-cards-command/
cat /proc/net/dev

nmcli con add type ethernet ifname eth1 con-name eth1
nmcli con modify eth1 ipv4.method manual ipv4.addresses 192.168.18.203/24
nmcli con modify eth1 connection.autoconnect yes
nmcli con reload
nmcli con up eth1

# restore
nmcli con del eth1

##########################################
# go to worker2 vm, to test the ceph vlan
nmcli con add type ethernet ifname ens9 con-name ens9
nmcli con modify ens9 ipv4.method manual ipv4.addresses 192.168.18.209/24
nmcli con modify ens9 connection.autoconnect yes
nmcli con reload
nmcli con up ens9

# restore
nmcli con del ens9
nmcli con del 'Wired connection 1'

##########################################
# go to worker1 vm, to test the ceph vlan
nmcli con add type ethernet ifname ens9 con-name ens9
nmcli con modify ens9 ipv4.method manual ipv4.addresses 192.168.18.208/24
nmcli con modify ens9 connection.autoconnect yes
nmcli con reload
nmcli con up ens9

# restore
nmcli con del ens9
nmcli con del 'Wired connection 1'

##########################################
# go to worker0 vm, to test the ceph vlan
nmcli con add type ethernet ifname ens9 con-name ens9
nmcli con modify ens9 ipv4.method manual ipv4.addresses 192.168.18.207/24
nmcli con modify ens9 connection.autoconnect yes
nmcli con reload
nmcli con up ens9

##########################################
# go to master2 vm, to test the ceph vlan
nmcli con add type ethernet ifname ens9 con-name ens9
nmcli con modify ens9 ipv4.method manual ipv4.addresses 192.168.18.206/24
nmcli con modify ens9 connection.autoconnect yes
nmcli con reload
nmcli con up ens9

# restore
nmcli con del ens9
nmcli con del 'Wired connection 1'

##########################################
# go to master1 vm, to test the ceph vlan
nmcli con add type ethernet ifname ens9 con-name ens9
nmcli con modify ens9 ipv4.method manual ipv4.addresses 192.168.18.205/24
nmcli con modify ens9 connection.autoconnect yes
nmcli con reload
nmcli con up ens9

# restore
nmcli con del ens9
nmcli con del 'Wired connection 1'

##########################################
# go to master0 vm, to test the ceph vlan
nmcli con add type ethernet ifname ens9 con-name ens9
nmcli con modify ens9 ipv4.method manual ipv4.addresses 192.168.18.204/24
nmcli con modify ens9 connection.autoconnect yes
nmcli con reload
nmcli con up ens9

# restore
nmcli con del ens9
nmcli con del 'Wired connection 1'

##########################################
# go to worker4 baremetal, to test the ceph vlan
nmcli con del 'Wired connection 1'
nmcli con del 'Wired connection 2'
nmcli con del 'Wired connection 3'
nmcli con del 'Wired connection 4'
nmcli con del 'Wired connection 5'
nmcli con del ens35f0.991
nmcli con del ens35f1

# https://access.redhat.com/solutions/1526613
nmcli con add type bond \
    con-name bond-24 \
    ifname bond-24 \
    mode 802.3ad ipv4.method disabled ipv6.method ignore
    
nmcli con mod id bond-24 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3
    
nmcli con add type bond-slave ifname ens49f0 con-name ens49f0 master bond-24
nmcli con add type bond-slave ifname ens35f0 con-name ens35f0 master bond-24

nmcli con up bond-24

nmcli con add type vlan con-name vlan-ceph ifname vlan-ceph dev bond-24 id 501 ip4 192.168.18.211/24

nmcli con up vlan-ceph

# restore
nmcli con del vlan-ceph
nmcli con del ens49f0 ens35f0
nmcli con del bond-24

#############################################
# go to worker3 baremetal, to test the ceph vlan

nmcli con del 'Wired connection 1'
nmcli con del 'Wired connection 2'
nmcli con del 'Wired connection 3'
nmcli con del 'Wired connection 4'
nmcli con del 'Wired connection 5'

nmcli con add type bond \
    con-name bond-24 \
    ifname bond-24 \
    mode 802.3ad ipv4.method disabled ipv6.method ignore
    
nmcli con mod id bond-24 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3
    
nmcli con add type bond-slave ifname ens49f0 con-name ens49f0 master bond-24
nmcli con add type bond-slave ifname ens35f0 con-name ens35f0 master bond-24

nmcli con up bond-24

nmcli con add type vlan con-name vlan-ceph ifname vlan-ceph dev bond-24 id 501  ip4 192.168.18.210/24

nmcli con up vlan-ceph

# restore
nmcli con del vlan-ceph
nmcli con del ens49f0 ens35f0
nmcli con del bond-24


#################################################
## for ceph vm
# install a 'fast' http proxy, then

subscription-manager --proxy=127.0.0.1:6666 register --username **** --password ********
# subscription-manager --proxy=127.0.0.1:6666 refresh

subscription-manager config --rhsm.baseurl=https://china.cdn.redhat.com
# subscription-manager config --rhsm.baseurl=https://cdn.redhat.com
subscription-manager --proxy=127.0.0.1:6666 refresh

# https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/installation_guide/index
subscription-manager --proxy=127.0.0.1:6666 repos --disable=*
subscription-manager --proxy=127.0.0.1:6666 repos --enable=rhel-7-server-rpms \
--enable=rhel-7-server-extras-rpms \
--enable=rhel-7-server-supplementary-rpms \
--enable=rhel-7-server-optional-rpms \
--enable=rhel-7-server-rhceph-4-tools-rpms --enable=rhel-7-server-ansible-2.8-rpms \
--enable=rhel-7-server-rhceph-4-mon-rpms \
--enable=rhel-7-server-rhceph-4-osd-rpms \
--enable=rhel-7-server-rhceph-4-tools-rpms 


yum clean all
yum makecache

yum update -y

systemctl enable --now firewalld
systemctl start firewalld
systemctl status firewalld

firewall-cmd --zone=public --add-port=6789/tcp
firewall-cmd --zone=public --add-port=6789/tcp --permanent
firewall-cmd --zone=public --add-port=6800-7300/tcp
firewall-cmd --zone=public --add-port=6800-7300/tcp --permanent
firewall-cmd --zone=public --add-port=6800-7300/tcp
firewall-cmd --zone=public --add-port=6800-7300/tcp --permanent
firewall-cmd --zone=public --add-port=6800-7300/tcp
firewall-cmd --zone=public --add-port=6800-7300/tcp --permanent
firewall-cmd --zone=public --add-port=8080/tcp
firewall-cmd --zone=public --add-port=8080/tcp --permanent
firewall-cmd --zone=public --add-port=443/tcp
firewall-cmd --zone=public --add-port=443/tcp --permanent
# firewall-cmd --zone=public --add-port=9090/tcp
# firewall-cmd --zone=public --add-port=9090/tcp --permanent

ssh-keygen

sed -i 's/#UseDNS yes/UseDNS no/' /etc/ssh/sshd_config
systemctl restart sshd

ssh-copy-id root@ceph

yum install -y ceph-ansible docker

cd /usr/share/ceph-ansible

# yum install -y docker
systemctl enable --now docker

cd /usr/share/ceph-ansible
/bin/cp -f  group_vars/all.yml.sample group_vars/all.yml
/bin/cp -f  group_vars/osds.yml.sample group_vars/osds.yml
/bin/cp -f  site-docker.yml.sample site-docker.yml
/bin/cp -f  site.yml.sample site.yml
/bin/cp -f  group_vars/rgws.yml.sample group_vars/rgws.yml
/bin/cp -f  group_vars/mdss.yml.sample group_vars/mdss.yml

# remember to set the env
# https://access.redhat.com/RegistryAuthentication
# REGISTRY_USER_NAME=
# REGISTRY_TOKEN=

cat << EOF > ./group_vars/all.yml
fetch_directory: ~/ceph-ansible-keys
monitor_interface: eth1 
public_network: 192.168.18.0/24
# ceph_docker_image: rhceph/rhceph-4-rhel8
# ceph_docker_image_tag: "latest"
# containerized_deployment: true
ceph_docker_registry: registry.redhat.io
ceph_docker_registry_auth: true
ceph_docker_registry_username: ${REGISTRY_USER_NAME}
ceph_docker_registry_password: ${REGISTRY_TOKEN}
ceph_origin: repository
ceph_repository: rhcs
# ceph_repository_type: cdn
ceph_repository_type: iso
ceph_rhcs_iso_path: /root/rhceph-4.1-rhel-7-x86_64.iso
ceph_rhcs_version: 4
bootstrap_dirs_owner: "167"
bootstrap_dirs_group: "167"
dashboard_admin_user: admin
dashboard_admin_password: Redhat!23
node_exporter_container_image: registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.1
grafana_admin_user: admin
grafana_admin_password: Redhat!23
grafana_container_image: registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8
prometheus_container_image: registry.redhat.io/openshift4/ose-prometheus:4.1
alertmanager_container_image: registry.redhat.io/openshift4/ose-prometheus-alertmanager:4.1
radosgw_interface: eth1
radosgw_address_block: 192.168.18.0/24
radosgw_civetweb_port: 8080
radosgw_civetweb_num_threads: 512
ceph_conf_overrides:
  global:
    osd_pool_default_size: 3
    osd_pool_default_min_size: 2
    osd_pool_default_pg_num: 32
    osd_pool_default_pgp_num: 32
  osd:
   osd_scrub_begin_hour: 22
   osd_scrub_end_hour: 7

EOF

cat << EOF > ./group_vars/osds.yml
devices:
  - /dev/vdb
  - /dev/vdc
  - /dev/vdd
EOF

cat << EOF > ./hosts
[grafana-server]
ceph
[mons]
ceph
[osds]
ceph
[mgrs]
ceph

EOF

sed -i "s/#copy_admin_key: false/copy_admin_key: true/" ./group_vars/rgws.yml

cd /usr/share/ceph-ansible

mkdir -p ~/ceph-ansible-keys
ansible all -m ping -i hosts

ansible-playbook -vv site.yml -i hosts

#  You can access your dashboard web UI at http://ceph:8443/ as an 'admin' user with 'Redhat!23' password

cd /root
ceph osd getcrushmap -o crushmap
crushtool -d crushmap -o crushmap.txt
sed -i 's/step chooseleaf firstn 0 type host/step chooseleaf firstn 0 type osd/' crushmap.txt
grep 'step chooseleaf' crushmap.txt
crushtool -c crushmap.txt -o crushmap-new
ceph osd setcrushmap -i crushmap-new
cd /usr/share/ceph-ansible

# test the result
ceph health detail
ceph osd pool create test 8
ceph osd pool set test pg_num 128
ceph osd pool set test pgp_num 128
ceph osd pool application enable test rbd
ceph -s
ceph osd tree
ceph osd pool ls
ceph pg dump
cat << EOF > hello-world.txt
wangzheng
EOF
rados --pool test put hello-world hello-world.txt
rados --pool test get hello-world fetch.txt
cat fetch.txt

# continue to install
cat << EOF >> ./hosts
[rgws]
ceph
[mdss]
ceph

EOF

ansible-playbook -vv site.yml --limit mdss -i hosts

ansible-playbook -vv site.yml --limit rgws -i hosts

# change mon param for S3
# 416 (InvalidRange)
# https://www.cnblogs.com/flytor/p/11380026.html
# https://www.cnblogs.com/fuhai0815/p/12144214.html
# https://access.redhat.com/solutions/3328431
# add config line
vi /etc/ceph/ceph.conf
# mon_max_pg_per_osd = 300

systemctl restart ceph-mon@ceph.service

ceph tell mon.* injectargs '--mon_max_pg_per_osd=1000'

ceph --admin-daemon /var/run/ceph/ceph-mon.`hostname -s`.asok config show | grep mon_max_pg_per_osd

ceph --admin-daemon /var/run/ceph/ceph-mgr.`hostname -s`.asok config set mon_max_pg_per_osd 1000

ceph osd lspools
ceph osd dump | grep 'replicated size'

install ocs

接下来在openshift4里面安装ocs组件,来对接之前安装的ceph节点。

# check ceph versino
ceph tell osd.* version

python ceph-external-cluster-details-exporter.py --help

python ceph-external-cluster-details-exporter.py --rbd-data-pool-name test --rgw-endpoint 192.168.18.203:8080 --run-as-user client.ocs
# [{"kind": "ConfigMap", "data": {"maxMonId": "0", "data": "ceph=192.168.18.203:6789", "mapping": "{}"}, "name": "rook-ceph-mon-endpoints"}, {"kind": "Secret", "data": {"mon-secret": "mon-secret", "fsid": "bfaeb4fb-2f44-41e7-9539-1ca75bb394a8", "cluster-name": "openshift-storage", "admin-secret": "admin-secret"}, "name": "rook-ceph-mon"}, {"kind": "Secret", "data": {"userKey": "AQBZUWdfavnEDBAA0qwn1WLRbFV+0bUY+8ZnMQ==", "userID": "client.ocs"}, "name": "rook-ceph-operator-creds"}, {"kind": "Secret", "data": {"userKey": "AQBZUWdfC1EzDhAAjVV7+S3jKk8LcPUxxkIF9A==", "userID": "csi-rbd-node"}, "name": "rook-csi-rbd-node"}, {"kind": "StorageClass", "data": {"pool": "test"}, "name": "ceph-rbd"}, {"kind": "Secret", "data": {"userKey": "AQBZUWdfG8pvEBAAnldlqNj72gqBRvSxc8FB+g==", "userID": "csi-rbd-provisioner"}, "name": "rook-csi-rbd-provisioner"}, {"kind": "Secret", "data": {"adminID": "csi-cephfs-provisioner", "adminKey": "AQBZUWdfCxXWExAAiiaU1KIyjFsBxZB6h9WVtw=="}, "name": "rook-csi-cephfs-provisioner"}, {"kind": "Secret", "data": {"adminID": "csi-cephfs-node", "adminKey": "AQBZUWdf52L9ERAAXbK5upV2lO5phttDrwzJyg=="}, "name": "rook-csi-cephfs-node"}, {"kind": "StorageClass", "data": {"pool": "cephfs_data", "fsName": "cephfs"}, "name": "cephfs"}, {"kind": "StorageClass", "data": {"endpoint": "192.168.18.203:8080", "poolPrefix": "default"}, "name": "ceph-rgw"}]

oc get cephcluster -n openshift-storage

oc get storagecluster -n openshift-storage

# install chrome on kvm host
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
yum install ./google-chrome-stable_current_*.rpm
google-chrome &

install cnv

# upload win10.qcow2 to http server(helper)
scp win10.qcow2.gz root@192.168.8.202:/var/www/html/

# on helper
chmod 644 /var/www/html/win10.qcow2.gz

oc project demo
cat << EOF > win10.dv.yaml
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata:
  name: "example-import-dv-win10"
spec:
  source:
      http:
         url: "http://192.168.8.202:8080/win10.qcow2.gz" 
  pvc:
    volumeMode: Block
    storageClassName: ocs-external-storagecluster-ceph-rbd
    accessModes:
      - ReadWriteMany
    resources:
      requests:
        storage: "40Gi"
EOF
oc apply -n demo -f win10.dv.yaml

oc get dv,pvc

# create a vm, and test the live migration

###############################################################
# network

#####################################
# worker4 baremetal, nic bond + vlan + bridge for business
nmcli con add type bond \
    con-name bond-13 \
    ifname bond-13 \
    mode 802.3ad ipv4.method disabled ipv6.method ignore
    
nmcli con mod id bond-13 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3
    
nmcli con add type bond-slave ifname ens49f1 con-name ens49f1 master bond-13
nmcli con add type bond-slave ifname ens35f1 con-name ens35f1 master bond-13

nmcli con up bond-13

nmcli connection add type bridge con-name br-business ifname br-business ip4 172.17.4.211/24

nmcli con up br-business

nmcli con add type vlan con-name vlan-business ifname vlan-business dev bond-13 id 991 master br-business slave-type bridge

nmcli con up vlan-business

#####################################
# worker4 baremetal, nic bond + vlan + bridge for business
nmcli con add type bond \
    con-name bond-13 \
    ifname bond-13 \
    mode 802.3ad ipv4.method disabled ipv6.method ignore
    
nmcli con mod id bond-13 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3
    
nmcli con add type bond-slave ifname ens49f1 con-name ens49f1 master bond-13
nmcli con add type bond-slave ifname ens35f1 con-name ens35f1 master bond-13

nmcli con up bond-13

nmcli connection add type bridge con-name br-business ifname br-business ip4 172.17.4.210/24

nmcli con up br-business

nmcli con add type vlan con-name vlan-business ifname vlan-business dev bond-13 id 991 master br-business slave-type bridge

nmcli con up vlan-business

###############################
# try to add 2nd nic
cat << EOF > nic.vm.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: bridge-network-business
  annotations:
    k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/br-business 
spec:
  config: '{
    "cniVersion": "0.3.1",
    "name": "bridge-network-business", 
    "plugins": [
      {
        "type": "cnv-bridge", 
        "bridge": "br-business" 
      },
      {
        "type": "cnv-tuning" 
      }
    ]
  }'
EOF

CS 游戏业务场景测试


###################################
# add management vlan to kvm host

nmcli con add type vlan con-name vlan-management ifname vlan-management dev bond-24 id 500  ip4 1.41.0.124/27

nmcli con up vlan-management

#restore
nmcli con del vlan-management

# upload cs server image
# for python3
python -m http.server 7800
# for python2
python -m SimpleHTTPServer 7800

oc project demo
cat << EOF > cnv.cs.dv.yaml
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata:
  name: "import-dv-cs-yitu"
spec:
  source:
      http:
         url: "http://192.168.8.251:7800/yitu.raw" 
  pvc:
    volumeMode: Block
    storageClassName: ocs-external-storagecluster-ceph-rbd
    accessModes:
      - ReadWriteMany
    resources:
      requests:
        storage: "150Gi"
EOF
oc apply -n demo -f cnv.cs.dv.yaml

oc get dv,pvc


业务测试服务器是一个CS业务,还是个ubuntu14,我们启动这个虚机,并且配置他的网络。 interface /etc/network/interfaces.d/eth0.cfg for cs server (ubuntu 14)

# The primary network interface
auto eth0
iface eth0 inet static
    address 172.17.4.215
    netmask 255.255.255.0
    gateway 172.17.4.254
    dns-nameservers 114.114.114.114
ifdown eth0
ifup eth0

cnv live migration

# upload cs server image
# for python3
python -m http.server 7800
# for python2
python -m SimpleHTTPServer 7800

oc project demo
cat << EOF > cnv.cs.dv.yaml
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata:
  name: "import-dv-rhel-78"
spec:
  source:
      http:
         url: "http://192.168.8.251:7800/rhel7.8.img" 
  pvc:
    volumeMode: Block
    storageClassName: ocs-external-storagecluster-ceph-rbd
    accessModes:
      - ReadWriteMany
    resources:
      requests:
        storage: "10Gi"
EOF
oc apply -n demo -f cnv.cs.dv.yaml

oc get dv,pvc

############################################
# try to debug the vm stuck after node failure, but find out this is not working.
# we try to decrease the pdb, but no use, vm still not move to another node. 
oc get pdb -n demo
# NAME                               MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
# kubevirt-disruption-budget-j5zlc   2               N/A               0                     12m
# kubevirt-disruption-budget-qsk9j   2               N/A               0                     12m

oc patch pdb kubevirt-disruption-budget-j5zlc -n demo --type=merge -p '{"spec":{"minAvailable":0}}'
oc patch pdb kubevirt-disruption-budget-qsk9j -n demo --type=merge -p '{"spec":{"minAvailable":0}}'

# Cannot evict pod as it would violate the pod's disruption budget.
oc adm drain worker-3.ocp4.redhat.ren --grace-period=10 --force --delete-local-data --ignore-daemonsets

oc adm uncordon worker-3.ocp4.redhat.ren

debug for node failure senario

# evictionStrategy: LiveMigrate
# power off and power on the VM 

# remove evictionStrategy: LiveMigrate settings, 
# and find out this doesn't work
oc patch -n demo vm/rhel78 --type json  -p '[{"op": "remove", "path": "/spec/template/spec/evictionStrategy"}]'
oc get vm/rhel78 -o yaml | grep evictionStrategy

# restore evictionStrategy: LiveMigrate settings
oc patch -n demo vm/rhel78 --type=merge -p '{"spec": {"template": {"spec": {"evictionStrategy":"LiveMigrate"} } } }'

# oc delete pod -n openshift-storage noobaa-db-0 --force --grace-period=0
# oc get pod -n openshift-storage

# no result out for these 2 command
oc get pod/virt-launcher-rhel78-r6d9m -o yaml | grep -A2 finalizer
oc get vm/rhel78 -o yaml | grep -A2 finalizer

# we can see there are finalizers on vmi
oc get vmi/rhel78 -o yaml | grep -A2 finalizer
  # finalizers:
  # - foregroundDeleteVirtualMachine
  # generation: 20

# poweroff the node, to reproduce the issue
# when the node is notready, and pod is terminating
oc get node
# NAME                       STATUS     ROLES        AGE    VERSION
# master-0.ocp4.redhat.ren   Ready      master       102d   v1.18.3+6c42de8
# master-1.ocp4.redhat.ren   Ready      master       102d   v1.18.3+6c42de8
# master-2.ocp4.redhat.ren   Ready      master       102d   v1.18.3+6c42de8
# worker-0.ocp4.redhat.ren   Ready      worker       102d   v1.18.3+6c42de8
# worker-1.ocp4.redhat.ren   Ready      worker       102d   v1.18.3+6c42de8
# worker-2.ocp4.redhat.ren   Ready      worker       102d   v1.18.3+6c42de8
# worker-3.ocp4.redhat.ren   NotReady   cnv,worker   93d    v1.18.3+6c42de8
# worker-4.ocp4.redhat.ren   Ready      cnv,worker   91d    v1.18.3+6c42de8
oc get pod
# NAME                          READY   STATUS        RESTARTS   AGE
# v2v-vmware-568b875554-lsj57   1/1     Running       0          2d6h
# virt-launcher-rhel78-r6d9m    1/1     Terminating   0          44m
oc get vmi
# NAME     AGE   PHASE     IP               NODENAME
# rhel78   52m   Running   172.17.4.15/24   worker-3.ocp4.redhat.ren

# below is working
oc patch -n demo vmi/rhel78 --type=merge -p '{"metadata": {"finalizers":null}}'

# after node failure, delete vmi
oc delete vmi/rhel78
oc get pod
# NAME                          READY   STATUS        RESTARTS   AGE
# v2v-vmware-568b875554-lsj57   1/1     Running       0          2d6h
# virt-launcher-rhel78-f5ltc    1/1     Running       0          32s
# virt-launcher-rhel78-r6d9m    1/1     Terminating   0          46m

# no use below, because we are bare mental.
cat << EOF > healthcheck.yaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example 
  namespace: openshift-machine-api
spec:
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-role: cnv
  unhealthyConditions:
  - type:    "Ready"
    timeout: "300s" 
    status: "False"
  - type:    "Ready"
    timeout: "300s" 
    status: "Unknown"
  maxUnhealthy: "80%" 
EOF
oc apply -f healthcheck.yaml
oc get MachineHealthCheck -n openshift-machine-api
# NAME      MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
# example   80%

其他备忘

oc get nns worker-4.ocp4.redhat.ren -o yaml
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkState
metadata:
  creationTimestamp: "2020-09-16T03:15:51Z"
  generation: 1
  managedFields:
  - apiVersion: nmstate.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .: {}
          k:{"uid":"135e4844-bf87-465a-8f6a-5fc1f85e5beb"}:
            .: {}
            f:apiVersion: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:status:
        .: {}
        f:currentState:
          .: {}
          f:dns-resolver:
            .: {}
            f:config:
              .: {}
              f:search: {}
              f:server: {}
            f:running:
              .: {}
              f:search: {}
              f:server: {}
          f:interfaces: {}
          f:route-rules:
            .: {}
            f:config: {}
          f:routes:
            .: {}
            f:config: {}
            f:running: {}
        f:lastSuccessfulUpdateTime: {}
    manager: kubernetes-nmstate
    operation: Update
    time: "2020-09-23T01:38:50Z"
  name: worker-4.ocp4.redhat.ren
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: worker-4.ocp4.redhat.ren
    uid: 135e4844-bf87-465a-8f6a-5fc1f85e5beb
  resourceVersion: "43763614"
  selfLink: /apis/nmstate.io/v1alpha1/nodenetworkstates/worker-4.ocp4.redhat.ren
  uid: 095a8223-d139-4add-9fcf-0e0435191f78
status:
  currentState:
    dns-resolver:
      config:
        search: []
        server:
        - 192.168.8.202
      running:
        search: []
        server:
        - 192.168.8.202
    interfaces:
    - ipv4:
        dhcp: false
        enabled: false
      ipv6:
        autoconf: false
        dhcp: false
        enabled: false
      link-aggregation:
        mode: 802.3ad
        options:
          ad_actor_system: "00:00:00:00:00:00"
          lacp_rate: fast
          miimon: "100"
          xmit_hash_policy: layer2+3
        slaves:
        - ens49f1
        - ens35f1
      mac-address: B8:59:9F:EF:71:5D
      mtu: 1500
      name: bond-13
      state: up
      type: bond
    - ipv4:
        dhcp: false
        enabled: false
      ipv6:
        autoconf: false
        dhcp: false
        enabled: false
      link-aggregation:
        mode: 802.3ad
        options:
          ad_actor_system: "00:00:00:00:00:00"
          lacp_rate: fast
          miimon: "100"
          xmit_hash_policy: layer2+3
        slaves:
        - ens49f0
        - ens35f0
      mac-address: B8:59:9F:EF:71:5C
      mtu: 1500
      name: bond-24
      state: up
      type: bond
    - bridge:
        options:
          group-forward-mask: 0
          mac-ageing-time: 300
          multicast-snooping: true
          stp:
            enabled: true
            forward-delay: 15
            hello-time: 2
            max-age: 20
            priority: 32768
        port:
        - name: vlan-business
          stp-hairpin-mode: false
          stp-path-cost: 100
          stp-priority: 32
      ipv4:
        address:
        - ip: 172.17.4.211
          prefix-length: 24
        dhcp: false
        enabled: true
      ipv6:
        address:
        - ip: fe80::1a6a:4414:8fec:940e
          prefix-length: 64
        auto-dns: true
        auto-gateway: true
        auto-routes: true
        autoconf: true
        dhcp: true
        enabled: true
      mac-address: B8:59:9F:EF:71:5D
      mtu: 1500
      name: br-business
      state: up
      type: linux-bridge
    - ipv4:
        enabled: false
      ipv6:
        enabled: false
      mac-address: 1e:d4:cc:be:5e:49
      mtu: 1450
      name: br0
      state: down
      type: ovs-interface
    - ethernet:
        auto-negotiation: true
        duplex: full
        speed: 10000
        sr-iov:
          total-vfs: 0
          vfs: []
      ipv4:
        dhcp: false
        enabled: false
      ipv6:
        autoconf: false
        dhcp: false
        enabled: false
      mac-address: B8:59:9F:EF:71:5C
      mtu: 1500
      name: ens35f0
      state: up
      type: ethernet
    - ethernet:
        auto-negotiation: true
        duplex: full
        speed: 10000
        sr-iov:
          total-vfs: 0
          vfs: []
      ipv4:
        dhcp: false
        enabled: false
      ipv6:
        autoconf: false
        dhcp: false
        enabled: false
      mac-address: B8:59:9F:EF:71:5D
      mtu: 1500
      name: ens35f1
      state: up
      type: ethernet
    - ipv4:
        enabled: false
      ipv6:
        enabled: false
      mac-address: B4:96:91:67:2D:A4
      mtu: 1500
      name: ens47f0
      state: down
      type: ethernet
    - ethernet:
        auto-negotiation: true
        duplex: full
        speed: 1000
        sr-iov:
          total-vfs: 0
          vfs: []
      ipv4:
        address:
        - ip: 192.168.8.211
          prefix-length: 24
        dhcp: false
        enabled: true
      ipv6:
        address:
        - ip: fe80::b696:91ff:fe67:2da5
          prefix-length: 64
        autoconf: false
        dhcp: false
        enabled: true
      mac-address: B4:96:91:67:2D:A5
      mtu: 1500
      name: ens47f1
      state: up
      type: ethernet
    - ethernet:
        auto-negotiation: true
        duplex: full
        speed: 10000
        sr-iov:
          total-vfs: 0
          vfs: []
      ipv4:
        dhcp: false
        enabled: false
      ipv6:
        autoconf: false
        dhcp: false
        enabled: false
      mac-address: B8:59:9F:EF:71:5C
      mtu: 1500
      name: ens49f0
      state: up
      type: ethernet
    - ethernet:
        auto-negotiation: true
        duplex: full
        speed: 10000
        sr-iov:
          total-vfs: 0
          vfs: []
      ipv4:
        dhcp: false
        enabled: false
      ipv6:
        autoconf: false
        dhcp: false
        enabled: false
      mac-address: B8:59:9F:EF:71:5D
      mtu: 1500
      name: ens49f1
      state: up
      type: ethernet
    - ipv4:
        enabled: false
      ipv6:
        enabled: false
      mtu: 65536
      name: lo
      state: down
      type: unknown
    - ipv4:
        enabled: false
      ipv6:
        enabled: false
      mac-address: de:b2:ca:03:6b:fa
      mtu: 1450
      name: tun0
      state: down
      type: ovs-interface
    - ipv4:
        dhcp: false
        enabled: false
      ipv6:
        autoconf: false
        dhcp: false
        enabled: false
      mac-address: B8:59:9F:EF:71:5D
      mtu: 1500
      name: vlan-business
      state: up
      type: vlan
      vlan:
        base-iface: bond-13
        id: 991
    - ipv4:
        address:
        - ip: 192.168.18.211
          prefix-length: 24
        dhcp: false
        enabled: true
      ipv6:
        address:
        - ip: fe80::e852:70de:e7be:8f04
          prefix-length: 64
        auto-dns: true
        auto-gateway: true
        auto-routes: true
        autoconf: true
        dhcp: true
        enabled: true
      mac-address: B8:59:9F:EF:71:5C
      mtu: 1500
      name: vlan-ceph
      state: up
      type: vlan
      vlan:
        base-iface: bond-24
        id: 501
    - ipv4:
        enabled: false
      ipv6:
        enabled: false
      mac-address: C2:AE:59:84:C6:E0
      mtu: 65000
      name: vxlan_sys_4789
      state: down
      type: vxlan
      vxlan:
        base-iface: ""
        destination-port: 4789
        id: 0
        remote: ""
    route-rules:
      config: []
    routes:
      config:
      - destination: 0.0.0.0/0
        metric: -1
        next-hop-address: 192.168.8.1
        next-hop-interface: ens47f1
        table-id: 0
      running:
      - destination: 172.17.4.0/24
        metric: 425
        next-hop-address: ""
        next-hop-interface: br-business
        table-id: 254
      - destination: 0.0.0.0/0
        metric: 104
        next-hop-address: 192.168.8.1
        next-hop-interface: ens47f1
        table-id: 254
      - destination: 192.168.8.0/24
        metric: 104
        next-hop-address: ""
        next-hop-interface: ens47f1
        table-id: 254
      - destination: 192.168.18.0/24
        metric: 400
        next-hop-address: ""
        next-hop-interface: vlan-ceph
        table-id: 254
      - destination: fe80::/64
        metric: 425
        next-hop-address: ""
        next-hop-interface: br-business
        table-id: 254
      - destination: fe80::/64
        metric: 256
        next-hop-address: ""
        next-hop-interface: ens47f1
        table-id: 254
      - destination: fe80::/64
        metric: 400
        next-hop-address: ""
        next-hop-interface: vlan-ceph
        table-id: 254
      - destination: ff00::/8
        metric: 256
        next-hop-address: ""
        next-hop-interface: br-business
        table-id: 255
      - destination: ff00::/8
        metric: 256
        next-hop-address: ""
        next-hop-interface: ens47f1
        table-id: 255
      - destination: ff00::/8
        metric: 256
        next-hop-address: ""
        next-hop-interface: vlan-ceph
        table-id: 255
  lastSuccessfulUpdateTime: "2020-09-23T01:38:50Z"

next step

Multi-Queue

  • https://kubevirt.io/user-guide/#/creation/disks-and-volumes?id=virtio-block-multi-queue
  • https://kubevirt.io/user-guide/#/creation/interfaces-and-networks?id=virtio-net-multiqueue

cases:

  • https://access.redhat.com/support/cases/#/case/02763144

RHACS / stackrox

官方的安装文档,非常详细和准确,照着做就好。

  • https://help.stackrox.com/docs/get-started/quick-start/

视频讲解

install rhacs

# below is no use for v3.0.59.1
cat <<EOF | oc apply -f -
apiVersion: helm.openshift.io/v1beta1
kind: HelmChartRepository
metadata:
  name: rhacs-repo
spec:
  name: rhacs-repo
  connectionConfig:
    url: http://registry.ocp4.redhat.ren:8080/rhacs-chart/
EOF

# restore
oc delete HelmChartRepository rhacs-repo

mkdir -p /data/install/rhacs
cd /data/install/rhacs

roxctl central generate interactive
# password: redhat

# Enter path to the backup bundle from which to restore keys and certificates (optional):
# Enter PEM cert bundle file (optional):
# Enter administrator password (default: autogenerated):
# Re-enter administrator password:
# Enter orchestrator (k8s, openshift): openshift
# Enter the directory to output the deployment bundle to (default: "central-bundle"):
# Enter the OpenShift major version (3 or 4) to deploy on (default: "0"): 4
# Enter Istio version when deploying into an Istio-enabled cluster (leave empty when not running Istio) (optional):
# Enter the method of exposing Central (route, lb, np, none) (default: "none"): route
# Enter main image to use (default: "stackrox.io/main:3.0.59.1"): registry.redhat.io/rh-acs/main:3.0.59.1
# Enter whether to run StackRox in offline mode, which avoids reaching out to the Internet (default: "false"): true
# Enter whether to enable telemetry (default: "true"):
# Enter the deployment tool to use (kubectl, helm, helm-values) (default: "kubectl"):
# Enter Scanner DB image to use (default: "stackrox.io/scanner-db:2.13.0"): registry.redhat.io/rh-acs/scanner-db:2.13.0
# Enter Scanner image to use (default: "stackrox.io/scanner:2.13.0"): registry.redhat.io/rh-acs/scanner:2.13.0
# Enter Central volume type (hostpath, pvc): pvc
# Enter external volume name (default: "stackrox-db"):
# Enter external volume size in Gi (default: "100"): 100
# Enter storage class name (optional if you have a default StorageClass configured):
# Generating deployment bundle...
# NOTE: Unless run in offline mode, StackRox Kubernetes Security Platform collects and transmits aggregated usage and system health information.  If you want to OPT OUT from this, re-generate the deployment bundle with the '--enable-telemetry=false' flag
# Done!

# Wrote central bundle to "central-bundle"

# To deploy:
#   - If you need to add additional trusted CAs, run central/scripts/ca-setup.sh.

#   - Deploy Central
#     - Run central/scripts/setup.sh
#     - Run oc create -R -f central

#   - Deploy Scanner
#      If you want to run the StackRox Scanner:
#      - Run scanner/scripts/setup.sh
#      - Run oc create -R -f scanner

# PLEASE NOTE: The recommended way to deploy StackRox is by using Helm. If you have
# Helm 3.1+ installed, please consider choosing this deployment route instead. For your
# convenience, all required files have been written to the helm/ subdirectory, along with
# a README file detailing the Helm-based deployment process.

# For administrator login, select the "Login with username/password" option on
# the login page, and log in with username "admin" and the password found in the
# "password" file located in the same directory as this README.

./central-bundle/central/scripts/setup.sh

oc -n stackrox get route central
# NAME      HOST/PORT                               PATH   SERVICES   PORT    TERMINATION   WILDCARD
# central   central-stackrox.apps.ocp4.redhat.ren          central    https   passthrough   None

cat central-bundle/password
# redhat

# open https://central-stackrox.apps.ocp4.redhat.ren 
# with admin / redhat

./central-bundle/scanner/scripts/setup.sh

oc create -R -f central-bundle/scanner
# serviceaccount/scanner created
# clusterrole.rbac.authorization.k8s.io/stackrox-scanner-psp created
# rolebinding.rbac.authorization.k8s.io/stackrox-scanner-psp created
# podsecuritypolicy.policy/stackrox-scanner created
# securitycontextconstraints.security.openshift.io/scanner created
# secret/scanner-db-password created
# secret/scanner-tls created
# secret/scanner-db-tls created
# configmap/scanner-config created
# networkpolicy.networking.k8s.io/scanner created
# networkpolicy.networking.k8s.io/scanner-db created
# deployment.apps/scanner created
# deployment.apps/scanner-db created
# service/scanner created
# service/scanner-db created
# horizontalpodautoscaler.autoscaling/scanner created

install sensor

sensor是stackrox的runtime扫描器核心,本质上,是一个内核模块/ebpf注入,而且是从容器里面注入,这里面的原理,我会单独做一个视频解释一下。

为了装sensor,我们需要在central平台上,添加集群。登录到系统中,选择系统配置,集群,添加集群:

添加集群里面,有2个参数,是sensor的镜像地址,我们当然要用registry.redhat.io的这种不需要申请license的地址了,对应的栏位填写如下信息:

  • registry.redhat.io/rh-acs/main
  • registry.redhat.io/rh-acs/collector

点击下一步以后,下载一个文件,然后到helper上继续。

cd  /data/install/rhacs/

/bin/cp -f ~/Downloads/sensor-ocp4.zip /data/install/rhacs/
unzip -d sensor sensor-ocp4.zip

./sensor/sensor.sh
# namespace/stackrox annotated
# Now using project "stackrox" on server "https://api.ocp4.redhat.ren:6443".
# Creating sensor secrets...
# secret/sensor-tls created
# Creating sensor RBAC roles...
# serviceaccount/sensor created
# clusterrole.rbac.authorization.k8s.io/stackrox:view-cluster created
# clusterrolebinding.rbac.authorization.k8s.io/stackrox:monitor-cluster created
# role.rbac.authorization.k8s.io/edit created
# rolebinding.rbac.authorization.k8s.io/manage-namespace created
# clusterrole.rbac.authorization.k8s.io/stackrox:edit-workloads created
# clusterrolebinding.rbac.authorization.k8s.io/stackrox:enforce-policies created
# clusterrole.rbac.authorization.k8s.io/stackrox:network-policies created
# clusterrolebinding.rbac.authorization.k8s.io/stackrox:network-policies-binding created
# clusterrole.rbac.authorization.k8s.io/stackrox:update-namespaces created
# clusterrolebinding.rbac.authorization.k8s.io/stackrox:update-namespaces-binding created
# clusterrole.rbac.authorization.k8s.io/stackrox:create-events created
# clusterrolebinding.rbac.authorization.k8s.io/stackrox:create-events-binding created
# clusterrole.rbac.authorization.k8s.io/stackrox:review-tokens created
# clusterrolebinding.rbac.authorization.k8s.io/stackrox:review-tokens-binding created
# Creating sensor security context constraints...
# securitycontextconstraints.security.openshift.io/sensor created
# Creating sensor network policies...
# networkpolicy.networking.k8s.io/sensor created
# Creating sensor pod security policies...
# clusterrole.rbac.authorization.k8s.io/stackrox-sensor-psp created
# rolebinding.rbac.authorization.k8s.io/stackrox-sensor-psp created
# podsecuritypolicy.policy/stackrox-sensor created
# Enter username for docker registry at registry.redhat.io: wandering.star
# Enter password for wandering.star @ registry.redhat.io: secret/collector-stackrox created
# Creating admission controller security context constraints...
# securitycontextconstraints.security.openshift.io/admission-control created
# Creating admission controller secrets...
# secret/admission-control-tls created
# Creating admission controller RBAC roles...
# serviceaccount/admission-control created
# role.rbac.authorization.k8s.io/watch-config created
# rolebinding.rbac.authorization.k8s.io/admission-control-watch-config created
# Creating admission controller network policies...
# networkpolicy.networking.k8s.io/admission-control-no-ingress created
# Creating admission controller pod security policies...
# podsecuritypolicy.policy/stackrox-admission-control created
# clusterrole.rbac.authorization.k8s.io/stackrox-admission-control-psp created
# rolebinding.rbac.authorization.k8s.io/stackrox-admission-control-psp created
# Creating admission controller deployment...
# deployment.apps/admission-control created
# service/admission-control created
# W0507 18:24:56.251769   13915 warnings.go:70] admissionregistration.k8s.io/v1beta1 ValidatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 ValidatingWebhookConfiguration
# W0507 18:24:56.272199   13915 warnings.go:70] admissionregistration.k8s.io/v1beta1 ValidatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 ValidatingWebhookConfiguration
# validatingwebhookconfiguration.admissionregistration.k8s.io/stackrox created
# Creating collector security context constraints...
# securitycontextconstraints.security.openshift.io/collector created
# Creating collector secrets...
# secret/collector-tls created
# Creating collector RBAC roles...
# serviceaccount/collector created
# Creating collector network policies...
# networkpolicy.networking.k8s.io/collector-no-ingress created
# Creating collector pod security policies...
# clusterrole.rbac.authorization.k8s.io/stackrox-collector-psp created
# rolebinding.rbac.authorization.k8s.io/stackrox-collector-psp created
# podsecuritypolicy.policy/stackrox-collector created
# Creating collector daemon set...
# daemonset.apps/collector created
# Creating sensor deployment...
# deployment.apps/sensor created
# service/sensor created
# service/sensor-webhook created
# secret/helm-effective-cluster-name created
# Creating upgrader service account
# serviceaccount/sensor-upgrader created
# clusterrolebinding.rbac.authorization.k8s.io/stackrox:upgrade-sensors created

我们来简单的窥探一下,装了sensor以后,master node上面dmesg的信息,可以看到有一个collector kmod加载了,并且还用到了CPU指令集的特性。

在master node上面,执行lsmod,也能看到这个collector kmod

lsmod | grep coll
# collector             651264  22

remove sensor

cd /data/install/rhacs

# ./sensor/delete-sensor.sh

kubectl delete --raw /apis/security.openshift.io/v1/securitycontextconstraints/collector

./sensor/delete-sensor.sh


bugfix for https://access.redhat.com/solutions/5911951

cd /data/install/rhacs


upgrade

https://help.stackrox.com/docs/upgrade-stackrox/from-44/

为 RHACS 找个应用场景: 安全合规测试云

视频讲解

红帽断供centos的思考

首先声明,笔者水平有限,如果读者对本文不认同,那么一定读者是对的,笔者是错的。

红帽最近有一个声明,断供了 git.centos.org。海内外议论纷纷,大部分认为红帽违背了开源协议和开源精神。对此,笔者的分析如下。

先说结论,笔者认为,红帽做的事情,是为开源事业可持续发展,探索出路。

好了,我们接着说,笔者为什么这么认为。首先,在红帽的声明中,间接的提到了,红帽需要钱,换句话说,红帽的业绩遇到了挑战。这很好理解,随着公有云的流行,RHEL的客户基础受到了极大的损失。另外,像Rocky这种下游衍生版本,号称实时和RHEL同步,对RHEL的业绩冲击也是存在的,这方面就有客户携手Rocky Linux公开的声明

缺钱这个问题,摆在这里,那么这个问题是红帽自己独有的问题吗?很明显,并不是,比如之前有名的log2j漏洞大流行,我们才发现,如此广泛使用的组件,是志愿者免费维护的,纯粹的用爱发电。。。我们都知道,如果持续要求对方一直用爱发电,最后的结果会是什么样子。

有一本书,说商业本质是秘密,那对于企业来说,秘密对应着什么呢?笔者认为,对应着产权(资本,IP)和运营。红帽这种企业没有IP,那么它靠的就是运营,开源事业的运营,而运营是基于规则的,运营的终极形态,是流量运营,听上去是不是耳熟?就是互联网上的常说的眼球经济。所以,从某种角度来说,红帽的运营模式,像极了互联网的运营模式,用免费模式打击对手,获得流量(垄断地位),然后流量变现。而不同的地方在于,红帽是围绕开源事业来做这个商业模式/商业循环的,它的这个商业模式如果能成功,就能保障开源事业的成功。

说到规则,这让我想起了看过的一个访谈,里面介绍了商品贸易,专利制度,开源协议,这些都是商业文明的阶段性产物,都是一种制度性保障,为的是让商业高效的运转,但是商业利益,一定是制度制定者来获取的,也就是美国人去获得的。所以红帽其实就是开源协议这种制度的商业利益收获者的代表,自然它就需要为这个制度寻找出路了,好在红帽还是信仰开源,积极反馈开源的,这一点是开源事业的幸事。中国也有欧拉社区,有木兰协议,也在制定自己的规则制度,估计未来基于这个制度的商业利益收获者会是中国人,那让我们拭目以待吧。

所以,笔者认为,红帽做的事情,是在新时代为开源事业探索一个可持续发展的出路。或者说,他是基于GPL协议来打补丁。红帽打补丁的方式,是使用服务费的方式来实现的。我们知道,GPL协议规定,软件二次分发,不能收取授权费。那我们收取服务费可不可以呢?这个问题,我们可以留给律师们去讨论。

笔者虽然对GPL协议并不权威,但是还是能回想起多年前,在学校听过年轻时stallman的演讲,感受最深的是,开源是让程序员能看到源代码,能更好,更高效的工作和协同,多年以后,我们看到了开源的巨大成功,笔者本人也雨露均沾,但是我们还是不能忘记,开源保护的是一种工作方式,如果我们能免费的保护他,自然好,但是如果不能免费,我们就要思考需要付出多大的代价来保护它。

所以说,红帽现在的探索是有意义的,它在保护开源的工作方式,要说有什么不足,我倒认为它的进度太慢了,我们无法评估我们需要付出多少金钱来保护开源的工作方式,以及它和闭源方式的对比。

最后,我来预测一下接下来的发展,笔者猜测,红帽会和RHEL下游的发行版(rocky, alma等)达成协议,红帽发布的补丁等源代码,要保持半年到1年的时间间隔,才能进入下游发行版。这是因为,目前通过官方渠道,得到红帽的源代码,必须走红帽的订阅协议,也就是刚才说的服务费协议,这个协议有一个过期时间,也就是说,你通过协议,拿到的源代码,受到一个有效期的限制,一般是1年,1年以后,你拿着这个源代码,想干什么都可以了。

总结来说,红帽在探索用服务费的方式保护开源的工作方式,这会造成下游发行版的一个人为的时间差。总体来说,对于开发者/程序员来说,开源的工作方式没有变化,但是对于大型最终用户来说,是要考虑,加大你的投资啦。

satellite 作为yum repo的简单演示

客户购买了 redhat rhel 订阅,想在国内使用,但是 rhel 订阅在os上激活,是需要连接国外服务器的,由于众所周知的原因,国内访问有时候不稳定,甚至国外服务器本身有的时候就慢,那么这种情况下,客户就需要一个订阅的代理服务器。红帽的satellite产品,就是这样一个订阅注册代理服务器。

当然,satellite产品内涵很丰富,订阅注册代理只是一个其中一个很小的功能。satellite产品的标准使用场景是,用户有一个内网环境,里面有一个主机能联网,这个主机安装satellite,并向satellite导入订阅证书,启动yum repo镜像服务,iPXE, dhcp, dns服务等,这些服务在一起,就能让内网的其他主机具备了上电以后,自动安装rhel的能力,rhel装好以后,satellite还提供持续更新的功能。

所以satellite是一个带安装源的OS全生命周期管理维护产品,官方文档在这里:

  • https://access.redhat.com/documentation/en-us/red_hat_satellite/6.13

本文,就演示一个最简单的场景,安装satellite,并且内网rhel在satellite上激活订阅,并使用satellite作为yum repo源。

实验架构图,请注意,本次实验展示的satellite的功能和场景很简单,其他satellite的功能,比如内容视图,satellite集群,离线操作等等很多功能,依然等待大家去探索。

安装 satellite server

satellite的完整产品架构里面,有server,还有独立的capsule,我们是极简部署,而且server里面也有内置的capsule,所以我们这次就部署一个server就好了。

服务器用的是16C 32G,500G HDD的VM,实际项目里面,硬盘要大点。

另外,server要有域名,而且要配置好反向解析。

# satellite server
# 172.21.6.171
# dns resolve and reverse to panlab-satellite-server.infra.wzhlab.top

# satellite client host
# 172.21.6.172

# on satellite server
ssh root@172.21.6.171

# https://access.redhat.com/documentation/en-us/red_hat_satellite/6.13/html-single/installing_satellite_server_in_a_connected_network_environment/index

systemctl disable --now firewalld.service

hostnamectl set-hostname panlab-satellite-server.infra.wzhlab.top

ping -c1 localhost
# PING localhost(localhost (::1)) 56 data bytes
# 64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.043 ms

ping -c1 `hostname -f`
# PING panlab-satellite-server.wzhlab.top (172.21.6.171) 56(84) bytes of data.
# 64 bytes from bogon (172.21.6.171): icmp_seq=1 ttl=64 time=0.047 ms

# active subscrition on this rhel.
subscription-manager register --auto-attach --username xxxxxxxxx --password xxxxxxxxxx

# add repo for satellite
subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms \
  --enable=rhel-8-for-x86_64-appstream-rpms \
  --enable=satellite-6.13-for-rhel-8-x86_64-rpms \
  --enable=satellite-maintenance-6.13-for-rhel-8-x86_64-rpms
# Repository 'rhel-8-for-x86_64-baseos-rpms' is enabled for this system.
# Repository 'rhel-8-for-x86_64-appstream-rpms' is enabled for this system.
# Repository 'satellite-6.13-for-rhel-8-x86_64-rpms' is enabled for this system.
# Repository 'satellite-maintenance-6.13-for-rhel-8-x86_64-rpms' is enabled for this system.

dnf module enable satellite:el8

dnf update -y

dnf install satellite chrony sos -y

systemctl enable --now chronyd

# begin install satellite
satellite-installer --scenario satellite \
--foreman-initial-organization "My_Organization" \
--foreman-initial-location "My_Location" \
--foreman-initial-admin-username admin \
--foreman-initial-admin-password redhat
# ......
# 2023-05-16 22:41:17 [NOTICE] [configure] System configuration has finished.
#   Success!
#   * Satellite is running at https://panlab-satellite-server.infra.wzhlab.top
#       Initial credentials are admin / redhat

#   * To install an additional Capsule on separate machine continue by running:

#       capsule-certs-generate --foreman-proxy-fqdn "$CAPSULE" --certs-tar "/root/$CAPSULE-certs.tar"
#   * Capsule is running at https://panlab-satellite-server.infra.wzhlab.top:9090

#   The full log is at /var/log/foreman-installer/satellite.log
# Package versions are being locked.

安装很容易,但是时间有点长,十几分钟,官方建议套在 tmux 里面运行安装程序。安装完成了,浏览器直接访问 url 就可以了。

我们可以在系统里面,看到satellite server作为一个host已经存在了。

下载订阅信息

我们的业务场景,是内网主机都注册到satellite上来,这必然需要把红帽官网上的订阅信息导入到satellite里面去,我们来一步一步做一下。

首先,我们要去红帽官网,创建一个订阅分配,如果我们有100个订阅,都要用到satellite上,那么就分配100个来。我们做实验,就分配1个,后面好实验超用,还有添加数量的场景。

我们创建的订阅分配,类型和我们装的satellite版本要保持一致。

切换到订阅tab:

添加订阅,会打开一个页面,让你搜索你有的订阅,并挑选一个出来:

我们选好了订阅以后,设定数量,根据需要的数量来,一般情况,把你所以的订阅都加进来。我们做实验,就设置 1. 然后下载。

导入订阅信息

我们已经有了订阅信息文件,那么我们回到satellite管理界面里面,导入它。

完成以后,我们就能看到订阅信息了。

配置 yum repo 镜像

我们实验的目的,就是配置一个yum repo 镜像源出来,但是默认satellite使用的是on demand 的方式来下载 rpm,我们希望让他一气呵成,主动提前的下载好,那么需要做一个配置。

激活主动下载配置

做了主动下载配置以后,我们就来添加 yum 源。

我们先搜索 appstream 。

然后我们选择小版本

为了做实验,凸显效果,我们只选择8.6这个非最新版本。

我们再搜索 baseos ,并选择 8.6 版本

选择好了yum 源以后,我们开始手动同步。

选择要同步的repo, 开始

经过漫长的时间,下载完成。

satellite服务器端的基本服务,就配置完了,我们看看系统情况。


satellite-maintain service list
# Running Service List
# ================================================================================
# List applicable services:
# dynflow-sidekiq@.service                   indirect
# foreman-proxy.service                      enabled
# foreman.service                            enabled
# httpd.service                              enabled
# postgresql.service                         enabled
# pulpcore-api.service                       enabled
# pulpcore-content.service                   enabled
# pulpcore-worker@.service                   indirect
# redis.service                              enabled
# tomcat.service                             enabled

# All services listed                                                   [OK]
# --------------------------------------------------------------------------------

df -h
# Filesystem      Size  Used Avail Use% Mounted on
# devtmpfs         16G     0   16G   0% /dev
# tmpfs            16G  148K   16G   1% /dev/shm
# tmpfs            16G  8.9M   16G   1% /run
# tmpfs            16G     0   16G   0% /sys/fs/cgroup
# /dev/sda3       499G  106G  393G  22% /
# /dev/sda2      1014M  265M  749M  27% /boot
# /dev/sda1       599M  9.6M  590M   2% /boot/efi
# tmpfs           3.2G     0  3.2G   0% /run/user/0

free -h
#               total        used        free      shared  buff/cache   available
# Mem:           31Gi        21Gi       1.7Gi       567Mi       7.9Gi       8.6Gi
# Swap:            0B          0B          0B

内存占用21G,硬盘占用 110G。这个数据给以后部署提供一个依据吧。。。

我们查看capsule的使用资源情况。

配置 active key

我们导入了 subscription,要给rhel使用,需要创建active key并绑定。active key可以灵活的控制激活的 rhel 数量,确保我们不超量使用订阅。

随便起一个名字

active key的详细配置里面, 我们设置 host limite 为 unlimited, 这个建议设置为具体数字, 保证不超用。我们还要选择 environment, 简单的场景,默认的就好,这个配置能让我们把主机分成不同的group来管理。 content view 也是默认的, 这个配置可以让不同的主机组,看到的rpm 版本不同 。release version 放空,这个配置可以配置主机默认的release版本。

可以看到,satellite的功能很多,是面向大规模主机部署场景设计的。

然后,我们把订阅附加到 active key里面去。

我们的orgnization启用了 simple access control, 为了对比,我们先 disable它,后面我们会打开它,来做个对比。

取消 SAC 的激活

注册主机

我们来创建就一个 URI, 目标rhel,curl这个 URL,会下载一个脚本,运行这个脚本,目标rhel就注册到我们的satellite server上了。

根据图例,来配置,注意,激活insecure,因为我们是自签名证书

详细配置里面,我们disable全部功能,因为我们不需要satellite来帮助我们部署服务器。我们让这个URL一直有效。

点击生成以后,就得到一个命令,复制下来,保存起来。

有了命令,我们就找一台rhel,来试试。

# on client host

curl -sS --insecure 'https://panlab-satellite-server.infra.wzhlab.top/register?activation_keys=demo-activate&location_id=2&organization_id=1&setup_insights=false&setup_remote_execution=false&setup_remote_execution_pull=false&update_packages=false' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0LCJpYXQiOjE2ODQzMDU1MTYsImp0aSI6IjdiODBkNzdmMjVjYzY1MDZjODQ3OGI2Y2VjNzRkZWZjOGM2YjAyMDUxMDQ4YTcyYTJlMWE1YzRiNTgyMjE5NzAiLCJzY29wZSI6InJlZ2lzdHJhdGlvbiNnbG9iYWwgcmVnaXN0cmF0aW9uI2hvc3QifQ.EVXyW9gjWyAQIFYUxnwwdxAigrPmUo_XYWnqn-Wh1Fw' | bash

# #
# # Running registration
# #
# Updating Subscription Management repositories.
# Unable to read consumer identity

# This system is not registered with an entitlement server. You can use subscription-manager to register.

# Error: There are no enabled repositories in "/etc/yum.repos.d", "/etc/yum/repos.d", "/etc/distro.repos.d".
# The system has been registered with ID: e9d03372-d3f4-4970-bb38-3a2282458e29
# The registered system name is: panlab-satellite-client
# Installed Product Current Status:
# Product Name: Red Hat Enterprise Linux for x86_64
# Status:       Subscribed

# # Running [panlab-satellite-client] host initial configuration
# Refreshing subscription data
# All local data refreshed
# Host [panlab-satellite-client] successfully configured.
# Successfully updated the system facts.

subscription-manager status
# +-------------------------------------------+
#    System Status Details
# +-------------------------------------------+
# Overall Status: Current

# System Purpose Status: Not Specified

subscription-manager release --list
# +-------------------------------------------+
#           Available Releases
# +-------------------------------------------+
# 8.6

subscription-manager release --set=8.6

subscription-manager config
# [server]
#    hostname = panlab-satellite-server.infra.wzhlab.top
# ......
# [rhsm]
#    auto_enable_yum_plugins = [1]
#    baseurl = https://panlab-satellite-server.infra.wzhlab.top/pulp/content
# ......

dnf repolist
# Updating Subscription Management repositories.
# repo id                                                                   repo name
# rhel-8-for-x86_64-appstream-rpms                                          Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)
# rhel-8-for-x86_64-baseos-rpms                                             Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)

dnf makecache
# Updating Subscription Management repositories.
# Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)                                                                                       63 kB/s | 4.1 kB     00:00
# Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)                                                                                    65 kB/s | 4.5 kB     00:00
# Metadata cache created.

subscription-manager repos
# +----------------------------------------------------------+
#     Available Repositories in /etc/yum.repos.d/redhat.repo
# +----------------------------------------------------------+
# Repo ID:   rhel-8-for-x86_64-baseos-rpms
# Repo Name: Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)
# Repo URL:  https://panlab-satellite-server.infra.wzhlab.top/pulp/content/My_Organization/Library/content/dist/rhel8/8.6/x86_64/baseos/os
# Enabled:   1

# Repo ID:   rhel-8-for-x86_64-appstream-rpms
# Repo Name: Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)
# Repo URL:  https://panlab-satellite-server.infra.wzhlab.top/pulp/content/My_Organization/Library/content/dist/rhel8/8.6/x86_64/appstream/os
# Enabled:   1


我们回到active key,可以看到已经激活的 repo

然后看到,我们没有配置host collection,所以这里也是空的。

最后,我们在active key的host列表中,看到了我们刚才的主机。

点进去看看,可以看到主机的rpm的安全问题,satellite已经能够看到。

问题那么多,我们更新一下看看

# on satellite-client
dnf update -y

哈哈,问题都解决了。

我们能看到,已经使用了一个订阅

在订阅详细信息里面,也能看到一个activation key

订阅包含的,使用的产品内容就是baseos, appstream

主机列表多了我们刚才激活的主机。

增加订阅数量

如果我们多买了一些订阅,怎么添加数量呢?这里,我们就模拟增加1个订阅的场景。

我们访问redhat portal,点击之前创建的订阅分配。

调整数量为 2

回到satellite里面,我们维护一下我们的manifect

点击刷新,他会在线更新

更新完成以后,数量就变成 2 了。

超用会发生什么

我们回复订阅分配为 1 ,然后在第二台主机上激活订阅,会发生什么呢?

# on client-02 , to try over use
curl -sS --insecure 'https://panlab-satellite-server.infra.wzhlab.top/register?activation_keys=demo-activate&location_id=2&organization_id=1&setup_insights=false&setup_remote_execution=false&setup_remote_execution_pull=false&update_packages=false' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0LCJpYXQiOjE2ODQzMDU1MTYsImp0aSI6IjdiODBkNzdmMjVjYzY1MDZjODQ3OGI2Y2VjNzRkZWZjOGM2YjAyMDUxMDQ4YTcyYTJlMWE1YzRiNTgyMjE5NzAiLCJzY29wZSI6InJlZ2lzdHJhdGlvbiNnbG9iYWwgcmVnaXN0cmF0aW9uI2hvc3QifQ.EVXyW9gjWyAQIFYUxnwwdxAigrPmUo_XYWnqn-Wh1Fw' | bash
# #
# # Running registration
# #
# Updating Subscription Management repositories.
# Unable to read consumer identity

# This system is not registered with an entitlement server. You can use subscription-manager to register.

# Error: There are no enabled repositories in "/etc/yum.repos.d", "/etc/yum/repos.d", "/etc/distro.repos.d".
# The system has been registered with ID: 43e38f76-2416-49db-890f-1a3ad3973828
# The registered system name is: satellite-client-02
# Installed Product Current Status:
# Product Name: Red Hat Enterprise Linux for x86_64
# Status:       Not Subscribed

# Unable to find available subscriptions for all your installed products.

subscription-manager list --consumed
# No consumed subscription pools were found.

subscription-manager repos
# This system has no repositories available through subscriptions.

subscription-manager status
# +-------------------------------------------+
#    System Status Details
# +-------------------------------------------+
# Overall Status: Invalid

# Red Hat Enterprise Linux for x86_64:
# - Not supported by a valid subscription.

# System Purpose Status: Not Specified

我们可以看到,订阅没有激活。我们确认一下,在订阅里面看,消耗量为 1

但是在activation key 里面,host 为2

不过,这个host list里面,有一个主机没有激活。

激活 Simple Content Access (SCA)

我们激活SCA,并限制 activation key 的 host 数量,用这种方法,来平衡使用方便和订阅不要超用。

激活 SCA

限制host 数量为1

我们在第二台主机上在激活试试

# on client-02 , to try over use
curl -sS --insecure 'https://panlab-satellite-server.infra.wzhlab.top/register?activation_keys=demo-activate&location_id=2&organization_id=1&setup_insights=false&setup_remote_execution=false&setup_remote_execution_pull=false&update_packages=false' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0LCJpYXQiOjE2ODQzMDU1MTYsImp0aSI6IjdiODBkNzdmMjVjYzY1MDZjODQ3OGI2Y2VjNzRkZWZjOGM2YjAyMDUxMDQ4YTcyYTJlMWE1YzRiNTgyMjE5NzAiLCJzY29wZSI6InJlZ2lzdHJhdGlvbiNnbG9iYWwgcmVnaXN0cmF0aW9uI2hvc3QifQ.EVXyW9gjWyAQIFYUxnwwdxAigrPmUo_XYWnqn-Wh1Fw' | bash
# #
# # Running registration
# #
# Updating Subscription Management repositories.
# Unable to read consumer identity

# This system is not registered with an entitlement server. You can use subscription-manager to register.

# Error: There are no enabled repositories in "/etc/yum.repos.d", "/etc/yum/repos.d", "/etc/distro.repos.d".
# Max Hosts (1) reached for activation key 'demo-activate' (HTTP error code 409: Conflict)

激活失败。

超用情况

如果启用了SCA,但是不限制host数量,超用的情况下,能不能激活成功呢?我们做个实验。

先导入只有1个订阅的离线订阅文件。

然后,在active key里面,放开host limit限制。

接下来,我们在2个主机上,进行激活。


# on 172
# try to register
curl -sS --insecure 'https://panlab-satellite-server.infra.wzhlab.top:6443/register?activation_keys=demo-activate&location_id=2&organization_id=1&setup_insights=false&setup_remote_execution=false&update_packages=false' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0LCJpYXQiOjE2OTM4ODg5NzIsImp0aSI6IjFlNzdkNDE1OWM4NmE3OGVjOWY5NjViMWQwODRlOWY5NThlN2ExZDBkNWZhZTJjZjY3NjMzMTQ1Nzk5NTRkNWEiLCJzY29wZSI6InJlZ2lzdHJhdGlvbiNnbG9iYWwgcmVnaXN0cmF0aW9uI2hvc3QifQ.aM6lKpNfBCH-FMHU0xkc6q4XaNeuS8JezLIQCf2faxI' | bash
# #
# # Running registration
# #
# Updating Subscription Management repositories.
# Unable to read consumer identity

# This system is not registered with an entitlement server. You can use subscription-manager to register.

# Error: There are no enabled repositories in "/etc/yum.repos.d", "/etc/yum/repos.d", "/etc/distro.repos.d".
# The system has been registered with ID: b853bd17-204a-4eeb-83c7-1d07f3dea7c6
# The registered system name is: client-0-changed
# # Running [client-0-changed] host initial configuration
# Refreshing subscription data
# All local data refreshed
# Host [client-0-changed] successfully configured.
# Successfully updated the system facts.


# on 173
# try to register
curl -sS --insecure 'https://panlab-satellite-server.infra.wzhlab.top:6443/register?activation_keys=demo-activate&location_id=2&organization_id=1&setup_insights=false&setup_remote_execution=false&update_packages=false' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0LCJpYXQiOjE2OTM4ODg5NzIsImp0aSI6IjFlNzdkNDE1OWM4NmE3OGVjOWY5NjViMWQwODRlOWY5NThlN2ExZDBkNWZhZTJjZjY3NjMzMTQ1Nzk5NTRkNWEiLCJzY29wZSI6InJlZ2lzdHJhdGlvbiNnbG9iYWwgcmVnaXN0cmF0aW9uI2hvc3QifQ.aM6lKpNfBCH-FMHU0xkc6q4XaNeuS8JezLIQCf2faxI' | bash
# #
# # Running registration
# #
# Updating Subscription Management repositories.
# Unable to read consumer identity

# This system is not registered with an entitlement server. You can use subscription-manager to register.

# Error: There are no enabled repositories in "/etc/yum.repos.d", "/etc/yum/repos.d", "/etc/distro.repos.d".
# The system has been registered with ID: 4bba0a26-4f91-4bae-8752-4b073eeaee13
# The registered system name is: satellite-client-02
# # Running [satellite-client-02] host initial configuration
# Refreshing subscription data
# All local data refreshed
# Host [satellite-client-02] successfully configured.
# Successfully updated the system facts.

使用 API 来注销主机

一般情况下,主机注册以后就一直在satellite里面了,但是如果我们是一个云环境,主机需要频繁的注册和注销,那么我们需要一个自动的方法,让云平台来调用 satellite API,实现satellite里面的主机自动注销。

使用 hostname 来注销

satellite官方文档里面,已经提供了一个API,可以自动注销主机。

本次实验就试试把client-2给删掉。


curl -s --request DELETE --insecure --user admin:redhat \
https://panlab-satellite-server.infra.wzhlab.top/api/v2/hosts/satellite-client-02 | jq .
# {
#   "id": 3,
#   "name": "satellite-client-02",
#   "last_compile": "2023-05-17T10:21:24.000Z",
#   "last_report": null,
#   "updated_at": "2023-05-17T10:21:24.861Z",
#   "created_at": "2023-05-17T10:19:49.756Z",
#   "root_pass": null,
#   "architecture_id": 1,
#   "operatingsystem_id": 2,
#   "ptable_id": null,
#   "medium_id": null,
#   "build": false,
#   "comment": null,
#   "disk": null,
#   "installed_at": null,
#   "model_id": 1,
#   "hostgroup_id": null,
#   "owner_id": 1,
#   "owner_type": "User",
#   "enabled": true,
#   "puppet_ca_proxy_id": null,
#   "managed": false,
#   "use_image": null,
#   "image_file": "",
#   "uuid": null,
#   "compute_resource_id": null,
#   "puppet_proxy_id": null,
#   "certname": "satellite-client-02",
#   "image_id": null,
#   "organization_id": 1,
#   "location_id": 2,
#   "otp": null,
#   "realm_id": null,
#   "compute_profile_id": null,
#   "provision_method": "build",
#   "grub_pass": null,
#   "discovery_rule_id": null,
#   "global_status": 2,
#   "lookup_value_matcher": "fqdn=satellite-client-02",
#   "openscap_proxy_id": null,
#   "pxe_loader": null,
#   "initiated_at": null,
#   "build_errors": null,
#   "content_facet_attributes": {
#     "id": 2,
#     "host_id": 3,
#     "uuid": null,
#     "content_view_id": 1,
#     "lifecycle_environment_id": 1,
#     "kickstart_repository_id": null,
#     "content_source_id": null,
#     "installable_security_errata_count": 0,
#     "installable_enhancement_errata_count": 0,
#     "installable_bugfix_errata_count": 0,
#     "applicable_rpm_count": 0,
#     "upgradable_rpm_count": 0,
#     "applicable_module_stream_count": 0,
#     "upgradable_module_stream_count": 0,
#     "applicable_deb_count": 0,
#     "upgradable_deb_count": 0
#   }
# }


API 调用以后,我们就能看到 client-2 这个主机被注销了。

这个注销主机的方法,有一个潜在问题,就是这个主机名会不会改变,如果我们在主机上,把主机名给改了,satellite里面会自动改,还是不会变呢?我们继续实验看看。

我们先看看现在的主机名是什么

hostnamectl
  #  Static hostname: client-0
  #        Icon name: computer-vm
  #          Chassis: vm
  #       Machine ID: 75587495919e40b7a0d39f7168df895e
  #          Boot ID: a15f631019d0463395d12c332873eb52
  #   Virtualization: vmware
  # Operating System: Red Hat Enterprise Linux 8.6 (Ootpa)
  #      CPE OS Name: cpe:/o:redhat:enterprise_linux:8::baseos
  #           Kernel: Linux 4.18.0-372.32.1.el8_6.x86_64
  #     Architecture: x86-64

在satellite里面确认一下主机名

接着,我们修改主机名,并刷新

hostnamectl set-hostname client-0-changed

subscription-manager refresh

我们在satellite里面确认一下,主机名没有修改

那么,什么情况下,satellite里面的主机名会改变呢,通过笔者的实验,发现必须unregister以后,重新注册才可以。

使用 host id 来注销

# get host uuid on the managed host
subscription-manager facts | grep system.uuid
# dmi.system.uuid: 4C6B4D56-ACB7-585F-EB20-90FD676DEA4B

# check how many uuid you can find, from another host
subscription-manager facts | grep uuid
# dmi.system.uuid: 8DF84D56-895F-6163-962B-30EF44BDE122
# virt.uuid: 8DF84D56-895F-6163-962B-30EF44BDE122

# get host id from satellite by uuid
curl -s --request GET --insecure --user admin:redhat \
  https://panlab-satellite-server.infra.wzhlab.top/api/v2/hosts?search=facts.dmi::system::uuid=4C6B4D56-ACB7-585F-EB20-90FD676DEA4B | \
  jq .results[0].id
# 8

# get host id from satellite by name
curl -s --request GET --insecure --user admin:redhat \
https://panlab-satellite-server.infra.wzhlab.top/api/hosts/panlab-satellite-client | jq .id
# 2

# delete host using host id
curl -s --request DELETE --insecure --user admin:redhat \
https://panlab-satellite-server.infra.wzhlab.top/api/hosts/2 | jq .
# {
#   "id": 2,
#   "name": "panlab-satellite-client",
#   "last_compile": "2023-05-17T12:26:28.000Z",
#   "last_report": null,
#   "updated_at": "2023-05-17T12:26:28.289Z",
#   "created_at": "2023-05-17T06:43:46.628Z",
#   "root_pass": null,
#   "architecture_id": 1,
#   "operatingsystem_id": 2,
#   "ptable_id": null,
#   "medium_id": null,
#   "build": false,
#   "comment": null,
#   "disk": null,
#   "installed_at": "2023-05-17T06:44:01.221Z",
#   "model_id": 1,
#   "hostgroup_id": null,
#   "owner_id": 1,
#   "owner_type": "User",
#   "enabled": true,
#   "puppet_ca_proxy_id": null,
#   "managed": false,
#   "use_image": null,
#   "image_file": "",
#   "uuid": null,
#   "compute_resource_id": null,
#   "puppet_proxy_id": null,
#   "certname": "panlab-satellite-client",
#   "image_id": null,
#   "organization_id": 1,
#   "location_id": 2,
#   "otp": null,
#   "realm_id": null,
#   "compute_profile_id": null,
#   "provision_method": "build",
#   "grub_pass": null,
#   "discovery_rule_id": null,
#   "global_status": 0,
#   "lookup_value_matcher": "fqdn=panlab-satellite-client",
#   "openscap_proxy_id": null,
#   "pxe_loader": null,
#   "initiated_at": "2023-05-17T06:43:59.574Z",
#   "build_errors": null,
#   "content_facet_attributes": {
#     "id": 1,
#     "host_id": 2,
#     "uuid": "e9d03372-d3f4-4970-bb38-3a2282458e29",
#     "content_view_id": 1,
#     "lifecycle_environment_id": 1,
#     "kickstart_repository_id": null,
#     "content_source_id": null,
#     "installable_security_errata_count": 0,
#     "installable_enhancement_errata_count": 0,
#     "installable_bugfix_errata_count": 0,
#     "applicable_rpm_count": 0,
#     "upgradable_rpm_count": 0,
#     "applicable_module_stream_count": 0,
#     "upgradable_module_stream_count": 0,
#     "applicable_deb_count": 0,
#     "upgradable_deb_count": 0
#   },
#   "subscription_facet_attributes": {
#     "id": 1,
#     "host_id": 2,
#     "uuid": "e9d03372-d3f4-4970-bb38-3a2282458e29",
#     "last_checkin": "2023-06-26T03:27:43.457Z",
#     "service_level": "",
#     "release_version": "8.6",
#     "autoheal": true,
#     "registered_at": "2023-05-17T06:43:47.000Z",
#     "registered_through": "panlab-satellite-server.infra.wzhlab.top",
#     "user_id": null,
#     "hypervisor": false,
#     "hypervisor_host_id": null,
#     "purpose_usage": "",
#     "purpose_role": "",
#     "dmi_uuid": "4C6B4D56-ACB7-585F-EB20-90FD676DEA4B"
#   }
# }

调用了这个接口之后,我们就能看到这个主机被注销了。

网络防火墙端口

客户的网络有严格的限制,要访问公网,需要特定的开防火墙,那么satellite需要开什么防火墙策略呢?

根据以下的一些官方知识库

  1. How to access Red Hat Subscription Manager (RHSM) through a firewall or proxy
  2. Public CIDR Lists for Red Hat (IP Addresses for cdn.redhat.com)
  3. Downloading Packages via Red Hat Official Network is Slow in mainland China
  4. What is the IP address range for 'subscription.rhn.redhat.com' and 'subscription.rhsm.redhat.com'?
  5. How do I configure my firewall for api.access.redhat.com?

我们总结出了一些域名需要放开

  1. subscription.rhn.redhat.com:443 [https] AND subscription.rhsm.redhat.com:443 [https] (This is the new default address in newer versions of RHEL 7)
  2. cdn.redhat.com:443 [https]
  3. *.akamaiedge.net:443 [https] OR *.akamaitechnologies.com:443 [https]
  4. china.cdn.redhat.com:443 [https]

如果客户网络的防火墙,只支持ip,那么要放开如下的一系列网络段,不过根据作者实际测试,这个ip地址列表并不准确,或者说,更新的并不及时。

端口转发

客户内网有严格的流量管理,不允许443端口通讯,需要把satellite的https 443端口,变成6443,那么我们来试试

先在web console上做一个配置

# on satellite server
# redirect 6443 to 443
iptables -t nat -A PREROUTING -p tcp --dport 6443 -j REDIRECT --to-port 443

# block 443 traffic REJECT
# iptables -A INPUT -p tcp --dport 443 -j DROP
# iptables -A INPUT -p tcp --dport 443 -j REJECT
# iptables -A INPUT -p tcp --dport 443 -j ACCEPT
# iptables -A INPUT -p tcp --dport 80 -j ACCEPT

# persistent
iptables-save > /etc/sysconfig/iptables

cat << EOF > /etc/systemd/system/iptables.service
[Unit]
Description=iptables Firewall Rules
After=network.target

[Service]
ExecStart=/sbin/iptables-restore /etc/sysconfig/iptables
Type=oneshot
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

systemctl enable --now iptables.service

# systemctl disable --now iptables.service

# sed -i "s/443/6443/g" /etc/httpd/conf/ports.conf

# semanage port -a -t http_port_t -p tcp 6443

# on client node
# to register
curl -sS --insecure 'https://panlab-satellite-server.infra.wzhlab.top:6443/register?activation_keys=demo-activate&location_id=2&organization_id=1&setup_insights=false&setup_remote_execution=false&update_packages=false' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0LCJpYXQiOjE2ODk4MjMxNDksImp0aSI6IjUxNTNiZmFjMDIxMjNjYjEzZDdjZjM5NWRkMWIyZWEzMWQ3NzA3YTczNzgxNzRhOWI5MDMzMzdjOTA4MzBlY2UiLCJzY29wZSI6InJlZ2lzdHJhdGlvbiNnbG9iYWwgcmVnaXN0cmF0aW9uI2hvc3QifQ.idNFXNsi6mz0fKef42yn_XwVWvwdKD2R3FolAHsrRmo' > sub.sh

sed -i 's/--server.port="443"/--server.port="6443"/g' sub.sh 

sed -i 's|https://panlab-satellite-server.infra.wzhlab.top/|https://panlab-satellite-server.infra.wzhlab.top:6443/|g' sub.sh 

# manually modify the shell
# comment out 2 step at the end of the script
# 我们的场景简单,就不需要其他步骤了

#     #register_katello_host | bash
# 	echo 'skip step'
# else
#     #register_host | bash
# 	echo 'skip step'
# fi

bash sub.sh


subscription-manager release --list
# +-------------------------------------------+
#           Available Releases
# +-------------------------------------------+
# 8.6

subscription-manager release --set=8.6

subscription-manager config
# [server]
  #  hostname = panlab-satellite-server.infra.wzhlab.top
  #  insecure = [0]
  #  no_proxy = []
  #  port = 6443
# ......
# [rhsm]
#    auto_enable_yum_plugins = [1]
#    baseurl = https://panlab-satellite-server.infra.wzhlab.top:6443/pulp/content
# ......

subscription-manager list --installed
# +-------------------------------------------+
#     Installed Product Status
# +-------------------------------------------+
# Product Name: Red Hat Enterprise Linux for x86_64
# Product ID:   479
# Version:      8.6
# Arch:         x86_64

subscription-manager repos
# +----------------------------------------------------------+
#     Available Repositories in /etc/yum.repos.d/redhat.repo
# +----------------------------------------------------------+
# Repo ID:   rhel-8-for-x86_64-appstream-rpms
# Repo Name: Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)
# Repo URL:  https://panlab-satellite-server.infra.wzhlab.top:6443/pulp/content/My_Organization/Library/content/dist/rhel8/8.6/x86_64/appstream/os
# Enabled:   1

# Repo ID:   rhel-8-for-x86_64-baseos-rpms
# Repo Name: Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)
# Repo URL:  https://panlab-satellite-server.infra.wzhlab.top:6443/pulp/content/My_Organization/Library/content/dist/rhel8/8.6/x86_64/baseos/os
# Enabled:   1

# try to unregister using satellite API
# get host id from satellite
# do not run below command on satellite server, you will face iptable redirect rule failure
curl -s --request GET --insecure --user admin:redhat \
https://panlab-satellite-server.infra.wzhlab.top:6443/api/hosts/client-0-changed | jq .id
# 6

# delete host using host id
curl -s --request DELETE --insecure --user admin:redhat \
https://panlab-satellite-server.infra.wzhlab.top:6443/api/hosts/6 | jq .
# {
#   "id": 6,
#   "name": "client-0-changed",
#   "last_compile": "2023-07-20T04:02:48.000Z",
#   "last_report": null,
#   "updated_at": "2023-07-20T04:02:48.132Z",
#   "created_at": "2023-07-20T03:19:39.676Z",
#   "root_pass": null,
#   "architecture_id": 1,
#   "operatingsystem_id": 2,
#   "ptable_id": null,
#   "medium_id": null,
#   "build": false,
#   "comment": null,
#   "disk": null,
#   "installed_at": "2023-07-20T03:20:28.857Z",
#   "model_id": 1,
#   "hostgroup_id": null,
#   "owner_id": 1,
#   "owner_type": "User",
#   "enabled": true,
#   "puppet_ca_proxy_id": null,
#   "managed": false,
#   "use_image": null,
#   "image_file": "",
#   "uuid": null,
#   "compute_resource_id": null,
#   "puppet_proxy_id": null,
#   "certname": "client-0-changed",
#   "image_id": null,
#   "organization_id": 1,
#   "location_id": 2,
#   "otp": null,
#   "realm_id": null,
#   "compute_profile_id": null,
#   "provision_method": "build",
#   "grub_pass": null,
#   "discovery_rule_id": null,
#   "global_status": 0,
#   "lookup_value_matcher": "fqdn=client-0-changed",
#   "openscap_proxy_id": null,
#   "pxe_loader": null,
#   "initiated_at": "2023-07-20T03:20:27.055Z",
#   "build_errors": null,
#   "content_facet_attributes": {
#     "id": 5,
#     "host_id": 6,
#     "uuid": "e91e4f8d-6ace-4a7a-8af0-dd7311786042",
#     "content_view_id": 1,
#     "lifecycle_environment_id": 1,
#     "kickstart_repository_id": null,
#     "content_source_id": null,
#     "installable_security_errata_count": 0,
#     "installable_enhancement_errata_count": 0,
#     "installable_bugfix_errata_count": 0,
#     "applicable_rpm_count": 0,
#     "upgradable_rpm_count": 0,
#     "applicable_module_stream_count": 0,
#     "upgradable_module_stream_count": 0,
#     "applicable_deb_count": 0,
#     "upgradable_deb_count": 0
#   },
#   "subscription_facet_attributes": {
#     "id": 9,
#     "host_id": 6,
#     "uuid": "e91e4f8d-6ace-4a7a-8af0-dd7311786042",
#     "last_checkin": "2023-07-20T04:02:46.801Z",
#     "service_level": "",
#     "release_version": "8.6",
#     "autoheal": true,
#     "registered_at": "2023-07-20T03:20:16.000Z",
#     "registered_through": "panlab-satellite-server.infra.wzhlab.top",
#     "user_id": null,
#     "hypervisor": false,
#     "hypervisor_host_id": null,
#     "purpose_usage": "",
#     "purpose_role": "",
#     "dmi_uuid": "4C6B4D56-ACB7-585F-EB20-90FD676DEA4B"
#   }
# }


从上面的操作,可以看到,客户的需求非常简单,那么我们是可以把端口从443变成6443的。大致的过程,是在web console上配置一下入口url,然后在主机上配置iptables,做端口转发。然后,在被管理节点上,把下发的shell脚本,做个定制,就可以了。

但是,需要注意,更改443端口,是红帽官方不支持的定制化,所以只能用于需求非常简单的场景中。

中国区加速

satellite默认会从cdn.redhat.com上下载rpm,但是在客户网络里面很慢,从china.cdn.redhat.com下载比较快,那么我们怎么配置,来用中国区的镜像呢?

第一步,配置一个 Content Credentials,注意,这里面的文件,是/etc/rhsm/ca/redhat-uep.pem,你可以下载到本地上传,也可以复制内容进去。

第二步,是配置一个 "Custom CDN",注意这里面要配置 SSL CA Content Credential ,用我们第一步配置的就好。

第三步,是刷新一下

到这里,我们就配置成功,可以同步啦。

安装 insight 插件

satellite也可以作为insight的proxy来使用,尝试之前,参考以下官方知识库,把对应的insight开关打开。

对应到实验环境,我们需要去目标主机上,单独打开insight开关

默认是关上的(false)

我们打开它。

然后,我们去目标主机,进行操作。我们先模拟离线环境,做iptables规则,关闭所有出流量,只留下satellite。

# block traffic to outside
# except to satellite

iptables -A OUTPUT -p tcp -d 172.21.6.171 -j ACCEPT
iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT
iptables -A OUTPUT -p tcp -j REJECT


# try to register on insight
insights-client --register
# Successfully registered host client-0-changed
# Automatic scheduling for Insights has been enabled.
# Starting to collect Insights data for client-0-changed
# Uploading Insights data.
# Successfully uploaded report from client-0-changed to account 5910538.
# View the Red Hat Insights console at https://console.redhat.com/insights/

insights-client --check-results

insights-client --show-results
# [
#  {
#   "rule": {
#    "rule_id": "generate_vmcore_failed_during_makedumpfile|GENERATE_VMCORE_FAILED_DURING_MAKEDUMPFILE",
#    "created_at": "2023-02-08T08:31:18.561333Z",
#    "updated_at": "2023-03-05T08:31:21.314917Z",
#    "description": "The vmcore generation fails in RHEL 8.6 when \"cgroup_disable=memory\" is configured due to a known bug in the kernel",
#    "active": true,
#    "category": {
#     "id": 1,
#     "name": "Availability"
#    },
#    "impact": {
#     "name": "Kernel Panic",
#     "impact": 4
#    },
#    "likelihood": 3,
#    "node_id": "6969010",
#    "tags": "kdump kernel panic",
#    "reboot_required": true,
#    "publish_date": "2023-03-05T03:26:00Z",
#    "summary": "The vmcore generation fails in RHEL 8.6 when \"cgroup_disable=memory\" is configured due to a known bug in the kernel.\n",
#    "generic": "The vmcore generation fails in RHEL 8.6 when \"cgroup_disable=memory\" is configured due to a known bug in the kernel.\n",
#    "reason": "This host is running **RHEL 8.6** with **kernel-{{=pydata.rhel_version}}** and \n**\"cgroup_disable=memory\"** is appended to the **KDUMP_COMMANDLINE_APPEND** in\nthe `/etc/sysconfig/kdump`:\n~~~\nKDUMP_COMMANDLINE_APPEND=\"{{=pydata.kdump_data_append}}\"\n~~~\n\nHowever, due to a known bug in the kernel versions prior to **4.18.0-372.40.1.el8_6**, \nthe vmcore generation fails when **\"cgroup_disable=memory\"** is appended to \n**KDUMP_COMMANDLINE_APPEND** in the `/etc/sysconfig/kdump`.\n",
#    "more_info": "",
#    "resolution_set": [
#     {
#      "system_type": 105,
#      "resolution": "Red Hat recommends that you perform the following steps:\n\n{{?pydata.cur_lock && pydata.rcm_locks}}\n* Unset the release lock.\n  ~~~\n  # subscription-manager release --unset\n  ~~~\n{{?}}\n\n{{?pydata.no_base &&\n  (pydata.cur_lock==null || (pydata.cur_lock && pydata.rcm_locks))}}\n* Enable the RHEL base repo:\n  ~~~\n  # subscription-manager repos --enable={{=pydata.no_base}}\n  ~~~\n  Note: To fix the issue in the base channel, you have to enable the base channel at first.\n{{?}}\n\n{{?pydata.cur_lock && pydata.req_repos && pydata.rcm_locks==null}}\n* {{?Object.keys(pydata.req_repos).length > 1}}Enable one of the following channels{{??}}Enable the following channel{{?}}:\n  ~~~\n  {{~pydata.req_repos:e}}# subscription-manager repos --enable={{=e}}\n  {{~}}\n  ~~~\n  Note: Red Hat only provides the resolution in the required channel{{?Object.keys(pydata.req_repos).length > 1}}s{{?}}. \n{{?}}\n* Update the `kernel` package:\n  ~~~\n  # yum update kernel\n  ~~~\n* Reboot the system with the new kernel:\n  ~~~\n  # reboot\n  ~~~\n{{?pydata.cur_lock && pydata.rcm_locks}}\n**Alternatively**, if unsetting the release lock is not an option, fix this issue by re-setting the release lock to {{?Object.keys(pydata.rcm_locks).length > 1}}one of the RHEL releases ``{{~pydata.rcm_locks:e}}{{=e}}, {{~}}``{{??}}the RHEL release ``{{=pydata.rcm_locks[0]}}``{{?}} and updating the package.{{?}}\n\n\nAfter applying the remediation, refresh the results of Advisor analysis by running the `insights-client` command on the system. \n~~~ \n# insights-client \n~~~ \n",
#      "resolution_risk": {
#       "name": "Upgrade Kernel",
#       "risk": 3
#      },
#      "has_playbook": true
#     }
#    ],
#    "total_risk": 3
#   },
#   "details": {
#    "type": "rule",
#    "error_key": "GENERATE_VMCORE_FAILED_DURING_MAKEDUMPFILE",
#    "rhel_version": "4.18.0-372.32.1.el8_6.x86_64",
#    "kdump_data_append": "irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd hest_disable"
#   },
#   "resolution": {
#    "system_type": 105,
#    "resolution": "Red Hat recommends that you perform the following steps:\n\n{{?pydata.cur_lock && pydata.rcm_locks}}\n* Unset the release lock.\n  ~~~\n  # subscription-manager release --unset\n  ~~~\n{{?}}\n\n{{?pydata.no_base &&\n  (pydata.cur_lock==null || (pydata.cur_lock && pydata.rcm_locks))}}\n* Enable the RHEL base repo:\n  ~~~\n  # subscription-manager repos --enable={{=pydata.no_base}}\n  ~~~\n  Note: To fix the issue in the base channel, you have to enable the base channel at first.\n{{?}}\n\n{{?pydata.cur_lock && pydata.req_repos && pydata.rcm_locks==null}}\n* {{?Object.keys(pydata.req_repos).length > 1}}Enable one of the following channels{{??}}Enable the following channel{{?}}:\n  ~~~\n  {{~pydata.req_repos:e}}# subscription-manager repos --enable={{=e}}\n  {{~}}\n  ~~~\n  Note: Red Hat only provides the resolution in the required channel{{?Object.keys(pydata.req_repos).length > 1}}s{{?}}. \n{{?}}\n* Update the `kernel` package:\n  ~~~\n  # yum update kernel\n  ~~~\n* Reboot the system with the new kernel:\n  ~~~\n  # reboot\n  ~~~\n{{?pydata.cur_lock && pydata.rcm_locks}}\n**Alternatively**, if unsetting the release lock is not an option, fix this issue by re-setting the release lock to {{?Object.keys(pydata.rcm_locks).length > 1}}one of the RHEL releases ``{{~pydata.rcm_locks:e}}{{=e}}, {{~}}``{{??}}the RHEL release ``{{=pydata.rcm_locks[0]}}``{{?}} and updating the package.{{?}}\n\n\nAfter applying the remediation, refresh the results of Advisor analysis by running the `insights-client` command on the system. \n~~~ \n# insights-client \n~~~ \n",
#    "resolution_risk": {
#     "name": "Upgrade Kernel",
#     "risk": 3
#    },
#    "has_playbook": true
#   },
#   "impacted_date": "2023-09-05T05:09:54.945795Z"
#  },
#  {
#   "rule": {
#    "rule_id": "el8_to_el9_upgrade|RHEL8_TO_RHEL9_UPGRADE_AVAILABLE_V1",
#    "created_at": "2023-07-18T08:45:10.136263Z",
#    "updated_at": "2023-09-04T20:32:26.551599Z",
#    "description": "RHEL 8 system is eligible for an in-place upgrade to RHEL 9 using the Leapp utility",
#    "active": true,
#    "category": {
#     "id": 4,
#     "name": "Performance"
#    },
#    "impact": {
#     "name": "Best Practice",
#     "impact": 1
#    },
#    "likelihood": 1,
#    "node_id": "6955478",
#    "tags": "autoack kernel leapp",
#    "reboot_required": true,
#    "publish_date": "2023-08-11T08:32:29Z",
#    "summary": "Red Hat provides `leapp` utility to support upgrade from **RHEL 8** to **RHEL 9**. The current **RHEL 8** version is eligible for upgrade to **RHEL 9** via `leapp` utility. Red Hat recommends that you install `leapp` packages.\n\nOne way to install `leapp` is during execution of **RHEL preupgrade analysis utility** in *Automation Toolkit -> Tasks* service. Run this task to understand the impact of an upgrade on your fleet and make a remediation plan before your maintenance window begins.\n",
#    "generic": "Red Hat provides `leapp` utility to support upgrade from **RHEL 8** to **RHEL 9**. The current **RHEL 8** version is eligible for upgrade to **RHEL 9** via `leapp` utility. Red Hat recommends that you install `leapp` packages.\n\nOne way to install `leapp` is during execution of **RHEL preupgrade analysis utility** in *Automation Toolkit -> Tasks* service. Run this task to understand the impact of an upgrade on your fleet and make a remediation plan before your maintenance window begins.\n",
#    "reason": "{{? pydata.error_key == \"RHEL8_TO_RHEL9_UPGRADE_AVAILABLE\"}}\nThe current **RHEL** version **{{=pydata.supported_path[0]}}** is eligible for upgrade to **RHEL** version {{? pydata.supported_path.length > 2}}**{{=pydata.supported_path[2]}}** (default) or **{{=pydata.supported_path[1]}}**{{??}}**{{=pydata.supported_path[1]}}**{{?}} via the Leapp utility.\n{{?}}\n\n{{? pydata.error_key == \"RHEL8_TO_RHEL9_UPGRADE_AVAILABLE_RPMS\"}}\nThe Leapp utility is available on this system. The current **RHEL** version **{{=pydata.supported_path[0]}}** is eligible for upgrade to **RHEL** version {{? pydata.supported_path.length > 2}}**{{=pydata.supported_path[2]}}** (default) or **{{=pydata.supported_path[1]}}**{{??}}**{{=pydata.supported_path[1]}}**{{?}} via the Leapp utility.\n{{?}}\n",
#    "more_info": "One way to install `leapp` is during execution of **RHEL preupgrade analysis utility** in [Automation Toolkit -> Tasks](https://console.redhat.com/insights/tasks) service. Run this task to understand the impact of an upgrade on your fleet and make a remediation plan before your maintenance window begins.\n",
#    "resolution_set": [
#     {
#      "system_type": 105,
#      "resolution": "Red Hat recommends that you upgrade to **RHEL9** with the following steps:\n\n{{? pydata.error_key == \"RHEL8_TO_RHEL9_UPGRADE_AVAILABLE\"}}\n1. Planning an upgrade according to these [points](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/planning-an-upgrade_upgrading-from-rhel-8-to-rhel-9)\n1. Preparing a RHEL 8 system for the upgrade according to this [procedure](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/assembly_preparing-for-the-upgrade_upgrading-from-rhel-8-to-rhel-9).\n\n1. Install `leapp` utility.\n   ~~~\n   # dnf install leapp-upgrade\n   ~~~\n1. Identify potential upgrade problems before upgrade.\n   ~~~\n   # leapp preupgrade --target {{? pydata.supported_path.length > 2}}{{=pydata.supported_path[2]}}{{??}}{{=pydata.supported_path[1]}}{{?}}\n   ~~~\n   **Note**: Check `/var/log/leapp/leapp-report.txt` or web console for any pre-check failure and refer to [Reviewing the pre-upgrade report](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/reviewing-the-pre-upgrade-report_upgrading-from-rhel-8-to-rhel-9) for more details. \n1. Start the upgrade.\n   ~~~\n   # leapp upgrade --target {{? pydata.supported_path.length > 2}}{{=pydata.supported_path[2]}}{{??}}{{=pydata.supported_path[1]}}{{?}}\n   ~~~\n1. Reboot system.\n   ~~~\n   # reboot\n   ~~~\n{{?}}\n\n{{? pydata.error_key == \"RHEL8_TO_RHEL9_UPGRADE_AVAILABLE_RPMS\"}}\n1. Planning an upgrade according to these [points](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/planning-an-upgrade_upgrading-from-rhel-8-to-rhel-9)\n1. Preparing a RHEL 8 system for the upgrade according to this [procedure](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/assembly_preparing-for-the-upgrade_upgrading-from-rhel-8-to-rhel-9).\n\n1. Identify potential upgrade problems before upgrade.\n   ~~~\n   # leapp preupgrade --target {{? pydata.supported_path.length > 2}}{{=pydata.supported_path[2]}}{{??}}{{=pydata.supported_path[1]}}{{?}}\n   ~~~\n1. Start the upgrade.\n   ~~~\n   # leapp upgrade --target {{? pydata.supported_path.length > 2}}{{=pydata.supported_path[2]}}{{??}}{{=pydata.supported_path[1]}}{{?}}\n   ~~~\n   **Note**: Check `/var/log/leapp/leapp-report.txt` or web console for any pre-check failure and refer to [Reviewing the pre-upgrade report](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/reviewing-the-pre-upgrade-report_upgrading-from-rhel-8-to-rhel-9) for more details.\n1. Reboot system.\n   ~~~\n   # reboot\n   ~~~\n{{?}}\nFor more details about upgrading, refer to [Upgrading to RHEL9](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/index).\n\n\nAfter applying the remediation, refresh the results of Advisor analysis by running the `insights-client` command on the system. \n~~~ \n# insights-client \n~~~ \n",
#      "resolution_risk": {
#       "name": "Upgrade RHEL",
#       "risk": 3
#      },
#      "has_playbook": false
#     }
#    ],
#    "total_risk": 1
#   },
#   "details": {
#    "type": "rule",
#    "error_key": "RHEL8_TO_RHEL9_UPGRADE_AVAILABLE_V1",
#    "supported_path": [
#     "8.6",
#     "9.0"
#    ]
#   },
#   "resolution": {
#    "system_type": 105,
#    "resolution": "Red Hat recommends that you upgrade to **RHEL9** with the following steps:\n\n{{? pydata.error_key == \"RHEL8_TO_RHEL9_UPGRADE_AVAILABLE\"}}\n1. Planning an upgrade according to these [points](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/planning-an-upgrade_upgrading-from-rhel-8-to-rhel-9)\n1. Preparing a RHEL 8 system for the upgrade according to this [procedure](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/assembly_preparing-for-the-upgrade_upgrading-from-rhel-8-to-rhel-9).\n\n1. Install `leapp` utility.\n   ~~~\n   # dnf install leapp-upgrade\n   ~~~\n1. Identify potential upgrade problems before upgrade.\n   ~~~\n   # leapp preupgrade --target {{? pydata.supported_path.length > 2}}{{=pydata.supported_path[2]}}{{??}}{{=pydata.supported_path[1]}}{{?}}\n   ~~~\n   **Note**: Check `/var/log/leapp/leapp-report.txt` or web console for any pre-check failure and refer to [Reviewing the pre-upgrade report](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/reviewing-the-pre-upgrade-report_upgrading-from-rhel-8-to-rhel-9) for more details. \n1. Start the upgrade.\n   ~~~\n   # leapp upgrade --target {{? pydata.supported_path.length > 2}}{{=pydata.supported_path[2]}}{{??}}{{=pydata.supported_path[1]}}{{?}}\n   ~~~\n1. Reboot system.\n   ~~~\n   # reboot\n   ~~~\n{{?}}\n\n{{? pydata.error_key == \"RHEL8_TO_RHEL9_UPGRADE_AVAILABLE_RPMS\"}}\n1. Planning an upgrade according to these [points](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/planning-an-upgrade_upgrading-from-rhel-8-to-rhel-9)\n1. Preparing a RHEL 8 system for the upgrade according to this [procedure](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/assembly_preparing-for-the-upgrade_upgrading-from-rhel-8-to-rhel-9).\n\n1. Identify potential upgrade problems before upgrade.\n   ~~~\n   # leapp preupgrade --target {{? pydata.supported_path.length > 2}}{{=pydata.supported_path[2]}}{{??}}{{=pydata.supported_path[1]}}{{?}}\n   ~~~\n1. Start the upgrade.\n   ~~~\n   # leapp upgrade --target {{? pydata.supported_path.length > 2}}{{=pydata.supported_path[2]}}{{??}}{{=pydata.supported_path[1]}}{{?}}\n   ~~~\n   **Note**: Check `/var/log/leapp/leapp-report.txt` or web console for any pre-check failure and refer to [Reviewing the pre-upgrade report](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/reviewing-the-pre-upgrade-report_upgrading-from-rhel-8-to-rhel-9) for more details.\n1. Reboot system.\n   ~~~\n   # reboot\n   ~~~\n{{?}}\nFor more details about upgrading, refer to [Upgrading to RHEL9](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/index).\n\n\nAfter applying the remediation, refresh the results of Advisor analysis by running the `insights-client` command on the system. \n~~~ \n# insights-client \n~~~ \n",
#    "resolution_risk": {
#     "name": "Upgrade RHEL",
#     "risk": 3
#    },
#    "has_playbook": false
#   },
#   "impacted_date": "2023-09-05T05:09:54.945795Z"
#  }
# ]

最后,我们能在公网的insight console上,看到我们的主机。

同时,我们注意到,虽然在公网的insight上,能看到这个主机,但是在access.redhat.com的订阅管理中,是看不到这个被管主机的。

总结以下,satellite可以作为insight的proxy使用,但是在实验过程中,我发现insight的结果,只能在主机自己,和insight公网console上展现,而satellite上面,虽然有insight结果展示的入口页面,但是页面是空的,也许有别的配置吧。

重装os

客户有一个特殊场景,如果rhel重装了,那么satellite上面,原来的信息要怎么处理?是否可以不删除satellite上面的信息,直接在rhel上面注册,复用之前的注册信息呢?我们试试

我们重装实验环境中的一台主机

# before reinstall, we check the uuid
subscription-manager facts | grep uuid
# dmi.system.uuid: 8DF84D56-895F-6163-962B-30EF44BDE122
# virt.uuid: 8DF84D56-895F-6163-962B-30EF44BDE122

# after reinstall os
# we can see the uuid is the same
subscription-manager facts | grep uuid
# dmi.system.uuid: 8DF84D56-895F-6163-962B-30EF44BDE122
# virt.uuid: 8DF84D56-895F-6163-962B-30EF44BDE122

# try to register again
curl -sS --insecure 'https://panlab-satellite-server.infra.wzhlab.top/register?activation_keys=demo-activate&location_id=2&organization_id=1&setup_insights=false&setup_remote_execution=false&setup_remote_execution_pull=false&update_packages=false' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0LCJpYXQiOjE2ODQzMDU1MTYsImp0aSI6IjdiODBkNzdmMjVjYzY1MDZjODQ3OGI2Y2VjNzRkZWZjOGM2YjAyMDUxMDQ4YTcyYTJlMWE1YzRiNTgyMjE5NzAiLCJzY29wZSI6InJlZ2lzdHJhdGlvbiNnbG9iYWwgcmVnaXN0cmF0aW9uI2hvc3QifQ.EVXyW9gjWyAQIFYUxnwwdxAigrPmUo_XYWnqn-Wh1Fw' | bash
# ......
# The DMI UUID of this host (8DF84D56-895F-6163-962B-30EF44BDE122) matches other registered hosts: satellite-client-02 (HTTP error code 422: Unprocessable Entity)

好了,我们看到了结论,satellite发现,已经有一个相同的uuid主机存在,不能再注册了。我们能做的,就是先在satellite里面,把现在已经存在的这个主机给删掉。

change uuid

我们知道了,uuid是注册satellite的一个key,但是,如果我们的环境特殊,uuid就是重复的,那么怎么办呢?官方有解决方案

[root@client ~]# vi /etc/rhsm/facts/uuid.facts 
{"dmi.system.uuid": "customuuid"}

* customuuid = hostname which is unique for every machine.

监控 subscription / 订阅

客户想自动化的监控订阅的过期时间,好及时的更新订阅。虽然我们可以在红帽的portal上面方便的看到订阅的状态,但是,如果我们是运维组,没有访问红帽portal的权限(内部沟通协调问题,你懂的),还是需要一个监控的工具来做这件事情。

那么我们就用 satellite 的 API 来做这件事情。


# you can get the org_id by search a host, the result json contain org_id
curl -s --request GET --insecure --user admin:redhat \
  https://panlab-satellite-server.infra.wzhlab.top:6443//katello/api/subscriptions?organization_id=1 | \
  jq -r '["Name","End Date"], (.results[] | [.name, .end_date] ) | @tsv '
# Name    End Date
# Employee SKU    2027-01-01 04:59:59 UTC

从上面的例子,我们可以看到,从 satellite API 里面,能直接拿到订阅的过期时间。方便运维组监控。

end

next

RHEL 订阅在线注册相关问题

在线注册过程

国内客户,购买了rhel订阅以后,就可以把自己的系统,在线注册了。一般用如下的命令:

subscription-manager register --auto-attach --username ********* --password ********

上述命令在国内的网络情况下,经常出现速度慢,超时等错误。这是因为,register过程,要访问国外的服务器(subscription.rhsm.redhat.com)。那我们可以搞一个proxy,然后让注册过程走proxy,就能加速。

How to access Red Hat Subscription Manager (RHSM) through a firewall or proxy

export PROXY="127.0.0.1:18801"

subscription-manager register --proxy=$PROXY --auto-attach --username ********* --password ********

官方知识库: https://access.redhat.com/solutions/253273

debug

如果不太清楚慢的原因,那么就需要打开rhsm的log,看看日志,确定问题原因了。

sed -i 's/default_log_level = .*/default_log_level = DEBUG/' /etc/rhsm/rhsm.conf

subscription-manager status

cat /var/log/rhsm/rhsm.log

后台持续访问红帽服务器

rhsm带了一些服务,其中有一个服务 rhsmcertd.service 默认是激活的。

systemctl list-unit-files | grep rhsm
# rhsm-facts.service                         disabled
# rhsm.service                               disabled
# rhsmcertd.service                          enabled

systemctl cat rhsmcertd.service
# # /usr/lib/systemd/system/rhsmcertd.service
# [Unit]
# Description=Enable periodic update of entitlement certificates.
# After=network.target

# [Service]
# Type=forking
# ExecStart=/usr/bin/rhsmcertd

# [Install]
# WantedBy=multi-user.target

我们可以看到,它启动了一个系统管理的服务,我们可以man rhsmcertd看看这个服务是做什么的。原来,它是定期去红帽服务器检查和更新证书的。我们是在线系统,留着它就好。

dnf using subscription-manager as plugin

我们平常使用dnf的时候,会不会触发subscription-manager里面的功能呢?笔者认为不会,这是因为RHEL的dnf里面,有一个plugin

cat /etc/dnf/plugins/subscription-manager.conf
# [main]
# enabled=1

# # When following option is set to 1, then all repositories defined outside redhat.repo will be disabled
# # every time subscription-manager plugin is triggered by dnf or yum
# disable_system_repos=0

我们可以看到,dnf有一个subscription-manager的plugin,具体他做什么,可以看看 dnf-plugin-subscription-manager 这个rpm,可以看到他有几个python脚本,他只有在特定的,有satellite的情况下,使用 upload subcommand 触发类似subscription-manager的逻辑,向satellite汇报本机情况。

Simple Content Access

红帽提供了一种新的消费订阅的模式,Simple Content Access,原来管理员需要一台主机一台主机的register, 然后在主机上添加订阅。这么操作有点麻烦。在新的 SCA 模式下,管理员只需要 register 这个主机就可以了,主机可以使用任何当前 org 下的订阅。

红帽的 SCA 政策,就是变相的鼓励大家超用订阅,然后第二年红帽销售就有理由管客户多要一笔钱了。这也是为什么,笔者建议激活SCA之前,要研究一下如何限制订阅的使用方法和措施。

官方文档:

activation key

SCA 太好用了,怎么能严格的控制使用量呢?方法是activation activation key可以指定host 数量,就可以避免超量使用啦。

具体方法见官方文档: https://access.redhat.com/articles/1378093

取消订阅过程

如果vm要销毁了,那么怎么取消订阅的使用呢,很简单。但是一定要记得,在vm销毁之前运行哦。。。

subscription-manager unregister

离线注册过程

如果客户网络情况太特殊,那么我们还可以走离线注册过程。背后的原理是,之前的在线注册,经过用户名密码验证后,系统会下载一个证书,保存在系统里面,后续再和红帽系统建立连接,就使用这个证书了。

离线注册流程,就是去手动下载这个证书,导入到系统中去,然后走后续流程。

具体步骤,见这个在线知识库: https://access.redhat.com/solutions/3121571

CCSP订阅的注册过程

CCSP订阅是为云主机厂商提供的一种订阅方式。有了CCSP订阅,云主机厂商需要去维护一套RHUI(Red Hat Update Infrastructure),然后云上的rhel都去访问RHUI来获得更新。

rpm CDN 加速

上面说的都是注册过程,注册完了,就是下载rpm了。红帽的rpm有全球的CDN加速,由于众所周知的原因,如果客户感觉下载慢,可以切换国内的CDN

subscription-manager config --rhsm.baseurl=https://china.cdn.redhat.com
subscription-manager refresh
yum clean all
yum makecache

官方知识库: https://access.redhat.com/solutions/5090421

satellite

企业用户的私有云,都是离线的环境。红帽提供了一个产品叫satellite,相当于一个注册服务器的代理和rpm源的私有CDN。

local repo mirror

如果客户认为使用satellite太复杂,部署太麻烦,那么还有一种笨拙,但是简单的方法,就是先注册一台主机,把红帽官方的repo给镜像到本地,在这个主机上开启web服务,把这个主机给变成一个本地repo源。其他主机指向这个本地源就可以了。

官方知识库: https://access.redhat.com/solutions/23016

第二年续订

通常,rhel订阅都是一年的,第二年续订就好。。。但是,续订以后,大部分情况,订阅号会改变,这种情况下,rhel上要做什么操作呢?需要刷新,并重新绑定。

# 刷新订阅信息
subscription-manager refresh

# 自动选择匹配的订阅附加
subscription-manager auto-attach

如果不清楚,或者忘记了有什么订阅了?那么用以下命令查看

# 此命令是查看当前账号下所有可用的有效订阅,其中也可以看到每个订阅的有效期
subscription-manager list --available

# 此命令是查看当前这台机器所消耗的订阅类型,其中也包括有效时间
subscription-manager list --consumed

如果担心订阅号改变,会影响业务,那么我们可以在RHSM web console上,把新的订阅号加上,然后提前 subscription-manager refresh , 这样就可以了。在RHSM web console上的操作,也可以通过rest api完成,方便有大量订阅的客户,自动化处理。

通过新增系统启动项来原地重装操作系统

很多时候,我们有一台centos7的主机,但是没有cd-rom的访问权限,有可能也没有console的访问权限,而我们做实验,又需要把这个主机刷新成rhel8等操作系统,那么我们怎么才能把这个centos7主机,原地重新安装成其他操作系统呢?

之前,已经有文章,描述怎么从centos7开始一个openshift/coreos的安装。那么,本文就探讨一下,如何从centos7,自动化安装一个alma8。同时,为了探索在安装的时候,能加载某些第三方驱动,我们也试试如何从centos7 boot进入alma8的手动安装界面。

boot into auto install

我们先来做一个从centos7的系统,做一些配置以后,重启,自动化安装成alma8系统。

这个部署就是一个kvm,这个kvm原来装的是centos7。但是我们需要一个安装源,也就是一个外部的http web server,提供安装光盘,并且提供kickstart配置文件。按理说哈,我们是可以配置kvm,把这些安装光盘里面的文件,还有kickstart文件什么的,都放到本地硬盘里面去,但是无奈在启动参数里面,你要指定这个硬盘id,作者实在是不知道怎么找到这些硬盘id,好在如果你用外部的http web server,只要知道URL就可以。

参考资料:

  • https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/system_design_guide/index#starting-a-kickstart-installation-manually_starting-kickstart-installations
  • https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/system_design_guide/index#updating-drivers-during-installation_system-design-guide
  • https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/system_design_guide/index#updating-drivers-during-installation_system-design-guide

# create a kickstart file and copy to /data/dnf/
# 本文的目录里面,有kickstart的配置模板,我们改变一下里面的IP地址i配置,安装源的URL就能用了。
sed -i '0,/^network.*/s/^network.*/network  --bootproto=static --device=enp1s0 --gateway=192.168.7.11 --ip=192.168.7.12  --netmask=255.255.255.0 --nameserver=192.168.7.11  --ipv6=auto --activate/' helper-ks-alma.cfg

sed -i '0,/^url --url.*/s/^url --url.*/url --url="http:\/\/192.168.7.11:5000\/cdrom\/"/' helper-ks-alma.cfg

# create a centos7 kvm

# 要装kvm,我们需要一个bridge
cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.102/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh

nmcli con mod baremetal +ipv4.addresses "192.168.7.102/24"
nmcli con up baremetal

mkdir -p /data/kvm
cd /data/kvm

# 我们就用centos7 minimal iso来安装好了。
# 先下载这个iso,作者发现,南京大学的mirror是真的快啊。。。
wget -O centos.iso http://mirrors.nju.edu.cn/centos/7.9.2009/isos/x86_64/CentOS-7-x86_64-Minimal-2207-02.iso

# 同样,作者提供了一个kvm安装centos的kickstart配置文件模板,替换一下ip地址配置就能用了。
sed -i '0,/^network.*/s/^network.*/network  --bootproto=static --device=eth0 --gateway=192.168.7.9 --ip=192.168.7.12  --netmask=255.255.255.0 --nameserver=192.168.7.11  --ipv6=auto --activate/' helper-ks.cfg

# 接下来,我们定义kvm,给他创建存储空间,启动kvm,就开始自动安装centos kvm了
create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

virsh destroy ocp4-acm-hub
virsh undefine ocp4-acm-hub

create_lv vgdata poolA lvacmhub 100G 
create_lv vgdata poolA lvacmhub-data 100G 

create_lv vgdata poolA lvacmhub 100G recreate
create_lv vgdata poolA lvacmhub-data 100G recreate

virt-install --name="ocp4-acm-hub" --vcpus=16 --ram=$((4*1024)) \
    --cpu=host-model \
    --disk path=/dev/vgdata/lvacmhub,device=disk,bus=virtio,format=raw \
    --disk path=/dev/vgdata/lvacmhub-data,device=disk,bus=virtio,format=raw \
    --os-variant rhel8.5 --network bridge=baremetal,model=virtio \
    --graphics vnc,port=59000 \
    --boot menu=on --location /data/kvm/centos.iso \
    --initrd-inject helper-ks.cfg --extra-args "inst.ks=file:/helper-ks.cfg" 

# 等一会,我们就有了一个centos kvm了。

# on helper web server
# 然后,我们在web server上,下载alma8的安装光盘,我们用minimal的版本就好了。
cd /data/dnf
wget -O alma8.iso http://mirrors.nju.edu.cn/almalinux/8/isos/x86_64/AlmaLinux-8-latest-x86_64-minimal.iso
# wget -O rocky8.iso http://mirrors.nju.edu.cn/rocky/8/isos/x86_64/Rocky-x86_64-minimal.iso

# 我们把光盘挂载在本地,然后我们的web server会自动的发布出去。
mkdir -p /data/dnf/cdrom
mount alma8.iso /data/dnf/cdrom
# mount rocky8.iso /data/dnf/cdrom

# on the centos7 vm
# 登录到新安装的centos7, 修改启动项,让下次启动的时候,直接进入安装界面
sshpass -p 'redhat' ssh-copy-id root@192.168.7.12

ssh root@192.168.7.12

# 在centos7里面,下载alma8的内核和ram disk.
HELPER_URL=http://192.168.7.11:5000/cdrom/

curl -o /boot/initrd.img  $HELPER_URL/images/pxeboot/initrd.img
curl -o /boot/vmlinuz     $HELPER_URL/images/pxeboot/vmlinuz

SNO_IP=192.168.7.13
SNO_GW=192.168.7.11
SNO_NETMAST=255.255.255.0
SNO_HOSTNAME=acm-demo-hub-master
SNO_IF=enp1s0
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_ROOTFS=http://192.168.7.11:5000/cdrom/
SNO_IGN=http://192.168.7.11:5000/helper-ks-alma.cfg

# 根据参数,我们自定义一个启动项,
# 这个启动项,用alma8的内核和ram disk启动,带IP地址参数,
# kickstart配置文件指向web server, 安装文件源也指向web server
cat << EOF > /etc/grub.d/40_custom
#!/bin/sh
exec tail -n +3 \$0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.

menuentry 'coreos' --class fedora --class gnu-linux --class gnu --class os {
    insmod gzio
    insmod part_msdos
    insmod xfs
    set root='hd0,msdos1'
    echo  'Loading coreos kernel ...'
    linux /vmlinuz rd.neednet=1 ip=$SNO_IP::$SNO_GW:$SNO_NETMAST:$SNO_HOSTNAME:$SNO_IF:none nameserver=$SNO_DNS  inst.ks=$SNO_IGN inst.repo=$SNO_ROOTFS
    echo  'Loading coreos initrd ...'
    initrd /initrd.img
}
EOF

# 定义下次自动启动的启动项,就是我们新定义
sed -i 's/^GRUB_DEFAULT=.*/GRUB_DEFAULT="coreos"/' /etc/default/grub 

# 写入我们的配置到grub
grub2-mkconfig -o /etc/grub2.cfg

# 重启等待就好了。
reboot

boot into install console

有的时候,我们是能接触到console的,而且自动化配置的很多参数,我们也不知道,那么我们就必须用手动的方式安装。同样的,我们假设设备已经装好了centos7,我们从这里开始开始。

这里面的区别和上面的步骤很小,就是在启动参数里面,我们不要加inst.ks这个参数,也就是没有自动化安装的配置文件,这样重启以后,我们就能停留在我们很熟悉的安装界面上了。


# create a centos7 kvm

cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.102/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh

nmcli con mod baremetal +ipv4.addresses "192.168.7.102/24"
nmcli con up baremetal


mkdir -p /data/kvm
cd /data/kvm

wget -O centos.iso http://mirrors.nju.edu.cn/centos/7.9.2009/isos/x86_64/CentOS-7-x86_64-Minimal-2207-02.iso

sed -i '0,/^network.*/s/^network.*/network  --bootproto=static --device=eth0 --gateway=192.168.7.9 --ip=192.168.7.12  --netmask=255.255.255.0 --nameserver=192.168.7.11  --ipv6=auto --activate/' helper-ks.cfg


create_lv() {
    var_vg=$1
    var_pool=$2
    var_lv=$3
    var_size=$4
    var_action=$5
    lvremove -f $var_vg/$var_lv
    # lvcreate -y -L $var_size -n $var_lv $var_vg
    if [ "$var_action" == "recreate" ]; then
      lvcreate --type thin -n $var_lv -V $var_size --thinpool $var_vg/$var_pool
      wipefs --all --force /dev/$var_vg/$var_lv
    fi
}

virsh destroy ocp4-acm-hub
virsh undefine ocp4-acm-hub

create_lv vgdata poolA lvacmhub 100G 
create_lv vgdata poolA lvacmhub-data 100G 


create_lv vgdata poolA lvacmhub 100G recreate
create_lv vgdata poolA lvacmhub-data 100G recreate


virt-install --name="ocp4-acm-hub" --vcpus=16 --ram=$((4*1024)) \
    --cpu=host-model \
    --disk path=/dev/vgdata/lvacmhub,device=disk,bus=virtio,format=raw \
    --disk path=/dev/vgdata/lvacmhub-data,device=disk,bus=virtio,format=raw \
    --os-variant rhel8.5 --network bridge=baremetal,model=virtio \
    --graphics vnc,port=59000 \
    --boot menu=on --location /data/kvm/centos.iso \
    --initrd-inject helper-ks.cfg --extra-args "inst.ks=file:/helper-ks.cfg" 

# on helper web server
cd /data/dnf
wget -O alma8.iso http://mirrors.nju.edu.cn/almalinux/8/isos/x86_64/AlmaLinux-8-latest-x86_64-minimal.iso

mkdir -p /data/dnf/cdrom
mount alma8.iso /data/dnf/cdrom

# on the centos7 vm
sshpass -p 'redhat' ssh-copy-id root@192.168.7.12

ssh root@192.168.7.12

HELPER_URL=http://192.168.7.11:5000/cdrom/

curl -o /boot/initrd.img  $HELPER_URL/images/pxeboot/initrd.img
curl -o /boot/vmlinuz     $HELPER_URL/images/pxeboot/vmlinuz

SNO_IP=192.168.7.12
SNO_GW=192.168.7.11
SNO_NETMAST=255.255.255.0
SNO_HOSTNAME=acm-demo-hub-master
SNO_IF=enp1s0
SNO_DNS=192.168.7.11
SNO_DISK=/dev/vda
SNO_ROOTFS=http://192.168.7.11:5000/cdrom/
SNO_IGN=http://192.168.7.11:5000/helper-ks-alma8.cfg


cat << EOF > /etc/grub.d/40_custom
#!/bin/sh
exec tail -n +3 \$0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.

menuentry 'coreos' --class fedora --class gnu-linux --class gnu --class os {
    insmod gzio
    insmod part_msdos
    insmod xfs
    set root='hd0,msdos1'
    echo  'Loading coreos kernel ...'
    linux /vmlinuz rd.neednet=1 ip=$SNO_IP::$SNO_GW:$SNO_NETMAST:$SNO_HOSTNAME:$SNO_IF:none nameserver=$SNO_DNS inst.repo=$SNO_ROOTFS
    echo  'Loading coreos initrd ...'
    initrd /initrd.img 
}
EOF

sed -i 's/^GRUB_DEFAULT=.*/GRUB_DEFAULT="coreos"/' /etc/default/grub 

grub2-mkconfig -o /etc/grub2.cfg

reboot

build nano boot disk

我们尝试做一个迷你启动盘,在这个迷你启动盘里面,除了内核和ram disk之外,什么也没有。同时,增加内核参数,把安装介质和kickstart配置文件,都指向web server。

  • https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html-single/installation_guide/index#s2-kickstart2-boot-media

dnf install -y isomd5sum

cd /data/dnf

wget -O rocky8.iso http://mirrors.nju.edu.cn/rocky/8/isos/x86_64/Rocky-x86_64-minimal.iso
mount rocky8.iso /data/dnf/cdrom

# create a kickstart file and copy to /data/dnf/
# 本文的目录里面,有kickstart的配置模板,我们改变一下里面的IP地址i配置,安装源的URL就能用了。
cd /data/dnf
/bin/cp -f helper-ks-alma.cfg helper-ks-alma-wutong.cfg

# for 101
SNO_IP=172.21.6.101
SNO_IP_INSTALL=172.21.6.199
SNO_GW=172.21.6.254
SNO_NETMAST=255.255.255.0
SNO_HOSTNAME=panlab-101
SNO_IF=eno1
SNO_DNS=172.21.1.1
SNO_DISK=/dev/sda
SNO_ROOTFS=http://172.21.6.11:5000/cdrom/
SNO_IGN=http://172.21.6.11:5000/helper-ks-alma-wutong.cfg

# for 102
SNO_IP=172.21.6.102
SNO_IP_INSTALL=172.21.6.199
SNO_GW=172.21.6.254
SNO_NETMAST=255.255.255.0
SNO_HOSTNAME=panlab-102
SNO_IF=eno1
SNO_DNS=172.21.1.1
SNO_DISK=/dev/sda
SNO_ROOTFS=http://172.21.6.11:5000/cdrom/
SNO_IGN=http://172.21.6.11:5000/helper-ks-alma-wutong.cfg

# for 103
SNO_IP=172.21.6.103
SNO_IP_INSTALL=172.21.6.199
SNO_GW=172.21.6.254
SNO_NETMAST=255.255.255.0
SNO_HOSTNAME=panlab-102
SNO_IF=eno1
SNO_DNS=172.21.1.1
SNO_DISK=/dev/sda
SNO_ROOTFS=http://172.21.6.11:5000/cdrom/
SNO_IGN=http://172.21.6.11:5000/helper-ks-alma-wutong.cfg


sed -i "0,/^network.*/s/^network.*/network  --bootproto=static --device=$SNO_IF --gateway=$SNO_GW --ip=$SNO_IP  --netmask=$SNO_NETMAST --nameserver=$SNO_DNS  --ipv6=auto --activate/" helper-ks-alma-wutong.cfg
sed -i "s/network  --hostname=.*/network  --hostname=$SNO_HOSTNAME/" helper-ks-alma-wutong.cfg

sed -i "0,/^url --url.*/s#^url --url.*#url --url=\"$SNO_ROOTFS\"#" helper-ks-alma-wutong.cfg

sed -i 's/vda/sda/g' helper-ks-alma-wutong.cfg


# mount /data/dnf/alma8.iso /data/dnf/cdrom

mkdir -p /data/tmp/

/bin/cp -pRf /data/dnf/cdrom/ /data/tmp/

cd /data/tmp/cdrom

rm -rf BaseOS/
rm -rf Minimal/
rm -f images/install.img

cat <<EOF > isolinux/isolinux.cfg
default vesamenu.c32
timeout 5

display boot.msg

# Clear the screen when exiting the menu, instead of leaving the menu displayed.
# For vesamenu, this means the graphical background is still displayed without
# the menu itself for as long as the screen remains in graphics mode.
menu clear
menu background splash.png
menu title AlmaLinux 8.7
menu vshift 8
menu rows 18
menu margin 8
#menu hidden
menu helpmsgrow 15
menu tabmsgrow 13

# Border Area
menu color border * #00000000 #00000000 none

# Selected item
menu color sel 0 #ffffffff #00000000 none

# Title bar
menu color title 0 #ff7ba3d0 #00000000 none

# Press [Tab] message
menu color tabmsg 0 #ff3a6496 #00000000 none

# Unselected menu item
menu color unsel 0 #84b8ffff #00000000 none

# Selected hotkey
menu color hotsel 0 #84b8ffff #00000000 none

# Unselected hotkey
menu color hotkey 0 #ffffffff #00000000 none

# Help text
menu color help 0 #ffffffff #00000000 none

# A scrollbar of some type? Not sure.
menu color scrollbar 0 #ffffffff #ff355594 none

# Timeout msg
menu color timeout 0 #ffffffff #00000000 none
menu color timeout_msg 0 #ffffffff #00000000 none

# Command prompt text
menu color cmdmark 0 #84b8ffff #00000000 none
menu color cmdline 0 #ffffffff #00000000 none

# Do not display the actual menu unless the user presses a key. All that is displayed is a timeout message.

menu tabmsg Press Tab for full configuration options on menu items.

menu separator # insert an empty line
menu separator # insert an empty line

label linux
  menu label ^Install WZH Linux 8.7
  menu default
  kernel vmlinuz
  append initrd=initrd.img rd.neednet=1 ip=$SNO_IP_INSTALL::$SNO_GW:$SNO_NETMAST:$SNO_HOSTNAME:$SNO_IF:none nameserver=$SNO_DNS  inst.ks=$SNO_IGN inst.repo=$SNO_ROOTFS

label check
  menu label Test this ^media & install WZH Linux 8.7
  kernel vmlinuz
  append initrd=initrd.img inst.stage2=hd:LABEL=AlmaLinux-8-7-x86_64-dvd rd.live.check quiet

menu separator # insert an empty line

# utilities submenu
menu begin ^Troubleshooting
  menu title Troubleshooting

label vesa
  menu indent count 5
  menu label Install AlmaLinux 8.7 in ^basic graphics mode
  text help
        Try this option out if you're having trouble installing
        AlmaLinux 8.7.
  endtext
  kernel vmlinuz
  append initrd=initrd.img inst.stage2=hd:LABEL=AlmaLinux-8-7-x86_64-dvd nomodeset quiet

label rescue
  menu indent count 5
  menu label ^Rescue a AlmaLinux system
  text help
        If the system will not boot, this lets you access files
        and edit config files to try to get it booting again.
  endtext
  kernel vmlinuz
  append initrd=initrd.img inst.stage2=hd:LABEL=AlmaLinux-8-7-x86_64-dvd inst.rescue quiet

label memtest
  menu label Run a ^memory test
  text help
        If your system is having issues, a problem with your
        system's memory may be the cause. Use this utility to
        see if the memory is working correctly.
  endtext
  kernel memtest

menu separator # insert an empty line

label local
  menu label Boot from ^local drive
  localboot 0xffff

menu separator # insert an empty line
menu separator # insert an empty line

label returntomain
  menu label Return to ^main menu
  menu exit

menu end

EOF


genisoimage -U -r -v -T -J -joliet-long -V "RHEL-6.9" -volset "RHEL-6.9" -A "RHEL-6.9" -b isolinux/isolinux.bin -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -eltorito-alt-boot -e images/efiboot.img -no-emul-boot -o ../wzh.iso .

implantisomd5 ../wzh.iso

update minimal


dnf update-minimal --security

end

others

https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/7/html/installation_guide/chap-anaconda-boot-options

  • inst.graphical
  • inst.resolution=800x600

dmsetup info -c -o name,blkdevname,devnos_used,blkdevs_used
# Name                  BlkDevName       DevNosUsed       BlkDevNamesUsed
# vgdata-lvacmhub       dm-4             253:2            dm-2
# vgdata-lvacmhub--data dm-5             253:2            dm-2
# vgdata-poolA          dm-3             253:2            dm-2
# vgdata-poolA-tpool    dm-2             253:1,253:0      dm-1,dm-0
# vgdata-poolA_tdata    dm-1             8:16             sdb
# vgdata-poolA_tmeta    dm-0             8:16             sdb

Relax and Recover(ReaR) / disaster recovery

Redhat give us a system back solution (ReaR), althought it is not supported by redhat, but it works. now we test it out.

reference:

  • https://access.redhat.com/solutions/2115051

# on nfs server
yum install -y nfs-utils

mkdir -p /storage

cat << EOF > /etc/exports
/storage        *(fsid=0,rw,sync,no_root_squash,no_subtree_check,crossmnt)
EOF

cat /etc/exports
# /storage        *(fsid=0,rw,sync,no_root_squash,no_subtree_check,crossmnt)  

systemctl enable --now nfs

systemctl disable --now firewalld

# on target server

yum install -y rear pstree nfs-utils

cat << EOF > /etc/rear/local.conf
OUTPUT=ISO
OUTPUT_URL=nfs://192.168.203.134/storage
BACKUP=NETFS
BACKUP_URL=nfs://192.168.203.134/storage
BACKUP_PROG_EXCLUDE=("${BACKUP_PROG_EXCLUDE[@]}" '/media' '/var/tmp' '/var/crash')
NETFS_KEEP_OLD_BACKUP_COPY=
EOF

rear -d -v mkbackup

# on nfs server, new files created from target centos7 vm

tree /storage
# /storage
# └── target-centos7
#     ├── backup.log
#     ├── backup.tar.gz
#     ├── README
#     ├── rear-target-centos7.iso
#     ├── rear-target-centos7.log
#     ├── selinux.autorelabel
#     └── VERSION

# now destroy the target centos vm, and recreate a new one
# boot the new vm using rear-target-centos7.iso

after reboot, the system comes back

no-cost rhel subscription / 红帽免费开发者订阅

自从centos宣布停止支持后,红帽为了照顾广大的开发者群体,推出了免费的开发者订阅,可以激活16个系统,还能免费看红帽的知识库,超值,现在就把注册和激活开发者账号的流程走一遍。

注册账户和激活订阅

首先,登录 https://access.redhat.com/ 去创建一个账号

然后访问: https://developers.redhat.com/products/rhel/download

接下来,我们确认一下我们的账号是否有 developer subscription, 访问 https://access.redhat.com/management

我们能够看到,我们刚刚激活了2个subscription,其中一个就是我们要的developer subscription

激活一个系统

接下来,我们用我们的用户名,密码,来激活一个rhel系统


subscription-manager register --auto-attach --username ******** --password ********

dnf repolist
# Updating Subscription Management repositories.
# repo id                                               repo name
# rhel-8-for-x86_64-appstream-rpms                      Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)
# rhel-8-for-x86_64-baseos-rpms                         Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)

访问 https://access.redhat.com/management/systems , 可以看到系统已经激活

能看知识库了

访问这个知识库文章,确认自己能访问知识库啦: https://access.redhat.com/solutions/6178422

调整分区

默认rhel安装,会给一个很大的/home,但是我们做实验,最好把空间都给 / , 不然很容易出现 / 空间不足的情况,那么怎么把 /home 删掉,并且扩大 / 分区呢?

lsblk
# NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
# sr0              11:0    1 1024M  0 rom
# vda             252:0    0   60G  0 disk
# ├─vda1          252:1    0    1G  0 part /boot
# └─vda2          252:2    0   59G  0 part
#   ├─rhel_v-root 253:0    0 38.3G  0 lvm  /
#   ├─rhel_v-swap 253:1    0  2.1G  0 lvm  [SWAP]
#   └─rhel_v-home 253:2    0 18.7G  0 lvm  /home

umount /home
lvremove -f /dev/rhel_v/home
#   Logical volume "home" successfully removed.

# comment out the following line to skip the /home partition
sed -i -E 's/^(.*\/home)/# \1/g' /etc/fstab

lvextend -l +100%FREE /dev/rhel_v/root
#   Size of logical volume rhel_v/root changed from <38.26 GiB (9794 extents) to <56.94 GiB (14576 extents).
#   Logical volume rhel_v/root successfully resized.

xfs_growfs /dev/rhel_v/root
# meta-data=/dev/mapper/rhel_v-root isize=512    agcount=4, agsize=2507264 blks
#          =                       sectsz=512   attr=2, projid32bit=1
#          =                       crc=1        finobt=1, sparse=1, rmapbt=0
#          =                       reflink=1
# data     =                       bsize=4096   blocks=10029056, imaxpct=25
#          =                       sunit=0      swidth=0 blks
# naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
# log      =internal log           bsize=4096   blocks=4897, version=2
#          =                       sectsz=512   sunit=0 blks, lazy-count=1
# realtime =none                   extsz=4096   blocks=0, rtextents=0
# data blocks changed from 10029056 to 14925824

lsblk
# NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
# sr0              11:0    1 1024M  0 rom
# vda             252:0    0   60G  0 disk
# ├─vda1          252:1    0    1G  0 part /boot
# └─vda2          252:2    0   59G  0 part
#   ├─rhel_v-root 253:0    0   57G  0 lvm  /
#   └─rhel_v-swap 253:1    0  2.1G  0 lvm  [SWAP]


reference

  • https://developers.redhat.com/blog/2021/02/10/how-to-activate-your-no-cost-red-hat-enterprise-linux-subscription#
  • https://developers.redhat.com/products/rhel/download

在红帽官网查询rpm包属于哪个repo

红帽rhel8系统,可以配置多个repo,默认是baseos, appstream,但是我们在项目中,经常会被问到,某个rpm是从哪个repo中获取的,如果这个rpm不在baseos, appstream这两个默认的repo中,那么我们在系统上用命令是查不出来的(至少我现在还不知道有什么办法),必须先把repo加到系统上,才能用dnf命令给查出来,这样就变成一个鸡生蛋,蛋生鸡的游戏,我们还是不知道应该加哪个repo。

好在,红帽官网提供了一个工具,可以查询rpm包属于哪个repo,只要把rpm包信息输入到系统上,然后用这个工具查询,就可以知道属于哪个repo了。

我们访问这个网页,查询一个包 qemu-kiwi,我们注意到,他是根据我当前的订阅权限查询的,好在作者当前的订阅包含的内容还算多。

点击x84_64,我们就能看到rpm包的具体信息,包含了repo的名字,版本,描述,等等。

于是,我们就得到了最终的答案,qemu-kiwi这个rpm包,是属于 advanced-virt-for-rhel-8-x86_64-rpms 这个repo的

离线环境下 原地从 rhel7 向 rhel8 升级

随着rhel7的生命周期结束,越来越近,同时rhel8的很多新特性很有功能和性能上的优势,很多客户都在考虑从rhel7向rhel8升级。一般来说,系统升级是高风险操作,非常推荐客户找备份主机,新装rhel8,然后把应用迁移过去,然后把rhel7这台主机操作系统删除重装rhel8,之后再把应用迁移回来。

但是很多客户的生产主机是非常高配的,并没有足够的备份主机,做上述操作。这样就要考虑在原地从rhel7向rhel8升级。由于原地升级风险很大,强烈建议客户联系专业团队,如红帽GPS,做全面的原地升级计划。

一般来说,原地升级要考虑如下问题:

  1. 系统存储情况,分区配置
  2. 是否有第三方内核驱动
  3. 操作系统做了什么定制化
  4. 启动了什么应用

红帽官方提供了leapp,以及boom来支持原地升级。但是这个并不能完全消除原地升级的风险。本文就在一台宿主机上,按照一个rhel7 vm,然后模拟离线环境,来升级到rhel8。目的是让准备实施原地升级的客户,能进行原地升级演练,更好的模拟目标系统的状态,并尽早的发现原地升级过程中的问题和风险。

视频讲解:

参考材料:

leapp

leapp是红帽官方的升级工具,在rhel8官方文档中,有详细的描述。本文聚焦在全离线环境下,如何使用leapp的方式来进行升级。注意,如果只使用leapp升级系统,那升级过程是单向的,也就是说,一旦开始升级,就不能再恢复到原来的状态,也不能降级。如何恢复或者降级,在后面boom的章节中描述。

订阅离线证书

为了在离线环境进行升级,我们需要先准备 rhel7 & rhel8 repo,那么这就需要订阅离线证书。这里我们就看看怎么下载,并解压缩出来。

我们访问红帽在线订阅系统,选择一个系统,给这系统添加正确的订阅,然后点击下载,我们就能得到一个zip文件。

然后我们上次这个zip文件到服务器上,之后解压缩之。

# on host, we use rocky 8.5
# prepare redhat subscription cert
mkdir -p /data/rhel8/entitle
cd /data/rhel8/entitle

# goto https://access.redhat.com/management/subscriptions
# search employee sku, find a system, go into, and download from subscription
# or goto: https://access.redhat.com/management/systems/4d1e4cc0-2c99-4431-99ce-2f589a24ea11/subscriptions
dnf install -y unzip 
unzip *
unzip consumer_export.zip
find . -name *.pem -exec cp {} ./ \;

mkdir -p /data/dockerfile/
cd /data/dockerfile/

ls /data/rhel8/entitle/*.pem | sed -n '2p' | xargs -I DEMO /bin/cp -f DEMO ./ 

用容器的方式构建离线repo

我们采用容器的方式构建离线repo,这样就可以在离线环境中进行升级了。至于为什么用容器的方式,那是因为我们需要同时为rhel7, rhel8两个系统构建离线repo,一般的方法,这就需要2个系统,而我们只需要一个系统,那么我们就用容器的方式,来模拟两个操作系统环境,进行构建。

这里面的一个问题是,在容器环境中,红帽的订阅机制是不生效的,我们需要一些技巧来解决这个问题。

# prepare rhel8 repo
mkdir -p /data/rhel/dnf

podman run -it --rm -v /data/rhel/dnf:/data/dnf:z \
    --mount type=bind,source=$(ls /data/rhel8/entitle/*.pem | sed -n '2p'),target=/etc/pki/entitlement/entitlement.pem,relabel=shared  \
    --mount type=bind,source=$(ls /data/rhel8/entitle/*.pem | sed -n '2p'),target=/etc/pki/entitlement/entitlement-key.pem,relabel=shared \
    registry.access.redhat.com/ubi8 bash

# in podman shell
dnf -y update || true && \
    sed -i 's|enabled=1|enabled=0|g' /etc/yum/pluginconf.d/subscription-manager.conf && \
    sed -i 's|%(ca_cert_dir)sredhat-uep.pem|/etc/rhsm/ca/redhat-uep.pem|g' /etc/yum.repos.d/redhat.repo && \
    sed -i '/ansible-2.9-for-rhel-8-x86_64-rpms/,/enabled = 0/s/enabled = 0/enabled = 1/' /etc/yum.repos.d/redhat.repo && \
    sed -i 's|cdn.redhat.com|china.cdn.redhat.com|g' /etc/yum.repos.d/redhat.repo && \
    dnf -y update && \
    cd /data/dnf && \
    dnf reposync -m --download-metadata --delete -n

# prepare rhel7 repo
mkdir -p /data/rhel/yum

podman run -it --rm -v /data/rhel/yum:/data/yum:z \
    --mount type=bind,source=$(ls /data/rhel8/entitle/*.pem | sed -n '2p'),target=/etc/pki/entitlement/entitlement.pem,relabel=shared  \
    --mount type=bind,source=$(ls /data/rhel8/entitle/*.pem | sed -n '2p'),target=/etc/pki/entitlement/entitlement-key.pem,relabel=shared \
    registry.access.redhat.com/ubi7 bash

# in podman shell
# https://unix.stackexchange.com/questions/677719/search-and-replace-lines-after-a-regex-match-using-sed
# https://stackoverflow.com/questions/148451/how-to-use-sed-to-replace-only-the-first-occurrence-in-a-file
sed -i 's|%(ca_cert_dir)sredhat-uep.pem|/etc/rhsm/ca/redhat-uep.pem|g' /etc/rhsm/rhsm.conf && \
  yum -y update || true && \
  sed -i 's|enabled=1|enabled=0|g' /etc/yum/pluginconf.d/subscription-manager.conf && \
  sed -i 's|%(ca_cert_dir)sredhat-uep.pem|/etc/rhsm/ca/redhat-uep.pem|g' /etc/yum.repos.d/redhat.repo && \
  sed -i 's|cdn.redhat.com|china.cdn.redhat.com|g' /etc/yum.repos.d/redhat.repo && \
  sed -i '/rhel-7-server-extras-rpms/,/enabled = 0/s/enabled = 0/enabled = 1/' /etc/yum.repos.d/redhat.repo && \
  yum -y update && \
  cd /data/yum && \
  yum install -y yum-utils createrepo && \
  reposync -n -d -l -m && \
  createrepo ./

使用ftp来提供repo服务

我们已经准备好了离线repo,那么我们启动一个ftp服务,来提供离线repo的服务。这里面会有一些权限, selinux的问题和解决技巧。

# setup ftp service for repo
dnf -y install vsftpd
sed -i 's/anonymous_enable=NO/anonymous_enable=YES/g' /etc/vsftpd/vsftpd.conf
systemctl enable --now vsftpd

systemctl disable --now firewalld

cd /data/
chcon -R -t public_content_t  rhel
chown -R ftp:ftp rhel

cd /var/ftp 
mkdir -p /var/ftp/rhel
# https://stackoverflow.com/questions/34736743/ftp-550-failed-to-change-directory
mount --bind /data/rhel /var/ftp/rhel
cat << EOF >> /etc/fstab
/data/rhel /var/ftp/rhel none bind 0 0
EOF

dnf install -y lftp
# try the ftp server
lftp 127.0.0.1
# ls rhel/yum/rhel-7-server-rpms/Packages/a/

安装 rhel7 vm

至此,我们的准备工作都完成了,开始安装rhel7的虚拟机。

# setup bridge for vm
mkdir -p /data/kvm
cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno2'
PUB_IP='192.168.7.11/24'
PUB_GW='192.168.7.11'
PUB_DNS='192.168.7.11'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh

# install rhel7 vm
cd /data/kvm

osinfo-query os | grep rhel7
#  rhel7-unknown        | Red Hat Enterprise Linux 7 Unknown                 | 7-unknown | http://redhat.com/rhel/7-unknown
#  rhel7.0              | Red Hat Enterprise Linux 7.0                       | 7.0      | http://redhat.com/rhel/7.0
#  rhel7.1              | Red Hat Enterprise Linux 7.1                       | 7.1      | http://redhat.com/rhel/7.1
#  rhel7.2              | Red Hat Enterprise Linux 7.2                       | 7.2      | http://redhat.com/rhel/7.2
#  rhel7.3              | Red Hat Enterprise Linux 7.3                       | 7.3      | http://redhat.com/rhel/7.3
#  rhel7.4              | Red Hat Enterprise Linux 7.4                       | 7.4      | http://redhat.com/rhel/7.4
#  rhel7.5              | Red Hat Enterprise Linux 7.5                       | 7.5      | http://redhat.com/rhel/7.5
#  rhel7.6              | Red Hat Enterprise Linux 7.6                       | 7.6      | http://redhat.com/rhel/7.6
#  rhel7.7              | Red Hat Enterprise Linux 7.7                       | 7.7      | http://redhat.com/rhel/7.7
#  rhel7.8              | Red Hat Enterprise Linux 7.8                       | 7.8      | http://redhat.com/rhel/7.8
#  rhel7.9              | Red Hat Enterprise Linux 7.9                       | 7.9      | http://redhat.com/rhel/7.9

# download rhel7 iso
wget -O rhel7.iso 'https://access.cdn.redhat.com/content/origin/files/sha256/19/19d653ce2f04f202e79773a0cbeda82070e7527557e814ebbce658773fbe8191/rhel-server-7.9-x86_64-dvd.iso?user=a768b217cf6ae8041b67586bb4dd5c77&_auth_=1641893589_4f48191c0168e22e5cedac1a1ef79ef8'

pvcreate /dev/sdb
vgcreate vgdata /dev/sdb

create_lv() {
    var_vg=$1
    var_lv=$2
    lvremove -f $var_vg/$var_lv
    lvcreate -y -L 120G -n $var_lv $var_vg
    wipefs --all --force /dev/$var_vg/$var_lv
}

create_lv vgdata lvrhel7

export http_proxy="http://192.168.195.54:5085"
export https_proxy=${http_proxy}

wget https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/notes/2022/files/helper-ks.cfg

unset http_proxy
unset https_proxy

# https://octowhale.gitbooks.io/kickstart/content/chapter2-kickstart-options-logvol.html
# https://octowhale.gitbooks.io/kickstart/content/chapter2-kickstart-options-network.html
# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/performing_an_advanced_rhel_installation/kickstart-commands-and-options-reference_installing-rhel-as-an-experienced-user#network_kickstart-commands-for-network-configuration
sed -i '0,/^network.*/s/^network.*/network  --bootproto=static --device=eth0 --gateway=192.168.7.1 --ip=192.168.7.12  --netmask=255.255.255.0 --nameserver=192.168.7.1  --noipv6 --activate/' helper-ks.cfg
sed -i 's/logvol \/  --fstype="xfs" .*/logvol \/  --fstype="xfs" --name=root --vgname=vg0 --percent=50/' helper-ks.cfg

# 配置kvm环境
dnf -y groupinstall "Server with GUI"

dnf -y install qemu-kvm libvirt libguestfs-tools virt-install virt-viewer virt-manager tigervnc-server

systemctl disable --now firewalld
systemctl enable --now libvirtd

# 准备vnc环境
vncpasswd

cat << EOF > ~/.vnc/config
session=gnome
securitytypes=vncauth,tlsvnc
# desktop=sandbox
geometry=1280x800
alwaysshared
EOF

cat << EOF >> /etc/tigervnc/vncserver.users
:1=root
EOF

# systemctl disable vncserver@:1
systemctl start vncserver@:1
# 如果你想停掉vnc server,这么做
systemctl stop vncserver@:1

# start to install the rhel7 vm
virt-install --name="rhel7" --vcpus=8 --ram=8192 \
--cpu=host-model \
--disk path=/dev/vgdata/lvrhel7,device=disk,bus=virtio,format=raw \
--os-variant rhel7.9 --network bridge=baremetal,model=virtio \
--graphics vnc,port=59000 \
--boot menu=on --location /data/kvm/rhel7.iso \
--initrd-inject helper-ks.cfg --extra-args "inst.ks=file:/helper-ks.cfg" 

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

配置 rhel 7 vm

rhel7虚拟机装好以后,我们要对他做一些简单的配置,把他的更新源指向我们的离线repo

# setup rhel7 vm
ssh root@192.168.7.12

# disable dns lookup in sshd when ssh login
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config
systemctl restart sshd

# link to local repo
cat << 'EOF' > /etc/yum.repos.d/remote.repo
[remote-rhel7]
name=remote-rhel7
baseurl=ftp://192.168.7.11/rhel/yum
enabled=1
gpgcheck=0
EOF

yum update -y
reboot

开始升级

我们使用leapp来升级,leapp会检查系统配置,并给出系统上有什么问题,导致不能原地升级。我们要根据leapp的提升,进行系统配置,配置完成后,我们可以再试试,如果检查通过,就可以原地升级了。

本文的系统环境非常简单,还遇到了2个问题,可以想象到,如果是生产环境,会遇到更多的问题。

# perform upgrade
# 先安装升级需要的软件
yum install -y leapp leapp-repository leapp-repository-deps lvm2-python-boom

# 配置升级过程中的安装源
cat << 'EOF' > /etc/leapp/files/leapp_upgrade_repositories.repo 
[BaseOS]
name=BaseOS
baseurl=ftp://192.168.7.11/rhel/dnf/rhel-8-for-x86_64-baseos-rpms
enabled=1
gpgcheck=0

[AppStream]
name=AppStream
baseurl=ftp://192.168.7.11/rhel/dnf/rhel-8-for-x86_64-appstream-rpms
enabled=1
gpgcheck=0

EOF

# 因为我们是离线环境,需要有一些升级用的参数文件,需要手动的下载和导入
# https://access.redhat.com/articles/3664871
# download the leapp-data15.tar.gz to server
tar -xzf leapp-data15.tar.gz -C /etc/leapp/files 

# 做第一次的升级前检测
# 从结果看,发现了2个问题,导致不能升级
leapp preupgrade --no-rhsm --enablerepo BaseOS --enablerepo AppStream
# .........
# ====> * verify_check_results
#         Check all generated results messages and notify user about them.

# ============================================================
#                      UPGRADE INHIBITED
# ============================================================

# Upgrade has been inhibited due to the following problems:
#     1. Inhibitor: Possible problems with remote login using root account
#     2. Inhibitor: Missing required answers in the answer file
# Consult the pre-upgrade report for details and possible remediation.

# ============================================================
#                      UPGRADE INHIBITED
# ============================================================


# Debug output written to /var/log/leapp/leapp-preupgrade.log

# ============================================================
#                            REPORT
# ============================================================

# A report has been generated at /var/log/leapp/leapp-report.json
# A report has been generated at /var/log/leapp/leapp-report.txt

# ============================================================
#                        END OF REPORT
# ============================================================

# Answerfile has been generated at /var/log/leapp/answerfile

# 我们看看这两个问题是什么
# 还好,红帽工具给出了解决问题的方法和命令
cat /var/log/leapp/leapp-report.txt
# Risk Factor: high (inhibitor)
# Title: Possible problems with remote login using root account
# Summary: OpenSSH configuration file does not explicitly state the option PermitRootLogin in sshd_config file, which will default in RHEL8 to "prohibit-password".
# Remediation: [hint] If you depend on remote root logins using passwords, consider setting up a different user for remote administration or adding "PermitRootLogin yes" to sshd_config.
# Key: 3d21e8cc9e1c09dc60429de7716165787e99515f
# ----------------------------------------
# Risk Factor: high (inhibitor)
# Title: Missing required answers in the answer file
# Summary: One or more sections in answerfile are missing user choices: remove_pam_pkcs11_module_check.confirm
# For more information consult https://leapp.readthedocs.io/en/latest/dialogs.html
# Remediation: [hint] Please register user choices with leapp answer cli command or by manually editing the answerfile.
# [command] leapp answer --section remove_pam_pkcs11_module_check.confirm=True
# Key: d35f6c6b1b1fa6924ef442e3670d90fa92f0d54b
# ----------------------------------------
# ............

# 我们应用红帽的解决方案
sed -i 's/#PermitRootLogin yes/PermitRootLogin yes/g' /etc/ssh/sshd_config

leapp answer --section remove_pam_pkcs11_module_check.confirm=True

# 开始升级
leapp upgrade --no-rhsm --enablerepo BaseOS --enablerepo AppStream
# ..............
# Transaction Summary
# =========================================================================================================================
# Install    213 Packages
# Upgrade    285 Packages
# Remove      66 Packages
# Downgrade    7 Packages

# Total size: 589 M
# DNF will only download packages, install gpg keys, and check the transaction.
# Downloading Packages:
# Running transaction check
# Transaction check succeeded.
# Running transaction test
# Transaction test succeeded.
# Complete!
# ====> * add_upgrade_boot_entry
#         Add new boot entry for Leapp provided initramfs.
# A reboot is required to continue. Please reboot your system.


# Debug output written to /var/log/leapp/leapp-upgrade.log

# ============================================================
#                            REPORT
# ============================================================

# A report has been generated at /var/log/leapp/leapp-report.json
# A report has been generated at /var/log/leapp/leapp-report.txt

# ============================================================
#                        END OF REPORT
# ============================================================

# Answerfile has been generated at /var/log/leapp/answerfile

reboot

第一次重启,我们能看到多了一个特殊的启动项,不用有任何操作,让他自动继续。

我们能看到启动过程是不一样的,在继续做系统升级的操作。

然后,系统会自动重启,我们能看到,重启以后,重新进行selinux relabel

之后,会再次自动重启,就完成升级了,可以看到简单的完成状态信息

升级之后的配置

至此,我们就完成了rhel7->rhel8的升级,我们要做一点配置,也就是把 rhel8 的更新源给配置进去。

# ssh into the new upgraded rhel8
cat << 'EOF' > /etc/yum.repos.d/remote.repo
[BaseOS]
name=BaseOS
baseurl=ftp://192.168.7.11/rhel/dnf/rhel-8-for-x86_64-baseos-rpms
enabled=1
gpgcheck=0

[AppStream]
name=AppStream
baseurl=ftp://192.168.7.11/rhel/dnf/rhel-8-for-x86_64-appstream-rpms
enabled=1
gpgcheck=0

EOF

dnf makecache

dnf upgrade -y
# ......
# Dependencies resolved.
# Nothing to do.
# Complete!

BOOM

之前说的leapp方法,有一个问题,就是如果系统升级失败,会让系统进入不可用状态。遗憾的是,对于定制化很多的生产系统,升级失败并不是小概率事件。为了避免系统升级失败,导致系统完全不可用的情况发生,红帽提供了boom工具,来帮助在升级之前,做一个系统快照,如果升级失败,那么就可以从这个系统快照中恢复系统。

boom工具并不是为了系统原地升级新打造的,boom是一个老工具,一个常用的使用场景是,先给系统做一个快照,然后对系统进行配置,如果发现系统配置正确,那么删除这个快照。如果发现系统配置不正确,那么就回复这个快照。

可以看出来,系统原地升级,只不过是boom的一个使用场景。

参考材料:

  1. Upgrading from RHEL 7 to RHEL 8 with Leapp and BOOM
  2. Boom! Booting RHEL from LVM snapshots

创建系统快照

我们坚持系统当前的状态,并创建系统快照

# after rhel7 vm created
vgs
#   VG  #PV #LV #SN Attr   VSize    VFree
#   vg0   1   2   0 wz--n- <119.00g 59.25g

lvs
#   LV   VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#   root vg0 -wi-ao---- <59.25g
#   swap vg0 -wi-ao---- 512.00m

lvcreate -s -L 10G -n rollback vg0/root
#   Logical volume "rollback" created.

lvs
#   LV       VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#   rollback vg0 swi-a-s---  10.00g      root   0.01
#   root     vg0 owi-aos--- <59.25g
#   swap     vg0 -wi-ao---- 512.00m

yum install -y leapp leapp-repository leapp-repository-deps lvm2-python-boom

boom create --title "RHEL7 Snapshot" --rootlv vg0/rollback
# WARNING - Boom configuration not found in grub.cfg
# WARNING - Run 'grub2-mkconfig > /boot/grub2/grub.cfg' to enable
# Created entry with boot_id 982beff:
#   title RHEL7 Snapshot
#   machine-id 036bb4e6c07a4ba9856c4bf68c1bd250
#   version 3.10.0-1160.49.1.el7.x86_64
#   linux /vmlinuz-3.10.0-1160.49.1.el7.x86_64
#   initrd /initramfs-3.10.0-1160.49.1.el7.x86_64.img
#   options root=/dev/vg0/rollback ro rd.lvm.lv=vg0/rollback
#   grub_users $grub_users
#   grub_arg --unrestricted
#   grub_class kernel

grub2-mkconfig > /boot/grub2/grub.cfg
# Generating grub configuration file ...
# Found linux image: /boot/vmlinuz-3.10.0-1160.49.1.el7.x86_64
# Found initrd image: /boot/initramfs-3.10.0-1160.49.1.el7.x86_64.img
# Found linux image: /boot/vmlinuz-3.10.0-1160.el7.x86_64
# Found initrd image: /boot/initramfs-3.10.0-1160.el7.x86_64.img
# Found linux image: /boot/vmlinuz-0-rescue-036bb4e6c07a4ba9856c4bf68c1bd250
# Found initrd image: /boot/initramfs-0-rescue-036bb4e6c07a4ba9856c4bf68c1bd250.img
# done

boom list
# BootID  Version                     Name                            RootDevice
# 982beff 3.10.0-1160.49.1.el7.x86_64 Red Hat Enterprise Linux Server /dev/vg0/rollback

lvs
#   LV       VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#   rollback vg0 swi-a-s---  10.00g      root   0.40
#   root     vg0 owi-aos--- <59.25g
#   swap     vg0 -wi-ao---- 512.00m

reboot

升级系统

我们接下来,安装之前leapp的步骤,进行原地升级操作,重启以后,我们看看系统状态,可以看到快照卷已经有接近50%的使用量。这就提醒我们,需要给快照卷足够大的空间,否则快照卷就会失效,丧失了系统恢复的功能。

启动过程中,我们选择默认的kernel

# perform upgrade to rhel8

# after upgrade
lvs
#   LV       VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#   rollback vg0 swi-a-s---  10.00g      root   44.57
#   root     vg0 owi-aos--- <59.25g
#   swap     vg0 -wi-ao---- 512.00m

rollback to rhel7

接下来我们尝试恢复到rhel7。我们重启系统,选择snapshot启动系统。

然后做卷的恢复操作。

# boot using the snapshot
lvconvert --merge /dev/vg0/rollback
#   Delaying merge since snapshot is open.
#   Merging of snapshot vg0/rollback will occur on next activation of vg0/root.

reboot

重启后,选择老的rhel7的kernel启动系统。

重装kernel,让rhel7最新的kernel作为默认的kernel。

lvs
#   LV   VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#   root vg0 Owi-aos--- <59.25g             11.06
#   swap vg0 -wi-ao---- 512.00m

yum list kernel*
# Installed Packages
# kernel.x86_64                                                3.10.0-1160.el7                                       @anaconda/7.9
# kernel.x86_64                                                3.10.0-1160.49.1.el7                                  @remote-rhel7
# kernel-tools.x86_64                                          3.10.0-1160.49.1.el7                                  @remote-rhel7
# kernel-tools-libs.x86_64                                     3.10.0-1160.49.1.el7                                  @remote-rhel7
# Available Packages
# kernel-abi-whitelists.noarch                                 3.10.0-1160.49.1.el7                                  remote-rhel7
# kernel-debug.x86_64                                          3.10.0-1160.49.1.el7                                  remote-rhel7
# kernel-debug-devel.x86_64                                    3.10.0-1160.49.1.el7                                  remote-rhel7
# kernel-devel.x86_64                                          3.10.0-1160.49.1.el7                                  remote-rhel7
# kernel-doc.noarch                                            3.10.0-1160.49.1.el7                                  remote-rhel7
# kernel-headers.x86_64                                        3.10.0-1160.49.1.el7                                  remote-rhel7

# https://access.redhat.com/solutions/4094081
yum remove -y kernel-3.10.0-1160.49.1.el7.x86_64 ; yum install -y kernel-3.10.0-1160.49.1.el7.x86_64
# grubby fatal error: unable to find a suitable template
# grubby: doing this would leave no kernel entries. Not writing out new config.
#   Verifying  : kernel-3.10.0-1160.49.1.el7.x86_64                                                                           1/1

# Installed:
#   kernel.x86_64 0:3.10.0-1160.49.1.el7

grub2-mkconfig -o /boot/grub2/grub.cfg

yum remove -y kernel-3.10.0-1160.49.1.el7.x86_64 ; yum install -y kernel-3.10.0-1160.49.1.el7.x86_64

reboot

重启以后,我们能看到rhel7最新的kernel已经作为系统默认启动的kernel选项。

accept the upgraded rhel8

最后,我们看看如果原地升级成功,我们如何接受这个升级。过程也简单,就是boom中删除snapshot的启动项,并且把snapshot卷删掉。

# boot into the rhel8
uname -a
# Linux helper 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 8 21:51:17 EST 2021 x86_64 x86_64 x86_64 GNU/Linux

lvs
#   LV       VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#   rollback vg0 swi-a-s---  10.00g      root   43.99
#   root     vg0 owi-aos--- <59.25g
#   swap     vg0 -wi-ao---- 512.00m

boom list
# WARNING - Options for BootEntry(boot_id=d291021) do not match OsProfile: marking read-only
# BootID  Version                     Name                            RootDevice
# 6d82dac 3.10.0-1160.49.1.el7.x86_64 Red Hat Enterprise Linux Server /dev/vg0/rollback
# e1f4484 3.10.0-1160.49.1.el7.x86_64 Red Hat Enterprise Linux Server /dev/mapper/vg0-root
# f7da13a 3.10.0-1160.el7.x86_64      Red Hat Enterprise Linux Server /dev/mapper/vg0-root
# d291021 4.18.0-348.7.1.el8_5.x86_64 Red Hat Enterprise Linux        /dev/mapper/vg0-root

boom entry delete 6d82dac
# WARNING - Options for BootEntry(boot_id=d291021) do not match OsProfile: marking read-only
# Deleted 1 entry

boom list
# WARNING - Options for BootEntry(boot_id=d291021) do not match OsProfile: marking read-only
# BootID  Version                     Name                            RootDevice
# e1f4484 3.10.0-1160.49.1.el7.x86_64 Red Hat Enterprise Linux Server /dev/mapper/vg0-root
# f7da13a 3.10.0-1160.el7.x86_64      Red Hat Enterprise Linux Server /dev/mapper/vg0-root
# d291021 4.18.0-348.7.1.el8_5.x86_64 Red Hat Enterprise Linux        /dev/mapper/vg0-root

lvs
#   LV       VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#   rollback vg0 swi-a-s---  10.00g      root   44.41
#   root     vg0 owi-aos--- <59.25g
#   swap     vg0 -wi-ao---- 512.00m

lvremove -f /dev/vg0/rollback
#   Logical volume "rollback" successfully removed.

lvs
#   LV   VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#   root vg0 -wi-ao---- <59.25g
#   swap vg0 -wi-ao---- 512.00m

reboot

重启以后,我们能看到,snapshot启动项没有了。

lvm snapshot full issue

这里提供一个背景知识,如果snapshot卷满了,那么snapshot卷就失效了,我们也就不能恢复了。

Why do I get I/O errors when my LVM snapshot reaches 100% usage?

end

sysctl.conf 里面设置的参数无法加载

客户遇到一个很奇怪的问题,明明在sysctl.conf里面配置了net.netfilter.nf_conntrack_max参数,但是重启以后,这个参数还是没有生效,这个问题是什么原因呢?

答案在红帽的知识库里面

  • https://access.redhat.com/solutions/548813

我们知道,sysctl的配置说是在系统启动的时候,由systemd-sysctl.service加载的。而客户环境里面,又有docker,docker因为会调用iptables的nat功能,从而隐形的加载内核模块nf_conntrack。我们猜测,应该是docker服务和systemd-sysctl服务,在系统启动的时候,相互配合有问题,才造成了我们内核模块的参数加载失败的问题。

接下来,我们做个实验来看看。

实验环境准备

我们装一台centos7,关闭firewalld,再安装社区版本的docker-ce。注意,我们在这里,并没有激活docker服务的自动启动。

# disable firewalld
systemctl disable --now firewalld

# install docker ce
yum install -y yum-utils
yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
yum install -y docker-ce docker-ce-cli containerd.io

# reboot is important
reboot

环境调查

系统重启以后,我们看看是否加载了内核模块nf_conntrack,并且看看有没有对应的参数。

# check nf_conntrack module status
lsmod | grep nf_conntrack
# nothing

sysctl net.netfilter.nf_conntrack_max
# sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_max: No such file or directory

可以看到,没有加载nf_conntrack模块,而且没有对应的参数。那么我们手动启动docker服务,然后再看看。

# enable docker service and check nf_conntrack again
systemctl start docker

lsmod | grep nf_conntrack
# nf_conntrack_netlink    36396  0
# nf_conntrack_ipv4      15053  2
# nf_defrag_ipv4         12729  1 nf_conntrack_ipv4
# nf_conntrack          139264  6 nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4
# libcrc32c              12644  2 nf_nat,nf_conntrack

sysctl net.netfilter.nf_conntrack_max
# net.netfilter.nf_conntrack_max = 65536

# check we didn't set the parameter for net.netfilter.nf_conntrack_max
find /etc -type f -exec grep -H nf_conntrack_max {} \;
# nothing

可以看到,加载了docker服务以后,内核模块nf_conntrack已经加载了,而且参数net.netfilter.nf_conntrack_max已经设置了。我们查找了一下/etc目录下面,发现没有配置net.netfilter.nf_conntrack_max的参数。

那么我们可以得出结论,不做任何配置,nf_conntrack内核模块,只有在docker服务启动的时候,才会加载到内核。如果systemd-sysctl这个服务,在系统启动的时候,早于docker服务,那么他就无法设置参数net.netfilter.nf_conntrack_max,就会造成我们所看到的故障。

接下来,我们确认一下我们的猜测,我们罗列一下,有哪些服务是在docker服务启动之前,必须要启动的。或者说,docker服务依赖哪些别的服务。

# check systemd init sequence
systemctl enable --now docker

# we can see, docker start after systemd-sysctl
systemctl list-dependencies docker
# docker.service
# ● ├─containerd.service
# ● ├─docker.socket
# ● ├─system.slice
# ● ├─basic.target
# ● │ ├─microcode.service
# ● │ ├─rhel-dmesg.service
# ● │ ├─selinux-policy-migrate-local-changes@targeted.service
# ● │ ├─paths.target
# ● │ ├─slices.target
# ● │ │ ├─-.slice
# ● │ │ └─system.slice
# ● │ ├─sockets.target
# ● │ │ ├─dbus.socket
# ● │ │ ├─systemd-initctl.socket
# ● │ │ ├─systemd-journald.socket
# ● │ │ ├─systemd-shutdownd.socket
# ● │ │ ├─systemd-udevd-control.socket
# ● │ │ └─systemd-udevd-kernel.socket
# ● │ ├─sysinit.target
# ● │ │ ├─dev-hugepages.mount
# ● │ │ ├─dev-mqueue.mount
# ● │ │ ├─kmod-static-nodes.service
# ● │ │ ├─plymouth-read-write.service
# ● │ │ ├─plymouth-start.service
# ● │ │ ├─proc-sys-fs-binfmt_misc.automount
# ● │ │ ├─rhel-autorelabel-mark.service
# ● │ │ ├─rhel-autorelabel.service
# ● │ │ ├─rhel-domainname.service
# ● │ │ ├─rhel-import-state.service
# ● │ │ ├─rhel-loadmodules.service
# ● │ │ ├─sys-fs-fuse-connections.mount
# ● │ │ ├─sys-kernel-config.mount
# ● │ │ ├─sys-kernel-debug.mount
# ● │ │ ├─systemd-ask-password-console.path
# ● │ │ ├─systemd-binfmt.service
# ● │ │ ├─systemd-firstboot.service
# ● │ │ ├─systemd-hwdb-update.service
# ● │ │ ├─systemd-journal-catalog-update.service
# ● │ │ ├─systemd-journal-flush.service
# ● │ │ ├─systemd-journald.service
# ● │ │ ├─systemd-machine-id-commit.service
# ● │ │ ├─systemd-modules-load.service
# ● │ │ ├─systemd-random-seed.service
# ● │ │ ├─systemd-sysctl.service
# ● │ │ ├─systemd-tmpfiles-setup-dev.service
# ● │ │ ├─systemd-tmpfiles-setup.service
# ● │ │ ├─systemd-udev-trigger.service
# ● │ │ ├─systemd-udevd.service
# ● │ │ ├─systemd-update-done.service
# ● │ │ ├─systemd-update-utmp.service
# ● │ │ ├─systemd-vconsole-setup.service
# ● │ │ ├─cryptsetup.target
# ● │ │ ├─local-fs.target
# ● │ │ │ ├─-.mount
# ● │ │ │ ├─rhel-readonly.service
# ● │ │ │ ├─systemd-fsck-root.service
# ● │ │ │ └─systemd-remount-fs.service
# ● │ │ └─swap.target
# ● │ └─timers.target
# ● │   └─systemd-tmpfiles-clean.timer
# ● └─network-online.target

我们可以很清晰的看到,systemd-sysctl.service服务是在docker服务启动之前,必须要启动的。

解决问题

红帽知识库给出了解决办法,就是使用系统自带的rhel-loadmodules.service服务,这个服务,在systemd-sysctl.service启动之前启动,我们来看看他的内容。

systemctl cat rhel-loadmodules.service
# # /usr/lib/systemd/system/rhel-loadmodules.service
# [Unit]
# Description=Load legacy module configuration
# DefaultDependencies=no
# Conflicts=shutdown.target
# After=systemd-readahead-collect.service systemd-readahead-replay.service
# Before=sysinit.target shutdown.target
# ConditionPathExists=|/etc/rc.modules
# ConditionDirectoryNotEmpty=|/etc/sysconfig/modules/

# [Service]
# ExecStart=/usr/lib/systemd/rhel-loadmodules
# Type=oneshot
# TimeoutSec=0
# RemainAfterExit=yes

# [Install]
# WantedBy=sysinit.target

我们可以看到,rhel-loadmodules.service服务,检查/etc/sysconfig/modules/目录是否存在,如果存在,就运行程序/usr/lib/systemd/rhel-loadmodules。那我们看看/usr/lib/systemd/rhel-loadmodules的内容。

cat /usr/lib/systemd/rhel-loadmodules
# #!/bin/bash

# # Load other user-defined modules
# for file in /etc/sysconfig/modules/*.modules ; do
#   [ -x $file ] && $file
# done

# # Load modules (for backward compatibility with VARs)
# if [ -f /etc/rc.modules ]; then
#         /etc/rc.modules
# fi

/usr/lib/systemd/rhel-loadmodules的内容很简单,就是遍历/etc/sysconfig/modules/目录下的所有*.modules文件,如果文件存在,就运行它。

那么我们就有了解决办法,创建对应的module文件,让这些内核模块,在系统启动的早期就加载了,这样之后的systemd-sysctl.service服务就可以正常的去设置参数了。

echo "modprobe nf_conntrack" >> /etc/sysconfig/modules/nf_conntrack.modules && chmod 775 /etc/sysconfig/modules/nf_conntrack.modules

echo "net.netfilter.nf_conntrack_max = 2097152" >> /etc/sysctl.d/99-nf_conntrack.conf

reboot

重启以后,我们坚持一下系统状态,如我们所预期的,一切正常。

systemctl status rhel-loadmodules.service
# ● rhel-loadmodules.service - Load legacy module configuration
#    Loaded: loaded (/usr/lib/systemd/system/rhel-loadmodules.service; enabled; vendor preset: enabled)
#    Active: active (exited) since Sun 2022-01-02 08:08:22 UTC; 40s ago
#   Process: 350 ExecStart=/usr/lib/systemd/rhel-loadmodules (code=exited, status=0/SUCCESS)
#  Main PID: 350 (code=exited, status=0/SUCCESS)
#     Tasks: 0
#    Memory: 0B
#    CGroup: /system.slice/rhel-loadmodules.service

# Jan 02 08:08:22 vultr.guest systemd[1]: Started Load legacy module configuration.

sysctl net.netfilter.nf_conntrack_max
# net.netfilter.nf_conntrack_max = 2097152

错误的方法 rc-local.service

面对sysctl参数无法加载的情况,我们第一时间想到的,可能就是rc-local.service服务,这个服务读取/etc/rc.local文件,并执行这个文件中的内容。不过,我们发现在这里面的sysctl -w 命令,依然不起作用,我们猜测,这是由于rc-local.service没有在docker.service之前执行导致的。

那么我们就来确认一下。

systemctl cat rc-local
# # /usr/lib/systemd/system/rc-local.service
# #  This file is part of systemd.
# #
# #  systemd is free software; you can redistribute it and/or modify it
# #  under the terms of the GNU Lesser General Public License as published by
# #  the Free Software Foundation; either version 2.1 of the License, or
# #  (at your option) any later version.

# # This unit gets pulled automatically into multi-user.target by
# # systemd-rc-local-generator if /etc/rc.d/rc.local is executable.
# [Unit]
# Description=/etc/rc.d/rc.local Compatibility
# ConditionFileIsExecutable=/etc/rc.d/rc.local
# After=network.target

# [Service]
# Type=forking
# ExecStart=/etc/rc.d/rc.local start
# TimeoutSec=0
# RemainAfterExit=yes

systemctl list-dependencies rc-local
# rc-local.service
# ● ├─system.slice
# ● └─basic.target
# ●   ├─microcode.service
# ●   ├─rhel-dmesg.service
# ●   ├─selinux-policy-migrate-local-changes@targeted.service
# ●   ├─paths.target
# ●   ├─slices.target
# ●   │ ├─-.slice
# ●   │ └─system.slice
# ●   ├─sockets.target
# ●   │ ├─dbus.socket
# ●   │ ├─systemd-initctl.socket
# ●   │ ├─systemd-journald.socket
# ●   │ ├─systemd-shutdownd.socket
# ●   │ ├─systemd-udevd-control.socket
# ●   │ └─systemd-udevd-kernel.socket
# ●   ├─sysinit.target
# ●   │ ├─dev-hugepages.mount
# ●   │ ├─dev-mqueue.mount
# ●   │ ├─kmod-static-nodes.service
# ●   │ ├─plymouth-read-write.service
# ●   │ ├─plymouth-start.service
# ●   │ ├─proc-sys-fs-binfmt_misc.automount
# ●   │ ├─rhel-autorelabel-mark.service
# ●   │ ├─rhel-autorelabel.service
# ●   │ ├─rhel-domainname.service
# ●   │ ├─rhel-import-state.service
# ●   │ ├─rhel-loadmodules.service
# ●   │ ├─sys-fs-fuse-connections.mount
# ●   │ ├─sys-kernel-config.mount
# ●   │ ├─sys-kernel-debug.mount
# ●   │ ├─systemd-ask-password-console.path
# ●   │ ├─systemd-binfmt.service
# ●   │ ├─systemd-firstboot.service
# ●   │ ├─systemd-hwdb-update.service
# ●   │ ├─systemd-journal-catalog-update.service
# ●   │ ├─systemd-journal-flush.service
# ●   │ ├─systemd-journald.service
# ●   │ ├─systemd-machine-id-commit.service
# ●   │ ├─systemd-modules-load.service
# ●   │ ├─systemd-random-seed.service
# ●   │ ├─systemd-sysctl.service
# ●   │ ├─systemd-tmpfiles-setup-dev.service
# ●   │ ├─systemd-tmpfiles-setup.service
# ●   │ ├─systemd-udev-trigger.service
# ●   │ ├─systemd-udevd.service
# ●   │ ├─systemd-update-done.service
# ●   │ ├─systemd-update-utmp.service
# ●   │ ├─systemd-vconsole-setup.service
# ●   │ ├─cryptsetup.target
# ●   │ ├─local-fs.target
# ●   │ │ ├─-.mount
# ●   │ │ ├─rhel-readonly.service
# ●   │ │ ├─systemd-fsck-root.service
# ●   │ │ └─systemd-remount-fs.service
# ●   │ └─swap.target
# ●   └─timers.target
# ●     └─systemd-tmpfiles-clean.timer

我们可以看到,rc-local服务,和docker服务,没有前后关系,这就导致了rc-local无法确保在docker.serivce之后运行,这样就会导致相关内核模块没有加载,进而导致sysctl命令无法加载。

reference

  • https://access.redhat.com/solutions/548813
  • https://www.dazhuanlan.com/bygxb/topics/1709928

others


systemctl list-unit-files | grep docker
# docker.service                                enabled
# docker.socket                                 disabled

# https://stackoverflow.com/questions/29309717/is-there-any-way-to-list-systemd-services-in-linux-in-the-order-of-they-were-l#fromHistory
# systemd-analyze plot > startup_order.svg

yum install -y graphviz

systemd-analyze dot | dot -Tsvg > systemd.svg
#    Color legend: black     = Requires
#                  dark blue = Requisite
#                  dark grey = Wants
#                  red       = Conflicts
#                  green     = After


# CONNTRACK_MAX = 连接跟踪表大小 (HASHSIZE) * Bucket 大小 (bucket size)
modinfo nf_conntrack
# filename:       /lib/modules/3.10.0-1160.49.1.el7.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz
# license:        GPL
# retpoline:      Y
# rhelversion:    7.9
# srcversion:     358A2186187A7E81339334C
# depends:        libcrc32c
# intree:         Y
# vermagic:       3.10.0-1160.49.1.el7.x86_64 SMP mod_unload modversions
# signer:         CentOS Linux kernel signing key
# sig_key:        77:15:99:7F:C4:81:91:84:C7:45:27:B6:08:4B:C7:F9:BB:15:62:7D
# sig_hashalgo:   sha256
# parm:           tstamp:Enable connection tracking flow timestamping. (bool)
# parm:           acct:Enable connection tracking flow accounting. (bool)
# parm:           nf_conntrack_helper:Enable automatic conntrack helper assignment (default 1) (bool)
# parm:           expect_hashsize:uint

cat /proc/sys/net/nf_conntrack_max
# 65536
cat /proc/sys/net/netfilter/nf_conntrack_max
# 65536

cat /proc/sys/net/netfilter/nf_conntrack_buckets
# 16384

# so the bucket size = 4
echo "`cat /proc/sys/net/netfilter/nf_conntrack_max` / `cat /proc/sys/net/netfilter/nf_conntrack_buckets`" | bc
# 4

给 mellanox bf2网卡刷镜像, 并测试 DPI URL-filter 场景

本文试图在BF2上配置DPI功能中的URL-filter场景,网络流量从bf2上经过以后,bf2的dpi芯片会分析网络包,并根据规则进行拦截。

实验的大体过程是,宿主机用rocky linux,用官方的固件(ubuntu)刷bf2卡,把bf2卡配置好。然后在宿主机上做点测试。

本文里面有一段,是如何在宿主机是rocky linux的情况下,给bf2卡刷官方的镜像

install host with rocky 8.5

我们先在宿主机上安装 rocky linux 8.5

# install rocky 8.5

export VAR_HOST='rl_panlab104'

# 按照完了操作系统以后,添加kernel参数,主要是intel_iommu=on iommu=pt,然后重启
cp /etc/default/grub /etc/default/grub.bak
sed -i "/GRUB_CMDLINE_LINUX/s/resume=[^[:space:]]*//"  /etc/default/grub
sed -i "/GRUB_CMDLINE_LINUX/s/rd.lvm.lv=${VAR_HOST}\\/swap//"  /etc/default/grub
# https://unix.stackexchange.com/questions/403706/sed-insert-text-after-nth-character-preceding-following-a-given-string
sed -i '/GRUB_CMDLINE_LINUX/s/"/ intel_iommu=on iommu=pt pci=realloc  default_hugepagesz=1G hugepagesz=1G hugepages=16 rdblacklist=nouveau"/2' /etc/default/grub

grub2-mkconfig -o /boot/efi/EFI/rocky/grub.cfg

grub2-mkconfig -o /boot/grub2/grub.cfg

# 添加kvm cpu host mode模式的支持,可以不做
cat << EOF > /etc/modprobe.d/kvm-nested.conf
options kvm_intel nested=1  
options kvm-intel enable_shadow_vmcs=1   
options kvm-intel enable_apicv=1         
options kvm-intel ept=1                  
EOF

# 默认的操作系统安装,有swap, home分区,我们是测试系统,全都删了吧。
umount /home
swapoff  /dev/$VAR_HOST/swap

cp /etc/fstab /etc/fstab.bak
sed -i 's/^[^#]*home/#&/' /etc/fstab
sed -i 's/^[^#]*swap/#&/' /etc/fstab

lvremove -f /dev/$VAR_HOST/home
lvremove -f /dev/$VAR_HOST/swap

lvextend -l +100%FREE /dev/$VAR_HOST/root
xfs_growfs /dev/$VAR_HOST/root

# on 104
# first, is console
# https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed

dnf install -y epel-release
dnf install -y byobu htop
dnf groupinstall -y 'Development Tools'
dnf groupinstall -y "Server with GUI"
dnf config-manager --set-enabled powertools

# https://bugzilla.redhat.com/show_bug.cgi?id=1814682
dnf install -y kernel-modules-extra psmisc

mkdir -p /data/down/
cd /data/down/

# 接下来装一些bf2特殊的包,把bf2向主机暴露的串口设备给激活。
# https://docs.nvidia.com/doca/sdk/installation-guide/index.html
# wget https://developer.nvidia.com/networking/secure/doca-sdk/doca_1.2.0/doca_120_b215/rshim-2.0.6-3.ge329c69.el7.centos.x86_64.rpm
yum install -y rshim*.rpm

dnf install -y rshim expect wget minicom rpm-build lshw
systemctl enable --now rshim
systemctl status rshim --no-pager -l
dnf install -y openssl-devel

export http_proxy="http://192.168.195.54:5085"
export https_proxy=${http_proxy}

git clone https://github.com/Mellanox/mstflint
cd mstflint
./autogen.sh
./configure --disable-inband
make && make install

# 接下来,配置宿主机当作nat路由器,这样bf2上的操作系统,也能访问互联网了。
# nat router on host
# https://access.redhat.com/discussions/4642721
cat << EOF >> /etc/sysctl.d/99-wzh-sysctl.conf

net.ipv4.ip_forward = 1

EOF
sysctl --system

systemctl disable --now firewalld

# on host
cat << EOF >> /etc/rc.d/rc.local

iptables -t nat -A POSTROUTING -o eno2 -j MASQUERADE

EOF
chmod +x /etc/rc.d/rc.local
systemctl enable --now rc-local

flash bf2 with offical image

if you want to flash the bf2 to offical doca ubuntu image, follow steps here.

# on host
mkdir -p /data/soft
cd /data/soft

cat << EOF > pwd
panpan
EOF

cat << EOF > bf.cfg
ubuntu_PASSWORD='`openssl passwd -1 -in pwd`'
EOF

dnf install -y pv

# https://docs.nvidia.com/doca/sdk/installation-guide/index.html
bfb-install --bfb /data/down/DOCA_v1.2.0_BlueField_OS_Ubuntu_20.04-5.4.0-1022-bluefield-5.5-1.0.3.2-3.8.0.11969-1.signed-aarch64.bfb --config bf.cfg --rshim rshim0

# console=hvc0 console=ttyAMA0 earlycon=pl011,0x01000000 fixrtc quiet

# on host
# set ip address to connect to bf2
# nmcli conn add type tun mode tap con-name tmfifo_net0 ifname tmfifo_net0 autoconnect yes ip4 192.168.100.1
nmcli conn modify tmfifo_net0 ipv4.address 192.168.100.1/30
nmcli conn up tmfifo_net0
# if you want to connect to bf2 through serial console
minicom --color on --baudrate 115200 --device /dev/rshim0/console

# on bf2
# login using ubuntu / panpan

sudo -i
passwd

sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
systemctl restart sshd

# set ip address to connect from host
cat << EOF > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
network: {config: disabled}
EOF
cat << EOF > /etc/netplan/50-netcfg-wzh.yaml
network:
    ethernets:
        oob_net0:
            dhcp4: true
        tmfifo_net0:
            addresses:
            - 192.168.100.2/30
            dhcp4: false
            nameservers:
                addresses:
                - 172.21.1.1
            routes:
            -   metric: 1025
                to: 0.0.0.0/0
                via: 192.168.100.1
    renderer: NetworkManager
    version: 2
EOF
netplan apply
/etc/init.d/networking restart

# on host
# 接下来,就可以很舒适的从宿主机上ssh到bf2卡上了
ssh root@192.168.100.2

dpi url-filter test

https://docs.nvidia.com/doca/sdk/url-filter/index.html

我们参考官方文档,做dpi URL-Filter的测试。

# on bf2
cd /opt/mellanox/doca/examples/url_filter/bin

echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
systemctl restart mlx-regex
systemctl status mlx-regex
# ● mlx-regex.service - Regex daemon for BlueField 2
#      Loaded: loaded (/etc/systemd/system/mlx-regex.service; enabled; vendor preset: enabled)
#      Active: active (running) since Thu 2021-12-16 11:47:01 UTC; 7s ago
#    Main PID: 55816 (mlx-regex)
#       Tasks: 1 (limit: 19083)
#      Memory: 564.0K
#      CGroup: /system.slice/mlx-regex.service
#              └─55816 /usr/bin/mlx-regex

# Dec 16 11:47:01 localhost systemd[1]: Started Regex daemon for BlueField 2.

/opt/mellanox/iproute2/sbin/mlxdevm port show
# pci/0000:03:00.0/294912: type eth netdev en3f0pf0sf0 flavour pcisf controller 0 pfnum 0 sfnum 0
#   function:
#     hw_addr 02:56:ae:76:cd:e9 state active opstate attached roce true max_uc_macs 128 trust off
# pci/0000:03:00.1/360448: type eth netdev en3f1pf1sf0 flavour pcisf controller 0 pfnum 1 sfnum 0
#   function:
#     hw_addr 02:26:61:34:13:9e state active opstate attached roce true max_uc_macs 128 trust off

/opt/mellanox/iproute2/sbin/mlxdevm port add pci/0000:03:00.0 flavour pcisf pfnum 0 sfnum 4
/opt/mellanox/iproute2/sbin/mlxdevm port add pci/0000:03:00.0 flavour pcisf pfnum 0 sfnum 5 

/opt/mellanox/iproute2/sbin/mlxdevm port show
# pci/0000:03:00.0/294912: type eth netdev en3f0pf0sf0 flavour pcisf controller 0 pfnum 0 sfnum 0
#   function:
#     hw_addr 02:56:ae:76:cd:e9 state active opstate attached roce true max_uc_macs 128 trust off
# pci/0000:03:00.0/294913: type eth netdev en3f0pf0sf4 flavour pcisf controller 0 pfnum 0 sfnum 4
#   function:
#     hw_addr 00:00:00:00:00:00 state inactive opstate detached roce true max_uc_macs 128 trust off
# pci/0000:03:00.0/294914: type eth netdev en3f0pf0sf5 flavour pcisf controller 0 pfnum 0 sfnum 5
#   function:
#     hw_addr 00:00:00:00:00:00 state inactive opstate detached roce true max_uc_macs 128 trust off
# pci/0000:03:00.1/360448: type eth netdev en3f1pf1sf0 flavour pcisf controller 0 pfnum 1 sfnum 0
#   function:
#     hw_addr 02:26:61:34:13:9e state active opstate attached roce true max_uc_macs 128 trust off

/opt/mellanox/iproute2/sbin/mlxdevm port function set pci/0000:03:00.0/294913 hw_addr 02:25:f2:8d:a2:4c trust on state active
/opt/mellanox/iproute2/sbin/mlxdevm port function set pci/0000:03:00.0/294914 hw_addr 02:25:f2:8d:a2:5c trust on state active

ovs-vsctl del-br ovsbr1 

ovs-vsctl add-br sf_bridge1
ovs-vsctl add-br sf_bridge2
ovs-vsctl add-port sf_bridge1 p0
ovs-vsctl add-port sf_bridge1 en3f0pf0sf4
ovs-vsctl add-port sf_bridge2 pf0hpf
ovs-vsctl add-port sf_bridge2 en3f0pf0sf5 

ovs-vsctl show
# 04d25b73-2f63-4e47-b7d9-2362cc4d7fda
#     Bridge ovsbr2
#         Port p1
#             Interface p1
#         Port en3f1pf1sf0
#             Interface en3f1pf1sf0
#         Port ovsbr2
#             Interface ovsbr2
#                 type: internal
#         Port pf1hpf
#             Interface pf1hpf
#     Bridge sf_bridge2
#         Port sf_bridge2
#             Interface sf_bridge2
#                 type: internal
#         Port en3f0pf0sf5
#             Interface en3f0pf0sf5
#         Port pf0hpf
#             Interface pf0hpf
#     Bridge sf_bridge1
#         Port sf_bridge1
#             Interface sf_bridge1
#                 type: internal
#         Port en3f0pf0sf4
#             Interface en3f0pf0sf4
#         Port p0
#             Interface p0
#     ovs_version: "2.15.1"

ifconfig en3f0pf0sf4 up
ifconfig en3f0pf0sf5 up

echo mlx5_core.sf.4  > /sys/bus/auxiliary/drivers/mlx5_core.sf_cfg/unbind
echo mlx5_core.sf.4  > /sys/bus/auxiliary/drivers/mlx5_core.sf/bind
echo mlx5_core.sf.5  > /sys/bus/auxiliary/drivers/mlx5_core.sf_cfg/unbind
echo mlx5_core.sf.5  > /sys/bus/auxiliary/drivers/mlx5_core.sf/bind

ls /sys/bus/auxiliary/devices/mlx5_core.sf.*
# /sys/bus/auxiliary/devices/mlx5_core.sf.2:
# driver  infiniband  infiniband_mad  infiniband_verbs  mlx5_core.eth.2  mlx5_core.rdma.2  net  power  sfnum  subsystem  uevent

# /sys/bus/auxiliary/devices/mlx5_core.sf.3:
# driver  infiniband  infiniband_mad  infiniband_verbs  mlx5_core.eth.3  mlx5_core.rdma.3  net  power  sfnum  subsystem  uevent

# /sys/bus/auxiliary/devices/mlx5_core.sf.4:
# driver  infiniband  infiniband_mad  infiniband_verbs  mlx5_core.eth.4  mlx5_core.rdma.4  net  power  sfnum  subsystem  uevent

# /sys/bus/auxiliary/devices/mlx5_core.sf.5:
# driver  infiniband  infiniband_mad  infiniband_verbs  mlx5_core.eth.5  mlx5_core.rdma.5  net  power  sfnum  subsystem  uevent

cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
# 4

# on 104 host with bf2
# nmcli con modify enp6s0f1 ipv4.method manual ipv4.addresses 192.168.99.11/24
nmcli con down enp6s0f1
nmcli con modify enp6s0f0 ipv4.method manual ipv4.addresses 192.168.99.11/24
nmcli con up enp6s0f0

# on 104 bf2
# 我们创建url filter规则。
/opt/mellanox/doca/examples/url_filter/bin/doca_url_filter -a 0000:03:00.0,class=regex -a auxiliary:mlx5_core.sf.4,sft_en=1 -a auxiliary:mlx5_core.sf.5,sft_en=1 -- -p
URL FILTER>> create database
URL FILTER>> filter http wzh_hits_msg wzhtest
URL FILTER>> commit database /tmp/signature.txt
# /tmp/104052/signatures.rules
# rules file is /tmp/104052/signatures.rules
# Info: Setting target hardware version to v5.7...done
# Info: Setting virtual prefix mode to 0...done
# Info: Setting prefix capacity to 32K...done
# Info: Setting compiler objective value to 5...done
# Info: Setting number of threads for compilation to 1...done
# Info: Reading ruleset...done
# Info: Detected 2 rules
# Info: Enabling global single-line mode...done
# Info: Setting maximum TPE data width to 4...done
# Info: Scanning rules...[==============================]...done
# Info: Analising possible prefix usage...[==============================]...done
# Info: Mapping prefixes, phase 1...[==============================]...done
# Info: Mapping prefixes, phase 2...[==============================]...done
# Info: Running rules analysis...[==============================]...done
# Info: Optimizing memory map...[==============================]...done
# Info: Analyzing memory map...[==============================]...done
# Info: Calculating thread instructions...[==============================]...done
# Info: Beginning to write memory map for ROF2...done
# Info: PPE total 1-byte prefix usage: 0/256 (0%)
# Info: PPE total 2-byte prefix usage: 0/2048 (0%)
# Info: PPE total 3-byte prefix usage: 0/2048 (0%)
# Info: PPE total 4-byte prefix usage: 1/32768 (0.00305176%)
# Info: TPE instruction RAM TCM partition usage: 2048/2048 (100%)
# Info: TPE instruction RAM external memory partition usage: 6207/13M (0.0455343%)
# Info: TPE class RAM usage: 1/256 (0.390625%)
# Info: Estimated threads/byte: 5.183e-10
# Info: Finalizing memory map for ROF2...done
# Info: Storing ROF2 data...done
# Info: Number of rules compiled = 2/2
# Info: Writing ROF2 file to /tmp/104052/rof/signatures_compiled.rof2
# Info: Writing binary ROF2 file to /tmp/104052/rof/signatures_compiled.rof2.binary...done
URL FILTER>> [12:36:50:606702][DOCA][I][UFLTR::Core]: SIG ID: 1, URL MSG: wzh_hits_msg, SFT_FID: 1

# on 101
curl http://192.168.99.11
# ....
#       <footer class="col-sm-12">
#       <a href="https://apache.org">Apache&trade;</a> is a registered trademark of <a href="https://apache.org">the Apache Software Foundation</a> in the United States and/or other countries.<br />
#       <a href="https://nginx.org">NGINX&trade;</a> is a registered trademark of <a href="https://">F5 Networks, Inc.</a>.
#       </footer>

#   </body>
# </html>

curl http://192.168.99.11/test
# <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
# <html><head>
# <title>404 Not Found</title>
# </head><body>
# <h1>Not Found</h1>
# <p>The requested URL was not found on this server.</p>
# </body></html>

# 一下url命中了规则,可以看到访问不成功。
# 其他没有命中的规则,就可以访问http服务。
curl http://192.168.99.11/wzhtest
# curl: (56) Recv failure: Connection timed out

performance test

简单的测试一下性能,由于环境的物理设备条件所限,所以结果并不准确。

# on 104 host
dnf install -y iperf3
iperf3 -s -p 6666

# on 101 host
iperf3 -c 192.168.99.11 -p 6666
# Connecting to host 192.168.99.11, port 6666
# [  5] local 192.168.99.21 port 37060 connected to 192.168.99.11 port 6666
# [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
# [  5]   0.00-1.00   sec  1.40 GBytes  12.1 Gbits/sec   17    905 KBytes
# [  5]   1.00-2.00   sec  1.46 GBytes  12.6 Gbits/sec   26    795 KBytes
# [  5]   2.00-3.00   sec  1.41 GBytes  12.1 Gbits/sec   71    922 KBytes
# [  5]   3.00-4.00   sec  1.49 GBytes  12.8 Gbits/sec    0    998 KBytes
# [  5]   4.00-5.00   sec  1.44 GBytes  12.4 Gbits/sec   44   1010 KBytes
# [  5]   5.00-6.00   sec  1.34 GBytes  11.5 Gbits/sec  101    796 KBytes
# [  5]   6.00-7.00   sec  1.45 GBytes  12.5 Gbits/sec    9    925 KBytes
# [  5]   7.00-8.00   sec  1.39 GBytes  11.9 Gbits/sec    0   1014 KBytes
# [  5]   8.00-9.00   sec  1.45 GBytes  12.4 Gbits/sec   62    930 KBytes
# [  5]   9.00-10.00  sec  1.44 GBytes  12.3 Gbits/sec  157   1.07 MBytes
# - - - - - - - - - - - - - - - - - - - - - - - - -
# [ ID] Interval           Transfer     Bitrate         Retr
# [  5]   0.00-10.00  sec  14.3 GBytes  12.3 Gbits/sec  487             sender
# [  5]   0.00-10.04  sec  14.3 GBytes  12.2 Gbits/sec                  receiver

# iperf Done.

ethtool enp5s0f1
# Settings for enp5s0f1:
#         Supported ports: [ Backplane ]
#         Supported link modes:   1000baseKX/Full
#                                 10000baseKR/Full
#                                 25000baseCR/Full
#                                 25000baseKR/Full
#                                 25000baseSR/Full
#         Supported pause frame use: Symmetric
#         Supports auto-negotiation: Yes
#         Supported FEC modes: None        RS      BASER
#         Advertised link modes:  1000baseKX/Full
#                                 10000baseKR/Full
#                                 25000baseCR/Full
#                                 25000baseKR/Full
#                                 25000baseSR/Full
#         Advertised pause frame use: Symmetric
#         Advertised auto-negotiation: Yes
#         Advertised FEC modes: None       RS      BASER
#         Link partner advertised link modes:  Not reported
#         Link partner advertised pause frame use: No
#         Link partner advertised auto-negotiation: Yes
#         Link partner advertised FEC modes: Not reported
#         Speed: 25000Mb/s
#         Duplex: Full
#         Auto-negotiation: on
#         Port: Direct Attach Copper
#         PHYAD: 0
#         Transceiver: internal
#         Supports Wake-on: d
#         Wake-on: d
#         Current message level: 0x00000004 (4)
#                                link
#         Link detected: yes

others


# firewall-cmd --permanent --direct --add-rule ipv4 nat POSTROUTING 0 -o eth_ext -j MASQUERADE
# firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i eth_int -o eth_ext -j ACCEPT
# firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i eth_ext -o eth_int -m state --state RELATED,ESTABLISHED -j ACCEPT
# firewall-cmd --permanent --add-port=80/tcp
# firewall-cmd --permanent --add-port=443/tcp
# firewall-cmd --permanent --add-port=53/tcp
# firewall-cmd --permanent --add-port=53/udp
# firewall-cmd --permanent --add-masquerade
# firewall-cmd --reload

# firewall-cmd --permanent --direct --remove-rule ipv4 nat POSTROUTING 0 -o eth_ext -j MASQUERADE
# firewall-cmd --permanent --direct --remove-rule ipv4 filter FORWARD 0 -i eth_int -o eth_ext -j ACCEPT
# firewall-cmd --permanent --direct --remove-rule ipv4 filter FORWARD 0 -i eth_ext -o eth_int -m state --state RELATED,ESTABLISHED -j ACCEPT
# firewall-cmd --permanent --remove-port=80/tcp
# firewall-cmd --permanent --remove-port=443/tcp
# firewall-cmd --permanent --remove-port=53/tcp
# firewall-cmd --permanent --remove-port=53/udp
# firewall-cmd --permanent --remove-masquerade
# firewall-cmd --reload

mellanox bf2 网卡激活snap功能, 配置nvme over fabrics 支持

本文讲述,如果使用mellanox bf2网卡,配置snap,挂载远端的nvme设备给宿主机使用,达到nvme over fabric的目的。

我们的实验环境,是2台物理机,宿主机都装的rocky linux 8.5,其中一个物理机上有nvme设备,另外一个物理机上有bf2网卡。

以下是实验架构:

安装实验环境

首先,我们需要给bf2刷上mellanox官方的doca bfb image,可以简单的理解是刷网卡固件,参考这里的文档做

接下来,我们给nvme设备主机,配置nvme over fabric的能力。


# on 101
# config nvme storage server side
# https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/8/html/managing_storage_devices/overview-of-nvme-over-fabric-devicesmanaging-storage-devices
nmcli con modify enp5s0f1 ipv4.method manual ipv4.addresses 192.168.99.21/24
nmcli con up enp5s0f1

yum install -y nvmetcli

cd /data/down/
# wget http://git.infradead.org/users/hch/nvmetcli.git/blob_plain/0a6b088db2dc2e5de11e6f23f1e890e4b54fee64:/rdma.json
cat << EOF > /data/down/rdma.json
{
  "hosts": [
    {
      "nqn": "hostnqn"
    }
  ],
  "ports": [
    {
      "addr": {
        "adrfam": "ipv4",
        "traddr": "192.168.99.21",
        "treq": "not specified",
        "trsvcid": "4420",
        "trtype": "rdma"
      },
      "portid": 2,
      "referrals": [],
      "subsystems": [
        "testnqn"
      ]
    }
  ],
  "subsystems": [
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "ef90689c-6c46-d44c-89c1-4067801309a8",
            "path": "/dev/nvme0n1"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "testnqn"
    }
  ]
}
EOF
modprobe nvmet-rdma
nvmetcli restore /data/down/rdma.json

dmesg
# ........
# [32664.912901] nvmet: adding nsid 1 to subsystem testnqn
# [32664.914013] nvmet_rdma: enabling port 2 (192.168.99.21:4420)

# to clear config / 清空配置
nvmetcli clear

nvme list
# Node                  SN                   Model                                    Namespace Usage                      Format           FW Rev
# --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
# /dev/nvme0n1          CVCQ726600A0400AGN   INTEL SSDPEDMW400G4                      1         400.09  GB / 400.09  GB    512   B +  0 B   8EV10171

# 测试一下
yum install nvme-cli
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.99.21 -s 4420
# Discovery Log Number of Records 1, Generation counter 2
# =====Discovery Log Entry 0======
# trtype:  rdma
# adrfam:  ipv4
# subtype: nvme subsystem
# treq:    not specified, sq flow control disable supported
# portid:  2
# trsvcid: 4420
# subnqn:  testnqn
# traddr:  192.168.99.21
# rdma_prtype: not specified
# rdma_qptype: connected
# rdma_cms:    rdma-cm
# rdma_pkey: 0x0000

接下来,我们在bf2上,做一些配置

# on 104 bf2
# 查看一下基本信息
ovs-vsctl show
# 04d25b73-2f63-4e47-b7d9-2362cc4d7fda
#     Bridge ovsbr2
#         Port p1
#             Interface p1
#         Port en3f1pf1sf0
#             Interface en3f1pf1sf0
#         Port ovsbr2
#             Interface ovsbr2
#                 type: internal
#         Port pf1hpf
#             Interface pf1hpf
#     Bridge ovsbr1
#         Port en3f0pf0sf0
#             Interface en3f0pf0sf0
#         Port pf0hpf
#             Interface pf0hpf
#         Port p0
#             Interface p0
#         Port ovsbr1
#             Interface ovsbr1
#                 type: internal
#     ovs_version: "2.15.1"

# nmcli con modify enp3s0f0s0 ipv4.method manual ipv4.addresses 192.168.99.11/24
# nmcli con up enp3s0f0s0

# ip addr add 192.168.99.11/24 dev enp3s0f0s0 
# ip addr del 192.168.99.11/24 dev enp3s0f0s0 

# 给一个sf端口,配置ip地址,这样bf2就能连接到远端nvme服务
cat << EOF > /etc/netplan/70-wzh-mlnx.yaml
network:
    ethernets:
        enp3s0f0s0:
            addresses:
            - 192.168.99.11/24
            dhcp4: false
    renderer: NetworkManager
    version: 2

EOF

# 配置bf2参数
mlxconfig -y -d /dev/mst/mt41686_pciconf0 s \
                PF_BAR2_ENABLE=0 \
                PER_PF_NUM_SF=1
mlxconfig -y -d /dev/mst/mt41686_pciconf0 s \
                PCI_SWITCH_EMULATION_ENABLE=1 \
                PCI_SWITCH_EMULATION_NUM_PORT=16 \
                VIRTIO_NET_EMULATION_ENABLE=1 \
                VIRTIO_NET_EMULATION_NUM_VF=0 \
                VIRTIO_NET_EMULATION_NUM_PF=0 \
                VIRTIO_NET_EMULATION_NUM_MSIX=16 \
                ECPF_ESWITCH_MANAGER=1 \
                ECPF_PAGE_SUPPLIER=1 \
                SRIOV_EN=0 \
                PF_SF_BAR_SIZE=8 \
                PF_TOTAL_SF=64
mlxconfig -y -d /dev/mst/mt41686_pciconf0.1 s \
                PF_SF_BAR_SIZE=10 \
                PF_TOTAL_SF=64
mlxconfig -y -d /dev/mst/mt41686_pciconf0 s \
                VIRTIO_BLK_EMULATION_ENABLE=1 \
                VIRTIO_BLK_EMULATION_NUM_PF=0 \
                VIRTIO_BLK_EMULATION_NUM_VF=0 \
                VIRTIO_BLK_EMULATION_NUM_MSIX=16 \
                EXP_ROM_VIRTIO_BLK_UEFI_x86_ENABLE=0

# 清空原来的snap配置
# 系统默认会创建一个demo的nvme设备,我们为了更清晰的做实验,就清空默认的配置
/bin/cp -f /etc/mlnx_snap/snap_rpc_init_bf2.conf /etc/mlnx_snap/snap_rpc_init_bf2.conf.wzh
/bin/cp -f /etc/mlnx_snap/spdk_rpc_init.conf /etc/mlnx_snap/spdk_rpc_init.conf.wzh

echo "" > /etc/mlnx_snap/snap_rpc_init_bf2.conf
echo "" > /etc/mlnx_snap/spdk_rpc_init.conf

# remember to COLD reboot
poweroff

# on bf2
# 重启以后,我们手动设置snap服务,深入的理解一下spdk, snap
# set the snap step by step
snap_rpc.py subsystem_nvme_create Mellanox_NVMe_SNAP "Mellanox NVMe SNAP Controller"
# {
#   "nqn": "nqn.2021-06.mlnx.snap:8b82f658f138ceaf83e3bfc261a7fb14:0",
#   "subsys_id": 0
# }

snap_rpc.py controller_nvme_create mlx5_0 --subsys_id 0 --pf_id 0
# {
#   "name": "NvmeEmu0pf0",
#   "cntlid": 0,
#   "version": "1.3.0",
#   "offload": false,
#   "mempool": false,
#   "max_nsid": 1024,
#   "max_namespaces": 1024
# }

spdk_rpc.py bdev_nvme_attach_controller -b Nvme0 -t rdma -a 192.168.99.21 -f ipv4 -s 4420 -n testnqn
# Nvme0n1

snap_rpc.py controller_nvme_namespace_attach -c NvmeEmu0pf0 spdk Nvme0n1 1

snap_rpc.py emulation_device_attach --num_msix 8 mlx5_0 virtio_blk 
# {
#   "emulation_manager": "mlx5_0",
#   "emulation_type": "virtio_blk",
#   "pci_type": "physical function",
#   "pci_index": 0
# }

snap_rpc.py controller_virtio_blk_create mlx5_0 --bdev_type spdk --bdev Nvme0n1 --pf_id 0 --num_queues 7
# VblkEmu0pf0

# 配置好了,我们检查一下状态
# check status 
snap_rpc.py controller_nvme_namespace_list -n nqn.2021-06.mlnx.snap:8b82f658f138ceaf83e3bfc261a7fb14:0 -i 0
# {
#   "name": "NvmeEmu0pf0",
#   "cntlid": 0,
#   "Namespaces": [
#     {
#       "nsid": 1,
#       "bdev": "Nvme0n1",
#       "bdev_type": "spdk",
#       "qn": "",
#       "protocol": "",
#       "snap-direct": true
#     }
#   ]
# }

snap_rpc.py emulation_managers_list
# [
#   {
#     "emulation_manager": "mlx5_0",
#     "hotplug_support": true,
#     "supported_types": [
#       "nvme",
#       "virtio_blk",
#       "virtio_net"
#     ]
#   }
# ]

spdk_rpc.py bdev_nvme_get_controllers
# [
#   {
#     "name": "Nvme0",
#     "trid": {
#       "trtype": "RDMA",
#       "adrfam": "IPv4",
#       "traddr": "192.168.99.21",
#       "trsvcid": "4420",
#       "subnqn": "testnqn"
#     }
#   }
# ]

snap_rpc.py controller_list
# [
#   {
#     "mempool": false,
#     "name": "VblkEmu0pf0",
#     "emulation_manager": "mlx5_0",
#     "type": "virtio_blk",
#     "pci_index": 0,
#     "pci_bdf": "07:00.0"
#   },
#   {
#     "subnqn": "nqn.2021-06.mlnx.snap:8b82f658f138ceaf83e3bfc261a7fb14:0",
#     "cntlid": 0,
#     "version": "1.3.0",
#     "offload": false,
#     "mempool": false,
#     "max_nsid": 1024,
#     "max_namespaces": 1024,
#     "name": "NvmeEmu0pf0",
#     "emulation_manager": "mlx5_0",
#     "type": "nvme",
#     "pci_index": 0,
#     "pci_bdf": "06:00.2"
#   }
# ]

测试

# on 101, rocky linux
lsblk
# NAME               MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
# sda                  8:0    0 278.9G  0 disk
# ├─sda1               8:1    0     1G  0 part /boot
# └─sda2               8:2    0 277.9G  0 part
#   └─rl_lab101-root 253:0    0 277.9G  0 lvm  /
# sr0                 11:0    1  1024M  0 rom
# nvme0n1            259:0    0 372.6G  0 disk
# └─nvme-data        253:1    0 372.6G  0 lvm

# on 104 host, rocky linux
# before snap setting
lsblk
# NAME                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
# sda                     8:0    0 278.9G  0 disk
# ├─sda1                  8:1    0   600M  0 part /boot/efi
# ├─sda2                  8:2    0     1G  0 part /boot
# └─sda3                  8:3    0 277.3G  0 part
#   └─rl_panlab104-root 253:0    0 277.3G  0 lvm  /

# after snap setting
lsblk
# NAME                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
# sda                     8:0    0 278.9G  0 disk
# ├─sda1                  8:1    0   600M  0 part /boot/efi
# ├─sda2                  8:2    0     1G  0 part /boot
# └─sda3                  8:3    0 277.3G  0 part
#   └─rl_panlab104-root 253:0    0 277.3G  0 lvm  /
# vda                   252:0    0 372.6G  0 disk
# └─nvme-data           253:1    0 372.6G  0 lvm

mount /dev/mapper/nvme-data /mnt
ls /mnt
# bgp-router.qcow2  ocp4-master-0.qcow2  ocp4-windows.qcow2

持久化配置

我们刚才的配置,是实验性质的,一步一步手工做的,把这些配置固定下来,这么做。

# on 104, bf2
cat << EOF > snap_rpc_init_bf2.conf
subsystem_nvme_create Mellanox_NVMe_SNAP "Mellanox NVMe SNAP Controller"
controller_nvme_create mlx5_0 --subsys_id 0 --pf_id 0
controller_nvme_namespace_attach -c NvmeEmu0pf0 spdk Nvme0n1 1
emulation_device_attach --num_msix 8 mlx5_0 virtio_blk 
controller_virtio_blk_create mlx5_0 --bdev_type spdk --bdev Nvme0n1 --pf_id 0 --num_queues 7

EOF

cat << EOF > spdk_rpc_init.conf
bdev_nvme_attach_controller -b Nvme0 -t rdma -a 192.168.99.21 -f ipv4 -s 4420 -n testnqn
EOF

# cold reboot
poweroff

Mellanox CX6 vdpa 硬件卸载 ovs-kernel 方式

本文来讲解,使用mellanox CX6 dx 网卡,实现vdpa硬件卸载。

视频讲解:

vdpa 硬件卸载介绍

既然说到了vdpa卸载,那么我们先简单介绍一下他是什么。

vDPA (virtio data path acceleration) 是一个内核框架,在2020年正式引入内核,NIC厂家会做vDPA网卡,意思是datapath遵循virtio规范,而控制面由厂家驱动提供。

以下是vDPA在虚拟机平台部署时的架构图:

以下是vDPA在k8s平台中部署是的架构图:

上面的架构图,是借用红帽介绍vdpa背景的文章。我们这次的实验,是按照mellanox的文档来做,从mellanox角度看,vdpa有2种方式来做

  1. 配置ovs-dpdk, ovs配置vdpa端口,同时创建socket。vm通过socket挂载vdpa设备。
  2. 配置ovs-kernel,启动vdpa-dpdk程序,同时创建socket。vm通过socket挂载vdpa设备。

第一种方法,由于ovs-dpdk,mellanox官方文档说只支持到rhel/centos 7 , 我们的环境是rhel/rocky 8.4,所以我们用后面一种方法。

在这里,背景介绍的很简单,以下是参考链接,可以更深入的学习:

有一个dpdk特殊概念,vf representor,dpdk文档有说,简单理解,是给控制面准备的vf的分身。

  • https://doc.dpdk.org/guides-18.11/prog_guide/switch_representation.html
   .-------------.                 .-------------. .-------------.
   | hypervisor  |                 |    VM 1     | |    VM 2     |
   | application |                 | application | | application |
   `--+---+---+--'                 `----------+--' `--+----------'
      |   |   |                               |       |
      |   |   `-------------------.           |       |
      |   `---------.             |           |       |
      |             |             |           |       |
.-----+-----. .-----+-----. .-----+-----.     |       |
| port_id 3 | | port_id 4 | | port_id 5 |     |       |
`-----+-----' `-----+-----' `-----+-----'     |       |
      |             |             |           |       |
    .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
    | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
    `-+--'    `-----+-----' `-----+-----' `---+--' `--+---'
      |             |             |           |       |
      |             |   .---------'           |       |
      `-----.       |   |   .-----------------'       |
            |       |   |   |   .---------------------'
            |       |   |   |   |
         .--+-------+---+---+---+--.
         | managed interconnection |
         `------------+------------'
                      |
                 .----+-----.
                 | physical |
                 |  port 0  |
                 `----------'

本次实验的架构图如下:

系统安装


export VAR_HOST='rl_panlab105'

# 按照完了操作系统以后,添加kernel参数,主要是intel_iommu=on iommu=pt,然后重启
cp /etc/default/grub /etc/default/grub.bak
sed -i "/GRUB_CMDLINE_LINUX/s/resume=[^[:space:]]*//"  /etc/default/grub
sed -i "/GRUB_CMDLINE_LINUX/s/rd.lvm.lv=${VAR_HOST}\\/swap//"  /etc/default/grub
# https://unix.stackexchange.com/questions/403706/sed-insert-text-after-nth-character-preceding-following-a-given-string
sed -i '/GRUB_CMDLINE_LINUX/s/"/ intel_iommu=on iommu=pt  default_hugepagesz=1G hugepagesz=1G hugepages=16 rdblacklist=nouveau"/2' /etc/default/grub

grub2-mkconfig -o /boot/efi/EFI/rocky/grub.cfg

grub2-mkconfig -o /boot/grub2/grub.cfg

# 添加kvm cpu host mode模式的支持,可以不做
cat << EOF > /etc/modprobe.d/kvm-nested.conf
options kvm_intel nested=1  
options kvm-intel enable_shadow_vmcs=1   
options kvm-intel enable_apicv=1         
options kvm-intel ept=1                  
EOF

# 默认的操作系统安装,有swap, home分区,我们是测试系统,全都删了吧。
umount /home
swapoff  /dev/$VAR_HOST/swap

cp /etc/fstab /etc/fstab.bak
sed -i 's/^[^#]*home/#&/' /etc/fstab
sed -i 's/^[^#]*swap/#&/' /etc/fstab

lvremove -f /dev/$VAR_HOST/home
lvremove -f /dev/$VAR_HOST/swap

lvextend -l +100%FREE /dev/$VAR_HOST/root
xfs_growfs /dev/$VAR_HOST/root

# 至此,开始安装网卡驱动
# 103 driver install
# https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed
mkdir -p /data/down/
cd /data/down/
dnf groupinstall -y 'Development Tools'
dnf groupinstall -y "Server with GUI"

wget https://www.mellanox.com/downloads/ofed/MLNX_OFED-5.4-3.0.3.0/MLNX_OFED_LINUX-5.4-3.0.3.0-rhel8.4-x86_64.tgz
tar zvxf *.tgz
cd /data/down/MLNX_OFED_LINUX-5.4-3.0.3.0-rhel8.4-x86_64
dnf install -y tcl tk kernel-modules-extra python36 make gcc-gfortran tcsh unbound
./mlnxofedinstall --all --force --distro rhel8.4
# ./mlnxofedinstall --dpdk --ovs-dpdk --upstream-libs --add-kernel-support --force --distro rhel8.4

reboot

systemctl enable --now mst
systemctl enable --now openibd

cat << EOF > /etc/yum.repos.d/mlx.repo
[mlnx_ofed]
name=MLNX_OFED Repository
baseurl=file:///data/down/MLNX_OFED_LINUX-5.4-3.0.3.0-rhel8.4-x86_64/RPMS
enabled=1
gpgcheck=0
EOF

dnf makecache 

# 开始安装dpdk相关的软件
mkdir -p /data/soft
cd /data/soft

dnf config-manager --set-enabled powertools
dnf install -y ninja-build meson

# 装mlnx版本的dpdk组件和ovs软件
# dnf group list
# dnf groupinstall -y 'Development Tools'
# install dpdk
dnf install -y mlnx-dpdk mlnx-dpdk-devel numactl-devel openvswitch  openvswitch-selinux-policy libnl3-devel openssl-devel zlib-devel libpcap-devel elfutils-libelf-devel 
# https://doc.dpdk.org/guides/linux_gsg/sys_reqs.html#compilation-of-the-dpdk
pip3 install --user pyelftools

systemctl enable --now openvswitch

export PATH=$PATH:/opt/mellanox/dpdk/bin/
echo 'export PATH=$PATH:/opt/mellanox/dpdk/bin/' >> ~/.bash_profile

# 编译上游的dpdk软件包,因为我们要用里面的vdpa sample程序
cd /data/soft/
wget https://fast.dpdk.org/rel/dpdk-20.11.3.tar.xz
tar vxf dpdk-20.11.3.tar.xz
# https://core.dpdk.org/doc/quick-start/
cd /data/soft/dpdk-stable-20.11.3/
# meson -Dexamples=all build
meson --reconfigure -Dexamples=all build
ninja -C build

export PKG_CONFIG_PATH=/opt/mellanox/dpdk/lib64/pkgconfig/
cd /data/soft/dpdk-stable-20.11.3/examples/vdpa
make -j 

# 按照kvm相关软件包
# install kvm with qemu
# dnf -y groupinstall "Server with GUI"

dnf -y install qemu-kvm libvirt libguestfs-tools virt-install virt-viewer virt-manager tigervnc-server

systemctl disable --now firewalld
systemctl enable --now libvirtd

# 最后,设置mlx网卡参数,激活sriov
# pci地址,使用 lspci -D | grep -i mell 或者 lshw -c network -businfo 得到
lspci -D | grep -i mell
# 0000:04:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
# 0000:04:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

lshw -c network -businfo
# Bus info          Device     Class          Description
# =======================================================
# pci@0000:02:00.0  eno3       network        NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
# pci@0000:02:00.1  eno4       network        NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
# pci@0000:01:00.0  eno1       network        NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
# pci@0000:01:00.1  eno2       network        NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
# pci@0000:04:00.0  enp4s0f0   network        MT2892 Family [ConnectX-6 Dx]
# pci@0000:04:00.1  enp4s0f1   network        MT2892 Family [ConnectX-6 Dx]

# UCTX_EN is for enable DevX
# DevX allows to access firmware objects
mlxconfig -y -d 0000:04:00.0 set SRIOV_EN=1 UCTX_EN=1 NUM_OF_VFS=8

ovs-kernel 方案

网卡设置脚本

# mlx默认的ovs,缺少一些selinux的配置,在此补上
# 项目上,可以根据需要,自行补充缺少的selinux配置
semodule -i wzh-mellanox-ovs-dpdk.pp

# 这里做了一个配置和启动ovs的脚步,逻辑是先清空ovs配置,再配置网卡模式,然后启动ovs
cat << 'EOF' > /data/ovs-offload-env.sh
#!/usr/bin/env bash

set -e
set -x

systemctl restart openvswitch
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=try
systemctl restart openvswitch

ip link set dev ${IFNAME} down || true
ip link set dev ${IFNAME}_0 down || true
ip link set dev ${IFNAME}_1 down || true

ip link set dev ${IFNAME}v0 down || true
ip link set dev ${IFNAME}v1 down || true

ovs-vsctl del-port ovs-sriov ${IFNAME} || true
ovs-vsctl del-port ovs-sriov ${IFNAME}_0 || true
ovs-vsctl del-port ovs-sriov ${IFNAME}_1 || true
ovs-vsctl del-br ovs-sriov || true

ovs-vsctl del-port br0-ovs pf0vf0 || true
ovs-vsctl del-port br0-ovs pf0vf1 || true
ovs-vsctl del-port br0-ovs pf0 || true
ovs-vsctl del-br br0-ovs || true

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=false
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-extra=" "
ovs-vsctl --no-wait set Open_vSwitch . other_config={}

# Turn off SR-IOV on the PF device. 
echo 0 > /sys/class/net/$IFNAME/device/sriov_numvfs
cat /sys/class/net/$IFNAME/device/sriov_numvfs
# 0

systemctl restart openvswitch

# Turn ON SR-IOV on the PF device. 
echo 2 > /sys/class/net/$IFNAME/device/sriov_numvfs
cat /sys/class/net/$IFNAME/device/sriov_numvfs
# 2

ip link set $IFNAME vf 0 mac ${VF1MAC}
ip link set $IFNAME vf 1 mac ${VF2MAC}

echo ${PCINUM%%.*}.2 > /sys/bus/pci/drivers/mlx5_core/unbind || true
echo ${PCINUM%%.*}.3 > /sys/bus/pci/drivers/mlx5_core/unbind || true

devlink dev eswitch set pci/$PCINUM mode switchdev
devlink dev eswitch show pci/$PCINUM
# # pci/0000:43:00.0: mode switchdev inline-mode none encap-mode basic

echo ${PCINUM%%.*}.2 > /sys/bus/pci/drivers/mlx5_core/bind
echo ${PCINUM%%.*}.3 > /sys/bus/pci/drivers/mlx5_core/bind

# systemctl enable --now openvswitch
# systemctl restart openvswitch

# Create an OVS bridge (here it's named ovs-sriov). 
ovs-vsctl add-br ovs-sriov

ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

systemctl restart openvswitch

ovs-vsctl add-port ovs-sriov ${IFNAME}
ovs-vsctl add-port ovs-sriov ${IFNAME}_0
ovs-vsctl add-port ovs-sriov ${IFNAME}_1

ip link set dev ${IFNAME} up
ip link set dev ${IFNAME}_0 up
ip link set dev ${IFNAME}_1 up

ip link set dev ${IFNAME}v0 up
ip link set dev ${IFNAME}v1 up

# systemctl restart openvswitch

# ip addr add ${VF1IP} dev ${IFNAME}v0
# ip addr add ${VF2IP} dev ${IFNAME}v1

EOF

# for 103
# export IFNAME=enp4s0f0
# export PCINUM=0000:04:00.0
# export VF1MAC=e4:11:22:33:44:50
# export VF2MAC=e4:11:22:33:44:51
# export VF1IP=192.168.55.21/24
# export VF2IP=192.168.55.22/24
# bash /data/ovs-offload-env.sh

# 设置一下环境变量,就可以执行脚本,启动ovs了。
# for 105
export IFNAME=enp67s0f0
export PCINUM=0000:43:00.0
export VF1MAC=e4:11:22:33:55:60
export VF2MAC=e4:11:22:33:55:61
# export VF1IP=192.168.55.31/24
# export VF2IP=192.168.55.32/24
bash /data/ovs-offload-env.sh

# 我们还需要启动一个DPDK的程序,做vdpa的功能,并接到vf上去。
/data/soft/dpdk-stable-20.11.3/examples/vdpa/build/vdpa -w ${PCINUM%%.*}.2,class=vdpa --log-level=pmd,info -- -i
create /tmp/sock-virtio0 0000:43:00.2
# EAL: Detected 24 lcore(s)
# EAL: Detected 2 NUMA nodes
# Option -w, --pci-whitelist is deprecated, use -a, --allow option instead
# EAL: Detected shared linkage of DPDK
# EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
# EAL: Selected IOVA mode 'VA'
# EAL: No available hugepages reported in hugepages-2048kB
# EAL: Probing VFIO support...
# EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:43:00.2 (socket 1)
# mlx5_vdpa: ROCE is disabled by Netlink successfully.
# EAL: No legacy callbacks, legacy socket not created
# Interactive-mode selected
# vdpa> create /tmp/sock-virtio0 0000:43:00.2
# VHOST_CONFIG: vhost-user server: socket created, fd: 112
# VHOST_CONFIG: bind to /tmp/sock-virtio0
# vdpa>

vdpa> list
# device name     queue num       supported features
# 0000:43:00.2            256             0x114c60180b

vdpa> stats 0000:43:00.2 0
# Device 0000:43:00.2:
#         Virtq 0:
#                 received_descriptors                                             1024
#                 completed_descriptors                                            39
#                 bad descriptor errors                                            0
#                 exceed max chain                                                 0
#                 invalid buffer                                                   0
#                 completion errors                                                0

kvm

接下来,我们就要创建一个kvm,来使用我们的vdpa通道。

由于我们创建了一个socket,需要qemu有权限读取这个socket,所以我们需要把qemu的用户改为root。

sed -i.bak 's/#user = "root"/user = "root"/' /etc/libvirt/qemu.conf

# 我们还需要创建一个网桥,让kvm能接住宿主机的网口能上网。方便访问和管理。
mkdir -p /data/kvm
cat << 'EOF' > /data/kvm/bridge.sh
#!/usr/bin/env bash

PUB_CONN='eno1'
PUB_IP='172.21.6.103/24'
PUB_GW='172.21.6.254'
PUB_DNS='172.21.1.1'

nmcli con down "$PUB_CONN"
nmcli con delete "$PUB_CONN"
nmcli con down baremetal
nmcli con delete baremetal
# RHEL 8.1 appends the word "System" in front of the connection,delete in case it exists
nmcli con down "System $PUB_CONN"
nmcli con delete "System $PUB_CONN"
nmcli connection add ifname baremetal type bridge con-name baremetal ipv4.method 'manual' \
    ipv4.address "$PUB_IP" \
    ipv4.gateway "$PUB_GW" \
    ipv4.dns "$PUB_DNS"
    
nmcli con add type bridge-slave ifname "$PUB_CONN" master baremetal
nmcli con down "$PUB_CONN";pkill dhclient;dhclient baremetal
nmcli con up baremetal
EOF
bash /data/kvm/bridge.sh

# 我们先用标准的方法,创建,启动和安装一个kvm
cd /data/kvm
export DOMAIN=cx6.1

virt-install --name="${DOMAIN}" --vcpus=2 --ram=8192 \
--cputune vcpupin0.vcpu=14,vcpupin1.vcpu=16 \
--memorybacking hugepages.page0.size=1,hugepages.page0.unit=GiB \
--cpu host-model \
--disk path=/data/kvm/${DOMAIN}.qcow2,bus=virtio,size=30 \
--os-variant rhel8.4 \
--network bridge=baremetal,model=virtio \
--graphics vnc,port=59000 \
--boot menu=on --location /data/kvm/Rocky-8.4-x86_64-minimal.iso \
--initrd-inject helper-ks-rocky.cfg --extra-args "inst.ks=file:/helper-ks-rocky.cfg" 

# 接下来,配置这个kvm,把vdpa的通道加入到kvm里面。
# https://unix.stackexchange.com/questions/235414/libvirt-how-to-pass-qemu-command-line-args
# virt-xml $DOMAIN --edit --confirm --qemu-commandline 'env=MY-ENV=1234'
virt-xml $DOMAIN --edit --qemu-commandline='-chardev socket,id=charnet1,path=/tmp/sock-virtio0'
virt-xml $DOMAIN --edit --qemu-commandline='-netdev vhost-user,chardev=charnet1,queues=16,id=hostnet1'
virt-xml $DOMAIN --edit --qemu-commandline='-device virtio-net-pci,mq=on,vectors=6,netdev=hostnet1,id=net1,mac=e4:11:c6:d3:45:f2,bus=pcie.0,addr=0x6,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024'

接下来,要手动修改如下的配置配置,注意这里cpu binding的核,都应该在一个numa上面。

virsh edit cx6.1

  <cputune>
    <vcpupin vcpu='0' cpuset='14'/>
    <vcpupin vcpu='1' cpuset='16'/>
  </cputune>

  <cpu mode='host-model' check='partial'>
    <numa>
      <cell id='0' cpus='0-1' memory='8388608' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  

最后的配置样例如下,项目中,可以根据以下例子排错。

virsh dumpxml cx6.1

<domain type='kvm' id='11' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>cx6.1</name>
  <uuid>5cbb6f7c-7122-4fc4-9706-ff46aed3bf25</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://redhat.com/rhel/8.4"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>2</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='14'/>
    <vcpupin vcpu='1' cpuset='16'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-rhel8.2.0'>hvm</type>
    <boot dev='hd'/>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge-IBRS</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='pcid'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='umip'/>
    <feature policy='require' name='md-clear'/>
    <feature policy='require' name='stibp'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='ibpb'/>
    <feature policy='require' name='ibrs'/>
    <feature policy='require' name='amd-stibp'/>
    <feature policy='require' name='amd-ssbd'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='require' name='pschange-mc-no'/>
    <numa>
      <cell id='0' cpus='0-1' memory='8388608' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/data/kvm/cx6.1.qcow2' index='2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu'/>
      <target dev='sda' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x16'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:8d:b6:8e'/>
      <source bridge='baremetal'/>
      <target dev='vnet2'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/6'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/6'>
      <source path='/dev/pts/6'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-11-cx6.1/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <graphics type='vnc' port='59000' autoport='no' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <stats period='5'/>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <alias name='rng0'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </rng>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c46,c926</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c46,c926</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+0</label>
    <imagelabel>+0:+0</imagelabel>
  </seclabel>
  <qemu:commandline>
    <qemu:arg value='-chardev'/>
    <qemu:arg value='socket,id=charnet1,path=/tmp/sock-virtio0'/>
    <qemu:arg value='-netdev'/>
    <qemu:arg value='vhost-user,chardev=charnet1,queues=16,id=hostnet1'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-net-pci,mq=on,vectors=6,netdev=hostnet1,id=net1,mac=e4:11:c6:d3:45:f2,bus=pcie.0,addr=0x6,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024'/>
  </qemu:commandline>
</domain>

赶紧试试吧

接下来就进入测试和体验环节。

# in cx6.1 kvm
# nmcli dev connect enp0s6
nmcli con modify enp0s6 ipv4.method manual ipv4.addresses 192.168.99.11/24
# nmcli con modify enp0s6 ipv4.method manual ipv4.addresses 192.168.55.91/24
nmcli con up enp0s6

# on peer machine (102)
nmcli con modify enp66s0f0 ipv4.method manual ipv4.addresses 192.168.99.21/24
# nmcli con modify enp66s0f0 ipv4.method manual ipv4.addresses 192.168.55.92/24
# nmcli dev connect enp66s0f0
nmcli con up enp66s0f0

# run after the tcpdump is running
ping 192.168.99.21
# PING 192.168.99.21 (192.168.99.21) 56(84) bytes of data.
# 64 bytes from 192.168.99.21: icmp_seq=1 ttl=64 time=0.089 ms
# 64 bytes from 192.168.99.21: icmp_seq=2 ttl=64 time=0.044 ms
# 64 bytes from 192.168.99.21: icmp_seq=3 ttl=64 time=0.046 ms
# ....

# on 105
tcpdump -i enp67s0f0_0 -w dump.test
# dropped privs to tcpdump
# tcpdump: listening on enp67s0f0_0, link-type EN10MB (Ethernet), capture size 262144 bytes
# ^C2 packets captured
# 2 packets received by filter
# 0 packets dropped by kernel

tcpdump -i enp67s0f0 -w dump.test
# dropped privs to tcpdump
# tcpdump: listening on enp67s0f0, link-type EN10MB (Ethernet), capture size 262144 bytes
# ^C4 packets captured
# 4 packets received by filter
# 0 packets dropped by kernel

用 wireshark 打开,可以看到是标准的icmp包,说明我们构建的是数据通路,而不是协议封装。另外,我们会发现,ping了很多包,但是我们只是抓到了1个,这是因为,网卡offload了,我们只能抓到第一个进入内核查流表的包,后面的都网卡offload了,就抓不到了。

以下是在pf上抓的包,抓到了4个。都是流的第一个包,后面的就都offload啦。

# ovs-dpctl dump-flows

# on 105
# 看看ovs的流表,可以看到有2个arp(0x0806)的流表(0x0806),正向和方向
# 还有2个ip(0x0800)的流表,正向和反向
ovs-appctl dpctl/dump-flows type=offloaded
# recirc_id(0),in_port(2),eth(src=0c:42:a1:fa:18:8e,dst=e4:11:c6:d3:45:f2),eth_type(0x0800),ipv4(frag=no), packets:149, bytes:15198, used:0.510s, actions:3
# recirc_id(0),in_port(2),eth(src=0c:42:a1:fa:18:8e,dst=e4:11:c6:d3:45:f2),eth_type(0x0806), packets:0, bytes:0, used:8.700s, actions:3
# recirc_id(0),in_port(3),eth(src=e4:11:c6:d3:45:f2,dst=0c:42:a1:fa:18:8e),eth_type(0x0800),ipv4(frag=no), packets:149, bytes:14602, used:0.510s, actions:2
# recirc_id(0),in_port(3),eth(src=e4:11:c6:d3:45:f2,dst=0c:42:a1:fa:18:8e),eth_type(0x0806), packets:0, bytes:0, used:8.701s, actions:2

# 我们再看看tc的配置,可以看到ovs把配置下发给了tc
# 这里是vf的入流量,可以看到它把流量镜像给了父端口,并且规则由硬件实现
tc -s filter show dev enp67s0f0_0 ingress
# filter protocol ip pref 2 flower chain 0
# filter protocol ip pref 2 flower chain 0 handle 0x1
#   dst_mac 0c:42:a1:fa:18:8e
#   src_mac e4:11:c6:d3:45:f2
#   eth_type ipv4
#   ip_flags nofrag
#   in_hw in_hw_count 1
#         action order 1: mirred (Egress Redirect to device enp67s0f0) stolen
#         index 4 ref 1 bind 1 installed 318 sec used 0 sec
#         Action statistics:
#         Sent 30380 bytes 310 pkt (dropped 0, overlimits 0 requeues 0)
#         Sent software 0 bytes 0 pkt
#         Sent hardware 30380 bytes 310 pkt
#         backlog 0b 0p requeues 0
#         cookie 8be6df4d7d4c33fce08f01a46fa10a4a
#         no_percpu
#         used_hw_stats delayed

# 我们再看看vf的出流量
# 有2个规则,一个是arp,一个是ip
# 都会把流量镜像给了父端口,并且规则由硬件实现
tc -s filter show dev enp67s0f0_0 egress
# filter ingress protocol ip pref 2 flower chain 0
# filter ingress protocol ip pref 2 flower chain 0 handle 0x1
#   dst_mac 0c:42:a1:fa:18:8e
#   src_mac e4:11:c6:d3:45:f2
#   eth_type ipv4
#   ip_flags nofrag
#   in_hw in_hw_count 1
#         action order 1: mirred (Egress Redirect to device enp67s0f0) stolen
#         index 4 ref 1 bind 1 installed 379 sec used 0 sec
#         Action statistics:
#         Sent 36260 bytes 370 pkt (dropped 0, overlimits 0 requeues 0)
#         Sent software 0 bytes 0 pkt
#         Sent hardware 36260 bytes 370 pkt
#         backlog 0b 0p requeues 0
#         cookie 8be6df4d7d4c33fce08f01a46fa10a4a
#         no_percpu
#         used_hw_stats delayed

# filter ingress protocol arp pref 4 flower chain 0
# filter ingress protocol arp pref 4 flower chain 0 handle 0x1
#   dst_mac 0c:42:a1:fa:18:8e
#   src_mac e4:11:c6:d3:45:f2
#   eth_type arp
#   in_hw in_hw_count 1
#         action order 1: mirred (Egress Redirect to device enp67s0f0) stolen
#         index 3 ref 1 bind 1 installed 13 sec used 6 sec
#         Action statistics:
#         Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
#         Sent software 0 bytes 0 pkt
#         Sent hardware 60 bytes 1 pkt
#         backlog 0b 0p requeues 0
#         cookie 1fbfd56eae42f9dbe71bf99bd800cd6d
#         no_percpu
#         used_hw_stats delayed

tc qdisc show dev enp67s0f0_0
# qdisc mq 0: root
# qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
# qdisc ingress ffff: parent ffff:fff1 ----------------

# 最后,我们把系统环境记录一下,方便回忆和项目上对比。
# on 105
ip link
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master baremetal state UP mode DEFAULT group default qlen 1000
#     link/ether 90:b1:1c:40:59:27 brd ff:ff:ff:ff:ff:ff
# 3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
#     link/ether 90:b1:1c:40:59:28 brd ff:ff:ff:ff:ff:ff
# 4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
#     link/ether 90:b1:1c:40:59:29 brd ff:ff:ff:ff:ff:ff
# 5: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
#     link/ether 90:b1:1c:40:59:2a brd ff:ff:ff:ff:ff:ff
# 6: enp67s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a2 brd ff:ff:ff:ff:ff:ff
#     vf 0     link/ether e4:11:22:33:55:60 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
#     vf 1     link/ether e4:11:22:33:55:61 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
# 7: enp67s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a3 brd ff:ff:ff:ff:ff:ff
# 8: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN mode DEFAULT group default qlen 256
#     link/infiniband 00:00:10:28:fe:80:00:00:00:00:00:00:98:03:9b:03:00:cc:71:2c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
# 9: baremetal: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
#     link/ether 90:b1:1c:40:59:27 brd ff:ff:ff:ff:ff:ff
# 10: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
#     link/ether 52:54:00:8f:4a:bc brd ff:ff:ff:ff:ff:ff
# 11: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN mode DEFAULT group default qlen 1000
#     link/ether 52:54:00:8f:4a:bc brd ff:ff:ff:ff:ff:ff
# 16: enp67s0f0_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
#     link/ether fa:cf:0f:6a:ec:45 brd ff:ff:ff:ff:ff:ff
# 17: enp67s0f0_1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
#     link/ether 76:65:93:70:96:ac brd ff:ff:ff:ff:ff:ff
# 18: enp67s0f0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether e4:11:22:33:55:60 brd ff:ff:ff:ff:ff:ff
# 19: enp67s0f0v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether e4:11:22:33:55:61 brd ff:ff:ff:ff:ff:ff
# 20: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
#     link/ether f6:e9:fd:16:8a:ea brd ff:ff:ff:ff:ff:ff
# 21: ovs-sriov: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a2 brd ff:ff:ff:ff:ff:ff
# 22: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master baremetal state UNKNOWN mode DEFAULT group default qlen 1000
#     link/ether fe:54:00:8d:b6:8e brd ff:ff:ff:ff:ff:ff

ip a
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
#     inet 127.0.0.1/8 scope host lo
#        valid_lft forever preferred_lft forever
#     inet6 ::1/128 scope host
#        valid_lft forever preferred_lft forever
# 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master baremetal state UP group default qlen 1000
#     link/ether 90:b1:1c:40:59:27 brd ff:ff:ff:ff:ff:ff
# 3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
#     link/ether 90:b1:1c:40:59:28 brd ff:ff:ff:ff:ff:ff
# 4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
#     link/ether 90:b1:1c:40:59:29 brd ff:ff:ff:ff:ff:ff
# 5: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
#     link/ether 90:b1:1c:40:59:2a brd ff:ff:ff:ff:ff:ff
# 6: enp67s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a2 brd ff:ff:ff:ff:ff:ff
# 7: enp67s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a3 brd ff:ff:ff:ff:ff:ff
# 8: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256
#     link/infiniband 00:00:10:28:fe:80:00:00:00:00:00:00:98:03:9b:03:00:cc:71:2c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
# 9: baremetal: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
#     link/ether 90:b1:1c:40:59:27 brd ff:ff:ff:ff:ff:ff
#     inet 172.21.6.105/24 brd 172.21.6.255 scope global noprefixroute baremetal
#        valid_lft forever preferred_lft forever
#     inet6 fe80::12a7:202d:c70b:be14/64 scope link noprefixroute
#        valid_lft forever preferred_lft forever
# 10: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
#     link/ether 52:54:00:8f:4a:bc brd ff:ff:ff:ff:ff:ff
#     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
#        valid_lft forever preferred_lft forever
# 11: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
#     link/ether 52:54:00:8f:4a:bc brd ff:ff:ff:ff:ff:ff
# 16: enp67s0f0_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
#     link/ether fa:cf:0f:6a:ec:45 brd ff:ff:ff:ff:ff:ff
# 17: enp67s0f0_1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
#     link/ether 76:65:93:70:96:ac brd ff:ff:ff:ff:ff:ff
# 18: enp67s0f0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether e4:11:22:33:55:60 brd ff:ff:ff:ff:ff:ff
#     inet 192.168.55.31/24 scope global enp67s0f0v0
#        valid_lft forever preferred_lft forever
# 19: enp67s0f0v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether e4:11:22:33:55:61 brd ff:ff:ff:ff:ff:ff
#     inet 192.168.55.32/24 scope global enp67s0f0v1
#        valid_lft forever preferred_lft forever
# 20: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
#     link/ether f6:e9:fd:16:8a:ea brd ff:ff:ff:ff:ff:ff
# 21: ovs-sriov: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
#     link/ether 0c:42:a1:fa:18:a2 brd ff:ff:ff:ff:ff:ff
# 22: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master baremetal state UNKNOWN group default qlen 1000
#     link/ether fe:54:00:8d:b6:8e brd ff:ff:ff:ff:ff:ff
#     inet6 fe80::fc54:ff:fe8d:b68e/64 scope link
#        valid_lft forever preferred_lft forever

ovs-vsctl show
# 8f3eddeb-c42c-4af4-9dc8-a46169d91a7c
#     Bridge ovs-sriov
#         Port enp67s0f0_1
#             Interface enp67s0f0_1
#         Port ovs-sriov
#             Interface ovs-sriov
#                 type: internal
#         Port enp67s0f0
#             Interface enp67s0f0
#         Port enp67s0f0_0
#             Interface enp67s0f0_0
#     ovs_version: "2.14.1"

# on kvm
ip link
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# 2: enp0s6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether e4:11:c6:d3:45:f2 brd ff:ff:ff:ff:ff:ff
# 3: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
#     link/ether 52:54:00:8d:b6:8e brd ff:ff:ff:ff:ff:ff

ip a
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
#     inet 127.0.0.1/8 scope host lo
#        valid_lft forever preferred_lft forever
#     inet6 ::1/128 scope host
#        valid_lft forever preferred_lft forever
# 2: enp0s6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether e4:11:c6:d3:45:f2 brd ff:ff:ff:ff:ff:ff
#     inet 192.168.99.11/24 brd 192.168.99.255 scope global noprefixroute enp0s6
#        valid_lft forever preferred_lft forever
#     inet6 fe80::f3c:b686:1739:a748/64 scope link noprefixroute
#        valid_lft forever preferred_lft forever
# 3: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
#     link/ether 52:54:00:8d:b6:8e brd ff:ff:ff:ff:ff:ff
#     inet 172.21.6.11/24 brd 172.21.6.255 scope global noprefixroute enp1s0
#        valid_lft forever preferred_lft forever
#     inet6 fe80::5054:ff:fe8d:b68e/64 scope link noprefixroute
#        valid_lft forever preferred_lft forever

性能测试

# on 102
dnf install -y iperf3
systemctl disable --now firewalld

iperf3 -s -p 6666

# on 11
dnf install -y iperf3

iperf3 -t 20 -p 6666 -c 192.168.99.21
Connecting to host 192.168.99.21, port 6666
[  5] local 192.168.99.11 port 50960 connected to 192.168.99.21 port 6666
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.40 GBytes  12.0 Gbits/sec    0    594 KBytes
[  5]   1.00-2.00   sec  1.39 GBytes  12.0 Gbits/sec    0    594 KBytes
[  5]   2.00-3.00   sec  1.39 GBytes  12.0 Gbits/sec    0    594 KBytes
[  5]   3.00-4.00   sec  1.40 GBytes  12.0 Gbits/sec    0    624 KBytes
[  5]   4.00-5.00   sec  1.40 GBytes  12.0 Gbits/sec    0    659 KBytes
[  5]   5.00-6.00   sec  1.40 GBytes  12.0 Gbits/sec    0    659 KBytes
[  5]   6.00-7.00   sec  1.40 GBytes  12.0 Gbits/sec    0    659 KBytes
[  5]   7.00-8.00   sec  1.40 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]   8.00-9.00   sec  1.40 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]   9.00-10.00  sec  1.40 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]  10.00-11.00  sec  1.39 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]  11.00-12.00  sec  1.39 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]  12.00-13.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  13.00-14.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  14.00-15.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  15.00-16.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  16.00-17.00  sec  1.39 GBytes  12.0 Gbits/sec    0   1.03 MBytes
[  5]  17.00-18.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  18.00-19.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
[  5]  19.00-20.00  sec  1.39 GBytes  11.9 Gbits/sec    0   1.03 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-20.00  sec  27.9 GBytes  12.0 Gbits/sec    0             sender
[  5]   0.00-20.04  sec  27.9 GBytes  11.9 Gbits/sec                  receiver

iperf Done.

# on 105
systemctl disable --now irqbalance.service
mlnx_affinity start

# on 102
systemctl disable --now irqbalance.service
mlnx_affinity start

# on 102
dnf install -y qperf
qperf

# on 105
qperf 192.168.88.21 tcp_bw
tcp_bw:
    bw  =  2.8 GB/sec

# on 101
qperf 192.168.99.21 tcp_bw
tcp_bw:
    bw  =  1.48 GB/sec

RHEL/centos 8 build kernel

本文描述如何在rhel8上编译自定义的内核。

业务背景是,客户需要使用mellanox网卡高级功能,需要kernel打开相应的选项,才可以使用,所以我们就编译一个新的内核出来。

讲解视频

实验步骤

# https://access.redhat.com/articles/3938081
# grubby --info=ALL | grep title

# https://blog.packagecloud.io/eng/2015/04/20/working-with-source-rpms/

export PROXY="192.168.253.1:5085"
export PROXY="192.168.203.1:5085"

# 由于需要rhel8.3,而当前8.3还是beta状态,我们需要注册特殊的订阅。
subscription-manager --proxy=$PROXY register --username **** --password ********

# subscription-manager config --rhsm.baseurl=https://china.cdn.redhat.com
# subscription-manager config --rhsm.baseurl=https://cdn.redhat.com
subscription-manager --proxy=$PROXY refresh

subscription-manager --proxy=$PROXY repos --help

subscription-manager --proxy=$PROXY repos --list > list

cat list | grep 'Repo ID' | grep -v source | grep -v debug

subscription-manager --proxy=$PROXY repos --disable="*"

subscription-manager --proxy=$PROXY repos \
    --enable="rhel-8-for-x86_64-baseos-beta-rpms" \
    --enable="rhel-8-for-x86_64-appstream-beta-rpms" \
    --enable="rhel-8-for-x86_64-supplementary-beta-rpms" \
    --enable="rhel-8-for-x86_64-rt-beta-rpms" \
    --enable="rhel-8-for-x86_64-highavailability-beta-rpms" \
    --enable="rhel-8-for-x86_64-nfv-beta-rpms" \
    --enable="fast-datapath-beta-for-rhel-8-x86_64-rpms" \
    --enable="codeready-builder-beta-for-rhel-8-x86_64-rpms" \
    # --enable="dirsrv-beta-for-rhel-8-x86_64-rpms" \
    # ansible-2.9-for-rhel-8-x86_64-rpms

cat << EOF >> /etc/dnf/dnf.conf
proxy=http://$PROXY
EOF

# 编译内核,需要rhel7, rhel8里面的epel的包
yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

yum -y install yum-utils rpm-build

yum list kernel.x86_64

# 下载内核源码包
yumdownloader --source kernel.x86_64

# 安装源码包
rpm -ivh /root/kernel-4.18.0-221.el8.src.rpm

cd /root/rpmbuild/SPECS
# https://stackoverflow.com/questions/13227162/automatically-install-build-dependencies-prior-to-building-an-rpm-package
# 安装辅助包
yum-builddep kernel.spec

# 生成配置
rpmbuild -bp --target=x86_64 kernel.spec

# libbabeltrace-devel

# https://www.cnblogs.com/luohaixian/p/9313863.html
KERNELVERION=`uname -r | sed "s/.$(uname -m)//"`
KERNELRV=$(uname -r)
/bin/cp -f /root/rpmbuild/BUILD/kernel-${KERNELVERION}/linux-${KERNELRV}/configs/* /root/rpmbuild/SOURCES/

cd /root/rpmbuild/BUILD/kernel-${KERNELVERION}/linux-${KERNELRV}/

/bin/cp -f configs/kernel-4.18.0-`uname -m`.config .config
# cp /boot/config-`uname -r`   .config

make oldconfig
# 自定义配置,请观看视频
make menuconfig

# vi .config

# CONFIG_MLX5_TC_CT=y
# CONFIG_NET_ACT_CT=m
# CONFIG_SKB_EXTENSIONS=y
# CONFIG_NET_TC_SKB_EXT=y
# CONFIG_NF_FLOW_TABLE=m
# CONFIG_NF_FLOW_TABLE_IPV4=m  x
# CONFIG_NF_FLOW_TABLE_IPV6=m  x
# CONFIG_NF_FLOW_TABLE_INET=m
# CONFIG_NET_ACT_CONNMARK=m x
# CONFIG_NET_ACT_IPT=m  x
# CONFIG_NET_EMATCH_IPT=m   x
# CONFIG_NET_ACT_IFE=m  x

# 指明编译x86
# x86_64
sed -i '1s/^/# x86_64\n/' .config

/bin/cp -f .config configs/kernel-4.18.0-`uname -m`.config
/bin/cp -f .config configs/kernel-x86_64.config

/bin/cp -f configs/* /root/rpmbuild/SOURCES/

cd /root/rpmbuild/SPECS

# cp kernel.spec kernel.spec.orig
# https://fedoraproject.org/wiki/Building_a_custom_kernel

# 自定义内核名称
sed -i "s/# define buildid \\.local/%define buildid \\.wzh/" kernel.spec

# rpmbuild -bb --target=`uname -m` --without kabichk  kernel.spec 2> build-err.log | tee build-out.log

# rpmbuild -bb --target=`uname -m` --without debug --without debuginfo --without kabichk kernel.spec 2> build-err.log | tee build-out.log

rpmbuild -bb --target=`uname -m` --with baseonly --without debug --without debuginfo --without kabichk kernel.spec 2> build-err.log | tee build-out.log

cd /root/rpmbuild/RPMS/x86_64/

# 安装编译的内核
INSTALLKV=4.18.0-221.el8.wzh

yum install ./kernel-$INSTALLKV.x86_64.rpm ./kernel-core-$INSTALLKV.x86_64.rpm ./kernel-modules-$INSTALLKV.x86_64.rpm

# 重启以后,检查内核模块激活
grep -R --include=Makefile CONFIG_NET_ACT_IFE
# rpmbuild/BUILD/kernel-4.18.0-221.el8/linux-4.18.0-221.el8.wzh.x86_64/net/sched/Makefile:obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
modprobe act_ife
lsmod | grep act_ife

本次实验编译完成的rhel kernel的包,在这里下载:

链接: https://pan.baidu.com/s/1AG07HxpXy9hoCLMq9qXi0Q 密码: 7hkt --来自百度网盘超级会员V3的分享

检查是否在虚拟机上以及主机基本情况

以下是虚拟机的输出

https://www.cnblogs.com/klb561/p/10527197.html

dmidecode -s system-product-name
# OpenStack Nova

lshw -class system
# sz-mec-dev02
#     description: Computer
#     product: OpenStack Nova
#     vendor: OpenStack Foundation
#     version: 13.2.1-20190604220711
#     serial: 261977f6-fc7a-49f3-954e-cf9feb70fc2c
#     width: 64 bits
#     capabilities: smbios-2.8 dmi-2.8 smp vsyscall32
#     configuration: boot=normal family=Virtual Machine uuid=8C0EE55A-5F37-554D-8300-313E29EF58B0
#   *-pnp00:00
#        product: PnP device PNP0b00
#        physical id: 1
#        capabilities: pnp
#        configuration: driver=rtc_cmos

dmesg |grep -i virtual
# [    0.145659] Booting paravirtualized kernel on KVM
# [    1.177345] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input4
# [    1.178356] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input3
# [    1.223866] systemd[1]: Detected virtualization kvm.

check core

https://www.cyberciti.biz/faq/check-how-many-cpus-are-there-in-linux-system/

echo "Number of CPU/cores online at $HOSTNAME: $(getconf _NPROCESSORS_ONLN)"

check memory

https://www.networkworld.com/article/3336174/how-much-memory-is-installed-and-being-used-on-your-linux-systems.html

dmidecode -t 17 | grep "Size.*MB" | awk '{s+=$2} END {print s / 1024 "GB"}'

openshift 4 kvm+ovs install

openshift4在日常的安装场景中,有这样一个情况,就是需要在多台配置小一些的主机上,实现跨主机的集群安装,这就需要多个kvm跨主机通讯,本来使用bridge方式,搭配可直连的ip地址,是可以满足的,但是由于ip地址管理的限制,我们没有可以直连的ip地址,那么我们就需要ovs+vxlan的方式,来解决这个问题。

本文针对2台主机,讲述如何配置ovs,以及如何启动kvm。

参考资料:

  • https://stackoverflow.com/questions/30622680/kvm-ovs-bridged-network-how-to-configure
  • https://stackoverflow.com/questions/31566658/setup-private-networking-between-two-hosts-and-two-vms-with-libvirt-openvswitc
  • https://blog.csdn.net/wuliangtianzu/article/details/81870551
  • https://pinrojas.com/2017/05/03/how-to-use-virt-install-to-connect-at-openvswitch-bridges/
  • https://www.jianshu.com/p/658332deac99
  • https://developer.gnome.org/NetworkManager/stable/nm-openvswitch.html

mtu 调整:

  • https://www.cnblogs.com/JacZhu/p/11006738.html
  • https://stackoom.com/question/3gFcR/%E6%97%A0%E6%B3%95%E5%9C%A8OVS%E9%9A%A7%E9%81%93%E4%B8%AD%E6%8D%95%E8%8E%B7%E5%A4%A7%E4%BA%8EMTU-%E7%9A%84%E6%B5%81%E9%87%8F
  • https://serverfault.com/questions/680635/mtu-on-open-vswitch-bridge-port
  • https://stackoverflow.com/questions/54398827/unable-to-capture-traffic-greater-than-mtu-1500-in-ovs-tunnel

vxlan

  • https://blog.csdn.net/a363344923/article/details/98033856
  • https://prolinuxhub.com/configure-start-up-scripts-for-ovs-on-centos-and-red-hat/

nat

  • https://www.sdnlab.com/19842.html
  • https://www.sdnlab.com/19802.html
  • https://www.sdnlab.com/19765.html

基于本文的ocp4安装实践,见笔记: https://github.com/wangzheng422/docker_env/blob/master/redhat/prepare/cmri/lab.md

on redhat-01


yum -y install openvswitch2.11 NetworkManager-ovs
# install pkg for vnc and kvm

systemctl enable --now openvswitch
systemctl status openvswitch

systemctl enable --now libvirtd

cat << 'EOF' > /etc/sysconfig/network-scripts/ifcfg-br-int 
DEVICE=br-int
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSBridge
BOOTPROTO=static
HOTPLUG=no
IPADDR=192.168.7.1
PREFIX=24
MTU=1450
EOF

cat << 'EOF' > /etc/sysconfig/network-scripts/ifcfg-vxlan1
DEVICE=vxlan1
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSTunnel
OVS_BRIDGE=br-int
OVS_TUNNEL_TYPE=vxlan
OVS_TUNNEL_OPTIONS="options:remote_ip=172.29.159.100"
BOOTPROTO=static
HOTPLUG=no
EOF

systemctl restart network

ovs-vsctl show

# ovs-vsctl set int br-int mtu_request=1450
# ovs-vsctl set int br-int mtu_request=[]

mkdir -p /data/kvm
cd /data/kvm

# bridge mode
cat << 'EOF' > ovsnet.xml
<network>
  <name>br-int</name>
  <forward mode='bridge'/>
  <bridge name='br-int'/>
  <virtualport type='openvswitch'/>
</network>
EOF

virsh net-define ovsnet.xml
virsh net-start br-int
virsh net-autostart br-int

# restore
virsh net-destroy br-int
virsh net-undefine br-int
/bin/rm -f /etc/sysconfig/network-scripts/ifcfg-br-int 
/bin/rm -f /etc/sysconfig/network-scripts/ifcfg-vxlan1
systemctl restart network


on redhat-02

 
yum -y install openvswitch2.11 NetworkManager-ovs
# install pkg for vnc and kvm

systemctl enable --now openvswitch
systemctl status openvswitch

systemctl enable --now libvirtd

ovs-vsctl show

cat << 'EOF' > /etc/sysconfig/network-scripts/ifcfg-br-int 
DEVICE=br-int
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSBridge
BOOTPROTO=static
HOTPLUG=no
IPADDR=192.168.7.2
PREFIX=24
MTU=1450
EOF

cat << 'EOF' > /etc/sysconfig/network-scripts/ifcfg-vxlan1
DEVICE=vxlan1
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSTunnel
OVS_BRIDGE=br-int
OVS_TUNNEL_TYPE=vxlan
OVS_TUNNEL_OPTIONS="options:remote_ip=172.29.159.99"
BOOTPROTO=static
HOTPLUG=no
EOF

systemctl restart network

ovs-vsctl show

# ovs-vsctl set int br-int mtu_request=1450

mkdir -p /data/kvm
cd /data/kvm

# bridge mode
cat << 'EOF' > ovsnet.xml
<network>
  <name>br-int</name>
  <forward mode='bridge'/>
  <bridge name='br-int'/>
  <virtualport type='openvswitch'/>
</network>
EOF

virsh net-define ovsnet.xml
virsh net-start br-int
virsh net-autostart br-int

# restore
virsh net-destroy br-int
virsh net-undefine br-int



创建虚拟机

虚机创建,注意调整每个虚机的mtu,关键在虚拟机里面,操作系统对网卡mtu的设置,这个其实是kernel安装的时候,启动参数的问题,请参考这里: https://www.man7.org/linux/man-pages/man7/dracut.cmdline.7.html


mkdir -p /data/kvm
cd /data/kvm

lvremove -f datavg/helperlv
lvcreate -y -L 230G -n helperlv datavg

# 230G
virt-install --name="ocp4-aHelper" --vcpus=2 --ram=4096 \
--disk path=/dev/datavg/helperlv,device=disk,bus=virtio,format=raw \
--os-variant centos7.0 --network network:br-int,model=virtio \
--boot menu=on --location /data/kvm/rhel-server-7.8-x86_64-dvd.iso \
--initrd-inject /data/kvm/helper-ks.cfg --extra-args "inst.ks=file:/helper-ks.cfg" 

弯路

ovs上的虚拟机,要开启mtu调整


sysctl -w net.ipv4.tcp_mtu_probing=1

cat << 'EOF' > /etc/sysctl.d/99-sysctl-wzh.conf
net.ipv4.tcp_mtu_probing = 1
EOF

sysctl --system

ovs-vsctl add-port br-int vxlan1 -- \
  set Interface vxlan1 type=vxlan options:remote_ip=172.29.159.99

ovs-vsctl set int br-int mtu_request=1450

nmcli connection add type vxlan id 100 remote 172.29.159.99 ipv4.addresses 192.168.77.2/24 ipv4.method manual ifname vxlan1 connection.id vxlan1 vxlan.parent enp2s0f0 
nmcli conn up vxlan1

nmcli conn del vxlan1

ovs-vsctl add-port br-int vxlan1 -- \
  set Interface vxlan1 type=vxlan options:remote_ip=172.29.159.100

ovs-vsctl set int br-int mtu_request=1450
ovs-vsctl set int br-int mtu_request=[]

systemctl restart network

# restore
ovs-vsctl del-port br-int vxlan1
ovs-vsctl del-br br-int
rm -f /etc/sysconfig/network-scripts/ifcfg-br-int 
systemctl restart network

man nm-openvswitch

nmcli con add type ovs-bridge \
    con-name br-private \
    ifname br-private \
    ipv4.method 'manual' \
    ipv4.address '192.168.7.1/24' 

nmcli connection modify br-private ipv4.addresses 192.168.7.1/24
nmcli connection modify eno2 ipv4.gateway 192.168.39.254
nmcli connection modify eno2 ipv4.dns 192.168.39.129
nmcli connection modify br-private ipv4.method manual
nmcli connection modify br-private connection.autoconnect yes
nmcli connection modify br-private connection.autoconnect yes
nmcli connection reload

nmcli con del br-private

nmcli connection add type vxlan id 100 remote 172.29.159.100 ipv4.addresses 192.168.77.1/24 ipv4.method manual ifname vxlan1 connection.id vxlan1 vxlan.parent enp2s0f0 
nmcli conn up vxlan1

nmcli conn del vxlan1

nmcli conn add type ovs-bridge conn.interface bridge0
nmcli conn add type ovs-port conn.interface port0 master bridge0
nmcli conn add type ovs-interface conn.interface iface0 master port0 \
             ipv4.method manual ipv4.address 192.168.7.1/24

nmcli conn del ovs-slave-iface0
nmcli conn del ovs-slave-port0
nmcli conn del ovs-bridge-bridge0

ovs-vsctl add-br br-private

ovs-dpctl show
ovs-ofctl show br0


OpenShift and Container Storage for Administrators

本文讲述openshift4的管理员上手培训,主要亮点是openshift的存储模块ocs,openshift的集中日志,和openshift的计量计费,这几个模块需要的底层资源比较多,平时难得有环境可以尝试。

workshop upstream github: https://github.com/openshift/openshift-cns-testdrive

WORKSHOP MODULES

以下是培训的各个模块的教材。

ocs (openshift container storage)

集中日志

计量计费

poc for sc

rhel host maintain

aliyun host


ssh-copy-id root@

cat << EOF > /root/.ssh/config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null

EOF

export VULTR_HOST=helper.hsc.redhat.ren

rsync -e ssh --info=progress2 -P --delete -arz /data/rhel-data/data ${VULTR_HOST}:/data/rhel-data

rsync -e ssh --info=progress2 -P --delete -arz /data/registry ${VULTR_HOST}:/data/

rsync -e ssh --info=progress2 -P --delete -arz /data/ocp4 ${VULTR_HOST}:/data/

rsync -e ssh --info=progress2 -P --delete -arz /data/is.samples ${VULTR_HOST}:/data/

cd /data
tar -cvf - registry/ | pigz -c > registry.tgz
tar -cvf - ocp4/ | pigz -c > ocp4.tgz
tar -cvf - data/ | pigz -c > rhel-data.tgz
tar -cvf - is.samples/ | pigz -c > /data_hdd/down/is.samples.tgz

helper host

######################################################
# on helper

find . -name vsftp*
yum -y install ./data/rhel-7-server-rpms/Packages/vsftpd-3.0.2-25.el7.x86_64.rpm
systemctl start vsftpd
systemctl restart vsftpd
systemctl enable vsftpd

firewall-cmd --permanent --add-service=ftp
firewall-cmd --reload

mv data /var/ftp/
chcon -R -t public_content_t /var/ftp/data

mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum repolist

yum -y update

hostnamectl set-hostname helper.hsc.redhat.ren
nmcli connection modify em1 ipv4.dns 114.114.114.114
nmcli connection reload
nmcli connection up em1

yum -y install fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

[recidive]
enabled = true

EOF

systemctl enable fail2ban
systemctl restart fail2ban

fail2ban-client status sshd
fail2ban-client status recidive
systemctl status fail2ban
tail -F /var/log/fail2ban.log

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config

diff /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK

systemctl restart sshd

passwd

useradd -m wzh

lsblk | grep 446 | awk '{print $1}' | xargs -I DEMO echo -n "/dev/DEMO "
# /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
lsblk | grep 446 | awk '{print $1}' | wc -l
# 12

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/assembly_configure-mange-raid-configuring-and-managing-logical-volumes
yum install -y lvm2

pvcreate -y /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgcreate datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgs

lvcreate --type raid10 -l 100%FREE --stripes 6 -n datalv datavg

umount /data_hdd
lvremove /dev/datavg/datalv

mkfs.xfs /dev/datavg/datalv

lvdisplay /dev/datavg/datalv -m

mkdir -p /data

cp /etc/fstab /etc/fstab.bak

cat << EOF >> /etc/fstab
/dev/datavg/datalv /data                   xfs     defaults        0 0

EOF

mount -a

yum install -y sysstat
lsblk | grep disk | awk '{print $1}' | xargs -I DEMO echo -n "DEMO "
# sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm
iostat -h -m -x sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm 5
iostat -m -x dm-24 5

yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

firewall-cmd --get-zones
# block dmz drop external home internal public trusted work
firewall-cmd --zone=public --list-all

firewall-cmd --permanent --zone=public --remove-port=2049/tcp

firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" port port="2049" protocol="tcp" source address="117.177.241.0/24" accept'
firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" port port="2049" protocol="tcp" source address="39.137.101.0/24" accept'

# firewall-cmd --permanent --zone=public --add-port=4443/tcp

firewall-cmd --reload

showmount -a
exportfs -s

cd /data_ssd/
scp *.tgz root@117.177.241.17:/data_hdd/down/

# https://access.redhat.com/solutions/3341191
# subscription-manager register --org=ORG ID --activationkey= Key Name
cat /var/log/rhsm/rhsm.log

subscription-manager config --rhsm.manage_repos=0
cp /etc/yum/pluginconf.d/subscription-manager.conf /etc/yum/pluginconf.d/subscription-manager.conf.orig
cat << EOF  > /etc/yum/pluginconf.d/subscription-manager.conf
[main]
enabled=0
EOF

# https://access.redhat.com/products/red-hat-insights/#getstarted
subscription-manager register --auto-attach
yum --disableplugin=subscription-manager install insights-client
insights-client --register

yum --disableplugin=subscription-manager install ncdu

helper host day 2

####################################
# anti scan
firewall-cmd --permanent --zone=public --remove-rich-rule='rule family="ipv4" port port="2049" protocol="tcp" source address="117.177.241.0/24" accept'
firewall-cmd --permanent --zone=public --remove-rich-rule='rule family="ipv4" port port="2049" protocol="tcp" source address="39.137.101.0/24" accept'

firewall-cmd --permanent --new-ipset=my-allow-list --type=hash:net
firewall-cmd --permanent --get-ipsets

cat > /root/iplist.txt <<EOL
127.0.0.1/32
223.87.20.0/24
117.177.241.0/24
39.134.200.0/24
39.134.201.0/24
39.137.101.0/24
192.168.7.0/24
112.44.102.224/27
47.93.86.113/32
221.226.0.75/32
210.21.236.182/32
61.132.54.0/24
112.44.102.228/32
223.87.20.7/32
10.88.0.0/16
223.86.0.14/32
39.134.204.0/24
EOL

firewall-cmd --permanent --ipset=my-allow-list --add-entries-from-file=iplist.txt

firewall-cmd --permanent --ipset=my-allow-list --get-entries

firewall-cmd --permanent --zone=trusted --add-source=ipset:my-allow-list 
firewall-cmd --reload

firewall-cmd --list-all
firewall-cmd --get-active-zones

firewall-cmd --zone=block --change-interface=em1

firewall-cmd --set-default-zone=block
firewall-cmd --runtime-to-permanent
firewall-cmd --reload

# setup time server
/bin/cp -f /etc/chrony.conf /etc/chrony.conf.bak

cat << EOF > /etc/chrony.conf
server 0.rhel.pool.ntp.org iburst
server 1.rhel.pool.ntp.org iburst
server 2.rhel.pool.ntp.org iburst
server 3.rhel.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
allow 39.134.0.0/16
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking

useradd -m zte

groupadd docker
usermod -aG docker zte

# https://github.com/containers/libpod/issues/5049
loginctl enable-linger zte
su -l zte

# https://www.redhat.com/en/blog/preview-running-containers-without-root-rhel-76
echo 10000 > /proc/sys/user/max_user_namespaces

####################################
## trust podman
firewall-cmd --permanent --zone=trusted --add-interface=cni0
firewall-cmd --permanent --zone=trusted --remove-interface=cni0

firewall-cmd --reload

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
allow 39.134.0.0/16
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking

bootstrap host

######################################################
# bootstrap

mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum repolist

yum -y update

hostnamectl set-hostname bootstrap.hsc.redhat.ren

nmcli connection modify em1 ipv4.dns 117.177.241.16
nmcli connection reload
nmcli connection up em1

yum -y install fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

[recidive]
enabled = true

EOF

systemctl enable fail2ban
systemctl restart fail2ban

fail2ban-client status sshd
fail2ban-client status recidive
systemctl status fail2ban
tail -F /var/log/fail2ban.log

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config

diff /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK

systemctl restart sshd

passwd

useradd -m wzh

lsblk | grep 446 | awk '{print $1}' | xargs -I DEMO echo -n "/dev/DEMO "
# /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
lsblk | grep 446 | awk '{print $1}' | wc -l
# 12

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/assembly_configure-mange-raid-configuring-and-managing-logical-volumes
yum install -y lvm2

pvcreate -y /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgcreate datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgs

lvcreate --type raid10 -l 100%FREE --stripes 6 -n datalv datavg

mkfs.xfs /dev/datavg/datalv

lvdisplay /dev/datavg/datalv -m

mkdir -p /data

cp /etc/fstab /etc/fstab.bak

cat << EOF >> /etc/fstab
/dev/datavg/datalv /data                   xfs     defaults        0 0

EOF

mount -a

yum install -y sysstat
lsblk | grep disk | awk '{print $1}' | xargs -I DEMO echo -n "DEMO "
# sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm
iostat -h -m -x sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm 5
iostat -m -x dm-24 5

yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking

master0 host

#####################################################
# master0

mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum repolist

yum -y update

hostnamectl set-hostname master0.hsc.redhat.ren

nmcli connection modify em1 ipv4.dns 117.177.241.16
nmcli connection reload
nmcli connection up em1

yum -y install fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

EOF

systemctl enable fail2ban
systemctl restart fail2ban

fail2ban-client status sshd
fail2ban-client status recidive
systemctl status fail2ban
tail -F /var/log/fail2ban.log

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

[recidive]
enabled = true

EOF

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config

diff /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK

systemctl restart sshd

passwd

useradd -m wzh

yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

lsblk | grep 446 | awk '{print $1}' | xargs -I DEMO echo -n "/dev/DEMO "
# /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
lsblk | grep 446 | awk '{print $1}' | wc -l
# 12

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/assembly_configure-mange-raid-configuring-and-managing-logical-volumes
yum install -y lvm2

pvcreate -y /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgcreate datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgs

lvcreate --type raid0 -l 100%FREE --stripes 12 -n datalv datavg

mkfs.xfs /dev/datavg/datalv

lvdisplay /dev/datavg/datalv -m

mkdir -p /data
mkdir -p /data_hdd

cp /etc/fstab /etc/fstab.bak

cat << EOF >> /etc/fstab
/dev/datavg/datalv /data_hdd                  xfs     defaults        0 0

EOF

mount -a

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking

master1 host

######################################################
# master1

mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum repolist

yum -y update

hostnamectl set-hostname master1.hsc.redhat.ren

nmcli connection modify em1 ipv4.dns 117.177.241.16
nmcli connection reload
nmcli connection up em1

yum -y install fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

[recidive]
enabled = true

EOF

systemctl enable fail2ban
systemctl restart fail2ban

fail2ban-client status sshd
fail2ban-client status recidive
systemctl status fail2ban
tail -F /var/log/fail2ban.log

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config

diff /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK

systemctl restart sshd

passwd

useradd -m wzh

yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

mkdir -p /data_hdd
mkfs.xfs -f /dev/sdb

cat << EOF >> /etc/fstab
/dev/sdb /data_hdd                   xfs     defaults        0 0
EOF

mount -a

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking

master2 host

######################################################
# master2

mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum repolist

yum -y update

hostnamectl set-hostname master2.hsc.redhat.ren

nmcli connection modify em1 ipv4.dns 117.177.241.16
nmcli connection reload
nmcli connection up em1

yum -y install fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true
EOF

systemctl enable fail2ban
systemctl restart fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

[recidive]
enabled = true

EOF

fail2ban-client status
systemctl status fail2ban
tail -F /var/log/fail2ban.log

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config

diff /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK

systemctl restart sshd

passwd

useradd -m wzh

lsblk | grep 446 | awk '{print $1}' | xargs -I DEMO echo -n "/dev/DEMO "
# /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
lsblk | grep 446 | awk '{print $1}' | wc -l
# 12

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/assembly_configure-mange-raid-configuring-and-managing-logical-volumes
yum install -y lvm2

pvcreate -y /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgcreate datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgs

lvcreate --type raid0 -l 100%FREE --stripes 12 -n datalv datavg

mkfs.xfs /dev/datavg/datalv

lvdisplay /dev/datavg/datalv -m

mkdir -p /data
mkdir -p /data_hdd

cp /etc/fstab /etc/fstab.bak

cat << EOF >> /etc/fstab
/dev/datavg/datalv /data_hdd                   xfs     defaults        0 0

EOF

mount -a

yum install -y sysstat
lsblk | grep disk | awk '{print $1}' | xargs -I DEMO echo -n "DEMO "
# sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm
iostat -m -x sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm 5
iostat -m -x dm-12 5

yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking

infra0 host

######################################################
# infra0

mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum repolist

yum -y update

hostnamectl set-hostname infra0.hsc.redhat.ren

nmcli connection modify em1 ipv4.dns 117.177.241.16
nmcli connection reload
nmcli connection up em1

yum -y install fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

[recidive]
enabled = true

EOF

systemctl enable fail2ban
systemctl restart fail2ban

fail2ban-client status sshd
fail2ban-client status recidive
systemctl status fail2ban
tail -F /var/log/fail2ban.log

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config

diff /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK

systemctl restart sshd

passwd

useradd -m wzh

lsblk | grep 446 | awk '{print $1}' | xargs -I DEMO echo -n "/dev/DEMO "
# /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
lsblk | grep 446 | awk '{print $1}' | wc -l
# 12

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/assembly_configure-mange-raid-configuring-and-managing-logical-volumes
yum install -y lvm2

pvcreate -y /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgcreate datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgs

lvcreate --type raid0 -l 100%FREE --stripes 12 -n datalv datavg

mkfs.xfs /dev/datavg/datalv

lvdisplay /dev/datavg/datalv -m

mkdir -p /data
mkdir -p /data_hdd

cp /etc/fstab /etc/fstab.bak

cat << EOF >> /etc/fstab
/dev/datavg/datalv /data                   xfs     defaults        0 0

EOF

mount -a

# https://access.redhat.com/solutions/769403
fuser -km /data
lvremove -f datavg/datalv
vgremove datavg
pvremove /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

pvcreate -y /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
vgcreate datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
lvcreate --type raid0 -L 400G --stripes 12 -n monitorlv datavg

yum install -y sysstat
lsblk | grep disk | awk '{print $1}' | xargs -I DEMO echo -n "DEMO "
# sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm
iostat -m -x sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm 5
iostat -m -x dm-12 5

yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking

infra1 host

######################################################
# infra1

mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum repolist

yum -y update

hostnamectl set-hostname infra1.hsc.redhat.ren

nmcli connection modify em1 ipv4.dns 117.177.241.16
nmcli connection reload
nmcli connection up em1

yum -y install fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

[recidive]
enabled = true

EOF

systemctl enable fail2ban
systemctl restart fail2ban

fail2ban-client status sshd
fail2ban-client status recidive
systemctl status fail2ban
tail -F /var/log/fail2ban.log

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config

diff /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK

systemctl restart sshd

passwd

useradd -m wzh

lsblk | grep 446 | awk '{print $1}' | xargs -I DEMO echo -n "/dev/DEMO "
# /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
lsblk | grep 446 | awk '{print $1}' | wc -l
# 12

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/assembly_configure-mange-raid-configuring-and-managing-logical-volumes
yum install -y lvm2

pvcreate -y /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgcreate datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

vgs

lvcreate --type raid0 -l 100%FREE --stripes 12 -n datalv datavg

mkfs.xfs /dev/datavg/datalv

lvdisplay /dev/datavg/datalv -m

mkdir -p /data
mkdir -p /data_hdd

cp /etc/fstab /etc/fstab.bak

cat << EOF >> /etc/fstab
/dev/datavg/datalv /data                   xfs     defaults        0 0

EOF

mount -a

# https://access.redhat.com/solutions/769403
fuser -km /data
lvremove -f datavg/datalv
vgremove datavg
pvremove /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm

pvcreate -y /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
vgcreate datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
lvcreate --type raid0 -L 400G --stripes 12 -n monitorlv datavg

yum install -y sysstat
lsblk | grep disk | awk '{print $1}' | xargs -I DEMO echo -n "DEMO "
# sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm
iostat -m -x sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm 5
iostat -m -x dm-12 5

yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking

worker-0 host


mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum --disableplugin=subscription-manager  repolist

yum -y update

hostnamectl set-hostname worker-0.ocpsc.redhat.ren

nmcli connection modify enp3s0f0 ipv4.dns 117.177.241.16
nmcli connection reload
nmcli connection up enp3s0f0

yum -y install fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

EOF

systemctl enable fail2ban
systemctl restart fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

[recidive]
enabled = true

EOF

systemctl restart fail2ban

fail2ban-client status sshd
fail2ban-client status recidive
systemctl status fail2ban
tail -F /var/log/fail2ban.log

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config

diff /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK

systemctl restart sshd

passwd

useradd -m wzh

lsblk | grep 446 | awk '{print $1}' | xargs -I DEMO echo -n "/dev/DEMO "
# /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk
lsblk | grep 446 | awk '{print $1}' | wc -l
# 11

yum install -y lvm2

pvcreate -y /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk 

vgcreate datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk

vgs

lvcreate --type raid0 -l 100%FREE --stripes 10 -n datalv datavg

mkfs.xfs /dev/datavg/datalv

lvdisplay /dev/datavg/datalv -m

mkdir -p /data

cp /etc/fstab /etc/fstab.bak

cat << EOF >> /etc/fstab
/dev/datavg/datalv /data                  xfs     defaults        0 0

EOF

mount -a

yum install -y sysstat
lsblk | grep disk | awk '{print $1}' | xargs -I DEMO echo -n "DEMO "
# sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm
iostat -m -x sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk 5
iostat -m -x dm-10 5



####################################
# ntp
yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

systemctl disable --now firewalld.service

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking

#######################################
# nic bond
cat << EOF > /root/nic.bond.sh
#!/bin/bash

# delete all connection 
nmcli -g uuid con | while read i ; do nmcli c delete uuid ${i} ; done 

nmcli con add type bond \
    con-name bond0 \
    ifname bond0 \
    mode 802.3ad \
    ipv4.method 'manual' \
    ipv4.address '39.137.101.28/25' \
    ipv4.gateway '39.137.101.126' \
    ipv4.dns '117.177.241.16'
    
nmcli con mod id bond0 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3
    
nmcli con add type bond-slave ifname enp3s0f0 con-name enp3s0f0 master bond0
nmcli con add type bond-slave ifname enp3s0f1 con-name enp3s0f1 master bond0

# nmcli con down enp3s0f0 && nmcli con start enp3s0f0
# nmcli con down enp3s0f1 && nmcli con start enp3s0f1
# nmcli con down bond0 && nmcli con start bond0

systemctl restart network

EOF

cat > /root/nic.restore.sh << 'EOF'
#!/bin/bash

# delete all connection 
nmcli -g uuid con | while read i ; do nmcli c delete uuid ${i} ; done 

# re-create primary connection 
nmcli con add type ethernet \
    con-name enp3s0f0 \
    ifname enp3s0f0 \
    ipv4.method 'manual' \
    ipv4.address '39.137.101.28/25' \
    ipv4.gateway '39.137.101.126' \
    ipv4.dns '117.177.241.16'

# restart interface
# nmcli con down enp3s0f0 && nmcli con up enp3s0f0

systemctl restart network

exit 0
EOF

chmod +x /root/nic.restore.sh

cat > ~/cron-network-con-recreate << EOF
*/2 * * * * /bin/bash /root/nic.restore.sh
EOF

crontab ~/cron-network-con-recreate

bash /root/nic.bond.sh

worker-0 disk


#########################################
# ssd cache + hdd
# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/logical_volume_manager_administration/index#lvm_cache_volume_creation
umount /data
lsblk -d -o name,rota

lvremove  /dev/datavg/datalv

pvcreate /dev/nvme0n1

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/vg_grow
vgextend datavg /dev/nvme0n1

## raid5 + cache
lvcreate --type raid5 -L 1G --stripes 9 -n hddlv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk

lvcreate --type raid5 -L 3.8T --stripes 9 -n mixlv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk

lvcreate -L 1G -n ssdlv datavg /dev/nvme0n1

# lvcreate --type cache-pool -L 300G -n cache1 datavg /dev/nvme0n1

lvcreate -L 1.4T -n cache1 datavg /dev/nvme0n1

lvcreate -L 14G -n cache1meta datavg /dev/nvme0n1

lvconvert --type cache-pool --poolmetadata datavg/cache1meta datavg/cache1

lvconvert --type cache --cachepool datavg/cache1 datavg/mixlv

# lvcreate --type raid5 --stripes 9 -L 1T -I 16M -R 4096K -n hddlv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk

# lvcreate --type raid5 --stripes 9 -L 1T -I 16M -R 4096K -n datalv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk

# lvcreate --type raid5 --stripes 9 -L 1T -n datalv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk

## raid0 + cache

lvcreate --type raid0 -L 4T --stripes 10 -n hddlv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk










lvcreate --type raid0 -L 1T --stripes 10 -n mixlv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk

lvcreate -L 300G -n ssdlv datavg /dev/nvme0n1

lvcreate --type cache-pool -L 300G -n cpool datavg /dev/nvme0n1

lvs -a -o name,size,attr,devices datavg

# lvconvert --type cache --cachepool cpool datavg/datalv

lvconvert --type cache --cachepool cpool datavg/mixlv

# lvconvert --type cache --cachepool cpool --cachemode writeback datavg/datalv

# lvs -a -o name,size,attr,devices datavg
# lvs -o+cache_mode datavg

# mkfs.xfs /dev/datavg/datalv
mkfs.xfs /dev/datavg/hddlv
mkfs.xfs /dev/datavg/ssdlv
mkfs.xfs /dev/datavg/mixlv

mkdir -p /data/
mkdir -p /data_ssd/
mkdir -p /data_mix/

cat /etc/fstab

cat << EOF >> /etc/fstab
/dev/datavg/hddlv /data                  xfs     defaults        0 0
/dev/datavg/ssdlv /data_ssd                  xfs     defaults        0 0
/dev/datavg/mixlv /data_mix                  xfs     defaults        0 0
EOF

mount -a
df -h | grep \/data

# cleanup
umount /data/
umount /data_ssd/
umount /data_mix/
lvremove -f /dev/datavg/hddlv
lvremove -f /dev/datavg/ssdlv
lvremove -f /dev/datavg/mixlv

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --directory=./ --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --directory=./ --ioengine=sync --size=100g 

blktrace /dev/datavg/mixlv /dev/nvme0n1 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk

blkparse -o /dev/null -i dm-42 -d dm-42.bin
btt -i dm-42.blktrace.bin

blkparse -o /dev/null -i nvme0n1 -d nvme0n1.bin
btt -i nvme0n1.bin | less

blkparse -o /dev/null -i sdb -d sdb.bin
btt -i sdb.bin | less


dstat -D /dev/mapper/datavg-hddlv,sdd,nvme0n1 -N enp3s0f0

dstat -D /dev/mapper/datavg-hddlv,sdd,nvme0n1 --disk-util 

bmon -p ens8f0,ens8f1,enp3s0f0,enp3s0f1

lvs -o+lv_all datavg/mixlv_corig

lvs -o+Layout datavg/mixlv_corig

lvs -o+CacheReadHits,CacheReadMisses

lvs -o+Layout

blockdev --report 
# https://access.redhat.com/solutions/3588841
/sbin/blockdev --setra 262144 /dev/mapper/datavg-hddlv
/sbin/blockdev --setra 8192 /dev/mapper/datavg-hddlv
/sbin/blockdev --setra 0 /dev/mapper/datavg-hddlv


hdparm -t /dev/mapper/datavg-hddlv

/sbin/blockdev --setra 4096 /dev/mapper/datavg-hddlv
/sbin/blockdev --setra 8192 /dev/mapper/datavg-hddlv
/sbin/blockdev --setra 16384 /dev/mapper/datavg-hddlv
/sbin/blockdev --setra 32768 /dev/mapper/datavg-hddlv
/sbin/blockdev --setra 65536 /dev/mapper/datavg-hddlv
/sbin/blockdev --setra 131072 /dev/mapper/datavg-hddlv

for f in /dev/mapper/datavg-hddlv_rimage_*; do /sbin/blockdev --setra 65536 $f ; done

for f in /dev/mapper/datavg-hddlv_rimage_*; do /sbin/blockdev --setra 131072 $f ; done

blktrace /dev/datavg/hddlv /dev/nvme0n1 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk

# Generate distribution of file sizes from the command prompt
# https://superuser.com/questions/565443/generate-distribution-of-file-sizes-from-the-command-prompt
find /data/mnt/ -type f > list
cat list | xargs ls -l > list.size
cat list.size | awk '{ n=int(log($5)/log(2));                         \
          if (n<10) n=10;                                               \
          size[n]++ }                                                   \
      END { for (i in size) printf("%d %d\n", 2^i, size[i]) }'          \
 | sort -n                                                              \
 | awk 'function human(x) { x[1]/=1024;                                 \
                            if (x[1]>=1024) { x[2]++;                   \
                                              human(x) } }              \
        { a[1]=$1;                                                      \
          a[2]=0;                                                       \
          human(a);                                                     \
          printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }' 
#   1k:      2
#  16k: 18875840
#  64k: 7393088
# 128k: 5093147
# 512k: 1968632
#   1M: 914486

cat list.size | awk '{size[int(log($5)/log(2))]++}END{for (i in size) printf("%10d %3d\n", 2^i, size[i])}' | sort -n

# 5.5
var_basedir="/data_ssd/mnt"
find $var_basedir -type f -size -16k  > list.16k
find $var_basedir -type f -size -128k  -size +16k > list.128k
find $var_basedir -type f -size +128k > list.+128k
find $var_basedir -type f > list


dstat --output /root/dstat.csv -D /dev/mapper/datavg-mixlv,/dev/mapper/datavg-mixlv_corig,sdh,sdab -N bond0

dstat -D /dev/mapper/datavg-hddlv,/dev/datavg/ext4lv,sdh,sdab -N bond0

i=0
while read f; do
  /bin/cp -f $f /data_mix/mnt/$i
  ((i++))
done < list

find /data_mix/mnt/ -type f > list

cat list | shuf > list.shuf.all

cat list.16k | shuf > list.shuf.16k
cat list.128k | shuf > list.shuf.128k
cat list.+128k | shuf > list.shuf.+128k
cat list.128k list.+128k | shuf > list.shuf.+16k

# zte use 1800
var_total=10
rm -f split.list.*


split -n l/$var_total list.shuf.all split.list.all.

split -n l/$var_total list.shuf.16k split.list.16k.
split -n l/$var_total list.shuf.128k split.list.128k.
split -n l/$var_total list.shuf.+128k split.list.+128k.
split -n l/$var_total list.shuf.+16k split.list.+16k.


for f in split.list.16k.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done
# for f in split.list.+16k.*; do 
#     cat $f | xargs -I DEMO cat DEMO > /dev/null &
# done
for f in split.list.128k.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done
for f in split.list.+128k.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

for f in split.list.all.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

ps -ef | grep /data_ssd/mnt | grep cat | awk '{print $2}' | xargs -I DEMO kill DEMO

echo "wait to finish"
wait
# while true; do
#   for f in split.list.all.*; do 
#       cat $f | xargs -I DEMO cat DEMO > /dev/null &
#   done
#   echo "wait to finish"
#   wait
# done
kill -9 $(jobs -p)

jobs -p  | xargs kill

ps -ef | grep /mnt/zxdfs | grep cat | awk '{print $2}' | xargs -I DEMO kill DEMO

ps -ef | grep /data_mix/mnt | grep cat | awk '{print $2}' | xargs -I DEMO kill DEMO




worker-1 host


mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum --disableplugin=subscription-manager  repolist

yum install -y byobu htop iostat

yum -y update

hostnamectl set-hostname worker-2.ocpsc.redhat.ren

nmcli connection modify eno1 ipv4.dns 117.177.241.16
nmcli connection reload
nmcli connection up eno1

yum -y install fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

EOF

systemctl enable fail2ban
systemctl restart fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

[recidive]
enabled = true

EOF

systemctl restart fail2ban

fail2ban-client status sshd
fail2ban-client status recidive
systemctl status fail2ban
tail -F /var/log/fail2ban.log

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config

diff /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK

systemctl restart sshd

passwd

useradd -m wzh

lsblk | grep 5.5 | awk '{print $1}' | xargs -I DEMO echo -n "/dev/DEMO "
# /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk
lsblk | grep 5.5 | awk '{print $1}' | wc -l
# 24

yum install -y lvm2

pvcreate -y /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

vgcreate datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

vgs

lvcreate --type raid0 -l 100%FREE --stripes 24 -n datalv datavg

mkfs.xfs /dev/datavg/datalv

lvdisplay /dev/datavg/datalv -m

mkdir -p /data

cp /etc/fstab /etc/fstab.bak

cat << EOF >> /etc/fstab
/dev/datavg/datalv /data                  xfs     defaults        0 0

EOF

mount -a

yum install -y sysstat
lsblk | grep disk | awk '{print $1}' | xargs -I DEMO echo -n "DEMO "
# sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm
iostat -m -x sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk 5
iostat -m -x dm-10 5


########################################
# ntp
yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

systemctl disable --now firewalld.service

# setup time server
/bin/cp -f /etc/chrony.conf /etc/chrony.conf.bak

cat << EOF > /etc/chrony.conf
server 117.177.241.16 iburst
server 0.rhel.pool.ntp.org iburst
server 1.rhel.pool.ntp.org iburst
server 2.rhel.pool.ntp.org iburst
server 3.rhel.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking
chronyc sources -v

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking

worker-1 disk

##################################
## config
mkdir -p /app_conf/zxcdn


#########################################
# ssd cache + hdd
# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/logical_volume_manager_administration/index#lvm_cache_volume_creation
umount /data
lsblk -d -o name,rota

lvremove  /dev/datavg/datalv

# lsblk | grep 894 | awk '{print $1}'

pvcreate /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/vg_grow
vgextend datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

## raid5

lvcreate --type raid5 -L 3T --stripes 23 -n hddlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid0 -L 1G --stripes 10 -n ssdlv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid5 -L 3T --stripes 23 -n mixlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid5 -L 1T --stripes 9 -n cache1 datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid5 -L 10G --stripes 9 -n cache1meta datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvconvert --type cache-pool --poolmetadata datavg/cache1meta datavg/cache1

lvconvert --type cache --cachepool datavg/cache1 datavg/mixlv

# lvcreate --type raid5 --stripes 9 -L 1T -I 16M -R 4096K -n hddlv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk



lvcreate --type raid5 -L 12T --stripes 23 -n mix0lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid0 -L 4T --stripes 10 -n cachemix0 datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 40G --stripes 10 -n cachemix0meta datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvconvert --type cache-pool --poolmetadata datavg/cachemix0meta datavg/cachemix0

lvconvert --type cache --cachepool datavg/cachemix0 datavg/mix0lv


lvcreate --type raid5 -L 1T --stripes 23 -n mix0weblv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid0 -L 162G --stripes 10 -n cachemix0web datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 2G --stripes 10 -n cachemix0webmeta datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvconvert --type cache-pool --poolmetadata datavg/cachemix0webmeta datavg/cachemix0web

lvconvert --type cache --cachepool datavg/cachemix0web datavg/mix0weblv


# lvcreate --type raid0 -L 200G --stripes 10 -n ssd0lv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 200G --stripes 4 -n ssd0lv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/ssd0lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g

lvremove -f datavg/ssd0lv

## raid0 + stripe

lvcreate --type raid0 -L 130T --stripes 24 -n hddlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx







lvcreate --type raid0 -L 900G --stripesize 128k --stripes 24 -n testfslv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

mkfs.ext4 /dev/datavg/testfslv
mount /dev/datavg/testfslv /data_mix






lvcreate --type raid0 -L 5T --stripes 10 -n ssdlv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid5 -L 5T --stripes 9 -n ssdlv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

mkfs.ext4 /dev/datavg/ssdlv
mount /dev/datavg/ssdlv /data_ssd

rsync -e ssh --info=progress2 -P --delete -ar --files-from=list.20k / 39.134.201.65:/data_ssd/mnt/

rsync -e ssh --info=progress2 -P --delete -ar /data/mnt/ 39.134.201.65:/data_ssd/mnt/

rsync -e ssh --info=progress2 -P --delete -ar /data/mnt/zxdfs/webcache-011/   39.134.201.65:/data_ssd/mnt/zxdfs/webcache-011/

rsync -e ssh --info=progress2 -P --delete -ar /data/mnt/zxdfs/webcache-012/   39.134.201.65:/data_ssd/mnt/zxdfs/webcache-012/







# slow
lvcreate --type raid0 -L 400G --stripesize 128k --stripes 12 -n testfslv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl 

# Generate distribution of file sizes from the command prompt
# https://superuser.com/questions/565443/generate-distribution-of-file-sizes-from-the-command-prompt
cat list | xargs ls -l > list.size
cat list.size | awk '{ n=int(log($5)/log(2));                         \
          if (n<10) n=10;                                               \
          size[n]++ }                                                   \
      END { for (i in size) printf("%d %d\n", 2^i, size[i]) }'          \
 | sort -n                                                              \
 | awk 'function human(x) { x[1]/=1024;                                 \
                            if (x[1]>=1024) { x[2]++;                   \
                                              human(x) } }              \
        { a[1]=$1;                                                      \
          a[2]=0;                                                       \
          human(a);                                                     \
          printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }' 




lvcreate --type raid0 -L 1T --stripes 24 -n mixlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid0 -L 300G --stripes 10 -n ssdlv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 300G --stripes 10 -n cache1 datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 3G --stripes 10 -n cache1meta datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvconvert --type cache-pool --poolmetadata datavg/cache1meta datavg/cache1

# lvs -a -o name,size,attr,devices datavg

lvconvert --type cache --cachepool datavg/cache1 datavg/mixlv

# lvs -a -o name,size,attr,devices datavg
# lvs -o+cache_mode datavg

mkfs.xfs /dev/datavg/hddlv
mkfs.xfs /dev/datavg/ssdlv
mkfs.xfs /dev/datavg/mixlv
mkfs.xfs /dev/datavg/mix0lv
mkfs.xfs /dev/datavg/mix0weblv

mkdir -p /data/
mkdir -p /data_ssd/
mkdir -p /data_mix/
mkdir -p /data_mix0
mkdir -p /data_mix0_web/

cat /etc/fstab

cat << EOF >> /etc/fstab
/dev/datavg/hddlv /data                  xfs     defaults        0 0
# /dev/datavg/ssdlv /data_ssd                  xfs     defaults        0 0
# /dev/datavg/mixlv /data_mix                  xfs     defaults        0 0
# /dev/datavg/mix0lv  /data_mix0                  xfs     defaults        0 0
# /dev/datavg/mix0weblv  /data_mix0_web                  xfs     defaults        0 0
EOF

mount -a
df -h | grep \/data

dd if=/dev/zero of=/data/testfile bs=4k count=9999 oflag=dsync
dd if=/dev/zero of=/data_ssd/testfile bs=4k count=9999 oflag=dsync
dd if=/dev/zero of=/data_mix/testfile bs=4k count=9999 oflag=dsync

dd if=/dev/zero of=/data/testfile bs=4M count=9999 oflag=dsync
dd if=/dev/zero of=/data_ssd/testfile bs=4M count=9999 oflag=dsync
dd if=/dev/zero of=/data_mix/testfile bs=4M count=9999 oflag=dsync

dd if=/data/testfile of=/dev/null bs=4k count=9999 oflag=dsync
dd if=/data_ssd/testfile of=/dev/null bs=4k count=9999 oflag=dsync
dd if=/data_mix/testfile of=/dev/null bs=4k count=9999 oflag=dsync

dd if=/dev/zero of=/data/testfile.large bs=4M count=9999 oflag=direct
dd if=/dev/zero of=/data_ssd/testfile.large bs=4M count=9999 oflag=direct
dd if=/dev/zero of=/data_mix/testfile.large bs=4M count=9999 oflag=direct

dd if=/dev/zero of=/data/testfile.large bs=4M count=9999
dd if=/dev/zero of=/data_ssd/testfile.large bs=4M count=9999 
dd if=/dev/zero of=/data_mix/testfile.large bs=4M count=9999 

dd if=/data/testfile.large of=/dev/null bs=4k count=9999 oflag=dsync
dd if=/data_ssd/testfile.large of=/dev/null bs=4k count=9999 oflag=dsync
dd if=/data_mix/testfile.large of=/dev/null bs=4k count=9999 oflag=dsync

dd if=/data/testfile.large of=/dev/null bs=4M count=9999 oflag=dsync
dd if=/data_ssd/testfile.large of=/dev/null bs=4M count=9999 oflag=dsync
dd if=/data_mix/testfile.large of=/dev/null bs=4M count=9999 oflag=dsync

dd if=/data/testfile.large of=/dev/null bs=4M count=9999
dd if=/data_ssd/testfile.large of=/dev/null bs=4M count=9999
dd if=/data_mix/testfile.large of=/dev/null bs=4M count=9999

dd if=/data/testfile.large of=/dev/null bs=40M count=9999
dd if=/data_ssd/testfile.large of=/dev/null bs=40M count=9999
dd if=/data_mix/testfile.large of=/dev/null bs=40M count=9999

# cleanup
umount /data/
umount /data_ssd/
umount /data_mix/
umount /data_mix0/
lvremove -f /dev/datavg/hddlv
lvremove -f /dev/datavg/ssdlv
lvremove -f /dev/datavg/mixlv
lvremove -f /dev/datavg/mix0lv

# ssd tunning
# https://serverfault.com/questions/80134/linux-md-vs-lvm-performance
hdparm -tT /dev/md0

# https://www.ibm.com/developerworks/cn/linux/l-lo-io-scheduler-optimize-performance/index.html
cat /sys/block/*/queue/scheduler

lsblk | grep 894 | awk '{print $1}' | xargs -I DEMO cat /sys/block/DEMO/queue/scheduler

lsblk | grep 894 | awk '{print "echo deadline > /sys/block/"$1"/queue/scheduler"}' 

iostat -x -m 3 /dev/mapper/datavg-mix0weblv /dev/mapper/datavg-mix0weblv_corig /dev/mapper/datavg-cachemix0web_cdata /dev/mapper/datavg-cachemix0web_cmeta


dstat -D /dev/mapper/datavg-hddlv,sdh,sdab -N bond0

dstat -D /dev/mapper/datavg-hddlv,sdh,sdab --disk-util 

bmon -p eno1,eno2,ens2f0,ens2f1,bond0

lvs -o+lv_all datavg/mixlv_corig

lvs -o+Layout datavg/mixlv_corig

lvs -o+CacheReadHits,CacheReadMisses

lvs -o+Layout

blockdev --report 
# https://access.redhat.com/solutions/3588841
/sbin/blockdev --setra 1048576 /dev/mapper/datavg-hddlv

/sbin/blockdev --setra 524288 /dev/mapper/datavg-hddlv

/sbin/blockdev --setra 262144 /dev/mapper/datavg-hddlv

/sbin/blockdev --setra 131072 /dev/mapper/datavg-hddlv

/sbin/blockdev --setra 65536 /dev/mapper/datavg-hddlv

/sbin/blockdev --setra 32768 /dev/mapper/datavg-hddlv

/sbin/blockdev --setra 16384 /dev/mapper/datavg-hddlv

/sbin/blockdev --setra 8192 /dev/mapper/datavg-hddlv

/sbin/blockdev --setra 8192 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx


for f in /dev/mapper/datavg-hddlv_rimage_*; do /sbin/blockdev --setra 8192 $f ; done

for f in /dev/mapper/datavg-hddlv_rimage_*; do /sbin/blockdev --setra 16384 $f ; done

blktrace /dev/datavg/hddlv  /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

blkparse -o /dev/null -i dm-24 -d dm-24.bin
btt -i dm-24.bin | less

blkparse -o /dev/null -i sda -d sda.bin
btt -i sda.bin | less


# 5.5
# find /data/mnt/ -type f -size -2M -size +512k  > list
var_basedir="/data_mix/mnt"
find $var_basedir -type f -size -2M  > list.2m
find $var_basedir -type f -size -10M  -size +2M > list.10m
find $var_basedir -type f -size +10M > list.100m

find /data/mnt/ -type f > list
dstat --output /root/dstat.csv -D /dev/mapper/datavg-mixlv,/dev/mapper/datavg-mixlv_corig,sdh,sdab -N bond0

dstat -D /dev/mapper/datavg-hddlv,/dev/datavg/testfslv,sdh,sdab -N bond0

mkdir -p /data_mix/mnt
i=11265199
while read f; do
  /bin/cp -f $f /data_mix/mnt/$i &
  ((i++))
  if (( $i % 200 == 0 )) ; then
    wait
  fi
done < list.100m

while true; do
  df -h | grep /data
  sleep 60
done

find /data_mix/mnt/ -type f > list

cat list | shuf > list.shuf.all

cat list.2m | shuf > list.shuf.2m
cat list.10m | shuf > list.shuf.10m
cat list.100m | shuf > list.shuf.100m
cat list.10m list.100m | shuf > list.shuf.+2m

# zte use 1800
var_total=10
split -n l/$var_total list.shuf.all split.list.all.
split -n l/$var_total list.shuf.2m split.list.2m.
split -n l/$var_total list.shuf.10m split.list.10m.
split -n l/$var_total list.shuf.100m split.list.100m.
split -n l/$var_total list.shuf.+2m split.list.+2m.

rm -f split.list.*

for f in split.list.2m.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done
# for f in split.list.+2m.*; do 
#     cat $f | xargs -I DEMO cat DEMO > /dev/null &
# done
for f in split.list.10m.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done
for f in split.list.100m.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

for f in split.list.all.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

jobs -p | xargs kill


ps -ef | grep xargs | grep DEMO | grep cat | awk '{print $2}' | xargs -I DEMO kill DEMO

ps -ef | grep /data_mix/mnt | grep cat | awk '{print $2}' | xargs -I DEMO kill DEMO


rclone sync /data/mnt/ /data/backup/mnt/ -P -L --transfers 64
rclone sync /data/home/ /data/backup/home/ -P -L --transfers 64
rclone sync /data/ztecdn/ /data/backup/ztecdn/ -P -L --transfers 64

rclone sync /data/backup/mnt/ /data/mnt/ -P -L --transfers 64


# check sn
dmidecode -t 1
# # dmidecode 3.2
# Getting SMBIOS data from sysfs.
# SMBIOS 3.0.0 present.

# Handle 0x0001, DMI type 1, 27 bytes
# System Information
#         Manufacturer: Huawei
#         Product Name: 5288 V5
#         Version: Purley
#         Serial Number: 2102312CJSN0K9000028
#         UUID: a659bd21-cc64-83c1-e911-6cd6de4f8050
#         Wake-up Type: Power Switch
#         SKU Number: Purley
#         Family: Purley

# check disk
lshw -c disk
  # *-disk:0
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.2.0
  #      bus info: scsi@0:0.2.0
  #      logical name: /dev/sda
  #      version: T010
  #      serial: xLkuQ2-XVVp-sfs3-8Rgm-vRgS-uysW-ncIudq
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:1
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.3.0
  #      bus info: scsi@0:0.3.0
  #      logical name: /dev/sdb
  #      version: T010
  #      serial: 5d2geD-fGih-Q6yK-2xVs-lWUG-tH38-qQWRC6
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:2
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.c.0
  #      bus info: scsi@0:0.12.0
  #      logical name: /dev/sdk
  #      version: T010
  #      serial: fePKOb-MTZv-j4Xz-qNjo-cPTr-078I-vZYiPH
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:3
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.d.0
  #      bus info: scsi@0:0.13.0
  #      logical name: /dev/sdl
  #      version: T010
  #      serial: fUTBJp-fXg0-0uJX-V4Qp-vSfZ-yxmb-G8LNam
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:4
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.e.0
  #      bus info: scsi@0:0.14.0
  #      logical name: /dev/sdm
  #      version: T010
  #      serial: SNfxce-ytX2-7j4p-opnQ-lOxC-AFIp-VbCfec
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:5
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.f.0
  #      bus info: scsi@0:0.15.0
  #      logical name: /dev/sdn
  #      version: T010
  #      serial: HJqH2G-XT7i-2R27-dSb0-q36n-T4Ut-Ml4GiE
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:6
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.10.0
  #      bus info: scsi@0:0.16.0
  #      logical name: /dev/sdo
  #      version: T010
  #      serial: IBh87y-SOWJ-rI3R-Mshu-agWM-TyHs-6ko0iu
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:7
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.11.0
  #      bus info: scsi@0:0.17.0
  #      logical name: /dev/sdp
  #      version: T010
  #      serial: erBKxc-gBsD-msEq-aXMJ-8akE-FGRb-SjBk1w
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:8
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.12.0
  #      bus info: scsi@0:0.18.0
  #      logical name: /dev/sdq
  #      version: T010
  #      serial: HsiL2h-6736-4x4H-0OTz-HuXj-My1c-RRShQP
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:9
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.13.0
  #      bus info: scsi@0:0.19.0
  #      logical name: /dev/sdr
  #      version: T010
  #      serial: yZQ8MH-7SCw-KIFL-fphN-S0W0-GS4V-Wc2gwx
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:10
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.14.0
  #      bus info: scsi@0:0.20.0
  #      logical name: /dev/sds
  #      version: T010
  #      serial: pp6xvN-MBT9-aLkB-65hF-7fwE-29vt-hA51K9
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:11
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.15.0
  #      bus info: scsi@0:0.21.0
  #      logical name: /dev/sdt
  #      version: T010
  #      serial: jXj3cL-qvoJ-JWP0-jvp9-WEbn-yD63-e6vFmP
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:12
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.4.0
  #      bus info: scsi@0:0.4.0
  #      logical name: /dev/sdc
  #      version: T010
  #      serial: Ca6Nyo-Oq5p-UdAY-oqIs-DlK5-1PPy-ugvF3P
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:13
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.16.0
  #      bus info: scsi@0:0.22.0
  #      logical name: /dev/sdu
  #      version: T010
  #      serial: GOTXh2-34fo-rZfh-IB5d-RkwW-o5EC-rDD4R1
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:14
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.17.0
  #      bus info: scsi@0:0.23.0
  #      logical name: /dev/sdv
  #      version: T010
  #      serial: 7Yn8xd-68Xu-A0RC-nx5Q-YEvJ-QPEG-CwjkP0
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:15
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.18.0
  #      bus info: scsi@0:0.24.0
  #      logical name: /dev/sdw
  #      version: T010
  #      serial: hdz5tv-f2Zm-wuf8-qtKO-XIlN-4Z1E-uHapKc
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:16
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.19.0
  #      bus info: scsi@0:0.25.0
  #      logical name: /dev/sdx
  #      version: T010
  #      serial: C3VFhO-mh9a-vKIR-Gi1o-pc05-LOqY-oErH8r
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:17
  #      description: SCSI Disk
  #      product: HW-SAS3408
  #      vendor: AVAGO
  #      physical id: 2.0.0
  #      bus info: scsi@0:2.0.0
  #      logical name: /dev/sdy
  #      version: 5.06
  #      serial: 00457f537b174eb025007018406c778a
  #      size: 446GiB (478GB)
  #      capabilities: gpt-1.00 partitioned partitioned:gpt
  #      configuration: ansiversion=5 guid=f72b8f56-6e5d-4a0c-a2a0-bf641ac2c2ff logicalsectorsize=512 sectorsize=4096
  # *-disk:18
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.5.0
  #      bus info: scsi@0:0.5.0
  #      logical name: /dev/sdd
  #      version: T010
  #      serial: 1sulWQ-pttz-zf0P-WTEe-cydl-lY6Q-CdX4Hv
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:19
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.6.0
  #      bus info: scsi@0:0.6.0
  #      logical name: /dev/sde
  #      version: T010
  #      serial: JF6q37-XaYh-qoXg-mPeZ-4Ofr-Qrkt-nh21RR
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:20
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.7.0
  #      bus info: scsi@0:0.7.0
  #      logical name: /dev/sdf
  #      version: T010
  #      serial: vvF48a-k1sq-7v1m-dpSh-yb50-KLLk-otk7lA
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:21
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.8.0
  #      bus info: scsi@0:0.8.0
  #      logical name: /dev/sdg
  #      version: T010
  #      serial: NHU0VX-vm31-DyRP-V4dc-gx7T-dXGI-Bb8qlw
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:22
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.9.0
  #      bus info: scsi@0:0.9.0
  #      logical name: /dev/sdh
  #      version: T010
  #      serial: jCIRNL-K08S-oYZc-Q5Eb-Y2ht-0NYt-0luz1T
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:23
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.a.0
  #      bus info: scsi@0:0.10.0
  #      logical name: /dev/sdi
  #      version: T010
  #      serial: wiQiLJ-Arua-8vcg-m6ta-KgSL-f1kD-rgzKxD
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:24
  #      description: ATA Disk
  #      product: HUS726T6TALE600
  #      physical id: 0.b.0
  #      bus info: scsi@0:0.11.0
  #      logical name: /dev/sdj
  #      version: T010
  #      serial: T7vZ96-uTGr-tvFz-jKoZ-479j-vRvh-WeCVRJ
  #      size: 5589GiB (6001GB)
  #      capacity: 5589GiB (6001GB)
  #      capabilities: 7200rpm lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:0
  #      description: ATA Disk
  #      product: MTFDDAK960TDC-1A
  #      physical id: 0.e.0
  #      bus info: scsi@15:0.14.0
  #      logical name: /dev/sdz
  #      version: M030
  #      serial: HE21uM-4KRw-heFX-IFVf-zO8Y-Rzah-ncwlwL
  #      size: 894GiB (960GB)
  #      capacity: 894GiB (960GB)
  #      capabilities: lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:1
  #      description: ATA Disk
  #      product: MTFDDAK960TDC-1A
  #      physical id: 0.f.0
  #      bus info: scsi@15:0.15.0
  #      logical name: /dev/sdaa
  #      version: M030
  #      serial: RGeqtd-dTEc-hV8g-Xd9o-I1Ke-sDH1-UK6mZg
  #      size: 894GiB (960GB)
  #      capacity: 894GiB (960GB)
  #      capabilities: lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:2
  #      description: ATA Disk
  #      product: MTFDDAK960TDC-1A
  #      physical id: 0.10.0
  #      bus info: scsi@15:0.16.0
  #      logical name: /dev/sdab
  #      version: M030
  #      serial: 1ROsNp-0J4j-DuWM-1nNl-Fo3K-gWfg-d7VDLq
  #      size: 894GiB (960GB)
  #      capacity: 894GiB (960GB)
  #      capabilities: lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:3
  #      description: ATA Disk
  #      product: MTFDDAK960TDC-1A
  #      physical id: 0.11.0
  #      bus info: scsi@15:0.17.0
  #      logical name: /dev/sdac
  #      version: M030
  #      serial: s0XeSI-Zl3B-0xcU-8wi3-BvVo-vU3k-cLZx22
  #      size: 894GiB (960GB)
  #      capacity: 894GiB (960GB)
  #      capabilities: lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:4
  #      description: ATA Disk
  #      product: MTFDDAK960TDC-1A
  #      physical id: 0.12.0
  #      bus info: scsi@15:0.18.0
  #      logical name: /dev/sdad
  #      version: M030
  #      serial: rZZ7yM-KImV-6Ld8-xmOJ-KyiC-Wstp-4t35S3
  #      size: 894GiB (960GB)
  #      capacity: 894GiB (960GB)
  #      capabilities: lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:5
  #      description: ATA Disk
  #      product: MTFDDAK960TDC-1A
  #      physical id: 0.13.0
  #      bus info: scsi@15:0.19.0
  #      logical name: /dev/sdae
  #      version: M030
  #      serial: LI50dd-vn2G-RiYE-5iuL-nxYI-TXCT-zs1lSY
  #      size: 894GiB (960GB)
  #      capacity: 894GiB (960GB)
  #      capabilities: lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:6
  #      description: ATA Disk
  #      product: MTFDDAK960TDC-1A
  #      physical id: 0.14.0
  #      bus info: scsi@15:0.20.0
  #      logical name: /dev/sdaf
  #      version: M030
  #      serial: 2hkDxG-90a2-mkEJ-GxmQ-doAv-SPT1-8qyo10
  #      size: 894GiB (960GB)
  #      capacity: 894GiB (960GB)
  #      capabilities: lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:7
  #      description: ATA Disk
  #      product: MTFDDAK960TDC-1A
  #      physical id: 0.15.0
  #      bus info: scsi@15:0.21.0
  #      logical name: /dev/sdag
  #      version: M030
  #      serial: bMQrTa-IKF7-vDFU-5RSR-cj4a-cOUL-QAY2yI
  #      size: 894GiB (960GB)
  #      capacity: 894GiB (960GB)
  #      capabilities: lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:8
  #      description: ATA Disk
  #      product: MTFDDAK960TDC-1A
  #      physical id: 0.16.0
  #      bus info: scsi@15:0.22.0
  #      logical name: /dev/sdah
  #      version: M030
  #      serial: q0VZpE-4sub-HKbe-RkRx-G0wM-HOeU-NDRXRe
  #      size: 894GiB (960GB)
  #      capacity: 894GiB (960GB)
  #      capabilities: lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:9
  #      description: ATA Disk
  #      product: MTFDDAK960TDC-1A
  #      physical id: 0.17.0
  #      bus info: scsi@15:0.23.0
  #      logical name: /dev/sdai
  #      version: M030
  #      serial: fEj7Rr-FSS8-ruwb-IjSj-xW6l-oj6v-q1pSNV
  #      size: 894GiB (960GB)
  #      capacity: 894GiB (960GB)
  #      capabilities: lvm2
  #      configuration: ansiversion=6 logicalsectorsize=512 sectorsize=4096
  # *-disk:10
  #      description: SCSI Disk
  #      product: HW-SAS3408
  #      vendor: AVAGO
  #      physical id: 2.0.0
  #      bus info: scsi@15:2.0.0
  #      logical name: /dev/sdaj
  #      version: 5.06
  #      serial: 00a6b489499e4cb02500904af3624ac6
  #      size: 893GiB (958GB)
  #      capabilities: partitioned partitioned:dos
  #      configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 signature=550d3974

yum -y install fio

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/vdo-ev-performance-testing

lvs -o+cache_policy,cache_settings,chunksize datavg/mix0weblv

# https://access.redhat.com/solutions/2961861
for i in  /proc/[0-9]* ; do echo $i >> /tmp/mountinfo ;  grep -q "/dev/mapper/datavg-mix0weblv" $i/mountinfo ; echo $? >> /tmp/mountinfo ; done

grep -B 1 '^0$' /tmp/mountinfo 

lvcreate --type raid5 -L 120G --stripes 23 -n mixtestlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

fio --rw=rw --rwmixread=80 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/mixtestlv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g

lvremove -f datavg/mixtestlv
# Run status group 0 (all jobs):
#    READ: bw=587MiB/s (615MB/s), 587MiB/s-587MiB/s (615MB/s-615MB/s), io=79.9GiB (85.8GB), run=139473-139473msec
#   WRITE: bw=147MiB/s (155MB/s), 147MiB/s-147MiB/s (155MB/s-155MB/s), io=20.1GiB (21.6GB), run=139473-139473msec

lvcreate --type raid6 -L 120G --stripes 22 -n mixtestlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

fio --rw=rw --rwmixread=80 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/mixtestlv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g

lvremove -f datavg/mixtestlv
# Run status group 0 (all jobs):
#    READ: bw=586MiB/s (614MB/s), 586MiB/s-586MiB/s (614MB/s-614MB/s), io=79.9GiB (85.8GB), run=139739-139739msec
#   WRITE: bw=147MiB/s (154MB/s), 147MiB/s-147MiB/s (154MB/s-154MB/s), io=20.1GiB (21.6GB), run=139739-139739msec

lvcreate --type raid0 -L 120G --stripes 24 -n mixtestlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

fio --rw=rw --rwmixread=80 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/mixtestlv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g

lvremove -f datavg/mixtestlv
# Run status group 0 (all jobs):
#    READ: bw=1139MiB/s (1194MB/s), 1139MiB/s-1139MiB/s (1194MB/s-1194MB/s), io=79.9GiB (85.8GB), run=71841-71841msec
#   WRITE: bw=286MiB/s (300MB/s), 286MiB/s-286MiB/s (300MB/s-300MB/s), io=20.1GiB (21.6GB), run=71841-71841msec

lvcreate --type raid0 -L 100G --stripes 10 -n mixtestlv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

fio --rw=rw --rwmixread=80 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/mixtestlv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g

lvremove -f datavg/mixtestlv
# Run status group 0 (all jobs):
#    READ: bw=1358MiB/s (1424MB/s), 1358MiB/s-1358MiB/s (1424MB/s-1424MB/s), io=79.9GiB (85.8GB), run=60282-60282msec
#   WRITE: bw=341MiB/s (358MB/s), 341MiB/s-341MiB/s (358MB/s-358MB/s), io=20.1GiB (21.6GB), run=60282-60282msec


lvcreate --type raid5 -L 100G --stripes 9 -n mixtestlv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

fio --rw=rw --rwmixread=80 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/mixtestlv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g

lvremove -f datavg/mixtestlv



lvcreate --type raid6 -L 100G --stripes 9 -n mixtestlv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

fio --rw=rw --rwmixread=80 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/mixtestlv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g

lvremove -f datavg/mixtestlv



lvcreate --type raid5 -L 120G --stripes 23 -n mixtestlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid0 -L 40G --stripes 10 -n cachetest datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 400M --stripes 10 -n cachetestmeta datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvconvert --type cache-pool --poolmetadata datavg/cachetestmeta datavg/cachetest

lvconvert --type cache --cachepool datavg/cachetest datavg/mixtestlv

fio --rw=rw --rwmixread=80 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/mixtestlv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g -random_distribution=zoned:60/10:30/20:8/30:2/40

lvremove -f datavg/mixtestlv
# Run status group 0 (all jobs):
#    READ: bw=716MiB/s (750MB/s), 716MiB/s-716MiB/s (750MB/s-750MB/s), io=31.0GiB (34.3GB), run=45744-45744msec
#   WRITE: bw=180MiB/s (189MB/s), 180MiB/s-180MiB/s (189MB/s-189MB/s), io=8228MiB (8628MB), run=45744-45744msec

lvcreate --type raid5 -L 120G --stripes 23 -n mixtestlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid5 -L 40G --stripes 9 -n cachetest datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid5 -L 400M --stripes 9 -n cachetestmeta datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvconvert --type cache-pool --poolmetadata datavg/cachetestmeta datavg/cachetest

lvconvert --type cache --cachepool datavg/cachetest datavg/mixtestlv

fio --rw=rw --rwmixread=80 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/mixtestlv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g -random_distribution=zoned:60/10:30/20:8/30:2/40

lvremove -f datavg/mixtestlv
# Run status group 0 (all jobs):
#    READ: bw=487MiB/s (511MB/s), 487MiB/s-487MiB/s (511MB/s-511MB/s), io=79.9GiB (85.8GB), run=167880-167880msec
#   WRITE: bw=122MiB/s (128MB/s), 122MiB/s-122MiB/s (128MB/s-128MB/s), io=20.1GiB (21.6GB), run=167880-167880msec

lvcreate -L 100G -n singledisklv datavg /dev/sda

fio --rw=rw --rwmixread=80 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/singledisklv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g -random_distribution=zoned:60/10:30/20:8/30:2/40

lvremove -f datavg/singledisklv
# Run status group 0 (all jobs):
#    READ: bw=151MiB/s (158MB/s), 151MiB/s-151MiB/s (158MB/s-158MB/s), io=44.2GiB (47.5GB), run=300031-300031msec
#   WRITE: bw=37.0MiB/s (39.8MB/s), 37.0MiB/s-37.0MiB/s (39.8MB/s-39.8MB/s), io=11.1GiB (11.9GB), run=300031-300031msec

lvcreate -L 20G -n singledisklv datavg /dev/sdai

fio --rw=rw --rwmixread=80 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/singledisklv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=20g -random_distribution=zoned:60/10:30/20:8/30:2/40

lvremove -f datavg/singledisklv
# Run status group 0 (all jobs):
#    READ: bw=431MiB/s (452MB/s), 431MiB/s-431MiB/s (452MB/s-452MB/s), io=16.0GiB (17.2GB), run=38005-38005msec
#   WRITE: bw=108MiB/s (113MB/s), 108MiB/s-108MiB/s (113MB/s-113MB/s), io=4088MiB (4287MB), run=38005-38005msec

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --directory=./ --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --directory=./ --ioengine=sync --size=100g 

blktrace /dev/datavg/mixlv 
# http benchmark tools
yum install httpd-tools
# https://github.com/philipgloyne/apachebench-for-multi-url
# https://hub.docker.com/r/chrisipa/ab-multi-url
# https://www.simonholywell.com/post/2015/06/parallel-benchmark-many-urls-with-apachebench/


fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/ssd0lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g

fio --rw=rw --rwmixread=99 --bsrange=128k-256k --name=vdo \
    --filename=/dev/datavg/ssd0lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g



worker-1 nic bond

ip link show
# 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether cc:64:a6:59:bd:24 brd ff:ff:ff:ff:ff:ff
# 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether cc:64:a6:59:bd:25 brd ff:ff:ff:ff:ff:ff
# 4: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether 08:4f:0a:b5:a2:be brd ff:ff:ff:ff:ff:ff
# 5: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether cc:64:a6:59:bd:26 brd ff:ff:ff:ff:ff:ff
# 6: eno4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether cc:64:a6:59:bd:27 brd ff:ff:ff:ff:ff:ff
# 7: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether 08:4f:0a:b5:a2:bf brd ff:ff:ff:ff:ff:ff

ip a s eno1
# 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether cc:64:a6:59:bd:24 brd ff:ff:ff:ff:ff:ff
#     inet 39.134.201.65/27 brd 39.134.201.95 scope global noprefixroute eno1
#        valid_lft forever preferred_lft forever
#     inet6 fe80::149f:d0ce:2700:4bf2/64 scope link noprefixroute
#        valid_lft forever preferred_lft forever

ethtool eno1  # 10000baseT/Full
ethtool eno2  # 10000baseT/Full
ethtool eno3  # 1000baseT/Full
ethtool eno4  # 1000baseT/Full
ethtool ens2f0  #  10000baseT/Full
ethtool ens2f1  #  10000baseT/Full

nmcli con add type bond \
    con-name bond0 \
    ifname bond0 \
    mode 802.3ad 

nmcli con mod id bond0 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3
    
nmcli con add type bond-slave ifname eno2 con-name eno2 master bond0
nmcli con add type bond-slave ifname ens2f0 con-name ens2f0 master bond0
nmcli con add type bond-slave ifname ens2f1 con-name ens2f1 master bond0

nmcli con down eno2
nmcli con up eno2
nmcli con down ens2f0
nmcli con up ens2f0
nmcli con down ens2f1
nmcli con up ens2f1
nmcli con down bond0
nmcli con start bond0       


#######################################
# nic bond
cat > /root/nic.bond.sh << 'EOF'
#!/bin/bash

set -x 

# delete all connection 
nmcli -g uuid con | while read i ; do nmcli c delete  ${i} ; done 

nmcli con add type bond \
    con-name bond0 \
    ifname bond0 \
    mode 802.3ad \
    ipv4.method 'manual' \
    ipv4.address '39.134.201.65/27' \
    ipv4.gateway '39.134.201.94' \
    ipv4.dns '117.177.241.16'
    
nmcli con mod id bond0 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3

nmcli con add type bond-slave ifname eno1 con-name eno1 master bond0    
nmcli con add type bond-slave ifname eno2 con-name eno2 master bond0
nmcli con add type bond-slave ifname ens2f0 con-name ens2f0 master bond0
nmcli con add type bond-slave ifname ens2f1 con-name ens2f1 master bond0

systemctl restart network

EOF

cat > /root/nic.restore.sh << 'EOF'
#!/bin/bash

set -x 

# delete all connection 
nmcli -g uuid con | while read i ; do nmcli c delete  ${i} ; done 

# re-create primary connection 
nmcli con add type ethernet \
    con-name eno1 \
    ifname eno1 \
    ipv4.method 'manual' \
    ipv4.address '39.134.201.65/27' \
    ipv4.gateway '39.134.201.94' \
    ipv4.dns '117.177.241.16'

systemctl restart network

exit 0
EOF

chmod +x /root/nic.restore.sh

cat > ~/cron-network-con-recreate << EOF
*/20 * * * * /bin/bash /root/nic.restore.sh
EOF

crontab ~/cron-network-con-recreate

bash /root/nic.bond.sh

# debug
cat /proc/net/bonding/bond0
cat /sys/class/net/bond*/bonding/xmit_hash_policy
# https://access.redhat.com/solutions/666853
ip -s -h link show master bond0

worker-2 host


mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum --disableplugin=subscription-manager  repolist

yum install -y byobu htop iostat

yum -y update

hostnamectl set-hostname worker-2.ocpsc.redhat.ren

nmcli connection modify eno1 ipv4.dns 117.177.241.16
nmcli connection reload
nmcli connection up eno1

yum -y install fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

EOF

systemctl enable fail2ban
systemctl restart fail2ban

cat << EOF > /etc/fail2ban/jail.d/wzh.conf
[sshd]
enabled = true

[recidive]
enabled = true

EOF

systemctl restart fail2ban

fail2ban-client status sshd
fail2ban-client status recidive
systemctl status fail2ban
tail -F /var/log/fail2ban.log

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK
sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config

diff /etc/ssh/sshd_config /etc/ssh/sshd_config.BAK

systemctl restart sshd

passwd

useradd -m wzh

lsblk | grep 5.5 | awk '{print $1}' | xargs -I DEMO echo -n "/dev/DEMO "
# /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk
lsblk | grep 5.5 | awk '{print $1}' | wc -l
# 24

yum install -y lvm2

pvcreate -y /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

vgcreate datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

vgs

lvcreate --type raid0 -l 100%FREE --stripes 24 -n datalv datavg

mkfs.xfs /dev/datavg/datalv

lvdisplay /dev/datavg/datalv -m

mkdir -p /data

cp /etc/fstab /etc/fstab.bak

cat << EOF >> /etc/fstab
/dev/datavg/datalv /data                  xfs     defaults        0 0

EOF

mount -a

yum install -y sysstat
lsblk | grep disk | awk '{print $1}' | xargs -I DEMO echo -n "DEMO "
# sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm
iostat -m -x sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk 5
iostat -m -x dm-10 5


########################################
# ntp
yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

systemctl disable --now firewalld.service

# setup time server
/bin/cp -f /etc/chrony.conf /etc/chrony.conf.bak

cat << EOF > /etc/chrony.conf
server 117.177.241.16 iburst
server 0.rhel.pool.ntp.org iburst
server 1.rhel.pool.ntp.org iburst
server 2.rhel.pool.ntp.org iburst
server 3.rhel.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking
chronyc sources -v

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking


worker-2 disk



#########################################
# ssd cache + hdd
# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/logical_volume_manager_administration/index#lvm_cache_volume_creation
umount /data
lsblk -d -o name,rota

lvremove  /dev/datavg/datalv

# lsblk | grep 894 | awk '{print $1}'

pvcreate /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/vg_grow
vgextend datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

## raid5

lvcreate --type raid5 -L 1G --stripes 23 -n hddlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid5 -L 1G --stripes 23 -n mixlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid5 -L 1G --stripes 9 -n ssdlv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai


lvcreate --type raid5 -L 3T --stripes 23 -n mix0lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx


lvcreate --type raid0 -L 1.3536T --stripes 10 -n cachemix0 datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 13G --stripes 10 -n cachemix0meta datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvconvert --type cache-pool --poolmetadata datavg/cachemix0meta datavg/cachemix0

lvconvert --type cache --cachepool datavg/cachemix0 datavg/mix0lv

# lvcreate --type raid5 --stripes 9 -L 1T -I 16M -R 4096K -n hddlv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk


## raid0 + stripe



lvcreate --type raid0 -L 1T --stripes 24 -n hdd0lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/hdd0lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=2453MiB/s (2572MB/s), 2453MiB/s-2453MiB/s (2572MB/s-2572MB/s), io=98.0GiB (106GB), run=41331-41331msec
#   WRITE: bw=24.9MiB/s (26.1MB/s), 24.9MiB/s-24.9MiB/s (26.1MB/s-26.1MB/s), io=1029MiB (1079MB), run=41331-41331msec
lvs -o+stripesize,chunksize datavg/hdd0lv
  # LV     VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe Chunk
  # hdd0lv datavg rwi-aor--- 1.00t                                                     64.00k    0
lvremove -f datavg/hdd0lv

lvcreate --type raid0 -L 1T -I 128 --stripes 24 -n hdd1lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/hdd1lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=2674MiB/s (2804MB/s), 2674MiB/s-2674MiB/s (2804MB/s-2804MB/s), io=98.0GiB (106GB), run=37912-37912msec
#   WRITE: bw=27.1MiB/s (28.4MB/s), 27.1MiB/s-27.1MiB/s (28.4MB/s-28.4MB/s), io=1029MiB (1079MB), run=37912-37912msec
lvs -o+stripesize,chunksize datavg/hdd1lv
  # LV     VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe  Chunk
  # hdd1lv datavg rwi-a-r--- 1.00t                                                     128.00k    0
lvremove -f datavg/hdd1lv

lvcreate --type raid0 -L 1T -I 256 --stripes 24 -n hdd1lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/hdd1lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=2674MiB/s (2804MB/s), 2674MiB/s-2674MiB/s (2804MB/s-2804MB/s), io=98.0GiB (106GB), run=37912-37912msec
#   WRITE: bw=27.1MiB/s (28.4MB/s), 27.1MiB/s-27.1MiB/s (28.4MB/s-28.4MB/s), io=1029MiB (1079MB), run=37912-37912msec
lvs -o+stripesize,chunksize datavg/hdd1lv
  # LV     VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe  Chunk
  # hdd1lv datavg rwi-a-r--- 1.00t                                                     256.00k    0k    0
lvremove -f datavg/hdd1lv


lvcreate --type raid0 -L 300G --stripes 10 -n ssd0lv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/ssd0lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=2602MiB/s (2728MB/s), 2602MiB/s-2602MiB/s (2728MB/s-2728MB/s), io=98.0GiB (106GB), run=38965-38965msec
#   WRITE: bw=26.4MiB/s (27.7MB/s), 26.4MiB/s-26.4MiB/s (27.7MB/s-27.7MB/s), io=1029MiB (1079MB), run=38965-38965msec
lvs -o+stripesize,chunksize datavg/ssd0lv
  # LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe Chunk
  # ssd0lv datavg rwi-a-r--- 300.00g                                                     64.00k    0
lvremove -f datavg/ssd0lv

lvcreate --type raid0 -L 300G -I 128 --stripes 10 -n ssd0lv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/ssd0lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=2438MiB/s (2556MB/s), 2438MiB/s-2438MiB/s (2556MB/s-2556MB/s), io=98.0GiB (106GB), run=41584-41584msec
#   WRITE: bw=24.7MiB/s (25.9MB/s), 24.7MiB/s-24.7MiB/s (25.9MB/s-25.9MB/s), io=1029MiB (1079MB), run=41584-41584msec
lvs -o+stripesize,chunksize datavg/ssd0lv
  # LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe  Chunk
  # ssd0lv datavg rwi-a-r--- 300.00g                                                     128.00k    0
lvremove -f datavg/ssd0lv

lvcreate --type raid0 -L 300G -I 256 --stripes 10 -n ssd0lv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/ssd0lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=1908MiB/s (2000MB/s), 1908MiB/s-1908MiB/s (2000MB/s-2000MB/s), io=98.0GiB (106GB), run=53135-53135msec
#   WRITE: bw=19.4MiB/s (20.3MB/s), 19.4MiB/s-19.4MiB/s (20.3MB/s-20.3MB/s), io=1029MiB (1079MB), run=53135-53135msec
lvs -o+stripesize,chunksize datavg/ssd0lv
  LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe  Chunk
  # ssd0lv datavg rwi-a-r--- 300.00g                                                     256.00k    0   0
lvremove -f datavg/ssd0lv


lvcreate --type raid5 -L 120G --stripes 23 -n hdd5lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/hdd5lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=474MiB/s (497MB/s), 474MiB/s-474MiB/s (497MB/s-497MB/s), io=98.0GiB (106GB), run=214073-214073msec
#   WRITE: bw=4920KiB/s (5038kB/s), 4920KiB/s-4920KiB/s (5038kB/s-5038kB/s), io=1029MiB (1079MB), run=214073-214073msec
lvs -o+stripesize,chunksize datavg/hdd5lv
  # LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe Chunk
  # hdd5lv datavg rwi-a-r--- 120.03g                                    100.00           64.00k    0
lvremove -f datavg/hdd5lv


lvcreate --type raid5 -L 120G -I 128 --stripes 23 -n hdd5lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/hdd5lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=449MiB/s (471MB/s), 449MiB/s-449MiB/s (471MB/s-471MB/s), io=98.0GiB (106GB), run=225892-225892msec
#   WRITE: bw=4663KiB/s (4775kB/s), 4663KiB/s-4663KiB/s (4775kB/s-4775kB/s), io=1029MiB (1079MB), run=225892-225892msec
lvs -o+stripesize,chunksize datavg/hdd5lv
  # LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe  Chunk
  # hdd5lv datavg rwi-a-r--- 120.03g                                    100.00           128.00k    0
lvremove -f datavg/hdd5lv


lvcreate --type raid5 -L 120G --stripes 23 -n mixtestlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid0 -L 40G --stripes 10 -n cachetest datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 1G --stripes 10 -n cache1testmeta datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvconvert --type cache-pool --poolmetadata datavg/cache1testmeta datavg/cachetest

lvconvert --type cache --cachepool datavg/cachetest datavg/mixtestlv

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/mixtestlv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=449MiB/s (471MB/s), 449MiB/s-449MiB/s (471MB/s-471MB/s), io=98.0GiB (106GB), run=225892-225892msec
#   WRITE: bw=4663KiB/s (4775kB/s), 4663KiB/s-4663KiB/s (4775kB/s-4775kB/s), io=1029MiB (1079MB), run=225892-225892msec
lvs -o+stripesize,chunksize datavg/mixtestlv
  # LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe  Chunk
  # hdd5lv datavg rwi-a-r--- 120.03g                                    100.00           128.00k    0
lvremove -f datavg/mixtestlv



lvcreate --type raid0 -L 1T --stripes 24 -n hdd1lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

fio --rw=randrw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/hdd1lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=2453MiB/s (2572MB/s), 2453MiB/s-2453MiB/s (2572MB/s-2572MB/s), io=98.0GiB (106GB), run=41331-41331msec
#   WRITE: bw=24.9MiB/s (26.1MB/s), 24.9MiB/s-24.9MiB/s (26.1MB/s-26.1MB/s), io=1029MiB (1079MB), run=41331-41331msec
lvs -o+stripesize,chunksize datavg/hdd1lv
  # LV     VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe Chunk
  # hdd0lv datavg rwi-aor--- 1.00t                                                     64.00k    0
lvremove -f datavg/hdd1lv



lvcreate --type raid0 -L 300G --stripes 10 -n ssd0lv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

fio --rw=randrw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --filename=/dev/datavg/ssd0lv --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=1 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 
# Run status group 0 (all jobs):
#    READ: bw=1527MiB/s (1601MB/s), 1527MiB/s-1527MiB/s (1601MB/s-1601MB/s), io=98.0GiB (106GB), run=66375-66375msec
#   WRITE: bw=15.5MiB/s (16.2MB/s), 15.5MiB/s-15.5MiB/s (16.2MB/s-16.2MB/s), io=1029MiB (1079MB), run=66375-66375msec
lvs -o+stripesize,chunksize datavg/ssd0lv
  # LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Stripe Chunk
  # ssd0lv datavg rwi-a-r--- 300.00g                                                     64.00k    0
lvremove -f datavg/ssd0lv








lvcreate --type raid0 -L 1G --stripes 24 -n hddlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx



lvcreate --type raid0 -L 130T --stripes 24 -n mixlv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

# lvcreate --type raid0 -L 300G --stripes 10 -n ssdlv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 8.6T --stripes 10 -n cache1 datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 40G --stripes 10 -n cache1meta datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvconvert --type cache-pool --poolmetadata datavg/cache1meta datavg/cache1

# lvs -a -o name,size,attr,devices datavg

lvconvert --type cache --cachepool datavg/cache1 datavg/mixlv

lvconvert --splitcache datavg/mixlv

# lvs -a -o name,size,attr,devices datavg
# lvs -o+cache_mode datavg

mkfs.xfs /dev/datavg/hddlv
mkfs.xfs /dev/datavg/ssdlv
mkfs.xfs /dev/datavg/mixlv
mkfs.xfs /dev/datavg/mix0lv

mkdir -p /data/
mkdir -p /data_ssd/
mkdir -p /data_mix/
mkdir -p /data_mix0

cat /etc/fstab

cat << EOF >> /etc/fstab
/dev/datavg/hddlv /data                  xfs     defaults        0 0
/dev/datavg/ssdlv /data_ssd                  xfs     defaults        0 0
/dev/datavg/mixlv /data_mix                  xfs     defaults        0 0
/dev/datavg/mix0lv  /data_mix0                  xfs     defaults        0 0
EOF

mount -a
df -h | grep \/data

dd if=/dev/zero of=/data/testfile bs=4k count=9999 oflag=dsync
dd if=/dev/zero of=/data_ssd/testfile bs=4k count=9999 oflag=dsync
dd if=/dev/zero of=/data_mix/testfile bs=4k count=9999 oflag=dsync

dd if=/dev/zero of=/data/testfile bs=4M count=9999 oflag=dsync
dd if=/dev/zero of=/data_ssd/testfile bs=4M count=9999 oflag=dsync
dd if=/dev/zero of=/data_mix/testfile bs=4M count=9999 oflag=dsync

dd if=/dev/zero of=/data/testfile.large bs=4M count=9999 oflag=direct
dd if=/dev/zero of=/data_ssd/testfile.large bs=4M count=9999 oflag=direct
dd if=/dev/zero of=/data_mix/testfile.large bs=4M count=9999 oflag=direct

dd if=/dev/zero of=/data/testfile.large bs=4M count=9999
dd if=/dev/zero of=/data_ssd/testfile.large bs=4M count=9999 
dd if=/dev/zero of=/data_mix/testfile.large bs=4M count=9999 

dd if=/data/testfile.large of=/dev/null bs=4k count=9999 oflag=dsync
dd if=/data_ssd/testfile.large of=/dev/null bs=4k count=9999 oflag=dsync
dd if=/data_mix/testfile.large of=/dev/null bs=4k count=999999 oflag=dsync

dd if=/data/testfile.large of=/dev/null bs=4M count=9999 oflag=dsync
dd if=/data_ssd/testfile.large of=/dev/null bs=4M count=9999 oflag=dsync
dd if=/data_mix/testfile.large of=/dev/null bs=4M count=9999 oflag=dsync

dd if=/data/testfile.large of=/dev/null bs=4M count=9999
dd if=/data_ssd/testfile.large of=/dev/null bs=4M count=9999
dd if=/data_mix/testfile.large of=/dev/null bs=4M count=9999

# cleanup
umount /data/
umount /data_ssd/
umount /data_mix/
umount /data_mix0/
lvremove -f /dev/datavg/hddlv
lvremove -f /dev/datavg/ssdlv
lvremove -f /dev/datavg/mixlv
lvremove -f /dev/datavg/mix0lv


# ssd tunning
# https://serverfault.com/questions/80134/linux-md-vs-lvm-performance
hdparm -tT /dev/md0

# https://www.ibm.com/developerworks/cn/linux/l-lo-io-scheduler-optimize-performance/index.html
cat /sys/block/*/queue/scheduler

lsblk | grep 894 | awk '{print $1}' | xargs -I DEMO cat /sys/block/DEMO/queue/scheduler

lsblk | grep 894 | awk '{print "echo deadline > /sys/block/"$1"/queue/scheduler"}' 

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --directory=./ --ioengine=libaio --numjobs=1 --thread \
    --norandommap --runtime=300 --direct=0 --iodepth=8 \
    --scramble_buffers=1 --offset=0 --size=100g 

fio --rw=rw --rwmixread=99 --bsrange=4k-256k --name=vdo \
    --directory=./ --ioengine=sync --size=100g 

blktrace /dev/datavg/mix0lv /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai     /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

blkparse -o /dev/null -i dm-244 -d dm-244.bin
btt -i dm-244.bin | less

blkparse -o /dev/null -i sdaa -d sdaa.bin
btt -i sdaa.bin | less

blkparse -o /dev/null -i sda -d sda.bin
btt -i sda.bin | less


blktrace /dev/datavg/ssd0lv /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai    


lvmconfig --typeconfig default --withcomments --withspaces

lvmconfig --type default --withcomments allocation/cache_policy
lvmconfig --type default --withcomments allocation/cache_settings
lvmconfig --type list --withcomments allocation/cache_settings

iostat -x -m 3 /dev/mapper/datavg-mixlv sdh sdab

dstat -D /dev/mapper/datavg-mixlv,/dev/mapper/datavg-mixlv_corig,sdh,sdab -N bond0

dstat -D /dev/mapper/datavg-mixlv,/dev/mapper/datavg-mixlv_corig,sdh,sdab --disk-util 

bmon -p eno1,eno2,ens2f0,ens2f1,bond0

lvs -o+lv_all datavg/mixlv_corig

lvs -o+Layout datavg/mixlv_corig

lvs -o+CacheReadHits,CacheReadMisses

lvs -o+Layout


blockdev --report    
# RO    RA   SSZ   BSZ   StartSec            Size   Device
# rw  8192   512  4096          0    478998953984   /dev/sdy
# rw  8192   512   512       2048      1073741824   /dev/sdy1
# rw  8192   512  4096    2099200      1073741824   /dev/sdy2
# rw  8192   512  4096    4196352    476849373184   /dev/sdy3
# rw  8192   512  4096          0    958999298048   /dev/sdaj
# rw  8192   512  4096       2048    958998249472   /dev/sdaj1
# rw  8192   512  4096          0   6001175126016   /dev/sda
# rw  8192   512  4096          0   6001175126016   /dev/sdd
# rw  8192   512  4096          0   6001175126016   /dev/sde
# rw  8192   512  4096          0   6001175126016   /dev/sdc
# rw  8192   512  4096          0   6001175126016   /dev/sdf
# rw  8192   512  4096          0   6001175126016   /dev/sdb
# rw  8192   512  4096          0   6001175126016   /dev/sdg
# rw  8192   512  4096          0   6001175126016   /dev/sdh
# rw  8192   512  4096          0   6001175126016   /dev/sdk
# rw  8192   512  4096          0   6001175126016   /dev/sdi
# rw  8192   512  4096          0   6001175126016   /dev/sdm
# rw  8192   512  4096          0   6001175126016   /dev/sdj
# rw  8192   512  4096          0   6001175126016   /dev/sdl
# rw  8192   512  4096          0   6001175126016   /dev/sdn
# rw  8192   512  4096          0   6001175126016   /dev/sdo
# rw  8192   512  4096          0   6001175126016   /dev/sdp
# rw  8192   512  4096          0   6001175126016   /dev/sdx
# rw  8192   512  4096          0   6001175126016   /dev/sdq
# rw  8192   512  4096          0   6001175126016   /dev/sdr
# rw  8192   512  4096          0   6001175126016   /dev/sdu
# rw  8192   512  4096          0   6001175126016   /dev/sdw
# rw  8192   512  4096          0   6001175126016   /dev/sds
# rw  8192   512  4096          0   6001175126016   /dev/sdt
# rw  8192   512  4096          0   6001175126016   /dev/sdv
# rw  8192   512  4096          0    960197124096   /dev/sdz
# rw  8192   512  4096          0    960197124096   /dev/sdaa
# rw  8192   512  4096          0    960197124096   /dev/sdac
# rw  8192   512  4096          0    960197124096   /dev/sdab
# rw  8192   512  4096          0    960197124096   /dev/sdad
# rw  8192   512  4096          0    960197124096   /dev/sdae
# rw  8192   512  4096          0    960197124096   /dev/sdag
# rw  8192   512  4096          0    960197124096   /dev/sdaf
# rw  8192   512  4096          0    960197124096   /dev/sdai
# rw  8192   512  4096          0    960197124096   /dev/sdah
# rw  8192   512  4096          0   5955689381888   /dev/dm-0
# rw  8192   512  4096          0   5955689381888   /dev/dm-1
# rw  8192   512  4096          0   5955689381888   /dev/dm-2
# rw  8192   512  4096          0   5955689381888   /dev/dm-3
# rw  8192   512  4096          0   5955689381888   /dev/dm-4
# rw  8192   512  4096          0   5955689381888   /dev/dm-5
# rw  8192   512  4096          0   5955689381888   /dev/dm-6
# rw  8192   512  4096          0   5955689381888   /dev/dm-7
# rw  8192   512  4096          0   5955689381888   /dev/dm-8
# rw  8192   512  4096          0   5955689381888   /dev/dm-9
# rw  8192   512  4096          0   5955689381888   /dev/dm-10
# rw  8192   512  4096          0   5955689381888   /dev/dm-11
# rw  8192   512  4096          0   5955689381888   /dev/dm-12
# rw  8192   512  4096          0   5955689381888   /dev/dm-13
# rw  8192   512  4096          0   5955689381888   /dev/dm-14
# rw  8192   512  4096          0   5955689381888   /dev/dm-15
# rw  8192   512  4096          0   5955689381888   /dev/dm-16
# rw  8192   512  4096          0   5955689381888   /dev/dm-17
# rw  8192   512  4096          0   5955689381888   /dev/dm-18
# rw  8192   512  4096          0   5955689381888   /dev/dm-19
# rw  8192   512  4096          0   5955689381888   /dev/dm-20
# rw  8192   512  4096          0   5955689381888   /dev/dm-21
# rw  8192   512  4096          0   5955689381888   /dev/dm-22
# rw  8192   512  4096          0   5955689381888   /dev/dm-23
# rw  8192   512  4096          0 142936545165312   /dev/dm-24
# rw  8192   512  4096          0    945580670976   /dev/dm-25
# rw  8192   512  4096          0    945580670976   /dev/dm-26
# rw  8192   512  4096          0    945580670976   /dev/dm-27
# rw  8192   512  4096          0    945580670976   /dev/dm-28
# rw  8192   512  4096          0    945580670976   /dev/dm-29
# rw  8192   512  4096          0    945580670976   /dev/dm-30
# rw  8192   512  4096          0    945580670976   /dev/dm-31
# rw  8192   512  4096          0    945580670976   /dev/dm-32
# rw  8192   512  4096          0    945580670976   /dev/dm-33
# rw  8192   512  4096          0    945580670976   /dev/dm-34
# rw  8192   512  4096          0   9455806709760   /dev/dm-35
# rw  8192   512  4096          0      4294967296   /dev/dm-36
# rw  8192   512  4096          0      4294967296   /dev/dm-37
# rw  8192   512  4096          0      4294967296   /dev/dm-38
# rw  8192   512  4096          0      4294967296   /dev/dm-39
# rw  8192   512  4096          0      4294967296   /dev/dm-40
# rw  8192   512  4096          0      4294967296   /dev/dm-41
# rw  8192   512  4096          0      4294967296   /dev/dm-42
# rw  8192   512  4096          0      4294967296   /dev/dm-43
# rw  8192   512  4096          0      4294967296   /dev/dm-44
# rw  8192   512  4096          0      4294967296   /dev/dm-45
# rw  8192   512  4096          0     42949672960   /dev/dm-46
# rw  8192   512  4096          0 142936545165312   /dev/dm-47
# rw  8192   512  4096          0        46137344   /dev/dm-48
# rw  8192   512  4096          0        46137344   /dev/dm-49
# rw  8192   512  4096          0        46137344   /dev/dm-50
# rw  8192   512  4096          0        46137344   /dev/dm-51
# rw  8192   512  4096          0        46137344   /dev/dm-52
# rw  8192   512  4096          0        46137344   /dev/dm-53
# rw  8192   512  4096          0        46137344   /dev/dm-54
# rw  8192   512  4096          0        46137344   /dev/dm-55
# rw  8192   512  4096          0        46137344   /dev/dm-56
# rw  8192   512  4096          0        46137344   /dev/dm-57
# rw  8192   512  4096          0        46137344   /dev/dm-58
# rw  8192   512  4096          0        46137344   /dev/dm-59
# rw  8192   512  4096          0        46137344   /dev/dm-60
# rw  8192   512  4096          0        46137344   /dev/dm-61
# rw  8192   512  4096          0        46137344   /dev/dm-62
# rw  8192   512  4096          0        46137344   /dev/dm-63
# rw  8192   512  4096          0        46137344   /dev/dm-64
# rw  8192   512  4096          0        46137344   /dev/dm-65
# rw  8192   512  4096          0        46137344   /dev/dm-66
# rw  8192   512  4096          0        46137344   /dev/dm-67
# rw  8192   512  4096          0        46137344   /dev/dm-68
# rw  8192   512  4096          0        46137344   /dev/dm-69
# rw  8192   512  4096          0        46137344   /dev/dm-70
# rw  8192   512  4096          0        46137344   /dev/dm-71
# rw  8192   512  4096          0      1107296256   /dev/dm-72    

# https://access.redhat.com/solutions/3588841
/sbin/blockdev --setra 4096 /dev/mapper/datavg-mixlv
/sbin/blockdev --setra 8192 /dev/mapper/datavg-mixlv
/sbin/blockdev --setra 16384 /dev/mapper/datavg-mixlv
/sbin/blockdev --setra 32768 /dev/mapper/datavg-mixlv
/sbin/blockdev --setra 65536 /dev/mapper/datavg-mixlv
/sbin/blockdev --setra 131072 /dev/mapper/datavg-mixlv
/sbin/blockdev --setra 262144 /dev/mapper/datavg-mixlv

# final config
/sbin/blockdev --setra 16384 /dev/mapper/datavg-mixlv
for f in /dev/mapper/datavg-mixlv_corig_rimage_*; do /sbin/blockdev --setra 16384  $f ; done

# worker2
# 5.5
find /data_mix/mnt/ -type f > list
dstat --output /root/dstat.csv -D /dev/mapper/datavg-mixlv,/dev/mapper/datavg-mixlv_corig,sdh,sdab -N bond0

var_basedir="/data_mix/mnt"
find $var_basedir -type f -size -511M  > list.512m
find $var_basedir -type f -size -2049M  -size +511M > list.2g
find $var_basedir -type f -size +2049M > list.+2g

cat list | shuf > list.shuf.all

cat list.512m | shuf > list.shuf.512m
cat list.2g | shuf > list.shuf.2g
cat list.+2g | shuf > list.shuf.+2g
cat list.2g list.+2g | shuf > list.shuf.+512m

rm -f split.list.*
# zte use 1800
var_total=10
# split -n l/$var_total list.shuf.all split.list.all.
split -n l/$var_total list.shuf.512m split.list.512m.
split -n l/$var_total list.shuf.2g split.list.2g.
split -n l/$var_total list.shuf.+2g split.list.+2g.
split -n l/$var_total list.shuf.+512m split.list.+512m.

for f in split.list.512m.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done
# for f in split.list.+512m.*; do 
#     cat $f | xargs -I DEMO cat DEMO > /dev/null &
# done
for f in split.list.2g.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done
for f in split.list.+2g.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

ps -ef | grep /data_mix/mnt | grep cat | awk '{print $2}' | xargs -I DEMO kill DEMO

tmux kill-window -t 3


# rm -f split.*

# 2.8
var_num=`echo "scale=0;$(cat list | wc -l  )/5" | bc -l`
head -n $var_num list > list.20
tail -n +$var_num list > list.80

var_total=1500
# split -n l/$(echo "scale=0;$var_total/5*4"|bc -l) list.20 split.list.20.
# while true; do
#   for f in split.list.20.*; do 
#       cat $f | xargs -I DEMO cat DEMO > /dev/null &
#   done
#   echo "wait to finish"
#   wait
# done
var_runtimes=$(echo "scale=0;$var_total/5*4"|bc -l)
while true; do
  for ((i=1; i<=$var_runtimes; i++)); do
    echo "Welcome $i times"
    cat list.20 | shuf | xargs -I DEMO cat DEMO > /dev/null &
  done
  echo "wait to finish"
  wait
done

var_total=1500
# split -n l/$(echo "scale=0;$var_total/5*1"|bc -l) list.80 split.list.80.
# while true; do
#   for f in split.list.80.*; do 
#       cat $f | xargs -I DEMO cat DEMO > /dev/null &
#   done
#   echo "wait to finish"
#   wait
# done
var_runtimes=$(echo "scale=0;$var_total/5*1"|bc -l)
while true; do
  for ((i=1; i<=$var_runtimes; i++)); do
    echo "Welcome $i times"
    cat list.80 | shuf | xargs -I DEMO cat DEMO > /dev/null &
  done
  echo "wait to finish"
  wait
done
# 500M-1.2GB/s
ps -ef | grep /data_mix/mnt | grep cat | awk '{print $2}' | xargs -I DEMO kill DEMO



worker-2 disk tunning


# 8.6T cache / 130T hdd = 6.6%
# 660G cache / 10T hdd 

lvcreate --type raid0 -L 10T --stripesize 2048k --stripes 24 -n ext02lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid0 -L 10T --stripesize 4096k --stripes 24 -n ext04lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid5 -L 10T --stripesize 2048k --stripes 23 -n ext52lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid5 -L 10T --stripesize 2048k --stripes 11 -n ext52lv12 datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl 



lvcreate --type raid0 -L 10T --stripesize 2048k --stripes 24 -n xfs02lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid0 -L 10T --stripesize 4096k --stripes 24 -n xfs04lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid5 -L 10T --stripesize 2048k --stripes 23 -n xfs52lv datavg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

lvcreate --type raid5 -L 10T --stripesize 2048k --stripes 11 -n xfs52lv12 datavg /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx


lvcreate --type raid0 -L 3.5T --stripesize 1024k --stripes 10 -n ext01lvssd datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 3.5T --stripesize 1024k --stripes 10 -n xfs01lvssd datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvcreate --type raid0 -L 700G --stripesize 2048k --stripes 10 -n cachelv datavg /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai

lvconvert --type cache-pool datavg/cachelv

lvconvert --type cache --cachepool datavg/cachelv datavg/ext02lv

# lvconvert --splitcache datavg/ext02lv
# lvconvert --uncache datavg/ext02lv

lvs -o+layout,stripesize
  # LV         VG     Attr       LSize  Pool      Origin          Data%  Meta%  Move Log Cpy%Sync Convert Layout              Stripe
  # ext01lvssd datavg rwi-a-r---  3.50t                                                                   raid,raid0           1.00m
  # ext02lv    datavg Cwi-a-C--- 10.00t [cachelv] [ext02lv_corig] 0.01   16.41           0.00             cache                   0
  # ext04lv    datavg rwi-a-r--- 10.00t                                                                   raid,raid0           4.00m
  # ext52lv    datavg rwi-a-r--- 10.00t                                                  9.72             raid,raid5,raid5_ls  2.00m
  # xfs01lvssd datavg rwi-a-r---  3.50t                                                                   raid,raid0           1.00m

mkdir -p /data_ext02
mkdir -p /data_ext04
mkdir -p /data_ext52
mkdir -p /data_ext01
mkdir -p /data_xfs01
mkdir -p /data_xfs02
mkdir -p /data_xfs04
mkdir -p /data_xfs52

mkdir -p /data_ext52_12
mkdir -p /data_xfs52_12

mkfs.ext4 /dev/datavg/ext02lv
mkfs.ext4 /dev/datavg/ext04lv
mkfs.ext4 /dev/datavg/ext52lv
mkfs.ext4 /dev/datavg/ext01lvssd
mkfs.xfs  /dev/datavg/xfs01lvssd
mkfs.xfs  /dev/datavg/xfs02lv
mkfs.xfs  /dev/datavg/xfs04lv
mkfs.xfs  /dev/datavg/xfs52lv

mkfs.ext4 /dev/datavg/ext52lv12
mkfs.xfs  /dev/datavg/xfs52lv12

mount /dev/datavg/ext02lv /data_ext02
mount /dev/datavg/ext04lv /data_ext04
mount /dev/datavg/ext52lv /data_ext52
mount /dev/datavg/ext01lvssd /data_ext01
mount /dev/datavg/xfs01lvssd /data_xfs01
mount /dev/datavg/xfs02lv /data_xfs02
mount /dev/datavg/xfs04lv /data_xfs04
mount /dev/datavg/xfs52lv /data_xfs52

mount /dev/datavg/ext52lv12 /data_ext52_12
mount /dev/datavg/xfs52lv12 /data_xfs52_12

dstat -d -D /dev/datavg/ext02lv,/dev/datavg/ext04lv,/dev/datavg/ext52lv,/dev/datavg/ext01lvssd,/dev/datavg/xfs01lvssd,/dev/datavg/xfs02lv,/dev/datavg/xfs04lv,/dev/datavg/xfs52lv,/dev/datavg/ext52lv12,/dev/datavg/xfs52lv12,/dev/sdaa
dstat -d -D /dev/datavg/ext02lv,/dev/datavg/ext04lv,/dev/datavg/ext52lv,/dev/datavg/ext01lvssd,/dev/datavg/xfs01lvssd,/dev/datavg/xfs02lv,/dev/datavg/xfs04lv,/dev/datavg/xfs52lv,/dev/datavg/ext52lv12,/dev/datavg/xfs52lv12,/dev/sdaa,/dev/sdb --disk-util
bmon -p bond0,enp*

# on worker1
rclone config
rclone lsd worker-2:
rclone sync /data_ssd/mnt/ worker-2:/data_ext01/mnt/ -P -L --transfers 64


# on worker-2

# fill data
# for 256M
var_basedir_ext="/data_ext04/mnt"

mkdir -p $var_basedir_ext

# how may write concurrency
var_total_write=10
# how much size each file, this value is in MB
# 512M
var_size=512
# how much size to write totally, in TB
# write 3T
var_total_size=3

var_number=$(echo "scale=0;$var_total_size*1024*1024/$var_size/$var_total_write"|bc -l)
var_len=$(echo "scale=0;$var_size*1024/1"|bc -l)

for ((i=1; i<=$var_number; i++)); do
  for ((j=1; j<=$var_total_write; j++)); do
    head -c ${var_len}K < /dev/urandom > $var_basedir_ext/$var_size-$j-$i &
  done
  echo "wait to finish: $i"
  wait
done



# fill data
# for 1G
var_basedir_ext="/data_ext04/mnt"

mkdir -p $var_basedir_ext

# how may write concurrency
var_total_write=10
# how much size each file, this value is in MB
# 512M
var_size=1024
# how much size to write totally, in TB
# write 3T
var_total_size=3

var_number=$(echo "scale=0;$var_total_size*1024*1024/$var_size/$var_total_write"|bc -l)
var_len=$(echo "scale=0;$var_size*1024/1"|bc -l)

for ((i=1; i<=$var_number; i++)); do
  for ((j=1; j<=$var_total_write; j++)); do
    head -c ${var_len}K < /dev/urandom > $var_basedir_ext/$var_size-$j-$i &
  done
  echo "wait to finish: $i"
  wait
done



# fill data
# for 2G
var_basedir_ext="/data_ext04/mnt"

mkdir -p $var_basedir_ext

# how may write concurrency
var_total_write=10
# how much size each file, this value is in MB
# 512M
var_size=2048
# how much size to write totally, in TB
# write 3T
var_total_size=3

var_number=$(echo "scale=0;$var_total_size*1024*1024/$var_size/$var_total_write"|bc -l)
var_len=$(echo "scale=0;$var_size*1024/1"|bc -l)

for ((i=1; i<=$var_number; i++)); do
  for ((j=1; j<=$var_total_write; j++)); do
    head -c ${var_len}K < /dev/urandom > $var_basedir_ext/$var_size-$j-$i &
  done
  echo "wait to finish: $i"
  wait
done


# copy data
rclone sync /data_ext01/mnt/ /data_xfs01/mnt/ -P -L --transfers 64
rclone sync /data_ext04/mnt/ /data_xfs02/mnt/ -P -L --transfers 64

rclone sync /data_ext04/mnt/ /data_xfs04/mnt/ -P -L --transfers 10
rclone sync /data_ext04/mnt/ /data_xfs52/mnt/ -P -L --transfers 10
rclone sync /data_ext04/mnt/ /data_xfs52_12/mnt/ -P -L --transfers 10

rclone sync /data_ext04/mnt/ /data_ext02/mnt/ -P -L --transfers 10
rclone sync /data_ext04/mnt/ /data_ext52/mnt/ -P -L --transfers 10
rclone sync /data_ext04/mnt/ /data_ext52_12/mnt/ -P -L --transfers 10




var_truebase="/data_xfs52"
mkdir -p $var_truebase/list.tmp
cd $var_truebase/list.tmp

var_basedir="$var_truebase/mnt"
find $var_basedir -type f -size -600M  > list.512m
find $var_basedir -type f -size -1100M  -size +600M > list.1g
find $var_basedir -type f -size +1100M > list.+1g
find $var_basedir -type f > list

cat list | xargs ls -l > list.size
cat list.size | awk '{ n=int(log($5)/log(2));                         \
          if (n<10) n=10;                                               \
          size[n]++ }                                                   \
      END { for (i in size) printf("%d %d\n", 2^i, size[i]) }'          \
 | sort -n                                                              \
 | awk 'function human(x) { x[1]/=1024;                                 \
                            if (x[1]>=1024) { x[2]++;                   \
                                              human(x) } }              \
        { a[1]=$1;                                                      \
          a[2]=0;                                                       \
          human(a);                                                     \
          printf("%3d - %4d %s: %6d\n", a[1], a[1]*2,substr("kMGTEPYZ",a[2]+1,1),$2) }' 


# seperate read
for i in 512m 1g +1g ; do
  cat list.$i | shuf > list.shuf.$i
done

rm -f split.list.*
# zte use 1800
var_total=30

for i in 512m 1g +1g ; do
  split -n l/$var_total list.shuf.$i split.list.$i.
done


for f in split.list.512m.*; do 
  cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

for f in split.list.1g.*; do 
  cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

for f in split.list.+1g.*; do 
  cat $f | xargs -I DEMO cat DEMO > /dev/null &
done


# mix read
for i in 512m 1g +1g ; do
  cat list.$i | shuf > list.shuf.$i
done

rm -f split.list.*
# zte use 1800
var_total=10

for i in 512m 1g +1g ; do
  split -n l/$var_total list.shuf.$i split.list.$i.
done

for i in 512m 1g +1g ; do
  for f in split.list.$i.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
  done
done



ps -ef | grep xargs | grep DEMO | grep cat | awk '{print $2}' | xargs -I DEMO kill DEMO

ps -ef | grep cat | grep /data | awk '{print $2}' | xargs -I DEMO kill -9 DEMO

lvconvert --splitcache datavg/ext02lv



var_truebase="/data_ext01"
mkdir -p $var_truebase/list.tmp
cd $var_truebase/list.tmp

var_basedir="$var_truebase/mnt"
find $var_basedir -type f -size -16k  > list.16k
find $var_basedir -type f -size -128k  -size +16k > list.128k
find $var_basedir -type f -size +128k > list.+128k
find $var_basedir -type f > list

cat list | xargs ls -l > list.size
cat list.size | awk '{ n=int(log($5)/log(2));                         \
          if (n<10) n=10;                                               \
          size[n]++ }                                                   \
      END { for (i in size) printf("%d %d\n", 2^i, size[i]) }'          \
 | sort -n                                                              \
 | awk 'function human(x) { x[1]/=1024;                                 \
                            if (x[1]>=1024) { x[2]++;                   \
                                              human(x) } }              \
        { a[1]=$1;                                                      \
          a[2]=0;                                                       \
          human(a);                                                     \
          printf("%3d - %4d %s: %6d\n", a[1], a[1]*2,substr("kMGTEPYZ",a[2]+1,1),$2) }' 


# seperate read
for i in 16k 128k +128k ; do
  cat list.$i | shuf > list.shuf.$i
done

rm -f split.list.*
# zte use 1800
var_total=30

for i in 16k 128k +128k ; do
  split -n l/$var_total list.shuf.$i split.list.$i.
done


for f in split.list.16k.*; do 
  cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

for f in split.list.128k.*; do 
  cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

for f in split.list.+128k.*; do 
  cat $f | xargs -I DEMO cat DEMO > /dev/null &
done


# mix read
for i in 16k 128k +128k ; do
  cat list.$i | shuf > list.shuf.$i
done

rm -f split.list.*
# zte use 1800
var_total=10

for i in 16k 128k +128k ; do
  split -n l/$var_total list.shuf.$i split.list.$i.
done

for i in 16k 128k +128k ; do
  for f in split.list.$i.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
  done
done

ps -ef | grep xargs | grep DEMO | grep cat | awk '{print $2}' | xargs -I DEMO kill DEMO



worker-2 nic bond

ip link show
# 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether cc:64:a6:59:bb:80 brd ff:ff:ff:ff:ff:ff
# 3: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether 08:4f:0a:b5:a4:6e brd ff:ff:ff:ff:ff:ff
# 4: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether cc:64:a6:59:bb:81 brd ff:ff:ff:ff:ff:ff
# 5: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether cc:64:a6:59:bb:82 brd ff:ff:ff:ff:ff:ff
# 6: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether 08:4f:0a:b5:a4:6f brd ff:ff:ff:ff:ff:ff
# 7: eno4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
#     link/ether cc:64:a6:59:bb:83 brd ff:ff:ff:ff:ff:ff

ip a s eno1
# 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether cc:64:a6:59:bb:80 brd ff:ff:ff:ff:ff:ff
#     inet 39.134.201.66/27 brd 39.134.201.95 scope global noprefixroute eno1
#        valid_lft forever preferred_lft forever
#     inet6 fe80::f690:1c45:b8c3:96d/64 scope link noprefixroute
#        valid_lft forever preferred_lft forever

ethtool eno1  # 10000baseT/Full
ethtool eno2  # 10000baseT/Full
ethtool eno3  # 1000baseT/Full
ethtool eno4  # 1000baseT/Full
ethtool ens2f0  #  10000baseT/Full
ethtool ens2f1  #  10000baseT/Full

nmcli con add type bond \
    con-name bond0 \
    ifname bond0 \
    mode 802.3ad 

nmcli con mod id bond0 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3

nmcli con add type bond-slave ifname eno2 con-name eno2 master bond0
nmcli con add type bond-slave ifname ens2f0 con-name ens2f0 master bond0
nmcli con add type bond-slave ifname ens2f1 con-name ens2f1 master bond0

nmcli con down eno2
nmcli con up eno2
nmcli con down ens2f0
nmcli con up ens2f0
nmcli con down ens2f1
nmcli con up ens2f1
nmcli con down bond0
nmcli con start bond0     


#######################################
# nic bond
cat > /root/nic.bond.sh << 'EOF'
#!/bin/bash

set -x 

# delete all connection 
nmcli -g uuid con | while read i ; do nmcli c delete  ${i} ; done 

nmcli con add type bond \
    con-name bond0 \
    ifname bond0 \
    mode 802.3ad \
    ipv4.method 'manual' \
    ipv4.address '39.134.201.66/27' \
    ipv4.gateway '39.134.201.94' \
    ipv4.dns '117.177.241.16'
    
nmcli con mod id bond0 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3

nmcli con add type bond-slave ifname eno1 con-name eno1 master bond0    
nmcli con add type bond-slave ifname eno2 con-name eno2 master bond0
nmcli con add type bond-slave ifname ens2f0 con-name ens2f0 master bond0
nmcli con add type bond-slave ifname ens2f1 con-name ens2f1 master bond0

systemctl restart network

EOF

cat > /root/nic.restore.sh << 'EOF'
#!/bin/bash

set -x 

# delete all connection 
nmcli -g uuid con | while read i ; do nmcli c delete  ${i} ; done 

# re-create primary connection 
nmcli con add type ethernet \
    con-name eno1 \
    ifname eno1 \
    ipv4.method 'manual' \
    ipv4.address '39.134.201.66/27' \
    ipv4.gateway '39.134.201.94' \
    ipv4.dns '117.177.241.16'

systemctl restart network

exit 0
EOF

chmod +x /root/nic.restore.sh

cat > ~/cron-network-con-recreate << EOF
*/20 * * * * /bin/bash /root/nic.restore.sh
EOF

crontab ~/cron-network-con-recreate

bash /root/nic.bond.sh


worker-3 host


systemctl stop firewalld
systemctl disable firewalld

cat << EOF > /etc/rc.local
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.

touch /var/lock/subsys/local

ipset create my-allow-set hash:net
ipset add my-allow-set 127.0.0.1/32
ipset add my-allow-set 223.87.20.0/24
ipset add my-allow-set 117.177.241.0/24
ipset add my-allow-set 39.134.200.0/24
ipset add my-allow-set 39.134.201.0/24
ipset add my-allow-set 39.137.101.0/24
ipset add my-allow-set 192.168.7.0/24
ipset add my-allow-set 112.44.102.224/27
ipset add my-allow-set 47.93.86.113/32
ipset add my-allow-set 221.226.0.75/32
ipset add my-allow-set 210.21.236.182/32
ipset add my-allow-set 61.132.54.2/32

ipset add my-allow-set 39.134.198.0/24

ipset add my-allow-set 39.134.204.0/24

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m set --match-set my-allow-set src -j ACCEPT
iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -p udp -j REJECT

EOF

chmod +x /etc/rc.d/rc.local
systemctl enable rc-local

# systemctl restart rc-local


#######################################
# nic bond
cat << 'EOF' > /root/nic.bond.sh
#!/bin/bash

# delete all connection 
nmcli -g uuid con | while read i ; do nmcli c delete uuid ${i} ; done 

nmcli con add type bond \
    con-name bond0 \
    ifname bond0 \
    mode 802.3ad \
    ipv4.method 'manual' \
    ipv4.address '39.134.204.73/27' \
    ipv4.gateway '39.134.204.65' \
    ipv4.dns '117.177.241.16'
    
nmcli con mod id bond0 bond.options \
    mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer2+3
    
nmcli con add type bond-slave ifname enp176s0f0 con-name enp176s0f0 master bond0
nmcli con add type bond-slave ifname enp176s0f1 con-name enp176s0f1 master bond0

systemctl restart network

EOF

cat > /root/nic.restore.sh << 'EOF'
#!/bin/bash

# delete all connection 
nmcli -g uuid con | while read i ; do nmcli c delete uuid ${i} ; done 

# re-create primary connection 
nmcli con add type ethernet \
    con-name enp176s0f0 \
    ifname enp176s0f0 \
    ipv4.method 'manual' \
    ipv4.address '39.134.204.73/27' \
    ipv4.gateway '39.134.204.65' \
    ipv4.dns '117.177.241.16'

systemctl restart network

exit 0
EOF

chmod +x /root/nic.restore.sh

cat > ~/cron-network-con-recreate << EOF
*/2 * * * * /bin/bash /root/nic.restore.sh
EOF

crontab ~/cron-network-con-recreate



mkdir /etc/yum.repos.d.bak
mv /etc/yum.repos.d/* /etc/yum.repos.d.bak

cat << EOF > /etc/yum.repos.d/remote.repo
[remote]
name=RHEL FTP
baseurl=ftp://117.177.241.16/data
enabled=1
gpgcheck=0

EOF

yum clean all
yum --disableplugin=subscription-manager  repolist

yum -y update

hostnamectl set-hostname worker-3.ocpsc.redhat.ren

nmcli connection modify enp176s0f0 ipv4.dns 117.177.241.16
nmcli connection reload
nmcli connection up enp176s0f0



# ntp
yum install -y chrony
systemctl enable chronyd
systemctl restart chronyd
systemctl status chronyd
chronyc tracking

systemctl disable --now firewalld.service

# update ntp
cat << EOF > /etc/chrony.conf
server 223.87.20.100 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

systemctl restart chronyd
systemctl status chronyd
chronyc tracking





worker-3 disk

lshw -class disk

lsblk | grep 5.5 | awk '{print $1}' | xargs -I DEMO echo -n "/dev/DEMO "
# /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy
lsblk | grep 5.5 | awk '{print $1}' | wc -l
# 24

pvcreate -y /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy

vgcreate datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy

lsblk -d -o name,rota

lvcreate --type raid0 -L 120T  --stripesize 128k --stripes 24 -n hddlv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy


mkfs.ext4 /dev/datavg/hddlv



lvcreate --type raid0 -L 5T  --stripesize 512k --stripes 24 -n xfslv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy

lvcreate --type raid0 -L 110T  --stripesize 4096k --stripes 24 -n extzxlv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy

lvcreate --type raid0 -L 3.5T  --stripesize 4096k --stripes 24 -n ext04lv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy

lvcreate --type raid6 -L 3.5T  --stripesize 2048k --stripes 22 -n ext62lv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy

lvcreate --type raid5 -L 3.5T  --stripesize 2048k --stripes 23 -n ext52lv datavg /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy



mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/fc-root

mkfs.xfs /dev/datavg/xfslv
mkfs.ext4 /dev/datavg/extlv



mkfs.ext4 /dev/datavg/ext04lv
mkfs.ext4 /dev/datavg/ext62lv

mkfs.ext4 /dev/datavg/ext52lv

mkfs.ext4 /dev/datavg/extzxlv
# mkfs.xfs /dev/datavg/extzxlv
mount /dev/datavg/extzxlv /data
rclone sync /data_ext04/mnt/ /data/redhat_mnt/  -P -L --transfers 64

mount /dev/datavg/xfslv /data_xfs
mount /dev/datavg/extlv /data_ext

mkdir -p /data_ext02
mkdir -p /data_ext04
mkdir -p /data_ext62
mkdir -p /data_ext52

mount /dev/datavg/ext02lv /data_ext02
mount /dev/datavg/ext04lv /data_ext04
# mount /dev/datavg/ext62lv /data_ext62
mount /dev/datavg/ext52lv /data_ext52

umount /data_xfs
lvremove -f datavg/xfslv
# rsync --info=progress2 -P -ar  /data_ext/mnt/ /data_xfs/mnt/
rclone sync /data_ext/mnt/ /data_xfs/mnt/ -P -L --transfers 64

umount /data_ext
lvremove -f datavg/extlv
rclone sync /data_xfs/mnt/ /data_ext/mnt/ -P -L --transfers 64

umount /data_ext52
rclone sync /data_xfs/mnt/ /data_ext04/mnt/ -P -L --transfers 64
rclone sync /data_xfs/mnt/ /data_ext62/mnt/ -P -L --transfers 64
rclone sync /data_xfs/mnt/ /data_ext52/mnt/ -P -L --transfers 64

lvs -o+stripesize

dstat -D /dev/datavg/xfslv,/dev/datavg/extlv,/dev/sdb,/dev/sdc 5
dstat -D /dev/datavg/xfslv,/dev/datavg/extlv,/dev/sdb,/dev/sdc --disk-util
bmon -p bond0,enp*

blockdev --report 
# https://access.redhat.com/solutions/3588841
# orig: 12288
/sbin/blockdev --setra 131072 /dev/datavg/xfslv
/sbin/blockdev --setra 131072 /dev/datavg/extlv

/sbin/blockdev --setra 12288 /dev/datavg/xfslv
/sbin/blockdev --setra 12288 /dev/datavg/extlv


mkdir -p /data/

cat /etc/fstab

cat << EOF >> /etc/fstab
/dev/datavg/hddlv /data                  ext4     defaults        0 0
EOF

mount -a
df -h | grep \/data

while true; do df -h | grep /data; sleep 10; done

dstat -D /dev/datavg/hddlv 
dstat -D /dev/sdb,/dev/sdc
dstat -D /dev/sdb,/dev/sdc --disk-util

mkfs.xfs -f /dev/sdb
mkfs.ext4 -F /dev/sdc

mkdir -p /data_xfs
mkdir -p /data_ext

mount /dev/sdb /data_xfs
mount /dev/sdc /data_ext


# fill data
# for 1.5M
var_basedir_xfs="/data_xfs/mnt"
var_basedir_ext="/data_ext/mnt"

mkdir -p $var_basedir_xfs
mkdir -p $var_basedir_ext


var_basedir_xfs="/data_xfs/mnt"
var_basedir_ext="/data_ext/mnt"
var_total=10
# 512k
var_size=0.5
# write 1T
var_number=$(echo "scale=0;1024*1024/$var_size/$var_total"|bc -l)
var_len=$(echo "scale=0;$var_size*1024/1"|bc -l)

for ((i=1; i<=$var_number; i++)); do
  for ((j=1; j<=$var_total; j++)); do
    # echo "Welcome $i times"
    head -c ${var_len}K < /dev/urandom > $var_basedir_xfs/$var_size-$j-$i &
    head -c ${var_len}K < /dev/urandom > $var_basedir_ext/$var_size-$j-$i &
  done
  echo "wait to finish: $i"
  wait
done

var_basedir_xfs="/data_xfs/mnt"
var_basedir_ext="/data_ext/mnt"
var_total=10
# 4M
var_size=4
# write 1T
var_number=$(echo "scale=0;1024*1024/$var_size/$var_total"|bc -l)
var_len=$(echo "scale=0;$var_size*1024/1"|bc -l)

for ((i=1; i<=$var_number; i++)); do
  for ((j=1; j<=$var_total; j++)); do
    # echo "Welcome $i times"
    head -c ${var_len}K < /dev/urandom > $var_basedir_xfs/$var_size-$j-$i &
    head -c ${var_len}K < /dev/urandom > $var_basedir_ext/$var_size-$j-$i &
  done
  echo "wait to finish: $i"
  wait
done


var_basedir_xfs="/data_xfs/mnt"
var_basedir_ext="/data_ext/mnt"
var_total=10
# 8M
var_size=8
# write 1T
var_number=$(echo "scale=0;1024*1024/$var_size/$var_total"|bc -l)
var_len=$(echo "scale=0;$var_size*1024/1"|bc -l)

for ((i=1; i<=$var_number; i++)); do
  for ((j=1; j<=$var_total; j++)); do
    # echo "Welcome $i times"
    head -c ${var_len}K < /dev/urandom > $var_basedir_xfs/$var_size-$j-$i &
    head -c ${var_len}K < /dev/urandom > $var_basedir_ext/$var_size-$j-$i &
  done
  echo "wait to finish: $i"
  wait
done

var_basedir_xfs="/data_xfs/mnt"
var_basedir_ext="/data_ext/mnt"
var_total=10
# 32M
var_size=32
# write 1T
var_number=$(echo "scale=0;1024*1024/$var_size/$var_total"|bc -l)
var_len=$(echo "scale=0;$var_size*1024/1"|bc -l)

for ((i=1; i<=$var_number; i++)); do
  for ((j=1; j<=$var_total; j++)); do
    # echo "Welcome $i times"
    head -c ${var_len}K < /dev/urandom > $var_basedir_xfs/$var_size-$j-$i &
    head -c ${var_len}K < /dev/urandom > $var_basedir_ext/$var_size-$j-$i &
  done
  echo "wait to finish: $i"
  wait
done

var_basedir_xfs="/data_xfs/mnt"
var_basedir_ext="/data_ext/mnt"
var_total=10
# 64M
var_size=64
# write 1T
var_number=$(echo "scale=0;1024*1024/$var_size/$var_total"|bc -l)
var_len=$(echo "scale=0;$var_size*1024/1"|bc -l)

for ((i=1; i<=$var_number; i++)); do
  for ((j=1; j<=$var_total; j++)); do
    # echo "Welcome $i times"
    head -c ${var_len}K < /dev/urandom > $var_basedir_xfs/$var_size-$j-$i &
    head -c ${var_len}K < /dev/urandom > $var_basedir_ext/$var_size-$j-$i &
  done
  echo "wait to finish: $i"
  wait
done

mkdir -p /data_xfs/list.tmp
cd /data_xfs/list.tmp
var_basedir="/data_xfs/mnt"
find $var_basedir -type f -size -2M  > list.2m
find $var_basedir -type f -size -10M  -size +2M > list.10m
find $var_basedir -type f -size +10M > list.100m
find $var_basedir -type f > list


var_truebase="/data"
mkdir -p $var_truebase/list.tmp
cd $var_truebase/list.tmp

var_basedir="$var_truebase/mnt"
find $var_basedir -type f -size -2M  > list.2m
find $var_basedir -type f -size -10M  -size +2M > list.10m
find $var_basedir -type f -size +10M > list.100m
find $var_basedir -type f > list

cat list | xargs ls -l > list.size
cat list.size | awk '{ n=int(log($5)/log(2));                         \
          if (n<10) n=10;                                              \
          size[n]++ }                                                   \
      END { for (i in size) printf("%d %d\n", 2^i, size[i]) }'          \
 | sort -n                                                              \
 | awk 'function human(x) { x[1]/=1024;                                 \
                            if (x[1]>=1024) { x[2]++;                   \
                                              human(x) } }              \
        { a[1]=$1;                                                      \
          a[2]=0;                                                       \
          human(a);                                                     \
          printf("%3d - %4d %s: %6d\n", a[1], a[1]*2,substr("kMGTEPYZ",a[2]+1,1),$2) }' 





cat list | shuf > list.shuf.all

cat list.2m | shuf > list.shuf.2m
cat list.10m | shuf > list.shuf.10m
cat list.100m | shuf > list.shuf.100m
cat list.10m list.100m | shuf > list.shuf.+2m

rm -f split.list.*
# zte use 1800
var_total=10
split -n l/$var_total list.shuf.all split.list.all.
split -n l/$var_total list.shuf.2m split.list.2m.
split -n l/$var_total list.shuf.10m split.list.10m.
split -n l/$var_total list.shuf.100m split.list.100m.
split -n l/$var_total list.shuf.+2m split.list.+2m.

for f in split.list.2m.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done
# for f in split.list.+2m.*; do 
#     cat $f | xargs -I DEMO cat DEMO > /dev/null &
# done

for f in split.list.10m.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done
for f in split.list.100m.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

for f in split.list.all.*; do 
    cat $f | xargs -I DEMO cat DEMO > /dev/null &
done

jobs -p | xargs kill

ps -ef | grep xargs | grep DEMO | grep cat | awk '{print $2}' | xargs -I DEMO kill DEMO



install ocp

helper node day1

############################################################
# on macbook
mkdir -p /Users/wzh/Documents/redhat/tools/redhat.ren/etc
mkdir -p /Users/wzh/Documents/redhat/tools/redhat.ren/lib
mkdir -p /Users/wzh/Documents/redhat/tools/ocpsc.redhat.ren/etc
mkdir -p /Users/wzh/Documents/redhat/tools/ocpsc.redhat.ren/lib
rm -rf /Users/wzh/Documents/redhat/tools/apps.ocpsc.redhat.ren/
mkdir -p /Users/wzh/Documents/redhat/tools/apps.ocpsc.redhat.ren/etc
mkdir -p /Users/wzh/Documents/redhat/tools/apps.ocpsc.redhat.ren/lib

cd /Users/wzh/Documents/redhat/tools/redhat.ren/
docker run -it --rm --name certbot \
            -v "/Users/wzh/Documents/redhat/tools/redhat.ren/etc:/etc/letsencrypt" \
            -v "/Users/wzh/Documents/redhat/tools/redhat.ren/lib:/var/lib/letsencrypt" \
            certbot/certbot certonly  -d "*.redhat.ren" --manual --preferred-challenges dns-01  --server https://acme-v02.api.letsencrypt.org/directory

cp ./etc/archive/redhat.ren/fullchain4.pem redhat.ren.crt
cp ./etc/archive/redhat.ren/privkey4.pem redhat.ren.key

cd /Users/wzh/Documents/redhat/tools/ocpsc.redhat.ren/
docker run -it --rm --name certbot \
            -v "/Users/wzh/Documents/redhat/tools/ocpsc.redhat.ren/etc:/etc/letsencrypt" \
            -v "/Users/wzh/Documents/redhat/tools/ocpsc.redhat.ren/lib:/var/lib/letsencrypt" \
            certbot/certbot certonly  -d "*.ocpsc.redhat.ren" --manual --preferred-challenges dns-01  --server https://acme-v02.api.letsencrypt.org/directory

cp ./etc/archive/ocpsc.redhat.ren/fullchain1.pem ocpsc.redhat.ren.crt
cp ./etc/archive/ocpsc.redhat.ren/privkey1.pem ocpsc.redhat.ren.key


cd /Users/wzh/Documents/redhat/tools/apps.ocpsc.redhat.ren/
docker run -it --rm --name certbot \
            -v "/Users/wzh/Documents/redhat/tools/apps.ocpsc.redhat.ren/etc:/etc/letsencrypt" \
            -v "/Users/wzh/Documents/redhat/tools/apps.ocpsc.redhat.ren/lib:/var/lib/letsencrypt" \
            certbot/certbot certonly  -d "*.apps.ocpsc.redhat.ren" --manual --preferred-challenges dns-01  --server https://acme-v02.api.letsencrypt.org/directory

cp ./etc/archive/apps.ocpsc.redhat.ren/fullchain1.pem apps.ocpsc.redhat.ren.crt
cp ./etc/archive/apps.ocpsc.redhat.ren/privkey1.pem apps.ocpsc.redhat.ren.key

# scp these keys to helper
# /data/cert/*

####################################################
# on helper node
yum -y install podman docker-distribution pigz skopeo httpd-tools

# https://access.redhat.com/solutions/3175391
htpasswd -cbB /etc/docker-distribution/registry_passwd admin ***************

cat << EOF > /etc/docker-distribution/registry/config.yml
version: 0.1
log:
  fields:
    service: registry
storage:
    cache:
        layerinfo: inmemory
    filesystem:
        rootdirectory: /data/registry
    delete:
        enabled: true
http:
    addr: :5443
    tls:
       certificate: /data/cert/redhat.ren.crt
       key: /data/cert/redhat.ren.key
auth:
  htpasswd:
    realm: basic‑realm
    path: /etc/docker-distribution/registry_passwd
EOF
# systemctl restart docker
systemctl stop docker-distribution
systemctl enable docker-distribution
systemctl restart docker-distribution
# 

firewall-cmd --permanent --add-port=5443/tcp
firewall-cmd --reload

podman login registry.redhat.ren:5443 -u admin -p *******************

yum install -y docker
systemctl start docker
docker login registry.redhat.ren:5443 -u admin

# upload vars-static.yaml to helper
yum -y install ansible-2.8.10 git unzip podman python36

cd /data/ocp4/ocp4-upi-helpernode
ansible-playbook -e @vars-static.yaml -e staticips=true tasks/main.yml

# upload install-config.yaml to helper /data/ocp4
cd /data/ocp4

/bin/rm -rf *.ign .openshift_install_state.json auth bootstrap master0 master1 master2 worker0 worker1 worker2

openshift-install create ignition-configs --dir=/data/ocp4

/bin/cp -f bootstrap.ign /var/www/html/ignition/bootstrap-static.ign
/bin/cp -f master.ign /var/www/html/ignition/master-0.ign
/bin/cp -f master.ign /var/www/html/ignition/master-1.ign
/bin/cp -f master.ign /var/www/html/ignition/master-2.ign
/bin/cp -f worker.ign /var/www/html/ignition/worker-0.ign
/bin/cp -f worker.ign /var/www/html/ignition/worker-1.ign
/bin/cp -f worker.ign /var/www/html/ignition/worker-2.ign

chmod 644 /var/www/html/ignition/*

########################################################
# on helper node, create iso
yum -y install genisoimage libguestfs-tools
systemctl start libvirtd

export NGINX_DIRECTORY=/data/ocp4
export RHCOSVERSION=4.3.0
export VOLID=$(isoinfo -d -i ${NGINX_DIRECTORY}/rhcos-${RHCOSVERSION}-x86_64-installer.iso | awk '/Volume id/ { print $3 }')
TEMPDIR=$(mktemp -d)
echo $VOLID
echo $TEMPDIR

cd ${TEMPDIR}
# Extract the ISO content using guestfish (to avoid sudo mount)
guestfish -a ${NGINX_DIRECTORY}/rhcos-${RHCOSVERSION}-x86_64-installer.iso \
  -m /dev/sda tar-out / - | tar xvf -

# Helper function to modify the config files
modify_cfg(){
  for file in "EFI/redhat/grub.cfg" "isolinux/isolinux.cfg"; do
    # Append the proper image and ignition urls
    sed -e '/coreos.inst=yes/s|$| coreos.inst.install_dev=vda coreos.inst.image_url='"${URL}"'\/install\/'"${BIOSMODE}"'.raw.gz coreos.inst.ignition_url='"${URL}"'\/ignition\/'"${NODE}"'.ign ip='"${IP}"'::'"${GATEWAY}"':'"${NETMASK}"':'"${FQDN}"':'"${NET_INTERFACE}"':none:'"${DNS}"' nameserver='"${DNS}"'|' ${file} > $(pwd)/${NODE}_${file##*/}
    # Boot directly in the installation
    sed -i -e 's/default vesamenu.c32/default linux/g' -e 's/timeout 600/timeout 10/g' $(pwd)/${NODE}_${file##*/}
  done
}

URL="http://117.177.241.16:8080/"
GATEWAY="117.177.241.1"
NETMASK="255.255.255.0"
DNS="117.177.241.16"

# BOOTSTRAP
# TYPE="bootstrap"
NODE="bootstrap-static"
IP="117.177.241.243"
FQDN="vm-bootstrap"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

# MASTERS
# TYPE="master"
# MASTER-0
NODE="master-0"
IP="117.177.241.240"
FQDN="vm-master0"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

# MASTER-1
NODE="master-1"
IP="117.177.241.241"
FQDN="vm-master1"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

# MASTER-2
NODE="master-2"
IP="117.177.241.242"
FQDN="vm-master2"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

# WORKERS
NODE="worker-0"
IP="117.177.241.244"
FQDN="vm-worker0"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

NODE="worker-1"
IP="117.177.241.245"
FQDN="vm-worker1"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg


# Generate the images, one per node as the IP configuration is different...
# https://github.com/coreos/coreos-assembler/blob/master/src/cmd-buildextend-installer#L97-L103
for node in master-0 master-1 master-2 worker-0 worker-1 worker-2 bootstrap-static; do
  # Overwrite the grub.cfg and isolinux.cfg files for each node type
  for file in "EFI/redhat/grub.cfg" "isolinux/isolinux.cfg"; do
    /bin/cp -f $(pwd)/${node}_${file##*/} ${file}
  done
  # As regular user!
  genisoimage -verbose -rock -J -joliet-long -volset ${VOLID} \
    -eltorito-boot isolinux/isolinux.bin -eltorito-catalog isolinux/boot.cat \
    -no-emul-boot -boot-load-size 4 -boot-info-table \
    -eltorito-alt-boot -efi-boot images/efiboot.img -no-emul-boot \
    -o ${NGINX_DIRECTORY}/${node}.iso .
done

# Optionally, clean up
cd /data/ocp4
rm -Rf ${TEMPDIR}

cd ${NGINX_DIRECTORY}

# mkdir -p /data/ocp4
# mkdir -p /data/kvm
scp master-*.iso root@117.177.241.17:/data/ocp4/

scp master-*.iso root@117.177.241.21:/data/ocp4/
scp worker-*.iso root@117.177.241.21:/data/ocp4/
scp bootstrap-*.iso root@117.177.241.21:/data/ocp4/

scp master-*.iso root@117.177.241.18:/data/ocp4/

# after you create and boot master vm, worker vm, you can track the result
export KUBECONFIG=/data/ocp4/auth/kubeconfig
echo "export KUBECONFIG=/data/ocp4/auth/kubeconfig" >> ~/.bashrc
source ~/.bashrc
oc get nodes

openshift-install wait-for bootstrap-complete --log-level debug

oc get csr

openshift-install wait-for install-complete

bash add.image.load.sh /data_ssd/is.samples/mirror_dir/

oc apply -f ./99-worker-zzz-container-registries.yaml -n openshift-config
oc apply -f ./99-master-zzz-container-registries.yaml -n openshift-config

helper node day1 oper


# https://docs.openshift.com/container-platform/4.3/openshift_images/managing_images/using-image-pull-secrets.html#images-update-global-pull-secret_using-image-pull-secrets
oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=/data/pull-secret.json

# https://docs.openshift.com/container-platform/4.3/networking/ingress-operator.html#nw-ingress-controller-tls-profiles_configuring-ingress
oc --namespace openshift-ingress-operator get ingresscontrollers

oc --namespace openshift-ingress create secret tls custom-certs-default --cert=/data/cert/apps.ocpsc.redhat.ren.crt --key=/data/cert/apps.ocpsc.redhat.ren.key

oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \
  --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'

oc get --namespace openshift-ingress-operator ingresscontrollers/default \
  --output jsonpath='{.spec.defaultCertificate}'

# upgrade ingress ca
oc --namespace openshift-ingress create secret tls custom-certs-default-01 --cert=/data/cert/apps.ocpsc.redhat.ren.crt --key=/data/cert/apps.ocpsc.redhat.ren.key

oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \
  --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default-01"}}}'

##################################################3
# add rhel hw node, and remove vm worker node
ssh-copy-id root@infra-0.ocpsc.redhat.ren
ssh root@infra-0.ocpsc.redhat.ren

ssh-copy-id root@infra-1.ocpsc.redhat.ren
ssh root@infra-1.ocpsc.redhat.ren

# disable firewalld on infra-0, infra-1

yum -y install openshift-ansible openshift-clients jq

# create rhel-ansible-host
cat <<EOF > /data/ocp4/rhel-ansible-host
[all:vars]
ansible_user=root 
#ansible_become=True 

openshift_kubeconfig_path="/data/ocp4/auth/kubeconfig" 

[new_workers] 
infra-0.ocpsc.redhat.ren
infra-1.ocpsc.redhat.ren

EOF

ansible-playbook -i /data/ocp4/rhel-ansible-host /usr/share/ansible/openshift-ansible/playbooks/scaleup.yml

# then remove old vm-worker0, vm-worker1
oc get nodes -o wide
oc adm cordon vm-worker-0.ocpsc.redhat.ren
oc adm cordon vm-worker-1.ocpsc.redhat.ren
oc adm drain vm-worker-0.ocpsc.redhat.ren --force --delete-local-data --ignore-daemonsets
oc adm drain vm-worker-1.ocpsc.redhat.ren --force --delete-local-data --ignore-daemonsets  
oc delete nodes vm-worker-0.ocpsc.redhat.ren
oc delete nodes vm-worker-1.ocpsc.redhat.ren
oc get nodes -o wide

# create nfs storage and enable image operator
bash ocp4-upi-helpernode/files/nfs-provisioner-setup.sh

oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"managementState": "Managed","storage":{"pvc":{"claim":""}}}}' --type=merge

# create operator catalog
oc patch OperatorHub cluster --type json \
    -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'

cat <<EOF > redhat-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: Redhat Operator Catalog
  sourceType: grpc
  image: registry.redhat.ren:5443/docker.io/wangzheng422/operator-catalog:redhat-2020-03-23
  publisher: Red Hat
EOF
oc create -f redhat-operator-catalog.yaml

# create infra node
# https://access.redhat.com/solutions/4287111
oc get node

oc label node infra0.hsc.redhat.ren node-role.kubernetes.io/infra=""
oc label node infra1.hsc.redhat.ren node-role.kubernetes.io/infra=""

oc patch ingresscontroller default -n openshift-ingress-operator --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector": {"matchLabels":{"node-role.kubernetes.io/infra":""}}}}}'

oc patch configs.imageregistry.operator.openshift.io/cluster -n openshift-image-registry --type=merge --patch '{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

oc get pod -o wide -n openshift-image-registry --sort-by=".spec.nodeName"

cat <<EOF > /data/ocp4/monitoring-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |+
    alertmanagerMain:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    prometheusK8s:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      volumeClaimTemplate:
        metadata:
          name: localpvc
        spec:
          storageClassName: local-sc
          resources:
            requests:
              storage: 400Gi
    prometheusOperator:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    grafana:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    k8sPrometheusAdapter:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    kubeStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    telemeterClient:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
EOF

oc apply -f /data/ocp4/monitoring-cm.yaml -n openshift-monitoring

oc get pods -n openshift-monitoring -o wide --sort-by=".spec.nodeName"

###########################################
## add user for zte
cd /data/ocp4
touch /data/ocp4/htpasswd
htpasswd -B /data/ocp4/htpasswd zteca
htpasswd -B /data/ocp4/htpasswd zteadm

oc create secret generic htpasswd --from-file=/data/ocp4/htpasswd -n openshift-config

oc apply -f - <<EOF
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: Local Password
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpasswd
EOF

watch oc get pod -n openshift-authentication

oc adm policy add-cluster-role-to-user cluster-admin  zteca

oc new-project zte
oc adm policy add-role-to-user admin zteadm -n zte

oc get clusterrolebinding.rbac

oc get clusterrole.rbac

oc adm policy add-cluster-role-to-user cluster-reader  zteadm
oc adm policy remove-cluster-role-from-user cluster-reader  zteadm

#########################################
# add more rhel-ansible-host

# scp vars_static.yaml to helper
cd /data/ocp4/ocp4-upi-helpernode
ansible-playbook -e @vars-static.yaml -e staticips=true tasks/main.yml

ssh-copy-id root@worker-0.ocpsc.redhat.ren

cat <<EOF > /data/ocp4/rhel-ansible-host
[all:vars]
ansible_user=root 
#ansible_become=True 

openshift_kubeconfig_path="/data/ocp4/auth/kubeconfig" 

[workers] 
infra-0.ocpsc.redhat.ren
infra-1.ocpsc.redhat.ren

[new_workers]
worker-0.ocpsc.redhat.ren

EOF

ansible-playbook -i /data/ocp4/rhel-ansible-host /usr/share/ansible/openshift-ansible/playbooks/scaleup.yml

#########################################
# add more rhel-ansible-host
cat << EOF  > /etc/yum/pluginconf.d/subscription-manager.conf
[main]
enabled=0
EOF
# scp vars_static.yaml to helper
cd /data/ocp4/ocp4-upi-helpernode
ansible-playbook -e @vars-static.yaml -e staticips=true tasks/main.yml

ssh-copy-id root@worker-1.ocpsc.redhat.ren
ssh-copy-id root@worker-2.ocpsc.redhat.ren

cat <<EOF > /data/ocp4/rhel-ansible-host
[all:vars]
ansible_user=root 
#ansible_become=True 

openshift_kubeconfig_path="/data/ocp4/auth/kubeconfig" 

[workers] 
infra-0.ocpsc.redhat.ren
infra-1.ocpsc.redhat.ren
worker-0.ocpsc.redhat.ren

[new_workers]
worker-1.ocpsc.redhat.ren
worker-2.ocpsc.redhat.ren

EOF

ansible-playbook -i /data/ocp4/rhel-ansible-host /usr/share/ansible/openshift-ansible/playbooks/scaleup.yml


#########################################
# add worker-3 rhel-ansible-host
# upload vars-static.yaml 
cd /data/ocp4/ocp4-upi-helpernode
ansible-playbook -e @vars-static.yaml -e staticips=true tasks/main.yml

cat << EOF  > /etc/yum/pluginconf.d/subscription-manager.conf
[main]
enabled=0
EOF
# scp vars_static.yaml to helper
cd /data/ocp4/ocp4-upi-helpernode
ansible-playbook -e @vars-static.yaml -e staticips=true tasks/main.yml

ssh-copy-id root@worker-3.ocpsc.redhat.ren

cat <<EOF > /data/ocp4/rhel-ansible-host
[all:vars]
ansible_user=root 
#ansible_become=True 

openshift_kubeconfig_path="/data/ocp4/auth/kubeconfig" 

[workers] 
infra-0.ocpsc.redhat.ren
infra-1.ocpsc.redhat.ren
worker-0.ocpsc.redhat.ren
worker-1.ocpsc.redhat.ren
worker-2.ocpsc.redhat.ren

[new_workers]
worker-3.ocpsc.redhat.ren

EOF

ansible-playbook -i /data/ocp4/rhel-ansible-host /usr/share/ansible/openshift-ansible/playbooks/scaleup.yml


helper node day 2 sec


cat << EOF > wzh.script
#!/bin/bash

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -s 127.0.0.1/32 -j ACCEPT
iptables -A INPUT -s 223.87.20.0/24 -j ACCEPT
iptables -A INPUT -s 117.177.241.0/24 -j ACCEPT
iptables -A INPUT -s 39.134.200.0/24 -j ACCEPT
iptables -A INPUT -s 39.134.201.0/24 -j ACCEPT
iptables -A INPUT -s 39.137.101.0/24 -j ACCEPT
iptables -A INPUT -s 192.168.7.0/24 -j ACCEPT
iptables -A INPUT -s 112.44.102.224/27 -j ACCEPT
iptables -A INPUT -s 47.93.86.113/32 -j ACCEPT
iptables -A INPUT -s 39.134.204.0/24 -j ACCEPT
iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -p udp -j REJECT

EOF

var_local=$(cat ./wzh.script | python3 -c "import sys, urllib.parse; print(urllib.parse.quote(''.join(sys.stdin.readlines())))"  )

cat <<EOF > 45-wzh-service.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 45-wzh-service
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain,${var_local}
          verification: {}
        filesystem: root
        mode: 0755
        path: /etc/rc.d/wzh.local
    systemd:
      units:
      - name: wzh.service
        enabled: true
        contents: |
          [Unit]
          Description=/etc/rc.d/wzh.local Compatibility
          Documentation=zhengwan@redhat.com
          ConditionFileIsExecutable=/etc/rc.d/wzh.local
          After=network.target

          [Service]
          Type=oneshot
          User=root
          Group=root
          ExecStart=/bin/bash -c /etc/rc.d/wzh.local

          [Install]
          WantedBy=multi-user.target

EOF
oc apply -f 45-wzh-service.yaml -n openshift-config


helper node quay

# on helper node
firewall-cmd --permanent --zone=public --add-port=4443/tcp
firewall-cmd --reload

podman pod create --infra-image registry.redhat.ren:5443/gcr.io/google_containers/pause-amd64:3.0 --name quay -p 4443:8443 

cd /data
rm -rf /data/quay
podman run -d --name quay-fs --entrypoint "tail" registry.redhat.ren:5443/docker.io/wangzheng422/quay-fs:3.2.0-init -f /dev/null
podman cp quay-fs:/quay.tgz /data/
tar zxf quay.tgz
podman rm -fv quay-fs

export MYSQL_CONTAINER_NAME=quay-mysql
export MYSQL_DATABASE=enterpriseregistrydb
export MYSQL_PASSWORD=zvbk3fzp5f5m2a8j
export MYSQL_USER=quayuser
export MYSQL_ROOT_PASSWORD=q98u335musckfqxe

podman run \
    --detach \
    --restart=always \
    --env MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} \
    --env MYSQL_USER=${MYSQL_USER} \
    --env MYSQL_PASSWORD=${MYSQL_PASSWORD} \
    --env MYSQL_DATABASE=${MYSQL_DATABASE} \
    --name ${MYSQL_CONTAINER_NAME} \
    --privileged=true \
    --pod quay \
    -v /data/quay/lib/mysql:/var/lib/mysql/data:Z \
    registry.redhat.ren:5443/registry.access.redhat.com/rhscl/mysql-57-rhel7

podman run -d --restart=always \
    --pod quay \
    --privileged=true \
    --name quay-redis \
    -v  /data/quay/lib/redis:/var/lib/redis/data:Z \
    registry.redhat.ren:5443/registry.access.redhat.com/rhscl/redis-32-rhel7

sleep 10

/bin/cp -f /data/cert/redhat.ren.crt /data/quay/config/extra_ca_certs/redhat.ren.crt
/bin/cp -f /data/cert/redhat.ren.crt /data/quay/config/ssl.cert
/bin/cp -f /data/cert/redhat.ren.key /data/quay/config/ssl.key

podman run --restart=always \
    --sysctl net.core.somaxconn=4096 \
    --privileged=true \
    --name quay-master \
    --pod quay \
    --add-host mysql:127.0.0.1 \
    --add-host redis:127.0.0.1 \
    --add-host clair:127.0.0.1 \
    -v /data/quay/config:/conf/stack:Z \
    -v /data/quay/storage:/datastorage:Z \
    -d registry.redhat.ren:5443/quay.io/redhat/quay:v3.2.1

# https://registry.redhat.ren:4443/

podman run --name clair-postgres --pod quay \
    -v /data/quay/lib/postgresql/data:/var/lib/postgresql/data:Z \
    -d registry.redhat.ren:5443/docker.io/library/postgres

# change /data/quay/clair-config/config.yaml
# https://registry.redhat.ren:4443 -> https://registry.redhat.ren:8443
podman run --restart=always -d \
    --name clair \
    -v /data/quay/clair-config:/clair/config:Z \
    -v /data/quay/clair-config/ca.crt:/etc/pki/ca-trust/source/anchors/ca.crt  \
    --pod quay \
    registry.redhat.ren:5443/quay.io/redhat/clair-jwt:v3.2.1

# stop and restart
podman stop clair
podman stop clair-postgres
podman stop quay-master
podman stop quay-redis
podman stop quay-mysql

podman rm quay-master
podman rm quay-redis
podman rm quay-mysql

podman rm clair
podman rm clair-postgres

podman pod ps
podman pod stop quay
podman pod rm quay

helper node zte oper

cd /data/ocp4/zte

oc project zxcdn
oc adm policy add-role-to-user admin zteadm -n zxcdn

oc create serviceaccount -n zxcdn zxcdn-app
oc adm policy add-scc-to-user privileged -z zxcdn-app -n zxcdn

# oc adm policy remove-scc-from-user privileged -z  zxcdn-app

oc get networks.operator.openshift.io cluster -o yaml

oc apply -f zte-macvlan.yaml

oc apply -f slbl7-configmap.yaml  
# oc apply -f slbl7-deployment.yaml 
oc apply -f slbl7-pod.yaml

oc apply -f ottcache-configmap.yaml  
oc apply -f ottcache-pod.yaml

# oc apply -f ott-service.yaml

oc delete -f slbl7-pod.yaml
oc delete -f ottcache-pod.yaml

## web cache
oc apply -f slb-configmap.yaml  
oc apply -f slb-deployment.yaml

oc delete -f slb-deployment.yaml

oc apply -f webcache-configmap.yaml  
oc apply -f webcache-deployment.yaml

oc delete -f webcache-deployment.yaml

helper host add vm-router


cd /data/ocp4/ocp4-upi-helpernode
ansible-playbook -e @vars-static.yaml -e staticips=true tasks/config.files.yml

# upload install-config.yaml to helper /data/ocp4
cd /data/ocp4

/bin/cp -f worker.ign /var/www/html/ignition/router-0.ign
/bin/cp -f worker.ign /var/www/html/ignition/router-1.ign
/bin/cp -f worker.ign /var/www/html/ignition/router-2.ign
/bin/cp -f worker.ign /var/www/html/ignition/router-3.ign
/bin/cp -f worker.ign /var/www/html/ignition/router-4.ign
/bin/cp -f worker.ign /var/www/html/ignition/router-5.ign
/bin/cp -f worker.ign /var/www/html/ignition/router-6.ign
/bin/cp -f worker.ign /var/www/html/ignition/router-7.ign
/bin/cp -f worker.ign /var/www/html/ignition/router-8.ign

chmod 644 /var/www/html/ignition/*


export NGINX_DIRECTORY=/data/ocp4
export RHCOSVERSION=4.3.0
export VOLID=$(isoinfo -d -i ${NGINX_DIRECTORY}/rhcos-${RHCOSVERSION}-x86_64-installer.iso | awk '/Volume id/ { print $3 }')
TEMPDIR=$(mktemp -d)
echo $VOLID
echo $TEMPDIR

cd ${TEMPDIR}
# Extract the ISO content using guestfish (to avoid sudo mount)
guestfish -a ${NGINX_DIRECTORY}/rhcos-${RHCOSVERSION}-x86_64-installer.iso \
  -m /dev/sda tar-out / - | tar xvf -

# Helper function to modify the config files
modify_cfg(){
  for file in "EFI/redhat/grub.cfg" "isolinux/isolinux.cfg"; do
    # Append the proper image and ignition urls
    sed -e '/coreos.inst=yes/s|$| coreos.inst.install_dev=vda coreos.inst.image_url='"${URL}"'\/install\/'"${BIOSMODE}"'.raw.gz coreos.inst.ignition_url='"${URL}"'\/ignition\/'"${NODE}"'.ign ip='"${IP}"'::'"${GATEWAY}"':'"${NETMASK}"':'"${FQDN}"':'"${NET_INTERFACE}"':none:'"${DNS}"' nameserver='"${DNS}"'|' ${file} > $(pwd)/${NODE}_${file##*/}
    # Boot directly in the installation
    sed -i -e 's/default vesamenu.c32/default linux/g' -e 's/timeout 600/timeout 10/g' $(pwd)/${NODE}_${file##*/}
  done
}

URL="http://117.177.241.16:8080/"
GATEWAY="117.177.241.1"
NETMASK="255.255.255.0"
DNS="117.177.241.16"

NODE="router-0"
IP="117.177.241.243"
FQDN="vm-router-0"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

NODE="router-1"
IP="117.177.241.244"
FQDN="vm-router-1"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

NODE="router-2"
IP="117.177.241.245"
FQDN="vm-router-2"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

NODE="router-3"
IP="117.177.241.246"
FQDN="vm-router-3"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

NODE="router-4"
IP="117.177.241.247"
FQDN="vm-router-4"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

NODE="router-5"
IP="117.177.241.248"
FQDN="vm-router-5"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

NODE="router-6"
IP="117.177.241.249"
FQDN="vm-router-6"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

NODE="router-7"
IP="117.177.241.250"
FQDN="vm-router-7"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

NODE="router-8"
IP="117.177.241.251"
FQDN="vm-router-8"
BIOSMODE="bios"
NET_INTERFACE="ens3"
modify_cfg

# Generate the images, one per node as the IP configuration is different...
# https://github.com/coreos/coreos-assembler/blob/master/src/cmd-buildextend-installer#L97-L103
for node in router-0 router-1 router-2 router-3 router-4 router-5 router-6 router-7 router-8; do
  # Overwrite the grub.cfg and isolinux.cfg files for each node type
  for file in "EFI/redhat/grub.cfg" "isolinux/isolinux.cfg"; do
    /bin/cp -f $(pwd)/${node}_${file##*/} ${file}
  done
  # As regular user!
  genisoimage -verbose -rock -J -joliet-long -volset ${VOLID} \
    -eltorito-boot isolinux/isolinux.bin -eltorito-catalog isolinux/boot.cat \
    -no-emul-boot -boot-load-size 4 -boot-info-table \
    -eltorito-alt-boot -efi-boot images/efiboot.img -no-emul-boot \
    -o ${NGINX_DIRECTORY}/${node}.iso .
done

# Optionally, clean up
cd /data/ocp4
rm -Rf ${TEMPDIR}

cd ${NGINX_DIRECTORY}

scp router-*.iso root@117.177.241.21:/data/ocp4/

# after vm on bootstrap created
oc get csr
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve

oc label node vm-router-0.ocpsc.redhat.ren node-role.kubernetes.io/router=''
oc label node vm-router-1.ocpsc.redhat.ren node-role.kubernetes.io/router=''
oc label node vm-router-2.ocpsc.redhat.ren node-role.kubernetes.io/router=''
oc label node vm-router-3.ocpsc.redhat.ren node-role.kubernetes.io/router=''
oc label node vm-router-4.ocpsc.redhat.ren node-role.kubernetes.io/router=''
# oc label node vm-router-5.ocpsc.redhat.ren node-role.kubernetes.io/router=''
# oc label node vm-router-6.ocpsc.redhat.ren node-role.kubernetes.io/router=''
# oc label node vm-router-7.ocpsc.redhat.ren node-role.kubernetes.io/router=''
# oc label node vm-router-8.ocpsc.redhat.ren node-role.kubernetes.io/router=''

##########################
## secure the router vm

cat << EOF > router.mcp.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: router
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,router]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/router: ""
EOF
oc apply -f router.mcp.yaml

cat << EOF > wzh.script
#!/bin/bash

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -s 127.0.0.1/32 -j ACCEPT
iptables -A INPUT -s 223.87.20.0/24 -j ACCEPT
iptables -A INPUT -s 117.177.241.0/24 -j ACCEPT
iptables -A INPUT -s 39.134.200.0/24 -j ACCEPT
iptables -A INPUT -s 39.134.201.0/24 -j ACCEPT
iptables -A INPUT -s 39.137.101.0/24 -j ACCEPT
iptables -A INPUT -s 192.168.7.0/24 -j ACCEPT
iptables -A INPUT -s 112.44.102.224/27 -j ACCEPT
iptables -A INPUT -s 47.93.86.113/32 -j ACCEPT
iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -p udp -j REJECT

EOF

var_local=$(cat ./wzh.script | python3 -c "import sys, urllib.parse; print(urllib.parse.quote(''.join(sys.stdin.readlines())))"  )

cat <<EOF > 45-router-wzh-service.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: router
  name: 45-router-wzh-service
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain,${var_local}
          verification: {}
        filesystem: root
        mode: 0755
        path: /etc/rc.d/wzh.local
    systemd:
      units:
      - name: wzh.service
        enabled: true
        contents: |
          [Unit]
          Description=/etc/rc.d/wzh.local Compatibility
          Documentation=zhengwan@redhat.com
          ConditionFileIsExecutable=/etc/rc.d/wzh.local
          After=network.target

          [Service]
          Type=oneshot
          User=root
          Group=root
          ExecStart=/bin/bash -c /etc/rc.d/wzh.local

          [Install]
          WantedBy=multi-user.target

EOF
oc apply -f 45-router-wzh-service.yaml -n openshift-config

# DO NOT
# cp 99-master-zzz-container-registries.yaml 99-router-zzz-container-registries.yaml 
# # change: machineconfiguration.openshift.io/role: router
# oc apply -f ./99-router-zzz-container-registries.yaml -n openshift-config

# on helper node
cat << EOF > /etc/docker-distribution/registry/config.yml
version: 0.1
log:
  fields:
    service: registry
storage:
    cache:
        layerinfo: inmemory
    filesystem:
        rootdirectory: /data/registry
    delete:
        enabled: true
http:
    addr: :5443
    tls:
       certificate: /data/cert/redhat.ren.crt
       key: /data/cert/redhat.ren.key

EOF

systemctl restart docker-distribution


helper node zte tcp-router


oc project openshift-ingress

# install the tcp-router and demo
oc create configmap customrouter-wzh --from-file=haproxy-config.template
oc apply -f haproxy.router.yaml

oc project zxcdn

oc apply -f ott-service.tcp.route.yaml


helper node cluster tunning

# tunning for pid.max

oc label mcp worker custom-kubelet-pod-pids-limit=true

cat << EOF > PodPidsLimit.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: pod-pids-limit
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet-pod-pids-limit: 'true'
  kubeletConfig:
    PodPidsLimit: 4096
EOF
oc apply -f PodPidsLimit.yaml

cat << EOF > crio.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: set-log-and-pid
spec:
 machineConfigPoolSelector:
   matchLabels:
     custom-kubelet-pod-pids-limit: 'true'
 containerRuntimeConfig:
   pidsLimit: 10240
EOF
oc apply -f crio.yaml


helper node local storage

https://docs.openshift.com/container-platform/4.3/storage/persistent_storage/persistent-storage-local.html


oc new-project local-storage


apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "local-disks"
  namespace: "local-storage" 
spec:
  nodeSelector: 
    nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - infra0.hsc.redhat.ren
          - infra1.hsc.redhat.ren
  storageClassDevices:
    - storageClassName: "local-sc"
      volumeMode: Filesystem 
      fsType: xfs 
      devicePaths: 
        - /dev/datavg/monitorlv


bootstrap node day1

##########################################################3
## on bootstrap
yum -y install tigervnc-server tigervnc gnome-terminal gnome-session gnome-classic-session gnome-terminal nautilus-open-terminal control-center liberation-mono-fonts google-noto-sans-cjk-fonts google-noto-sans-fonts fonts-tweak-tool

yum install -y    qgnomeplatform   xdg-desktop-portal-gtk   NetworkManager-libreswan-gnome   PackageKit-command-not-found   PackageKit-gtk3-module   abrt-desktop   at-spi2-atk   at-spi2-core   avahi   baobab   caribou   caribou-gtk2-module   caribou-gtk3-module   cheese   compat-cheese314   control-center   dconf   empathy   eog   evince   evince-nautilus   file-roller   file-roller-nautilus   firewall-config   firstboot   fprintd-pam   gdm   gedit   glib-networking   gnome-bluetooth   gnome-boxes   gnome-calculator   gnome-classic-session   gnome-clocks   gnome-color-manager   gnome-contacts   gnome-dictionary   gnome-disk-utility   gnome-font-viewer   gnome-getting-started-docs   gnome-icon-theme   gnome-icon-theme-extras   gnome-icon-theme-symbolic   gnome-initial-setup   gnome-packagekit   gnome-packagekit-updater   gnome-screenshot   gnome-session   gnome-session-xsession   gnome-settings-daemon   gnome-shell   gnome-software   gnome-system-log   gnome-system-monitor   gnome-terminal   gnome-terminal-nautilus   gnome-themes-standard   gnome-tweak-tool   nm-connection-editor   orca   redhat-access-gui   sane-backends-drivers-scanners   seahorse   setroubleshoot   sushi   totem   totem-nautilus   vinagre   vino   xdg-user-dirs-gtk   yelp

yum install -y    cjkuni-uming-fonts   dejavu-sans-fonts   dejavu-sans-mono-fonts   dejavu-serif-fonts   gnu-free-mono-fonts   gnu-free-sans-fonts   gnu-free-serif-fonts   google-crosextra-caladea-fonts   google-crosextra-carlito-fonts   google-noto-emoji-fonts   jomolhari-fonts   khmeros-base-fonts   liberation-mono-fonts   liberation-sans-fonts   liberation-serif-fonts   lklug-fonts   lohit-assamese-fonts   lohit-bengali-fonts   lohit-devanagari-fonts   lohit-gujarati-fonts   lohit-kannada-fonts   lohit-malayalam-fonts   lohit-marathi-fonts   lohit-nepali-fonts   lohit-oriya-fonts   lohit-punjabi-fonts   lohit-tamil-fonts   lohit-telugu-fonts   madan-fonts   nhn-nanum-gothic-fonts   open-sans-fonts   overpass-fonts   paktype-naskh-basic-fonts   paratype-pt-sans-fonts   sil-abyssinica-fonts   sil-nuosu-fonts   sil-padauk-fonts   smc-meera-fonts   stix-fonts   thai-scalable-waree-fonts   ucs-miscfixed-fonts   vlgothic-fonts   wqy-microhei-fonts   wqy-zenhei-fonts

vncpasswd

cat << EOF > ~/.vnc/xstartup
#!/bin/sh
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
gnome-session &
EOF
chmod +x ~/.vnc/xstartup

vncserver :1 -geometry 1280x800
# 如果你想停掉vnc server,这么做
vncserver -kill :1

firewall-cmd --permanent --add-port=6001/tcp
firewall-cmd --permanent --add-port=5901/tcp
firewall-cmd --reload

# 配置kvm环境
yum -y install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install virt-viewer virt-manager

systemctl enable libvirtd
systemctl start libvirtd

brctl show
virsh net-list

cat << EOF >  /data/virt-net.xml
<network>
  <name>br0</name>
  <forward mode='bridge'>
    <bridge name='br0'/>
  </forward>
</network>
EOF

virsh net-define --file virt-net.xml
virsh net-dumpxml br0
# virsh net-undefine openshift4
# virsh net-destroy openshift4
virsh net-autostart br0
virsh net-start br0

cp /etc/sysconfig/network-scripts/ifcfg-em1 /etc/sysconfig/network-scripts/ifcfg-em1.orig

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-em1
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=em1
DEVICE=em1
ONBOOT=yes
# IPADDR=117.177.241.21
# PREFIX=24
# GATEWAY=117.177.241.1
IPV6_PRIVACY=no
# DNS1=117.177.241.16
BRIDGE=br0
EOF

cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-br0 
TYPE=Bridge
BOOTPROTO=static
IPADDR=117.177.241.21
GATEWAY=117.177.241.1
DNS1=117.177.241.16
ONBOOT=yes
DEFROUTE=yes
NAME=br0
DEVICE=br0
PREFIX=24
EOF

systemctl restart network

virt-install --name=ocp4-bootstrap --vcpus=2 --ram=16384 \
--disk path=/data/kvm/ocp4-bootstrap.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/bootstrap-static.iso   

virt-install --name=ocp4-master0 --vcpus=8 --ram=65536 \
--disk path=/data/kvm/ocp4-master0.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/master-0.iso 

# virt-install --name=ocp4-master1 --vcpus=20 --ram=200704 \
# --disk path=/data/kvm/ocp4-master1.qcow2,bus=virtio,size=200 \
# --os-variant rhel8.0 --network bridge=br0,model=virtio \
# --boot menu=on --cdrom /data/ocp4/master-1.iso 

virt-install --name=ocp4-master2 --vcpus=8 --ram=65536 \
--disk path=/data/kvm/ocp4-master2.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/master-2.iso 

virt-install --name=ocp4-worker0 --vcpus=4 --ram=32768 \
--disk path=/data/kvm/ocp4-worker0.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/worker-0.iso 

virt-install --name=ocp4-worker1 --vcpus=4 --ram=32768 \
--disk path=/data/kvm/ocp4-worker1.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/worker-1.iso 


tar -cvf - ocp4-master0.qcow2 | pigz -c > /data/kvm/ocp4-master0.qcow2.tgz
rsync -e "ssh -c chacha20-poly1305@openssh.com" --info=progress2 -P -arz  /data/kvm/ocp4-master0.qcow2.tgz root@117.177.241.18:/data/kvm/

tar -cvf - ocp4-master2.qcow2 | pigz -c > /data/kvm/ocp4-master2.qcow2.tgz
rsync -e "ssh -c chacha20-poly1305@openssh.com" --info=progress2 -P -arz  /data/kvm/ocp4-master2.qcow2.tgz root@117.177.241.22:/data/kvm/

# anti scan
firewall-cmd --permanent --new-ipset=my-allow-list --type=hash:net
firewall-cmd --permanent --get-ipsets

cat > /root/iplist.txt <<EOL
127.0.0.1/32
223.87.20.0/24
117.177.241.0/24
39.134.200.0/24
39.134.201.0/24
39.137.101.0/24
192.168.7.0/24
112.44.102.224/27
47.93.86.113/32
EOL

firewall-cmd --permanent --ipset=my-allow-list --add-entries-from-file=iplist.txt

firewall-cmd --permanent --ipset=my-allow-list --get-entries

firewall-cmd --permanent --zone=trusted --add-source=ipset:my-allow-list 
firewall-cmd --reload

firewall-cmd --list-all
firewall-cmd --get-active-zones

firewall-cmd --set-default-zone=block
firewall-cmd --runtime-to-permanent
firewall-cmd --reload

# https://access.redhat.com/solutions/39604
virsh list

virsh dump ocp4-router-0 /data/tmp/ocp4-router-0.dump --memory-only --verbose

virsh dump ocp4-router-1 /data/tmp/ocp4-router-1.dump --memory-only --verbose

virsh dump ocp4-router-2 /data/tmp/ocp4-router-2.dump --memory-only --verbose

virsh dump ocp4-router-3 /data/tmp/ocp4-router-3.dump --memory-only --verbose

cd /data
tar -cvf - tmp/ | pigz -c > virsh.dump.tgz



################################
## add more router vm
virt-install --name=ocp4-router-0 --vcpus=4 --ram=16384 \
--disk path=/data/kvm/ocp4-router-0.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/router-0.iso 

virt-install --name=ocp4-router-1 --vcpus=4 --ram=16384 \
--disk path=/data/kvm/ocp4-router-1.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/router-1.iso 

virt-install --name=ocp4-router-2 --vcpus=4 --ram=16384 \
--disk path=/data/kvm/ocp4-router-2.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/router-2.iso 

virt-install --name=ocp4-router-3 --vcpus=4 --ram=16384 \
--disk path=/data/kvm/ocp4-router-3.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/router-3.iso 

virt-install --name=ocp4-router-4 --vcpus=4 --ram=16384 \
--disk path=/data/kvm/ocp4-router-4.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/router-4.iso 

# virt-install --name=ocp4-router-5 --vcpus=2 --ram=8192 \
# --disk path=/data/kvm/ocp4-router-5.qcow2,bus=virtio,size=200 \
# --os-variant rhel8.0 --network bridge=br0,model=virtio \
# --boot menu=on --cdrom /data/ocp4/router-5.iso 

# virt-install --name=ocp4-router-6 --vcpus=2 --ram=8192 \
# --disk path=/data/kvm/ocp4-router-6.qcow2,bus=virtio,size=200 \
# --os-variant rhel8.0 --network bridge=br0,model=virtio \
# --boot menu=on --cdrom /data/ocp4/router-6.iso 

# virt-install --name=ocp4-router-7 --vcpus=2 --ram=8192 \
# --disk path=/data/kvm/ocp4-router-7.qcow2,bus=virtio,size=200 \
# --os-variant rhel8.0 --network bridge=br0,model=virtio \
# --boot menu=on --cdrom /data/ocp4/router-7.iso 

# virt-install --name=ocp4-router-8 --vcpus=2 --ram=8192 \
# --disk path=/data/kvm/ocp4-router-8.qcow2,bus=virtio,size=200 \
# --os-variant rhel8.0 --network bridge=br0,model=virtio \
# --boot menu=on --cdrom /data/ocp4/router-8.iso 


# helper node operation


master1 node day1

##########################################################3
## on master1
yum -y install tigervnc-server tigervnc gnome-terminal gnome-session gnome-classic-session gnome-terminal nautilus-open-terminal control-center liberation-mono-fonts google-noto-sans-cjk-fonts google-noto-sans-fonts fonts-tweak-tool

yum install -y    qgnomeplatform   xdg-desktop-portal-gtk   NetworkManager-libreswan-gnome   PackageKit-command-not-found   PackageKit-gtk3-module   abrt-desktop   at-spi2-atk   at-spi2-core   avahi   baobab   caribou   caribou-gtk2-module   caribou-gtk3-module   cheese   compat-cheese314   control-center   dconf   empathy   eog   evince   evince-nautilus   file-roller   file-roller-nautilus   firewall-config   firstboot   fprintd-pam   gdm   gedit   glib-networking   gnome-bluetooth   gnome-boxes   gnome-calculator   gnome-classic-session   gnome-clocks   gnome-color-manager   gnome-contacts   gnome-dictionary   gnome-disk-utility   gnome-font-viewer   gnome-getting-started-docs   gnome-icon-theme   gnome-icon-theme-extras   gnome-icon-theme-symbolic   gnome-initial-setup   gnome-packagekit   gnome-packagekit-updater   gnome-screenshot   gnome-session   gnome-session-xsession   gnome-settings-daemon   gnome-shell   gnome-software   gnome-system-log   gnome-system-monitor   gnome-terminal   gnome-terminal-nautilus   gnome-themes-standard   gnome-tweak-tool   nm-connection-editor   orca   redhat-access-gui   sane-backends-drivers-scanners   seahorse   setroubleshoot   sushi   totem   totem-nautilus   vinagre   vino   xdg-user-dirs-gtk   yelp

yum install -y    cjkuni-uming-fonts   dejavu-sans-fonts   dejavu-sans-mono-fonts   dejavu-serif-fonts   gnu-free-mono-fonts   gnu-free-sans-fonts   gnu-free-serif-fonts   google-crosextra-caladea-fonts   google-crosextra-carlito-fonts   google-noto-emoji-fonts   jomolhari-fonts   khmeros-base-fonts   liberation-mono-fonts   liberation-sans-fonts   liberation-serif-fonts   lklug-fonts   lohit-assamese-fonts   lohit-bengali-fonts   lohit-devanagari-fonts   lohit-gujarati-fonts   lohit-kannada-fonts   lohit-malayalam-fonts   lohit-marathi-fonts   lohit-nepali-fonts   lohit-oriya-fonts   lohit-punjabi-fonts   lohit-tamil-fonts   lohit-telugu-fonts   madan-fonts   nhn-nanum-gothic-fonts   open-sans-fonts   overpass-fonts   paktype-naskh-basic-fonts   paratype-pt-sans-fonts   sil-abyssinica-fonts   sil-nuosu-fonts   sil-padauk-fonts   smc-meera-fonts   stix-fonts   thai-scalable-waree-fonts   ucs-miscfixed-fonts   vlgothic-fonts   wqy-microhei-fonts   wqy-zenhei-fonts

vncpasswd

cat << EOF > ~/.vnc/xstartup
#!/bin/sh
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
gnome-session &
EOF
chmod +x ~/.vnc/xstartup

vncserver :1 -geometry 1280x800
# 如果你想停掉vnc server,这么做
vncserver -kill :1

firewall-cmd --permanent --add-port=6001/tcp
firewall-cmd --permanent --add-port=5901/tcp
firewall-cmd --reload

# 配置kvm环境
yum -y install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install virt-viewer virt-manager

systemctl enable libvirtd
systemctl start libvirtd

brctl show
virsh net-list

cat << EOF >  /data/virt-net.xml
<network>
  <name>br0</name>
  <forward mode='bridge'>
    <bridge name='br0'/>
  </forward>
</network>
EOF

virsh net-define --file virt-net.xml
virsh net-dumpxml br0
# virsh net-undefine openshift4
# virsh net-destroy openshift4
virsh net-autostart br0
virsh net-start br0

cp /etc/sysconfig/network-scripts/ifcfg-em1 /etc/sysconfig/network-scripts/ifcfg-em1.orig

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-em1
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=em1
DEVICE=em1
ONBOOT=yes
# IPADDR=117.177.241.17
# PREFIX=24
# GATEWAY=117.177.241.1
IPV6_PRIVACY=no
# DNS1=117.177.241.16
BRIDGE=br0
EOF

cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-br0 
TYPE=Bridge
BOOTPROTO=static
IPADDR=117.177.241.17
GATEWAY=117.177.241.1
DNS1=117.177.241.16
ONBOOT=yes
DEFROUTE=yes
NAME=br0
DEVICE=br0
PREFIX=24
EOF

systemctl restart network

virt-install --name=ocp4-master1 --vcpus=20 --ram=200704 \
--disk path=/data/kvm/ocp4-master1.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on --cdrom /data/ocp4/master-1.iso 

virsh list --all

virsh start ocp4-master1

# anti scan
firewall-cmd --permanent --new-ipset=my-allow-list --type=hash:net
firewall-cmd --permanent --get-ipsets

cat > /root/iplist.txt <<EOL
127.0.0.1/32
223.87.20.0/24
117.177.241.0/24
39.134.200.0/24
39.134.201.0/24
39.137.101.0/24
192.168.7.0/24
112.44.102.224/27
47.93.86.113/32
EOL

firewall-cmd --permanent --ipset=my-allow-list --add-entries-from-file=iplist.txt

firewall-cmd --permanent --ipset=my-allow-list --get-entries

firewall-cmd --permanent --zone=trusted --add-source=ipset:my-allow-list 
firewall-cmd --reload

firewall-cmd --list-all
firewall-cmd --get-active-zones

firewall-cmd --set-default-zone=block
firewall-cmd --runtime-to-permanent
firewall-cmd --reload

master0 node day1

########################################################
# master0 
yum -y install tigervnc-server tigervnc gnome-terminal gnome-session gnome-classic-session gnome-terminal nautilus-open-terminal control-center liberation-mono-fonts google-noto-sans-cjk-fonts google-noto-sans-fonts fonts-tweak-tool

yum install -y    qgnomeplatform   xdg-desktop-portal-gtk   NetworkManager-libreswan-gnome   PackageKit-command-not-found   PackageKit-gtk3-module   abrt-desktop   at-spi2-atk   at-spi2-core   avahi   baobab   caribou   caribou-gtk2-module   caribou-gtk3-module   cheese   compat-cheese314   control-center   dconf   empathy   eog   evince   evince-nautilus   file-roller   file-roller-nautilus   firewall-config   firstboot   fprintd-pam   gdm   gedit   glib-networking   gnome-bluetooth   gnome-boxes   gnome-calculator   gnome-classic-session   gnome-clocks   gnome-color-manager   gnome-contacts   gnome-dictionary   gnome-disk-utility   gnome-font-viewer   gnome-getting-started-docs   gnome-icon-theme   gnome-icon-theme-extras   gnome-icon-theme-symbolic   gnome-initial-setup   gnome-packagekit   gnome-packagekit-updater   gnome-screenshot   gnome-session   gnome-session-xsession   gnome-settings-daemon   gnome-shell   gnome-software   gnome-system-log   gnome-system-monitor   gnome-terminal   gnome-terminal-nautilus   gnome-themes-standard   gnome-tweak-tool   nm-connection-editor   orca   redhat-access-gui   sane-backends-drivers-scanners   seahorse   setroubleshoot   sushi   totem   totem-nautilus   vinagre   vino   xdg-user-dirs-gtk   yelp

yum install -y    cjkuni-uming-fonts   dejavu-sans-fonts   dejavu-sans-mono-fonts   dejavu-serif-fonts   gnu-free-mono-fonts   gnu-free-sans-fonts   gnu-free-serif-fonts   google-crosextra-caladea-fonts   google-crosextra-carlito-fonts   google-noto-emoji-fonts   jomolhari-fonts   khmeros-base-fonts   liberation-mono-fonts   liberation-sans-fonts   liberation-serif-fonts   lklug-fonts   lohit-assamese-fonts   lohit-bengali-fonts   lohit-devanagari-fonts   lohit-gujarati-fonts   lohit-kannada-fonts   lohit-malayalam-fonts   lohit-marathi-fonts   lohit-nepali-fonts   lohit-oriya-fonts   lohit-punjabi-fonts   lohit-tamil-fonts   lohit-telugu-fonts   madan-fonts   nhn-nanum-gothic-fonts   open-sans-fonts   overpass-fonts   paktype-naskh-basic-fonts   paratype-pt-sans-fonts   sil-abyssinica-fonts   sil-nuosu-fonts   sil-padauk-fonts   smc-meera-fonts   stix-fonts   thai-scalable-waree-fonts   ucs-miscfixed-fonts   vlgothic-fonts   wqy-microhei-fonts   wqy-zenhei-fonts

vncpasswd

cat << EOF > ~/.vnc/xstartup
#!/bin/sh
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
gnome-session &
EOF
chmod +x ~/.vnc/xstartup

vncserver :1 -geometry 1280x800
# 如果你想停掉vnc server,这么做
vncserver -kill :1

firewall-cmd --permanent --add-port=6001/tcp
firewall-cmd --permanent --add-port=5901/tcp
firewall-cmd --reload

# 配置kvm环境
yum -y install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install virt-viewer virt-manager

systemctl enable libvirtd
systemctl start libvirtd

brctl show
virsh net-list

cat << EOF >  /data/virt-net.xml
<network>
  <name>br0</name>
  <forward mode='bridge'>
    <bridge name='br0'/>
  </forward>
</network>
EOF

virsh net-define --file virt-net.xml
virsh net-dumpxml br0
# virsh net-undefine openshift4
# virsh net-destroy openshift4
virsh net-autostart br0
virsh net-start br0

cp /etc/sysconfig/network-scripts/ifcfg-em1 /etc/sysconfig/network-scripts/ifcfg-em1.orig

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-em1
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=em1
DEVICE=em1
ONBOOT=yes
# IPADDR=117.177.241.18
# PREFIX=24
# GATEWAY=117.177.241.1
IPV6_PRIVACY=no
# DNS1=117.177.241.16
BRIDGE=br0
EOF

cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-br0 
TYPE=Bridge
BOOTPROTO=static
IPADDR=117.177.241.18
GATEWAY=117.177.241.1
DNS1=117.177.241.16
ONBOOT=yes
DEFROUTE=yes
NAME=br0
DEVICE=br0
PREFIX=24
EOF

systemctl restart network

mkdir -p /data/ocp4
mkdir -p /data/kvm

pigz -dc ocp4-master0.qcow2.tgz | tar xf -

virt-install --name=ocp4-master0 --vcpus=20 --ram=200704 \
--disk path=/data/kvm/ocp4-master0.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on 

virsh list --all

virsh start ocp4-master0

# anti scan
firewall-cmd --permanent --new-ipset=my-allow-list --type=hash:net
firewall-cmd --permanent --get-ipsets

cat > /root/iplist.txt <<EOL
127.0.0.1/32
223.87.20.0/24
117.177.241.0/24
39.134.200.0/24
39.134.201.0/24
39.137.101.0/24
192.168.7.0/24
112.44.102.224/27
47.93.86.113/32
EOL

firewall-cmd --permanent --ipset=my-allow-list --add-entries-from-file=iplist.txt

firewall-cmd --permanent --ipset=my-allow-list --get-entries

firewall-cmd --permanent --zone=trusted --add-source=ipset:my-allow-list 
firewall-cmd --reload

firewall-cmd --list-all
firewall-cmd --get-active-zones

firewall-cmd --set-default-zone=block
firewall-cmd --runtime-to-permanent
firewall-cmd --reload

master2 node day1

########################################################
# master2 
yum -y install tigervnc-server tigervnc gnome-terminal gnome-session gnome-classic-session gnome-terminal nautilus-open-terminal control-center liberation-mono-fonts google-noto-sans-cjk-fonts google-noto-sans-fonts fonts-tweak-tool

yum install -y    qgnomeplatform   xdg-desktop-portal-gtk   NetworkManager-libreswan-gnome   PackageKit-command-not-found   PackageKit-gtk3-module   abrt-desktop   at-spi2-atk   at-spi2-core   avahi   baobab   caribou   caribou-gtk2-module   caribou-gtk3-module   cheese   compat-cheese314   control-center   dconf   empathy   eog   evince   evince-nautilus   file-roller   file-roller-nautilus   firewall-config   firstboot   fprintd-pam   gdm   gedit   glib-networking   gnome-bluetooth   gnome-boxes   gnome-calculator   gnome-classic-session   gnome-clocks   gnome-color-manager   gnome-contacts   gnome-dictionary   gnome-disk-utility   gnome-font-viewer   gnome-getting-started-docs   gnome-icon-theme   gnome-icon-theme-extras   gnome-icon-theme-symbolic   gnome-initial-setup   gnome-packagekit   gnome-packagekit-updater   gnome-screenshot   gnome-session   gnome-session-xsession   gnome-settings-daemon   gnome-shell   gnome-software   gnome-system-log   gnome-system-monitor   gnome-terminal   gnome-terminal-nautilus   gnome-themes-standard   gnome-tweak-tool   nm-connection-editor   orca   redhat-access-gui   sane-backends-drivers-scanners   seahorse   setroubleshoot   sushi   totem   totem-nautilus   vinagre   vino   xdg-user-dirs-gtk   yelp

yum install -y    cjkuni-uming-fonts   dejavu-sans-fonts   dejavu-sans-mono-fonts   dejavu-serif-fonts   gnu-free-mono-fonts   gnu-free-sans-fonts   gnu-free-serif-fonts   google-crosextra-caladea-fonts   google-crosextra-carlito-fonts   google-noto-emoji-fonts   jomolhari-fonts   khmeros-base-fonts   liberation-mono-fonts   liberation-sans-fonts   liberation-serif-fonts   lklug-fonts   lohit-assamese-fonts   lohit-bengali-fonts   lohit-devanagari-fonts   lohit-gujarati-fonts   lohit-kannada-fonts   lohit-malayalam-fonts   lohit-marathi-fonts   lohit-nepali-fonts   lohit-oriya-fonts   lohit-punjabi-fonts   lohit-tamil-fonts   lohit-telugu-fonts   madan-fonts   nhn-nanum-gothic-fonts   open-sans-fonts   overpass-fonts   paktype-naskh-basic-fonts   paratype-pt-sans-fonts   sil-abyssinica-fonts   sil-nuosu-fonts   sil-padauk-fonts   smc-meera-fonts   stix-fonts   thai-scalable-waree-fonts   ucs-miscfixed-fonts   vlgothic-fonts   wqy-microhei-fonts   wqy-zenhei-fonts

vncpasswd

cat << EOF > ~/.vnc/xstartup
#!/bin/sh
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
gnome-session &
EOF
chmod +x ~/.vnc/xstartup

vncserver :1 -geometry 1280x800
# 如果你想停掉vnc server,这么做
vncserver -kill :1

firewall-cmd --permanent --add-port=6001/tcp
firewall-cmd --permanent --add-port=5901/tcp
firewall-cmd --reload

# 配置kvm环境
yum -y install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install virt-viewer virt-manager

systemctl enable libvirtd
systemctl start libvirtd

brctl show
virsh net-list

cat << EOF >  /data/virt-net.xml
<network>
  <name>br0</name>
  <forward mode='bridge'>
    <bridge name='br0'/>
  </forward>
</network>
EOF

virsh net-define --file virt-net.xml
virsh net-dumpxml br0
# virsh net-undefine openshift4
# virsh net-destroy openshift4
virsh net-autostart br0
virsh net-start br0

cp /etc/sysconfig/network-scripts/ifcfg-em1 /etc/sysconfig/network-scripts/ifcfg-em1.orig

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-em1
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=em1
DEVICE=em1
ONBOOT=yes
# IPADDR=117.177.241.22
# PREFIX=24
# GATEWAY=117.177.241.1
IPV6_PRIVACY=no
# DNS1=117.177.241.16
BRIDGE=br0
EOF

cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-br0 
TYPE=Bridge
BOOTPROTO=static
IPADDR=117.177.241.22
GATEWAY=117.177.241.1
DNS1=117.177.241.16
ONBOOT=yes
DEFROUTE=yes
NAME=br0
DEVICE=br0
PREFIX=24
EOF

systemctl restart network

mkdir -p /data/ocp4
mkdir -p /data/kvm

pigz -dc ocp4-master2.qcow2.tgz | tar xf -

virt-install --name=ocp4-master2 --vcpus=20 --ram=200704 \
--disk path=/data/kvm/ocp4-master2.qcow2,bus=virtio,size=200 \
--os-variant rhel8.0 --network bridge=br0,model=virtio \
--boot menu=on 

virsh list --all

virsh start ocp4-master2

# anti scan
firewall-cmd --permanent --new-ipset=my-allow-list --type=hash:net
firewall-cmd --permanent --get-ipsets

cat > /root/iplist.txt <<EOL
127.0.0.1/32
223.87.20.0/24
117.177.241.0/24
39.134.200.0/24
39.134.201.0/24
39.137.101.0/24
192.168.7.0/24
112.44.102.224/27
47.93.86.113/32
EOL

firewall-cmd --permanent --ipset=my-allow-list --add-entries-from-file=iplist.txt

firewall-cmd --permanent --ipset=my-allow-list --get-entries

firewall-cmd --permanent --zone=trusted --add-source=ipset:my-allow-list 
firewall-cmd --reload

firewall-cmd --list-all
firewall-cmd --get-active-zones

firewall-cmd --set-default-zone=block
firewall-cmd --runtime-to-permanent
firewall-cmd --reload

infra0 node day1

systemctl disable firewalld.service
systemctl stop firewalld.service

# secure for anti-scan
cat << EOF >> /etc/rc.local

ipset create my-allow-set hash:net
ipset add my-allow-set 127.0.0.1/32
ipset add my-allow-set 223.87.20.0/24
ipset add my-allow-set 117.177.241.0/24
ipset add my-allow-set 39.134.200.0/24
ipset add my-allow-set 39.134.201.0/24
ipset add my-allow-set 39.137.101.0/24
ipset add my-allow-set 192.168.7.0/24
ipset add my-allow-set 112.44.102.224/27
ipset add my-allow-set 47.93.86.113/32

ipset add my-allow-set 39.134.204.0/24

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m set --match-set my-allow-set src -j ACCEPT
iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -p udp -j REJECT

EOF

chmod +x /etc/rc.d/rc.local
systemctl enable rc-local

# systemctl restart rc-local

# 配置kvm环境
yum -y install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install virt-viewer virt-manager

systemctl enable libvirtd
systemctl start libvirtd

infra1 node day1

systemctl disable firewalld.service
systemctl stop firewalld.service

# secure for anti-scan
cat << EOF >> /etc/rc.local

ipset create my-allow-set hash:net
ipset add my-allow-set 127.0.0.1/32
ipset add my-allow-set 223.87.20.0/24
ipset add my-allow-set 117.177.241.0/24
ipset add my-allow-set 39.134.200.0/24
ipset add my-allow-set 39.134.201.0/24
ipset add my-allow-set 39.137.101.0/24
ipset add my-allow-set 192.168.7.0/24
ipset add my-allow-set 112.44.102.224/27
ipset add my-allow-set 47.93.86.113/32

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m set --match-set my-allow-set src -j ACCEPT
iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -p udp -j REJECT

EOF

chmod +x /etc/rc.d/rc.local
systemctl enable rc-local

# systemctl restart rc-local

# 配置kvm环境
yum -y install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install virt-viewer virt-manager

systemctl enable libvirtd
systemctl start libvirtd

worker-0 day2 oper


podman login registry.redhat.ren:4443 -u zteadm

# localhost/ottcache-img:6.01.05.01T03
skopeo copy docker-archive:ZXCDN-OTT-IAS-IMGV6.01.05.01_TEST.tar docker://registry.redhat.ren:4443/zteadm/ottcache-img:6.01.05.01T03

# localhost/slbl7-img:6.01.05.01T03
skopeo copy docker-archive:ZXCDN-OTT-SLBL7-IMGV6.01.05.01_TEST.tar docker://registry.redhat.ren:4443/zteadm/slbl7-img:6.01.05.01T03

# localhost/webcache-img:v6.01.04.03
skopeo copy docker-archive:ZXCDN-CACHE-WEBCACHE-IMGV6.01.04.03.tar docker://registry.redhat.ren:4443/zteadm/webcache-img:v6.01.04.03

# localhost/pg-img:v1.01.01.01
skopeo copy docker-archive:ZXCDN-PG-IMGV1.01.01.01.tar docker://registry.redhat.ren:4443/zteadm/pg-img:v1.01.01.01

# localhost/slb-img:v6.01.04.03
skopeo copy docker-archive:ZXCDN-CACHE-SLB-IMGV6.01.04.03.tar docker://registry.redhat.ren:4443/zteadm/slb-img:v6.01.04.03

# io speed test
dd if=/dev/zero of=/data/testfile bs=1G count=10
# 10+0 records in
# 10+0 records out
# 10737418240 bytes (11 GB) copied, 6.85688 s, 1.6 GB/s

dd if=/dev/zero of=/data/testfile bs=1G count=10 oflag=direct
# 10+0 records in
# 10+0 records out
# 10737418240 bytes (11 GB) copied, 3.98098 s, 2.7 GB/s

dd if=/dev/zero of=/data/testfile bs=5M count=9999
# 9999+0 records in
# 9999+0 records out
# 52423557120 bytes (52 GB) copied, 27.8529 s, 1.9 GB/s

dd if=/dev/zero of=/data/testfile bs=5M count=9999 oflag=direct
# 9999+0 records in
# 9999+0 records out
# 52423557120 bytes (52 GB) copied, 16.1121 s, 3.3 GB/s

dd if=/dev/zero of=/data/testfile bs=5M count=9999 oflag=dsync
# 9999+0 records in
# 9999+0 records out
# 52423557120 bytes (52 GB) copied, 51.2713 s, 1.0 GB/s

dd if=/data/testfile of=/dev/null bs=1M count=9999 oflag=dsync
# 9999+0 records in
# 9999+0 records out
# 10484711424 bytes (10 GB) copied, 1.9141 s, 5.5 GB/s

dd if=/data/testfile of=/dev/null bs=5M count=9999 oflag=dsync
# 9999+0 records in
# 9999+0 records out
# 52423557120 bytes (52 GB) copied, 9.3676 s, 5.6 GB/s

# secure for anti-scan
cat << EOF > /etc/rc.local
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.

touch /var/lock/subsys/local

ipset create my-allow-set hash:net
ipset add my-allow-set 127.0.0.1/32
ipset add my-allow-set 223.87.20.0/24
ipset add my-allow-set 117.177.241.0/24
ipset add my-allow-set 39.134.200.0/24
ipset add my-allow-set 39.134.201.0/24
ipset add my-allow-set 39.137.101.0/24
ipset add my-allow-set 192.168.7.0/24
ipset add my-allow-set 112.44.102.224/27
ipset add my-allow-set 47.93.86.113/32
ipset add my-allow-set 221.226.0.75/32
ipset add my-allow-set 210.21.236.182/32
ipset add my-allow-set 61.132.54.2/32
ipset add my-allow-set 39.134.198.0/24

ipset add my-allow-set 218.205.236.16/28

ipset add my-allow-set 39.134.204.0/24

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m set --match-set my-allow-set src -j ACCEPT
iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -p udp -j REJECT

EOF

chmod +x /etc/rc.d/rc.local
systemctl enable rc-local

# systemctl restart rc-local

ipset add my-allow-set 221.226.0.75/32
ipset add my-allow-set 210.21.236.182/32
ipset add my-allow-set 61.132.54.2/32

# 配置kvm环境
yum -y install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install virt-viewer virt-manager

systemctl enable libvirtd
systemctl start libvirtd

worker-1 day2 oper

cat << EOF > /etc/rc.local
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.

touch /var/lock/subsys/local

ipset create my-allow-set hash:net
ipset add my-allow-set 127.0.0.1/32
ipset add my-allow-set 223.87.20.0/24
ipset add my-allow-set 117.177.241.0/24
ipset add my-allow-set 39.134.200.0/24
ipset add my-allow-set 39.134.201.0/24
ipset add my-allow-set 39.137.101.0/24
ipset add my-allow-set 192.168.7.0/24
ipset add my-allow-set 112.44.102.224/27
ipset add my-allow-set 47.93.86.113/32
ipset add my-allow-set 221.226.0.75/32
ipset add my-allow-set 210.21.236.182/32
ipset add my-allow-set 61.132.54.2/32
ipset add my-allow-set 39.134.198.0/24

ipset add my-allow-set 218.205.236.16/28

ipset add my-allow-set 39.134.204.0/24

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m set --match-set my-allow-set src -j ACCEPT
iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -p udp -j REJECT

EOF

chmod +x /etc/rc.d/rc.local
systemctl enable rc-local

# systemctl restart rc-local

# 配置kvm环境
yum -y install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install virt-viewer virt-manager

systemctl enable libvirtd
systemctl start libvirtd

worker-2 day2 oper

cat << EOF > /etc/rc.local
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.

touch /var/lock/subsys/local

ipset create my-allow-set hash:net
ipset add my-allow-set 127.0.0.1/32
ipset add my-allow-set 223.87.20.0/24
ipset add my-allow-set 117.177.241.0/24
ipset add my-allow-set 39.134.200.0/24
ipset add my-allow-set 39.134.201.0/24
ipset add my-allow-set 39.137.101.0/24
ipset add my-allow-set 192.168.7.0/24
ipset add my-allow-set 112.44.102.224/27
ipset add my-allow-set 47.93.86.113/32
ipset add my-allow-set 221.226.0.75/32
ipset add my-allow-set 210.21.236.182/32
ipset add my-allow-set 61.132.54.2/32
ipset add my-allow-set 39.134.198.0/24

ipset add my-allow-set 218.205.236.16/28

ipset add my-allow-set 39.134.204.0/24

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m set --match-set my-allow-set src -j ACCEPT
iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -p udp -j REJECT

EOF

chmod +x /etc/rc.d/rc.local
systemctl enable rc-local

# systemctl restart rc-local

# 配置kvm环境
yum -y install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install virt-viewer virt-manager

systemctl enable libvirtd
systemctl start libvirtd
systemctl status libvirtd

systemctl stop libvirtd
systemctl disable libvirtd
# Installed:
#   libguestfs-tools.noarch 1:1.40.2-5.el7_7.3          libvirt.x86_64 0:4.5.0-23.el7_7.5          libvirt-python.x86_64 0:4.5.0-1.el7
#   qemu-kvm.x86_64 10:1.5.3-167.el7_7.4                virt-install.noarch 0:1.5.0-7.el7          virt-manager.noarch 0:1.5.0-7.el7
#   virt-viewer.x86_64 0:5.0-15.el7

# Dependency Installed:
#   adwaita-cursor-theme.noarch 0:3.28.0-1.el7                            adwaita-icon-theme.noarch 0:3.28.0-1.el7
#   at-spi2-atk.x86_64 0:2.26.2-1.el7                                     at-spi2-core.x86_64 0:2.28.0-1.el7
#   atk.x86_64 0:2.28.1-1.el7                                             augeas-libs.x86_64 0:1.4.0-9.el7
#   autogen-libopts.x86_64 0:5.18-5.el7                                   cairo.x86_64 0:1.15.12-4.el7
#   cairo-gobject.x86_64 0:1.15.12-4.el7                                  cdparanoia-libs.x86_64 0:10.2-17.el7
#   celt051.x86_64 0:0.5.1.3-8.el7                                        colord-libs.x86_64 0:1.3.4-1.el7
#   cyrus-sasl.x86_64 0:2.1.26-23.el7                                     dbus-x11.x86_64 1:1.10.24-13.el7_6
#   dconf.x86_64 0:0.28.0-4.el7                                           dejavu-fonts-common.noarch 0:2.33-6.el7
#   dejavu-sans-fonts.noarch 0:2.33-6.el7                                 flac-libs.x86_64 0:1.3.0-5.el7_1
#   fontconfig.x86_64 0:2.13.0-4.3.el7                                    fontpackages-filesystem.noarch 0:1.44-8.el7
#   fribidi.x86_64 0:1.0.2-1.el7_7.1                                      fuse.x86_64 0:2.9.2-11.el7
#   fuse-libs.x86_64 0:2.9.2-11.el7                                       gdk-pixbuf2.x86_64 0:2.36.12-3.el7
#   genisoimage.x86_64 0:1.1.11-25.el7                                    glib-networking.x86_64 0:2.56.1-1.el7
#   glusterfs-api.x86_64 0:3.12.2-47.2.el7                                glusterfs-cli.x86_64 0:3.12.2-47.2.el7
#   gnome-icon-theme.noarch 0:3.12.0-1.el7                                gnutls.x86_64 0:3.3.29-9.el7_6
#   gnutls-dane.x86_64 0:3.3.29-9.el7_6                                   gnutls-utils.x86_64 0:3.3.29-9.el7_6
#   gperftools-libs.x86_64 0:2.6.1-1.el7                                  graphite2.x86_64 0:1.3.10-1.el7_3
#   gsettings-desktop-schemas.x86_64 0:3.28.0-2.el7                       gsm.x86_64 0:1.0.13-11.el7
#   gstreamer1.x86_64 0:1.10.4-2.el7                                      gstreamer1-plugins-base.x86_64 0:1.10.4-2.el7
#   gtk-update-icon-cache.x86_64 0:3.22.30-3.el7                          gtk-vnc2.x86_64 0:0.7.0-3.el7
#   gtk3.x86_64 0:3.22.30-3.el7                                           gvnc.x86_64 0:0.7.0-3.el7
#   harfbuzz.x86_64 0:1.7.5-2.el7                                         hexedit.x86_64 0:1.2.13-5.el7
#   hicolor-icon-theme.noarch 0:0.12-7.el7                                hivex.x86_64 0:1.3.10-6.9.el7
#   ipxe-roms-qemu.noarch 0:20180825-2.git133f4c.el7                      iso-codes.noarch 0:3.46-2.el7
#   jasper-libs.x86_64 0:1.900.1-33.el7                                   jbigkit-libs.x86_64 0:2.0-11.el7
#   json-glib.x86_64 0:1.4.2-2.el7                                        lcms2.x86_64 0:2.6-3.el7
#   libICE.x86_64 0:1.0.9-9.el7                                           libSM.x86_64 0:1.2.2-2.el7
#   libX11.x86_64 0:1.6.7-2.el7                                           libX11-common.noarch 0:1.6.7-2.el7
#   libXau.x86_64 0:1.0.8-2.1.el7                                         libXcomposite.x86_64 0:0.4.4-4.1.el7
#   libXcursor.x86_64 0:1.1.15-1.el7                                      libXdamage.x86_64 0:1.1.4-4.1.el7
#   libXext.x86_64 0:1.3.3-3.el7                                          libXfixes.x86_64 0:5.0.3-1.el7
#   libXft.x86_64 0:2.3.2-2.el7                                           libXi.x86_64 0:1.7.9-1.el7
#   libXinerama.x86_64 0:1.1.3-2.1.el7                                    libXmu.x86_64 0:1.1.2-2.el7
#   libXrandr.x86_64 0:1.5.1-2.el7                                        libXrender.x86_64 0:0.9.10-1.el7
#   libXt.x86_64 0:1.1.5-3.el7                                            libXtst.x86_64 0:1.2.3-1.el7
#   libXv.x86_64 0:1.0.11-1.el7                                           libXxf86misc.x86_64 0:1.0.3-7.1.el7
#   libXxf86vm.x86_64 0:1.1.4-1.el7                                       libarchive.x86_64 0:3.1.2-14.el7_7
#   libasyncns.x86_64 0:0.8-7.el7                                         libcacard.x86_64 40:2.5.2-2.el7
#   libconfig.x86_64 0:1.4.9-5.el7                                        libepoxy.x86_64 0:1.5.2-1.el7
#   libglvnd.x86_64 1:1.0.1-0.8.git5baa1e5.el7                            libglvnd-egl.x86_64 1:1.0.1-0.8.git5baa1e5.el7
#   libglvnd-glx.x86_64 1:1.0.1-0.8.git5baa1e5.el7                        libgovirt.x86_64 0:0.3.4-3.el7
#   libguestfs.x86_64 1:1.40.2-5.el7_7.3                                  libguestfs-tools-c.x86_64 1:1.40.2-5.el7_7.3
#   libgusb.x86_64 0:0.2.9-1.el7                                          libibverbs.x86_64 0:22.1-3.el7
#   libiscsi.x86_64 0:1.9.0-7.el7                                         libjpeg-turbo.x86_64 0:1.2.90-8.el7
#   libmodman.x86_64 0:2.0.1-8.el7                                        libogg.x86_64 2:1.3.0-7.el7
#   libosinfo.x86_64 0:1.1.0-3.el7                                        libproxy.x86_64 0:0.4.11-11.el7
#   librdmacm.x86_64 0:22.1-3.el7                                         libsndfile.x86_64 0:1.0.25-10.el7
#   libsoup.x86_64 0:2.62.2-2.el7                                         libthai.x86_64 0:0.1.14-9.el7
#   libtheora.x86_64 1:1.1.1-8.el7                                        libtiff.x86_64 0:4.0.3-32.el7
#   libusal.x86_64 0:1.1.11-25.el7                                        libusbx.x86_64 0:1.0.21-1.el7
#   libvirt-bash-completion.x86_64 0:4.5.0-23.el7_7.5                     libvirt-client.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-daemon.x86_64 0:4.5.0-23.el7_7.5                              libvirt-daemon-config-network.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-daemon-config-nwfilter.x86_64 0:4.5.0-23.el7_7.5              libvirt-daemon-driver-interface.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-daemon-driver-lxc.x86_64 0:4.5.0-23.el7_7.5                   libvirt-daemon-driver-network.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-daemon-driver-nodedev.x86_64 0:4.5.0-23.el7_7.5               libvirt-daemon-driver-nwfilter.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-daemon-driver-qemu.x86_64 0:4.5.0-23.el7_7.5                  libvirt-daemon-driver-secret.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-daemon-driver-storage.x86_64 0:4.5.0-23.el7_7.5               libvirt-daemon-driver-storage-core.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-daemon-driver-storage-disk.x86_64 0:4.5.0-23.el7_7.5          libvirt-daemon-driver-storage-gluster.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-daemon-driver-storage-iscsi.x86_64 0:4.5.0-23.el7_7.5         libvirt-daemon-driver-storage-logical.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-daemon-driver-storage-mpath.x86_64 0:4.5.0-23.el7_7.5         libvirt-daemon-driver-storage-rbd.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-daemon-driver-storage-scsi.x86_64 0:4.5.0-23.el7_7.5          libvirt-daemon-kvm.x86_64 0:4.5.0-23.el7_7.5
#   libvirt-glib.x86_64 0:1.0.0-1.el7                                     libvirt-libs.x86_64 0:4.5.0-23.el7_7.5
#   libvisual.x86_64 0:0.4.0-16.el7                                       libvorbis.x86_64 1:1.3.3-8.el7.1
#   libwayland-client.x86_64 0:1.15.0-1.el7                               libwayland-cursor.x86_64 0:1.15.0-1.el7
#   libwayland-egl.x86_64 0:1.15.0-1.el7                                  libwayland-server.x86_64 0:1.15.0-1.el7
#   libxcb.x86_64 0:1.13-1.el7                                            libxkbcommon.x86_64 0:0.7.1-3.el7
#   libxshmfence.x86_64 0:1.2-1.el7                                       lsof.x86_64 0:4.87-6.el7
#   lzop.x86_64 0:1.03-10.el7                                             mesa-libEGL.x86_64 0:18.3.4-6.el7_7
#   mesa-libGL.x86_64 0:18.3.4-6.el7_7                                    mesa-libgbm.x86_64 0:18.3.4-6.el7_7
#   mesa-libglapi.x86_64 0:18.3.4-6.el7_7                                 mtools.x86_64 0:4.0.18-5.el7
#   netcf-libs.x86_64 0:0.2.8-4.el7                                       nettle.x86_64 0:2.7.1-8.el7
#   numad.x86_64 0:0.5-18.20150602git.el7                                 opus.x86_64 0:1.0.2-6.el7
#   orc.x86_64 0:0.4.26-1.el7                                             osinfo-db.noarch 0:20190319-2.el7
#   osinfo-db-tools.x86_64 0:1.1.0-1.el7                                  pango.x86_64 0:1.42.4-4.el7_7
#   pcre2.x86_64 0:10.23-2.el7                                            perl-Sys-Guestfs.x86_64 1:1.40.2-5.el7_7.3
#   perl-Sys-Virt.x86_64 0:4.5.0-2.el7                                    perl-hivex.x86_64 0:1.3.10-6.9.el7
#   perl-libintl.x86_64 0:1.20-12.el7                                     pixman.x86_64 0:0.34.0-1.el7
#   pulseaudio-libs.x86_64 0:10.0-5.el7                                   pulseaudio-libs-glib2.x86_64 0:10.0-5.el7
#   pycairo.x86_64 0:1.8.10-8.el7                                         python-gobject.x86_64 0:3.22.0-1.el7_4.1
#   qemu-img.x86_64 10:1.5.3-167.el7_7.4                                  qemu-kvm-common.x86_64 10:1.5.3-167.el7_7.4
#   radvd.x86_64 0:2.17-3.el7                                             rdma-core.x86_64 0:22.1-3.el7
#   rest.x86_64 0:0.8.1-2.el7                                             scrub.x86_64 0:2.5.2-7.el7
#   seabios-bin.noarch 0:1.11.0-2.el7                                     seavgabios-bin.noarch 0:1.11.0-2.el7
#   sgabios-bin.noarch 1:0.20110622svn-4.el7                              spice-glib.x86_64 0:0.35-4.el7
#   spice-gtk3.x86_64 0:0.35-4.el7                                        spice-server.x86_64 0:0.14.0-7.el7
#   squashfs-tools.x86_64 0:4.3-0.21.gitaae0aff4.el7                      supermin5.x86_64 0:5.1.19-1.el7
#   syslinux.x86_64 0:4.05-15.el7                                         syslinux-extlinux.x86_64 0:4.05-15.el7
#   trousers.x86_64 0:0.3.14-2.el7                                        unbound-libs.x86_64 0:1.6.6-1.el7
#   usbredir.x86_64 0:0.7.1-3.el7                                         virt-manager-common.noarch 0:1.5.0-7.el7
#   vte-profile.x86_64 0:0.52.2-2.el7                                     vte291.x86_64 0:0.52.2-2.el7
#   xkeyboard-config.noarch 0:2.24-1.el7                                  xml-common.noarch 0:0.6.3-39.el7
#   xorg-x11-server-utils.x86_64 0:7.7-20.el7                             xorg-x11-xauth.x86_64 1:1.0.9-1.el7
#   xorg-x11-xinit.x86_64 0:1.3.4-2.el7                                   yajl.x86_64 0:2.0.4-4.el7

tips

  1. config local storage operator
  2. config monitor storage
  3. benchmark the storage using real senario

OSX 系统如何录制系统声音

mac系统上,如何录制系统的声音一直是个问题。特别是现在开在线会议,有的时候想在mac上录下来,在mac系统上,默认是不可以的。windows系统上不存在这个问题,不知道为什么mac上面反而特别麻烦。

解决这个问题的方法,就是BackgroundMusic

首先,下载和安装 BackgroundMusic , 这个按照官网的步骤完成就好。

然后,启动audio midi程序,配置一个aggregate device

确认一下系统的input, output设置

最后,我们开启录屏软件,输入设备选择aggregate device就可以了。