← Back to Index

coreos 启动分析

我们知道,coreos 是内核和根文件系统,一起打包升级的,也就是所谓的 A/B 切换升级,那么他到底是怎么实现这个的呢?现在我们就来分析一下。

视频讲解

首先我们分析一下 /boot 分区

ls -hl /boot/loader/entries/
        
        total 4.0K
        -rw-r--r--. 1 root root 629 Apr  9 11:57 ostree-1-rhcos.conf
        -rw-r--r--. 1 root root 630 Apr  9 11:57 ostree-2-rhcos.conf
        
        cat /boot/loader/entries/*.conf
        
        title Red Hat Enterprise Linux CoreOS 49.84.202110081407-0 (Ootpa) (ostree:1)
        version 1
        options random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal $ignition_firstboot ostree=/ostree/boot.0/rhcos/a10b07df1aa66c008cd3b9acb17d765f0755702cadfa0090155dced4d2e9bfe0/0 ip=enp1s0:dhcp root=UUID=0a0d4701-04bf-45a2-8b9b-f761542a617a rw rootflags=prjquota
        linux /ostree/rhcos-a10b07df1aa66c008cd3b9acb17d765f0755702cadfa0090155dced4d2e9bfe0/vmlinuz-4.18.0-305.19.1.el8_4.x86_64
        initrd /ostree/rhcos-a10b07df1aa66c008cd3b9acb17d765f0755702cadfa0090155dced4d2e9bfe0/initramfs-4.18.0-305.19.1.el8_4.x86_64.img
        
        title Red Hat Enterprise Linux CoreOS 410.84.202203081640-0 (Ootpa) (ostree:0)
        version 2
        options random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal $ignition_firstboot ostree=/ostree/boot.0/rhcos/838cd9a10892dbd5e32ffdbec249a4c0db18f6d1c56f416f7a59a2f806f55941/0 ip=enp1s0:dhcp root=UUID=0a0d4701-04bf-45a2-8b9b-f761542a617a rw rootflags=prjquota
        linux /ostree/rhcos-838cd9a10892dbd5e32ffdbec249a4c0db18f6d1c56f416f7a59a2f806f55941/vmlinuz-4.18.0-305.40.1.el8_4.x86_64
        initrd /ostree/rhcos-838cd9a10892dbd5e32ffdbec249a4c0db18f6d1c56f416f7a59a2f806f55941/initramfs-4.18.0-305.40.1.el8_4.x86_64.img

我们可以清晰的看到,这里面定义了2个入口,并且每个入口,对应了/boot/ostree//vmlinuz- 和 /boot/ostree//initramfs-.img , 另外启动参数option定义了 ostree的路径,指向了对应的/boot/ostree/boot.0/ , 所以,我们得出结论,这个A/B切换,在系统启动的时候,是通过grub2的一些特殊配置实现的,看上去并不复杂。

参考文档:

再分析一下 mount

lsblk
        
        NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
        sr0     11:0    1   104M  0 rom
        vda    252:0    0   120G  0 disk
        ├─vda1 252:1    0     1M  0 part
        ├─vda2 252:2    0   127M  0 part
        ├─vda3 252:3    0   384M  0 part /boot
        └─vda4 252:4    0 119.5G  0 part /sysroot
        
        mount | grep vda4
        
        /dev/vda4 on /sysroot type xfs (ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
        /dev/vda4 on / type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
        /dev/vda4 on /etc type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
        /dev/vda4 on /usr type xfs (ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
        /dev/vda4 on /var type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
        /dev/vda4 on /var/lib/containers/storage/overlay type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
        /dev/vda4 on /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/1 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
        /dev/vda4 on /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/2 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
        /dev/vda4 on /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/3 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
        /dev/vda4 on /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/4 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
        /dev/vda4 on /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/5 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)

我们非常困惑,mount命令的输出显示,vda4被挂载了很多次,每次都是不同的路径,这是为什么呢?

cat /proc/1/mountinfo  | grep vda4
        
        99 102 252:4 / /sysroot ro,relatime - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
        102 1 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0 / rw,relatime shared:1 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
        103 102 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc /etc rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
        104 102 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/usr /usr ro,relatime shared:3 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
        133 102 252:4 /ostree/deploy/rhcos/var /var rw,relatime shared:4 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
        299 133 252:4 /ostree/deploy/rhcos/var/lib/containers/storage/overlay /var/lib/containers/storage/overlay rw,relatime - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
        7886 133 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc/modprobe.d /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/1 rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
        5920 133 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc/sysconfig /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/2 rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
        7429 133 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc/sysctl.d /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/3 rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
        7965 133 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc/sysctl.conf /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/4 rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
        8491 133 252:4 /ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0/etc/systemd /var/lib/kubelet/pods/80389395-c0f4-4342-a2ee-2b8c31dbbdbc/volume-subpaths/etc/tuned/5 rw,relatime shared:2 - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota

答案在/proc/1/mountinfo中,我们来仔细分析一下里面的内容,特别是根文件系统的挂载。

  1. 我们看第一行,/dev/vda4说的是设备,xfs说的是这个设备上的文件系统,/ 说的是设备上的本来的目录, /sysroot 说的是挂载到当前进程空间的什么目录
  2. 我们再来看第二行,/dev/vda4说的是设备,xfs说的是这个设备上的文件系统,/ostree/deploy/rhcos/deploy/b1df1247e3ad53173c1e13a913ec645d48a22f6a294e70e2ca5bda8c31f78d78.0 说的是设备上的本来的目录, / 说的是挂载到当前进程空间的什么目录

所以,总结下来,/dev/vda4 上面的目录结构,和我们一般的目录结果不一样,系统启动以后,关键的路径被重新的安排就位了一下。


        find /sysroot -maxdepth 3
        
        /sysroot
        /sysroot/boot
        /sysroot/ostree
        /sysroot/ostree/repo
        /sysroot/ostree/repo/config
        /sysroot/ostree/repo/tmp
        /sysroot/ostree/repo/extensions
        /sysroot/ostree/repo/state
        /sysroot/ostree/repo/refs
        /sysroot/ostree/repo/objects
        /sysroot/ostree/repo/.lock
        /sysroot/ostree/deploy
        /sysroot/ostree/deploy/rhcos
        /sysroot/ostree/boot.0.1
        /sysroot/ostree/boot.0.1/rhcos
        /sysroot/ostree/boot.0
        /sysroot/.coreos-aleph-version.json

调查和 mount fs 相关的systemd

systemctl cat ostree-remount.service
        
        [Unit]
        Description=OSTree Remount OS/ Bind Mounts
        Documentation=man:ostree(1)
        DefaultDependencies=no
        ConditionKernelCommandLine=ostree
        OnFailure=emergency.target
        Conflicts=umount.target
        
        # Run after core mounts
        
        After=-.mount var.mount
        After=systemd-remount-fs.service
        
        # But we run *before* most other core bootup services that need write access to /etc and /var
        
        Before=local-fs.target umount.target
        Before=systemd-random-seed.service plymouth-read-write.service systemd-journal-flush.service
        Before=systemd-tmpfiles-setup.service
        
        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=/usr/lib/ostree/ostree-remount
        StandardInput=null
        StandardOutput=journal
        StandardError=journal+console
        
        [Install]
        WantedBy=local-fs.target
        
        systemctl list-unit-files | grep mount
        
        proc-sys-fs-binfmt_misc.automount                                      static
        boot.mount                                                             generated
        dev-hugepages.mount                                                    static
        dev-mqueue.mount                                                       static
        proc-fs-nfsd.mount                                                     static
        proc-sys-fs-binfmt_misc.mount                                          static
        run-vmblock\x2dfuse.mount                                              disabled
        sys-fs-fuse-connections.mount                                          static
        sys-kernel-config.mount                                                static
        sys-kernel-debug.mount                                                 static
        tmp.mount                                                              disabled
        var-lib-nfs-rpc_pipefs.mount                                           static
        var.mount                                                              generated
        dracut-mount.service                                                   static
        dracut-pre-mount.service                                               static
        nfs-mountd.service                                                     static
        ostree-remount.service                                                 disabled
        systemd-remount-fs.service                                             static
        umount.target                                                          static
        
        systemctl cat dracut-mount.service
        
        # /usr/lib/systemd/system/../../dracut/modules.d/98dracut-systemd/dracut-mount.service
        
        #  This file is part of dracut.
        
        #
        
        # See dracut.bootup(7) for details
        
        [Unit]
        Description=dracut mount hook
        Documentation=man:dracut-mount.service(8)
        After=initrd-root-fs.target initrd-parse-etc.service
        After=dracut-initqueue.service dracut-pre-mount.service
        ConditionPathExists=/usr/lib/initrd-release
        ConditionDirectoryNotEmpty=|/lib/dracut/hooks/mount
        ConditionKernelCommandLine=|rd.break=mount
        DefaultDependencies=no
        Conflicts=shutdown.target emergency.target
        
        [Service]
        Environment=DRACUT_SYSTEMD=1
        Environment=NEWROOT=/sysroot
        Type=oneshot
        ExecStart=-/bin/dracut-mount
        StandardInput=null
        StandardOutput=syslog
        StandardError=syslog+console
        KillMode=process
        RemainAfterExit=yes
        
        # Bash ignores SIGTERM, so we send SIGHUP instead, to ensure that bash
        
        # terminates cleanly.
        
        KillSignal=SIGHUP

参考文档: