Let’s go on an adventure. I’ve learnt a lot more Rust over the last year, and I want to get back into writing properly, so my plan is to write a Linux Operating System. While writing it, I’ll be taking notes in my repo - https://github.com/sinkingpoint/qos/tree/main/notes . And every now and then formalising them into more structured blog posts over here, once I’ve learnt enough to make something interesting.
Welcome to the first of such formalisations: Getting something booting.
I’m a bit of a rabbit hole learner. I start one place and easily get distracted into others. It helps to keep an end goal in mind in order to act as a north star, so getting something booting seems like a noble first goal to work towards. Let’s see what rabbit holes we can find.
What to boot?
When one wants to boot something, it helps to have something to boot. Well that’s easy - I’m writing this on my laptop, and that’s already booted something. But what has it booted? Now there’s a good question. Let’s ask:
$ cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.6.8-200.fc39.x86_64 root=UUID=e2cd75ff-3ee9-41ce-b23f-28f7d78f4a4f ro rootflags=subvol=root rd.luks.uuid=luks-f44121b1-5be9-47d8-a206-72c52309e7dd rhgb quiet
/proc/cmdline
contains all the kernel parameters that my laptop booted with. BOOT_IMAGE=
there looks promising!
While we’re really only interested in one parameter ( While technically two separate arguments, these work hand in hand. In particular, they enable the “Red Hat Graphical Boot” mode, and disable a lot of the kernel messages after the kernel boots. These are Fedora specific to allow for a nice splash screen when booting, rather than 👻spooky👻 kernel messages.🐇 Rabbithole: Kernel Parameters
BOOT_IMAGE
) there, it can’t hurt to enumerate the rest.root=UUID=e2cd75ff-3ee9-41ce-b23f-28f7d78f4a4f
root
allows us to specify what disk should be mounted as our root filesystem. The structure of this one is interesting to note however - UUID=
isn’t something that the kernel specifically understands. Instead, the variable is read and interpreted by the initramfs (Spoilers!) in order to find the disk. This in particular is telling systemd to boot the disk identified by UUID e2cd75ff-3ee9-41ce-b23f-28f7d78f4a4f. Other valid options include LABEL
, PARTLABEL
, PARTUUID
, and ID
(see: https://man7.org/linux/man-pages/man8/mount.8.html).ro
ro
tells the kernel to mount the root file system as read-only when we boot.rootflags=subvol=root
rootflags
allows us to send specific options when mounting the filesystem. In particular, it’s the data
argument to the mount syscall. subvol=root
tells the call what BTRFS subvolume to mount.rd.luks.uuid
rd.luks.uuid
isn’t interpreted by the kernel at all, and is instead understood by systemd-cryptsetup-generator to indicate which LUKS device to activate when booting. The rd.
at the front indicates that it’s only handled by the initramfs (rd=“ram disk”).rhgb & quiet
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.6.8-200.fc39.x86_64
, so my laptop is booting from a disk ((hd0,gpt2)
: Hard Drive 0, “GUID Partition Table” (GPT) 2) something called “vmlinuz-6.6.8-200.fc39.x86_64”. Let’s see if we can find that.
$ find / -name vmlinuz-6.6.8-200.fc39.x86_64
/boot/vmlinuz-6.6.8-200.fc39.x86_64
$ file /boot/vmlinuz-6.6.8-200.fc39.x86_64
/boot/vmlinuz-6.6.8-200.fc39.x86_64: Linux kernel x86 boot executable bzImage, version 6.6.8-200.fc39.x86_64 (mockbuild@f2936e05dca94a129acf79933fec484d) #1 SMP PREEMPT_DYNAMIC Thu Dec 21 04:01:49 UTC 2023, RO-rootFS, swap_dev 0XD, Normal VGA
A “boot executable”? Nice. That sounds like something we can boot! And, ah! That “6.6.8-200.fc39” indicates the Linux kernel version that I’m running (Kernel 6.6.8, minor release 200, Fedora 39) Now… how do we boot it?
If we look in /boot for other “6.6.8-200.fc39.x86_64” related files, we get a few. Let’s take a look at them. This contains the Kernel Configuration that my Kernel was built with. A CPIO archive? Never heard of that one before. I do know what initramfs is though. This contains the “Initial Ram File system” (initramfs) that my system booted with. That makes sense! We found in our other rabbithole that our Kernel parameters contained a few flags that the documentation said were interpreted by an initramfs - this must be that! From https://wiki.gentoo.org/wiki/Initramfs: An initramfs (initial ram file system) is used to prepare Linux systems during boot before the init process starts. That sounds like something to do with the boot process though, so let’s return once we’ve got something booting. From https://www.kernel.org/doc/html/latest/kbuild/modules.html: Module.symvers contains a list of all exported symbols from a kernel build. Module.symvers contains all exported symbols from the kernel and compiled modules. For each symbol, the corresponding CRC value is also stored. According to the above link, the structure is: Let’s take one: That means that lines means a CRC of 0x00000000, a symbol name of Could have fooled me. My guess at the first field was a memory address. Weird. My CRCs are all 0x00000000? Ah: For a kernel build without CONFIG_MODVERSIONS enabled, the CRC would read 0x00000000. And indeed: Of all the files here, this one’s the one with a Wikipedia page. It must be important. In fact, this does what I thought at first glance the symvers file did - it provides a mapping from symbols to addresses in kernel space. The middle column there is the “type” of the symbol. This allows me to say, if I wanted to call the soft_restart_cpu function, that I should jump to address 0xffffffff81000400. Useful!🐇 Rabbithole: Other boot files
/boot/config-6.6.8-200.fc39.x86_64
$ file /boot/config-6.6.8-200.fc39.x86_64
/boot/config-6.6.8-200.fc39.x86_64: Linux make config build file, ASCII text
$ head /boot/config-6.6.8-200.fc39.x86_64
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 6.6.8-200.fc39.x86_64 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6)"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=130201
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=24000
/boot/initramfs-6.6.8-200.fc39.x86_64.img
$ sudo file /boot/initramfs-6.6.8-200.fc39.x86_64.img
/boot/initramfs-6.6.8-200.fc39.x86_64.img: ASCII cpio archive (SVR4 with no CRC)
/boot/symvers-6.6.8-200.fc39.x86_64.xz
$ file /boot/symvers-6.6.8-200.fc39.x86_64.xz
/boot/symvers-6.6.8-200.fc39.x86_64.xz: symbolic link to /lib/modules/6.6.8-200.fc39.x86_64/symvers.xz
$ unxz ./symvers-6.6.8-200.fc39.x86_64.xz
$ file symvers-6.6.8-200.fc39.x86_64
symvers-6.6.8-200.fc39.x86_64: ASCII text
$ head symvers-6.6.8-200.fc39.x86_64
0x00000000 system_state vmlinux EXPORT_SYMBOL
0x00000000 static_key_initialized vmlinux EXPORT_SYMBOL_GPL
0x00000000 reset_devices vmlinux EXPORT_SYMBOL
0x00000000 loops_per_jiffy vmlinux EXPORT_SYMBOL
0x00000000 init_uts_ns vmlinux EXPORT_SYMBOL_GPL
0x00000000 wait_for_initramfs vmlinux EXPORT_SYMBOL_GPL
0x00000000 init_task vmlinux EXPORT_SYMBOL
0x00000000 cc_platform_has vmlinux EXPORT_SYMBOL_GPL
0x00000000 cc_mkdec vmlinux EXPORT_SYMBOL_GPL
0x00000000 tdx_kvm_hypercall vmlinux EXPORT_SYMBOL_GPL
<CRC> <Symbol> <Module> <Export Type>
0x00000000 cc_platform_has vmlinux EXPORT_SYMBOL_GPL
cc_platform_has
, a namespace of vmlinux
, and an export of EXPORT_SYMBOL_GPL
so it can only be used from GPL licensed modules. We can even find where that comes from! https://github.com/torvalds/linux/blob/fbafc3e621c3f4ded43720fdb1d6ce1728ec664e/arch/x86/coco/core.c#L111$ grep CONFIG_MODVERSIONS /boot/config-6.6.8-200.fc39.x86_64
# CONFIG_MODVERSIONS is not set
/boot/System.map-6.6.8-200.fc39.x86_64
$ sudo file /boot/System.map-6.6.8-200.fc39.x86_64
/boot/System.map-6.6.8-200.fc39.x86_64: ASCII text
$ sudo head /boot/System.map-6.6.8-200.fc39.x86_64
0000000000000000 D __per_cpu_start
0000000000000000 D fixed_percpu_data
0000000000001000 D cpu_debug_store
0000000000002000 D irq_stack_backing_store
0000000000006000 D cpu_tss_rw
000000000000b000 D gdt_page
000000000000c000 d exception_stacks
0000000000018000 d entry_stack_storage
0000000000019000 D espfix_waddr
0000000000019008 D espfix_stack
Well my laptop has already booted that boot image, and I could start fuzzing around in those files to start building something, having to restart my laptop every time I wanted to test a change sounds like a real bear. In order to have to not do that, I can boot things in a Virtual Machine - much nicer. We manage Virtual Machines with Hypervisor, and I’ll need to pick one. But there’s so many! The one I’m most familiar with is qemu, so for familiarities sake than anything else, let’s use that. It’s a decent choice - it has a nice Command Line Interface, and it supports KVM which allows it to be 🏃fast🏃 .
Let’s try this:
$ qemu-system-x86_64 -kernel ./vmlinuz-6.6.8-200.fc39.x86_64
...
[ 1.101571] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[ 1.105553] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.8-200.fc39.x86_64 #1
[ 1.107942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
[ 1.111619] Call Trace:
...
Fantastic! We just booted a kernel. It immediately crashed (or, “panic"ed, because we didn’t give it a drive, or an initramfs, or… anything really), but by golly it actually booted.
While the above command works well enough, there’s a few more flags we can chuck on the end to make it a bit nicer to work with. Those give us an actual final command of:🐇 Rabbithole: Useful qemu flags
-display none
allows us to stop the extra window that qemu opens.-serial stdio -append "console=ttyAMA0 console=ttyS0"
allows us to redirect the output of our virtual machine back to the terminal we ran qemu on.--enable-kvm
enables using KVM, which gives us a notable speed increase.-m 2G
gives our VM 2G of memory, rather than the default 128MB. We probably wont need all that memory to start with, but we might as well have it to prepare for the future.qemu-system-x86_64 -kernel ./vmlinuz-6.6.8-200.fc39.x86_64 -display none -serial stdio -append "console=ttyAMA0 console=ttyS0" --enable-kvm -m 2G
Now what?
An initramfs
Let’s go back to those files in /boot. Another one stood out there, /boot/initramfs-6.6.8-200.fc39.x86_64.img. Now, I happen to know that after the kernel starts it needs something to mount, and an initial ram file system (“initramfs”) is a perfect candidate for that. Let’s give it a try:
qemu-system-x86_64 -kernel ./vmlinuz-6.6.8-200.fc39.x86_64 -initrd ./initramfs-6.6.8-200.fc39.x86_64.img -display none -serial stdio -append "console=ttyAMA0 console=ttyS0" --enable-kvm -m 2G
Note here the new -initrd
flag pointing to our initramfs. Running that, we get presenting with a bunch of systemd start logs. My gosh it works! We just booted my laptop, on my laptop. Meta.
But I don’t want to just run someone else’s initramfs, I want to make my own. So what does one look like?
$ file initramfs-6.6.8-200.fc39.x86_64.img
initramfs-6.6.8-200.fc39.x86_64.img: ASCII cpio archive (SVR4 with no CRC)
A “CPIO archive”, huh? Never heard of them. They seem very similar to Wait what? Where’s our file system? And also, my initramfs file is 39 megabytes - that bin file is only 207 _kilo_bytes. Is CPIO really that wasteful to have a 180x overhead? Something’s not quite right here. Let’s at least look at what we do have. We’ve got two files here: So, our “early_cpio” file contains a single “1” in it, and our “GenuineIntel.bin” contains some random junk (microcode by the folder name, and the fact that it says “Intel”). What does that early_cpio file do? There’s no reference to it in the Linux source, but we can find earlycpio.c that seems to be called from the microcode loader. We can even find where it comes from, but as far as I can tell this file is purely informational - please correct me! So this cpio is loaded before the real initramfs in order to load the microcode onto my CPU. So where’s the real one? Let’s take a closer look: Ah hah! Our initramfs is being sneaky 😏. It’s not just a CPIO archive but a CPIO archive plus a gzipped file bolted on the end! What’s that doing there? It’s a second CPIO file, this time compressed!🐇 Rabbithole: The structure of an initramfs
tar
files - a collection of files bundled into one. I even seem to have a tool installed for extracting them!$ cpio -i < initramfs-6.6.8-200.fc39.x86_64.img
416 blocks
$ tree
.
├── early_cpio
└── kernel
└── x86
└── microcode
└── GenuineIntel.bin
4 directories, 2 files
$ file early_cpio kernel/x86/microcode/GenuineIntel.bin
early_cpio: ASCII text
kernel/x86/microcode/GenuineIntel.bin: data
$ cat early_cpio
1
$ binwalk initramfs-6.6.8-200.fc39.x86_64.img
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 ASCII cpio archive (SVR4 with no CRC), file name: ".", file name length: "0x00000002", file size: "0x00000000"
112 0x70 ASCII cpio archive (SVR4 with no CRC), file name: "early_cpio", file name length: "0x0000000B", file size: "0x00000002"
240 0xF0 ASCII cpio archive (SVR4 with no CRC), file name: "kernel", file name length: "0x00000007", file size: "0x00000000"
360 0x168 ASCII cpio archive (SVR4 with no CRC), file name: "kernel/x86", file name length: "0x0000000B", file size: "0x00000000"
484 0x1E4 ASCII cpio archive (SVR4 with no CRC), file name: "kernel/x86/microcode", file name length: "0x00000015", file size: "0x00000000"
616 0x268 ASCII cpio archive (SVR4 with no CRC), file name: "kernel/x86/microcode/GenuineIntel.bin", file name length: "0x00000026", file size: "0x00033C00"
212732 0x33EFC ASCII cpio archive (SVR4 with no CRC), file name: "TRAILER!!!", file name length: "0x0000000B", file size: "0x00000000"
212992 0x34000 gzip compressed data, maximum compression, from Unix, last modified: 1970-01-01 00:00:00 (null date)
8109851 0x7BBF1B gzip compressed data, from Unix, last modified: 1970-01-01 00:00:00 (null date)
11000563 0xA7DAF3 xz compressed data
13377833 0xCC2129 xz compressed data
13884598 0xD3DCB6 xz compressed data
13907055 0xD4346F xz compressed data
13912976 0xD44B90 xz compressed data
13966468 0xD51C84 xz compressed data
24097481 0x16FB2C9 Certificate in DER format (x509 v3), header length: 4, sequence length: 17280
25197943 0x1807D77 CRC32 polynomial table, little endian
$ (cpio -i; cat > second.gz) < initramfs-6.6.8-200.fc39.x86_64.img
416 blocks
$ file second.gz
second: gzip compressed data, max compression, from Unix, original size modulo 2^32 82102784
$ gunzip second.gz
$ file second
second: ASCII cpio archive (SVR4 with no CRC)
Let’s extract them:
$ (cpio -i; gunzip | cpio -i) < ../initramfs-6.6.8-200.fc39.x86_64.img
416 blocks
cpio: dev/console: Cannot mknod: Operation not permitted
cpio: dev/kmsg: Cannot mknod: Operation not permitted
cpio: dev/null: Cannot mknod: Operation not permitted
cpio: dev/random: Cannot mknod: Operation not permitted
cpio: dev/urandom: Cannot mknod: Operation not permitted
160357 blocks
$ ls
bin dev early_cpio etc init kernel lib lib64 proc root run sbin shutdown sys sysroot tmp usr var
A bunch of “Operation not permitted” errors because we can’t mknod
, but wahey! We have an honest to god file system in here! Hey, what happens once the file system is mounted? Booting it above started systemd - how did that happen? Let’s look at the code.
The first line there seems promising if (ramdisk_execute_command) {
- we’re in a ram disk! Where does that come from? Well, apparently it comes from two places - the default of "/init"
, or dynamically from the rdinit kernel parameter. We learnt before that we don’t have an rdinit
in our kernel parameters so we must be using the default! What’s that?
$ file init
init: symbolic link to usr/lib/systemd/systemd
Well shucks, it’s systemd! Exactly what we saw when we booted it with qemu.
So, our initramfs is a CPIO archived file system, with /init
being the executable that gets run. Shall we make one? Let’s start simple - booting into a shell. We can assemble a tree, and turn that into a CPIO archive:
$ mkdir tree
mkdir: created directory 'tree'
$ cp /bin/sh tree/init
tree$ find . | cpio -c -o > initramfs
cpio: File ./initramfs grew, 1439744 new bytes not copied
5625 blocks
Let’s try and boot our new initramfs:
$ qemu-system-x86_64 -kernel ./vmlinuz-6.6.8-200.fc39.x86_64 -initrd ./tree/initramfs -display none -serial stdio -append "console=ttyAMA0 console=ttyS0" --enable-kvm
...
[ 1.150096] Run /init as init process
[ 1.150782] Failed to execute /init (error -2)
[ 1.151436] Run /sbin/init as init process
[ 1.152056] Run /etc/init as init process
[ 1.152663] Run /bin/init as init process
[ 1.153301] Run /bin/sh as init process
[ 1.153947] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
Oh 🙁 errno
says a -2 is ENOENT. Our init process can’t be found? But it found it to run it? Ah! Shared libraries. We need a few of those - let’s add them.
$ ldd tree/init
linux-vdso.so.1 (0x00007ffcb718e000)
libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007fed6c3a9000)
libc.so.6 => /lib64/libc.so.6 (0x00007fed6c1c7000)
/lib64/ld-linux-x86-64.so.2 (0x00007fed6c563000)
$ tree
.
├── init
└── lib64
├── ld-linux-x86-64.so.2
├── libc.so.6
└── libtinfo.so.6
$ find . | cpio -c -o > initramfs
And run it:
$ qemu-system-x86_64 -kernel ./vmlinuz-6.6.8-200.fc39.x86_64 -initrd ./tree/initramfs -display none -serial stdio -append "console=ttyAMA0 console=ttyS0" --enable-kvm
...
init: cannot set terminal process group (-1): Inappropriate ioctl for device
init: no job control in this shell
init-5.2#
😮 We have a shell! That means we’ve successfully made an initramfs from scratch! If we Ctrl-D, the kernel panics (“Attempted to kill init”), but still! Progress!
And that’s where I’ll leave off here. It’s worthwhile looking back at what we’ve learnt:
- The structure of the /boot directory
- Using qemu to boot a kernel
- The structure of an initramfs
- And making our own one!
Where do we go from here? Well, the world’s our oyster. I don’t like packaging /bin/sh so maybe we’ll start with our own shell?
Let me know what you think of these - I’m planning to run this as a series a bit, meandering my way around constructing what I feel like and sharing along the way. Let’s see how far we can get!
I'm on Twitter: @sinkingpoint and BlueSky: @colindou.ch. Come yell at me!