Today in bite sized Linux pieces, I want to talk about how Linux boots.
When a Linux Kernel first boots, there is a bunch of behind the scenes work that gets you to the point that you can type in your login credentials. This is the first of a few posts in which I want to cover what happens, from when the Kernel first hands over to user space up to a login prompt.
There’s a number of articles already about the Kernel boot process, how it initialises its data structures, starts the idle process etc, but I don’t want to cover that. I want to talk about User Land - the things that us mere mortals can control, without having to delve (too much) into the Kernel source.
In this entry, we’ll cover the first step of the process - how the kernel goes into User Space by executing an init process, and into some systemd processes (because systemd is the main init system these days, for better or for worse)
In the beginning, there was init
Let’s start at the beginning. Not the real beginning - I don’t want to get too much into Kernel internals (I’ll save that for another post), but the beginning of user space - the init process.
The init process, commonly referred to as “PID 1” (Process ID 1), or just “systemd” as most modern Linux distributions use systemd as an init process, is responsible for setting up the rest of userspace. So how does this process get started? Well there’s two main flows.
initramfs
All (read: most, there seems to be a small trend against initramfs’) modern Kernels contain a CPIO archive called the “initramfs”. The initramfs is extracted into memory by the kernel, which then attempts to execute /init
1:
if (ramdisk_execute_command) {
ret = run_init_process(ramdisk_execute_command);
if (!ret)
return 0;
pr_err("Failed to execute %s (error %d)\n",
ramdisk_execute_command, ret);
}
Where ramdisk_execute_command is “/init”
run_init_process
2 sets the first argument to our init process to the location of the executable, debug prints the arguments and environments variables, and then finally executes our program (with kernel_execve
).
static int run_init_process(const char *init_filename)
{
const char *const *p;
argv_init[0] = init_filename;
pr_info("Run %s as init process\n", init_filename);
pr_debug(" with arguments:\n");
for (p = argv_init; *p; p++)
pr_debug(" %s\n", *p);
pr_debug(" with environment:\n");
for (p = envp_init; *p; p++)
pr_debug(" %s\n", *p);
return kernel_execve(init_filename, argv_init, envp_init);
}
And just like that, we have a PID 1 - our first program!
The role of /init in our initramfs is very simple: Load the real file system. How it does that is very system specific specific, but most modern Linux distributions will hand this over to systemd
The “Legacy” flow
For completeness, we can talk about the “legacy” flow that the kernel follows. I say legacy, but there seems to be a growing trend of people who don’t want to run an initramfs in order to improve boot times, so this flow might make a resurgence in the future.
If the execution of /init fails, then the kernel falls through to another attempt to load an init process. Right at the end of kernel_init
, the kernel attempts to handover to user space by going through a number of secondary guesses at where an init
executable might be 3 :
if (CONFIG_DEFAULT_INIT[0] != '\0') {
ret = run_init_process(CONFIG_DEFAULT_INIT);
if (ret)
pr_err("Default init %s failed (error %d)\n",
CONFIG_DEFAULT_INIT, ret);
else
return 0;
}
if (!try_to_run_init_process("/sbin/init") ||
!try_to_run_init_process("/etc/init") ||
!try_to_run_init_process("/bin/init") ||
!try_to_run_init_process("/bin/sh"))
return 0;
Here’s a jist of what that’s doing:
- If the
DEFAULT_INIT
Kernel config variable is set, then attempt to execute that as the init process - Otherwise run through some sensible defaults (
/sbin/init
,/etc/init
,/bin/init
) and try to execute each of those in turn - If all those fail then attempt to dump the user into a shell so they can debug this very broken system
As above, most Linux systems these days will have systemd as their init process. On my system, /sbin/init is symlinked:
09/01/2022 14:31:39 AEDT❯ ls -l /sbin/init
lrwxrwxrwx. 1 root root 22 Nov 16 00:21 /sbin/init -> ../lib/systemd/systemd
How systemd executes systemd
If we follow the initramfs flow from above, we have an extra step to go from that to a fully fledged system. How this works is again, system dependent, but let’s go through how systemd in particular works.
systemd operates with “targets”. These targets in and of themselves do nothing, but other services can mark themselves in relation to them. This means that, for example, a service to initialise network devices can hook into network.target to get started when we enter that level.You might know these targets as “run levels”, the concept that systemd has replaced, but the serve the same function.
The difficulty in working out what happens in our initramfs is that systemds targets in the initramfs are actually different than on our main system! That means we’re going to have to look into our initramfs in order to work out whats going on.
Thankfully, Linux distributions store their initramfs’ in /boot, so we can just grab one out and extract it:
~/tmp
09/01/2022 17:20:43 AEDT❯ sudo cp /boot/initramfs-5.14.18-200.fc34.x86_64.img .
~/tmp
09/01/2022 17:20:45 AEDT❯ sudo chown colin:colin ./initramfs-5.14.18-200.fc34.x86_64.img
09/01/2022 17:22:09 AEDT❯ cpio -i < initramfs-5.14.18-200.fc34.x86_64.img
410 blocks
~/tmp
09/01/2022 17:22:13 AEDT❯ ls
early_cpio initramfs-5.14.18-200.fc34.x86_64.img kernel
But wait!? Where’s our /init? Where’s our whole file system? This is the first interesting part about the initramfs, it’s actually two cpio archives back to back. One normal, and one gzipped. We can properly extract it like so:
~/tmp
09/01/2022 17:27:17 AEDT❯ (cpio -i; zcat | cpio -i) < initramfs-5.14.18-200.fc34.x86_64.img
410 blocks
139482 blocks
~/tmp
09/01/2022 17:27:23 AEDT❯ ls
bin dev early_cpio etc init initramfs-5.14.18-200.fc34.x86_64.img kernel lib lib64 proc root run sbin shutdown sys sysroot tmp usr var
Much better! Now we have something to work with.
systemd stores its targets in /usr/lib/systemd/system
. In particular, the default.target symlinks to the target that will be started by default when systemd starts:
09/01/2022 17:28:04 AEDT❯ ls usr/lib/systemd/system/default.target -l
lrwxrwxrwx. 1 colin colin 13 Jan 9 17:27 usr/lib/systemd/system/default.target -> initrd.target
So initrd.target is where we need to look. Rather than going into all the files and dependencies, we can use the systemd-analyze
tool, that spits out helpful dependency graphs that we can look at. Here’s the one for initrd.target:
The green arrows there are “depends on” relationships, so we can see that to start initrd.target, we need to start, in particular initrd-root-fs.target - that sounds like it does what we want! The name implies that it loads the root file system, which will be used to replace the current initramfs. Let’s take a look.
initrd-root-fs.target has a few different services that hook into it, all different methods for loading the file system. For example, on blank systems, we have systemd-repart.service that can repartition the root device, or systemd-volatile-root.service that converts the root device into a volatile (tmpfs) system. My current system uses ostree-prepare-root.service to load the operating system from the ostree system. All of these methods have the same outcome though - one way or another, they mount the root file system to the /sysroot directory.
Once the root filesystem has been loaded, there’s only one last step before we are into our system (at least at a file system level). That is initrd-switch-root.service. This does only one thing - calls systemd to use the pivot root syscall to change the root to the /sysroot directory.
Once that’s done, we’re officially in our main file system - we can start loading the users settings! But beyond here is for next time, this is already getting a bit long!
What have we learned?
Over the course of this post, we’ve covered quite a bit! We’ve looked at the two methods to startup the system - initramfs and the legacy flow, we’ve looked at what exactly is in our initramfs, and we’ve looked at what systemd does in particular to ready the system to boot. In future posts, I want to go further into ttys, how ttys are readied for use, and what a login actually looks like on the backend. Stay tuned!