Let's build an OS: Mounting a disk

Let’s go on an adventure. I’ve learnt a lot more Rust over the last year, and I want to get back into writing properly, so my plan is to write a Linux Operating System. While writing it, I’ll be taking notes in my repo - https://github.com/sinkingpoint/qos/tree/main/notes . And every now and then formalising them into more structured blog posts over here, once I’ve learnt enough to make something interesting.

I had intended this entry to be a simple one. I really did. We were going to use the nix binding of the mount function to create a tiny binary that takes a device and a mount point and mounts it. Literally three lines of code.

Annoyingly, as is seemingly normal these days, nothing is ever simple. My plans were irreparably ruined by one parameter, const char *filesystemtype. What is that, you may ask? Why that is what tells the kernel what type of file system you’re about to mount. You thought that it might be able to work it out on its own? Foolish mortal. We’re gonna have to work it out ourselves.

Filesystems

According to the mount command documentation, we can find our allowed values for filesystemtype in /proc/filesystems. Let’s take a look:

$ cat /proc/filesystems
nodev	sysfs
nodev	tmpfs
nodev	bdev
nodev	proc
nodev	cgroup
nodev	cgroup2
nodev	cpuset
nodev	devtmpfs
nodev	configfs
nodev	debugfs
nodev	tracefs
nodev	securityfs
nodev	sockfs
nodev	bpf
nodev	pipefs
nodev	ramfs
nodev	hugetlbfs
nodev	devpts
	ext3
	ext2
	ext4
nodev	autofs
nodev	efivarfs
nodev	mqueue
nodev	selinuxfs
nodev	binder
	btrfs
nodev	pstore
	fuseblk
nodev	fuse
nodev	fusectl
	squashfs
	vfat
nodev	binfmt_misc
nodev	rpc_pipefs
nodev	overlay

That’s quite a few! What does nodev mean? Again we can refer to the mount docs which say:

MS_NODEV Do not allow access to devices (special files) on this filesystem.

Boo. I want special files (like /dev/urandom) in my filesystem, so let’s toss those ones out. We’re left with:

	ext3
	ext2
	ext4
	btrfs
	fuseblk
	squashfs
	vfat

That’s a much smaller list! Of those, ext4 seems like the easiest one to play around with. We can make one with the mkfs.ext4 command:

$ truncate -s1000M ./test.ext4
$ mkfs.ext4 test.ext4 
mke2fs 1.47.0 (5-Feb-2023)
Discarding device blocks: done
Creating filesystem with 256000 4k blocks and 64000 inodes
Filesystem UUID: 9d54eda7-067c-44a7-a95c-7abc0cc76aad
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

Now, we know that this is an ext4 filesystem, so we could just mount it and call it a day:

mount("./test.ext4", "ext4", 0, 0)

But that’s boring. If we didn’t know that it was an ext4 filesystem, how could we figure it out?

Probing

When I don’t know the type of a file, I use the file command. It’s pretty handy, and it seems to work in our case:

$ file ./test.ext4 
./test.ext4: Linux rev 1.0 ext4 filesystem data, UUID=9d54eda7-067c-44a7-a95c-7abc0cc76aad (extents) (64bit) (large files) (huge files)

Paging through the file source code, we can even find where that comes from: https://github.com/file/file/blob/master/magic/Magdir/filesystems#L1702 . It turns out that file’s detections are really hard to read! We can get a few pointers though.

The general structure of these files is this:

<address> <type> <value to match against> <Text to add to the output>

The first entry in the ext* block is:

0x438   leshort         0xEF53          Linux

Which says that for an ext family file system, the short (16 bits, 2 bytes) at 0x438 in the file should be 0xEF53. Let’s take a look at our filesystem:

$ hexdump -s 0x438 -n 2 test.ext4
0000438 ef53

Looks right! What else can we get from that file? Well, the first thing to note is that every ext* family file system is in the same block - they all have that same 0xEF53 magic number. What makes the difference is simply the capabilities enabled on the filesystem. Even from the comments in that file, we can see that if the filesystem doesn’t have journaling enabled, then it’s an ext2 filesystem. The comments for disambiguating ext3 and ext4 are a bit more hard to parse:

#  and small INCOMPAT?
>>0x460 lelong          <0x0000040
#   and small RO_COMPAT?
>>>0x464 lelong         <0x0000008      ext3 filesystem data
#   else large RO_COMPAT?
>>>0x464 lelong         >0x0000007      ext4 filesystem data
#  else large INCOMPAT?
>>0x460	lelong          >0x000003f      ext4 filesystem data

What are INCOMPAT, and RO_COMPAT? For those we’ll need to look at the Super Block.

Ext Super Block

You might have seen “blocks” in the past. Let’s take a look at my drive:

$ sudo ls -l /dev/nvme0n1
brw-rw----. 1 root disk 259, 0 Feb  2 16:11 /dev/nvme0n1

That b at the front? That means it’s a block device. Block devices are special files - they read data not as bytes but as blocks. Blocks are a fixed number of bytes, depending on the device. If your device has a block size of 1 kilobyte, and you need to read 100 bytes? Too bad, you’re reading 1 kilobyte. You want to read 1040 bytes? Too bad, you’re reading 2 kilobytes.

The super block of the file system is a special block that details all the important metadata about the filesystem. The ext super block structure is well documented. In it we can find all sorts of interesting information.

For starters, we can find our magic number, 0xEF53, at offset 0x38. The ext superblock has 1024 bytes of padding at the beginning which puts our magic number at 0x400 (1024) + 0x38 (56) = 0x438 (1080) bytes in. That 0x438 position is what we found in the file configs!

More interestingly, we find our INCOMPAT and RO_COMPAT values:

Offset	Size	Name	Description
0x60	__le32	s_feature_incompat	Incompatible feature set. If the kernel or fsck doesn’t understand one of these bits, it should stop. See the super_incompat table for more info.
0x64	__le32	s_feature_ro_compat	Readonly-compatible feature set. If the kernel doesn’t understand one of these bits, it can still mount read-only. See the super_rocompat table for more info.

So INCOMPAT and RO_COMPAT are bitsets that indicate features of the file system. Let’s go back to our file outputs:

#  and small INCOMPAT?
>>0x460 lelong          <0x0000040
#   and small RO_COMPAT?
>>>0x464 lelong         <0x0000008      ext3 filesystem data
#   else large RO_COMPAT?
>>>0x464 lelong         >0x0000007      ext4 filesystem data
#  else large INCOMPAT?
>>0x460	lelong          >0x000003f      ext4 filesystem data

0x0000040 is 0b1000000 in binary, and 0x0000008 is 0b1000. ext3 must have an INCOMPAT less than 0b1000000, so the first 7 features in the incompat bitset must be ext3 only (i.e. if it has any bits above the first 6 bits set, it has a feature not in ext3). Similarly, the first 4 bits in ro_compat must be ext3 features.

Putting it all together

Now that we understand how to check whether a file is an ext filesystem, we can finally put it together:

fn identify_fs(fs: &mut File) -> io::Result<Option<String>> {
    fs.seek(SeekFrom::Start(0x438))?;
    let mut magic = [0; 2];
	
	// Check the magic number.
    fs.read_exact(&mut magic)?;
    if magic != [0x53, 0xEF] {
        return Ok(None);
    }

	// Check the feature flags to disambiguate ext3/4.
    fs.seek(SeekFrom::Start(0x460))?;
    let mut incompat = [0; 4];
    fs.read_exact(&mut incompat)?;
    let incompat = u32::from_le_bytes(incompat);

    fs.seek(SeekFrom::Start(0x464))?;
    let mut ro_compat = [0; 4];
    fs.read_exact(&mut ro_compat)?;
    let ro_compat = u32::from_le_bytes(ro_compat);

    if incompat < 0x0000040 && ro_compat < 0x0000008 {
        Ok(Some("ext3".to_string()))
    } else {
        Ok(Some("ext4".to_string()))
    }
}

And finally we’re able to implement our mount command:

use std::{env, fs::File};

use nix::mount::{mount, MsFlags};

fn main() {
    let filename = env::args().nth(1).unwrap();
    let mount_path = env::args().nth(2).unwrap();
    let mut fs = File::open(&filename).unwrap();

    let filesystem_type = match identify_fs(&mut fs) {
        Ok(Some(fs)) => fs,
        Ok(None) => {
            eprintln!("Unsupported filesystem");
            return;
        }
        Err(e) => {
            eprintln!("Error: {}", e);
            return;
        }
    };

    mount::<_, _, _, str>(Some(filename.as_str()), mount_path.as_str(), Some(filesystem_type.as_str()), MsFlags::empty(), None).unwrap();
}

And try it out:

$ mkdir mnt
$ sudo ./target/debug/mount ./filesystem.ext4 ./mnt 
$ ls mnt
lost+found

It works! We can mount ext filesystems now! We obviously don’t support all the flags that the full mount command supports, but we’ve got enough for our purposes.

We’ve got everything in place now to be able to mount a real filesystem from our initramfs, so next we’ll be doing that, and actually booting into an init system. Exciting!

I'm on BlueSky: @colindou.ch. Come yell at me!

Filesystems#

Probing#

Ext Super Block#

Putting it all together#

Filesystems

Probing

Ext Super Block

Putting it all together