Let’s go on an adventure. I’ve learnt a lot more Rust over the last year, and I want to get back into writing properly, so my plan is to write a Linux Operating System. While writing it, I’ll be taking notes in my repo - https://github.com/sinkingpoint/qos/tree/main/notes . And every now and then formalising them into more structured blog posts over here, once I’ve learnt enough to make something interesting.
I had intended this entry to be a simple one. I really did. We were going to use the nix binding of the mount function to create a tiny binary that takes a device and a mount point and mounts it. Literally three lines of code.
Annoyingly, as is seemingly normal these days, nothing is ever simple. My plans were irreparably ruined by one parameter, const char *filesystemtype
. What is that, you may ask? Why that is what tells the kernel what type of file system you’re about to mount. You thought that it might be able to work it out on its own? Foolish mortal. We’re gonna have to work it out ourselves.
Filesystems
According to the mount command documentation, we can find our allowed values for filesystemtype
in /proc/filesystems. Let’s take a look:
$ cat /proc/filesystems
nodev sysfs
nodev tmpfs
nodev bdev
nodev proc
nodev cgroup
nodev cgroup2
nodev cpuset
nodev devtmpfs
nodev configfs
nodev debugfs
nodev tracefs
nodev securityfs
nodev sockfs
nodev bpf
nodev pipefs
nodev ramfs
nodev hugetlbfs
nodev devpts
ext3
ext2
ext4
nodev autofs
nodev efivarfs
nodev mqueue
nodev selinuxfs
nodev binder
btrfs
nodev pstore
fuseblk
nodev fuse
nodev fusectl
squashfs
vfat
nodev binfmt_misc
nodev rpc_pipefs
nodev overlay
That’s quite a few! What does nodev
mean? Again we can refer to the mount docs which say:
MS_NODEV Do not allow access to devices (special files) on this filesystem.
Boo. I want special files (like /dev/urandom) in my filesystem, so let’s toss those ones out. We’re left with:
ext3
ext2
ext4
btrfs
fuseblk
squashfs
vfat
That’s a much smaller list! Of those, ext4
seems like the easiest one to play around with. We can make one with the mkfs.ext4
command:
$ truncate -s1000M ./test.ext4
$ mkfs.ext4 test.ext4
mke2fs 1.47.0 (5-Feb-2023)
Discarding device blocks: done
Creating filesystem with 256000 4k blocks and 64000 inodes
Filesystem UUID: 9d54eda7-067c-44a7-a95c-7abc0cc76aad
Superblock backups stored on blocks:
32768, 98304, 163840, 229376
Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done
Now, we know that this is an ext4 filesystem, so we could just mount it and call it a day:
mount("./test.ext4", "ext4", 0, 0)
But that’s boring. If we didn’t know that it was an ext4 filesystem, how could we figure it out?
Probing
When I don’t know the type of a file, I use the file
command. It’s pretty handy, and it seems to work in our case:
$ file ./test.ext4
./test.ext4: Linux rev 1.0 ext4 filesystem data, UUID=9d54eda7-067c-44a7-a95c-7abc0cc76aad (extents) (64bit) (large files) (huge files)
Paging through the file
source code, we can even find where that comes from: https://github.com/file/file/blob/master/magic/Magdir/filesystems#L1702 . It turns out that file’s detections are really hard to read! We can get a few pointers though.
The general structure of these files is this:
<address> <type> <value to match against> <Text to add to the output>
The first entry in the ext*
block is:
0x438 leshort 0xEF53 Linux
Which says that for an ext family file system, the short (16 bits, 2 bytes) at 0x438 in the file should be 0xEF53. Let’s take a look at our filesystem:
$ hexdump -s 0x438 -n 2 test.ext4
0000438 ef53
Looks right! What else can we get from that file? Well, the first thing to note is that every ext* family file system is in the same block - they all have that same 0xEF53 magic number. What makes the difference is simply the capabilities enabled on the filesystem. Even from the comments in that file, we can see that if the filesystem doesn’t have journaling enabled, then it’s an ext2 filesystem. The comments for disambiguating ext3 and ext4 are a bit more hard to parse:
# and small INCOMPAT?
>>0x460 lelong <0x0000040
# and small RO_COMPAT?
>>>0x464 lelong <0x0000008 ext3 filesystem data
# else large RO_COMPAT?
>>>0x464 lelong >0x0000007 ext4 filesystem data
# else large INCOMPAT?
>>0x460 lelong >0x000003f ext4 filesystem data
What are INCOMPAT
, and RO_COMPAT
? For those we’ll need to look at the Super Block.
Ext Super Block
You might have seen “blocks” in the past. Let’s take a look at my drive:
$ sudo ls -l /dev/nvme0n1
brw-rw----. 1 root disk 259, 0 Feb 2 16:11 /dev/nvme0n1
That b
at the front? That means it’s a block device. Block devices are special files - they read data not as bytes but as blocks. Blocks are a fixed number of bytes, depending on the device. If your device has a block size of 1 kilobyte, and you need to read 100 bytes? Too bad, you’re reading 1 kilobyte. You want to read 1040 bytes? Too bad, you’re reading 2 kilobytes.
The super block of the file system is a special block that details all the important metadata about the filesystem. The ext super block structure is well documented. In it we can find all sorts of interesting information.
For starters, we can find our magic number, 0xEF53, at offset 0x38. The ext superblock has 1024 bytes of padding at the beginning which puts our magic number at 0x400 (1024) + 0x38 (56) = 0x438 (1080) bytes in. That 0x438 position is what we found in the file
configs!
More interestingly, we find our INCOMPAT
and RO_COMPAT
values:
Offset | Size | Name | Description |
---|---|---|---|
0x60 | __le32 | s_feature_incompat | Incompatible feature set. If the kernel or fsck doesn’t understand one of these bits, it should stop. See the super_incompat table for more info. |
0x64 | __le32 | s_feature_ro_compat | Readonly-compatible feature set. If the kernel doesn’t understand one of these bits, it can still mount read-only. See the super_rocompat table for more info. |
So INCOMPAT
and RO_COMPAT
are bitsets that indicate features of the file system. Let’s go back to our file outputs:
# and small INCOMPAT?
>>0x460 lelong <0x0000040
# and small RO_COMPAT?
>>>0x464 lelong <0x0000008 ext3 filesystem data
# else large RO_COMPAT?
>>>0x464 lelong >0x0000007 ext4 filesystem data
# else large INCOMPAT?
>>0x460 lelong >0x000003f ext4 filesystem data
0x0000040
is 0b1000000
in binary, and 0x0000008
is 0b1000
. ext3 must have an INCOMPAT
less than 0b1000000, so the first 7 features in the incompat bitset must be ext3 only (i.e. if it has any bits above the first 6 bits set, it has a feature not in ext3). Similarly, the first 4 bits in ro_compat must be ext3 features.
Putting it all together
Now that we understand how to check whether a file is an ext filesystem, we can finally put it together:
fn identify_fs(fs: &mut File) -> io::Result<Option<String>> {
fs.seek(SeekFrom::Start(0x438))?;
let mut magic = [0; 2];
// Check the magic number.
fs.read_exact(&mut magic)?;
if magic != [0x53, 0xEF] {
return Ok(None);
}
// Check the feature flags to disambiguate ext3/4.
fs.seek(SeekFrom::Start(0x460))?;
let mut incompat = [0; 4];
fs.read_exact(&mut incompat)?;
let incompat = u32::from_le_bytes(incompat);
fs.seek(SeekFrom::Start(0x464))?;
let mut ro_compat = [0; 4];
fs.read_exact(&mut ro_compat)?;
let ro_compat = u32::from_le_bytes(ro_compat);
if incompat < 0x0000040 && ro_compat < 0x0000008 {
Ok(Some("ext3".to_string()))
} else {
Ok(Some("ext4".to_string()))
}
}
And finally we’re able to implement our mount command:
use std::{env, fs::File};
use nix::mount::{mount, MsFlags};
fn main() {
let filename = env::args().nth(1).unwrap();
let mount_path = env::args().nth(2).unwrap();
let mut fs = File::open(&filename).unwrap();
let filesystem_type = match identify_fs(&mut fs) {
Ok(Some(fs)) => fs,
Ok(None) => {
eprintln!("Unsupported filesystem");
return;
}
Err(e) => {
eprintln!("Error: {}", e);
return;
}
};
mount::<_, _, _, str>(Some(filename.as_str()), mount_path.as_str(), Some(filesystem_type.as_str()), MsFlags::empty(), None).unwrap();
}
And try it out:
$ mkdir mnt
$ sudo ./target/debug/mount ./filesystem.ext4 ./mnt
$ ls mnt
lost+found
It works! We can mount ext filesystems now! We obviously don’t support all the flags that the full mount command supports, but we’ve got enough for our purposes.
We’ve got everything in place now to be able to mount a real filesystem from our initramfs, so next we’ll be doing that, and actually booting into an init system. Exciting!
I'm on Twitter: @sinkingpoint and BlueSky: @colindou.ch. Come yell at me!