We learnt previously that an initramfs in Linux is a “CPIO archive”, so I set out to write something that can read and write them, in order to learn more about the format. What I found was an interesting discrepancy between how GNU cpio and the Linux Kernel parse CPIO files. It seems pretty innocuous, but I thought I’d document it for posterity.

The CPIO header

CPIO is an exceedingly simple format. At its heart, CPIO archives are a list of entries where each entry comprises a header, a file name, and the file data. The header itself is basically a stat output:

+-----------------------+
|         magic         |
|          mode         |
|           uid         |
|           gid         |
|         nlink         |
|         mtime         |
|          size         |
|      devmajor         |
|      devminor         |
|     rdevmajor         |
|     rdevminor         |
|      namesize         |
|      checksum         |
+-----------------------+

One of CPIO’s quirks is that it doesn’t create directories for you - if your archive contains a file such as ./bin/sh, it must contain an entry for ./bin before it in the list (or have the ./bin directory already created). This ensures that the directory exists for the file to be written to.

It’s in these directory entries that the discrepancy shows up. In particular: what should the size field (which is used to convey the length of the data in the file) be for directories? In my first implementation, I used the number from the stat system call:

$ stat .
  Size: 182             Blocks: 8          IO Block: 4096   directory
...

Note that stat says the directory is 182 bytes. However, I did think ahead and only wrote the data for standard files. This results in a CPIO file with a non zero length directory, and no data to put into it. One might think that this is invalid - the parser should try to read 182 bytes, which is actually the header of the following file, and then balk. This isn’t what happens in GNU cpio though, in fact, it parses it just fine:

$ cpio -i --verbose < bad.cpio
.
init
bin
lib64
2 blocks

If you try and boot from it though, you get an error as expected:

$ qemu-system-x86_64 -kernel /boot/vmlinuz-6.6.8-200.fc39.x86_64 -initrd ./bad.cpio
...
[    0.567173] Initramfs unpacking failed: no cpio magic
...
[    1.043349] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---

It appears that GNU cpio simply ignores the size field of directories. Which one is more correct? Who’s to say. If I could track down any formal definition of the CPIO format then I would love to know if this is undefined behaviour or not - if you know anything, please share!

Here’s the bad CPIO file if you’re interested: bad.cpio. As an aside: CPIO files can contain absolute paths. It’s entirely possible to extract a CPIO archive that overwrites /bin/sh with something malicious, and you wouldn’t know. Have fun!

I'm on BlueSky: @colindou.ch. Come yell at me!