We learnt previously that an initramfs in Linux is a “CPIO archive”, so I set out to write something that can read and write them, in order to learn more about the format. What I found was an interesting discrepancy between how GNU cpio
and the Linux Kernel parse CPIO files. It seems pretty innocuous, but I thought I’d document it for posterity.
The CPIO header
CPIO is an exceedingly simple format. At its heart, CPIO archives are a list of entries where each entry comprises a header, a file name, and the file data. The header itself is basically a stat
output:
+-----------------------+
| magic |
| mode |
| uid |
| gid |
| nlink |
| mtime |
| size |
| devmajor |
| devminor |
| rdevmajor |
| rdevminor |
| namesize |
| checksum |
+-----------------------+
One of CPIO’s quirks is that it doesn’t create directories for you - if your archive contains a file such as ./bin/sh
, it must contain an entry for ./bin
before it in the list (or have the ./bin
directory already created). This ensures that the directory exists for the file to be written to.
It’s in these directory entries that the discrepancy shows up. In particular: what should the size
field (which is used to convey the length of the data in the file) be for directories? In my first implementation, I used the number from the stat
system call:
$ stat .
Size: 182 Blocks: 8 IO Block: 4096 directory
...
Note that stat
says the directory is 182 bytes. However, I did think ahead and only wrote the data for standard files. This results in a CPIO file with a non zero length directory, and no data to put into it. One might think that this is invalid - the parser should try to read 182 bytes, which is actually the header of the following file, and then balk. This isn’t what happens in GNU cpio
though, in fact, it parses it just fine:
$ cpio -i --verbose < bad.cpio
.
init
bin
lib64
2 blocks
If you try and boot from it though, you get an error as expected:
$ qemu-system-x86_64 -kernel /boot/vmlinuz-6.6.8-200.fc39.x86_64 -initrd ./bad.cpio
...
[ 0.567173] Initramfs unpacking failed: no cpio magic
...
[ 1.043349] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
It appears that GNU cpio
simply ignores the size field of directories. Which one is more correct? Who’s to say. If I could track down any formal definition of the CPIO format then I would love to know if this is undefined behaviour or not - if you know anything, please share!
Here’s the bad CPIO file if you’re interested: bad.cpio. As an aside: CPIO files can contain absolute paths. It’s entirely possible to extract a CPIO archive that overwrites /bin/sh
with something malicious, and you wouldn’t know. Have fun!