Let's build an OS: A basic shell

Let’s go on an adventure. I’ve learnt a lot more Rust over the last year, and I want to get back into writing properly, so my plan is to write a Linux Operating System. While writing it, I’ll be taking notes in my repo - https://github.com/sinkingpoint/qos/tree/main/notes . And every now and then formalising them into more structured blog posts over here, once I’ve learnt enough to make something interesting.

I love a good shell. There’s really nothing better than the raw control of your computer that a shell unlocks. Running programs that do things is all well and good, but man, being able to get a good REPL going? Nothing better.

But how do they work? It feels like they should be simple - read user input, pass to exec, loop. Is that all there is to it? I dunno. But in this entry, I’m gonna find out.

Our north star this time is to get something that I can type commands like ls -l into, and see the output in my terminal. Vaguely, I see that coming in three parts:

A “buffer” (for lack of a better word), that takes the raw key strokes from the user and saves them to build up a string.
A parser, that takes raw strings and chunks them up into something we can use (in order to handle things like strings, and unicode escapes)
An executor, that takes a parser output and execs it

Lets see what that looks like.

A Buffer

We can start with a really basic Rust program, just as something to boot into:

fn main() {
    loop {}
}

This doesn’t actually do anything except prevent the kernel from panicing immediately because the init process died. Using the tool for creating initramfs’s that I made after our last excursion into the wonders of the CPIO format, we can build that into an initramfs and boot it. And we see this:

[    1.379771] Run /init as init process
█

So far so good; no panics yet. We can even type things!

[    1.379771] Run /init as init process
abcdefg█

And backspace them:

[    1.379771] Run /init as init process
abcd█

This shell stuff is pretty easy (Maybe I can raise $23M)! We can even move our cursor arou… Oh.

[    1.379771] Run /init as init process
abcdefg^[[D█

What’s that ^[[D? I pressed my left arrow key, and that doesn’t look like a left arrow. The ‘^’ in front kind of looks like when I press Ctrl-C (^C). Maybe these are “Escape Characters”?

🐇 Rabbithole: ANSI Escape Codes

To learn what that is, we’re going to have to learn a teeny-tiny amount of history about ✨the terminal✨. Here’s a bit of a crash course:

You see, way back in old timey land, we didn’t have the super-fast computers that can comfortably fit on your lap that we have these days. We had not-as-fast computers that could comfortably fit in your living room (if you had a big living room). Being both big, and ridiculously expensive, it was pretty impractical to give a computer to everyone who needed one like we do today, so everyone had to share. Sharing started out as people literally taking turns, but everyone knows how addicted people get to their computers, so having a single user at a time was kind of a drag. That all changed with two creations: multi-user operating systems, and networking. Multi-user operating systems allow us to have multiple programs running at the same time, but that’s not particularly useful when we still have the one seat. Then we starting plugging computers into networks (this has made a lot of people very angry and been widely regarded as a bad move). With that, if we had something that could take input from the user, and output from the computer over the network, then we can have a bunch of people working on the same computer at the same time! Enter: 🌈the terminal✨.

Terminals aren’t the complicated personal computers we all know and love. They started out as typewriters, physically printing the computers output to a sheet of paper. Because we hooked them up to computers over the telephone network, they became “tele-typewriters”, or “teletypes”. Typewriters are pretty limited. For the most part, look at your keyboard today, and that’s it. We need some standard way of translating those key strokes into magical network ether though, so along came ASCII. ASCII went along and assigned 127 different numbers to 88 printable characters and 37 non printable “control” characters, allowing a computer, and its associated software, to completely control the typewriter, remotely!

And then we invented screens, and all hell broke loose. You see the problem with typewriters, and physically printing things to paper is that it’s quite final. What if I write a command, and then want to add another argument in the middle? Time to start a new command on a new line. With screens, we’re not limited by the bounds of finality of our physical world. We’re free to go back and edit the middle of commands as much as we want! At least we would be, if we could teach computers what “go back” meant. We could publish some new ASCII, adding new codes, but that’s a losing battle - having to update everything to speak the new ASCII of screens would be a nightmare. What we really want is something that can interoperate with ASCII.

And that brings us all the way around to our ^[[D. We needed some way for a computer to indicate to a terminal “Move the cursor back a space”, without introducing any more characters to ASCII. ASCII already introduced the idea of “Escape Sequences” (characters like ‘^C’, Ctrl-C), so we simply extended that idea. In ^[[D the ^[ at the front actually indicates ASCII 0x1B, or the ESC(ape) character (Thus, “Escape Sequence”). “[” is called the Control Sequence Introducer (CSI), and “D” is the “function defining character”. All formed, they make up the “Cursor Left” Control Sequence. Hey! That’s what I pressed!

How do I know that that makes up “Cursor Left”? Well, I read the spec! ANSI X3.64 is only one spec (ISO, the International Standard Organisation also produced a Control Characters spec), but the ANSI one is much more readable in my opinion. Shall we look at some interesting nuggets?

Apparently VIM contributed to it?

Looking at the contributers, we see contributions from “VIM”.

VIM: S. W. White M. R. Speers (alternate) J. A. Baker (alternate)

This seems highly unlikely considering this is a specification from 1979, more than a decade before VIM was born out of VI. I assume that this was instead a company, but I can’t seem to find anything about them or any of the listed contributors. Who were you S.W White?!

Cursor Movement

Commands to move the cursor around

We have commands for moving the cursor around - Cursor Forward and Cursor Backward in particular seem like they’ll be useful for backspacing logic in the future.

Reset Initial State

RIS: Reset Initial State

Apparently there’s a control sequence for “resetting” the terminal, which I would guess is what we’ll have to do for a reset command implementation. I always assumed that reset was some combination of tcsetattr and println.

Terminal Settings

So it turns out, by default, terminals give you a pretty ok typing environment. So long as you type like you’re on a typewriter - no movement through a buffer (except backspace) for example. But in my shell, I want to be able to move backwards and forwards through a buffer. I’m a terrible typist. So our first task here: How do we stop those ^[[Ds from appearing? Let’s talk about terminal settings.

The Linux terminal driver has a number of settings that can control how the output is displayed on the terminal. We can see an entire list with man stty (or we can google it like a normal person).

🐇 Rabbithole: Terminal Settings

Running stty shows us the current terminal settings, and I get:

$ stty
speed 38400 baud; line = 0;
-brkint -imaxbel iutf8

Let’s break those down:

speed 38400 baud

Because our pseudo-terminal has to act like a real, network connected terminal, it has to set a baud rate, that is, the bit rate between our shell and our terminal. This says that our bit rate is 38400 bits per second, plenty enough! For fun, we can set it down to a lower setting (110 bps in this case):

$ stty ospeed 110 ispeed 110                

$ stty                      
speed 110 baud; line = 0;
-brkint -imaxbel iutf8

This is still quite a bit - ~14 8 bit characters per second, but it does cause a noticible stutter when keyboard maching. Neat!

line = 0

This sets the line dicipline to the default, 0. The line dicipline of the terminal is what processes the data “on the line”, after its been received, but before it gets passed to the terminal itself. We can view the supported line diciplines with man 8 ldattach, which indicates that line dicipline 0, is “TTY” - that sounds like what we want!

       TTY(0)
           The default line discipline, providing transparent operation (raw mode) as well as the habitual terminal line editing capabilities (cooked mode).

Line dicipline 3 intrigued me though:

       PPP(3)
           Point to Point Protocol (PPP) processor for transmitting network packets over serial lines.

Maybe that’ll be a side project one of these days.

-brkint

The “-” here indicates a negation from the default, turning off brkint. stty says this makes “breaks cause an interrupt signal”. But what’s a break? A line break? Nope. This goes back to our low level terminal business. Consider our network connected terminal, and its copper network cable - there’s a signal that flucuates between “high”, 5 volts (for example), and “low”, 0 volts. A terminal break is when that line goes low for a certain period of time. We can simluate one with the tcsendbreak command:

tcsendbreak()  transmits  a  continuous stream of zero-valued bits for a specific duration, if the terminal is using asynchronous serial data transmission.  If duration is zero, it transmits zero-valued bits for at least 0.25
       seconds, and not more than 0.5 seconds.  If duration is not zero, it sends zero-valued bits for some implementation-defined length of time.

So -brkint is telling our terminal to not sent interrupts when it receives a terminal break - probably wise, considering we will presuambly never have a real one in our terminal environment.

-imaxbel

Similarly, -imaxbel turns off the imaxbel setting. This is again rather cryptic - “beep and do not flush a full input buffer on a character”. The FreeBSD docs are a lot more helpful:

The system imposes a	limit  of  MAX_INPUT  (currently  255)
    characters  in  the input queue.  If	imaxbel	is set and the
    input queue limit has been reached, subsequent input	causes
    the system to send an ASCII BEL  character  to  the	output
    queue  (the	terminal beeps at you).	 Otherwise, if imaxbel
    is unset and	the input queue	is full, the next input	 char-
    acter  causes the entire input and output queues to be dis-
    carded.

So imaxbel makes the terminal beep, and automatically flush the buffer when we hit the MAX_INPUT. In our world of modern terminals with large input buffers, this also makes sense to turn off.

iutf8

This turns on UTF-8 support, which allows me to have things like emojis in my terminal:

$ 😼😼😼😼
zsh: 😼😼😼😼: command not found...

But also allows support for non ASCII characters, like other languages, which is probably useful.

If we search for “control characters”, we find the setting echoctl:

       * [-]echoctl
              echo control characters in hat notation ('^c')

Let’s update our code to disable that. We can use tcgetattr to get the current terminal settings, zero out the echoctl bit with a dose of bitwise magic, and then set the settings with tcsetattr.

fn main() {
    let reader = io::stdin();
    let mut attrs = match tcgetattr(&reader) {
        Ok(attrs) => attrs,
        Err(e) => {
            println!("Error getting terminal attributes: {}", e);
            return;
        }
    };

    attrs.local_flags &= !(LocalFlags::ECHOCTL);

    if let Err(e) = tcsetattr(&reader, SetArg::TCSANOW, &attrs) {
        println!("Error setting terminal attributes: {}", e);
        return;
    }

    loop {}
}

When that boots, it looks like we can arrow key around now!

[    1.371934] Run /init as init process
cats█recool

Backspacing becomes a bit janky though:

[    1.371934] Run /init as init process
catsa█ cool

As we’re typing characters, our terminal is dutifully outputting them, but there’s no memory allowing it to do cool things like shift characters back as we backspace. It’s becoming pretty clear that we’ll need to build some memory into this and handle the display of characters ourselves. So let’s disable echoing completely, and then read and output characters ourselves. That’ll allow us to keep track of what’s being typed so we can read it later.

fn main() {
    ...
    // Disable echoing entirely.
    attrs.local_flags &= !(LocalFlags::ECHO);

    ...

    // Read a character, and then output it.
    loop {
        let mut chars: [u8; 1] = [0; 1];
        reader.read_exact(&mut chars).unwrap();
        print!("{}", chars[0] as char);
    }
}

Hrmm. For some reason, we now only get output when we press the Enter key. Maybe we’re hitting line buffering in our print!? The default stdout hook does include a LineWriter, so let’s get a raw handle to stdout that doesn’t have any of that nonsense.

fn main() {
    ...
    let mut stdout = unsafe { File::from_raw_fd(STDOUT_FILENO) };
    loop {
        let mut chars: [u8; 1] = [0; 1];
        reader.read_exact(&mut chars).unwrap();
        write!(stdout, "{}", chars[0] as char);
    }
}

Still nothing though - we’re only getting output when we press the enter key. Maybe another terminal setting will help us? Back to the docs, we find “Canonical Mode”:

Canonical and noncanonical mode
The setting of the ICANON canon flag in c_lflag determines whether the terminal is operating in canonical mode (ICANON set) or noncanonical mode (ICANON unset). By default, ICANON is set.
In canonical mode:
Input is made available line by line. An input line is available when one of the line delimiters is typed (NL, EOL, EOL2; or EOF at the start of line). Except in the case of EOF, the line delimiter is included in the buffer returned by read(2).

So it turns out that this “Canonical Mode” was the source of all our nice text editing abilities, but it also precludes us from doing anything more complicated. Let’s disable it and see if we can get that unbuffered input:

fn main() {
    ...
    attrs.local_flags &= !(LocalFlags::ECHO | LocalFlags::ICANON);
    ...

    let mut stdout = unsafe { File::from_raw_fd(1) };
    // Read a character, and then output it.
    loop {
        let mut chars: [u8; 1] = [0; 1];
        reader.read_exact(&mut chars).unwrap();
        write!(stdout, "{}", chars[0] as char);
    }
}

And that seems like it works! We can read a character at a time, and then write that character out to the terminal.

Backspacing

Now that we can read individual characters, we’ll need to handle things like Backspace, which will delete things underneath the cursor, and move everything to the right of the cursor back one. To start with, we’re going to need some sort of memory so that we know exactly what characters we have to move back, so let’s create a struct:

struct Buffer<T: Write> {
    /// The currently buffered input.
    buffer: String,

    /// The position of the cursor in the buffer.
    position: usize,

    writer: T
}

impl<T: Write> Buffer<T> {
    fn new(writer: T) -> Buffer {
        return Buffer {
            buffer: String::new(),
            position: 0,
            writer,
        }
    }

    /// Add a character to the buffer at the current position.
    fn push_char(&mut self, c: char) {
        if self.position == self.buffer.len() {
            self.buffer.push(c);
        } else {
            self.buffer.insert(self.position, c);
        }

        self.position += 1;
    }
}

And then update our loop to use it:

let mut stdout = unsafe { File::from_raw_fd(1) };
let mut buffer = Buffer::new(stdout);
loop {
    let mut chars: [u8; 1] = [0; 1];
    reader.read_exact(&mut chars).unwrap();
    if c == '\u{7f}' { // '\u{7f}' is the DEL character in ASCII.
        buffer.backspace();
    } else {
        buffer.push_char(chars[0]);
    }
}

Before we implement that backspace function, let’s lay out the steps of what exactly backspace does. If we start with ab█de as an example, we have four steps:

Move the cursor back one space (i.e. a█cde)
Delete everything after the cursor (i.e. a█)
Redraw everything after the cursor (i.e. acde█)
Because the redraw of characters will move the cursor forward, move the cursor back to the starting position (i.e. a█de)

Let’s make that concrete:

fn backspace(&mut self) {
    // Don't backspace if the cursor is at the beginning of the buffer.
    if self.position == 0 {
        return;
    }
    
    // Remove the character from the buffer.
    if self.position == self.buffer.len() {
        self.buffer.pop();
    } else {
        self.buffer.remove(self.position - 1);
    }

    // Move the cursor back one space, using the "Cursor Back" ANSI Escape, ^[[D.
    write!(self.writer, "\x1b[D").expect("Failed to write to stdout");
    self.position -= 1;
    
    // Clear from the cursor to the end of the line, with the "Erase In Line" ANSI Escape, ^[[K
    let start = if self.position == 0 { 0 } else { self.position - 1 };
    write!(self.writer, "\x1b[K{}", &self.buffer[start..]).expect("Failed to write to stdout");
    
    // After rewriting a line, we are at the end of it. If we were in the middle of the string, we need to move the cursor back.
    if self.buffer.len() > self.position {
        write!(self.writer, "\x1b[{}D", self.buffer.len() - self.position).expect("Failed to write to stdout");
    }
}

That seems to work! We can backspace now!

What about arrow keys? Let’s dive back into ANSI escape codes for a second.

Parsing ANSI Escape Codes

When handling backspace, we were able to output our own ANSI escape codes. In order to handle arrow keys, we’ll need to be able to read them.Luckily, our arrow keys are pretty simple. They’re three characters, an ESC (0x1b), followed by the “Control Sequence Indicator”, ‘[’, followed by a letter. In our case, we only care about ‘C’ (move backwards), and ‘D’ (move forwards), for the left and right arrow respectively. Let’s update our loop again:

loop {
    let mut buffer = Buffer::new();
    let mut chars: [u8; 1] = [0; 1];
    reader.read_exact(&mut chars).unwrap();
    if c == '\u{7f}' { // '\u{7f}' is the DEL character in ASCII.
        buffer.backspace();
    } else if c == '\u{1b}'{ // \u{1b} is ESC in ASCII.
        let escape[u8; 2] = [0; 2];
        reader.read_exact(&mut escape).unwrap(); // Read the [, <char>.
        buffer.handle_control_character(escape[1]);
    } else {
        buffer.push_char(chars[0]);
    }
}

And then define our handler on our buffer:

fn handle_control_character(&mut self, c char) {
    match c {
        'C' => self.move_cursor(1)
        'D' => self.move_cursor(-1),
        _ => panic!("UNSUPPORTED")
    }
}

// Move the cursor left or right by the given number of spaces.
fn move_cursor(&mut self, amt: isize) {
    // Calculate the new position, clamping it to the bounds of the buffer.
    let mut new_position = (self.position as isize + amt).clamp(0, self.buffer.len() as isize) as usize;

    // Output the required control characters to move the cursor.
    match new_position.cmp(&self.position) {
        Ordering::Greater => write!(self.writer, "\x1b{}C", new_position - self.position).expect("Failed to write to stdout"),
        Ordering::Less => write!(self.writer, "\x1b[{}D", self.position - new_position).expect("Failed to write to stdout"),
        Ordering::Equal => (),
    }
}

That seems to work as expected. It’s only scratching the surface of how complicated these codes can get though. In move_cursor we even use one of the more complicated parts of escape sequences - they can have arguments! When we’re moving the cursor, outputting 5 of the same escape sequence to move 5 times is laborious. Instead, we emit ^[[5D - Cursor back 5 spaces. This saves a decent amount of complexity, although our handler doesn’t support that yet. Something to work on!

But that’s about it for an initial Buffer. We can type things, arrow around, and delete things. All the basics for a terminal line editor. Now how can we actually split it up into arguments?

A Parser

Now that we can read raw input, we need to be able to chunk it up and understand it as a command. The simplest parser we could do would be to just split the input on whitespace, but that’s not particularly interesting. What I want is the ability to have strings - things in quotes, like ‘foo’, or ‘i haven\’t written a parser in a decade’. For that we’re gonna need something a bit more heavy duty.

The cheating way to do this would be to define a Parsing Expression Grammar - a definition of exactly what a string looks like, and then use a library like pest in order to generate a parser from that definition. But where we’re going, we don’t use things like libraries, so let’s make our own. We can start out with a trait. This trait is going to represent something that can consume an number of characters from a slice, and return a “Token”, where a Token contains some metadata about the characters that were consumed. Something like this:

#[derive(Debug, PartialEq)]
pub struct Token<T> {
    pub literal: String, // The literal characters that this token derives from.
    pub start: usize,    // The absolute start position (in chars) of this token.
    pub length: usize,   // The length in characters of this token. 
    pub token: T,        // The underlying type of token.
}

/// A Result type returned by a Consumer.
/// Ok(Some()) if there was a Token that was sucessfully parsed
/// Ok(None) if there was no valid token found
/// Err() if the token _looked_ like a match, but wasn't, e.g. unterminated strings.
pub type ParserResult<T> = Result<Option<Token<T>>, String>;

pub trait Consumer {
    /// Attempt to consume a number of characters from `input`, starting at `start`.
    fn try_consume(input: &[char], start: usize) -> ParserResult<Self> where Self: Sized;
}

We can then define some implementations of that trait. For example, we can define a quoted string as any set of characters surrounded by quotes.

// Consumes a string surrounded by the given quotes, with escapes. e.g. "hello world", 'hello world', etc.
#[derive(Debug)]
pub struct QuotedString<const QUOTE:char>;

impl<const QUOTE: char> Consumer for QuotedString<QUOTE> {
    fn try_consume(input: &[char], start: usize) -> ParserResult<Self> {
        if input.len() - start < 2 || input[start] != QUOTE {
            return Ok(None);
        }

        let mut literal = String::from(QUOTE);
        let mut length = 1;

        while start + length < input.len() && input[start + length] != QUOTE {
            literal.push(input[start]);
            length += 1;
        }

        if input.len() - (start + length) > 0 && input[start + length] == QUOTE {
            literal.push(QUOTE);
            length += 1;
        } else {
            return Err(format!("Expected closing quote: {}", QUOTE), start);
        }

        Ok(Some(Token {
            literal,
            start,
            length,
            token: QuotedString
        }))
    }
}

Const generics allow us to define this generically across quotes. That allows us to handle all the different quotes we want, using the same struct. For example, we can define a single quoted string as QuotedString<'\''>, or a double quoted string as QuotedString<'"'>.

We can define an unquoted string, in a similar way:

// Consumes a string surrounded by the given quotes, with escapes. e.g. "hello world", 'hello world', etc.
#[derive(Debug)]
pub struct UnquotedString;

impl Consumer for UnquotedString<QUOTE> {
    fn try_consume(input: &[char], start: usize) -> ParserResult<Self> {
        let mut literal = String::new();
        let mut length = 0;

        while start + length < input.len() && input[start + length].is_whitespace() {
            literal.push(input[start]);
            length += 1;
        }

        Ok(Some(Token {
            literal,
            start,
            length,
            token: UnquotedString
        }))
    }
}

We can then build up an enum, representing any of the strings that we might want to collect:

// Consumes a single quoted string, with escapes. e.g. 'hello world', 'foo\\', etc.
pub type SingleQuotedString = QuotedString<'\''>;

// Consumes a double quoted string, with escapes. e.g. "hello world", "foo\\", etc.
pub type DoubleQuotedString = QuotedString<'"'>;

// Consumes a string that is either quoted or unquoted.
#[derive(Debug, PartialEq)]
pub enum QuotedOrUnquotedString {
    SingleQuoted,
    DoubleQuoted,
    Unquoted
}

impl Consumer for QuotedOrUnquotedString {
    fn try_consume(input: &[char], start: usize) -> ParserResult<Self> {
        if let Some(token) = SingleQuotedString::try_consume(input, start)?  {
            return Ok(Some(Token {
                literal: token.literal,
                start,
                length: token.length,
                token: QuotedOrUnquotedString::SingleQuoted
            }));
        } else if let Some(token) = DoubleQuotedString::try_consume(input, start)? {
            return Ok(Some(Token {
                literal: token.literal,
                start,
                length: token.length,
                token: QuotedOrUnquotedString::DoubleQuoted
            }));
        } else if let Some(token) = UnquotedString::try_consume(input, start)? {
            return Ok(Some(Token {
                literal: token.literal,
                start,
                length: token.length,
                token: QuotedOrUnquotedString::Unquoted
            }));
        }

        Ok(None)
    }
}

And then finally, an Expression that is a collection of QuotedorUnquotedStrings, seperated by some amount of whitespace.

// Consumes a string that is made up of component strings. e.g. "/bin/sh -c 'echo hello world'" would be parsed into 3 parts: "/bin/sh", "-c", and "'echo hello world'".
#[derive(Debug, PartialEq)]
pub struct Expression {
    pub parts: Vec<Token<QuotedOrUnquotedString>>
}

impl Consumer for Expression {
    /// Try and consumer an Expression, which is a number of strings, seperated by arbitrary whitespace.
    fn try_consume(input: &[char], start: usize) -> ParserResult<Self> {
        let mut literal = String::new();
        let mut parts = Vec::new();
        let mut length = 0;

        while start + length < input.len() {
            if let Some(c) = Whitespace::try_consume(input, start + length)? {
                // Try consume any amount of whitespace between tokens.
                literal.push_str(&c.literal);
                length += c.length;
            } else if let Some(token) = QuotedOrUnquotedString::try_consume(input, start + length)? {
                // Try consume a string token.
                literal += &token.literal;
                length += token.length;
                parts.push(token);
            } else {
                break;
            }
        }

        if parts.is_empty() {
            return Ok(None);
        }

        Ok(Some(Token {
            literal,
            start,
            length,
            token: Expression {
                parts
            }
        }))
    }
}

That should be enough for a basic implementation! There’s definitely more room for improvement here, but it gives us some basic string support. In particular, we will want to extend UnquotedStrings and DoubleQuotedStrings to support variable interpolation, we will want to support escaping (e.g. \n, or \u08), and we’ll want to handle more complicated structures (e.g. for loops) in the future, but this is enough to get something running.

An Executor

The final component of our shell will be something that takes the output of our parser and actually runs a program. But what does actually running a program entail? We can turn to strace to tell us exactly what happens. Let’s try strace ls:

$ strace ls
execve("/usr/bin/ls", ["ls"], 0x7ffcb7692f88 /* 60 vars */) = 0
...

So the very first thing we do is call the execve system-call. According to the docs:

execve() executes the program referred to by pathname. This causes the program that is currently being run by the calling process to be replaced with a new program, with newly initialized stack, heap, and (initialized and uninitialized) data segments.

That sounds like what we want!

🐇 Rabbithole: execve Frontends

While the actual system-call that gets executed is execve, libc actually provides a number of “frontends” to execve; functions that do not exist as system-calls themselves, but instead perform some logic before eventually calling execve. We can see them all with man 3 exec. Each version has an l version (e.g. execl), and a v version (e.g. execv). I’m gonna lump them together because they’re functionally the same, the only material difference is that the l series takes a varargs set of arguments, whereas the the v version takes an array.

execl and execv

These are the simplest pair. They simply take an executable, and a set of arguments, and forward them into execve, leaving the environment variables blank.

execle

execle is the same as execl, but also lets you add in an array of environment variables. This provides a varargs interface over execve.

execlp and execvp

These look the same as execl and execv, except they do something special with the first argument. In particular, while the non-p versions of the functions expect a full path in the first argument, the p versions attempt to resolve the name to a full path by looking at the PATH environment variable, if they don’t already contain a /.

execvpe

Building on execvp, execvpe does the same path searching magic, but also lets you add in an array of environment variables, basically exactly execve, but with $PATH magic.

execlpe

Notably missing is execlpe. This is because the l variants take a varargs set, and you can’t have any arguments after the varargs, so we can’t get on the environment variables at the end.

execve is only one part of the puzzle though. The docs say “causes the program that is currently being run by the calling process to be replaced with a new program”. That’s a bit of a problem - I want my shell to keep running once the program we execute finishes; We can’t do that if it’s replaced!

The other part of the puzzle is fork. fork allows us to duplicate our running process - one to keep our shell running, and one to run our process. So our process is to take our parser output, evaluate it as seperate arguments, fork, and then execve in the child process. What does that look like?

Let’s define a Process struct that will encapsulate this logic:

pub struct Process {
    pub argv: Vec<String>,
}

And then the fork -> exec logic, where we call fork and then if we’re the child side of the fork call execve. Eventually we’ll want to do some accounting on the parent side to keep track of the running processes, but we can ignore that for now in favor of just running in the background.

impl Process {
    /// Start the process in a new child process.
    pub fn start(&mut self) -> nix::Result<()> {
        unsafe {
            match fork() {
                Ok(ForkResult::Parent { child }) => {},
                Ok(ForkResult::Child) => {
                    self.exec();
                },
                Err(e) => {
                    // Error
                }
            }
        }

        Ok(())
    }

    /// `exec` the process, replacing the current process with the new process.
    /// Because this function is always called in a child process, any persistent state set here will be lost.
    fn exec(&self) {
        // Convert the String args into CString's which are NULL-terminated. Our first dip into FFI code!
        let path = CString::new(self.argv[0].as_str()).unwrap();
        let args: Vec<CString> = self.argv.iter().map(|arg| CString::new(arg.as_str()).unwrap()).collect();

        if let Err(e) = execve(&path, &args) {
            std::process::exit(e as i32);
        }
    }

Interestingly, while the nix crate provides a good rust interface over libc for most functions, fork is still unsafe. This is because there’s no way go guarantee that we can memory safetly when we’re cloning the entire memory space.

Tying it all together

So we have our three components: Our Buffer, our Parser, and our Executor. Let’s tie them together into a REPL.

First, we set the terminal settings to disable Canonical mode, and echoing.

fn main() {
    let reader = std::io::stdin();

    let mut attrs = match tcgetattr(&reader) {
        Ok(attrs) => attrs,
        Err(e) => {
            return;
        }
    };

    // Disable "Canonical mode" and "Echo".
    // Canonical mode means that the terminal will buffer input until a newline is received, this allows us to read input one char at a time.
    // Echo means that the terminal will print input back to the user, this allows us to read input without the user seeing it.
    attrs.local_flags &= !(LocalFlags::ICANON | LocalFlags::ECHO);

    if let Err(e) = tcsetattr(&reader, SetArg::TCSANOW, &attrs) {
        return;
    }

Then we create our buffer from an unbuffered stdin input, and read a line:

    let stdout = unsafe { File::from_raw_fd(1) };
    let mut buffer = Buffer::new(reader, stdout);
    loop {
        let line = match buffer.read() {
            Ok(line) => line,
            Err(e) => {
                return;
            }
        };

Then we parse that line as an expression:

        let expression = match parser::try_parse::<Expression>(input) {
            Ok(Some(expr)) => expr,
            Ok(None) => return Ok(()),
            Err(e) => {
                println!("Error: {}", e);
                continue;
            }
        };

And finally, turn that expression into an array of arguments and execute it:

        let args = concrete_arguments(expression);
        let mut process = Process::new(args);
        process.start()?;
    }
}

And?

$ cargo run -p qsh     
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/qsh`
/bin/ls -l
total 15284
drwxr-xr-x. 1 colin colin       66 Dec 29 16:19 assemble-initramfs
-rw-r--r--. 1 colin colin    14633 Jan  3 08:24 Cargo.lock
-rw-------. 1 colin colin      401 Jan  1 12:22 Cargo.toml
drwxr-xr-x. 1 colin colin       26 Dec 29 16:15 cpio
drwxr-xr-x. 1 colin colin       72 Jan  1 14:46 escapes
-rw-r--r--. 1 colin colin 15620696 Jan  6 17:49 initramfs.cpio
-rw-r--r--. 1 colin colin      395 Dec 29 18:24 Makefile
drwxr-xr-x. 1 colin colin      184 Jan 13 18:46 notes
drwxr-xr-x. 1 colin colin       26 Dec 31 17:41 qsh
-rw-r--r--. 1 colin colin      727 Dec 26 10:29 README.md
drwxr-xr-x. 1 colin colin      116 Dec 31 18:29 target

It works! We’ve sucessfully run a command! There’s a lot to improve here. In no particular order:

Adding support for variable interpolation
$PATH handling, so we don’t have to give a fully qualified path to an executable
waiting for a child process to exit before continuing the new shell loop
Control structures like if statements and for loops

But hey, we’ve got the basic functionality down and that’s pretty neat. I’ll continue to work on the shell in the background as we need more functionality but now that we have something we can boot into, what’s next? The main thing we need to do in our initramfs is to mount a root file system, so let’s work towards that. We’ll need some control structures in our shell, and a mount command to run that calls the mount syscall. That sounds like a good next step!

I'm on Twitter: @sinkingpoint and BlueSky: @colindou.ch. Come yell at me!

A Buffer#

Apparently VIM contributed to it?#

Cursor Movement#

Reset Initial State#

Terminal Settings#

speed 38400 baud#

line = 0#

-brkint#

-imaxbel#

iutf8#

Backspacing#

Parsing ANSI Escape Codes#

A Parser#

An Executor#

execl and execv#

execle#

execlp and execvp#

execvpe#

execlpe#

Tying it all together#