Logging (i.e. exporting text data from your application) is one of the very first things that any budding programmer learns to do. Who among us hasn’t started learning a programming language with a:

10 PRINT "Hello World!"
20 END

That’s a log! Not a good log, but a log nonetheless. And that is why the importance of logging in a systems observability is so often overlooked - everyone can log, it’s hard to log well. Here I want to talk about logging, in particular structured logging and justify my opinion: If you’re not using structured logging, your logs are not actually useful.

What is structured logging?

Let’s consider a Go program that outputs a log:

import "log"

func main() {
    // Do something
    // ...
    // Oh no! An error!
    log.Println("Failed to plug the foo widget into the bar socket")
}

and say we deploy it across a bunch of EC2 instances. Let’s say one of them failed, well, which one is it? This is where structure for your logs comes in. At a basic level, you could simply prepend the instance name to your log:

name, _ := os.Hostname()
// Shhhh... ignoring errors here for the sake of simplicity
log.Printf("%s: Failed to plug the foo widget into the bar socket", name)

Our logs now have some structure! If we want to know on which of our instances the error occured, we can just look at the beginning of the log.

But you can imagine that we might have a bunch of different pieces of metadata about the log that we want to include in our log (like the time, or the part number of our widget, or the shape of our socket), so very quickly the adhoc structure of our log gets kind of convoluted:

name, _ := os.Hostname()
// Shhhh... ignoring errors here for the sake of simplicity
utctime , _ := time.Now().UTC().MarshalText()
log.Printf("%s: %s: Failed to plug the foo widget (ID: %s) into the bar socket (Shape: %s)", utctime, name, foo.id, bar.shape)

Our format is now a bit of a nightmare of implied structure. Even worse: What if we wanted to aggregate these logs? Say, show a graph of errors by widget ID (to identify a bad batch or producer) - well then, bust out the grep and awk because we’re in for a bumpy ride.

This is where structured logging libraries like zerolog come in - there exists many different ones for various languages, but the key feature is that they allow you to tag logs with fields and export that data in a structured way (generally JSON) such that we can easily parse out fields we want. Using zerolog, our log would become something like this:

name, _ := os.Hostname()
log.Debug().
    Str("foo_id", foo.id).
    Str("bar_shape", bar.shape).
    Str("instance", name).
    Msg("Failed to plug the foo widget into the bar socket")
// Produces:
// {"foo_id": "1234", "bar_shape": "Heptagram", "instance": "i-1234", "message": "Failed to plug the foo widget into the bar socket"}

Much cleaner (although admittedly at the cost of human readability in this form). This allows us to search on specific fields (jq "select(.bar_shape == "Heptagram")), or in your favourite log storage), aggregate on the same, and search for specific error messages (our message is now static across all these particular errors, so we can search for it without dealing with IDs in the way).

Why should I use it?

In large deployments, with hundreds if not thousands of instances, dealing with the log flow out of all the systems can be a struggle. What you need is some way to easily slice and dice data to trim out all the chaff and find the logs actually relevant to your problem. This is where structured logging (particularly coupled with a good log ingestion tool like Loki) really shines through. The tooling around structured logging allows you to very quickly filter out logs, produce graphs from fields and generally parse your data a lot quicker than an unstructured log.

Withour structured logging, parsing your logs in any reasonable way becomes too much of a burden in even medium sized deloyments. Considering how little the orchestration overhead is, there’s really no excute to not use structured logging.

I'm on Twitter: @sinkingpoint and BlueSky: @colindou.ch. Come yell at me!