In many modern system architectures, there exists some common building blocks between systems. We generally call these the “Platform” (See Facebooks “Platform Engineer” role), which makes a nice analogy as these components are supposed to form a solid base for you to build a system on top of. But how should we manage these common pieces? Therein lies an interesting question, that I here offer some thoughts on.
There’s three main organisational models I have seen in the wild, each with their benefits and drawbacks.
In Economics, there is the concept of The Tragedy of the Commons. The gist of this is that in a world where everyone is acting in their own self interest (i.e. this world, humans are inherantly selfish), unregulated common lands get neglected as everyone over exploits them. If you’ve ever worked in a start up environment, this situation may sound familiar - often common platforms are spun up on an adhoc, as needed basis. If we need a database, we start a Postgres server; if we need metrics, we spin up a Prometheus instance. This works fine, until it suddenly doesn’t. These services eventually collapse under the weight of so many unregulated services. So what can we do? Well the Tragedy of the Commons gives us two options: Regulation, or Privatisation.
So lets consider how we can regulate our commons. To start, we need to acknowledge that our Platform services are not infinite resources - they need to be managed by someone. This doesn’t have to be a full time thing, but we need to put some effort in. We need to have a capacity plan - an idea of how many requests/puts/whatevers our service can handle, and then we need to allocate that capacity (minus some headroom, say 20%) amongst the users of that platform. We can call an individual allocation a “Budget” (Similar to Googles Error Budget concept), and enforce it via Ratelimits at a per second/minute/hour level.
Our other option is to get rid of the commons altogether. In this model, instead of one monolithic Platform for everyone, each Team or Service (depending on how you want to structure it) would operate their own Platform services. If a team needs a Database, they can spin it up themselves. In practise, this model requires a lot more effort than Regulation though - in order to prevent teams being bogged down in running their platform services, there needs to be appropriate tooling built around each service that allows teams to manage their services (Spin it up, Monitor it, Expand it as necessary), and this generally requires dedicated headcount to build and maintain.
So which one is right for me?
The startups I’ve seen have progressed through all three in their paths to success. Most start out with a commons model, but with this article I hope I’ve convinced you that this model should be abandoned as soon as possible. Privatisation of the commons in my experience seems to be the most stable model of the three, as long as devs are receptive to the idea of running their own infrastructure. It reduces the blast radius of failures (a team can only affect their own platforms), and removes the Noisy Neighbor Problem, but requires a big investment in tooling and support to make work properly. Regulation is a good middle ground for most mid sized companies however, and seems to be reliable with a minimal level of effort so for most I would recommend it.Like this post, or just want to yell at me? Follow me on Twitter: @sinkingpoint