The Package Management of Cake ๐Ÿง

Building more deterministic, reproducible and secure build systems

ยท

7 min read

Featured on Hashnode

Writing good code is important. Making sure users can run your code is just as important, if not more.

Think of your source code as a cake recipe. It gives you all the instructions to make a tasty dessert. You compile your source code into a binary, you follow the recipe to make a delicious Victoria Sponge. Piece of cake. ๐Ÿฐ

Although all of your users have the same recipe, someone might be using substitute ingredients, while another person might not have the right equipment. In a similar sense, different compiler versions, missing header files, and so much more can lead to users opening ticket reports and causing headaches for maintainers.

Developers should be spending their time adding features and fixing bugs, while users just want programs to run. Maybe there are some lessons we can learn from the age-old tradition of making a cake!

There are two aspects in particular that I want to focus on:

  • What you can do now as a developer for your project.
  • What current build systems can improve on for the future.

Fetching Dependencies ๐Ÿ“ฆ

Imagine you want to bake a cake. You've found a good recipe online and you're ready to start baking. After scanning through the list, you make a note of all the ingredients you need. You jump in your car, and start driving to the shop.

However, there's a problem. Hundreds of other people are also trying to buy ingredients from the same shop. Looks like you'll be waiting here a while...

You decide to come back another day, only to find that the shop has closed! Where will you get your eggs from now?

Many build systems today like to fetch dependencies during the build. This can be problematic for a few reasons. Your project should ideally build whether or not there's internet access, but having no internet is equivalent to your car breaking down. You can't go to the shop to buy ingredients, so you can't make the cake.

Having hundreds of people downloading a dependency from the same website can also put additional strain on their infrastructure. Or, similar to the closed shop, what happens if the website goes down? Then the build will fail.

What's the solution? ๐Ÿค”

If one shop isn't available, there should be other places where you can fetch your ingredients. In terms of software deployment, backups of dependencies should be kept on other servers so that if one goes down, users can still build the project. This process is known as mirroring.

Having the same dependency available via multiple servers also helps to reduce the amount of strain put on each server. We've killed two birds with one stone! ๐Ÿ•Š

I also mentioned that the build should ideally succeed without internet access. A user might be suspicious if a project starts fetching lots of unknown dependencies from online, or they might have a dodgy internet connection. This can be solved by allowing users to "use their own ingredients".

If I already have enough eggs at home, I shouldn't have to go to the shop to buy even more unnecessarily. In the same sense, if a user already has the right dependency on their system, they should be allowed to use it rather than a build system redownloading it.

This process is traditionally done using a configure script, which can check whether certain dependencies are already provided or whether they need to be fetched.

Many Go projects take the approach of vendoring their dependencies. This involves packaging the dependencies with the source code. It's a bit like a meal kit, where the ingredients are already provided. It has its pros and cons, but it's an alternative approach.

Reproducible Builds ๐Ÿ—

You've finally finished making the cake. Hooray! ๐ŸŽ‰ You enjoy eating it so much that you want to make it again tomorrow. After baking another cake, something seems different. It doesn't taste right. To your dismay, you realise that the milk had soured overnight.

You would still like to eat some cake, so you ask your friend if they can make it for you. You provide them with the recipe, and they kindly bake you a cake. Even though they followed the same steps, it looks slightly different.

When building a project, it shouldn't matter whether a user builds it today or tomorrow, or if they're in one timezone or another. The result should always be the same. This concept is known as reproducible builds.

This also helps to deal with the classic "It works on my machine" problem. If a build is truly reproducible, given the same source code, two different machines should output identical binaries.

How do you achieve this? ๐Ÿ’ญ

You can't be 100% sure whether a build is truly reproducible. To give a simple Python example:

from random import randint

random_integer = randint(1, 100)
if random_integer == 2:
   print("Don't believe everything you see")
else:
   print("I'm totally reproducible")

From the outside, this might seem like it always gives the same output, regardless of the user's operating system, timezone etc. However, one day a user might run it and be surprised to see something different.

The reproducible builds documentation has a lot of useful information on how to make your project builds more reproducible. Simple things like not including a timestamp, which modifies the binaries depending on when they were built, can go a long way.

As of the time of writing, Arch Linux claims that around 79% of their packages are built reproducibly. They test this by building the same project twice with a few differences in the build environment, and then comparing the resulting binaries. Using a similar process, NixOS claims that almost 100% of builds are reproducible.

Security ๐Ÿ”’

Source Code โ†’ Binary ๐Ÿ‘ท

Baking cakes is hard work. You find someone online who delivers pre-made cakes, no baking required. It seems too good to be true. How can you be sure it's exactly the same cake as described in the recipe?

A recipe is very easy to inspect. If there's poison in the recipe, you would notice. If you had the final cake, however, you wouldn't be certain whether there's poison in it or not.

In a similar sense, developers can read through source code to check whether there's any poison. It's more difficult to check a compiled binary.

Many projects offer pre-compiled binaries of the source code that are ready to download. We all trust that it hasn't been tampered with, and that it is just the compiled source code and nothing more. How can we be sure that a backdoor hasn't been added, or that it doesn't contain some form of malware?

What's the solution? ๐Ÿ™‹

A lot of automated build systems (e.g. GitHub Actions) allow users to view the logs that show the source code being compiled. This is a good first step.

If the builds are reproducible, this allows for independent third parties to build the source code themselves and check that the provided binary is identical to the compiler output.

Downloading Resources ๐Ÿง‘โ€๐Ÿ’ป

You've ordered the cake, and it's out for delivery. When the package arrives at your door, you notice that the box has been damaged and it looks like it's been opened. Maybe we shouldn't eat this cake...

The internet is a dangerous place. A file could be maliciously tampered with while it's being downloaded, and users might be none the wiser.

What's the solution? ๐Ÿง

Checksums. A good checksum algorithm will give different results if the contents of a file have changed. From a developer's perspective, publishing the checksums with the files to download allows users to check whether they have been tampered with during transmission.

The same is true while fetching dependencies when building from source. These are also downloaded from online, and so build systems should check that they have been modified.

What can you do now? ๐Ÿ’ป

As a package maintainer at MacPorts, these sort of situations come up quite often. For you as a developer, here are some recommendations.for your project:

  • Publish checksums of your download files.
  • If you can, provide an offline build mode that allows using pre-provided dependencies. Package maintainers will be grateful :)
  • Avoid timestamps, timezones and check out the reproducible docs.

Thanks for reading!

ย