Joseph Naberhaus


Building Faster by Building Less

May 2022

For this website, I wrote my own static site generator (a program that compiles content into a website). At first, my generator was fast because it was simple. That changed recently after adding a couple of handy but slow features. Now it takes several minutes to create the entire website, and it's going to get worse as I add more content. This was an unacceptable delay to my iteration times and needed to be fixed.

It isn't possible for me to significantly improve the speed of the generator. Instead, I need to build less content each time I run it. I accomplished this by recycling as much of the previous build as possible and only compiling the parts that have changed. Most programming languages use this same trick and call it incremental compilation. The rest of this article describes how it works.

Change Detection

It all starts by determining if a file has changed. I could do this by keeping a copy of the old version of every file. Then, after I've made some edits and started the generator, I could compare each byte of the old and new versions of the file to determine if it has changed.

Byte by Byte Comparison

This method is effective, but it requires keeping a duplicate copy of every file. This is rather expensive for big files, such as images and videos. Instead, we use a hash function to condense each file into a fixed-length sequence of bytes. Then, we can compare those hashes instead of the whole files.

Hash Comparison

If the hash didn't change, then the file didn't change either. There are exceptions where two different files can produce the same hash. However, with a good hash function, they are so rare that they're nearly impossible.

Dependencies

Knowing if a file has changed isn't enough to determine if it needs to be recompiled. My website builder allows files to form dependencies on other files. If a dependency of a file changes, then we also need to rebuild that file. Click around below to see how dependencies affect other files.

Dependencies
= Same = Different Click on a file to change it. File A File B File C Image A Image B Component A Component B Image C
Files to Rebuild:

You can see that even one file change can have a large impact on the total number of files that need to be recompiled. Still, unless every file is different, we probably won't have to recompile them all. The algorithm for performing the change detection demonstrated above can be written in a few lines of pseudocode:

shouldRecompile(file):
    // The file is brand new.
    if file.oldHash == null:
        return true

    // The file has changed.
    if file.oldHash != file.newHash:
        return true

    for each dependency of the file:
        // The dependency changed.
        if shouldRecompile(dependency):
            return true

    // Nothing relevant changed. Reuse the old compilation of the file.
    return false

This code snippet takes in three pieces of information:

Keeping track of the new and old hashes is trivial, but how do we know the current dependencies without recompiling the file? The key is that we first check if the file's content has changed. If it did, then we have to rebuild the file anyway. If it didn't change, then the list of dependencies for that file also didn't change, and we can reuse the list of dependencies from the previous build.

Artifacts

There is one more problem to be solved with dependencies. When one file depends on another, it usually needs some information about it. For example, when an HTML page depends on an image, it needs to know what sizes of it are available in the output. We don't want to have to compile the file each time to determine this information, so instead, I save this information within a field called the Artifact of a file. Here's an example of what the artifact looks like for images:

Image Build Artifact
image.png
An image with the specified width.

            

All Artifact objects are placed in a file called build-cache.json (which is incidentally also where I store the old hashes and dependency lists). When I recompile the website, I can reference either the old artifact for the file or, if necessary, rebuild the file and reference the new artifact.

My Implementation

I've been using incremental compilation while writing this post, and it has been working great! If you want to see the actual implementation I'm using it can be found in my GitHub repository for this website. Someday, I plan to pull the incremental compilation portion of that code out into a separate, reusable library so that other Golang projects can easily take advantage of it.