I’ve recently submitted an application to the Royal Society Industry Fellowship scheme, aiming to continue my journey into the world of build systems. Below is the technical section, which I think is worth sharing regardless of the outcome of my application. I’d like to thank Neil Mitchell, Simon Peyton Jones, Simon Marlow and Nick Benton for many enlightening discussions on build systems that helped me understand both the problem and possible approaches to solving it.
A build system is a critical component of most software projects, responsible for compiling the source code written in various programming languages and producing executable programs — end software products. Build systems sound simple, but they are not; in large software projects the build system grows from simple beginnings into a huge, complex engineering artefact. Why? Because it evolves every day as new features are continuously being added by software developers; because it builds programs for a huge variety of target hardware configurations (from mobile to cloud); and because it operates under strict correctness and performance requirements, standing on the critical path between the development of a new feature and its deployment into production.
It is known that build systems can take up to 27% of software development effort, and that improvements to build systems rapidly pay off . Despite its importance, this subject is severely under-researched, which prompts major companies, such as Facebook and Google, to invest significant internal resources to make their own bespoke build system frameworks.
Challenges in large-scale build systems
The following two requirements are important for complex large-scale build systems, however, to the best of my knowledge, no existing build system can support both:
- Distributed cloud builds: codebase of large software projects comprises millions of lines of code spread across thousands of files, each built independently by thousands of developers on thousands of machines. A distributed cloud build system speeds up builds dramatically (and saves energy) by transparently sharing build products among developers.
- Dynamic dependency graphs: many existing build systems need to know all the dependencies between software components, i.e. the dependency graph, at the start of the build process. This makes it possible to analyse the graph ahead of time, but is fundamentally limited: parts of the dependency graph can often be discovered only during the build process, i.e. the dependency graph is dynamic, not static.
There exist build systems that support cloud builds, e.g. Facebook’s Buck and Google’s Bazel, as well as build systems that support dynamic dependencies, e.g. Neil Mitchell’s Shake (originally developed at Standard Chartered) and Jane Street’s Jenga, but not both.
Proposed approach and research objectives
I will combine two known techniques when developing the new build system: 1) storing build results in a cloud build cache, where a file is identified by a unique hash that combines hashes of the file, its build dependencies and the environment, so that users of the build system can fetch an already built file from the cache, or build it themselves and share it with others by uploading it into the build cache; and 2) storing the last known version of the dependency graph in a persistent graph storage, and updating it whenever a file needs to be rebuilt and the newly discovered set of dependencies differs from the previous record, as implemented in the Shake build system .
My first research objective is to formalise the semantics of cloud build systems with dynamic dependency graphs in the presence of build non-determinism (real compilers are not always deterministic) and partial builds (not all intermediate build results are necessary to store locally). I will then integrate cloud build caches with persistent graph storage and develop a domain-specific language for the new build system using scalable functional programming abstractions, such as polymorphic and monadic dependencies, high-level concurrency primitives, and compositional build configurations . In 2012-2014 I developed an algebra for dependency graphs  that I will use for proving the correctness of new build algorithms. The final research objective is to evaluate the developed build system and explore opportunities for its integration into a real large-scale build infrastructure.
 S. McIntosh et al. “An empirical study of build maintenance effort”, In Proceedings of the International Conference on Software Engineering (ICSE), ACM, 2011.
 N. Mitchell. “Shake before building – replacing Make with Haskell”, In Proceedings of the International Conference on Functional Programming (ICFP), ACM, 2012.
 A. Mokhov, N. Mitchell, S. Peyton Jones, S. Marlow. “Non-recursive Make Considered Harmful: Build Systems at Scale”, In Proceedings of the International Haskell Symposium, ACM, 2016.
 A. Mokhov, V. Khomenko. “Algebra of Parameterised Graphs”, ACM Transactions on Embedded Computing Systems, 13(4s), p.143, 2014.