I’ve recently submitted an application to the Royal Society Industry Fellowship scheme, aiming to continue my journey into the world of build systems. Below is the technical section, which I think is worth sharing regardless of the outcome of my application. I’d like to thank Neil Mitchell, Simon Peyton Jones, Simon Marlow and Nick Benton for many enlightening discussions on build systems that helped me understand both the problem and possible approaches to solving it.
Build systems
A build system is a critical component of most software projects, responsible for compiling the source code written in various programming languages and producing executable programs — end software products. Build systems sound simple, but they are not; in large software projects the build system grows from simple beginnings into a huge, complex engineering artefact. Why? Because it evolves every day as new features are continuously being added by software developers; because it builds programs for a huge variety of target hardware configurations (from mobile to cloud); and because it operates under strict correctness and performance requirements, standing on the critical path between the development of a new feature and its deployment into production.
It is known that build systems can take up to 27% of software development effort, and that improvements to build systems rapidly pay off [1]. Despite its importance, this subject is severely under-researched, which prompts major companies, such as Facebook and Google, to invest significant internal resources to make their own bespoke build system frameworks.
Challenges in large-scale build systems
The following two requirements are important for complex large-scale build systems, however, to the best of my knowledge, no existing build system can support both:
- Distributed cloud builds: codebase of large software projects comprises millions of lines of code spread across thousands of files, each built independently by thousands of developers on thousands of machines. A distributed cloud build system speeds up builds dramatically (and saves energy) by transparently sharing build products among developers.
- Dynamic dependency graphs: many existing build systems need to know all the dependencies between software components, i.e. the dependency graph, at the start of the build process. This makes it possible to analyse the graph ahead of time, but is fundamentally limited: parts of the dependency graph can often be discovered only during the build process, i.e. the dependency graph is dynamic, not static.
There exist build systems that support cloud builds, e.g. Facebook’s Buck and Google’s Bazel, as well as build systems that support dynamic dependencies, e.g. Neil Mitchell’s Shake (originally developed at Standard Chartered) and Jane Street’s Jenga, but not both.
Proposed approach and research objectives
I will combine two known techniques when developing the new build system: 1) storing build results in a cloud build cache, where a file is identified by a unique hash that combines hashes of the file, its build dependencies and the environment, so that users of the build system can fetch an already built file from the cache, or build it themselves and share it with others by uploading it into the build cache; and 2) storing the last known version of the dependency graph in a persistent graph storage, and updating it whenever a file needs to be rebuilt and the newly discovered set of dependencies differs from the previous record, as implemented in the Shake build system [2].
My first research objective is to formalise the semantics of cloud build systems with dynamic dependency graphs in the presence of build non-determinism (real compilers are not always deterministic) and partial builds (not all intermediate build results are necessary to store locally). I will then integrate cloud build caches with persistent graph storage and develop a domain-specific language for the new build system using scalable functional programming abstractions, such as polymorphic and monadic dependencies, high-level concurrency primitives, and compositional build configurations [3]. In 2012-2014 I developed an algebra for dependency graphs [4] that I will use for proving the correctness of new build algorithms. The final research objective is to evaluate the developed build system and explore opportunities for its integration into a real large-scale build infrastructure.
References
[1] S. McIntosh et al. “An empirical study of build maintenance effort”, In Proceedings of the International Conference on Software Engineering (ICSE), ACM, 2011.
[2] N. Mitchell. “Shake before building – replacing Make with Haskell”, In Proceedings of the International Conference on Functional Programming (ICFP), ACM, 2012.
[3] A. Mokhov, N. Mitchell, S. Peyton Jones, S. Marlow. “Non-recursive Make Considered Harmful: Build Systems at Scale”, In Proceedings of the International Haskell Symposium, ACM, 2016.
[4] A. Mokhov, V. Khomenko. “Algebra of Parameterised Graphs”, ACM Transactions on Embedded Computing Systems, 13(4s), p.143, 2014.
Thanx for sharing Andrey!! Great ideas and thinking, indeed there is not much work in combining those two directions, if no at all.
May be Facebook or Google would also be very much interested in moving forward from their current ‘cloud builds’ 😉 ?
Cheers from Bru.
MR
Thanks Max! Conquering Facebook, Google and the rest of the world, is one of the secret goals of the project 😉
Good luck! https://github.com/Ericson2314/haskell-vcache/blob/master/generic-store/hsrc_lib/Database/Generic.hs is an experiment I never finished which might help come implementation time.
Great, I’ll look into it when we start implementing a prototype. Thank you!
Also, I believe Nix with “import-from-derivation” is both distributed and dynamic. But certainly there are many ways to improve on it.
https://github.com/NixOS/nix/issues/954: “Figure out how to make IFD work properly” lists some issues, though they aren’t very theoretically interesting ones.
Hmm… I should finally learn how Nix works — thank you for pointing this out. Do you know if there is any formal definition of semantics of Nix builds?
https://nixos.org/docs/papers.html might help? Eelco Dolstra’s thesis is probably the most descriptive (and the only I’ve looked at). I didn’t see anything about import-from-derivation, but only have looked at little bits.
Awesome, thanks again!
In case nothing talks about import-from-derivation, here’s a brief summary.
Historically Nix had two phases, “eval” evaluating the Nix expression to a (Merkle) dependency DAG from a dag of specially-tagged objects (derivations), and then “build” building that dependency DAG.
The “Nix Expression Language” is mostly pure, except for the built-in functions `readFile ` which reads a file into string, and `import ` which parses the file as a nix expression and evaluates it.
Normally, any path ending up in a derivation becomes a dependency. The built-in string printing logic of a derivation becomes the path where it *will* be built, which includes its hash. The “Merkleness” of the dependency DAG simply comes from these hash-containing-paths being treated like any other string when the derivation that includes them is hashed.
From what I side above alone, since `import` eliminates the path, it would never make it to the final dependency DAG. And even if it did, it would be no use because when the interpreter goes to load the path at eval-time, nothing will be built it and the import will fail.
What “import-from-derivation” means is `import `works. The derivation is built, and then evaluation is continued. This adds dynamism and removes the restriction to two phases. This seems to correspond to ndm’s Applicative vs Monadic build systems, though I haven’t heard of an applicative functor or monad formalized for Nix nor come up with one myself. I’d hope that for any two languages (here the Nix Expression language and the Derivation DAG langauge), where each evaluates to syntax of the other, one can construct an “Eval monad” corresponding to “import-from-derivation”.
(I’ve never heard “readFile-from-derivation” but the principle would be the same. I know it does work in practice.)
A brief diversion. What the thesis does go into well is an interesting contrast between the “extensional store” and “intensional store”. The intensional store is the only one implemented (at the time of the thesis and today), but extensional one is wholly better. I highly recommend that section.
I bring this up because I believe there is some fundamental commonality between the existential store and import-from-derivation. Specifically, both allow one to “forget” what happens before and “rehash”—the tradeoff is some extra work is done (hopefully cheap) in hopes that more cached builds will be usable (hopefully expensive ones).
Finally, check out https://github.com/cleverca22/nix-tests/blob/master/make.nix for a tiny demo of trying to build C code. Plain old Nix is alright alright as a “package manager” (i.e. for inter-package dependencies alone), but I think the dynamism is needed to make a compelling “build system” (intra-package dependencies).
Hope this makes sense and is useful. I’m really excited about you tackling this topic which is indeed so under-researched!
Arg! I had some `angle-open expr angle-closed` and `angle-open derivation angle-closed` that got erased:
-`import [expr evaluating to path]`traditional import
-`readFile [expr evaluating to path]`traditional readFile
– `import [expr evaluation to derivation]` import-from-dervation
Look for the space before the closing ‘`’ for where they were
> Hope this makes sense and is useful. I’m really excited about you tackling this topic which is indeed so under-researched!
This is very very useful! I will definitely start the project by reading the thesis — it looks like an essential reading.