{"id":47,"date":"2014-10-11T22:07:29","date_gmt":"2014-10-11T21:07:29","guid":{"rendered":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/?p=47"},"modified":"2018-07-07T11:58:51","modified_gmt":"2018-07-07T10:58:51","slug":"shaking-up-ghc","status":"publish","type":"post","link":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/shaking-up-ghc\/","title":{"rendered":"Shaking up GHC"},"content":{"rendered":"<p>As part of my 6-month research secondment to Microsoft Research in Cambridge I am taking up the challenge of migrating the current <a title=\"Glasgow Haskell Compiler\" href=\"https:\/\/en.wikipedia.org\/wiki\/Glasgow_Haskell_Compiler\">GHC<\/a> build system based on standard <code>make<\/code> into a new and (hopefully) better one based on <a title=\"Find out more about Shake\" href=\"https:\/\/github.com\/ndmitchell\/shake\/blob\/master\/README.md\"><code>Shake<\/code><\/a>. If you are curious about the project you can find more details on the <a title=\"Shaking up GHC\" href=\"https:\/\/ghc.haskell.org\/trac\/ghc\/wiki\/Building\/Shake\">wiki page<\/a>.<\/p>\n<p>During this week I&#8217;ve been trying to wrap my head around the current build system and so far I&#8217;ve felt rather intimidated by the scale of this undertaking. Yet I am optimistic because I&#8217;m helped by several much smarter people \ud83d\ude42 If you are a Haskell fan and would like to help make GHC better, please join this team effort!<\/p>\n<p><!--more--><\/p>\n<p>To give you an idea about typical dependency chains, below is a representative one:<\/p>\n<ul>\n<li>File <code>compiler\/prelude\/primops.txt.pp<\/code> describes GHC&#8217;s primitive operations and types. It uses C preprocessor directives like <code>#include<\/code> and <code>#if<\/code> and therefore needs to be preprocessed. The result goes into <code>compiler\/stage1\/build\/primops.txt<\/code>. If one of the included files changes (e.g., <code>ghc_boot_platform.h<\/code>) the result must be rebuilt.<\/li>\n<li>File <code>primops.txt<\/code> is processed by <code>genprimopcode --out-of-line<\/code> to generate <code>primop-out-of-line.hs-incl<\/code>.<\/li>\n<li><code>genprimopcode<\/code> itself needs to be built. It lives in <code>utils\/genprimopcode<\/code> and is a Haskell program, so it is first built with the bootstrap (stage 0) compiler. Read more about compiler stages <a title=\"GHC stages\" href=\"https:\/\/ghc.haskell.org\/trac\/ghc\/wiki\/Building\/Architecture\/Idiom\/Stages\">here<\/a>.<\/li>\n<li>Finally, a bunch of <code>.hs-incl<\/code> files are included (by a C preprocessor) into <code>compiler\/prelude\/PrimOps.lhs<\/code>, which then participates in the rest of the build in a more or less standard way.<\/li>\n<\/ul>\n<h3>A detour into graph families and parallelism<\/h3>\n<p>Since my recent research has been focused on graphs, I find build dependency &#8216;graphs&#8217; quite interesting. I put graphs in quotes, because they are actually <i>graph families<\/i>, where each constituent graph corresponds to a partial order on the set of build actions: this is what gets executed when you run a build tool once (partial orders give parallelism and termination). There is some additional <i>conditional<\/i> structure, which enables\/disables parts of the family depending on certain file freshness criteria, contents of generated files, and on the final build target; conditions select a particular graph in the family, so each build run is unambiguous. I think that one can use the <i>algebra of parameterised graphs<\/i> to capture and reason about build systems, for example, transform them in a provably correct way. The theory of parameterised graphs has been developed specifically to reason about graph families; you can have a look at my recent paper about this <a title=\"A paper about the algebra of parameterised graphs\" href=\"http:\/\/goo.gl\/55mvNB\">here<\/a>.<\/p>\n<p>And a final comment about parallelism and randomness. One often wants to control the degree of parallelism during a build run, e.g., by passing an argument like <code>-j8<\/code> to <code>make<\/code> or <code>Shake<\/code> (I believe such parallelism control is called a <i>policy<\/i> in the concurrency theory). When parallelism is limited, there may be several possible executions of a partial order, and here is where randomness comes handy, as nicely explained in the <a title=\"Shake paper\" href=\"http:\/\/community.haskell.org\/~ndm\/downloads\/paper-shake_before_building-10_sep_2012.pdf\">Shake paper<\/a> (section 4.3.2). Briefly, different build actions may require different resources (CPU, disk, memory, etc.), and scheduling them deterministically may lead to resource contention (imagine always running all compilers before all linkers). A randomised scheduling of build actions avoids this worst case scenario and is reported to give up to 20% speed-up in real build systems. I am curious to see what speed-up can be achieved by randomness in the GHC case. It may be worth doing some research into a clever scheduling strategy that minimises resource contention intelligently (rather than by randomness) by putting build actions in parallel bundles according to their resource requirements. One will then need to annotate build actions with their resource requirements, of course, but this seems not very difficult to do as there are just a few different types of build actions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As part of my 6-month research secondment to Microsoft Research in Cambridge I am taking up the challenge of migrating the current GHC build system based on standard make into a new and (hopefully) better one based on Shake. If you are curious about the project you can find more details on the wiki page. &hellip; <a href=\"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/shaking-up-ghc\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Shaking up GHC<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1174,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[14,4],"class_list":["post-47","post","type-post","status-publish","format-standard","hentry","category-coding","tag-build","tag-haskell"],"_links":{"self":[{"href":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/wp-json\/wp\/v2\/posts\/47","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/wp-json\/wp\/v2\/users\/1174"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/wp-json\/wp\/v2\/comments?post=47"}],"version-history":[{"count":17,"href":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/wp-json\/wp\/v2\/posts\/47\/revisions"}],"predecessor-version":[{"id":816,"href":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/wp-json\/wp\/v2\/posts\/47\/revisions\/816"}],"wp:attachment":[{"href":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/wp-json\/wp\/v2\/media?parent=47"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/wp-json\/wp\/v2\/categories?post=47"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.ncl.ac.uk\/andreymokhov\/wp-json\/wp\/v2\/tags?post=47"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}