Graphs in disguise: from todo lists to build systems

In this blog post we will look at an example of using the algebra of graphs for manipulating sequences of items, which at first sight might not look like graphs, but are actually dependency graphs in disguise. We will develop a tiny DSL for composing todo lists on top of the alga library and will show how it can be used for planning a holiday and, on a more serious note, for writing software build systems. This blog post will partially answer the question about possible applications of the algebra of graphs that was asked in this reddit discussion.

Update: This series of blog posts was published as a functional pearl at the Haskell Symposium 2017.

Todo lists and the algebra of graphs

Todo lists are sequences of items that, as one would expect, need to be done. The order of items in the sequence matters, because some items may depend on others. The simplest todo list is the empty one. Then we have todo lists containing a single item, from which we can build up longer lists using the same operators we introduced to construct graphs.

An item will correspond to a graph vertex. We’ll use the OverloadedStrings GHC extension, so we can create todo items without explicitly wrapping them into a vertex. This will also allow everyone to choose their favourite representation for strings; plain old String is fine for our examples:

{-# LANGUAGE OverloadedStrings #-}
import Data.String

import Algebra.Graph
import Algebra.Graph.Util

instance (IsString a, Ord a) => IsString (Todo a) where
    fromString = vertex . fromString

shopping :: Todo String
shopping = "presents"

Here Todo is a Graph instance whose implementation will be revealed later. One can combine several items into a single todo list using the overlay operator + of the algebra:

shopping :: Todo String
shopping = "presents" + "coat" + "scarf"

The semantics of a todo list is just a list of items in the order they can be completed, or Nothing if the there is no possible completion order that satisfies all dependency constraints between different items. We can extract the semantics using todo function with the following signature:

todo :: Ord a => Todo a -> Maybe [a]

The overlay operator is commutative, therefore reordering items in the shopping list does not change the semantics:

λ> todo shopping
Just ["coat","presents","scarf"]
λ> todo $ "coat" + "scarf" + "presents"
Just ["coat","presents","scarf"]
λ> shopping == "coat" + "scarf" + "presents"
True

As you can see, the items are simply ordered alphabetically as there are no dependencies between them. Let’s add some! To do that we’ll use the connect operator → from the algebra. When two todo lists are combined with →, the meaning is that all items in the first list must be completed before we can start completing items from the second todo list. I’m currently planning a holiday trip to visit friends and therefore will need to pack all stuff that I buy before travelling:

holiday :: Todo String
holiday = shopping * "pack" * "travel"

λ> todo holiday
Just ["coat","presents","scarf","pack","travel"]
λ> shopping == holiday
False
λ> shopping `isSubgraphOf` holiday
True

Items "pack" and "travel" have been appended to the end of the list even though "pack" comes before "presents" alphabetically, and rightly so: we can’t pack presents before we buy them!

Now let’s add a new dependency constraint to an existing todo list. For example, I might want to buy a new scarf before a coat, because I would like to make sure the coat looks good with the new scarf:

λ> todo $ holiday + "scarf" * "coat"
Just ["presents","scarf","coat","pack","travel"]

Look how the resulting list changed: "coat" has been moved after "scarf" to meet the new constraint! Of course, it’s not too difficult to add contradictory constraints, making the todo list impossible to schedule:

λ> todo $ holiday + "travel" * "presents"
Nothing

There is nothing we can do to complete all items if there is a circular dependency in our todo list: "presents" → "pack" → "travel" → "presents".

It may sometimes be useful to have some notion of item priorities to schedule some items as soon or as late as possible. Let me illustrate this with an example, by modifying our todo lists as follows:

shopping :: Todo String
shopping = "presents" + "coat" + "phone wife" * "scarf"

holiday :: Todo String
holiday = shopping * "pack" * "travel" + "scarf" * "coat"

As you see, I now would like to phone my wife before buying the scarf to make sure it also matches the colour of one of her scarves (she has a dozen of them and I can’t possibly remember all the colours). Let’s see how this changes the resulting order:

λ> todo holiday
Just ["phone wife","presents","scarf","coat","pack","travel"]

This works but is a little unsatisfactory: ideally I’d like to phone my wife right before buying the scarf. To achieve that I can amend the shopping list by changing the priority of the item "phone wife":

-- Lower the priority of items in a given todo list
low :: Todo a -> Todo a

shopping :: Todo String
shopping = "presents" + "coat" + low "phone wife" * "scarf"

λ> todo holiday
Just ["presents","phone wife","scarf","coat","pack","travel"]

Aha, this is better: "phone wife" got scheduled as late as possible, and is now right next to "scarf", as desired. But wait — if my wife finds out that I gave a low priority to my phone calls to her, I’ll get into trouble! I need to find a better way to achieve the same effect. In essence, we would like to have a variant of the connect operator that pulls the arguments together as close as possible during scheduling (and, alternatively, we may also want to repel arguments as far from each other as possible).

-- Pull the arguments together as close as possible
(~*~) :: Ord a => Todo a -> Todo a -> Todo a

-- Repel the arguments as far as possible
(>*<) :: Ord a => Todo a -> Todo a -> Todo a

shopping :: Todo String
shopping = "presents" + "coat" + "phone wife" ~*~ "scarf"

This looks better and leads to the same result as the code above.

The final holiday expression can be visualised as follows:
Graph expression

Here the overlay operator + is shown simply by placing its arguments next to each other, the connect operators are shown by arrows, and the arrow with a small triangle stands for the tightly connect operator ~*~. By following the laws of the algebra, we can flatten the graph expression into a dependency graph shown below:
Dependency graph

The graph is then linearised into a list of items by the todo function.

So, here you go: you can plan your holiday (or anything else) in Haskell using the alga library!

Constructing command lines in build systems

The above reminds me of build systems that construct command lines for executing various external programs, such as compilers, linkers, etc. A command line is just a list of strings, that typically include the path to the program that is being executed, paths to source files, and various configuration flags. Some of these strings may have order constraints between them, quite similar to todo lists. Let’s see if we can use our tiny DSL for todo lists for describing command lines.

Here is a simple command line to compile "src.c" with GCC compiler:

cmdLine1 :: Todo String
cmdLine1 = "gcc" * ("-c" ~*~ "src.c" + "-o" ~*~ "src.o")

λ> todo cmdLine1
Just ["gcc","-c","src.c","-o","src.o"]

Build systems are regularly refactored, and it is useful to track changes in a build system to automatically rebuild affected files if need be (for example, in the new GHC build system Hadrian we track changes in command lines and this helps a lot in its development). Some changes do not change the semantics of a build system and can therefore be safely ignored. As an example, one can rewrite cmdLine1 defined above by swapping the source and object file parts of the command line:

cmdLine2 :: Todo String
cmdLine2 = "gcc" * ("-o" ~*~ "src.o" + "-c" ~*~ "src.c")

λ> cmdLine1 == cmdLine2
True
λ> todo cmdLine2
Just ["gcc","-c","src.c","-o","src.o"]

As you can see, the above change has no effect, as we would expect from the commutativity of +. Replacing ~*~ with the usual connect operator on the other hand sometimes leads to changes in the semantics:

cmdLine3 :: Todo String
cmdLine3 = "gcc" * ("-c" * "src.c" + "-o" * "src.o")

λ> cmdLine1 == cmdLine3
False
λ> todo cmdLine3
Just ["gcc","-o","-c","src.c","src.o"]

The resulting sequence is correct from the point of view of a dependency graph, but is not a valid command line: the flag pairs got pushed apart. The change in semantics is recognised by the algebra and a rerun of the build system should reveal the error.

As a final exercise, let’s write a function that transforms command lines:

optimise :: Int -> Todo String -> Todo String
optimise level = (* flag)
  where
    flag = vertex $ "-O" ++ show level

λ> todo $ optimise 2 cmdLine1
Just ["gcc","-c","src.c","-o","src.o","-O2"]

As you can see, optimise 2 appends the optimisation flag "-O2" at the end of the command line, i.e. optimise 2 == (* "-O2").

Command lines in real build systems contain many conditional flags that are included only when compiling certain files on certain platforms, etc. You can read about how we deal with conditional flags in Hadrian here.

Under the hood

Scheduling a list of items subject to dependency constraints is a well-known problem, which is solved by topological sort of the underlying dependency graph. GHC’s containers library has an implementation of topological sort in Data.Graph module. It operates on adjacency lists and to reuse it we can define the following Graph instance:

newtype AdjacencyMap a = AM { adjacencyMap :: Map a (Set a) }
    deriving (Eq, Show)

instance Ord a => Graph (AdjacencyMap a) where
    type Vertex (AdjacencyMap a) = a
    empty       = AM $ Map.empty
    vertex  x   = AM $ Map.singleton x Set.empty
    overlay x y = AM $ Map.unionWith Set.union
        (adjacencyMap x) (adjacencyMap y)
    connect x y = AM $ Map.unionsWith Set.union
        [ adjacencyMap x, adjacencyMap y
        , fromSet (const . keysSet $ adjacencyMap y)
                  (keysSet $ adjacencyMap x) ]

adjacencyList :: AdjacencyMap a -> [(a, [a])]
adjacencyList = map (fmap Set.toAscList) . Map.toAscList . adjacencyMap

λ> adjacencyList $ clique [1..4]
[(1,[2,3,4]),(2,[3,4]),(3,[4]),(4,[])]

Todo is built on top of the TopSort graph instance, which is just a newtype wrapper around AdjacencyMap based representation of graphs:

newtype TopSort a = TS { fromTopSort :: AdjacencyMap a }
    deriving (Show, Num)

instance Ord a => Eq (TopSort a) where
    x == y = topSort x == topSort y

The custom Eq instance makes sure that graphs are considered equal if their topological sorts coincide. In particular all cyclic graphs fall into the same equivalence class corresponding to topSort g == Nothing:

λ> path [1..4] == (clique [1..4] :: TopSort Int)
True
λ> topSort $ clique [1..4]
Just [1,2,3,4]
λ> topSort $ path [1..4]
Just [1,2,3,4]
λ> topSort $ transpose $ clique [1..4]
Just [4,3,2,1]
λ> topSort $ circuit [1..4]
Nothing

Function topSort simply calls Data.Graph.topSort performing the necessary plumbing, which is not particularly interesting.

The current implementation has two issues: the topological sort is not always lexicographically first, as evidenced by cmdLine3 above, where "-o" precedes "-c" in the final ordering. The second issue is that topSort does not satisfy the closure axiom defined in the previous blog post. One possible approach to fix this is to compute the transitive reduction of the underlying dependency graph before the topological sort.

Have a great holiday everyone!

Leave a Reply

Your email address will not be published. Required fields are marked *