Graph theory is my favourite topic in mathematics and computing science and in this blog post I’ll introduce an algebra of graphs that I’ve been working on for a while. The algebra has become my go-to tool for manipulating graphs and I hope you will find it useful too.
The roots of this work can be traced back to my CONCUR’09 conference submission that was rightly rejected. I subsequently published a few application-specific papers gradually improving my understanding of the algebra. The most comprehensive description can be found in ACM TECS (a preprint is available here). Here I’ll give a general introduction to the simplest version of the algebra of graphs and show how it can be implemented in Haskell.
Update: This series of blog posts was published as a functional pearl at the Haskell Symposium 2017.
Constructing graphs
Let G be a set of graphs whose vertices come from a fixed universe. As an example, we can think of graphs whose vertices are positive integers. A graph g ∈ G can be represented by a pair (V, E) where V is the set of its vertices and E ⊆ V × V is the set of its edges.
The simplest possible graph is the empty graph. I will be denoting it by ε in formulas and by empty
in Haskell code. Hence, ε = (∅, ∅) and ε ∈ G.
A graph with a single vertex v will be denoted simply by v. For example, 1 ∈ G is a graph with a single vertex 1, that is ({1}, ∅). In Haskell I’ll use vertex
to lift a given vertex to the type of graphs.
To construct bigger graphs from the above primitives I’ll use two binary operators overlay and connect, denoted by + and →, respectively. The overlay + of two graphs is defined as:
(V1, E1) + (V2, E2) = (V1 ∪ V2, E1 ∪ E2)
In words, the overlay of two graphs is simply the union of their vertices and edges. The definition of connect → is similar:
(V1, E1) → (V2, E2) = (V1 ∪ V2, E1 ∪ E2 ∪ V1 × V2)
The difference is that when we connect two graphs, we add an edge from each vertex in the left argument to each vertex in the right argument. Here are a few examples:
- 1 + 2 is the graph with two isolated vertices 1 and 2.
- 1 → 2 is the graph with a directed edge between vertices 1 and 2.
- 1 → (2 + 3) is the graph with three vertices {1, 2, 3} and two directed edges (1, 2) and (1, 3). In Haskell we can write
connect 1 (overlay 2 3)
. - 1 → 1 is the graph with vertex 1 and a self-loop (an edge going from a vertex to itself).
The following type class expresses the above in Haskell:
class Graph g where type Vertex g empty :: g vertex :: Vertex g -> g overlay :: g -> g -> g connect :: g -> g -> g
Let’s construct some graphs! A graph that contains a given list of unconnected vertices can be constructed as follows:
vertices :: Graph g => [Vertex g] -> g vertices = foldr overlay empty . map vertex
And here is a clique (a fully connected graph) on a given list of vertices:
clique :: Graph g => [Vertex g] -> g clique = foldr connect empty . map vertex
For example, clique [1..]
is the infinite clique on all positive integers; we will call such cliques covering the whole universe complete graphs. We can also construct any graph given its edgelist:
fromEdgeList :: Graph g => [(Vertex g, Vertex g)] -> g fromEdgeList = foldr overlay empty . map edge where edge (x, y) = vertex x `connect` vertex y
As we will see in the next section, graphs satisfy a few laws and form an algebraic structure that is very similar to a semiring.
Algebraic structure
The structure (G, +, →, ε) introduced above satisfies many usual laws:
- (G, +, ε) is an idempotent commutative monoid
- (G, →, ε) is a monoid
- → distributes over +, e.g. 1 → (2 + 3) = 1 → 2 + 1 → 3
The following decomposition axiom, is the only law that makes the algebra of graphs different from a semiring:
x → y → z = x → y + x → z + y → z
Indeed, in a semiring the two operators have different identity elements, let’s denote them ε+ and ε→, respectively. By using the decomposition axiom we can prove that they coincide:
ε+ | = | ε+ → ε→ → ε→ | (identity of →) |
= | ε+ → ε→ + ε+ → ε→ + ε→ → ε→ | (decomposition) | |
= | ε+ + ε+ + ε→ | (identity of →) | |
= | ε→ | (identity of +) |
The idempotence of + also follows from the decomposition axiom.
The following is a minimal set of axioms that describes the graph algebra:
- + is commutative and associative
- (G, →, ε) is a monoid, i.e. → is associative and ε is the identity element
- → distributes over +
- → can be decomposed: x → y → z = x → y + x → z + y → z
An exercise for the reader: prove that ε is the identity of + from the minimal set of axioms above. This is not entirely trivial! Also prove that + is idempotent.
Note, to switch from directed to undirected graphs it is sufficient to add the axiom of commutativity of →. We will explore this in a future blog post.
Examples
Let’s look at two basic instances of the Graph type class that satisfy the laws from the previous section. The first one, called Relation, adopts our set-based definitions for the overlay and connect operators and is therefore a free instance (i.e. it doesn’t satisfy any other laws):
data Relation a = Relation { domain :: Set a, relation :: Set (a, a) } deriving (Eq, Show) instance Ord a => Graph (Relation a) where type Vertex (Relation a) = a empty = Relation Set.empty Set.empty vertex x = Relation (Set.singleton x) Set.empty overlay x y = Relation (domain x `Set.union` domain y) (relation x `Set.union` relation y) connect x y = Relation (domain x `Set.union` domain y) (relation x `Set.union` relation y `Set.union` Set.fromDistinctAscList [ (a, b) | a <- Set.elems (domain x) , b <- Set.elems (domain y) ])
Let’s also make Relation an instance of Num type class so we can use + and * operators for convenience.
instance (Ord a, Num a) => Num (Relation a) where fromInteger = vertex . fromInteger (+) = overlay (*) = connect signum = const empty abs = id negate = id
Note: the Num law abs x * signum x == x
is satisfied since x → ε = x. In fact, any Graph instance can be made a Num instance if need be. We can now play with graphs using interactive GHC:
λ> 1 * (2 + 3) :: Relation Int Relation {domain = fromList [1,2,3], relation = fromList [(1,2),(1,3)]} λ> 1 * (2 + 3) + 2 * 3 == (clique [1..3] :: Relation Int) True
Another simple instance can be obtained by embedding all graph constructors into a basic algebraic datatype:
data Basic a = Empty | Vertex a | Overlay (Basic a) (Basic a) | Connect (Basic a) (Basic a) deriving Show instance Graph (Basic a) where type Vertex (Basic a) = a empty = Empty vertex = Vertex overlay = Overlay connect = Connect
We cannot use the derived Eq instance here, because it would clearly violate the laws of the algebra, e.g. Overlay Empty Empty
is structurally different from Empty
. However, we can implement a custom Eq instance as follows:
instance Ord a => Eq (Basic a) where x == y = toRelation x == toRelation y where toRelation :: Ord a => Basic a -> Relation a toRelation = foldBasic foldBasic :: (Vertex g ~ a, Graph g) => Basic a -> g foldBasic Empty = empty foldBasic (Vertex x ) = vertex x foldBasic (Overlay x y) = overlay (foldBasic x) (foldBasic y) foldBasic (Connect x y) = connect (foldBasic x) (foldBasic y)
The Basic instance is useful because it allows to represent densely connected graphs more compactly. For example, clique [1..n] :: Basic Int
has linear-size representation in memory, while clique [1..n] :: Relation Int
stores each edge separately and therefore takes O(n2) memory. As I will demonstrate in future blog posts, we can exploit compact graph representations for deriving algorithms that are asymptotically faster on dense graphs compared to existing graph algorithms operating on edgelists.
Summary
I’ve been using the algebra of graphs presented above for several years in a number of different projects and found it very useful. There are a few flavours of the algebra that I will introduce in follow-up blog posts that allow to work with undirected graphs, transitively closed graphs (also known as partial orders or dependency graphs), graph families, and their various combinations. All these flavours of the algebra can be obtained by extending the set of axioms.
I am working on a Haskell library alga implementing the algebra of graphs and intend to release it soon. Let me know if you have any suggestions on how to improve the above code snippets.
Instroduced
Fixed, thank you!
Very nice, can’t wait for the follow up post.
Thank you! I’ll write a follow up post during the weekend.
I don’t get the decomposition axiom tbh. I don’t see how that can old for any plain old graph.
Do you agree that we can represent any plain old graph with a pair (V, E) of vertices and edges?
Then the decomposition axiom simply states that the operators + and -> on such pairs satisfy the equality x -> y -> z = x -> y + x -> z + y -> z. I haven’t shown the proof, but it is not very complex: simply substitute + and -> in the axiom with the definitions.
The way I defined + and -> is not the only way to make the decomposition axiom work, there may be other suitable definitions.
Hope this clarifies things!
You say that connect yields directed edges, but clique [1,2] only gives one of the two directed edges in the clique.
You are right. The implementation I gave creates a ‘directed clique’, which is usually called a ‘tournament’ in graph theory: https://en.wikipedia.org/wiki/Tournament_(graph_theory).
If you would like to have edges going in both directions then we can introduce ‘biconnect’:
biconnect x y = connect x y `overlay` connect y x
and then we can have
clique’ = foldr biconnect empty . map vertex
This clique’ is probably what you would like to have.
Instead of “connect” and “biconnect”, one could use “direct” and “connect” respectively (consistency would suggest “undirect” instead of “connect”, but that is not much of a verb).
I prefer to use “connect” in the definition of class Graph because it is relatively neutral w.r.t. to edge directedness and can therefore be used both by directed graphs and undirected graph instances. But I see your point!
I quite like the treatment you’ve made of graphs in this post. However, in my opinion you’ve made the common over-simplification of edges by stating that they are ordered pairs (thus leading to a directed graph treatment). My preference is for one by Tutte [1]:
> A graph G consists of a set E(G) of edges and a (disjoint) set V(G) of vertices, together with a relation of incidence which associates with each edge two vertices, not necessarily distinct, called its ends.
Note that this definition lends itself much nicer to different types of graphs (undirected, planar, easily extendable to hypergraphs, etc.).
I’m interested to see how your library progresses and can help to try have support for fgl graphs in there; when it comes time for testing then fgl-arbitrary might be able to help [2].
[1]: http://www.sciencedirect.com/science/article/pii/S1385725861500455
[2]: https://hackage.haskell.org/package/fgl-arbitrary-0.2.0.2/docs/Data-Graph-Inductive-Arbitrary.html#g:5
Many thanks Ivan! Your (and Martin Erwig’s) fgl library is cool and I will definitely investigate how this algebra relates to fgl.
Regarding edges: note that the algebra itself doesn’t define edges in any particular way, and I therefore believe it can also describe graphs expressed using Tutte’s edge definition, which is indeed more general. As I mentioned, if you add an axiom of commutativity of ->, turning it into something like –, then the edges become undirected, i.e. unordered pairs 1 — 2 = 2 — 1.
Do you know any other instances similar to this whereby the identity element is shared by the two operations of what would otherwise be a semiring?
Have you explored any alternative natural operations for graph constructions, particularly those which would give one a semiring structure?
Not sure what your grand views are for this, but it would also be interesting to explore this from a categorical perspective in the next post… What are the functors between the category of MONOIDS and GRAPHMONOIDS, RINGS and GRAPHALGEBRAS, and are these appropriate questions to ask?
It certainly has me thinking.. thanks for the post!
You are asking very interesting questions!
I didn’t find any semiring-like algebras that have the same identity for the two operations (apart from some trivial ones). I’m still looking — please let me know if you come across anything like this.
There are many graph-like semirings out there, and I looked at many of them, e.g. the Kleene Algebras. So far I haven’t found any algebra with the decomposition axiom.
Why do I need the decomposition axiom? I haven’t actually provided a motivation for it (or why I defined overlay and connect the way I did). So, here it is: I wanted to represent *arbitrary* graphs and *only* graphs using expressions, so expression x -> y -> z should be just a graph, and therefore I should be able to construct it from smaller pieces. But what should these pieces be? We can’t make it equal to x -> y + y -> z because then our definition of -> would be partial, since it’s not always possible to find ‘the last node’ (in this case y) that we need to connect to z (e.g. the left-hand side may be a cycle). The only choice that seems to work is to connect all vertices on the left-hand side to all vertices on the right hand side, which leads directly to the decomposition axiom.
As for your category theory question: I know very little about category theory, so I don’t think I can do anything sensible in this direction. In fact, I still need some time to even understand your question 🙂 But I’d be very interested in hearing from you if you come up with anything — feel free to get in touch with me by email and we can continue this discussion.
Very nice algebra!
I defined the algebra in Coq and proved the properties you hinted at (idempotence, one is a unit for plus) in Coq in the following gist: https://gist.github.com/olivierverdier/b4e2bfba103246842eceea711351bdf7
Awesome, many thanks! We also have a formalisation in Agda, which can be found in Section 4.6 here in case you are interested:
http://async.org.uk/tech-reports/NCL-EEE-MICRO-TR-2014-191.pdf
(Note, this is a slightly more complicated version of the algebra with an additional operator.)
Since there may be many representations of the same graph through the -> and + constructions, have you guys thought about what a good “canonical” representation could be? The simplest representation seems to be the disjunctive form (overlays of connections of singular vertices, such as that computed by fromEdgeList), but it seems like the axioms can be treated as reduction rules for certain definitions of canonicity. For example, a “sum of maximal cliques” (but not maximum) representation can be computed by inverting the distributivity and the decomposition laws. In general, since we don’t really know how to compute graph isomorphisms easily, it’s probably a tall order to define some uniquely canonical representation that is also efficient to compute/reduce to.
A compact canonical representation is one of the most interesting questions about this algebra, at least I’ve spent quite a lot of time thinking about this.
So far I’ve only got the fromEdgeList-like canonical form: it’s easy to rewrite any expression into an (ordered) overlay of vertices, and pairs of vertices, just like you mentioned.
The problem with the approach based on maximal cliques is that a graph can have an exponential number of them: O(3^(n/3)), I believe, so without some further tricks this is even less compact than the simple canonical form above.
One promising approach is graph ‘modular decomposition’:
https://en.wikipedia.org/wiki/Modular_decomposition
The underlying algorithms are not straightforward, and I haven’t yet had time to evaluate how compact such a representation would be in practice — that may be a good & fun project!
By the way, I don’t think that the graph isomorphism problem is an obstacle on the way to a compact canonical form: we don’t need to match vertices of two given graphs! In other words, graph 1 -> 2 is different from 2 -> 3 even though they are isomorphic.
Hi Andrey, very glad to see you writing about this! One small comment, I think it actually takes a bit more work to show that Relation is the free model of this algebra. Formally, saying Relation is the free model means that for any other model M there is a unique homomorphism from Relation to M. To show this you would have to prove e.g. that every expression built up using the algebra can be simplified to some canonical form using the laws, and then show that those canonical forms can be represented using relations. And indeed, that is true, since every expression can be reduced to a sum of individual vertices and edges, which I know you have proved in some of your papers on the subject.
Hi Brent, glad that you found the blog post! 🙂 Our brief chat at ICFP gave me an impulse to come back to the algebra and write about it. Thank you!
Yes, I agree that proving that Relation is a free model of this algebra actually requires some extra work that I skipped here for the sake of brevity.
I am subscribed to your blog so I saw the post as soon as it was published, but only now got around to reading it. =) Looking forward to more posts on the subject.
Cool =) The next post is up. Will do my best to keep up the momentum!
This page was a revelation! In August 2015, I got introduced to functional programming during the course of my PhD work in network information theory at the Indian Institute of Technology Bombay, and have wanted to explore the subject in greater depth for a long time. At the time, I came across of Erwig’s inductive graph approach. I will be submitting my thesis coming Monday (Mar 5, 2018) and maybe after a short break, hope to turn my attention full time to functional programming.
Thank you! If your thesis is about graphs and functional programming, I’d be interested to have a look.