Constructing Hard Examples for Graph Isomorphism

We describe a method for generating graphs that provide difficult examples for practical Graph Isomorphism testers. We first give the theoretical construction, showing that we can have a family of graphs without any non-trivial automorphisms which also have high Weisfeiler-Leman dimension. The construction is based on properties of random 3XOR-formulas. We describe how to convert such a formula into a graph which has the desired properties with high probability. We validate the method by an experimental implementation. We construct random formulas and validate them with a SAT solver to filter through suitable ones, and then convert them into graphs. Experimental results demonstrate that the resulting graphs do provide hard examples that match the hardest known benchmarks for graph isomorphism.


Introduction
Graph Isomorphism (GI) is the problem of deciding, given two graphs G and H whether there is a bijection between their sets of vertices V (G) and V (H) respectively, that takes edges of G to edges of H and non-edges of G to non-edges in H. In short, it asks if G and H are the same, up to a renaming of vertices. The problem is of great interest in the field of complexity theory as it is among the few natural problems in NP that are not known to be in P nor known to be NP-complete. Babai's recent result [9] places the problem in quasi-polynomial time, further cementing its status as a candidate NP-intermediate problem.
While the complexity of GI is interesting from the theoretical standpoint, in practice the problem is largely solved. That is, there exist programs which efficiently deal with instances of graph isomorphism that arise in practice, for instance in searching through chemical databases or in image processing. Significant among these effective graph isomorphism testers are the programs Traces and nauty, available in a common distribution (see [20]). It remains a challenge for the theoretician to examine the algorithms behind these programs and determine their worst-case behaviour. Indeed, in the long version of his paper, Babai asks the question [8,Sec. 13.5]: The question is, does there exist an infinite family of pairs of graphs on which these heuristic algorithms fail to perform efficiently? The search for such pairs might turn up interesting families of graphs.
We address this question and provide a means of constructing just such a family of graphs, including an implementation and experimental results.
Computationally, the graph isomorphism problem is equivalent to the problem of determining the orbits of the automorphism group of a graph G. That is, given a graph G, we wish to partition V (G) into the minimum number of classes so that for any pair of vertices u, v in the same class there is an automorphism of a graph G that takes u to v. We call this the orbit partition of the graph G. The fundamental algorithm underlying nauty as well as Traces, like many practical approaches to the graph isomorphism problem, relies on steadily refining a partition to arrive at the partition into orbits. It does this through a process of (i) vertex refinement combined with (ii) individualization and (iii) factoring of automorphisms of the graph. The process of vertex refinement, also known as colour refinement, is a highly efficient method that is known to achieve the orbit partition on almost all graphs [10] but fails miserably on regular graphs, for example. Where vertex refinement fails, the programs use individualization, which is the process of selecting (i.e. individualizing) a particular vertex and placing it in its own class and then repeating vertex refinement until evenutually a partition into singleton sets is obtained. With backtracking, the structure of the parition into orbits is revealed. The main differences between the various graph isomorphism solvers (not only nauty and Traces but also bliss and conauto) are precisely in how the vertices are selected. This process is inherently exponential in the number of selections that need to be made. However, the search space is radically cut down if we can identify nontrivial automorphisms of the graph and factor the graph suitably, which Traces, in particular, does effectively. More details are given in Section 2 below.
In theory, the vertex refinement algorithm is subsumed by the k-dimensional Weisfeiler-Leman (k-WL) method, which works by refining a partition of the k-tuples of the vertices of a graph G (see [16,Sec. 2] for a good account of the history of this method). Taking k to be large enough (of the order of the number of vertices in G), the k-WL method gives exactly the orbit partition, but this is (for k ≥ 2) not a practical method and rarely used in solvers. However, what is interesting from the point of view of worst-case analysis is that the k-WL method serves as a way of bounding the number of individualizations we need to determine the orbit partition in a graph G. To be precise, suppose G has k vertices v 1 , . . . , v k such that when each of them is individualized, the vertex refinement procedure converges to the discrete partition of G, then we can also determine the orbit partition by the (k + 2)-WL method. Since it is known, through the construction of Cai, Fürer and Immerman [16] (called CFI graphs below), that there are, for every k, graphs on which the k-WL method fails to give the orbit partition, it follows that there is no constant bound on the number of individualizations needed, in combination with vertex refinement, to obtain the orbit partition of a graph. Hence, there is no polynomial bound on the running time of a graph isomorphism algorithm based solely on vertex refinement and individualization.
However, Traces has another element in its armoury, and that is that it detects automorphisms while constructing the orbit partition, using these to factor the graph and therefore cut down the search space. This means that the running time is not exponential in the number of individualizations but is potentially divided by the size of the automorphism group of the graph. Indeed, the CFI graphs are among the standard benchmarks considered in [20] and they prove to be not too difficult for the program to handle as they have many automorphisms. This led the first author of the present paper to suggest (at the December 2015 Dagstuhl seminar on Graph Isomorphism) that the way to construct hard families of graphs and answer Babai's quetsion, is to find graphs whose Weisfeiler-Leman dimension is large but which have no non-trivial automorphisms. A construction of structures satisfying this property, known as multipedes is given in the work of Gurevich and Shelah [17]. These structures can be easily turned into graphs to yield the desired family. However, while the theoretical construction guarantees the existence of such graphs, it turns out that constructing actual instances, even for small values of k, leads to very large graphs. Thus, in order to construct families that can be used for practical benchmarking of GI solvers, a refined analysis is required. One such approach was taken by Neuen and Schweitzer in [22] where the multipede construction was combined with size reduction techniques.
In the present paper, we give an alternative construction of such graphs which proves very effective. Instead of the multipedes of Gurevich and Shelah, we start with random systems of equations over the 2-element field. This is based on the insight from [4] that the construction of CFI graphs really codes such systems in the graph construction. We use a way of lifting these systems to graphs which have the property that as long as the original system has no non-trivial solutions, the resulting graph has no non-trivial automorphisms. Moreover, as long as the original system is k-locally satisfiable (a precise definition will follow), the orbit partition of the graph cannot be obtained by the l-WL method for some large l. It turns out that a random system has both properties: of having no non-trivial solutions and being k-locally satisfiable for sublinear values of k. The theoretical basis of this construction is given in Section 3. The main conclusion is Theorem 1 which shows that we can construct families of graphs which are asymmetric, i.e. have no non-trivial automorphisms, but have linear Weisfeiler-Leman dimension. Moreover, we define a randomized construction that produces such graphs with high probability.
It is instructive to compare our theoretical result with that of Neuen and Schweitzer [21,22]. They also give a construction of families of graphs which provably require exponential time on a solver based only on individualization and refinement. Here, they take refinement to be any class of procedures that respect k-WL equivalences for some fixed k. Their construction is based on the multipede construction of Gurevich and Shelah, converted into graphs. This construction guarantees that the graphs are asymmetric, and have unbounded Weisfeiler-Leman dimension. While the dimension is unbounded, the multipede construction does not yield linear dimension. Indeed, as noted above, the graphs obtained for small values of k are rather large. Therefore, in order to obtain exponential lower bounds Neuen and Schweitzer employ size reduction techniques and explicitly consider the shape of the individualization search tree. By contrast, we directly establish a linear lower bound on the WL-dimension of the graphs and this immediately leads to an exponential lower bound on the search tree size under any target cell selection strategy. The experimental results reported in [21] are comparable with the ones we obtain.
To implement our theoretical construction, we leverage a highly developed SAT solver. This enables us to search for systems of equations which have no non-trivial solutions by coding them as 3SAT instances. While we do not directly check for k-local satisfiability, we use a proxy which is checking the speed improvement in the SAT solver that is obtained by the use of Gaussian elimination methods. This filter allows us to select those systems which are most likely to be locally satisfiable. We present details of the method in Section 4. Finally, we present some experimental results in Section 5 which show that the method does, indeed, yield instances which are hard, especially for Traces, but also for other isomorphism solvers. We have created some benchmark sets of such graphs, and one of these is now available through the webpage for nauty and Traces: http://pallini.di.uniroma1.it/Graphs.html. However, we consider the main contribution of the present work to be the method, using a SAT solver, which gives the ability to generate such large and hard example graphs at will.
The experimental work reported here was carried out in three stages. The first set of experiments were performed in April-June 2017 as part of the second author's Master's project. The full code of the implementation, all data generated in the experiments, as well as a project write-up can be found here: https://github.com/kkcam/graph-ismorphism. In particular, a number of graphs in dreadnaut format (.dre) can be downloaded from the site to run directly with nauty/Traces. A second set of experiments on the same graphs, involving a wider range of solvers, was carried out in January-February 2019, in response to a request from a referee. The results are available from the same website. Finally, another set of graphs using this protocol was created by Yui Chi Richie Yeung. Those graphs and the results are available at https://github.com/y-richie-y/sat_cfi/. A selection of results from all three sets of experiments is given in Section 5.
The theoretical work detailed in Section 3 is based on the first author's project suggestion, given as an appendix in the Master's thesis available at https://github.com/kkcam/graph-ismorphism.

Background
Partitions and Isomorphisms. Given a set S and two partitions P = {P 1 , . . . , P s } and Q = {Q 1 , . . . , Q t } of S, we say that P is a refinement of Q (or equivalently, that Q is coarser than P) if for every P ∈ P there is a Q ∈ Q such that P ⊆ Q. It is a proper refinement if s > t. We say that a partition P is discrete if every part is a singleton.
We always consider undirected, loopless, simple graphs. That is, a graph G is a set of vertices V (G) along with a set of edges E(G) where each edge e ∈ E(G) is a two-element set e = {u, v} ⊆ V (G), with u = v. Where G is clear from context, we simply write V and E for the vertex and edge set respectively. For a set C, a C-coloured graph is a a graph G together with a function χ : V → C. For a C-coloured graph (G, χ), we refer to the partition of V given by {{v | χ(v) = c} | c ∈ C} as the χ-partition of V .
Given two C-coloured graphs (G, χ) and (H, δ), an isomorphism from the first to the second is a bijection ι : V (G) → V (H) such that for each u, v ∈ E(G), {u, v} ∈ E(G) if, and only if, {ι(u), ι(v)} ∈ E(H) and χ(v) = δ(ι(v)). An automorphism of (G, χ) is an isomorphism from (G, χ) to itself. The orbit partition of (G, χ) is the coarsest partition of V (G) such that if u and v are in the same part of the partition, there is an automorphism ι of (G, χ) such that ι(u) = v. Consider the three computational problems: (1) given a pair of graphs G and H, decide if there is an isomorphism from G to H; (2) given a pair of coloured graphs (G, χ) and (H, δ) decide if there is an isomorphism between them; and (3) given a coloured graph (G, χ), output its orbit partition. It is known that these three problems are polynomially-equivalent, which is to say that there are polynomial-time reductions between any pair of them (see, for instance [24]). As such, we treat them as equivalent and mostly concentrate on the third. A graph G, or a coloured graph (G, χ), is called asymmetric if its only automorphism is the identity map. Some authors call such graphs rigid, but we employ the terminology of Babai [7] who reserves the latter term for graphs without non-trivial endomorphisms.
Refinement and Individualization. The vertex refinement procedure is an algorithm that produces, given a C-coloured graph (G, χ), the coarsest partition P of V refining the χ-partition such that if u, v ∈ P ∈ P, then for every Q ∈ P, u and v have the same number of neighbours in Q. Note that the partition of V produced by the vertex refinement procedure is coarser than the orbit partition. For many graphs G it is, in fact, the orbit partition but, for example for regular graphs with no colouring, it can be properly coarser (see [10]).
Given a C-coloured graph (G, χ) and a vertex v ∈ V , let c be a new colour that does not appear in C and let χ : V → C ∪ {c} be defined as the colouring with χ (v) = c and χ (u) = χ(u) for u = v. Then, the partition of V obtained by vertex refinement from (G, χ ) is called the vertex refinement of (G, χ) with individualization of v. More generally, given a sequence I = (v 1 , . . . , v k ) of vertices in V , the vertex refinement with individualizations of I is the algorithm that produces the vertex refinement of (G, χ ) where χ is a C ∪ {c 1 , . . . , c k }-colouring of V (G) with χ (v i ) = c i and χ (v) = χ(v) for v ∈ I; and c 1 , . . . , c k ∈ C. This partition is not in general coarser than the orbit partition. The aim of the individualization-refinement procedure is to find the smallest (in a precise sense) I such that the refinement with individualizations of I yields a partition into singleton sets. From this, it is possible to produce the orbit partition of (G, χ). For details, we refer the reader to [20,Sec. 2].
High-dimensional Weisfeiler-Leman. For each integer k ≥ 2, the k-dimensional Weisfeiler-Leman algorithm gives a partition of V k that is the coarsest partition P satisfying the following stability condition: if u, v ∈ V k are tuples in the same part of P and (P 1 , . . . , P k ) is a k-tuple of parts of P, then the order-preserving map from u to v is an isomorphism of the induced subgraphs of G and Here, u[x/i] denotes the tuple obtained by substituting x for the ith element of u and |S| denotes the cardinality of a set S. We write ≡ k to denote the equivalence relation corresponding to this partition. Again, this partition is necessarily coarser than the partition of V k into orbits under the action of the automorphism group of G, since the orbit partition clearly satisfies the stability condition. Also, for sufficiently large values of k, in particular certainly for k ≥ n − 1, the partition given by ≡ k is the orbit partition. So, for any graph G, we define the Weisfeiler-Leman dimension of G, denoted WL(G), to be the least k such that the partition induced by ≡ k on V k coincides with the orbit partition.
Cai, Fürer and Immerman [16] showed that there is no fixed k such that WL(G) < k for all graphs G. Indeed, they show a linear lower bound on WL(G). To be precise, they construct for each k a pair G and H of non-isomorphic graphs with O(k) vertices such that G and H cannot be distinguished by the isomorphism test based on k-dimensional Weisfeiler-Leman equivalence. This implies that WL(G H) > k, where G H is the disjoint union of G and H. Through this operation of disjoint union, the problem of testing graph isomorphism reduces to that of constructing the orbits of the automorphism group and, as stated in the introduction, here we adopt the latter perspective.
The k-dimensional Weisfeiler-Leman isomorphism test has been extensively analyzed in theoretical studies of the graph isomorphism problem. It has many equivalent and strikingly different characterizations, arising in algebra, combinatorics, logic and optimization. In particular, it is known that in a graph G, u ≡ k v if, and only if, there is no formula of C k+1 (first-order logic with counting quantifiers using at most k + 1 distinct variables) that distinguishes u from v. This relation was given a useful characterization in terms of a k-pebble bijective game by Hella [18]. The game is played by two players called Spoiler and Duplicator on a graph G. The position of the game at any moment in time consists of two k-tuples of vertices u and v. In each move, Spoiler chooses a value of i ∈ [k] and Duplicator responds with a bijection π : V → V . Spoiler then chooses a vertex x ∈ V and the new position is u[x/i] and v[π(x)/i]. Spoiler wins in any position if the ordered subgraph of G induced by the tuple u is not isomorphic to the ordered subgraph induced by v. The result of Hella [18] essentially tells us that Duplicator has a strategy to play forever without Spoiler winning in the (k + 1)-pebble bijective game starting at position u, v if, and only if, u ≡ k v.
The connection between the Weisfeiler-Leman dimension of a graph and the refinement and individualization procedure is the following. If a graph G contains k vertices v 1 , . . . , v k such that individualizing them results in the vertex refinement procedure on G producing the discrete partition, then WL(G) ≤ k + 2. This is most easily seen through the characterization of the Weisfeiler-Leman dimension in terms of counting logic. That is, when such a set of k vertices exist, we can show that any pair x and y of vertices that are not distinguished by any formula of C k+3 are in the same orbit of the automorphism group of G. The argument is as follows. We know that the vertex refinement procedure yields a partition into C 2 -equivalence classes. By the assumption that individualizing v 1 , . . . , v k results in the discrete partition, we have that for each vertex x, there is a formula φ(x) of C 2 using constants for v 1 , . . . , v k that is only true of x and of no other vertex in G.
Write φ x for the formula of C k+2 where the constants in φ have been replaced with new variables (which we will also denote v 1 , . . . , v k for convenience). Now consider the formula Note, here, while we have a different formula φ x for each x ∈ V (G), we assume that we re-use the variable x. However, where we write φ y , we replace it with the new variable y. This ensures that the total number of variables used is at most k + 3. It can then be verified that the formula ∃v 1 , . . . , v k θ describes the graph G uniquely, up to isomorphism. Moreover, ∃v 1 , . . . , v k (θ ∧ φ x ) is a formula of C k+3 that is only satisfied in G by vertices in the orbit of x.
Multipedes. Gurevich and Shelah [17] show how to construct a class of finite structures which is (1) axiomatized by a sentence of first-order logic; (2) contains only structures with no non-trivial automorphisms; and (3) such that no formula of fixed-point logic defines a linear order on all structures in this class. From our point of view, the relevant aspect of this construction is that it gives, for each value of k, an asymmetric structure M k , that is one which has no non-trivial automorphisms, but on which the partition into ≡ k -classes is non-trivial. That is to say, even though every element of the orbit partition of M k is a singleton, there are pairs of distinct elements a, b in M k such that a ≡ k b. The structures in question are called 3-multipedes in [17].
It seems at first sight that this provides suitable hard examples for graph isomorphism testers such as Traces. Strictly speaking, the 3-multipedes are not graphs, but they can be translated to graphs by standard methods (see [19,Theorem 5.5.1]), preserving the relevant properties: unbounded Weisfeiler-Leman dimension and no non-trivial automorphisms. The absence of non-trivial automorphisms means that factoring by automorphisms cannot be used to speed up search by trimming the tree, while the unbounded dimension guarantees that there is no upper bound on the number of individualization steps needed to make the vertex refinement procedure yield the discrete partition. However, actually constructing instances of such multipedes turns out to be difficult. The proof in [17] does not actually show how to construct the structures M k . Rather, it shows that for all large enough values of n, a random structure on n elements, under a suitable skew probability distribution µ n , has the right properties. However, the probability grows rather slowly with n. Indeed, the smallest value of n at which the probability is non-zero is possibly exponential in k. An experimental attempt to sample from the distribution µ n failed to produce interesting examples at values of n up to a few thousand [2].
In the present paper, we consider an alternative construction, based on similar principles, of graphs whose Weisfeiler-Leman dimension is linear in the number of vertices (as with the CFI graphs), and which have no non-trivial automorphisms. The construction is again randomized, but based on a simple and well-understood probability distribution. Furthermore, a use of a SAT solver enables the quick generation of examples by filtering graphs that are sampled from the distribution.

The Construction
In this section, we describe a construction that yields, for each k, a graph G k with O(k) vertices with the property that G k is asymmetric and has Weisfeiler-Leman dimension at least k. The proof that such graphs exist is derived from known results in the literature, and here we show how to derive it, giving the necessary definitions to understand the background. The starting point of the construction is the observation that we can define instances of 3-xor that are k-locally consistent but unsatisfiable.
XOR formulas. Fix a countable set X of Boolean variables. We use capital letters X, Y, . . . to range over this set. A 3-xor-formula is a finite set of clauses, where each clause contains exactly 3 literals, each of which is either a variable X or a negated variableX.
We say that a 3-xor-formula φ is satisfiable if there is an assignment T : X → {0, 1} of truth values to the variables X such that in each clause of φ, an even number of literals is made true.
Given a 3-xor-formula φ, we can construct a system of linear equations over the two-element field F 2 . That is, for each clause C of φ we construct the equation x + y + z = c where x, y, z are the variables occurring in the literals of C and c is 1 if an odd number of them appear negated and 0 otherwise. It is easily verified that this system of equations has a solution if, and only if, φ is satisfiable. Note that two distinct clauses may give rise to the same equation. Say that two clauses are equivalent if they give rise to the same equation. In the sequel, we will use the terminology of 3-xor formulas and of systems of linear equations interchangeably, according to which is convenient.
So, we can think of a 3-xor formula with n variables and m clauses as a system of equations Hx = b where H is an m×n matrix, x the tuple of n variables and b ∈ F m 2 the m-tuple of right-hand sides of the equation. We say the system is homogeneous if the right-hand side of every equation is 0. This corresponds to a 3-xor formula in which no variable appears negated. A homogeneous system is always satisfiable, as it is satisfied by the assignment of 0 to every variable. Note that a homogeneous system is completely specified by a collection of 3-element sets of variables, with one set for each equation, containing the three variables that appear in it.
Random XOR formula. For fixed positive integers m, n we write F (m, n) for the set of all 3-xor-formulas over the variables X 1 , . . . , X n containing exactly m inequivalent clauses. We also write F(m, n) for the uniform probability distribution over F (m, n). It is known that, for large enough values of m and n, with m > n, a random formula drawn from this distribution is unsatisfiable (see [23]). That is to say that as n increases, for all m > n, the probability that a formula drawn from the distribution F(m, n) is satisfiable tends to 0.
We are interested in 3-xor-formulas that are unsatisfiable but k-locally consistent, for suitable integer k. For our purposes, we define k-local consistency by means of the following pebble game, played by two players called Spoiler and Verifier. The game is played on a 3-xor-formula φ with k pebbles p 1 , . . . , p k . At each move Spoiler chooses a pebble p i (either one that is already in play, or a fresh one) and places it on a variable X appearing in φ. In response, Verfier has to choose a value from {0, 1} for the variable X. If, as a result, there is a clause C such that all literals in C have pebbles on them and the assignment of values to them given by Verifier results in C being unsatisfied, then Spoiler has won the game. Otherwise the game can continue. If Verifier has a strategy to play the game forever without losing, we say that φ is k-locally consistent.
It is also known that for all k, the probability that a random formula drawn from F(m, n) (with m > n) is k-locally consistent tends to 1 as n increases. This was proved for formulas of 3sat rather than 3-xor in [3], but a similar analysis shows the result also for 3-xor. Such an analysis can be found in [5,Lemma 4].
Unique Satisfiability. As noted above, a homogeneous system of equations is always satisfiable, as it is satisfied by the assignment of 0 to every variable. We say that the system is uniquely satisfiable if this is the only solution to the system. It is easy to see that the set of solutions to Hx = 0 is exactly the null space of the matrix H, as a subset of the vector space F n 2 . In particular, the system is uniquely satisfiable if, and only if, H has rank n.
Define H(m, n) to be the set of all homogeneous systems of equations with m clauses and n variables, and H(m, n) for the uniform probability distribution over this set. We use the following fact about this distribution, established in [11] Lemma 1. There is a threshold t > 1 such that, for any α > t, the probability that a random system drawn from H(αn, n) is uniquely satisfiable tends to 1 as n increases.
A homogeneous system is necessarily satisfiable, and so also k-locally consistent for all k. If it is uniquely satisfiable, it has no solutions when we require some fixed variable X i to take value 1. However, for a randomly chosen such system, adding the condition X i = 1 leaves it k-locally consistent for small values of k. To be precise, there is a constant γ such that if φ is a system Hx = 0 chosen at random from H(αn, n), then with high probability there is some i such that if φ i is the system obtained from φ by adding the equation X i = 1 then φ i is γn-locally consistent. This is a much weaker statement than proved in [5,Lemma 4] where it is shown that changing the right-hand sides of a random subset of the vertices to 1 still leaves the system γn-locally consistent.
CFI construction The construction of Cai, Fürer and Immerman [16] provides us with examples of pairs of non-isomorphic graphs which are not distinguished by the k-dimensional Weisfeiler-Leman test. Inspired by this, a construction described in [1,Prop. 32] shows how to convert any k-locally consistent system of equations in H(m, n) to one that cannot be distinguished by the k-dimensional Weisfeiler-Leman test from its homogeneous companion. Here, the homogeneous companion of a system Hx = b is Hx = 0 (see also [5,Lemma 3] for a similar argument). Here we adapt the construction, to turn an arbitrary system φ into a graph G φ with the property that the local consistency of φ translates into a lower bound on the Weisfeiler-Leman dimension of G φ . Moreover, the unique satisfiability of φ guarantees that G φ is asymmetric.
For any 3-xor-formula φ, we define the graph G φ by the following construction. If φ has m inequivalent clauses and n variables, G φ has a total of 4m + 2n + 3(n − 1) vertices.
Let X 1 , . . . , X n be the n variables in some fixed order. For each clause C of φ, we define the four clauses C 000 , C 011 , C 110 , and C 101 by letting C 000 = C and C 011 , C 110 , C 101 be the three clauses equivalent to C obtained by negating exactly two of the literals of C. In particular, C 011 is obtained from C 000 by negating the second and third variables in the clause, C 110 by negating the first and second and C 101 by negating the first and third. Here, the terms "first", "second" and "third" refer to the numberical order of the variables chosen above.
We then have a vertex in G φ for each of these clauses. Also, for each variable X in φ, we have two vertices X 0 and X 1 . In addition, for each i with 1 ≤ i < n we have three vertices i l , i r , i s . Figure 1: Clause gadget in G φ corresponding to the clause X ⊕ Y ⊕ Z The edges are described as follows. For each clause C, if the literal X occurs in C, we have an edge from C to X 1 and if the literalX occurs in C, we have an edge from C to X 0 . There is an edge between X 0 and X 1 . These are depicted in Figure 1. These capture the essence of the CFI-like construction. In addition, for each i we also have the edges: (i l , i r ), (i r , i s ) and (i l , X 0 i ), (i l , X 1 i ), (i r , X 0 i+1 ) and (i r , X 1 i+1 ). These additional edges are depicted in Figure 2. The purpose of the additional gadget involving the vertices i l , i r and i s is to remove some symmetries on the graph by imposing the chosen order on the set of variables. Figure 2: Asymmetry gadget in G φ Now, fix a homogeneous system of equations φ, and let G φ be the graph obtained by the above construction. Also, let φ i be the system obtained by adding the equation X i = 1 to φ.
Proof. The proof follows along the lines of [5,Lemma 3] by showing a winning strategy for Duplicator in the bijective (k + 1)-pebble game played on G φ starting in the position u, v where u is the tuple consisting of the vertex X 0 i repeated k + 1 times and v is the tuple consisting of the vertex X 1 i repeated k + 1 times. We give a brief outline. Duplicator's strategy will always be to play a bijection that is the identity on the vertices i l , i r , i s . For each variable X it maps the set {X 0 , X 1 } to itself (though it may swap these two vertices) and for each clause C it maps the set {C 000 , C 011 , C 110 , C 101 } to itself (though it may permute these elements). Moreover the permutation induced on {C 000 , C 011 , C 110 , C 101 } must be either the identity or a permutation induced by swapping X 0 and X 1 for exactly two variables X appearing in the clause C. Call a bijection meeting these requirements well-defined.
Given a position u, v in the bijective game, we say that it is consistent if there is a well-defined bijection β taking u to v and such that for any C i ∈ u if β(C i ) = C j where C j is obtained from C i by swapping X 0 , X 1 and Y 0 , Y 1 then β(X 0 ) = X 1 if either of X 0 or X 1 is in u and similarly β( Consider now a consistent position u, v and let β be a well-defined bijection witnessing this. Let U be the set of variables of φ containing all variables X such that either X 0 or X 1 appears in u or X appears in some clause C such that one of C 000 , C 011 , C 110 , C 101 appears in u. Note that U has at most 3(k + 1) elements. Now, we define the map T : U → {0, 1} given by T (X) = 0 if β(X 0 ) = X 0 and T (X) = 1 if β(X 0 ) = X 1 . Duplicator's strategy is to ensure that T is a winning position for Verifier in the 3(k + 1)-local consistency game on the formula φ. That is to say, starting in the position where pebbles are placed on the variables in U , and Verifier's responses are given by T , Verifier can continue and play forever. This condition is satisfied by the initial position, as u is just the element X 0 i repeated k + 1 times, so U = {X i }, and T is the map taking X i to 1. But, the fact that φ i is k +1-locally consistent implies that this is a winning position for Verifier. Now, to see that the Duplicator can maintain the position, suppose at some stage in the bijective game, Spoiler moves pebble j. Duplicator needs to construct a well-defined bijection such that anywhere Spoiler places the pebble will result in a consistent position. Spoiler's move can be translated into a move in the local consistency game from the current position T . Duplicator's possible responses in that game define a function from the variables X to {0, 1} and this can be turned into a well-defined bijection.
Asymmetry Finally, we want to argue that if the homogeneous system φ is uniquely satisfiable, then G φ is asymmetric. Before giving the proof, we give some intuition. The graph G φ has two vertices X 0 and X 1 for each variable X of φ. Consider first a permutation π of these vertices which fixes each set {X 0 , X 1 }. This gives rise to a map T from the variables of φ to {0, 1} such that T (X) = 1 if, and only if, π exchanges X 0 and X 1 . To extend π to an automorphism of G φ would require us to permute the vertices corresponding to clauses in such way that fixed each set {C 000 , C 011 , C 110 , C 101 }. This can only happen if for exactly two of the variables X appearing in the clause C do we have T (X) = 1. In other words, this requires T to be an assignment satisfying φ. Are there other automorphisms of G φ that do not fix the sets {X 0 , X 1 }? The presence of the vertices i l , i r , i s ensures that these sets are effectively linearly ordered and no other automorphisms are possible. This is formally proved below.
Lemma 3. If φ is homogeneous, then it is uniquely satisfiable if, and only if, G φ is asymmetric.
Proof. Let α be any automorphism of G φ . Note that every clause vertex C has degree 3. Every variable vertex X 0 or X 1 has degree at least 4. Every vertex i s has degree 1. Thus, each of the following sets is fixed (set-wise) by α: • the set S = {i s | 1 ≤ i < n}: this is the set of vertices of degree 1; • the set R = {i r | 1 ≤ i < n}: this is the set of vertices adjacent to a vertex in S; • the set of clause vertices C : this is the set of vertices of degree 3 that are not within distance 2 of a vertex in S; • the set of variable vertices X : this is the set of neighbours of C; and • the set L = {i l | 1 ≤ i < n}: this is everything else.
Indeed, we can say more. Each of the sets S, L and R is fixed pointwise by α. If this were not so, there would be some i, j with i < j such that α(i s ) = j s (since the set S is fixed). Then, α(i r ) = j r (since these are the sole neighbours), α({X 0 i , X 1 i }) = {X 0 j , X 1 j } (since these are the only neighbours in X of i r and j r respectively), and so α((i + 1) l ) = (j + 1) l and α((i + 1) r ) = (j + 1) r . Proceeding by induction, we have for all k α((i + k) r ) = (j + k) r . Taking k large enough so that j + k > n ≥ i + k, we get a contradiction.
It also follows that, for each variable X, α({X 0 , X 1 }) = {X 0 , X 1 }. That is, α either fixes each of the two vertices or it interchanges them. Note that if α fixes all the variable vertices, then it is the identity everywhere, since no two vertices in C have the same neighbours in X . Let T be the assignment that maps X to 0 if α is the identity on {X 0 , X 1 } and 1 otherwise. We can check that T satisfies φ.
In the other direction, suppose there is a truth assignment T that satisfies φ. Now consider the map on X that exchanges the vertices X 0 and X 1 just in case T (X) = 1 and is the identity everywhere else. We extend this to a map on C as follows. For any clause C of φ, there are either 0 or 2 variables X in C for which T (X) = 1, since T is a satisfying assignment. In the first case, we let our map be the identity on {C 000 , C 011 , C 110 , C 101 } and in the latter case it is the unique permutation of this set induced by exchanging X 0 and X 1 for the two variables such that T (X) = 1. Finally, we also define the map to be the identity on the set L ∪ R ∪ S. It is now easy to see that this map is an automorphism.
We can conclude the description of the construction with the statement of a theorem. Theorem 1. There is a family of asymmetric graphs G k with O(k) vertices and Weisfeiler-Leman dimension k.
Proof. By Lemma 1, there is an α such that for sufficiently large n, a randomly chosen homogeneous system of equations φ from H(αn, n) is uniquely satisfiable with positive probability. By Lemma 3, G φ is then asymmetric. On the other hand, by [5,Lemma 5], φ i is γn-locally consistent for any i with high probability, which implies by Lemma 2 that G φ has WL-dimension at least 1 3 γn − 1.
Note that we really have proved, not only the existence of such a family, we have described a random process that produces such graphs with high probability.
Size Bounds While Theorem 1 tells us that the graphs G k have size linear in k, the actual size bounds are somewhat less clear. Specifically, there is a probabilistic element to the construction that relies on constructing a uniquely satisfiable formula φ such that for some i, φ i is k-locally consistent. What we know is that for any k, and any large enough n, a randomly constructed formula with n variables and m = αn clauses (α > 1) will have these properties with high probability. How big does n have to be before the probability becomes significant? A direct calculation does not give much cause for optimism.
Our argument for why a random formula is k-locally consistent with high probability is based on [5, Lemma 3], which in turn relies on the argument for expansion of a random 3-uniform hypergraph given in [13]. The key combinatorial bound in on width is [13][Lemma 6.6], attributed to [12]. A simple calculation shows that we need m to be around 10 7 in order to be guaranteed a width lower bound of 2 (i.e. that a formula will be 2-locally consistent with probability greater than 1/2). With m around 10 9 , we get reasonably high bounds on width, but these would be much larger graphs than we wish to consider. What we show in the rest of the paper is experimental results which show that even for much smaller values of n and m, a random sample produces graphs that are difficult for isomorphism testers. We combine random generation of 3-xor formulas with a filtering process which is aimed at improving the likelihood of getting locally-consistent instances. We describe this in the next section.

Experimental Setup
Section 3 established a theoretical result that shows that a random graph constructed in a particular way has the properties of being asymmetric and having high Weisfeiler-Leman dimension. In outline, we want to start with a random homogeneous 3-xor formula on n variables with αn clauses, i.e. a random 3-uniform hypergraph on n vertices and convert it into the graph G φ . This graph is asymmetric if φ is uniquely satisfiable (which occurs with high probability). Moreover G φ has Weisfeiler-Leman dimension at least k if φ is 3k-locally satisfiable, an event that also occurs with high probability. We now describe an experimental setup for producing such graphs (with up to a few thousand nodes) by starting with a random formula and applying a succession of filters. In the process we apply a number of heuristics in addition to the theoretical approach outlined above. To motivate and justify these heuristics, we now break up the construction in a slightly different way.
Asymmetry Consider φ, a homogeneous 3-xor formula with n variables X = {X 1 , . . . , X n } and m clauses C = {C 1 , . . . , C m }. We identify this system with a bipartite graph Φ with vertices C on the left and X on the right and an edge between X and C if X appears in the clause C. Note that because φ is homogeneous, the graph determines φ completely. The construction of the graph G φ described above can now be broken up into two steps. In the first step, we produce a graph G 1 φ by replacing each X ∈ X by two vertices X 0 and X 1 and each C ∈ C with four vertices {C 000 , C 011 , C 110 , C 101 } and connecting them as described above and illustrated in Figure 1. In the second step, we augment the graph G 1 φ with additional vertices i l , i r , i s for i ∈ {1, . . . , n}. Note that the reason for the second step is effectively to impose a linear order on the set of variables X and thereby ensure that the only automorphisms of G φ are the ones generated by satisfying assignments to φ. Thus, if the graph Φ is itself asymmetric, the second step is unnecessary as it is easily seen that in this case the only automorphisms of G 1 φ are generated from a satisfying truth assignment to φ by swapping X 0 and X 1 for all variables that are set to true. How likely is it that a random Φ (i.e. a random left-3-regular bipartite graph with m nodes on the left and n on the right) is asymmetric?
We can think of Φ as a 3-uniform hypergraph on n nodes, with m edges. It is easy to show (using the same methods that show that a random graph is asymmetric (see [15,Chap. 9])) that if m is roughly n log n, then a random 3-uniform hypergraph is, indeed, asymmetric with probability going to 1 as n increases. This is not the case when m = αn for constant α. However, our experiments show that in the range of values of n we worked with (n up to about 3000, and m between 1 and 5), there was a reasonably high probability of coming up with an asymmetric hypergraph. Moreover, if Φ is asymmetric, this is reasonably quick to check with a tester such as nauty/Traces as it is also highly probable that vertex refinement gives the orbit partition. It is only when Φ is converted to G 1 φ that we get high Weisfeiler-Leman dimension. Thus, for the purpose of the experiments, instead of generating the graphs G φ from Φ, we run a test on Φ to check if it has any non-trivial automorphisms. If it does, we discard it. Otherwise, we construct the graph G 1 φ and use that.
Unique Satisfiability Having generated a random homogeneous formula φ, we wish to check that it is uniquely satisfiable. For this, we use a highly developed SAT solver (CryptoMiniSat 5). This SAT solver is specifically optimized for cryptographic applications where the input often contains clauses that are formed by taking the XOR, rather than the disjunction, of a set of literals.
CryptoMiniSat combines standard SAT solving methods (based on DPLL) with the selective use of Gaussian elimination to attack such problems quickly. In our filter, we express φ as a conjunction of clauses where each clause is the XOR of three variables. We then test the satisfiability of φ ≡ φ ∧ X∈X X. That is, we add a clause that is the disjunction of all variables in X . Of course, φ is satisfiable if, and only if, φ has a satisfying assignment other than the all zeroes solution. In other words, φ is satisfiable if, and only if, φ is not uniquely satisfiable.

Local Consistency
We also want to ensure that the φ we select is k-locally consistent for a sufficiently large value of k. This is difficult to check directly. The problem of checking k-local consistency is known to be hard, requiring time exponential in k (see [14]) and we do not know of any good implementations. Instead, we used a simple heuristic that leverages the specific capabilities of CryptoMiniSat. Specifically, this package allows us to turn the use of Gaussian elimination on and off with an option. We check the satisfiability of the formula φ by running CryptoMiniSat twice, once with Gaussian elimination on and once with it off. If the former is significantly faster than the latter, we expect that the φ we have is a good candidate. Note, however, that this does not give us any bounds on the value of k for which φ might be k-locally consistent.
To justify this heuristic, note that the DPLL methods (with clause-learning and restarts) as employed in modern SAT solvers are subsumed by bounded-width resolution (see [6]). And formulas that are highly locally consistent but not globally consistent are exactly the ones that are difficult for bounded-width resolution [3]. On the other hand, Gaussian elimination is a method that specifically is fast for solving systems of linear equations which may well be locally consistent [4]. Thus, a formula on which Gaussian elimination is quick to determine satisfiability but DPLL-based methods are slow is a prime candidate for us.

Summary of Methodology
In summary, our methodology for generating hard examples for isomorphism testers is the following.
1. For a fixed value of n and m (roughly about 2n), take a set X of n variables.
2. Randomly select m 3-element subsets of X to form the left-3-regular bipartite graph Φ.
3. Check (using Traces) to see if Φ has any non-trivial automorphisms. If so, discard it. 4. If Φ has no non-trivial automorphisms, form the formula φ by taking the conjunction of the clauses X∼C X for each left-node C of Φ along with the clause X∈X X.

5.
Check if the formula φ is satisfiable using the SAT solver CryptoMiniSat with Gaussian elimination option on. If it is satisfiable, discard Φ.
6. Run CryptoMiniSat on φ a second time, with the Gaussian elimination option turned off. If this takes significantly longer to determine φ is unsatisfiable, then Φ is a prime candidate for the construction.
7. From Φ, obtain the graph G 1 φ by replacing each node C on the left-hand side with four nodes, and each node X in the right-hand side with two nodes and connecting them as described earlier.

Experimental Results
We can report on three sets of experimental results, using the construction described in the previous section. It should be noted that the main parameter that can be varied in the construction is the ratio m/n where m is the number of clauses and n the number of variables in the 3-xor formula. The ratio needs to be at least 1 to guarantee that the constructed formula is uniquely satisfiable. The closer it is to 1, the less likely it is to be uniquely satisfiable. Indeed, experimental runs show that at smaller values we had to sample from the distribution H(m, n) many times over to find a uniquely satisfiable instance. On the other hand, the larger the value of m/n, the harder it is to find locally satisfiable instances. While the theoretical results guarantee that the probability of finding such instances increases with n, it clearly does so more slowly for large values of m/n. Hence, in practice, one needs to fine tune the right value of the ratio to get good results.
It should also be noted that our construction does not determine the actual WL-dimension of the graphs. This seems to be a much harder computational problem than testing isomorphism itself (see [14] for bounds on the related problem of determining k-local consistency). Thus, while the heuristic filters we use are likely to produce graphs of large WL-dimension, we are unable to actually state the dimension of the graphs produced.
First Set. The procedure for constructing graphs described in the previous section was run on a cloud server, with the specification given in Table 1, during April-June 2017. The results show that graphs of a few hundred nodes produced using this procedure are very difficult for Traces in the sense that in most cases, at this size, the system timed out (with a timeout set at 3 hours) and failed to identify the automorphism orbits.
Some results of test runs on graphs produced by our construction are shown in Figure 3. These plot the time taken to run Traces on graphs with n nodes (n being the horizontal axis). The plot on the left of the figure is for graphs produced from 3-xor formulas with n variables and m = n clauses. The plot on the top right gives similar times for graphs produced from 3-xor formulas with m = 2n. Here, virtually all graphs we were able to produce with over 5000 nodes timed out on Traces. The complete data, including the graphs constructued, from this set of experiments is available at https://github.com/kkcam/graph-ismorphism. Some explanation of the nomenclature might be helpful. The graphs are classified according to the ratio m/n used in their construction. For instance con 2n is the collection of graphs with m = 2n, and con n is the collection of graphs with m = n. There is also a package con sml which contains for each n the graph with the smallest ratio m/n for which we were able to obtain a uniquely satisfiable 3-xor formula, which also gives an asymmetric bipartite graph Φ.
Second Set In February 2019, we ran a second set of experiments. These used the same database of graphs produced by the construction for the first set. This time the virtual machine setup was as described in Table 2. Again, with Traces, most of the larger graphs timed out. However, we also ran the same set of graphs through nauty, bliss and conauto, and all of these showed much better performance than Traces on the large graphs in this collection.
As a sample, we produce in Figure 4 the timing results on the graphs in the package con 2n for each of the four isomorphism solvers. The timeout is set at 120 m inutes and which can be seen to  occur frequently for Traces. It should be noted that most of the timeouts occurred due to memory limitations. It seems Traces requires large amounts of memory to process these graphs and the swapping required is what leads to the process timing out. While the other solvers (in particular bliss) were able to resolve the graphs quickly, they do show fast growth in run times as the graphs get larger. This is explored further in the third set of experiments. This generated a new collection of graphs using the same protocol as described in Section 4. These were run again through Traces, nauty, bliss and conauto. Graphs with up to 8000 nodes were generated with values of m/n in the range of 1.5-2. Once again, Traces frequently (almost invariably) times out on the larger graphs. The performance of the other three solvers is better, but exhibits fast growth in running time. For example we exhibit the results for nauty and bliss in Figure 5, with a logarithmic scale on the y-axis for running time. This is highly indicative of exponential growth in running time. Once again, bliss proved to be the fastest of the solvers. However, the performance depends heavily on which target cell selection heuristic is used. As bliss allows the use of different heuristics by setting parameters, results three different heuristics are displayed in Figure 5, in different colours. The best performance is for fl, which is "first largest non-singleton cell".
Discussion There are some important points one should highlight from the experimental results. The first is that Traces performs significantly worse on the graphs we construct than any of the other solvers. One possible explanation for this is the fact that the fundamental algorithm in Traces is a breadth-first search procedure of the individualization tree. Such a procedure may require shallowe trees but may, in principle, be more memory-intensive than a depth-first search. An important way that Traces avoids this drawback is the early identification of symmetries in the graph and using this to prune the search space. It is possible that the lack of symmetries in our graphs makes this pruning impossible leading to the solver running out of memory and timing out as a result. The construction described by Neuen and Schweitzer [21] is also aimed at constructing graphs which are asymmetric and have high WL-dimension. They also, similarly, report that Traces is rather slower on their graphs than other solvers. In contrast, Yeung reports that his implementation of the Neuen-Schweitzer construction yields graphs on which Traces performs faster than nauty. This warrants further investigation. Apart from Traces, an important distinction between the other solvers tested is their cell selection strategy. One of them, bliss, explicitly allows the user to choose the strategy in a call to the system. As we have seen, the choice of strategy can have a significant effect on the performance of the solvers. The results of the third set of experiments, especially on bliss, demonstrate the effect starkly. It would be instructive to understand how these cell selection strategies interact with the construction we have presented.
Our theoretical construction shows that there exists a family of graphs on which any solver based on individualization and refinement, along with factoring by symmetries, will take exponential time, no matter what cell selection strategy is used. Furthermore, it shows that sampling graphs at random from the distribution we describe will produce such graphs with high probability. We cannot verify that the graphs we select do indeed have high WL-dimension, which is why we need experimental validation, and the results do strongly suggest that the growth rate, on any solver, is exponential. In the first set of experiments, we constructed graphs with parameter m/n ≤ 2 only up to about 5000 nodes. For larger graphs, larger ratios were used. The third set of experimental results extended the construction of graphs with small ratio up to about 8000 nodes (e.g. n = 800, m = 1600), and the increase in running time is striking. The main reason for using larger ratios to generate the larger graphs in the first case was that at small ratios, finding large 3-xor formulas that are uniquely satisfiable becomes difficult, requiring large numbers of re-trials with fresh sampling. When this is combining with two runs of a SAT solver for each sampled formula, the time becomes prohibitive. Also, as we are not using the asymmetry gadgets described in Section 3, we are relying on checking that the random 3-left-regular hypergraph we select is itself asymmetric, and this may also involve repeated trials. Here the probability of success decreases with n for a constant ratio. The protocol was improved in the third set by checking unique satisfiability by a direct rank computation. Then, the SAT solver check was only performed for those formulas already known to be uniquely satisfiable, merely to record the difference caused by the use of Gaussian elimination.

Conclusion
We have described a theoretical construction of graphs that are provably difficult for a isomorphism solvers such as nauty and Traces. We have examined the construction experimentally and the results indicate that the graphs produced do indeed show exponential growth rates in running time on these solvers.
The main theoretical result combines known lower bounds for local consistency of 3-xor formulas with a construction inspired by the graphs of Cai-Fürer-Immerman and the related multipede construction to give a family of graphs which are provably asymmetric and of linear Weisfeiler-Leman dimension. This ensures that the running time grows exponentially with the size of the graphs. Our result also shows that a randomly constructed 3-xor instance is likely to yield a difficult graph with high probability. That is, the probability tends towards 1 as the graphs get larger.
The experimental setup uses SAT solver technology to create a series of filters which, combined with the random generation of 3-xor formulas produces graphs to follow the theoretical procedure. For the experimental set-up, we dropped some of the theoretical guarantees on asymmetry and local consistency and replaced them with heuristic tests. This is because we are unable to verify directly the WL-dimension of the graphs constructed.
The results show that our method quickly and consistently produces graphs that are difficult for Traces. Experiments with other solvers also support the conclusion that the growth rate of the time taken is exponential. This is comparable with the construction of hard graphs in [21].