A Symbolic Decision Procedure for Symbolic Alternating Finite Automata

We introduce Symbolic Alternating Finite Automata (s-AFA) as an expressive, succinct, and decidable model for describing sets of finite sequences over arbitrary alphabets. Boolean operations over s-AFAs have linear complexity, which is in sharp contrast with the quadratic cost of intersection and union for non-alternating symbolic automata. Due to this succinctness, emptiness and equivalence checking are PSpace-hard. We introduce an algorithm for checking the equivalence of two s-AFAs based on bisimulation up to congruence. This algorithm allows us to exploit the power of SAT and SMT solvers to efficiently search the state space of the s-AFAs. We evaluate our decision procedure on two verification and security applications: 1) checking satisfiability of linear temporal logic formulas over finite traces, and 2) checking equivalence of Boolean combinations of regular expressions. Our experiments show that our technique often outperforms existing techniques and it can be beneficial in both such applications.


Introduction
Programs that operate over sequences are ubiquitous and used for many different tasks, such as text processing [2], program monitoring [23], and deep packet inspection in networking [29].Being able to efficiently reason about these programs is a crucial task and many techniques have been proposed to do so.However, most of these techniques share a common link: they use finite automata and the related decision procedures.
Due to this reason, recently there has been renewed interest in automata theory, especially in the fields of security and programming languages [4,5,11,13,26,29].Despite these improvements, existing automata-based decision procedures are not as advanced as other decision procedures such as SMT solvers yet.For example, if one wants to use classic automata techniques to check whether there exists a string s that is accepted by all regular expressions in a set {r1, . . ., rn}, she will incur in two problems.
• Regular expressions operate over large alphabets with thousands of characters and most existing automata formulations do not cope well with large alphabets.• Intersecting the automata corresponding to the given regular expressions produces an automaton with number of states exponential in the number of regular expressions.Indeed, this problem is PSPACE-complete, but (as it happened in the world of SMT solvers) one might hope to find better solutions that rarely hit the worst-case complexity.
In this paper, we present a novel automata model together with a decision procedure to address both such limitations.Symbolic finite automata The problem of handling large alphabets is not a new one.Recently a new model, called symbolic fi-nite automata, has been proposed to address this limitation.Symbolic Finite Automata (S-FAS) [13,33] are finite state automata in which the alphabet is given as a Boolean algebra that may operate over an infinite domain, and transitions are labeled with first-order predicates over the algebra.Although strictly more expressive than finite-state automata, S-FA are closed under Boolean operations and admit decidable equivalence, as long as it is decidable to check satisfiability of predicates in the alphabet algebra.Symbolic alternating finite automata While S-FAS provide an elegant framework for abstracting away the alphabet structure, they have the same state complexities of classic finite automata.In particular, repeated S-FA intersections can result in S-FAS with exponentially many states.To solve this problem, we propose Symbolic Alternating Finite Automata (S-AFAs).S-AFAs add alternation to S-FAS by allowing transitions to contain Boolean formulas that describe the set of target states.For example, when an S-AFA is in state p and is reading a string s = a1 . . .an, the transition specifies that s is accepted from state p, if a1 is a lower-case alphabetic character and either the string a2 . . .an is accepted from state q1 or it is accepted from both q2 and q3.By adding alternation to S-FAS, S-AFAs obtain Boolean operations with linear complexity, which is in sharp contrast with the quadratic intersection and exponential complementation of S-FAS.Equivalence using bisimulation The succinctness of S-AFAs comes at a cost: equivalence and emptiness are PSPACE-complete problems.In the case of S-FAS, emptiness has linear complexity while equivalence is also PSPACE-complete.
In this paper, we propose a symbolic decision procedure for checking equivalence (and emptiness) of two S-AFAs and show that the procedure is effective in practice.The algorithm extends to S-AFAs the bisimulation up to congruence technique for solving the language equivalence problem for nondeterministic finite automata recently proposed by Bonchi and Pous [5].The algorithm belongs to a family of techniques based on the principle that two configurations of an automaton recognize the same language if and only if there is a bisimulation relation that relates them.Hopcroft and Karp's classical algorithm for checking equivalence of deterministic finite automata is a member of the family that employs a bisimulation up to technique [5,21].Rather than compute a bisimulation relation (which may be quadratic in the number of configurations of the DFA), Hopcroft and Karp's algorithm computes a relation R that is a bisimulation up to equivalence, in the sense that the equivalence relation generated by R is a bisimulation.Bisimulation up to congruence improves upon this technique by exploiting additional structure on the configurations of a nondeterministic automaton (the configurations of the NFA are finite disjunctions of states and if (say) if a1 R b1 and a2 R b2, then we may derive (a1 ∨ a2) R (b1 ∨ b2).
We extend this technique in two ways.First, we show how the framework can be applied to alternating automata by exploiting the lattice structure on S-AFA configurations to compute a small relation that generates a bisimulation, and by using a propositional satisfiability solver to compute the congruence closure.Second, we give a technique for extending the algorithm to symbolic alphabets by showing how to efficiently enumerate a set of represenative characters, in a style reminiscent of the way that SAT solvers enumerate all satisfying assignments to a propositional formula.
We implemented our algorithm and evaluated it on a comprehensive set of verification and security benchmarks.First, we used S-AFAs to check satisfiability of more than 10,000 LTL formulas appearing in [17] using the semantics of LTL over finite traces from [15] and compared our implementation against the tool Mona [19].Second, we used S-AFAs to check equivalence of Boolean combinations of complex regular expressions appearing in [reg] and compare against existing solutions based on nonalternating finite automata.Our experiments show that S-AFAs and our bisimulation technique often outperforms existing techniques and can be beneficial in both such applications.Contributions.In summary our contributions are: • Symbolic Alternating Finite Automata, S-AFAs, an automata model that can describe languages of strings operating over large and potentially infinite alphabets and for which Boolean operations have linear time complexity (Section 3. • An algorithm for checking equivalence of two S-AFAs, which integrates bisimulation up to congruence with propositional SAT solving (Section 4).• A modular, open-source implementation of S-AFAs, which allows the users to provide custom definitions of the alphabet Boolean algebra, and an efficient implementation of our equivalence algorithm (Section 5) • A comprehensive evaluation of our model and its decision procedures on more than 40,000 benchmarks from real world applications (Section 6).

Motivating example
In this section we present an application of S-AFAs and their equivalence algorithm in the context of spam detection.Spam detection is a notoriously hard task and spam filters are continuously modified to either take into account novel malicious behaviours or to relax existing assumption to handle overly restricting behaviours.While machine learning is the typical choice for spam detection, certain companies prefer using custom filters created using regular-expression-based black-and white-listing.The number of such filters can be very large and redundant expressions that cover already considered behaviours are often mistakenly added to the set of filters.Efficiently processing all such expressions can become a complicated task and it is therefore undesirable to avoid adding redundant filters to the list of processed ones.
For the sake of this example, we assume that a spam filter is given as a set of regular expressions R = {r1, . . ., rn} with the following string: a string is an instance of the spam filter R if it belongs to the language of each regular expression ri ∈ R. A simple example of a spam scenario is given in Fig. 1. 1When a new spam filter R = {r 1 , . . ., r n } is added to the set of all spam filters S, we might want to see whether there exists a spam filter R that is already in S and that subsumes R .Similarly, if we have a regular expression W describing known good inputs, we might want to check that none of the strings  in W is classified as spam.As is well known, we can perform these checks by building the finite automaton corresponding to each regular expression and by using the appropriate automata operations.However, as mentioned in the introduction we face some problems.
• Regular expressions operate over very large alphabets with up to 2 16 characters that can make classic automata operations highly impractical.
• Repeated Boolean automata operations can cause an exponential blow-up in the number of states of the resulting automaton.
• Checking equivalence of non-deterministic automata is a PSPACEcomplete problem and in general requires automata determinization.
S-AFAs, the model proposed in this paper explicitly addresses the first two issues and provides a practical algorithm for solving the third one.S-AFAs combine two existing automata models.
Symbolic finite automata extend classic automata to large and potentially infinite alphabets by allowing transitions to carry predicates over a given alphabet theory.For example, the third regular expression in Figure 1 can be succinctly represented using the symbolic finite automaton with transitions where q0 is an initial state and q1 is the only final state.Transitions carry a predicates and can be traversed when the processed symbol is a model of the predicate.For example a true-labeled transition can be traversed when reading any input character in the alphabet.This model enables succinct representation of large input alphabets.
Alternating finite automata extend classic automata by allowing Boolean operations to be performed at the transition level.For example, assume we are given three finite automata A1, A2, and A3 for the regular expressions in Figure 1 with initial states p1, p2, and p3.One can build an alternating finite automaton A accepting the intersection of the languages accepted by the three automata by simply taking the disjoint union of the three automata A1, A2, and A3 and setting p1 ∧ p2 ∧ p3 as the initial state.Informally, the initial state says that the alternating finite automaton A accepts a string s iff s is in the language of the state p1, of the state p2, and of the state p3.We defer the formalization of this concept to Section 3. Thanks to this transition structure, Boolean operations over alternating automata can be performed in linear time and space.These complexities are in sharp contrast with those of classic automata where intersection has quadratic complexity and complementation has exponential complexity.
Our model combines these two models and can handle large alphabets and perform Boolean operations in linear time and space.While S-AFA inherit the efficient Boolean operations from alternating finite automata, they also inherit the complexity of deciding equivalence and emptiness: both are PSPACE-complete.(In fact these two problems are reducible to each other in linear time.)However in this paper we propose an equivalence algorithm that can efficiently solve problems like the regular expression analysis described in this section on many practical instances that existing models cannot handle.
We considered real regular expressions taken from [reg] and we were able to prove equivalences of the form r1 ∩ r2 ∩ r3 = r4 ∩ r5 in milliseconds for instances for which the classic decision procedure based on non-alternating automata could not terminate in 20 seconds.We expand on this evaluation in Section 6.

Symbolic alternating finite automata
This section gives a formal description of symbolic alternating finite automata (S-AFA).The two key features of S-AFAs are that (1) the alphabet is symbolic (as in a symbolic finite automaton), and (2) the automaton may make use of both existential and universal nondeterminism (as in an alternating finite automaton).
As in an S-FA, the symbolic alphabet of an S-AFA is manipulated algorithmically via an effective Boolean algebra.An effective Boolean algebra A has components (D, Ψ, [[ ]], ⊥, , ∨, ∧, ¬).D is a set of domain elements.Ψ is a set of predicates closed under the Boolean connectives and ⊥, ∈ Ψ.The denotation function For ϕ ∈ Ψ, we write IsSat(ϕ) when [[ϕ]] = ∅ and say that ϕ is satisfiable.In the following we will assume that IsSat is a computable function and that, for every domain element a ∈ D and predicate ϕ, it is decidable to check whether a ∈ The intuition is that such an algebra is represented programmatically as an API with corresponding methods implementing the Boolean operations and the denotation function.The following are examples of decidable effective Boolean algebras.
2 BVk is the powerset algebra whose domain is the finite set BVk, for some k > 0, consisting of all nonnegative integers smaller than 2 k , or equivalently, all k-bit bit-vectors.A predicate is represented by a BDD of depth k.The Boolean operations correspond to the BDD operations and ⊥ is the BDD representing the empty set.
The denotation [[β]] of a BDD β is the set of all integers n such that a binary representation of n corresponds to a solution of β.
SMT σ is the decision procedure for a theory over some sort σ, say integers, such as the theory of integer linear arithmetic.This algebra can be implemented through an interface to an SMT solver.Ψ contains in this case the set of all formulas ϕ(x) in that theory with one fixed free integer variable x.For example, a formula (x mod k) = 0, say div k , denotes the set of all numbers divisible by k.Then div 2 ∧ div 3 denotes the set of numbers divisible by six.
Nondeterministic automata generalize deterministic automata by allowing a state to have multiple outgoing transitions labelled with the same character.A word is accepted by the nondeterministic automaton when some run leads to an accepting state (i.e., choice is interpreted existentially).One may naturally consider the dual interpretation of choice, wherein a word is accepted when all runs lead to an accepting state (i.e., choice is interpreted universally).An alternating finite automaton supports both types of nondeterminism.Nested combinations of existential and universal choices can naturally be represented by positive Boolean formulas.Formally, for any set X, we use B + (X) to denote the set of positive Boolean formulas over X (that is, Boolean formulas built from true, false, and the members of X using the binary connectives ∧ and ∨).Definition 3.1 (Symbolic alternating finite automaton).A symbolic alternating finite automaton (S-AFA) is a tuple M = A, Q, p0, F, ∆ where A is a decidable effective Boolean algebra, Q is a finite set of states, p0 ∈ B + (Q) is a positive Boolean formula over Q, F ⊆ Q is a set of accepting states, and An S-AFA over an effective Boolean algebra A recognizes a language of words over the set of characters DA, which we will define presently.Let M = A, Q, p0, F, ∆ be an S-AFA.We define a function LM (•) : A mapping each positive Boolean formula to the language accepted by that formula to be the least function (in pointwise inclusion ordering) that satisfies: Finally, we define the language L(M ) recognized by M as L (M ) LM (p0).Unsurprisingly, the relationship between S-AFAs and S-FAs is analogous to the relationship between AFAs and NFAs: S-AFAs and S-FAS recognize the same family of languages, but converting an S-AFA with n states to an S-FA can require up to 2 n states.Interestingly, the symbolic alphabet is another source of complexity in the conversion from S-AFA to S-FA.Consider an S-AFA with two states Q = {x, y} and two outgoing transitions per state ∆ = { x, ϕ1, x , x, ϕ2, y , y, ψ1, y , y, ψ2, true } The states of the equivalent S-FA can be identified with the positive Boolean formula over Q.The state p ∧ q must have four outgoing transitions, one for each combination of the guards of x and y (ϕ1 ∧ ψ1, ϕ1 ∧ ψ2, ϕ2 ∧ ψ1, and ϕ2 ∧ ψ2).In general, for an S-AFA with n states each with m outgoing transitions, the equivalent S-FA can have states with up to m n outgoing transitions.

Boolean operations on S-AFAs
One of the critical features of S-AFAs is that Boolean operations have linear complexity in the number of states.The constructions for S-AFA union, intersection, and complementation follow the standard ones for AFA, with the exception that S-AFA complementation (like S-FA complementation) requires a preprocessing step.For the sake of completeness, we will recall these constructions below.
Suppose that M = A, Q, p0, F, ∆ and M = A, Q , p 0 , F , ∆ are S-AFAs over the same effective Boolean algebra and (without loss of generality) disjoint state spaces.Their union and intersection are defined simply by: , the set of states, set of final states, and transitions of the union/intersection S-AFAs are just the union of the component S-AFAs; they differ only in the initial state, which is either the disjunction (union) or conjunction (intersection) of the initial states of the components.
The complement construction for an S-AFA M = A, Q, p0, F, ∆ relies on M satisfying the property that for all states x ∈ Q, the set { ϕ : ∃p.x, ϕ, p ∈ ∆} forms a partition of DA.An S-AFA that satisfies this condition is called normal.Any S-AFA can be converted into an equivalent normal S-AFA using the normalization procedure pictured in Algorithm 1. (The algorithm is similar to the one in [27] for computing all satisfiable assignments to a propositional formula, and the one in [32] for computing satisfiable Boolean combinations of a set of predicates, and also the representative character enumeration algorithm that we will present in the next section -there it will be explained in greater detail).Normalization may (in the worst case) cause an exponential blow-up while IsSat(chars) do in the number of outgoing transitions of any one state in an S-AFA (note, however, that the exponential factor does not depend on the number of states).
Assuming that M is normal, the complement can be constructed by "De Morganization."We define the complement to be where • denotes the positive Boolean formula tranformation that replaces every ∧ with ∨ (and vice versa).

An algebraic view of S-AFA
This section describes S-AFA in a more algebraic style, which will be useful in the next section.
A bounded lattice L = L, , , , ⊥, is a partially ordered set L, such that every finite set of elements has a least upper bound and greatest lower bound.For any pair of elements x, y ∈ L, we use x y to denote their least upper bound and x y to denote the greatest lower bound.The least element of the lattice (the least upper bound of the empty set) is denoted by ⊥ and the greatest (the greatest lower bound of the empty set) by .We say that L is distributive if for all a, b, c ∈ L, we have a (b c) = (a b) (a c).
Our first example of a bounded lattice is the (distributive) bounded lattice of positive Boolean formulas.Operating (as we do) under the assumption that we do not distinguish between logically equivalent positive Boolean formulas, for any set X, B + (X) is a bounded lattice where the order is logical entailment, the least upper bound is disjunction, the greatest lower bound is conjunction, ⊥ is false, and is true.A second important example is the Boolean lattice 2 {0, 1}, ≤, ∨, ∧, 0, 1 (which is also bounded and distributive).
Let X be a set.A model over X is a function m : X → 2 that assigns each x ∈ X a Boolean value.The model m can be extended to evaluate any positive Boolean formula by defining: Thus, we say that the model m : X → 2 extends uniquely to a lattice homomorphism B + (X) → 2. In fact, there is nothing special about the bounded lattice 2 in this regard: if L = L, , , ⊥, is a bounded lattice, then any function f : X → L extends uniquely to a lattice homomorphism B + (X) → L. 2 In the following, our notation will not distinguish between a function f : X → L and its extension.
Let M = A, Q, p0, F, ∆ be an S-AFA.The set F ⊆ Q of final states defines a model F : Q → 2 over Q as follows: Note that for any p ∈ B + (Q), we have F (p) = 1 if and only if LM (p) contains the empty word.Any character a ∈ DA can be associated with a transition function ∆a : Recall that since ∆a is a function into a bounded lattice (namely Similarly, any word w = a1...an ∈ D * A can be associated with a transition function ∆a : Finally observe that we can characterize the language recognized by an S-AFA succinctly using the algebraic machinery described in this section: for any p ∈ B + (Q), we have

Equivalence checking for S-AFAs
This section describes an algorithm for checking whether two symbolic alternating finite automata recognize the same language.Recently, Bonchi and Pous introduced the bisimulation up to congruence technique for solving the language equivalence problem for non-deterministic finite automata [5].We extend this technique in two ways: (1) we show how the framework can be applied to alternating automata, using a propositional satisfiability solver to compute congruence closure; (2) we give a technique for extending the technique to symbolic alphabets.

Bisimulation up to congruence
We will begin by recalling some of the details of bisimulation up to congruence, adpated to our setting of symbolic alternating finite automata.
Definition 4.1 (Bisimulation).Let M = A, Q, p0, F, ∆ be an S-AFA, and let R ⊆ B + (Q) × B + (Q) be a binary relation on positive Boolean formulas over M 's states.We say that R is a bisimulation if for all p, q such that p R q, we have Consistency and compatibility are useful notions outside of the context of bisimulations, so we will provide more general definitions.For any relation R ⊆ X × X and any function f : Clearly, compatible functions are closed under composition, and the composition of a compatible function with a consistent function is consistent.
The following proposition states the soundness and completeness of the principle of bisimulations for language equivalence checking.

For any bisimulation
is a bisimulation.
We will now define bisimulation up to congruence for S-AFAs.
Definition 4.3 (Congruence closure).Let Q be a finite set and let R ⊆ B + (Q) × B + (Q) be a binary relation.The congruence closure of R, denoted ≡R, is the smallest congruence relation that contains R.That is, ≡R is the smallest reflexive, transitive, and symmetric relation that contains R and such that for all p1, p2, q1, q2 ∈ B + (Q) such that p1 ≡R q1 and p2 ≡R q2, we have p1 ∧ p2 ≡R q1 ∧ q2 and p1 ∨ p2 ≡R q1 ∨ q2.
Definition 4.4 (Bisimulation up to congruence).Let M = A, Q, p0, F, ∆ be an S-AFA, and let R ⊆ B + (Q) × B + (Q) be a binary relation.We say that R is a bisimulation up to congruence if for all p, q such that p R q, we have • Compatibility: For all a ∈ DA, ∆a(p) ≡R ∆a(q).
Bisimulation up to congruence allows us to solve language equivalence queries as follows: if R is a bisimulation up to congruence such that p R q, then p and q recognize the same language.This follows from Proposition 4.2 and the following: We delay the proof of this proposition to the next section, after we have developed some technical machinery for checking whether a given relation is a bisimulation up to congruence.

Congruence checking
First, we will address an algorithmic challenge: how may one check whether a given relation is a bisimulation up to congruence?Generating the congruence closure ≡R explicitly is intractable, since the cardinality of ≡R may be double-exponentially larger than that of R.However, for the purpose of checking whether a relation R is a bisimulation up to congruence, we need only to be able to check membership within the congruence closure.Thus, we are interested in the CONGRUENCE problem, which is stated as follows: given a finite set Q, a finite relation R ⊆ B + (Q) × B + (Q), and two positive Boolean formulas p, q ∈ B + (Q), determine whether p ≡R q.
In the following, we will show that the CONGRUENCE problem is NP-complete, but it can be solved in practice by exploiting propositional satisfiability solvers.Towards this end, we define the logical closure of a relation as follows: Definition 4.6.Let Q be a finite set and let R ⊆ B + (Q) × B + (Q) be a binary relation.Define cl(R) as the logical closure of R as follows: Observe that for any R, p, and q, we have p cl(R) q if and only if Φ(R) ∧ ¬(p ⇐⇒ q) is unsatisfiable.Thus, membership in cl(R) reduces to a propositional satisfiability problem.
Note that every propositional model m : Q → 2 extends uniquely to a bounded lattice homomorphism B + (Q) → 2 (where for all p ∈ B + (Q), m(p) = 1 ⇐⇒ m |= p).In light of this, m is a model of Φ(R) if and only if m is consistent with R, and p cl(R) q if and only if h(p) = h(q) for every bounded lattice homomorphism h : B + (Q) → 2 that is consistent with R. That is, cl(R) is the largest relation such that that every bounded lattice homomorphism that is consistent with R is consistent with cl(R).
The question now is what is the relationship between the congruence closure and the logical closure.We will show that they are identical.First, a lemma: Proof.Clearly cl(R) is a congruence relation containing R, so ≡R is a subset of cl(R).It remains to show that cl(R) is a subset of ≡R, or equivalently that if p and q are not related by ≡R, then they are not related by cl(R).
Let p and q be such that p ≡R r.Then the equivalence classes [p] and [q] are distinct in the quotient lattice B + (Q)/≡ R .By Lemma 4.7, there is a bounded lattice homomorphism f : )), f |= p ⇐⇒ q.Therefore, Φ(R) |= p ⇐⇒ q, and p and q are not related by cl(R).Proposition 4.8 yields a simple candidate algorithm for the CONGRUENCE problem: simply check whether the pair of positive Boolean formulas belongs to the logical closure of a relation using a SAT solver.The following proposition states that we cannot hope for an asymptotically superior algorithm.Finally, let ϕ be the formula obtained by replacing every negative literal ¬p with p. Then ϕ is satisfiable if and only if ϕ ≡R false.
Finally, using the technical machinery we have developed in this section, we will re-state and prove Proposition 4.5.Proof.Consistency: Suppose that p ≡R q.Since cl(R) coincides with ≡R, we have p cl(R) q, and thus Φ(R) |= p ⇐⇒ q.Since R is a bisimulation up to congruence, F is consistent with R, and thus Compatibility: Towards the converse, suppose that there exists positive Boolean formulas p, q ∈ B + (Q) and a character a ∈ DA such that ∆a(p) ≡R ∆a(q), and show that p ≡R q.
Since ∆a(p) ≡R ∆a(q), and cl(R) coincides with ≡R, there exists a model m : Q → 2 such that m |= Φ(R) but m |= ∆a(p) ⇐⇒ ∆a(q).Since m(∆a(p)) = m(∆a(q)), it is sufficient to show that m • ∆a is consistent with R (since if p and q can be distinguished by a homomorphism consistent with R, then it cannot be the case that p cl(R) q, and thus p ≡R q).Towards proving that m • ∆a is consistent with R, let r, s be such that r R s, and prove that m(∆a(r)) = m(∆a(s)).Since R is a bisimulation up to congruence, we have ∆a(r) ≡R ∆a(s).Since m is consistent with R, we have m |= ∆a(r) ⇐⇒ ∆a(s) and thus m(∆a(r)) = m(∆a(s)).

Equivalence algorithm
We will now show how the theory of bisimulation up to congruence can be leveraged in a decision procedure for S-AFA language equivalence.The algorithm addresses two challenges raised by bringing Bonchi and Pous' NFA equivalence algorithm to bear on symbolic alternating finite automata: (1) how to efficiently check membership in the congruence closure of a relation, and (2) how to efficiently enumerate a sufficient finite set of characters on which to verify the bisimuation conditions.
The equivalence decision procedure is pictured in Algorithm 2. The idea is simple: given an S-AFA M = A, Q, p0, F, ∆ and two positive Boolean formulas p0 and q0, the algorithm attempts to synthesize a bisimulation up to congruence R such that p0 R q0 or show that no such R exists.Checking congruence closure membership The algorithm implicitly maintains a relation R such that any bisimulation that contains p0, q0 must contain R. The congruence closure of R is represented using an incremental SAT solver.An incremental SAT solver s internally maintains a context formula (initially true) and supports two operations: s.push(ϕ) conjoins the constraint ϕ to s's context, and s.isSat(ϕ) checks whether the conjunction of ϕ and s's context is satisfiable.The congruence closure of R = { p1, q1 , ..., pn, qn } is represented by a SAT solver with context Φ(R) = (p1 ⇐⇒ q1) ∧• • • ∧ (pn ⇐⇒ qn) (cf.Definition 4.6 and Proposition 4.8).We may add a pair p, q to the relation R by calling s.push(p ⇐⇒ q) and we may check whether a given pair p, q belongs to ≡R by issuing the query s.isSat(p ⇐⇒ q).
The relation R is initialized to the singleton set containing the pair of p0, q0 of formulas for which we wish to decide equivalence (line 4).Recall that R is a bisimulation up to congruence if and only if it satisfies the consistency and compatibility conditions given in Definition 4.4.Thus, if at any point the relation R contains a pair p, q such that F (p) = F (q) (i.e., R fails the consistency condition), the algorithm returns false (lines [22][23], having proved that LM (p0) = LM (q0).Towards the consistency condition, the algorithm maintains a worklist such that every pair p, q ∈ R such that ∆a(p) ≡R ∆a(q) for some character a ∈ DA belongs to worklist.The algorithm returns true when worklist is empty (equivalently, when the consistency condition holds).Enumeration of representative characters Each iteration of the main loop (lines 5-26) removes a pair p, q from the worklist and adds pairs to R that are implied by the membership of p, q in R and the compatibility condition.A naive way to do this is to iterate over the alphabet DA, and for each character a ∈ DA add ∆a(p), ∆a(q) to R and worklist if ∆a(p) ≡R ∆a(q).However, iterating over the alphabet is not effective because the set of characters may be infinite.
The algorithm overcomes this problem by iterating over a finite set of representative characters, such that if ∆a(p) ≡R ∆a(q) holds for all representative characters a then it holds for all characters.One natural candidate for the set of representative characters is to choose one member of each equivalence class of the relation defined by (That is, a b if the transition functions ∆a and ∆ b are equal).Observe that any set B containing one member of each equivalence class is a valid choice for a representative set of characters: 1. B is finite: recalling that for any character a, ∆a is defined by and that ∆ is a finite set, there are only finitely many equivalence classes of , and so B is finite.
2. B is representative: Suppose ∆ b (p) ≡R ∆ b (q) for all b ∈ B, and let a ∈ DA.There is some b ∈ B such that a b, so we have ∆ b (p) = ∆a(p) ≡R ∆a(q) = ∆ b (q) and thus ∆a(p) ≡R ∆a(q).
A practical refinement of this idea is to employ an equivalence relation with fewer equivalence classes.Towards this end, for any set of states S ⊆ Q, define an equivalence relation S on the set of characters DA as follows: Note that the relation Q coincides with .The argument that any set of equivalence class representatives of is a valid set of representative characters applies also to equivalence class representatives of S for any set S that contains every state appearing in p or q.Since S is typically smaller than Q, this yields a smaller representative set.
Algorithm 2 iterates over the set of representative characters (lines 7-26) by manipulating sets of characters symbolically via the effective Boolean algebra A. The enumeration is reminiscent of the way that AllSat solvers enumerate satisfying assignments to propositional formula [27].The variable chars, initially , holds a predicate representing the set of characters that remain to be processed.At each iteration of the loop, we select a character a ∈ chars that has not yet been processed (line 10), and compute a predicate class representing its equivalence class in the relation S : class {ϕ : ∃x, q. x, ϕ, q ∈ ∆ ∧ a ∈ ϕ } ∧ {¬ϕ : ∃x, q. x, ϕ, q ∈ ∆ ∧ a / ∈ ϕ } .
We then remove every character in a's equivalence class from the set chars by conjoining chars with the negation of class (line 21).This ensures that on the next iteration of the loop, we choose a character that is not equivalent to any character seen so far (in the context of AllSat, ¬class is sometimes called a blocking clause).
Illustrative example Let M = A, Q, p0, F, ∆ be an S-AFA over the theory of linear integer arithmetic where there are five states Q = {v, w, x, y, z}, all states are final, and the transitions are as follows: −−→ w z true − − → v Assume that we want to prove that the state v is equivalent to the state w.The algorithm initializes the relation R to { v, w }, the worklist to [ v, w ], and enters the main loop: We pick v, w off the worklist, and enter the inner loop: We compute a witness to the satisfiability of chars -this may be any integer, but let's suppose that we choose 0. We compute the equivalence class of 0 to be class = c ≤ 0 and set chars ← chars ∧ ¬(c ≤ 0).We add ∆0(v), ∆0(w) = x ∨ y, z to R and the worklist.
We generate 5 as a witness to satisfiability of chars.We compute the equivalence class of 5 to be class = c > 0 and set chars ← chars ∧ ¬(c > 0).We find that and therefore, do not add ∆5(v), ∆5(w) to R or the worklist.
We find that chars is unsatisfiable and exit the inner loop.we compute 1 as a witness of satisfiability of chars.We compute the equivalence class of 1 to be class = (c = 1) and set chars ← chars ∧ ¬(c = 1).We find that and therefore do not add ∆1(x ∨ y), ∆1(z) to R or the worklist.
)) We generate 2 as a witness to satisfiability of chars.We compute the equivalence class of 2 to be class = c = 1 and set chars ← chars ∧ ¬(c = 1).We find that and therefore do not add ∆2(x ∨ y), ∆2(z) to R or the worklist.
).We find that chars is unsatisfiable and exit the inner loop.

Implementation
We implemented S-AFAs and their decision procedure in an existing Java automata library. 3   3 Name and link omitted for double blind.The implementation provides an interface for specifying custom Boolean algebras for both the alphabet theory and the positive Boolean formulas over the automaton states and it can be easily integrated with externally specified alphabet theories.To represent the positive Boolean formulas over the automaton states we implemented two algebras: one which simply maintains the explicit Boolean representations of formulas (referred to as DAG in the experiments) and one which instead maintains a BDD corresponding to each formula.We use the JDDFactory implementation in JavaBDD as our BDD library (http://javabdd.sourceforge.net/).To check membership of formulas to the congruence closure we use the SAT solver JSAT (https://j-sat.com/tag/jsat/).
Our implementation is open source.Optimizations We implemented a few simple optimizations for improving the performance of our decision procedure.
First, whenever we construct an S-AFA, we remove all the states that are trivially non reachable from the initial state or that cannot reach a final state.Given a positive Boolean formula f ∈ B + (Q), let st(f ) ⊆ Q be the set of states appearing in f .Given a S-AFA M = A, Q, p0, F, ∆ , we construct a graph GM = (V, E) with set of vertices V = Q, and transitions E = {(s, s ) | (s, ϕ, q) ∈ ∆ ∧ s ∈ st(q)}.We then remove from M all the states that are not reachable from one of the states s0 ∈ st(p0) in GM and all the states that do not have a path to some state s ∈ F in GM .In each positive Boolean formulas of M we replace every removed state with false.The resulting S-AFA is equivalent to M .
Second, note that Algorithm 2 does not specify what data structure to use for the worklist.Natural choices for such a data structure are a stack or a queue, but none of these data structures leverages the fact that smaller formulas are more likely to generate better congruences.Instead, we implement the worklist using a priority queue which always extracts the pair of smallest size, where the size of a pair (p, q) is given by the formula |p| + |q|.

Evaluation
We evaluated the performance of our algorithms on the following benchmarks.
1. We check satisfiability of the LTL formulas appearing in [17] using the semantics of LTL over finite traces from [15].

We check equivalence of Boolean combinations of regular expressions appearing in [reg].
All experiments were run on an Intel Core i7 2.60 GHz CPU with 16 GB of RAM.

Satisfiability checking for LTL over finite traces
Linear temporal logic (LTL) plays a prominent role in program verification and its properties have been studied extensively.While the semantics of LTL is typically defined over infinite strings, recently there has been a lot of interest in the interpretation of LTL over finite traces [15] because this variant can be used in applications such as program monitoring.We use LTL-F to refer to the interpretation of LTL over finite traces.LTL-F is as expressive as first-order logic over strings, while checking satisfiability of LTL-F formulas is a PSPACE-complete problem, there exists a linear time translation from LTL-F formulas to alternating automata [15].Similarly to what happens with regular LTL, this translation results in an alphabet of size exponential in the number of atomic proposition appearing in the formula.Due to this reason, S-AFAs are a promising model for designing decision procedures for of LTL-F.
In this section, we evaluate the performance of our algorithm on the tasks of checking satisfiability and equivalence of LTL-F formulas using the linear time translation from LTL-F formulas to alternating automata proposed in [15] We first describe the set of considered formulas and then present one experiment for checking satisfiability and one for checking equivalence.Benchmark formulas We consider three sets of LTL formulas.The first set (lift and lift b in the figures) contains 16 parametric formulas describing a lift system of increasing complexity [18].The second set (counter and counter l in the figures) contains 30 formulas describing counters for which satisfiability under the infinite string semantics is notoriously hard [28].Interestingly, these formulas have exactly one model under the infinite string semantics, while they are unsatisfiable under the finite string one.The third set contains more than 10,000 random formulas that were created as part of the experimental evaluation in [12].The formulas have size varying between 10 and 100, and number of atomic propositions varying between 2 and 4. For the non-random formulas we set the timeout at 60 seconds, while for the random ones, given how many there are, we set the timeout at 5 seconds.

Satisfiability checking
We evaluate the performance of our algorithm for checking satisfiability of the S-AFAs corresponding to the given LTL-F formulas.We use the linear time translation from LTL-F to monadic second order logic (MSO) proposed in [15] to generate MSO formu- las equivalent to the LTL-F ones and compare our implementation against Mona [19], a solver for the monadic second order logic of one successor. 4In LTL-F the alphabet is the set of bitvectors of size n, where n is the number of atomic propositions appearing in the formula and each bit indicates whether one of the atomic propositions is true or false.The constructed S-AFAs will therefore be over the theory of bit-vectors and we use BDDs to describe predicates in such a theory.
For each formula we measure: 1) the runtime of Mona on the equivalent MSO formula (mona); 2) the runtime of computing the deterministic automaton accepting the reverse language of the S-AFA and checking its emptiness (reverse-SFA) 5 ; 3) the runtime of the bisimulation algorithm using a directed acyclic graph (i.e., hash-consed) representation of positive Boolean expressions (bisim-DAG); 4) the runtime of the bisimulation algorithm using BDDs to represent positive Boolean expressions (bisim-BDD).The results for the non-random LTL formulas are depicted in Figure 2, while Figure 3 illustrates the difference in runtime between our bisimulation algorithm, reverse-S-FA, and Mona.Since the bisimulation algorithm that uses an explicit representation of positive Boolean expressions is consistently faster than the one using BDDs, Figure 3 only shows results for the former.In these graphs a point above 0 means that the bisimulation solver is faster than the other solver.Finally, in the top part of Figure 5 we show the number of times each solver timed out when checking satisfiability of the 10,000 randomly generated formulas.Results Mona outperforms the bisimulation solver on non-random formulas.However, Mona runs out of memory for relatively small instances for which our solver does not.Mona is slower than our algorithm and times out more often on randomly generated LTL formulas.In particular the bisimulation algorithm outperforms Mona on 87% of the instances.The reverse-SFA algorithm also times out often and is in general slower than the bisimulation algorithm.Explicitly representing positive Boolean expressions is generally much faster than using BDDs.Even though our algorithm is mostly suited for equivalence and inclusion checking, this experiments illustrates that S-AFAs and our bisimulation technique are a viable solution for checking satisfiability of LTL-F formulas.

Equivalence checking
We evaluate the performance of our algorithm for checking the equivalence of the S-AFAs corresponding to the given LTL-F formulas against slight modifications of such automata.Given an LTL-F formula ϕ, let Aϕ be the corresponding S-AFA.For each formula ϕ in the benchmark, we compute a variation A ϕ of the automaton Aϕ by randomly flipping the acceptance condition of one of the states in Aϕ.We then measure the cost of checking the equivalence of Aϕ with A ϕ .In this experiment, we do not consider Mona as there is no natural way to generate a formula corresponding to the modified S-AFAs.Given the random nature of our experiment we only consider the 10,000 randomly generated LTL formulas from [12].The results are depicted in Figure 4.The graph shows the time difference between the reverse-SFA algorithm and the bisimulation algorithm.Again we only plot the case in which positive Boolean expressions are represented explicitly rather than with BDDs.Ṫhe bottom graph in the Figure 5 shows the number times each solver timed out.

Results
The bisimulation algorithm is again faster than the reverse-SFA algorithm and times out in only one instance.In this experiment representing positive Boolean expressions using BDDs incurs in more timeouts than those observed using the reverse-SFA algorithm.We believe that the slow performance of BDDs is due to the many substitution operations-a slow operation for BDDsneeded by the equivalence algorithm.

Boolean combinations of regular expressions
Regular expressions are ubiquitous and their analysis is fundamental in many domains, from deep-packet inspection in networking [29] to static analysis of string-manipulating programs [2].Classic automata techniques for analyzing regular expressions are often limited by two factors.First, regular expressions operate over large alphabets, making most existing automata representations impractical.Second, Boolean combinations of regular expressions produce automata with large number of states.Since our model promises to attenuate both these problems, in this experiment we ask the following question.Is our technique more efficient than classic automata techniques when analyzing properties involving Boolean combinations of regular expressions?In this experiment we use unions of intervals to represent predicates in the alphabet theory.This representation naturally models character classes which often appear in regular expressions-e.g., [a-z0-9].We stress how the ability to easily change the representation of the underlying alphabet illustrates the versatility of S-AFAs.We first describe the set of considered regular expression and then describe what experiments we evaluate our techniques on.Benchmark formulas We consider regular expressions from [reg].This site contains more than 3,000 crowd-sourced regular expressions for tasks such as email filtering, phone number detection, and URL detection.From these expressions, we isolate 75 regular expressions for email filtering and consider Boolean combinations of them.For each experiment we set the timeout at 20 seconds.

Equivalence checking
We evaluate the performance of our algorithm on the task of checking equivalence of intersected regular expressions.This experiment is inspired by the application described in Section 2, where we were interested in detecting whether a newly added spam filter is already present in a set of existing spam filters.We identify sets of regular expressions {r1, . . ., rn} such that L(r1) ∩ . . .∩ L(rn) = ∅ and n ∈ {3, 4, 5}.Here L(r) denotes the set of strings accepted by r.For each set of expressions we then measure the time required to check whether L(r1) ∩ . . .∩ L(rn−1) = L(r1) ∩ . . .∩ L(rn).We only illustrate instances on which at least one solver doesn't timeout and, for each value of n, we stop the generation at 6,000 sets.In this experiment we compare our algorithm against the classic decision procedure based on finite automata intersection, determinization, and equivalence.Concretely, for each set of regu- lar expressions we build the corresponding (non-alternating) symbolic finite automata (S-FAS), then perform automata intersection, determinize the two automata corresponding to the left-hand and right-hand sides of the equality, and finally use Hopcroft-Karp algorithm [21] to check the equivalence of the resulting automata. 6 When measuring the running time of this procedure we consider the cumulative cost of all operations.
For the bisimulation experiment, we take advantage of the fact that our algorithm can also check the equivalence of two configurations of the same S-AFA.In particular, instead of building two S-AFAs and check whether they accept the same language, we only build one S-AFA and check the equivalence of the two state configurations corresponding to L(r1) ∩ . . .∩ L(rn) and L(r1) ∩ . . .∩ L(rn) ∩ L(rn+1).Given a set of regular expressions {r1, . . ., rn}, let {A1, . . ., An} be the corresponding nondeterministic S-FAS (notice that an S-FA is also an S-AFA) with corresponding initial states {q 1 0 , . . ., q n 0 }.After building the intersected S-AFA A = A1 ∩ . . .∩ An with initial state q 1 0 ∧ . . .∧ q n 0 , we check whether the state q 1 0 ∧ . . .∧ q n 0 is equivalent to the state q 1 0 ∧ . . .∧ q n−1 0 .Figure 6 illustrates the runtime of the classic algorithm based on S-FA equivalence (SFAEq) and our bisimulation based equivalence procedure (bisim-DAG) for sets of regular expressions of different sizes.Given the slowdown observed in Section 6.1, in this experiment we do not measure the performance of the bisimulation equivalence that uses BDD to represent positive Boolean expressions.A point above the diagonal indicates an instance in which the bisimulation algorithm is faster than performing S-FA equivalence.The first three entries of Figure 7 show the number of instances on which each algorithm timed out.Results Remarkably, the two algorithms have orthogonal performances and none of the two strictly outperforms the other one.The bisimulation algorithm is faster than the S-FA algorithm on approximately 45% of the instances, but it times out 803 fewer times.The results are truly encouraging and show that each algorithm has its benefits.

Forced equivalence checking
In the previous experiment almost all tuples return inequivalent as the result.To better appreciate the cost of checking equivalence 6 For a fair comparison we consider symbolic finite automata instead of classical automata, as the latter would suffer from the large alphabet size.  in the case in which both sides of the equality describe the same language we perform the following experiment.For every tuple {r1, . . ., rn} (n ∈ {2, 3, 4}) such that L(r1) ∩ . . .∩ L(rn) = ∅, we measure the time required to check whether L(r1) ∩ . . .∩ L(rn) = L(r1) ∩ . . .∩ L(rn) ∩ L(rn), where one of the regular expression is added twice. 7We only illustrate instances on which at least one solver doesn't timeout and, for each value of n, we stop the generation at 6,000 sets.The results are showed in Figure 8. Figure 8 illustrates the running times of this experiment, while the last three entries of Figure 7 show the number of instances on which each algorithm timed out.
To better asses the effectiveness of the congruence in pruning the space of explored states we also measured how many states each algorithm explored during the equivalence check.The results are shown in Figure 9. Results Again, the two algorithms have orthogonal performances and none of the two strictly outperforms the other one.However, this time, the bisimulation algorithm is faster than the S-FA algorithm on approximately 20% of the instances, but it times out 492 fewer times.We find quite remarkable how the two algorithms have complementary performances.When we combine more than 30,000 equivalence checks performed in this section, we have 2,292 instances in which the bisimulation algorithm terminated while the S-FA equivalence timed out and 997 instances in which the S-FA equivalence terminated but the bisimulation algorithm timed out.This experiment shows that our algorithm is useful for analyzing regular expressions and it will be a great addition to practical regular expression engines.Moreover, we could also appreciate how, in the forced equivalence experiment, the bisimulation procedure explored fewer states than the algorithm for checking S-FA equivalence approximately 46% of the times.These are typically the instances for which the bisimulation algorithm is the faster one.

Applications and future directions
The development of the theory of S-AFAs is motivated by several concrete practical problems.Here we discuss six such applications.
In each case we illustrate what kind of alphabet theory the automaton operates with, the role of alternation, and how our decision procedure could be adopted.

Regular expression analysis
We discussed how regular expressions are used in many different contexts.In particular, the problems of checking equivalence and inclusion arise in many practical applications, such as text processing and analysis of string-manipulating programs [2,33].In Section 6.2 we empirically showed that S-AFAs are an effective model for checking equivalence of complex combinations of regular expressions.Moreover, unlike classic models like DFAs and NFAs, S-AFAs can use interval arithmetic or BDDs to succinctly represent the complex alphabet structure of real-world regular expressions.

Model-checking LTL over finite traces
In Section 6.1 we showed how S-AFAs can effectively check satisfiability of LTL formulas interpreted over finite sequences.Part of this success was enabled by the BDD representation of the input alphabet.While we mostly focused on checking emptiness of the S-AFAs generated from the LTL-F formulas, another promising option is that of using our algorithm to model-check transitions systems against LTL-F formulas [10].Intuitively, one could model a transition system as an S-AFA T and the LTL-F property as another S-AFA P .The model-checking problem then amounts to whether the language of the transitions system T is included in the language of the property P .We can reduce this check to an equivalence query of the form T ∨ P = P .Exploring whether our algorithm could improve on the state of the art model checkers is an intriguing research direction.

SMT solving with sequences
SMT solvers such as [16] have drastically changed the world of programming languages and turned previously unsolvable problems into feasible ones.The recent interest in verifying programs operating over sequences has created a need for extending existing SMT solving techniques to handle sequences over complex theories [32].Solvers that are able to handle strings, typically do so by building automata and then performing complex operations over such automata [24].As we advocated in this paper, this approach is doomed to incur into a state blow-up as the SMT formulas typically contain many Boolean operations.Moreover, existing solvers only handle strings over finite and small alphabets [24].S-AFAs and the techniques presented in this paper have the potential to impact the way in which such solvers for SMT are built as they can support sequences over arbitrary alphabet theory and do not incur into state explosion.Investigating ways to integrate S-AFAs into SMT solvers for sequences is an exciting and challenging research direction we plan to pursue in the future.

Automata learning
The first algorithm for efficiently learning deterministic finite automata was introduced by Dana Angluin [3].Since then, automata learning has found many applications in program verification and program synthesis [1,36].Recently, algorithms have been proposed to learn alternating finite automata [4].Similarly to Angluin's original algorithm, the one proposed in [4] uses a learning model in which the learner, who is trying to learn an automaton for describing an unknown target language L, is allowed to query the teachers and ask two types of queries.
Equivalence queries Given an automaton A, does A accepts exactly the language L.
Membership queries Given a string s, is the string s in the language L.
In a model in which the language L is specified as an automaton itself (this is for example the case in [6]) being able to efficiently perform such queries is crucial for efficient learning.The equivalence algorithm proposed in this paper has the potential to make the problem of learning alternating automata more efficient.

S-AFA minimization
Automata minimization is a crucial operation for mitigating state space explosion.For example, one of the "secrets" that made the monadic second-order logic solver Mona [19] practical was to eagerly perform automata minimization when building intermediate automata corresponding to sub-formulas of the input formula.Recently, minimization and state reduction algorithms have been proposed for S-FAS [13] and nondeterministic automata [26].However, the problem of reducing the state space of alternating automata has not been explored yet.Interestingly, when performing the LTL-F to S-AFA reduction during our evaluation in Section 6.1 we observed that many states of the produced S-AFAs were redundant because the same sub-formulas (e.g., Xa) appeared multiple times in the input LTL-F formula.While we solved this problem by hashing sub-formulas, it would have been better to automatically detect such redundancies using a state reduction algorithm for S-AFAs.Automata minimization and bisimulations are tightly related concepts and we are confident that the notions presented in this paper can be of aid in designing efficient algorithms for reducing the number of states in S-AFAs and in general in alternating automata.

List manipulating programs
Since S-AFAs support efficient Boolean operations (Section 3.1), they offer an intruiging possibility of acting as the predicates of an effective Boolean algebra.That is, we can consider S-AFAs over an effective Boolean algebra where characters are strings and the predicates are themselves S-AFAs.Naturally, this process can be iterated indefinitely (strings of strings of strings of strings ...).One compelling use-case for such technology is in expressing and verifying correctness properties of list-manipulating programs.For example, a sparse representation of a 10×10 matrix of integers might be represented by type of the form: One could imagine that the decision procedure presented in Section 4 might be useful for type checking in a language with such a rich type system (e.g., consider the problem of checking that matrix addition preserves the sparse matrix invariants).

Related work
Automata with predicates The concept of automata with predicates instead of concrete symbols was first mentioned in [35] and was first discussed in [30] in the context of natural language processing.S-FAS were then formally introduced in [33] and then adopted in [34] with a focus on security analysis of sanitizers.D'Antoni et al. studied alternation for symbolic tree automata, but all their techniques are based on classic algorithms for eliminating alternation and reductions to non-alternating automata [14].Our approach is different from the one in [14] as our equivalence procedure does not need to build the S-FA corresponding to a S-AFA.The Mona implementation [22] provides decision procedures for monadic second-order logic, and it relies on a highly-optimized BDD-based representation for automata which has seen extensive engineering effort [22].Therefore, the use of BDDs in the context of automata is not new, but is used here as an example of a Boolean algebra that seems particularly well suited for working with the alternating automata generated by LTL formulas.
Alternating automata Alternation is an old concept in computer science and and the notion of alternating automata dates back to the 80s [7,9].Vardi recognized the potential of such a model in program verification, in spite of their high theoretical complexities [31].Alaska was the one of the first practical implementation of alternating automata [17].In Alaska, the alphabet and the set of states are both represented using bit-vectors and this allows to model the search space using BDD.While this representation is somewhat similar to ours, S-AFAs are much more modular because they support arbitrary alphabets and alphabet representation (not just bit-vectors and BDDs) and arbitrary state representations (again not just BDDs).Alaska performs state-space reduction using antichains while checking AFA emptiness.As observed by Bonchi and Pous [5], bisimulation up to congruence strictly subsumes antichain reduction.
Equivalence using bisimulation up to Notions of bisimulations similar to the one presented in this paper have been studied in the past [8,20,25] in different context that did not related to language equivalence.The paper that most relates to ours is by Bonchi and Pous [5], where the idea of bisimulation up to congruence is used to check equivalence and inclusion of non-deterministic finite state automata (NFAs).There are two main differences with the ideas we presented here.First, generalizing bisimulation up to congruence to alternating finite automata is absolutely non-trivial.While in the case of NFAs, the congruence closure can be computed in polynomial time with a simple saturation algorithm, this is not the case for AFAs.In fact, for AFAs the problem becomes NP-complete.Second, the techniques in [5] are described for NFAs operating over finite alphabets and most of the presented examples operate over alphabets of size two.We demonstrated a technique for extending the bisimulation up to congruence technique to symbolic and potentially infinite alphabets using a representative enumeration algorithm reminiscent of AllSat solving.In fact, our technique can also be used to extend the techniques proposed in [5] to arbitrary domains.

Conclusion
We presented symbolic alternating finite automata, S-AFAs, a succinct and efficient automaton model for describing sets of finite sequences over large and potentially infinite alphabets.We also introduced an algorithm for checking the equivalence of two S-AFAs based on bisimulation up to congruence and showed how this algorithm often outperforms other automaton models in two concrete applications.First, our algorithm can be used to efficiently check satisfiability of linear temporal logic formulas over finite traces.Second, our algorithm can efficiently checking whether different Boolean combinations of regular expressions describe the same language.

Lemma 4 . 7 .
Let L = L, , , ⊥, be a finite bounded lattice.For any two distinct elements a, b in L, there is a homomorphismf : L → 2 such that f (a) = f (b).Proof.Without loss of generality, suppose that a b.Let c be a join-irreducible element c of L such that c a and c b.The existence of such a c can be proved by contradiction: suppose there exists some a b such that there is no join-irreducible element c of L such that c a. Then there exists a least such element a.By assumption, a is join-reducible so there exists d and d such that d a, d a, and d d = a.It cannot be the case that both d b and d b (if so, then d d b, but by construction we have a = d d b).Suppose without loss of generality that d b.By minimality of a, there is some join-irreducible element c d such that c b. Since d a, we have c a and we are done.Construct a function f : L → 2 by defining f (x) 1 if c ≤ x 0 otherwise One may check that f is a bounded lattice homomorphism with f (a) = 1 and f (b) = 0. Proposition 4.8.Let Q be a finite set and let R ⊆ B + (Q)×B + (Q) be a binary relation.Then cl(R) coincides with ≡R.

Proposition 4 . 9 .
CONGRUENCE is NP-complete.Proof.Membership in NP follows immediately from Proposition 4.8.We prove NP-hardness of CONGRUENCE by giving polytime reduction from SAT.The key insight is that the relation R can be used to axiomatize negation, so that arbitrary Boolean formulas can be encoded into positive Boolean formulas.Let ϕ be a Boolean formula in conjunctive normal form over a set of propositional variables P .Form a new set of propositional variables Q P ∪ {p : p ∈ P } consisting of the original propositional variables P plus a disjoint set of "barred" copies, intended to represent negative literals.Define a relation R ⊆ B + (Q) × B + (Q) as follows: R { p ∧ p, false : p ∈ P } ∪ { p ∨ p, true : p ∈ P }

Proposition 4 . 5 .
Let M = A, Q, p0, F, ∆ be an S-AFA.Let R ⊆ B + (Q) × B + (Q) bea binary relation on positive Boolean formulas over M 's states.R is a bisimulation up to congruence if and only if ≡R is a bisimulation.

Figure 2 .
Figure 2. Satisfiability checking for non-random LTL formulas.The missing points are instances for which Mona runs out of memory.

Figure 5 .
Figure 5. Number of timeouts for different solver for checking emptiness and equivalence of all the random LTL formulas.

Figure 7 .
Figure 7. Timeout distribution for regular expression equivalence checks.
type pos = {p : int | 0 ≤ p < 10} type row = {r : (pos × int) list | length(r) ≤ 10} type matrix = {m : (pos × row) list | length(m) ≤ 10} a binary relation on positive Boolean formulas over M 's states.R is a bisimulation up to congruence if and only if ≡R is a bisimulation.