figure a
figure b

1 Introduction

Recently, many tools for solving string constraints have been developed, motivated mainly by techniques for finding security vulnerabilities such as SQL injection or cross-site scripting (XSS) in web applications [34,35,36]. String solving has also found its applications in, e.g., analysis of access user policies in Amazon Web Services [8, 26, 39] or smart contracts [7]. Solvers for string constraints are usually implemented as string theory solvers inside SMT solvers, such as cvc 5  [9] or Z3  [31], allowing combination with other theories, most commonly the theory of integers for string lengths. Other well known string solvers include Z3str3RE  [12, 13], Z3-Trau  [1], Z3str4  [30], OSTRICH  [19], and others.

In this paper, we present Z3-Noodler  1.0.0 [47], a fork of Z3  4.12.2 where the string theory solver is replaced with the stabilization-based procedure for solving string (dis)equations with regular and length constraints [14, 20]. The procedure makes heavy use of nondeterministic finite automata (NFAs) and operations over them, for which we use the efficient Mata library for NFAs [23, 29].

The presented version implements multiple improvements over a previous Z3-Noodler prototype from [20]. Firstly, it extends the support for string predicates from the SMT-LIB string theory standard [11] by (1) applying smarter and more specific axiom saturation and (2) adding support for their solving inside the decision procedure (e.g., for the \(\lnot \texttt {contains} \) predicate). It also implements various optimizations (e.g., for regular constraints handling) and other decision procedures, e.g., the Nielsen transformation [32] for quadratic equations and a procedure for regular language (dis)equations; moreover, we added heuristics for choosing the best decision procedure to use.

We compared Z3-Noodler with other string solvers on standard SMT-LIB benchmarks [10, 42, 43]. The results indicate that Z3-Noodler is competitive, superior especially on benchmarks containing mostly regular constraints and word (dis)equations, and that the improvements since [20] had a large impact on the number of solved instances as well as its overall performance.

2 Architecture

Z3-Noodler replaces the string theory solver in the DPLL(T)-based SMT solver Z3  [31] (version 4.12.2) with our string solver Noodler  [14], which is based on the stabilization algorithm (cf. Section 3). DPLL(T)-based solvers in general combine a SAT solver providing satisfying assignments to the Boolean skeleton of a formula with multiple theory solvers for checking conjunctions of theory literals.

Z3-Noodler still uses the infrastructure of Z3, most importantly the parser, string theory rewriter and the linear integer arithmetic (LIA) solver. The Z3 parser takes formulae in the SMT-LIB format [10], where Z3-Noodler can handle nearly all predicates/functions (such as substr, len, at, replace, regular membership, word equations, etc.) in the string theory as defined by SMT-LIB [11].

Even though we do use the string theory rewriter of Z3, we disabled those rewritings that do not benefit our core string solver. For instance, we removed rules that rewrite regular membership constraints to other types of constraints since solving regular constraints and word equations using our stabilization-based approach is efficient.

Fig. 1.
figure 1

Architecture of Z3-Noodler

The interaction of the Noodler solver with Z3 is shown in Fig. 1 and works as follows. Upon receiving a satisfying Boolean assignment from the SAT solver ( ), we first remove irrelevant assignments (using Z3 ’s relevancy propagation), which allows us to work with smaller instances and return more general theory lemmas. A theory assignment obtained from the Boolean assignment consists of string (dis)equations, regular constraints, and, possibly, predicates that were not axiom-saturated before (cf. Section 3).

The core Noodler string decision procedure then reduces the conjunction of string literals to a LIA constraint over string lengths, and returns it to Z3 as a theory lemma ( ), to be solved together with the rest of the input arithmetic constraints by Z3 ’s internal LIA solver. Noodler implements a couple of decision procedures (discussed in Section 3), heavily employing the Mata automata library (version 0.109.0) [29] ( ). As an optimization of the theory lemma generation, when the string constraint reduces into a disjunction of LIA length constraints, we check the satisfiability of individual disjuncts (generated lazily on demand) separately in order to get a positive answer as soon as possible. For testing the disjuncts, the current solver context is cloned and queried about satisfiability of the LIA constraint conjoined with the disjunct ( ).

3 String Theory Core

In this section, we provide details about Z3-Noodler ’s string theory implementation, including initial axiom saturation, proprocessing, the core procedure, and limitations.

Axiom Saturation. In order to best utilize the power of Z3 ’s internal LIA solver during the generation of a satisfiable assignment, we saturate the input formula with length-aware theory axioms and axioms for string predicates (this happens during Z3 ’s processing of the input formula, before the main SAT solver starts generating assignments). We can then avoid checking SAT assignments that trivially violate length conditions. Most importantly, we add length axioms \(\texttt {len} (t_1) \ge 0\), \(\texttt {len} (t_1.t_2) = \texttt {len} (t_1) + \texttt {len} (t_2)\) where \(t_1, t_2\) are arbitrary string terms, and \(\texttt {len} (t_1) = \texttt {len} (t_2)\) for the word equation \(t_1 = t_2\).

Moreover, for string functions/predicates, Noodler saturates the original formula with an equivalent formula composed of word (dis)equations and length/regular constraints, which are more suitable for our core procedure (e.g., for \(\lnot \texttt {contains} (s, \texttt {"abc"})\) in the input formula, we add the regular constraint \(s \notin \Sigma ^* \texttt {abc} \Sigma ^*\)). We use different saturation rules for instances of predicates with concrete values. For instance, for \(\texttt {substr} (s,\texttt {4},\texttt {1})\), we add just the term \(\texttt {at} (s, \texttt {4})\). On the other hand, for \(\texttt {substr} (s,t_i,t_j)\), where s is a string term and \(t_i, t_j\) are general integer terms (possibly containing variables), we need to add a more general formula talking about the prefix and suffix of s of given lengths. The original predicate occurrence is then removed from received assignments by Noodler (Z3 does not allow to remove parts of the original formula).

Decision Procedures.   Z3-Noodler ’s string theory core contains several complementary decision procedures. The main one is the stabilization-based algorithm for solving word equations with regular constraints introduced in [14] and later extended with efficient handling of length constraints and disequations [20]. The stabilization-based algorithm starts, for every string variable, with an NFA encoding regular constraints on the variable and iteratively refines the NFA according to the word equations until the stability condition is achieved. The stability condition holds when, for every word equation, the language of the left-hand side (obtained as the language of the concatenation of NFAs for variables and string literals) equals the language of the right-hand side. When stability is achieved, length constraints of the solutions are generated and passed to the LIA solver. The algorithm is complete for the chain-free [5] combinations of equations, regular and length constraints, together with unrestricted disequations, making it the largest known decidable fragment of these types of constraints.

The stabilization-based decision procedure starts by inductively converting the initial regular constraints into NFAs. During the construction, we utilize eager simulation-based reduction [16, 17] with on-demand determinization and minimization.

For an efficient handling of quadratic equations (systems of equations with at most two occurrences of each variable) with lengths, Noodler implements a decision procedure based on the Nielsen transformation [32]. The algorithm constructs a graph corresponding to the system and reasons about it to determine if the input formula is satisfiable or not [22, 38]. If the system contains length variables, we also create a counter automaton corresponding to the Nielsen graph (in a similar way as in [28]). In the subsequent step, we contract edges, saturating the set of self-loops and, finally, we iteratively generate flat counter sub-automata (a flat counter automaton only allows cycles that are self-loops), which are later transformed into LIA formulae describing lengths of all possible solutions.

In order to solve (dis)equations of regular expressions, we reduce the problem to reasoning about the corresponding NFAs (similarly as for regular constraints handling). In particular, we use efficient NFA equivalence and universality checking from Mata, which implements advanced antichain-based algorithms [6, 46].

Preprocessing. Each decision procedure employs a sequence of preprocessing rules transforming the string constraint to a more suitable form. Our portfolio of rules includes transformations reducing the number of equations by a conversion to regular constraints, propagating epsilons and variables over equations, underapproximation rules, and rules reducing the number of disequations (cf. [20]). On top of that, Z3-Noodler employs information about length-equivalent variables allowing to infer simpler constraints (e.g., for \(xy=zw\) with \(\texttt {len} (x) = \texttt {len} (z)\), we can infer \(y = w\)). Z3-Noodler also checks for simple unsatisfiable patterns for early termination. A sequence of preprocessing rules is composed for each of the decision procedures differently, maximizing their strengths.

Supported String Predicates and Limitations. Z3-Noodler currently supports handling of basic string predicates replace, substr, at, indexof, prefix, suffix, contains, and a limited support for \(\lnot \texttt {contains} \). From the set of extended constraints, the core solver currently does not support the replace_all function (and variants of replacement based on regular expressions) and to/from_int conversions. The decision procedures used in Z3-Noodler make it complete for the chain-free fragment with unbounded disequations and regular constraints [20], and quadratic equations. Outside this fragment, our theory core is sound but incomplete.

4 Experiments

Tools and environment. We compared Z3-Noodler with the following state-of-the-art tools: cvc 5  [9] (version 1.0.8), Z3  [31] (version 4.12.2), Z3str3RE  [12, 13], Z3str4  [30], OSTRICH  [19]Footnote 1, and Z3-Noodler \(^{ pr }\) (version 0.1.0 used in [20]). We did not compare with Z3-Trau  [2] as it is no longer under active development and gives incorrect results on newer benchmarks. The experiments were executed on a workstation with an Intel Xeon Silver 4314 CPU @ 2.4 GHz with 128 GiB of RAM running Debian GNU/Linux. The timeout was set to 120 s, memory limit was set to 8 GiB.

Benchmarks. The benchmarks come from the SMT-LIB [10] repository, specifically categories QF_S  [42] and QF_SLIA  [43]. These benchmarks were also used in SMT-COMP’23 [41], in which Z3-Noodler participated (version 0.2.0). As Z3-Noodler does not support to/from_int conversions and replace_all-like predicates, we excluded formulae whose satisfiability checking needs their support. Based on the occurrences of different kinds of constraints, we divide the benchmarks into three groups:

  • Regex This category contains formulae with dominating regular membership and length constraints. It consists of AutomatArk  [13], Denghang, StringFuzz  [15], and Sygus-qgen benchmark sets. We excluded 1,568 formulae from StringFuzz that require support of the to_int predicate.

  • Equations The formulae in this category consist mostly of word equations with length constraints and a small amount of other predicates. It contains Kaluza  [27, 40], Kepler  [25], Norn  [3, 4], Slent  [44], Slog  [45], Webapp, and Woorpje  [24] benchmark sets. We excluded 414 formulae from Webapp that require support of replace_all, replace_re, and replace_re_all predicates.

  • Predicates-small Although Z3-Noodler focuses mainly on word equations with length and regular constraints, the evaluation includes also a group consisting of smaller formulae that use string predicates such as substr, at, contains, etc. It is formed from FullStrInt, LeetCode, and StrSmallRw  [33] benchmark sets. We removed 5,509 formulae containing the to/from_int predicates from FullStrInt and StrSmallRw.

We also consider the PyEx  [37] benchmark, which we do not put into any of these groups, as it contains large formulae with complex predicates (substr, contains, etc.). We note that we omit the small Transducer+  [18] benchmark because it contains exclusively formulae with replace_all.

Table 1. Results of experiments on all benchmark sets. For each tool and benchmark set (as well as whole groups under \(\Sigma \)), we give the number of unsolved instances. Results for tools with the highest number of solved instances are in bold. Numbers with \({}^*\) contain also incorrect results.
Table 2. Average run times (in seconds) of solved instances and their standard deviations.

Results. We show the number of unsolved instances for each benchmark and tool (as well as whole groups) in Table 1. Some tools gave incorrect results (determined by comparing to the output of cvc 5 and Z3) for some benchmarks. Usually, this was less than 10 instances, except for Z3str3RE on StringFuzz and StrSmallRw (50 and 12 incorrect results respectively) and Z3-Noodler \(^{ pr }\) on StrSmallRw (218 incorrect results). Table 2 then shows the average run times and their standard deviations for solved instances for each category and tool.

The results show that Z3-Noodler outperforms other tools on the Regex group (in particular on Denghang, StringFuzz, and Sygus-qgen) both in the number of solved instances and the average run time. Only on AutomatArk it cannot solve the most formulae (but it solves only 7 less than the winner OSTRICH, while being much faster).

On the Equations group, Z3-Noodler also outperforms other tools on most of the benchmarks. In particular on Kepler, Norn, Slent, Slog, and Webapp. On Kaluza, it is outperformed by other tools, but it still solves the vast majority of formulae. Z3-Noodler has worse performance on Woorpje, which seems to be a synthetic benchmark generated to showcase the strength of a specialized algorithm [24] (this benchmark is the reason for Z3-Noodler taking the second place in the whole group). With 0.11 s, Z3-Noodler and cvc 5 have the lowest average run time.

The winner of Predicates-small is cvc 5. In particular, on FullStrInt and LeetCode the difference with Z3-Noodler is equally 4 instances and on StrSmallRw the difference is 51 cases. The average time of Z3-Noodler is also a bit higher, with 0.11 s for Z3-Noodler compared to the 0.03 s for cvc 5. Similarly, Z3-Noodler is outperformed by cvc 5, Z3, and Z3str4 on PyEx. Indeed, we have not optimized Z3-Noodler for formulae with large numbers of predicates yet. The results of Z3-Noodler could, however, be further improved by proper axiom saturation for predicates or lazy predicate evaluation.

Fig. 2.
figure 2

Comparison of Z3-Noodler with cvc 5, Z3, and the virtual best solver (VBS). Times are in seconds, axes are logarithmic. Dashed lines represent timeouts (120 s). Colours distinguish groups:  Regex,  Equations, and  Predicates-small.

Table 3. Evaluating solver contribution to a portfolio. Times are in seconds.

In Fig. 2 we show scatter plots comparing running time of Z3-Noodler with cvc 5, Z3, and virtual best solver (VBS; a solver that takes the best result from all tools other than Z3-Noodler) on all three benchmark groups. The plots show that Z3-Noodler outperforms the competitors on a vast number of instances, in many cases being complementary to them. To validate this claim, we also checked how different solvers contribute to a portfolio. That is, we took the VBS including Z3-Noodler (VBS \(^+\)) and then checked how well the portfolio works without each of the solvers. Table 3 shows the results on the Regex and Equations groups (we omit Predicates-small, where Z3-Noodler does not help the portfolio). The results show that on the two groups, Z3-Noodler is the most valuable solver in the portfolio. We also include results on the small portfolio of Z3 and cvc 5 (with and without Z3-Noodler) showing that, on the two groups, using just these three solvers is almost as good as using the whole portfolio of all solvers.

Comparing with the older version Z3-Noodler \(^{ pr }\) from [20], we can see that there is a significant improvement in most benchmarks, most significantly in AutomatArk, StringFuzz, Kepler, StrSmallRw, and Kaluza. We note that adding more complicated algorithm selection strategies significantly improved the overall performance of Z3-Noodler, but, on the other hand, decreased the performance on Kaluza (cf. [20]). Better results in AutomatArk and StringFuzz stem from the improvements in Mata and from heuristics tailored for regular expressions handling. Including Nielsen’s algorithm [32] has the largest impact on the Kepler benchmark. The improvement on predicate-intensive benchmarks is caused by optimizations in axiom saturation for predicates. The older version also had multiple bugs that have been fixed in the current version.