Symbolic tree automata
Introduction
Finite word automata and finite tree automata provide a foundation for a wide range of applications in software engineering, from regular expressions to compiler technology and specification languages. Despite their immense practical use, explicit representations are not feasible in the presence of finite large alphabets. They require each transition to encode only a single element from the alphabet. For example, string characters in standard programming languages (such as the char type in C#) use 16-bit bit-vectors, an explicit representation would thus require an alphabet of size 216. Moreover, most common forms of finite automata do not support infinite alphabets.
A practical solution to the representation problem is symbolic tree automata. They are an extension of classical tree automata that addresses this problem by allowing transitions to be labeled with arbitrary formulas in a specified label theory. While the idea of allowing formulas is straightforward, typical extensions of finite tree automata often lead to either undecidability of the emptiness problem, such as tree automata with equality and disequality constraints [1], or many extensions lead to nonclosure under complement, such as the generalized tree set automata class [1], finite-memory tree automata [2] that generalize finite-memory automata [3] to trees, or unranked data tree automata [4]. We show that this is not the case for symbolic tree automata. The key distinction is that the extension here is with respect to characters rather than adding symbolic states or adding constraints over whole subtrees.
The symbolic extension is practically useful for exploiting efficient symbolic constraint solvers when performing basic automata-theoretic transformations: it enables a separation of concerns. The solver is used as a black box with a clearly defined interface that exposes the label theory as an effective Boolean algebra. The chosen label theory can be specific to a particular problem instance. For example, even when the alphabet is finite, e.g., 16-bit bit-vectors, it may be useful for efficiency reasons to use integer-linear arithmetic rather than bit-vector arithmetic when the solver is more efficient over integers and when only standard arithmetic operations (and no bit-level operations) are being used. Recent work [5], [6] on symbolic string recognizers and transducers takes advantage of this observation.
We here investigate the case of the more expressive class of symbolic tree automata. Even though a symbolic tree automaton is a finite object, a key point is that the number of interpretations for symbolic labels does not need to be finite. For example, as a consequence of our main result (Theorem 2) a label theory may itself be the theory of symbolic tree automata (over some basic label theory).
In order to use classical tree automata algorithms, it is possible to reduce a symbolic tree automaton A into a classical finite tree automaton whose alphabet is given by all of the satisfiable Boolean combinations of guards that occur in A. However, such a transformation is in general not practical because it introduces an exponential increase in the size of the automaton before the actual algorithm is applied. Moreover, when more than one automaton are involved, this has to be done up front for all predicates that occur in all the automata in order to define the common alphabet. A concrete example of such a blowup is given in [7, Example 2].
Section snippets
Definition of symbolic tree automata
We introduce an extension of tree automata with an effective encoding of labels by predicates that denote sets of labels, rather than individual labels. We assume a countable background universe . A predicate φ over is a finite representation of a subset of ; we write when is clear from the context. We assume given an effectively enumerable set of predicates Σ such that, for each element there is such that , such that and , and Σ is effectively
Determinization of symbolic tree automata
Similar to the case of deterministic frontier-to-root tree recognizers, DSTAs have the same expressive power as general STAs. We lift the classical powerset construction of nondeterministic Rabin-Scott recognizers to STAs. Let denote the powerset of a set X.2 We write for the rule . Definition 5 Let . The powerset STA of A is: where
Boolean closure of symbolic tree automata
For complete closure under Boolean operations we use the following product construction that is a lifting of the standard product of finite tree automata to STAs.
Definition 6 Let , for , be STAs. The product of and is the STA where, for and ,
Related work
Our interest in automata and transducers with symbolic alphabets originally surfaced in the context of security analysis of string sanitization routines [6]. Sanitizers transform untrusted data to trusted data as a first line of defense against cross site scripting (XSS) attacks in web browsers. Symbolic transducers were generalized to symbolic tree transducers (STTs) in [12]. Boolean closure operations of STAs were initially studied in [13] where preliminary results corresponding to Theorem 1
References (25)
- et al.
Tree automata techniques and applications
- et al.
Tree automata over infinite alphabets
- et al.
Finite-memory automata
- et al.
Efficient reasoning about data trees via integer linear programming
- et al.
Symbolic automata constraint solving
- et al.
Symbolic finite state transducers: algorithms and applications
- et al.
Minimization of symbolic automata
- et al.
Tree Automata
(1984) - et al.
An evaluation of automata algorithms for string analysis
- et al.
Fast: a transducer-based language for tree manipulation
Symbolic tree transducers
Cited by (12)
AutoQ: An Automata-Based Quantum Circuit Verifier
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Reasoning About Data Trees Using CHCs
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Projection for Büchi Tree Automata with Constraints between Siblings
2020, International Journal of Foundations of Computer ScienceTree automata with global constraints for infinite trees
2019, Leibniz International Proceedings in Informatics, LIPIcsProjection for Büchi Tree Automata with Constraints Between Siblings
2018, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)