1 Introduction

Over the years, research on concurrent verification has been chiefly conducted under the premise that the threads run according to the classical Sequential Consistency (SC) semantics. Under SC, the threads operate on a set of shared variables through which they communicate atomically, i.e., read and write operations take effect immediately. In particular, a write operation is visible to all the threads as soon as the writer thread carries out its operation. Therefore, the threads always maintain a uniform view of the shared memory: they all see the latest value written on any given variable and we can interpret program runs as interleavings of sequential thread executions. Although SC has been immensely popular as an intuitive way of understanding the behaviours of concurrent threads, it is not realistic to assume computation platforms guarantee SC anymore. The reason is that, due to hardware and compiler optimizations, most modern platforms allow more relaxed program behaviours than those permitted under SC, leading to so-called weak memory models. Weakly consistent platforms are found at all levels of system design such as multiprocessor architectures (e.g., [32, 33]), Cache protocols (e.g., [19, 31]), language level concurrency (e.g., [24]), and distributed data stores (e.g., [17]). Program behaviours change dramatically when moving from the SC semantics to weaker semantics. Therefore, in recent years, research on the verification of concurrent programs under weak memory models have started to become popular. A classical example of weak memory models is the Total Store Ordering (TSO) semantics which is a formalization of the Intel x86 processor architecture [29]. The TSO semantics inserts an unbounded FIFO buffer, called the store buffer, between each thread and the main memory. When a thread performs a write instruction, the corresponding operation is appended to the end of the buffer, and hence it is not immediately visible to other threads. The write messages are non-deterministically propagated from the store buffer of a given thread to the shared memory. Verification of programs that contain data races needs to take the underlying memory model into account. This is crucial in hardware-close programming, especially in concurrent libraries or kernels. Such applications are inherently racy; exploiting racy WMM operations for efficiency is standard practice. Our work serves as a foundation for ensuring the correctness of such systems, which often rely on these intricate memory models to achieve optimal performance.

In a parallel development, significant research has been done on extending model checking frameworks to programs with infinite state spaces. There are two main reasons why a program might have an infinite state space. The first is that the program has unbounded control structures, which means it can have an unbounded number of threads. Examples include parameterized systems, in which correctness of the system is checked regardless of the number of threads, and programs that allow dynamic thread creation through spawning [11]. Secondly, the program may operate on unbounded data structures, such as clocks [12], stacks [16], and queues ([1, 10]). These works, including their extensions, have been done under the SC assumption. Although recent works have started to explore parameterized verification for weak memory models [4, 6, 22], the verification of programs that operate on a shared unbounded data structure with weak memory semantics has remained unexplored until now.

In this paper, we combine infinite-state programs with weak memory models: we study the decidability and complexity of the reachability problem for programs operating on unbounded data structures under the TSO semantics. While the TSO semantics has been extensively studied (e.g., [5, 15]), it has been assumed that the data domain is finite. This means that the possible values of a shared variable or a register are bounded. In contrast, our model allows for an infinite domain such as natural numbers \(\mathbb {N}\) or real numbers \(\mathbb {R}\). It contains register assignments, an operator that may assign an arbitrary value to a register, and a set of relations that act as guards. We focus on relations equality and “greater than” on totally ordered sets and combinations, negations and inversions of them. Our model finds practical utility in continuously running concurrent protocols. A prime example is the bakery ticket protocol used in various scenarios. It is presented in Section 4. Here, an unbounded number of requests occur, each assigned increasing numbers and the lowest-numbered request is serviced. This presents a scenario with inherent races that requires an infinite domain which our model can effectively capture. Note that our model is infinite in multiple dimensions: the threads are infinite-state as they operate on unbounded data domains, the store buffers are unbounded, and they carry write-messages over an unbounded domain.

In order to perform safety verification, we need to decide whether there is an execution that can reach some undesirable control state. We study the control state reachability problem and show that for many domains and relations, it is undecidable. Therefore, we propose an alternative approach by introducing an under-approximation schema using context-bounding [14, 23, 25, 28, 30]. Context-bounding has been proposed in [30] as a suitable approach for efficient bug detection in multithreaded programs. Indeed, for concurrent programs, a bounding concept that provides both good coverage and scalability must be based on aspects related to the interactions between concurrent components. It has been shown experimentally that concurrency bugs usually show up after a small number of context switches [28]. In this work, we study a context bounded analysis where only the active thread may perform an operation and update the memory. We show that in this case, the state reachability problem is not only decidable, but even PSPACE complete. To this end, we perform a two-step abstraction that employs insights about context bounded runs of TSO semantics as well as the structure of reachable configurations.

In the first step of our abstraction process, we refine the methods introduced by [14]. Their construction introduces a code-to-code translation that abstracts the buffer, simplifying the problem to state reachability under SC. Our approach leverages the fact that this abstraction does not explicitly depend on variable values. In our case, the abstraction step yields a register machine where the register values are integers or real numbers, and the transitions are conditioned by “gap-constraints” [9, 18, 27]. Gap constraints serve to identify, within each system configuration, (i) the variables with identical values and (ii) the gaps (differences) between variable values. Notably, these gaps can be arbitrarily large. The papers [9, 18, 27] analyze programs with gap constraints within the framework of well-structured systems [8, 20]. As a result, they do not provide upper bounds on the complexity.

As another key contribution of this paper, we propose a method to achieve PSPACE completeness. The fundamental idea behind our algorithm is that for any system execution, there is an alternative execution with larger gaps among the variables. This implies that we do not need to explicitly track the gaps between variables, as is the case in [9, 18, 27]. Instead, we implement a second (precise) abstraction step, focusing solely on the order of variables. For any pair of variables x and y, we record whether \(x=y\), \(x<y\), or \(x>y\).

2 Related Work

Not much current work considers the complexity and decidability of infinite-state state programs on weak memory models. Furthermore, most existing works consider parameterized verification rather than programs with infinite data domains. The paper [6] considers parameterized verification of programs running under TSO, and shows that the reachability problem is PSPACE complete. However, the work assumes that the threads are finite-state and, in particular, the threads do not manipulate unbounded data domains. The paper [22] shows PSPACE completeness when the underlying semantics is the Release-Acquire fragment of C11. The latter semantics gives rise to a different semantics compared to TSO. The paper also considers finite-state threads.

In [2], parameterized verification of programs running under TSO is considered. However, the paper applies the framework of well-structured systems where the buffers of the threads are modelled as lossy channels, and hence the complexity of the algorithm is non-primitive recursive. In particular, the paper does not give any complexity bounds for the reachability problem (or any other verification problems). The paper [15] considers checking the robustness property against SC for parameterized systems running under the TSO semantics. However, the robustness problem is entirely different from reachability and the techniques and results developed in this work cannot be applied in our setting.

The paper [4] considers parameterized verification under the TSO semantics when the individual threads are infinite-state. However, the authors study a restricted model, where it assumes that (i) all threads are identical and (ii) the threads do not use atomic operations. Generally, parameterized verification for the restricted model is easier than non-parameterized verification. For instance, in the case of TSO where the threads are finite-state, the restricted parameterized verification problem is in PSPACE [6] while the non-parameterized problem has a non-primitive recursive complexity [13].

The are many works on extending infinite-state systems with unbounded data domains. Well studied examples are Petri nets with data tokens [27], stacks with unbounded stack alphabets [7], and lossy channel systems with unbounded message alphabets [1]. All these works assume the SC semantics and are hence orthogonal to this work.

3 Total Store Order (TSO)

Let \(\mathbb {B}=\{true,false\}\). Given a function \(f: A\rightarrow B\) with \(a\in A,b\in B\), \(f[a\leftarrow b]\) is defined as follows: \(f[a\leftarrow b](a)\mathrel {:=}b\), \(f[a\leftarrow b](a')\mathrel {:=}f(a')\) for any \(a'\in A\) with \(a'\ne a\). We write \(x\in w\) for letter \(x\in \varSigma \) occurring in word \(w\in \varSigma ^*\) and \(w'\le w\) for \(w'\in \varSigma ^*\) being a subsequence of w.

Let x and y be two natural (real) numbers. Let \(n \in \mathbb {N}\), we use \(x<_n y\) (resp. \( \le _n y\)) to denote that \(x+n<y\) (resp. \(x+n \le y\)). A data theory is defined by a pair \((\texttt{D}, \textsf{Rl})\) where \(\texttt{D}\) is an infinite data domain and \( \textsf{Rl}\subseteq \texttt{D}\times \texttt{D}\rightarrow \mathbb {B}\) is a finite set of relations over \(\texttt{D}\). In this paper, we restrict ourselves to the set of natural/real numbers as data domain, and the set of relations \(\textsf{Rl}\) to be a subset of \(\textsf{Rl}_{\le n}=\{ =, \ne , <, \le , <_n, \le _n \mid n\in \mathbb {N}\}\). We assume w.l.o.g. that \(0 \in \texttt{D}\).

Transition Systems A labelled transition system is a tuple \(\mathcal{T}\mathcal{S}=(\varGamma , \mathcal {L}, \mathcal {T},\gamma _{\textsf{init}})\) that consists of a set of configurations \(\varGamma \), a finite set of labels \(\mathcal {L}\), a labelled transition relation \(\mathcal {T}\subseteq \varGamma \times \mathcal {L}\times \varGamma \), and an initial configuration \(\gamma _{\textsf{init}}\in \varGamma \). We write \(\gamma \xrightarrow {\ell } \gamma ' \) for \(\langle \gamma ,\ell ,\gamma ' \rangle \in \mathcal {T}\). We say that \(\mathbb {\pi }= t_1\ldots t_n \in \mathcal {T}^*\) is a run of \(\mathcal{T}\mathcal{S}\) if there is a sequence of configurations \(\gamma _1, \gamma _2, \ldots , \gamma _{n+1}\) such that \(t_i=\gamma _i\xrightarrow {\ell _i} \gamma _{i+1}\) for \(i\le n\) and \(\gamma _1=\gamma _{\textsf{init}}\). The run \(\mathbb {\pi }\) ends in configuration \(\gamma _{n+1}\). We say that \(\gamma \) is reachable if there is a run \(\mathbb {\pi }\) of \(\mathcal{T}\mathcal{S}\) that ends in \(\gamma \).

Programs A concurrent program \(\textsf{Prog}\) consists of finite set of threads \(\mathcal {T}\). Each thread \(t\in \mathcal {T}\) is a finite state machine that works on its own set of local registers \(\mathcal {R}_t\). The local registers of different threads are disjoint. Let \(\mathcal {R}=\cup _{t \in \mathcal {T}}\mathcal {R}_t\). The threads communicate over a finite set of shared variables \(\mathcal {X}\). The registers and the shared variables take their values from a data theory \((\texttt{D},\textsf{Rl})\). Formally, a thread is a tuple \(t=\langle \mathcal {Q}_t, \mathcal {R}_t, \varDelta _t, q_{\textsf{init}}^t\rangle \) where \(\mathcal {Q}_t\) is a finite set of states of thread t, \(q_{\textsf{init}}^t \in \mathcal {Q}_t\) is the initial state of t, and \(\varDelta _t\subseteq \mathcal {Q}_t \times \textsf{Op}\times \mathcal {Q}_t\) is a finite set of transitions that change the state and execute an operation \(\textsf{op}\in \textsf{Op}\). Let \(x\in \mathcal {X}, r_1,r_2 \in \mathcal {R}_t\). A transition \(\delta \in \varDelta _t\) is a tuple \(\delta =\langle q,\textsf{op},q'\rangle \) where the operation \(\textsf{op}\in \textsf{Op}\) has one of the following forms: (1) \(r_1 \mathrel {:=}r_2\) assigns the value of register \(r_2\) to register \(r_1\), (2) \(r_1 \mathrel {:=}\circledast \) non-deterministically assigns a value to register \(r_1\), (3) \(\textsf{rl}( r_1 , r_2 )\) checks if the values of the two registers \(r_1\) and \(r_2\) satisfy the relation \(\textsf{rl} \in \textsf{Rl}\), (4) \(\textsf{rd}( x , r_1 )\) reads the value of shared variable x and stores it in register \(r_1\), (5) \( \textsf{wt}( x , r_1 )\) writes the value of register \(r_1\) to shared variable x, and (6) \(\textsf{arw}(x, r_1, r_2)\) is the atomic read write operation which atomically executes a read followed by a write operation.

Fig. 1.
figure 1

The transition relation of TSO. We assume that \(\textsf{St}(t)=q\).

TSO Semantics The TSO memory model [33] is used by the x86 processor architecture. Each thread has its own FIFO write buffer. Write operations \(\textsf{wt}( x , r )\) in a thread t do not update the memory immediately; if \(d\in \texttt{D}\) is the value of r, then (xd) is appended to the buffer of t. The buffer contents are updated to the shared memory non-deterministically. A read operation \(\textsf{rd}( x , r )\) in t accesses the latest write in the buffer of t. In case there is no such write, it accesses the shared memory. For the atomic read write operation \(\textsf{arw}(x, r_1, r_2)\) in thread t, the buffer of t must be empty (\(\epsilon \)), and the value of x in the memory must be same as the value of \(r_1\). Then x is set to the value of \(r_2\).

Formally, the TSO memory model is a labelled transition system. A configuration \(\gamma \) is defined as a tuple \(\gamma =\langle \textsf{St},\textsf{RVal},\textsf{Buf},\textsf{Mem}\rangle \) where \(\textsf{St}: \mathcal {T}\rightarrow \bigcup _{t\in \mathcal {T}} \mathcal {Q}_t\) maps each thread to its current state, \(\textsf{RVal}: \mathcal {R}\rightarrow \texttt{D}\) maps each register in a thread to its current value, \(\textsf{Buf}: \mathcal {T}\rightarrow (\mathcal {X}\times \texttt{D})^*\) maps each thread buffer to its content, which is a sequence of writes. Finally, \(\textsf{Mem}: \mathcal {X}\rightarrow \texttt{D}\) maps each shared variable to its current value in the memory. The initial configuration of \(\textsf{Prog}\) is defined by a tuple \(\gamma _{\textsf{init}}=\langle \textsf{St}_{\textsf{init}},\textsf{RVal}_{\textsf{init}},\textsf{Buf}_{\textsf{init}},\textsf{Mem}_{\textsf{init}} \rangle \) where \(\textsf{St}_{\textsf{init}}\) maps each thread t to its initial states \(q_{\textsf{init}}^t\), \(\textsf{RVal}_{\textsf{init}}\) and \( \textsf{Mem}_{\textsf{init}}\) assign all registers and shared variables the value 0, and \(\textsf{Buf}_{\textsf{init}}\) initializes all thread buffers to the empty word \(\epsilon \). We formally define the labelled transition relation \(\xrightarrow {\ell } \) on configurations in Figure 1 where the label \(\ell \) is either of the form \(t,\textsf{op}\) (to denote a thread operation) or tu (to denote an update operation) with \( t \in \mathcal {T}\) is a thread and \(\textsf{op}\in \textsf{Op}\) is an operation.

The Reachability Problem Reach Given a concurrent program \(\textsf{Prog}\) and a state \(q_{ final }\in \mathcal {Q}_t\) of thread t, Reach asks, if a configuration \(\gamma =\langle \textsf{St},\textsf{RVal},\textsf{Buf},\textsf{Mem}\rangle \) with \(\textsf{St}(t)=q_{ final }\) is reachable by the transition system given by the TSO semantics of \(\textsf{Prog}\). In this case, we say that the state \(q_{ final }\) is reachable by \(\textsf{Prog}\). We use Reach \([ \texttt{D},\textsf{Rl} ]\) to denote the reachability problem for a concurrent program with the data theory (\(\texttt{D}, \textsf{Rl})\).

4 Lamport’s Bakery Algorithm

To demonstrate the practical application of our model, we use it to model Lamport’s Bakery Algorithm [26]. Created by Leslie Lamport in 1974, it is a cornerstone solution for achieving mutual exclusion in concurrent systems. Picture threads as patrons entering a bakery, each is handed a unique ticket upon arrival. These tickets, representing the order of entry, dictate the sequence for accessing critical sections. They ensure an orderly execution flow and preventing race conditions in a critical section.

Each thread is assigned a unique number that is larger then the numbers currently assigned to other threads. The thread possessing the lowest number is granted entry to the critical section. This thread may access the critical section an unbounded number of times. This means the assigned tickets keep increasing and thus an infinite domain is required. Note that the algorithm does not rely on precise tickets values, we only need to compare the tickets to each other. This makes the protocol well suited to our program model.

The protocol contains n threads where each thread \(i\le n\) is associated with two variables: The ticket number \(ticket_i\) and the flag \(chosen_i\) which signals whether the thread has chosen a ticket number. We assume \(r_{TRUE}\) and \(r_{FALSE}\) are initialized with different values that represent the boolean values of a flag and that \(ticket_i\) is initially the same as \(r_{FALSE}\) for all \(i\le n\).

The algorithm for thread i is given in Algorithm 1. For the sake of simplicity and compactness we present the transition system as pseudocode. This is equivalent to a program definition since the code only accesses variables and registers using operations \(\textsf{Op}\) with relations \(\textsf{Rl}_{<}\). The remaining instructions only affect the finite control flow and can be expressed using transitions.

Algorithm 1
figure a

Lamport Bakery Protocol

5 State Reachability for TSO with (Dis)-Equality Relation

We show that the reachability problem for concurrent programs under TSO is undecidable when \(\{=, \ne \} \subseteq \textsf{Rl}\). The proof is achieved through a reduction from the state reachability problem of Lossy Channel Systems with Data (DLCS) [1], which is already known to be undecidable. To simulate the lossy channel, we employ write buffers, as both are implemented as first-in-first-out queues. However, there are three main distinctions that must be considered: (i) write buffers do not contain letters, (ii) write buffers are not lossy, and (iii) the semantics of reads differ from receives.

We address these distinctions as follows: (i) We encode the letters as variables. (ii) We model writes being lost by avoiding to read them. (iii) To prevent buffer reads, we transfer the writes into a write buffer of a second thread with a different variable. We ensure that every write is accessed only once by overwriting them immediately with a different value.

Theorem 1

Reach \([ \texttt{D},\textsf{Rl} ]\) is undecidable for \(\{=,\ne \} \subseteq \textsf{Rl}\).

The rest of this section is devoted to the proof of the above theorem. We first recall the definition of Lossy Channel Systems with Data (DLCS) [1]. Then, we present the reduction from state reachability problem of DLCS to Reach \([ \texttt{D},\textsf{Rl} ]\).

Fig. 2.
figure 2

The transition relation of DLCS

Lossy Channel Systems with Data A DLCS \(\mathcal {L}= \langle \mathcal {Q}_\mathcal {L}, \mathcal {X}_\mathcal {L}, \varSigma _\mathcal {L}, \varDelta _\mathcal {L}, q_{\textsf{init}}\rangle \) consists of a finite set of states \(\mathcal {Q}_\mathcal {L}\), a finite number of variables \(\mathcal {X}_\mathcal {L}\) ranging over an infinite domain \(\texttt{D}\), a finite channel alphabet \(\varSigma _\mathcal {L}\), \(q_{\textsf{init}} \in \mathcal {Q}\) is the initial state, and a finite set of transitions \(\varDelta _\mathcal {L}\). The set \(\varDelta _\mathcal {L}\) of transitions is a subset of \( \mathcal {Q}_\mathcal {L}\times \textsf{Op}_\mathcal {L}\times \mathcal {Q}_\mathcal {L}\). Let \(x, y \in \mathcal {X}_\mathcal {L}\). The set \(\textsf{Op}_\mathcal {L}\) consists of the following operations (1) \(x\mathrel {:=}y\) which assigns the value of y to x, (2) \(x\mathrel {:=}\circledast \), which assigns a fresh value from \(\texttt{D}\) that is different from the existing values of all variablesFootnote 1, (3) \({x}={y}\) (\(x\ne y\)) which compares the value of variables x and y, (4) \(!\langle a,x\rangle \) which appends letter \(a \in \varSigma _\mathcal {L}\) together with the value of x to the channel, (5) \(?\langle a,x\rangle \) which deletes the head of the channel \(\langle a,d\rangle \) and stores the value d in x, and (6) loss which removes elements in the channel.

A configuration \(\gamma \) of DLCS is defined by the tuple \(\langle q, \textsf{XVal}, w\rangle \) where \(q \in \mathcal {Q}_\mathcal {L}\) is the current state, \(\textsf{XVal}: \mathcal {X}_\mathcal {L}\rightarrow \texttt{D}\) is the current valuation of the variables, and \(w \in (\varSigma \times \texttt{D})^*\) is the content of the lossy channel. The system is lossy, which means any element in the channel may disappear anytime. The initial configuration \(\gamma _{\textsf{init}}\) of \(\mathcal {L}\) is defined by \((q_{\textsf{init}},\textsf{XVal}_{\textsf{init}},\epsilon )\) where \(\textsf{XVal}_{\textsf{init}}(x)=0\) for all \(x \in \mathcal {X}_{\mathcal {L}}\). The transition relation of DLCS is given in Figure 2.

The state reachability problem for \(\mathcal {L}\) asks whether, for a given final state \(q_{ final }\in \mathcal {Q}\), there is a reachable configuration \(\gamma \) of the form \(\gamma =\langle q_{ final }, \textsf{XVal}, w \rangle \). In this case, we say that the state \(q_{ final }\) is reachable by \(\mathcal {L}\).

Theorem 2

([1]). The state reachability problem for DLCS is undecidable.

Fig. 3.
figure 3

\(\textsf{Prog}(\mathcal {L})\) with threads t (pink states) and \(t_{\textsf{ch}}\) (yellow states).

Reduction from DLCS reachability Given a DLCS \(\mathcal {L}= \langle \mathcal {Q}_\mathcal {L}, \mathcal {X}_\mathcal {L}, \varSigma _\mathcal {L}, \varDelta _\mathcal {L}, q_{\textsf{init}}\rangle \) over data domain \(\texttt{D}\) with \(\mathcal {X}_\mathcal {L}=\{ x_1\ldots x_n \}\), we reduce the state reachability of \(\mathcal {L}\) to the reachability problem Reach \([ \texttt{D}, \{=,\ne \} ]\) of a concurrent program \(\textsf{Prog}(\mathcal {L})\), with two threads \(t, t_{\textsf{ch}}\). The thread t simulates the operations of \(\mathcal {L}\), while thread \(t_{\textsf{ch}}\) simulates the lossy channel of \(\mathcal {L}\) using its write buffer. Let \(\mathcal {R}_t=\{r_{\$},r_{tmp}\} \cup \{r_x \mid x \in \mathcal {X}_\mathcal {L}\}\), \(\mathcal {R}_{t_{\textsf{ch}}}=\{r_{\$}^{\textsf{ch}}, r_{tmp}^{\textsf{ch}}\}\) be the local registers of threads t and \(t_{\textsf{ch}}\). Corresponding to each \(x \in \mathcal {X}_\mathcal {L}\), we have the register \(r_x\) in thread t, which stores the current values of x. Registers \(r_{tmp}\) and \(r_{tmp}^{\textsf{ch}}\) are used to temporarily store certain values. The shared variables of \(\textsf{Prog}(\mathcal {L})\) are \(\mathcal {X}=\{x_a, y_a \mid a \in \varSigma _\mathcal {L}\}\), they help in simulating the behavior of the lossy channel of \(\mathcal {L}\).

Simulating the DLCS. The transitions of \(\textsf{Prog}(\mathcal {L})\) are sketched in Figure 3. The initialization of the program is omitted in the figure and goes as follows. The thread \(t_{\textsf{ch}}\) starts by assigning a non-deterministic value (say \(\$\)) to the register \(r_{\$}^{\textsf{ch}}\) (i.e., \(r_{\$}^{\textsf{ch}}\mathrel {:=}\circledast \)), then checks that the new value \(\$\) is different from 0 (i.e., by checking that \(r_{\$}^{\textsf{ch}}\ne r_{tmp}^{\textsf{ch}}\)), and finally performs an atomic read write operation \( \textsf{arw}(x, r_{tmp}^{\textsf{ch}}, r_{\$}^{\textsf{ch}})\) on each variable \(x \in \mathcal {X}\). The thread t starts by reading the value of each shared variable \(x \in \mathcal {X}\) (i.e., performing \(\textsf{rd}( x , r_{\$} )\)) and checks if its value is different from 0 (i.e., \(r_{\$}\ne r_{tmp}\)). At the end of this initialization phase, all the shared variables have the new value \(\$\), the registers \(r_{tmp}\) and \(r_{tmp}^{\textsf{ch}}\) have the value 0 and the registers \(r_{\$}\) and \(r_{\$}^{\textsf{ch}}\) have the value \(\$\). The current state of thread t is the initial state \(q_{\textsf{init}}\) of \(\mathcal {L}\) while the thread \(t_{\textsf{ch}}\) is in a state \(q_{\textsf{ch}}\).

Every transition \(\langle q, x \mathrel {:=}y, q' \rangle \in \varDelta _\mathcal {L}\) is simulated in \(\textsf{Prog}(\mathcal {L})\) by threat t with a gadget-a sequence of transitions that starts in q and ends in \(q'\). The transitions \((q, x\mathrel {:=}y, q')\), \((q, x=y, q')\) and \((q, x \ne y, q')\) in the DLCS are simulated by the thread t as gadgets with single transitions \((q,r_x\mathrel {:=}r_y,q'), (q,r_x=r_y,q')\) and \((q,r_x \ne r_y,q')\), respectively. We omit their description in Figure 3.

To simulate \(x\mathrel {:=}\circledast \), we load the new value in register \(r_{tmp}\) and ensure that it is different from the values in registers \(r_{\$}\) and \(r_{x_1} \ldots r_{x_n}\). This is depicted by the gadget \(Gad^t_{x\mathrel {:=}\circledast }\) in thread t. The send operation \(!\langle a, x\rangle \) in the DLCS is simulated by the gadget \(Gad^t_{!\langle a, x\rangle }\). In the DLCS, the send appends the letter a and the value of x to the channel. This is simulated by the write \(\textsf{wt}( x_a , r_x )\), thereby appending \((x_a, val(r_x))\) to the buffer of t. To simulate reads of the DLCS, we first make note of a crucial difference in the way reads happen in DLCS and TSO. In DLCS, a read happens from the head of the channel, and the head is deleted immediately after the read. In TSO however, we can read from the latest write in the shared memory multiple times. In order to simulate the “read once” policy of the DLCS, we follow each \(\textsf{wt}( x_a , r_x )\) with another write \(\textsf{wt}( x_a , r_{\$} )\).

Thread \(t_{\textsf{ch}}\) is a loop from the state \(q_{\textsf{ch}}\) which continuously reads from \(x_a\) a value from a simulated send followed by the separator \(\$ \). It copies these values to \(y_a\) using local register \(r_{tmp}^{\textsf{ch}}\). The first time it reads from \(x_a\), it reads the value d of x from a simulated send \(!\langle a, x\rangle \). It ensures that this is not the \(\$\) symbol (\(r_{tmp}^{\textsf{ch}}\ne r_{\$}^{\textsf{ch}}\)), and writes this value from \(r_{tmp}^{\textsf{ch}}\) into variable \(y_a\), thus appending \((y_a, d)\) in the buffer of \(t_{\textsf{ch}}\). It then reads again the value of \(x_a\) into \(r_{tmp}^{\textsf{ch}}\). This time, it makes sure to read \(\$ \) with the check \(r_{tmp}^{\textsf{ch}}=r_{\$}^{\textsf{ch}}\). The receive \(?\langle a, x\rangle \) of the DLCS is simulated by \(Gad^t_{?\langle a, x\rangle }\). First, we read from \(y_a\) and store it in \(r_x\), ensuring this value d is not \(\$\). Then, we read \(\$\) from \(y_a\). This ensures that the earlier value d is overwritten in the memory and is not read twice.

A loss in the channel of the DLCS results in losing some messages \(\langle a, d\rangle \). This is accounted for in \(\textsf{Prog}_\mathcal {L}\) in two ways. Thread \(t_{\textsf{ch}}\) may not pass on a value written from \(x_a\) to \(y_a\) since the loop may not execute for every value. Thread t may not read a value written by \(t_{\textsf{ch}}\) in \(y_a\) since it was already overwritten by some later writes.

Lemma 1

The state \(q_{final}\) is reachable by \(\mathcal {L}\) if and only if \(q_{final}\) is reachable by \(\textsf{Prog}(\mathcal {L})\).

The formal proof is given in Appendix A of the full version [3]. Theorem 1 extends to any set of relations that we can use to simulate equality and disequality. For instance \(\le ,\nleq \in \textsf{Rl}\).

6 Context Bounded Analysis

In the light of this undecidability, we turn our attention to a variant of the reachability problem which is tractable. We study context bounded runs, an under-approximation of the program behavior that limits the possible interactions between processes. A run consists of a number of contexts. A context is a sequence of steps where only a certain fixed thread t is active. We say that \(\mathbb {\pi }\in \textsf {CB}(k)\) if and only if there is a partitioning \(\mathbb {\pi }=\mathbb {\pi }_1 \ldots \mathbb {\pi }_k\) such that for all contexts \(i\le k\) there is an active thread \(t_i\in \mathcal {T}\) such that only the active thread updates the memory and performs operations: If \(\gamma \xrightarrow {\ell } \gamma '\in \mathbb {\pi }_i\), then \(\ell \in \{t_i\} \times (\textsf{Op}\cup \{ u\} )\).

In the following, we show PSPACE completeness of \(\textsf {CB}(k)\)-Reach \([ \texttt{D},\textsf{Rl}_{\le n} ]\) for relations such as (dis) equality, “greater than” or even “greater by at least n” for \(n \in \mathbb {N}\) (see Theorem 4). Our approach begins with a proof of PSPACE hardness through a reduction from the non-emptiness problem of the intersection of regular languages [21].

Next, we demonstrate PSPACE membership by reducing the problem to state reachability of a finite transition system which we solve in polynomial space. This reduction faces challenges from two main sources, namely, (i) the unbounded size of the write buffers, and (ii) the infinite data domain \(\texttt{D}\). In this section, we show how to construct a finite transition system while preserving state reachability in two key steps.

Following [14], we first perform a buffer abstraction. An in-depth analysis of the TSO semantics within context bounded runs reveals a critical insight: Even though the buffer may contain an unbounded number of writes, only a bounded number of these writes can be read later on. This allows us to non-deterministically identify and store the necessary writes using variables.

Finally, we implement a domain abstraction. A popular approach is to abstract the values into equivalence classes based on the supported relations. This reveals our next challenge: (iii) the set of relations \(\textsf{Rl}_{\le n}\) is infinite. We conduct an analysis of the reachable configurations and discover the following: If a configuration is reachable, then any configuration that is the same except with greater distances between differing values is reachable as well. It follows that, for control state reachability, the abstraction does not require the precise distances between variables; their relative order is sufficient.

6.1 Lower-bound

We establish PSPACE hardness by polynomially reducing the problem of checking non-emptiness of the intersection of regular languages to \(\textsf {CB}(k)\)-Reach \([ \texttt{D},\textsf{Rl}_{\le n} ]\). Given a set of finite automata \(\mathcal {A}_1 \ldots \mathcal {A}_n\) with \(\mathcal {A}_i= \langle \mathcal {Q}_i, \varDelta _i, q^\textsf{init}_i \mathcal {Q}^F_{i} \rangle \), where \(\varDelta _i\subseteq \mathcal {Q}_i \times \varSigma \times \mathcal {Q}_i,\; q_i^\textsf{init}\in \mathcal {Q}_i\), and \(\mathcal {Q}^F_{i}\subseteq \mathcal {Q}_i\) for \(i\le n\), the problem asks whether there is a word \(w\in \varSigma ^*\) that is accepted by each automaton \(\mathcal {A}_i\) with \(i\le n\). This is known to be PSPACE hard[21].

We construct a program \(\textsf{Prog}( \mathcal {A}_1 \ldots \mathcal {A}_n )\) that consists of a single thread and reaches a state \(q_{ final }\) if and only if there is such a word. The idea of the construction is that we assign each state \(q_i\in \mathcal {Q}_i\) a unique value stored in a register \(r_{q_i}\) and we store the value of the current state of each automaton \(\mathcal {A}_i\) in a register \(r_i\). To begin, we ensure that the current states are the initial ones. This means \(r_i=r_{q^\textsf{init}_i}\) holds for each \(i\le n\). Then, we choose a letter \(a\in \varSigma \) and simulate some transition \(q_i\xrightarrow {a} q'_i \in \varDelta _i\) for each automaton. This is done by ensuring that the current state is \(q_i\) with \(r_i=r_{q_i}\) and then updating the current state with \(r_i\mathrel {:=}r_{q'_i}\). We repeat this step until each current state is a final state. At this point, we know we have simulated runs for each automaton that accept the same word and we reach \(q_{ final }\).

The formal definition of the construction as well as the proof of correctness is given in Appendix B of [3]. This is a polynomial reduction of non-emptiness of the intersection of regular languages to \(\textsf {CB}(k)\)-Reach \([ \texttt{D},\textsf{Rl}_{\le n} ]\). Observe that we only need test for equality and disequality. The disequalitiy checks are necessary to ensure that each register \(r_{q_i}\) has been assigned a different value.

Theorem 3

\(\textsf {CB}(k)\)-Reach \([ \texttt{D},\textsf{Rl}_{\le n} ]\) is PSPACE hard.

6.2 PSPACE Upper-bound

Assume that we are given a program \(\textsf{Prog}\) and a context bound k. As an intermediary step towards finite state space we construct a finite state machine \(\texttt{AB}(\textsf{Prog},k )\) with variables, over the infinite data domain \(\texttt{D}\). The name \(\texttt{AB}\) stands for abstract buffer as it abstracts from the unbounded write buffers using a finite number of variables. We show that \(\texttt{AB}(\textsf{Prog},k )\) is state reachability equivalent with the TSO semantics of \(\textsf{Prog}\) bound by \(\textsf {CB}(k)\).

While abstracting away the buffers, the main challenge is to simulate read operations. Recall from Section 3 that each read operation in a thread accesses either a write from its own buffer or from the shared memory. A buffer read always reads from the threads latest write on the same variable. Since only the active thread may interact with the memory during the context, we can assume w.l.o.g. that all memory updates occur at the end of a context. This means a memory read accesses the last write on the same variable that updated the memory in an earlier context, and hence we do not need to store the whole buffer content. For memory reads, we need the latest writes leaving the buffer at the end of each context for each variable. For buffer reads, we only require the latest writes on each variable that are issued by each thread.

Construction of the abstract machine The abstract machine \(\texttt{AB}(\textsf{Prog},k )\) is defined by the tuple \(\langle \mathcal {Q}_{\texttt{AB}},\mathcal {X}_{\texttt{AB}},\varDelta _{\texttt{AB}}, q_{\textsf{init}}^\texttt{AB}\rangle \) where \( \mathcal {Q}_{\texttt{AB}}\) is the finite set of states, \(\mathcal {X}_{\texttt{AB}}\) is the finite set of variables, \(\varDelta _{\texttt{AB}}\) is the transition relation, and \(q_{\textsf{init}}^\texttt{AB}\) is the initial state. A control state \(q_\texttt{AB}\in \mathcal {Q}_{\texttt{AB}}\) is a tuple \((\textsf{St},act,j,c,u)\) where: (i) the current state of every thread is stored using function \(\textsf{St}: \mathcal {T}\rightarrow \mathcal {Q}\); (ii) function \(act: \{1\ldots k \} \rightarrow \mathcal {T}\) assigns to each context an active thread; (iii) the current context is stored in variable \(j\in \{1\ldots k \}\); (iv) the function \(c: \mathcal {X}\times \mathcal {T}\rightarrow \{0, 1\ldots k \}\) assigns to each variable \(x\in \mathcal {X}\) and thread \(t\in \mathcal {T}\), the (future) context \(j'\) in which the latest write on x will leave the write buffer of t. This determines when t can access the shared memory on that variable again; and (v) function \(u: \{1\ldots k \} \rightarrow 2^\mathcal {X}\) assigns each context j the set of variables that are updated during j. Additionally, we will introduce some helper states with the transitions relation. We omit them from the definition of \(\mathcal {Q}_{\texttt{AB}}\). The initial state \(q_{\textsf{init}}^\texttt{AB}\) is such a helper state.

The set of variables \(\mathcal {X}_{\texttt{AB}}\) contains: (i) the set of variables \(\mathcal {X}\) in \(\textsf{Prog}\), (ii) the set of registers \(\mathcal {R}\), (iii) for each each context \(j\le k\) and each variable \(x\in \mathcal {X}\), we introduce a variable \(x_j\), which stores the value of the last write on x that leaves the write buffer in context j, (iv) for each thread t and each variable \(x\in \mathcal {X}\), we introduce a variable \(x_t\) which stores the value of the newest write of t on x that is still in the buffer of t. Notice that this is the write that t accesses when reading x (if such a write exists).

We define the transition relation \(\varDelta _{\texttt{AB}}\) in Figure 4. Let \(c_{\textsf{init}}(x,t)=0\) for all \(x \in \mathcal {X}\) and \(t \in \mathcal {T}\), and \(u_{\textsf{init}}(i)=\emptyset \) for all \(i \in \{1,\ldots ,k\}\). The outgoing transitions of state \(q^{\texttt{AB}}_{\textsf{init}}\) are the outgoing transitions of \((\textsf{St}_{\textsf{init}},act,0,c_{\textsf{init}},u_{\textsf{init}})\) for every possible function act. This means the construction guesses a function act and behaves as if the other elements in the tuple have the initial values. Local transitions are adapted in a straightforward manner. A read on x from the buffer occurs if there is a write on x in the buffer. This means the latest write on x leaves the buffer in a context c(xt) after (or in) the current context j. In such a case, we access \(x_t\) which holds the latest write on x in the buffer of t. If there is no such write on x in the buffer, i.e. \(c(x,t)<j\) holds, then the read fetches the value of x from the shared memory.

Fig. 4.
figure 4

The transition relation \(\varDelta _{\texttt{AB}}\) of \(\texttt{AB}(\textsf{Prog},k )\). Let \(\delta =\langle q_a, op, q_b \rangle \in \varDelta _{t}\) and \(q_\texttt{AB}=(\textsf{St},act,j,c,u)\) with \(\textsf{St}(t)=q_a\) and \(act(j)=t\).

A write operation on x overwrites the latest entry in the write buffer on that variable \(x_t\) and determines a future (or current) context \(j'\) with \(j'\ge j\) in which it leaves the buffer. This is recorded in the variable \(x_{j'}\) and x is added to the set \(u(j')\) which holds the variables that are updated in context \(j'\). Note that \(j'\) cannot be smaller then any other context in which a write on a variable y leaves the buffer of t. This information is obtained from the function c. Also, \(j'\) must be a context in which t is active.

At any time, the run can switch from a context j with \(j<k\) to \(j+1\). Let \(u(j)=\{ x^1\ldots x^n \}\). These are the variables that are updated during context j. The values of the last updates on these variables in the context, stored in \(x^1_j\ldots x^n_j\), are written to the corresponding variables in the shared memory. Since \(\texttt{AB}(\textsf{Prog},k )\) only performs memory updates at the end of a context, an atomic read write \(\textsf{arw}(x, r_1, r_2)\) requires that the current buffer content leaves the buffer in the current context. This is ensured by using the condition \( j\ge max\{ c((y,t )) \mid y\in \mathcal {X}\}\). If there is a write on x in the buffer of t, then \(j=c(x,t)\). This is covered by the buffer arw rule in Figure 4. Here, the current value of x is stored in \(x_t\), so we first check that it equals \(r_1\) and update \(x_t\) as well as \(x_j\) with \(r_2\). If \(j>c(x,t)\) holds, then there is no write on x in the buffer of t (memory arw rule) and we compare the value of x in the shared memory with \(r_1\) and update it to \(r_2\).

A configuration \(\gamma =(q_\texttt{AB}, \textsf{Mem})\) in the induced LTS of \(\texttt{AB}(\textsf{Prog},k )\) consists of a state \(q_\texttt{AB}\in \mathcal {Q}_\texttt{AB}\) along with a variable assignment \(\textsf{Mem}\). Let \(\gamma _{\textsf{init}}=(q^\texttt{AB}_{\textsf{init}}, \textsf{Mem}_{\textsf{init}})\) be the initial configuration of \(\texttt{AB}(\textsf{Prog},k )\). Given the transitions \(\varDelta _{\texttt{AB}}\), we can define the transitions in the induced LTS in a straightforward manner. A state \(q_{ final }\in \mathcal {Q}_t\) of thread t is said to be reachable by \(\texttt{AB}(\textsf{Prog},k )\) if and only if there is a reachable configuration of the form \(((\textsf{St},act,j,c,u), \textsf{Mem})\) such that \(\textsf{St}(t)=q_{ final }\) holds.

Lemma 2

A state of \(\textsf{Prog}\) is reachable under TSO by a run \(\mathbb {\pi }\in \textsf {CB}(k)\) if and only if it is reachable by \(\texttt{AB}(\textsf{Prog},k )\).

The proof of Lemma 2 is given in Appendix C of [3]. Next, we abstract away the infinite data domain from \(\texttt{AB}(\textsf{Prog},k )\). We remove this last source of infinity by constructing a finite state machine \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) from \(\texttt{AB}(\textsf{Prog},k )\).

Domain Abstraction We use domain abstraction to solve \(\textsf {CB}(k)\)-Reach \([ \texttt{D},\textsf{Rl}_{\le n} ]\) by reducing state reachability of \(\texttt{AB}(\textsf{Prog},k )\) to reachability of a finite state machine. We introduce the set of relations \(\textsf{Rl}_{<}=\{ =, \ne , < \}\). To abstract away the infinite data domain, we abstract from the exact values of the variables. Instead of storing actual values, we store which relations from \(\textsf{Rl}_{<}\) holds between which pairs of variables, which is finite information. This way, we reduce the infinite domain \(\texttt{D}\) to the finite Boolean domain \(\mathbb {B}\). For example, \(( q_\texttt{AB}, x=y )\) is an abstraction of a configuration \((q_\texttt{AB}, \textsf{Mem}(x)=1,\textsf{Mem}(y)=1 )\). Given a variable assignment \(\textsf{Mem}\) and a relation \(\textsf{rl}\), we define \(\textsf{rl}_\textsf{Mem}(x,y) \mathrel {:=}\textsf{rl}(\textsf{Mem}(x),\textsf{Mem}(y))\). Any variable assignment \(\textsf{Mem}\) induces a set of relations \(\textsf{Rl}_\textsf{Mem}=\{ \textsf{rl}_\textsf{Mem}\mid \textsf{rl}\in \textsf{Rl}_{<}\}\) over the variables \(\mathcal {X}_{\texttt{AB}}\). When considering multiple sets of relations we denote a relation \(\textsf{rl}\in \textsf{Rl}\) as \(\textsf{rl}_\textsf{Rl}\). For a variable assignment \(\textsf{Mem}\), we say set of relations \(\textsf{Rl}\) over variables is consistent with \(\textsf{Mem}\) if \(\textsf{Rl}=\textsf{Rl}_{\textsf{Mem}}\).

Fig. 5.
figure 5

The transition relation of \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\). Sets \(\textsf{Rl}\) and \(\textsf{Rl}'\) satisfy (i) equality is an equivalence relation; (ii) disequality holds iff equality does not hold; (iii) \("<"\) is a total order on variables that are not equal.

Given \(\texttt{AB}(\textsf{Prog},k )=\langle \mathcal {Q_{\texttt{AB}},X_{\texttt{AB}}},\varDelta _{\texttt{AB}}, q_{\textsf{init}}^{\texttt{AB}}\rangle \), we now construct the finite state machine \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )=\langle \mathcal {Q},\varDelta ,q_{\textsf{init}}\rangle \) as follows: \(\mathcal {Q}\mathrel {:=}\mathcal {Q}_{\texttt{AB}} \times \{ \textsf{rl}_{\mathcal {X}_{\texttt{AB}}} : \mathcal {X}_{\texttt{AB}} \times \mathcal {X}_{\texttt{AB}} \rightarrow \mathbb {B}\mid \textsf{rl}\in \textsf{Rl}_{<}\}\). We abstract from a variable assignment by storing in the states which relations are satisfied. The initial state is \(q_\textsf{init}=(q^\texttt{AB}_\textsf{init}, \textsf{Rl}_{\textsf{Mem}_{\textsf{init}}})\). We define the transitions of \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) in Figure 5. We construct the transitions such that they abstract from the transitions of the LTS induced by the semantics of \(\texttt{AB}(\textsf{Prog},k )\). Where the semantics on transitions of \(\texttt{AB}(\textsf{Prog},k )\) require that certain values in the configurations before and after the operation are the same, the transitions of \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) only require that the relations between variables before and after the relation are the same. For instance, the assign rule for operation \(x\mathrel {:=}x'\) requires that \(\textsf{Rl}\) and \(\textsf{Rl}'\) are the same for all variables except x and \(x=_{\textsf{Rl}'} x'\) must hold after the operation. Conditions (i)-(iii) in Figure 5 reflect the properties of \(\textsf{Rl}_{<}\) on values. They ensure that \(\textsf{Rl}\) and \(\textsf{Rl}'\) have consistent variable assignments. Note that for any operation \(<_n\) (or \(\le _n\)), we soften the condition to \(x<_\textsf{Rl}y\). We will show that this still results in an abstraction precise enough to be state reachability equivalent.

Since \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) is a finite state machine, it induces the obvious LTS where a configuration consists of a state. The following lemma shows that the construction is indeed an abstraction of \(\texttt{AB}(\textsf{Prog},k )\). We assume \(\textsf{Prog}\) uses \(\textsf{Rl}_{\le n}\).

Lemma 3

If \(q_\texttt{AB}\) is reachable by \(\texttt{AB}(\textsf{Prog},k )\), then a state \((q_\texttt{AB},\textsf{Rl})\) is reachable by \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\).

Proof

Assume \(\langle (q_\texttt{AB}, \textsf{Mem})\xrightarrow {op} (q'_\texttt{AB}, \textsf{Mem}' ) \rangle \). We argue that \(\langle (q_\texttt{AB}, \textsf{Rl}_{\textsf{Mem}}), op, (q'_\texttt{AB}, \textsf{Rl}_{\textsf{Mem}'} ) \rangle \in \varDelta \) holds as well. The lemma follows immediately. We show this for operation \(x\mathrel {:=}\circledast \). For all other operations, the proof is analogue and we omit it.

It follows from the semantics of \(x\mathrel {:=}\circledast \), that \(\textsf{Mem}(y)=\textsf{Mem}'(y)\) for any \(y \in \mathcal {X}_\texttt{AB}\setminus \{x\}\) holds. This means \(\textsf{Rl}_{\textsf{Mem}}\) and \(\textsf{Rl}_{\textsf{Mem}'}\) satisfy the new value rule. The equality relations in \(\textsf{Rl}_{\textsf{Mem}}\) and \(\textsf{Rl}_{\textsf{Mem}'}\) are consistent with the equality relations on values of \(\textsf{Mem}\) and \(\textsf{Mem}'\). The equality relation given by the values is an equivalence relation and thus Condition (i) is satisfied. Similarly, Condition (ii) is satisfied since values are obviously not equal if and only if they are not related by equality. Condition (iii) is satisfied since relation < on values forms a total order. All conditions are satisfied. This means \(\langle (q_\texttt{AB}, \textsf{Rl}_{\textsf{Mem}}), x\mathrel {:=}\circledast , (q'_\texttt{AB}, \textsf{Rl}_{\textsf{Mem}'} ) \rangle \in \varDelta \).

Lemma 4

If a state \((q_\texttt{AB},\textsf{Rl})\) is reachable by \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\), then \(q_\texttt{AB}\) is reachable by \(\texttt{AB}(\textsf{Prog},k )\).

We prove this by performing an induction over runs of \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) and constructing equivalent runs of \(\texttt{AB}(\textsf{Prog},k )\). In order to do this, we construct configurations with consistent variable assignments. The main challenge is that these variable assignments may not have large enough distances between the values. Take the operation \(x <_n y\), for instance. Here, \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) only requires \(x < y\). Note that any value other than 0 was created by an \(x\mathrel {:=}\circledast \) operation. We can modify a run so that some of these operations assign larger values. This way, we can increase the distances of variable assignments of reachable configurations without changing their consistency with respect to relations. The formal proof of this is given in Appendix E of [3].

Theorem 4

\(\textsf {CB}(k)\)-Reach \([ \texttt{D},\textsf{Rl}_{\le n} ]\) is PSPACE complete.

Proof

While \(\textsf{Rl}_{\le n}\) is an infinite set, \(\textsf{Rl}_{<}\) has only 3 relations. This means \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) is a finite transition system where state reachability is decidable. According to Lemma 2, Lemma 3 and Lemma 4, deciding state reachability of \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) is equivalent to solving \(\textsf {CB}(k)\)-Reach \([ \textsf{Rl}_{\le n} ]\).

We non-deterministically solve the state reachability of \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) by guessing a run that is length-bounded by the size of the state space and checking whether it reaches \(q_{ final }\). We store the current state \(((\textsf{St},act,j,c,u), \textsf{Rl})\) together with a binary encoding of the current length of the run. Note that the state only requires polynomial space. The number of states of \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) is exponential in the program size as well as k, which means the binary encoding also requires polynomial space.

We extend the run by choosing to either perform a context switch or an operation. We begin with the initial state \(q^\texttt{AB}_\textsf{init}\), which is a special case since we first need to guess a function act according to the init rule in Figure 4. To perform an operation, we look at the current state of the active thread \(\textsf{St}(act(j))\), pick an outgoing transition from the program, and update the state according to the corresponding rules given in Figure 4 and Figure 5.

We illustrate this on the new-value operation. Assume we pick the outgoing transition \(\langle q_a, x\mathrel {:=}\circledast , q_b \rangle \in \varDelta _{act(j)}\). In this case, we update the state according to the local rule in Figure 4. Then we update the set \(\textsf{Rl}\) according to the new-value rule in Figure 5. We leave all relations that do not include x unchanged, and we non-deterministically choose x to be either equal to some variable, or to be between two other adjacent variables, or to be the largest or smallest variable. We update the relations to x accordingly. For any other operation, the changes to \(\textsf{Rl}\) are uniquely determined. For writes, we additionally need to non-deterministically pick some future context \(j'\) of the update according to the write rule in Figure 4. In the case of a context switch, we perform a series of variable assignments according to the context switch rule.

Note that we do not explicitly construct the entire \(\textsf{Rl}_{<}{-}\texttt{AB}(\textsf{Prog},k )\) transition system; the program and the rules given in Figure 4 and Figure 5 are sufficient to guess a run. Each step can be performed in polynomial space. Once \(\textsf{St}(act(j))=q_{ final }\) holds, we know \(q_{ final }\) is reachable. The complexity of this process is in PSPACE. According to Theorem 3, the problem is PSPACE hard as well.

7 Conclusion

We examined safety verification of concurrent programs running under TSO that operate on variables ranging over an infinite domain. We have shown that this is undecidable even if the program can only check the variables for equality and non-equality. We studied a context bounded variant of the problem as well. Here, we solved the problem for programs using relations in \(\textsf{Rl}_{\le n}\) and showed that it is PSPACE complete.

As future work, we plan to examine more expressive under-approximations of the program behaviour than the presented context bounded analysis and how these under-approximations affect decidability and complexity of the problem. We also intend to explore the problem for additional relations and/or operations a program may perform.