Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

For performance reasons, modern multi-processors may reorder memory access operations. This is due to complex buffering and caching mechanisms that make the response memory queries (load operations) faster, and allow to speed up computations by parallelizing independent operations and computation flows. Therefore, operations may not be visible to all processors at the same time, and they are not necessarily seen in the same order by different processors (when they concern different addresses/variables). The only model where all operations are visible immediately to all processors is the Sequential Consistency (SC) model [28] which corresponds to the standard interleaving semantics where the program order between operations of a same processor is preserved. Modern architectures adopt weaker models (in the sense that they allow more behaviours) due to the relaxation in various ways of the program order. Examples of such weak models are TSO adopted in Intel x86 machines for instance, POWER adopted in PowerPC machines, or the model adopted in ARM machines.

Apprehending the effects of all the relaxations allowed in such models is extremely hard. For instance, while TSO allows reordering stores past loads (of different addresses/variables) reflecting the use of store buffers, a model such as POWER allows reordering of all kinds of store and load operations under quite subtle conditions. A lot of work has been devoted to the definition of formal models that accurately capture the program semantics corresponding to models such as TSO and POWER [11, 30, 32, 34, 35]. Still, programming against weak memory models is a hard and error prone task. Therefore, developing formal verification approaches under weak memory models is of paramount importance. In particular, it is crucial in this context to have efficient algorithms for automatic bug detection. This paper addresses precisely this issue and presents an algorithmic approach for checking state reachability in concurrent programs running on the POWER semantics as defined in [21] (which is essentially the POWER model presented in [34] with small changes that have been introduced in order to increase the accuracy and the precision of the model).

The verification of concurrent programs under weak memory models is known to be complex. Indeed, encoding the buffering and storage mechanisms used in these models leads in general to complex, infinite-state formal operational models involving unbounded data structures like FIFO queues (or more generally unbounded partial order constraints). For the case of TSO, efficient, yet precise encodings of the effects of its storage mechanism have been designed recently [3, 5]. It is not clear how to define such precise and practical encodings for POWER.

In this paper, we consider an alternative approach. We investigate the issue of defining approximate analysis. Our approach consists in introducing a parametric under-approximation schema in the spirit of context-bounding [12, 25, 27, 31, 33]. Context-bounding has been proposed in [33] as a suitable approach for efficient bug detection in multithreaded programs. Indeed, for concurrent programs, a bounding concept that provides both good coverage and scalability must be based on aspects related to the interactions between concurrent components. It has been shown experimentally that concurrency bugs usually show up after a small number of context switches [31].

In the context of weak memory models, context-bounded analysis has been extended in [12] to the case of programs running on TSO. The work we present here aims at extending this approach to the case of POWER. This extension is actually very challenging due to the complexity of POWER and requires developing new techniques that are different from, and much more involved than, the ones used for the case of TSO. First, we introduce a new concept of bounding that is suitable for POWER. Intuitively, the architecture of POWER is similar to a distributed system with a replicated memory, where each processor has its own replica, and where operations are propagated between replicas according to some specific protocol. Our bounding concept is based on this architecture. We consider that a computation is divided in a sequence of “contexts”, where a context is a computation segment for which there is precisely one active processor. All actions within a context are either operations issued by the active processor, or propagation actions performed by its storage subsystem. Then, in our analysis, we consider only computations that have a number of contexts that is less or equal than some given bound. Notice that while we bound the number of contexts in a computation, we do not put any bound on the lengths of the contexts, nor on the size of the storage system.

We prove that for every bound \({\mathbbm {K}}\), and for every concurrent program \({ Prog}\), it is possible to construct, using code-to-code translation, another concurrent program \({ Prog}^\bullet \) such that for every \({\mathbbm {K}}\)-bounded computation \(\uppi \) in \({ Prog}\) under the POWER semantics there is a corresponding \({\mathbbm {K}}\)-bounded computation \(\uppi ^\bullet \) of \({ Prog}^\bullet \) under the SC semantics that reaches the same set of states and vice-versa. Thus, the context-bounded state reachability problem for \({ Prog}\) can be reduced to the context-bounded state reachability problem for \({ Prog}^\bullet \) under SC. We show that the program \({ Prog}^\bullet \) has the same number of processes as \({ Prog}\), and only \(O(|{\mathcal P}||{{\mathcal X}}|{\mathbbm {K}}+|{{\mathcal R}}|)\) additional shared variables and local registers compared to \({ Prog}\), where \(|{\mathcal P}|\) is the number of processes, \(|{{\mathcal X}}|\) is the number of shared variables and \(|{{\mathcal R}}|\) is the number of local registers in \({ Prog}\). Furthermore, the obtained program has the same type of data structures and variables as the original one. As a consequence, we obtain for instance that for finite-data programs, the context-bounded analysis of programs under POWER is decidable. Moreover, our code-to-code translation allows to leverage existing verification tools for concurrent programs to carry out verification of safety properties under POWER.

To show the applicability of our approach, we have implemented our reduction, and we have used cbmc version 5.1 [17] as the backend tool for solving SC reachability queries. We have carried out several experiments showing the efficiency of our approach. Our experimental results confirm the assumption that concurrency bugs manifest themselves within small bounds of context switches. They also confirm that our approach based on context-bounding is more efficient and scalable than approaches based on bounding sizes of computations and/or of storage systems.

Related work. There has been a lot of work on automatic program verification under weak memory models, based on precise, under-approximate, and abstract analyses, e.g., [2, 5, 8, 10, 12,13,14,15,16, 18,19,20, 23, 24, 26, 29, 36,37,38,39,40]. While most of these works concern TSO, only a few of them address the safety verification problem under POWER (e.g., [6, 9,10,11, 36]). The paper [21] addresses the different issue of checking robustness against POWER, i.e., whether a program has the same (trace) semantics for both POWER and SC.

The work in [9] extends the cbmc framework by taking into account weak memory models including TSO and POWER. While this approach uses reductions to SC analysis, it is conceptually and technically different from ours. The work in [10] develops a verification technique combining partial orders with bounded model checking, that is applicable to various weak memory models including TSO and POWER. However, these techniques are not anymore supported by the latest version of cbmc. The work in [6] develops stateless model-checking techniques under POWER. In Sect. 4, we compare the performances of our approach with those of [6, 9]. The tool herd [11] operates on small litmus tests under various memory models. Our tool can handle in an efficient and precise way such litmus tests.

Recently, Tomasco et al. [36] presented a new verification approach, based on code-to-code translations, for programs running under TSO and PSO. They also discuss the extension of their approach to programs running under POWER (however the detailed formalization and the implementation of this extension are kept for future work). Our approach and the one proposed in [36] are orthogonal since we are using different bounding parameters: In this paper, we are bounding the number of contexts while Tomasco et al. [36] are bounding the number of write operations.

2 Concurrent Programs

In this section, we first introduce some notations and definitions. Then, we present the syntax we use for concurrent programs and its semantics under POWER as in [21, 34].

Preliminaries. Consider sets \(A\) and \(B\). We use \(\left[ {A}\mapsto {B}\right] \) to denote the set of functions from \(A\) to \(B\), and write \(f:A\mapsto B\) to indicate that \(f\in \left[ {A}\mapsto {B}\right] \). We write \(f(a)=\bot \) to denote that \(f\) is undefined for \(a\). We use \(f[a\leftarrow b]\) to denote the function \(g\) such that \(g(a)=b\) and \(g(x)=f(x)\) if \(x\ne a\). We will use a function \(\mathtt {gen}\) which, for a given set \(A\), returns an arbitrary element \(\mathtt {gen}\left( A\right) \in A\). For integers \(i,j\), we use \([i..j]\) to denote the set \(\left\{ i,i+1,\ldots ,j\right\} \). We use \({A}^*\) to denote the set of finite words over \(A\). For words \(w_1,w_2\in {A}^*\), we use \(w_1\cdot w_2\) to denote the concatenation of \(w_1\) and \(w_2\).

Syntax. Figure 1 gives the grammar for a small but general assembly-like language that we use for defining concurrent programs. A program \({ Prog}\) first declares a set \({\mathcal X}\) of (shared) variables followed by the code of a set \(\mathcal P\) of processes. Each process \(p\) has a finite \({\mathcal R}\left( p\right) \) of (local) registers. We assume w.l.o.g. that the sets of registers of the different processes are disjoint, and define \({\mathcal R}:=\cup _p{\mathcal R}\left( p\right) \). The code of each process \(p\in \mathcal P\) starts by declaring a set of registers followed by a sequence of instructions.

Fig. 1.
figure 1

Syntax of concurrent programs.

For the sake of simplicity, we assume that the data domain of both the shared variables and registers is a single set \({\mathcal D}\). We assume a special element \(0\in {\mathcal D}\) which is the initial value of each shared variable or register. Each instruction \(\mathfrak {i}\) is of the form \(\uplambda \!:\!\mathfrak {s}\) where \(\uplambda \) is a unique label (across all processes) and \(\mathfrak {s}\) is a statement. We define \(\mathtt{lbl}\left( \mathfrak {i}\right) :=\uplambda \) and \(\mathtt{stmt}\left( \mathfrak {i}\right) :=\mathfrak {s}\). We define \(\mathfrak {I}_{p}\) to be the set of instructions occurring in \(p\), and define \(\mathfrak {I}:=\cup _{p\in \mathcal P}\mathfrak {I}_{p}\). We assume that \(\mathfrak {I}_{p}\) contains a designated initial instruction \(\mathfrak {i}^{ init}_{p}\) from which \(p\) starts its execution. A read instruction in a process \(p\in \mathcal P\) has a statement of the form \(\$r\leftarrow x\), where \(\$r\) is a register in \(p\) and \(x\in {\mathcal X}\) is a variable. A write instruction has a statement of the form \(x\leftarrow { exp}\) where \(x\in {\mathcal X}\) is a variable and \({ exp}\) is an expression. We will assume a set of expressions containing a set of operators applied to constants and registers, but not referring to the content of memory (i.e., the set of variables). Assume, conditional, and iterative instructions (collectively called aci instructions) can be explained in a similar manner. The statement \(\mathtt{term}\) will cause the process to terminate its execution. We assume that \(\mathtt{term}\) occurs only once in the code of a process \(p\) and that it has the label \(\uplambda ^{\mathtt{term}}_{p}\). For an expression \({ exp}\), we use \({\mathcal R}\left( { exp}\right) \) to denote the set of registers that occur in \({ exp}\). For a write or an aci instruction \(\mathfrak {i}\), we define \({\mathcal R}\left( \mathfrak {i}\right) :={\mathcal R}\left( { exp}\right) \) where \({ exp}\) is the expression that occurs in \(\mathtt{stmt}\left( \mathfrak {i}\right) \).

For an instruction \(\mathfrak {i}\in \mathfrak {I}_{p}\), we define \(\mathtt{next}\left( \mathfrak {i}\right) \) to be the set of instructions that may follow \(\mathfrak {i}\) in a run of a process. Notice that this set contains two elements if \(\mathfrak {i}\) is an aci instruction (in the case of an assume instruction, we assume that if the condition evaluates to \({ false}\), then the process moves to \(\uplambda ^{\mathtt{term}}_{p}: \mathtt{term}\)), no element if \(\mathfrak {i}\) is a terminating instruction, and a single element otherwise. We define \(\mathtt{Tnext}\left( \mathfrak {i}\right) \) (resp. \(\mathtt{Fnext}\left( \mathfrak {i}\right) \)) to be the (unique) instruction to which the process execution moves in case the condition in the statement of \(\mathfrak {i}\) evaluates to \({ true}\) (resp. \({ false}\)).

Configurations. We will assume an infinite set \({\mathcal E}\) of events, and will use an event to represent a single execution of an instruction in a process. A given instruction may be executed several times during a run of the program (for instance, when it is in the body of a loop). In such a case, the different executions are represented by different events. An event \(\mathbbm {e}\) is executed in several steps, namely it is fetched, initialized, and then committed. Furthermore, a write event may be propagated to the other processes. A configuration \({\mathbbm {c}}\) is a tuple \(\left\langle {\mathbbm {E},\prec ,\mathtt{ins},\mathtt{status},\mathtt{rf},\mathtt{Prop},\prec _\mathtt{co}}\right\rangle \), defined as follows.

Events. \(\mathbbm {E}\subseteq {\mathcal E}\) is a finite set of events, namely the events that have been created up to the current point in the execution of the program. \(\mathtt{ins}:\mathbbm {E}\mapsto \mathfrak {I}\) is a function that maps an event \(\mathbbm {e}\) to the instruction \(\mathtt{ins}\left( \mathbbm {e}\right) \) that \(\mathbbm {e}\) is executing. We partition the set \(\mathbbm {E}\) into disjoint sets \(\mathbbm {E}_{p}\), for \(p\in \mathcal P\), where \(\mathbbm {E}_{p}:= \left\{ \mathbbm {e}\in \mathbbm {E}\ | \ \mathtt{ins}\left( \mathbbm {e}\right) \in \mathfrak {I}_{p}\right\} \), i.e., for a process \(p\in \mathcal P\), the set \(\mathbbm {E}_{p}\) contains the events whose instructions belong to \(p\). For an event \(\mathbbm {e}\in \mathbbm {E}_{p}\), we define \(\mathtt{proc}\left( \mathbbm {e}\right) :=p\). We say that \(\mathbbm {e}\) is a write event if \(\mathtt{ins}\left( \mathbbm {e}\right) \) is a write instruction. We use \(\mathbbm {E}^\mathtt{W}\) to denote the set of write events. Similarly, we define the set \(\mathbbm {E}^\mathtt{R}\) of read events, and the set \(\mathbbm {E}^\mathtt{ACI}\) of aci events whose instructions are either assume, conditional, or iterative. We define \(\mathbbm {E}^\mathtt{W}_{p}\), \(\mathbbm {E}^\mathtt{R}_{p}\), and \(\mathbbm {E}^\mathtt{ACI}_{p}\), to be the restrictions of the above sets to \(\mathbbm {E}_{p}\). For an event \(\mathbbm {e}\) where \(\mathtt{stmt}\left( \mathtt{ins}\left( \mathbbm {e}\right) \right) \) is of the form \(x\leftarrow { exp}\) or \(\$r\leftarrow x\), we define \(\mathtt{var}\left( \mathbbm {e}\right) :=x\). If \(\mathbbm {e}\) is neither a read nor a write event, then \(\mathtt{var}\left( \mathbbm {e}\right) :=\bot \).

Program Order. The program-order relation \(\prec \subseteq \mathbbm {E}\times \mathbbm {E}\) is an irreflexive partial order that describes, for a process \(p\in \mathcal P\), the order in which events are fetched from the code of \(p\). We require that (i) \(\mathbbm {e}_1\not \prec \mathbbm {e}_2\) if \(\mathtt{proc}\left( \mathbbm {e}_1\right) \ne \mathtt{proc}\left( \mathbbm {e}_2\right) \), i.e., \(\prec \) only relates events belonging to the same process, and that (ii) \(\prec \) is a total order on \(\mathbbm {E}_{p}\).

Status. The function \(\mathtt{status}:\mathbbm {E}\mapsto \left\{ \mathtt{fetch},\mathtt{init},\mathtt{com}\right\} \) defines, for an event \(\mathbbm {e}\), the current status of \(\mathbbm {e}\), i.e., whether it has been fetched, initialized, or committed.

Propagation. The function \(\mathtt{Prop}:\mathcal P\times {\mathcal X}\mapsto \mathbbm {E}^\mathtt{W}\cup {\mathcal E}^\mathtt{init}\) defines, for a process \(p\in \mathcal P\) and variable \(x\in {\mathcal X}\), the latest write event on \(x\) that has been propagated to \(p\). Here \({\mathcal E}^\mathtt{init}:=\left\{ \mathbbm {e}^\mathtt{init}_{x}\ | \ x\in {\mathcal X}\right\} \) is a set disjoint from the set of events \({\mathcal E}\), and will be used to define the initial values of the variables.

Read-From. The function \(\mathtt{rf}:\mathbbm {E}^\mathtt{R}\mapsto \mathbbm {E}^\mathtt{W}\cup {\mathcal E}^\mathtt{init}\) defines, for a read event \(\mathbbm {e}\in \mathbbm {E}^\mathtt{R}\), the write event \(\mathtt{rf}\left( \mathbbm {e}\right) \) from which \(\mathbbm {e}\) gets its value.

Coherence Order. All processes share a global view about the order in which write events are propagated. This is done through the coherence order \(\prec _\mathtt{co}\) that is a partial order on \(\mathbbm {E}^\mathtt{W}\) s.t. \(\mathbbm {e}_1\prec _\mathtt{co}\mathbbm {e}_2\) only if \(\mathtt{var}\left( \mathbbm {e}_1\right) =\mathtt{var}\left( \mathbbm {e}_2\right) \), i.e., it relates only events that write on identical variables. If a write event \(\mathbbm {e}_1\) is propagated to a process before another write event \(\mathbbm {e}_2\) and both events write on the same variable, then \(\mathbbm {e}_1\prec _\mathtt{co}\mathbbm {e}_2\) holds. Furthermore, the events cannot be propagated to any other process in the reverse order. However, it might be the case that a write event is never propagated to a given process.

Dependencies. We introduce a number of dependency orders on events that we will use in the definition of the semantics. We define the per-location program-order \(\prec _\mathtt{poloc}\subseteq \mathbbm {E}\times \mathbbm {E}\) such that \(\mathbbm {e}_1\prec _\mathtt{poloc}\mathbbm {e}_2\) if \(\mathbbm {e}_1\prec \mathbbm {e}_2\) and \(\mathtt{var}\left( \mathbbm {e}_1\right) =\mathtt{var}\left( \mathbbm {e}_2\right) \), i.e., it is the restriction of \(\prec \) to events with identical variables. We define the data dependency order \(\prec _\mathtt{data}\) s.t. \(\mathbbm {e}_1\prec _\mathtt{data}\mathbbm {e}_2\) if (i) \(\mathbbm {e}_1\in \mathbbm {E}^\mathtt{R}\), i.e., \(\mathbbm {e}_1\) is a read event; (ii) \(\mathbbm {e}_2\in \mathbbm {E}^\mathtt{W}\cup \mathbbm {E}^\mathtt{ACI}\), i.e., \(\mathbbm {e}_2\) is either a write or an aci event; (iii) \(\mathbbm {e}_1\prec \mathbbm {e}_2\); (iv) \(\mathtt{stmt}\left( \mathtt{ins}\left( \mathbbm {e}_1\right) \right) \) is of the form \(\$r\leftarrow x\); (v) \(\$r\in {\mathcal R}\left( \mathtt{ins}\left( \mathbbm {e}_2\right) \right) \); and (vi) there is no event \(\mathbbm {e}_3\in \mathbbm {E}^\mathtt{R}\) such that \(\mathbbm {e}_1\prec \mathbbm {e}_3\prec \mathbbm {e}_2\) and \(\mathtt{stmt}\left( \mathtt{ins}\left( \mathbbm {e}_3\right) \right) \) is of the form \(\$r\leftarrow y\). Intuitively, the loaded value by \(\mathbbm {e}_1\) is used to compute the value of the expression in the statement on the instruction of \(\mathbbm {e}_2\). We define the control dependency order \(\prec _\mathtt{ctrl}\) such that \(\mathbbm {e}_1\prec _\mathtt{ctrl}\mathbbm {e}_2\) if \(\mathbbm {e}_1\in \mathbbm {E}^\mathtt{ACI}\) and \(\mathbbm {e}_1\prec \mathbbm {e}_2\).

We say that \({\mathbb {c}}\) is committed if \(\mathtt{status}\left( {\mathbbm {e}}\right) =\mathtt{com}\) for all events \(\mathbbm {e}\in \mathbbm {E}\). The initial configuration \({\mathbb {c}}_{ init}\) is defined by \(\left\langle {\emptyset ,\emptyset ,\lambda \mathbbm {e}.\bot ,\lambda \mathbbm {e}.\bot ,\lambda \mathbbm {e}.\bot ,\lambda p.\lambda x. \mathbbm {e}^\mathtt{init}_{x},\emptyset }\right\rangle \). We use \({\mathbb {C}}\) to denote the set of all configurations.

Transition Relation. We define the transition relation as a relation \(\xrightarrow {}{}\subseteq {\mathbb {C}}\times \mathcal P\times {\mathbb {C}}\). For configurations \({\mathbb {c}}_1,{\mathbb {c}}_2\in {\mathbb {C}}\) and a process \(p\in \mathcal P\), we write \({\mathbb {c}}_1\xrightarrow {p}{}{\mathbb {c}}_2\) to denote that \(\left\langle {{\mathbb {c}}_1,p,{\mathbb {c}}_2}\right\rangle \in \xrightarrow {}{}\!\). Intuitively, this means that \(p\) moves from the current configuration \({\mathbb {c}}_1\) to \({\mathbb {c}}_2\). The relation \(\xrightarrow {}{}\) is defined through the set of inference rules shown in Fig. 2.

Fig. 2.
figure 2

Inference rules defining the relation \(\xrightarrow {p}{}{}\) where \(p\in \mathcal P\).

The rule \(\mathtt{Fetch}\) chooses the next instruction to be executed in the code of a process \(p\in \mathcal P\). This instruction should be a possible successor of the instruction that was last executed by \(p\). To satisfy this condition, we define \(\mathtt{MaxI}\left( {\mathbb {c}},p\right) \) to be the set of instructions as follows: (i) If \(\mathbbm {E}_{p}=\emptyset \) then define \(\mathtt{MaxI}\left( {\mathbb {c}},p\right) :=\left\{ \mathfrak {i}^{ init}_{p}\right\} \), i.e., the first instruction fetched by \(p\) is \(\mathfrak {i}^{ init}_{p}\). (ii) If \(\mathbbm {E}_{p}\ne \emptyset \), let \(\mathbbm {e}'\) be the maximal event of \(p\) (w.r.t. \(\prec \)) in the configuration \({\mathbb {c}}\) and then define \(\mathtt{MaxI}\left( {\mathbb {c}},p\right) :=\mathtt{next}\left( \mathtt{ins}\left( \mathbbm {e}'\right) \right) \). In other words, we consider the instruction \(\mathfrak {i}'=\mathtt{ins}\left( \mathbbm {e}'\right) \in \mathfrak {I}_{p}\), and take its possible successors. The possibility of choosing any of the (syntactically) possible successors corresponds to speculatively fetching statements. As seen below, whenever we commit an aci event, we check whether the made speculations are correct or not. We create a new event \(\mathbbm {e}\), label it by \(\mathfrak {i}\in \mathtt{MaxI}\left( {\mathbb {c}},p\right) \), and make it larger than all the other events of \(p\) w.r.t. \(\prec \). In such a way, we maintain the property that the order on the events of \(p\) reflects the order in which they are fetched in the current run of the program.

There are two ways in which read events get their values, namely either from local write events that are performed by the process itself, or from write events that are propagated to the process. The first case is covered by the rule \(\mathtt{Local\text {-}Read}\) in which the process \(p\) initializes a read event \(\mathbbm {e}\in \mathbbm {E}^\mathtt{R}\) on a variable (say \(x\)), where \(\mathbbm {e}\) has already been fetched. Here, the event \(\mathbbm {e}\) is made to read its value from a local write event \(\mathbbm {e}'\in \mathbbm {E}^\mathtt{W}_{p}\) on \(x\) such that (i) \(\mathbbm {e}'\) has been initialized but not yet committed, and such that (ii) \(\mathbbm {e}'\) is the closest write event that precedes \(\mathbbm {e}\) in the order \(\prec _\mathtt{poloc}\). Notice that, by condition (ii), \(\mathbbm {e}'\) is unique if it exists. To formalize this, we define the Closest Write function \(\mathtt{CW}\left( {\mathbb {c}},\mathbbm {e}\right) :=\mathbbm {e}'\) where \(\mathbbm {e}'\) is the unique event such that (i) \(\mathbbm {e}'\in \mathbbm {E}^\mathtt{W}\), (ii) \(\mathbbm {e}'\prec _\mathtt{poloc}\mathbbm {e}\), and (iii) there is no event \(\mathbbm {e}''\) such that \(\mathbbm {e}''\in \mathbbm {E}^\mathtt{W}\) and \(\mathbbm {e}'\prec _\mathtt{poloc}\mathbbm {e}''\prec _\mathtt{poloc}\mathbbm {e}\). Notice that \(\mathbbm {e}'\) may not exist, i.e., it may be the case that \(\mathtt{CW}\left( {\mathbb {c}},\mathbbm {e}\right) =\bot \). If \(\mathbbm {e}'\) exists and it has been inititialized but not commited, we initialize \(\mathbbm {e}\) and update the read-from relation appropriately. On the other hand, if such an event does not exist, i.e., if there is no write event on \(x\) before \(\mathbbm {e}\) by \(p\), or if the closest write event on \(x\) before \(\mathbbm {e}\) by \(p\) has already been committed, then we use the rule \(\mathtt{Prop\text {-}Read}\) to let \(\mathbbm {e}\) fetch its value from the latest write event on \(x\) that has been propagated to \(p\). Notice this event is the value of \(\mathtt{Prop}\left( p,x\right) \).

To commit an initialized read event \(\mathbbm {e}\in \mathbbm {E}^\mathtt{R}_{p}\), we use the rule \(\mathtt{Com\text {-}Read}\). The rule can be performed if \(\mathbbm {e}\) satisfies two conditions in \({\mathbb {c}}\). The first condition is defined as \(\mathtt{RdCnd}\left( {\mathbb {c}},\mathbbm {e}\right) := \forall \mathbbm {e}'\in \mathbbm {E}^\mathtt{R}: (\mathbbm {e}'\prec _\mathtt{poloc}\mathbbm {e})\implies (\mathtt{rf}\left( \mathbbm {e}'\right) \preceq _\mathtt{co}\mathtt{rf}\left( \mathbbm {e}\right) )\). It states that for any read event \(\mathbbm {e}'\) such that \(\mathbbm {e}'\) precedes \(\mathbbm {e}\) in the order \(\prec _\mathtt{poloc}\), the write event from which \(\mathbbm {e}'\) reads its value is equal to or precedes the write event for \(\mathbbm {e}\) in the coherence order \(\prec _\mathtt{co}\). The second condition is defined by \(\mathtt{ComCnd}\left( {\mathbb {c}},\mathbbm {e}\right) := \forall \mathbbm {e}'\in \mathbbm {E}: (\mathbbm {e}'\prec _\mathtt{data}\mathbbm {e})\vee (\mathbbm {e}'\prec _\mathtt{ctrl}\mathbbm {e})\vee (\mathbbm {e}'\prec _\mathtt{poloc}\mathbbm {e}) \implies (\mathtt{status}\left( {\mathbbm {e}'}\right) =\mathtt{com})\). It states that all events \(\mathbbm {e}'\in \mathbbm {E}\) that precede \(\mathbbm {e}\) in one of the orders \(\prec _\mathtt{data}\), \(\prec _\mathtt{ctrl}\), or \(\prec _\mathtt{poloc}\) should have already been committed.

To initialize a fetched write event \(\mathbbm {e}\in \mathbbm {E}^\mathtt{R}_{p}\), we use the rule \(\mathtt{Init\text {-}Write}\) that requires all events that precede \(\mathbbm {e}\) in the order \(\prec _\mathtt{data}\) should have already been initialized. This condition is formulated as \(\mathtt{WrInitCnd}\left( {\mathbb {c}},\mathbbm {e}\right) := \forall \mathbbm {e}'\in \mathbbm {E}^\mathtt{R}: (\mathbbm {e}'\prec _\mathtt{data}\mathbbm {e}) \implies (\mathtt{status}\left( {\mathbbm {e}'}\right) =\mathtt{init}\vee \mathtt{status}\left( {\mathbbm {e}'}\right) =\mathtt{com})\). When a write event in a process \(p\in \mathcal P\) is committed, it is also immediately propagated to \(p\) itself. To maintain the coherence order, the semantics keeps the invariant that the latest write event on a variable \(x\in {\mathcal X}\) that has been propagated to a process \(p\in \mathcal P\) is the largest one in the coherence order among all write events on \(x\) that have been propagated to \(p\) up to now in the run. This invariant is maintained in \(\mathtt{Com\text {-}Write}\) by requiring that the event \(\mathbbm {e}\) (that is being propagated) is strictly larger in the coherence order than the latest write event on the same variable as \(\mathbbm {e}\) that has been propagated to  \(p\).

Write events are propagated to other processes through the rule \(\mathtt{Prop}\). A write event \(\mathbbm {e}\) on a variable \(x\) is allowed to be propagated to a process \(q\) only if it has a coherence order that is strictly larger than the coherence of any event that has been to propagated to \(q\) up to now. Notice that this is given by coherence order of \(\mathtt{Prop}\left( q,x\right) \) which is the latest write event on \(x\) that has been propagated to \(q\).

When committing an aci event through the rule \(\mathtt{Com\text {-}ACI}\), we also require that we verify any potential speculation that have been made when fetching the subsequent events. We assume that we are given a function \(\mathtt{Val}\left( {\mathbb {c}},\mathbbm {e}\right) \) that takes as input an aci event \(\mathbbm {e}\) and returns the value of the expression of the conditional statement in the instruction of \(\mathbbm {e}\) when evaluated in the configuration \({\mathbb {c}}\). The \(\mathtt{Val}\left( {\mathbb {c}},\mathbbm {e}\right) \) is only defined when all events that precede \(\mathbbm {e}\) in the order \(\prec _\mathtt{data}\) should have already been initialized.

To that end, we define predicate \(\mathtt{ValidCnd}\left( {\mathbb {c}},\mathbbm {e}\right) := (\exists \mathbbm {e}'\in \mathbbm {E}:\; \mathbbm {e}\prec \mathbbm {e}' \wedge \not \exists \mathbbm {e}''\in \mathbbm {E}:\; \mathbbm {e}\prec \mathbbm {e}''\prec \mathbbm {e}') \implies ((\mathtt{Val}\left( {\mathbb {c}},\mathbbm {e}\right) ={ true}\wedge \mathtt{ins}\left( \mathbbm {e}'\right) =\mathtt{Tnext}\left( \mathtt{ins}\left( \mathbbm {e}\right) \right) ) \vee (\mathtt{Val}\left( {\mathbb {c}},\mathbbm {e}\right) ={ false}\wedge \mathtt{ins}\left( \mathbbm {e}'\right) =\mathtt{Fnext}\left( \mathtt{ins}\left( \mathbbm {e}\right) \right) )) \). The rule intuitively finds the event \(\mathbbm {e}'\) that was fetched immediately after \(\mathbbm {e}\). Notice that such an event may not exist and it is unique if it exists. The predicate requires the choice of \(\mathbbm {e}'\) is consistent with the value \(\mathtt{Val}\left( {\mathbb {c}},\mathbbm {e}\right) \) of the expression in the statement of the instruction of \(\mathbbm {e}\).

Bounded Reachability. A run \(\uppi \) is a sequence of transitions \({\mathbb {c}}_0\xrightarrow {p_1}{}{\mathbb {c}}_1\xrightarrow {p_2}{}{\mathbb {c}}_2\cdots {\mathbb {c}}_{n-1}\xrightarrow {p_n}{}{\mathbb {c}}_n\). In such a case, we write \({\mathbb {c}}_0\xrightarrow {\uppi }{}{\mathbb {c}}_n\). We define \(\mathtt{last}\left( \uppi \right) :={\mathbb {c}}_n\). We define \({\uppi }\!\uparrow :=p_1p_2\cdots p_n\), i.e., it is the sequence of processes performing the transitions in \(\uppi \). For a sequence \(\upsigma =p_1p_2\cdots p_n\in {\mathcal P}^*\), we say that \(\upsigma \) is a context if there is a process \(p\in \mathcal P\) such that \(p_i=p\) for all \(i:1\le i\le n\). We say that \(\uppi \) is committed (resp. \(k\) -bounded) if \(\mathtt{last}\left( \uppi \right) \) is committed (resp. if \(\uppi \uparrow =\upsigma _1\cdot \upsigma _2\cdot \cdots \upsigma _k\) where \(\upsigma _i\) is a context for all \(i:1\le i\le k\)).

For \({\mathbb {c}}\in {\mathbb {C}}\) and \(p\in \mathcal P\), we define the set of reachable labels of the configuration \({\mathbb {c}}\) as follows. (i) If \({\mathbb {c}}={\mathbb {c}}_{ init}\) then \(\mathtt{lbl}\left( {\mathbb {c}}\right) := \left\{ \bot \right\} \), i.e. process \(p\) does not reach to any label in the initial configuration. (ii) If \({\mathbb {c}}\ne {\mathbb {c}}_{ init}\), let \(\mathbbm {e}\) be the maximal event of \(p\) (w.r.t. \(\prec \)) in \({\mathbb {c}}\). We define \(\mathtt{lbl}\left( {\mathbb {c}}\right) := \left\{ \mathtt{lbl}\left( \mathtt{ins}\left( \mathbbm {e}\right) \right) \right\} \), i.e. process \(p\) reaches to the label of the maximal event \(\mathbbm {e}\) of \(p\) (w.r.t. \(\prec \)) in the configuration \({\mathbb {c}}\). In the reachability problem, we are given a label \(\uplambda \) and asked whether there is a committed run \(\uppi \) and a configuration \({\mathbb {c}}\) such that \({\mathbb {c}}_{ init}\xrightarrow {\uppi }{}{\mathbb {c}}\) where \(\uplambda \in \mathtt{lbl}\left( {\mathbb {c}}\right) \). For a natural number \({\mathbbm {K}}\), the \({\mathbbm {K}}\) -bounded reachability problem is defined by requiring that the run \(\uppi \) in the above definition is \({\mathbbm {K}}\)-bounded.

3 Translation

In this section, we introduce an algorithm that reduces, for a given number \({\mathbbm {K}}\), the \({\mathbbm {K}}\)-bounded reachability problem for POWER to the corresponding problem for SC. Given an input concurrent program \({ Prog}\), the algorithm constructs an output concurrent program \({ Prog}^\bullet \) whose size is polynomial in \({ Prog}\) and \({\mathbbm {K}}\), such that for each \({\mathbbm {K}}\)-bounded run \(\uppi \) in \({ Prog}\) under the POWER semantics there is a corresponding \({\mathbbm {K}}\)-bounded run \(\uppi ^\bullet \) of \({ Prog}^\bullet \) under the SC semantics that reaches the same set of process labels. Below, we first present a scheme for the translation of \({ Prog}\), and mention some of the challenges that arise due to the POWER semantics. Then, we give a detailed description of the data structures we use in \({ Prog}^\bullet \). Finally, we describe the codes of the processes in \({ Prog}^\bullet \).

Fig. 3.
figure 3

Translation map \([\![{.}]\!]_{{\mathbbm {K}}}\). We omit the label of an intermediary instruction when it is not relevant.

Scheme. Our construction is based on code-to-code translation scheme that transforms the program \({ Prog}\) into the program \({ Prog}^\bullet \) following the map function \([\![{.}]\!]_{{\mathbbm {K}}}\) given in Fig. 3. Let \(\mathcal P\) and \({\mathcal X}\) be the sets of processes and (shared) variables in \({ Prog}\). The map \([\![{.}]\!]_{{\mathbbm {K}}}\) replaces the variables of \({ Prog}\) by \((|{\mathcal P}|\cdot (2{\mathbbm {K}}+1))\) copies of the set \({\mathcal X}\), in addition to a finite set of finite-data structures (which will be formally defined in the Data Structures paragraph). The map function then declares two additional processes \(\mathtt{iniProc}\) and \(\mathtt{verProc}\) that will be used to initialize the data structures and to check the reachability problem at the end of the run of \({ Prog}^\bullet \). The formal definition of \(\mathtt{iniProc}\) (resp. \(\mathtt{verProc}\)) will be given in the Initializing process (resp. Verifier process) paragraph. Furthermore, the map function \([\![{.}]\!]_{{\mathbbm {K}}}\) transforms the code of each process \(p\in \mathcal P\) to a corresponding process \(p^\bullet \) that will simulate the moves of \(p\). The processes \(p\) and \(p^\bullet \) will have the same set of registers. For each instruction \(\mathfrak {i}\) appearing in the code of the process \(p\), the map \([\![{\mathfrak {i}}]\!]_{{\mathbbm {K}}}^p\) transforms it to a sequence of instructions as follows: First, it adds the code defined by \({\mathtt{activeCnt}}\) to check if the process \(p\) is active during the current context, then it transforms the statement \(\mathfrak {s}\) of the instruction \(\mathfrak {i}\) into a sequence of instructions following the map \([\![{\mathfrak {s}}]\!]_{{\mathbbm {K}}}^p\), and finally it adds the sequence of instructions defined by \({\mathtt{closeCnt}}\) to guess the occurrence of a context switch. The translation of an aci statement keeps the same statements and adds \(\mathtt{control}\) to guess the contexts when the corresponding event will be committed. The terminating statement remains identical by the map function \([\![{\mathtt{term}}]\!]_{{\mathbbm {K}}}^p\). The translations of write and read statements will be described in the Write Instructions and Read Instructions paragraphs respectively.

Challenges. There are two aspects of the POWER semantics (cf. Sect. 2) that make it difficult to simulate the run \(\uppi \) under the SC semantics, namely non-atomicity and asynchrony. First, events are not executed atomically. In fact, an event is first fetched and initialized before it is committed. In particular, an event may be fetched in one context and be initialized and committed only in later contexts. Since there is no bound on the number of events that may be fetched in a given context, our simulation should be able to handle unbounded numbers of pending events. Second, write events of one process are propagated in an asynchronous manner to the other processes. This implies that we may have unbounded numbers of “traveling” events that are committed in one context and propagated to other processes only in subsequent contexts. This creates two challenges in the simulation. On the one hand, we need to keep track of the coherence order among the different write events. On the other hand, since write events are not distributed to different processes at the same time, the processes may have different views of the values of a given variable at a given point of time.

Since it is not feasible to record the initializing, committing, and propagating contexts of an unbounded number of events in an SC run, our algorithm will instead predict the summary of effects of arbitrarily long sequences of events that may occur in a given context. This is implemented using an intricate scheme that first guesses and then checks these summaries. Concretely, each event \(\mathbbm {e}\) in the run \(\uppi \) is simulated by a sequence of instructions in \(\uppi ^\bullet \). This sequence of instructions will be executed atomically (without interruption from other processes and events). More precisely, if \(\mathbbm {e}\) is fetched in a context \(k:1\le k\le {\mathbbm {K}}\), then the corresponding sequence of instructions will be executed in the same context \(k\) in \(\uppi ^\bullet \). Furthermore, we let \(\uppi ^\bullet \) guess (speculate) (i) the contexts in which \(\mathbbm {e}\) will be initialized, committed, and propagated to other processes, and (ii) the values of variables that are seen by read operations. Then, we check whether the guesses made by \(\uppi ^\bullet \) are valid w.r.t. the POWER semantics. As we will see below, these checks are done both on-the-fly during \(\uppi ^\bullet \), as well as at the end of \(\uppi ^\bullet \). To implement the guess-and-check scheme, we use a number of data structures, described below.

Data Structures. We will introduce the data structures used in our simulation in order to deal with the above asynchrony and non-atomicity challenging aspects.

Asynchrony. In order to keep track of the coherence order, we associate a time stamp with each write event. A time stamp \(\uptau \) is a mapping \(\mathcal P\mapsto {\mathbbm {{\mathbbm {K}}}^\otimes }\) where \({\mathbbm {{\mathbbm {K}}}^\otimes }:={\mathbbm {K}}\cup \left\{ \otimes \right\} \). For a process \(p\in \mathcal P\), the value of \(\uptau \left( p\right) \) represents the context in which the given event is propagated to \(p\). In particular, if \(\uptau \left( p\right) =\otimes \) then the event is never propagated to \(p\). We use \({\mathbb T}\) to denote the set of time stamps. We define an order \(\sqsubseteq \) on \({\mathbb T}\) such that \(\uptau _1\sqsubseteq \uptau _2\) if, for all processes \(p\in \mathcal P\), either \(\uptau _1(p)=\otimes \), or \(\uptau _2(p)=\otimes \), or \(\uptau _1(p)\le \uptau _2(p)\). Notice that if \(\uptau _1\sqsubseteq \uptau _2\) and there is a process \(p\in \mathcal P\) such that \(\uptau _1(p)\ne \otimes \), \(\uptau _2(p)\ne \otimes \), and \(\uptau _1(p)<\uptau _2(p)\) then \(\uptau _1(q)\le \uptau _2(q)\) whenever \(\uptau _1(q)\ne \otimes \) and \(\uptau _2(q)\ne \otimes \). In such a case, \(\uptau _1\sqsubset \uptau _2\). On the other hand, if either \(\uptau _1(p)=\otimes \) or \(\uptau _2(p)=\otimes \) for all \(p\in \mathcal P\), then both \(\uptau _1\sqsubseteq \uptau _2\) and \(\uptau _2\sqsubseteq \uptau _1\). The coherence order \(\prec _\mathtt{co}\) on write events will be reflected in the order \(\sqsubseteq \) on their time stamps. In particular, for events \(\mathbbm {e}_1\) and \(\mathbbm {e}_2\) with time stamps \(\uptau _1\) and \(\uptau _2\) respectively, if \(\uptau _1\sqsubset \uptau _2\) then \(\mathbbm {e}_1\) precedes \(\mathbbm {e}_2\) in coherence order. The reason is that there is at least one process \(p\) to which both \(\mathbbm {e}_1\) and \(\mathbbm {e}_2\) are propagated, and \(\mathbbm {e}_1\) is propagated to \(p\) before \(\mathbbm {e}_2\). However, if both \(\uptau _1\sqsubseteq \uptau _2\) and \(\uptau _2\sqsubseteq \uptau _1\) then the events are never propagated to the same process, and hence they need not to be related by the coherence order.

If \(\uptau _1\sqsubseteq \uptau _2\) then we define the summary of \(\uptau _1\) and \(\uptau _2\), denoted by \(\uptau _1\oplus \uptau _2\), to be the time stamp \(\uptau \) such that \(\uptau (p)=\uptau _1(p)\) if \(\uptau _2(p)=\otimes \), and \(\uptau (p)=\uptau _2(p)\) otherwise. For a sequence \(\upsigma =\uptau _0\sqsubseteq \uptau _1\sqsubseteq \cdots \sqsubseteq \uptau _n\) of time stamps, we define the summary \(\oplus \upsigma :=\uptau '_n\) where \(\uptau '_i\) is defined inductively by \(\uptau '_0:=\uptau _0\), and \(\uptau '_i:=\uptau '_{i-1}\oplus \uptau _i\) for \(i:1\le i\le n\). Notice that, for \(p\in \mathcal P\), we have \(\oplus \upsigma (p)=\uptau _i(p)\) where \(i\) is the largest \(j:1\le j\le n\) s.t. \(\uptau _j(p)\ne \otimes \).

Our simulation observes the sequence of write events received by a process in each context. In fact, the simulation will initially guess and later verify the summaries of the time stamps of such a sequence. This is done using data structures \(\upalpha ^{ init}\) and \(\upalpha \). The mapping \(\upalpha ^{ init}:\mathcal P\times {\mathcal X}\times {\mathbbm {K}}\mapsto \left[ {\mathcal P}\mapsto {{\mathbbm {{\mathbbm {K}}}^\otimes }}\right] \) stores, for a process \(p\in \mathcal P\), a variable \(x\in {\mathcal X}\), and a context \(k:1\le k\le {\mathbbm {K}}\), an initial guess \(\upalpha ^{ init}\left( p,x,k\right) \) of the summary of the time stamps of the sequence of write events on \(x\) propagated to \(p\) up to the start of context \(k\). Starting from a given initial guess for a given context \(k\), the time stamp is updated successively using the sequence of write events on \(x\) propagated to \(p\) in \(k\). The result is stored using the mapping \(\upalpha :\mathcal P\times {\mathcal X}\times {\mathbbm {K}}\mapsto \left[ {\mathcal P}\mapsto {{\mathbbm {{\mathbbm {K}}}^\otimes }}\right] \). More precisely, we initially set the value of \(\upalpha \) to \(\upalpha ^{ init}\). Each time a new write event \(\mathbbm {e}\) on \(x\) is created by \(p\) in the context \(k\), we guess the time stamp \(\upbeta \) of \(\mathbbm {e}\), and then update \(\upalpha \left( p,x,k\right) \) by computing its summary with \(\upbeta \). Thus, given a point in a context \(k\), \(\upalpha \left( p,x,k\right) \) contains the summary of the time stamps of the whole sequence of write events on \(x\) that have been propagated to \(p\) up to that point. At the end of the simulation, we verify, for each context \(k:1\le k<{\mathbbm {K}}\), that the value of \(\upalpha \) for a context \(k\) is equal to the value of \(\upalpha ^{ init}\) for the next context \(k+1\).

Furthermore, we use three data structures for storing the values of variables. The mapping \(\mu ^{ init}:\mathcal P\times {\mathcal X}\times {\mathbbm {K}}\mapsto {\mathcal D}\) stores, for a process \(p\in \mathcal P\), a variable \(x\in {\mathcal X}\), and a context \(k:1\le k\le {\mathbbm {K}}\), an initial guess \(\mu ^{ init}\left( p,x,k\right) \) of the value of the latest write event on \(x\) propagated to \(p\) up to the start of the context \(k\). The mapping \(\mu :\mathcal P\times {\mathcal X}\times {\mathbbm {K}}\mapsto {\mathcal D}\) stores, for a process \(p\in \mathcal P\), a variable \(x\in {\mathcal X}\), and a point in a context \(k:1\le k\le {\mathbbm {K}}\), the value \(\mu \left( p,x,k\right) \) of the latest write event on \(x\) that has been propagated to \(p\) up to that point. Moreover, the mapping \(\upnu :\mathcal P\times {\mathcal X}\mapsto {\mathcal D}\) stores, for a process \(p\in \mathcal P\) and a variable \(x\in {\mathcal X}\), the latest value \(\upnu \left( p,x\right) \) that has been written on \(x\) by \(p\).

Non-atomicity. In order to satisfy the different dependencies between events, we need to keep track of the contexts in which they are initialized and committed. One aspect of our translation is that it only needs to keep track of the context in which the latest read or write event on a given variable in a given process is initialized or committed. The mapping \(\mathtt{iW}:\mathcal P\times {\mathcal X}\mapsto {\mathbbm {K}}\) defines, for \(p\in \mathcal P\) and \(x\in {\mathcal X}\), the context \(\mathtt{iW}\left( p,x\right) \) in which the latest write event on \(x\) by \(p\) is initialized. The mapping \(\mathtt{cW}:\mathcal P\times {\mathcal X}\mapsto {\mathbbm {K}}\) is defined in a similar manner for committing (rather than initializing) write events. Furthermore, we define similar mappings \(\mathtt{iR}\) and \(\mathtt{cR}\) for read events. The mapping \(\mathtt{iReg}:{\mathcal R}\mapsto {\mathbbm {K}}\) gives, for a register \(\$r\in {\mathcal R}\), the initializing context \(\mathtt{iReg}\left( \$r\right) \) of the latest read event loading a value to \(\$r\). For an expression \({ exp}\), we define \(\mathtt{iReg}\left( { exp}\right) :=\max \left\{ \mathtt{iReg}\left( \$r\right) \ | \ \$r\in {\mathcal R}\left( { exp}\right) \right\} \). The mapping \(\mathtt{cReg}:{\mathcal R}\mapsto {\mathbbm {K}}\) gives the contexts for committing (rather than initializing) of the read events. We extend \(\mathtt{cReg}\) from registers to expressions in a similar manner to \(\mathtt{iReg}\). Finally, the mapping \(\mathtt{ctrl}:\mathcal P\mapsto {\mathbbm {K}}\) gives, for a process \(p\in \mathcal P\), the committing context \(\mathtt{ctrl}\left( p\right) \) of the latest aci event in \(p\).

figure a

Initializing Process. Algorithm 1 shows the initialization process. The for-loop of lines 1, 3 and 5 define the values of the initializing and committing data structures for the variables and registers together with \(\upnu \left( p,x\right) \), \(\mu \left( p,x,1\right) \), \(\upalpha \left( p,x,1\right) \) and \(\mathtt{ctrl}\left( p\right) \) for all \(p\in \mathcal P\) and \(x\in {\mathcal X}\). The for-loop of line 7 defines the initial values of \(\upalpha \) and \(\mu \) at the start of each context \(k\ge 2\) (as described above). The for-loop of line 10 chooses an active process to execute in each context. The current context variable \(\mathtt{cntxt}\) is initialized to 1.

Write Instructions. Consider a write instruction \(\mathfrak {i}\) in a process \(p\in \mathcal P\) whose statement is of the form \(x\leftarrow { exp}\). The translation of \(\mathfrak {i}\) is shown in Algorithm 3. The code simulates an event \(\mathbbm {e}\) executing \(\mathfrak {i}\), by encoding the effects of the inference rules \(\mathtt{Init\text {-}Write}\), \(\mathtt{Com\text {-}Write}\) and \(\mathtt{Prop}\) that initialize, commit, and propagate a write event respectively. The translation consists of three parts, namely guessing, checking and update.

Guessing. We guess the initializing and committing contexts for the event \(\mathbbm {e}\), together with its time stamp. In line 1, we guess the context in which the event \(\mathbbm {e}\) will be initialized, and store the guess in \(\mathtt{iW}\left( p,x\right) \). Similarly, in line 3, we guess the context in which the event \(\mathbbm {e}\) will be committed, and store the guess in \(\mathtt{cW}\left( p,x\right) \) (having stored its old value in the previous line). In the for-loop of line 4, we guess a time stamp for \(\mathbbm {e}\) and store it in \(\upbeta \). This means that, for each process \(q\in \mathcal P\), we guess the context in which the event \(\mathbbm {e}\) will be propagated to \(q\) and we store this guess in \(\upbeta \left( q\right) \).

Checking. We perform sanity checks on the guessed values in order to verify that they are consistent with the POWER semantics. Lines 6–8 perform the sanity checks for \(\mathtt{iW}\left( p,x\right) \). In lines 6–7, we verify that the initializing context of the event \(\mathbbm {e}\) is not smaller than the current context. This captures the fact that initialization happens after fetching of \(\mathbbm {e}\). It also verifies that initialization happens in a context in which \(p\) is active. In line 8, we check whether \(\mathtt{WrInitCnd}\) in the rule \(\mathtt{Init\text {-}Write}\) is satisfied. To do that, we verify that the data dependency order \(\prec _\mathtt{data}\) holds. More precisely, we find, for each register \(\$r\) that occurs in \({ exp}\), the initializing context of the latest read event loading to \(\$r\). We make sure that the initializing context of \(\mathbbm {e}\) is later than the initializing contexts of all these read events. By definition, the largest of all these contexts is stored in \(\mathtt{iReg}\left( { exp}\right) \).

Lines 9–10 perform the sanity checks for \(\mathtt{cW}\left( p,x\right) \). In line 9, we check the committing context of the event \(\mathbbm {e}\) is at least as large as its initializing context. In line 10, we check that \(\mathtt{ComCnd}\) in the rule \(\mathtt{Com\text {-}Write}\) is satisfied. To do that, we check that the committing context is larger than (i) the committing context of all the read events from which the registers in the expression \({ exp}\) fetch their values (to satisfy the data dependency order \(\prec _\mathtt{data}\), in a similar manner to that described for initialization above), (ii) the committing contexts of the latest read and write events on \(x\) in \(p\), i.e., \(\mathtt{cR}\left( p,x\right) \) and \(\mathtt{cW}\left( p,x\right) \) (to satisfy the per-location program order \(\prec _\mathtt{poloc}\)), and (iii) the committing context of the latest aci event in \(p\), i.e., \(\mathtt{ctrl}\left( p\right) \) (to satisfy the control order \(\prec _\mathtt{ctrl}\)).

The for-loop of line 11 performs three sanity checks on the time stamp \(\upbeta \). In line 12, we verify that the event \(\mathbbm {e}\) is propagated to \(p\) in the same context as the one in which it is committed. This is consistent with the rule \(\mathtt{Com\text {-}Write}\) which requires that when a write event is committed then it is immediately propagated to the committing process. In line 14, we verify that if the event \(\mathbbm {e}\) is propagated to a process \(q\) (different from \(p\)), then the propagation takes place in a context later than or equal to the one in which \(\mathbbm {e}\) is committed. This is to be consistent with the fact that a write event is propagated to other processes only after it has been committed. In line 17, we check that guessed time stamp of the event \(\mathbbm {e}\) does not cause a violation of the coherence order \(\prec _\mathtt{co}\). To do that, we consider each process \(q\in \mathcal P\) to which \(\mathbbm {e}\) will be propagated (i.e., \(\upbeta \left( q\right) \ne \otimes \)). The time stamp of \(\mathbbm {e}\) should be larger than the time stamp of any other write event \(\mathbbm {e}'\) on \(x\) that has been propagated to \(q\) up to the current point (since \(\mathbbm {e}\) should be larger in the coherence order than \(\mathbbm {e}'\)). Notice that by construction the time stamp of the largest such event \(\mathbbm {e}'\) is currently stored in \(\upalpha \left( q,x,\upbeta \left( q\right) \right) \). Moreover, in line 18, we check that the event is propagated to \(q\) in a context in which \(p\) is active.

Updating. The for-loop of line 19 uses the values guessed above for updating the global data structure \(\upalpha \). More precisely, if the event \(\mathbbm {e}\) is propagated to a process \(q\), i.e., \(\upbeta \left( q\right) \ne \otimes \), then we add \(\upbeta \) to the summary of the time stamps of the sequence of write operations on \(x\) propagated to \(q\) up to the current point in the context \(\upbeta \left( q\right) \). Lines 22–23 assign the value \({ exp}\) to \(\mu \left( p,x,\upbeta \left( q\right) \right) \) and \(\upnu \left( p,x\right) \) respectively. Recall that the former stores the value defined by the latest write event on \(x\) propagated to \(q\) up to the current point in the context \(\upbeta \left( q\right) \), and the latter stores the value defined by the latest write on \(x\) by \(p\).

Read Instructions. Consider a read instruction \(\mathfrak {i}\) in a process \(p\in \mathcal P\) whose statement is of the form \(\$r\leftarrow x\). The translation of \(\mathfrak {i}\) is shown in Algorithm 2. The code simulates an event \(\mathbbm {e}\) running \(\mathfrak {i}\) by encoding the three inference rules \(\mathtt{Local\text {-}Read}\), \(\mathtt{Prop\text {-}Read}\), and \(\mathtt{Com\text {-}Read}\). In a similar manner to a write instruction, the translation scheme for a read instruction consists of guessing, checking and update parts. Notice however that the initialization of the read event is carried out through two different inference rules.

Guessing. In line 1, we store the old value of \(\mathtt{iR}\left( p,x\right) \). In line 2, we guess the context in which the event \(\mathbbm {e}\) will be initialized, and store the guessed context both in \(\mathtt{iR}\left( p,x\right) \) and \(\mathtt{iReg}\left( \$r\right) \). Recall that the latter records the initializing context of the latest read event loading a value to \(\$r\). In lines 3–4, we execute similar instructions for committing (rather than initializing).

Checking. Lines 5–8 perform the sanity checks for \(\mathtt{iR}\left( p,x\right) \). Lines 5–6 check that the initializing context for the event \(\mathbbm {e}\) is not smaller than the current context and the initialization happens in a context in which p is active. Line 7 makes sure that at least one of the two inference rules \(\mathtt{Local\text {-}Read}\) and \(\mathtt{Prop\text {-}Read}\) is satisfied, by checking that the closest write event \(\mathtt{CW}\left( {\mathbb {c}},\mathbbm {e}\right) \) (if it exists) has already been initialized. In line 8, we satisfy \(\mathtt{RdCnd}\) in the rule \(\mathtt{Com\text {-}Read}\). Lines 9–11 perform the sanity checks for \(\mathtt{cR}\left( p,x\right) \) in a similar manner to the corresponding instructions for write events (see above).

Updating. The purpose of the update part (the if-statement of line 12) is to ensure that the correct read-from relation is defined as described by the inference rules \(\mathtt{Local\text {-}Read}\) and \(\mathtt{Prop\text {-}Read}\). If \(\mathtt{iR}\left( p,x\right) <\mathtt{cW}\left( p,x\right) \), then this means that the latest write event \(\mathbbm {e}'\) on \(x\) by \(p\) is not committed and hence, according to \(\mathtt{Local\text {-}Read}\), the event \(\mathbbm {e}\) reads its value from that event. Recall that this value is stored in \(\upnu \left( p,x\right) \). On the other hand, if \(\mathtt{iR}\left( p,x\right) \ge \mathtt{cW}\left( p,x\right) \) then the event \(\mathbbm {e}'\) has been committed and hence, according to \(\mathtt{Prop\text {-}Read}\), the event \(\mathbbm {e}\) reads its value from the latest write event on \(x\) propagated to \(p\) in the context where \(\mathbbm {e}\) is initialized. We notice that this value is stored in \(\mu \left( p,x,\mathtt{iR}\left( p,x\right) \right) \).

Verifier Process. The verifier process makes sure that the updated value \(\upalpha \) of the time stamp at the end of a given context \(k: 1\le k \le {\mathbbm {K}}-1\) is equal to the corresponding guessed value \(\upalpha ^{ init}\) at the start of the next context. It also performs the corresponding checking for the values written on the variables (by comparing \(\mu \) and \(\mu ^{ init}\)). Finally, it checks whether we reach an error label \(\uplambda \) or not.

4 Experimental Results

In order to evaluate the efficiency of our approach, we have implemented a context-bounded model checker for programs under POWER, called power2sc Footnote 1. We use cbmc version 5.1 [17] as the backend tool. However, observe that our code-to-code translation can be implemented on the top of any backend tool that provides safety verification of concurrent programs running under the SC semantics. In the following, we present the evaluation of power2sc on 28 C/pthreads benchmarks collected from goto-instrument [9], nidhugg [6], memorax [5], and the SV-COMP17 bechmark suit [1]. These are widespread medium-sized benchmarks that are used by many tools for analyzing concurrent programs running under weak memory models (e.g. [2,3,4, 7, 8, 10, 12,13,14,15, 22, 24, 37, 40]). We divide our results in two sets. The first set concerns unsafe programs while the second set concerns safe ones. In both parts, we compare results obtained from power2sc to the ones obtained from goto-instrument and nidhugg, which are, to the best of our knowledge, the only two tools supporting C/pthreads programs under POWERFootnote 2. All experiments were run on a machine equipped with a 2.4 GHz Intel x86-32 Core2 processor and 4 GB RAM.

Table 1. Comparing power2sc with goto-instrument and nidhugg on two sets of benchmarks: (a) unsafe and (b) safe (with manually inserted synchronizations). The LB column indicates whether the tools were instructed to unroll loops up to a certain bound. The CB column gives the context bound for power2sc. The program size is the number of code lines. A t/o entry means that the tool failed to complete within 1800 s. The best running time (in seconds) for each benchmark is given in bold font.

Table 1a shows that power2sc performs well in detecting bugs compared to the other tools for most of the unsafe examples. We observe that power2sc manages to find all the errors using at most 6 contexts while nidhugg and goto-instrument time out to return the errors for several examples. This also confirms that few context switches are sufficient to find bugs. Table 1b demonstrates that our approach is also effective when we run safe programs. power2sc manages to run most of the examples (except Dijkstra and Lamport) using the same context bounds as in the case of their respective unsafe examples. While nidhugg and goto-instrument time out for several examples, they do not impose any bound on the number of context switches while power2sc does.

We have also tested the performance of power2sc with respect to the verification of small litmus tests. power2sc manages to successfully run all 913 litmus tests published in [34]. Furthermore, the output result returned by power2sc matches the ones returned by the tool herd [11] in all the litmus tests.