figure a
figure b

1 Modelling and Validating Correctness in FDR4

FDR4 ย [4] is a model checker for CSPย [5] that has recently been used to verify linearisabilityย [7], as well as opacity and durable opacityย [3]. We similarly provide an FDR4 development, which allows proofs of refinement to be automatically checked up to certain bounds. This is in contrast to manual methods of proving correctness of concurrent objectsย [1, 2], which require a significant amount of manual human input (though such manual proofs are unbounded). FDR4 uses a variety of underlying model checking paradigms and partial-order reduction techniquesย [4], depending on the structure of the files to be verified. FDR4 builds on FDR3, but the exact implementation details of FDR4 are not publicly available since it is a commercial product (available for free academic use).

The CSP files corresponding to this report may be downloaded fromย [8].

1.1 Modelling Details

One of the most challenging aspects of the FDR4 development is the modelling work itself. Our algorithms execute over a shared memory, but the CSP formalism is based on communicating processes with no notion of shared states. Thus, for each location we must explicitly define handler processes that communicate through channels to update and return the values of components (e.g. the addresses, read/write sets) of each model. Moreover, the implementations (txPMDK, PMDK-NORec and PMDK-TML), the specification (ddTMS) and underlying memory models (PSC and \(\textsc {PTSO}_{\textrm{syn}}\)) we consider are non-trivial, significantly increasing the challenge of the modelling effort.

Although constructing the models is challenging, once the models have been developed, they can be combined in a modular fashion. We have taken advantage of this feature to combine our implementations with different memory models during development. The combination of PMDK-TML and TML/NOrec also takes advantage of this modularity.

This modularity also means that our models are reusable. One could use our models to check other developments, e.g. those that use txPMDK to implement other failure-atomic data structures, or verify redesigns of txPMDK over different memory models. Specifically, we use a top-level CSP process (which may comprise an interleaved composition of processes for each transaction) to model the most general client. Each transaction process begins a transaction, and then calls an unbounded number of reads, writes and allocations at non-deterministically chosen locations and with non-deterministically chosen values. An in-flight transaction process may also non-deterministically choose to terminate by calling commit instead of calling a read, write or allocation. Each operation call produces an externally visible invocation event, and when the operation terminates, an externally visible response is generated. Some operations may respond with an abort, in which case the transaction process itself terminates.

Additionally, there is an externally visible crash event that synchronises with all processes. At the level of the abstraction (i.e. ddTMS), this simply terminates all in-flight transactions, and resets the memory sequence (as detailed by the rule (X)). At the level of the implementation, all in-flight transactions are terminated and additionally, the store and persistency buffers are cleared. This means that when execution resumes, the value of each location is taken from NVM. Immediately after a crash (and before any other processes are started), the recovery process corresponding to the algorithm is executed. Note that transaction identifiers are never reused.

We eschew further details of our FDR4 models since they are provided as supplementary materialย [8] and also refer the interested reader to other prior works ย [3, 7].

Fig. 1.
figure 1

Overview of FDR4 checks

Fig. 2.
figure 2

Summary of upper bounds checks (total time in seconds: compilation + model exploration). The time out (TO) is set to 1000 seconds of compilation time.

1.2 Overview of Development

An overview of our FDR4 development is given in Fig. 1. We derive two specifications from ddTMS. The first is an FDR4 model of ddTMS itself, based on prior workย [3, 7], but contains the extensions required for ddTMS. The second is ddTMS-Seq, which restricts ddTMS to a sequential crash-free specification. We use ddTMS-Seq to obtain (lower-bound) liveness-like guarantees, which strengthens traditional deadlock or divergence proofs of refinement. These lower-bound checks ensure our models contain at least the traces of ddTMS-Seq.

CSP files. Our development comprises the following files.

figure c

Description of Tests. The file comprises six tests as detailed in Figs. 9 and 10 of the paper. There are three upper-bound checks, which show that PMDK, PMDK-TML and PMDK-NORec are refinements of ddTMS, validating soundness:

  • , checking that PMDK refines ddTMS.

  • , checking that PMDK-TML refines ddTMS.

  • , checking that PMDK-NORec refines ddTMS.

Each of these tests can be run against the memory models: NVM (which contains no crashes), PSC and \(\textsc {PTSO}_{\textrm{syn}}\) by commenting/uncommenting the relevant lines at the end of the file .

Additionally, there are three lower-bound checks, which show ddTMS-Seq are refinements of PMDK, PMDK-TML and PMDK-NORec.

Each of these tests can be run against the memory models: NVM and PSC as defined in the file . Note that the test against \(\textsc {PTSO}_{\textrm{syn}}\) times out. However, the tests above are sufficient since \(\textsc {PTSO}_{\textrm{syn}}\) reduces to PSC in the absence of data races (e.g., sequential executions).

Each check in FDR4 is split into two phases: (1)a compilation phase that builds the models; and (2) a model exploration phase. The characteristics of the upper and lower bounds checks are distinct. When naively checking the upper bound, compilation is almost instantaneous but model exploration times can be significant; these times are swapped for the lower bounds checks.

In general, lower-bounds take much longer to verify than the upper-bounds since FDR4 is optimised to verify abstract (low-detail) specifications are refined by concrete (high-detail) implementations. The lower bounds checks use the more complex models as the specification, leading to the creation of very large space-inefficient models, putting pressure on the available system memory. However, the lower-bound checks for PSC and \(\textsc {PTSO}_{\textrm{syn}}\) are superceded by the corresponding checks over NVM, since the memory models PSC and \(\textsc {PTSO}_{\textrm{syn}}\) are both supersets of NVM. That is, any trace over NVM must also be a trace PSC and \(\textsc {PTSO}_{\textrm{syn}}\). For two transactions, two locations and two values, the checks for PMDK, PMDK-TML and PMDK-NORec take 12.16, 17.36, and 56.02 seconds, respectively.

1.3 Summary of Results

Fig.ย 2 summarises our experiments on the upper bound checks, where the times shown combine the compilation and model exploration times. Each row represents an experiment that bounds the number of transactions (#txns), locations (#locs), values (#val) and the size of the persistency and store buffers (#buff). The times reported are for an Apple M1 device with 16GB of memory. The first row depicts a set of experiments where the implementations execute directly on NVM, without any buffers. As we discuss below, these tests are sufficient for checking lower bounds. The baseline for our checks sets the value of each parameter to two, and Fig. 2 allows us to see the cost of increasing each parameter. Note that all models time out when increasing the number of transactions to three, thus these times are not shown. Also note that for txPMDK (which is single-threaded), the checks for PSC also cover \(\textsc {PTSO}_{\textrm{syn}}\), since \(\textsc {PTSO}_{\textrm{syn}}\) is equivalent to PSC in the absence of racesย [6]. Nevertheless, it is interesting to run the single-threaded experiments on the \(\textsc {PTSO}_{\textrm{syn}}\) model to understand the impact of the memory model on the checks.

In our experiments we use FDR4 โ€™s built-in partial order reduction features to make the upper bound checks feasible. This has a huge impact on the model checking speed; for instance, the check for PMDK-TML with two transactions, two locations, two values and buffer size of two reduces from over 6000 seconds (1 hour and 40 minutes) to under 7 seconds, which is almost a 1000-fold improvement! This speed-up makes it feasible to use FDR4 for rapid prototyping when developing programs that use txPMDK, even for the relatively complex \(\textsc {PTSO}_{\textrm{syn}}\) memory model.