Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Distributed systems replicate their state over different nodes in order to satisfy several non-functional requirements, such as performance, availability, and reliability. It then becomes crucial to keep a consistent view of the replicated data. However, this is a challenging task because consistency is in conflict with two common requirements of distributed applications: availability (every request is eventually executed) and tolerance to network partitions (the system operates even in the presence of failures that prevent communication among components). In fact, it is impossible for a system to simultaneously achieve strong Consistency, Availability and Partition tolerance [6]. Since many domains cannot renounce to availability and network partitions, developers need to cope with weaker notions of consistency by allowing, e.g., replicas to (temporarily) exhibit some discrepancies, as long as they eventually converge to the same state.

This setting challenges the way in which data are specified: states, state transitions and return values should account for the different views that a data item may simultaneously have. Consider a data type Register corresponding to a memory cell that is read and updated by using, respectively, operations \(\mathtt {rd}\) and \(\mathtt {wr}\). In a replicated scenario, the value obtained when reading a register after two concurrent updates \(\mathtt {wr(0)}\) and \(\mathtt {wr(1)}\) (i.e., updates taking place over different replicas) is affected by the way in which updates propagate among the different replicas: it is perfectly possible that the result of the read is (i) undefined (when the read is performed over a third replica that has not received any of the updates), (ii) 0 or (iii) 1. Basically, the return value depends on the updates that are seen by that read operation. Choosing the return value is straightforward when a read sees just one update. This is less so if a read is performed over a replica that knows both updates, for allowing all replicas to (consistently) pick one of the available values. A common strategy for registers is that the last-write wins, i.e., the last update should be chosen when several concurrent updates are observed. This strategy implicitly assumes that all events in a system can be arranged in a total order. Several recent approaches focus on the operational specification of replicated data types [2,3,4,5, 7, 8, 12, 14]. Usually, the specification describes the meaning of an operation in terms of two different relations among events: visibility, which explains the causes for each result, and arbitration, which totally orders events. Consider the visibility relation V in Fig. 1a and the arbitrations \(A_1\) and \(A_2\) in Fig. 1b and c, respectively. The meaning of \(\mathtt {rd}\) is defined such that \(\mathtt {rd} (V,A_1) = 1\) and \(\mathtt {rd} (V,A_2) = 0\). We remark that operational approaches require specifications to be functional, i.e., for every operation, visibility and arbitration relation, there exists exactly one return value. In this way operational specifications commit to concrete policies for resolving conflicts.

Fig. 1.
figure 1

A scenario for the replicated data type Register

This work aims at putting on firm grounds the operational approaches for rdts by giving them a purely functional description and, eventually, a categorical one. In our view, rdts are functions that map visibility graphs (i.e., configurations) into sets of admissible arbitrations, i.e., all executions that generate a particular configuration. In this setting, a configuration mapped to an empty set of admissible arbitrations stands for an unreachable configuration. We rely on such an abstract view of rdts to highlight some of the implicit assumptions shared by most of the operational approaches. In particular, we characterise operational approaches, such as [4, 12], as those specifications that satisfy three properties: besides the evident requirement of being functional (i.e., deterministic and total), they must be coherent (i.e., larger states are explained as the composition of smaller ones), and saturated (e.g., an unobserved operation can be arbitrated in any position, even before the events that it sees). We show this inclusion to be strict and discuss some interesting cases that do not fall in this class. Moreover, we show that functional characterisation elegantly accounts for underspecification and refinement, which are standard notions in data type specification.

Then, we develop a categorical presentation for specifications. We focus on coherent specifications and show that there is a one-to-one correspondence between coherent specifications and a particular class of functors from the category \(\mathcal {I}(\mathcal {L})\) of labelled directed acyclic graphs and injective past-reflecting morphism (which are the dual notion of tp-morphisms [9]) to the category \(\mathcal {P}(\mathcal {L})\) of sets of paths and path-set morphisms preserving the initial object. As it is standard from classical results on algebraic specification theory, pullbacks and (a weak form of) pushouts in \(\mathcal {I}(\mathcal {L})\) provide basic operators for composing specifications, and thus our functorial presentation is the first step towards a denotational semantics of rdts (see e.g. [1] and the references therein).

The paper has the following structure. Section 2 introduces the basic definitions concerning labelled directed acyclic graphs. Section 3 discusses our functional mechanism for the presentation of Replicated Data Types. Section 4 compares our proposal with the classical operational one [2]. Section 5 illustrates a categorical characterisation for our proposal. Finally, in the closing section we draw some conclusions and highlight further developments.

2 Labelled Directed Acyclic Graphs

In this section we recall the basics of labelled directed acyclic graphs, which are used for our description of replicated data types. We rely on countable sets \(\mathcal {E}\) of events and \(\mathcal {L}\) of labels

Definition 1

(Labelled Directed Acyclic Graph). A Labelled Directed Acyclic Graph (ldag) over a set of labels \(\mathcal {L}\) is a triple such that \(\mathcal {E}_{\mathtt {G}}\) is a set of events, is a binary relation whose transitive closure is a strict partial order, and is a labeling function. An ldag \(\mathtt {G}\) is a path if is a strict total order.

We write \(\mathbb {G}(\mathcal {L})\) and \(\mathbb {P}(\mathcal {L})\) to respectively denote the sets of all ldags and paths over \(\mathcal {L}\). We use \(\mathtt {G}\) to range over \(\mathbb {G}(\mathcal {L})\) and \(\mathtt {P}\) to range over \(\mathbb {P}(\mathcal {L})\). Moreover, we write instead of to make evident that paths are total orders. We say that \(\mathtt {P}\) is a path over \(\mathcal {E}\) if and write for . We usually omit the subscript \(\mathtt {G}\) (or \(\mathtt {P}\)) when referring to the elements of \(\mathtt {G}\) (of \(\mathtt {P}\), respectively) when no confusion arises. We write \(\epsilon \) for the empty ldag, i.e., such that .

Definition 2

(Morphism). An ldag morphism \(\mathtt {f}\) from \(\mathtt {G}\) to \(\mathtt {G}'\), written , is a mapping such that and .

Hereafter we implicitly consider ldags up-to isomorphism, i.e., related by a bijective function that preserves and reflects the underlying relation.

Example 1

Consider the set of labels describing the operations of a 1-bit register. Each label is a tuple where \({ op}\) denotes an operation and \({ rv}\) its return value. For homogeneity, we associate the return value ok to every write operation. Now, take the ldag over \(\mathcal {L}\) defined as where and \(\mathtt {\lambda }\) is such that , , . A graphical representation of \(\mathtt {G}_{\mathtt {1}}\) is in Fig. 2a. Since we consider ldags up-to isomorphism, we do not depict events and write instead the corresponding labels when no confusion arises. \(\mathtt {G}_{\mathtt {2}}\) is an ldag where is empty. Neither \(\mathtt {G}_{\mathtt {1}}\) nor \(\mathtt {G}_{\mathtt {2}}\) is a path, because they are not total orders. \(\mathtt {P}_{\mathtt {1}}\) in Fig. 2c is an ldag that is also a path. Hereinafter we use undirected arrows when depicting paths and avoid drawing transitions that are obtained by transitivity, as shown in Fig. 2d. All ldags in Fig. 2 belong to \(\mathbb {G}(\mathcal {L})\), but only is in .

Fig. 2.
figure 2

Two simple ldags and two paths.

2.1 ldag Operations

We now present a few operations on ldags, which will be used in the following sections. We start by introducing some notation for binary relations. We write Id for the identity relation over events and for . We write (and similarly ) for the preimage of , i.e., . We use for the restriction of to elements in \(\mathcal {E}\), i.e. . Analogously, is the domain restriction of \(\mathtt {\lambda }\) to the elements in \(\mathcal {E}\). We write for the extension of the set \(\mathcal {E}\) with a fresh element, i.e., such that .

Definition 3

(Restriction and Extension). Let and . We define

  • as the restriction of to ;

  • as the extension of \(\mathtt {G}\) over \(\mathcal {E}'\) with .

Restriction obviously lifts to sets \(\mathcal {X}\) of ldags, i.e., . We omit the subscript \(\mathcal {E}'\) in when \(\mathcal {E}' = \mathcal {E}\).

Example 2

Consider the ldags \(\mathtt {G}_{\mathtt {1}}\) and \(\mathtt {G}_{\mathtt {2}}\) depicted in Fig. 2a and b, respectively. Then, and .

The following operator allows for the combination of several paths and plays a central rol in our characterisation of replicated data types.

Definition 4

(Product). Let be a set of paths. The product of \(\mathcal {X}\) is

Intuitively, the product of paths is analogous to the synchronous product of transition systems, in which common elements are identified and the remaining ones can be freely interleaved, as long as the original orders are respected.

Example 3

Consider the paths \(\mathtt {P}_{\mathtt {1}}\) and \(\mathtt {P}_{\mathtt {2}}\) in Fig. 3, and assume that they share the event labelled \(\langle \mathtt {wr(2)}, ok\rangle \). Their product has two paths \(\mathtt {P}_{\mathtt {3}}\) and \(\mathtt {P}_{\mathtt {4}}\), each of them contains the elements of \(\mathtt {P}_{\mathtt {1}}\) and \(\mathtt {P}_{\mathtt {2}}\) and preserves the relative order of the elements in the original paths. We remark that the product is empty when the paths have incompatible orders. For instance, .

Fig. 3.
figure 3

Product between two paths.

It is straightforward to show that \(\otimes \) is associative and commutative. Hence, we freely use \(\otimes \) over sets of sets of paths.

3 Specifications

We introduce our notion of specification and applies it to some well-known data types.

Definition 5

(Specification). A specification \(\mathcal {S}\) is a function such that and .

A specification \(\mathcal {S}\) maps an ldag (i.e., a visibility relation) to a set of paths (i.e., its admissible arbitrations). Note that is a path over , and hence a total order of the events in \(\mathtt {G}\). However, we do not require \(\mathtt {P}\) to be a topological ordering of \(\mathtt {G}\), i.e., may not hold. Although some specification approaches consider only arbitrations that include visibility [5, 7], our definition accommodates also presentations, such as [2, 4], in which arbitrations may not preserve visibility. We focus later on in a few subclasses, such as coherent specifications, in order to establish a precise correspondence with replicated data types. We also remark that it could be the case that , which means that \(\mathcal {S}\) forbids the configuration \(\mathtt {G}\) (more details in Example 4 below). For technical convenience, we impose and disallow : \(\mathcal {S}\) cannot forbid the empty configuration, which denotes the initial state of a data type.

We now illustrate the specification of some well-known replicated data types.

Example 4

(Counter). The data type Counter provides operations for incrementing and reading an integer register with initial value 0. A read operation returns the number of increments seen by that read. An increment is always successful and returns the value ok. Formally, we consider the set of labels . Then, a Counter is specified by \(\mathcal {S}_{ Ctr}\) defined such that

A visibility graph \(\mathtt {G}\) has admissible arbitrations (i.e., ) only when each event \(\mathtt {e}\) in \(\mathtt {G}\) labelled by \(\mathtt {rd}\) has a return value k that matches the number of increments anteceding \(\mathtt {e}\) in \(\mathtt {G}\). We illustrate two cases for the definition of \(\mathcal {S}_{ Ctr}\) in Fig. 4. While the configuration in Fig. 4a has admissible arbitrations, the one in Fig. 4b has not, because the unique event labelled by \(\mathtt {rd}\) returns 0 when it is actually preceded by an observed increment. In other words, an execution is not allowed to generate such a visibility graph. We remark that \(\mathcal {S}_{ Ctr}\) does not impose any constraint on the ordering .

In fact, a path does not need to be a topological ordering of \(\mathtt {G}\) as, for instance, the rightmost path in the set of Fig. 4a.

Fig. 4.
figure 4

Counter specification.

Example 5

(Last-write-wins Register). A Register stores a value that can be read and updated. We assume that the initial value of a register is undefined. We take \(\mathcal {L} = \{\langle \mathtt {wr}(k),{ok}\rangle \ |\ k\in \mathbb {N}\}\cup (\{\mathtt {rd}\}\times \mathbb {N}\cup \{\bot \})\) as the set of labels. The specification \(\mathcal {S}_{ lwwR}\) gives the semantics of a register that adopts the last-write-wins strategy.

An ldag \(\mathtt {G}\) has admissible arbitrations only when each event associated with a read operation returns a previously written value. As per the first condition above, a read operation returns the undefined value \(\bot \) when it does not see any write. By the second condition, a read \(\mathtt {e}\) returns a natural number k when it sees an operation \(\mathtt {e}'\) that writes that value k. In such case, any admissible arbitration \(\mathtt {P}\) must order \(\mathtt {e}'\) as the greatest (accordingly to ) of all write operations seen by \(\mathtt {e}\).

Example 6

(Generic Register). We now define a Generic Register that does no commit to a particular strategy for resolving conflicts. We specify this type as follows

As in Example 5, the return value of a read corresponds to a written value seen by that read, but the specification does not determine which value should be chosen. We require instead that all read operations with the same causes (i.e., ) have the same result. Since this condition is satisfied by any admissible configuration \(\mathtt {G}\), it ensures convergence. The fact that convergence is explicitly required contrasts with approaches like [2, 4], where on the contrary convergence is ensured automatically by considering only deterministic specifications. We remark that for the deterministic cases, e.g., Examples 4 and 5, we do not need to explicitly require convergence.

3.1 Refinement

Refinement is a standard approach in data type specification, which allows for a hierarchical organisation that goes from abstract descriptions to concrete implementations. The main benefit of refinement relies on the fact that applications can be developed and reasoned about in terms of abstract types, which hide implementation details and leave some freedom for the implementation. Consider the specification of the Generic Register introduced in Example 6, which only requires a policy for conflict resolution that ensures convergence. On the contrary, the specification in Example 5 explicitly states that concurrent updates must be resolved by adopting the last-write-wins policy. Since the latter policy ensures convergence, we would like to think about as a refinement of \(\mathcal {S}_{ gR}\). We characterise refinement in our setting as follows.

Definition 6

(Refinement). Let \(\mathcal {S}_1, \mathcal {S}_2\) be specifications. We say that \(\mathcal {S}_1\) refines \(\mathcal {S}_2\) and we write if .

Example 7

It can be easily checked that implies for any \(\mathtt {G}\). Consequently, is a refinement of .

Example 8

Consider the data type Set, which provides (among others) the operations \(\mathtt {add}\), \(\mathtt {rem}\) and \(\mathtt {lookup}\) for respectively adding, removing and examining the elements within the set. Different alternatives have been proposed in the literature for resolving conflicts in the presence of concurrent additions and removals of elements (see [13] for a detailed discussion). We illustrate two possible alternatives by considering the execution scenario depicted in Fig. 5. A reasonable semantics for \(\mathtt {lookup}\) over \(\mathtt {G}\) and \(\mathtt {P}\) would fix the result \(\mathtt{V}\) as either \(\emptyset \) or \(\mathtt{\{1\}}\). In fact, under the last-write-wins policy, the specification prescribes that \(\mathtt {lookup}\) returns \(\mathtt{\{1\}}\) in this scenario. Differently, the strategy of 2P-SetsFootnote 1 establishes that the result is \(\emptyset \).

The following definition provides a specification for an abstract data type Set that allows (among others) any of the above policies.

where

The set contains the elements added to (and possibly removed from) the set seen by \(\mathtt {e}\) while contains those elements for which \(\mathtt {e}\) sees no removal. Thus, the condition states that \(\mathtt {lookup}\) returns a set that contains at least all the elements added but not removed (i.e., in ). However, the return value V may contain elements that have been added and removed (the choice is left unspecified). Condition \(\mathtt{Conv}\) ensures convergence, similarly to the specification of \(\mathcal {S}_{ gR}\) in Example 6.

Then, a concrete resolution policy such as 2P-Sets can be specified as follows

Clearly, is a refinement of \(\mathcal {S}_{ Sets}\). Other policies can be specified analogously.

Fig. 5.
figure 5

A scenario for the replicated data type Set

3.2 Classes of Specifications

We now discuss two properties of specifications. Firstly, we look at specifications for which the behaviour of larger computations matches that of their shorter prefixes.

Definition 7

(Past-Coherent Specification). Let \(\mathcal {S}\) be a specification. We say that \(\mathcal {S}\) is past-coherent (briefly, coherent) if

Note that coherence implies that . Intuitively, sub-paths are obtained from the interleaving of the paths belonging to the associated sub-specifications.

Fig. 6.
figure 6

A non-coherent specification.

Example 9

The specifications in Examples 4, 5 and 6 are all coherent, because their definitions are in terms of restrictions of the ldags. Now consider the specification \(\mathcal {S}\) defined such that the equalities in Fig. 6 hold. \(\mathcal {S}\) is not coherent because the arbitrations for the ldag in Fig. 6b should contain all the interleavings for the paths associated with its sub-configurations, as depicted in Fig. 6a. Instead, note that the arbitration of \(\langle \mathtt {o_2},v_2\rangle \) before \(\langle \mathtt {o_1},v_1\rangle \) in the leftmost path on Fig. 6c would not hinder coherence by itself, even if it is not allowed by the sub-configuration in Fig. 6b.

A second class of specifications is concerned with saturation. Intuitively, a saturated specification allows every top element on the visibility to be arbitrated in any position. We first introduce the notion of saturation for a path.

Definition 8

(Path Saturation). Let \(\mathtt {P}\) be a path and a label. We write for the set of paths obtained by saturating \(\mathtt {P}\) with respect to , defined as follows

A path \(\mathtt {P}\) saturated with a label generates the set of all paths obtained by placing a new event labelled by in any position within \(\mathtt {P}\). A saturated specification thus extends a computation by adding a new operation that can be arbitrated in any position.

Definition 9

(Saturated Specification). Let \(\mathcal {S}\) be a specification. We say that \(\mathcal {S}\) is saturated if

Example 10

The specifications in Examples 4, 5 and 6 are all saturated because a new event \(\mathtt {e}\) can be arbitrated in any position. In fact, the specifications in Examples 4 and 6 do not use any information about arbitration, while the specification in Example 5 constrains arbitrations only for events that are not maximal. Figure 7 shows a specification that is not saturated because it does not allow to arbitrate the top event (the one labelled \(\langle \mathtt {rd},{1}\rangle \)) as the first operation in the path. In a saturated specification, the equality in Fig. 4a should hold. We remark that the specification is coherent although it is not saturated.

Fig. 7.
figure 7

A non-saturated specification

4 Replicated Data Type

In this section we show that our proposal can be considered as (and it is actually more general than) a model for the operational description of rdts as given in [2, 4]. We start by recasting the original definition of rdt (as given in [2, Definition 4.5]) in terms of ldags. As hinted in the introduction, the meaning of each operation of an rdt is specified in terms of a context, written \(\mathtt {C}\), which is a pair \(\langle \mathtt {G}, \mathtt {P} \rangle \) such that . We write for the set of contexts over \(\mathcal {L}\), and fix a set \(\mathcal {O}\) of operations and a set \(\mathcal {V}\) of values. Then, the operational description of rdts in [2, 4] can be formulated as follows.

Definition 10

(Replicated Data Type). A Replicated Data Type (rdt) is a function .

In words, for any visibility graph \(\mathtt {G}\) and arbitration \(\mathtt {P}\), the specification \(\mathcal {F}\) indicates the result of executing the operation \(\mathtt {op}\) over \(\mathtt {G}\) and \(\mathtt {P}\), which is .

Example 11

The data type Counter introduced in Example 4 is formally specified in [2, 4] as follows

Given a context in , we may check whether the value associated with each operation matches the definition of a particular rdt. This notion is known as return value consistency [2, Definition 4.8]. In order to relate contexts with and without return values, we use the following notation: given , by we denote the ldag obtained by projecting the labels of \(\mathtt {G}\) in the obvious way.

Definition 11

(Return Value Consistent). Let \(\mathcal {F}\) be an rdt and a context. We say that \(\mathcal {F}\) is Return Value Consistent (rval) over \(\mathtt {G}\) and \(\mathtt {P}\) and we write if . Moreover, we define

Example 12

Consider the rdt introduced in Example 11. The context in Fig. 8a is rval consistent while the one in Fig. 8b is not because requires \(\mathtt {rd}\) to return the number of operations seen by that read, which in this case should be 2.

Fig. 8.
figure 8

rval consistency for \(\mathcal {F}_{ ctr}\).

The following result states that return value consistent paths are all coherent, in the sense that they match the behaviour allowed for any shorter configuration.

Lemma 1

Let \(\mathcal {F}\) be an rdt and \(\mathtt {G}\) an ldag. Then

As for coherent specifications, the property also holds for return value consistent paths.

4.1 Deterministic Specifications

We now focus on the relation between our notion of specification, as introduced in Definition 5, and the operational description of rdts, as introduced in [2, 4] and formalised in Definition 10 in terms of ldags. Specifically, we characterise a proper subclass of specifications that precisely correspond to rdts.

For this section we restrict our attention to specifications over the set of labels , i.e., .

Definition 12

(Total Specification). Let \(\mathcal {S}\) be a specification. We say that \(\mathcal {S}\) is total if

Intuitively, a specification is total when every projection over \(\mathcal {O}\) of a context in , as represented by , can be extended with the execution of any operation of the data type. This is formalised by stating that for any operation and any admissible arbitration (sequence of operations) of a configuration (once more, labelled only with operations), then can be extended into an admissible arbitration of the configuration , where is just one of the possible configurations (the one labelled with the correct return values) whose projection corresponds to .

We remark that a total specification does not prevent the definition of an operation that admits more than one return value in certain configurations, i.e., in Definition 12 does not need to be unique. For instance, consider the Generic Register in Example 6, in which operation \(\mathtt {rd}\) may return any of the causally-independent, previously written values. Albeit being total, the specification for \(\mathtt {rd}\) is not deterministic. On the contrary, a specification is deterministic if an operation executed over a configuration admits at most one return value, as formally stated below.

Definition 13

(Deterministic Specification). Let \(\mathcal {S}\) be a specification. We say that \(\mathcal {S}\) is deterministic if

A weaker notion for determinism could allow the result for an added operation to depend also on the given admissible path. We say that a specification \(\mathcal {S}\) is value-deterministic if

Finally, we say that a specification is functional if it is both deterministic and total.

Example 13

Figure 9 shows a value-deterministic specification. Although a read operation that follows an increment may return two different values, such difference is explained by the previous computation: in one case the increment succeeds while in the other fails. The specification is however not deterministic because it admits a sequence of operations to be decorated with different return values.

Fig. 9.
figure 9

A value-deterministic and coherent specification.

Example 14

It is straightforward to check that the specifications in Examples 4 and 5 are deterministic. On the contrary, the specification of the Generic Register in Example 6 is not even value-deterministic. It suffices to consider a configuration in which a read operation sees two different written values. Similarly, Set in Example 8 is not deterministic.

The lemma below states a simple criterion for determinism.

Lemma 2

Let \(\mathcal {S}\) be a coherent and deterministic specification. Then

So, if two configurations are annotated with the same operations yet with different values, then their admissible paths are already all different if we disregard return values.

4.2 Correspondence Between rdts and Specifications

This section establishes the connection between rdts and specifications. We first introduce a mapping from rdts to specifications.

Definition 14

Let \(\mathcal {F}\) be an rdt. We write for the specification associated with \(\mathcal {F}\), defined as follows

Next result shows that rdts correspond to specifications that are coherent, functional and saturated.

Lemma 3

For every rdt \(\mathcal {F}\), is coherent, functional, and saturated.

The inverse mapping from specifications to rdts is defined below.

Definition 15

Let \(\mathcal {S}\) be a specification. We write for the rdt associated with \(\mathcal {S}\), defined as follows

Note that may not be well-defined for some \(\mathcal {S}\), e.g. when \(\mathcal {S}\) is not deterministic. The following lemma states the conditions under which is well-defined.

Lemma 4

For every coherent and functional specification \(\mathcal {S}\), is well-defined.

The following two results show that rdts are a particular class of specifications, and hence, provide a fully abstract characterisation of operational rdts.

Theorem 1

For every coherent, functional, and saturated specification \(\mathcal {S}\), .

Theorem 2

For every rdt \(\mathcal {F}\), .

The above characterisation implies that there are data types that cannot be specified as operational rdts. Consider e.g. Generic Register and Set, as introduced respectively in Examples 6 and 8. As noted in Example 14, they are not deterministic. Hence, they cannot be translated as rdts. We remark that a non-deterministic specification does not imply a non-deterministic conflict resolution, but it allows for underspecification.

5 A Categorical Account of Specifications

In the previous sections we provided a functional characterisation of RDTs. We now proceed on to a denotational account of our formalism by providing a categorical foundation which is amenable to the building of a family of operators on specifications.

5.1 Composing ldags

We start by considering a sub-class of morphisms between ldags, which account for the evolution of visibility relation by reflecting the information about observed events.

Definition 16

(Past-Reflecting Morphism). Let \(\mathtt {G}_{\mathtt{1}}\) and \(\mathtt {G}_{\mathtt{2}}\) be ldags and an ldag morphism. We say that \(\mathtt {f}\) is past-reflecting if

We can concisely write and spell out the definition as

It is noteworthy that this requirement boils down to (the dual of) what are called tp-morphisms in the literature on algebraic specification theory, which are an instance of open maps [9]. As we will see, this property is going to be fundamental in obtaining a categorical characterisation of coherent specifications.

Now, let \(\mathcal {G}(\mathcal {L})\) be the category whose objects are ldags and arrows are past-reflecting morphisms, and \(\mathcal {I(\mathcal {L})}\) the sub-category whose arrows are injective morphisms.

Proposition 1

\(\mathbf{(} \textsc {ldag}\ \mathbf{Pullbacks/Pushouts).}\) The category \(\mathcal {G}(\mathcal {L})\) of ldags and past-reflecting morphisms has (strict) initial object, pullbacks and pushouts along monos.

Note that pushout squares along monos are also pullback ones. As often the case, the property concerning pushouts does not hold in \(\mathcal {I}(\mathcal {L})\), even if a weak form does, since monos are stable under pushouts in \(\mathcal {G}(\mathcal {L})\). For the time being, we just remark that these properties guarantee a degree of modularity for our formalism.

We need a last definition before giving a categorical presentation.

Definition 17

(Downward closure). Let \(\mathtt {G}\) = be an ldag and . We say that \(\mathcal {E}'\) is downward closed if

It is easy to show that for any past-reflecting morphism the image of along \(\mathtt {f}\) is downward closed. Should \(\mathtt {f}\) be injective, we strengthen the relationship.

Lemma 5

An injective morphism is past-reflecting if and only if

  1. 1.

    implies ;

  2. 2.

    is downward closed.

This result tells us that past-reflecting injective morphisms are uniquely characterised as such by the properties of the image of \(\mathcal {E}_1\) with respect to \(\mathtt {G}_{\mathtt {2}}\).

Now, while the initial object of both \(\mathcal {G}(\mathcal {L})\) and \(\mathcal {I}(\mathcal {L})\) is the empty graph \(\epsilon \), the pullback in the latter has an easy characterisation, thanks to the previous lemma. Indeed, let be past-preserving injective morphisms, assuming the functions on elements to be identities for the sake of simplicity, and let . Then, and they correspond (with the obvious morphisms) to the pullback of \(\mathtt {f}_1\) and \(\mathtt {f}_2\).

5.2 The Model Category

We now move to define the model category.

Definition 18

(Morphism Saturation). Let and be sets of paths and an injective function such that . The saturation function is defined as follows

That is, each \(\mathtt {Q}\) is the image of \(\mathtt {P}\) via a morphism with underlying function \(\mathtt {f}\). We can exploit saturation in order to get a simple definition of our model category.

Definition 19

(Path-Set Morphism). Let and be sets of paths. A path-set morphism is an injective function such that and

The property can be stated as

thus each path in \(\mathtt {P}_2\) is related to a (unique) path in \(\mathtt {P}_1\) via a morphism induced by \(\mathtt {f}\). Let \(\mathcal {P}(\mathcal {L})\) be the category whose objects are sets of paths over the same elements and labelling (i.e., subsets of for some \(\mathcal {E}\) and \(\mathtt {\lambda }\)), and arrows are path-set morphisms.

Proposition 2

(Path Pullbacks/Pushouts). The category \(\mathcal {P}(\mathcal {L})\) of sets of paths and path-set morphisms has (strict) initial object and pullbacks.

As for \(\mathcal {I}(\mathcal {L})\), also \(\mathcal {P}(\mathcal {L})\) admits a weak form of pushouts along monos.

Remark 1

The initial object is the set in including only the empty path \(\epsilon \). As for pullbacks, let be path-set morphisms, assuming the functions on elements to be identities for the sake of simplicity, and let . Then, the pullback is the set in with . As for pushouts, let be injective path-set morphisms, assuming the functions on elements to be identities for the sake of simplicity, and . Then, the “weak” pushout is the set in with \(\mathtt {\lambda }\) the extension of and .

5.3 A Categorical Correspondence

It is now time to move towards our categorical characterisation of specifications.

First, let us restrict our attention to functors \(F: \mathcal {I}(\mathcal {L}) \rightarrow \mathcal {P}(\mathcal {L})\) that preserve the underlying set of objects, i.e., such that the underlying function on objects \(Ob_F\) maps an ldag \(\mathtt {G}\) into a subset of (and preserves the underling function on path-set morphisms). We also say that F is coherent if for all ldags \(\mathtt {G}\). Thus, any such functor F that preserves the initial object (i.e., \(F(\epsilon ) = \{\epsilon \}\)) gives raise to a specification: it just suffices to consider the object function \(Ob_F: Ob_{\mathcal {I}(\mathcal {L})} \rightarrow Ob_{\mathcal {P}(\mathcal {L})}\).

Proposition 3

Let \(F: \mathcal {I}(\mathcal {L}) \rightarrow \mathcal {P}(\mathcal {L})\) be a (coherent) functor preserving the initial object. Then \(Ob_F\) is a (coherent) specification.

For the inverse we need an additional lemma.

Lemma 6

Let \(\mathcal {S}\) be a coherent specification and downward closed. Then .

The lemma above immediately implies the following result.

Proposition 4

A coherent specification \(\mathcal {S}\) induces a coherent functor preserving the initial object such that .

By using Propositions 3 and 4 we can state the main result of this section.

Theorem 3

There is a bijection between coherent specifications and coherent functors \(\mathcal {I}(\mathcal {L}) \rightarrow \mathcal {P}(\mathcal {L})\) preserving the initial object.

6 Conclusions and Future Works

Our contribution proposes a denotational view of replicated data types. While most of the traditional approaches are operational in flavour [4, 7, 8], we strived for a formalism for specifications which could exploit the classical tools of algebraic specification theory. More precisely, we associate to each configuration (i.e., visibility) a set of admissible arbitrations. Differently from those previous approaches, our presentation naturally accommodates non-deterministic specifications and enables abstract definitions allowing for different strategies in conflict resolution. Our formulation brings into light some properties held by mainstream specification formalisms: beside the obvious property of functionality, they also satisfy coherence and saturation. A coherent specification can neither prescribe an arbitration order between events that are unrelated by visibility nor allow for additional arbitrations over past events when a configuration is extended (i.e., a new top element is added to visibility). Instead, a saturated specification cannot impose any constraint to the arbitration of top elements. Note that saturation does not hold when requiring that admissible arbitrations should be also topological orderings of visibility. Hence, the approaches in [2, 4] generate specifications that are not saturated. We remark that this relation between visibility and arbitration translates in a quite different property in our setting, and this suggests that consistency models defined as relations between visibility and arbitration (e.g., monotonic and causal consistency) could have alternative characterisations. We plan to explore these connections in future works.

Another question concerns coherence, which prevents a specification from choosing an arbitration order on events that are unrelated by visibility and forbids, e.g., the definition of strategies that arbitrate first the events coming from a particular replica. Consequently, it becomes natural to look for those rdts and consistency models that are the counterpart of non-coherent specifications, still preserving some suitable notion of causality between events. We do believe that the weaker property (that is, no additional arbitration over past events when a configuration is extended) is a worthwhile alternative, accommodating for many examples that impose less restrictions on the set of admissible paths (hence, that may allow more freedom to the arbitration).

These issues might be further clarified by our categorical presentation. Our proposal is inspired by current work on the semantics of nominal calculi [11], and it shares similarities with [10], since our category \(\mathcal {G}\) is the sub-category of their \(\mathbf {FinSet^{\rightrightarrows }}\) with past-reflecting morphisms. The results on Sect. 5 focus on a functorial characterisation of specifications. We chose an easy way out for establishing the bijection between functors and specifications by restricting the possible object functions and by defining coherence “on the nose”, (i.e., by considering functors F such that and ), since requiring the specification to be coherent is needed in order to obtain the functor in Proposition 4. A proper characterisation should depend on the properties of F over the arrows of \(\mathcal {G}\) (such as pullback/pushout preservation), instead of the properties of the objects in its image on \(\mathcal {P}\).

The same categorical presentation may shed light on suitable operators on specifications. Indeed, this is the usual situation when providing a functorial semantics for a language (see e.g. [1], and the references therein, among many others), and intuitively we have already a freshness operator , along the lines of edge allocation in [10]. We plan to extend these remarks into a full-fledged algebra for specifications.