Confluence in Probabilistic Rewriting

Driven by the interest of reasoning about probabilistic programming languages, we set out to study a notion of unicity of normal forms for them. To provide a tractable proof method for it, we define a property of distribution confluence which is shown to imply the desired unicity (even for infinite sequences of reduction) and further properties. We then carry over several criteria from the classical case, such as Newman's lemma, to simplify proving confluence in concrete languages. Using these criteria, we obtain simple proofs of confluence for $\lambda_1$, an affine probabilistic lambda-calculus, and for Q$^*$, a quantum programming language for which a related property has already been proven in the literature.


Introduction
In the formal study of programming languages, modelling execution via a small-step operational semantics is a popular choice. Such a semantics is given by an abstract rewriting system (ARS) which, mathematically, is no more than a binary relation on abstract terms specifying whether one term can rewrite to another. This relation is not required to be a function, and can thus allow for a program to rewrite in two different ways. In such a case, it is important that the different execution paths for a given program reach the same final value (if any); thus guaranteeing that any two determinizations of the semantics (e.g. execution machines) assign the same meaning to every program.
This correctness property is expressible at the level of relations, and is known as unicity of normal forms (UN): any two irreducible terms reachable from a common starting point must be equal. For non-trivial languages, such as the λ-calculus, it can be hard to prove this property directly. Fortunately, the property of confluence can serve as a proof method for it since UN is a trivial corollary of it while its proof tends to be more tractable. This was the approach followed by Church and Rosser in [6,Corollary 2], where UN and confluence were first proven for the λ-calculus in 1936. Nowadays, confluence is widely used to show the adequacy of operational semantics in many kinds of programming languages.
In the past decades there has been a growing interest in programming languages with probabilistic behaviours (e.g. [1,2,4,9,10,13,21,22]), which cannot be modelled as a mere relation between terms. Example features include a probabilistic choice operator [10] and quantum measurement [9]. In these settings, the same need of showing the correctness of the semantics is present.
At a first glance, it may seem as if neither UN nor confluence could ever hold in these cases, since the possible results of, say, rolling a die or measuring a qubit are irreconcilably distinct. The key observation is that, in a probabilistic language, a notion of equivalence of programs should be about distributions of values, and not about punctual values [16]. Indeed, a single program might evaluate to different values if it is run twice, but equality should certainly be reflexive. Thus, the expectation should be that the different reduction paths do not impact the final distribution of results. This is precisely the property we set out to study in this paper, developing an associated notion of confluence for it.
For a concrete example, take a hypothetical language for representing dice rolls, where represents an unrolled die which can reduce to any element in {1, . . . , 6} with equal probability. For a pair of dice, represented by ( , ), we should be allowed to choose which die to roll first. Rolling the first die can result in any term in the set {(i, )} i=1, 6 with equal probability; and similarly for the second. When continuing the rolls, both branches will end up in the same uniform distribution. However, this could be not so: consider the term (λx.(x, x)) . If the die is rolled before the abstraction is applied, only pairs with equal components can be obtained, which is not the case if we apply the abstraction first, obtaining ( , ), which can reduce to any pair of results. We shall later come back to a similar language and provide conditions to avoid this divergence.
In Section 2 we introduce the problem of unicity of distributions for probabilistic rewriting. In Section 3 we define a rewriting system over distributions giving rise to our notion of distribution confluence and prove its adequacy. In Section 4 we give some criteria for proving distribution confluence, simplifying the burden of proof for concrete languages. In Section 5 we extend our unicity of distributions result to terms which are only asymptotically terminating. In Section 6 we prove confluence for two concrete programming languages: a simple, affine, probabilistic calculus, dubbed λ 1 and the quantum lambda calculus Q * [9]. Finally, in Section 7 we conclude, give some insights on future directions, and analyse some related work.

Preliminaries
We assume familiarity with the study of abstract rewriting. We adopt the terminology from [6] and call a sequence of expansions followed by a sequence of reductions  [15,20]. We denote by L (X) the type of finite lists with elements in X, where '[ ]', ':' and '+ + ' denote the empty list, the list constructor, and list concatenation, respectively. We also use the notation [a, b, c] for a : b : c : [ ].
We define D(A) = L (R + × A), used as an explicit representation of (finitelysupported) distributions. There is no further restriction on D(A). In particular, any given element might appear more than once, as in [( 1 /3, a), ( 1 /2, b), ( 1 /6, a)]. On a node (p, a) of the distribution, p is called the weight and a the element. The weight of a distribution is defined to be the sum of all weights of its nodes, and can be any positive real number. When we require a normalised distribution, we use the type D 1 (A), defined as the set of those d ∈ D(A) with unit weight. We abbreviate the distribution [(p 1 , a 1 ), (p 2 , a 2 ), . . . , (p n , a n )] by [(p i , a i )] i , where n should be clear from the context. We also write αD for the distribution obtained by scaling every weight in D by α; that is, We often reason about equivalence of distributions. For that purpose, we define a relation '∼' as the congruence closure of the rules in Fig. 1 (i.e. the smallest relation satisfying the rules and such that D ∼ D ′ implies E 1 + + D + + E 2 ∼ E 1 + + D ′ + + E 2 ). We distinguish some subsets of '∼' by limiting the rules that may be used. We note by S the congruence closure of Split, by (F J) the congruence closure of both Flip and Join, and similarly for other subsets.
We call two distributions equivalent if they are related by the reflexive-transitive closure of ∼, noted ≈. Two distributions are equivalent, then, precisely when they assign the same total weight to every a ∈ A, regardless of order and duplication.
Arguably, using such a definition of distributions with lists is cumbersome and less clear than using a more semantic definition. However, we feel this is outweighed by the degree of rigurosity attained in later proofs (especially as we found some of them to be quite error-prone). As a secondary benefit, most of our development should be straightforwardly mechanizable.
A downside of the choice of lists is that we can only represent finitely-supported distributions. This restriction is present in other works as well (e.g. [11,23]) and it seems to not be severe for modelling programming languages.

Probabilistic Abstract Rewriting Systems (PARS)
To model probabilistic rewriting, we need to move away from a simple relation between terms as used in ARSes. We shall then relate elements to distributions of elements, which introduce probability. In favour of expressivity, we allow for a single element to be related to multiple distributions (or none) 3 .

Definition 2.1 A probabilistic abstract rewriting system (PARS) is a pair (A, →)
where A is a set (called the "carrier") and → a relation of type P(A×D 1 (A)) (called the "pointwise evolution relation").
It should be clear that every ARS is also a PARS by taking Dirac distributions (i.e. normalised distributions with a single element). We can provide a simple example of a PARS by extensionally listing →, as is commonly done for ARSes.

Example 2.2 Let A be the PARS given by
Here, a is the only non-deterministic element. We call d a terminal element since it has no successor distributions.
As more significant examples, in Section 6 we describe two probabilistic λ-calculi with operational semantics modellable by a PARS.
Execution in a PARS is a mixture of non-deterministic and probabilistic choices. The first kind, corresponding to the P operator, occur when the machine chooses a successor distribution for the current element. The second kind, corresponding to the D 1 operator, is a random choice between the elements of the chosen successor distribution. To model such execution, we introduce the notion of computation tree.

Definition 2.3
Given a PARS (A, →), we define the set of its (finite) "computation trees" with root a (noted T (a)) inductively by the following rules.
a ∈ T (a) a → [(p 1 , a 1 ), . . . , (p n , a n )] t i ∈ T (a i ) [a; (p 1 , t 1 ); . . . ; (p n , t n )] ∈ T (a) A graphical representation of an example tree for the PARS in Definition 2.2 is given in the right. We also sometimes consider infinite computation trees, by taking the coinductively defined set instead.
So, any tree T ∈ T (a) represents one possible (uncertain) evolution of the system after starting on a. There is no further assumption about trees: in particular, if an element a is expanded many times in a tree, different successor distributions may be used at each node 4 .
When all the leaves of a tree are terminal elements, we call the tree maximal (as there is no proper supertree of it). A computation tree naturally assigns to each of its leaves a probability, taken as the product of the p i in the path from the root to it. It is clear that collecting the leaves of a tree (along with their assigned probabilities) gives rise to a normalised list distribution. We call such distribution the support of T and note it as supp(T ).
We can now state our property of interest.

Definition 2.4 (UTD)
A PARS A has "unique terminal distributions" when for every a and T 1 , We said before that proving UN (for ARSes) directly is usually hard. Since PARSes subsume ARSes, the same difficulties arise for proving UTD directly. Therefore, we seek a property akin to confluence, providing a more tractable proof method.
One idea is to set up a rewriting over the supports of computation trees (expanding leaves as a reduction) and study its confluence. However, this is bound to be too rigid, as the behaviours of computation trees are not identified by equivalence of their support distributions, forbidding reasoning modulo equivalence. For example, a tree with two leaves ( 1 /2, a) can exhibit more terminal distributions than a tree with a single (1, a) leaf. Therefore we would have to deem the following, morally confluent, system: In order to not reject cases like this, we could change the previous notion to allow for an equivalence when closing the diagram. In this case, it is hard to reason compositionally about the property.

Rewriting Distributions and Confluence
To arrive at a notion of confluence that avoids the previously mentioned drawbacks, we shall define a rewriting over distributions that is more liberal than that of computation trees (Definition 3.4).

Definition 3.1 Given a PARS A = (A, →)
, we define the relation ։ P (of type P(D(A) × D(A))) (called "parallel evolution") by the rules: Note that without using the first rule, this is just the identity relation on distributions. We note the subset of this relation where the first rule must be used at least once in a step as ։ 1 P , and call it proper evolution. Note that ։ P is enough to simulate computation trees in this system, since it can be used to rewrite between their supports in the following sense.

Definition 3.2 We call a relation
Proof. First, note that ։ P is compositional. The result then follows by induction on T , using compositionality. ✷ We now define an ARS over distributions, combining both parallel evolution and equivalence steps. Our definition of confluence for a PARS A is then simply the usual confluence of that relation.

Definition 3.5 (Distribution confluence)
We say a PARS A is "distribution confluent" (or simply "confluent") when Det(A) is confluent in the classical sense.
Note that reduction in Det(A) is more liberal than the expansion of trees, since it allows for "partial" evolutions. 2, a)]. Nevertheless, its confluence is adequate for proving UTD, as Lemma 3.7 shows.
. The result then follows by induction on the number of steps, and the transitivity and reflexivity of ≈. ✷ . Since T 1 , T 2 are maximal, their supports are terminal. Then, from two applications of Lemma 3.6, we get that supp(T 2 ) ≈ C ≈ supp(T 1 ), as needed.✷ Furthermore, beyond UTD, distribution confluence implies that diverging computations (with no terminal distribution) can also be joined. As a consequence of that, confluence gives a neat method of proving the consistency of the equational theory induced by ։, as long as two distinct terminal elements exist.
Proof. The way back is trivial, so we detail the way forward. From confluence (repeatedly), D 1 and D 2 must have a common reduct. The result then follows from Lemma 3.6. ✷ So, if a and b are distinct terminal elements, we know that ։-convertibility is a consistent theory as Summarizing, in a confluent PARS, reasoning about equivalence of programs is simplified and there is a strong consistency guarantee about convertibility, much like in the classical case.

Introduction
In the previous section, we have introduced our definition of confluence and argued that it is correct and sufficient for studying programming languages. For the prop-erty to be useful in practice, it should be also amenable to be proven. In this section we provide several simplified criteria for this task, obtaining analogues to many of the usual methods for classical confluence.
Since distribution confluence is no more than the classical confluence of ։, every existing classical criteria (such as the diamond property or Newman's lemma) is valid in this setting. However, very few of those are useful. Indeed, Det(A) is never strongly (or even weakly) normalising regardless of A, and therefore Newman's lemma does not apply. With respect to the diamond property, consider a system with a → D, then the following reductions are possible: and these two distributions cannot be joined in a single step (unless we make further assumptions on A). Also, as evidenced by this example, we need to prove confluence for every distribution, not just for Dirac ones; and take into account equivalence steps as well.
Thus, a priori, it would seem as if distribution confluence is hard to prove. To relieve that, we shall prove various syntactic lemmas about the relation ։, allowing us to decompose it into more manageable forms. We then show how we can limit our reasoning to Dirac distributions, ignore equivalence steps in the peaks and allow to use them freely in the valleys. Lastly, we carry over classical criteria for confluence into this setting, such as the aforementioned diamond property and Newman's lemma.

Syntactic lemmas about the relation ։
Since both ։ P and ≈ are reflexive, we have (։ P ∪ ≈) * = (։ P / ≈) * . Thus, since confluence is a property over the reflexive-transitive closure of a relation, it suffices to study the confluence of ։ P / ≈, where equivalence steps do not have a cost, but are pervasive.
Given the precise syntactic definition for both relations, we can prove by analysis on the reductions that any step of ։ P / ≈ can be made by splitting first, then evolving, and then joining back elements, as Lemma 4.3 states. We first introduce the following notion of commutation 5 .

Definition 4.1 (Sequential commutation)
We say that a relation R "commutes over" S when S · R ⊆ R · S, and note it as R ⊣ S. The property can be expressed by the diagram on the right. .

S R R S
A key property of sequential commutation is that if R ⊣ S, then (R ∪ S) * = R * · S * . It is also preserved when taking the n-fold composition (i.e. "n steps") or reflexive-transitive closures on each side. We now prove some commutations relating evolution and equivalence steps (the last one needs some "administrative" steps).
Proof. By induction on the shape of the reductions. ✷
Proof. The backwards inclusion is trivial, so we detail only the forward direction. By making use of the second commutation in Lemma 4.2 we get that ≈ = S * ·(F J) * . Thus, we need to show S * · (F J) * · ։ P · S * · (F J) * ⊆ S * · ։ P · (F J) * . The proof then proceeds by using the other two commutations to reorder the relations. ✷ Furthermore, this equivalence extends to n-fold compositions.
Proof. By induction on n, and using the previous lemma and commutations. ✷

Simplifying diagrams
With the previous decompositions, we can now prove a very generic result about diagram simplification with a specific root D, which then easily generalizes to the whole system.

Definition 4.5
We say a pair of relations (γ, δ) "closes" another pair The diagram for the property can be seen on the right. When this occurs for all a, we simply say "(γ, δ) closes (α, β)". Note that → is confluent precisely when (→ * , → * ) closes (→ * , → * ).  Proof. We give a sketch of the proof, for a more explanatory development please refer to [17]. We need to close (S * · α · (F J) * , S * · β · (F J) * ). First, note that closing (S * · α, S * · β * ) is enough since we can revert the (F J) * steps with (F S) * steps. Now, since α and β are local, S * · α and S * · β are as well. Thus, we can limit ourselves to closing the Dirac distributions of D, and combine the reductions since γ, δ are compositional. We now need to close (S * ·α, S * ·β) when starting from some [ (1, a)]. Note that the left (right) branch is then of the form p 1 D 1 + + · · · + + p n D n (q 1 E 1 + + · · · + + q m E m ), where a reduces via α (β) to each D i (E j ). We can apply our hypothesis to get a C i,j closing each D i , E j . By first splitting each branch appropriately, we can close them in p 1 q 1 C 1,1 + + . . . + + p 1 q m C 1,m + + . . . + + p n q m C n,m , thus we conclude. ✷ From this theorem, we get as corollaries several simplified criteria for confluence, applicable at the level of a particular distribution or to the whole system.

Proof.
A corollary of Theorem 4.7, taking α = ։ P and β = γ = δ = ։ * P . ✷ Criterion 4.10 (Diamond property) If for every element a of D and distributions E, F such that E ← a → F there is a C such that E ։ P/≈ C և P/≈ F , then D has the diamond property for ։ P / ≈.

Proof. A corollary of Theorem 4.7, taking
Note that in all these criteria, we need not consider any equivalence in the peak, and can use them freely in the valley, both before and after evolving. Also, proving any of these criteria for every element a entails the confluence of the system.
In the classical case, a common tool for proving confluence is switching to another rewriting relation with equal reflexive-transitive closure (and thus an equivalent confluence) but which might be easier to analyse. For distribution confluence, a similar switch is allowed, slightly simplified by Lemma 4.12. Proof. The first part follows by case analysis on the reduction ։ 1 . The second part is then trivial. ✷

Newman's lemma
Newman's lemma [18] states that, for a strongly normalising system, local confluence and confluence are equivalent properties, yet we have remarked previously that ։ P is never a strongly normalising relation. To get an analogue to Newman's lemma, we thus provide a specialized notion of strong normalisation.

Definition 4.13 A infinite sequence
Definition 4.14 We call a distribution D "strongly normalising" when there is no infinite ։ 1 P -chain 6 of root D. We call a PARS strongly normalising when every distribution is strongly normalising.
There are indeed systems which do satisfy this requirement, and it is intuitively what one would expect. Now a probabilistic analogue to Newman's lemma can be obtained, following a proof style very similar to that of [14]. 6 Note that infinite (։ 1 P / ≈)-chains always exist because of partial evolution.

Definition 4.15
We say that a distribution D is "locally confluent" when E և 1 P D ։ 1 P F implies that there exists C such that E ։ * C և * F . (Note that strong normalisation over Dirac distributions implies it for all distributions, and likewise for local confluence.)

Lemma 4.16 (Newman's) If a PARS is locally confluent and strongly normalising, then it is confluent.
Proof.
We shall prove, by well-founded induction over ։ 1 P , that every distribution is confluent. For a particular distribution, it suffices to show that that any peak of proper evolutions can be closed by ։ * . Then, by Corollary 4.8 (and since ։ * P = ։ 1 * P ), confluence follows. We want to close a diagram of shape E և 1 * P D ։ 1 * P F . If either of the branches is zero steps long, then we trivially conclude. If not, we can form the diagram on the right, completing the proof by local confluence and the induction hypotheses for E ′ and F ′ . ✷

Limit distributions
In classical abstract rewriting, an element can either be non-normalising, weakly normalising or strongly normalising (corresponding to the situations where it will not, may, and will normalise, respectively). In probabilistic rewriting, the story is not as simple. Consider the following PARS, where b is a terminal element.
Is a normalising? One could say "no" since, indeed, it does not have a finite maximal computation tree, as there is always some probability for the system to be in the non-terminal a state. However, such a probability will be made arbitrarily small by taking sufficient steps, and the distribution [(1, b)] is reached in the limit. In this case a is called almost surely terminating [3]. Certainly, a desirable fact is that almost-surely-terminating elements have a unique final distribution. We will prove that distribution confluence guarantees such unicity.
We first introduce a notion of distance between mathematical distributions, i.e. normalised functions of type A → [0, 1] 7 . We note with D the mathematical distribution obtained from the list distribution D (with the expected definition). We also extend definitions over mathematical distributions to list distributions by applying − where appropriate. Note that this distance is the L 1 distance and the definition of limit is the usual one for metric spaces. It is then well known that there is at most one limit for a given sequence. We are interested in limits composed of terminal elements, representing a distribution of values. For that, the following definition is useful.

Definition 5.3
For a mathematical distribution D, we define its "liveness" as the sum of weights for non-terminal elements. That is, Liv(D) = a∈dom( →) D(a) Note that the liveness of a list distribution cannot increase by evolution, and that Liv(D) = 0 iff D is terminal. Moreover, since the normalised part of a distribution cannot evolve, liveness provides an upper bound on the possible distance to be attained by evolution, as the following lemma states.

Proof. By Lemma 4.4 there exist D ′ and E
Because of the equivalences, it suffices to show the result for D ′ and E ′ . Assume, without loss of generality, that D ′ = D l + + D t , where all elements of D l are not terminal, and all those of D t are. Since parallel evolution is local and terminal elements cannot evolve, we know that E ′ = E ′′ + + D t for some E ′′ . Then, d(D ′ , E ′ ) is simply d(D l , E ′′ ). Note that Liv(D ′ ) is the weight of D l and of E ′′ . Since distance is bounded by total weight, it follows that it is at most 2 · Liv(D ′ ) = 2 · Liv(D). ✷ Now, we can extend our notion of unicity of terminal distributions to limit distributions of terminal elements, accounting for an infinite sequence of reduction steps. We call this property unicity of limit distributions (ULD) and show it to be a consequence of distribution confluence.

Lemma 5.5 If A is confluent, and a distribution D is the root of two infinite ։-chains E i and F j with respective limits E ∞ and F ∞ terminal distributions, then
Take ε > 0. By the definition of limit, we know there Since E ∞ and F ∞ are terminal, Liv(E i ) and Liv(F j ) must be less than ε /6. The distributions E i and F j are reachable by a finite amount of ։ steps, so by confluence there exists a distribution C such that E i ։ * C և * F j . From Lemma 5.4, we get that d(E i , C) < ε /3 and likewise for F j . From these four bounds and the triangle inequality we get that d(E ∞ , F ∞ ) < ε. Since this is the case for any positive ε, this distance must be exactly 0, and therefore In our introductory example, we used the term (λx.(x, x)) as an example of a non-confluent computation. The tension apparently arises between "binding the result" (CBV) and "binding the computation" (CBN), which makes a difference in the probabilistic case when the binding is duplicated (as already pointed out in [10]). There seem to be three ingredients needed for this failure of confluence of a term (λx.M )N : (1) x appears free more than once in M (2) N has a non-Dirac terminal distribution (3) both call-by-name and call-by-value reductions are possible.
In this section we define a probabilistic λ-calculus, dubbed λ 1 , that prevents the combination of these three features by providing two kinds of abstractions, one restricting duplication and one restricting evaluation order 8 . We show λ 1 to be confluent (by a diamond property), giving evidence that little more than linearity of probabilistic arguments is required to achieve a confluent probabilistic programming language.
The calculus is heavily based on the one defined in [24]. The set of pre-terms is given by the following grammar M, N : where the main novelty is the probabilistic choice operator ⊕ p , for any real number p in the open interval (0, 1). Abstractions (λ) are affine, that is, in the scope of λx, there can be at most one free occurrence of x. Non-linear abstractions (λ!) have no such restriction. Affinity is enforced by a well-formedness judgment, whose definition is straightforward and which we thus omit. We work only with well-formed pre-terms, which form the set of terms.
The operational semantics is provided as a PARS in Fig. 2. A non-linear abstraction can only β-reduce with an argument of the form !N . Such terms cannot reduce, and are called thunks. This effectively implies that non-linear abstractions follow a fixed strategy (which is, morally, CBV until the argument is reduced to a thunk and CBN afterwards) 9 .
To prove the diamond property for λ 1 , we first need two substitution lemmas. Proof. By induction on the well-formedness of N . ✷ Lemmas 6.1 and 6.2 are analogous to both statements of [24,Lemma 3.1]. Armed with both, we can prove the following theorem, which implies the diamond property.
Proof. By induction on the shape of M → D and M → E. ✷ By Corollary 4.10 we conclude that the calculus λ 1 is confluent, and thus enjoys both UTD and ULD.

Q *
The Q * calculus [9] is a quantum programming language with quantum measurement, an inherently probabilistic operation. Reduction occurs between configurations, which are terms coupled with a quantum state, and which we will not detail further. Its semantics does not fix a strategy and, as λ 1 , is also based on [24]. Reduction steps are paired with a label indicating which type of reduction occurred (e.g. which qubit was measured). In Q * , terms are linear and not affine: variables representing quantum data cannot be duplicated nor discarded, as per the no-cloning [26] and no-erasure [19] properties of quantum physics.
The authors prove a property called strong confluence which asserts that any two maximal (possibly infinite) computation trees with a common root have an equivalent support when restricted to normal forms, and, further, that any normal form appears in an equal amount of leaves on both trees.
To prove such property, the authors prove a crucial lemma called quasi-onestep confluence, which is morally a diamond property but with slightly different behaviours according to the reductions taken. The reductions are distinguished between two sets N , K and those of the form meas r (measurements). We will not describe these sets nor Q * 's semantics (its full description can be found in [9]), and will only state the lemma about its reductions. The notation C → p α D means "C reduces to D with probability p via the label α"; and C → p N D means C → p α D for some α ∈ N (idem K).

Lemma 6.4 (Quasi-one-step Confluence for Q * [8, Proposition 4])
Let C, D, E be configurations and C → p α D, C → s β E, then: to the same non-confluence.
(i) If α ∈ K and β ∈ K, then either D = E or there is F with D → 1 K F and If α = meas r and β = meas q (with r = q), then there are t, u ∈ [0, 1] and an F such that pt = su, D → t measq F and E → u measr F . From this lemma, the fact that there are no infinite K sequences, and a "probabilistic strip lemma", the authors prove strong confluence [9,Theorem 5.4].
For distribution confluence, a simpler proof can be obtained. After modelling Q * as a PARS (without labeled reductions, but sets of distributions instead) we can readily reinterpret Lemma 6.4 to prove the diamond property for it (by Corollary 4.10). From this result, distribution confluence follows, and therefore also unicity of both terminal and limit distributions. Notably, neither the normalisation requirement for K nor the "probabilistic strip lemma" are needed for this fact.
Our obtained distribution confluence is similar, but neither weaker nor stronger than strong confluence. It is not weaker as distribution confluence guarantees that divergences of computations without any normal form can be joined, which strong confluence does not. It is also not stronger as it implies nothing of limit distributions that are not terminal, while it follows from strong confluence that they must coincide in their normalized part. It also does not imply the equality between the amount of leaves on each tree 10 .

Conclusions
We have studied the problem of showing that an operational semantics for a probabilistic language is not affected by the choice of strategy. For this purpose, we provided a definition of confluence for probabilistic systems by defining a classical relation over distributions. We showed our property of distribution confluence to be appropriate as, in particular, it implies a unicity of terminal distributions, both for finite and infinite reductions, and gives an equational consistency guarantee.
We believe this development demonstrates that distribution confluence provides a reasonable "sweet spot" for proving the correctness of probabilistic semantics, as it provides the expected guarantees about execution while allowing tractable proofs. Concretely, the provided proofs for λ 1 and Q * are in line with what one would expect for linear calculi.
The proof about Q * also partially answers the conjecture posed in [9, Section 8] ("any rewriting system enjoying properties like Proposition 4 [our Lemma 6.4] enjoys confluence in the same sense as the one used here") positively. The answer is partial since distribution confluence is not strictly equivalent.
Looking ahead, there are several interesting directions to explore. First, a study of confluence dealing with terms (and not just abstract elements) should provide more insights applicable to concrete languages, and we expect concepts such as orthogonality to have probabilistic analogues. As a generalization, it seems possible to take distribution weights from any mathematical field and not only the positive reals: even if interpreting such systems is not obvious, it seems most of our results would hold. Finally, a quantitative notion of confluence could also be explored, where a distribution is considered confluent if any divergence of it can be joined "up to ε"; in particular, obtaining useful simplified criteria for said property seems difficult.

Related work
In [7], similar definitions of rewriting of distributions and confluence are introduced.
A key difference is that, in that work, equivalent distributions are identified and there is no partial evolution. In particular, this implies that the relation is not compositional, introducing a very subtle error in Lemma 10 (which, basically, states that a diamond property on Dirac distributions implies that of the whole system). The error is not severe for their development, but highlights the non-triviality of the matter.
In [12], a notion of confluence is defined and proven for an extension of λ q [25] (a quantum λ-calculus) with measurements (thus introducing probabilistic behaviour). The proposed confluence is basically a confluence on computation trees, and the one we study in this paper is strictly weaker, yet sufficient for UTD and consistency.
In [9], already amply discussed, the notion of confluence introduced is a strong confluence over maximal trees (either finite or infinite) which is related, but neither weaker nor stronger than distribution confluence. Finally, in [5], a property of confluence is defined and studied over probabilistic rewriting systems which do not contain any non-determinism (i.e. where → is a partial function). This is a very different notion of confluence, speaking about punctual final results instead of distributions (indeed, distribution confluence trivially holds as there is no non-determinism).