Monads, partial evaluations, and rewriting

Monads can be interpreted as encoding formal expressions, or formal operations in the sense of universal algebra. We give a construction which formalizes the idea of"evaluating an expression partially": for example,"2+3"can be obtained as a partial evaluation of"2+2+1". This construction can be given for any monad, and it is linked to the famous bar construction, of which it gives an operational interpretation: the bar construction induces a simplicial set, and its 1-cells are partial evaluations. We study the properties of partial evaluations for general monads. We prove that whenever the monad is weakly cartesian, partial evaluations can be composed via the usual Kan filler property of simplicial sets, of which we give an interpretation in terms of substitution of terms. In terms of rewritings, partial evaluations give an abstract reduction system which is reflexive, confluent, and transitive whenever the monad is weakly cartesian. For the case of probability monads, partial evaluations correspond to what probabilists call conditional expectation of random variables. This manuscript is part of a work in progress on a general rewriting interpretation of the bar construction.

Monads can be interpreted as encoding formal expressions, or formal operations in the sense of universal algebra. We give a construction which formalizes the idea of "evaluating an expression partially": for example, "2+3" can be obtained as a partial evaluation of "2+2+1". This construction can be given for any monad, and it is linked to the famous bar construction [ML00, VII.6], of which it gives an operational interpretation: the bar construction induces a simplicial set, and its 1-cells are partial evaluations.
We study the properties of partial evaluations for general monads. We prove that whenever the monad is weakly cartesian, partial evaluations can be composed via the usual Kan filler property of simplicial sets, of which we give an interpretation in terms of substitution of terms.
In terms of rewritings, partial evaluations give an abstract reduction system which is reflexive, confluent, and transitive whenever the monad is weakly cartesian.
For the case of probability monads, partial evaluations correspond to what probabilists call conditional expectation of random variables.
This manuscript is part of a work in progress on a general rewriting interpretation of the bar construction.

Background: monads and formal expressions
An interpretation of the theory of monads, in terms of universal algebra [HP07], is that a monad is like a consistent choice of spaces of formal expressions in a signature. This interpretation is most accurate for monads on the category of sets, but the categorical constructions work in general.
We can then interpret the definition of a monad in the following way. First of all, we have a functor T : C → C, which consists of the following assignments: (a) To each space X, we assign a new space T X, which we think of as containing formal expressions of elements of X in a certain signature, modulo the equations specified by the theory. For example, the elements of T X may exactly be the formal sums of elements of X.
(b) Given two spaces X and Y and a function f : X → Y , we get a function T f : T X → T Y , which we think of as elementwise substitution. This assignment should preserve identity and composition. In the case of formal sums, given a function f : X → Y , we automatically get a function from formal sums of elements of X to formal sums of elements of Y by just "extending linearly". For example: (1.1) In the case of formal sums, any element x can be considered a (trivial) formal sum. For general monads, this is encoded in the unit natural transformation η : id C ⇒ T . Moreover, formal sums of formal sums can be reduced to just formal sums, such as can be reduced to 2a + 2b + c + d.
This in general is encoded in the monad multiplication µ : T T ⇒ T . In general, the objects of C may not be sets. Therefore, instead of elements, we look at generalized elements: in general, what we can interpret as a "formal expression" should be a morphism S → T X for some object S of C. Let's have the following convenient definition: Definition 1.1. Let (T, η, µ) be a monad on a category C, and X be an object. A generalized formal expression on X is a morphism p : S → T X, where S is an object of C.
In the case of sets, as S we can take the terminal set 1, recovering the usual formal expressions. The same can be done for many concrete categories. When this does not cause confusion, we will drop the word "generalized".
In mathematics, many times formal expressions can be evaluated to a result. For example, the expression 3 + 2 can be evaluated to 5. An algebra of a monad is a space in which (generalized) formal expressions can be evaluated to actual (generalized) elements. So for the formal sum monad on Set, the category of algebras is precisely the category of commutative monoids, since in commutative monoids formal sums can be evaluated to actual elements in a way which respects usual rewriting rules for working with sums, namely associativity and commutativity.
Formally, an algebra of the monad T consists of an object A together together with an evaluation map e : T A → A, suitably compatible with η and µ.
Definition 1.2. Let A be a T -algebra. Given a (generalized) formal expression p : S → T A, we call its result the (generalized) element of A given by e • p : S → A.

Partial evaluations and partial decompositions
Consider the sums 3 + 4 + 5 (2.1) and 7 + 5. (2.2) Not only do they have the same result, but in addition, we can say that the sum (2.2) can be obtained from (2.1) by partially evaluating the expression. Just as well, we would like to say that the sum (2.1) can be obtained from (2.2) by partially decomposing the terms in the expression. Let's try to make this precise. The idea is that there is a formal sum of formal sums, i.e. a formal sum with one level of brackets, such that removing the brackets yields the term on the left, and such that performing the operations in the brackets (and then removing the brackets) yields the term on the right. That is: (3 + 4) + (5) 3 + 4 + 5 7 + 5 remove brackets evaluate brackets As we have seen in Section 1, the "formal sums of formal sums" live in T T A. The map which can be seen as "removing the brackets" is the multiplication map µ : T T A → T A, and the map that evaluates the expressions within the brackets is the image of the evaluation map e under the functor T , i.e. T e : T T A → T A. We can then give a general definition of partial evaluations for all monads.
Definition 2.1. Let (T, η, µ) be a monad on a category C, let (A, e) be a T -algebra, and consider the formal expressions p, q : S → T A. A partial evaluation of p into q, or a partial decomposition of q into p is a map k : S → T T A, or "nested formal expression", which makes the following diagram commute:

Basic properties
From the definition and the triangle identities we have immediately the following result, which is a sort of consistency check: any expression has two trivial partial evaluations, to itself, and to its result (viewed as a formal expression).
Proposition 2.2. Let A be a T -algebra like above, and p : S → T A. Then: (a) p admits a partial evaluation to itself; (b) p admits a partial evaluation to η • e • p, which we call its total evaluation.
Proof.  Here is another consistency check: if p admits a partial evaluation into q, then p and q necessarily must have the same result.
Proposition 2.4 (Law of total evaluation). Consider the formal expressions p, q : S → T A, and suppose that there exists a partial evaluation from p into q. Then p and q have necessarily the same result, i.e. e • q = e • p.
Proof. The multiplication square of the T -algebra (A, e) is a commutative diagram Now suppose that k : S → T T A gives a partial evaluation of p into q, i.e. µ • k = p, and (T e) • k = q. Then, since the square above commutes, as was to be shown.

Composing partial evaluations
There is another appealing property to expect from partial evaluations, namely that if we can partially evaluate p to q and q to r, then we expect that we should be able to partially evaluate p to r.
Definition 2.7. Let (T, η, µ) be a monad on a category C. The monad (T, η, µ) is called cartesian if: • The functor T preserves pullbacks; • The naturality squares of η and µ are pullbacks.
As we will see, partial evaluations for cartesian monads are particularly well-behaved. But cartesianness is also a very restrctive condition, and we thus consider a variant of it, based on the standard concept of weak pullback. Note that we do not require the map a to be unique.
We can generalize the notion of weakly cartesian monad [Web04,CHJ14] from Set to all categories.
Definition 2.9. Let (T, η, µ) be a monad on a category C. We say that (T, η, µ) is weakly cartesian if: • The functor T preserves weak pullbacks; • The naturality squares of η and µ are weak pullbacks. We have the following result: Proposition 2.11. Let T be a monad on a category C. Let A be a T -algebra, and suppose that the following diagram is a weak pullback: Then the partial evaluation relation on every set of formal expressions C(S, T A) is transitive.
Proof. We have to prove the following: given p, q, r : S → T A and k, h : Consider now the commutative diagram: (which commutes by the composition, associativity, and naturality squares). Then we have that p sits in the bottom left corner, q in the top corner, and r in the bottom right corner, while k sits in the top left corner, and h in the top right. Since the top diamond is exactly diagram (2.3), which by hypothesis is a weak pullback diagram, there exists an a : S → T T T A such that (2.4) still commutes. Therefore ρ : Since the square (2.3) is necessarily a weak pullback for any weakly cartesian monad, we have Corollary 2.12. Let T be a weakly cartesian monad. Then for every T -algebra A and every object S, the partial evaluation on C(S, T A) is transitive.
Since the free commutative monoid monad is weakly cartesian, this construction reproduces Example 2.6.
In the same way, if T is cartesian, or only if µ is, then the diagram (2.3) is a pullback. This makes the composition of partial evaluations into an algebraic operation, if we keep track of the element of T T A which witnesses each partial evaluation relation.
Notably, every monad on Set which arises from a (non-symmetric) operad are cartesian [CHJ14].

As a simplicial object
The diagram (2.4) is given by the first three levels of the bar construction [ML00, Section VII.6]. In particular, the composition condition is exactly a Kan filler condition, as for nerves of categories. In general, the bar construction has the flavor of a highercategorical extension of the partial evaluation relation. The study of the higher-order compositional properties of this construction is work in progress; here we describe what we know so far.
Definition 3.1. Let T be a monad on C and (A, e) a T -algebra. The bar construction of A is the simplicial object A • in the category of T -algebras given by the following assignments for all i ≥ 0: The simplicial identities are guaranteed to hold by the monad and algebra structure, and by naturality of the structure maps. This implies, in particular, that given an object S of C, hom(S, A • ) is a simplicial set, with the following interpretation: • The vertices of the simplicial set are given by the (generalized) elements of T A, i.e. (generalized) formal expressions; • The 1-simplices are given by witnesses of partial evaluations: since the source and target maps d 0 , d 1 : A 1 → A 0 are exactly µ and T e, we can view the 1-simplices as arrows pointing in the direction of partial evaluation; • For every vertex, or equivalently formal expression, the map s 0 : A 0 → A 1 given by T η gives an "identity" 1-simplex, which has the right source and target, as proven in Proposition 2.2.
• The composition of 1-simplices, when defined, is given by a 2-simplex which is exactly a Kan filler of an inner horn. When T is weakly cartesian, this filler always exists. When T is cartesian, this filler is moreover unique. And in the category of sets, the same arguments show that the resulting simplicial set is even the nerve of a category [Seg68]! Since partial evaluations (or equivalently, partial decompositions) are in most cases intrinsically directed, these simplicial objects (and the simplicial sets that we obtain by proving them with objects S) give "spaces" which are intrinsically directed. As spaces, it thus seems most natural to study them with the methods and tools of directed homotopy theory (see for example [Gra09]).  In particular, z can always be given by the "total evaluation" η • e • t = η • e • u of Proposition 2.4;

In terms of rewriting systems
• The irreducible elements are the total evaluations. In other words, given s ∈ C(S, T A), the following are equivalent: -If s → t for some t ∈ C(S, T A), then necessarily t = s; -There exists a ∈ C(S, A) such that s = η • a.
• If T is weakly cartesian, then → is transitive.
The composition of partial evaluations can be thought of as a "rewriting of rewritings", which points to the theory of higher rewritings (see for example [Bur93,Mim14]). At least in this framework, however, higher rewrite rules are defined in a simplicial flavour, rather than globular.

Examples
Here are some examples of monads and of the partial evaluations that they induce.
Monoid and group action monads. Let G be a monoid (or group) in a cartesian monoidal category. Then X → G × X is a functor equipped with a monad structure, with unit and multiplication induced by those of G, and the algebras e : G × A → A are the objects equipped with G-actions. Let now (g, x) and (h, y) be elements of G × A. We have that (h, y) is a partial evaluation of (g, x) if and only if there is an element (h, ℓ, x) ∈ G × G × A such that hℓ = g and ℓx = y. In pictures: In other words, (h, y) is a partial evaluation of (g, x) if and only if we can write g as a composite hℓ, such that "applying only the part ℓ to x gives y". So (h, y) is "further along in the orbit" than (g, x). If G is a group, then whenever x and y are on the same orbit we can find the decomposition above, by setting ℓ = h −1 g, and the partial evaluation relation is symmetric: it is the equivalence relation given by belonging to the same orbit. If G is only a monoid, instead, then the partial evaluation relation is generally stronger than being in the same orbit and need not be symmetric. As this monad (on Set) is associated to a non-symmetric operad, it is a cartesian monad. Thus witnesses of partial evaluations can be uniquely composed (and their composition is given by the composition of the monoid or group). In this case, the category whose nerve is the simplicial set Set(1, A • ) arising from the bar construction has pairs (g, x) as above as objects, and triples (h, ℓ, x) as above as morphisms, with domain (hℓ, x) and codomain (h, ℓx).
Idempotent monads. For idempotent monads, it is easy to check all partial evaluations are trivial. In other words, all partial evaluations are identities.
Free monoid monad. This monad is also associated to an operad. Therefore, also here, partial evaluations can be uniquely composed, and form a category. We are currently not aware of a more explicit description of this category.
Free commutative monoid monad. This monad is weakly cartesian [CHJ14]. Therefore, partial evaluation witnesses in T T A can still be composed, but the composition is not unique.

Partial evaluations in probability
Partial evaluations for probability monads permit to compare probability distributions in terms of how spread, or how random they are.
Common ways of measuring the "randomness" of a probability measures are functionals, like variance and entropy. However, there is important information that a single real number cannot encode. Intuitively, a single number can measure only "how much" the randomness is, but not "where", or "in which way".
Example 6.1. Consider for example the probability distributions on R whose densities are represented in the following picture: One can say that p is "more random" or "more spread" than q around the same center of mass. Instead, while r looks more "peaked" than q, it is so "somewhere else": it has indeed less randomness quantitatively, but over different regions. In a partial order, we would say that q and r are incomparable. In higher dimensions, the same would be true if the two distributions were spread around the same center of mass, but along different directions. This is what we mean by "where the randomness is".
In the rest of this section we will show how partial evaluations can be employed in cases like the example above.

Probability monads and the Kantorovich monad
We have seen that monads can be interpreted in terms of formal expressions encoding possible "operations". In the case of probability monads, the operations in question are formal convex combinations, or mixtures.
Consider a coin flip, where "heads" and "tails" both have probability 1/2. Then in some sense, this is a convex combination of "heads" and "tails". Formally, the set {"heads", "tails"} is not a convex space, so one can't really take actual mixtures of its elements. However, one can embed {heads, tails} into the space λ "heads" + (1 − λ) "tails" | λ ∈ [0, 1] , using the map which sends "heads" → 1 "heads" + 0 "tails" and "tails" → 0 "heads" + 1 "tails". In this new space, one can actually take convex combinations: for example, 1/2 "heads"+1/2 "tails" is now a convex combination of the extremal points "heads" and "tails". In general one does not only take finite convex combinations, but rather integrals with respect to normalized measures, so we are talking about generalized mixtures, in the sense of Choquet theory [Win85]. The interpretation is nevertheless the same.
• Given an object X, which we can think of a set of possible (deterministic) states, we can form an object P X, which contains "formal mixtures" of elements of X; • Every function f : X → Y gives a function P f : P X → P Y by convex-linear extension; • X is embedded into P X via a map δ : X → P X which maps an element x ∈ X to the trivial formal convex combination x; • Formal mixtures of formal mixtures can be evaluated using the map E : P P X → P X, as the following example illustrates.
Example 6.2. Suppose that you have two coins in your pocket. Suppose that one coin is fair, with "heads" on one face and "tails" on the other; suppose the second coin has "heads" on both sides. Suppose now that you draw a coin randomly, and flip it.
We can sketch the probabilities in the following way: ? coin 1 coin 2 heads tails heads tails 1/2 1/2 1/2 1/2 1 0 Let X be the set {"heads", "tails"}. A coin gives a law according to which we will obtain "heads" or "tails", so it determines an element of P X. Since the choice of coin is also random (we also have a law on the coins), the law on the coins determines an element of P P X. By averaging, the resulting overall probabilities are ? heads tails In other words, the "average" or "composition" can be thought of as an assignment E : P P X → P X, from laws of "random random variables" to laws of ordinary random variables.
There are spaces, like for example R, where one can take actual mixtures. These correspond exactly to the algebras of P . In other word, a P -algebra is a convex space of some sort, a space which is closed under mixture operations (usually, a convex subset of some vector space). Taking expectation values is one of the most important operations in probability theory: the spaces where this can be done are precisely the algebras of a probability monad.
The details of how this is carried out in practice vary, depending on the choice of category, of monad, and so on. So in particular, one may get different sorts of "convex spaces". The probability monad that we use in this section, the Kantorovich monad, has as algebras precisely the closed convex subsets of Banach spaces (see [FP17]). Another example in the literature is the Radon monad on the category of compact Hausdorff spaces: its algebras are precisely the compact convex subsets of locally convex topological vector spaces [Św74,Kei08].
Let's define the Kantorovich monad. It is a monad on the category CMet of complete metric spaces and short maps, i.e. functions f : X → Y such that for every x, x ′ ∈ X, Definition 6.3. Let X be a complete metric space. The Kantorovich-Wasserstein space P X is the space whose elements are Radon probability measures on X with finite first moment, and whose distance is given by: where the supremum is taken over all the short maps X → R.
The assignment X → P X is part of a functor: we can assign to each morphism f : X → Y a morphism P f : P X → P Y given by the push-forward of probability measures. In other words, if p ∈ P X and A is a measurable subset of Y , then: The unit of the monad is given by the Dirac delta map δ : X → P X, which assigns to each x ∈ X the Dirac mass δ x concentrated at X. The composition E : P P X → P X is given by integration, as in Example 6.2: if µ ∈ P P X and A is a measurable subset of X, then The algebras of the Kantorovich monad must be first of all objects of our category, i.e. complete metric spaces. Moreover, as we have seen, they should be closed with respect to convex combinations in some sense, as for example convex regions of a vector space. Closed convex subsets of Banach spaces are then an ideal candidate: they are complete metric spaces, and they are convex. It can be proven [FP17, Section 5.3] that the Palgebras in CMet are exactly closed convex subsets of Banach spaces, with the structure map given by the (Bochner) integral.
For more details, we refer the reader to [FP17] and [Per18].

Partial expectations and conditional expectations
Let's now study partial evaluations for algebras of the Kantorovich monad. In this section, we will consider the Kantorovich monad on unordered spaces, i.e. on CMet. The intuition is that p is "more concentrated" than q, or "closer to a delta at its center of mass". From the statistical point of view, p is better approximated by just looking at its expectation than q, since q is "more spread out".
We have the following result [Per18, Theorem 2.6.9]: Theorem 6.4. For every naturality square of the multiplication transformation E : P P ⇒ P , the weak universality property required of a weak pullback holds for maps out of the singleton space 1.
This is enough to see that the partial evaluation relation for algebras of the Kantorovich monad is transitive. But more generally, we do not know: Problem 6.5. Is the Kantorovich monad weakly cartesian?
In probability theory there exists already a concept that intuitively is a "partial expectation", namely, conditional expectation of random variables. It turns out that the two concepts are in some sense equivalent.
The material in this subsection can be found more in detail in [Per18, Section 4.2.1], in particular, all the proofs can be found there (mind however the difference in the terminology). The result is closely related to previous work of Winkler and Weizsäcker (see [Win85] and the discussion therein).
Definition 6.6. Consider a probability space (X, F, µ), a sub-σ-algebra G of F, and measurable mappings f, g : X → A such that f * µ and g * µ have finite first moment. We say that g is a conditional expectation of f given G if: • The function g is also G-measurable; • For every G in the σ-algebra G, we have For brevity, we extend the terminology to the image measures themselves: Definition 6.7. Let p, q ∈ P A. We call a conditional expectation of p into q in distribution a probability space (X, F, µ) together with a sub-σ-algebra G of F, and mappings f, g : X → A, with f F-measurable and g G-measurable, such that p = f * µ, q = g * µ, and g is a conditional expectation of f given G.
Here is now the main result [Per18, Theorem 4.2.14]: Theorem 6.8. Let A be a P -algebra, and let p, q ∈ P A. The following conditions are equivalent: (a) There exists a partial evaluation of p into q; (b) There exists a conditional expectation of p into q.
So, in particular, the law of total evaluation of Proposition 2.4 corresponds to the well-known law of total expectation of random variables.
This does not mean, however, that whenever there is a partial evaluation of p into q, their associated random variables are in relationship of conditional expectation: we are only looking at the distributions, and not at the correlations between the random variables. In other words, the theorem does not give an equivalence of structures (partial evaluations and conditional expectations), but merely an equivalence of properties of admitting those structures. The question of whether the equivalence can be strengthened to an equivalence of structures, up to suitable isomorphism, is currently still open.
Just as well, also the inverse process, partial decomposition, is known in probability theory, and it goes under the name of a dilation: a random map which intuitively "only spreads, but does not translate" (think of diffusion without drift, or the kernel of a martingale). In statistics, this corresponds to "adding unbiased noise", or "casual, not systematic errors". Definition 6.9. Let A be a P -algebra. A dilation is a map k : A → P A, which we write a → k a , such that for all a ∈ A, e(k a ) = a. Let now p ∈ P A. A p-dilation is a map t : A → P A such that for p-almost all a ∈ A, e(k a ) = a.
The most trivial dilation is the delta. Clearly, every dilation is a p-dilation. We have then again the following result [Per18, Lemma 4.2.17].
Theorem 6.10. Let A be a P -algebra, and let p, q ∈ P A. The following conditions are equivalent: (a) There exists a partial decomposition of p into q; (b) There exists a p-dilation k such that E • k * p = q.
We have gained an extra interpretation: in the context of probability, a partial decomposition of p into q is a process of "adding noise", or "letting diffusion take place". Conversely, we can then also interpret partial evaluations as "removing noise".