Bimonoidal Structure of Probability Monads

We give a conceptual treatment of the notion of joints, marginals, and independence in the setting of categorical probability. This is achieved by endowing the usual probability monads (like the Giry monad) with a monoidal and an opmonoidal structure, mutually compatible (i.e. a bimonoidal structure). If the underlying monoidal category is cartesian monoidal, a bimonoidal structure is given uniquely by a commutative strength. However, if the underlying monoidal category is not cartesian monoidal, a strength is not enough to guarantee all the desired properties of joints and marginals. A bimonoidal structure is then the correct requirement for the more general case. We explain the theory and the operational interpretation, with the help of the graphical calculus for monoidal categories. We give a definition of stochastic independence based on the bimonoidal structure, compatible with the intuition and with other approaches in the literature for cartesian monoidal categories. We then show as an example that the Kantorovich monad on the category of complete metric spaces is a bimonoidal monad for a non-cartesian monoidal structure.


August 2018
We give a conceptual treatment of the notion of joints, marginals, and independence in the setting of categorical probability. This is achieved by endowing the usual probability monads (like the Giry monad) with a monoidal and an opmonoidal structure, mutually compatible (i.e. a bimonoidal structure). If the underlying monoidal category is cartesian monoidal, a bimonoidal structure is given uniquely by a commutative strength. However, if the underlying monoidal category is not cartesian monoidal, a strength is not enough to guarantee all the desired properties of joints and marginals. A bimonoidal structure is then the correct requirement for the more general case.
We explain the theory and the operational interpretation, with the help of the graphical calculus for monoidal categories. We give a definition of stochastic independence based on the bimonoidal structure, compatible with the intuition and with other approaches in the literature for cartesian monoidal categories. We then show as an example that the Kantorovich monad on the category of complete metric spaces is a bimonoidal monad for a non-cartesian monoidal structure.

Introduction
The standard way to treat randomness categorically is via a probability monad, of which classic examples are the Giry monad [Gir82] and the probabilistic powerdomain [JP89]. The interpretation is the following: let C be a category whose objects we think of as spaces of possible values that a variable may assume. A probability monad P on C makes it possible to talk about random variables on objects X ∈ C, or equivalently random elements of X: an element p ∈ P X specifies the law of a random variable on X.
A central theme of probability theory is that random variables can form joints and marginals. For this to make sense in C, we need C to be a monoidal category, and we need P to interact well with the monoidal structure. We argue that this interaction is best modelled in terms of a bimonoidal structure.
A first structure which links a monad with the tensor product in a category is that of a strength. A strength for a probability monad is a natural map X ⊗ P Y → P (X ⊗ Y ), whose interpretation is the following: an element of X and a random element of Y determine uniquely a random element of X ⊗ Y which has the correct marginals, and whose randomness is all in the Y component. In the language of probability theory, (x, q) ∈ X ⊗ P Y defines the product distribution of δ x and q on X ⊗ Y . In the literature, the operational meaning of a strength for a monad, which includes the usage in probability, is well explained in [PP02], and in [JP89] for the case of the probabilistic powerdomain. A compendium of probability monads appearing in the literature, with information about their strength, can be found in [Jac17].
The monoidal structure can be thought of as a refinement of the idea of strength. The basic idea is that given two probability measures p ∈ P X and q ∈ P Y , one can canonically define a probability measure p ⊗ ∇ q ∈ P (X ⊗ Y ), the "product distribution" 1 . This is not the only possible joint distribution that p and q have, but it can be obtained without additional knowledge (of their correlation). When a strength satisfies suitable symmetry conditions (commutative strength) it defines automatically a monoidal structure [Koc72,GLLN08].
An opmonoidal structure formalizes the dual intuition, namely that given a joint probability distribution r ∈ P (X ⊗ Y ) we canonically have the marginals on P X and P Y as well. A bimonoidal structure is a compatible way of combining the two structures, in a way consistent with the usual properties of products and marginals in probability. When 1 Our reason for denoting it by p ⊗∇ q rather than by p ⊗ q is that we want to interpret p : 1 → P X and q : 1 → P Y as morphisms, so that p ⊗ q : 1 ⊗ 1 → P X ⊗ P Y is not yet the product distribution. Rather, one needs to compose p ⊗ q with the monoidal structure ∇ : P X ⊗ P Y → P (X ⊗ Y ), which is the subject of the present paper, see Section 3.2.
the underlying category is cartesian monoidal, then P is automatically opmonoidal. In this case, we show that if P carries a monoidal structure, then it is automatically bimonoidal. Therefore a commutative strong monad on a cartesian monoidal category is canonically bimonoidal. This is for example the case of the probabilistic powerdomain [JP89]. We argue that the bimonoidal structure is the structure of relevance for probability theory: if the underlying category is not cartesian monoidal, or the strength is not commutative, then one cannot talk about joints and marginals in the usual way just by having a strong monad.
However, not every probability monad in the literature is bimonoidal, not even strong; a famous counterexample is in [Sat18]. While a non-bimonoidal probability monad could be of use in measure theory to talk about spaces of measures, it would be far from applications to probability, since it would not permit talking about concepts like stochastic independence and correlation, which in probability theory play a central role. We thus want to argue that in order for a monad to really count as a probability monad, it should be a bimonoidal monad.
In Section 2 we describe the setting of semicartesian monoidal categories and affine monads, which we argue is the one of relevance for classical probability theory. In such a setting, we will represent the concepts using a graphical calculus analogous to that of [Mel06], presented in 2.1. In Section 3 we will sketch the basic theory and interpretation of a bimonoidal structure for probability monads, using the graphical calculus. The same definitions in terms of commutative diagrams can be found in Appendix A. In 3.1, we will show how this permits to talk about functions between products of random variables. In 3.2, we show how to define a category of probability spaces from a probability monad, in such a way that the monoidal structure is inherited. This permits to connect with other treatments of stochastic independence in the literature. In 3.3 we will see in more detail why this formalism generalizes the strength of probability monads on cartesian monoidal categories. In Section 4, we give a notion of stochastic independence based on the bimonoidal structure of the monad, and show that it satisfies some of the intuitively expected properties. In 4.1 we show that, if the base category is cartesian monoidal, our definition agrees with the one given by Franz [Fra01], and it is compatible with the definition of independence structure given by Simpson [Sim18]. Finally, in Section 5 we will give a nontrivial example of a bimonoidal monad, the Kantorovich monad on complete metric spaces [vB05,FP19]. The precise proofs and calculations of the statements of Section 5 can be found in Appendix B.

Semicartesian monoidal categories and affine monads
By definition, a semicartesian monoidal category is a monoidal category in which the monoidal unit 1 is a terminal object. For probability theory, this is a very appealing feature of a category, because such an object can be interpreted as a trivial space, having only one possible state. In other words, the object 1 would have the property that for every object X, X ⊗ 1 ∼ = X (monoidal unit), so that tensoring with 1 does not increase the number of possible states, and moreover there is a unique map ! : X → 1 (terminal object), which we can think of as "forgetting the state of X". Cartesian monoidal categories are in particular semicartesian. Not every monoidal category of interest in probability theory is cartesian, but most of them are semicartesian (in particular, all the ones listed in [Jac17]).
Semicartesian monoidal categories have another appealing feature for probability: every tensor product space comes equipped with natural projections onto its factors: which satisfy the universal property of the product projections if and only if the category is cartesian monoidal. These maps are important in probability theory, because they give the marginals. Since these projections are automatically natural in X and Y , a semicartesian monoidal category is always a tensor category with projections in the sense of [Fra01, Definition 3.3]; see [Lei16] for more background. 2 Suppose now that P is a probability monad 3 on a semicartesian monoidal category C. Since we can interpret the unit 1 has having only one possible (deterministic) state, it is tempting to say that just as well there should be only one possible random state: if there is only one possible outcome, then there is no real randomness. In other words, it is appealing to require that P (1) ∼ = 1. A monad with this condition is called affine. Most monads of interest for probability are indeed affine (in particular, again, all the ones listed in [Jac17]).
Unless otherwise stated, we will always work in a symmetric semicartesian monoidal category with an affine probability monad. These conditions simplify the treatment a 2 Conversely, a tensor category equipped with natural projections is semicartesian whenever the projection maps X ⊗ 1 → X and 1 ⊗ X → X coincide with the unitors for all objects X. See for example (the dual statement to) [GLS16, Theorem 3.5]. 3 In this work, "probability monad" is not a technical term: any monad could be in principle considered a probability monad. We merely use this term in order to indicate our intended interpretation in terms of randomness, as in the case of the Giry monad or the probabilistic powerdomain.
lot, while keeping most other conceptual aspects interesting. By the remarks above, they seem to be the right framework for classical probability theory. The definition of monoidal, opmonoidal, and bimonoidal monads can however be given for general braided monoidal categories: the interested reader can find them in Appendix A.

Graphical calculus
Here we introduce a form of graphical calculus specializing that of Melliès [Mel06] to our setting. Let C be a strict symmetric semicartesian monoidal category, and P an affine monad. We can represent objects X as vertical lines, and morphisms f : X → Y as boxes: X X and X Y f which we read from top to bottom. Functor applications are represented by shadings. For example the image P X of X under a functor P and the functor image P f : P X → P Y of f are: We can represent monoidal products by horizontal juxtaposition. For example, the map f ⊗ g : X ⊗ A → Y ⊗ B can be represented as: The monoidal unit 1 is better represented by nothing, so that expressions like X ⊗ 1 ∼ = 1 ⊗ X ∼ = X all have the same representation. However sometimes it is helpful to keep track of it, and in those cases we will draw it as a dotted line: For every object X there is a unique map ! : X → 1, which we can interpret as "forgetting the state of X". We will represent such a map as a "ground wire", following the literature on quantum systems: X 1 or, omitting the unit, simply:

X
The condition that P is affine, in picture, is 1 1 = 1 1 or even more trivially: = Since we are in a symmetric monoidal category, there is a canonical braiding isomorphism X ⊗ Y → Y ⊗ X. We represent it as: X Y Y X which one can think of as "swapping" X and Y . In a symmetric monoidal category, if we apply it twice, we obtain the identity.
We turn now to the monad structure of P . The monad unit δ : X → P X is a natural transformation which "puts X into a shading", while the multiplication E : P P X → P X goes from a double shading to a single shading: X X and X X We do not draw a box for these "structure maps": we consider them the canonical maps from their source to their target. The diagrams above will always denote δ and E, never other morphisms.

Monoidal structure of probability monads
Let P be an affine probability monad on a strict symmetric semicartesian monoidal category C. In this setting, a monoidal structure for the functor P amounts to a natural map ∇ : P X ⊗ P Y → P (X ⊗ Y ) with associativity and unitality conditions. In terms of graphical calculus, ∇ is a way to pass from P X ⊗ P Y , i.e.:

X Y
to P (X ⊗ Y ), i.e.:

X Y
so we can represent it as: X X Y Y We again do not put any box, as we consider it the canonical map of the form given by the diagram above. The probabilistic interpretation is the following: given p ∈ P X and q ∈ P Y , there is a canonical (albeit not unique) way of obtaining a joint in P (X ⊗ Y ), namely the product probability. Technically we also should need a map 1 → P (1) ∼ = 1, i.e.
1 1 or, omitting the unit, simply But due to our affineness assumption, such a map can only be the identity. The associativity condition now says that it does not matter in which way we multiply first: so that there is really just one way of forming a product of three probability distributions. The left and right unitality conditions say that: which means that the product distribution of some p ∈ P X with the unique measure on 1 is the same as just p. An opmonoidal structure for the functor P amounts to a natural map ∆ : P (X ⊗Y ) → P X ⊗ P Y , which we represent as: X X Y Y and again a map P (1) → 1, i.e. which in this setting can only be the identity. We have, dually, a coassociativity condition: The probabilistic interpretation is that given a joint probability distribution r ∈ P (X ⊗ Y ), we can canonically obtain marginal distributions on P X and P Y , and again, if we have many factors, it does not matter in which order we take the marginals. Analogously, we have left and right counitality conditions: which say that the marginal distribution of some p ∈ P (X ⊗ 1) on the first factor (or of some p ∈ P (1 ⊗ X) on the second factor) is just p again.
The monoidal and opmonoidal structure should interact to form a bimonoidal structure [AM10] for the functor P . To have that, we have first of all some unit-counit conditions, which in our setting are trivially satisfied, since they only involve maps to 1. But more importantly, the following bimonoidality (or distributivity) condition needs to hold: where the center of the diagram on the right is a swap of P X and P Y . The probabilistic interpretation is a bit involved, and it has to do with stochastic independence. We will analyze it separately in Section 4.
which means that the product of the average is the average of the product, and that the marginals of an average are the averages of the marginals. These last conditions may seem a bit obscure, but they come up naturally in probability: see as an example the case of the Kantorovich monad (Section 5 and its proofs in Appendix B). We are in other words requiring that P is a bimonoidal monad.
Definition 3.1. A bimonoidal monad (P, δ, E) is a monad whose functor is a bimonoidal functor, and whose unit and multiplication are bimonoidal natural transformations.
The definition above works in general, however the particular conditions for the monoidal and opmonoidal structure which have been given here suffice only in the specific context of a semicartesian monoidal category with an affine monad. In Appendix A there is a more general definition, for generic symmetric monoidal categories. The definition given there specializes to the one given above in this context.
As far as we know, this kind of structure has not been considered before in this exact form. Monads in a general bicategory are a standard concept, however to the best of our knowledge the bicategory of monoidal categories, bimonoidal functors, and bimonoidal natural transformations has not been used explicitly. In particular, it has not been used in categorical probability. To avoid possible confusion, let us also point out that the notion of a bimonoidal monad is a distinct concept from that of a bimonad [Wil08].
Most probability monads in the literature have an additional symmetry: the multiplication and comultiplication commute with the braiding, i.e. they are equivariant with respect to permutations of random variables. This means in diagrams that Such a functor (and such a monad) is called braided or symmetric. A definition in terms of traditional commutative diagrams can again be found in Appendix A.

Algebra and coalgebra of random variables
The so-called "law of the unconscious statistician" says that given a function f : X → Y and a random variable on X with law p ∈ P X, the law of the image random variable under f will be the push-forward of p along f . In categorical terms, this simply means that P is a functor, and that the image random variable has law (P f )(p), where P f : P X → P Y is given by the push-forward. The bimonoidal structure of P comes into play whenever we have functions to and from product spaces. Consider a morphism f : X ⊗ Y → Z, which we represent as: Given random variables X and Y , we can form an image random variable on Z in the following way: first we form the joint on X ⊗ Y using the monoidal structure, and then we form the image under f . In other words, in terms of laws we perform the following composition: X Y Z f For maps in the form g : X → Y ⊗ Z we can proceed analogously by forming the marginals, using the opmonoidal structure: This way, together with associativity and coassociativity, one can form functions to and from arbitrary products of random variables.
Whenever we have an internal structure, like an internal monoid or group, this way we can extend the operations on the random elements, via convolution. For example, if X is a monoid, then also P X becomes a monoid, using P X ⊗ P X → P (X ⊗ X) → P X for the multiplication. The analogous statements apply for coalgebraic structures. In other words, the bimonoidal structure allows to have an algebra and coalgebra of random variables whenever the deterministic variables form an internal algebraic structure. For a concrete example, if as monoid we take the real line with addition, as convolution algebra we get the usual convolution of probability measures. We notice that such a convolution algebra is a monoid (with the neutral element given by the Dirac delta at zero), but not a group: only the monoid structure is inherited, in general.

The category of random elements
In the literature, many categorical treatments of statistical dependence work in categories whose objects are probability spaces, or fixed probability measures on a space, rather than categories with a probability monad [Fra01,Sim18]. One can form probability spaces from a probability monad in a canonical way: Definition 3.2. Let C be a category with terminal object 1 and P a probability monad on C. Then the category Prob(C) is defined to be the co-slice category 1/P . In other words: • Objects of Prob(C) are objects X of C together with arrows 1 → P X of C; In analogy with the category of elements, we can interpret Prob(C) as a category of random elements, or of probability spaces. The objects can be interpreted as elements of P X, i.e. probability measures on X, and the morphisms can be interpreted as maps preserving the selected element in the space of measures, i.e. measure-preserving maps.
Under some mild assumptions, if C has a semicartesian monoidal structure we can transfer that structure to the category of random elements, with a construction analogous to that of Section 3.1.
Definition 3.3. Let C be a semicartesian monoidal category and P an affine probability monad on C with monoidal structure ∇. We define the following monoidal structure on Prob(C): given p : 1 → P X and q : 1 → P Y , we define p ⊗ ∇ q : 1 → P (X ⊗ Y ) to be the composition: p⊗q ∇ and for morphisms we proceed analogously.
This way (Prob(C), ⊗ ∇ ) is a semicartesian monoidal category, with the unit 1 → 1 isomorphic to the terminal object. In particular, it is always a tensor category with projections in the sense of [Fra01], generalizing the construction given in Section 3.1 therein (in which the base category Meas is cartesian monoidal). In general (and in all interesting cases in the literature), Prob(C) equipped with this monoidal structure is not cartesian monoidal, not even if C is: the product probability does not satisfy the universal property of a categorical product (see for example [Fra01] for a discussion on this). 4 Some of the upcoming results will refer to Prob(C), whose objects we also call laws, as they generalize laws of random variables. In particular we will use the notation p ⊗ ∇ q for the product probability.

Bimonoidal monads on a cartesian monoidal category
Suppose now that the monoidal structure of C is cartesian monoidal, i.e. that the monoidal product is given by the categorical product (so, in particular, C is semicartesian). The projection maps π 1 : X × Y → X and π 2 : X × Y → Y now satisfy a universal property. Let's now apply P , so that we get maps P π 1 : P (X × Y ) → P X and P π 2 : P (X × Y ) → P Y . By the universal property of the product, there is then a unique map P (X × Y ) → P X × P Y compatible with the projections, i.e. making the following diagram commute: This gives a natural map ∆ : P (X × Y ) → P X × P Y . Such a map exists and is unique for any (finite) number of factors, so it is automatically associative. Therefore P has a canonical opmonoidal structure. This is true for all functors P between cartesian monoidal categories. Moreover, this opmonoidal structure is unique, due to naturality, Suppose now that P in addition has a (given) monoidal structure ∇. By the universal property of the product, it is straightforward to see that the bimonoid diagram (3.1) commutes automatically. Therefore, whenever C is cartesian monoidal, it suffices to have a monoidal structure to obtain a bimonoidal structure: Proposition 3.4. In a cartesian monoidal category, a bimonoidal monad is the same structure as a monoidal monad.
In particular, since a monoidal structure is equivalent to a commutative strength (see [Koc72] for the closed monoidal case, and [GLLN08, Appendix A4] for the general case), a commutative strong monad on a cartesian monoidal category is automatically bimonoidal in a unique way. This is what happens, for example, for the probabilistic powerdomain on the category of domains. However, not all bimonoidal probability monads arise in this way. In Section 5, we will give an example of a bimonoidal probability monad on a non-cartesian monoidal category, the Kantorovich monad on complete metric spaces.

Stochastic independence
Our framework allows to give a formal definition of stochastic dependence and independence in categorical terms, closely related to other notions appearing in the literature [Fra01,Sim18].
First of all, we look at an important consequence of the bimonoidality condition (3.1): stochastic dependence can only be forgotten, not created. Consider two spaces X and Y . Then given a joint distribution r ∈ P (X ⊗ Y ), we can form the marginals r X ∈ P X and r Y ∈ P Y . If we try to form a joint again, via the product, the correlation is lost. Vice versa, instead, if we have two marginals, form their joint, and then divide them again into marginals, we expect to get our initial random variables back. Graphically: This is indeed the case under the assumptions that we've made so far: Proposition 4.1. Let X, Y be objects of a symmetric semicartesian monoidal category C. Let P : C → C be a bimonoidal endofunctor, with P (1) ∼ = 1. Then ∆ • ∇ = id P X⊗P Y .
In particular, P X ⊗ P Y is a retract of P (X ⊗ Y ).
The proposition above is proved graphically in Appendix B.1. It is a special case of a standard result about the so-called normal bimonoidal functors, which can be found for example in [AM10, Section 3.5].
In general we do not get any condition ∇ • ∆ = id P (X⊗Y ) , i.e. in general An example is given by X = Y = {0, 1}, with a perfectly correlated and uniform distribution. So correlation can be forgotten, but not created, by the bimonoidal structure maps.
Going further, we can use these structures in order to talk about probabilistic independence: Definition 4.2. X and Y are independent for the law r : That is, applying the left-hand side of (4.2) gives the same as applying the right-hand side if and only if we have independence.
We are now ready for the probabilistic interpretation of the bimonoidality condition (3.1), which gives its main motivation: Consider any joints W X and Y Z, and form their product. In the resulting distribution, W will be independent of Y , and X will be independent of Z. More rigorously: Proposition 4.3. Let W, X, Y, Z be objects of a symmetric semicartesian monoidal category C. Let P : C → C be a bimonoidal functor, with P (1) ∼ = 1. Let r : 1 → P (W ⊗X) and s : 1 → P (Y ⊗ Z), and consider the law r ⊗ ∇ s : Then after forgetting X and Z, for the resulting law W and Y are independent. Just as well, after forgetting W and Y , for the resulting law X and Z are independent.
A graphical proof in terms of Definition 4.2 is given as well in Appendix B.1. This result forms part of the semi-graphoid axioms [PP85] which axiomatize properties of conditional independence, namely in the case where the conditioning is trivial.
Concretely, Proposition 4.3 corresponds to the axiom of decomposition, stating that if X is independent from (Y, Z), then X is also independent from Y . The semi-graphoid axiom of symmetry (if X is independent of Y , then Y is independent of X) is also satisfied whenever we have a symmetric bimonoidal monad.

Comparison with other notions of independence
Franz [Fra01] defines stochastic independence in a semicartesian monoidal category in the following way: given objects A, B 1 , B 2 (which one can think of as probability spaces), and arrows f 1 : A → B 1 and f 2 : A → B 2 (which one can think of measure-preserving maps), then f 1 and f 2 are independent if and only if there exists h : A → B 1 ⊗ B 2 making this diagram commute: where π 1 , π 2 are the projections of the tensor product. He then proves [Fra01, Proposition 3.5] that in the category Prob of (traditional) probability spaces, this notion of independence is equivalent to the standard one of probability theory. We propose a generalization of that result, which holds for categories of random elements obtained by generic cartesian monoidal categories. So in the case of cartesian monoidal base categories, the two approaches agree. The proof can be found in Appendix B.2, and goes along the lines of the proof of [Fra01, Proposition 3.5].
Simpson [Sim18] defines an independence structure as a certain collection of multispans that contains the singleton families. Given again a cartesian monoidal category C and an affine monad P on C, and given a finite multispan {f i : A → B i } i∈I in C, we can form a multispan in the category Prob(C) by precomposing with a law r : 1 → P A. We can call such a resulting multispan independent, in analogy with Definition 4.2, iff where (f i ) i∈I : A → i∈I B i is the tupling of the f i given by the cartesian monoidal structure, and ∇ I and ∆ I are the maps respectively i (P B i ) → P ( i B i ) and P ( i B i ) → i (P B i ) obtained by iterating respectively ∇ and ∆ (by associativity and coassociativity, the resulting maps are unique). Independent multispans defined in this way form then an independence structure in the sense of [Sim18, Definition 2.1], in a way analogous to Examples 2.1 and 2.2 therein: they are closed with respect to multispan composition, and to forming subfamilies. Therefore, again in the case of a cartesian monoidal base category, our definition is compatible with Simpson's approach.

Bimonoidal structure of the Kantorovich monad
The Kantorovich monad is a probability monad on complete metric spaces. It was first defined by van Breugel for compact and for complete 1-bounded metric spaces [vB05]. We will use here the definitions and results of [FP19], which work for all complete metric spaces.
Consider the category CMet whose: • Objects are complete metric spaces; • Morphisms are short maps, i.e. functions f : for all x, x ∈ X; • As monoidal structure, we define X ⊗ Y to be the set X × Y , with the metric: d (x, y), (x , y ) := d(x, x ) + d(y, y ).
This category can be thought of as a category of enriched categories and functors [Law73, Section 2], and the monoidal structure is closed but not cartesian. Further motivation for the choice of this category is given in [FP19]. In particular, by choosing as morphisms the short maps, one can obtain P X as a colimit of spaces of empirical distributions of finite sequences [FP19, Section 3], which would not be possible if one allowed for more general morphisms (like continuous or Lipschitz functions).
We recall the basic definitions of [FP19].
Definition 5.1. Let X be a complete metric space.
• A Radon probability measure p on X is said to have finite first moment if for every short map f : X → R, X f dp < ∞.
Every such probability measure can be specified uniquely by its integration against short maps to R: the set of such measures can be identified with the set of positive, Scott-continuous linear functionals on the space of Lipschitz functions on X. Hence, in the following, we explicitly construct such measures by specifying their action on short maps.
• The Kantorovich-Wasserstein space P X is the space of all Radon probability measures on X with finite first moment, equipped with the metric: where the supremum ranges over all short maps X → R. With this metric, P X is itself a complete metric space.
• Given f : X → Y , we define P f : P X → P Y as the map assigning to p ∈ P X its push-forward measure (P f )(p) := f * p ∈ P Y . The latter is defined by saying that for all g : f * p also has finite first moment, and this assignment makes P into a functor.
A concise treatment of Wasserstein spaces can be found in [Bas15] and a more comprehensive one in [Vil09]. For the basic measure-theoretic setting, we refer the reader to [Bog00,Edg98].
The functor P admits a monad structure, with the unit δ : X → P X given by the Dirac distributions X f (y) d(δ(x))(y) := f (x), and the multiplication E : P P X → P X given by forming the expected or average distribution, We can now define product joints and marginals, which will equip P with a bimonoidal structure.
Definition 5.2. Let p ∈ P X, q ∈ P Y . We denote p ⊗ ∇ q the joint probability measure on X ⊗ Y defined by: Let now r ∈ P (X ⊗ Y ). We denote (r X ) the marginal probability on X defined by: The marginal on Y is defined analogously.
It is straightforward to check that the functionals defined in Definition 5.2 are positive, linear, and Scott-continuous, therefore they specify uniquely Radon probability measures of finite first moment.
In the rest of this section we will show that the joints and marginals in Definition 5.2 equip the Kantorovich monad on CMet with a bimonoidal monad structure (Theorem 5.15). The proofs with the actual calculations are in Appendix B.
We will prove now that the product joint construction equips P with a monoidal structure.
Definition 5.3. Let X, Y ∈ CMet. We define the map ∇ : P X ⊗ P Y → P (X ⊗ Y ) as mapping (p, q) ∈ P X ⊗ P Y to the joint p ⊗ ∇ q ∈ P (X ⊗ Y ).
Therefore, ∇ is a morphism of CMet. This would not be the case if we took as monoidal structure for CMet the cartesian product: for the product metric, ∇ is Lipschitz, but in general not short. The fact that ∇ equips P with a monoidal structure now follows directly from the naturality and associativity of the product probability construction (as sketched in Section 3). In other words, the proofs of the next three statements (see Appendix B.3) can be adapted to most other categorical contexts in which the map ∇ is of a similar form.
Proposition 5.5. ∇ : P X ⊗ P Y → P (X ⊗ Y ) is natural in X and Y .
We know that a monoidal monad is the same as a commutative monad, and therefore obtain: We now turn to the analogous statements for the marginals, and show that they equip P with an opmonoidal structure.
Definition 5.9. Let X, Y ∈ CMet. We define the map ∆ : P (X ⊗ Y ) → P X ⊗ P Y as mapping r ∈ P (X ⊗ Y ) to the pair of marginals (r X , r Y ) ∈ P X ⊗ P Y .
Therefore ∆ is a morphism of CMet. Again, the following statements follow just from the properties of marginals, and their proofs (see Appendix B.4) can be adapted to most other categorical contexts provided that ∆ is of a similar form.
Proposition 5.11. ∆ : P (X ⊗ Y ) → P X ⊗ P Y is natural in X, Y .
The lax and oplax monoidal structure interact to give a bimonoidal structure. The following statements also follow just from the properties of joints and marginals.
The main result then just follows as a corollary: Theorem 5.15. The Kantorovich monad is a symmetric bimonoidal monad, with monoidal structure given by the product joint, and opmonoidal structure given by the marginals.
By Proposition 4.1, we therefore have: Corollary 5.16. ∆ X,Y • ∇ X,Y = id P X⊗P Y . Therefore, the inclusion ∇ of product measures into general joints, is an isometric embedding for the Kantorovich metric, and its image is a retract of the space of all joints.

A. Monoidal, opmonoidal and bimonoidal monads
We recall the definition of the different monoidal structures for a functor, for the case of braided (including symmetric) monoidal categories. For more results and more general definitions, we refer to [AM10].
Let (C, ⊗) and (D, ⊗) be braided monoidal categories. (c) The "multiplication" ∇ : (d) The following "associativity" diagram commutes for every X, Y, Z in C: The following "unitality" diagrams commute for every X in C: We say that (F, η, ∇) is also braided, or symmetric if C is symmetric, if in addition the multiplication commutes with the braiding: Definition A.2. Let (F, η F , ∇ F ) and (G, η G , ∇ G ) be lax monoidal functors (C, ⊗) → (D, ⊗). A lax monoidal natural transformation, or just monoidal natural transformation when it's clear from the context, is a natural transformation α : F ⇒ G which is compatible with the unit and multiplication map. In particular, the following diagrams must commute (for all X, Y ∈ C): 3. An oplax monoidal functor (C, ⊗) → (D, ⊗) is a triple (F, , ∆), such that: (a) F : C → D is a functor; (b) The "counit" : F (1 C ) → 1 D is a morphism of D; (c) The "comultiplication" ∆ : (d) The following "coassociativity" diagram commutes for every X, Y, Z in C: The following "counitality" diagrams commute for every X in C: We say that (F, , ∆) is also braided, or symmetric if C is symmetric, if in addition the comultiplication commutes with the braiding: Definition A.4. Let (F, F , ∆ F ) and (G, G , ∆ G ) be oplax monoidal functors (C, ⊗) → (D, ⊗). An oplax monoidal natural transformation, or just monoidal natural transformation when it's clear from the context, is a natural transformation α : F ⇒ G which is compatible with the counit and comultiplication map. In particular, the following diagrams must commute (for all X, Y ∈ C): (c) The following "bimonoidality" diagram commutes: The following three "unit/counit" diagrams commute: Definition A.6. Let (F, F , ∆ F ) and (G, G , ∆ G ) be bilax monoidal functors (C, ⊗) → (D, ⊗). A bilax monoidal natural transformation, or just monoidal natural transformation when it's clear from the context, is a natural transformation α : F ⇒ G which is a lax and oplax natural transformation.
Definition A.7. Now, we define: • A monoidal monad is a monad in the bicategory of monoidal categories, lax monoidal functors, and monoidal natural transformations; • An opmonoidal monad is a monad in the bicategory of monoidal categories, oplax monoidal functors, and monoidal natural transformations; • A bimonoidal monad is a monad in the bicategory of braided monoidal categories, bilax monoidal functors, and monoidal natural transformations.
In the third definition, we need the symmetry (or at least a braiding) in order to express the bimonoid equation that is part of the definition of bilax monoidal functor [AM10], even if the functor itseld if not braided. If the functor is braided, we can define in addition: • A braided (resp. symmetric) monoidal monad is a monad in the bicategory of braided (resp. symmetric) monoidal categories, braided lax monoidal functors, and monoidal natural transformations; • An braided (resp. symmetric) opmonoidal monad is a monad in the bicategory of braided (resp. symmetric) monoidal categories, braided oplax monoidal functors, and monoidal natural transformations; Now since the braiding at 1 ⊗ 1 is just the identity, we can even simplify the condition to: or more concisely: We now notice that: = since both maps are just the identities at 1. This is the crucial step. We are left with: which because of all the unit and counit conditions is equivalent to i.e. equation (4.1).
Proof of Proposition 4.3. Consider the left side of (3.1) and forget X and Z using the unique maps to 1, and compose at the remaining W ⊗ Y with the left-hand side of (4.2). We get: Applying (3.1) on the left and affinity of P on the right we get: and applying (4.1) on the left we now get: which, before applying the ground wire maps, is the right-hand side of (3.1), which is therefore equal to its left-hand side. Hence by Definition 4.2, W is independent of Y for any law in the form given in the hypothesis. For X and Z we can proceed analogously.

B.2. Proof of equivalence of the notions of independence
Proof of Proposition 4.4. In Prob(C), f 1 and f 2 are independent in the sense of Franz with respect to the law s : 1 → P A if and only if there exists h : A → B 1 × B 2 such that the following diagram commutes: where π 1 and π 2 are the projections of C, where the dotted arrows from 1, with a slight abuse of notation, denote Kleisli morphisms (s : 1 → P A, etcetera), and where r 1 and r 2 denote the resulting laws on B 1 and B 2 . Now suppose that such an h exists. By the universal property of the product, it must necessarily be equal to (f 1 , f 2 ). Therefore P (f 1 , f 2 ) • s = r 1 ⊗ ∇ r 2 = ∇ • (r 1 ⊗ r 2 ). Now using Proposition 4.1, so B 1 and B 2 are independent in the sense of Definition 4.2.

B.3. Monoidal structure of the Kantorovich monad
In order to prove Proposition 5.4, first a useful result: is short as well.
Proof of Proposition B.1. First of all, f : X ⊗ Y → R being short means that for every x, x ∈ X, y, y ∈ Y : Now: Proof of Proposition 5.4. To prove that ∇ it is short, let p, p ∈ P X, q, q ∈ P Y . Then where by replacing the partial integral of f by g we have used Proposition B.1.
Proof of Proposition 5.5. By symmetry, it suffices to show naturality in X. Let f : X → Z. We need to show that this diagram commutes: Now let p ∈ P X, q ∈ P Y , and g : Proof of Proposition 5.6. Since both maps are natural, we only need to check the coherence diagrams. Since the unitor is just the identity at the terminal object, the unit diagrams commute. The associativity diagram at each X, Y, Z gives for (p, q, r) ∈ P X ⊗ P Y ⊗ P Z on one path and on the other path The product of probability distributions is now associative, as a simple calculation can show. The symmetry condition is straightforward.
Proof of Proposition 5.7. We know that (P, id 1 , ∇) is a lax monoidal functor. We need to check now that δ and E are monoidal natural transformations. Again we only need to show the commutativity with the multiplication, since the unitor is trivial. For δ : id CMet ⇒ P we need to check that this diagram commute for each X, Y : which means that for each x ∈ X, y ∈ Y δ x ⊗ ∇ δ y = δ (x,y) , which is easy to check (the delta over the product is the product of the deltas). For E : P P ⇒ P we first need to find the multiplication map ∇ 2 X,Y : P P X ⊗ P P Y → P P (X ⊗ Y ) (the unit is just twice the deltas, and the unit diagram again trivially commutes). This map is given by and more explicitly, if µ ∈ P P X, ν ∈ P P Y , and f : P (X × Y ) → R, Now we have to check that this map makes this multiplication diagram commute: Now let µ ∈ P P X, ν ∈ P P Y , and g : X × Y → R. We have, using the formula for ∇ 2 found above, g(x, y) dp(x) dq(y) dµ(p) dν(q) Therefore the diagram commutes, and (P, δ, E) is a monoidal monad.

B.4. Opmonoidal structure of the Kantorovich monad
Just as in the case of joints, to prove the Proposition 5.10 we first prove the following useful result.
Proposition B.2. Let f : X → R and g : Y → R be short. Then (f + g) : X ⊗ Y → R given by (x, y) → f (x) + g(y) is short.
Proof of Proposition B.2. Let x, x ∈ X and y, y ∈ Y . Then Proof of Proposition 5.10. To prove that ∆ is short, let p, q ∈ P (X ⊗ Y ), and denote p X , p Y , q X , q Y their marginals. Then: where by replacing f + g with h we have used Proposition B.2.
Proof of Proposition 5.11. By symmetry, it suffices to show naturality in X. Let f : X → Z. We need to show that this diagram commutes: Let now p ∈ P (X ⊗ Y ). We have to prove that: On one hand: On the other hand, let h : Z → R and g : Y → R be short. Then: and: Y g(y) d(((f ⊗ id) * p) Y )(y) = Z⊗Y g(y) d((f ⊗ id) * p)(z, y) = X⊗Y g(y) dp(x, y) = Y g(y) dp Y (y), so the two components are again (f * p X , p Y ).
Proof of Proposition 5.12. We already have naturality of the maps, and the counitor is trivial, we just have to check coassociativity. Namely, that the following diagrams commutes for each X, Y, Z: Now given p ∈ P (X ⊗ Y ⊗ Z), we get: (∆ X⊗Y ⊗ id) • ∆ X⊗Y,Z (p) = (∆ X⊗Y ⊗ id)(p XY , p Z ) = (p X , p Y , p Z ), and: since there is only one way of forming marginals. The symmetry condition is again straightforward.
Proof of Proposition 5.13. We know that (P, id 1 , ∆) is an oplax monoidal functor. We need to check now that δ and E are comonoidal natural transformations. Again we only need to show the commutativity with the comultiplication, since the counitor is trivial. For δ : id CMet ⇒ P we need to check that this diagram commute for each X, Y : which means that for each x ∈ X, y ∈ Y , (δ (x,y) ) X = δ x and (δ (x,y) ) Y = δ y , which is again easy to check (the marginals of a delta are the deltas at the projections). For E : P P ⇒ P we first need to find the comultiplication map ∆ 2 X,Y : P P (X ⊗ Y ) → P P X ⊗ P P Y (the unit is just twice the deltas, and the unit diagram again trivially commutes). This map is given by: P (P (X ⊗ Y )) P (P X ⊗ P Y ) P (P X) ⊗ P (P Y ) (∆ XY ) * ∆ P X,P Y and more explicitly, if µ ∈ P (P (X ⊗ Y )), and f : P X → R and g : P Y → R are short: f (r Y ) dµ(r).
We have to check that this map makes this multiplication diagram commute: Now let µ ∈ P (P (X ⊗ Y )), and f : X → R and g : Y → R short. We have, using the formula for ∆ 2 found above: and analogously: which means: Therefore the diagram commutes, and (P, δ, E) is an opmonoidal monad.

B.5. Bimonoidal structure of the Kantorovich monad
Proof of Proposition 5.14. We already know that P is lax and oplax. We only need to check the compatibility diagrams between the two structures. The unit diagrams are trivial, because the unitors are trivial. The bimonoidality diagram: says that given p ∈ P (W ⊗ X), q ∈ P (Y ⊗ Z): Now on one hand: On the other hand: ∆ W ⊗Y,X⊗Z • ∇ W ⊗X,Y ⊗Z (p, q) = ∆ W ⊗Y,X⊗Z (p ⊗ ∇ q).
The marginal of p ⊗ q on W ⊗ Y is, by Fubini's theorem, let f : W ⊗ Y → R: W ⊗Y f (w, y) d((p ⊗ ∇ q) W Y )(w, y) = W ⊗X⊗Y ⊗Z f (w, y) d(p ⊗ ∇ q)(w, x, y, z) = W ⊗X⊗Y ⊗Z f (w, y) dp(w, x) dq(y, z) = W ⊗Y f (w, y) dp W (w) dq Y (y) and similarly the marginal on X ⊗ Z is given by p X ⊗ ∇ q Z . In other words, if the pairs are independent, the components from different pairs are also independent. It follows that P is bilax monoidal.