Back to the format

In probabilistic process algebras the classic qualitative description of process behaviour is enriched with quantitative information on it, usually modelled in terms of probabilistic weights and/or distributions over the qualitative behaviour. In this setting, we use behavioural equivalences to check whether two processes show exactly the same behaviour, and, if this is not the case, we can use behavioural metrics to measure the distance between them. Compositional reasoning requires that equivalence, or closeness, of behaviour of two processes are not destroyed when language operators are applied on top of them in order to build larger processes. Formally, the equivalence must be a congruence, and the metric must be uniformly continuous, with respect to language operators. Instead of verifying these compositional properties by hand, operator-by-operator, it is much more convenient to prove them for a class of operators once for all, and to check that the operators one is dealing with are in that class. This is achieved by means of SOS speciﬁcation formats: they consist in a set of syntactical constraints characterising a class of operators on the patterns of SOS rules, that deﬁne the operational semantics of languages. With this survey, we aim to collect and describe the speciﬁcation formats that have been proposed in the literature to guarantee the compositional properties of (variants of) bisimulation equivalences and bisimulation metrics in the probabilistic setting.


Introduction
The Structural Operational Semantics (SOS) framework was introduced by Plotkin [100] to provide a model for process algebras [20], and, thus, to equip processes with an operational semantics, i.e., a way to describe their computation steps and the (possible) state-changes in a step-by-step fashion.Briefly, the operational semantics of processes is defined as a Labelled Transition System (LTS) [83], i.e. the set of transitions (each modelling a computation step) that is generated from a collection of operational rules.Those rules allow us to infer transitions from transitions, according to the syntactic structure of processes.
Later on, following the findings of de Simone in his PhD thesis [105], the SOS framework has been successfully applied as a formal tool to devise a meta-theory of process algebras, i.e., to develop results that hold for classes of (process description, specification) languages.The most prominent example in this direction is the use of the SOS framework to favour compositional reasoning.Informally, we expect two processes that show the same behaviour to be inter-replaceable: if a system is specified as  (, ), i.e., the composition of processes ,  via the language operator  , and  shows the same behaviour of process , then the behaviour of  (, ), i.e., the system in which we replace  with , should be equal to that of  (, ).In the process algebraic setting, to formalise whether two processes show the same behaviour we make use of behavioural equivalences.These are equivalence relations defined over LTSs that allow us to establish whether the behaviours of two processes are indistinguishable for their observers.In the literature, several notions of behavioural equivalences have been proposed, each based on how much we abstract from the information carried by the LTS: for instance, a classic behavioural equivalence is (strong) bisimilarity [95,100], which identifies two processes if they can mimic each other's transitions and reach processes that are, in turn, equivalent.The interested reader is referred to the taxonomies in [69,71] for an exhaustive presentation of behavioural equivalences.A behavioural equivalence that is compatible with all language operators, i.e., for which the property of inter-replacement outlined above holds with respect to any operator  of the language, is called a congruence.
Given a behavioural equivalence and a language, one could prove the desired compositional property, like that of congruence, in a direct way, operator-by-operator.However, this approach is time consuming, and typically ends up in the development of long technical results that hold specifically over the considered language and, thus, cannot be reused to verify the same property on a different language.Since compositional results can be established from general properties of operators, it is much more convenient to identify the class of operators for which the desired property holds, and then check whether the operators of the considered language belong to that class.Rule formats were introduced with this exact purpose: they are a set of syntactical constraints over the operational rules ensuring that if all the rules for the language satisfy them, then the desired property holds over the generated LTS.Rule formats become specification formats if also some constraints on the set of rules are imposed.In particular, rule and specification formats designed to guarantee the congruence property are also called congruence formats.
In the last three decades, a variety of studies on the compositional properties of processes have been conducted.We refer the interested reader to the survey papers [1,98] for a presentation of the rule formats proposed in the classic process algebras literature.However, the advances made in the last decade on the modelling and behavioural analysis of probabilistic processes have called for new results on compositional reasoning.This survey is then a natural follow-up to [1,98] that focuses on probabilistic process algebras.

This survey: formats and probability
In probabilistic process algebras, the classic qualitative description of process behaviour outlined above is enriched with quantitative, probabilistic, information on it.In particular, the transitions in an LTS have to be modified in order to take also the probabilistic behaviour into account.However, following various interpretations of the interplay of nondeterminism and probability, as well as different perspectives on the probabilistic behaviour, in the literature we can find a wealth of proposals of probabilistic extensions of LTSs.To the best of our knowledge, there are only three probabilistic models for which specification formats have been proposed: the generative model (GPTS) [12,15,29], the reactive model (RPTS) [72,88], and the nondeterminisitc and probabilistic model (NPTS) [102].Briefly, in GPTSs nondeterminism is replaced by probabilistic choices; in RPTSs the possible reactions of a process to an input from the external environment are determined probabilistically; in NPTSs each nondeterministic choice by a processes induces a probability distribution over the next-state behaviour.Clearly, each model comes with its own syntactic presentation, and, thus, with its own formats.Moreover, the same wealth of approaches can be found, even enriched, in the definition of behavioural equivalences over each model, where the choice on how probabilities have to be taken into account and compared plays a fundamental role.We refer the interested reader to [22,24,77] for presentations of the different notions of probabilistic equivalences.A common feature to behavioural equivalences in the classic (i.e., non-probabilistic) and in the probabilistic settings, is that they relate processes that behave exactly the same.However, the information on probabilistic behaviour usually derives from statistical samplings, or measures on physical systems, and, thus, they are inevitably subject to errors and approximations.Consequently, one can be interested to know whether the behaviour of two processes is similar up-to some tolerance, or to evaluate how far the behaviour of two processes is apart.This led to the introduction of approximated equivalences (see, e.g., [3,52,53,67]) and behavioural metrics (see, e.g., [30,35,43,50,86]).The former ones quantify the differences in the probabilities to perform a single computation step.Conversely, the latter ones measure the differences arising along the entire computation.
Regarding compositional reasoning, what means to be compatible with language operators now depends also on whether we are considering (approximated) equivalences or metrics.While for probabilistic equivalences, this compatibility is still expressed in terms of congruence properties, in the case of approximated relations and metrics we need to ensure that the induced distances satisfy some uniform continuity properties with respect to the language operators.Informally, a small variation in the behaviour of processes should cause only a bounded variation in the behaviour of their composition: if process  differs by at most  from process , then the difference between  (, ) and  (, ) should be bounded by (), for a proper function , called modulus of continuity, that can correspond to non-extensiveness [9], non-expansiveness [51], or Lipschitz continuity [62].
In this survey we collect and discuss the formats for probabilistic behavioural equivalences, approximated equivalences, and behavioural metrics, over GPTSs, RPTSs, and/or NPTSs, that have been proposed in the literature.Specifically, we present: • Over GPTSs: -Congruence format for probabilistic bisimilarity (Section 7.1).
• Over RPTSs: Table 1 Summary of all the formats provided in this survey.
In Table 1 we present a recap of all the formats that we will present in the upcoming sections.

Organisation of contents
To the mere purpose of highlighting the various contents of this survey, we have divided it into five parts.Part I, which consists of Sections 2-6, includes all the background notions that are necessary to the presentation of the formats that are covered in the survey.In detail, Section 2 reviews all the basic notions of the SOS framework given for classic process algebras.Those notions are then extended to the probabilistic setting in Sections 3-6.In Section 3 we present the three probabilistic extensions of LTSs for which formats have been proposed: GPTSs, RPTSs, and NPTSs.The description of how these models are generated is then given in Section 4. Section 5 introduces all the equivalences, approximated equivalences, and metrics whose compositional properties, presented in Section 6, have been characterised by means of the formats discussed in the remainder of the survey.
Part II includes Sections 7-9, and covers the formats proposed to guarantee the congruence property of various (bi)simulation relations.Specifically, in Section 7 we present the congruence formats for bisimilarity over the GPTS and RPTS models.In Section 8, we consider the NPTS model and we review the congurence formats for (bi)simulation relations, and the congruence formats for (rooted) branching bisimilarity.Section 9 then recaps and compares the formats presented in this part.
Part III deals with approximated equivalences: Section 10 covers the non-expansiveness formats for -bisimulations over GPTSs and RPTSs, while Section 11 those for NPTSs.The part is concluded by Section 12 that highlights the principal features of the three formats.
Part IV consists of Sections 13 and 14, in which we present and compare all the continuity formats studied for the bisimilarity metric over NPTSs: the non-extensiveness, non-expansiveness, and Lipschitz continuity formats.
Part V concludes the paper with a brief discussion, in Section 15, on other probabilistic models existing in the literature, but for which no format has been proposed, and an informal presentation, in Section 16, of formats proposed outside of the process algebraic setting.

Process algebras and Structural Operational Semantics
In order to make our survey accessible to everyone, besides keeping it self-contained, we devote this section to a recap of standard notions in classic process algebras and in the structural operational semantics (SOS) framework.
Process algebras [20] are a classic tool for reasoning about the behaviour of concurrent, distributed systems.In this setting, systems are specified as processes, which are syntactically built as closed terms over an algebraic structure, called term algebra, that uses a set of operators to construct complex terms by combining simpler terms.The SOS framework [100] was introduced to provide a model for process algebras, by equipping processes with an operational semantics that describes their computation steps and the state-changes in a step-by-step fashion.This is achieved through the generation of a labelled transition system (LTS) [83] whose states are processes, and whose transitions between states are labelled with meta-variables, called actions, that abstract the computational steps inducing that state-transformation.
Our order of business for the remainder of this section is to formally introduce these concepts.

Term algebras
A signature Σ is a set of operators (or function symbols) each characterised by an arity (or rank), i.e., a number of arguments.Operators of arity 0 are called constants, while operators of arity one and two are called unary and binary, respectively.We let  , , ℎ, … range over Σ.
A process term is closed if it does not contain any variable.We let (Σ) denote the subset of the closed process terms in (Σ,   ).Closed process terms are also called processes.
It is common in the literature to write  instead of  ( ) when  ∈ Σ is a constant, and to use infix notation for binary operators, e.g., one writes  1 +  2 instead of +( 1 ,  2 ).Process terms will be henceforth simply called terms.The reason why   is called the set of process variables will become clear in Section 4.
Example 1. Assume a set of action names Act.These actions are meta-variables that describe, in an abstract fashion, what happens in one computation step, including the possible interactions of the process with the environment.Let Act denote the set of action co-names, i.e., Act = { |  ∈ Act}.As usual, we postulate that  =  and  ≠  for all  ∈ Act.Then, we let  = Act ∪ Act ∪ {}, for a special action name  ∉ Act, known as the silent, or internal, action.Let , , … range over .We remark that  is undefined.
The syntax of (recursion-free) Milner's Calculus of Communicating Systems (CCS) [96] over  is generated by the following grammar: where  ∈   ,  ∈ ,  ⊆ Act, and  ∶  →  is a relabelling function such that  () =  and  () =  () for all  ∈ Act.Hence, the signature of CCS consists in: (i) the constant ; (ii) the unary operators action prefixing (.⋅ ), relabelling ( ⋅[ ]), and restriction ( ⋅∖); (iii) and the binary operators nondeterministic alternative composition ( ⋅ + ⋅ ), and parallel composition ( ⋅ ∥  ⋅ ).These operators allow us to model the following process behaviours. is the idling process, i.e., the process that cannot perform any action.Process . performs action  and then proceeds its computation as process .Process  1 +  2 behaves either as  1 or as  2 , discarding the process that is not chosen.The choice is nondeterministic in the sense that it is not possible to establish a priori which process will be selected (as the choice might depend on an interaction with the environment).The parallel components in  1 ∥   2 can either interleave their behaviour, or they can synchronise (communicate) over actions  and .The result of a synchronisation is the execution of a  action by  1 ∥   2 .We remark that, since  is undefined, communication cannot occur over silent actions.Relabelling assumes a function  ∶  →  (which behaves like the identity over  and is commutative with ⋅ ), which is applied to the actions performed by the argument process.The restriction operator is used to inhibit actions by a process: ∖ behaves as  but the actions in  ∪  are forbidden.Remark 1.When writing CCS terms, sometimes trailing occurrences of  will be omitted, i.e., we may simply write, e.g.,  +  instead of . + , for any  ∈ ,  ∈   .

Labelled transition systems
The semantics of processes is defined in terms of (labelled) transitions modelling the computation steps.A transition of the form   ← ← ← ← ← ← ← ← →  ′ expresses that process  performs a computation step that is described by the action-label , and then proceeds its computation as specified by process  ′ .The collection of all the transitions describing the behaviour of processes is called a labelled transition system (LTS) [83].
Definition 2 (LTS, [83]).A labelled transition system (LTS) is a triple ((Σ), , ← ← ← ← →), where (Σ) is a set of processes (or states),  is a set of action labels, and Following the notation conventions from the literature, we use Process  is called the source of the transition, whereas  ′ is called the target.Moreover, for each  ∈ (Σ) and  ∈ , we write To improve readability, and as common practice, in the examples we will present the behaviour of a process as the process graph, having processes as vertices, and transitions as edges, that can be constructed from the LTS.

Transition system specifications
In order to show how an LTS is generated from the specification of processes, we need to introduce the notions of transition system specification and substitution.
Assume a signature Σ, a set of process variables   , and a set of action labels .
A positive literal (respectively, negative literal) is an expression of the form Definition 3 (TSS, [75]).A SOS rule has the form   , where  is a set of literals and  is a positive literal.A transition system specification (TSS) has the form (Σ, , ), with  a set of rules.
Example 3. The SOS rules for the CCS operators introduced in Example 1 are reported in Fig. 2.
A substitution over   and Σ is a mapping  ∶   → (Σ,   ) from variables to terms, and it is called a closed substitution if all variables are mapped to closed terms, i.e. () ∈ (Σ) for all  ∈   .Substitutions are extended from variables to terms, literals, and rules by element-wise application.In particular, a closed substitution instance of a literal (respectively, rule) is called a closed literal (respectively, closed rule).
The intuitive meaning of SOS rules is that if the conditions expressed by the literals in the  are met by the processes substituted to the terms occurring in them, then the closed substitution instance of the  can be derived.Formally, an LTS is derived from a TSS through the notion of proof.Definition 4 (Proof, [70]).A proof from a TSS  = (Σ, , ) of a closed literal  is a well-founded, upwardly branching tree, with nodes labelled by closed literals, such that the root is labelled  and, if  ′ is the label of a node  and  is the set of labels of the nodes directly above  (i.e., the labels of the children of node labelled  ′ ), then: • either  ′ is positive and   ′ is a closed substitution instance of a rule in , • or  ′ is negative and for each closed substitution instance of a rule in  whose conclusion denies  ′ , a literal in  denies one of its premises.
A closed literal  is provable from  , notation  ⊢ , if there exists a proof from  of .The LTS induced by  has as transitions the closed literals that are provable from  .
V. Castiglioni, R. Lanotte and S. Tini Remark 2. As noticed in [74], it might happen that if some rules in the TSS have negative premises, then it is not straightforward to associate an LTS to it, since, for instance, the TSS may allow us to derive transitions that deny each other.We refer the interested reader to [70] for a thorough discussion of this issue and the possible solutions to it.We limit ourselves to remark that all the TSSs that we consider in this survey are meaningful, and each one of them generates a unique model corresponding to the LTS whose transition relation contains exactly the closed positive literals that are provable from it.
The notion of disjoint extension for a TSS allows us to introduce new operators without affecting the behaviour of those already specified.

Definition 5 (Disjoint extension). A TSS 𝑇
and  ′ introduces no new rule for any operator in Σ.

Behavioural equivalences
So far, we have seen that systems are specified as processes whose behaviour is modelled as an LTS, and how such an LTS is generated.A natural question is then to verify whether the behaviour of a process is equal to the intended behaviour of the system.Hence, we need to establish a criterion to compare the behaviour of processes.Behavioural relations have been proposed with this exact purpose: abstract from unwanted details of process behaviour and identify those processes that interact "similarly" with the environment.They consist of behavioural equivalences and behavioural preorders, where a preorder is a relation that is reflexive and transitive, while an equivalence is a preorder which is also symmetric.Equivalences are usually used to establish whether two processes are indistinguishable for behaviour, whereas preorders are mostly used to establish process refinements with respect to behaviour.
Due to the large number of properties which may be relevant in the analysis of systems, many different theories of equivalences were proposed.Essentially, two processes are considered equivalent if: (i) either they perform the same sequences of actions (trace approach); (ii) or they perform the same sequences of actions, and after each sequence they are ready to accept the same sets of actions (testing approach); (iii) or they perform the same sequences of actions, and after each sequence they exhibit, recursively, the same behaviour (bisimulation approach).Moreover, in an orthogonal way, the strong approach to equivalences and the weak one differ since only in the first case the processes are discriminated also on the basis of their ability to perform internal computation steps, such as communications between subsystems, that are not visible to the external observers.For an exhaustive presentation of behavioural equivalences, preorders, and their relations, we refer the interested reader to van Glabbeek's linear time -branching time spectra [69,71].

Reasoning compositionally: congruences and formats
Equipping processes with a semantics is not the only application of the SOS framework.One of the main concerns in the development of a meta-theory of process languages is to guarantee their compositionality, namely to prove the compatibility of the language operators with the behavioural relation chosen for the application context.In algebraic terms, this compatibility is known as the congruence (resp.precongruence) property of the considered behavioural equivalence (resp.preorder).This property, formally introduced later in Definition 27, guarantees that the substitution of a subcomponent of the system with an equivalent one does not affect the behaviour of the system.
The SOS framework plays a crucial role in supporting the compositional reasoning and verification: a rule format is a set of syntactical constraints over SOS rules ensuring the desired semantic properties of the transition system derived from them.A rule format becomes a specification format if some constraints are imposed also on the set of the rules.Thus, one can prove useful results, as the (pre)congruence property, for a whole class of languages at the same time.These formats are also referred to as congruence formats, and we refer the interested reader to the survey papers [1,98] for a presentation of existing formats for classic behavioural relations.For instance, the De Simone format [106] ensures that trace equivalence is a congruence; the GSOS format [27] works for bisimilarity; and in [26] the ntyft/ntxt format [74] is reduced to the ready trace format and its variants to guarantee that decorated trace preorders are precongurences.
V. Castiglioni, R. Lanotte and S. Tini Example 4. As an example of a rule format, we present the Grand SOS (or simply GSOS) format of Bloom, Istrail, and Meyer [27].Our choice fell on it because all the formats that we will discuss in this survey can be considered as extensions of this format to the probabilistic setting.
We say that an SOS rule is in GSOS format if it has the form where: (i)  is the arity of  ∈ Σ; (ii)  ∈ (Σ,   ); (iii)   ,   ∈ ℕ for each  = 1, … , ; (iv) the variables   and   appearing in the premises are all distinct and the only variables that occur in the rule.Notice that all the rules for CCS operators, presented in Fig. 2, are in GSOS format.

Models of probabilistic processes
Probabilistic process algebras [7,45,80] extend classic process algebras, by adding suitable operators allowing us to express probability distributions over sets of possible events or behaviours.
Example 5. We recall some operators from [6,7,17,42,47,49] that we will use throughout the survey to exemplify various definitions and results.The non-probabilistic operators are inherited from CCS (given in Example 1), Communicating Sequential Processes (CSP) [79] and the Algebra of Communicating Processes (ACP) [18,19].Here we introduce these operators only informally, and the SOS rules formalising their semantics will be given in next sections.We remark that our aim is not to consider a single signature including all these operators, but rather to showcase the most common (and best known) operators from the literature, and use them in the upcoming examples.
Besides CCS's idling process , examples of constants are the ACP-like process , which performs action √ and terminates with success, and the ACP-like process , which performs the action  ∈  and will proceed as .This description assumes that √ is a special element in  denoting the successful termination.
Probabilistic CCS-like/CSP-like action prefixing allows for composing an action with a probabilistic choice of processes.In detail, process .
⨁  =1 [  ]  performs the action  and continues as process   with probability   .We may write .( 1 ⊕   2 ) if  = 2,  1 =  and  2 = (1 − ).Then, we may write . 1 if  = 1 and  1 = 1.In order to describe sequences of behaviours, in CCS and CSP we use action prefixing, whereas in ACP sequential composition is used:  1 ;  2 starts running as  1 and will proceed as  2 after the successful termination by  1 .Finite iteration   is used as a syntactic sugar for ; … ; , with  occurrences of .
Infinite iteration   let an infinite number of copies of  run one after the other.However, if  can terminate with success, also   and   can.The unary Kleene star process  * from [13] differs from   in that it can terminate with success regardless of whether  can do the same or not.
The ACP-like merge  1 ∥  2 is a form of parallel composition where the two processes  1 and  2 can interleave their actions and can also synchronise.In order to achieve synchronisation, we need a partial function ×∶  ×  →  mapping some pairs of actions describing the steps of  1 and  2 to the action describing their synchronisation.If  1 ×  2 is undefined, then the steps described by  1 and  2 cannot synchronise.In the CSP-like parallel composition  1 ∥   2 the processes  1 and  2 are forced to synchronise on actions in  and interleave actions not in , where  is a subset of .We remark that CSP-like parallel composition allows for multiparty communications, whereas in CCS communication is a two-party handshaking (cf.Fig. 2).The probabilistic interleaving  1 ‖   2 interleaves the actions of the process arguments  1 and  2 with respect to a probability weight .In detail, whenever a given action  can be performed by both  1 and  2 , then  1 makes  with probability  and  2 with probability 1 − .If  can be performed only by  1 (resp. 2 ) then  1 ‖   2 performs the -move of  1 (resp. 2 ).The probabilistic alternative composition  1 +   2 differs from  1 ∥   2 in that the process that does not move is discarded.
The replication !  creates an arbitrary number of copies of argument  that will run in parallel.More precisely, they are combined by using operator ∥  .In its probabilistic version !(,)  from [97], replication is realised with probability 1 − .Clearly, replication and probabilistic replication may exploit a different version of parallel composition, without any conceptual difference.
The priority operator    allows the argument process to perform  only if it cannot perform , whereas all other actions that can be performed by the argument process  can be performed also by    ().
As transitions are used to model the behaviour of processes, it is clear that, in the probabilistic setting, we need to modify the relations of the form   ← ← ← ← ← ← ← ← →  ′ in order to capture also the probabilistic behaviour.However, several approaches exist, each describing the probabilistic behaviour from a different perspective.Hence, in the literature we can find a wealth of proposals of probabilistic extensions of LTSs.In this section, we recall the three models for probabilistic processes for which rule formats (that are the crux of our survey) have been studied, namely the generative (Section 3.1), the reactive (Section 3.2), and the nondeterministic and probabilistic (Section 3.3) models.For completeness, the interested reader can find an informal description of the probabilistic models for which no rule format has been given in the literature in Section 15.

The generative model of probabilistic processes
In the generative, or full, model of probabilistic processes [12,15,29,72,108], for each process a single probability distribution is ascribed to all its moves.In [72] the mechanistic view of the generative model is explained in terms of button pushing experiments.Intuitively, the environment, or observer, experiments on the process by attempting to depress the buttons that constitute its interface.Buttons correspond to action labels.Briefly, the observer tries to push all buttons at a time.The experiment succeeds if there are buttons that are unlocked and, in that case, the process will decide according to a prescribed probability distribution which unlocked button will go down.Then, the process is ready for the next experiment.
In the generative model, computation steps are described by transitions of the form →   ′ , expressing that process  has probability  to perform an action described by  and to reach the process  ′ .As argued in [72],  is an index allowing for distinguishing →   ′ from another transition from  to  ′ labelled with  and having the same probability weight .We use "{ |" and "| }" as brackets for multisets.Definition 6 (GPTS, [12,15,29]).A generative probabilistic transition system, or GPTS for short, is of the form ((Σ), , , ← ← ← ← →), where: (i) Σ is a signature, (ii)  is a set of action labels, (iii)  is a set of indexes, and (iv) Definition 6 requires that each process  ∈ (Σ) be semistochastic, namely the sum of the probability of its outgoing transitions, if there are any, sum up to 1.Other definitions exist in the literature, such as those in [72,108] which admit that the sum of the probability of the transitions leaving from a process is a value 0 ≤  < 1, the interpretation being that in that process deadlock has probability 1 − .

We will write 𝑡
As a further notation, for a set of action labels  ⊆  and a value 0 ≤  ≤ 1, we write → denotes that there is no transition with label in  leaving from .We write Example 6.In Fig. 3a we depict the GPTS of process  = (.+ 1∕3 .) + 1∕2 ( + 1∕2 ).

The reactive model of probabilistic processes
In the reactive model of probabilistic processes [72,88], the kind of action of a process is chosen nondeterministically, and, then, a probability distribution is ascribed to all moves of the process labelled with that action.If we reason in terms of button pushing experiment, in the reactive case the observer selects one of the buttons and tries to depress it.The experiments succeed if the selected button is unlocked.The choice of the button is completely under the control of the observer, therefore it is nondeterministic under the point of view of the process.More precisely, for the process this is an external nondeterminism.If the selected button goes down, the process makes an internal state transition according to a probability distribution that is associated with the process and that button and is ready for the next experiment.Definition 7 (RPTS, [72,88]).A reactive probabilistic transition system, or RPTS for short, is of the form ((Σ), , , ← ← ← ← →), where: (i) Σ is a signature, (ii)  is a set of action labels, (iii)  is a set of indexes, and (iv) ← ← ← ← →⊆ (Σ) ×  × (0, 1] ×  × (Σ) is a transitions relation such that, for all processes  ∈ (Σ) and action labels  ∈ : Definition 7 requires that, for each process, the probability of its outgoing transitions labelled with some action  ∈ , if there are any, sum up to 1.This is known as the property of reactivity.
̸ are defined for RPTSs as for GPTSs.

The nondeterministic and probabilistic model of processes
The nondeterministic and probabilistic model [40,102] can be viewed as a generalisation of the reactive model.Besides the nondeterministic choice on the kind of action performed by the process, there is also a nondeterministic choice among several probability distributions over the moves by the process that are associated with that action.Reasoning in terms of button pushing experiments, V. Castiglioni  as in the reactive case the observer selects one of the buttons in the process interface.Again, for the process this is an external nondeterministic choice.If the button is unlocked and goes down, then the process selects one of the available reactions to that button pressing, namely there is an internal nondeterministic choice, and makes an internal state transition according to a probability distribution that is associated with the selected reaction.
The introduction of the nondeterministic and probabilistic model requires to deal explicitly with distributions over processes, namely mappings  ∶ (Σ) → [0, 1] with ∑ ∈(Σ) () = 1.We use ((Σ)) to denote the set of all probability distributions over (Σ), and we let ,  ′ , … range over ((Σ)).Definition 8 (NPTS, [40]).A nondeterministic probabilistic transition system, or NPTS for short, is a triple of the form ((Σ), , ← ← ← ← →), where: (i) Σ is a signature, (ii)  is a set of action labels, and (iii) →  holds for no distribution .Summarising, NPTSs allow us to model, at the same time, reactive behaviour, nondeterminism and probability: A process can react to several events coming from the environment, each represented by a label in ; then, for each event, there may be a nondeterministic choice between several reactions, namely a set of equally labelled transitions; finally each transition takes to a probabilistic choice over processes.
The NPTS model is known also as the Segala's model, since NPTSs were introduced in Segala's thesis [102] (and some previous works [101,103]).However, in this survey we adopt the equivalent process algebraic formulation proposed in [40], since it allows for a simpler, albeit general, formalisation.
In the following, we will consider a particular set of distributions over processes: • for a process ,   is the Dirac (or point) distribution defined by   () = 1 and   ( ′ ) = 0 for  ′ ≠ ; • for an operator  ∈ Σ of rank  and distributions   ,  ( The support of a distribution , i.e., the set of processes  such that () > 0, will be denoted by ().

Remark 3.
In this survey, we consider only distributions with a finite support.
Example 8.In Fig. 3c we report the NPTS of process The distributions, that are reached via the action-labelled transitions, are represented by means of dotted arrows having the probability weight assigned to the target process as label.For brevity, we use • to denote the distribution   .

Probabilistic transition system specifications
The theory of TSSs (see Section 2.3) has been extended to the probabilistic case.Clearly, the differences in the models presented in Section 3 are reflected on the corresponding specifications.We discuss the technical subtleties characterising them in Sections 4.1, 4.2, 4.3.
Before entering in the details of those specifications, we remark that the basic syntactic notions introduced in Section 2, like those of process term, substitution, and rule, remain unchanged in the probabilistic setting.

Transition system specifications for the NPTS model
We start with the NPTS model, since the SOS rules for it are simpler than those for other models.What makes these rules special is that they use specific terms allowing us to express distributions over terms.Indeed, we have seen above that the peculiarity of the NPTS model is that the transitions take processes to distributions over processes.Hence, the literals constituting the premises and the conclusions of the SOS rules for this model should include a syntactic representation of distributions.To this end, distribution terms (or open distributions) have been introduced as expressions containing distribution variables that can instantiate to distributions.The set of distribution variables will be denoted by   , disjoint from   and Σ, and ranged over by , , … .We will use ,  ′ , … to range over   ∪   .Definition 9 (Distribution term, [39,40,42]).For sets of process variables   and distribution variables   , the set (Σ,   ,   ) of the distribution terms over Σ,   ,   , ranged over by ,  ′ , … , is the least set s.t.:  [39,40,42,90]).A Probabilistic SOS rule, or PSOS rule, has the form: →  are called, respectively, positive premises (pprem()), negative premises (nprem()), and conclusion (conc()).The set of all premises is denoted by prem().Then, the term  ( 1 , … ,   ) is the source (src()), the variables  1 , … ,   are the source variables (  ∈ src()), and the distribution term  is the target (trgt()).
Example 11.All rules in Table 2 are PSOS rules.These rules specify the operators whose intuitive meaning was introduced in Example 5. We need to add some technical comment.The first rule for  +1 formalises that  −  copies of  terminate with success, one copy performs  and  copies remain active.The second and third rules for the considered operator are similar and consider the cases  = 0 and  = , respectively.Notice that the case  =  differs from the case  <  since the ability by  to terminate with success is not among the premises of the rule.The fourth rule for  +1 captures the case where all  + 1 copies of  terminate.Function × used in ACP-like merge requires that In order to derive a NPTS from a TSS, one can exploit the notion of supported model, which was given in the probabilistic setting in [39,40,42,90] and derived from the classic notion from [27,28,70].In [39,40,42,90] the considered rules are more general than the PSOS rules in Definition 10.However, if we restrict to PSOS rules, the NPTS that is derived from a TSS contains precisely the closed literals that are provable according to Definition 4.

Transition rules for the generative model
The rules for the generative model are more complicated than those for NPTSs, since we have to specify how the probability weight and the index of the conclusion are derived from those of the positive premises.
Given a set  ⊆ {1, … , } such that  = { Definition 11 (Generative rule, [87]).A generative rule  for an operator  ∈ Σ has the form unneeded premises, where "unneeded" indicates that their probability  ′  could be 0. The expressions  ℎ premises, where "unquantified" indicates that their probability is not relevant.The expressions The notions of conclusion, source, source variable and target are as for the rules in Definition 10.
Remark 4 (Double testing).Definition 11 does not admit double testing, namely, two (or more) active premises cannot test moves of the same process argument.In [87], the authors explain how this feature can be added to generative rules, but given the heavy amount of technical subtleties necessary to do that, we decided not to report it here.Notice that double testing is admitted by Definition 10.
In Definition 11 we assume that the set of indexes  contains an index () for each rule  having a constant as source.Moreover, given any rule  with active variables {  |  ∈ } and a set of indexes {  |  ∈ } ⊂ , we assume that also (, [  ] ∈ ) is an index in .
Example 12.All rules in Table 3 and Table 4 are generative rules.They appear in two separate tables since the rules in Table 3 work also for the reactive case (presented in Definition 12 below).These rules specify operators from [4,12,29,41,67,72].Even the simpler rules, like those for operators  and , look heavier than those in Table 2 due to the probability weight and the index in the transitions.We note that in the generative case, prefixing does not introduce any probabilistic behaviour.Operator +  is used for this purpose.The rules for  1 ;  2 in Table 3 are similar to those in Table 2.The rules for finite iteration  +1 in Table 4 are partitioned into two sets: one set contains the rules with a negative premise , and are applied if  cannot terminate with success; the other set contains the rules with an unquantified premise and are applied when  can terminate with success.The first rule in Table 4 (actually, we have a rule for each action label  ∈  ⧵ { √ }) belongs to the first set and describes the case where  cannot terminate.If  can terminate, one applies the other two rules, which belong to the second set.If  terminates with probability  1 , then for any  ≥ 0 it holds that  − 1 is the probability that  −  copies of  terminate with success one after the other.If this event is realised for  = 0, namely  copies of  terminate, then the remaining copy of  can either perform a non-√ action or terminate.This is the situation formalised in the second rule for  +1 .If that event is realised for  ≥ 1, then if the first of the remaining copies of  performs a non-√ action then  copies of  remain active.This is the situation formalised in the third rule for  +1 .The rules for   and ! in Table 3 are similar to those in Table 2.In the generative case, the Kleene star, which is specified in Table 4, must be parametric with respect to a probability weight , the idea from [17] being that  * , behaves as  with probability 1 −  (first two rules) and terminates with success with probability  (third rule).The rules for probabilistic alternative composition  1 +   2 and probabilistic interleaving  1 ∥   2 in Table 4 are partitioned in three sets, capturing the cases where both  1 and  2 can move, only  1 can move, and only  2 can move.Rules in the first set have positive premises for both arguments, rules in the second and third set use negative premises.Clearly, given a particular instance of process arguments  1 and  2 , only the rules in one of those three sets will be applicable.Notice that in the generative case  1 +   2 and  1 ∥   2 work differently with respect to the NPTS case formalised in Table 2. Essentially, in Table 2 the probabilistic choice applies only when both  1 and  2 intend to perform the same action, whereas in Table 4 the generative nature of probability imposes that when both  1 and  2 can move then  1 moves with probability  and  2 with probability 1 −  independently of the actions they intend to perform.Also operator ∥ (Table 4) is different with respect to its nondeterministic and probabilistic version.In detail, in the generative case we assume that  1 ×  2 is always defined whenever Without such an assumption, the probability of all pairs of actions performed by  1 and  2 that cannot be combined by × should be redistributed to the allowed synchronisations, thus making calculations really boring.However, the rules are anyway quite complicated, the reason being that a √ action by one process cannot synchronise with a non-√ action by the other, thus implying that the overall probability that the two processes attempt to synchronise a √ with a non-√ action should be redistributed to the probability of the allowed synchronisations.Such a normalisation of probability is done by exploiting unneeded premises, which allows us to quantify the overall probability of performing √ by the process arguments.The rules are partitioned into four sets, capturing the cases where both  1 and  2 can terminate with success, only  1 can, only  2 can, both cannot.The rules for relabelling and restriction are similar to those in Table 2, the only difference being that restriction  ⧵  requires unneeded premises allowing for normalising the probability of transitions by taking into account that transitions from  with label in  are inhibited.Priority    () requires both unquantified and unneeded premises: if the process argument  performs  then    () cannot perform , therefore the overall probability weight of the -moves by  must be used for normalisation.

Table 4
Some operators of generative process algebras.

𝑥 𝑎,𝑝
We notice that the alternative composition  1 +  2 cannot be a generative operator.Moreover, operator  1 ∥   2 could be defined, but whenever only one of the two processes performs an action in  then its probability should be distributed to the allowed actions, thus requiring some complicate normalisation.Also CCS-like parallel composition  1 ∥   2 may be defined, but that would require to deal with the following issue: since an -transition can be performed autonomously or together with a -transition of the partner process, if synchronisation is possible then the probability of the -transition should be partially assigned to the autonomous move and partially assigned to the handshaking.
We notice that unneeded premises are not required to derive the conclusion, meaning that  ′  could be 0 for some  ∈  .Unneeded premises are used only to compute the probability weight of the conclusion.More precisely, they allow for normalising probability, which is necessary in several operators of generative process algebras, such as restriction and priority.Indeed, the probability of the conclusion is the product of   , which is the weight assigned to the rule , by , which is the conditional probability that each process argument with index  ∈  performs   under the assumption that each process argument with index  ∈  is not allowed to perform actions in   (note that, coherently, clause (iii) in Definition 11 ensures that   ∉   when  = ).The choice of using sets of actions   instead of single actions   in unneeded premises is mandatory, since there are cases where an unneeded premise → would not be sufficient for normalisation.For instance, if process   is the argument of index  and has two different moves → would erroneously give rise to a normalisation factor 1 − 1 ∕6, whereas the unneeded premise → takes into account the probability of all   moves of   and gives rise to the correct normalisation factor 1 − 2 ∕6.
Unquantified premises do not contribute in computing the probability of the conclusion.They are necessary conditions for the application of rule .We could replace unquantified premises by active premises.In fact, an unquantified premise  ℎ where neither  ℎ, nor  ℎ, nor  ℎ, are mentioned in the conclusion.According to [87], distinguishing unquantified and active premises improves readability.

Also negative premises 𝑥 𝑘 𝐶
̸ mentioning only one action.The more compact notation was chosen for readability.

Transition rules for the reactive model
The notions of rule and TSS for the reactive case are similar to those for the generative case.
Definition 12 (Reactive rule, [87]).A reactive rule  for an operation  has the form where: (i) , ,  are sets of indexes for arguments of  ; (ii)   ,  ∈  are actions and  ℎ ,   ⊆  are sets of actions; (iii)   is a variable with range (0, 1] and   is a variable with range ; (iv)  ∈ (Σ) is a term; (v)   ∈ [0, 1] and is called the weight of .

Remark 5 (Double testing II).
As in the case of generative rules, Definition 12 does not admit double testing.Again, in [87] the authors explain how this feature can be added to reactive rules.
Since normalisation of probability is not required in the reactive case, Definition 12 does not need unneeded premises.Also in this case, the notions of supported transition and supported model for RTSSs are analogous to those presented for PSOS-TSSs.
Example 13.All rules in Table 3 and Table 5 are reactive rules.The rules for sequencing  1 ;  2 , iterations   ,  +1 and  * , replication !, restriction  ⧵  and priority    () are similar to those in Table 2, modulo the different shape of transitions.The rules for  +1 differ from those in Table 2 and Table 4.For any action label , the rules for  +1 are partitioned in two sets, which are discriminated by the ability, or not, of performing √ by .In particular, if  can terminate, an arbitrary number  ∈ [0, ] of copies of  terminate before the subsequent occurrence of  makes the  move.In order to guarantee the reactivity property, we assume that the probability that  copies of  terminate is 1 ∕( + 1), for each  ∈ [0, ].In this way, the probability weight  of a transition  , →  ′ is partitioned into  + 1 probability weights  ∕( + 1) assigned to transitions taking  +1 to processes  ′ ,  ′ ; , . . .,  ′ ;   .For each action label , the rules for probabilistic alternative composition  1 +   2 are partitioned in three sets: rules applied when both  1 and  2 can perform , rules applied if only  1 can perform  and rules applied if only  2 can perform .Probabilistic interleaving is similar.Relabelling requires care in order to avoid that mapping two or more action labels to the same action label breaks reactivity property.If the process argument  can perform precisely  actions that are mapped by  to the same , then the probability weight  of a transition Notice that  1 ∥   2 cannot be defined in any trivial way, since synchronisation over different actions would give rise to -transitions, thus implying that guaranteeing that the overall probability of -transitions is 1 is really complicated.Also  1 ∥   2 is not immediate to be defined, since actions not in  may be performed by  1 with overall probability 1, and the same for  2 , thus implying that the probability of the -transitions by  1 and  2 should be weighted in some way.

Behavioural equivalences and metrics
Now that we have seen how the behaviour of probabilistic processes is modelled, we can discuss the tools that have been proposed for its analysis.In this section we introduce some notions of probabilistic behavioural equivalence, approximate equivalence and metric over probabilistic processes.
As previously outlined, when working with classic process algebras, we can use behavioural equivalences to identify those processes that interact "similarly" with the environment.Probabilistic behavioural equivalences [21,25,31,48,88,102,103,117] are their extensions to probabilistic processes: they formalise when two processes are able to engage the same interactions with the environment, with the same probability.Essentially, two processes are equivalent if they are indistinguishable by any external observer from both points of view on behaviour: qualitative and quantitative.However, it may happen that processes exhibit the same qualitative behaviours, but there are differences in the probability of their realisations, i.e., processes may have different probability to interact similarly with the environment.When using behavioural equivalences, even the tiniest difference in the probabilities causes the two processes to be considered totally unrelated.Approximate equivalences [3,4,52,53,67,112] and behavioural metrics [30,35,43,46,50,57,67,107] are less demanding than equivalences, since they relate two processes if they are approximately indistinguishable by any external observer.Technically, equivalences answer the question of whether two processes behave exactly in the same way or not, whereas approximate equivalences and behavioural metrics quantify the differences in their behaviours.Indeed, the approaches of approximate equivalences and metrics differ from each other.As we will see, in the former approach the distance between two processes is determined by the maximal difference that can be evaluated on a single computation step.Conversely, in the latter approach the behavioural distances arising along a computation path accumulate and are weighted by the probability of the realisation of that path.Remark 6.The different approaches to behavioural relations discussed in Section 2.4, have been applied also to the probabilistic case, and spectra for equivalences [21,25] and metrics [35] have been proposed.In the following, we will restrict our attention to the bisimulation approach, since it is the only one for which rule formats have been developed.

Strong (bi)simulation
We introduce the notion of probabilistic (strong) bisimulation [72,88,102].The intuition is that this equivalence relates two processes if they are able to mimic each other's transitions and evolve to distributions that are, in turn, equivalent.This intuition is usually called the probabilistic bisimulation game.
The bisimulation game is formalised in different ways, depending on the probabilistic model that we consider.We start with the generative and the reactive models.To this end, we need to introduce the aggregate probability distribution function: (, ,  ) computes the total probability to reach a set of processes  , through transitions labelled with action , from a process .Adopting the convention that summation over the empty set is 0, this function can be defined as follows: Definition 13 (Aggregate probability distribution function).Assume a GPTS, or RPTS, ((Σ), , , ← ← ← ← →).Then, the aggregate probability distribution function  ∶ (Σ) ×  × 2 (Σ) → [0, 1] is the function given by: Notice that the definition of  works for both the generative and the reactive models of processes.We will write   if we want to stress that we are dealing with a GPTS, and   for a RPTS.Remark 7. In the case of NPTSs, we have implicitly an aggregate distribution function for each transition.

Definition 14 (Generative bisimulation
Generative bisimilarity is the union of all generative bisimulations, it is, in turn, a generative bisimulation.We denote it by ∼  , and write  1 ∼   2 for ( 1 ,  2 ) ∈ ∼  .Reactive bisimilarity is the union of all reactive bisimulations, it is, in turn, a reactive bisimulation.We denote it by ∼  , and write  1 ∼   2 for ( 1 ,  2 ) ∈ ∼  .
Notice that the only difference between the two notions of bisimulation presented above is in the probabilistic model over which they are defined: a GPTS for generative bisimulations, and a RPTS for the reactive ones.
If we consider the model of NPTSs, the notion of bisimulation stems from [102].In order to present it formally, we need to introduce the lifting of a relation over processes to a relation over distributions over processes.To this end, we rely on the notion of matching, also known as measure coupling [116].
The notion of bisimulation for the NPTS model can be given together with those of two preorders, known as simulation and ready simulation.We opted for this choice since rule formats have been developed also for these preorders, and will be presented in Section 8.

Branching bisimulation
The special silent action label  was introduced by Milner [96] to describe those computation steps that are not observable by the external environment, like internal communications between subsystems.The weak approach to equivalences abstracts from -labelled transitions, meaning that processes are not discriminated by their ability to perform them or not.We will introduce weak equivalences only for NPTSs, since the only rule formats provided in literature for weak equivalences over probabilistic processes consider that model.
The weak approach is technically based on the notion of weak transition.Following [48], in order to introduce weak transitions in NPTSs one needs to lift the transition relation to a relation on distributions, called hyper-transition in [91,92].We assume that  ∉  and let   =  ∪ {}.Moreover, we pursue in using , , … to range over  and we will use , , … to range over   .
Branching bisimulation [73] is well known to be the weak equivalence that better preserves the branching properties of processes.We recall here the notion of probabilistic branching bisimulation as presented in [36,37].The proposed formulation is equivalent to the original, scheduler-free, version introduced in [8,91].
Definition 19 (Branching bisimulation, [36,37]).Given an NPTS ((Σ), As it is well known from classic process algebras, to ensure compositionality with respect to nondeterministic choice operators, we need to impose the root condition, which requires that an initial -transition is mimicked by a -transition, as if they were observable actions.

Definition 20 (Rooted branching bisimulation). Assume an NPTS
The union of all (rooted) branching bisimulations is the greatest (rooted) branching bisimulation, is denoted ≈ b (resp.≈ rb ) and called (rooted) branching bisimilarity.In NPTSs with no divergence, namely without any infinite sequence of -transitions, ≈ b and ≈ rb are equivalence relations [37].

Approximate bisimulation
Approximate bisimulations [2,4,5,[52][53][54][55]112] are parametric with respect to a threshold  ∈ [0, 1], representing the maximal "acceptable" difference between processes, and are, thus, also called -bisimulations.They relax the notion of bisimulation in that they relate two processes if their probability to reach a set of processes, that are in turn -bisimilar, differs by at most .Approximate bisimulations have been characterised in operational terms [67], by a modal logic [52,112], and in terms of games [52].
V. Castiglioni, R. Lanotte and S. Tini Before giving formal definitions, we observe that two processes with quite different behaviours could be linked by a sequence of processes, which, pairwise, have only a small behavioural difference.For instance, it may happen that both, the behavioural distance between  1 and  2 , and that between  2 and  3 , are , whereas the distance between  1 and  3 is 2.Therefore, if one defines -bisimulations as equivalence relations, then there may be -bisimulations containing ( 1 ,  2 ), and other -bisimulations containing ( 2 ,  3 ), but none can contain both these pairs, thus implying that -bisimulations cannot be closed with respect to union, and no notion of "the greatest" -bisimulation can be given.Conversely, if we drop the transitivity requirement from the definition of bisimulations, then there may be -bisimulations containing both ( 1 ,  2 ) and ( 2 ,  3 ).Summarising, either we renounce to the property of closure with respect to union for -bisimulations, or we renounce to define them as equivalence relations.The former is the approach in [3,4,53], where GPTSs and RPTSs are considered, the latter is the approach in [52,55,112], that deal with NPTSs.
We start by introducing approximate bisimulations for the generative and reactive models.
Analogously, assuming a RPTS ((Σ), , , ← ← ← ← →), an equivalence relation The idea behind Definition 21 is that if two processes are equated by some generative, or reactive, -bisimulation, then their behavioural distance is ≤ .We remark that, as in the case of Definition 14, while the formulations of the two notions of approximate bisimulations are the same, the aggregate distribution functions are evaluated over different probabilistic models: a GPTS in the case of generative -bisimulations, and a RPTS for the reactive ones.Since Definition 21 and Definition 14 coincide when  = 0, the distance between processes equated by some generative or reactive bisimulation is 0. Technically, as anticipated above, bisimulations defined as equivalences cannot be closed with respect to union, which implies that we cannot give any notion of "the greatest" generative, or reactive, -bisimulation.

Bisimulation metric
The notion of bisimulation metric [30,46,50] bases on the idea that two processes can be at a given distance  < 1 only if they can mimic each other's transitions and evolve to distributions that are, in turn, at a distance < .These distances are defined as pseudometrics.We start by recalling the notion of pseudometric and its lifting to pseudometrics over distributions.In order to lift the definition of a pseudometric over processes to a pseudometric over distributions over processes, we rely on the notions of matching (Definition 15 above) and Kantorovich lifting.
The Kantorovich distance ()(,  ′ ) can be understood as the minimum expected value of the ground distance , over the supports of  and  ′ , with respect to the distribution .
Bisimulation metrics are normally parametric with respect to a discount factor allowing us to mitigate the distance of future transitions [44,50].Informally, any difference that can be observed only after a long sequence of computation steps has less impact than the differences that can be witnessed at the beginning of the computation.In our context, the discount factor is a value  ∈ (0, 1] and the distance arising at step  is mitigated by   ( = 1 means no discount).
Note that bisimulation metrics are pseudometrics by definition.In [46] it is proved that for each discount  the smallest bisimulation metric exists, which is called -bisimilarity metric and denoted by   .The kernel of   induces an equivalence relation that coincides with bisimilarity, i.e., processes are at distance 0 if and only if they are bisimilar.
Proposition 1 ([30,46]).For any ,  ′ ∈ (Σ), we have that   (,  ′ ) = 0 if and only if  ∼  ′ .Remark 10.As elsewhere in the literature, we shall use the term bisimulation metric in place of bisimulation pseudometric.We recall that in a metric the requirement (, ) = 0 in Definition 23 is strengthened by requiring that (, ) = 0 if and only if  = .This explains why one needs pseudometrics and not metrics over processes: with a pseudometric two processes that behave in the same way can be at distance 0; with a metric this is possible only if the two processes are syntactically the same.

Definition 26 (Bisimulation metric functional).
Let  ∶ [0, 1] (Σ)×(Σ) → [0, 1] (Σ)×(Σ) be the functional defined, for all  ∶ (Σ) × (Σ) → [0, 1] and ,  ′ ∈ (Σ), by: The bisimulation metric functional  can be interpreted as the quantitative analogue to the classic bisimulation game: the goal of the attacker is to maximise the distance between the two players, whereas the defender tries to minimise it.Consider processes  and  ′ , and choose a challenge   ← ← ← ← ← ← ← ← →   from the attacker .The defender  ′ is now allowed to look for the best possible answer where by best answer we mean the one that minimises the (discounted) Kantorovich distance from   .Formally, given The attacker is then allowed to look for the best possible challenge   ← ← ← ← ← ← ← ← →   , namely the one that maximises the Kantorovich distance between   and the closest distribution reached by the defender.This is formalised by means of the supremum over the transitions from .To guarantee symmetry, the roles of the two processes are then swapped, and the worst case is considered, as given by the maximum between the obtained distances.
The bisimularity metric   , i.e. the smallest bisimulation metric, can then be characterised as the least fixed point of the bisimulation metric functional.Proposition 3 ([46]).The bisimilarity metric   is the least fixed point of , i.e., (  ) =   .
Henceforth, in the examples, we will always tacitly assume the fix-point characterisation of the bisimilarity metric, as it allows for a more intuitive evaluation of the distances.

Comparing approximate bisimulations and metrics
The following example shows the incomparability of -bisimulations with the bisimilarity metric.This is mainly due to the fact that -bisimulations are only as good as the worst approximation, while bisimulation metrics always take into account all the differences encountered along the execution path.
Example 14.Consider processes  and  given by the first two NPTSs in Fig. 4. Since,  3 processes  3 and  4 cannot be related by any -bisimulation.On the other hand,  4 and  3 can be related by all -bisimulations for all  ≤ 1.We infer that  1 and  1 , and thus  and , are 0.5-bisimilar and cannot be -bisimilar for any  < 0.5.However, the bisimilarity distance between them is given by Hence, even in the no-discount case  = 1, processes ,  are at a bisimilarity distance that is smaller than their approximate bisimulation distance.In this case, the disparity is given by the fact that in the evaluation of   the distance between the distributions  1 = 0.5  3 + 0.5  4 and  2 = 0.5  3 + 0.5  4 is weighted by the probability assigned to the pair ( 1 ,  1 ) by the matching.Hence, the evaluation of the metrics allows us to take into account the fact that these two distributions have a small probability to be reached, and thus their difference has a lesser impact on the overall distance between  and .Consider now process  and process  given by the third NPTS in Fig. 4. A 0.1-bisimulation with all pairs (  ,   ), with  = 1 … 4, and (, ) can be given.On the other hand, we have that where we have used equalities because it is straightforward to prove that the ones considered are the optimal matchings.Hence, for  = 1 the bisimilarity distance between  and  is greater than their approximated bisimulation distance.In this case, the disparity is given by the fact that the bisimilarity metric sums up the (weighted) difference between the distributions  1 and  3 = 0.4 ⋅   3 + 0.5 ⋅   4 to the difference between distributions 0.1 ⋅   1 + 0.9 ⋅   2 and 0.2 ⋅   1 + 0.8 ⋅   2 .Conversely, the approximated bisimulation distance considers these differences independently one from another.

Compositionality
In this section we introduce the properties that favour compositional reasoning in the probabilistic setting.
For the equivalence approach to semantics, we follow the same approach as of classic process algebras: compositionality consists in requiring that equivalent systems are inter-replaceable.Hence, if a system  contains as a sub-component a system  1 , then the behaviour of  should not change if  1 is replaced by an equivalent system  2 .If we move to behavioural distances, then the requirement is that systems are approximately inter-replaceable: the behaviour of  should have only a smooth and limited change if  1 is replaced by a system  2 having a behaviour close to that of  1 .
V. Castiglioni, R. Lanotte and S. Tini In the case of behavioural equivalences, the intuition of inter-replaceability for processes is formalised by requiring that an equivalence relation satisfies the property of congruence.

Definition 27 (Congruence). We say that an equivalence relation
In order to deal with behavioural distances, we need a quantitative analogue to Definition 27, expressing that a small variation in the behaviour of the components leads to a bounded small variation in the behaviour of the composed processes.More precisely, the distances between the components and the distances between the composed processes should be related by some real values function enjoying some property of continuity.Technically, this would allow us to establish that, given any operator  , if we fix a non-zero distance , understood as the admissible tolerance from the operational behaviour of a given composed process  ( 1 , … ,   ), then there are non-zero distances   such that the distance between the composed processes  ( 1 , … ,   ) and  ( ′ 1 , … ,  ′  ) is at most  whenever the component  ′  is in distance of at most   from   .The intuitions above are formalised by the notion of modulus of continuity for  and the considered distance.

Definition 29 (Uniformly continuous operator,
The reason why Definition 29 uses the uniform version of continuity, instead of the more general notion of continuity (technically,  depends only on the distance between   and  ′  and is independent of the concrete processes   and  ′  ), is that it allows for universal compositionality guarantees.
We notice that whenever an operator  is uniformly continuous with respect to a 1-bounded pseudometric , then the fact that the modulus of continuity maps (0, … , 0) to 0 ensures that the kernel of  is a congruence with respect to  .In general, the opposite implication is not true.z

The formats
The crux of this survey consists in presenting the rule formats that have been proposed in the literature for a systematic verification of the compositional properties introduced above.
In Section 10, we recall the formats from [109,110] ensuring that approximate generative and reactive bisimulations are nonexpansive.The formats for the continuity properties of -bisimulations, introduced in [63], are then presented in Section 11.
Finally, in Section 13, we discuss the continuity formats proposed in [34] for the bisimilarity metric.

Congruence formats for generative, and reactive bisimulations
In this section, we recall the formats ensuring the congruence properties for generative and reactive bisimulations.The results of the section stem from [17,87], and are listed in Table 6.

Table 6
Summary of the formats presented in Section 7.

Format definition Compositionality property
Generative bisimulation Definition 14

Generative bisimulation as a congruence
We start by showing that the pattern of the generative rules (Definition 11) cannot be extended in any trivial way without risking to compromise the congruence property of generative bisimilarity.
Firstly, we argue that one cannot replace the variable   appearing in the target of a positive premise →     with a more structured term, or allowing a source variable to appear more than once in the source of the rule, or allowing more than one operator to appear in the source of the rule.These requirements are not related to the use of probability, and were already known in the theory of bisimilarity for classic process algebras.We discuss them in the following example.
Example 15.Let us define constants  and  ′ , unary operators  , ℎ 1 and ℎ 2 , and binary operator  by means of the following rules: Only the rules for  and  ′ are generative rules; the others violate Definition 11: in the rule for  the target of the premise is not a variable, in the rule for  the source variable  appears more than once in the source of the rule, in the last rule the source contains more than one operator.We can easily show that generative bisimulation is not a congruence with respect to these three operators  ,  and ℎ 1 .In detail, we have that  ∼   ′ but  () ≁   ( ′ ), since  () , since there is no rule for ℎ 2 thus implying that ℎ 2 () has no transitions for all processes , but Then, the following example shows that in generative rules, the probability weights appearing in premises must be variables that can be instantiated with arbitrary probability values, and cannot be constants.The motivation is that, by using constants, one could discriminate two processes that make the same action with the same overall probability, but by means of transitions having different probability weights.
Example 16.Let us define an operator  by means of the following rule: The premise of the rule, which is not allowed in Definition 11, can be instantiated only with a transition with the probability weight 1.Consider the processes . and . +  ., with  an arbitrary value in (0, 1).We have .∼  . +  . but  (.) ≁   (. +  .), since  (.) In order to ensure semistochasticity of processes, we need some restrictions on the set of the generative rules forming a TSS, namely we need a specification format.Specifically, one cannot consider an arbitrary set of generative rules respecting Definition 11: For instance, specifying a constant  by means of the following two rules would not be legal, since the overall probability mass of the transitions of process  is 2.

𝑘
For a set of generative rules , let   denote the subset of the rules in  having  ( 1 , … ,   ) as source.
Definition 30 (GTSS, [87]).A generative transition system specification, or GTSS for short, is formed by a set  of generative rules as in Definition 11 such that, for each operator  ∈ Σ, the set   of the rules in  for  is partitioned into (possibly infinite, but countable) sets   1 , … ,    , … , such that: 1. Given two sets    ≠    , there is an argument  of operator  such that:   3 and Table 4 give a GTSS.As already commented in Example 12, the rules for finite iteration  +1 are partitioned into two sets, one characterised by the unquantified premise for √ , the other characterised by the negative premise for the same label.The sets  +  and  ∥  are partitioned into three subsets: one has active premises for both arguments, one has the negative premise for the first argument, the other has the negative premise for the second argument.The set  ∥ is partitioned in four subsets, discriminated by the unquantified or negative premise for √ for the two arguments.The set     is partitioned in two sets, one containing rules with the unquantified premise for , the other containing rules with the negative premise for the same label.
The following example shows that extending generative rules by allowing terms more general than variables in the source of premises, in the style of the so called -rules in [75], would not be trivial at all.

Example 18.
Let us define the unary operators  and  by means of the following rules, where we have a rule  ,1 and a rule  ,2 for each action  ∈  and a rule  ,3 for each  ≠ .Notice that  1 ,  2 ∈ (0, 1] are the weights for  ,1 and  ,2 .Assume two processes  1 and  2 and that both these processes have transitions with overall probability 1. Assume that has no transition and the transitions of  ( 1 ) can be derived by applying the rules  ,1 but not the rules  ,2 .Therefore,  ( 1 ) is semistochastic if and only if ̸ then there are transitions from ( 2 ) thus implying that the transitions of  ( 2 ) can be derived by applying both the rules  ,1 and  ,2 , thus implying that  ( 2 ) is semistochastic only if  1 +  2 = 1.Summarising, in order to have that both  ( 1 ) and  ( 2 ) are semistochastic, we need that  1 = 1 and  1 +  2 = 1 for  1 ,  2 ∈ (0, 1], which is not possible.In order to fix such an issue, we should: (i) opt for  1 with  negative premises formalising that () cannot move and  unquantified premises formalising that () moves.This is doable, but it would entail a complex rewriting of clause 1 in Definition 30.
From the results in [17,87] one can infer that generative bisimilarity over processes specified by generative rules is a congruence.In [87] it is shown also that the constraints imposed by GTSSs ensure that processes are semistochastic.Those results were later extended to generative rules admitting double testing.

Theorem 1 ([87]
). Assume the GPTS induced by a GTSS.Then, (i) all processes are semistochastic, and (ii) the generative bisimilarity over the processes is a congruence.

Reactive bisimulation as a congruence
The GPTS derived from the rules in Examples 15 and 16 is also a RPTS.Therefore, the constraints in the pattern of the rules discussed in those examples are mandatory also in case of reactive rules.
In order to ensure that Definition 7 is respected, namely that all processes enjoy the property of reactivity, we need some restrictions on sets of rules forming a TSS.For instance, defining a constant  as in Equation ( 1), with the label  of the second rule replaced by the  of the first rule, would not be legal.For a set of reactive rules , let  , denote the subset of the rules in  having  ( 1 , … ,   ) as source and  as action.
Definition 31 (RTSS, [87]).A reactive transition system specification, or RTSS for short, is formed by a set  of reactive rules such that, for each operation  ∈ Σ and action  ∈ , the set  , is partitioned into (possibly infinite, but countable) sets  , 1 , … ,  ,  , … such that: 1. Given two sets  ,  ≠  ,  , there is an argument  of operator  such that: (a) all rules in one of those two sets have a negative premise , there is either an active premises Clause 1 in Definition 31 ensures that the transitions of a process  ( 1 , … ,   ) labelled  cannot be derived by rules in two different sets of rules  ,  and  ,  .Then, clause 2 ensures that the probability of the -labelled moves of  ( 1 , … ,   ) that are derivable from any set  ,  (if there are any) sum up to 1.

Example 19.
The set of the rules in Table 3 and Table 5 give a RTSS.
Notice that clause 1 in Definition 30 guarantees mutual exclusion between any pair of sets of rules    and    for  , whereas clause 1 in Definition 31 guarantees mutual exclusion between  ,  and  ,  only if  = .The reason is that the probability of the moves of a process labelled  and the probability of the moves of the same process labelled , with  ≠ , are related only in the generative case.
The following result was proved in [87] also for reactive rules admitting double testing.

Theorem 2 ([87]
). Assume the RPTS induced by a RTSS.Then, (i) for each process, the probability of its outgoing transitions with a given label , if any, sum up to 1, and (ii) the reactive bisimilarity over the processes is a congruence.

Congruence formats over NPTSs
In this section, we consider the model of nondeterministic and probabilistic processes and recall the formats ensuring the congruence property for strong (bi)similarities and branching bisimilarity, developed in [17,33,40,42] and [36,91], respectively, and are summarised in Table 7.

Definition 32 (PGSOS TSS,
1. all process variables  1 , … ,   are pairwise different, and 2. all distribution variables  , for  ∈  and  ∈   are pairwise different. A PGSOS rule is called positive if it contains only positive premises.A PSOS TSS (Σ, , ) is a (positive) PGSOS TSS if all the rules in  are (positive) PGSOS rules.

Table 7
Summary of the formats presented in Section 8.

Format definition
Compositionality property Bisimulation Definition 17
We now provide a simple example showing that by relaxing the syntactical constraints in Definition 32, the congruence property can no longer be guaranteed.It is also shown that precongruence of similarity is broken by negative premises.This example is obtained as a straightforward adaptation to the PSOS setting of classic counterexample from the literature on nondeterministic processes (see, e.g., [75,Section 5]).
Example 20.Let us consider the operators  , , ℎ defined by the following PSOS rules: Notice that the rule for  violates item 1 in Definition 32 and the rule for  violates item 2. We proceed to show that bisimilarity is not a congruence for  and .Consider processes  = .+ 0.5 .. and  ′ = .+ 0.5 ..( + 0.5 ), which should be equated by all reasonable notions of equivalence.As expected, we have  ∼  ′ .However,  (, ) ≁  (,  ′ ) and (, ) ≁ (,  ′ ).Indeed, if we name by  the distribution 0.5  + 0.5 . , we get  (, ) as the rule for  requires the distribution terms in the premises to be syntactically indistinguishable.Moreover, we show that the negative premise in the rule for ℎ breaks similarity.For process  + 0.5 . we have  ⊑  + 0.5 . but ℎ() ⋢ ℎ( + 0.5 .), since Actually, in [40,42] the congruence result was proved for rules with a pattern more general than the PGSOS presented in Definition 32, namely the /, which can be thought of as the probabilistic counterpart of the ∕ pattern from [74].In detail, the symbols in / have the following meaning: (i) "": negative premises are allowed, (ii) "": premises have the form   ← ← ← ← ← ← ← ← →  for a generic term , (iii) "": the conclusion has the form  ( 1 , … ,   )  ← ← ← ← ← ← ← ← → , and (iv) "": the conclusion has the form We recall here a restricted version of those rules, called simple / rules in [63].

Definition 33.
[Simple /-TSS, [40,42,63]] The simple  rules and the simple  rules are, respectively, of the form The rules in [40,42] admit also premises expressing that, intuitively, some variables appearing in a term  ℎ can be in the support of a distribution variable   , which allows for inspecting several subsequent transitions of a process argument.

Branching bisimulation
Two proposals of congruence formats for (rooted) branching bisimilarity can be found in [36,91].Although the sets of syntactical constraints characterising the formats are similar, the two approaches differ on the construction of the PSOS rules.Roughly, in [91] terms are built on a two sorted signature, mixing state and distribution terms, whereas in [36] the two kinds of terms are kept distinct.In agreement with the rest of the paper, we use the notation of [36].
The rule formats for (rooted) branching bisimilarity rely on a labelling as liquid or frozen of the arguments of the operators with respect to two predicates, denoted Λ and ℵ, that are inherited from the formats for weak equivalences proposed in the non-probabilistic setting [58,59].The predicate Λ labels as liquid the running process arguments, namely those that may have already started their execution, and as frozen the others.Then, ℵ marks as liquid the process arguments that can start their execution immediately, and as frozen those that cannot.We can show now how this intuition applies to the operators in Table 2.
Example 21.Both arguments of ;  ′ should be marked as ℵ-liquid, since in a given process ;  ′ both  and  ′ are able to start running immediately.Then, the first argument is Λ-liquid and the second Λ-frozen, since only  may have already performed some transitions.The argument of  * ,   , !   and !(,)  is Λ-frozen, as it is not currently running, and ℵ-liquid, since it can start running right away.Both arguments of operator  +  ′ are Λ-frozen, since in a given process  +  ′ both  and  ′ cannot have already performed any transition, and ℵ-liquid, since both  and  ′ are able to start running immediately.The labelling of the arguments of  +   ′ is similar.Then, the arguments   of .
⨁  =1 [  ]  neither have preformed any action yet, nor they can start their execution immediately (since first action  has to be performed).Hence, they are marked Λ-frozen and ℵ-frozen.Conversely, both arguments of operators  ∥  ′ ,  ∥   ′ ,  ∥   ′ ,  ∥   ′ , are Λ-liquid and ℵ-liquid: in fact, they may have already performed some computation steps, and they can also proceed with their execution at once.Finally, the argument of [ ],  ⧵ ,    () is Λ-liquid and ℵ-liquid.
Informally, in the construction of the format, we can use ℵ to distinguish the arguments of an operator  that cannot be tested in the premises of the rules for  (i.e., those labelled as ℵ-frozen), from the ones that may be tested (i.e., those labelled as ℵ-liquid).Then, we can use the marking Λ over the ℵ-liquid arguments to distinguish those for which the root condition must hold (i.e., the Λ-frozen ones), from those that are not required to always satisfy it (i.e., the Λ-liquid ones).
From this characterisation, albeit intuitive, we can infer that the testing for -moves of arguments that are marked as both Λliquid and ℵ-liquid requires care.In fact, since the root condition is not required to hold for these arguments, we need to ensure that the presence (or absence) of a -transition by them does not entail a particular behaviour of the process in which they occur.
We have that  () ≉ b  ( ′ ) (and, thus,  (.  ) ≉ rb  (. ′  )) since  () For this reason, we require -moves by (Λ ∩ ℵ)-liquid arguments to be tested only in patience rules, namely rules allowing a process to mimic the -moves of its arguments without modifying its structure.

Definition 34 (Patience rule).
A patience rule for argument  of  is a PGSOS rule of the form In Definition 35 below, we formalise the "liquid"/"frozen" marking of the arguments of operators, and we show how it can be extended to the variables occurring in (distribution) terms.[26,36,59]).Let Γ ∈ {ℵ, Λ} be an unary predicate on {( , ) |  ∈ Σ, 1 ≤  ≤ }.If Γ( , ) then argument  of  is labelled as Γ-liquid, otherwise it is Γ-frozen.Let  be a term,  a distribution term and  ∈   and  ∈   variables.Then: Then, occurrences of  and  that are not Γ-liquid in  or  are Γ-frozen.

Definition 35 (Liquid and frozen arguments of operators,
We say that a predicate Γ is universal if it holds for all arguments of all operators in the signature.
V. Castiglioni, R. Lanotte and S. Tini We now have all the ingredients necessary to introduce the specification formats for branching bisimilarity and its rooted version, i.e., respectively, the PBB-safe format and the PRBB-safe format.These can be thought of either as extensions of the PGSOS format to capture the considered weak semantics, or as probabilistic versions of the BB-safe and the RBB-safe formats, respectively, proposed for those equivalences in the non-probabilistic setting [58,59].
Definition 36 (PRBB-safe TSS, [36]).A PGSOS rule  is probabilistic rooted branching bisimulation safe (PRBB-safe) with respect to predicates ℵ and Λ if: 1. Right-hand sides of positive premises occur only Λ-liquid in the target.2. If  occurs only Λ-liquid in the source, then  occurs only Λ-liquid in .
3. If  occurs only ℵ-frozen in the source, then  does not occur in any premise.4. If  has exactly one ℵ-liquid occurrence in source, which is also Λ-liquid, then: (a)  has at most one occurrence in the positive premises, (b)  does not occur in any negative premise, (c) if  occurs in a premise labelled , then  must be patient for some argument  of the source.
A TSS is PRBB-safe for some predicates ℵ and Λ, if it has the patience rules for all (ℵ ∩ Λ)-liquid arguments and it only contains PRBB-safe rules.We say that it is PBB-safe if, moreover, predicate Λ is universal.
In item 1, since a premise   ← ← ← ← ← ← ← ← →  implies that the processes in the support of any closed instance of  are running, it is required that  occurs in the target only at Λ-liquid positions, which are those hosting running processes.Item 2 guarantees that running processes, namely those marked liquid by Λ, maintain their mark and thus the possibility to run.Item 3 prevents the testing of processes that cannot execute, namely those that are at ℵ-frozen positions in the source.Item 4 regulates the testing of processes at (ℵ ∩ Λ)-liquid positions, namely those for which one has ≈ b but the root condition of ≈ rb is not guaranteed.The following example explains clauses 4a and 4b in item 4 in detail, clause 4c was already discussed in Example 22.
If we exclude from Table 2 the rules for operators  ∥   ′ and    (), then we obtain a PRBB-safe TSS by using the labelling in Example 21.Notably, these two operators break the equivalence.This has already been shown for ∥  in Example 23.We can show that this holds also for    .For instance, given CCS processes  = .. + .. and  ′ = .+ .., we get

Formats highlights
In this part we have discussed the formats for probabilistic bisimulation-based relations over the three models.
The PGSOS format (Definition 32) for strong (bi)similarity over NPTSs is indeed the most intuitive and simplest of the ones provided.In fact, it is obtained as a straightforward translation of the classic GSOS format onto NPTSs.Similarly, the PRBB-safe format (Definition 36) for rooted branching bisimilarity can be seen as the probabilistic version of the RBB-safe format that ensures the congruence property of the considered equivalence over nondeterministic languages.Specifically, the PRBB-safe format is based on two predicates and a liquid/frozen marking of arguments of operators that, as in the classic case, allows us to distinguish the transitions that should be treated as visible from the silent ones.Moreover, the PBB-safe format for branching bisimilarity is obtained from the PRBB-safe format by requiring the predicate Λ to be universal, exactly as the BB-safe format is obtained from the RBB-safe format.
The similarities between the classic and probabilistic formats in these cases are due to the features of NPTSs: in this model, the probabilistic behaviour is expressed entirely by the distribution terms, i.e., the targets of transitions.Indeed, the cases of GPTSs and V. Castiglioni, R. Lanotte and S. Tini

Table 8
Summary of the formats presented in Section 10.

Format definition Compositionality property
Generative -bisimulation Definition 21

𝗮𝗿𝗯-safe RTSS Definition 40
Non-expansiveness Theorem 6 [110] RPTSs are different, as each transition is assigned a probability weight.Hence, the main technical difficulty in the definition of the GTSS (Definition 30) and the RTSS (Definition 31) formats for strong bisimilarity consists in guaranteeing the semistochasticity, or reactivity, of processes.For this reason, in both formats the rules for an operator are partitioned into mutually exclusive derivation sets such that the sum of the weights of the transitions derivable from each partition is 1.However, in RTSS the rules are partitioned not only according to the source operator, but also considering the label of the conclusion.In fact, in the reactive case the probability of the -labelled moves of a process sum either to 0 or to 1.

Part 3. Formats for approximated equivalences 10. Non-expansiveness formats for approximate generative, and reactive bisimulations
In this section, we recall the formats from [110] ensuring that both approximate generative bisimulations and approximate reactive bisimulations are non-expansive.See also Table 8 for a summary.Other form of continuity, such as Lipschitz continuity, were not considered in [110].We observe that in both cases, the generative and the reactive one, any format guaranteeing non-expansiveness of -bisimulation guarantees also the congruence of bisimulation, since the kernel of an -bisimulation is a bisimulation.Therefore, the constraints imposed by formats for approximate bisimulations should be at least as strong as those for bisimulations.

Non-expansiveness of approximate generative bisimulation
We start by recalling the notion of approximate generative bisimulation safe (-safe) rule.Essentially, -safe rules are generative rules that: (i) admit only a limited form of unquantified and negative premises, (ii) do not admit duplication of variables, i.e., the arguments of operators and their derivatives cannot be duplicated in the target of the rule, and (iii) force the derivatives of source variable to appear in the target.
Definition 37 (-safe rule, [110]).A generative rule as in Definition 11 is -safe if: 1. unquantified and negatives premises can be labelled only by , 2. for each index  ∈ {1, … , } ⧵  , at most one occurrence of source variable   appears in the target , 3. for each index  ∈  , no occurrence of source variable   appears in , 4. for each index  ∈  exactly one occurrence of the derivative   of the source variable   appears in .
We now illustrate that the constraints in Definition 37 are necessary by means of several examples.Negative, or unquantified, premises using a set of actions  ⊂  are forbidden (clause 1), since they could distinguish a process argument that does not perform actions in , from an approximately bisimilar process argument that performs actions in  with a small probability.The example below deals with negative premises, an example dealing with unquantified premises could be given in an analogous way.
Example 24 (Negative premises).Assume an action label  ∈  and let  be the unary operator having the following rule   for all  ≠ : For  ∈ (0, 1), the processes  1 =  and  2 =  + 1−  are related by a generative -bisimulation, whereas  ( 1 ) and  ( 2 ) cannot be related by any  ′ -bisimulation, since  ( 1 ) can perform  and  ( 2 ) cannot.
Variable duplication is forbidden: as, in rules, source variables   are instantiated by process arguments, and variables   by process argument derivatives, their duplication may cause a duplication of the distance between different instances of process arguments, thus breaking non-expansiveness.The following example deals with the duplication of derivatives of source variables.Similar arguments could be given for duplication of source variables, or to show that a source variable and its derivative cannot both appear in the target.
The point is that in computing the distance between  ( 1 ) and  ( 2 ) we sum up the distance between  1 and  2 caused by the different probability to reach . and that caused by the different probability to reach ..
As observed in Remark 4, double testing can be added to generative rules without compromising the congruence property for generative bisimulation (Theorem 1).However, we can show that double testing would break non-expansiveness of generative bisimulation and, therefore, cannot be admitted in Definition 37. Technically, double testing consist in having rule premises of the with the same variable   in the left side.In order to ensure that the probability of both the mutually exclusive events "  performs   1 " and "  performs   2 " contribute to the probability of the derived transition by  ( 1 , … ,   ), in [87] their probability weights   1 and   2 were summed when computing the probability of the move of  ( 1 , … ,   ).However, since this is not the reason for which double testing cannot be admitted, in the example below we let   1 and   2 be composed with an arbitrary function ℎ Example 27 (Double Testing).Consider two actions ,  ∈  and define  as the unary operator having the following rules:   , (with double testing) for actions  and , and Fig. 6.Two generative -bisimilar processes.
Next, we show why Definition 37 does not admit look ahead, i.e., active premises viewing two consecutive moves of any argument of  .Intuitively, approximate bisimulation consider the maximal difference that can be evaluated on a single computation step, whereas look ahead would allow for accumulating the distances arising in different computation steps, thus breaking non-expansiveness.
Definition 38 (-safe GTSS, [110]).A GTSS as in Definition 30 is -safe if all rules are -safe, and • for each action  ∈ , let  ,  be the subset of the rules in    having  as action; the sum of the weights of the rules in any set  ,   is less than or equal to 1.
We can show that the clause in Definition 38 cannot be relaxed.
Example 29.Assume  = {, , } and let  be the unary operator having the following rules: The rules   and   violate Definition 38.Consider again the processes  1 and  2 in Fig. 5.They are -bisimular, but  ( 1 ) and  ( 2 ) cannot be related by any -bisimulation, since they reach  through  with probability 1 ∕2 − 2 and 1 ∕2, respectively, thus implying V. Castiglioni, R. Lanotte and S. Tini

Table 9
Summary of the formats presented in Section 11.

𝝐-PGSOS Definition 44
Non-expansiveness Theorem 8 [63] that  ( 1 ) and  ( 2 ) can be related by a  ′ -bisimulation only if  ′ ≥ 2.The point is that  maps both actions  and  of its process argument to action , so that the distance between  ( 1 ) and  ( 2 ) caused by  is the sum of the distance between  1 and  2 caused by  and that caused by .
Theorem 5 ([110]).Let  be a -safe GTSS.Then all generative -bisimulations are non-expansive with respect to all operators defined by  . Example

Non-expansiveness of approximate reactive bisimulations
We start with the notion of approximate reactive bisimulation safe (-safe) rule.Essentially, -safe rules are reactive rules as in Definition 12 that do not admit duplication of variables.
Definition 39 (-safe rule, [110]).A reactive rule as in Definition 12 is -safe if: 1. for each index  ∈ {1, … , } ⧵  , at most one occurrence of source variable   appears in , 2. for each index  ∈  at most one occurrence of derivative   of source variable   appears in , and, if no occurrence of   appears in , then at most one occurrence of   appears in .
Duplication of variables is forbidden for the same reason that was discussed in previous section for the generative case.The constraints in Definition 37 on unquantified and negative premises and that imposing that derivatives of source variables must appear in the target are not necessary in the reactive case, since these constraints were imposed to deal with situations in which there is a probability in (0, 1) to perform a given action (cf.Example 24 and Example 26), which cannot happen in the reactive case.Double testing and look-ahead cannot be admitted, as in the generative case.Now we recall the notion of -safe RTSS, i.e., a RTSS as in Definition 31 where all rules are -safe.
We note that the second item in Definition 38 is not needed in Definition 40 since it already appears in Definition 31 (see item 2b).Theorem 6 ([110]).Let  be a -safe RTSS.Then all reactive -bisimulations are non-expansive with respect to all operators defined by  .

Non-expansiveness format for approximate bisimulations
In this section, we recall the format from [63] ensuring that bisimulations for the nondeterministic and probabilistic model are non-expansive.See also Table 9.
The approach in [63] can be summarised as follows.The first task is to assign an expansiveness bound to each operator.This is an upper bound on the distance between the composed processes, given the pairwise distances between their components.Specifically, the expansiveness bound on an -ary operator  is defined as a mapping exp  ∶ ℝ  → ℝ, such that exp  ( 1 , … ,   ) =  if whenever   ∼    ′  for all  ∈ {1, … , }, then  ( 1 , … ,   ) ∼  ′  ( ′ 1 , … ,  ′  ) for some  ′ ≤ .Expansiveness bounds are computed on the rules of a TSS.Then, the expansiveness bounds of the operators in a TSS are exploited to deduce a specification format ensuring non-expansiveness of approximate bisimulations.Notice that expansiveness bounds may be exploited also to deduce specification formats for other continuity properties.
The first observation in [63] is that the expansiveness of an operator  defined by a rule  depends on three factors: (i) the multiplicity (i.e.number of occurrences) of source variables and their derivatives in the target of , (ii) the expansive power of operators (i.e.how much does the operator multiply the distance of its arguments) that define a context in the rule target around the source variables or their derivatives, and (iii) the reactive behaviour discriminating power of the premises of .
The induced NPTS of (Σ, ,  2 ) contains the following transitions: The power of 2 in the distance reflects directly the multiplicity of 2 of the derivative  in the rule target.The same effect can be observed for multiple occurrences of source variables in the rule target, e.g.consider for the  -defining rule (  ,   ) instead of (, ) as target.
Furthermore, the expansive power of operators used in the rule target determine the expansiveness of the operator defined by that rule.A simple example is the axiom While the variable  occurs only once in the rule target, we still have  2 () ∼   2 () for  = 1 − (1 − ) 2 , because the operator  has an expansive power of 2 with respect to its single argument.This indicates that the expansive power of (arguments of) operators need to be defined recursively.
The multiplicity of source variables and their derivatives in the targets of the rules and the expansive power of operators applied on those variables multiply.Consider the rules: These rules together with  2 define  3 .Now  () ∼  ′  () for  ′ = 1 − (1 − ) 4 .As explained above for  2 , in the rule defining operator  the derivative  appears twice in the rule target.Additionally the operator  that is applied to  in the target of that rule has for both of its arguments an expansive power of two because in the -defining rule in  3 the derivatives  1 ,  2 of both arguments  1 ,  2 appear twice in the target.
The expansive power of an operator may be unbounded.Consider the operator  3 defined by the rule: This rule together with  2 define  4 .In the rule that defines  3 the derivative  occurs twice in the target.Moreover, each occurrence of  is put in the context of that operator  3 (recursive call).Additionally both occurrences of  3 () are put in the binary context , which enforces that the distances of the two copies of  multiply.Recursive multiplication of the distances leads to an approximate bisimulation distance of  3 () ∼ 1  3 ().The expansive power of  3 will in this case be denoted by ∞.
On the other hand, an operator may absorb the approximate bisimulation distance.Consider the rules: These rules together with  define  5 .Operator  3 has no rule, operator  4 allows for deriving an unconditional move to process .As a consequence, both  3 and  4 absorb the distance between their arguments, namely their expansive power is 0.Then, since in the rules for  4 and  5 the derivative  of source variable  appears in the context of an operator with the expansive power 0, also  4 and  5 absorb the distances of their arguments and their expansive power is 0. Indeed,  4 () ∼ 0  4 () and  5 () ∼ 0  5 ().
If the reactive behaviour of the process associated to a source variable is tested by some premise, then the operator defined by this rule may discriminate states with different reactive behaviour.Consider rules: These rules together with  define  6 .Operator  5 has the expansive power 0 whenever it is applied to pairs of process arguments ( 1 ,  2 ) and ( 1 ,  2 ) such that, for  = 1, 2,   and   have the same reactive behaviour, namely they have the same initial steps.Essentially, if both   and   perform , or they are unable to make , we have  5 ( 1 ,  2 ) ∼ 0  5 ( 1 ,  2 ).Otherwise, we have  5 ( 1 ,  2 ) ∼ 1  5 ( 1 ,  2 ).We will say that  5 has reactive behaviour discriminating power, which has the same effect of the expansive power 1, whenever the process arguments have different reactive and the same effect of the expansive power 0 otherwise.Then, since in the rule for  6 we apply  5 on top of 2 occurrences of the derivative , the expansive power of  6 is 2. Indeed,  6 () ∼   6 () for  = 1 − (1 − ) 2 .Note that also operators  4 and  5 defined above in  5 have expansive power 0 and have reactive behaviour discriminating power.
Definition 41 (Reactive behaviour discriminating power).For a set of rules , an operator  and an argument , we define ( , ) = 1 if the source variable   appears in a premise of some rule  ∈   , i.e., if for some  ∈   there is a positive premise In [63] the expansive power of operators defined in a PTSS (Σ, , ) is computed by relying on a least fixed point of a monotone function.Let ℕ ∞ denote ℕ ∪ {∞}, with the natural ordering extended by  < ∞ for each  ∈ ℕ, and the usual arithmetic extended for summation by We define a poset  = (, ⊑)
We define now the multiplicity functor  ∶  → , which allows us to compute iteratively the expansive power of operators and the frequency of variables in terms.

Definition 42 (Multiplicity functor). Functor
, where: We can see that  computes the expansive power of an operator  by inspecting the rules for  and taking into account: (i) the multiplicity of the variables in the target, (ii) the expansive power of the operators in the target, (iii) the reactive behaviour discriminating power of the operators in the target.
Functor  is order-preserving.This ensures the existence and uniqueness of the least fixed point of  by the Knaster-Tarski fixed point theorem.
We denote the least fixed point of  by ( Σ ,   ).We call  Σ ( , ) the expansive power of argument  of operator  , and   ()( ) the weighted multiplicity of variable  in term .The expansive power of  allows us to derive an upper bound on the approximate bisimulation distance between terms  ( 1 , … ,   ) and  ( ′ 1 , … ,  ′  ) expressed in relation to the approximate bisimulation distances   between their arguments.

Definition 43 (Expansiveness bound).
The expansiveness bound exp  of operator  ∈ Σ with respect to the approximate bisimulation distances   of its arguments  = 1, … ,  is defined by The pattern 1 − ∏  =1 (1 −   )  Σ (,) can be explained as follows.In the derivatives of  ( 1 , … ,   ),  Σ ( , ) copies of the derivatives of process argument   are created.Therefore,  Σ ( , ) instances of   and  ′  contribute with their distance   to the distance between  ( 1 … ,   ) and  ( ′ 1 … ,  ′  ).These  Σ ( , ) contributes are not simply summed: in the pattern 1 − (1 −   )  Σ (,) the distance   between one occurrence of   and one occurrence of  ′  is weighted by the probability that all other occurrences of   and  ′  , for all  ∈ [1, ], behave in the same way.
The following result states that the expansiveness bound of operators can be exploited to obtain an upper bound on the approximate bisimulation distance between processes.The result was given in [63] for simple /-TSSs [40,42,63] recalled in Definition 33.
Clearly, Theorem 7 allows one to infer the uniform continuity properties of operators from the rules, but computing exp  requires a recursive reasoning over those rules.In [63] a rule format for non-expansiveness was proposed, which exploits Theorem 7 and whose constraints on the rules are easy to verify, since it suffices to count the occurrences of source variables and derivatives in the rule target without any need for recursive reasoning over other rules.

Definition 44 (𝜖-PGSOS rule format).
A PGSOS-rule  is an -PGSOS rule if for each   ∈ src() we have Operators  and  can create an arbitrary number of copies of their process argument.We can show that  and  are not uniformly continuous. and    giving the remaining probability 1 − (1 − )  to processes that cannot move.We get   ( (  ),  (  )) = (1 − (1 − )  ).Hence, sup ∈ℕ   ( (  ),  (  )) = sup ∈ℕ (1 − (1 − )  ) = , thus giving that the distance between composed processes is bounded by () =  if  > 0 and (0) = 0, which is not a modulus of continuity since it is not continuous at 0. Hence,  is not uniformly continuous accordingly to Definition 29.
Consider now operator .We have (  ) As for operator  , we conclude that no modulus of continuity for  can be given and, therefore,  is not uniformly continuous.
Our second observation is that, intuitively, Lipschitz continuity requires that the number of copies of process arguments that can be created along the computation should be bounded by a constant.

Example 35 (copy operator).
Consider the copy operator [27] defined by the following rule: The second rule allows us to create an exponential number of copies of process argument.We can show that this breaks Lipschitz continuity for any candidate Lipschitz factor .
Since processes are at bisimulation distance zero if and only if they are bisimilar (Proposition 1), a format guaranteeing uniform continuity for bisimilarity metric must also guarantee the congruence property for bisimilarity.In Section 8 we argued that the congruence property for bisimilarity is guaranteed by the ∕ format of [40,42].However, the following example shows that already the simple version of the ∕ format (Definition 33) breaks Lipschitz continuity, thus implying that when dealing with metrics we cannot go beyond PGSOS.Intuitively, by using operators in the source of the premises of the rules, if those operators are not non-expansive, then we can create an exponential number of copies of the derivatives of source variables, which breaks Lipschitz continuity as already observed above. 2 with a new operator  defined as follows:

Example 36 (𝗇𝗍𝜇𝖿𝜃∕𝗇𝗍𝜇𝗑𝜃 rules). Let us extend Table
The rule for  is simply- but falls out of the PGSOS format since a term different from a single variable occurs as left hand side of the premise.We can show that  cannot be -Lipschitz continuous for any fixed constant .Consider processes  1 = .. and  1 = .([1− ].⊕ []).Then, for all  > 1 define   = .−1 and   = .−1 .Clearly, for all  ≥ 1 we have   (  ,   ) =   .We now aim at evaluating   ( (  ),  (  )), for all  ≥ 1.Since the target of the rule for  is of the form  (), where  is derived from the synchronous parallel composition of the source variable with itself, we can infer that  (  ) via a sequence of  -labelled transitions reaches a distribution that assigns probability 1 to the process obtained by applying  to the parallel composition of 2  copies of process ..Similarly, after  -labelled transitions  (  ) reaches a distribution assigning probability (1 − ) 2  to the same process and the remaining probability 1 − (1 − ) 2  to processes that cannot execute any action (being obtained by applying  to the parallel composition of 2 −1 processes among which there is at least one process ).Therefore we get   ( (  ),  (  )) =   (1 − (1 − ) Next, we need to understand how we can obtain the upper-bounds on the distance, corresponding to the moduli of continuity from Definition 29, from the syntactic inspection of the rules.To this end, consider the fix-point characterisation of the bisimilarity metric (Proposition 3).By monotonicity of the suprema and infima used in the definition of the bisimulation metric functional (Definition 26), we can transfer any upper-bound on the Kantorovich lifting to an upper-bound on   .In particular, we can exploit the following general properties of the Kantorovich distance, that hold for any 1-bounded pseudometric  [66]: ()(  ,   ) = (, ), for all ,  ∈ (Σ), and ()( Since the distributions over which the Kantorovich distance is evaluated are obtained as closed instances of the targets of the PGSOS rules, the uniform continuity formats will be characterised by various constraints on the form of those distribution terms.

Non-extensiveness format
We start with PGSOS rules for non-extensiveness, which admit only convex combinations of variables and constants in the rule target.Remind that Σ  denotes the subset of the -ary operators in Σ, and let    denote the set of distribution terms    = {  |  ∈   }.
Notice that a PTSS with only the operators .,  +  ′ ,  +   ′ and .⨁  =1 [  ]  is non-extensive.In [62] it is showed that all the remaining binary operators in Table 2, which are out of Definition 45, are not non-extensive, thus witnessing that the constraints in Definition 45 are not too demanding.𝑏. (Fig. 7).We get   ( 1 ‖   1 ,  2 ‖   2 ) =  ⋅ 0.19, with 0.19 the probability that  1 ‖   1 can not perform  at second step.

Non-expansiveness format
We obtain a format for non-expansiveness by relaxing Definition 45 as follows.We can allow a rule target to be a convex combination of distribution terms in which also non-constant operators occur and are applied to source variables and their derivatives, provided that, for each source variable, at most one occurrence of that variable, or its derivatives, occurs.Definition 46 (Non-expansive TSS, [34]).Let  1 = (Σ 1 , ,  1 ) be a non-extensive TSS.A PGSOS TSS  2 = (Σ 2 , ,  2 ) with  1 ⊑  ) and the subsequent Proposition 6 states that a solution exists, which assigns to each   a value   < ∞, provided that the discount factor  is strictly less than 1.
Theorem 11 (Lipschitz continuity, [34]).Let (Σ, , ) be a Lipschitz-TSS and let  ∈ (0, 1). the examples of this section that were showed to be non-Lipschitz continuous do not respect Definition 47. Operator  in Example 34 violates clause 3 since no   can be given.Operator  violates clause 2 since no   can be given.In Example 35, the second rule for operator  violates clause 3, since the derivative of the source variable appears in the body of , which is not non-expansive.

Formats highlights
As in the case of approximated bisimulations, given that the kernel of bisimilarity metric coincides with probabilistic bisimilarity, the non-extensiveness, non-expansiveness, and Lipschitz formats discussed in this part are all obtained as refinements of the PGSOS format.Specifically, since the quantitative behaviour of processes is all encoded in the distribution terms, the three formats impose constraints on the form of the targets of PGSOS rule.
In the non-extensive format (Definition 45) only convex combination of variables and constants is admitted as targets.Essentially, trgt() = ∑ ∈     and in each distribution term   we have at most one occurrence of one source variable, or of one of its derivatives, and this occurrence cannot be in the context of any operator.
The non-expansive format (Definition 46) allows trgt() to be a convex combination of distribution terms in which also nonconstant operators occur and are applied to source variables and their derivatives, provided that, for each source variable, at most one occurrence of that variable, or its derivatives, occurs.This means that, in each   , source variables, or their derivatives, can occur only in the scope of non-extensive operators that do not amplify their distance.However, all the variables occurring in   contribute to the overall distance.
The Lipschitz format (Definition 47) is more permissive, but it requires to inspect the PGSOS rules operator-by-operator.Given an operator  , we use two natural numbers,   and   , as upper bounds to the number of occurrences in the targets of the rules for , where    is the maximum of the Lipschitz factor of the   nested operators.

Part 5. Concluding remarks
We have collected and discussed the formats for probabilistic behavioural equivalences, approximated equivalences, and behavioural metrics, over generative, reactive, and nondeterministic probabilistic transition systems, that have been proposed in the literature.Summarising, we have presented: • The GTSS format ensuring the congruence property of generative bisimilarity over GPTSs (Section 7.1).
• The RTSS format ensuring the congruence property of reactive bisimilarity over RPTSs (Section 7.2).
• The PGSOS format ensuring the congruence property of probabilistic bisimilarity, the precongruence property of probabilistic ready similarity, and, in the positive version, the precongruence property of probabilistic similarity over NPTSs (Section 8.1).• The PRBB-safe format ensuring the congruence property of rooted branching bisimilarity, and the PBB-safe format for that of branching bisimilarity over NPTSs (Section 8.2). • The -safe format ensuring the non-expansiveness property of generative -bisimulations over GPTSs (Section 10.1).
To show that none of the formats we discussed is too restrictive, in Tables 11 and 12 we report the list of the formats that are satisfied by the principal operators defined in probabilistic process algebras.In Table 11 we consider the operators defined in Tables 3-5, and the formats for generative and reactive (approximated) bisimulations.Likewise, in Table 12 we list the formats for (bi)simulations, -bisimulations and bisimulations metrics over NPTSs satisfied by the operators presented in Table 2.We remark that the unary operators of relabelling, restriction, and priority violate the non-extensiveness format for bisimilarity metrics, as indicated in Table 12.However, since they do satisfy the non-expansiveness format and they are unary, they show a non-extensive behaviour (as, in their case, the sum of the distances over the arguments always coincides with the maximum of those distances).
We now conclude our survey with a brief review of other probabilistic models existing in the literature, but for which no format has been proposed, followed by an informal presentation of formats proposed outside of the process algebraic setting.

Other models
In this survey, we focused our attention on three specific extensions of LTSs to the probabilistic setting: the generative, the reactive, and the nondeterministic and probabilistic models.The rationale behind this choice is quite straightforward: to the best of our knowledge these are the only models for which specification formats have been given.Moreover, we wanted to support compositional reasoning over systems in which nondeterministic aspects of the behaviour are combined with probabilistic ones.Each interpretation of this combination led to a different model, and consequently to a different format (as perfectly exemplified by the studies on strong probabilistic bisimilarity in Sections 7 and 8).
However, in the literature we can find other proposals of semantics models for probabilistic processes, for which no rule format has been given.For sake of completeness, below we give an informal presentation of those models.We remark that all the models discussed below are semistochastic.
Stratified model.The stratified model [72] captures the branching structure of purely probabilistic choices and follows a bistructured approach to operational semantics.It is characterised by two types of transitions: the action transitions and the probability transitions.The former ones are of the form   ← ← ← ← ← ← ← ← →  as in the classic LTSs.The latter ones allow for expressing explicitly probabilistic choices, and are of the form   ← ← ← ← ← ← ← ← →   meaning that process  has probability  to continue its execution as .(The role of the index  is similar to that of indexes in the generative and reactive models.)The operational rules are defined in such a way that processes can perform either only action transitions, or only probability transitions, and are therefore divided into action processes and probability processes, respectively.We remark that action processes have (at most) one outgoing action transition, the choice in this model is purely probabilistic, and there is no construct allowing for nondeterministic behaviour.Alternating model.The alternating model stems from [115] and it extends the stratified model with nondeterministic choices, allowing for both internal and external nondeterminism.In this setting, processes are partitioned into nondeterministic processes and probabilistic processes.The former ones can perform only (nondeterministic) action transitions; the latter ones only probability transitions.In particular, each probability transition is defined between a probabilistic process and a nondeterministic one.(Notice that this is another difference with the stratified model, since there probability transitions may end in probability processes.)Conversely, no constraint is imposed, in general, on the target of nondeterministic action transitions.In case these transitions are required to end in a probabilistic process, the model is then called strictly alternating (also known as Labelled Concurrent Markov Chains [76]).
Although the strictly alternating model may resemble the NPTS model, we remark that: • in NPTSs there is no partition over the set of processes, and • the probabilistic transitions that may be induced from the probability distributions do not have processes as their targets, but a discrete measure over them (the Dirac distribution).
Given the absence of a partition over processes, the NPTS model is also referred to as the non-alternating model, to mark the difference with the alternating model [104].Markov models.Markov models are used to capture the semantics of those processes (and systems) whose probabilistic behaviour can be expressed in terms of a Markovian process (in the purely mathematical sense of the term).In the literature, we can find a wealth of proposals of such models that differ for the initial knowledge on the state space, the topology of the latter, the possibility to interact with the environment (and, thus, the presence of nondeterminism), whether timed aspects of computation are taken into account and how, etc.Just to make a few examples, the most common Markov models used in the literature are Markov Chains (and their labelled, continuous/discrete-time, continuous/discrete-space, variations), Labelled Markov Processes, Markov Decision Processes, and Hidden Markov models.We spare the reader from a detailed presentation of these models, their variations, and all the other models not mentioned above, as this is out of the scope of our work.We limit ourselves to point out the principal difference between Markov models and the probabilistic models discussed so far: the relation between processes and probability.Markov models are characterised by either a transition probability function that maps a process (or a pair process, label) onto a distribution over processes, or a transition matrix that maps a pair of processes (or a triple process, label, process) onto a probability weight.These transition functions play the role of the Markov kernel of the Markovian process that represents the behaviour of the considered system.We remark that, as in Markovian processes, this means that the probability of reaching a particular process (or set of processes) depends only on the current process.In the various probabilistic ex-tensions of LTSs given above, the labelled transitions are defined as relations either between two processes, or between a process and a distribution.In particular, the probability weight that is used as label of the transition it is not obtained, at least in principle, from the process performing that transition.This holds true also for the NPTS model, in which, technically speaking, the distribution reached via a transition does not depend on the process performing that transition, but on the resolution of nondeterministic choices.

The meta-model generalisation: a format for ULTraS
The realm of quantitative aspects of systems behaviour is much more various and complex than the distinction in the models considered above.This is due not only to possibly different interpretations of the nondeterminism-probability interplay, but also (and mainly) to the use of quantitative information other than probability.For instance, we might want to consider the execution rates of computation steps [14,85,99], the concentration of substances in biological/chemical systems [38,111,113], the load on the nodes in a network [68], or any other information of the performance of a system [60,78], etc.In the literature, we can find a wealth of semantic models and process calculi designed to capture a specific quantitative feature of systems behaviour.Despite the intuitive advantages of this richness, the drawback of their peculiarity is that the tools and formal results developed for one calculus cannot be used to reason on the others.
This inconvenience led to the search for meta-models, namely semantic frameworks that are general enough to offer uniform definitions and results that can be then instantiated and applied to a particular calculus or model.The most prominent examples of meta-models in the quantitative setting are Functional Transition Systems [89], Weighted Labelled Transition Systems [56,84], and Uniform Labelled Transition Systems (ULTraS) [23].As our aim with this survey is to cover specification formats for probabilistic systems, we do not discuss these meta-models and the results obtained on them in detail.Nonetheless, given the generality of the ULTraS model, and for sake of completeness, we give a bird's-eye view of the Weight Function GSOS (WF-GSOS) format proposed in [94] for the syntactic presentation of ULTraS.
Transitions in the ULTraS model are a generalisation of the transition relation presented in Section 3.3 for the NPTS model (Definition 8): each state can select nondeterministically an action-labelled transition leading to a state reachability weight function, i.e., any function quantifying the likelihood of a state to be reached in that step.Following [94], transitions are then of the form   ← ← ← ← ← ← ← ← →  where  is a state,  is an action, and  is a weight function assigning to each state a value from a given commutative monoid .The intuition behind the notion of bisimulation over -ULTraS states is also a generalisation of that over NPTSs: two states are bisimilar if by performing the same actions they reach weight functions that assign the same weights to bisimulation classes of states.Given  ⊆ , the total weight assigned to  by  is defined by  |  = ∑ ∈ ().Then, given any relation  between two sets  and  , and its subset closure  * , the lifting   over weight functions is defined as ( 1 ,  2 ) ∈   iff for all (, ) ∈  * it holds that  1 |  =  2 |  .Definition 50 (Bisimulation, [94,Definition 2.6]).Let (, , ← ← ← ← →  ) and ( , , ← ← ← ← →  ) be two image finite -ULTraS.A relation  ⊆  ×  is a bisimulation iff (, ) ∈  implies that for each  ∈ : States ,  are said to be bisimilar if there is a bisimulation relation  such that (, ) ∈  .
The WF-GSOS format is then obtained as an extension of the classic GSOS format.A new signature, called the weight signature, is introduced for a syntactic representation of weight functions, in a similar fashion as distribution terms are used as the syntactic counterpart of probability distributions (see Section 4.1).

Table 3
Some operators of generative and reactive process algebras.

Table 5
Some operators of reactive process algebras.
, R. Lanotte and S. Tini NPTSs showing the incomparability of approximate bisimulations with bisimulation metrics.
R.Lanotteand S. Tini (a) all rules in one of those two sets have a negative premise     such that   ∩   ≠ ∅, or an active premise     , for an action   such that   ∈   .2. for each set    , it holds that: (a) all rules in    have the same sets of active variables, unneeded premises, unquantified premises and negative premises; (b) given a set of actions {  |  ∈ } such that {  |  ∈ } are the active variables of the rules in    , if the rules in    and   ∈   , then there is at least one rule in    with active premises {    , , … are the rules in    having the same set of active premises, then their weights satisfy  1 + ⋯ +    + ⋯ = 1.Clause 1 in Definition 30 ensures that the transitions of a process  ( 1 , … ,   ) cannot be derived by applying rules from two different sets  and    .Then, clause 2 ensures that if some transitions from  ( 1 , … ,   ) are derivable from a set    , then their probability weights sum up to 1.The set of all rules in Table and  2 with  1 +  2 = 1, (ii) replace each  ,1 with two rules, say  can be applied to derive transitions of a process  () only if () cannot move and such that   ∩   ≠ ∅, or an active premise     for an action   such that   ∈   .2. For each set  ,  , it holds that: (a) all rules in  ,  have the same set of unquantified premises and negative premises; (b) the sum of the weights of the rules in  ,  is 1; (c) given any two rules  1 and  2 in  ,  , then, for each active premise     , countable sets of indexes,   ,   ∈ (Σ,   ),   ,   ,  ∈ ,   ∈   ,  ∈ Σ,  1 , … ,   ∈   ,  ∈ (Σ,   ,   ),  ∈   and the constraints 1-2 in Definition 32 hold.
, R.Lanotte and S. Tini 1] is a suitable function.Then, let  be the binary operator having the following rule    1 , 2 for all pairs of actions  1 ,  2 ∈ : 30 (Non expansive generative operators.).The constants  and , prefixing ., probabilistic alternative composition  1 +   2 , probabilistic interleaving  1 ∥   2 , ACP-like merge  1 ∥  2 and restriction  ⧵  are defined in Table 3 and Table 4 by a set of rules respecting Definition 38 and, therefore, are non-expansive.On the other hand, in rules defining iterations   ,  +1 ,  * , and replication !, we observe replication of variables that violate the constraints in Definition 37; in the second rule for sequential composition  1 ;  2 the derivative of the first argument does not appear in the target, thus violating the fourth clause in Definition 37; the rule for relabelling [ ] respects Definition 38 only if  is injective; the rules for priority    () have unquantified and negative premises violating the first clause in Definition 37.

Table 11
Formats satisfied per operator in the generative and reactive models.of, respectively, source variables and their derivatives, and the nested operators in which a source variable can occur.Moreover, the occurrences of derivatives of source variables can be only in the scope of non-expansive operators.These constraints ensure that the distance between the instances of the arguments of operator  is expanded of at most a factor   ⋅  ⋅   ⋅