Asymptotic elimination of partially continuous aggregation functions in directed graphical models

For a ﬁnite and relational signature σ and ﬁnite domain D we consider the set W D of all σ -structures with domain D . On W D a probability distribution is determined by a so-called parametrized probabilistic graphical model , a concept studied in statistical relational artiﬁcial intelligence. We also consider a many valued logic, denoted PLA , with truth values in the unit interval for expressing queries. PLA uses aggregation functions, for example the arithmetic mean, geometric mean, maximum and minimum, instead of quantiﬁers. In this setting we prove that every formula of PLA with only admissible aggregation functions is asymptotically equivalent to a formula without aggregation functions, as the domain size tends to inﬁnity. A corollary of this is a probabilistic convergence law for PLA -formulas with only admissible aggregation functions. © 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons .org /licenses /by /4 .0/).

1. Introduction 1.1. Aggregation functions. Aggregation functions (also called aggregate functions or combination functions) are an important tool in analysis of data. Such functions take a sequence of numbers (or more generally, some number of sequences of numbers) and return a number. Here we will only consider aggregation functions whose value does not depend on the order of the numbers in the sequence. Typical examples include the arithmetic mean, the geometric mean, and the maximum of the numbers in the sequence. Moreover, we consider only sequences of numbers in the unit interval [0, 1] and aggregation functions with values in [0, 1]. In our context this is natural because the numbers we consider can be viewed as probabilities (or relative frequences) and the logic that we will consider has truth values in [0,1]. But one can also think of numbers in the unit interval as being the "normalized" versions of numbers in [0, a] for some positive a ∈ R. As usual [0, 1] n denotes the Cartesian product of n intervals [0, 1] and we let   1], so F takes k sequences (not necessarily of the same length) as input. We call F an aggregation function if F is symmetric in the sense that ifr 1 , . . . ,r k ∈ [0, 1] <ω and for each i = 1, . . . , k,ρ i is an arbitrary reordering of the entries ofr i , then F (ρ 1 , . . . ,ρ k ) = F (r 1 , . . . ,r k ).
(5) noisy-or(r) = 1 − n i=1 (1 − r i ). 1.2. Logic. We will study a probability logic with aggregation functions, abbreviated P LA. Since the output of an aggregation function may be any number in the unit interval it follows that P LA will be a many valued logic with truth values in the unit interval. Since the aggregation functions max and min can be used to express existential and universal quantification, respectively, it follows that the expressive power of P LA exceeds that of first-order logic. Examples of the expressivity of P LA are given in Section 5. For example we show that every stage of the SimRank [14] can be expressed by a P LA-formula.
The syntax of P LA (Definition 3.3) is similar to the probability logic studied by Jaeger [13], but we use the semantics (Definition 3.5) of Lukasiewicz logic for the propositional connectives ¬, →, ∨ and ∧. We make this choice because we want the truth value of, for example, ϕ → ψ to vary continuously as the truth values of ϕ and ψ vary. When formulas take the truth values 0 or 1 the semantics of Lukasiewicz coincides with the common semantics of the mentioned connectives. For a concise introduction to Lukasiewicz logic see e.g. [2,Section 11.2], or see the original source [19]. 1.3. Probability distributions and parametrized probabilistic graphical models. Formulas of P LA are evaluated in finite structures, which can be thought of as "possible worlds". In particular, given a formula we are interested in the probability that this formula takes a particular (truth) value, or a value in a given interval, when interpreted in a random possible world with a fixed domain. The problem formulation assumes that a probability distribution is given on the set of possible worlds with a fixed domain. We fix an arbitrary finite first-order signature σ with only relation symbols. In practice the signature is determined by the context. We assume that the domain is [n] = {1, 2, . . . , n} for some positive integer n. Let W n denote the set of all σ-structures (in the usual sense of first-order logic) with domain [n]. There are many ways to define a probability distribution on W n . Since our aim is to obtain results that are useful within the context of statistical relational AI, a subfield of AI and machine learning, we consider a probability distribution on W n which is determined by a so-called parametrized (or lifted) probabilistic graphical model (PPGM). (For background on statistical relational learning and probabilistic graphical models see e.g. [4,9,10,15].) A PPGM is determined partly by a (directed or undirected) graph, the vertices of which are so-called parametrized random variables, but can also be seen as (usually) atomic first-order formulas. The (conditional) dependencies between the random variables are expressed by the edges of the graph. For every parametrized random variable of a directed PPGM the conditional probability of it taking a given value can be computed from the values of its parents. The formalization of a PPGM used in this article, first considered in [16], is a lifted Bayesian network in the sense of Definition 4.6 below which uses conditional probability logic (CPL) (Definition 4.4) to express "threshold conditions". Informally speaking, a lifted Bayesian network assigns a probability to an atomic formula R(x) by saying that it has probability α i if a condition χ i (x) holds where χ i (x) may not use R but may use the usual syntactic constructs of first-order logic and constructs with the following meaning: The relative frequency ofȳ satisfying ϕ 1 (x,ȳ) conditioned onȳ satisfying ϕ 2 (x,ȳ) is at least as large as the relative frequencyȳ satisfying ϕ 3 (x,ȳ) conditioned onȳ satisfying ϕ 4 (x,ȳ). Thus lifted Bayesian networks are well suited for expressing probabilities that change if a threshold (in terms of a relative frequency) is passed but which stay fixed between the thresholds. The discussion in Section 5, including Example 5.3, hints to what kind of distributions can be expressed and their relevance. 1.4. Problems and results. With a lifted Bayesian network and a description of how they induce a probability distribution P n on W n (Definition 4.7) we have the framework for asking for a PLA-formula ϕ(x) andā ∈ [n] k (where k is the length of the sequence of variablesx): What is the probability that ϕ(ā) has a given value (or that its value belongs to a given interval)? The "brute force" method to compute this probability is to compute the value of ϕ(ā) in all structures in W n and then add the probabilities of those structures where ϕ(ā) has the given value(s). Needless to say, this approach is in general extremely inefficient for large n. Ideally, we would like to have a way of computing, or at least approximating, the probability that ϕ(ā) takes a certain value which is independent of the size n of the domain. If ϕ(x) has no aggregation functions then the value of ϕ(ā) depends only on which atomic formulas the sequenceā satisfies; it follows that, for any interval I ⊆ [0, 1], the probability that ϕ(ā) belongs to I can be computed by using only the lifted Bayesian network and ϕ.
Thus, from a computational perspective, we may wish that for every formula ϕ(x) of P LA we could find a formula ψ(x) without aggregation functions such that with a probability approaching 1 as n → ∞ ϕ(ā) and ψ(ā) have the same value. (In the context of a 2-valued logic with quantifiers instead of aggregation functions such a result would be called "almost sure elimination of quantifiers".) However, it is not difficult to construct a formula ϕ(x) such that (even under the assumptions of the main theorems of this article) there is no ψ(x) without aggregation functions such that, almost surely, ϕ(ā) and ψ(ā) have the same value for allā. Therefore we consider the notion of asymptotic equivalence. Two formulas ϕ(x) and ψ(x) are asymptotically equivalent (Definition 4.2) if for every ε > 0, the probability that there isā such that the difference between the values of ϕ(ā) and ψ(ā) is larger than ε tends to 0 as n → ∞. Note that in a two-valued logic asymptotic equivalence and almost sure equivalence coincide.
Due to the wide variety of aggregation functions and probability distributions we do not expect to be able to "asymptotically eliminate" all possible aggregation functions for all possible probability distributions. Our main result, Theorem 6.8, says that if a lifted Bayesian network has the property that all of its aggregation formulas are noncritical (in a sense to be made precise), then every formula of P LA with only admissible aggregation functions is asymptotically equivalent to a formula without aggregation functions, with respect to probability distributions induced by the lifted Bayesian network. The condition that an aggregation function is admissible means, very roughly, that it behaves in a uniformly continuous way within certain restricted contexts. As explained by Remark 7.21, the asymptotically equivalent formula without aggregation functions can be computed from the original formula by using only the lifted Bayesian network that induces the probability distribution. The result about asymptotic elimination of admissible aggregation functions implies that for every P LA-formula with only admissible aggregation functions (which include max, min, arithmetic and geometric means) the probability that it is satisfied (by a random tuple of parameters) converges as the domain size tends to infinity. To the best of our knowledge this is the first convergence law of a logic with truth values in the unit interval [0, 1] and which can express all properties that are expressible in first-order logic.
1.5. Related work. Already in 1998 Jaeger [13] proved a convergence result for firstorder formulas in a context where the probability distribution was determined by a relational Bayesian network that uses only exponentially convergent aggregation functions. The logic P LA that we will define is much like Jaeger's probability logic [13], but we will use P LA for defining queries, while Jaeger's use of probability logic in [13] is to define probability distributions (via relational Bayesian networks).
More recently, Koponen [16] proved "almost sure elimination of conditional probability quantiers" and (as a by product) a zero-one law for conditional probability logic, which is a two valued logic that extends first-order logic, in a context where the distribution was determined by a lifted Bayesian network. Quite recently, Grädel et al. [11] proved convergence laws for first-order logic with semiring semantics.
Besides the above results there has recently been a growing interest in the AI community in investigating the effect of increasing domain sizes on probabilistic inference in various contexts. Very limited convergence results with respect to logical expressibility, covering only Boolean combinations of atomic formulas, have been obtained for (domainaware) Markov logic networks and relational logistic regression networks by Poole et al. [21] and Mittal et al. [20]. Weitkämper [23] and [25] showed that domain-aware relational logistic regression networks and, more generally, functional lifted Bayesian networks are asymptotically equivalent to aggregation-free networks. However, they only allow for non-nested dependencies on relative frequencies rather than allowing for a choice of aggregation function. Weitkämper [24] shows asymptotic quantifier elimination for probabilistic logic programming, which only supports the noisy-or combination function.
Jaeger's work in [13] considers exponentially convergent aggregation functions in the probability formulas used to define probabilities in relational Bayesian networks. Although his notion of exponentially convergent aggregation function is similar in spirit to our Definition 6.2 of admissible aggregation function, neither of the notions implies the other. Indeed, noisy-or is exponentially convergent but not admissible, while the arithmetic mean function is admissible but not exponentially convergent.
1.6. Organization. Section 2 clarifies some basic terminology and notation. Section 3 defines the syntax and semantics of P LA and derives a couple of basic properties of P LA. Section 4 introduces the reader to asymptotic equivalence of formulas, conditional probability logic and lifted Bayesian networks, and the way they induce a probability distribution. In Section 5 we discuss the expressivity of lifted Bayesian networks and of P LA, including concrete examples. In Section 6 the notion of admissible aggregation function is defined. It is proved that arithmetic mean, geometric mean, maximum, minimum and conditional arithmetic mean are admissible aggregation functions and the main result, Theorem 6.8, and its corollary about convergence are stated. Section 7 contains the proof of Theorem 6.8.

Preliminaries
We use more or less standard notation and terminology within the field of finite model theory; see e.g. [18]. The letter σ (or σ ′ ) will always denote a finite relational signature (vocabulary). By saying that σ is finite and relational we mean that σ is finite and contains only relation symbols. We use the expression σ-structure in the sense of firstorder logic and such structures are denoted by calligraphic letters A, B, C, . . ., possibly with super-or subscripts. If A is a σ-structure and σ ′ ⊂ σ then A↾σ ′ denotes the reduct of A to the (sub)signature σ ′ . The domain (universe) of a structure A will be denoted by the corresponding noncalligraphic letter A. Often the domain A will be the set [n] = {1, . . . , n} for some integer n ∈ N + where N + denotes the set of all positive integers and N denotes the set of all nonnegative integers. The cardinality of a set A is denoted by |A|. Finite sequences (tuples) of elements are denoted byā for some noncapital letter a. The length of a sequenceā is denoted |ā|. For two sequencesā and b their concatenation is denotedāb. The set of all elements that occur in a sequenceā is called its range and is denoted rng(ā). For a set A and integer k > 0, A k denotes the set of all k-tuples (sequences of length k) of elements from A and A <ω = k∈N + A k .
The letters x, y, z (possibly with indices) will almost always denote formal logical variables. The expressionsx,ȳ,z will denote finite sequences of distinct variables although this assumption may be repeated sometimes. However ifā denotes a sequence of some other kind, a sequence of reals for example, then we allow repetitions of the same element in the sequence. Formulas of a formal logic are usually denoted ϕ, ψ, χ or θ. As usual, if ϕ is a formula and all of its free variables occur in the sequencex then this formula may be denoted by ϕ(x). If A is a σ-structure, ϕ(x) is first-order formula over σ and a ∈ A |x| , then the notation A |= ϕ(ā) has the same meaning as in first-order logic.
A directed acyclic graph (DAG) is a pair (V, E) where V (the set of its vertices) is any set and E ⊆ V × V (the set of its edges) has the property that

Probability logic with aggregation functions
Let σ be a finite relational signature. Definition 3.1. (i) Constructions of the form 'x = y' and 'R(x 1 , . . . , x r )', where x, y, x 1 , . . . , x r are variables, and R ∈ σ has arity r, are called atomic first-order formulas (over σ). By a first-order literal (over σ) we mean a first-order atomic formula (over σ) or a negation of such one. (ii) If A is a σ-structure with domain A and a, b, a 1 , . . . , a r ∈ A, then the notation 'A |= a = b' and 'A |= R(a 1 , . . . , a r )', where R ∈ σ, have the same meaning as in first-order logic.
Definition 3.2. (Atomic σ-types) A consistent set p of first-order literals over σ is called an atomic σ-type. If an atomic σ-type is denoted by p(x) it is understood that every variable that occurs in a formula in p(x) occurs in the sequencex. An atomic σ-type p(x) is called complete if for every first-order literal ϕ(x) ∈ P LA(σ), either ϕ(x) or ¬ϕ(x) belongs to p(x). If p(x) is an atomic σ-type and rng(ȳ) ⊆ rng(x), then p(x)↾ȳ (or p↾ȳ) denotes the set of all formulas ϕ ∈ p(x) such that every variable of ϕ occurs in y.
When convenient we will identify an atomic σ-type p(x) with the formula obtained by taking the conjunction of all formulas in p(x). With this convention, if A is a σ-structure andā ∈ A |x| the notation A |= p(ā) makes sense and means, with model theoretic language, thatā realizes p(x) (in the structure A). Note that if σ = ∅, then an atomic σ-type p(x) will only contain literals of the form z = y or z = y where z, y ∈ rng(x). Definition 3.3. (Syntax of P LA(σ)) By the probability logic with aggregation functions over σ, denoted P LA(σ), we mean the set of objects called formulas which are constructed as described below. We assume that we have an infinite set of symbols called variables, usually denoted x, y, z, u, v, possibly with indices. For each ϕ ∈ P LA(σ) the notation F v(ϕ) denotes the set of free variables of ϕ. If we denote a formula by ϕ(x), wherex is a sequence of variables, it is understood that all free variables of ϕ(x) occur inx.
We may also let ⊥ and ⊤ denote 0 and 1, respectively. (2) For all variables x and y, 'x = y' belongs to P LA(σ). The free variables of 'x = y' are x and y. (3) For every R ∈ σ, say of arity r, and any choice of variables x 1 , . . . , x r , R(x 1 , . . . , x r ) belongs to P LA(σ). The free variables of R(x 1 , . . . , x r ) are x 1 , . . . , x r . (4) If ϕ, ψ, χ ∈ P LA(σ) then the following also belong to P LA(σ): (¬ϕ), (ϕ ∧ ψ), (ϕ ∨ ψ), (ϕ → ψ), and (the ϕ-weighted mean of ψ and χ) but we may skip some parantheses if there is no ambiguity. In each case the set of free variables of the new formula is the union of the sets of free variable of the formulas which it is constructed from. We consider ϕ ↔ ψ as an abbreviation of (ϕ → ψ) ∧ (ψ → ϕ). belongs to P LA(σ). If this new formula is denoted ψ then thus this construction binds the variables inȳ.
Definition 3.4. (i) A formula that does not contain any aggregation function is called aggregation-free.
With the above definitions the semantics of the propositional constructions in part (4) coincide with the common semantics for ¬, ∧, ∨ and → when the values are 0 or 1. Also, each propositional construction corresponds to a uniformly continuous function and this is essential in the proofs of the main results.
Remark 3.6. (On aggregations without identity constraints) The reader may ask why we did not, for a k-ary aggregation function F , add to P LA(σ) formulas of the form F (ϕ 1 (x,ȳ), . . . , ϕ k (x,ȳ) :ȳ) with the same semantics as in part (5) of Definition 3.5 except for omitting the condition that p = (ā,b) holds (as p = no longer appears in the new formula). The reason is that for formulas of this form our proof that an admissible aggregation function F can be asymptotically eliminated does not work out (where the notion 'admissible' is defined in Definition 6.2), because the "degrees of freedom" which are determined by an identity type (usually denoted p = here) matter in this context. More detailed explanations of why the proof does not work out are found in Remark 7.17. However, as we show in [17], every strongly admissible aggregation function F (including the arithmetic and geometric means, but not max and min) can be asymptotically eliminated from a formula like F (ϕ 1 (x,ȳ), . . . , ϕ k (x,ȳ) :ȳ) and for more general kinds of probability distributions than considered here. Definition 3.7. We say that ϕ(x) ∈ P LA(σ) and ψ(x) ∈ P LA(σ) are equivalent if, for every finite σ-structure A and everyā ∈ A |x| , A(ϕ(ā)) = A(ψ(ā)).
Remark 3.8. A basic probability formula which is also a sentence, that is, a formula without free variables, has the form n i=1 (⊤ → c i ) where c i ∈ [0, 1] (and recall that ⊤ = 1). The formula n i=1 (⊤ → c i ) is equivalent to c where c = min{c 1 , . . . , c n }, so every basic probability sentence is equivalent to a sentence of the form c for some c ∈ [0, 1]. Definition 3.9. The aggregation rank of a formula ϕ ∈ P LA(σ), denoted agr(ϕ), is defined as follows: (1) If ϕ is aggregation-free then agr(ϕ) = 0.
Lemma 3.10. If ϕ(x) ∈ P LA(σ) is aggregation-free then ϕ(x) is equivalent to a basic probability formula.
Proof. Let p 1 (x), . . . , p m (x) enumerate, without repetition, all complete atomic σ-types in the variablesx. As ϕ(x) is aggregation-free it is clear that for all i = 1, . . . , m, every σ-structure A and allā,b Therefore there are (not necessarily distinct) c 1 , . . . , c m ∈ [0, 1] such that whenever A is a σ-structure andā ∈ A |x| , then, for every i = 1, . . . , m, Thus, if A is a σ-structure andā ∈ A |x| , then there is a unique j such that A |= p j (ā) and we get Below we note that P LA(σ) respects isomorphism.

Directed parametrized probabilistic graphical models and induced
sequences of probability distributions 4.1. Sequences of probability distributions and asymptotic equivalence. Throughout this section (as in the rest of the article) we assume that σ is a finite relational signature and that W n denotes the set of all σ-structures with domain [n].
Definition 4.1. By a sequence of probability distributions (on (W n : n ∈ N + )) we mean a sequence (P n : n ∈ N + ) such that for every n, P n is a probability distribution on W n .
Definition 4.2. Let ϕ(x), ψ(x) ∈ P LA(σ) wherex is a tuple of distinct variables. We say that ϕ(x) and ψ(x) are asymptotically equivalent (with respect to (P n : n ∈ N + )) if for all ε > 0 The following lemma is essential for the proof of the main results.

Lemma 4.3. (Preservation of asymptotic equivalence under connectives)
, and χ(x) is asymptotically equivalent to χ ′ (x). Let θ(x) be a formula constructed from ϕ(x), ψ(x) and/or χ(x) by any one of the constructions in part (4) of Definition 3.3 and let θ ′ (x) be constructed in the same way from and θ ′ (x) are asymptotically equivalent with respect to (P n : n ∈ N + ).
, and x · y + (1 − x) · z are uniformly continuous. Therefore the conclusion follows from the semantics (Definition 3.5), the assumptions about asymptotic equivalence and from the assumption that θ and θ ′ are constructed in the same way from ϕ(x), ψ(x) and/or χ(x) and from ϕ ′ (x), ψ ′ (x) and/or χ ′ (x), respectively.

4.2.
Conditional probability logic and lifted Bayesian networks. The parametrized probabilistic graphical model that we will use was introduced in [16] and is called lifted Bayesian network. Lifted Bayesian networks use, in their definition, a logic (also introduced in [16]) called conditional probability logic (CP L) so we introduce this logic first. (There are previously considered logics, such as the ones in [1] and [12], the expressivity of which is at least as strong as the expressivity of CP L, but for such logics we have not found any "convergence" results of the kind proved in [16] which will be used later.) The set of conditional probability formulas over σ, denoted CP L(σ), is defined as follows: (1) Every atomic σ-formula belongs to CP L(σ) (where 'atomic' has the same meaning as in first-order logic with equality).
In both these new formulas all variables of ϕ, ψ, θ and τ that appear in the sequenceȳ become bound. That this construct is a form of quantification becomes apparent from its semantics below.
A formula ϕ ∈ CP L(σ) is called quantifier-free if it is constructed from atomic formulas by using only connectives ¬, ∧, ∨, → and ↔.
(2) Suppose that A is a finite σ-structure and let ϕ( and in this case we say that then we write Definition 4.6. (Lifted Bayesian network) Let σ be a finite relational signature. A lifted Bayesian network for σ is determined by the following components: (a) An acyclic directed graph (DAG) G with vertex set σ.
We use the convention to denote a lifted Bayesian network by the same symbol (e.g. G) as its underlying DAG. Observe that Definition 4.6 makes sense if σ is empty. In this case the underlying DAG has empty vertex set (and edge set) and no numbers or formulas as in parts (b) and (c) of the definition need to be specified.
Definition 4.7. (The probability distribution induced by a lifted Bayesian network) Let σ be a finite nonempty relational signature and let G denote a lifted Bayesian network over σ. In this definition we denote the arity of R ∈ σ by k R . Suppose that the underlying DAG of G has maximal path rank ρ. Let σ −1 = ∅ and, for 0 ≤ r ≤ ρ, let σ r = {R ∈ σ : mp(R) ≤ r}. For r = −1, 0, 1, . . . , ρ, let G r be the subnetwork of G which is induced by σ r and let W r n be the set of all σ r -structures domain [n]. Note that G ρ = G and W ρ n = W n . Let P −1 n be the unique probability distribution on the singleton set W −1 n . By induction on r we define, for every r = 0, 1, . . . , ρ, a probability distribution P r n on the set W r n as follows: For every A ∈ W r n , otherwise.
Finally we let P n = P ρ n so (P n : n ∈ N + ) is a sequence of probability distributions on (W n : n ∈ N + ) which we call the sequence of probability distributions induced by G.

Expressivity
The scope of the results shown here depends both on the expressivity of the query language P LA and also on the expressivity of the lifted Bayesian networks which induce the probability distributions for which we show our results.

5.1.
Scope of the underlying families of probability distributions. Conditional probability logic allows the expression of discrete conditions based on relative frequencies, in addition to the full power of first-order logic. Such conditions are often used as triggers in policy or engineering applications. For instance, consider modelling infectious disease dynamics on networks. Then CP L can express a variety of trigger conditions, such as the occurrence of a single positive case (using existential quantification) or a certain percentage of people being infected (using relative frequency quantification). Lifted Bayesian networks then allow the modelling of actions that may be taken when those conditions are met. This suffices for modelling a variety of real-world policy decisions (such as those summarised in [3, Table II]). For further examples of the expressivity of CP L see Example 3.5 and remarks 3.4 and 3.6 in [16] A clearly important fragment of CP L for which our results hold is first-order logic itself. Lifted Bayesian networks whose formulas are first-order already suffice to model the relational Bayesian network specifications of Cozman and Maua's probabilistic finite model theory [6,7].
Beyond CP L and lifted Bayesian networks, our results generalize immediately to every other sequence of probability distributions that is asymptotically equivalent to a sequence of distributions induced by a lifted Bayesian network in the following sense: Definition 5.1. Two sequences of distributions P = (P n :∈ N + ) and P ′ = (P ′ n : Remark 5.2. In measure theoretic terms, the sequences of distributions P and P ′ are asymptotically equivalent if and only if the limit of the total variation difference between them is 0.
For two formalisms that are very different from lifted Bayesian networks, namely probabilistic logic programming under the distribution semantics and functional lifted Bayesian networks, it has recently been demonstrated [24,25] that every sequence of distributions induced by such a formalism is asymptotically equivalent to a sequence of distributions that is induced by a lifted Bayesian network in which all aggregation formulas are Boolean combinations of atomic formulas.
Probabilistic logic programming is one of the most-studied formalisms for statistical relational artificial intelligence that is unique in supporting recursion in the context of negation-by-failure, a feature inherited from classical logic programming. It has found significant practical application in bioinformatics [8].
Functional lifted Bayesian networks are much closer to P LA itself, as they are designed to support continuous dependencies on relative frequency. However, unlike P LA the aggregation functions used in defining a functional lifted Bayesian network must not be nested, which limits their expressivity but ensures the asymptotic equivalence to a quantifier-free lifted Bayesian network. They can model both linear and logistic regression functions, which suffices to express domain-size aware relational logistic regression [23].
To give some feeling of what kind of distributions can be described with lifted Bayesian networks, we consider the following example which we describe informally. Suppose we have properties P 1 , . . . , P s which also correspond to unary relation symbols. Each P i may be (conditionally) dependent of some P j and (conditionally) independent of other P k . These (conditional) dependencies and independencies can be described by a directed acyclic graph with vertex set {P 1 , . . . , P s }. To each P i we associate some CP Lformulas that use only P j among the parents of P i and which define cases such that within each case P i (x) holds with a fixed probability. Let E be a binary relation symbol (corresponding to some relation) and let the probability that E(x, y) holds depend (only) on which P i are satisfied by x and y, respectively. More formally the directed acyclic graph is enlarged with the vertex E and arrows from P i to E for all P i which have influence on the probability of E. Let R be a binary relation symbol, let 0 < r 1 < . . . < r t = 1 and c 1 , . . . , c t ∈ [0, 1]. Let the probability that R(x, y) holds be c i if , among z such that E(u, z) for at least 1/10 of the u in the domain, and d y is the proportion of z with E(z, y), among z such that E(u, z) for at least 1/10 of the u in the domain. More formally, the directed acyclic graph is enlarged with a vertex R and an arrow from E to R, and for every i = 1, . . . , t − 1, a CP L-formula which expresses that

5.2.
Expressivity of PLA. CP L is fundamentally distinct from P LA by working in a 0-1-valued rather than a continuous-valued logic. CP L therefore supports nesting conditional probability quantifiers, but does not allow for continuous dependencies on those conditional probabilities nor for other aggregation functions than conditional probabilities.
The expressiveness of P LA arises precisely from allowing nested combinations of different aggregation functions. The support for arithmetic mean, and variations of it, among them opens up new possibilities not covered by any of the aforementioned formalisms.
"The similarity to x of the most similar other element" is given by max(ψ(x, y) : y : x = y).
"The average similarity of x to other elements" is given by am(ψ(x, y) : y : x = y).
"The lowest similarity score of any two elements" is expressed by In Example 5.7 we show that all the stages of SimRank [14] are expressible in P LA.
Example 5.5. (Conditional arithmetic mean) There are situations when we are interested in the mean over elements that satisfy some condition. In the present context we can express this situation by considering P LA-formulas ϕ(x,ȳ) and ψ(x,ȳ) where ψ is 0-1 valued. Let p = (x,ȳ) be a complete atomic ∅-type, so it expresses all identity relations among the variablesxȳ. For a finite structure A andā ∈ A |x| , the arithmetic mean of A(ϕ(ā,b)) asb ranges over all tuples in A |ȳ| that satisfy p = (ā,ȳ) and ψ(ā,ȳ) can, letting X = {b ∈ A |ȳ| : p = (ā,b) holds}, be written as We wish to find a P LA-formula θ(x) such that A(θ(ā)) equals (5.1) whenever the denominator is positive. For this we use the aggregation function 'cam' defined for all cam(p,q) = 0 ifp contains only zeros, and otherwise Note that ifp contains at least one nonzero entry, then 0 < am(p) ≤ am( √p ) and hence . So the division with max(am( √p ), am(q)) instead of just am(q) makes sure that cam(p,q) always belongs to [0, 1], but also, by Proposition 6.5 below, it follows that cam is admissible (that is, it has some "continuity properties") so that the main results apply to formulas using it. Let θ(x) be the P LA-formula and suppose thatq is not constantly zero, so it contains at least one 1.
which equals (5.1) under the stated assumptions.
Example 5.6. (Conditional arithmetic mean with relaxed identity constraints) Let ϕ(x, y, z), ψ(x, y, z) ∈ P LA(σ) and assume that ψ is 0-1 valued. Let A ∈ W n and a ∈ [n] |x| . Suppose that we want to express the average of A(ϕ(ā, b, c)) as (b, c) ranges over all ordered pairs of elements in [n] \ rng(ā) such that A(ψ(ā, b, c)) = 1. In other words we allow that b = c and that b = c so we have not fixed a complete identity constraint on y and z. Then we cannot directly apply the methods of Example 5.5 since those methods require that we consider the conditional arithmetic mean only for (b, c) such that b = c or only for (b, c) such that b = c. However we can use the idea of Example 5.5 together with some additional "tricks" which we now explain. If we let X = [n]\rng(ā), n ′ = n−|rng(ā)| and ϕ ′ (x, y, z) denotes ϕ(x, y, z)∧ψ(x, y, z) then the described conditional average can be written as We wish to express the above by a P LA-formula that uses only admissible aggregation functions, but a problem is that the expressions above are undefined if cam * (p,q,r,s) = 0 ifp andq contain only zeros, Note that cam * is defined for all possiblep,q,r,s ∈ [0, 1] <ω and that its output is always in [0, 1]. By Proposition 6.5 below, cam * is admissible and hence the main results apply to formulas using it.
Let θ(x) be the formula cam * (ϕ ′ (x, y, z), ϕ ′ (x, y, y), ψ(x, y, z), ψ(x, y, y) : y, z : p = (x, y, z)) where p = (x, y, z) expresses the identity relations among the elements inā, that y and z are different from all variables inx, and that y = z. We claim that A(θ(ā)) equals (5.2) whenever (5.2) is well defined. Let and note that the length of each of the above sequences is n ′ (n ′ − 1). Then A(θ(ā)) = cam * (p,q,r,s). Suppose that at least one ofp andq contain at least one non-zero entry.
Observe that A(ϕ ′ (ā, b, c)) ≤ A(ψ(ā, b, c)) for all b and c and recall that A(ψ(ā, b, c)) is either 0 or 1. It follows that if some entry ofr is 1 then max(pr) = 1 and λ(p,r) = 1/m, so cam * (p,q,r,s) = am(p) + 1 m am(q) am(r) + 1 m am(s) which is equal to (5.2). Now suppose that all entries ofr are zero (so all entries ofp are zero as well) but some entry ofs is one. Then max(pr) = 0 and λ(p,r) = 1 and hence In the next example we show that cam * can be used to define the "stages" of the so-called SimRank.
Example 5.7. (SimRank) Consider a signature σ with a binary relation symbol E. For a finite σ-structure A, a measure of the similarity of two elements a, b ∈ A is given by the so-called SimRank [14] defined recursively as where C ∈ (0, 1] is a constant and I(a) denotes the set of in-neighbours of a, that is, The SimRank s(a, b) can be estimated in stages by k'th stage SimRanks s k (a, b) defined as follows Then lim k→∞ s k (a, b) = s(a, b) [14]. We now construct, for any k ∈ N, a P LA(σ)-formula ϕ k (x, y) such that for every finite σ-structure A and all a, b ∈ A, A(ϕ k (a, b)) = s k (a, b). For simplicity we let C = 1, because if we have defined ϕ k so that the above holds for C = 1 then we can use the weighted mean (from Definition 3.3 of the syntax of P LA) to get a similar formula for any C ∈ (0, 1).
We simply let ϕ 0 (x, y) be the formula x = y. Suppose that ϕ k (x, y) ∈ P LA(σ) is such that, for all a, b ∈ A, if a = b or if I(a) = ∅ and I(b) = ∅, then A(ϕ k (a, b)) = s k (a, b). Then let ψ(x, y, u, v) be the formula E(u, x) ∧ E(v, y) and let ϕ ′ k (x, y, u, v) be ϕ k (u, v) ∧ ψ(x, y, u, v). Define ϕ k+1 (x, y) to be the formula

Admissibility and the main result
We begin by considering the condition on aggregation functions, admissibility, that will allow us to asymptotically eliminate them in the context of distributions induced by lifted Bayesian networks.
6.1. Admissibility. Our main result tells that 'admissible' aggregation functions can be asymptotically eliminated from P LA-formulas. Admissibility is a kind of continuity condition and to define it we will use the notion of convergence testing sequence. Informally speaking, an infinite sequencer n ∈ [0, 1] <ω , n ∈ N, is convergence testing if |r n | < |r n+1 | for all n and if there are k ∈ N + and c 1 , . . . , c k , α 1 , . . . , α k ∈ [0, 1] such that, as n → ∞, every entry ofr n is ever closer to one of c 1 , . . . , c k and, for i = 1, . . . , k, the proportion of entries inr n that are close to c i is ever closer to α i . Our definition of convergence testing sequence is similar in spirit to a stronger notion with the same name used by Jaeger [13]. The corresponding notion in [13] is stronger than ours because it adds a requirement that "accumulation" around certain points happens with exponential speed.
Definition 6.1. A sequencer n ∈ [0, 1] <ω , n ∈ N, is called convergence testing for parameters c 1 , . . . , c k ∈ [0, 1] and α 1 , . . . α k ∈ [0, 1] if the following hold, where r n,i denotes the ith entry ofr n : (1) |r n | < |r n+1 | for all n ∈ N. Observe that in the definition of admissibility we require that all α i,j are nonzero. It is straightforward to verify that Noisy-or is not admissible, but we have: Proposition 6.3. The functions am (arithmetic mean), gm (geometric mean), max and min are admissible.
Proof. All functions are clearly continuous on [0, 1] n , for every n, as can be seen directly from their definition. So we proceed to show that they are compatible with convergence testing sequences as demanded by Condition (2) of admissibility. So letr n , n ∈ N, be convergence testing with parameters c 1 , . . . , c k ∈ [0, 1] and α 1 , . . . , α k ∈ (0, 1]. Let F be the arithmetic mean. Then lim n→∞ F (r n ) = α 1 c 1 + · · · + α k c k . Indeed, for any sufficiently small δ > 0 there is an N ∈ N such that for all n > N , and |{i ≤ |r n | : r n ∈ I i }| |r n | ∈ (α i − δ, α i + δ).
For F the geometric mean we obtain lim n→∞ F (r n ) = k i=1 c α i i . Indeed, for any δ > 0 choose N such that (6.1) and (6.2) hold for all n > N . Then for all n > N , It can be used when some quantity is influenced by the imbalance of (the means of) two other quantities. Since 'am' is admissible and |x − y| is uniformly continuous on R 2 it follows that F is admissible.
. G is used in the context of Domain-size-Aware Relational Logistic Regression models [23] and it is admissible because 'am' is admissible and S is uniformly continuous.
As a third example, which is not an "arithmetic combination" of unary aggregation functions (such as am or gm) we have the pseudometric µ u 1 on [0, 1] <ω , described in Definition 7.2 below, which is a binary aggregation function.
In Examples 5.5 and 5.6 we considered the "conditional arithmetic means" cam and cam * . The next proposition tells that they are indeed admissible.
Proposition 6.5. The aggregation functions cam and cam * are admissible.
First suppose that all a i are zero. Then lim n→∞ max(p n ) = 0 and it follows from (6.3) that lim n→∞ cam(p n ,r n ) = 0.
Next, suppose that at least one a i is positive. In this case  (1) and (2) in the definition of admissibility, we observe that if it is not the case that bothp andq are constantly zero, then the following holds, which straightforward to verify: We first show that, for all n 1 , n 2 , n 3 , n 4 ∈ N + , cam * is continuous at every point (p 0 ,q 0 ,r 0 ,s 0 ) ∈ [0, 1] n 1 × [0, 1] n 2 × [0, 1] n 3 × [0, 1] n 4 . Note that the number m in the expression λ(p 0 ,r 0 ) depends only on |p 0 | = n 1 . As cam * is constructed by composing arithmetic operations, max, the function am and using the square root, its continuity at (p 0 ,q 0 ,r 0 ,s 0 ) is clear wheneverp 0 andq 0 do not consist only of zeros. So now suppose that bothp 0 andq 0 consist only of zeros, so cam * (p 0 ,q 0 ,r 0 ,s 0 ) = 0. If (p,q,r,s) approaches (p 0 ,q 0 ,r 0 ,s 0 ), thenp andq approach sequences which are constantly zero, so the right hand side of (6.5) approaches 0 and hence cam * (p,q,r,s) approaches 0.
Suppose that at least one a i is positive. Then max(p nrn ) converges to a positive number and am( √p n ) converges to a positive number (namely i α i √ a i ), so the denominator in (6.4) converges to a positive number. If some c i is positive then max(p nrn )am(r n ) converges to a positive number and hence the denominator in (6.4) converges to a positive number.
Suppose that all a i and all c i are zero and some d i is positive. Then max(p nrn ) converges to 0 and hence λ(p n ,r n ) converges to a positive number. Also, am(s n ) converges to i δ i d i which is positive since all δ i are positive. Hence λ(p n ,r n )am(s n ) converges to a positive number and thus the same holds for the denominator in (6.4).
Suppose that all a i , all c i and all d i are zero, but at least one b i is positive. Then lim n→∞ cam * (p n ,q n ,r n ,s n ) = lim Finally, suppose that all a i , all b i , all c i and all d i are zero. Then it follows from (6.5) that lim n→∞ cam * (p n ,q n ,r n ,s n ) = 0.

6.2.
Noncriticallity. Let σ be a finite relational signature. The main result uses the assumption that every aggregation formula of the lifted Bayesian network G (for σ) used to define probability distributions is noncritical with respect to the network. The notion of noncritical CP L(σ)-formula defined in [16] uses the notion of m-critical number, where m ∈ N. The notion of m-critical number in [16,Definition 4.29] is quite technical and is embedded in the proof of the main results of [16]. But it follows from Lemma 4.12 and Definitions 4.18, 4.22, and 4.29 in [16], that if α ∈ R is m-critical (with respect to a lifted Bayesian network G for σ) in the sense of [16,Definition 4.29], then it can be generated, using the operations addition, multiplication and division from the set of numbers where µ(R | χ R,i ) is the number associated to G in Definition 4.6. In fact, if α is mcritical in the sense of [16,Definition 4.29], then α can be generated from S(G) with at most 16 l l ·|σ| applications of the operations addition, multiplication and division, where l is the sum of m and the maximal arity of the relation symbols in σ. The number 16 l l ·|σ| is a very crude upper bound based on considering the possible atomic σ-types in m variables and the definitions and result from [16] mentioned above. To describe noncritical formulas more easily we will use the following definition. Definition 6.6. Let ϕ ∈ CP L(σ). A real number r is a quantifier parameter of ϕ if ϕ has a subformula of the form If a formula ϕ ∈ CP L(σ) is critical with respect to a lifted Bayesian network G, in the sense of [16,Definition 4.30], then ϕ has a quantifier parameter r such that r = α − β and both α and β are m-critical in the sense of [16,Definition 4.29] where m is the sum of the number of free variables in ϕ and the quantifier-rank of ϕ (in the sense of [16,Definition 3.7]). It follows from the discussion above that if ϕ is critical with respect to G in the sense of [16,Definition 4.30], then ϕ has a quantifier parameter r such that r = α − β and both α and β can be generated from S(G), as defined above, by at most 16 l l ·|σ| applications of the operations addition, multiplication and division, where l is the sum of the length of ϕ (as a string of symbols) and the maximal arity of the relation symbols in σ.
In order to avoid the technicalities involved in [16,Definitions 4.29 and 4.30] we will define a notion of noncritical formula which is somewhat stronger than the corresponding notion in [16,Definition 4.30] but still interesting, we think, since for any k, the set of numbers which can be generated from S(G) with at most k applications of addition, multiplication and division is finite, so this set can be avoided by "moving" a tiny bit up or down in R. Definition 6.7. (Noncritical formula with respect to G) Let G be a lifted Bayesian network for σ. We call a formula ϕ ∈ CP L(σ) potentially critical with respect to G if it has a quantifier parameter r such that r = α − β and both α and β can be generated from S(G) (as defined above) by at most 16 l l ·|σ| applications of the operations addition, multiplication and division, where l is the sum of the length of ϕ (as a string of symbols) and the maximal arity of the relation symbols in σ. Otherwise we call ϕ noncritical with respect to G.
It follows that every first-order formula is noncritical with respect to any lifted Bayesian network.
6.3. The main result. Theorem 6.8. (Asymptotic elimination of admissible aggregation functions) Let σ be a finite relational signature and let G be a lifted Bayesian network for σ such that every aggregation formula of G is noncritical with respect to G. If ϕ(x) ∈ P LA(σ) and all aggregation functions in ϕ are admissible, then ϕ(x) is asymptotically equivalent to a basic probability formula with respect to the sequence of probability distributions induced by G.
The proof of Theorem 6.8 is carried out in Section 7. First we derive a corollary. Corollary 6.9. (Convergence of probability) Let G be a lifted Bayesian network over a finite relational signature σ such that every aggregation formula of G is noncritical with respect to G. Let (P n : n ∈ N + ) be the sequence of probability distributions induced by G. If ϕ(x) ∈ P LA(σ) has only admissible aggregation functions then there are c 1 , . . . , c k ∈ [0, 1], depending only on ϕ and G, such that for every m ∈ N + , everyā ∈ [m] |x| and every ε > 0, and for all i = 1, . . . , k P n A ∈ W n : |A(ϕ(ā)) − c i | < ε converges as n → ∞.
Proof. Let G, P = (P n : n ∈ N + ) and ϕ(x) be as assumed. By Theorem 6.8, there is a basic probability formula ψ(x) which is asymptotically equivalent to ϕ(x) with respect to P.
is a conjunction of first-order literals. Without loss of generality we can assume that each ψ i is the conjunction of all formulas in a complete atomic σ-type. Note that for every A ∈ W n and everyā ∈ [n] |x| we have A(ψ(ā)) ∈ {c 1 , . . . , c k } and A(ψ(ā)) = c i if A |= ψ i (ā). Let c ∈ {c 1 , . . . , c k } and suppose that i 1 , . . . , i t enumerates all i such that c i = c. Then P n {A ∈ W n : A(ψ(ā)) = c} = P n t j=1 ψ i j (ā) . By Proposition 7.8 below, it follows that the above probability converges as n → ∞. (Moreover, the number to which it converges depends only on ψ and G, according to the same proposition.) Since ϕ(x) and ψ(x) are asymptotically equivalent with respect to P the conclusions of the corollary follow.

Asymptotic elimination of aggregation functions
In this section we prove Theorem 6.8. Its proof is concluded by Corollary 7.20. The definition of admissible aggregation function given above (Definition 6.2) is relatively intuitive and was convenient for proving Proposition 6.3. But in the proofs of this section another characterization of admissibility is needed. The next subsection shows that admissibility is equivalent to a condition which we call "admissibility sensu novo". 7.1. An alternative characterization of admissibility. In order to formulate the other characterization of admissibility we need to relate eachr ∈ [0, 1] <ω to a specific function from [0, 1] to [0, 1], and we need to define a couple of pseudometrics on [0, 1] <ω . Definition 7.1. (Functional representations of sequences) Let n ∈ N + and let r = (r 1 , . . . , r n ) ∈ [0, 1] n . We will associate a function from [0, 1] to [0, 1] withr in two different ways, one way where the order of the entries inr matters and one in which the order does not influence the associated function.
Since we are only considering the limit, we can assume without loss of generality that N = 1. Consider the sequencesr ′ n andρ ′ n obtained by setting r ′ n,l = c j if r n,l ∈ I j (recall that different I j are disjoint), and likewise ρ ′ n,l = c j if ρ n,l ∈ I j . By Condition (1) of admissibility sensu novo, for every ε > 0 there are δ > 0 and n 0 depending only on ε such that if n > n 0 and µ u 1 (r ′ n ,ρ ′ n ) < δ, then |F (r ′ n ) − F (ρ ′ n )| < ε. This together with (7.1) implies that It now suffices to show that We claim that this is a consequence of Condition (2) of admissibility sensu novo. We only show that the first limit equals 0, since the second limit is treated in the same way. So let ε > 0 and choose an appropriate δ > 0. We need to show that, for all sufficiently large n, clauses (a)-(d) of Condition (2) hold, withr ′ n in the role ofr i in condition (2) of Definition 7.4 andr n in the role ofρ i in the same definition. Clause (a) is obvious. Clause (b) is true for sufficiently large n sincer n is convergence testing, because we can just choose I j with diameter less than δ. Clause (c) applied tor ′ n is again clear by the definition ofr ′ n . Clause (d) applied tor ′ n is clear for sufficiently large n because of (7.1). This concludes the proof. 7.2. Asymptotic elimination of admissible aggregation functions. Throughout this section we assume that σ is a finite and relational signature and we let W n be the set of all σ-structures with domain [n]. Let G be a lifted Bayesian network over σ such that every aggregation formula of G is noncritical with respect to G. Also let P = (P n : n ∈ N + ) be the sequence of probability distributions which is induced by G.
Since the sequence of probability distributions P is fixed throughout the section we will simply say that two formulas are asymptotically equivalent when we mean that they are asymptotically equivalent with respect to P.
In this section we prove that if ϕ(x) ∈ P LA(σ) and all aggregation functions in ϕ are admissible then there is a basic probability formula ψ(x) ∈ P LA(σ) such that ϕ and ψ are asymptotically equivalent; this is concluded by Corollary 7.20 below and proves Theorem 6.8.
The proof of this result proceeds by induction on the complexity of P LA(σ)-formulas and we now outline the proof. The main inductive step is to show that if ϕ(x) denotes the formula F (ψ(x,ȳ) :ȳ : p = (x,ȳ)) where F : [0, 1] <ω → [0, 1] <ω is an admissible aggregation function and ψ is asymptotically equivalent to a basic probability formula, then ϕ(x) is asymptoticaly equivalent to a basic probability formula. (The case for F of higher arity than 1 is analogous.) In fact, to begin with we will assume that ψ(x,ȳ) is a basic probability formula and we will see that we can assume that it has the form i,j (p i,j (x,ȳ) → c i,j ) where each p i,j is (the conjunction of) a complete atomic σ-type, p i,j ↾x = p i,k ↾x for all i, j and k, and each p i,j implies p = .
The crucial step of the proof is to analyse, for each i, each structure A ∈ W n , and a ∈ [n] |x| such thatā realizes the restriction of p i,j tox, the sequencē From Proposition 7.8 below it follows that, for every i, with high probability, as n → ∞, the proportion ofb ∈ [n] |ȳ| , among those satisfying p = (ā,b), such that A |= p i,j (ā,b) is close to some α i,j which depends only on p i,j and G. Therefore the proportion of c i,j in r is, with high probability, close to the sum of all α i ′ ,j ′ such that c i ′ ,j ′ = c i,j . As F is admissible, hence admissible sensu novo, it follows from condition (1) in the definition of admissibility sensu novo that F (r) is, with high probability, close to a number d i which depends only on F , p i,j (as j ranges over its possible values) and G. Consequently, ϕ(x) is asymptotically equivalent to a formula of the form This step is completed by Corollary 7.18. Then we use this result and condition (2) in the definition of admissibility sensu novo to show (in Proposition 7.19) that if ψ(x,ȳ) is asymptotically equivalent to a basic probability formula (but is not necessarily itself a basic probability formula), then ϕ(x) is asymptotically equivalent to a basic probability formula.
We first define theȳ-dimension of an atomic σ type p(x,ȳ) which, informally speaking, is the number of degrees of freedom for the variablesȳ once the variablesx have been instantiated by parameters from a structure. Definition 7.6. Let p(x,ȳ) be an atomic σ-type. Theȳ-dimension of p(x,ȳ), denoted dimȳ(p), is the maximal d ∈ N such that there are a σ-structure A,ā ∈ A |x| andb ∈ A |ȳ| such that A |= p(ā,b) and |rng(b) \ rng(ā)| ≥ d.
Let p(x,ȳ) be an atomic σ-type and d itsȳ-dimension. We will, for large n, A ∈ W n andā ∈ [n] |x| that realizes p↾x be interested in the proportion b ∈ [n] |ȳ| : With the terminology of the next definition, the subsequent proposition tells that with high probability the above proportion is close to a number which depends only on p, p↾x and G. The same proposition also tells that for quantifier free formulas ϕ(x), the probability that a tuple of parameters satisfies it converges, as n → ∞, to a number that depends only on ϕ and G.
Note that if p(x,ȳ) and q(x) are as in the above definition and the σ-structure A is (p, q, 0)-unsaturated, then p(x,ȳ) is not realized in A. From [16] we can extract the following (with explanations that follow): Proposition 7.8. (i) For every quantifier-free first-order formula ϕ(x) over σ, m ∈ N + and everyā ∈ [m] |x| , lim n→∞ P n (ϕ(ā)) exists and depends only on ϕ(x) and G. Moreover, the rate of convergence does not depend onā, but only on ϕ(x) and G.
(iii) The numbers β and γ from part (ii) are products of numbers of the form µ(R | χ R,i ) or (1 − µ(R | χ R,i )) associated to G as in part (c) of Definition 4.6.
Proof. Part (i) is a direct consequence of Theorem 3.15 in [16], but we point out that there is an unfortunate typo in the cited theorem where '|P n (ϕ(ā))−d| ≤ 1−e −cn ' should read '|P n (ϕ(ā)) − d| ≤ e −cn '. Part (ii) follows from Lemma 4.13 and Proposition 4.41 in [16] [16] and induction on the maximal path rank of the underlying DAG of G ′ .
The next lemma states the expected fact that if p i (x,ȳ), i = 1, . . . , t, is an enumeration without repetition of all complete atomic σ-types that extend a given complete atomic σ-type q(x), then the sum, as i = 1, . . . , t, of the numbers to which the probability of p i (x,ȳ) converges, conditioned on q(x) being true, is 1.
It will be convenient to argue in a context where, for some arbitrary δ > 0 and m ∈ N + , we assume that if p(x,ȳ) is a complete σ-type, |x| + |ȳ| ≤ m and q(x) = p↾x, then all structures that we consider are (p, q, α/(1+δ))-saturated and (p, q, α(1+δ))-unsaturated for some α that depends only on p, q and G. This is justified by the next definition and subsequent lemma.
Proof. Immediate from Proposition 7.8 (ii), since there are only finitely many complete atomic σ-types with at most m variables.  1 (x,ȳ), . . . , ψ k (x,ȳ) :ȳ : p = (x,ȳ)) where ψ 1 , . . . , ψ k are basic probability formulas. The proofs in the general case work out in essentially the same way but the notation becomes messier, for example since the assumptions and notation introduced in Assumption 7.15 for ψ(x,ȳ) need to be considered for all ψ i (x,ȳ).
We begin with a lemma which takes care of an odd case, which however is syntactically possible.
Proof Now it is straightforward to check, using the semantics of P LA(σ) (Definition 3.5), The next lemma justifies the making of some simplifying assumptions in the arguments that follow later (see Assumption 7.15).
Proof. Let A be a finite σ-structure and letā ∈ A |x| . Ifā does not satisfy p = ↾x, then both formulas, withx interpreted asā have the value 0. Ifā satisfies p = ↾x, then for everyb ∈ A |ȳ| such that p = (ā,b) holds. Hence F is applied to the same sequence in both cases and therefore both formulas in the statement of the lemma get the same value.
The previous two lemmas justify the addition of the following assumptions in the main part of the proof of the asymptotic elimination of aggregation functions.
Assumption 7.15. In Lemma 7.16 and Corollary 7.18 we make the following assumptions: Let κ ∈ N + and letx andȳ be sequences of distinct variables such that rng(x) ∩ rng(ȳ) = ∅ and |x| + |ȳ| ≤ κ ∈ N + . Let p = (x,ȳ) be a complete atomic ∅-type, let l = dimȳ(p = ) and let ψ(x,ȳ) denote the basic probability formula where we may, without loss of generality, assume that each p i,j (x,ȳ) is a complete atomic σ-type and p = ⊆ p i,j . Furthermore, we assume (by reordering if necessary) that for all i = 1, . . . , s and all 1 ≤ j, j ′ ≤ t i , p i,j ↾x = p i,j ′ ↾x. Let q i (x) = p i,1 ↾x for each i. Without loss of generality we may also assume that the p i,j (x,ȳ), i = 1, . . . , s, j = 1, . . . , t i , enumerate all complete atomic σ-types with free variablesx,ȳ which extend p = (x,ȳ) (because c i,j is allowed to be zero). Note that l = dimȳ(p i,j ) for all i and j.
We are now ready for the main technical lemma, the proof of which uses condition (1) Proof. Let ε > 0. The conclusion of the lemma will follow if we can show that there is δ > 0 such that for all sufficiently large n 1 and n 2 , all A 1 ∈ Y κ,δ n 1 , all A 2 ∈ Y κ,δ n 2 , all a 1 ∈ [n 1 ] |x| and allā 2 ∈ [n 2 ] |x| , if A 1 |= q i (ā 1 ) and A 2 |= q i (ā 2 ), then Towards the end of the argument we will see that the assumption that F is admissible implies that such δ exists. Let δ > 0 and suppose that For k = 1, 2 letr k = A k ψ(ā k ,b) :b ∈ [n k ] |ȳ| and p = (ā,b) holds and observe that, for everyb ∈ [n k ] |ȳ| , It follows that for each k = 1, 2 and every j = 1, . . . , t i , everyb ∈ [n k ] |ȳ| such that A k |= p i,j (ā k ,b) contributes to a coordinate c i,j in the sequencer k . If l = 0 then p = has a unique extension to a complete atomic σ-type with variables x,ȳ and which includes q i (x), so t i = 1 and, for k = 1, 2, p i,1 (ā k ,ȳ) is realized by the unique tuple which realizes p = (ā k ,ȳ). Hence |r 1 | = |r 2 | = 1 and for k = 1, 2 the unique entry ofr k is c i,1 , sor 1 =r 2 and therefore µ u 1 (r 1 ,r 2 ) = 0. Now suppose that l > 0, so Proposition 7.8 (ii) is applicable. Let α j be the number associated to p i,j by Proposition 7.8 (ii). Since A k ∈ Y κ,δ n k for k = 1, 2 it follows that Suppose that c ∈ [0, 1] and that there are exactly m indices j = j 1 , . . . , j m such that c i,j = c. It follows from (7.3) that, for each k = 1, 2 and sufficiently large n k the number c will occur between (α j 1 + . . . + α jm )(n k ) l /(1 + δ) and (α j 1 + . . . + α jm )(n k ) l (1 + δ) times inr k . In particular, if all α i 1 , . . . , α jm are 0, then c i,j does not occur inr k . From Lemma 7.9 we get α 1 + . . . + α t i = 1. It now follows from definitions 7.1 and 7.2 that µ u 1 (r 1 ,r 2 ) ≤ δg(t i ) where g(t i ) depends only on t i .
Remark 7.17. Suppose for a moment that we would allow formulas of the form F (ψ(x,ȳ) : y), where F is an aggregation function, with the semantic interpretation A F (ψ(ā,ȳ) : Then condition (1) of Definition 7.4 of admissibility sensu novo which was used in the proof of Lemma 7.16 is no longer, in general, applicable in the same proof.
To exemplify this, suppose that σ is the empty signature, thatx is the empty sequence of variables and thatȳ = (y 1 , y 2 ). Consider the formula '(y 1 = y 2 ) → 1/2' which we denote by ψ(x,ȳ) to use the same notation as in the proof of Lemma 7.16. Let A 1 , A 2 , a 1 andā 2 be as in the proof of Lemma 7.16 (soā 1 andā 2 are empty in this example) and for k = 1, 2 letr Then, for k = 1, 2, exactly n k entries ofr k will be 1/2 and exactly n k (n k − 1) entries of r k will be 1, so the proportion of '1/2' is 1/(n k − 1) which is not zero but tends to zero as n k tends to infinity. It follows that µ u 1 (r 1 ,r 2 ) → 0 as n 1 , n 2 → ∞. Since the parameters denoted α i,j in condition (1) of the definition of admissibility sensu novo are required to be nonzero we cannot use condition (1) to conclude that F (r 1 ,r 2 ) is as small as we like if µ u 1 (r 1 ,r 2 ) is sufficiently small. Then there is a basic probability formula θ(x) such that for every ε > 0 there is δ > 0 such that for all sufficiently large n, all A ∈ Y κ,δ n , and allā ∈ [n] |x| , we have A F ψ(ā,ȳ) :ȳ : p = (ā,ȳ) − A θ(ā) < ε.
Proof. Recall that q i (x) are assumed to be as in Assumption 7.15, so each q i (x) is consistent with the restriction of p = to the variablesx. For every i = 1, . . . , s, let d i ∈ [0, 1] be as in Lemma 7.16. Let q ′ 1 (x), . . . , q ′ m (x) enumerate all complete atomic ∅-types in the variablesx which are different from p = ↾x. We show that we can let θ(x) be the formula If A |= q ′ j (ā) for some j, then (no matter what δ is) Now suppose thatā satisfies p = (x) and hence it satisfies q i (x) for some i. Then It follows from Lemma 7.16 that if δ > 0 is small enough, then for every i = 1, . . . , s, all sufficiently large n, all A ∈ Y κ,δ n , and allā ∈ [n] |x| , if A |= q i (ā), then A F ψ(ā,ȳ) :ȳ : p = (ā,ȳ) − d i < ε.

Consequently
A F ψ(ā,ȳ) :ȳ : The next proposition states that one admissible aggregation function can be asymptotically eliminated and this is the main step in the inductive proof of Corollary 7.20 The proof of the proposition uses condition (2) of Definition 7.4 of admissibility sensu novo.
Corollary 7.20. Let ϕ(x) ∈ P LA(σ) and suppose that all aggregation functions in ϕ are admissible. Then ϕ(x) is asymptotically equivalent to a basic probability formula.
Proof. We use induction on the complexity of formulas. If the aggregation rank is 0, that is, if the formula is aggregation-free, then the conclusion follows from Lemma 3.10, since equivalence implies asymptotic equivalence.
Suppose that the aggregation rank of ϕ(x) is larger than 0. We have one case for each way in which ϕ can be constructed from simpler formulas, as in parts (4) and (5) of Definition 3.3. We start with part (4), the "propositional constructions", and consider only one of the subcases, since the other are treated in the same way. Suppose that ϕ(x) is the formula ψ(x) ∧ χ(x). By the induction hypothesis, there are basic probability formulas ψ ′ (x) and χ ′ (x) such that ψ(x) and ψ ′ (x) are asymptotically equivalent and χ(x) and χ ′ (x) are asymptotically equivalent. By Lemma 4.3, ψ(x) ∧ χ(x) is asymptotically equivalent to ψ ′ (x) ∧ χ ′ (x). The formula ψ ′ (x) ∧ χ ′ (x) is aggregation-free, hence (by Lemma 3.10) it is equivalent to a basic probability formula ϕ ′ (x). Then ϕ(x) and ϕ ′ (x) are asymptotically equivalent. Now we turn to part (5) of Definition 3.3 and suppose that ϕ(x) has the form F (ψ 1 (x,ȳ), . . . , ψ k (x,ȳ) :ȳ : p = (x,ȳ)) where F denotes an admissible aggregation function. Then each ψ i is simpler than ϕ so each ψ i (x,ȳ) is, by the induction hypothesis, asymptotically equivalent to a basic probability formula. Then Proposition 7.19 combined with Remark 7.12 implies that ϕ(x) is asymptotically equivalent to a basic probability formula.
Remark 7.21. (Computing an asymptotically equivalent formula without aggregation functions) Corollary 7.20 guarantees that for every ϕ(x) ∈ P LA(σ) with only admissible aggregation functions there is a basic probability formula ψ(x) which is asymptotically equivalent to ϕ(x). If ϕ is aggregation-free then Lemma 3.10 guarantees the existence of such ψ and in practice such ψ(x) can be constructed by, for every complete atomic σ-type p(x), computing the value c p that ϕ(x) takes if p(x) is satisfied. Then ψ(x) will be (up to equivalence) the conjunction of formulas of the form p(x) → c p . (If ϕ is aggregation-free and without free variables, then ϕ has the same value in all structures and we compute this value, call it c p , and then ϕ is equivalent to the basic probability formula ⊤ → c p .) If ϕ is not aggregation-free we first reduce the problem to finding, for each subformula of ϕ(x), say ϕ ′ (x), a basic probability formula which is asymptotically equivalent to ϕ ′ (x). Assuming this has been done and (which is the nontrivial case) that ϕ(x) has the form F (ψ 1 (x,ȳ), . . . , ψ k (x,ȳ) :ȳ : p = (x,ȳ)), where F is admissible, we proceed like this, where to simplify notation we assume that k = 1. Thus let ϕ(x) be F (ψ(x,ȳ) :ȳ : p = (x,ȳ)). So by assumption we have computed a basic probability formula ψ ′ (x,ȳ) which is asymptotically equivalent to ψ(x,ȳ). By modifying ψ ′ if necessary we can assume that it has the form s i=1 t i j=1 p i,j (x,ȳ) → c i,j where reach p i,j is a complete atomic σ-type and for all i, j, k, p i,j ↾x = p i,k ↾x.
If every p i,j is inconsistent with p = then the proof of Lemma 7.13 shows how to form a basic probability formula which is equivalent to ϕ(x). Otherwise, we may (justified by Lemma 7.14) remove all p i,j which are inconsistent with p = and assume that that all conditions in Assumption 7.15 hold.
According to Proposition 7.8, for each p i,j , allā ∈ [m] |x| andb ∈ [m] |ȳ| such that p = (ā,b) holds, the limits lim n→∞ P n (p i,j (ā,b)) and lim n→∞ q i (b) exist, where q i = p i,j ↾x, and are products of numbers associated to G (that is, numbers denoted µ(R | χ R,i ) in Definition 4.6). If these limits are denoted β i,j and γ i , respectively, then let α i,j = β i,j /γ i . The next task is, for each i, to find the limit of F (r), as the length ofr tends to infinity andr has the properties ofr 1 (orr 2 ) in the proof of Lemma 7.16, with α i,j abbreviated as α j . More precisely, given some small δ > 0, large n and assuming that l = dimȳ(p = ) > 0, we constructr of length n as follows: if c ∈ [0, 1] and there there are exactly m indices j = j 1 , . . . , j m such that c i,j = c, then we letr have between (α j 1 + . . . + α jm )n l /(1 + δ) and (α j 1 + . . . + α jm )n l (1 + δ) occurrences of c (and if c = c i,j for all j, thenr has no occurrence of c). Since F is assumed to be admissible, hence admissible sensu novo, the limit of F (r) for suchr as its length tends to infinity exists and let us suppose that the limit is d i (for each index i). Then F (ψ ′ (x,ȳ) :ȳ : p = (x,ȳ)) is asymptotically equivalent to s i=1 (q i (x) → d i ), where q i = p i,j ↾x, as implied by Corollary 7.18 (with ψ ′ in place of ψ). Proposition 7.19 implies that F (ψ(x,ȳ) :ȳ : p = (x,ȳ)) is asymptotically equivalent to s i=1 (q i (x) → d i ). Although we know that the limit of F (r), forr as described above, exists as the length ofr tends to infinity, it may not be clear how to compute it. In this case we can still estimate the limit (assuming that F (r) can be estimated with arbitrarily high precision for every relevantr), by choosing large n, constructingr as above and computing (or estimating) F (r). Since F is admissible we know that for any ε > 0, if n is large enough and δ small enough, then F (r) is within distance ε of the limit, by Condition (1) of Definition 7.4.

Conclusion
We have considered what we call probability logic with aggregation functions (PLA) for expressing queries. P LA uses aggregation functions instead of quantifiers, but can express all queries that are expressible in first-order logic by using the aggregation functions max and min. The motivation comes from data mining, machine learning and statistical relational artificial intelligence where aggregation over a domain is often done with aggregation functions, for example the arithmetic mean of a sequence of reals. Since the mean of a sequence need not be 0 or 1, even if all entries in the sequence are 0 or 1, P LA is a many valued logic with values in the unit interval [0, 1]. A typical query in this context is to ask "Is the value of (a sentence) ϕ in the interval I?".
Then our aim was to study the asymptotic behaviour, as the domain size tends to infinity, of the probability of a query expressible with P LA with respect to certain probability distributions of relevance withing statistical relational AI. As there are so many different kinds of aggregation functions, we do not expect to find a single result that covers the asymptotic behaviour of P LA-formulas with arbitrary aggregation functions.
Hence we identified what we call admissible (or intuitively "partially uniformly continuous") aggregation functions for which we could prove asymptotic results. The arithmetic and geometric means and max and min are admissible, but we also gave examples of several other admissible aggregation functions. We demonstrated the expressive power of P LA restricted to admissible aggregation functions by, for example, showing that every stage in the approximation of the SimRank can be expressed by a P LA-formula with only admissible aggregation functions (and by similar but simpler arguments one can show that every approximation stage of the Page rank [5] can be expressed by a P LA-formula with only admissible aggregation functions).
We have used the formalism lifted Bayesian network for inducing, for any finite relational signature σ, a probability distribution on the set of σ-structures with a given finite domain. Roughly speaking, a lifted Bayesian network for σ is a directed acyclic graph with vertex set σ which specifies (conditional) probabilities to each R ∈ σ by case distinctions expressed by formulas of conditional probability logic (CPL), that use only the parents of R in the directed acyclic graph. CP L is a 2-valued logic that extends first-order logic and with which one can express that a relative frequency (of events expressed by CP L-formulas) belongs to a given interval, or that the difference between two relative frequencies belongs to a certain interval. This type of construction in CP L can be iterated as many times as one likes, just as quantifiers can be nested in first-order logic.
With this set up our main result was that every P LA-formula ϕ(x) with only admissible aggregation functions is asymptotically equivalent to a P LA-formula ψ(x) without aggregation functions, which in rough terms means that the values of the two formulas will with high probability be almost the same (and ψ(x) can only take finitely many different values). From the proof one can extract a procedure for finding such ψ and the procedure needs only ϕ and the lifted Bayesian network as input.
From the main result we derive a convergence law for P LA-formulas with only admissible aggregation functions. It states that for any such formula ϕ(x) there are α 1 , . . . , α k , c 1 , . . . , c k ∈ [0, 1] (for some k) such that the sum of the α i is 1 and, for any sequence of parametersā from the domain, every ε > 0 and i, with probability tending to α i the value of ϕ(ā) will belong to [c i − ε, c i + ε].
The studies begun here have continued in [17] where we, among other things, prove similar results in a context allowing more probability distributions, including such where, with high probability, some or all relations are "sparse", but at the cost of only allowing what we call strongly admissible aggregation functions in P LA-formulas. The arithmetic and geometric means are strongly admissible but max and min are not. Due to results about random graphs [22] it is impossile, in general, to asymptotically eliminate max and min from P LA-formulas in the context of sparse graphs.