Conditional probability logic, lifted bayesian networks and almost sure quantifier elimination

We introduce a formal logical language, called conditional probability logic (CPL), which extends first-order logic and which can express probabilities, conditional probabilities and which can compare conditional probabilities. Intuitively speaking, although formal details are different, CPL can express the same kind of statements as some languages which have been considered in the artificial intelligence community. We also consider a way of making precise the notion of lifted Bayesian network, where this notion is a type of (lifted) probabilistic graphical model used in machine learning, data mining and artificial intelligence. A lifted Bayesian network (in the sense defined here) determines, in a natural way, a probability distribution on the set of all structures (in the sense of first-order logic) with a common finite domain $D$. Our main result is that for every"noncritical"CPL-formula $\varphi(\bar{x})$ there is a quantifier-free formula $\varphi^*(\bar{x})$ which is"almost surely"equivalent to $\varphi(\bar{x})$ as the cardinality of $D$ tends towards infinity. This is relevant for the problem of making probabilistic inferences on large domains $D$, because (a) the problem of evaluating, by"brute force", the probability of $\varphi(\bar{x})$ being true for some sequence $\bar{d}$ of elements from $D$ has, in general, (highly) exponential time complexity in the cardinality of $D$, and (b) the corresponding probability for the quantifier-free $\varphi^*(\bar{x})$ depends only on the lifted Bayesian network and not on $D$. The main result has two corollaries, one of which is a convergence law (and zero-one law) for noncritial CPL-formulas.


Introduction
We consider an extension of first-order logic which we call conditional probability logic (Definition 3.1), abbreviated CPL, with which it is possible to express statements about probabilities, conditional probabilities, and to compare conditional probabilities which makes it possible to express statements about the (conditional) independence (or dependence) of events or random variables. Remarks 3.4, 3.6 and Example 3.5 below illustrate this. The semantics of CPL deals only with finite structures and assumes that all elements in a structure are equally likely, so (conditional) probabilities correspond to proportions. Quite similar formal languages, which aim at expressing the same sort of statements, have been studied within the field of artificial intelligence by Halpern [11, Section 2] and Bacchus et. al. [2,Definition 4.1]. CPL is more expressive than the probability logic L ωP considered by Keisler and Lotfallah in [16] (which cannot express conditional probabilities) and our first theorem (Theorem 3.14) is a generalization of their main result [16,Theorem 4.9], both in the sense that the language considered here is more expressive and that we consider a wider range of probability distributions.
A graphical model for a probability distribution and a set of random variables is a "graphical" way of describing the conditional dependencies and independencies between the random variables. In such a probabilistic model the random variables are also viewed as the vertices of a directed or undirected graph where edges indicate conditional dependencies and independencies [3,23]. The notion of a Bayesian network is one of the most well-known graphical models. A Bayesian network G for a probability space (S, µ) and Date: 18 August 2021. binary random variables X 1 , . . . , X n is determined by the following data where, for any distinct i 1 , . . . , i k ∈ {1, . . . , n} and values x 1 , . . . , x k that X i 1 , . . . , X i k can take, the tuple (x 1 , . . . , x k ) denotes the event that X i 1 = x 1 , . . . , X i k = x k : (1) A (not necessarily connected) directed acyclic graph (DAG), also denoted G, with vertex set V = {X 1 , . . . , X n } such that if there is an arrow (directed arc) from X i to X j then i < j. (2) To each vertex X i ∈ V , if X j 1 , . . . , X j k are the parents of X i (in the DAG) and x i , x j 1 , . . . , x j k are values that X i , X j k , . . . , X j k , respectively, can take, then the conditional probability P(x i | x j 1 , . . . , x j k ) is specified 1 in such a way that the following holds: (a) For each j, the set of parents of X j , denoted par(X j ), is minimal (with respect to set inclusion) with the property that for every i < j, X i and X j are conditionaly independent over par(X j ). The conditional independence means that if par(X j ) = {X l 1 , . . . , X l k }, then for all possible values x i , x j , x 1 , . . . , x k of X i , X j , X l 1 , . . . , X l k , respectively, we have whenever both sides are defined. 2 (b) The joint probability distribution on X 1 , . . . , X n is determined by the conditional probabilities associated with the vertices of G. More precisely: The event (x 1 , . . . , x n ) can be computed recursively by repeatedly using the following identities which hold for any choice of distinct i 1 , . . . , i k ∈ {1, . . . , n}: . . , j m ∈ {i 1 , . . . , i k−1 } and par(X i k ) = {X j 1 , . . . , X jm }.
If G is a Bayesian network as defined above, then it follows (from e.g. [23, Definition 1.2.1 and Theorems 1.2.6, 1.2.7]) that (i) For every X j ∈ V , X j and the set of all predecessors of X j are conditionally independent over par(X j ). (ii) For every X j ∈ V , X j and the set of all nondescendants of X j (except X j itself) are conditionally independent over par(X j ). Moreover: if condition (i) or condition (ii) holds, then {X 1 , . . . , X n } can be ordered so that conditions (a) and (b) above hold without changing the arrows of the DAG.
Graphical models are used in machine learning, data mining and artificial intelligence in (probability based) learning and inference making. To illustrate this by a very simple example, suppose that we have a finite set A of some kind of objects and properties P, Q and R which objects in A may, or may not, have. We can view A as a "training set". The training set can be formalized as a σ-structure with domain A where σ = {P, Q, R} and P, Q and R are also viewed as unary relation symbols. Let µ be a probability distribution on A and let binary random variables X, Y, Z : A → {0, 1} be defined by X(a) = 1 if a has the property P and X(a) = 0 otherwise (for every a ∈ A); Y (a) = 1 if a has the property Q and Y (a) = 0 otherwise; and analogously for Z and R. Suppose that, after some "learning", we have found a Bayesian network G for (A, µ) and X, Y, Z such that its DAG is as illustrated and the (conditional) probabilities µ(X = 1), µ(Y = 1 | X = 1), µ(Y = 1 | X = 0), µ(Z = 1 | X = 1) and µ(Z = 1 | X = 0) are specified. (In real X Y Z applications, it is unlikely that a relatively simple probabilistic model,which is desirable for computational efficiency, fits the training data completely and usually this is not even the goal because one wants to avoid so-called "overfitting"; so one can view the Bayesian network as a reasonable approximation of the training data.) An application of the Bayesian network G is to make predictions about probabilities on some other finite domain B. Let us now make the following assumptions, partly based on G but where the independency assumptions between different objects are imposed. Every b ∈ B has the probability µ(X = 1) of having property P , independently of what the case is for other b ∈ B. For every b ∈ B, if b has the property P then the probability that b also has the property Q (R) is µ(Y = 1 | X = 1) (µ(Z = 1 | X = 1)), independently of what the case is for other elements in B, and if b does not have the property P then the probability that b has Q (R) is µ(Y = 1 | X = 0) (µ(Z = 1 | X = 0)), independently of what the case is for other elements. Based on this we can define a probability distribution (as in Definition 3.11) on the set W B of all σ-structures with domain B, where each member of W B represents a "possible scenario" or "possible world". For every formula ϕ(x 1 , . . . , x k ) of conditional probability logic and any choice of b 1 , . . . , b k ∈ B we can now ask what the probability is that ϕ(x 1 , . . . , x k ) is satisfied by b 1 , . . . , b k . When using a Bayesian network G for prediction as in the example we have "lifted" it from its original context (the set A) and used it on a new domain of objects. Also when moving from the fixed domain A to an arbitrary domain B we have, in a sense, "lifted" our reasoning from propositional logic to first-order logic, or some extension of it. Perhaps this is the reason why the term "lifted graphical model" is used by some authors when a graphical model is used to describe or predict (conditional) probabilities of events on an arbitrary or unknown domain; see [18] for a survey of lifted graphical models. In the subfield of machine learning, data mining and artificial intelligence called statistical relational learning (or sometimes probabilistic logic learning) the "lifted" perspective is central as one here considers general domains of objects and properties and relations that may, or may not, hold for, or between, the objects. (See for example [6,9].) There is no consensus regarding what, exactly, a lifted Bayesian network (let alone lifted graphical model) is or how it determines a probability distribution on a set of "possible worlds". Different approaches have been considered. A key question is how the probability that a random variable takes a particular value is influenced by its parents in the DAG of the Bayesian network. The above example uses the most simple form of aggregation/combination rules. Another approach is to use aggregation/combination functions. Some explanation of these notions are found in e.g. [6, p. 31, 54], [18, p. 18], [13]. I have not seen any definition of the notions of aggregation rule and aggregation function, but aggregation rules tend to be mainly linguistic descriptions (think of formal logic) of how the value of a random variable depends on the values of other random variables in the network, while aggregation functions are specifications of the dependence mainly in terms of functions. From a practical point of view it probably makes sense to have the freedom to adapt one's lifted graphical model to the application at hand, so uniformity may not be a primary concern for practicians. But to prove mathematical theorems about lifted graphical models, and the probability distributions that they induce, we need (of course) to make precise what we mean, which is done in Section 3.
In this article we use aggregation rules expressed by formulas of conditional probability logic (CPL). The idea is that for any relation symbol R, of arity k say, there are an integer ν R , numbers α R,i ∈ [0, 1], and CPL-formulas χ R,i (x 1 , . . . , x k ) for i = 1, . . . , ν R such that if χ R,i (x 1 , . . . , x k ) holds, then the probability that R(x 1 , . . . , x k ) holds is α R,i . This formalism is strong enough to express, for example, aggregation rules of the following kind for arbitrary m, any CPL-formula ψ(x 1 , . . . , x k ) and any α i ∈ [0, 1], i = 0, . . . , m: For all i = 0, . . . , m, if the proportion of k-tuples that satisfy ψ(x 1 , . . . , x k ) is in the interval [i/m, (i + 1)/m], then the probability that R(x 1 , . . . , x k ) holds is α i .
Once we have made precise (as in Definition 3.8) what we mean by a lifted Bayesian network G for a finite relational signature σ (i.e. a finite set of relation symbols, possibly of different arities) and also made precise (as in Definition 3.11) how G determines a probability distribution P D on the set of all σ-structures with domain D (for some finite set D), then we can ask questions like this: Given a CPL-formula, ϕ(x 1 , . . . , x k ) and d 1 , . . . , d k ∈ D what is the probability that ϕ(x 1 , . . . , x k ) is satisfied by the sequence d 1 , . . . , d k ? Or more formally, what is P D {D ∈ W D : D |= ϕ(d 1 , . . . , d k )} ? It is computationally very expensive to answer the question by analyzing all members of W D , since, in general, the cardinality of W D is in the order of 2 |D| r where r is the maximal arity of relation symbols in σ and |D| is the cardinality of D. However, our first theorem (Theorem 3.14) says that if ϕ is "noncritical" in the sense that its conditional probability quantifiers (if any) avoid "talking about" certain finitely many critical numbers, then there is a quantifier-free formula ϕ * (x 1 , . . . , x k ) such that, with probability approaching 1 as |D| → ∞, ϕ and ϕ * are equivalent. If we are given such ϕ * then we can easily compute the probability α * = P D {D ∈ W D : D |= ϕ * (d 1 , . . . , d k )} by using only the lifted Bayesian network G, so in particular this computation is independent of the cardinality of D. Moreover, α * only depends on the quantifier-free formula ϕ * and not on the choice of elements d 1 , . . . , d k . We also get that, as |D| → ∞, But of course, given a noncritical ϕ, we have to first find a quantifier-free ϕ * which is "almost surely" equivalent to ϕ. The proof of Theorem 3.14 produces an algorithm for doing this. At one step in the algoritm one may need to transform a quantifierfree formula into an equivalent disjunctive normal form and this computational task is, in general, NP-hard. But if one assumes that all quantifier-free subformulas of ϕ are disjunctive normal forms, then the algorithm that produces ϕ * works in quadratic time in the length of ϕ if we assume that an arithmetic operation, a comparison of two numbers and a comparison of two literals is completed in one time step (more details in Remark 3.17).
The proof of Theorem 3.14 gives some by-products such as a "logical limit/convergence law" (Theorem 3.15) and a result (Theorem 3.16) saying that for every lifted Bayesian network as in Definition 3.8 there is an "almost surely equivalent" lifted Bayesian network in which all aggregation formulas (as in Definition 3.8) are quantifier-free. The original zero-one law for first-order logic, independently of Glebskii et. al. [10] and Fagin [8], becomes a special case of Theorem 3.15 when we restrict attention to first-order sentences and the DAG of the lifted Bayesian network has no edges and all the probabilities associated to the vertices are 1/2.
A couple of earlier results exist which are similar to the results of this article. Jaeger [13] has considered another sort of lifted Bayesian network which he calls relational Bayesian network. Instead of using using aggregation/combination rules (as we do in this article) relational Bayesian networks use aggregation/combination functions. Theorem 3.9 in [13] is as analogoue of Theorem 3.15 below for first-order formulas in the setting of relational Bayesian networks which use only "exponentially convergent" combination functions. Theorem 4.7 in [14] has a similar flavour as Theorem 3.16 below, but [14] considers "admissible" relational Bayesian networks and a probability measure defined by such on the set of structures with a common infinite countable domain.
The results of this article are mainly motivated by concepts and methods in machine learning, data mining and artificial intelligence, but if the results are seen from the perspective of finite model theory and random discrete structures, then they join a long tradition of results concerning logical limit laws and almost sure elimination of quantifiers. For a very small and eclectic selection of work in this field, ranging from the first to some of the last, see for example [8,10,12,15,19,21,22,24,25].
The organization of this article is as follows. Section 2 introduces the basic conventions used in this article as well as some basic definitions. Section 3 defines the main notions of the article and states the main results. Section 4 gives the proofs of these results. The last section is a brief discussion about further research in the topics of formal logic, probabilistic graphical models, almost sure elimination of quantifiers and convergence laws.

Preliminaries
Basic knowledge of first-order logic and first-order structures is expected and there are many sources in which the reader can find this background, for example [20]. In this section we clarify and define some basic notation and terminology concerning logic and graph theory. Formulas of a formal logic will usually be denoted by ϕ, ψ, θ or χ, possibly with sub-or superscripts. Logical variables will be denoted x, y, z, u, v, w possibly with sub-or superscripts. Finite sequences/tuples of variables are similarly denotedx,ȳ,z, etc. If a formula is denoted by ϕ(x) then it is, as usual, assumed that all free variables of ϕ occur in the sequencex (but we do not insist that every variable inx occurs in the formula denoted by ϕ(x)); moreover in this context we will assume that all variables inx are different although this is occasionally restated. In general, finite sequences/tuples of elements are denoted byā,b,c, etc. For a sequenceā, rng(ā) denotes the set of elements occuring inā. For a sequenceā, |ā| denotes its length. For a set A, |A| denotes its cardinality. In particular, if ϕ is a formula of some formal logic (so ϕ is a sequence of symbols), then |ϕ| denotes its length. Sometimes we abuse notation by writing 'ā ∈ A' when we actually mean that rng(ā) ⊆ A. By a signature (or vocabulary) we mean a set of relation symbols, function symbols and constant symbols. A signature σ is called finite relational if it is finite as a set and all symbols in it are relation symbols. We use the terminology 'σ-structure', or just structure if we omit mentioning the signature, in the sense of first-order logic. Structures in this sense will be denoted by calligraphic letters A, B, C, etc. The domain (or universe) of a structure A will often be denoted by the corresponding non-calligraphic letter A. A structure is called finite if its domain is finite. If σ ⊂ σ are signatures and A is σ-structure, then A σ denotes the reduct of A to the signature σ . We let [n] denote the set {1, . . . , n}. We use the terminology atomic (σ-)formula in the sense of first-order logic with equality, so in particular, the expression 'x = y' is an atomic σ-formula for every signature σ, including the empty signature σ = ∅. It will also be convenient to have a special symbol which is viewed as an atomic σ-formula for every signature σ; the formula is interpreted as being true in every structure. (i) If ϕ(x) is an atomic σ-formula, then ϕ(x) and ¬ϕ(x) are called σ-literals.
(ii) A consistent set of σ-literals is called an atomic σ-type. When denoting an atomic σ-type by p(x) it is assumed (as for formulas) that if a variable occurs in a formula in p(x), then it belongs to the sequencex. (iii) If p(x) is an atomic σ-type, then the identity fragment of p(x) is the set of formulas of the form x i = x j or x i = x j that belong to p(x).
Remark 2.2. Note that if p(x) is complete atomic σ-type wherex = (x 1 , . . . , x m ), then this implies that for all 1 ≤ i, j ≤ m, either x i = x j or x i = x j belongs to p(x). (Also observe that if p(x,ȳ) is a complete atomic σ-type and dimȳ(p(x,ȳ)) = d, then for every σ-structure A and for allā,b such that A |= p(ā,b), we have rng(b \ rng(ā) = d. (i) If p(x) is an atomic σ-type, then the notation 'A |= p(ā)' means that A |= ϕ(ā) for every formula ϕ(x) ∈ p(x), or in other words thatā satisfies every formula in p(x) with respect to the structure A, or (to use model theoretic language) that a realizes p(x) with respect to the structure A. (ii) Ifȳ is a sequence of different variables (such that no variable occurs in bothx andȳ) and q(x,ȳ) is an atomic σ-type, then q(ā, ) for all i = 0, . . . , k − 1; the length of this path is the number of edges in it, in other words, the length is k.
Definition 2.4. (About directed acyclic graphs) Suppose that G is a DAG with nonempty and finite vertex set V . Let a ∈ V .
We let par (a) denote the set of parents of a. Observe that if G is a DAG with vertex set V and mpr(G) = r and G is the induced subgraph of G with vertex set V = {a ∈ V : mpr(a) < r}, then, for every a ∈ V , the mp-rank of a is the same no matter if we compute it with respect to G or with respect to G; it follows that mpr(G ) = r − 1.
We call a random variable binary if it can only take the value 0 or 1. The following is a direct consequence of [1, Corollary A.1.14] which in turn follows from the Chernoff bound [4]: Lemma 2.5. Let Z be the sum om n independent binary random variables, each one with probability p of having the value 1. For every ε > 0 there is c ε > 0, depending only on ε, such that the probability that |Z − pn| > εpn is less than 2e −cεpn .

Conditional probability logic and lifted Bayesian networks
In this section we define the main concepts of this article and state the main results.
Definition 3.1. (Conditional probability logic) Suppose that σ is a signature. Then the set of conditional probability formulas over σ, denoted CP L(σ), is defined inductively as follows: (1) Every atomic σ-formula belongs to CP L(σ) (where 'atomic' has the same meaning as in first-order logic with equality).
where x is a variable. (As usual, in practice we do not necessarily write out all parentheses.) We consider ∀xϕ to be an abbreviation of ¬∃x¬ϕ.
In both these new formulas all variables of ϕ, ψ, θ and τ that appear in the sequenceȳ become bound. So this construction can be seen as a sort of quantification, which may become more clear by the provided semantics below.
A formula ϕ ∈ CP L(σ) is called quantifier-free if it contains no quantifier, that is, if it is constructed from atomic formulas by using only connectives ¬, ∧, ∨, →, ↔.
(2) Suppose that A is a finite σ-structure and let ϕ( and in this case we say that is defined similarly.

Remark 3.3. (A warning)
Observe that with the given semantics, because the first formula may fail to be true forā because ψ(ā, A) = ∅ or τ (ā, A) = ∅ in which case the corresponding fraction is undefined and then also the other formula is false forā.
Remark 3.4. (Expressing conditional probabilities, or just probabilities) Let x = (x 1 , . . . , x k ) andȳ = (y 1 , . . . , y l ). If τ (x,ȳ) denotes the formula y 1 = y 1 and θ(x,ȳ) denotes the formula y 1 = y 1 , then expresses that the proportion of tuplesȳ that satisfy ϕ(x,ȳ) among thoseȳ that satisfy ψ(x,ȳ) is at least r. Thus the formula expresses a conditional probability if we assume that all l-tuples have the same probability. Under the stated assumptions, let us abbreviate (3.1) by If we assume, in addition, that ψ(x,ȳ) is the formula y 1 = y 1 , then each of (3.1) and (3.2) expresses that the proportion of l-tuplesȳ that satisfy ϕ(x,ȳ) is at least r.
Example 3.5. Suppose that M is a unary relation symbol and F a binary relation symbol. Consider the statement "For at least half of all persons x, if at least one third of the friends of x are mathematicians, then x is a mathematician". If M (x) expresses that "x is a mathematician" and F (x, y) expresses that "x and y are friends", then this statement can be formulated in CPL, using the abbreviation (3.2), as If A represents a database from the real world, then it is unlikely that events of interest are (conditionally) independent according the precise mathematical definition. Instead one may look for "approximate (conditional) independencies". If r is changed to be a small positive number and if then the dependency between X and Y is weak, or one could say that they are "approximately independent up to an error of r". The reason for the more complicated formula is to make "r-approximate independence" symmetric.
Definition 3.8. (Lifted Bayesian network) Let σ be a finite relational signature. In this article we define a lifted Bayesian network for σ to consist of the following components: (a) An acyclic directed graph (DAG) G with vertex set σ.
We will use the same symbol (for example G) to denote a lifted Bayesian network and its underlying DAG. The intuitive meaning of µ(R | χ R,i ) in part (c) is that ifā is a sequence of elements from a structure andā satisfies χ R,i (x), then the probability that Remark 3.9. (Subnetworks) Let G denote a lifted Bayesian network for σ. Suppose that σ ⊂ σ is such that if R ∈ σ then par(R) ⊆ σ . Then it is easy to see that σ determines a lifted Bayesian network G for σ such that • the vertex set of the underlying DAG of G is σ , • for every R ∈ σ , the number ν R and the formulas χ R,i , i = 1, . . . , ν R , are the same as those for G, • for every R ∈ σ and every 1 ≤ i ≤ ν R , the numbers µ(R | χ R,i ) are the same as those for G. We call the so defined lifted Bayesian network G for σ the subnetwork (of G) induced by σ . Definition 3.10. (The case of an empty signature) (i) As a technical convenience we will also consider a lifted Bayesian network, denoted G ∅ , for the empty signature ∅. According to Definition 3.8 the vertex set of the underlying DAG of G ∅ is ∅, the empty set. It follows that no formulas or numbers as in parts (b) and (c) of Definition 3.8 need to be specified for G ∅ .
(ii) For every n ∈ N + , let W ∅ n denote the set of all ∅-structures with domain [n] and note that every W ∅ n has only one member which is just the set [n]. (iii) For every n ∈ N + , let P ∅ n be the unique probability distribution on W ∅ n . Definition 3.11. (The probability distribution in the general case) Let σ be a finite nonempty relational signature and let G denote a lifted Bayesian network for σ. Suppose that the underlying DAG of G has mp-rank ρ. For each 0 ≤ r ≤ ρ let G r be the subnetwork (in the sense of Remark 3.9) induced by σ r = {R ∈ σ : mpr(R) ≤ r} and note that G ρ = G. Also let σ −1 = ∅, G −1 = G ∅ and let P −1 n be the unique probability distribution on W −1 n = W ∅ n . By induction on r we define, for every r = 0, 1, . . . , ρ, a probability distribution P r n on the set W r n of all σ r -structures with domain [n] as follows: For every A ∈ W r n , otherwise.
Finally we let W n = W ρ n and P n = P ρ n , so P n is a probability distribution on the set of all σ-structures with domain [n].
Remark 3.12. ((Ir)reflexive and/or symmetric relations) Let A be a set and let R ⊆ A k be a k-ary relation on A. We call R reflexive if for all a ∈ A the k-tuple containing a in each coordinate belongs to R. We call R irreflexive if for every (a 1 , . . . , a k ) ∈ R we have a i = a j if i = j. We call R symmetric if for every (a 1 , . . . , a k ) ∈ R, every permutation of (a 1 , . . . , a k ) also belongs to R. Consider Definition 3.11 and let R ∈ σ. We can make sure that P n (A) > 0 only if the interpretation of R in A is reflexive (respectively irreflexive) by choosing the formulas χ R,i and associated (conditional) probabilities in an appropriate way. To achieve that P n (A) > 0 only if the interpretation of R in A is symmetric we can do like this: In the definition of λ(A, R, i,ā) (in Definition 3.11) we interpret R(ā) as meaning that R is satisfied by every permutation ofā and we interpret ¬R(ā) as meaning that R is not satisfied by any permutation ofā. We also need to assume that for every k-tupleā, either every permutation ofā satisfies χ R,i (x) or no permutation ofā satisfies χ R,i (x). Then the proof of Theorems 3.14 -3.16 still works out with very small modifications.
Definition 3.13. Let σ, W n and P n be as in Definition 3.11. (i) If ϕ(x) ∈ CP L(σ) andā ∈ [n] |x| , then we define P n (ϕ(ā)) = P n {A ∈ W n : A |= ϕ(ā)} . (ii) If ϕ ∈ CP L(σ) has no free variables (i.e. is a sentence), then we define P n (ϕ) = P n {A ∈ W n : A |= ϕ} . Now we can state the main results. They use the notion of noncritical formula which depends on the lifted Bayesian network under consideration. Since this notion is quite technical and relies on some technical results (concerning the convergence of the probability that an atomic type is realized) which will be proved later, we give the precise definition later in Definition 4.30; in that context it will be more evident why the definition of noncritical formula looks as it looks. For now I only say this: For every m ∈ N + there are finitely many numbers (depending only on G) which are called m-critical (according to Definition 4.29). Roughly speaking, a formula ϕ(x) ∈ CP L(σ) is noncritical (details in Definition 4.30) if for every subformula (of ϕ(x)) of the form the number r is not the difference of two m-critical numbers where m = |x| + qr(ϕ). It follows that every first-order formula is noncritical. For a longer discussion on the topics of critical formulas and (non)convergence see Remark 3.18.
Theorem 3.14. (Almost sure elimination of quantifiers for noncritical formulas) Let σ be a finite relational signature, let G be a lifted Bayesian network and, for each n ∈ N + , let P n be the probability distribution induced by G (according to Definition 3.11) on the set W n of all σ-structures with domain [n]. Suppose that every aggregation formula χ R,i of G is noncritical. If ϕ(x) ∈ CP L(σ) is noncritical, then there are a quantifier free formula ϕ * (x) ∈ CP L(σ) and c > 0, which depend only on ϕ(x) and G, such that for all sufficiently large n Theorem 3.15. (Convergence for noncritical formulas) Let σ, G, W n and P n be as in Theorem 3.14. For every noncritical ϕ(x) ∈ CP L(σ) there are c > 0 and 0 ≤ d ≤ 1, depending only on ϕ(x) and G, such that for every m ∈ N + and everyā ∈ [m] |x| , P n (ϕ(ā)) − d ≤ e −cn for all sufficiently large n ≥ m. The number d is always critical (i.e. l-critical for some l). Moreover, if ϕ has no free variable (i.e. is a sentence), then P n (ϕ) converges to either 0 or 1.
Theorem 3.16. (An asymptotically equivalent "quantifier-free" network) Let σ, G, W n and P n be as in Theorem 3.14. Then for every aggregation formula for all sufficiently large n ≥ m, where P * n is the the probability distribution on W n according to Definition 3.11 if G is replaced by G * and P n is replaced by P * n . Remark 3.17. (Computational complexity) The proof of Theorem 3.14 indicates an algorithm for finding the quantifier-free ϕ * from ϕ. Suppose that we fix the lifted Bayesian network (so σ is also fixed) and try to understand how efficient the algorithm is with respect to the length of ϕ. The crucial step is Definition 4.35 and Lemma 4.37 which together show how to eliminate a quantifier of the form constructed in part (3) of Definition 3.1 in a satisfiable formula. However, at this step in the proof we assume that the formulas inside the latest quantification are written as disjunctions of complete atomic types. The problem of transforming an arbitrary quantifier-free formula into an equivalent disjunctive normal form is NP-hard so the algorithm is not necessarily efficient in general (given the current state of affairs in computational complexity theory). But if we assume that every quantifier-free subformula of ϕ is a disjunctive normal form, then the number "steps" that the indicated algorithm needs to find ϕ * is O(|ϕ| 2 ) if |ϕ| denotes the length of ϕ and "step" means an arithmetic operation 3 , a comparison of two numbers or a comparison of two literals. This essentially follows from Remark 4.36 because the number of times that a quantifier needs to be eliminated is bounded by |ϕ|.
Remark 3.18. (Necessity of noncriticality) It follows from Remark 3.4 that for every sentence ψ of the language L ωP considered in [16] there is a sentence of CPL which has exactly the same finite models as ψ. Therefore it follows from [16, Proposition 3.1] that the assumption that ϕ is noncritical in Theorems 3.14 and 3.15 is necessary. More precisely, let σ contain one binary relation symbol and no other symbols and let G be a lifted Bayesian network for σ where µ(R(x, y) | x = x) = 1/2 and 'x = x' is the only aggregation formula associated to R. Then, according to [16,Proposition 3.1] interpreted in the present context, there is a ("critical") sentence ψ ∈ CP L(σ) such that P n (ψ) does not converge.
We now generalize the idea of [16, Proposition 3.1] to show that nonconvergence for at least some "critical" formulas is the case for many (if not all) lifted Bayesian networks. Let σ be a finite relational signature, let G be a lifted Bayesian network for σ such that every aggregation formula of G is noncritical. Let ϕ(x) ∈ CP L(σ) be a noncritical formula wherex = (x 1 , . . . , x k ) and k ≥ 2. By Theorem 3.15 there is 0 ≤ d ≤ 1 such that for every n 0 ∈ N + and everyā ∈ [n 0 ] k , lim n→∞ P n (ϕ(ā)) = d. By the same theorem, d is a "critical" number, that is, l-critical for some l according to Definition 4.29. Suppose that 0 < d < 1 (which would typically be the case if ϕ(x) is atomic). Furthermore, suppose that all numbers of the form µ(R | χ R,i ) associated with G (as in Definition 3.8) are rational. It then follows (from Definition 4.29 and results and definitions preceeding it) that d is rational. Now suppose that for all n and any choice of distinctā 1 , . . . ,ā m ∈ [n], the binary random variables X 1 , . . . , X m with domain W n are independent, where X i (A) = 1 if A |= ϕ(ā i ) and X i (A) = 0 otherwise. From item (b) of Remark 4.7 one can derive that if ϕ(x) is atomic then this independence assumption holds.
Now we consider the following formula, denoted ψ(x 1 , . . . , x k−1 ): In structures in W n it expresses that "there are exactly dn elements y such that ϕ(x , y) is satisfied" wherex = (x 1 , . . . , x k−1 ). We will show that P n (∃x ψ(x )) does not converge as n → ∞.
If dn is not an integer and A ∈ W n , then A |= ∃x ϕ(x ), so P n (∃x ψ(x )) = 0. Note that as d is rational there are infinitely many n such that dn is an integer and infinitely many n such that dn is not an integer. Hence it suffices to show that P n (∃x ψ(x )) gets arbitrarily close to 1 for sufficiently large n such that dn is an integer. Fix a large n such that dn is an integer andā ∈ [n] k−1 . Then, for every b ∈ [n], P n (ϕ(ā, b)) is very close to d. By the assumption about independence above, it follows that the probability that "there are exactly dn elements b such that ϕ(ā, b) holds" is close to n dn dn(1 − d) (1−d)n . Thus the probability of the negation of this statement is close to 1 − n dn dn(1 − d) (1−d)n and, using the assumption about independence again, it follows that P n (¬∃x ϕ(x )) is close to Finally let us take a broader look at a formula of the form Suppose that ϕ i is noncritical for every i. By Theorem 3.15, there are numbers d 1 , d 2 , d 3 , d 4 such that for any n 0 , everyā ∈ [n 0 ] |ȳ| and every i, lim n→∞ P n (ϕ i (ā)) = d i . Suppose that r is chosen so that r + d 1 /d 2 = d 3 /d 4 . Then the formula (3.3) is critical (see Definition 4.30). The intuition is now that for all large enough n and almost all A ∈ W n , the numbers r + |ϕ 1 (A) ∩ ϕ 2 (A)|/|ϕ 2 (A)| and |ϕ 3 (A) ∩ ϕ 4 (A)|/|ϕ 4 (A)| are very close to each other, but this does not exclude the possibility that for infinitely many n the first number is at least as large as the second and for infinitely many n the second number is larger. In this case the truth value of the formula (3.3) will alternate between true and false infinitely many times as n tends to infinity in typical (or "almost all") members of W n . One may also ask if it is necessary in the above theorems that all aggregation formulas χ R,i are noncritical. I do not currently know but I assume that the answer is yes. 4. Proof of Theorems 3.14, 3.15 and 3.16 Let σ be a finite relational signature and G a lifted Bayesian network for σ. The proof proceeds by induction on the mp-rank of the underlying DAG of G. The base case will not be when the mp-rank of G is 0. Instead the base case will be the "empty" lifted Bayesian network for the empty signature ∅, as described in Definition 3.10. In the case of an empty signature (and consequently empty lifted Bayesian network) Theorems 3.14 -3.16 are a direct consequence of Lemma 4.13 below.
The rest of the proof concerns the induction step. The induction step is proved by Proposition 4.41 and Corollary 4.42 which rely (only) on Assumption 4.1 below which states the general assumptions related to the lifted Bayesian network and Assumption 4.10 below which states the induction hypothesis. Theorems 3.14 -3.  • σ is a finite relational signature and σ is a proper subset of σ.
• For every R ∈ σ \ σ and every 1 ≤ i ≤ ν R , µ(R | χ R,i ) denotes a real number in the interval [0, 1]. (Sometimes we write µ(R(x) | χ R,i (x)) wherex is a sequence of variables the length of which equals the arity of R.) • For every σ-structure A, every R ∈ σ \ σ , every 1 ≤ i ≤ ν R and everyā ∈ A r where r is the arity of R, let otherwise.
• For every n ∈ N + , W n is the set of all σ -structures with domain [n] = {1, . . . , n} and P n is a probability distribution on W n . • For every n ∈ N + , W n is the set of all σ-structures with domain [n].
Definition 4.2. For every n ∈ N and every A ∈ W n we define Then P n is a probability distribution on W n which we may call the P n -conditional probability distribution on W n . Notation 4.3. The notation in this section will follow the following pattern: σ -structures, in particular members of W n , will be denoted A , B , etcetera; subsets of W n will be denoted X (or X n ), Y (or Y n ), etcetera; σ-structures and subsets of W n will be denoted similarly but without the (symbol for) "prime".
In the proofs that follow we will consider "restrictions" of P n to some subsets of W n according to the next definition.
(ii) If A ∈ W n , then we let Then P Y and P A are probability distributions on W Y and W A , respectively; if this is not clear see Remark 4.7 below. Note also that if Y ⊆ W n , A ∈ Y and A ∈ W A , then and in particular, taking Y = W n , we have, for every A ∈ W n , (4.2) P n (A) = P n (A σ )P A σ (A). We now state a few basic lemmas which will be useful. Proof. By using (4.2) in the first line below we get Lemma 4.6. For every n, (i) if X ⊆ W n and A ∈ W n , then P n (X | W A ) = P A (X ∩ W A ), and (ii) if X ⊆ W n and Y ⊆ W n , then P n (X | W Y ) = P Y (X ∩ W Y ).
Proof. Let X ⊆ W n .
(i) Let A ∈ W n . Using Lemma 4.5 in the first line below and (4.2)) in the second line below, we get (ii) Let Y ⊆ W n . Using that X ∩ W Y is the disjoint union of all X ∩ W A such that A ∈ Y , Lemma 4.5, part (i) of this lemma and (4.1), we get Remark 4.7. (About P A ) Fix any n and any A ∈ W n . For every R ∈ σ \ σ , every 1 ≤ i ≤ ν R and everyā ∈ χ R,i (A ), let Ω(R, i,ā) = {0, 1} and let P R,i,ā be the probability distribution on Ω(R, i,ā) with P R,i,ā (1) = µ(R | χ R,i ). Then let P Ω be the product measure on Consider the map which sends A ∈ W A to the finite sequencē where κ(R, i,ā) = 1 if A |= R(ā) and κ(R, i,ā) = 0 otherwise. This map is clearly a bijection from W A to Ω and, for every A ∈ W A , P A (A) = P Ω (κ A ).
The next lemma is a direct consequence of (b) of Remark 4.7.
Lemma 4.8. Suppose that p(x 1 , . . . , x m ) and q(x 1 , . . . , x m ) are (possibly partial) atomic (σ \σ )-types. Also assume that if ϕ is an atomic σ-formula which does not have the form x = x or the form and ϕ ∈ p or ¬ϕ ∈ p, then neither ϕ nor ¬ϕ belongs to q. Then, for every n, every A ∈ W n and all distinct a 1 , . . . , a m ∈ [n], the event {A ∈ W A : A |= p(a 1 , . . . , a m )} is independent from the event {A ∈ W A : A |= q(a 1 , . . . , a m )} in the probability space (W A n , P A ).
If p (x,ȳ) and q (x) are atomic σ -types and q ⊆ p , then the notions of (p , q , α)saturated and (p , q , α)-unsaturated are defined in the same way, but considering finite σ -structures instead.
(3) For every complete atomic σ -type p (x) with |x| ≤ k there is a number which we denote P (p (x)), or just P (p ), such that for all sufficiently large n and all a ∈ [n] which realize the identity fragment of p , P n {A ∈ W n : A |= p (ā)} − P (p (x)) ≤ δ (n). (4) For every complete atomic σ -type p (x,ȳ) with |xȳ| ≤ k and 0 < dimȳ(p (x, y)) = |ȳ|, if q (x) = p x and P (q ) > 0, then for all sufficiently large n, every A ∈ Y n is (p , q , α/(1 + ε ))-saturated and (p , q , α(1 + ε ))-unsaturated if α = P (p (x,ȳ))/P (q (x)). (5) For every χ R,i (x) as in Assumption 4.1 there is a quantifier-free σ -formula χ * R,i (x) such that for all sufficiently large n and all A ∈ Y n , A |= ∀x χ R,i (x) ↔ χ * R,i (x) . Remark 4.11. (Some special cases) (i) As a technical convenience we allow empty types (and this does not contradict our definition of an atomic type). For example, in Definition 4.9, we allow the possibility thatx is an empty sequence and consequently q(x) = ∅ and p(x,ȳ) is really just p(ȳ).
(ii) For an empty atomic σ -type p we let P (p ) = 1 and in this case we also interpret the set {A ∈ W n : A |= p (ā)} as being equal to W n . Then part (3) of Assumption 4.10 makes sense also for a empty type p . (iii) If p (ȳ) is a complete atomic σ -type and P (p ) = 0, then for all sufficiently large n and all A ∈ Y , p is not realized in A (i.e. p (A ) = ∅). The reason is this: Let x denote an empty sequence and let q (x) be the empty atomic σ -type, so q ⊆ p . For large enough n, every A ∈ W n is (p , q , P (p )(1 + ε ))-unsaturated by part (4) of Assumption 4.10. If P (p ) = 0 this implies that p has no realization in A. Lemma 4.12. Suppose that p (x) is a complete atomic σ -type and that p(x) ⊇ p (x) is a (possibly partial) atomic σ-type. There is a number which we denote P(p(x) | p (x)), or just P(p | p ), such that for all sufficiently large n, allā ∈ [n] and all A ∈ Y n such that A |= p (ā), Proof. Suppose thatā,b ∈ [n] and A , B ∈ Y n are such that A |= p (ā) and B |= p (b). Let R ∈ σ \ σ . By part (5) of Assumption 4.10, for each 1 ≤ i ≤ ν R , there is a quantifier free formula χ * R,i such that (if n is large enough) χ R,i is equivalent to χ * R,i in every structure in Y n . It follows that ifc andd are subsequences ofā andb, respectively, of length equal to the arity of R, then either A |= χ R,i (c) and B |= χ R,i (d), or A |= χ R,i (c) and B |= χ R,i (d). The conclusion of the lemma now follows from (a) and (b) of Remark 4.7.
Lemma 4.13. (The base case) For every k ∈ N + and every ε > 0, if σ = ∅, P n is the uniform probability distribution 4 on W n for all n and δ : N + → R ≥0 is any 4 In fact the uniform probability distribution is the only probability distribution on W n since W n is a singleton if σ = ∅ (which we assume in this lemma).
Proof. Suppose that σ = ∅ and let k ∈ N + and ε > 0 be given. Then, for every n, W n contains a unique structure which is just the set [n] which has probability 1. Let δ : N + → R ≥0 be any function such that lim n→∞ δ (n) = 0. For every complete atomic σ -type p (x) let P (p (x)) = 1. Observe that, for every n, ifā ∈ [n] andā realizes the identity fragment of p (x), thenā realizes p (x) in the unique A of W n . Hence, for trivial reasons we have (3). For every n let Y n be the set of all A ∈ W n such that for every complete atomic σ -type p (x,ȳ) with |xȳ| ≤ k and 0 < dimȳ(p (x, y)) = |ȳ|, if q(x) = p x, then for all sufficiently large n, every A ∈ Y n is (p , q , 1/(1 + ε ))-saturated and (p , q , (1 + ε ))unsaturated. Suppose that p (x,ȳ) is a complete atomic σ -type with |xȳ| ≤ k and 0 < dimȳ(p (x, y)) = |ȳ|. Let q (x) = p x and suppose that A |= q (ā) where A ∈ W n . Then A |= p (ā,b) for everyb ∈ [n] consisting of different elements no one of which occurs inā. There are n |ȳ| − Cn |ȳ|−1 suchb for some constant C. So if n |ȳ| − Cn |ȳ|−1 ≥ n |ȳ| 1+ε then A is (p , q , 1/(1 + ε ))-saturated. For trivial reasons, A is also (p , q , (1 + ε ))unsaturated. Hence, we have proved (4). The last claim of the lemma follows from Proposition 4.32 the proof of which works out in exactly the same way if σ and Y n (in that proof) is replaced by σ and Y n , respectively, and we assume (4). In other words, the almost everywhere elimination of quantifiers follows from the saturation and unsaturation properties stated in (4).
Lemma 4.14. Suppose that X n ⊆ W n . Then for all sufficiently large n, P n (X n ) ≤ P n (X n ∩ W Y n ) + δ (n).
Proof. We have P n (X n ) = P n (X n ∩ W Y n ) + P n (X n \ W Y n ) and, using Lemma 4.5, we have Hence P n (X n ) ≤ P n (X n ∩ W Y n ) + δ (n).
Lemma 4.15. Suppose that p (x) is a complete atomic σ -type and that p(x) ⊇ p (x) is an (possibly partial) atomic σ-type. Letting n be sufficiently large, then for allā ∈ [n] and letting Z n be the set of all A ∈ Y n such that A |= p (ā) we have where P(p(x) | p (x)) is like in Lemma 4.12.
Proof. For every A ∈ W n we have A |= p (ā) if and only if A σ |= p (ā). Therefore W Y n ∩ {A ∈ W n : A |= p (ā)} = W Z n . By Lemma 4.6 we have P n {A ∈ W n : A |= p(ā)} | W Y n ∩ {A ∈ W n : A |= p (ā)} = P Z n {A ∈ W Z n : A |= p(ā)} . 5 In the sense of Definition 4.30.
Then, using (4.1) and Lemma 4.12, we get Lemma 4.16. Suppose that p (x) is a complete atomic σ -type and that p(x) ⊇ p (x) is a (possibly partial) atomic σ-type. Then for all sufficiently large n and allā ∈ [n] which realize the identity fragment of p (x) (and hence of p) we have Proof. Letā ∈ [n] realize the identity fragment of p (x). Furthermore, let X n be the set of all A ∈ W n such that A |= p(ā), let X n be the set of all A ∈ W n such that A |= p (ā), and let Z n be the set of all A ∈ Y n such that A |= p (ā).
From parts (2) and (3) of Assumption 4.10 it easily follows that (for large enough n) P n (Z n )/P n (Y n ) differs from P n (Z n ) by at most δ (n), P n (Z n ) differs from P n (X n ) by at most δ (n) and P n (X n ) differs from P (p (x)) by at most δ (n). By Lemma 4.6, P n (X n | W Y n ) = P Y n (X ∩ W Y n ). Then, using (4.1) and Lemma 4.12, we have ≤ P (p (x)) + 3δ (n).
Lemma 4.17. Suppose that p (x) is a complete atomic σ -type and that p(x) ⊇ p (x) is an (possibly partial) atomic σ-type. Then for all sufficiently large n and allā ∈ [n] which realize the identity fragment of p (x) we have P n {A ∈ W n : A |= p(ā)} − P(p(x) | p (x)) · P (p (x)) < 5δ (n).
Proof. Letā ∈ [n] realize the identity fragment of p (x). Let X n be the set of all A ∈ W n such that A |= p(ā). We have P n X n = P n X n | W Y n P n W Y n + P n X | W n \ W Y n P n W n \ W Y n . By the use of Lemma 4.5 and by part (2) of Assumption 4.10, we also have It follows that P n X | W n \ W Y n P n W n \ W Y n ≤ δ (n). By Lemma 4.5 and part (2) of Assumption 4.10, P n W Y n = P n Y n ≥ 1 − δ (n). It now follows from Lemma 4.16 that P n X n differs from P(p(x | p (x)) · P (p (x)) by at most 5δ (n) (for sufficiently large n).
With this definition we can reformulate Lemma 4.17 as follows: is an (possibly partial) atomic σ-type such that p σ is a complete atomic σ -type, then, for all sufficiently large n and allā ∈ [n] which realize the identity fragment of p(x) we have In Lemma 4.12 we defined the notation P(p(x) | p (x)) when the atomic σ-type p has no more variables than the complete atomic σ -type p . From Definition 4.18 of P(p(x)) it follows that P(p(x) | p (x)) = P(p(x))/P (p (x)). Now we extend this notation to pairs of (p(x,ȳ), q(x)) where p(x,ȳ) is a complete atomic σ-type and q(x) = p x. Definition 4.22. Suppose that p(x, y) is a complete atomic σ-type and let q(x) = p x. We define P(p(x,ȳ) | q(x)) = P(p(x,ȳ)) P(q(x)) .
Proof. Using Definition 4.18 and Lemmas 4.20 and 4.21 we get Lemma 4.24. Suppose that n is large enough that part (4) of Assumption 4.10 holds. Suppose that p(x, y) and q(x) are complete atomic σ-types such that |xy| ≤ k, dim y (p) = 1 and q ⊆ p. Let γ = P(p(x, y) | q(x)) and A ∈ Y n . Then is at least 1 − 2n |x| e −c ε γn where the constant c ε > 0 depends only on ε .
The next lemma generalizes the previous one to types p(x,ȳ) where the length ofȳ is greater than one. Lemma 4.25. Suppose that n is large enough that part (4) of Assumption 4.10 holds. Suppose that p(x,ȳ) and q(x) are complete atomic σ-types such that |xȳ| ≤ k, dimȳ(p) = |ȳ| and q ⊆ p. Let γ = P(p(x,ȳ) | q(x)) and A ∈ Y n . Then is at least 1 − 2 |ȳ| n |x|+|ȳ|−1 e −c ε γn where the constant c ε > 0 depends only on ε .
The following corollary follows directly from the definition of Y n and Lemma 4.25.
Lemma 4.28. There is a constant c > 0 such that for all sufficiently large n, P n Y n ≥ 1 − e −cn 1 − δ (n) .
Proof. There are, up to changing variables, only finitely many atomic σ-types p(x) such that |x| ≤ k. It follows from Lemma 4.25 that there is a constant c > 0 such that for all large enough n and all A ∈ Y n , Then, reasoning similarly as in the proof of Lemma 4.16 (using (4.1)), we get Using part (2) of Assumption 4.10 we know get Definition 4.29. A real number is called critical if it is m-critical for some positive integer m. We say that a real number α is m-critical if at least one of the following holds: (a) There are a complete atomic σ-type q(x), distinct complete atomic σ-types p 1 (x,ȳ), . . ., p l (x,ȳ) and a number 1 ≤ l ≤ l such that |xȳ| ≤ m, q ⊆ p i for all 1 ≤ i ≤ l and .
(b) α = l /l where 0 ≤ l ≤ l are integers and l is, for any choice of distinct variables x 1 , . . . , x m , less or equal to the number of pairs (p(x 1 , . . . , x m ), q(x 1 , . . . , x d )) where d < m ≤ m, p and q are complete atomic σ-types such that q ⊆ p and dim (x d ,...,x m ) (p) = 0.
From the definition it follows that (for every m ∈ N) there are only finitely many mcritical numbers. It also follows (from part (b)) that, for every m, 0 and 1 are m-critical.
In part (a) above we allowx to be empty in which case the type q(x) is omitted and P(p i | q) is replaced by P(p i ).
(i) We call ϕ(x) noncritical if the following holds: If is a subformula of ϕ(x) (where ψ, θ, ψ * and θ * denote formulas in CP L(σ) and z andȳ may have variables in common withx) then, for all l-critical numbers α and β, r = α − β.
Since, for every l ∈ N, there are only finitely many l-critical numbers it follows that for every noncritical ϕ(x) ∈ CP L(σ), if one just chooses ε > 0 sufficiently small, then ϕ(x) is ε-noncritical. Definition 4.26 and Lemma 4.28 motivate the next definition.
It follows from Definition 4.31 and Lemma 4.27 that if p(x,ȳ) and q(x) are complete atomic σ-types such that |xȳ| ≤ k, d = dimȳ(p) > 0, q ⊆ p, P(q) > 0, and γ = P(p | q), then for every n, every A ∈ Y n is (p, q, γ/(1 + ε))-saturated and (p, q, γ(1 + ε))unsaturated. By an analogous argument as in Remark 4.11 (iii), it now follows that if p(x) is a complete atomic σ-type such that |x| ≤ k and P(p) = 0, then for all sufficiently large n, p is not realized in any member of Y n . In the proof of the proposition below we will sometimes abuse notation by treating an atomic type p(x) as the formula obtained by taking the conjunction of all formulas in p(x). So when writing, for example, ' m i=1 m i j=1 p i,j (x, y)' in the proof below we view p i,j (x, y) in this expression as the conjunction of all formulas in the complete atomic type p i,j (x, y). Proposition 4.32. (Elimination of quantifiers) Suppose that ϕ(x) ∈ CP L(σ) is εnoncritical and |x| + qr(ϕ) ≤ k. Then there is a quantifier-free formula ϕ * (x) such that for all sufficiently large n and every A ∈ Y n , A |= ∀x(ϕ(x) ↔ ϕ * (x)).
Proof. Let an ε-noncritical ϕ(x) ∈ CP L(σ) be given with |x| + qr(ϕ) ≤ k. We will assume thatx is nonempty (i.e. that ϕ has free variables). In Remark 4.39 it is indicated which changes we need to make in the simpler case when ϕ has no free variable. The proof proceeds by induction on quantifier-rank. Suppose that qr(ϕ) > 0 since otherwise we can just let ϕ * be ϕ and then we are done. If for all sufficiently large n, for all A ∈ Y n and for allā ∈ [n] |x| we have A |= ϕ(ā) then we can let ϕ * (x) be the formula x 1 = x 1 and then A |= ∀x(ϕ(x) ↔ ϕ * (x)) for all sufficiently large n and all A ∈ Y n . So from now on we assume that, for arbitrarily large n, there are A ∈ Y n andā such that A |= ϕ(ā).
Suppose that ϕ(x) is ∃yψ(x, y) for some ψ(x, y). Then we have |xy| + qr(ψ) ≤ k and qr(ψ) < qr(ϕ) so, by the induction hypothesis, we may assume that ψ(x, y) is quantifierfree. By assumption there are n, A ∈ Y n ,ā and b such that A |= ψ(ā, b). Then there are m ≥ 1, different complete atomic σ-types q i (x), i = 1, . . . , m, and, for each i, m i ≥ 1 and different complete atomic σ-types p i,j (x, y), j = 1, . . . , m i , such that q i ⊆ p i,j for all j and ψ(x, y) is equivalent to m i=1 m i j=1 p i,j (x, y). If, for some i, P(q i (x)) = 0, then q i is not realized in any A ∈ Y n (for large enough n) and can be removed. So we may assume that all P(q i ) > 0 for all i. If, for some i and j, P(p i,j | q) = 0 then P(p i,j ) = 0 so p i,j is not realized in any A ∈ Y n for large enough n. So we may also assume that P(p i,j | q i ) > 0 for all i and j. If dim y (p i,j ) = 1 then, by the definitions of Y n and ε, it follows that for all sufficiently large n and all A ∈ Y n , if A |= q i (ā) then A |= ∃yp i,j (ā, y). If dim y (p i,j ) = 0 then, for all n and all A ∈ W n , if A |= q i (ā) then A |= p i,j (ā, b) for some b ∈ rng(ā). It follows that for all sufficiently large n and all A ∈ Y n , A |= ∀x ∃yψ(x, y) ↔ m i=1 q i (x) . Now we consider the case when ϕ(x) has the form Since the second case (4.4) is treated by straightforward variations of the arguments for taking care of the first case (4.3) we only consider the first case (4.3). Observe that |xȳ| + qr(ψ) ≤ k (because qr(ϕ) = |ȳ| + max{qr(ψ), qr(θ), qr(ψ * ), qr(θ * )}) and similarly for θ, ψ * and θ * . Since all the formulas ψ, θ, ψ * and θ * have smaller quantifier-rank than ϕ we may, by the induction hypothesis, assume that ψ(x,ȳ), θ(x,ȳ), ψ * (x,ȳ) and θ * (x,ȳ) are quantifier-free formulas.
If θ(x,ȳ) or θ * (x,ȳ) is unsatisfiable, then, by the provided semantics, we have A |= ϕ(ā) for every σ-structure A and every sequence of elementsā from the domain of A. In this case ϕ(x) is equivalent to any contradictory quantifier-free formula with free variables amongx, for example the formula x 1 = x 1 . So from now on we assume that θ(x,ȳ) and θ * (x,ȳ) are satisfiable.
Until further notice, assume also that ψ(x,ȳ) ∧ θ(x,ȳ) and ψ * (x,ȳ) ∧ θ * (x,ȳ) are satisfiable. Then there are distinct complete atomic σ-types q i (x), p i,j (x,ȳ), for i = 1, . . . , m and j = 1, . . . , m i , and distinct complete atomic σ-types t i (x), s i,j (x,ȳ), for i = 1, . . . , l and j = 1, . . . , l i , such that the following conditions hold: ȳ) for all i = 1, . . . , l and all j = 1, . . . , l i . ȳ) it follows that m ≤ l and m i ≤ l i for all i ≤ m. Moreover, for every i ≤ m there is i such that q i = t i , and for all i ≤ m and all j ≤ m i there are i , j such that p i,j = s i ,j . Therefore we may assume in addition (by reordering if necessary) that For the same reasons as in the previous case we may assume that all of P(q i ), P(p i,j ), P(t i ) and P(s i,j ) are positive for all i and j. Next we define  Now we can reason in exactly the same way with regard to the formulas ψ * (x,ȳ) and θ * (x,ȳ). So there are numbers m * , l * , m * i and l * i and complete atomic σ-types q * i (x) for i = 1, . . . , m * , p * i,j (x,ȳ) for i ≤ m * and j = 1, . . . , m * i , t * i (x) for i = 1, . . . , l * and s * i,j (x,ȳ) for i ≤ l * and j = 1, . . . , l * i such that all which has been said about ψ, θ, q i , p i,j , t i and s i,j holds if these formulas and types are replaced by ψ * , θ * , q * i , p * i,j etcetera, and the numbers m, l, m i , l i are replaced by m * , l * , m * i and l * i . Moreover, we define numbers d * i,j , e * i,j , d * i , e * i , α * i,j , α * i , β * i,j , β * i and γ * i in the same way as above, using the types q * i , p * i,j , t * i and s * i,j instead of q i , p i,j , t i and s i,j . So far we have assumed that ψ(x,ȳ) ∧ θ(x,ȳ) and ψ * (x,ȳ) ∧ θ * (x,ȳ) are satisfiable. If ψ(x,ȳ) ∧ θ(x,ȳ) is not satisfiable, then we let m = 0 and we view the disjunction m i=1 m i j=1 p i,j (x,ȳ) as "empty" and hence always false. In this case we always have i > m so it follows that γ i = 0 for all i = 1, . . . , l. Similar conventions apply if ψ * (x,ȳ)∧θ * (x,ȳ) is not satisfiable. With these conventions the case when any one of the mentioned formulas is unsatisfiable is taken care of by the rest of the proof. .
i,j and s * i,j , respectively. Proof. We split the argument into cases corresponding to the three first cases of Definition 4.33. Let A ∈ Y n .
First suppose that d i = e i > 0 and hence If d i,j = 0 then p i,j (ā, A) = 1 and each member of the unique tuple realizing p i,j (ā,ȳ) belongs toā. It follows that By similiar reasoning (and since we assume d i = e i ) we get From (4.6), (4.7) and (4.8) we get (4.9) Now suppose that d i = e i = 0. Then γ i = m i /l i . Also, each p i,j (ā,ȳ) and each s i,j (ā,ȳ) has a unique realization in A. Since we assume that p i,j = p i,j if j = j and s i,j = s i,j if j = j we get , and now the inequalities of (a) and (b) follow trivially. Next, suppose that d i < e i . Then γ i = 0. By similar reasoning as before, A) for sufficiently large n.
It follows that .
Since e i > 0 we can argue as we did to get (4.8), so we have Depending on whether d i > 0 or d i = 0 we get, by arguing as in previous cases, Since d i < e i we get, in either case, where C > 0 is a constant that depends only on the types p i,j and s i,j . The proof of part (d) is, of course, exactly the same (besides the relevant replacements of symbols). Lemmas 4.37 and 4.38 below show that ϕ(x) is equivalent, in every A ∈ W n for all large enough n, to a quantifier-free formula which depends only on ϕ(x) and the lifted Bayesian network G. As noted after Definition 4.29, 0 is a ζ-critical number for every ζ, so r > 0 (since ϕ is noncritical). Observe that it follows from Definitions 4.29 and 4.33 that γ i and γ * i are (|x| + qr(ϕ))-critical numbers for all i. Lemma 4.37. Suppose that I = ∅. Then for all sufficiently large n, all A ∈ Y n and all a ∈ [n] |x| , A |= r + ψ(ā,ȳ) | θ(ā,ȳ) ȳ ≥ ψ * (ā,ȳ) | θ * (ā,ȳ) ȳ if and only if A |= i∈I t i (ā).
Then we argue just as we did in the beginning of the proof of Lemma 4.37 to get (4.10) and find 1 ≤ i ≤ l and 1 ≤ i ≤ l * such that t i = t * i . Since I = ∅ we must have i / ∈ I and therefore r + γ i < γ * i . Now we can continue to argue exactly as in the proof of Lemma 4.37 to get a contradiction in each one of the cases 1-4 in that proof. r + ψ(ȳ) | θ(ȳ) ȳ ≥ ψ * (ȳ) | θ * (ȳ) ȳ , where we can assume that ψ, θ, ψ * and θ * are quantifier-free. Then there are distinct types p i (ȳ), i = 1, . . . , m and distinct types s i (ȳ), i = 1, . . . , l. We can now define numbers γ and γ * similarly as each γ i (and γ * i ) was defined above. We now get an analogoue of Lemma 4.34 which gives the same kind of upper and lower bounds of m i=1 p i (A) l i=1 s i (A) in terms of γ. If r + γ ≥ γ * then, by the noncriticality of (4.23), we get r + γ > γ * and by the ε-noncriticality of the same formula we get r + γ/(1 + 2ε) 2 > γ * (1 + 2ε) 2 . Now we can argue similarly as in the "converse direction" in the proof of Lemma 4.37 and conclude that (4.23) is true in all A ∈ Y n for all sufficiently large n; hence (4.23) is equivalent to in all such A. Now suppose that r + γ < γ * and suppose, towards a contradiction, that there are arbitrarily large n and A ∈ Y n in which (4.23) holds. Then we can argue as in the first part of the proof of Lemma 4.37 and get a contradiction. Hence, for all sufficiently large n, (4.23) is false in all A ∈ Y n ; consequently, (4.23) is equivalent to ¬ in all such A. (The case when ϕ has the form ∃yψ(ȳ) is easier and analogous to the argument in the beginning of the proof of Proposition 4.32 so this part is left to the reader.) Now the proof of Proposition 4.32 is completed.  (1) lim n→∞ δ(n) = 0.
Proof. Parts (1) and (2)  considered. For which combinations of formal logical language and lifted graphical model do we get "almost sure elimination of quantifiers" and/or "logical limit laws"? Do we get more expressive formalisms by using aggregation functions than if we use aggregation rules, or vice versa? How do different combinations of formal language and graphical model relate to each other? In what sense is a combination (formal language 1, graphical model 1) "better" than a combination (formal language 2, graphical model 2)? What are reasonable candidates for the relation "A is better/stronger than B"? Some thoughts in this direction appear in the last part of [5]. One can consider conditional probabilities which are not constant, but depend on the size of the set of elements (or tuples) satisfying the condition in question. As a special case we have probabilities that depend on the size of the whole domain, as in previous work on logical zero-one laws in random graphs [24,25].) What if the probability of a tupleā satisfying a relation is dependent on whether another tupleb satisfies the same relation (as in [19,21] for example)?
A situation that seems natural in the context of artificial intelligence is to have an underlying fixed structure and on top of it relations that are "governed" by some probabilistic graphical model. The underlying fixed structure could be represented by a τ -structure A for some signature τ . For another signature σ (disjoint from τ ) we could consider the set of expansions of A to (τ ∪ σ)-structures where the probabilities of these extensions are governed by some probabilistic model and the underlying structure A. To formalize this using the set up of this article, one can modify W ∅ n in Definition 3.10 to contain exactly one τ -structure with domain [n] and W n will be the set of all (τ ∪ σ)structures that expand the uniquen structure in W ∅ n . The definition of the probability distribution P n on W n can now depend not only on the lifted Bayesian network G but also on the unique structure in W ∅ n . It seems obvious that, in order to get similar results as in this article, one needs to assume some sort of uniformity regarding the unique structure in W ∅ n for cofinitely many n.