Towards the entropy-limit conjecture

,


Introduction
Inductive logic seeks to determine how much certainty to attach to a conclusion proposition ψ, given premiss propositions ϕ 1 , . . ., ϕ k to which attach measures of certainty X 1 , . . ., X k respectively.That is, the main task is to find Y such that where | ≈ signifies the inductive entailment relation.Often, X 1 , . . ., X k , Y are probabilities or sets of probabilities.There are many possible semantics for inductive logic [21].One key approach stems from the work of Carnap, who provided a continuum of inductive entailment relations [9][10][11]37].An alternative approach, which is the focus of this paper, is to apply the maximum entropy principle of Jaynes [24,25].According to this approach one should consider, from all the probability functions that satisfy the premisses, those with maximum entropy, and let Y be the set of probability values that these functions give to the conclusion ψ.
If the underlying logical language is a finite propositional language, then this latter proposal is rather straightforward to implement and has many nice properties [38]. 1 However, if the language is a first-order predicate language L with infinitely many constant symbols, certain intriguing questions arise.In particular, there are two main ways to implement the proposal in the predicate-language case, and it is not entirely clear as to whether the resulting inductive logics agree.
One approach, due to Barnett and Paris [7], proceeds as follows: (i) reinterpret the premisses as constraints on the probabilities of sentences of a finite predicate language L n that has n constant symbols; (ii) determine the function P n that maximises entropy on this finite language, subject to constraints imposed by the reinterpreted premisses; (iii) draw inductive inferences using the function P ∞ defined by P ∞ (θ) df = lim n→∞ P n (θ) for any sentence θ of L. (The technical details will be explained below.)A second approach, explored by Williamson [47,48], proceeds as follows: (i) consider probability functions defined on the language L as a whole; (ii) deem one probability function P to have greater entropy than another function Q if H n (P ), where H n is the entropy function on the finite sublanguage L n , dominates H n (Q) for sufficiently large n; (iii) draw inductive inferences using those functions P † , from all the probability functions on L that satisfy the premisses, that have maximal entropy (i.e., no other function satisfying the premisses has greater entropy).Again, see below for details.
The first approach, which we shall call the entropy-limit approach, has the advantage that it is more constructive, so it can be easier to calculate the probabilities required for inductive inference.The second approach, which we shall call the maximal-entropy approach, has the advantage that it yields determinate results in certain cases where the entropy-limit approach does not.This is because the entropy-limit approach faces what is known as the finite model problem: contingent premisses can become inconsistent when reinterpreted as applying to a finite domain.
These approaches to inductive logic would be strengthened if it could be shown that they give the same results where they are both applicable.Then one could use the maximal-entropy approach to provide a general semantics for inductive logic, but use the entropy-limit approach where a more constructive approach is helpful.
Following some results of Rafiee Rad [41], discussed in the next section, Williamson [48, p. 191] articulated the following conjecture: Entropy-limit Conjecture.Where P ∞ exists and satisfies the constraints imposed by the premisses, it is the unique function with maximal entropy from all those that satisfy the premisses, i.e., P † = P ∞ .
If the entropy-limit conjecture is true, this would lend support to the claim that maximising entropy leads to a canonical inductive logic-a goal that has hitherto proved very elusive [48].In this paper, we provide new evidence for the entropy-limit conjecture.We show that the entropy-limit conjecture is true for a single premiss that takes the form of a categorical Π 1 sentence ∀ xθ( x) where θ( x) is quantifier-free (Section 3); for various scenarios in which there are multiple non-categorical premisses, ϕ X 1  1 , . . ., ϕ X k k where X i is a probability or set of probabilities that attaches to the sentence ϕ i (Section 4); and for certain general cases in which convergence of the n-entropy maximiser P n to the entropy limit P ∞ is sufficiently fast (Section 5).
While the general status of the entropy-limit conjecture remains open, these new results verify important consequences of the conjecture.Thus, when taken together with previous results (outlined in Section 2), these new results provide inductive support for the general entropy-limit conjecture.
Normal models While the quest for a viable inductive logic provides key motivation for this research, the results of this paper are also relevant to a rather different problem: the characterisation of the most normal model of a first-order theory.Consider a first-order language L. Let T be a finite consistent set of first-order axioms in L. There have been different approaches to defining the default or most normal model for T , depending on how one would interpret the default model.One can, for example, consider the prime models (the smallest canonical models) as default models (see Chang and Keisler [14,p. 96] and Hodges [22, p. 336] for instance).Another approach would be to interpret normality in terms of closure properties and require default models to be, for example, existentially closed.Other approaches have considered the default model as the 'average' model and try to characterise this in terms of the distribution of models (see for example [5,6,[18][19][20]).Another way that this question can be interpreted was posed in [39], and studied further in [40,42,43], as: Given a finite (consistent) set T of first-order axioms, from a language L and a structure M with domain {t 1 , t 2 , . ..} over L, which we only know to be a model of T , what probability should we assign to a sentence θ(t 1 , . . ., t n ) being true in M?
Then any set of first-order axioms can be seen as imposing a probability function over the sentences of the language, in which the probability assigned to a sentence θ is interpreted as the probability that it will hold in a random model of T .The question is, how can a set of first-order axioms determine a probability function in the most natural way [44]?
The constraint that M is a model for T requires the probability assignment to give probability 1 to all sentences in T and consequently to all sentences logically implied by them.There is, however, a large set of probability functions that will satisfy this constraint but which will differ on the probability that they will assign to other sentences in L. One can further trim this set by imposing extra conditions on the way that these probabilities are to be assigned.And, by doing so specify what it means for M to be the default model.
One example is to make this assignment of probabilities in such a way that captures the notion of averageness or typicality.In the literature, this is referred to as the limiting centre of mass assignment (see for example [34,38,39]).Another approach, followed in [40,42,43], and with which we will be concerned here, characterises a default model as being maximally uninformative with respect to the sentences of the language not implied by T .These maximally uninformative probability assignments are taken to be maximum entropy probability functions.
If the entropy-limit conjecture is true, this would lend support to the claim that maximising entropy leads to a canonical model characterisation for first-order theories.Thus, the results of this paper are relevant to the characterisation of normal models.

The formal framework
In this section we set out the rudiments of the formal framework and some notational conventions, and we survey previous work relevant to the entropy-limit conjecture.
The predicate languages Throughout this paper we consider a first-order predicate language L, with countably many constant symbols t 1 , t 2 , . . .and finitely many relation symbols, U 1 , . . ., U n .The atomic sentences, i.e., sentences of the form where k is the arity of the relation U i , will be denoted by a 1 , a 2 , . .., ordered in such a way that atomic sentences involving only constants among t 1 , . . ., t n occur before those atomic sentences that also involve t n+1 .We denote the sentences of L by SL and the set of quantifier-free sentences by QF SL.
We will also consider the finite sublanguages L n of L, where L n has only the first n constant symbols t 1 , . . ., t n but the same relation symbols as L. L n has finitely many atomic sentences a 1 , . . ., a r n .We call the state descriptions of L n (i.e., the sentences of the form ±a 1 ∧ • • • ∧ ±a r n ), n-states.We let Ω n be the set of n-states for each n.Note that |Ω n | = 2 r n , and every n-state ω n ∈ Ω n has |Ω n+1 |/|Ω n | = 2 r n+1 −r n many n + 1-states, ω n+1 which extend it (i.e., ω n+1 ω n ).We denote the sentences of L n by SL n .
We use N ϕ (or, when ϕ is clear from the context, simply N ) to refer to the largest number n such that t n appears in ϕ ∈ SL.
For a sentence ϕ ∈ SL and fixed n ≥ N ϕ , we can reinterpret ϕ as a sentence of L n , by interpreting ∃xθ(x) We use the notation (ϕ) n , or if there is no ambiguity, simply ϕ n , to denote this reinterpretation of ϕ in L n .For any sentence ϕ we denote by [ϕ] n the set of n-states that satisfy ϕ.We denote the number of n-states in Probability A probability function P on L is a function P : SL −→ R ≥0 such that: P1: If τ is a tautology, i.e., |= τ , then P (τ ) = 1.P2: If θ and ϕ are mutually exclusive, i.e., |= ¬(θ ∧ ϕ), then P (θ ∨ ϕ) = P (θ) + P (ϕ).P3: A probability function on L n is defined similarly.We shall use the notation P and P n to denote the set of all probability functions on L and L n respectively.Conditional probability is defined here in terms of unconditional probabilities: P (θ|ϕ) := P (θ ∧ ϕ)/P (ϕ) if P (ϕ) > 0. The following result is central to probability as defined on a predicate language: Theorem 3 (Gaifman's Theorem [17]).Every probability function is determined by the values it gives to the quantifier-free sentences.
Since the probability of a quantifier-free sentence ϕ is determined by the probability of the n-states, for any n ≥ N ϕ , every probability function is determined by the values it gives to the n-states.
Example 4 (Equivocator function).The equivocator function P = is defined by: for each n.The restriction P = L n of P = to L n is a probability function on L n , for any n.To simplify notation, we will use P = to refer to these restrictions, as well as to the function on L itself.In addition, we will say that a sentence θ has measure x if x is the probability that the equivocator function attaches to θ, P = (θ) = x.

Entropy
The n-entropy of a probability function P (which is defined on either L or L n ) is defined as: We follow the usual conventions in taking 0 log 0 = 0 and the logarithm to be the natural logarithm.
We now turn to entailment relationships in inductive logic of the form where ϕ 1 , . . ., ϕ k , ψ ∈ SL and each X i is a member or subset of the unit interval.In the case in which X i = 1, the premiss ϕ i is certain, the superscript X i may be omitted, and ϕ i is called categorical.
We next introduce the two key approaches to making sense of such a relationship, the entropy-limit approach and the maximal-entropy approach.
The entropy-limit approach Suppose X 1 , . . ., X n are probabilities or closed intervals of probabilities.Let N = max{N ϕ 1 , . . ., N ϕ k }, so that t N is the constant symbol, of all those occurring in ϕ 1 , . . ., ϕ k , with the largest index.For fixed n ≥ N , reinterpret ϕ 1 , . . ., ϕ k as statements of L n .Let E n be the set of probability functions on L n that satisfy (ϕ 1 ) X 1 n , . . ., (ϕ k ) X k n .If E n = ∅ consider the n-entropy maximiser: Since X 1 , . . ., X n are probabilities or closed intervals of probabilities, E n is closed and convex and P n is uniquely determined.Several considerations point to P n as the most appropriate probability function for drawing inferences from premisses on L n [38].When characterising normal models of a set of first order axioms, i.e., when X 1 , . . ., X n = 1, P n can be regarded as the normal probabilistic characterisation of a random model of {ϕ 1 , . . ., ϕ n } with respect to L n , where the normality is understood in terms of being minimally constrained [44].However, the premisses are intended as statements on L, not L n , and the question arises as to what would be the most appropriate probability function for drawing inferences from these premisses when they are interpreted as statements about an infinite domain, or what the default characterisation of random model of the premisses would be with respect to the full language L. If it exists, one can consider the function P ∞ defined on L as a pointwise limit of maximum entropy functions [7] 2 : The entropy-limit approach takes P ∞ for inference, attaching probability Y = P ∞ (ψ) to sentence ψ.
There is one complication about the definition of P ∞ which we need to address.While Barnett and Paris [7] define P ∞ in terms of a pointwise limit where the limit is taken independently for each sentence of L, Paris and Rafiee Rad [40,41,43] define P ∞ in a slightly different way: take the pointwise limit on quantifierfree sentences and extend this to the (unique) probability function on L as a whole which agrees with the values obtained on the quantifier-free sentences, assuming that the pointwise limit exists and satisfies the axioms of probability on quantifier-free sentences of L [17].The Rad-Paris definition circumvents a problem that can arise with the Barnett-Paris definition, namely that the pointwise limit on L as whole may exist but may fail to be a probability function (see Appendix A.1 for a discussion of this point).Since the entropylimit conjecture with respect to the Rad-Paris definition implies the entropy-limit conjecture with respect to the Barnett-Paris definition, we consider the Rad-Paris definition of P ∞ in this paper, with the aim of proving stronger results.
Note that P n and E n are defined on L n not L. To simplify notation, when P is defined on L, we will say P ∈ E n to mean that the restriction of The maximal-entropy approach This alternative approach avoids appealing to the finite sublanguages L n .Instead, consider E, the set of probability functions on L that satisfy the premisses ϕ X 1 1 , . . ., ϕ X k k .For probability functions P and Q defined on L, P is deemed to have greater entropy than function Q if it has greater n-entropy for sufficiently large n, i.e., if there is some natural number N such that for all n ≥ N , H n (P ) > H n (Q).Then we can consider the set of probability functions in E with maximal entropy: maxent E df = {P ∈ E : there is no Q ∈ E that has greater entropy than P }.
If maxent E = ∅, one can draw inferences using the maximal entropy functions P † .Thus, the maximalentropy approach attaches the set of probabilities Y = {P † (ψ) : P † ∈ maxent E} to ψ.Alternatively, if the premisses are categorical one can take P † as the default probabilistic description of a random model of the premisses.
See [35] and [48,Chapter 9] for one kind of justification of this approach.
What is known so far The entropy-limit conjecture says that if P ∞ exists and is in E, then maxent E = {P ∞ }.The majority of work in the literature concerning the conjecture deals with the special case of categorical premisses and concerns the probabilistic characterisation of models of a set of first order axioms.
Barnett and Paris study monadic first order languages and show that the limit entropy approach is well defined, i.e., P ∞ exists, for a generalised set of linear constraints (i.e., categorical and non-categorical premisses) on such languages [7].Rafiee Rad considers the special case of a set of first order axioms on monadic languages, derives the exact form of P ∞ and shows that the entropy-limit conjecture holds for these languages-see [41,Theorem 29], and [44] for a more general case.Similarly, he derives the exact form of P ∞ and shows that the conjecture is true in the categorical Σ 1 case, i.e., the case in which the premiss propositions ϕ 1 , . . ., ϕ k are all Σ 1 statements [43, Corollary 1].
He also shows that there exist cases in which maxent E = ∅: for any probability function satisfying the premiss ∃x∀yRxy there is another probability function with greater entropy that also satisfies that premiss [42].This involves considering sets of axioms with quantifier complexity of Σ 2 .Landes shows that the entropy limit also fails to be well-defined for this and other premisses in Σ 2 [33].Similarly, [41, Section 3.2] provides cases, with premisses of Π 2 quantifier complexity, in which P ∞ is not well defined.(It is not yet known whether or not the maximal entropy approach can yield an answer for those cases.)See also [44] for a more general case.On the other hand, [41, §4.1] shows that there are cases in which P ∞ does not exist but P † does.
This leaves open the case concerning sets of axioms with quantifier complexity Π 1 as well as noncategorical premisses for polyadic languages.Paris and Rafiee Rad investigate the existence of P ∞ for sets of Π 1 sentences and show that for a special case, which they call the slow Π 1 sentences, the entropy-limit approach is well defined [40].Plan of the paper In Section 3 we show that the entropy-limit conjecture holds in cases involving categorical premisses (i.e., premisses that take the form of sentences of the predicate language L without probabilities attached) of Π 1 quantifier complexity.In Section 4 we extend these cases to ones in which the premiss sentences do have probabilities attached.In Section 5 we provide a general result which shows that the entropy-limit conjecture holds in certain general cases in which the P n converge fast enough to P ∞ .We sum up in Section 6.

Summary of key notation
Key notation is summarised in Table 2.Note that we use χ to denote the set of constraints (premisses) currently under investigation.In cases where the constraints vary, we subscript P ∞ or P † with the constraint currently operating.

The categorical Π 1 constraint
In this section we show that the entropy-limit conjecture holds in the case in which there is a single categorical constraint ϕ which takes the form of a satisfiable Π 1 sentence ∀ xθ( x).As we shall now explain, this situation splits naturally into two cases: that in which ϕ has non-zero measure, P = (∀ xθ( x)) > 0, explored in §3.1, and that in which ϕ has zero measure, P = (∀ xθ( x)) = 0, explored in §3.2.
For all satisfiable ∃ xθ( x) ∈ Σ 1 it is known that for N = N ∃ xθ( x) ([43,Theorem 4] and [43, Corollary 1]): Intuitively, since Σ 1 and Π 1 are natural duals, one suspects that something similar is true for all satisfiable ∀ xθ( x) ∈ Π 1 and N = N ∀ xθ( x) : Unfortunately, this intuition only gets us so far, because there are no ω N to include in the disjunction iff ∀ xθ( x) has measure zero (Proposition 9).In this case, the intuition breaks down because conditioning on a zero-probability sentence is not defined.On the other hand, the intuition is correct as long as at least one such ω N exists (Corollary 10).The set of such Π 1 -sentences is characterised in Proposition 9.
The case of satisfiable ∀ xθ( x) ∈ Π 1 with measure zero is much harder, since P = (•|∀ xθ( x)) is simply not defined.Nevertheless, in Theorem 15 we prove that Entropy Limit Conjecture does also hold for all measure zero ∀ xθ( x) ∈ Π 1 .The technical difficulty is precisely that of defining a 'conditional probability' conditional on a sentence that has measure zero.
The following proposition plays a key part in many of our later proofs.
Proposition 5.For all ∅ = S and all x > 0 it holds that arg sup This proposition implies that entropy maximisation over a subset S of n-states under a linear constraint can be achieved by maximising the n-entropy of probability functions assigning S joint probability one and pointwise re-scaling the maximal n-entropy function.
Proof.It suffices to note that the n-entropy of g/x over S is an affine-linear transformation of the n-entropy of the probability function g over S: We use this proposition to show that distributing probability mass more uniformly increases n-entropy in the following sense.Consider two probability functions P, Q ∈ P which disagree on the non-empty subset S of n-states such that a : The following lemma will be important in the remainder of this section.Recall that P = is used to refer to the equivocator function both on L and its finite sublanguages L n : Proof.By definition of P n , P n can only assign non-zero probability to those n-states which are in [γ n ] n .Since entropy is maximal if the probabilities are uniform (Proposition 5), it follows that P n has to assign equal probability to all n-states in By assumption we have that [γ n ] n = ∅, and thus all n-states in [γ n ] n are assigned the same probability by P n and these probabilities sum to one.Hence, P n (ψ) = P = (ψ|γ n ) for all n ∈ N and all ψ ∈ SL n .
First we consider the case in which ∀ xθ( x) has positive measure.

Non-zero measure, P
Using Lemma 6 it is easy to see that for all sentences γ ∈ SL with positive measure, if for all χ ∈ QF SL lim n→∞ P = (χ|γ n ) exists, then In other words, P ∞ is obtained by considering the limit of equivocators conditionalised on the premiss reinterpreted on L n .The above lemma tells us what these probability functions-the P n -look like.
Assume that P = (γ) > 0 and thus for large enough n: P = (γ n ) > 0 and P (•|γ n ) is well-defined.By Remark 7, lim n→∞ P = (χ|γ n ) = P ∞ γ (χ).Taking the limit of (3) we obtain where the last equality follows from Remark 7, if the limit exists.But notice that It is easy to check that open orders satisfy δ.On the other hand, δ does not have a finite model: for every element in the support a of a model of δ there has to exist some other element b which is greater than a, U 1 ab.Note that a = b (U 1 is irreflexive).If U 1 were cyclic, then transitivity entails U 1 dd for some d which contradicts δ.Hence, for every element in the finite model there must be some other element which is greater.Contradiction.So, δ only has infinite models.It holds for all probability functions P ∈ P that P (δ n ) = 0, since there are no finite models of δ.Since, δ is satisfiable, there exist probability functions P ∈ P such that P (δ) > 0 ([37, p. 189]).And thus P (δ n ) = 0 < P (δ) for such a probability function P .
Thus, the limit exists as the right hand side is well defined.Then (2) holds for all quantifier-free sentences and thus by Gaifman's Theorem it holds for all sentences in SL.
Next assume that P = (γ) = 0. Taking the limit of (3) we obtain So, the limit on the right hand side exists and is equal to P = .Now conclude as above that P = = P ∞ ¬γ .
While Theorem 8 is informative about the entropy limit, it leaves open certain questions.When is it the case that ∀ xθ( x) ∈ Π 1 has positive measure?What exactly does the entropy limit look like?And does the entropy-limit conjecture hold in that case?We shall address these questions in turn.
Proof.To simplify notation we let N := N ϕ .
2 ⇒ 1: If ϕ ∈ Π 1 is of this form, then 1 ⇒ 2. We show that the negation of 2 entails the negation of 1, P = (ϕ) = 0. We now assume that ϕ ∈ Π 1 is not of this form, and denote by ∅ ⊆ I * ⊆ I those indices for which every λ ij i contains a variable.We consider two cases: first suppose that I * is not empty and let i ∈ I * .Note that P = (∀ x j∈J i λ ij ( x, t)) Let us suppose for the moment that ϕ contains only a single variable, say x.Let n > N and let ω n be an n-state which satisfies (∀x j∈J i λ ij (x, t)) n .We now count the number n + 1-states which extend ω n and satisfy (∀x j∈J i λ ij (x, t)) n+1 .Notice that ω n has 2 r n+1 −r n extensions to L n+1 .Those that satisfy (∀x j∈J i λ ij (x, t)) n+1 are precisely those n +1-states which satisfy (∀x There are Thus, If ϕ contains two or more variables, then the () n -operation leads to more conjunctions than in the singlevariable case.Hence, when counting the n + 1-states which satisfy (∀ x j∈J i λ ij ( x, a)) n+1 there is an even greater number which we subtract from 2 |J i | .The limit is hence equal to zero, too.And so For the second case, suppose that I * is empty.Then for every i ∈ I there exists at least one λ ij i which does not contain a variable.Furthermore, since we are assuming that ϕ is not of the form given in (2), for every such choice of , ω n cannot satisfy ϕ n by only satisfying a variable-free literal from each conjunct (since, as mentioned above, they are jointly inconsistent).Then there has to exist an i 0 ∈ I such that ω n is inconsistent with all variable-free literals in j i 0 λ i 0 j i 0 ( t).Thus, ω n and all its extensions ω m which satisfy ϕ must satisfy literals in j i 0 λ i 0 j i 0 , which contain a variable.For the purposes of counting extensions, we might as well ignore the variable-free literals of j i 0 ∈J i 0 λ i 0 j i 0 .We may now proceed as if I * is not empty.
3 ⇒ 1.Since i∈I λ ij * i ( t) is consistent, this easily follows: ) is trivially true.For the direction from right to left it suffices to notice that by 2, ∈ QF SL N since each conjunct only involves constants, and N is the largest constant appearing in θ( x).And that it is consistent by assumption since we have shown that 3 implies 1 and 1 implies 2. Hence, the conjunction is entailed by some N -state ω N ∈ Ω N .Then ω N entails the logically weaker i∈I λ ij * i ( t) ∨ ϕ. 4 ⇒ 1. P = (ϕ) ≥ P = (ω N ) > 0, where the strict inequality follows from the definition of P = .
Hence, P = ∈ E n for all n ≥ N and so for all n > N, P n ϕ = P = and thus If P = (ϕ) < 1 then, first, by [43,Theorem 4]: [37,Lemma 3.8].The limit is hence well-defined and in [0, 1).Thus we can use Theorem 8 to obtain Next, applying [43, Lemma 5], we have So, Inserting this back into (4) gives Applying the Theorem of Total Probability to P = (•) on the right hand side and simplifying the equation, we obtain converges exponentially (or faster) to zero in n > N.
This lemma says that N -states compatible with the constraint on L N which do not entail ϕ have only very few extensions which satisfy the constraints on more expressive languages.
Proof.As above, write ϕ in conjunctive normal form, say i∈I j i ∈J i λ ij i ( x, t), with the standard convention that λ ij i ( x, t) means that λ ij i ( x, t) mentions at most the variables in x and at most the constant symbols in t but not necessarily all of them.
First notice that for ω N ∈ [ϕ N ] N such that ω N ϕ, there has to exist at least one disjunction, say ), such that ω N fails to entail all literals in λ ij i ( x, t) that do not mention a variable.To see this, notice that if ω N entails one such literal in every disjunction then ω N entails the whole disjunction and thus ϕ.That contradicts our assumption.Notice that if the literal does mention a variable then ω N cannot entail it, since ω N only mentions constants t 1 , . . ., t N but no variable.We now let Δ( x, t) : Next notice that this also holds for any ω n extending ω N .That is every ω n which extends ω N fails to satisfy all literals in Δ( x, t) which do not mention a variable.To see this notice that all literals in Δ( x, t) only mention constants t 1 , . . ., t N , and since ω n agrees with ω N on L n , if it satisfies any such literal, that literal is satisfied by ω N , which cannot be the case, as just discussed above.Now consider an n-state ω n ∈ [ϕ n ] n which is logically equivalent to ( i∈I j i ∈J i λ ij i ( x, t)) n .Since ω n does not satisfy a single literal λ ij i ( x, t) of Δ( x, t) which does not mention a variable, ω n must satisfy the interpretation of at least one of its literals, say λ ij i ( x, t), mentioning a variable in L n .In λ ij i ( x, t) a variable has been replaced by a constant by the () n -operation.
Let us consider ω n+1 that extends ω n and ω n+1 (∀ xΔ( x, t)) n+1 .Since ω n mentions all constants in Δ (n ≥ N ), its extension, ω n+1 cannot satisfy a literal in (Δ( x, t)) n+1 with no variable replaced by a constant.This is so because by the discussion above ω n does not satisfy any such literal in Δ( x, t).But since ω n+1 (∀ xΔ( x, t)) n+1 it has to satisfy, in L n+1 , the interpretation of one literal in Δ( x, t) with variables.That is, it has to satisfy some literal, with variables, of Δ( x, t) in which the variable is replaced by a constant.
Since ϕ ∈ Π 1 , for all variables there is at least one ∀-quantifier in front of the Δ( x, t) which binds them.Since ω n+1 (∀ xΔ( x, t)) n+1 , we have ω n+1 Δ(t n+1 , . . ., t n+1 , t) (we have instantiated all the universally quantified variables with t n+1 ).And so Let d be the maximal number of literals in any disjunction in the CNF of ϕ.In particular, j i ¬λ ij i (t n+1 , . . ., t n+1 , t) has no more than d literals -which is independent of n.And thus And thus for all n > N Theorem 12.For all ϕ = ∀ xθ( x) ∈ Π 1 with positive measure, where N = N ϕ , i.e., the maximum n such that t n appears in ϕ.
Proof.If ϕ is a tautology, then t) be the conjunctive normal form of ϕ.Then from Proposition 9 every ω N ∈ Ω N such that ω N ϕ entails at least one λ ij * i ( t) (i.e., literal with no variable) for each i.Hence, every extension ω n of ω N also entails ϕ and thus ϕ n .To see this notice that such literals only mention constants and will thus be quantifier free sentences in L N , and every extension of ω N agrees with ω N on L N .Also note that if ω N / ∈ [ϕ N ] N then for all n > N and all its extensions ω n it holds that ω n / ∈ [ϕ n ] n .And so we find for all n ≥ N that P n ϕ (ω N ) = 0 = P n ϕ (ω n ).Now consider an N -state ω N ∈ [ϕ N ] N which does not entail ϕ.By Lemma 11, the ratio of its extensions that satisfy ϕ n decreases at least exponentially quickly in n.Since P n equivocates on those n-states which are models of ϕ n (Lemma 6), it follows that To complete the proof we need to show that P ∞ ϕ = P † ϕ .To show this we show that P ∞ ϕ defined above has greater entropy than every probability function P ∈ E in the sense required by the maximal entropy approach.
Use S to denote the set of N -states which entail ϕ.First notice that P ∞ ϕ defined above has greater entropy than any probability function P which assigns probability one jointly to the N -states in S. To see this notice that for each n > N, P assigns non-zero probability only to extensions of state descriptions in S but so does P ∞ ϕ and P ∞ ϕ does so in a completely equivocal way, dividing the probability equally between them.So for each n > N, P ∞ ϕ has strictly greater n-entropy than P for all n such that P and P disagree on L n .
Next we calculate the n-entropy of P ∞ ϕ : where ω n N is the restriction of ω n to the first N constants.
If P is a probability function in E which assigns joint probability 1 − k < 1 to the N -states in S, then it must assign joint probability k > 0 to the N -states not in S. To maximise n-entropy, P equivocates on the n-states extending those N -states not in S -as much as possible.We find To calculate this we first notice a couple of things: first that each ω N ∈ Ω N has the same number of extensions to L n for n > N, which is and the probability mass 1 − k is divided equally between them to maximise n-entropy, assigning each such ω n measure ∈ S, ω n ϕ n }| and these are jointly assigned probability mass k, so the entropy on this set is maximal, if this probability is divided equally between them.Then for (5) we have By Lemma 11, for large n, M |Ω| n .Thus, for large n So, P ∞ ϕ has greater entropy than P .And so, P ∞ ϕ has greater entropy than all other P ∈ E. Hence, we have P † ϕ = P ∞ ϕ .

Zero measure, P = (∀ xθ( x)) = 0
If ϕ = ∀ xθ( x) has measure zero, then another strategy is required, as explained at the start of the section.We can, however, solve one case easily (Proposition 13).We then show why this solution strategy does not work in the general measure-zero case.

Proposition 13. For all conjunctions of literals θ( x, t) it holds for consistent
where all λ i are literals.Note that for all ω n ∈ Ω n it holds that ω n ∈ [ϕ n ] n , if and only if [ϕ n ] n is a sub-formula of ω n .Hence, all ω n ∈ [ϕ n ] n have equally many k + n-states extending them which are in [ϕ n+k ] n+k .Since the entropy maximisers on finite languages L m assign all those states which do not satisfy ϕ m probability zero for large enough m, all probability mass is assigned to those states in [ϕ n ] n .
First, note that P = (•|ϕ n ) has maximum n-entropy among all probability functions with P (ϕ n ) = 1.Thus, Second, observe that P = (•|ϕ n+1 ) agrees with P = (•|ϕ n ) on Ω n since θ( x) is a contingent conjunction of literals.To see this notice that P = (•|ϕ n+1 ) divides the probability mass equally between the n + 1-states that satisfy ϕ n+1 but these are all extensions of n-states that satisfy ϕ n and all of these have an equal number of extensions to n + 1-states that satisfy ϕ n+1 .
We hence find for all m ≥ N and all m-states ω m that P ∞ ϕ (ω m ) = lim n→∞ P n ϕ (ω m ) = lim n→∞ P = (ω m |ϕ n ) = P = (ω m |ϕ m ).This shows that P ∞ ϕ is well defined on all m-states for all m > N and it satisfies P1 and P2 since it is a limit of probability functions.Thus, by Gaifman's Theorem it can be uniquely extended to L.
As soon as there are disjunctions in ∀ xθ( x) ∈ Π 1 , matters are more involved, because different disjuncts in θ( x) may have different consequences: Example 14.For ϕ = ∀xy((U 2 t 1 ∧ U 1 x) ∨ ¬U 2 y) ∈ Π 1 and the language containing only the two unary relation symbols U 1 , U 2 , we have There are two sorts of n-states which entail ϕ n .Those which entail U 2 t 1 ∧ n i=1 U 1 t i and those which entail n i=1 ¬U 2 t i .No state can entail both sentences.At every level n ≥ 2, exactly half of all extensions satisfying ϕ n are extensions satisfying ϕ n+1 .Furthermore, at every level there are twice as many n-states which entail x and ¬U 2 y) are thus treated differently in the entropy maximising process.
Proof.Let ϕ be ∀ xθ( x), N = N ϕ and t > N. Let's first observe that, then for all t ≥ N and all t-states ω t ∈ Ω t using Lemma 6 Since P ∞ exists by assumption, this limit is well-defined, i.e., it takes a definite value for all ω t ∈ Ω t .This, then, defines P ∞ on all t-states, for all t ≥ N .But this means that P ∞ is uniquely determined.By the fact that P ∞ is a unique probability function, it must be the case that P2 holds, in particular, for all t-states ω t ∈ Ω t it holds that Next, we show that P ∞ = P † .To show this we will show that P ∞ has greater entropy than every other probability function in E. So let Q ∈ E \ {P ∞ }.If t ≥ N , then for every t-state ω t that is inconsistent with ϕ t , it holds that ω t is inconsistent with ϕ.Hence, Q(ω t ) = 0. We consider two cases: first we look at those Q that assign non-zero probability to some t-state, t > N with vanishingly few extensions that satisfy ϕ n for large n, and next those Q that assign zero probability to all such t-states.
Case 1. Suppose that there is some t-state ω t that satisfies ϕ t and Q(ω t ) > 0 such that there exists some other t-state μ t that satisfies ϕ t with many more extensions compatible with ϕ than ω t .Then the limit of is zero.Suppose that t ≥ N is minimal with this property.Now define a probability function P ∈ E which agrees with Q everywhere except for ω t , ν t and (at least some of) their extensions.Let P (ω t ) := 0 and P (ν t ) := Q(ν t ) + Q(ω t ) > Q(ν t ).Note that this forces P (ω n ) = 0 for all extensions ω n of ω t .For the extensions ν n of ν t we define a real number α > 0 by the unique solution of α•P ∞ (ν t ) = Q(ω t ) +Q(ν t ).Then simply put P (ν n ) := α•P ∞ (ν n ) for all extensions ν n of ν t .We need to show is that this is a probability function.For this is it is enough to observe that for all k ≥ 0 and all ν n+k that satisfy ϕ n+k extending ν n , Next, note that for all large enough n ≥ t it holds that (Proposition 5) Since Q and P only disagree on ω t , μ t and (at least some of) their extensions, this means that H n (Q) < H n (P ) for all large enough n.This entails that Q / ∈ maxent E. Case 2. Consider a Q ∈ E which assigns zero probability to all ω t which have vanishingly few extensions satisfying ϕ n for large n.Suppose furthermore that Q does not always assign probabilities according to asymptotic ratios of the number of extensions, i.e., there exists a minimal t ≥ N and two t-states ω t , ν t ∈ [ϕ t ] t with P ∞ (ω t ), P ∞ (ν t ) > 0 such that Define a function P ∈ E which agrees with Q except for ω t , ν t and (at least some of) their extensions.Put for all n ≥ t So, P assigns the same joint probability mass to ω t , ν t and its extensions as Q.However, P does so by adhering to the same ratios as P ∞ .By Proposition 5 it holds for all large enough n ≥ t: Since Q and P only disagree on ω t , ν t and (at least some of) their extensions, this means that H n (Q) < H n (P ) for all large enough n.This entails that Q / ∈ maxent E. This means, that we can always improve in the entropy ordering by assigning probability zero to t-states with vanishingly few extensions compatible with ϕ (case 1).We can also improve in the entropy ordering by assigning probabilities according to the same ratios as P ∞ .There is however only one probability function that satisfies both these conditions, which is P ∞ .Hence, P ∞ has greater entropy than every other function Q ∈ E \ {P ∞ }.And so P † = P ∞ .This proof leaves two open questions.(1) What is the concrete form of P ∞ , assuming it exists for ϕ ∈ Π 1 ?(2) Does the existence of a unique maximal entropy function, maxent E = {P † }, entail that the entropy limit exists and that they are equal, P † = P ∞ , for all ϕ ∈ Π 1 ?
While we do not know the answers to these questions, we do know that there are premiss sentences ϕ = ∀ xθ( x) ∈ Π 1 of which Theorem 15 holds nontrivially.
Furthermore, for all consistent premiss sentences ϕ = ∀ xθ( x) ∈ Π 1 in which θ( x) is a conjunction of literals, the entropy limit is well-defined (Proposition 13) and thus the entropy-limit conjecture holds nontrivially for all these sentences too.

Non-categorical premisses and Jeffrey updating
In Section 3, we saw that entropy maximisation on predicate languages for categorical Π 1 (and also Σ 1 ) premisses amounts to updating the equivocator (the prior probability function in a state of maximal uncertainty, i.e., no evidence is available at all) by conditionalisation.This mirrors the finite case in which entropy maximisation agrees with conditionalisation in case of categorical evidence [46].We now turn to non-categorical premisses of the form ϕ X , X ∈ (0, 1) and show that, for Π 1 and for Σ 1 premiss propositions, the entropy-limit conjecture holds and entropy maximisation amounts to Jeffrey updating (Theorem 22).Again, this mirrors the finite case in which entropy maximisation agrees with Jeffrey updating in case of non-categorical premisses of the form ϕ X [46].Our result is also in line with the literature that shows that MaxEnt updating agrees with Jeffrey updating on infinite domains [13].

Point probabilities
While Π k and Σ k categorical constraints require different approaches to maximise entropy, this is no longer so for non-categorical constraints.Every non-categorical Π k constraint is equivalent to a non-categorical Σ k constraint, since χ = {ϕ X } is equivalent to {¬ϕ 1−X }.Lemma 19.For all contingent ϕ ∈ SL such that P ∞ ϕ and P ∞ ¬ϕ both exist and all X ∈ (0, 1) it holds that So, if the entropy limit exists for both categorical premisses ϕ, ¬ϕ, then the entropy limit for the noncategorical premiss(es) ϕ X (and ¬ϕ 1−X ) exists and is obtained by a weighted average inspired by Jeffrey updating.If P ∞ ¬ϕ = P = (•|¬ϕ) and if P ∞ ϕ = P = (•|ϕ), then the entropy limit for the non-categorical premiss(es) ϕ X (and ¬ϕ 1−X ) is indeed given by Jeffrey updating of the equivocator.In many cases where the premisses only involve one type of quantifier (either ∀ or ∃ but not both), this is indeed the case, as we showed in Section 3.
Proof.First, note that E = {P ∈ P : P (ϕ) = X and P (¬ϕ) = 1 − X}.Next, we observe that We now see that for all n ∈ N that X • P n ϕ maximises entropy over the set and that (1 − X) • P n ¬ϕ maximises entropy over the set : We see this by recalling Proposition 5.For all functions f : {1, . . ., N } → R ≥0 it holds that arg sup Finally, observe that the objective function is additive with respect to n-states in the following sense: Hence, if there exist two sets of n-states (here: [ϕ n ] n and [¬ϕ n ] n ) such that every constraint applies to exactly one of these sets of n-states, then the maximum entropy function can be found by maximising entropy separately over these two sets.Hence, this shows that By the assumption that P ∞ ϕ , P ∞ ¬ϕ are well-defined, we get that ¬ϕ satisfies P1 and P2 on QF SL.By Gaifman's Theorem [17], P ∞ ϕ X is (uniquely extendible to) a probability function on SL.

Lemma 20. Under the assumption of Lemma 19, if maxent
Proof.By the above, P ∞ ϕ X ∈ E ϕ X .Denote by H n S (P ) the n-entropy of P evaluated on all n-states in S ⊆ Ω n .So, By assumption P † ϕ exists and is unique, it must hence be in E ϕ .Since maxent E ϕ = {P † ϕ }, and since P † ϕ it has greater entropy than every other probability function R with R(ϕ) = 1, we have that X • P † ϕ will dominate any probability function Q with Q(ϕ) = X in entropy.In the same way (1 − X)P † ¬ϕ will dominate any probability function Q with Q(¬ϕ) = 1 − X in entropy.Then by Proposition 5 and the discussion immediately after that, X • P † ϕ + (1 − X) • P † ¬ϕ will dominate every probability function in E ϕ X .By assumption, however, It is worth noting two points here.First, an application of Lemma 20 requires that P ∞ and P † are defined for both ϕ and ¬ϕ.(It is not sufficient that P ∞ ϕ and P † ϕ are well defined.)For example, if ϕ is a slow Π 1 sentence, then ¬ϕ ∈ Σ 1 and we know that P ∞ and P † are well-defined for both ϕ and ¬ϕ.Lemma 20 also applies non-trivially to consistent ϕ = ∀ xθ( x) ∈ Π 1 in which θ( x) is a conjunction of literals (Remark 18).Second, we note that nothing in the proof of Lemma 20 hinges on working with a single non-categorical premiss.Indeed, all that was needed for that result was that for any satisfiable sentence ϕ, the pair {ϕ n , ¬ϕ n } give a partition of n-states for all n.The result can thus be generalised in a straightforward way to any set of premisses that satisfy this condition.Definition 21.A non-empty set of sentences ϕ 1 , . . ., ϕ k ∈ SL is called a partition on the large L n , if and only if there exists a J ∈ N such that for all n ≥ J, Notice that a set of sentences will trivially fail to satisfy this condition if at least one sentence does not have finite models of size n for all sufficiently large n, even if it does have an infinite model.Consider for example the sentence ϕ = ∀xzw∃y(U ) which only has infinite models.One may think of {ϕ, ¬ϕ} as partitioning the full language by partitioning the class of models of the language L. Since P (ϕ n ) = 0 = 1 − P (¬ϕ n ) holds for all n ∈ N and all probability functions P ∈ P , P (ϕ n ) = X for all X ∈ (0, 1) is unsatisfiable and hence E n = ∅ for all n.We hence require partitions on finite sublanguages.
Vice versa, not every partition on the large L n partitions the class of models of L. ψ 1 := ϕ ∨ U 2 t 1 and ψ 2 := ¬U 2 t 1 form a partition of all finite sublanguages but ψ 1 ∧ ψ 2 has infinite models characterised by ϕ ∧ ¬U 2 t 1 .So, {ψ 1 , ψ 2 } does not partition the class of models of L.

Theorem 22 (Entropy Maximisation and Jeffrey Updating
where ϕ 1 , . . ., ϕ k is a partition on the large L n and X 1 , . . ., X k ≥ 0 such that k i=1 X i = 1, and if for all Proof.The proof follows immediately by applying the argument in the proof of Lemma 20 a finite number of times.

Generalisation to probability intervals
We now show how to use the above results to prove that the entropy-limit conjecture holds for certain non-categorical premisses ϕ X where X is a set of probabilities.
If the entropy limits exist for the categorical constraint χ 1 := {∀ xθ( x) 1 }, then Proof.First note that it follows from Lemma 19 that the entropy limits exist for the non-categorical constraint χ x := {∀ xθ( x) x } for all x ∈ [0, 1].We use P n x to denote the unique probability function on L n with maximal n-entropy subject to the constraint χ x = {ϕ x }.
It is easy to check that for all x ≥ > 0 there exists some M ∈ N such that for all n ≥ M , H n (P n x ) < H n (P n x− ).It follows that P n ϕ X = P n ϕ inf X .Hence, P ∞ ϕ X = P ∞ ϕ inf X .Applying Theorem 15 we note that P † ϕ 1 = P ∞ ϕ 1 .In particular, P † ϕ 1 exists, is unique and it satisfies the constraint χ 1 .Applying Theorem 22 we obtain P † ϕ x = P ∞ ϕ x for all x ∈ X.Using the proof technique from Lemma 20 we see that for all y ∈ X \ {inf X} there exists some M ∈ N such that for all n ≥ M it holds that H n (P † ϕ y ) < H n (P † ϕ inf X ).Hence, P † ϕ inf X has greater entropy than every other probability function in E and hence Proof.The proof is an easy adaptation of the proof of the previous proposition replacing inf X by λ.
It is easy to check that for all x ∈ X \ {λ} there exists some M ∈ N such that for all n ≥ M it holds that H n (P n x ) < H n (P n λ ).It follows that for all large enough k that P n ϕ X and P n ϕ λ are arbitrarily close on all k-states.Hence, P ∞ ϕ X = P ∞ ϕ λ .Note first that by Theorem 12 the entropy limit exists and is equal to the maximum entropy function for the categorical constraint χ 1 , P † ϕ 1 = P ∞ ϕ 1 .In particular, P † ϕ 1 exists, is unique and it satisfies the constraint χ 1 .Applying Theorem 22 we obtain P † ϕ x = P ∞ ϕ x for all x ∈ X.Using the proof technique from Lemma 20 we see that for all y ∈ X \ {λ} there exists some M ∈ N such that for all n ≥ M , H n (P † ϕ y ) < H n (P † ϕ λ ).Hence, P † ϕ λ has greater entropy than every other probability function in E and hence • If λ := P = (ϕ) = 0 and the entropy limits exists for the categorical constraint χ 1 := {∀ xθ( x) 1 }, then Proof.The only thing left to prove is the last equality which follows from Theorems 12, 15 and 22.

Convergence in entropy
In this section we show that there are some rather general conditions under which the entropy-limit conjecture is true.We suppose only that E, E 1 , E 2 , . . .are convex sets of probability functions generated by some consistent set χ of constraints on probabilities of sentences of L and that the P n df = arg max P ∈E n H n (P ) exist for sufficiently large n.For example, if χ = {ϕ X 1 1 , . . ., ϕ X k k } and the X 1 , . . ., X k are probabilities or closed intervals of probabilities, then the E n are closed and this guarantees the existence of the P n for non-empty E n .
The main condition required for the general result is that P n converges to P ∞ in entropy.Thus, we first introduce this kind of convergence and compare it to L 1 convergence, which also plays a role in what follows: Definition 26 (Convergence in Entropy).Suppose P and Q n , for n = 1, 2, . .., are probability functions on 5We define L 1 distance as follows, The latter equality follows as per [16, Equation 11.137].
Definition 27 (Convergence in L 1 ).Suppose P and Q n , for n = 1, 2, . .., are probability functions on L. The The entropy function H n is not 1-1.Therefore, that the Q n converge to P in entropy does not imply that they converge in L 1 to P , nor that, if they do additionally converge in L 1 to P , then P is the unique function to which they converge in entropy.
Example 28.Suppose L is a language with a single predicate U which is unary.Define P by P (Ut for all n, the Q n converge in entropy to both P and R but converge in L 1 to neither function.
Example 29.Proceed as in the previous example, except let Q n = P for all n.Now the Q n converge in L 1 to P , but converge in entropy to both P and R, among other functions.However, it turns out that, under certain conditions, if the n-entropy maximisers P n converge in entropy to P ∈ E then they converge in L 1 to P .To show this we need two lemmas.
First, a Pythagorean theorem holds for what we call the n-divergence d n [16,Theorem 11.6.1]: Definition 30 (n-divergence).The n-divergence of two probability functions P and Q is defined as the Kullback-Leibler divergence of P from Q on L n : Lemma 31 (Pythagorean theorem).For any convex F ⊆ P , if P ∈ F and Q / ∈ F , then where Corollary 32.For any convex F ⊆ P , if P ∈ F and R n = arg sup S∈F H n (S), then Proof.If the equivocator function P = / ∈ F , then we can apply the Pythagorean theorem to Q = P = and simplify.
Otherwise, R n = P = L n and the inequality holds with equality: The second lemma connects the L 1 distance to n-divergence [see, e.g., 16, Lemma 11.6.1]: Apart from convergence in entropy, the other key condition invoked by our general entropy-limit theorem is regularity: Proposition 36.Suppose χ is regular.If the P n converge in entropy to P ∈ E, then they converge in L 1 to P .Proof.By regularity, P n = arg max Q∈F n H n (Q) for sufficiently large n and convex F n .So by Corollary 32, for sufficiently large n, n by Pinsker's inequality.Hence, that the P n converge in entropy to P implies that P − P n 2 n converges to zero, which in turn implies that the P n converge in L 1 to P .
Note that the regularity condition can be dropped if P is the equivocator function P = : Proposition 37.If the P n converge in entropy to P = , then they converge in L 1 to P = .
Proof.As we saw in the proof of Corollary 32, So, n by Pinsker's inequality.Hence, that the P n converge in entropy to P = implies that P = − P n 2 n converges to zero, which in turn implies that the P n converge in L 1 to P = .
Importantly for our purposes, convergence in entropy guarantees the existence of the pointwise entropy limit P ∞ in these cases: Proposition 38.Suppose χ is regular or P is the equivocator function.If the P n converge in entropy to P ∈ E, then P ∞ exists and P = P ∞ .
Proof.Applying Proposition 36 if χ is regular, or Proposition 37 if P is the equivocator, together with Equation ( 6), we see that if the P n converge in entropy to P , then P (ψ) = lim n→∞ P n (ψ) for every quantifier-free sentence ψ.P is the unique such limit of the P n because it is in E and so a probability function, and hence determined by its values on the quantifier-free sentences of L [17].Now, P ∞ is defined as the unique extension to L of pointwise limit of P n on quantifier-free sentences, assuming that this pointwise limit exists and satisfies the axioms of probability on quantifier-free sentences of L. This latter assumption holds because, as we have seen, lim n→∞ P n (ψ) = P (ψ) for quantifier-free ψ, where P is a probability function.Since P ∞ is the unique extension to L, it must agree with P on L as a whole.Therefore P ∞ exists and P = P ∞ .
We can now progress to the main result of this section: Theorem 39 (Entropy-Limit Theorem under convergence in entropy).Suppose χ is regular or P is the equivocator function.If the P n converge in entropy to P ∈ E, then P ∞ exists and Proof.The existence of P ∞ and the fact that P = P ∞ is an application of Proposition 38.So it remains to show that maxent E = {P ∞ }.
If P is the equivocator function, this fact follows straightforwardly.P = P ∞ = P = ∈ E and for any other function Q in E, P and Q must differ on some n-states for large enough n.P = has greater n-entropy than Q for all such n; this holds for any other Q ∈ E, so P = is the unique member of maxent E.
We turn next to the case in which P is not the equivocator function-this is the case in which χ is regular.First we shall show that P ∞ ∈ maxent E; later we shall see that there is no other member of maxent E.
First, then, assume for contradiction that P ∞ / ∈ maxent E. Then there is some Q ∈ E such that Q has greater entropy than P ∞ .I.e., for sufficiently large n, H n (P n ) ≥ H n (Q) > H n (P ∞ ).N.b., Q = P ∞ .Hence, for sufficiently large n, where the latter two inequalities hold by Corollary 32 (given regularity) and Pinsker's inequality.Hence, since the P n converge in entropy to P ∞ , they converge pointwise to Q.By the uniqueness of pointwise limits, Q = P ∞ : a contradiction.Hence, P ∞ ∈ maxent E, as required.
Next we shall see that P ∞ is the unique member of maxent E. Suppose for contradiction that there is some P † ∈ maxent E such that P † = P ∞ .Then P ∞ cannot eventually dominate P † in n-entropy-i.e., there is some infinite set J ⊆ N such that for n ∈ J, H n (P † ) ≥ H n (P ∞ ) .
Let R df = λP † + (1 − λ)P ∞ for some λ ∈ (0, 1).Now by the log-sum inequality [16,Theorem 2.7.1], for all n ∈ J large enough that Hence, for large enough n ∈ J, by Corollary 32 and regularity.Now by Pinsker's inequality (Lemma 33) and the definition of R, Let us consider the behaviour of However, λ(P † (ρ n ) − P ∞ (ρ n )) −→ 0 as n −→ ∞, as we shall now see.P † = P ∞ by assumption, so they must differ on some quantifier-free sentence ψ, a sentence of L m , say.Suppose without loss of generality that P † (ψ) > P ∞ (ψ) (otherwise take ¬ψ instead) and let Since P n converges in L 1 to P ∞ we can consider n > m large enough that (see equation ( 6)): In particular, since ψ is quantifier-free, P n (ψ) − P ∞ (ψ) ≤ max ϕ∈SL n (P n (ϕ) − P ∞ (ϕ)) < λδ/2.For any such n, Putting the above parts together, we have that for sufficiently large n ∈ J, However, that these H n (P n ) − H n (P ∞ ) are bounded away from zero contradicts the assumption that the P n converge in entropy to P ∞ .Hence, P ∞ is the unique member of maxent E, as required.
One can use this result to test whether some hypothesised function P ∈ E is both the entropy limit P ∞ and maximal entropy function P † , via the following procedure: 1. Determine P n as a function of n. 2. Determine whether P n converges in entropy to P .3. Determine whether χ is regular or P is the equivocator function.4. If these last two conditions hold, then P = P † = P ∞ .
With regard to step 2, a rapid form of convergence in L 1 is sufficient (but not necessary) for convergence in entropy.Recall that r n is the number of atomic sentences in L n : Proof.By [16,Theorem 17.3.3],for sufficiently large n we have that: Both these latter terms tend to zero with n, by the fact that r n Q n − P n tends to zero together with (in the case of the second term) the fact that x log x −→ 0 as x −→ 0.
In the remainder of this section we provide a range of examples to illustrate the usage of the above algorithm.
Example 41.Suppose χ = {∃xU x}, where L has a single unary predicate U .Letting ϕ be ∃xU x, ϕ n is defined as Ut 1 ∨ • • • ∨ Ut n .We have that E = {P ∈ P : P (ϕ) = 1} and E n = {P ∈ P n : P (ϕ n ) = 1}.The n-entropy maximiser gives probability 0 to the n-state ¬Ut 1 ∧ • • • ∧ ¬Ut n and divides probability 1 amongst the 2 n − 1 other n-states: We shall use Lemma 40 to show that the P n converge in entropy to the equivocator function P = : Note that P = ∈ E: Now χ is not regular.This is because E n = P n , so F n = P n , P n and arg max Q∈F n H n (Q) is P = rather than P n .However, because the P n converge in entropy to the equivocator function, the Entropy-Limit Theorem (Theorem 39) nevertheless implies that maxent E = {P ∞ } = {P = }.
Example 42.Suppose χ = {∀xU x X }, where L has a single unary predicate U and X ∈ [0, 1].Letting ϕ be ∀xU x, ϕ n is Ut 1 ∧ . . .∧ Ut n .We have that E = {P ∈ P : P (ϕ) = X} and E n = {P ∈ P n : P (ϕ n ) = X}.Then the n-entropy maximiser gives probability X to the n-state ϕ n and divides probability 1 − X amongst all other n-states: Let us consider whether the P n might converge in entropy to the following function: Hence the P n do indeed converge in entropy to P .Moreover, P ∈ P because ω n ∈Ω n P (ω n ) = 1 and for ω n = ϕ n , and for ω n = ϕ n , Moreover, P ∈ E: In addition, if X > 0 then χ is regular.To see this, observe that for any Therefore, when n is large enough that X > 1 2 n we have that P = (ϕ) = 1 2 n < X = P n (ϕ n ) ≤ Q(ϕ n ) and so, since P n spreads the remaining probability 1 − X evenly across the remaining n-states, H n (P n ) ≥ H n (Q).On the other hand, if X = 0 then P is the equivocator function.Either way, we can apply Theorem 39 to conclude that P = P ∞ = P † in this example.
Example 43.Suppose χ = {∀x(Ux → Ut 3 ) X }, where L is a unary language with a single unary predicate The n-entropy maximiser will give these n-states the same probability: Let us consider and ask whether the P n converge to P ∈ E in entropy: because each of the three terms tends to zero with n. χ is regular as long as X > 1/2, for otherwise P has greater n-entropy than P n for sufficiently large n.Hence, we can invoke Theorem 39 to conclude that In cases where the condition of Lemma 40 does not hold, the following lemma can come in useful: Lemma 44.Probability functions (Q n ) n≥1 converge in entropy to P if and only if Note that if Q n (ω) is zero whenever P (ω) is zero then the second term, − ω∈Ω n :P (ω)=0 Q n (ω) log Q n (ω), vanishes.
Proof.Let x = x ω = Q n (ω) and a = a ω = P (ω) > 0 and consider the Taylor series expansion of x log x at a: because each of the component terms individually tend to zero with n.Hence we do have convergence in entropy for X < 1.If X = 1, the third line in the above sum disappears and we have to rewrite the second line, which corresponds to the n-states n i=1 ¬Ut i on which P n is positive but P is zero: The first line of the sum tends to zero as before; the second line is approximately −(n−1) log 2 2 n−1 which also tends to zero.Hence, we also have convergence in entropy when X = 1.Recall that χ is regular as long as X > 1/2.Hence, we can again invoke Theorem 39 to conclude that P coincides with the maximal entropy function P † when X > 1/2.
Finally, here is an example involving a Π 2 constraint, which shows that the entropy-limit conjecture holds in cases other than those covered by previous sections of the paper.Now P n converges in entropy to the equivocator P = : To verify the penultimate identity above, consider that: This follows because: Moreover, the equivocator is in E: Theorem 39 then implies that maxent E = {P = } = {P ∞ }.

Conclusions
We have shown that the entropy-limit conjecture holds in the following scenarios: Non-categorical partition.χ = {ϕ X 1 1 , . . ., ϕ X k k } where ϕ 1 , . . ., ϕ k is a partition on the large L n and X 1 , . . ., X k ≥ 0 such that k i=1 X i = 1, and for all 1 Convergence in Entropy.The P n converge in entropy to P ∈ E and either χ is regular or P is the equivocator function (Theorem 39).
Taking into account previous work (see Section 2), the entropy-limit conjecture has now been verified in quite a broad range of different scenarios.Future work might proceed in one of two directions.The first is to further extend the range of scenarios in which the conjecture is tested-e.g., to categorical constraints of greater quantifier complexity or to a broader range of non-categorical constraints.The second is to consider inference processes other than the maximum entropy principle, which might be relevant to questions other than the search for a canonical inductive logic or a canonical characterisation of normal models.There are several examples of such inference processes that have been proposed and studied in the literature, for example Centre of Mass, Minimum Distance and the spectrum of inference processes based on generalised Rényi entropies [45].These inference processes differ in the structural properties that they impose on the probability function that they pick for inference.There are, however, several such properties that are in common between them that allow for a generalisation of some of our results-see [38] for a detailed analysis of these structural properties for different inference processes.Of particular interest is a symmetry property called the Renaming Principle.
The Renaming Principle (RP) is a symmetry axiom that ensures that the choice of the probability function is invariant under a uniform renaming of the set of state descriptions of finite sublanguages.
An inference process ι, defined on the finite languages L n , satisfies Renaming Principle if for two sets of linear constraints χ and χ of the form χ ={ where the ω 1 , . . ., ω r n are a permutation of the n-states ω 1 , . . ., ω r n of L n , it holds that: What is special about RP in our context is that many of the results we have provided (as well as those given in [40][41][42]) hold for any inference process that satisfies RP.This is a rather large class of inference processes that includes not only the Maximum Entropy but also the examples given above (Center of Mass, Minimum Distance and those based on generalised Rényi entropies).For a detailed discussion on this point see [44].
We give another symmetry result that follows from RP in Appendix A.2.An immediate question, which we hope to study further in future work, is whether or not the conjecture and the results thereof can be generalised if we take an approach analogous to the maximal-entropy approach for defining these other inference processes on first order languages.
Another promising avenue for further research is the introduction of functions to the underlying language, as recently studied by Howarth and Paris [23].
Soroush Rafiee Rad research is also supported by the Deutsche Forschungsgemeinschaft (DFG) grant number RO 4548/8-1.Jon Williamson is grateful for funding from the UK Arts and Humanities Research Council (grant AH/I022957/1) and the Leverhulme Trust (grant RPG-2019-059).We are grateful to David Corfield for help with Example 46.

A.1. Defining the entropy limit
As pointed out in the Section 2 there are two ways to define the entropy-limit approach on first order languages.One, the Barnett-Paris definition, is to define the entropy-limit function as the limit of local entropy maximisers directly on all sentences of L. That is to take P ∞ (ψ) = lim r→∞ P r (ψ r ) if the limit exists for all ψ ∈ SL, and to take P ∞ as undefined otherwise.The second approach, the Rad-Paris definition, is to define P ∞ on quantifier free sentences as the limit of local entropy maximisers and then take its unique extension (by Gaifman's Theorem) to the whole of SL.
If the pointwise limit given by the first approach exists and is a probability function, then it agrees with the one obtained from the second approach.To see this let the probability function W be given by the pointwise limit and let P ∞ be the one obtained from the Rad-Paris definition.Then for all n and n-states ω n W (ω n ) = lim r→∞ P r ((ω n ) r ) = lim r→∞ P r (ω n ) = P ∞ (ω n ) .Thus, W agrees with P ∞ on all n-states and so on all quantifier free sentences and hence, by the uniqueness criteria in Gaifman's Theorem they agree on all SL.
The main issue with the Barnett-Paris approach is that the pointwise limit on the whole of SL might exist but not be a probability function.This is obviously circumvented by the second approach: defining P ∞ on quantifier free sentences as the above limit ensures that axioms P1 and P2 are satisfied, and Gaifman's Theorem guarantees a unique extension of P ∞ to be a probability function over all SL.To see how the first approach can fail in this respect, consider the following example.Let L be language with equality and a single binary relation U , and consider the following set of sentences: ϕ 1 = ∀x¬Uxx, ϕ 2 = ∀x, y, z((Uxy ∧ Uyz) → Uxz), ϕ 3 = ∀x, y(¬(x = y) → (Uxy ∨ Uyx)) and ϕ 4 = ∀x∃yU xy.Note that ϕ 1 , ϕ 2 and ϕ 3 are the axioms for a linear strict order and adding ϕ 4 ensures that there are no end points.As noted above, these sentences together have no finite model.Let As we have observed already, P r assigns the full probability mass equally among those r-states that are consistent with ϕ, i.e., those r-states that characterise a strict linear order over t 1 , . . ., t r .There are r! many such r-states.If ω n is inconsistent with ϕ, then all its r-state extensions are inconsistent with it and we have P r ϕ (ω n ) = 0 for all r > n.If, on the other hand, ω n does characterise a strict linear order over t 1 , . . ., t n , then it can be extended to a strict linear order over t 1 , . . ., t r in Π r−n i=1 (n + i) = r! n! many ways each receiving the same probability of 1/r! under P r ϕ , thus The local entropy maximiser P r ϕ assigns probability zero to those n-states that do not correspond to a strict linear ordering of t 1 , . . ., t n and assigns probability of 1/n! to each of the rest.Of these, only one does not appear among {ω 1 , . . ., ω s }, namely the one which puts t i as the final element in the ranking.Hence, Then P ∞ ϕ assigns probability 1 to each conjunct in (A.1) and hence will give probability 1 to the whole conjunction, P ∞ ϕ ( m i=1 ∃yU t i y) = 1, and we have Now for all ψ ∈ SL let W (ψ) = lim r→∞ P r ϕ (ψ r ) as given by the Barnett-Paris definition and assume the limit is well defined for all ψ and that W is a probability function on SL.Then by the discussion above W agrees with P ∞ ϕ on all SL but then 1 = P ∞ ϕ (ϕ 4 ) = W (ϕ 4 ) = lim r→∞ P r ϕ ((ϕ 4 ) r ) = lim r→∞ 0 = 0, a contradiction.Notice that the penultimate equality follows from the fact that ϕ 1 ∧ ϕ 2 ∧ ϕ 3 ∧ ϕ 4 has no finite models.Thus, if W is well defined on all SL, it cannot be a probability function.
We have shown two things: (i) The Barnett-Paris approach might fail to produce a probability function (violating P3); (ii) The Rad-Paris approach does produce a probability function in all cases in which the Barnett-Paris entropy-limit is well-defined on QF SL.Note finally that the Rad-Paris entropy-limit function may fail to satisfy the constraints χ.The maximal entropy function, if unique, always satisfies the constraints since it is by definition a member of E, the set of probability functions satisfying χ.

Table 1
Summary of what is known so far with respect to entropy maximisers for categorical premisses. lim n k=1 Ut i t k ).Let n > i.Then since n k=1 Ut i t k ∈ SL n there are n-states ω 1 , . . ., ω s such that n k=1 Ut i t k ↔ s i=1 ω i .Thus P ∞ ϕ (∃yU t i y) = lim n→∞ lim r→∞ P r ϕ ( n k=1 Ut i t k ) = lim n→∞ lim r→∞ s i=1 P r ϕ (ω i ) .