On the Logic of a Prior Based Statistical Mechanics of Polydisperse Systems: The Case of Binary Mixtures

Most undergraduate students who have followed a thermodynamics course would have been asked to evaluate the volume occupied by one mole of air under standard conditions of pressure and temperature. However, what is this task exactly referring to? If air is to be regarded as a mixture, under what circumstances can this mixture be considered as comprising only one component called “air” in classical statistical mechanics? Furthermore, following the paradigmatic Gibbs’ mixing thought experiment, if one mixes air from a container with air from another container, all other things being equal, should there be a change in entropy? The present paper addresses these questions by developing a prior-based statistical mechanics framework to characterise binary mixtures’ composition realisations and their effect on thermodynamic free energies and entropies. It is found that (a) there exist circumstances for which an ideal binary mixture is thermodynamically equivalent to a single component ideal gas and (b) even when mixing two substances identical in their underlying composition, entropy increase does occur for finite size systems. The nature of the contributions to this increase is then discussed.


Introduction
Whether one thinks of ubiquitous substances such as air or drinking water or more specific fluids such as petroleum or milk, they all count as multicomponent systems, i.e., they all comprise more than one identifiable type of constituent. Furthermore, following early commentaries (see [1] and references therein) on Gibbs' original work [2,3] on mixtures, the latter are thought to play a key role in the-quantum [4][5][6][7][8][9][10][11][12][13] or classical [14][15][16][17][18][19][20][21][22][23][24][25]-foundations of classical statistical mechanics; through the (in)famous Gibbs paradox. Somewhat surprisingly then, the vast majority of statistical mechanics textbooks covers principally single component systems. Mixtures are usually treated but as an exception to the identical particle paradigm. The reasons for this lack of visibility of multicomponent systems, despite their omnipresence in natural and artificial settings, are often rather elusive so that we could be left with-wrongly-attributing motives to their authors ranging from "mixtures do not matter as much as we think" to "mixtures are too complicated to treat in all their details anyways". To be sure, if phase behaviour is to be considered in models of mixtures with interacting constituents, then characterising these phases and their occurrences is indeed a much more complicated problem than when looking at single component systems, as illustrated in relatively recent works [26,27]. Gibbs originally developed the notions of grand (and petit) canonical ensembles to elucidate the statistical thermodynamics of systems with varying particle numbers, including mixing problems [3]. More recently, the entropy of mixtures and Gibbs' paradoxes were revisited within a more contemporary framework involving probabilities and particle exchanges protocols equivalent to the grand canonical ensemble for non-interacting systems [28]. However, for finite size systems, the grand canonical ensemble may not always give the same result as the canonical ensemble and, in practice, many mixing scenarios do not involve any external reservoir with which to exchange particles. In addition, we will see that, even in absence of interaction between constituents, it is arguable that the very nature of a polydisperse system in the canonical ensemble requires additional care in its statistical mechanics treatment which impacts the finite size expected canonical free energy and entropy measures. The purpose of this paper is therefore two-fold: (1) to develop a mathematically and conceptually consistent statistical mechanics treatment of mixtures in the canonical ensemble and (2) to derive general finite size expressions for the canonical measures of free energy and entropy. From these expressions, we will then show that there are circumstances for which a mixture, in the canonical ensemble, becomes equivalent to a single component system thus partially justifying the apparent lack of coverage in the literature. In what follows, we will focus on the statistical mechanics of binary mixtures to illustrate the kind of difficulties that can already emerge at this rather simple level of polydispersity. The article is organised as follows: Section 2 reminds the reader of the classical textbook treatment of binary mixtures in the canonical ensemble. Section 3 introduces a heuristic generalisation of the textbook treatment developed in Section 2 and discusses some of its caveats. Section 4 proposes a general, prior based, statistical mechanics framework for binary mixtures and discusses its consequences. Section 5 looks at the problem of mixing between two mixtures with different compositions and identifies the various contributions to the entropy change so as provide a specific definition of mixing entropy. Finally, Section 6 discusses further perspectives and Section 7 presents conclusions.

Textbook Treatment of Binary Mixtures
The textbook treatment of binary mixtures of non-interacting particles with Hamiltonian H usually considers a system with a total of N particles with N 1 particles of type 1 and mass m 1 and N 2 = N − N 1 particles of type 2 and mass m 2 , different from type 1, confined in a box of volume V and maintained at a temperature T = (k B β) −1 . Following the 1/n! prescription for any n identical particles in the system, the classical canonical partition function Q(N, N 1 , β, V) of the system then reads (e.g., in Refs. [8,9,11]): where h-usually taken as the Planck constant-is a quantity with the dimension of an action and the Hamiltonian H reads where U box ({ r i } i=1,...,N ) is infinite if any particle goes outside the bounding box and is zero otherwise. Upon choosing, as is often done, N 1 = N/2, it follows that: where is the thermal wavelength of species i. Finally, applying the Stirling approximation ln N! ≈ N ln N − N, one finds for sufficiently large N the free energy where ρ ≡ N/V. In Equation (4), the second and third terms can be interpreted as the free energy that a gas of N/2 identical particles of respectively type 1 and 2 would have, had they been separated in equal sized compartments of volume V/2. Given that F = U − TS and that T does not vary, it is then rather timely to introduce the mixing entropy ∆S mix as being: The entropy gain of k B ln 2 per particle found in Equation (5) upon mixing is the well known entropy of mixing attributable to the difference in identity between the two species 1 and 2. Technically speaking, this difference in identity is betrayed by the factor (N/2!) −2 in Equation (3), instead of the usual 1/N! expected for identical particles. However, it has been argued that interpreting the Nk B ln 2 entropy change as stemming from this combinatorial factors was in fact misleading [11]. We will come back to this issue in Section 5.

Heuristic Generalisation
One of the issues with what was presented above as the "textbook" derivation is that it may appear somewhat artificial in a general context. For example, in the case where N 1 = N/2, it follows that N ought to be an even number as there cannot be a fractional number of particles. In addition, even in absence of fractional particle numbers, how are we to choose the species upon adding a single additional particle to the overall system? These questions become more and more inevitable as the kind of mixtures one wants to look at becomes more complex with, e.g., N 1 = N/ √ 2 or when considering more than two species or even infinitely many.
One possible way of tackling these issues is to interpret the composition of a mixture through the probability p ∈ [0, 1] for a particle in the box to be of type 1 and then recast N 1 as N 1 ≡ N p. Since p is a probability, the product N p need not be an integer and each new particle added to the mixture will be of type 1 with probability p and type 2 with probability 1 − p. With this new prescription, Equation (3) can be rewritten as follows: where dy y x e −y is the Euler gamma function which generalises the factorial function to the reals. For large values of x, one can use a saddle point approximation and finds that Γ(x + 1) satisfies the Stirling approximation Γ(x + 1) x ln x − x. Thus, in the large N limit, one finds for the free energy βF(N, p, β, V) N p ln ρΛ 3 1 where s(p) ≡ −p ln p − (1 − p) ln(1 − p). Although some authors [19,26] refer to s(p) as the mixing entropy, we follow [29] and consider that the mixing entropy denomination should be left for unambiguously prepared mixing scenarios and their corresponding entropy variations and therefore refer to s(p), appearing in the equilibrium free energy of a mixture, as its composition entropy. For the reader who is more mathematically inclined, in the case of a Bernouilli random variable with probability p, s(p) is also called the binary entropy [30]. It can be noted that Equation (7) can be rewritten, without any explicit reference to different species, as follows: is the weighted geometric mean of the masses of the species 1 and 2. One important consequence of Equation (8) is that, at fixed composition, bothΛ and s(p) have fixed values and do not contribute to the thermodynamic properties of the system and therefore the non-interacting binary mixture is thermodynamically equivalent to a system of N effective identical particles [22].
Suppose now that we have two binary mixtures each comprising species 1 and 2 but with different compositions: mixture A with probability p A for a particle of the mixture to be of type 1 and mixture B with probability p B for a particle of the mixture to be of type 2. Each mixture contains N/2 particles and is initially confined in a box of volume V/2 (cf. Figure 1). Figure 1. Schematic representation of a mixing scenario between two different substances. The top left panel uses a particle representation with type 1 particles as dark orange squares and type 2 particles as blue disks. We see that the compositions of substances A and B are different. This is further illustrated in representing their underlying probability distribution on the top right panel. Upon mixing the bottom panels, they form a new composition C that is a priori different from A and B.
They are then mixed together in a box of volume V: what is the entropy change in the mixture? To address this question, we need first to determine the composition of the new mixture C obtained after mixing. In the absence of chemical reactions, the conservation of particle number compels us to ascribe the probability p C = (p A + p B )/2. The entropy change upon mixing reads then where is the Jensen-Shannon divergence. Two features of D JS worth mentioning is that it is positive definite and bounded from above by ln 2 and its square root is a distance between probability distributions [31]. One can verify for example that if p A = 0 and p B = 1, then D JS = ln 2 thus retrieving Equation (5).
In spite of the generalisation of the textbook derivation to any binary mixture and the physical insights gained from adopting a composition as probability paradigm, the approach developed above suffers from a few mathematical and conceptual problems which need to be addressed: • The heuristic approach used in this section starts off by directly generalising Equation (3) into Equation (6). Mathematically, it cannot start from Equation (1) as, in general, it could involve a fractional number of phase space integrals over particles of type 1 or 2, respectively. Thus, the current approach cannot be used as a basis to devise a mathematically rigorous composition-probability-based statistical mechanics of mixtures. • Equation (6) made use of the Euler Gamma function to replace the more traditionally accepted n! terms at the denominator so as to account for N p not being an integer. While mathematically this may be fine, it is not justified within statistical mechanics itself and, indeed, as the first point was being raised, it is hardly so, as one could have a fractional number of phase space integrals.

•
Problems arise with N p as well. What does it mean to switch from N 1 (an integer) to N p (not necessarily an integer) in the canonical ensemble? If p is a probability for a particle to be of type 1, then the particle type is a Bernouilli random variable t = {1, 2} such that p(t = 1) = p and p(t = 2) = 1 − p. Suppose we now model a mixture as a collection {t i } i=1,...,N of N independent such random variables. This leads us to define the random variable corresponding to the number of particles of type 1 in the system. If we denote · , from the composition average, N 1 = N p comes. Thus, by substituting N 1 with N p in Equation (3), one effectively replaces an integral number of particles by a real positive expectation value. Conceptually, this poses a problem, at least in principle, as any given individual mixture will only ever have an integer number of particles usually close to, but different from, Np.
It is the aim of the following sections to devise a rigorous statistical mechanics framework of binary mixtures for which the composition is interpreted as an a priori probability distribution with probability p for any given particle to be of type 1 (and probability (1 − p) for a particle to be of type 2). Only once such a framework has been laid out can we hope to delineate the range of validity, if any, of the results summarised in Equations (7)-(9) and the approximate reasoning used to derive them.

Prior Based Statistical Mechanics of a Binary Mixture
We now consider the simple case of the composition of a binary mixture modelled as a collection Denoting N 1 as the random variable for the number of particles of type 1 among N, the probability P(N 1 = N 1 |N, p) for it to be equal to a set integer value N 1 follows a Binomial distribution B N,p (N 1 ) defined by: Given the status of random variable of N 1 , it means that the set of values it can take corresponds to a set of different possible realisations of the same mixture. As a side note, in models with interacting components, the fact that there can be multiple realisations of the same mixture (as characterised by an a priori probability distribution) translates into using random matrices to set the interaction strengths between constituents [27].
From Equation (11), we see that Now, for any given mixture realisation, both N and N 1 are fixed and the framework of the canonical ensemble applies, including Equation (3). We can then substitute 1/(N 1 !(N − N 1 )!) by its expression in Equation (12) and get: We seek a definition of the free energy of a mixture that would be realisation-independent. This is because repeating an experiment with a given mixture-characterised by prior composition probability-likely involves different realisations of its composition. To this end, we denote F (N, p, V, β) ≡ −k B T ln Q(N, N 1 , β, V) the canonical free energy averaged over realisations of N 1 . After a bit of algebra and using the fact that N 1 = N p, we get where s(p) andΛ are as introduced in Equations (7) and (8), respectively, and H( f , g) ≡ − ∑ N N 1 =0 f (N 1 ) ln g(N 1 ) making then H(B N,p , B N,p ) interpretable as the realisation entropy of the mixture. There does not exist any exact closed form expression for H(B N,p |B N,p ) [32] but for N sufficiently large H(B N,p , B N,p ) ∼ (1/2) ln(2πeN p(1 − p)) (see Appendix A) so that, in the large N limit, Equation (14) is equivalent to Equation (8), thus justifying more rigorously its validity. A few remarks are in order regarding Equation (14): • The large N limit leading to Equation (8) in the heuristic derivation (Section 3) implies that both N p and N(1 − p) should be sufficiently large for the Stirling approximation to hold, which can require a very large N if p is either very small or very close to unity. On the contrary, Equation (14) converges rather quickly with N to the asymptotic form Equation (8) especially when p is either very small or very close to unity. This can be explained by remarking that, if 1 − p is very small and N finite, any realisation will most likely have very few, if any, particles of type 2. As a consequence, the realisation entropy will be closer to zero in magnitude as there is less uncertainty in the composition realisations. Therefore, although the heuristic derivation leads to the asymptotic form Equation (8), it incorrectly-given that Equation (14) has stronger mathematical and conceptual foundations-predicts the composition for which the asymptotic regime is reached the fastest.

•
For finite N, the realisation entropy term in Equation (14) actually decreases the entropy estimate given by the first four terms in the r.h.s of Equation (14) as it contributes negatively to the entropy of the system. This can be understood by adopting a "surprise" interpretation of entropy. The last two terms of Equation (14) can then be interpreted as the average surprise to have N 1 type 1 particles in the mixture. On the one hand, the Ns(p) contribution to entropy stems from an estimation of the surprise for a given realisation N 1 as being N 1 ln p + (N − N 1 ) ln(1 − p). This would be exact if one were to either assign a particular order to the particles or if one were to repeat single-particles experiments N times, where the identity of the particle for each try is obtained from the underlying probability distribution of the Bernouilli variable t, and add-up the observed individual surprises [22]. On another hand, the probability to have N 1 particles of type 1 among N is B N,p (N 1 ) and the corresponding surprise is − ln B N,p (N 1 ). The difference between the two gives us the relative surprise − ln p N 1 (1−p) N−N 1 which captures the contribution to entropy owing to composition.
Finally, we note that Sollich et al. [26,33] have proposed a strategy based on a moment-description of thermodynamic quantities for sufficiently large N to tackle the fact that a mixture composition of a single finite system is but a realisation of some underlying prior composition. This strategy differs in spirit from the one developed in the present paper in that it aims at performing a dimensional reduction of the free energy landscape by using a dependence of the free energy in the moments (ideally a small number of them) of the feature used to characterise the composition (e.g., size. mass, etc...) to obtain insights on the phase behaviour of mixtures. This objective is currently beyond the scope of the present paper.

Identifying the Gibbs Mixing Entropy
In this section, we consider a situation analogous to the one described in Figure 1 whereby N/2 particles (N being even) of a mixture A initially confined in a volume V/2 mix with N/2 particles of a mixture B initially confined in a volume V/2. Mixtures A and B only comprise type 1 and 2 particles but each particle identity is modelled with different random variables depending on the mixture it belongs to: t A with p(t A = 1) = p A and t B with p(t B = 1) = p B . Following the theory developed in Section 4, we model the composition of mixture A (resp. B) as a collection of N/2 independent Bernouilli random variables {t A i } i=1,...,N/2 (resp. {t B i } i=1,...,N/2 ) and denote N A 1 (resp. N B 1 ) the random variable giving the number of type 1 particles in mixture A (resp. mixture B). N A 1 and N B 1 both follow a Binomial distribution and therefore, prior to mixing, the free energy of each mixture-for a given realisation-follows from Equation (12): Upon mixing, the A and B mixtures will form a new mixture C with a composition emerging from the constraints during the diffusion process [29]. In the absence of chemical reactions, the number N C 1 of particles of type 1, once mixing has occurred, is a random variable satisfying: and We note that Equation (17) justifies the approach used for mixture C within the heuristic approach described in Section 3. If we want to know what is the probability for having N C 1 = N C 1 , then we have: where δ i,j = 1 if i = j, and zero otherwise, is the Kronecker delta function. If p A = p B , then P N,p A ,p B (N C 1 ) = B N,p A (N C 1 ). However, there is no known closed form expression for P N,p A ,p B (N C 1 ) when p A = p B so if one wants to estimate its values it has to be done either numerically by carrying out the whole summation or by using an approximate expression [30].
The canonical free energy of the final state of the system for a given realisation N C 1 can be written from the form of the canonical partition function in Equation (13): The difference with Equation (13), however, is that, in Equation (20), the probability distribution B N,p C (N C 1 ) is in general not equal to the actual probability distribution (given in Equation (19)) for obtaining N C 1 type 1 particles after the mixing of substances A and B. We will see in what follows that, as long as B N,p C (N C 1 ) has the same support as P(N C 1 = N C 1 |N, p A , p B ), this problem can be overcome. For a given realisation of mixtures A and B composition, we can now obtain the entropy variation ∆S mix upon mixing as being of the two gases from . For each realisation, a different entropy variation can be be found in principle. Like in Section 4, a realisation-free entropy change ∆S mix ≡ ∆S mix can be sought by averaging over composition realisations of substances A and B. We get (see Appendix B): In Equation (21), we have identified three different contributions to the entropy change upon mixing substances A and B that are worth commenting on:

•
Partitioning: The first term on the r.h.s of Equation (21) corresponds to the entropy change owing to having removed a partition between the initial compartments or, said differently, results from the more numerous partitioning possibilities offered when the whole volume V can be explored instead of V/2. Indeed, when each particle can freely roam in the boxe of volume V, imagining a virtual division splitting the box into two equal half will lead each particle to be either on one side or the other of the virtual dividing wall. There are then 2 N possibilities. When the dividing wall switches from virtual to real, and N/2 particles have to be on each side, the number of ways of realising this partitioning is N!/((N/2)!) 2 . The corresponding entropy change is therefore larger than zero for any finite N. • Composition distance: The second term on the r.h.s of Equation (21) has already been discussed in Section 4 and corresponds to a contribution to the entropy variation stemming from how different are the compositions of mixtures A and B when characterised by underlying probability distributions for particle identities. Since D JS (p A |p B ) is a square metric, it becomes strictly zero when the a priori compositions are identical. • Composition realisation: The third and last term identified on the r.h.s of Equation (21) corresponds to the entropy variation owing to realisation considerations. It is worth noting that when the compositions are identical i.e., ) because of the submodular character of Shannon's entropy function [34]. As a result, the entropy variation due to composition realisations is larger than or equal to zero upon mixing even when substances have the same underlying composition.
When split into the three contributions identified in Equation (21), the reasons for an entropy increase upon mixing can be seen in a new light.
First, as noted in [11], conflating the presence of combinatorial terms in an entropy expression with an explicit role of particle identity can be misleading. Indeed, the partitioning entropy term has nothing to do with whether substances A and B are considered identical or not and is therefore not exclusive to the mixing of different substances. It will therefore appear in all mixing scenarios no matter what. This positive increase in entropy owing to partitioning can, at least in principle, be used to do meaningful work as exemplified with Szilard's engine [35]. This positive increase in entropy also further supports the-usually dismissed-intuitive claim that, even when substances are identical, some form of mixing does indeed occur.
The realisation entropy contribution that is positive even when substances have an identical underlying composition can be understood by the fact that upon mixing information on the initial realisations of mixtures A and B has disappeared. Therefore, if one were to insert a dividing wall, even if it happens that there are N/2 particles on each side, the realised compositions on either sides of the wall will be differing from the initial ones, even if selective membrane is used. This difference in realisation can in principle be used to do work by adapting Szilard's engine to incorporate a semi-permeable membrane. This conclusion would not follow, however, if the substances are identical in their underlying composition and p A is exactly equal to 0 or 1 since the consideration on realisations would then become meaningless.
It remains then the square metric contribution to entropy which is the only one to exactly vanish if substances are identical in their underlying composition and to measure in what sense substances A and B differ from each other in principle. For reasons detailed elsewhere [22], we argue that this contribution reflects best Gibbs' original insights on the mixing of substances by diffusion [2] and shall therefore denote ∆S Gibbs mix ≡ k B ND JS (p A |p B ). All of the above considerations with regard to Equation (21) hold for finite size systems. From a standard use of Stirling's approximation, the first term on the r.h.s of Equation (21) vanishes in the thermodynamic limit. At first glance, it is nevertheless unclear how the H(P N,p A ,p B , B N,p C ) behaves in the large N limit. It can be shown (Appendix A) that the term H(P N,p A ,p B , B N,p C ) is at most of order ∼ O(ln N) so that ∆S mix ∆S Gibbs mix = k B ND JS (p A |p B ) (22) for large N.

Discussion
After having briefly presented what we called the textbook treatment of binary mixtures, we looked at a heuristic generalisation which is grounded in the idea that a size-independent definition of a substance necessitates the existence of a prior underlying probability distribution from which are drawn the particles identity. The heuristic formulation works by substituting N 1 the number of particles of type 1 by N p, p being the probability for an individual particle to be of type 1. Following some postulated extensions of the canonical ensemble framework, this led us to Equation (13) which enabled us to discuss how the thermodynamics of a binary mixture can then be equivalent to that of a system of identical particles, provided the composition is kept fixed. Acknowledging various caveats of the heuristic approach, we then developed a more rigorous approach fully compatible with traditional statistical mechanics. This new approach starts by taking seriously the idea of a prior probability distribution underlying the composition of a substance. If that is the case, then two different realisations of the same underlying distribution need not have the exact same composition. Having this in mind, it then conjectured that to obtain a realisation-independent thermodynamics of a binary mixture, one needs to consider the free energies averaged over realisations. This procedure enabled us to retrieve Equation (13) in the large N limit, thus validating the more intuitive approach used in Section 3. Incidentally, this further supports the idea that for dilute gases or substances at fixed composition, the actual polydisperse character of the substance bears no consequences on the thermodynamics of the system and it is enough to consider the system as comprising only-effective-identical particles. It is unclear whether this is purposeful or not, but this equivalence between a binary mixture and a single component system can partially justify-at least a posteriori-the predominance of single component systems in the literature available at undergraduate level.
Aside from the average free energy of a given mixture, the entropy variation upon mixing two different mixtures has also been studied within the new statistical mechanics framework proposed in this paper. It was found that the entropy of mixing comprised three physically different contributions: one owing to partitioning that is not exclusive to the mixing of different substances, one owing to a square distance between the underlying probability distributions for particles' identity and a last one owing to realisation entropy differences. This last term is new and does not vanish when the underlying compositions of the two mixed substances are the same. Upon inspection, the second-metric based-term in Equation (21) is therefore the closest to the entropy of mixing, as discussed by Gibbs in [2]. Finally, we discussed the thermodynamic limit behaviour of Equation (21) and showed that it reduced to the Gibbs entropy (22).
It is worth noting that the proposed theory of binary mixtures relies on the assumption that the substances can be modelled as collections of independent random-identity-variables. In practice, any protocol which behaves the same way with any particle regardless of how many of a specific kind it has already processed, would give rise to a Binomial distribution for particle number of type 1 (resp. 2). However, there can also be non-independent collections of Bernouilli variables with P N,p (N 1 ) = B N,p (N 1 ) (e.g., kinetic proof reading or active sorting) [36]. In such cases, the mathematical framework developed in these pages is still valid, but the asymptotic behaviours of average free energies need to be elucidated for each individual case. In addition, it is important to stress that, if the particles were to interact with each other, that would not necessarily mean that the particles' identity random variables are not independent. Instead, particle interactions could simply be expressed through random matrices [27]. Therefore, one avenue of exploration would be to adapt it to interacting mixtures. Moreover, it is possible to see a parallel with the composition-realisation-based statistical mechanics developed in the present paper and averages over disorder in systems with quenched disorder (see e.g., [37]). This offers possibilities of cross-fertilisation between different branches of statistical mechanics. Now, if the realisation-based approach corresponds to averages over quenched disorder in other fields, it is tempting to draw a parallel between grand canonical ensemble-treatments of mixtures [28] and annealed disorder. The author is not aware of an equation equivalent to Equation (22) in the context of the grand canonical ensemble and this is another route worth exploring.
Finally, in Refs. [22,29], the heuristic approach was used to suggest that both discrete and continuous polydisperse systems were equivalent to a system of identical particles, provided the composition remains the same and a generalised version of Equation (9) was derived for mixtures of arbitrary number of components and composition. Whether these results can remain valid in a version of the present framework extended to more than two components remains to be determined.

Conclusions
In this article we proposed a theory to address the possibility of different realisations of a given composition and obtain realisation-independent free energies in the canonical ensemble. This lead us to find that (a) in addition to the composition entropy discussed in [22,29] a realisation entropy-vanishing in the thermodynamic limit-emerges in the expression of the free energy of a binary mixture, (b) for a fixed composition the thermodynamics of a binary mixture is equivalent to that of a system of identical particles, (c) the entropy change upon mixing two finite size identical substances has two positive contributions owing to firstly an increase in partitioning entropy and secondly a loss of information on substances' realisations and (d) in the thermodynamic limit the mixing entropy is a square norm between the compositions of the substances being mixed. Further work remains to be done to extent this work to more complex mixtures.
Funding: This research received no external funding.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Limiting Behaviour of H(P N,p A ,p B , B N,p C )
We wish to determine the asymptotic behaviour of H(P N,p A ,p B , B N,p C ) as N tends to infinity. To this end, we first note that H(P N,p A ,p B , B N,p C ) ≥ 0 as it is the cross entropy over a discrete probability distribution. Next, we recall Jensen's inequality [38], which, for a given random variable X, states that E[ϕ(X)] ≥ ϕ(E[X]), where E[·] stands for the expectation value. If we choose B N,p C (N C 1 )/P N,p A ,p B (N C 1 ) for X and ϕ(x) = ln x, we get , respectively. In the large N limit, the central limit theorem applies and we have Given that N C 1 = N A 1 + N B 1 , we can use Equation (A3) and consider N C 1 as a sum of two normally distributed random variables. Since the sum of two normally distributed random variables with mean µ A (resp. µ B ) and variance σ 2 A (resp. σ 2 B ) is also normally distributed but with mean µ A + µ B and variance σ 2 A + σ 2 B , P N,p A ,p B (N C 1 ) ∼ N N p C , N 2 (p A (1−p A )+p B (1−p B )) (N C 1 ).
Thus, denoting σ 2 C = N(p A (1 − p A ) + p B (1 − p B ))/2, we get that H(P N,p A ,p B , P N,p A ,p B ) ∼ H(N N p C ,σ 2 C , N N p C ,σ 2 which, when combined with Equation (A2), shows that H(P N,p A ,p B , B N,p C ) is at most of order ∼ O(ln N) in the large N limit.

Appendix B. Deriving the Entropy Change Averaged over Compositions
In this section, we want to find an expression for the entropy variation upon mixing two substances averages over all possible realisations of their composition. We start by discussing more formally which average is taken. In our particular case of mixing, the average of any function f (N A 1 , N B 1 ) is defined as From the general definition above, two specific cases are worth looking at: We are now equipped to look at the average of ∆S mix /k B = βF A (N/2, N A 1 , β, V) + βF B (N/2, N B 1 , β, V) − βF C (N, N C 1 , β, V). To this end, we note that the first two free energies are functions of only one of the particle number and will therefore follow Equation (A10). This case is rather straightforward and gives an expression of the form (14): (A13) Next, the average of the free energy once mixing has occurred follows the case described in Equation (A12) and we get: Substituting N C 1 by N p C and subtracting Equation (A14) from (A13) then gives Equation (21).