Universal bounds for imaging in scattering media

In this work we establish universal ensemble independent bounds on the mean and variance of the mutual information and channel capacity for imaging through a complex medium. Both upper and lower bounds are derived and are solely dependent on the mean transmittance of the medium and the number of degrees of freedom $N$. In the asymptotic limit of large $N$, upper bounds on the channel capacity are shown to be well approximated by that of a bimodal channel with independent identically Bernoulli distributed transmission eigenvalues. Reflection based imaging modalities are also considered and permitted regions in the transmission-reflection information plane defined. Numerical examples drawn from the circular and DMPK random matrix ensembles are used to illustrate the validity of the derived bounds. Finally, although the mutual information and channel capacity are shown to be non-linear statistics of the transmission eigenvalues, the existence of central limit theorems is demonstrated and discussed.

In this work we establish universal ensemble independent bounds on the mean and variance of the mutual information and channel capacity for imaging through a complex medium. Both upper and lower bounds are derived and are solely dependent on the mean transmittance of the medium and the number of degrees of freedom N . In the asymptotic limit of large N , upper bounds on the channel capacity are shown to be well approximated by that of a bimodal channel with independent identically Bernoulli distributed transmission eigenvalues. Reflection based imaging modalities are also considered and permitted regions in the transmission-reflection information plane defined. Numerical examples drawn from the circular and DMPK random matrix ensembles are used to illustrate the validity of the derived bounds. Finally, although the mutual information and channel capacity are shown to be non-linear statistics of the transmission eigenvalues, the existence of central limit theorems is demonstrated and discussed.

I. INTRODUCTION
The need to image through a complex scattering medium occurs frequently in, for example, biomedical optics, aerial reconnaissance, remote sensing and astronomy [1][2][3]. Such efforts are, however, frequently impeded since upon transmission through complex media the structure of the incident image field is strongly modified resulting in a randomly varying output speckle pattern bearing little or no resemblance to the original image. To overcome this problem numerous techniques have been developed in recent years. Measurement of the transmission matrix of a scattering medium, for example, allows retrieval of the original image by application of the associated inverse operation [4][5][6]. Short range correlations in the output speckle pattern can also be leveraged to enable numerical image reconstruction by means of either iterative phase retrieval or cross-correlation based algorithms [7][8][9]. Alternatively, single pixel imaging techniques extract information through sequential variation of either the illumination or measurement basis in combination with spatial integration of the output speckle, allowing the initial image to be rebuilt in terms of its constituent spatial modes [10,11].
As a consequence of the inherent randomisation of an image caused by transmission through a scattering medium, a computational step is always required to obtain a final image. Although good imaging results have been reported in a variety of experimental setups, the quality of such computational images can frequently be algorithm dependent, making comparison and benchmarking more difficult, an issue also encountered with other imaging modalities [12][13][14]. Traditionally reported metrics of imaging performance, such as the spatial resolution or fidelity of the output image [15], critically do not distinguish between the detrimental effects of scattering in the medium and data post-processing. Whilst * matthew.foreman@imperial.ac.uk the former is fundamental, the latter can in principle be improved through better algorithm design. It is therefore natural to ask what fundamental limitations are imposed by transmission through a scattering medium, a problem which we consider in this work. We note that this question is also related to the degree of control in wavefront shaping [16]. Considering that the input and transmitted images are highly dissimilar, conventional imaging quality metrics are notably unsuitable to address this question. Instead we adopt an information based perspective, whereby the output speckle pattern is treated as a message from which we extract information about the scene of interest.
The greater suitability of information based metrics to quantify transmission through disordered media has motivated a number of related studies [17][18][19][20]. For instance, it has been shown that interference effects, which are prevalent in scattering environments, affects the rate of information transmission between antenna arrays [21]. Information-theoretic metrics, such as the channel capacity of an information channel have furthermore been directly related to, and shown to decrease as a result of, mesoscopic correlations between scattered waves [22]. The mutual information between reflected and transmitted speckle images has also been recently investigated [23]. These studies, however, are typically either limited to the dispersive regime or require a priori statistical knowledge of the scattering properties of the medium. Moreover, focus has generally been restricted to either wireless communications or transmission through wires and previous results are thus less applicable to an imaging context. In this work we therefore ask the question of whether there exist fundamental system and algorithm independent bounds on how information encoded into images can be transmitted through a complex scattering medium. Making no assumptions on the nature of the scattering medium, other than to restrict to statistical ensembles with a given mean transmittance, we show that the answer to this question is in the affirmative. We derive and discuss these bounds. The structure of this article is therefore as follows. We begin in Sec-tion II by formalising the information theoretic treatment of imaging through scattering media, before performing a Monte-Carlo based study of the statistical properties of some common statistical ensembles describing scattering in complex media in Section III. Derivation and discussion of universal ensemble independent bounds on the mean and variance of our information-theoretic metrics is given in Section IV where comparison to numerical results is also given. Finally, our conclusions are given in Section V.

II. INFORMATION IN IMAGING THROUGH SCATTERING MEDIA
Before it is possible to quantify the effect of transmission through a scattering medium, it is first necessary to formalise the information content of the starting image. To do so we begin by noting that the number of degrees of freedom N of an optical image are, in general, limited. For a digital image these degrees of freedom naturally correspond to the number of pixels present, whilst for analog images the limit can derive from the finite bandwidth with which the image is generated or recorded (i.e. the spatial resolution) [24]. Although finite, the degrees of freedom of an image can nevertheless be used to encode information, such that an image can be considered as a single symbol from an information source with a source alphabet of S = {S 1 , S 2 , . . . , S N } [25]. As a concrete example, consider a digital image comprising of N pixels with a fixed total power corresponding to a single photon. The N symbols can be encoded onto the position of the photon, whereby S j then corresponds to the photon being registered on the jth pixel of the image. The information encoded in the photon's position can then be quantified using the Shannon entropy, H(S) = − N j=1 p(S j ) log p(S j ), where p(S j ) is the probability that the photon is observed in the jth image pixel. The total information encoded in an image composed of n photons is thus nH(S). Note, that for ease of notation we shall use p(· · · ) throughout this work to denote different probability distributions, where the associated relevant random variable will be apparent from the argument.
Although it is natural to consider each pixel of an image as an individual degree of freedom, it is equally legitimate to instead consider utilising different extended spatial modes, drawn from a complete orthonormal set of basis functions, to encode information. One possible choice of such basis functions is, for example, the Hadamard functions as are frequently used in single pixel imaging [11]. With this interpretation, an arbitrary input image field can be represented as a superposition of spatial modes with associated mode coefficients a = [a 1 , . . . , a N ]. In turn, the probability that a photon, chosen at random from all photons which make up the input image, is in the jth mode is given by p(S j ) p j = |a j | 2 / N k=1 |a k | 2 , where we define the shorthand notation p j for conve-nience. Naturally, each pixel can also be considered a spatial mode, whereby |a j | 2 is the intensity of each image pixel.
An image field incident upon a scattering medium generates both a reflected and transmitted field, both of which can also be represented as a superposition of spatial modes. Accordingly, the effect of a medium on the incident image can be described using the scattering matrix S of the medium, viz. Note that we assume the number of input and output modes are equal for simplicity. For a lossless system we can express the scattering matrix using the polar decomposition [26] where O is the null matrix, U, U , V and V are unitary matrices of singular vectors, and τ = [τ 1 , . . . , τ N ] is a diagonal matrix containing the transmission eigenvalues. To simplify the analysis we henceforth assume the input and output modes correspond to the singular basis of the medium such that U, U , V and V in Eq.
(2) can be replaced by the identity matrix. Since this change of basis is performed using a unity transformation, no information is lost. Within this framework, in this work we consider three scenarios, namely measuring in i) transmission, ii) reflection or iii) both. Specifically, when measuring in transmission we input modes a and measure the transmitted intensities of each mode i.e. |c j | 2 . The output alphabet contains symbols denoted T 1 , T 2 , . . . , T N , corresponding to the N transmission modes. In measuring all |c j | 2 (j = 1, . . . N ), however, we also learn about how much energy is in the aggregate of the reflected modes ( N j=1 |b j | 2 ) since by conservation of energy N j=1 |a j | 2 = N j=1 |b j | 2 + N j=1 |c j | 2 . We denote this additional possible output symbol by T N +1 , such that the complete output alphabet is T = {T 1 , . . . , T N , T N +1 }. Reflection based measurements are similar albeit we now measure the reflected mode intensities |b j | 2 . Again through this measurement we also learn about the total transmitted intensity N j=1 |c j | 2 , such that the output alphabet is R = {R 1 , . . . , R N , R N +1 }. When measurements are made in both reflection and transmission the corresponding alphabet of the output is U = {R 1 , . . . , R N , T 1 , . . . , T N }. Note that in this latter case the aggregate output symbols (i.e. the (N + 1)th outputs) are omitted.
Thus far it has been implicitly assumed that the basis of spatial modes used to express the input and output images is complete. Taking an angular spectrum basis for concreteness, this implies that all spatial frequencies of the original image are incident onto the scattering medium, and similarly that all output spatial frequencies are collected. The former is easily achieved by matching the numerical aperture (NA) of the illumination optics to the spatial bandwidth of the initial image. Use of finite NA collection optics, however, means that light that is scattered out of the medium at large angles is not measured and thus the detection basis is not complete as assumed. This scenario can be approached using filtered scattering matrices, as is detailed further in Ref. [27]. Alternatively, the input and output bases can be expanded so as to include all possible angular modes and the undetected output modes instead incorporated into the aggregate channels described above. For example, when measuring in transmission with a finite NA lens, the aggregate channel T N +1 would include all of the reflected modes in addition to the angular modes lying outside the NA of the collection optics. The latter is preferable from an informatic standpoint since it is more apparent where information is lost in the system.
The quality of information transmission through a scattering medium can be quantified using the mutual information per photon between the measured output mode intensities (including the aggregate modes for Cases i and ii) and the original image, defined as where N = T , R or U for Cases i-iii respectively and is the conditional entropy, or equivocation, of N given S. Noting p(N j , S k ) = p(N j |S k )p k and p(N j ) = N k=1 p(N j , S k ), the mutual information can be calculated from the source probabilities p k and the set of conditional probabilities p(N j |S k ). The latter can be found from the scattering matrix S. From Eq. (1) and Eq. (2) where ρ j = 1 − τ j is the reflectance of the jth eigenmode of the scattering medium and δ jk is the Kronecker delta. Upon normalising by the total incident intensity and comparing to the law of total probability we can make the association p(R j |S k ) = ρ k δ jk for j = 1, . . . N . Similarly it follows that p(T j |S k ) = τ k δ jk for j = 1, . . . , N . These probabilities are sufficient when considering Case iii (N = U), however Cases i and ii require the further probabilities p(T N +1 |S k ) = 1 − τ k and p(R N +1 |S k ) = τ k respectively, which follow by summing Eq. (5) (and the equivalent expression for |c j | 2 ) over j. Physically this embodies the fact that the transmittance (reflectance) of a medium describes the fraction of photons that are transmitted (reflected). Substituting these probabilities into Eq. (3) we find where we have defined The first thing to note from Eq. (6) is that by measuring the intensity in all possible output modes (N = U) we are able to extract all the information contained in the original image, as would be intuitively expected given the system is assumed to be lossless. If, however, we do not measure the reflected (transmitted) modes as in Case i (Case ii) we lose an amount of information equal to −Λ N N j=1 P N j log P N j . Physically, P N j represents the probability that a photon, taken from only those modes that are not directly measured, is in the jth spatial mode. Accordingly − N j=1 P N j log P N j can be interpreted as the Shannon entropy contained in only the reflected (transmitted) modes, however, when considering all possible output modes this information must be weighted by the relative fraction of energy carried by the reflected (transmitted) waves, i.e. by Λ N .
It is well known that Shannon entropy is maximised when the probability of each underlying state is equally likely [25]. For a system with a given transmittance, it is therefore evident from Eq. (6) that the information lost is greatest when the expected energy in each of the lossy modes (i.e. the reflected or transmitted modes for Case i and ii respectively) is equal, specifically P N j = 1/N . If the source entropy is maximised p j = 1/N , this corresponds to a uniform eigenvalue spectrum. More generally, however, the information loss from transmission through a scattering medium is dependent on both the input image (due to the p j dependence) and the spectrum of transmission eigenvalues. The channel capacity of the scattering medium, which describes the maximum information that can be transmitted through the medium over the space of all input images, offers a more general, source independent method to quantify information loss. Specifically, for a fixed scattering medium, the channel capacity can be found by maximising the mutual information I N with respect to the input probabilities p j , i.e. C N = sup {pj } I N . Using the standard result for maximum Shannon entropy mentioned above, it follows immediately that C U = log N [25]. To determine C T and C R , we must however maximise the mutual information, subject to the constraint N j=1 p j = 1, explicitly using the method of Lagrange multipliers. So doing yields the result that C N = − log p k + η N k log P N k , where we use the tilde notation to denote optimal quantities, which must hold for all k = 1, . . . , N . Using the definition for P N j to replace p k in this relation yields, upon rearrangement, Summing over k gives the transcendental equation which with knowledge of all transmission eigenvalues can be solved numerically to find Λ N exp[C N ]. The optimal P N k then follow from Eq. (8). In turn the input probabilities that maximise the mutual information and achieve the channel capacity follow according to which further allows Λ N and C N to be calculated individually. Eq. (9) is formally equivalent to the condition found for the channel capacity of a reduced information channel [25]. Although still dependent on the precise details of the spectrum of transmission (or reflection) eigenvalues, i.e. η N k , in the Supplementary Material we show that both C N and I N are decreasing functions of η N k . Accordingly we find that modes with larger η N k individually contribute more significantly to the total information content of the measured signal. It should also be noted that in the derivation above it has been assumed that 0 < η N k < 1. If η N k = 0 or 1 for some k additional care must be taken in the maximisation. Illustration of how such cases can be approached is given in the Supplementary Material where we derive the channel capacity for the extreme case of a bimodal information channel, i.e. one for which η N k are Bernoulli random variables. Specifically, we show that when K elements of the vector η = [η N 1 , . . . , η N N ] are zero, corresponding to so-called open channels [28], (N − K elements are hence unity) the channel capacity is given by

III. STATISTICAL PROPERTIES OF INFORMATIC METRICS
The transmission properties of scattering media are in general random and thus the mutual information upon transmission of a known image and the channel capacity of the medium differ case by case. If the scattering matrix of a given medium is known, individual results can nevertheless be calculated. The stochastic variability of these information-theoretic quantities is however intrinsically linked to the underlying probability distribution function (PDF) of the transmission eigenvalues, p(τ ), and thus differs depending on the physics at hand. Chaotic behaviour, for instance, can arise in systems where waves are scattered from structures whose typical dimensions are large relative to the wavelength of the scattered waves, whereby ray dynamics is dominant [29]. Random matrix theory is known to provide a good statistical description of the scattering matrices of such systems [28,30] through use of Dyson's circular ensembles [31]. Random disordered systems, in which incident flux is on average equally distributed among all possible outgoing channels, are however better described using the theory of Dorokhov [32], and Mello, Pereyra and Kumar [33] (DMPK), which describes evolution of p(τ ) with medium thickness using a Fokker-Planck equation. In each of these cases, and indeed more generally, the precise form of the PDF governing the transmission eigenvalues (and hence I N and C N ) is dependent on the number of transmission modes N [30]. To illustrate this point, in Fig. 1 we plot PDFs of the channel capacities C T and C R , calculated using 2×10 4 realisations of scattering matrices sampled from the circular unitary (CUE) and orthogonal ensembles (COE), for different N . Scattering matrices drawn from the CUE are only constrained so far as to ensure S is unitary, such that the PDFs found for transmission (blue bars) and reflection (purple) are identical (within statistical fluctuation). The further constraint of time reversal symmetry is however imposed on scattering matrices sampled from the COE. By virtue of the resulting coherent back scattering, in which time reversed scattering trajectories constructively interfere [30], it is seen that the channel capacity for measurements made in reflection (yellow bars) is on average larger than that for transmission measurements (green). Differences in p(C R ) and p(C T ) are particularly marked for low N , however, for larger numbers of modes the PDFs both converge to a normal distribution, suggestive of a central limit theorem (CLT). Similar asymptotic behaviour is also seen for the CUE and DMPK ensemble and for PDFs of the mutual information (not shown).
Each term in the summation of Eq. (6) (and the equivalent for C N ) can be considered as different random variables, however, existence of CLTs (for N = U) for large N is non-trivial since non-zero correlations between the transmission eigenvalues violate the usual independent random variable approximations required for conventional CLTs to hold [34]. It has been shown that any linear statistic of the transmission eigenvalues [35] does still obey a CLT even in the presence of eigenvalue correlations, however it is important to note that I N and C N are not linear statistics of τ j . This observation is in contrast to previous works, e.g. [19], and is a result of the aggregate symbols R N +1 and T N +1 which, by definition, depend on all eigenvalues. This non-linear dependance also implies that the mean properties of I N and C N are not wholly determined by the average eigenvalue spectrum, since higher order statistical moments can play an important role. CLTs for a restricted class of multilinear statistics have been demonstrated [36], however, in the current context the CLT can be most easily justified through extension of the rationale of Ref. [37] for linear statistics. This is detailed in the Supplemental Material. Existence of such CLTs is restricted to the same cases as described in Ref. [35] and do not exist for all possible ensembles.
Full statistical parametrisation of the channel capacity and mutual information can give deep insights, however, from a practical point of view the values expected on average, and the degree to which they vary between media of the same statistical class, are more convenient. Accordingly, we here considerC N = E η [C N ] and N ] −C 2 N (and analogous quantities for I N ), where E η [· · · ] denotes the statistical expectation over the ensemble of possible η (or equivalently τ ). Given the CLTs discussed above, for large N these parameters can be sufficient to uniquely describe the full PDF. Figure 2 shows the dependence of the average channel capacity for transmission measurements (C T ) as a function of N (dark gray line with square markers) when the scattering matrix is drawn from the COE as calculated using Monte-Carlo simulations. The shaded gray area, moreover, depicts the corresponding band defined byC T ± σC T . Average channel capacity is seen to increase sub-linearly as mode number increases in contrast to other geometries in which a linear increase has been found [17,21]. Furthermore Fig. 3 showsC T for scattering matrices drawn from a DMPK ensemble for disordered media of varying thicknesses L (measured in mean free paths l). Scattering matrices in this case were generated using the technique detailed in Ref. [38]. It is evident that channel capacity in transmission decreases as the mean transmittance of each eigenchannel (which are equal for all eigenchannels within the DMPK model) decreases sinceτ ∼ (1+L/l) −1 [39].

IV. UNIVERSAL BOUNDS ON MEAN INFORMATION
Although numerical results, as shown in Figs. 1-3, are insightful, exact analytic results are preferable. The complexity of the PDFs governing τ however preclude determination of exact analytic results. Moreover, ensemble specific results are somewhat restrictive and not applicable to different classes of scattering media. As such we instead now consider derivation of ensemble independent informatic bounds. For measurements made in both reflection and transmission (Case iii) our analysis is particularly simple, since all information is retrieved such that C U = log N ,Ī U = H(S) and σ 2 C U = σ 2 I U = 0. For Cases i and ii, whilst it is immediately obvious that regardless of the underlying statistics, C N ≤ log N and I N ≤ H(S), tighter upper bounds, parametrised only by the mean reflectance or transmittance respectively, can be derived. To do so relies upon the observation that both C N and I N are convex functions with respect to η N k , which we prove in the Supplementary Material. Since the analysis is identical for N = R and T we temporarily drop the N superscripts on η j for clarity.
Convexity of both the channel capacity and mutual information means that lower bounds for their expectations immediately follow from Jensen's inequality [40], namely C N (η) ≤C N and I N (η) ≤Ī N whereη = E η [η]. For a balanced channel for which all mean eigenvalues are equal (η j =η for all j) these lower bounds take the simple form (1−η) log N ≤C N and (1−η)H(S) ≤Ī N . Equality is achieved for a deterministic medium with fixed transmittance. We thus note that the channel capacity of a random scattering medium is on average larger than that of a deterministic channel.
Derivation of an upper bound forC N again invokes convexity of C N . Specifically, convexity with respect to Since C N is convex with respect to all η j similar inequalities can be sequentially applied yieldinḡ Accordingly the upper bound in Eq. (11) represents the weighted average of the channel capacity that can be sent through bimodal information channels (i.e. an information channel for which η i = 0 or 1 for all i). The weightings in the mixture, however, depend on the statistics of the transmission eigenvalues as parametrised by the set ofw(u). A universal, i.e. ensemble independent, upper bound on the mean channel capacity can thus be found by maximising the right-hand side of Eq. (11) with respect to the expectationsw(u). This maximisation is however performed subject to a number of constraints onw(u) beyond that given by Eq. (12). Firstly we note that because 0 ≤ η i ≤ 1 for all i, the weights are themselves bounded such that 0 ≤w(u) ≤ 1. Furthermore for Case i (ii) we can consider the total reflectance (transmittance) given by g(η) = N i=1 η i . Noting that g N (. . . , η j , . . .) = (1−η j )g N (. . . , 0, . . .)+η j g N (. . . , 1, . . .) it follows (similarly to above) that whereη i = E η [η i ] is the mean of the ith transmission eigenvalue. The right hand side of Eq. (13) physically corresponds to the mean total reflectance (transmittance) of the scattering medium, denotedḡ N . When considering the universal upper bound on the mean mutual informationĪ N , the maximisation must in general be performed numerically since the bound is strongly dependent on the source image through p j . Analytic results can however be found for the universal upper bound on the channel capacity by noting that C N (u) in Eq. (11) represents the channel capacity for a bimodal channel with K open channels, i.e. C N (u) = log[K + 1 − δ KN ] (see Supplemental Material). Since C N (u) depends only on the number of zero elements in u and not on the ordering of the elements Eq. (11) can be written asC where w K = u∈U Kw (u) is the sum of the weights over the set U K of bimodal channels with |u| 2 = N −K. For a system with total mean reflectance (transmittance) ofḡ N the constraints can be similarly written where k = floor[ḡ N ]. The corresponding universal upper bound on the channel capacity is henceC N ≤C max Both the upper and lower bounds on the mean channel capacity are shown in Figs. 2 and 3 (dashed and dotted curves respectively), in addition to the weaker upper bound of log N (dot-dashed curve). Neither of these upper or lower bounds can be improved without further restricting the properties of the statistical ensembles under consideration. We have seen above that when measurements are made in both reflection and transmission (Case iii) all information encoded in the original image can be extracted. It may hence be intuitively expected that if the mutual information (or channel capacity) for transmission measurements is larger, then the corresponding value for reflection measurements is smaller. Each pair of metrics, e.g. (C T , C R ), define an information plane as illustrated in Fig. 4, on which such relations can be visualised. Through simple manipulation of Eq. (6) and application of Gibbs' inequality [25] it can be shown that H(S) ≤ I R + I T ≤ 2H(S) and similarly log N ≤ C R + C T ≤ 2 log N . In combination with our earlier bounds these inequalities imply that a single scattering medium drawn from any ensemble with fixed mean transmittance is described by a single point lying in a triangular region of the associated information plane. Figure 4 illustrates this permissible region (combination of the blue and gray shaded areas) when considering the (C T , C R ) plane. Note that for the case of a balanced ensemble, the C T + C R = log N boundary (dotted blue line) corresponds to the lower bound given by Jensen's inequality found above. Channel capacities for individual realisations of scattering media drawn from the DMPK ensemble are also shown in Fig. 4 assuming N = 25. Dis- tinct clusters of points are evident and correspond to differing mean transmittances (Nτ ) and lie along the parametric curve (solid gray curve with markers) defined by (C T ,C R ). Individual realisations are shown for equally spaced values ofτ ranging from 0.95 to 0.15. As discussed above decreasingτ corresponds to thicker samples. Average channel capacities for thicker samples are hence again seen to be greater in a reflection modality. Noting that log N ≤C R +C T also holds, such parametric curves for the mean channel capacities of other statistical ensembles must also lie within the triangular region shown in Fig. 4. Bounds onC N derived above, however, further restrict the allowed region to that depicted by the gray shading. (C T ,C R ) points corresponding to the COE (purple circle) and CUE (blue square) are also shown and clearly lie within this admissible region.
Interestingly, we also find that in the asymptotic limit of large N , the upper bound defined by (C max T ,C max R ) can be well approximated by the mean channel capacity found when the transmission eigenvalues are independent identically distributed Bernoulli random variables with mean ofη =ḡ N /N namelȳ The quality of this approximation is shown in Fig. 5. At this point we also note an interesting connection with the results of Ref. [16] in which it was demonstrated that in the diffusive regime there are at best g T degrees of freedom when attempting to focus light through a disordered medium. In this scattering regime the eigenvalue spectrum corresponds to a bimodal distribution which is highly concentrated at both τ k ≈ 0 and τ k ≈ 1 [33]. The degrees of freedom available to engineer the light field in a scattering medium are thus those that preserve the information about the source field. Our results show that this intuitive rule also represents a rigorous limit beyond the diffusive scattering regime. Moreover, within an imaging context, we note the aggregate channel provides additional information.
Finally we consider what universal bounds exist for the variance of the channel capacity (our discussion will be solely in terms of the channel capacity, however analogous results hold for the mutual information I N ). As with the case for the channel capacity, a deterministic scattering medium provides the trivial lower bound on the variance in which case σ 2 C N = 0. For any bounded random variable X (0 ≤ X ≤ 1) the Bhatia-Davis inequality states that variance of X has a maximum value ofx(1 −x) when X is Bernoulli distributed and wherē x is the mean of X (or equivalently the probability that X = 1) [41]. Transforming this result onto the problem of determining the maximal value of σ 2 C N we have σ 2 C N ≤C N (log N −C N ). This expression is however not ensemble independent due to the dependence on the mean channel capacity. Instead the variance must be maximised subject to the inequality constraints derived above. Maximum variance is again achieved when C N is Bernoulli distributed whereby We note that when the CLT discussed above holds, Eq. (18) only gives a loose bound on the variance due to the differing nature of the Gaussian and Bernoulli distributions applicable in each case. The significant difference between the calculated variances and the limiting values is evident in Figs. 2 and 3. An alternative upper bound on the variance of the channel capacity can however also be derived which can sometimes give slightly improved constraints in comparison to Eq. (18) (this accounts for the slight kink in the blue band plotted in Fig. 3). To do so we first note that C N is a non-negative convex function with respect to η, such that taking the square preserves convexity [40]. Following a maximization procedure similar to that given above, albeit for E η [C 2 N ], gives where w γ = 1 − w β and we have used the lower limit onC N to express the bound in an ensemble independent manner. The explicit expressions for β, γ and w

V. CONCLUSIONS
In conclusion, in this work we have considered the informational limits on image transmission through complex media using a random scattering matrix based formalism. Information-theoretic quantities, namely the mutual information and channel capacity, were considered in preference to more conventional imaging metrics due to the inherent randomisation, and resulting poor image fidelity, caused by scattering in such media. Through Monte Carlo simulations of media described by the COE, CUE and DMPK matrix ensembles, we have numerically studied the full statistical distribution of these metrics and demonstrated the existence of CLTs in the asymptotic limit of large mode numbers. Formal existence conditions for such CLTs were also highlighted.
Whilst such numerical and ensemble specific results are both interesting and useful, they are nevertheless limited in scope. In this work, we have therefore established universal upper and lower bounds on the mean mutual information and channel capacity of image transmission through a complex medium. Specifically, the lower bound was found to match that of a fixed transmittance deterministic channel, whereas the upper bound corresponds to a mixture of bimodal channels. For systems with a large number of degrees of freedom, the upper bound on channel capacity was found to be well approximated by that of a bimodal channel with independent identically Bernoulli distributed transmission eigenvalues. Bounds on the variance of the channel capacity were also derived, albeit found to provide only loose bounds for the numerical cases considered since limiting values of the variance are achieved when the channel capacity is Bernoulli distributed. Notably, the limits found here do not require any a priori statistical knowledge of the medium other than the mean transmittance and are applicable beyond the more usual diffusive regimes considered in the literature. Given their ensemble independent nature, these bounds hence act as fundamental limits in imaging through scattering media and provide a benchmark to evaluate the many emerging techniques for imaging through complex media. In this section we determine the average channel capacity for scattering media for which the eigenvalues are independent identically distributed Bernoulli random variables. As part of our derivation we will also find the channel capacity for an information channel with K open sub-channels and N − K closed channels. Our derivation here also serves as an illustration as to how to determine the channel capacity in the case when some η N i are exactly equal to unity and/or zero.
We begin by considering the mutual information of an information channel as given by Eq. (6) of the main text which takes the form To maximise I N with respect to the source probabilities p j subject to the constraint N j=1 p j = 1 we construct the Lagrangian where α is a Lagrange multiplier. Evaluating the derivative with respect to p k yields We now assume that η N k = 1 or 0 for all k and that we have ordered the sub-channels such that i.e. that there are m closed sub-channels and K = N −m open sub-channels. As in the main text, in this case we define u = [η N 1 , . . . , η N N ]. Accordingly it then follows that and Upon substitution of these derivatives into Eq. (S3) and equating the derivative of the Lagrangian to zero we find Summing Eq. (S7) over k and enforcing the constraint N k=1 p k = 1 gives where we have introduced the tilde notation to denote optimal source probabilities which are also dependent on which alphabet N we measure. To determine the optimal probabilities we use Eq.
Although we have assumed the specific ordering of η N k as given by Eq. (S4) the derivation is unaffected upon permutation of elements of u. We note that thus far in our derivation we have assumed a fixed u and thus have not allowed for any randomness in our bimodal information channel. For the case that the transmission eigenvalues are independent identically distributed Bernoulli random variables, m is a binomial random variable with corresponding PDF where δ(· · · ) is the Dirac delta function. Note that since the transmission eigenvalues are identically distributed η j =η for all j. Determination of the mean capacity of a bimodal channel then follows simply as (S14)

S2. CENTRAL LIMIT THEOREM FOR NON-LINEAR STATISTICS OF ηj
In Ref. [1] Politzer presents a formal proof that the asymptotic probability distribution function of any linear statistic A = i µ(η i ) of the eigenvalues, here denoted η i (i = 1, . . . N ), is Gaussian. In his proof Politzer describes how correlations between eigenvalues can be interpreted as N -body forces. In particular a random ensemble of matrices with N -eigenvalue forces can be expressed such that the probability of a set of eigenvalues {η j } is proportional to where V (η j ) are effective one body external potentials chosen such that the ensemble has the same eigenvalue density ρ(η) and two point correlation function K(η, η ) as the ensemble with the original N -body forces. Polizter then proceeds to consider perturbation of the eigenvalue probability distribution by an additional factor of exp[ i µ(η i )] where A = i µ(η i ) such that the eigenvalue density is modified to ρ(η) + δρ(η). The final step of Politzer's proof is to show that the perturbation in the eigenvalue density δρ is linear in µ, such that the central limit theorem applied. In our case, we can follow analogous steps, however, we now perturb the eigenvalue probability distribution of Eq. (S15) by a nonlinear statistic of the form A = i h i (η), which again perturbs the eigenvalue density to ρ (η) = ρ(η) + δρ(η). This nonlinear perturbation corresponds to introduction of a perturbing potential with complicated N -body forces. Following the arguments of Politzer used to justify the form of Eq. (S15), it is, however, possible to replace the perturbed ensemble (when certain smoothness criteria are meet [2]) with one with probability distribution of the form whilst maintaining the form of ρ (η) and the perturbed correlation function up to order 1/N . With this linearised form, the proof of the CLT proceeds identically to that given in Ref. [1].

S3. CONVEXITY OF MUTUAL INFORMATION AND CHANNEL CAPACITY WITH RESPECT TO TRANSMISSION EIGENVALUES
To prove convexity of the mutual information I N with respect to the parameters η j we show that its Hessian matrix H η is positive semi-definite. We must thus evaluate the derivatives ∂I N /∂η j ∂η k . Note that we drop the N notation throughout this section for clarity. Consider then the first order derivative where we have used the derivatives ∂Λ l /∂p k = η k and ∂P j /∂p k = η j [δ jk /Λ l − p j η k /Λ 2 l ] and the last step follows since N j=1 ∂P j /∂η k = ∂[ N j=1 P j ]/∂η k where N j=1 P j = 1 is a constant. Since p j and P j lie in the range [0, 1] it follows that the first derivative is always negative, i.e. the mutual information is a decreasing function with respect to all η j . The second order derivative thus takes the form Consider then x k x l ∂ 2 I N ∂η k ∂η l (S23) From the Cauchy-Schwarz inequality we however note that such that x T H η x ≥ 0, i.e. the Hessian matrix is positive semi-definite. Since the Hessian matrix is positive semi-definite we have that the mutual information I N is a convex function in η for any fixed set of probabilities {p j }. At this point we can however use the result that if f 1 (x), f 2 (x), . . . f N (x) are some convex functions in x then their point-wise maximum (i.e. sup {x} f j ) is also convex in x [3]. Accordingly it follows that C N is a convex function in η since it is given by the supremum of I N with respect to the source probabilities.

S4. UPPER INFORMATIONAL BOUNDS
In this section we seek to prove that inequality (11) of the main text gives an upper bound no worse than log N and similarly the analogous expression for the mutual information I N is no worse than the source entropy H(S). We recall Eq. (11) which takes the form We first note that for any u the inequality C N (u) ≤ log N holds, which in turn allows us to factor this term out of the summation such that Exchanging the order of summation and expectation we haveC Study of the combinatorics of the summation and product terms quickly reveals that u∈P i∈Au (1 − η i ) j∈Bu η j = 1 (S29) such thatC N ≤ log N . The derivation for I N is formally equivalent except the initial step requires the inequality I N (u) ≤ H(S).

S5. MAXIMUM CHANNEL CAPACITY PROOF
In the main text we derived the ensemble specific upper bound on the mean channel capacity of a scattering medium, as described by Eq. (14). Specifically we demonstrated thatC N ≤ C where In this section we show that among all possible ensembles with a fixed mean total transmittance the universal upper bound is given by sup { w K } C =C max N , whereC max N is given by (16)of the main text. Specifically we note that the maximisation of C is subject to the constraints Although the constraint given in Eq. (S33) was not explicitly given in the main text it follows by observing that 0 ≤ w(u) ≤ 1, w K = u∈U K w(u) and that the set U K has #U K = N K distinct elements. Eq. (S31) and Eq. (S32) can be used to eliminate two of the w K from the complete set of N . Specifically we considering expressing w i and w i−1 in terms of the remaining w K , which from Eq. (S31) and Eq. (S32) gives We now consider the explicit difference ∆ =C max From Eq. (S39) it is first observed that when w K = 0 for {K; 0 ≤ K ≤ N, K = N − k, N − k − 1} the difference between C and andC max N is identically zero. Noting further that the bracketed factor in Eq. (S39) is negative for all 0 ≤ K ≤ N and 0 ≤ k ≤ N − 1 (as can be easily seen by visual inspection of the function), it follows that for any w K ≥ 0 ({K; 0 ≤ K ≤ N, K = N − k, N − k − 1}) the difference ∆ is positive, i.e. C ≤C max N . Positivity of w K therefore ensures thatC max N represents the universal upper bound on the mean channel capacity.