Mode-Dependent Loss and Gain: Statistics and Effect on Mode-Division Multiplexing

In multimode fiber transmission systems, mode-dependent loss and gain (collectively referred to as MDL) pose fundamental performance limitations. In the regime of strong mode coupling, the statistics of MDL (expressed in decibels or log power gain units) can be described by the eigenvalue distribution of zero-trace Gaussian unitary ensemble in the small-MDL region that is expected to be of interest for practical long-haul transmission. Information-theoretic channel capacities of mode-division-multiplexed systems in the presence of MDL are studied, including average and outage capacities, with and without channel state information.


Introduction
Multimode fiber (MMF) has been employed traditionally in short-distance systems, including local-area networks and data-center interconnects [1][2][3]. In such applications, MMF is often favored over single-mode fiber (SMF) because of relaxed connector alignment tolerances and reduced transceiver component costs. MMF supports propagation of multiple spatial modes having different group delays and potentially different losses. Manufacturing variations, bends, mechanical stresses, thermal gradients and other effects cause coupling between different modes. Even if a signal is launched into a single spatial mode, it tends to couple into multiple spatial modes with different group delays. This modal dispersion [4][5][6] causes intersymbol interference (ISI), and is the primary factor limiting bandwidth-distance products in short-distance MMF systems using direct detection. Mode-dependent loss and gain (collectively referred to as MDL), are typically less important than modal dispersion in shortdistance systems. The characteristics of a MMF, in particular, the modal group delay profile and loss profile, vary along the length of a fiber, and can be considered constant only over a characteristic correlation length. When a fiber is much longer than the correlation length, it can be modeled as a concatenation of numerous sections with independent characteristics. We refer to this as the strong-coupling regime. In a recent paper [7], we studied modal dispersion in MMF in the strong-coupling regime, showing that it is statistically equivalent to a zerotrace random Gaussian Hermitian matrix, or a Gaussian unitary ensemble [8], with eigenvalues corresponding to modal group delays. In the strong-coupling limit, the modal group delays follow a limiting distribution that depends only on a single parameter and the number of modes [7].
In recent works, data rates have been increased by spatially multiplexing several parallel data streams in one fiber. This can be achieved using multi-core fiber [9][10], or by modedivision multiplexing (MDM) in MMF [11][12][13][14][15][16][17][18][19]. Long-haul systems using MDM in MMF are expected to use coherent detection. When using coherent detection, large modal dispersion may necessitate complex signal processing, but it does not fundamentally limit performance or channel capacity. For example, systems using orthogonal frequency-division multiplexing (OFDM) [20][21] can avoid ISI given sufficiently long cyclic prefix and narrow subcarrier spacing, but at the price of a reduced tolerance to fiber nonlinearity and phase noise, unless complex signal processing is used for inter-carrier interference cancellation [22][23]. Likewise, systems using single-carrier modulation can mitigate ISI given a sufficient number of equalizer taps, but at the price of equalization-enhanced phase noise [24][25][26].
Long-haul MDM systems will employ many inline optical components, including amplifiers and switches, which can introduce MDL. Unlike modal dispersion, MDL is fundamentally a performance-limiting factor. In the extreme case, MDL is equivalent to a reduction in the number of propagating modes, leading to a proportional decrease in data rate or channel capacity.
In this paper, we study the statistics of MDL in the strong-coupling regime. In this regime, a MMF can be modeled as a cascade of random matrices [7,27]. By using the central limit theorem for the product of random matrices, we show that MDL follows a limiting distribution that depends only on the number of modes and a single additional parameter. Using known properties of 2 2  and very large matrices, we state two propositions, which are verified through extensive numerical simulation. These two important results are the following: Proposition I: In the strong-coupling regime, when the overall MDL is small, the distribution of the overall MDL (measured in units of the logarithm of power gain or decibels) is identical to the eigenvalue distribution of zero-trace Gaussian unitary ensemble.
Proposition II: In the strong-coupling regime, when the overall MDL is small, the standard deviation (STD) of the overall MDL m dl  depends solely on the square-root of the accumulated MDL variance  via: 2 12 1 mdl (1) If the MMF comprises K independent, statistically identical sections, each with MDL variance 2 g  , we have . In Eq. (1), m dl  and  are measured in units of the logarithm of power gain. 1 Proposition I describes the shape of the MDL distribution, whose spread depends on the variance of the Gaussian unitary ensemble. Proposition II quantifies the spread of the distribution by specifying its STD. Using these propositions, the distribution of MDL can be completely specified by the number of modes and the square-root of the accumulated MDL variance g K   ξ . Propositions I and II have not been proven rigorously. Using numerical simulation, we demonstrate that in the small-MDL region, the shape of the overall MDL distribution essentially depends only on the dimension of the zero-trace Gaussian unitary ensemble, which is the number of modes. For systems with overall MDL in the range of practical interest, 10 dB, the overall MDL (measured in decibels or log power gain) has a distribution very close to the eigenvalue distribution of zero-trace Gaussian unitary ensemble.
In Proposition II, the STD of the overall MDL depends solely on . For modal dispersion in MMF, or for polarization-mode dispersion (PMD) in SMF [28-30], the overall modal group delay spread depends linearly on   K , where   is the STD of the group delay spread per section [7,[28][29][30][31]. For MDL in MMF, the overall MDL depends nonlinearly on g K   ξ , as described by Eq. (1). The nonlinear relationship of Proposition II cannot be proven rigorously in the general case, but related results can be derived analytically in the limit of many modes. Proposition II has been tested by comparing Eq. (1) to numerical simulations. The approximation Eq. (1) is highly accurate for two-mode fibers with g K   ξ up to 10 dB, and the region of its validity increases with an increasing number of modes.
A statistical characterization of MDL can be used to compute information-theoretic channel capacities of MMF with MDL. Instead of using brute-force simulation of a cascade of many independent fiber sections described by random matrices, a zero-trace Gaussian unitary ensemble can be directly generated numerically, and its eigenvalues computed. Alternatively, the known joint probability density of a zero-trace Gaussian unitary ensemble, as given in [7], can be employed. We demonstrate that computations of both average capacities and outage capacities by these methods match those by brute force simulation in the region of small MDL.
The remainder of this paper is organized as follows. Sec. 2 presents the matrix model of MMF with strong mode coupling and provides arguments for Propositions I and II. Sec. 3 presents numerical simulations verifying Propositions I and II. Sec. 4 computes channel capacities of MMF with MDL and compares them to results computed for the zero-trace Gaussian unitary ensemble. Secs. 5 and 6 are discussion and conclusions, respectively.

Matrix Model for Multimode Fiber with Strong Mode Coupling
The model presented here is valid in the strong-coupling regime, where the overall fiber length is much greater than the correlation length over which the MDL profile can be considered constant, such that the fiber can be modeled as a concatenation of many independent sections. The model presented here is valid regardless of the actual correlation length or the statistics of MDL in each individual section.

Random Matrix Model
In the strong-coupling regime, a MMF is divided into K sections, with each section modeled as a random matrix. The length of each section should be at least the correlation length, such that each section can be considered independent of the others [7,27].
Assume that the MMF supports D propagating modes. 2 At an angular frequency , the transfer characteristic of the kth section may be modeled by a matrix ) ω ( where * denotes Hermitian transpose, ) (k U and ) (k V are frequency-independent random unitary matrices representing modal coupling at the input and output, respectively, and ) ω ( is a diagonal matrix representing modal propagation in the kth section. Including both MDL and modal dispersion, can be expressed as: where, in the kth section, the vector describes the uncoupled MDL, and describes the uncoupled modal groups delays. Neither the mean MDL, nor the mean group delay, affect the statistical properties of interest, so without loss of generality, we assume that 0 The overall transfer matrix of a MMF having K sections is: Similar to multi-input multi-output (MIMO) wireless systems [32,33], at any single frequency, using singular value decomposition, the overall matrix ) (t M can be decomposed into D spatial channels: where ) (t U and ) (t V are the input and output unitary beam-forming matrices, and we have defined Here, is a vector of the logarithms of the eigenvalues of , which quantifies the overall MDL of the MIMO system. Our goal here is to study the statistics of the overall MDL described by In an individual section, say the kth section, the uncoupled MDL ) (k g and group delay ) (k τ are modeled as independent of frequency within the band of a signal of interest. In Eq. (5), multiplication of many matrices of the form Eq. (2) causes the overall MDL and overall group delay to become frequency-dependent. In the decomposition given by Eq. (6), from the receiver to the transmitter. In the bit-loading process, both the power and information bits in each spatial channel may be allocated according to the gain vector ) (t g . In the general case that the channel matrix ) (t M is frequency-dependent, frequency-dependent channel decomposition [given by Eq. (6)] and bit loading may be performed. When using OFDM, the bit loading may be performed for each subcarrier or for a group of adjacent subcarriers. Using single-carrier modulation, frequency-dependent channel decomposition and bit loading is possible, but becomes complex to implement in the regime of strong mode coupling [34].
In a long-haul system, the feedback process described above may become impractical if the MMF changes on a time scale shorter than or comparable to the round-trip propagation delay. When feedback becomes impossible, space-time codes [35][36] or error-correction codes across the spatial channels can provide diversity. In these cases, all spatial channels are allocated equal powers.

Measures of Mode-Dependent Loss and Gain
We are now prepared to define properly two terms used in this paper to quantify MDL: accumulated MDL and overall MDL.
Accumulated MDL refers to the sum of the uncoupled MDL values in all K sections comprising a fiber. The variance of the accumulated MDL is: where 2 Overall MDL refers to the end-to-end coupled MDL of a fiber comprising K sections. The overall MDL is computed from the gain vector ) (t g that appears in Eq. (6). The gains in ) (t g are assumed to be ordered as  , and are assumed to sum to zero: When taken together, the D elements of is an element chosen randomly from ) (t g . The STD of overall MDL m dl  has the same units as the gains in ) The mean of the maximum MDL difference,   , quantifies the gain difference between the strongest and weakest modes. In a two-mode fiber,   is commonly referred to as the mean polarization-dependent loss (PDL).
The accumulated MDL variance 2 ξ , given by Eq. (7), does not equal the variance of overall MDL because of the nonlinearity inherent in Proposition II, given by Eq. (1). Much of the complexity of PDL and MDL arises from the nonlinearity of Eq. (1).

Properties of the Product of Random Matrices
Characterizing the statistics of the singular values of , is the key to understanding the performance of MDM systems, as in MIMO wireless systems [32-33]. The analysis here is complicated by the fact that ) (t M is the product of random matrices [see Eq. (5)], rather than the sum of random matrices, as in [7]. Since the early works [37] and [38], there have been many studies on the statistics of the products of random matrices, but most addressed the Lyapunov exponent of the products [39, 40]. We are interested here in the statistics of the eigenvalues of a product of matrices, not the Lyapunov exponent.
Because matrix multiplication is not commutative, i.e., AB is not generally equal to BA , even for square matrices, the logarithm of the product of two matrices, AB log , is not equal to the sum of A log and B log . Unlike the product of positive random variables (which do commute), which has its central limit as the log-normal distribution, the product of positive random matrices (those with positive eigenvalues) does not generally have its central limit as the exponent of a Gaussian unitary ensemble. The central limit theorem for the summation of random matrices is not necessary helpful for the understanding of the products of random matrices.
For any matrix X and a very small number δ , we have Of course, the approximation Eq. (1) is linear when ξ is far less than unity. Numerical simulation shows that Proposition I remains valid for ξ up to 10 dB (about 2.3 in log power gain units), a regime in which, equivalently speaking, A log and B log are not very small. For random matrices of the form Eq. (2), at a single frequency, we are interested in the statistics of the singular values of the product Eq. (5) in the decomposition Eq. (6). Approximate results were obtained for the case that Eq. (2) is 2 2  in the study of PDL in SMF [42][43], and for the case that Eq. (2) has very large size in the study of free random variables [44][45]. In both cases, for positive matrices A and B , the product AB may be interpreted as , similar to free probability theory. The repetition of and [46] showed that for low PDL, the PDL (measured in decibels or log power gain units) has a Maxwellian distribution, the same as the distribution of the group delay in SMF with PMD [28-30]. In a recent study [7], we showed that the Maxwellian is the eigenvalue distribution for a zero-trace 2 2  unitary Gaussian ensemble. The zero-trace 2 2  unitary Gaussian ensemble may be obtained by the summation of many zero-trace 2 2  Hermitian matrices [7]. To compare the results of [42] and [46] for the products of 2 2  random matrices and the results of [7] for the summation of 2 2  random matrices, in terms of its eigenvalues, the central limit for the product of random matrices has the same distribution shape as the central limit for the summation of random matrices. This statement is only valid in the small-PDL limit, but the Maxwellian distribution has been found to be accurate even for large PDL [43]. The model used here is largely the same as that in [46], except that the 2 2  matrices . In free probability theory, the semicircle distribution serves a function analogous to the normal distribution, which is the central limit for the summation of random variables in traditional probability theory [49, Sec. .
In traditional probability theory, the product of independent positive random variables has a central limit as the log-normal distribution. The log-normal distribution models shadowing in wireless systems [50, Sec. 3.2.9] and distortion by stimulated Raman scattering in optical fiber systems [51]. As shown in [52], the log-semicircle distribution is found to be the central limit of the product of positive free random variables if the free random variables have small variance; in the notation used here, this corresponds to defined in Eq. (7), having a small value. Although there have been many studies on the properties of the products of free random variables [53-55], to our knowledge, none have studied the distribution explicitly.
From the results of [52], equivalently speaking, when the MDL is small and in the strongcoupling regime, the overall MDL has a log-semicircle distribution. In [7], it is shown that the modal group delays follow a semicircle distribution in the limit of a large number of modes. MDL (measured in decibels or log power gain) has the same distribution as the modal group delays in the limit of a large number of modes. Expression Eq. (1) is basically equivalent to the results in [52], which were derived from the free probability theory.
Based on the discussion above, Proposition I is correct both for 2 2  and for large matrices, i.e., for SMF with PDL and for a many-mode fiber with MDL. Proposition II is derived from the theory of free random variables. Free random variables are large random matrices that are used to model MMF with a large number of modes. These arguments do not represent rigorous proofs in the general case for Propositions I and II, however. In the next section, numerical simulation is used to justify extending Propositions I and II to the general case, in which the number of modes D ranges from 2 to infinity.

Numerical Simulations of Mode-Dependent Loss and Gain
In this section, we use numerical simulation to verify Propositions I and II. We start by addressing PDL in a two-mode fiber, as analytical approximations are well-known. We then address the many-mode case, comparing to analytical results from free-probability theory. Finally, we address fibers with four and eight modes, to further test our analytical approximations.

Two-Mode Fiber
The simplest possible case, a two-mode fiber, describes PDL in SMF, where approximate analytical results were derived for the small-PDL regime [42] and extended to the large-PDL regime [43]. To facilitate comparison of our results with the previous PDL literature, we will quantify MDL in two different ways in the two-mode case. In order to include fibers with more than two modes, we have defined the STD of the overall MDL m dl , which corresponds to the mean PDL as defined in [42][43].
In [42], instead of using Eq. (1), the STD of overall MDL was approximated by where both m dl σ and ξ were defined as in Eq. (1). The STD of overall MDL given by Eq. (8) From [42], the probability density of PDL (in decibels or log power gain) is the wellknown Maxwellian distribution, which is the same as the distribution of differential group delay in SMF with PMD [28-30]. Fig. 1(a) shows the simulated probability density of the MDL of a two-mode fiber compared with the two-sided Maxwellian distribution given in Table I. The probability densities in Fig. 1(a) are shown with the x-axis normalized by the simulated STD of the overall MDL  mdl . The fiber has K = 256 sections. Each independent fiber section has a gain of Fig. 1(a).
are generated by the method described in the Appendix. The eigenvalues of the cascaded matrix are found numerically. The probability densities in Fig. 1(a) are each obtained from 200,000 eigenvalues. In Fig. 1(a), the simulated probability density matches the two-sided Maxwellian distribution very well for 10 ξ  dB. . Also shown in Fig. 1 times, to be exact) the STD of the overall MDL  mdl .
In Fig. 1(b), we observe that the approximation of  mdl given by Eq. (8) [42] is always larger than the simulated results, but is valid within 0.5 and 1 dB up to ξ = 8 and 9 dB, respectively. The approximation of  mdl given by Eq. (1) is always smaller than the simulated results, but is valid within 0.5 and 1 dB up to ξ = 9 and 12 dB, respectively. While the approximation of  mdl given by Eq. (8) [42] deviates from the simulated results beyond ξ = 10 dB, that given by Eq. (1) lies within 3 dB up to ξ = 20 dB.  Table I. The x-axis for each curve is normalized by the simulated STD of the overall MDL mdl. (b) Comparing the simulated STD of overall MDL with approximations (1) and (8) Fig. 1(b). Our purpose for including large values of MDL in Fig. 1 is to test the validity of Eqs. (1) and (8) in predicting the STD of overall MDL  mdl . The more exact model of [43], which is applicable beyond the low-MDL regime, is not shown in Fig. 1(b). In [43], the STD of overall MDL is given as 9 / ξ 1 ξ σ 2 mdl   , which is very close the approximation given by Eq. (1).
We conclude, similar to [42], that in two-mode fibers having mean MDL differences less than 15 dB, the MDL (measured in decibels or log power gain) follows a Maxwellian distribution, and the approximation Eq. (1) for the STD of the overall MDL is valid. In twomode fibers, Proposition I is valid for  up to 10 dB, corresponding to the STD of overall MDL  mdl up to 13.8 dB and mean MDL difference up to 23.4 dB. The STD of overall MDL, Eq. (1) from Proposition II, is valid for  up to about 15 dB.

Fibers with Large Numbers of Modes
For fibers with a large number of modes, the modal group delays have the same statistical properties as the eigenvalues of a large zero-trace Gaussian unitary ensemble [7]. The eigenvalues of large random matrices have a semicircle distribution, as shown by Wigner [56], and shown to be valid for a large class of large random matrices [48]. The summation of free random variables also gives a semicircle distribution, as known from free probability theory [47].   Fig. 2(a) shows the simulated eigenvalue distribution of a fiber with 64 modes. Similar to the simulation in Fig. 1(a), the fiber has K = 256 sections and all unitary matrices are generated by the method described in the Appendix. Each curve of Fig. 2(a) is constructed using 64,000 eigenvalues. Each MMF section has a modal gain profile α ) ( , where g σ α  . The first 32 modes have gains of + and the last 32 modes have gains of , such that the gains sum to zero. The x-axis of each curve in Fig. 2(a) is normalized by the simulated STD of the overall MDL  mdl . In Fig. 2(a), the simulated eigenvalue distribution is very close to the semicircle distribution given by Table I up to ξ = 15 dB. Comparing Figs. 1(a) and 2(a), the simulated distribution in Fig. 2(a) is closer to the semicircle distribution than the simulated distribution in Fig. 1(a) is to the Maxwellian distribution over a larger range of MDL values. Fig. 2(b) compares the simulated STD of the overall MDL  mdl as a function of g K σ ξ  to the approximation for  mdl given by Eq. (1) for a fiber with D = 64 modes. The approximationEq.
(1) agrees with simulated results within 0.01 dB for ξ up to 10 dB. For ξ from 10 to 20 dB, the overall MDL approximation Eq. (1) is always smaller than the simulated results and the discrepancy between the simulations and the approximation Eq. (1) increases to 15 . 0 dB. The discrepancy may arise from our simulating matrices of size D = 64 instead of infinitely large matrices. The discrepancy may also be caused by numerical uncertainty. Nevertheless, the approximation Eq. (1) can be considered highly accurate.
and zero otherwise.  Table I. The x-axis for each curve is normalized by the simulated STD of the overall MDL mdl. (b) Comparing the simulated STD of overall MDL with approximation (1). The simulated mean maximum MDL difference is also shown for comparison. Fig. 2(b) also shows the simulated maximum MDL difference, which is the mean of the maximum gain difference In the range of  up to 15 dB, where the simulated distribution is close to the semicircle distribution [as seen in Fig. 2(a)], the STD of overall MDL  mdl is as high as 33.4 dB and the maximum MDL difference is as high as 81 dB. Practical MDM system should have MDL well below those values.
We conclude that for fibers with 64 modes, Proposition I is valid for  up to 15 dB, corresponding to  mdl up to 21.2 dB. The overall MDL approximation Eq. (1) from Proposition II is valid for values of  up to 20 dB.

Other Few-Mode Fibers
Figs. 1 and 2 confirm numerically that Propositions I and II are valid for fibers with two and 64 modes. Using [7], the eigenvalues of small size zero-trace Gaussian unitary ensembles can be derived analytically by direct integration. Table I shows those distributions with normalized variances of unity.
Figs. 1(a) and 2(a) show that the simulated distribution is close to the theoretical distribution in the region of small  and small overall MDL. The approximate STD of overall MDL given by Eq. (1) and the simulation results become closer with an increase in the number of modes. The distributions in Table I Table I Table I. The x-axis for each curve is normalized by the simulated STD of the overall MDL mdl. (b) Comparing the simulated STD of overall MDL with approximation (1). The simulated mean maximum MDL difference is also shown for comparison.
In Fig. 3(a), for a fiber with D = 4 modes, we observe that the simulated distribution matches that of Table I until  = 13 dB, a value slightly higher than for D = 2 modes [ Fig.  1(a)], but slightly smaller than for D = 64 modes [ Fig. 2(a)]. As explained in [7], the number of peaks is the same as the number of modes, and each peak corresponds to values where the gain is concentrated. The simulated central two peaks in Fig. 3(a) actually match the theory until  = 17 dB, but discrepancies appear in the two side peaks.
In Fig. 3(b), for a fiber with D = 4 modes, we observe that the approximation for  mdl Eq. (1) is always smaller than the simulated results. The discrepancy is up to 11 . 0 dB for ξ up to 10 dB but increases to 55 . 0 dB for ξ up to 20 dB. In Fig. 4(a), for a fiber with D = 8 modes, we observe that the simulated distributions match those in Table I until  = 14 dB, slightly higher than for D = 2 modes [ Fig. 1(a)], but slightly smaller than for D = 64 modes [ Fig. 2(a)]. Similar to Fig. 3(a), the simulated central six peaks of Fig. 4(a) match the theory until  = 17 dB, but discrepancies appear in the two side peaks.
In Fig. 4(b), for a fiber with D = 8 modes, we observe that the approximation of  mdl Eq. (1) agrees with simulations within 0.02 dB for ξ up to 10 dB. For ξ ranging from 10 to 20 dB, the overall MDL approximation Eq. (1) is always smaller than the simulated results, and the discrepancy increases to a maximum of 04 . 0 dB. We conclude that in fibers with D = 4 and 8 modes, in the range of ξ up to 20 dB, corresponding to a STD of overall MDL up to  mdl = 33 dB, the overall MDL approximation Eq. (1) can be considered accurate for practical purposes. Proposition I can be considered accurate in the small-MDL region, in which is less than 10 to 15 dB, corresponding to the STD of overall MDL  mdl less than 12 to 21 dB, with the accuracy increasing as the number of modes increases. The approximation for  mdl , given by Eq. (1) in Proposition II, is accurate over a range larger than that in which Proposition I is accurate. Specifically, for fibers with more than 8 modes, the approximation

Channel Capacity with Mode-Dependent Loss and Gain
An MDM system in MMF is assumed to use coherent detection and inline optical amplification. The dominant noise at the receiver is assumed to arise from amplified spontaneous emission, and the received noise power spectral density is assumed to be the same in each mode, similar to the assumptions in [12][13][14][15]. The signal-to-noise ratio (SNR) t ρ is defined as the received signal power (total over all D modes) divided by the received noise power (per mode). 3 This SNR definition is similar to that in wireless MIMO systems [33].
For a MMF without MDL, all modes have the same gain and the same received power, and the channel capacity is equal to Fig. 5 shows the channel capacity as a function of SNR t ρ for fibers with different numbers of modes. When the SNR is much greater than the number of modes, the capacity 3 This definition of SNR is compatible with the conservative assumption that the total signal power in all D modes is subject to a constraint independent of D, e.g., by fiber nonlinearity or component costs, while the noise power per mode is independent of D. Nevertheless, our results, if interpreted correctly, do not depend on this assumption in any way. increases almost linearly with the number of modes. When the SNR is much less than the number of modes, the capacity is almost independent of the number of modes. In the limit of an infinite number of modes, the capacity is given asymptotically by e C t 2 log    , which is independent of the number of modes. Fig. 5 shows clearly that multiple modes increase capacity, especially in systems with sufficiently high SNR. The availability of CSI at the transmitter is an essential factor governing channel capacity for MMF with MDL. In the absence of CSI, an increase in MDL always leads to a decrease of capacity. In the extreme limit of MDL, the fiber supports propagation in only one mode. If CSI is available to the transmitter, only the surviving mode is used for transmission, and the channel capacity is ) , which is Eq. (9) with D = 1. If CSI is not available and the surviving mode is not known to the transmitter, the transmitter must allocate equal power to all modes, and the channel capacity is ) As demonstrated below, channel capacity is improved greatly by the availability of CSI. In some situations, when CSI is available, the channel capacity with MDL may exceed Eq. (9).

Average Channel Capacity without Channel State Information
When CSI is not available, the transmitter allocates equal power to each mode. Given the number of modes D and the STD of overall MDL  mdl , the average channel capacity is: where p D (x) is the probability density with unit variance given in Table I, and the constant  is determined by the constraint: If the noise power per mode is normalized to unity, the constant is the total transmitted power and /D in Eq. (10) is the transmitted power per mode. From Sec. 3, the MDL (measured in units of decibels or log power gain) has zero mean. Measured on a linear scale, the overall power loss/gain of a K-section MMF fiber, summed over all D modes, is equal to . The factor given by Eq. (11), normalizes the overall gain to unity, as measured on a linear scale. For a system with MDL, the channel gains are not constant, but are random variables. The SNR t  is the mean (statistical average) SNR, and the factor  given by can be interpreted asthe ratio of the mean SNR to the mean gain of the channel. The product   x mdl exp χ  in Eq. (10) can be interpreted as the SNR of a channel realization with normalized gain x.
Analytical expressions are available for the constant , given by Eq. (11), for fibers with the numbers of modes given in Table I, but are too complicated for practical calculations. There is no known analytical expression for Eq. (10) except for the limit of very large D. Numerical integration of Eq. (10) may be employed to find the channel capacity. For fibers with a large number of modes, MDL is log-semicircle distributed, and the constant becomes is the modified Bessel function of the first kind.
If D is far larger than ) 2 exp( χ mdl   , where 2 mdl is the upper limit of the semicircle distribution, the channel capacity approaches  C . In this limit, each mode is allocated a very small power, and the overall capacity is proportional to the total power. Fig. 6 shows the average capacity with MDL but without CSI, as a function of the squareroot of accumulated MDL variance g K    , for fibers with various numbers of modes D. As CSI is unavailable, equal power is allocated to each mode. The mean SNR is  t = 10 dB. In Fig. 6, the theoretical channel capacity is calculated by using Eq. (1) to find the STD of overall MDL  mdl , and the probability distributions in Table I   In Fig. 6, in the absence of CSI, average channel capacity always degrades with increasing , particularly in fibers with smaller numbers of modes D. For values of  up to 10 dB, the average channel capacities from theory and simulation match very well. For a twomode fiber, theoretical and simulated capacities match within 5% up to  = 11 dB. For a 512-mode fiber, theoretical and simulated capacities match within 5% up to  = 19 dB. The discrepancy between theory and simulation decreases with an increase in the number of modes.

Average Channel Capacity with Channel State Information
When CSI is available at the transmitter, transmit power may be allocated to each mode in an optimal way. If the noise power per mode is normalized to unity and the overall gain vector ) (t g is known, the optimal transmit powers are given by , where    denotes limiting to nonnegative values, i.e.,   The constant  is chosen to satisfy the total power constraint: with  given by Eq. (11). The average channel capacity is which is an expectation over random realizations of the MDL. A brute-force method to compute Eq. (13) involves generating many random realizations of the overall matrix Eq. (5) and computing their singular values to evaluate the average capacity Eq. (13). Each realization of Eq. (5) is the product of 3K matrices. 4 Assuming values of MDL sufficiently small to be of practical interest, Propositions I and II are valid (see Sec. 3). In this regime, a more efficient method of computing Eq. (13) is based on Proposition I, namely, that the joint probability density for the vector ) (t g is given by the eigenvalue distribution of a zero-trace Gaussian unitary ensemble [7]. Hence, a zerotrace random Gaussian Hermitian matrix may be generated using the method described in the Appendix, and the eigenvalues of the random matrix can be used to compute Eq. (13). Also, instead of generating new matrices for different values of , Proposition II can be exploited. A single matrix may be generated, and the eigenvalues can be scaled using the approximation for  mdl , given by Eq. (1). Fig. 7 shows the average capacity for a system with MDL and with CSI, as a function of the square-root of accumulated MDL variance g K    , for fibers with various numbers of modes D. The SNR is  t = 10 dB, representing a statistical average. The theoretical curves are obtained using 100,000 zero-trace random Gaussian Hermitian matrices generated as described in the Appendix. Simulated results are obtained using brute-force generation of channel matrices by the same methods used for Figs. 1-4. In Fig. 7, we see that for fibers with D = 2, 4 or 8 modes, even with CSI, the channel capacity decreases with increasing MDL. With D = 16 modes, at small MDL, the capacity increases with increasing MDL, although it eventually decreases with a further increase in MDL. More generally, MDL can increase capacity when the SNR is small relative to the number of modes D. This may seem counter-intuitive, but can be made plausible by the following example for D = 2 modes. Assume that the total transmitted power is unity, the noise power per mode is unity and the mean SNR is  t = 1. Hence, the mean channel gain is 0 dB, corresponding to unity in linear units. Without MDL, using Eq. (9), the channel capacity is 2log 2 (1+0.5) = 1.17 bit/s/Hz. Now suppose that MDL is present, and that in a particular channel realization, the modal gains are -3.01 dB and +1.76 dB, which correspond to 0.5 and 1.5 in linear units (the mean of these two values is unity, as in the absence of MDL). When CSI is not available, the transmitter allocates powers of 0.5 to each mode.  In Fig. 7, for a two-mode fiber, theoretical and simulated average capacities agree within 5% up to  = 11 dB. For a 16-mode fiber, theoretical and simulated average capacities agree within 5% up to  = 17 dB. The discrepancy decreases with an increasing number of modes.

Outage Capacity
Outage capacity is often considered a better performance measure than average capacity for channels with random gains, such as in an MDM system with MDL [27]. The outage capacity may be computed using randomly generated Gaussian Hermitian matrices, similar to the method used to compute average capacity with CSI, as described in Sec. 4.2. Such a method is much more efficient than the brute-force multiplication of 3K matrices, as given by Eq. (5). With CSI, the outage capacity is computed from a gain vector ) (t g similar to Eq. (13), and is given by: where Pr{ } denotes probability, C out and p out are the outage capacity and outage probability, respectively. The constant  in Eq. (14) is determined by the total power constraint Eq. (12). Without the CSI, the outage probability is given by  , for fibers with various numbers of modes D. The outage probability is out p = 10 -3 , and the SNR is  t = 10 dB. Systems with or without CSI are considered. The theoretical curves are obtained using 100,000 zero-trace random Gaussian Hermitian matrices generated, as described in the Appendix. Simulated results are obtained using brute-force generation of channel matrices, as used for Figs. 1-4. In Fig. 8, we observe that the outage capacity decreases rapidly with increasing MDL for all numbers of modes, with or without CSI. 5 This rapid decrease is caused by a large fraction (or even all) of modes having small gains, and is a consequence of the difference between the average gains measured on linear and decibel scales. The average of the linear gains always exceeds the average of the decibel gains because the arithmetic mean is always larger than the geometric mean. The rapid decrease of outage capacity with increasing MDL can be explained most easily in the two-mode case, although a similar explanation can be applied to any number of modes. Referring to Fig. 1(a), the origin of the x-axis is the mean decibel-scale gain of 0 dB (or the geometric mean linear-scale gain of unity), which is a constant, independent of the value of  mdl . Recall that the stronger and weaker modes always have gains larger and smaller than the origin of Fig. 1(a), respectively. The arithmetic mean of the linear-scale gain is always on the positive side of the origin, and increases with increasing MDL. Now consider a specific numerical example, using only linear gain units. Suppose the MDL is high, and the arithmetic mean gain is 10. Because the geometric mean gain is always unity, the stronger mode has a gain in the range (1,  ) and the weaker mode has a gain in the range (0, 1). Suppose that in a particular realization, the stronger and weaker modes have gains of 10 and 0.1, respectively, an average gain close to the arithmetic mean. Assuming the noise level per mode is unity and CSI is not available, the capacity is log 2 (1+10/2) + log 2 (1+0.1/2) = 2.66 bit/s/Hz. This is far above the outage capacity. Now suppose that in another realization, the stronger and weaker modes both have gains approaching unity, an average gain close to the geometric mean. Again assuming a noise level of unity and no CSI, the capacity approaches 2log 2 (1+1/2) = 1.17 bit/s/Hz. This value is very close to the outage capacity.
In Fig. 8, theoretical and simulated outage capacities match well up to  = 10 dB. A difference up to 5% is observed at higher values of  as the number of modes increases. In all cases, the theoretical outage capacity lies below simulated values, so the theoretically derived estimates are conservative, particularly at larger values of 

Discussion
Based on [43,52], the STD of overall MDL is known to be given exactly by Optical amplifiers may be subject to saturation, which can cause variations in the signal power and noise level along the propagation path. Any power variations that are uniform across modes will not affect the statistics of MDL. Nevertheless, saturation effects may impact the channel capacities discussed in Sec. 4. The capacities computed in Sec. 4 assume that the total launched power levels, equivalent to given by , do not change with the random realizations in MDL. This assumption of constant launched power requires that there be no amplifier saturation caused by MDL. For example, consider a two-mode case with unit total input power, unit noise level (in the absence of saturation), unit SNR, and gains of 0.5 and 1.5 (measured in linear units). In the absence of saturation, the output powers may vary from 0.5 to 1.5, depending on the alignment of the input signal to the principal modes of MDL (i.e., the modes of minimum and maximum gain). When saturation occurs, however, the maximum output powers may be limited to values less than the nominal value of 1.5, while noise levels decrease proportionally to maintain the mean SNR of unity. The model given here does not take account of these effects.
For an input signal vector 0 x and the overall transfer matrix of a K-section fiber given by , but this correlation is not well-known except for very special cases.
In Sec. 4, the noises for the different principal modes are assumed to be independent and identically distributed (i.i.d.). In the strong-coupling regime and with many amplifiers serving as independent sources of amplified spontaneous emission, this assumption should be substantially correct. This can be justified as follows.
For simplicity, suppose the Kth (last) fiber section contains a noise source, and that in the local eigenmodes of the Kth section, the noise powers are 2 , i K  , i = 1, …, D. At the fiber output, the electric fields from this noise source are described by a vector  n where n is a D-dimensional Gaussian noise vector with unit variance in each element. At the fiber output, the noise correlation matrix is: For a given realization of the random unitary matrix ) (K V , the output noises may be neither independent nor identically distributed. Taking an expectation over all random matrices ) (K V yields:  are random unitary matrices, except in special cases. We have studied MDL statistics only for a single frequency. MDL is frequencydependent with a certain coherence bandwidth, which is expected to depend on the modal dispersion in Eq. (3). MDL should be statistically independent for frequencies whose separation is much larger than a certain coherence bandwidth, which should be of the same order as gd 1/σ , where gd σ is the STD of the group delay [7]. If the overall bandwidth of a signal is much larger than the coherence bandwidth, due to frequency diversity, the overall outage capacity should approach the average capacity (shown in Figs. 6 and 7 for cases without or with CSI), instead of the single-frequency outage capacities shown in Fig. 8. As explained earlier, modal dispersion does not fundamentally degrade performance, but does affect receiver complexity.

Conclusions
Two main propositions have been verified here numerically. The MDL in MMF is statistically the same as the eigenvalue distribution of a zero-trace Gaussian unitary ensemble in the small-MDL regime. The STD of the overall MDL may be approximated by Eq. (1) over a wide range of MDL values. Depending on the square-root of the accumulated MDL variance g K  ξ , both propositions are valid up to  = 10 dB for two-mode fibers and up to 6 The law of large numbers is concerned with averages. The overall noise correlation matrix is a summation over the correlation matrices corresponding to all the independent noise sources. The normalized cross-correlation of the noise is characterized by ratios between its off-diagonal elements and its diagonal elements, which are variances proportional to the number of noise sources. Thus, the normalized cross-correlation is implicitly an "average", to which the law of large numbers is applicable.  = 15 dB for fibers with many modes. This range of validity corresponds to a maximum MDL difference up to 23.4 dB in two-mode fibers and nearly 80 dB in fibers with many modes, far larger than values likely to be acceptable for practical long-haul systems.
The channel capacity of MMF has been calculated based on the proposition that the MDL in MMF has the same statistical properties as a zero-trace Gaussian unitary ensemble. Both average and outage capacities, with and without CSI, match results of brute-force simulation in the small-MDL regime.