Method for computing the optimal signal distribution and channel capacity

An iterative method for computing the channel capacity of both discrete and continuous input, continuous output channels is proposed. The efficiency of new method is demonstrated in comparison with the classical Blahut – Arimoto algorithm for several known channels. Moreover, we also present a hybrid method combining advantages of both the Blahut – Arimoto algorithm and our iterative approach. The new method is especially efficient for the channels with a priory unknown discrete input alphabet. © 2015 Optical Society of America OCIS codes: (060.2330) Fiber optics communications; (060.4510) Optical communications. References and links 1. C. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal 27, 379–423 (1948). 2. S. Arimoto, “An algorithm for computing the capacity of arbitrary discrete memoryless channels,” IEEE Trans. Inf. Theory 18, 14–20 (1972). 3. R. E. Blahut, “Computation of channel capacity and rate-distortion functions,” IEEE Trans. Inf. Theory 18, 460– 473 (1972). 4. C. Chang and L. D. Davisson, “On calculating the capacity of an infinite-input finite (infinite)-output channel,” IEEE Trans. Inf. Theory 34, 1004–1010 (1988). 5. J. Huang and S. P. Meyn, “Characterization and computation of optimal distributions for channel coding,” IEEE Trans. Inf. Theory 51, 2336–2351 (2005). 6. P. O. Vontobel, A. Kavcic, D. M. Arnold, and H. A. Loeliger, “A generalization of the Blahut-Arimoto algorithm to finite-state channels,” IEEE Trans. Inf. Theory 54, 1887–1918 (2008). 7. C.-I. Chang and S. C. Fan, “The capacity of discrete-time memoryless Rayleigh-fading channels,” Information and Computation 86, 1–13 (1990). 8. G. Matz and P. Duhamel, “Information geometric formulation and interpretation of accelerated Blahut-Arimototype algorithms,” in “Information Theory Workshop, 2004. IEEE,” (2004), pp. 66–70. 9. A. Mecozzi and M. Shtaif, “On the capacity of intensity modulated systems using optical amplifiers,” IEEE Photon. Technol. Lett. 13, 1029–1031 (2001). 10. K.-P. Ho, “Exact evaluation of the capacity for intensity-modulated direct-detection channels with optical amplifier noises,” IEEE Photon. Technol. Lett. 17, 858–860 (2005). 11. K. S. Turitsyn and S. K. Turitsyn, “Nonlinear communication channels with capacity above the linear Shannon limit,” Opt. Lett. 37, 3600–3602 (2012). 12. M. A. Sorokina and S. K. Turitsyn, “Regeneration limit of classical Shannon capacity,” Nature Communications 5, 3861 (2014). 13. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in Fortran (Cambridge University Press, Cambridge — New York, 1992). 14. G. Golub and C. F. Van Loan, Matrix Computations, 3rd edition (Johns Hopkins University Press, Baltimore, MD, 1996). 15. C. Shannon, “Probability of error for optimal codes in a Gaussian channel,” The Bell System Technical Journal 38, 611–656 (1959). 16. F. W. J. Ovler, D. W. Lozier, R. F. Boisvert, and C. W. Clark, NIST Handbook of Mathematical Functions (Cambridge University Press, New York, 2010). #236792 $15.00 USD Received 24 Mar 2015; revised 22 May 2015; accepted 22 May 2015; published 1 Jun 2015 © 2015 OSA 15 Jun 2015 | Vol. 23, No. 12 | DOI:10.1364/OE.23.015119 | OPTICS EXPRESS 15119 17. I. C. Abou-Faycal, M. D. Trott, and S. Shamai (Shitz), “The capacity of discrete-time memoryless Rayleighfading channels,” IEEE Trans. Inf. Theory 47, 1290–1301 (2001). 18. P. P. Mitra and J. B. Stark, “Recent advances in coherent optical communication,” Nature 411, 1027–1030 (2001). 19. R.-J. Essiambre, G. J. Foschini, G. Kramer, and P. J. Winzer, “Capacity limits of information transport in fiberoptic networks,” Phys. Rev. Lett. 101, 163901 (2008). 20. R. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, “Capacity limits of optical fiber networks,” J. Lightwave Technol. 28, 662–701 (2010). 21. M. A. Sorokina, S. Sygletos, and S. K. Turitsyn, “Optimization of cascaded regenerative links based on phase sensitive amplifiers,” Opt. Lett. 38, 4378–4381 (2013). 22. A. D. Ellis, J. Zhao, and D. Cotter, “Approaching the non-linear shannon limit,” J. Lightwave Technol. 28, 423– 433 (2010). 23. D. J. Richardson, “Filling the light pipe,” Science 330, 327–328 (2010). 24. A. S. Holevo and V. Giovannetti, “Quantum channels and their entropic characteristics,” Rep. Prog. Phys. 75, 046001 (2012). 25. V. Giovannetti, S. Lloyd, L. Maccone, and J. H. Shapiro, “Electromagnetic channel capacity for practical purposes,” Nat. Photon. 7, 834–838 (2013). 26. M. Katz and S. Shamai (Shitz), “On the capacity-achieving distribution of the discrete-time noncoherent and partially coherent AWGN channels,” IEEE Trans. Inf. Theory 50, 2257–2270 (2004). 27. C. Hager, A. Graell i Amat, A. Alvarado, and E. Agrell, “Constellation optimization for coherent optical channels distorted by nonlinear phase noise,” in “Global Communications Conference (GLOBECOM), 2012 IEEE,” (2012), pp. 2870–2875. 28. E. Agrell, A. Alvarado, G. Durisi, and M. Karlsson, “Capacity of a nonlinear optical channel with finite memory,” J. Lightwave Technol. 16, 2862–2876 (2014).


Introduction
Calculation of Shannon capacity of communication channels is a challenging mathematical problem that is important for design of practical systems.In optical communications complexity of this problem is increased due to nonlinear properties of fibre channels and memory introduced by fibre dispersion.Numerical methods developed in the past for computation of channel capacity face various new challenges and have to be modified and further improved.In this work we propose two new methods for numerical calculation of the Shannon capacity.We focus here on the explaining how to constructively use the new approaches rather than on fine mathematical details and proofs.
We consider a problem of numerical computation of the maximum of the mutual information functional introduced by Shannon [1]: dxdy P(y|x)P(x) log 2 P(y|x) dx P(y|x )P(x ) , here P(y|x) is the given conditional probability of detecting output signal y for an input signal x; both variables x and y may be discrete or continuous; here we consider only memoryless channels for simplicity of presentation and comparison with known methods, however, we would like to stress that our method can be also extended to channels with finite memory; P(x) is the probability density function of the input signal x.Consider P(y|x) to be a twice differentiable function of x.When variable x is discrete dx . . . is changed by the summation over the discrete support.The maximal value of functional (1) should be found under the condition of a bounded from above average power S: The Shannon functional optimization problem has analytical solutions only for a very limited class of communication channels and, as a rule, requires numerical calculations.
The most known and widely used numerical method in this field is the Blahut -Arimoto algorithm (BAA) [2,3] that was originally developed for finite discrete alphabets of both x and y.The BAA has a classical status and plays a fundamental role in numerical computations of capacity for various communication channels (see e.g.[4][5][6] and references therein).In this work we consider a general case of continuous-input, continuous output channel.In general, distribution P(x) that maximizes the mutual information (the functional presented in ( 1)) for such channels can be continuous or discrete or mixed (including both discrete and continuous parts).The discrete distribution formally can be presented using the continuous description as a set of Dirac delta functions with different coefficients.For channels with continuous input, continuous output the original BAA was modified [7,8].Our aim here is to introduce a new numerical method of channel capacity evaluation and demonstrate its advantage compared to the BAA for several channels considered below.The new point used in our method in application to continuous input, continuous output channels is an adaptive iterative tuning of the discrete approximations of continuous signal x.
The proposed method is a technical tool for optimizing the mutual information with respect to variable x.As distinct from BAA we do not consider the input alphabet as a fixed set of values, but something to be determined through numerical optimization.Generally, a wide class of distributions can provide the value of functional (1) close to its limit [5].The proposed method makes it possible to narrow down this class.
To illustrate the proposed relaxation method and to make a comparison with the BAA we consider three channels: (i) the classical Gaussian linear channel, just to set a reference point, (ii) the Rician channel that is known in optical communications as intensity modulation direct detection channel [9,10], and (iii) recently introduced (in the context of optical communications) nonlinear continuous input, continuous output regenerative channel.The latter includes cascaded interleaving of additive noise with nonlinear signal transformation x → x + α sin β x, where α, β are given parameters.Such nonlinear transformation (that effectively restricts the signal corruption by noise) can lead to channel capacity exceeding the capacity for the corresponding linear additive white Gaussian noise channel [11,12].However, this is possible only for an optimal input signal alphabet making this an interesting and challenging optimization problem that we use for testing the new method and its comparison to BAA.The choice of channels for comparison is not critical and we demonstrate that both the classical BAA and our method lead to exactly the same results in all three cases, however, with different computational complexity.
The paper is organized as follows.Section 2 describes the proposed method of computing the maximum of mutual information functional.Section 3 presents the numerical results for three channels chosen for comparison.In Section 3.1 the relaxation and the BAA methods are briefly compared with the analytical formula for the linear Gaussian channel.In Section 3.2 we compare the new method with the BAA for the Rice distribution channel whose capacity was first numerically found in [9,10].Section 3.3 is focused on the nonlinear noise squeezing channel comparative study by both methods.In Section 4 the difference between the two methods is explained and the hybrid approach, combining BAA and the relaxation method is tested, showing excellent performance.Finally, Section 5 presents our conclusions.We take out the more rigorous presentment of the numerical procedure into the Appendix.

Relaxation method
Definite integrals over a finite segment [a, b] can be approximated by the Rieman sums where f (x), g(x) are the continuous functions, L is the number of intervals.Arbitrary accuracy of the numerical approximation is controlled by the smallness of ∆ i .Denote p i = g(x i )∆ i , then where δ (x) is the Dirac delta function.The numbers p i can be chosen rational, i.e. there exist integer positive numbers n 1 , n 2 , . . ., for arbitrary ε while M being sufficiently large.Let the values of x i are equidistant: Then one can present the function g(x) as follows: We replace the set of intermediate points x 1 , . . ., x L−1 by new primed set (x 1 , . . ., x M ), where each point x i enters the new set several times with multiplicity n i : Then the factor 1/M in the sum becomes independent of index j.Underline that the approximation of function g(x) as a comb of delta-functions is valid only under integral (3).After this approximation the integral turns into the sum.
Returning to integral (1) we denote the probability distribution g(x) = P(x) and replace by the comb.In this way the problem reduces to the search of extremum for mutual information functional The normalization of the probability density function is included automatically when p k = 1/M.The bounded power condition is taken into account by the Lagrange multipliers method.Hereafter we suppose the average power S to be a fixed constant.Consider an auxiliary functional where λ is the Lagrange multiplier.Instead of a functional (6) it is possible to maximize any functional Φ = Φ + γS, where γ is an auxiliary constant.Since S is constant problems of extremum for both functional Φ and Φ are equivalent.At the same time with Φ we acquire an additional parameter γ and exploit it to provide the diagonal dominance in the matrix, and then simplify the numerical procedures.The point of maximum of functional Φ is determined from equations ∂ Φ/∂ x i = 0.As follows from (6), the set reduces to It is the set of transcendental algebraic equations.The role of γ-term is shifting of all the eigenvalues.It is especially important if some eigenvalues are close to zero and then the matrix inversion procedure is ill-conditioned.
Let us describe the relaxation procedure.Denote x k and introduce the additional parameter t ("time").The variables will be considered to be functions of parameter t: x k = x k (t), k = 1 . . ., M; λ = λ (t).In place of solving the algebraic set of transcendental Eqs.(7) we solve a set of linear differential equations When we multiply ( 8) by Φ k and sum up all the equations over index k, we see that the norm Then its solution converges to the solution of the set (7) with increasing "time" t.The relaxation is exponential Φ k 2 ∼ e −2t and the solution becomes negligible at t 1. Set ( 8) can be solved faster by iterations.In relaxation method [13] we replace ordinary differential Eqs. ( 9) by approximate finite-difference equations.We search for the steady-state solution, the details of relaxation process itself are not interesting for our purpose.Then calculating the stationary solution we may take a long step τ.
Denote the "velocities" dx k /dt = w k .From ( 7) we get The solution of differential Eqs. ( 8) tends to zero, then the solution of Eq. ( 9) tends to the solution of transcendental Eqs.(7).Further stages of iteration procedure are: , where τ is the step. 3 • Repeat 1 • and 2 • until the solution gets the target accuracy.
Solving set (9) we use Gauss-Seidel method of successive iterations.We vary parameter γ in order to provide the diagonal dominance in the corresponding matrix.The matrix with diagonal dominance is well-conditioned then the iteration procedure is stable and the convergence is faster [14].The number of required operations is O(M 2 ) per an iteration.

Numerical results
For the purpose of numerical modeling the continuous functions are usually replaced by stepwise functions defined at a finite number of intervals.Thus for approximation of continuous function in a finite domain the set of its values is sufficient in finite number of points.Calculations for continuous channels then can be executed by the same methods as for discrete channels.

The Gaussian channel
To test the new method and compare it to the BAA we first apply it to the well-known linear Gaussian channel.For linear additive white Gaussian noise channel with noise dispersion N the conditional probability is given by: The optimal distribution for this case is continuous and also Gaussian [15].For a given value of S it is Both the computed capacity and the probability distribution P(x) are in a good agreement with the well-known analytical results.For the relaxation method, the calculations require 5 − 10 iterations for each S. The noise dispersion is N = 0.36, the step is τ = 25, the number of steps for x is M = 800.The difference between numerical result and analytical formula is observed in 4th digit only.We would like to point out that for this example the BAA requires only 1 − 2 iterations then both the methods give the result for the probability distribution almost immediately.We include the example of the linear Gaussian channel for a fair comparison, just to show that the BAA is optimal for this particular classical channel compared to the relaxation method.However, as we will demonstrate below for other types of channels with discrete optimal distribution of x the new method is more efficient compared to the BAA.The Rice channel is defined by the transformation y = |x + η| between the transmitted x and received signal y, where η is the complex Gaussian noise.For instance, under assumption of negligible fibre nonlinearity, this formula describes the intensity modulation direct detection (IMDD) channel in optical communications when information coding (of the complex envelope field) is applied only over the signal power (intensity) without using optical phase.In this case η is the complex Gaussian term generated by the amplified spontaneous emission.The statistical properties of the Rice channel are given by the following distribution [9,10]:

Rice channel
here σ 2 is the dispersion, I 0 is the modified Bessel function of zero order [16].It is known [10,17] that the optimal input signal distribution for the continuous Rice channel has at least a single discrete point at x = 0.For relatively small S the optimal alphabet is the binary format [10].Starting from a binary signal format at small S, the number of discrete points is increased with S. We compare here the BAA and new relaxation method for σ = 1 considering only discrete variant of the Rice channel.Figure 1(a) shows the optimal probability distribution function calculated by the relaxation method at M = 401, S = 1.4.The probability density in this case is P(x) = p 1 δ (x − x 1 ) + p 2 δ (x − x 2 ), x 1 = 0, p 1 = 0.875, x 2 = 3.35, p 2 = 0.125.Figure 1(a) is plotted after 500 iterations.Figure 1(b) shows the same probability distribution function calculated by the BAA after 20000 iterations (solid curve) and after 1000 iterations (dashed).From these curves it is seen that the new method requires less number of iterations especially for piece-wise continuous functions.After relaxation method the resultant distribution function consists immediately of "stairs".The iterations are necessary only to refine their heights and positions.The BAA blurs the fronts and the stair shape becomes sharper only with increasing the number of iterations.This example shows that the relaxation method has certain advantages over the BAA when applied to non-Gaussian channels.

Nonlinear transformation channel
Next we consider optimization of the mutual information functional over input signal distributions for a recently introduced nonlinear channel [11,12].The following analysis, in general, can be applied to the n-dimensional vectors y and x representing output and input signals, correspondingly.However, without loss of generality, we examine here one-dimensional channel, and will use in what follows the notations x and y for one-dimensional (real) continuous input, continuous output, respectively.The transmission system presents a channel with R cascaded identical nonlinear elements interleaved with the standard linear additive white Gaussian noise spans (in-between nonlinear elements).In the absence of the nonlinear elements, this channel corresponds to the classical linear white Gaussian noise channel with the corresponding accumulated dispersion.As explained in detail in [11,12], the signal evolution in such system can be described in terms of the stochastic map -a discrete version of the Langevin equation for stochastic processes: Here k is the discrete index and T is the transfer function of the noise squeezing nonlinear element.The term η k models the Gaussian noise with zero mean and the dispersion given by N k added at k-th node Here we consider a simple example with single nonlinear element placed in-between two linear Gaussian channels.The first output y 0 = x + n 0 is the input signal x plus the first additive Gaussian noise n 0 .The average value of the noise is zero n 0 = 0, its dispersion is n 2 0 = N 0 .Here the bar denotes the averaging.Next, the y 0 acts as an input for the nonlinear element.The nonlinear element transforms the signal according to the formula [12]: Then the Gaussian noise is added to the signal The conditional probability distribution for such a channel is At α = 0 the channel is Gaussian.The average value of the noise is zero, its overall dispersion is N = N 0 + N 1 .The optimal distribution in this case is given by Eq. ( 11).Below we assume For linear Gaussian channel the optimal distribution of input signal is a continuous function.There exists also a number of discrete realizations of input alphabet providing the Shannon functional being arbitrary close to the maximum.Therefore, there is no problem of choosing the optimal alphabet.When the optimal input distribution P(x) is discrete, the situation is changed dramatically.The choice of input alphabet in this case is more difficult.
The numerical simulation shows that even for relatively small parameter α the capacityachieving input distribution for this channel is discrete with corresponding discrete optimal input alphabet.In the new method we do need to exploit the discrete input distribution.We would like also to emphasize that the (optimal) input alphabet is not known a priori.In this sense, we cannot even a priori claim the considered channel to be discrete or continuous.For comparison purpose, the numerical computations are realized by the BAA and by the relaxation method.The values x 1 , . . ., x M obtained at α = 0 are used as initial first-step estimates in the calculations.Figure 3(a) shows the optimal function P(x) for higher parameter α = 0.1 calculated by the relaxation method after 1000 iterations.Figure 3(b) shows the same distribution density calculated by the BAA.It is seen that using BAA even 20000 iterations are insufficient to include values x = 0, x = ±2.2into the desired alphabet.The relaxation method finds out these values using much less number of iterations.
Further increase in nonlinearity α improves the convergence of both the methods since the optimal alphabet consists of less number of symbols at higher α.The plots for α = 0.15 calculated by the new method after 500 iterations are shown in Fig. 4(a).Figure 4(b) shows the density calculated by BAA after 5000 and 20000 iterations.The optimal distribution density P(x) for α = 0.25 is not shown since it coincides with that at α = 0.15.Capacity C increases and then decreases with parameter α, therefore the optimal value exists.Figure 5 demonstrates that the noise squeezing leading to capacity increase over the capacity of the linear white Gaussian noise channel can be achieved with α varied over wide interval α = 0.3 − 0.7.

Hybrid method
Let us compare the methods using the number of iterations as the optimization parameter.At S = 5, α = 0.15 the maximal probability p = 0.27 corresponds to x = ±0.95(these values are displayed in Fig. 4).Consider the vicinity of the points x = ±0.95.The probability to find x within interval (x − ∆/2, x + ∆/2) is given by formula The greater is interval ∆ the better it is the accuracy of determining the probability p.At the same time, the accuracy of determining position x decreases with increasing ∆. Figure 6 presents the optimal probability P as a function of iteration number (more precisely the decimal logarithm of the iteration number).One can see that the number of iteration is much less when the relaxation method is employed.Let us point out the main difference of the relaxation method from that by Blahut -Arimoto algorithm [2].The BAA fixes the values x 1 < x 2 < . . .x L and determines the optimal probabilities p 1 , p 2 , . . ., p L of these values.In the relaxation method all the values x i are "moving" (more precisely "running") to increase the mutual information at fixed probabilities p i .The initial values p 1 , . . ., p L may be chosen equal to 1/L or some other initial set may be used.For example, it is possible to use a combination of both methods -a hybrid capacity computation method -to find the discrete distribution P(x) = ∑ K i=1 p i δ (x − x i ) faster.Let the optimal points x 1 , . . ., x K and corresponding probabilities p 1 , . . ., p K are unknown.We assume the trial values for a dense grid of fixed points x1 , . . ., xK and initial probabilities p1 , . . ., pK .The BAA reveals the optimal probabilities including some zero probabilities.The optimal points x i are accurate within the step of initial grid.To accelerate the search, first one makes n Blahut -Arimoto iterations, and then exclude the zero probabilities and corresponding points from the set x1 , . . ., xL .The result can be used as initial values in relaxation method.
The optimization problem of mutual information for the channel transmitting a finite set of symbols is reduced to the determination of the optimal alphabet sequence {x 1 , . . ., x L } and calculation of the best probabilities p i > 0 for each x i .The best alphabet found by BAA is chosen from a domain are assumed with some step ∆.The less is step ∆ the better it is the accuracy of calculated optimal values.The probabilities p i of values x i occur to be nonzero for closest values to given x i .It is demonstrated by Figs.1(b) and 2(b).As the number of iterations increases, the function P(x) becomes more similar to the sum ∑ k p k δ (x − x k ).
However, the number of iterations required to get the comb of delta-functions is by 1 − 2 orders greater than in the suggested relaxation method.The advantage of new method is the initial form of function P(x) which already consists of delta-function "lumps".During the calculations the positions x i move to provide the optimal value of functional (1).The BAA calculations broaden the lumps.That is why the greater number of iterations is needed for a good approximation.The BAA computing gives a smooth distribution in principle, but the channel input probabilities after running BAA could become a good initial condition for the relaxation method.Let us use the combined technique to nonlinear regenerative channel.An example calculated by the hybrid method is shown in Fig. 7.The initial positions are chosen in the nodes of uniform grid.Initial 200 iterations are executed by BAA.The step is ∆ = 0.02503, x ∈ [−10, 10], the number of points is M = 800.Then 40 iterations are carried out by the relaxation method.The results of BAA iterations are exploited as initial values of probabilities.The main ridge is placed at x ≈ 1.Its position x and height P (shown by steams) are slightly variable with S. Additional ridges burn with increasing S. Their heights are growing with S but remain sufficiently small, especially at large x.That means the higher number of symbols in optimal input alphabet with increasing energy.The negative part is not shown in Fig. 7, since it repeats the positive part P(−x, S) = P(x, S).
Another example of the hybrid calculation is the optimal Rice probability distribution at S = 5.In this case the binary format is not the better from the capacity point of view.The distribution P(x) is obtained by the BAA method after 20000 iterations.They are shown in Fig. 8 by the solid line and do not change after the additional 20000 BAA iterations.The maximum points are x ≈ 0, 3.4, 5.75, 7.9.The peaks are broadened with increasing x.The new relaxation method is applied to complete the calculation.The equidistant grid from BAA method x i = H(i − 1)/n is used as the sequence of initial values.Here H = 20 is the length of segment, n = 801, ∆x = 0.025.The output of BAA calculations are exploited as an initial values of probability q i = P(x i )∆x.The variation of probability distribution is shown in Fig. 8 by dashed line and circles.It is evident that the distribution converges to the discrete distribution.The peaks at x = 0, 3.44, 5.77 are clearly distinguishable.To present the result it is convenient to plot the histogram.The segment of x variation [0, H] is divided by K half-intervals: [0, δ 1 ) ∪ • • • ∪ [δ K−1 , H).Then we determine how many points hit the interval and sum up the probabilities of these points p i .We obtain the probabilities that are not negligible.The accuracy of position is determined by corresponding the half-interval length.The histogram is shown in Fig. 9.
Note that the relaxation method enables one to get a local maximum of the mutual information functional.When the alphabet is fixed the maximum is unique.When we the optimal input alphabet is not known there are several local maxima.The search of global maximum is a separate intricate problem.For this problem the combination of new method with BAA into novel hybrid method seems to be a promising approach.For each initial condition BAA would help to refine the points, and then the probabilities could be computed by the relaxation method.To make sure that we get the global maximum at fist we calculate with small accuracy, choose the appropriate initial condition, and then recalculate the maximum with better accuracy.
The modern accelerated methods for calculating the capacity [5,8] involve two limitations: (i) of signal peak power x ∈ [−W,W ] and (ii) of average power (2).The limitation of average power in the memoryless channels is necessary, since without this limitation the capacity could become unrestrictedly large with growing S. Like in paper [10], we consider the problem with limitation (5).The strict limitation enables one to avoid the additional limitation of the signal peak power.

Conclusions
Analysis of capacity of fiber-optic communication channels has attracted a great deal of interest in recent years (see e.g.[18][19][20][21][22][23] and references therein).This interest is directly linked to the major challenge faced by the modern optical communication systems -necessity to cope with the fast growing information traffic that requires corresponding increase of the networks capacity.The problem of computation of the Shannon (channel) capacity is a classical problem of the information theory.Only a short time ago it was recognized that the Shannon capacity of the nonlinear fiber channels has not yet been fully understood [18][19][20][21].The famous Shannon result for capacity [1] is derived for a linear channel with the additive Gaussian noise.However, fiber channels are nonlinear, their properties are changed with increase of signal power.Methods for calculation of the Shannon capacity of complex nonlinear channels with memory, such as, fiber channels are not yet well developed.The key characteristic of the channel, conditional probability of receiving certain output y provided that signal x was sent, is not known for nonlinear fiber channels.Current estimates of the lower bound of capacity are based on direct computation of mutual information for different modulation formats for varying input signal power [19,20].This approach allows one to predict and describe certain many important features of the information transmission, but, of course, this does not answer the question about the maximum possible error-free transmission rate.More efforts are needed in development of novel and enhancing the existing methods of capacity computation for nonlinear fiber channels.
There are interesting related research directions: capacity analysis in specially designed nonlinear channels (e.g. with signal regeneration or noise squeezing elements) [11,12] that, in principle, would allow capacity to be higher than in the linear AWGN channel and studies of capacity of quantum information channels [24,25].The numerical methods presented here are relevant to these research directions and we anticipate that application of our approach will lead to new results in these areas too.In this paper we applied new method for computation of the Shannon capacity of nonlinear noise squeezing channel and confirmed that its capacity is greater than the AWGN channel capacity (see Fig. 5).We have found that the optimal distribution becomes discrete even at small parameter α.
In conclusion, the relaxation method is proposed for maximization of the Shannon mutual information functional (1) both for discrete and continuous input, continuous output channels.The method is compared with the well-known Blahut -Arimoto algorithm for Gaussian and Rician channels and a nonlinear channel with a cascade of noise squeezing elements.The advantage of new method is demonstrated, especially for the problems with the initially unknown discrete optimal alphabet or if the optimal input alphabet is continuous, but we approximate capacity using discrete inputs.We demonstrated also the advantage of using the new hybrid method combining BAA and the relaxation approach.The presented new methods can be applied to more complex channels, including recently introduced models (see e.g.[21,[26][27][28] and references therein).

Fig. 5 .
Fig. 5.The channel capacity C/B as a function of nonlinearity α and signal-to-noise ratio S/N.The edge curve α = 0 corresponds to the Shannon limit for linear real channel.

Fig. 6 .
Fig.6.The comparison of methods.Probability P of symbol x = 0.95 as a function of the number of iterations (log 10 scale) calculated by the BAA at ∆ = 0.3 (solid line), 0.2 (dashed), 0.1 (dotted).The upper curve depicts the result of relaxation method at ∆ = 0.001.

Fig. 9 .
Fig. 9.The histogram of probabilities for the Rice channel calculated by 20000 BAA and 400 relaxation method iterations.