Probability-density function for energy perturbations of isolated optical pulses

The mathematical methods required to model simple stochastic processes are reviewed briefly. These methods are used to determine the probability-density function (PDF) for noise-induced energy perturbations of isolated (solitary) optical pulses in fiber communication systems. The analytical formula is consistent with the numerical solution of the energymoment equation. System failures are caused by large energy perturbations. For such perturbations the actual PDF differs significantly from the (idealized) Gauss PDF that is often used to predict system performance. © 2003 Optical Society of America OCIS codes:(060.2230) fiber optics communications; (190.5530) pulse propagation and solitons; (030.6600) statistical optics References and links 1. L. F. Mollenauer, J. P. Gordon and P. V. Mamyshev, “Solitons in high bit-rate, long-distance transmission,” in Optical Fiber Telecommunications IIIA , edited by I. P. Kaminow and T. L. Koch (Academic, San Diego, 1997), pp. 373–460. 2. E. Iannone, F. Matera, A. Mecozzi and M. Settembre, Nonlinear Optical Communications Networks (Wiley, New York, 1998). 3. K. Ishida, T. Kobayashi, J. Abe, K. Kinjo, S. Kuroda and T. Misuochi, “A comparative study of 10 Gb/s RZ-DPSK and RZ-ASK WDM transmission over transoceanic distances,” OFC 2003, paper ThE2. 4. J. X. Cai, D. G. Foursa, C. R. Davidson, Y. Cai, G. Domagala, H. Li, L. Liu, W. W. Patterson, A. N. Pilipetskii, M. Nissov and N. S. Bergano, “A DWDM demonstration of 3.73 Tb/s over 11,000 km using 373 RZ-DPSK channels at 10 Gb/s,” OFC 2003, paper PD22. 5. J. P. Gordon and H. A. Haus, “Random walk of coherently amplified solitons in optical fiber transmission,” Opt. Lett. 11, 665–667 (1986). 6. C. J. McKinstrie, “Gordon–Haus timing jitter in dispersion-managed systems with distributed amplification,” Opt. Commun.200, 165–177 (2001) and references therein. 7. J. P. Gordon and L. F. Mollenauer, “Phase noise in photonic communication systems using linear amplifiers,” Opt. Lett.23, 1351–1353 (1990). 8. C. J. McKinstrie, C. Xie and T. I. Lakoba, “Efficient modeling of phase jitter in dispersion-managed soliton systems,” Opt. Lett. 27, 1887–1889 (2002) and references therein. 9. S. N. Vlasov, V. A. Petrishchev and V. I. Talanov, “Averaged dersciption of wave beams in linear and nonlinear media,” Radiophys. Quantum Electron. 14, 1062–1070 (1971). 10. W. J. Firth, “Propagation of laser beams through inhomogeneous media,” Opt. Commun. 22, 226–230 (1977). 11. D. Anderson, “Variational approach to pulse propagation in optical fibers,” Phys. Rev. A 27, 3135–3145 (1983). 12. D. J. Kaup, “Perturbation theory for solitons in optical fibers,” Phys. Rev. A 42, 5689–5694 (1990). 13. H. A. Haus and Y. Lai, “Quantum theory of soliton squeezing: a linearized approach,” J. Opt. Soc. Am. B 7, 386–392 (1990). 14. D. Marcuse, “Derivation of analytical expressions for the bit-error probability in lightwave systems with optical amplifiers,” J. Lightwave Technol. 8, 1816–1823 (1990). 15. J. P. Gordon and L. F. Mollenauer, “Effects of fiber nonlinearities and amplifier spacing on ultra-long distance transmission,” J. Lightwave Technol. 9, 170–173 (1991). 16. P. A. Humblet and M. Azizõ glu, “On the bit error rate of lightwave systems with optical amplifiers,” J. Lightwave Technol.9, 1576–1582 (1991). 17. J. S. Lee and C. S. Shim, “Bit-error-rate analysis of optically preamplified receivers using an eigenfunction expansion method in optical frequency domain,” J. Lightwave Technol. 12, 1224–1229 (1994). 18. T. Yoshino and G. P. Agrawal, “Photoelectron statistics of solitons corrupted by amplified spontaneous emission,” Phys. Rev. A51, 1662–1668 (1995). 19. B. A. Malomed and N. Flytzanis, “Fluctuational distribution function of solitons in the nonlinear Schr ödinger system,” Phys. Rev. E48, R5–R8 (1993). 20. G. E. Falkovich, I. Kolokolov, V. Lebedev and S. K. Turitsyn, “Statistics of soliton-bearing systems with additive noise,” Phys. Rev. E63, 25601R (2001). 21. R. Holzlöhner, V. S. Grigoryan, C. R. Menyuk and W. L. Kath, “Accurate calculation of eye diagrams and bit error rates in optical transmission systems using linearization,” J. Lightwave Technol. 20, 389–400 (2002). 22. R. O. Moore, G. Biondini and W. L. Kath, “Importance sampling for noise-induced amplitude and timing jitter in soliton transmission systems,” Opt. Lett. 28, 105–107 (2003). 23. C. J. McKinstrie and P. J. Winzer, “How to apply importance-sampling techniques to simulations of optical systems,” http://arxiv.org/physics/0309002. 24. H. Kim and A. H. Gnauck, “Experimental investigation of the performance limitation of DPSK systems due to nonlinear phase noise,” IEEE Photon. Technol. Lett. 15, 320–322 (2003). 25. C. W. Gardiner,Handbook of Stochastic Methods, 2nd Ed. (Springer, Berlin, 2002). 26. J. W. Goodman, Statistical Optics(Wiley, New York, 1985). 27. J. G. Proakis, Digital Communications, 3rd Ed. (McGraw-Hill, New York, 1995). 28. M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions (Dover, New York, 1965), result 29.3.81. 29. W. Horsthemke and R. Lefever, Noise-Induced Transitions (Springer, Berlin, 1983), Sections 5.5 and 6.7. 30. R. Graham, “Hopf bifurcation with fluctuating control parameter,” Phys. Rev. A 25, 3234–3258 (1982). 31. R. Graham and A. Schenzle, “Carleman imbedding of multiplicative stochastic processes,” Phys. Rev. A 25, 1731–1754 (1982). 32. I. S. Gradsteyn and I. M. Rhyzhik, Table of Integrals, Series and Products, 5th Ed. (Academic, San Diego, 1994), result 6.633.2. 33. P. J. Winzer, S. Chandrasekhar and H. Kim, “Impact of filtering on RZ-DPSK reception,” IEEE Photon. Technol. Lett. 15, 840–842 (2003) and references therein. 34. J. D. Hoffman,Numerical Methods for Scientists and Engineers (McGraw-Hill, New York, 1992).


Introduction
In the last two decades there has been considerable interest in optical communication systems based on return-to-zero (RZ) pulses.In amplitude-shift-keyed (ASK) systems [1,2] information is transmitted by the presence (or absence) of a pulse.In differential-phase-shift-keyed (DPSK) systems [3,4] information is transmitted by the phase difference between neighboring pulses.Long-haul systems need amplifiers to compensate for fiber loss.Unfortunately, amplifier noise causes the amplitude (energy), phase, frequency and arrival-time of the pulses to fluctuate (jitter).These pulse-parameter fluctuations (perturbations) impair system performance.For ASK systems the dominant impairment is arrival-time jitter [5,6], in which noise-induced frequency shifts are converted into arrival-time shifts by dispersion.For DPSK systems the dominant impairment is phase jitter [7,8], in which energy (power) shifts are converted into phase shifts by nonlinearity.Although energy jitter is not the dominant impairment for either type of system, it should be considered first because it affects both types of system and is the simplest jitter phenomenon to model.Furthermore, the mathematical methods and physical insights developed in the study of energy jitter can be applied to studies of the other jitter phenomena.
Light-wave propagation in a fiber is governed by the nonlinear Schrödinger equation (NSE).The standard way to model pulse-parameter perturbations is to use the moment [9], variation [10,11] or soliton-perturbation [12,13] method to convert the NSE into a set of stochastic differential equations (SDEs) for the pulse parameters.If one assumes that the parameter perturbations are small, one can model their growth using linearized SDEs, which produce Gauss probability-density functions (PDFs).This approach produces (approximate) formulas that de-scribe how the means and variances of the PDFs depend on the system parameters (such as length): One can use these formulas to distinguish reasonable system designs from bad ones.However, system failures are caused by large parameter perturbations that occur infrequently.For such parameter perturbations the linearized SDEs (and the associated Gauss PDFs) are not valid: One cannot rely on the aforementioned formulas to make quantitative predictions about system performance.
Non-Gauss PDFs are produced by two independent mechanisms: nonlinear propagation in fibers and nonlinear (quadratic) detection at receivers.The effects of the latter mechanism on the bit-error-ratios (BERs) of ASK systems with (idealized) integrate-and-dump receivers were studied in detail [14,15,16].Propagation was not analyzed, but characteristic functions were used to calculate the BERs associated with specified signals and specified amounts of amplitude noise.Subsequently, Karhunen-Loève expansions were used to extend these analyses to include more-realistic receivers [17,18].Recent studies included the effects of propagation.Malomed [19] used a probability-diffusion, or Fokker-Planck, equation (FPE) to study the statistics of damped soliton systems driven by noise.(These systems are not like communication systems, in which fiber loss is compensated by amplification.)The statistics of undamped soliton systems perturbed by noise (which are like communication systems) were studied recently by Falkovich [20], who used a path-integral formalism to predict a non-Gauss energy PDF.Non-Gauss energy PDFs were observed in recent numerical simulations [21,22,23], and a non-Gauss phase PDF (which is a manifestation of a non-Gauss energy PDF) was observed in a recent experiment [24].
In this paper energy jitter is studied in detail.The paper is organized as follows: In Section 2 the mathematical methods required to model nonlinear stochastic processes are reviewed briefly.These methods include stochastic calculus, which facilitates the solution of SDEs, and the conversion of SDEs to FPEs, for which many solution methods exist.In Section 3 the NSE is used to derive two (equivalent) SDEs for the energy in a bit period.Because the rate at which amplifier noise changes the bit energy depends on the current energy, these SDEs are nonlinear.Energy jitter is analyzed in Section 4, for (idealized) systems with uniformlydistributed amplification (UDA).Formulas are obtained for the PDF of the output energies.In Section 5 the analytical results are validated by numerical solutions of the energy SDEs and their (common) associated FPE.The effects of nonuniformly-distributed amplification (NDA) are studied in Section 6.Finally, in Section 7 the main results of this paper are summarized.

Simple stochastic process
The mathematical methods required to model stochastic processes were described in detail by Gardiner [25].Readers who are familiar with these methods should proceed to Section 3 directly.Others should consider a simple stochastic process that involves a single measurable quantity X. Suppose that this process is modeled by the SDE where Ẋ = dX/dz, a and b are deterministic functions of X and z, and r is a random function of z.
A common (idealized) model for r is white noise, which is characterized by the equations where denotes an ensemble average and δ denotes a Dirac delta function.(Boldface was used to distinguish the symbol for the delta function from the symbol for a small change in value.) According to Eq. ( 3), the correlation distance of white noise is zero.The related function defines a Wiener process [25].It follows from Eqs. ( 2)-(4) that Let δ z be a short, but finite, distance interval and let δW = W (δ z).Then it follows from Eqs.
(5) and ( 6) that a typical noise increment δW ∼ δ z 1/2 .Although the white-noise model is common (and convenient), white noise cannot exist in a real physical system.To understand why not, recall that the space-autocorrelation function and the ensemble-autocorrelation function If r(z) represents a stationary stochastic process, the ensemble-autocorrelation is a function of ζ = z−z (rather than z and z separately) and the space average (7) equals the ensemble average (8): G e = G z = G [26].Define the Fourier transform and the spectral density [For cases in which |r| 2 is (noise) power, S is the power per unit inverse-wavelength.]Then the Wiener-Khinchin (statistical-autocorrelation) theorem [26] states that For white noise the autocorrelation function G(ζ ) = δ (ζ ), so the associated spectral density S(k) = 1: The spectrum has infinite bandwidth and contains infinite power.Such a spectrum cannot exist.[In transmission systems the frequency spectra are limited by the amplifier (or filter) bandwidths, which are finite.They are converted to wavenumber spectra by propagation.]A more-realistic model for r is weakly-colored noise, the simplest example of which is the rectangular spectral density This spectrum has a bandwidth of K (which we assume is much broader than any other relevant bandwidth) and contains finite power.The associated autocorrelation function has a correlation distance of order 1/K (which is much shorter than any other relevant distance scale).One can still use Eqs.( 2) and (3) to characterize weakly-colored noise, provided that one interprets δ as the sinc function in Eq. ( 14).For weakly-colored noise Eq. ( 4) does not define a Wiener process.However, Eq. ( 5) is still satisfied exactly and Eq. ( 6) is satisfied approximately.In particular, the increment δ z 0 r(z )dz ∼ δ z 1/2 , provided that δ z is much longer than the correlation distance.
Because Eq. ( 1) has a random driving term, each X(z) is one member of an ensemble of solutions.This description of the stochastic process is based on an SDE for the dependent variable X.An alternative description is based on the PDF for the independent variable x.Let P(x, z)dx be the probability that X(z) is in the range (x, x + dx).Then P(x, z) satisfies a probability-diffusion equation, which is called the Fokker-Planck equation (FPE).(In the mathematics literature the probability-diffusion equation is called the Kolmogorov equation.) The rules of stochastic calculus and the relation between an SDE and its associated FPE both depend on an approximate formula for the small, but finite, increment where, for simplicity of notation, the explicit z-dependence of a and b was suppressed and the initial position was denoted by 0 (rather than z).The first term on the right side of Eq. ( 15) is of order δ z, whereas the second term is of order δ z 1/2 .By omitting terms of order δ z 3/2 and higher, one can make the simplification where a 0 = a(X 0 ) = a[X(0)]: The first contribution only depends on the value of X at the beginning of the interval.In contrast, because the second term is larger than the first, one must account for the cumulative change in X.By doing so, one finds that where b = ∂ b/∂ X.It follows from Eqs. ( 16) and ( 17) that The first term on the right side of Eq. ( 18) is deterministic and of order δ z.The second is random (with zero mean) and of order δ z 1/2 .The third is also random, but of order δ z.Consequently, its contribution to the random part of δ X is insignificant.However, its mean value, if nonzero, is of order δ z.It follows from Eq. ( 18) that According to Eq. ( 3), the double integral in Eq. ( 19) contains a delta function, which implies that the noise correlation length l c = 0.As we explained above, no real physical process can have l c = 0.The requirement that l c be much shorter than the next-shortest length δ z introduces an ambiguity into the right side of Eq. ( 15), as viewed from the perspective of deterministic calculus.Indeed, if r were a deterministic function of z Eq. ( 15) could be rewritten as Equation ( 20) would be consistent with Eq. ( 15), provided only that l δ z.The ambiguity, which exists for stochastic processes, is whether one treats l l c or l l c .The choice one makes determines whether one obtains Ito or Stratonovich calculus, respectively.
In the Ito formalism the correlation in Eq. ( 19) vanishes: r(z+l)r(z ) = 0 because z ≤ z and l l c .One obtains the same result by making the intuitive argument that white noise emitted at position z is not correlated with noise emitted at any previous position, no matter how close, and interpreting the upper limit of the z -integral as z − .It follows from either approach that δ X ≈ a 0 δ z. ( It also follows (unambiguously) from Eqs. ( 3) and ( 18) that In most physical models the functions r and b are meant to be evaluated at the same position z, so the parameter l, defined in Eq. ( 20), should satisfy l l c .In the Stratonovich formalism l = 0. Let I and J denote the double integrals that appear in Eqs.(19) and (22), respectively.Then, by using the sinc function in Eq. ( 14) to evaluate these integrals, one finds that J = δ z + O(1/K) and I = J/2.Because the correlation distance is much shorter than the interval under consideration (Kδ z 1) In general, the short, but finite correlation distance associated with weakly-colored noise modifies the mean value of δ X (which is called the drift), but does not affect its variance.These conclusions do not depend sensitively on the choice of autocorrelation function.For the special case in which the noise is additive (b = 0), the drift formulas (21) and (24) are identical.
The simplest SDEs can be solved directly.(For example, linear SDEs with additive noise.)However, most SDEs cannot.Just as changes of variables enable the solution of some deterministic differential equations, so also do they enable the solution of some SDEs.In stochastic calculus the change-of-variable rule depends on the relation between the SDE and the associated finite-difference equation (FDE).
In the Ito formulation Eq. ( 1) is equivalent to the FDE Consistent with Eqs. ( 5) and ( 6), the noise increment has mean δW = 0 and variance δW 2 = δ z.Furthermore, noise increments associated with different positions are uncorrelated.Let Y = f (X), where f is an arbitrary differentiable (and invertible) function.Then where f X = d f /dX.It follows from the discussion of Eq. ( 18) that the larger deterministic contribution to δ X 2 must be retained [Eq.( 23)], whereas the other (smaller deterministic and random) contributions need not.Consequently, The change-of-variables rule in Ito calculus differs from the rule in deterministic calculus.
In the Stratonovich formulation Eq. ( 1) is equivalent to the FDE where b X = ∂ b/∂ X.It follows from Eqs. ( 27) and ( 29) that To complete the change of variables, one must replace the X-derivatives on the right side of Eq. ( 30) by Y -derivatives.Let g be the inverse of f , so that f X = 1/g Y and f XX = −g YY /g 3 Y .Then, by using these facts, one can rewrite Eq. ( 30) in the canonical form This FDE is equivalent to the SDE The change-of-variables rule in Stratonovich calculus is the same as the deterministic rule.Many SDEs resist solution by the change-of-variable method.However, it is sometimes possible to solve the FPEs associated with such SDEs.The derivation of these FPEs is based on the Chapman-Kolmogorov equation [25] where T is the transition-probability function and the initial position was denoted by 0 (rather than z).Suppose that X(0) has the value x.Then T (δ x, δ z|x, 0)d(δ x) is the probability that X(δ z) has a value between x + δ x and x + δ x + d(δ x).Because δ x is a small increment (the transition probability is only significant for small values of δ x), one can expand the integrand T P in a Taylor series about x.The result is where the explicit dependence of P and T on x and z = 0 was suppressed for simplicity of notation.For white noise the random contribution to δ X is bδW [Eqs.( 4) and ( 18)].The increment of a Wiener process (δW ) has Gaussian statistics [25].It follows from this fact and Eq. ( 18) that the Ito transition probability By substituting this formula in Eq. ( 35) one finds that the first integral is 1, the second is aδ z and the third is b 2 δ z.By cancelling like terms one obtains the Ito FPE For reference, notice that this derivation does not require the shape of T to be known exactly: The cumulative transition probability is always 1, and the mean and variance of δ X follow directly from Eqs. ( 19) and (22), respectively.If one repeats the preceding derivation, using Eqs.( 24) and ( 25) instead of Eqs. ( 21) and ( 23), one obtains the Stratonovich FPE which one can rewrite in the canonical form It is clear from Eqs. ( 37) and ( 38) that the Ito and Stratonovich FPEs associated with the SDE (1) are formally distinct.However, for systems with additive noise the associated FPEs are identical.[See the discussion after Eq. ( 25).]For such systems, let Y be an arbitrary function of X.Then the FPEs for P(y) are also identical.The proof of this statement is similar to the derivations of the Ito and Stratonovich change-of-variables rules.
We end this section by noting another difference between the Ito and Stratonovich formulations.Consider the ensemble average f (X)r , where f is an arbitrary function.At any position z the quantity X depends on the noise emitted at all previous positions.The Ito formulation is based on the assumption that noise emitted at position z is not correlated with noise emitted at any previous position, no matter how close.Consequently, in the Ito formulation by assumption.In contrast, the Stratonovich formulation is based on the assumption that the correlation distance of the noise is short, but finite.At any reference position z = 0 from which it follows that Terms proportional to a, b and f were omitted from Eqs. ( 41) and (42) because they are of higher order in δ z.By assumption, the correlation distance of the noise is shorter than δ z, so f [X(−δ z)]r(0) = 0.It follows from Eq. ( 14) that 0 Equations ( 40) and (43) are closely related to the finite-increment formulas ( 21) and (24).Suppose that the SDE (1) is written in the form Ẋ = r x (X, z), where r x is the total rate of change.It follows from the preceding results that in the Ito formulation whereas in the Stratonovich formulation Let δ r x denote the deviatoric rate of change r x − r x .Then In the Ito formulation Eq. ( 46) is exact, whereas in the Stratonovich formulation it is approximate (non-delta-like terms of order 1 were omitted).

Energy equation
Light-wave propagation in a fiber is governed by the NSE [1, 2] where ∂ z = ∂ /∂ z, A is the (complex) electric-field amplitude, g is the amplifier gain rate, α, β and γ are the fiber loss, dispersion and nonlinearity coefficients, respectively, and R is a random driving (source) term that models the effects of amplifier noise.Because R is independent of A, the noise is said to be additive.At each position the rate at which noise changes the amplitude is a random function of time.Furthermore the rates of change at different positions are independent.These properties are quantified by the equations where denotes an ensemble average and δ denotes a delta function.(Boldface was used to distinguish the symbol for a delta function from the symbol for a small change in value.)The source strength S = n sp hωg, where n sp is the spontaneous noise factor (which has a typical value in the range 1.1-1.3)and hω is the photon energy.
It is instructive to consider the noise energy that accumulates between the initial point z = 0 and the neighboring point z = δ z, in the absence of a signal pulse.It follows from Eq. (47) that Let T be the bit period.Then the noise energy in a bit period is By combining formulas (51) and (52) one finds that the expected noise energy By using Eq.(50) to evaluate the integrals in Eq. ( 53), one finds that δ E = Sδ zT δ (0).If one tries to interpret δ (t) as the Dirac delta function δ D (t), one obtains the unphysical result that the frequency bandwidth and energy of the noise are infinite.In real systems the noise bandwidth and energy are limited by the (common) bandwidth of the amplifiers (or filters).Consequently, one should interpret δ (t) as the delta-like function Fsinc(πFt), where F is the amplifier (or filter) bandwidth.By doing so, one obtains the physical result where Sδ z is the energy per noise mode and FT is the number of modes [14,15,16].
In real systems the wavenumber bandwidth of the noise is also finite.However, when one models the spatial evolution of the bit energy one can interpret δ (z) as the Dirac delta function δ D (z), and use Ito calculus, or one can interpret δ (z) as the delta-like function Ksinc(πKz), where K is the noise bandwidth, and use Stratonovich calculus.The former assumption implies that the correlation distance of the noise is zero, whereas the latter assumption implies that the correlation distance is short, but finite.We will illustrate both approaches in this section.The required elements of stochastic calulus were reviewed in Section 2.
The total (signal and noise) energy in a bit period is The wave power depends quadratically on A. By applying the Ito change-of-variable rule [Eq.(28)] to the real and imaginary parts of Eq. ( 47), combining the results and integrating the transformed equation over a bit period, one can show that The term SFT is present in Eq. ( 56) because the Ito rule differs from the deterministic rule.No energy-flux terms are present.On average, the noise-energy flux at the time boundaries (−T /2 and T /2) is zero.The signal-energy flux can be neglected if the signal pulse is isolated (does not interact with other pulses in the same channel or other channels).
Our goal is to rewrite Eq. ( 56) in the canonical form where One can deduce formulas for a and b by calculating the requisite moments of the right side of Eq. (56).In the Ito formulation the correlation distance of the noise is zero and A(z,t) depends only on the noise emitted at previous positions (z < z).It follows from these facts that By applying this result to the right side of Eq. ( 56), one finds that Thus, Eq. ( 56) is equivalent to the Ito SDE where g and S depend on z.Notice that the rate at which noise changes the bit energy depends on the current energy.Equation ( 63) is valid for any combination of distributed and lumped amplification.
The preceding description of the energy evolution is based on the Ito SDE (63) for the independent variable E. There is an alternative description, which is based on the PDF for the independent variable e.Let P(e, z)de be the probability that E(z) is in the range (e, e + de).Then [according to Eq. (37)] the energy PDF satisfies the Ito FPE Now consider the Stratonovich analysis.It follows from Eqs. ( 47) and (55), and the Stratonovich change-of-variable rule [Eq.(33)], that In contrast to Eq. ( 56), the term SFT is absent because the Stratonovich rule is the same as the deterministic rule.Our goal is to rewrite Eq. ( 65) in the canonical form (57).For a Stratonovich SDE r e has the canonical properties [Eqs.( 45) and ( 46)] where b = ∂ b/∂ E and non-delta-like terms of order 1 were omitted.In the Stratonovich formulation the correlation distance of the noise is short, but finite, and A(z,t) depends on the noise emitted at all previous positions and the current position (z ≤ z).It follows from these facts [and a short calculation similar to that which produced Eq. ( 43)] that By applying this result to the right side of Eq. ( 65), one finds that δ r e (z)δ r e (z ) = 2SEδ (z − z ).
(70) Thus, Eq. ( 65) is equivalent to the Stratonovich SDE where g and S depend on z.It follows from Eq. (71) [and Eq. ( 38)] that the energy PDF satisfies the Stratonovich FPE Although the Ito (63) and Stratonovich (71) SDEs differ slightly, the Ito (64) and Stratonovich (72) FPEs are identical.This result was obtained because the noise term in the NSE (47) is additive [see the discussion after Eq. ( 39)].
In this section the NSE was used to derive an (Ito or Stratonovich) SDE for the bit (pulse) energy.Solitons in constant-dispersion systems are characterized by four shape parameters: the energy, phase, frequency shift and time delay.Return-to-zero pulses in dispersion-managed systems are characterized by the aforementioned shape parameters, together with the chirp and width.By following a procedure similar to that described above (moment method), one can derive a set of six coupled (Ito or Stratonovich) SDEs, one for each shape parameter.Not only is energy jitter an important phenomenon in its own right, it is also a paradigm for a class of jitter phenomena: Whatever mathematical tools and physical insights are developed in the study of energy jitter can be applied to these other jitter phenomena.

Analysis of energy jitter
In the absence of noise the energy is constant.In the presence of noise it undergoes a random walk.For the purposes of illustration, it is enough to consider systems with UDA (g = α).The effects of NDA (g = α) will be discussed in Section 6.Let E 0 denote the initial pulse energy, and let X = E/E 0 , µ = FT and ζ = Sz/E 0 denote the normalized bit energy, mode number and normalized distance, respectively.Then, for systems with UDA, the normalized energy is governed by the Ito SDE where r is a random driving term with unit strength.(In the Stratonovich SDE the parameter µ is replaced by µ − 1/2.)By definition, the initial condition is X(0) = 1.Equation ( 73) is difficult to solve exactly.
For systems with high signal-to-noise ratios, the probability that X differs significantly from 1 is small.By making the (standard) approximation X 1/2 ≈ 1 in Eq. ( 73), one obtains the linearized equation Equation ( 74) defines X (approximately) as a Gauss (normal) random variable with mean m n = 1 + µζ and variance v n = 2ζ .The associated PDF is (From a logical standpoint the PDF of the non-negative quantity X cannot be exactly Gaussian, because, if it were, the probability of X < 0 would be finite for all ζ > 0. From a practical standpoint this inconsistency is tolerable if the probability of X < 0 is exponentially small for system lengths of interest.)For example, consider a 10-Gb/s system with UDA, α = 0.21 dB/Km, β = −0.30ps 2 /Km (D = 0.38 ps/Km-nm) and γ = 1.7/Km-W.Then a soliton (sech pulse) with a full-width at half-maximum of 30 ps has an input energy of 21 fJ (time-averaged power of 0.21 mW).If the system length l = 10 Mm, the output noise power in both polarizations, in a frequency bandwidth of 12 GHz (wavelength bandwidth of 0.1 nm), is 1.7 µW: The (optical) signalto-noise ratio (SNR) is 21 dB.(Systems with NDA produce the same noise power in shorter distances.)For this system the normalized output-energy variance 2Sl/E 0 is 6.6 × 10 −3 and the output-energy deviation is 8.1 × 10 −2 .
Although the standard approximation facilitates the solution of Eq. ( 73), it constrains the associated PDF (75) to be a symmetric function of x (relative to the mean value).Equation (73) defines X as the sum of independent random increments.Because the size of each increment depends on the current value of X, the actual PDF cannot be a symmetric function of x.Thus, the standard approximation is inadequate.One can remove the coefficient X 1/2 from the random term in Eq. ( 73) by making the change of variables Y = X 1/2 , in which case where µ = (2µ − 1)/4.By making the approximation Y ≈ 1 in the drift term, one obtains the linearized equation  [27].For the process under consideration m x ≈ 1 + µζ and v x ≈ 2ζ .The associated PDF can be written approximately as Formula (78) exhibits clearly the asymmetric and non-Gaussian nature of the energy PDF.
However, its accuracy remains to be determined.Equation (73) and the initial condition X(0) = 1 are equivalent to the FPE and the initial condition P(x, 0) = δ (x − 1).The moments of x are defined by the formula By multiplying Eq. ( 75) by x and x 2 , and integrating by parts, one can show that from which it follows that the mean is 1 + µζ and the variance is 2ζ + µζ 2 ≈ 2ζ .For typical systems ζ 2 ∼ 10 −5 , so the approximate PDF (78) predicts the energy mean and variance accurately.
Because the diffusion term in Eq. ( 79) involves the first power of x, one can solve Eq. ( 79) by Laplace transforming in x, making the a priori assumption that P(0, ζ ) = 0, and using the method of characteristics to solve the transformed equation.The result is Solution ( 83) is also called the moment generating function [27], because .By using the shift theorem and the relation [28] exp where I n (z) is the modified Bessel function of order n, one finds that Notice that P(0, ζ ) = 0, as assumed.For short distances, one can use the relation I n (z) ∼ exp(z)/(2πz) 1/2 as z → ∞ to rewrite solution (86) as The similarities between formulas (78) and (87) are evident.[The main effect of the term x µ/2 in formula (87) is to shift the mean from 1 to 1 + µζ .]One can also use a Sturm-Liouville expansion to solve the FPE (79).Consider the evolution of the conditional probability P(x, ζ |x 0 , 0) of observing the value x at the position ζ , provided that x 0 was observed at ζ = 0. (In this context one should consider the normalization energy E 0 as a reference energy rather than the initial energy.)The conditional PDF satisfies the forward FPE (79) with respect to the variable x.Moreover, it can be shown [25], similarly to Section 2, that the backward FPE (or backward Kolmogorov equation), which P(x, ζ |x 0 , 0) satisfies with respect to x 0 , has the form of the equation adjoint to (79).Specifically, The initial condition for Eqs. ( 79) and ( 88) is The boundary conditions belong to the natural type in the Gikhman-Skorokhod classification [29], which means that one should not and cannot specify such conditions.However, one can impose the normalization conditions Further discussion of the natural boundary conditions can be found in [30].We will point out issues relevant to our analysis as we proceed.
Following [29], we seek the solution to Eqs. ( 79) and (88) in the form where p(x, λ ) and p(x 0 , λ ) are the original and adjoint eigenfunctions, which satisfy the related eigenvalue equations According to the standard Sturm-Liouville theory of second-order differential operators, these eigenfunctions form a bi-orthogonal set satisfying the orthogonality relation It is the existence of the integral in Eq. ( 94) that plays the role of the boundary conditions for Eqs. ( 92) and (93) [30].One can also show that p and p are related by the simple equation where q(x) is the particular (but not necessarily normalizable) solution of the FPE (79) satisfying In order that a function satisfying condition (90) be representable in the form of expansion (91), the sets of eigenfunctions p(x, λ ) and p(x, λ ) must be complete.The completeness relation has the form Indeed, if Eq. ( 97) is satisfied and f (x) is any function for which the following integrals exist, then Equations ( 98)-(100) are consistent with Eqs. ( 91) and (94).For a standard Sturm-Liouville problem, in which the operator has real coefficients, the completeness relation ( 97) is known to hold in a certain functional space [29,31].In more general problems, which might occur for multi-variable FPEs, the completeness relations of the eigenfunctions must be established on a case-by-case basis [30].As we will show shortly, in the case of Eqs. ( 92) and ( 93), together with the boundary conditions stated after Eq. ( 94), the completeness relation ( 97) is equivalent to the orthogonality relation (94).We reiterate that, in general, the completeness relation has to be established independently in order to justify the eigenfunction expansion (91).We now return to the derivation of the conditional PDF.The eigenvalue problem (92)-(94) has the well-known solution where J n (z) is the Bessel function of order n.The completeness relation for the eigenfunctions ( 101) and (102) follows from the orthogonality relation (94), because the arguments of the Bessel functions depend symmetrically on λ and x.Notice that the functions where N n (z) is the Neumann function of order n, also solve (92) and are bounded at x = 0.However, they are not included in the set of eigenfunctions p(x, λ ) because they are not required for completeness.Notice also that for µ > 3/2 the individual eigenfunctions p(x, λ ) are not normalizable due to their divergence at x = ∞.This behavior does not invalidate expansion (91), but does make its convergence non-uniform in ζ : One cannot interchange the limits x → ∞ and ζ → ∞ in the resulting solution [31].By substituting Eqs. ( 101) and (102) into Eq.(91), and using the relation [32] ∞ one finds that Even though we did not impose boundary conditions on Eqs. ( 79) and (88), the eigenfunctions (101) and the solution (104) satisfy the condition of vanishing probability flux, at the boundaries x = 0 and x = ∞.Not only does the Sturm-Liouville analysis validate the Laplace-transform analysis (x 0 = 1), it also obviates the need for a priori boundary conditions.Solution (104) is identical to the solution obtained in previous studies [14,15,16] by a different approach.We expect that the approach presented herein, which is based on a FPE, can be generalized in a straightforward way to obtain the PDFs of other pulse parameters.
Energy PDFs are plotted in Fig. 1 for µ = 5 and ζ = 3.3 × 10 −3 , which correspond to the physical parameters described in Section 3. The solid, dot-dashed and dashed curves represent the exact PDF (86), the approximate PDF (78) and the Gauss PDF (75), respectively.All three PDFs have (approximately) the same mean and variance.When plotted on a linear scale (Fig. 1a), the differences between the PDFs are barely perceptible.When plotted on a logarithmic scale (Fig. 1b), it is clear that the exact and approximate PDFs have enhanced high-energy tails and diminished low-energy tails relative to the Gauss PDF.In particular, at high energies these (logarithmic) PDFs decrease linearly with energy, rather than quadratically [20].To compensate for this distortion, their peaks of are shifted to slightly lower energies (Fig. 1a).For the stated  86), the approximate PDF (78) and Gauss PDF (75), respectively.The differences between the exact and approximate PDFs are barely perceptible.Because the noise-induced energy perturbations depend on the pulse energy, the exact and approximate PDFs (which account for this dependence) have enhanced high-energy tails and diminished low-energy tails relative to the Gauss PDF (which does not).
parameters, the differences between the exact and approximate PDFs are insignificant.The probability of observing the low (normalized) energy X = 0.5 is about three orders of magnitude smaller than that predicted by the Gauss PDF and the probability of observing the high energy X = 1.5 is about two orders of magnitude larger.As the energy deviation |E − E 0 | increases, so also do the differences between the PDFs.These results show the danger of using Gaussian statistics to predict system performance.
In the past, the Gauss approximation was made often to facilitate predictions of bit-errorratios (BERs).For ASK systems, or DPSK systems with unbalanced detectors, the Gauss approximation predicts BERs that are correct to within an order of magnitude [14,15,16,33].However, this agreement is fortuitous: The Gauss approximation overestimates the probability of a 1 by several orders of magnitude (Fig. 1b), but underestimates the probability of a 0 by a similar amount, and the two errors cancel.For DPSK systems with balanced detectors the Gauss approximation overestimates the BER by several orders of magnitude [33].

Simulation of energy jitter
One can verify the results of Section 4 by solving the FPE (79) numerically.We did so by using the Crank-Nicholson (CN) scheme [34], which is the standard finite-difference scheme for diffusion equations.
One can also simulate energy jitter by solving the SDE (73) numerically, and using a randomnumber generator to specify the noise increments.We will discuss finite-difference schemes for the general SDE (1).Schemes for the energy SDE are special cases of the general schemes.
If one applies the Euler scheme to Eq. ( 1), one obtains the FDE where the subscript n denotes the distance nδ z.The noise increments have zero mean and satisfy the correlation equation where δ mn is the Kronecker delta function.It follows from Eqs. ( 106) and (107) that δW n+1 is not correlated with X n .Equation ( 106) is the Ito FDE (26).
In contrast, if one applies the Heun (predictor-corrector) scheme to Eq. ( 1), one obtains the coupled FDEs By using Eq. ( 108) to approximate the predicted terms in Eq. ( 109), one finds that Let δ X n+1 = X c n+1 − X n .Then it follows from Eqs. ( 107) and (110) that Thus, the Heun scheme is equivalent to the FDE Equation ( 113) is the Stratonovich FDE (29).
Although the Heun scheme provides the means to simulate energy jitter in a manner that is consistent with the Stratonovich formulation, it was motivated by mathematical considerations (numerical accuracy), rather than physical considerations.One can also derive a Stratonovich scheme from Eq. ( 106), by modifying the properties of the noise increments in accordance with the discussion of Section 3 [Eq.( 14)].However, to do so accurately requires that each increment be correlated with many of its neighboring increments: The scheme that results is physically motivated, but cumbersome.
For some stochastic processes the Stratonovich SDE occurs naturally, whereas for others the Ito SDE occurs naturally.The Heun scheme for the former is more accurate than the Euler scheme for the latter [34].If higher accuracy is required for an Ito SDE, one can convert it to the equivalent Stratonovich SDE (which has a different drift term) and apply the Heun scheme to the latter.Energy PDFs are plotted in Fig. 2 for µ = 5 and ζ = 3.3 × 10 −3 , which correspond to the physical parameters described in Section 3. The solid curve denotes the analytical solution (86), whereas the dashed curve denotes the numerical solution of the FPE (79) and the dots represent the results of numerical simulations based on the Stratonovich FDE (113), with a = µ −1/2 and b = (2X) 1/2 , and the correlation equation (107).The simulations were made with an ensemble of 3 × 10 6 pulses.Importance sampling [22,23] was used to improve the simulation statistics (measure the probabilities of rare energy perturbations accurately).There is excellent agreement between the analytical predictions and the numerical results.Simulations of energy jitter in systems with lumped amplification were made recently by Moore [22].There was excellent agreement between the numerical solutions of the energy SDE and the NSE, on which the energy SDE is based.Thus, there is no reason to doubt the accuracy of formula (86) for (model) systems with UDA.

Effects of nonuniformly-distributed amplification
For systems with NDA, the normalized energy X = E/E(0) is governed by the Ito SDE where r is a random driving term with unit strength.The excess gain rate ν(z) = g(z) − α and the source strength both depend on z.
In the absence of noise the energy varies periodically with distance (oscillates).The spatial period l is the distance between the Raman pumps (or the Erbium amplifiers).In the presence of noise the energy undergoes an oscillatory random walk.Energy decision thresholds (for the detection of 1s) are defined as fractions of the mean energy.In the context of such thresholds, what is important is not the absolute energy variance, but the relative energy variance (which is measured relative to the mean energy).Thus, one can simplify the mathematical analysis of energy jitter and clarify the physical significance of the results by writing the energy variable as a product of oscillatory and non-oscillatory factors.Let X = Y exp z 0 ν(z )dz .Then the modified energy Y is governed by the equation where the modified source strength Equation ( 116) is similar to Eq. (73).However, for systems with NDA, the source strength (117) oscillates.One can account for these oscillations by defining the dimensionless distance variable By using the fact that δ Equation ( 119) is identical to Eq. (73): The modified energy in systems with NDA evolves in the same way as the absolute energy in systems with UDA.Because the modified and absolute energies only differ by a scale factor, their PDFs are similar: One can apply the results of Section 4 to systems with NDA, provided that one modifies the definition of ζ and rescales the energy variable.
Comparisons of different nonlinear systems are usually based on the equality of pathaveraged energies.For systems with UDA, the path-averaged energy E a equals the input energy E(0) and the path-averaged source strength σ a = (n sp hωα/E a ).For systems with NDA, the path-averaged energy (121) For typical system parameters, energy jitter grows on a (long) distance scale of order 1 Mm, whereas the (short) distance between pumps (amplifier spacing) is of order 0.1 Mm.In the context of energy jitter, the slow (but steady) accumulation of the distance increments is more important than the (small) rapid oscillations in ζ , so one can make the approximation where the (dimensionless) ratio ρ is the product of the integrals in Eq. ( 121) divided by l 2 .It follows from this definition that ρ ≥ 1, which reflects the fact that NDA produces more noise than UDA.For example, consider a 10-Gb/s system with α = 0.21 dB/Km, β = −0.30ps 2 /Km (D = 0.38 ps/Km-nm), γ = 1.7/Km-W and backward Raman amplification.The gain rate where α p = 0.25 dB/Km is the fiber loss rate at the pump wavelength.A soliton with a fullwidth at half-maximum of 30 ps has a path-averaged energy of 21 fJ and an input energy of 75 fJ (time-averaged input power of 0.75 mW).If the system length l = 5 Mm the output noise power in both polarizations, in a frequency bandwidth of 12 GHz (wavelength bandwidth of 0.1 nm), is 5.4 µW: The (optical) SNR is 21 dB.(The SNR of this 5-Mm system with NDA is comparable to the SNR of the 10-Mm system with UDA that was discussed in Section 4.)  The mean and variance of the normalized energy X are plotted as functions of distance in Fig. 3 for the aforementioned physical parameters and µ = 5.One can determine the mean and variance by using the formulas of Section 4, and evaluating the scale factor exp z 0 ν(z )dz and the distance variable ζ numerically.One can also determine them by solving numerically the moment equations d z m x = ν(z)m x + µσ x (z), (124) which follow from Eq. ( 114) and the associated FPE.We used the latter approach.It is clear from the figure that the oscillations in excess-gain cause the mean energy to oscillate and the energy variance to oscillate as it grows.These large oscillations obscure the underlying simplicity of the energy-jitter phenomenon.The variance of the modified energy Y is plotted as a function of distance in Fig. 4. The solid curve was obtained by solving Eqs.(124) and (125) numerically, whereas the dashed line was obtained from the formula v y = 2ζ + µζ 2 .The distance variable ζ was evaluated by using Eq.(122), with ρ = 1.8.As stated above, the oscillations in ζ and (hence) v y are small: From a practical standpoint v y grows monotonically.

Summary
There are two ways to model stochastic (noisy) processes.The first involves a stochastic differential equation (SDE) for the physical quantity X, in which X is the dependent variable.The second involves a probability-diffusion, or Fokker-Planck, equation (FPE) for the probabilitydensity function (PDF) P(x), in which x is an independent variable.This paper contained brief descriptions of stochastic calculus, which facilitates the solution of SDEs, and the relation between SDEs and their associated FPEs, for which many solution methods exist.The Ito formulation is based on the assumption that the correlation-time of the noise is zero, whereas the Stratonovich formulation is based on the complementary assumption that the correlation-time is nonzero, but short.Both formulations were described.
A detailed study was made of energy jitter in optical communication systems.The nonlinear Schrödinger equation (NSE), which governs light-wave propagation in a fiber, was used to derive exact (Ito and Stratonovich) SDEs for the energy of an isolated bit (pulse and noise).Because the rate at which amplifier noise changes the bit energy depends on the current energy, these SDEs are nonlinear (multiplicative noise) and their (common) associated FPE has variable coefficients.The linearized SDE (additive noise) predicts that the energy PDF is Gaussian.Accurate formulas for the energy PDF were obtained from the nonlinear SDEs and their associated FPE.For typical system parameters (which were listed in Secs. 4 and 6) the actual PDF differs significantly from the Gauss PDF.Let E 0 be the (unperturbed) output energy in a noiseless system and let E be the (perturbed) output energy in a noisy system.Then the probability of observing the low energy E = 0.5E 0 is about three orders of magnitude smaller than that predicted by the Gauss PDF and the probability of observing the high energy E = 1.5E 0 is about two orders of magnitude larger: One cannot predict system performance accurately by linearizing the energy SDE (assuming Gaussian statistics).Reduced low-energy probabilities bode well for ASK systems (because the receiver eyes are closed by low energies, which are less likely), whereas increased high-energy probabilities bode ill for DPSK systems [because energy (power) jitter drives phase jitter and phase shifts of either sign close the receiver eyes.] In (idealized) systems with uniformly-distributed amplification (UDA) the energy variance increases monotonically with distance.In systems with nonuniformly-distributed amplification (NDA) the absolute energy variance oscillates as it grows.However, the relative energy variance (which is measured relative to the mean energy) grows monotonically.The (relative) growth rate in systems with NDA is higher than the (absolute) growth rate in systems with UDA because more noise is emitted in the former.
Finally, the mathematical methods and physical insights developed in this study of energy jitter can be applied to studies of other jitter phenomena (such as arrival-time and phase jitter).
a and b are deterministic functions and r is a random driving term with unit strength [ r(z) = 0 and r(z)r(z ) = δ (z − z )].Henceforth, the explicit dependence of a and b on z will be suppressed.Let r e denote the total rate of change of the bit energy [right side of Eq. (56) or (57)] and let δ r e denote the deviatoric rate of change r e − r e .Then, for an Ito SDE, r e has the canonical properties [Eqs.(44) and (46)] r e (z) = a(E), (58) δ r e (z)δ r e (z ) = b 2 (E)δ (z − z ).

Fig. 1 .
Fig.1.PDFs for the normalized energy E/E 0 .The solid, dot-dashed and dashed curves represent the exact PDF (86), the approximate PDF (78) and Gauss PDF (75), respectively.The differences between the exact and approximate PDFs are barely perceptible.Because the noise-induced energy perturbations depend on the pulse energy, the exact and approximate PDFs (which account for this dependence) have enhanced high-energy tails and diminished low-energy tails relative to the Gauss PDF (which does not).

Fig. 2 .
Fig. 2. PDFs for the normalized energy E/E 0 .The solid curve represents the analytical solution (86), whereas the dashed curve represents the numerical solution of the FPE (79) and the dots represent the results of importance-sampled simulations based on the FDE (113).
z )dz dz l.

Fig. 3 .
Fig. 3. Mean and variance of the normalized energy E/E(0) plotted as functions of distance.The curves were obtained by solving Eqs.(124) and (125) numerically.

Fig. 4 .
Fig. 4. Relative energy variance plotted as a function of distance.The solid curve was obtained by solving Eqs.(124) and (125) numerically, whereas the dashed line was obtained from an approximate analytical formula.