Conditional Recurrent Neural Networks for broad applications in nonlinear optics

We present a novel implementation of conditional Long Short-Term Memory Recurrent Neural Networks that successfully predict the spectral evolution of a pulse in nonlinear periodically-poled waveguides. The developed networks offer large flexibility by allowing the propagation of optical pulses with ranges of energies and temporal widths in waveguides with different poling periods. The results show very high agreement with the traditional numerical models. Moreover, we are able to use a single network to calculate both the real and imaginary parts of the pulse complex envelope, allowing for successfully retrieving the pulse temporal and spectral evolution using the same network.


Introduction -Machine Learning
Machine learning is a branch of artificial intelligence and computer science that involves the implementation of numerical algorithms, which autonomously learn to successfully make predictions in various fields [1,2].It has been extensively applied in image recognition and classification [3,4], natural language processing [5], time-series prediction [6], cybersecurity [7], healthcare [8], autonomous vehicle control [9] and neuroscience research [10].In the recent years, machine learning techniques have also been developed for different applications in optics [11][12][13], such as inverse design of photonic structures [14], and optical microscopy [15].
Nonlinear optical pulse propagation presents another optimal field for harnessing the capabilities of neural networks and machine learning.A number of different architectures and implementations have been proposed in recent years, ranging from physics-informed solutions that rely on knowledge by the network of the governing equations [16], to fully model-free, data-driven methods that can, for instance, predict the outcomes of the numerical solution of the Nonlinear Schrödinger equation [17][18][19], or improve numerical simulations of optical Rogue-waves [20].Optical pulse propagation in a nonlinear medium depends on the parameters of the input pump as well as the geometry of the medium.The latter can drastically affect the pulse dynamics through modifying the dispersion profile, and the nonlinearity strength.Distinguishing between the nonlinear pulse propagation in a range of different structures would often require training multiple neural networks, each for a certain structure.
In this work, we develop a fully data-driven approach using conditional recurrent neural networks (RNN) [21] to predict the outcomes of nonlinear pulse propagation for a range of input-pulse parameters in different waveguide structures that simultaneously exhibit secondand third-order nonlinearities.The developed approach also allows for the training of a unified network to compute both the spectral and temporal evolution of a pulse via assigning the real and imaginary parts of the pulse complex-envelope as the network condition.We present in this paper the architecture of the conditional RNN followed by different examples that demonstrate its potential in successfully simulating the dynamics of ultrashort pulses in nano-photonic periodically-poled lithium-niobate waveguides.
The paper is organised as follows: In Sec. 2, we present the numerical model, traditional RNNs and the conditioning of sequential data.Two trained networks based-on a single conditional recurrent architecture are discussed in Sec. 3. A trained network based on a dual conditional architecture, with two separate conditioning parameters, is explained in Sec. 4. Finally, our conclusion together with a comparison between the performances of the different conditional arXiv:2312.13326v1[physics.optics]20 Dec 2023 RNNs and the numerical model are presented in Sec. 5.

Numerical Model
The complex dynamics of the pulse electric field in periodically poled waveguides with simultaneous second-and third-order nonlinearities can be simulated with large accuracy using the unidirectional pulse propagation equation (UPPE) [22,23], where  is the propagation axis,  is the angular frequency, Ẽ (, ) = F { (, )} is the spectral electric field,  is the time in a reference frame moving with the pulse group velocity, F is the Fourier transform, () is the full dispersion,  1 is the first-order dispersion coefficient,  is the speed of light in vacuum, and  (2) and  (3) are the second-and third-order nonlinear coefficients, respectively.The pulse complex envelope can be extracted using A ((, ) = E (, )  [  0  − (  0 )  ] , with  0 the pulse central frequency, E the analytical signal given by E (, ) =  (, ) − H [ (, )], and H the Hilbert transform [24].The UPPE model is solved via implementing the split-step Fourier method [25], over a longitudinal stepsize in the range of Δ ≤ 0.2 m.For certain poling periods, we found that a very small stepsize is required to prevent the model from divergence.Hence, to optimise the performance of the model, an adaptive algorithm that decreases Δ only when required is applied.Although the solution is largely accurate, it is very time consuming and computationally demanding due to the extremely fine stepsize needed, together with the fine spectral resolution necessary for modelling the electric field.

RNNs with LSTM layers
Using recurrent neural networks can drastically reduce the simulation time of numerical solutions.RNNs are ideal for learning sequential data, due to the feedback loops, or connections within their layers [21,26], that provide an internal memory of past observations.In particular, RNNs with long short-term memory (LSTM) [27] layers can deal with longer sequences, where the sequential dependence of the data spans over a large number of observations, since these types of network avoid vanishing gradients problems [28].The standard implementation of a LSTM recurrent network consists of feeding  sequential inputs at equally-spaced Δ positions to the network, to predict the output at the next step.For instance, the pulse spectral profiles at  positions are fed to a neural network to calculate the output at the position  + 1, as shown in the simplified diagram of a LSTM network, in Fig. 1.Practically, the network consists of more LSTM and Dense layers with varying number of nodes.A limitation of this architecture is being structure-specific, since the sequential data learned by the network lacks any information about the structure, which can dramatically affect the output.

Conditional RNN
Feeding additional data is therefore crucial for predicting different dynamics that are affected by non-sequential data, such as the waveguide parameters.Such data can be fed to a RNN by appending it to the sequence, as an additional value or vector within the input matrix.This approach relies on the network learning autonomously that the additional information is not part of the sequential data, but rather a conditioning parameter, which may lead to very long and possibly unsuccessful training.Alternatively, the conditioning of the prediction can be performed a posteriori, by concatenating the RNN with another form of neural network.However, this method is ineffective in capturing a wide range of dynamics.An ideal approach is instead to condition  the hidden state [27] of the recurrent or LSTM cell, as it is fed to the cell itself.This solution has been integrated in the form of a Keras wrapper for recurrent layers, the ConditionalRecurrent [29], shown in Fig. 2. The wrapper acts on the input hidden state of every cell, allowing for the architecture to be implemented within a model similarly to a standard LSTM cell.

Single conditional RNN
The architecture of the conditional RNN with a single condition is shown in Fig. 3.The model has been designed with the Keras functional API [30] that allows for a non-sequential model, in account of the multiple inputs, i.e. the sequential and conditional data.To correctly initialise the hidden states, the conditional data is fed twice to the network, initially with the sequential data, then with the conditioned sequential data output from the first LSTM layer.This has been proven to be a good compromise between the quality of the prediction and the associated speed.

Uniformly and linearly chirped poled waveguides
We have trained the RNN to predict the spectral pulse evolution in different uniformly-poled thin-film lithium niobate waveguides with a trapezoidal core, 950 nm top width, 340 nm etching height, 800 nm thickness, 60 degree side angle, 4.56 mm length, and silica as a bottom cladding.The pulse central wavelength is fixed at 1550 nm, whereas the pulse full-width-half-maximum (FWHM) and energy are varied over the ranges 25 -30 fs and 3 -5 pJ, respectively.This is equivalent to dispersion and nonlinearity lengths in the ranges 5.27-7.59mm and 0.82-1.63mm respectively.The uniform poling of the waveguide is varied over the range 3.2 -4.4 m.The second and third-order coefficients are assumed as  (2) = 26 pm/V,  (3) = 3417 pm 2 /V 2 , respectively.
The network is trained with the simulations generated by the UPPE model with the poling period as the conditional parameter.Ten sequential steps of the spectral profile are fed together with the poling period, to obtain the next spectral intensity.The training and testing were performed on 390 and 168 propagations respectively.The splitting of the testing and training data was performed to ensure that the testing data was a representative sample of the overall dataset.The predictions by the RNN x are compared with the calculated values  from the UPPE model, via the normalised root mean square (RMS) error defined as, which is equivalent to the relative error, with  and  being the spectral and spatial coordinate respectively.The simulations can be fed with an arbitrary sampling for both the spectral and spatial axes, because the network usually does not require the extremely fine scanning used by the traditional numerical simulations.The implemented sampling has then been determined via a heuristic approach since a high resolution would lead to higher accuracy on the expense of slowing down the performance of the trained network.For this example, we have opted for a 500×150 points grid, which results in a 15 nm spectral resolution and a 30 m spatial resolution.The Dense output layer has been assigned a sigmoid activation function, since the training data has been normalised within the range (0,1).Figures 4 and 5 compare the spectral evolution predicted by the UPPE model and the conditional RNN for two different poling periods Λ = 3.9 m and 4.2 m, respectively, as well as different sets of pulse parameters that have not been used in the training process.A comparison between the results of the two models at four specific locations along the waveguide is also depicted.As portrayed, changing the poling period has dramatically modified the pulse dynamics inside the waveguide [23].Nevertheless, the outcomes of the UPPE and the conditional RNN are in excellent agreement in both cases.This is reflected in the normalised RMS errors being R = 0.015 and 0.019, for the former and latter cases, respectively.The predictions are also very good across all the testing data, with very small fluctuation in accuracy.The same architecture displayed in Fig. 3 is also trained for the prediction pulses through linearly-chirped poled waveguides that can provide quasi phase-matching for multiple secondorder nonlinear interactions.The period at any point along the propagation axis  is given by, 2 where  is the chirp parameter that depends on the initial Λ 0 and final Λ  poling periods.The shape of the continuous chirp function can be discretised as a stair-like function with multiple segments, each of constant poling period [23].The training is performed over a 10-mm long lithium niobate waveguide, where the poling-period at each step is set as the conditional parameter.
The pulse energy and temporal duration ranges are 8 -12 pJ and 30 -40 fs, corresponding to dispersion and nonlinearity lengths in the ranges 7.59-13.5mm and 0.41-0.82mm, respectively.The linearly-chirped poling period varies within the domain 3.8 -4.2 m.A lower spatial resolution of 50 m has been implemented, while maintaining the same spectral resolution, resulting in a 500×200 points grid.The model was trained on 60 propagations and tested on 30, with both subsets being a representative sample of the entire dataset.The rest of the simulation and training parameters are the same as in the uniform poling case.
The spectral evolution of an optical pulse with parameters within the aforementioned ranges in a waveguide with a poling period in the range 3.85 -4.1 m using the UPPE and conditional RNN are portrayed in Fig. 6.The shown simulations are for a set of parameters belongs to the testing data.As depicted, a very good agreement between the two models has been obtained, with a normalised RMS error of R = 0.08.Similar results have also been obtained for other linearly-chirped waveguides with different poling-period ranges.Although the sizes of the training and testing datasets are relatively small in comparison to the previous application, because of the training time needed in this case, a very good agreement is still obtained across the whole testing dataset.

Double conditional RNN
The network can also be trained to predict the real and imaginary parts of the complex envelope A, by setting them as a boolean switch, to inform the network if the prediction being performed corresponds to the real or imaginary part.Hence, both the magnitude and phase of the complex envelope A can be determined using the same network.Subsequently, predicting the pulse spectral evolution would allow us to calculate its temporal evolution via applying the inverse Fourier Transform, provided the data is on a Fourier grid.The ability of the network to retain the phase of the complex envelope widens the applications of the proposed technique, such as in calculating the spectral coherence and the cross-frequency resolved optical gating (XFROG).Moreover, this condition can be combined together with the poling period in a double-condition recurrent network, as shown in Fig. 7. Similarly to the single-conditional network, each condition needs to be fed twice for the best initialisation of the hidden states.The conditional RNN is optimised and designed for a single condition.Therefore, in the case of having multiple conditions, each layer should be initialised for each condition individually.We found that either feeding the two conditions simultaneously or feeding the first condition twice followed by the second condition twice and so on would result in a less accurate prediction.
Using the simulations generated by the UPPE model, the architecture displayed in Fig. 7 has been trained on the spectral evolution of the real and imaginary parts of the pulse complex envelope in uniformly-poled waveguides, where the period is in the range 3.4 -3.8 m.The simulation parameters are the same as the previous examples except for the pulse energy and temporal duration, that are varied over the ranges 5 -7 pJ and 25 -40 fs, respectively.The model was trained and tested on 88 and 44 propagations respectively, and the subsets were selected to both be a representative sample of the entire dataset.The sequence length is set to  = 10, and the two conditions, the poling period and the complex envelope, are fed at each step.The data is sampled with a spectral resolution of ≃ 1.5 m to allow the application of the inverse Fourier Transform to correctly restore the pulse temporal evolution.The number of points along the frequency axis is still reduced to ∼ 5k by removing the points outside the transparency window of the waveguide, which are added back as zeros after the prediction is performed by the network.We found that this would have a negligible effect on the complex envelope.The spatial resolution is set at 30 m.The Dense output layer has been set to a tanh activation function, since the data in this case is normalised in the range (-1,1).
The spectral and temporal evolution of an optical pulse with an input energy 6 pJ and temporal width 30 fs in a 4.56 mm-long lithium niobate waveguide with a uniform poling period Λ = 3.4 m is displayed in Fig. 8.The pulse parameters are not involved in the training process.As shown, the results by the RNN and UPPE models are in very good agreement, with a normalised RMS error of R = 0.009 and R = 0.03 for the spectral and temporal evolution, respectively.

Conclusions
To summarise and compare the performance of the presented conditional RNNs and the UPPE model, Table 1 shows the simulation parameters and speed of the different approaches described in this work.Since the waveguide lengths can vary, we have compared the average simulations speed rather than the time.The simulation speed for the UPPE is an estimate, since the adaptive algorithm can significantly slow down the simulations to avoid divergence.As illustrated, the conditional RNN can be up to 60 times faster than the UPPE model for the same dynamics, demonstrating its potential in predicting the outputs of nonlinear pulse propagation in waveguides.The average normalised RMS errors for each application are also shown in the table.They have been calculated over the entire testing sets for each application, and they show good agreement between the RNN predictions and the numerical solution results.Also, the obtained values of RMS error are better than other related works [17,18], that achieve values within the range 0.09-0.19. Temporal evolution predicted indirectly from spectral envelope using the inverse Fourier Transform. Step-size usually becomes much shorter than this value via the implemented adaptive algorithm, particularly in the pulse compression regime.
Average time to train with 200 propagations for 500 epochs.
In conclusion, we have exploited the single and dual conditional architectures within LSTM models in predicting nonlinear pulse propagation in optical periodically-poled nanowaveguides with second and third-order nonlinearities.These innovative approaches have demonstrated their capability to significantly outpace traditional numerical models while maintaining remarkable accuracy.The successful integration of conditional data into sequential data is a critical breakthrough in pulse-propagation modelling.Notably, this work marks the first success in predicting both the spectral and temporal evolution of a pulse using the same neural network, allowing for calculating the XFROG spectrograms or spectral coherence of the optical source.This research will offer new opportunities for the development of faster and more efficient approaches to modeling nonlinear phenomena.In this work, we set the poling period and the real/imaginary parts of the complex envelope as conditions for the RNNs.However, other conditional parameters can be included and explored as well, such as the waveguide geometry or different core materials.Since the proposed approach is fully data-driven, it can be easily applied for other nonlinear physical systems, for instance, Bose-Einstein condensates that can be described by the Gross-Pitaevskii equation [31].Since the proposed networks rely on 'Supervising Learning', they are unable to predict phenomena not captured by the model used in training.However, the RNNs perform the predictions at a constant rate, regardless the complexity of the dynamics that can dramatically reduce the computational speed.For instance, when the cascaded second-order nonlinearity [32] is dominant, we found that the computational time of traditional models can reach an hour or more for just few millimeters-long propagation, whereas the RNN models always predict the outcome in few seconds.This approach can hence be utilised in optimising a wide range of input parameters to tailor a certain output.Finally, we envisage that this work would offer fruitful opportunities for expanding the applications of machine learning in linear and nonlinear optics.

Fig. 1 .
Fig. 1.Standard implementation of a LSTM network for predicting the spectral evolution of a pulse. steps of the propagation, shown in a color plot (left) are fed to the network (middle) to predict the next step (right).

Fig. 2 .
Fig. 2. Diagram of ConditionalRecurrent (in orange) around a LSTM cell (green).The wrapper embeds the condition cond  within the hidden state of the LSTM cell, h −Δ , feeding the conditioned hidden state h cond to the cell.The cell state c −Δ and the input  ( − Δ) are not affected.

Fig. 3 .
Fig.3.Architecture of a single-condition RNN, where the conditional data is fed twice to the network, while the sequential data is only fed once.

Fig. 4 .
Fig. 4. (a,b) Spectral evolution of optical pulse centred at 1.55 m with input energy 3.75 pJ and temporal FWHM 28.5 fs, in a 4.56 mm-long LiNbO 3 waveguide with a uniform poling period Λ = 3.9 m, predicted by the numerical UPPE model (a) and conditional neural network (b).(c-f) Spectral intensity profiles at four different positions along the waveguide.

Fig. 5 .
Fig. 5. (a,b) Spectral evolution of an optical pulse centred at 1.55 m with input energy 4.25 pJ and temporal FWHM 26.5 fs, in a 4.56 mm-long LiNbO 3 waveguide with a uniform poling period Λ = 4.2 m, predicted by the numerical UPPE model (a) and conditional neural network (b).(c-f) Spectral intensity profiles at four different positions along the waveguide.

Fig. 6 .
Fig. 6. (a,b) Spectral evolution of a pulse of an input energy 12 pJ and temporal width 30 fs in a 10 mm-long LiNbO 3 waveguide with a chirp poling period from Λ 0 = 3.85 m to Λ  = 4.1 m, predicted by the UPPE model (left) and conditional neural network (right).(c-f) Spectral intensity at different distances throughout the propagation.

Fig. 7 .
Fig. 7. Architecture of a dual conditional model, where each conditional data is fed twice to the network and the sequential data is fed only once.

Fig. 8 .
Fig. 8. (a,b) Spectral and (c,d) temporal evolution of a pulse of an energy 6 pJ and temporal width 30 fs in a 4.56 mm-long LiNbO 3 waveguide with uniform poling period Λ = 3.4 m, using the UPPE model (a,c) and the double conditional neural network(b,d).

Table 1 .
Comparison between the UPPE and trained used in Figs.4,5 (Uniform NN), Fig. 6 (Chirp NN), and Fig. 8 (Complex NN) in terms of the simulation parameters and speed.