Physics-constrained machine learning for electrodynamics without gauge ambiguity based on Fourier transformed Maxwell’s equations

We utilize a Fourier transformation-based representation of Maxwell’s equations to develop physics-constrained neural networks for electrodynamics without gauge ambiguity, which we label the Fourier–Helmholtz–Maxwell neural operator method. In this approach, both of Gauss’s laws and Faraday’s law are built in as hard constraints, as well as the longitudinal component of Ampère–Maxwell in Fourier space, assuming the continuity equation. An encoder–decoder network acts as a solution operator for the transverse components of the Fourier transformed vector potential, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{{\textbf {A}}}_\perp ({\textbf {k}}, t)$$\end{document}A^⊥(k,t), whose two degrees of freedom are used to predict the electromagnetic fields. This method was tested on two electron beam simulations. Among the models investigated, it was found that a U-Net architecture exhibited the best performance as it trained quicker, was more accurate and generalized better than the other architectures examined. We demonstrate that our approach is useful for solving Maxwell’s equations for the electromagnetic fields generated by intense relativistic charged particle beams and that it generalizes well to unseen test data, while being orders of magnitude quicker than conventional simulations. We show that the model can be re-trained to make highly accurate predictions in as few as 20 epochs on a previously unseen data set.


Formalism Maxwell's equations in Fourier space
The formalism that will be employed here is also used in the canonical quantization of the electromagnetic field 41 , where distinguishing between the physical degrees of freedom and the gauge redundancies is important.In the context of our study, we use it to minimize the number of fields needed to be modeled, while also ensuring that a substantial number of Maxwell's equations are automatically satisfied as hard constraints.We start with Maxwell's equations in position space: where E(r, t) , B(r, t) , J(r, t) , and ρ(r, t) are the time-varying electric field, magnetic field, current density, and charge density, respectively 42 .We perform a 3D spatial Fourier transform on Maxwell's equations, which is defined for a vector field F(r, t) as and rewrite Eq. (1) (assuming k = 0 ; see "Appendix A" for a discussion of the zero case): For each wave vector k , a generic vector can be separated into components longitudinal and transverse to k: An inverse Fourier transformation of Eq. (4) will lead to a Helmholtz decomposition of F(r, t) .Gauss's law for the electric and magnetic fields (Eq.( 3)) can then be expressed as: In Fourier space, the electric and magnetic fields' dependence on the Fourier transformed scalar and vector potential, φ(k, t) and Â(k, t) , is given by Ê(k, t) = −ik φ(k, t) − ∂ Â(k,t) ∂t , B(k, t) = ik × Â(k, t) .The transverse field components are: Using Eq. ( 3), Faraday's law trivially follows: The transverse components of the Ampère-Maxwell equation become: while using Eq. ( 5) it can be shown that the longitudinal components of the Ampère-Maxwell equation is equivalent to the continuity equation.The continuity equation also relates Ĵ� to ρ , thus only ρ and Ĵ⊥ need be used.

Fourier-Helmholtz-Maxwell neural operator
Given the charge and current density distributions, the EM Fields can be by determined by Â⊥ (k, t) , as per Eq. ( 9), and ρ(k, t) .Â⊥ (k, t) can be found from Ĵ⊥ (k, t) via Eq. ( 8).From this, we propose finding the generated EM fields using the approach depicted in the diagram of Fig. 1, where the hard constraints (ones automatically obeyed) are listed.Since this approach incorporates Fourier transformations, Helmholtz decompositions, and Maxwell's equations within a neural operator framework, we refer to it as the Fourier-Helmholtz-Maxwell neural operator (FoHM-NO) method.
In this work, we attempt to estimate the EM fields from ρ and J using the FoHM-NO method, specifically with convolutional neural networks.When applying neural networks to physics problems, prior theoretical knowledge can be leveraged to model useful quantities already identified by physicists (i.e., Â⊥ (k, t) ), rather than the direct observables (i.e., E(r, t) and B(r, t) ).This has been demonstrated previously through the use of neural networks to model the Hamiltonian of classical systems 43 , the scalar and vector potentials to estimate EM fields 40 and the stream function of an incompressible fluid's velocity flow field 44 .
Ultimately, the FoHM-NO method exploits that the EM fields have only 2 independent degrees of freedom, as can be seen from Eq. ( 9) in the source-less case.On a fundamental level though, this stems from the photon being a spin-1 massless particle.One benefit of the method is that it require fewer fields than the 6 that would be needed for directly estimating both the electric and magnetic field, or the 4 necessary by using the scalar potential and the entire vector potential.With more fields to fit, there comes increasing tension between fitting the data while also satisfying physical conditions.In contrast, with this procedure by construction three of the four Maxwell's equations (Eqs.( 5) and ( 7)) are built-in as hard constraints.This is also the case for the longitudinal component of Ampère-Maxwell's Law, which follows from the continuity equation and Gauss's law for the electric field (see "Appendix B").Working in Fourier space, meanwhile, has the advantage that spatial derivatives are traded for multiplication by wave vectors.Thus, constraints like Eq. ( 4) can be easier to satisfy than their spatial derivative counterparts.For a fast Fourier transform (FFT), if N is the total number of spatial data points, then the FFT has a time complexity of O N log N .This, along with modern GPU computing, means that moving to and fro from Fourier space can be done extremely quickly, even for very large data sets.We did not perform a temporal Fourier transform as the time length of the simulations varied.This variation would pose challenges to the networks to learn in the frequency domain.Additionally, the adoption of a 4D CNN would present computational challenges and significantly increases the number of parameters in models.
Computational EM methods done in configuration space have a notable advantage: an easy, straight-forward incorporation of spatial boundary conditions, which can be useful in solving many real-world EM problems.Here, we don't consider any special spatial boundary conditions.For one, the simulation was just started with the initial conditions of the particles.Secondly, neural networks have been found to be useful in solving inverse PDE problems, where the problem may be ill-posed.For example, insufficient boundary conditions, a case where classical methods struggle or cannot be used [45][46][47] .We wish to apply our approach to inverse problems in future works.
An additional benefit of FoHM-NO is that by focusing on just Â⊥ (k, t) rather than all the potential fields, the issue of gauge redundancy is bypassed.To see this, note that under a gauge transformation, for some scalar function f (k, t) .Since the vector potential change occurs in the k direction then, ( 9) www.nature.com/scientificreports/Hence, by using just Â⊥ (k, t) there is no need to choose a particular gauge.The gauge invariance of Â⊥ (k, t) is well known in works involving the quantization of the EM field 41,48,49 and the spin-orbit decomposition of gauge fields 50 .
Care must be taken, though, if one were to change inertial reference frames as Â⊥ (k, t) , unlike the gauge- dependent A µ = (φ/c, A) , is not a Lorentz 4-vector.For our purposes, which is finding efficient and accurate computational methods for accelerator physics, this is not a serious issue.
The rest of this article is a proof-of-concept application of FoHM-NO for predicting the EM fields generated by relativistically charged particle beams.However, it is applicable to any EM setting where ρ and J are given, although it can also be extended to include the matter density fields in the predictions if the equations of state are used.While FoHM-NO may generally not perform as accurately as mature, state-of-the-art EM codes, the use of neural networks here gives it the advantage of having a very fast runtime.This can be useful in cases where speed is essential, such as accelerator control and diagnostics settings, where beam time can be quite limited.Alternatively, it can be used for exploratory analyses to find approximate solutions before bringing in more accurate, but slower, methods for refinement.

Data
Two simulations of electrons beams entering a solenoid magnetic fields were generated (see Fig. 2) to develop the FoHM-NO method.The first, the complex beam, was generated to train the neural networks.This complex beam was initially composed of several Gaussian bunches with spreads differing both between bunches and along the x, y, and z axes (see Fig. 3).The second, the Gaussian Beam, was generated to test the networks on a completely new unseen distribution which was not used for model training (see Fig. 4).This test beam starts as a Gaussian bunch before entering the magnetic field.This enabled us to test the method under very different scenarios,  with the more intricate complex beam having multiple length scales in order to train the networks over a wider range of inputs.The Gaussian beam, which is much more simple, was used as a test as it more closely resembles physical beams in accelerators.All the data was generated using general particle tracer (GPT) 51,52 .
Considering the energy scales involved, GPT simulations only considered the space charge forces.For computational efficiency, GPT only registers EM fields where particles are present, creating artifacts where the EM field suddenly drop to 0 where particles are absent.To address this issue, a large neutral group of particles co-moving with the beam was added to enable the registration of EM fields on the whole domain of interest without altering the motion of the electrons.Although the simulation took place on a 128 × 128 × 128 spatial lattice, there were artifacts where the EM fields abruptly dropped to zero on the edges (not the matter fields, however).Therefore, the EM fields were cropped to 126 × 126 × 126 for both beams.
Each simulations ran a 2 nC bunch of electrons with average energy 5.6 MeV moving through a 0.5 T solenoid magnetic field, first being compressed and then expanding (see Fig. 3).The space is a 128 × 128 × 128 spatial grid.There were 201 time steps for the complex beam and 171 for the simple beam.The beam's dynamics were simulated over a total time of 1.2 ns during which it traveled over a 0.35 m distance.The beam's charge density ρ(r, t) and current density J(r, t) were binned as 3D histograms over cubes with 1.3 mm side lengths, as shown in Figs. 3 and 4. The physical spacing of grid were: To improve convergence, we first scaled the data according to standard deviation.The data had a heavy tail, so in addition a log-like transform was performed �(x) = sgn(x) ln (1 + |x|) , which has the property that larger values get greatly reduced, but smaller values stay about the same since �(x) ≈ x for |x| ≪ 1.
In this work, there were no special boundary condition considerations, an external magnetic field as generated by a cylindrically symmetric solenoid was provided as input.As charged particles moved through this field their dynamics were affected by the external solenoid field and by their own self-generated electric and magnetic fields which were calculated according to Maxwell's equations based on the charge density and current density of the charged particle beam.In this setup the accelerator beam-pipe was assumed to be of large enough radius so that image charges and currents induced in the accelerator walls did not need to be considered.In future work, boundary conditions including image charges and Maxwell's equations on the surfaces of conductors will be incorporated into the ML model as additional hard physics constraints.

Physics-constrained neural networks
Neural networks (NN's) are powerful machine learning tools that, by the universal approximation theorem, can be used to approximate a function to arbitrary precision 53,54 .NN's ability to approximate go even further by being universal approximators for operators 55 .This has been harnessed to produce approximate solutions to partial differential equations 56,57 , 3D electrodynamics 40 , and also two spatial dimensional magnetohydrodynamics simulations 58,59 .
The inclusion of physics information, constraints, or symmetries into NN be used to help guide training 60 .In some cases, these physics-incorporated NN's can produce accurate predictions that are less computationally (12)  x = y = z = 10.2 µm, t = 6 ps.www.nature.com/scientificreports/demanding than numerical simulations, while also being easier to develop and implement 61 .Here, we make use of a convolutional encoder-decoder architectures to make an NN that has Ĵ⊥ as input and Â⊥ as output.

Models and fit
For a relativistic beam propagating along the z direction, we generally have |J z | ≫ |J x |, |J y | and consequentially it is expected that the magnetic field is predominately in the x and y direction: It was verified that this was indeed the case for both beam simulations used here.In fact, B z was both small and noisy, likely heavily corrupted by numerical errors.Thus, to make things more computationally efficient and increase training speed, the model was trained only using the z component of Ĵ⊥ to predict the z component of Â⊥ : From this the transverse components of the magnetic field, Bx and By , are obtained via B = ik × ( Â⊥,z ẑ) .The models were trained on the first 85% of the complex beam run, which is the training set.The remainder, validation set, was used for hyperparameter tuning.The entire Gaussian Beam data was used as test data.

Architectures
Different architectures for the neural network were explored to optimize performance.The first models trained had a CNN encoder-decoder architecture design, which is depicted in Fig. 5 (once the skip connections are removed).This was chosen as the reduction and reconstruction from the small latent space acts as an effective regularizer.In addition, it was successful in a previous beam simulation task 40 .The encoder compresses the input Ĵ⊥,z to the latent space via various functional compositions, represented as layers.From there, two decoder arms are applied to the latent space, one producing ℜ( Â⊥,z ) and the other ℑ( Â⊥,z ) .Empirically, this bifurcated structure produced better results than having a single decoder that gave the both real and imaginary components of Â⊥,z as output.We made two models following Fig. 5 differing by the number of filters in the latent space, 4 and 6 filters, which we label CNN-small-4 and CNN-small-6, respectively.Also, we used an architecture similar to CNNsmall-4, but with more filters in the hidden layers and thus more parameters, dubbed here CNN-large.We also investigated the adaption of an U-Net encoder-decoder architecture 62 , visually depicted in Fig. 5, which we call U-Net.U-Nets have found applications in scientific computation 63,64 and computer vision.One limitation of the CNN encoder-decoder architecture in Fig. 5 is that information of the input gets 'forgotten' as it's compressed to the latent space.The U-Net addresses this by adding 'skipped connections' between hidden layers of the encoder and decode that share the same dimensions.These connections concatenate the feature maps, allowing the decoder to pull information from the encoder and generally reproduce finer details more accurately than CNN architectures without skipped connections.The skipped connections are also useful during the training process by allowing for more gradient flow, thus can act as an effective tool against vanishing gradients and can speed up training.www.nature.com/scientificreports/Fits An L 2 penalty for the weights and dropout with p = 0.05 were used for regularization.All the NN's were cre- ated, trained and tested using Tensorflow 65 , with training being done using the Adam optimizer 66 with early stopping based on validation loss.All the machine learning training was done on a Dell Precision 7920 with two Nvidia RTX A6000 GPU's with 48 GB VRAM each.A summary of the four architectures with a description can be found in Table 1.

Cost function and metric
NNs were trained to minimize mean squared error (MSE) in Fourier space summed over each time step in the training data: where BNN is the NN prediction for the Fourier transformed magnetic field.By the Plancherel theorem, this quantity is equivalent to the MSE in position space over the training data: To evaluate the model performance at time step t we used the relative error of the magnetic field in position space: Here, �•� denotes a spatial integral over the domain.The time averaged ε(t) was used to evaluate a model over a data set.

Results
The U-Net performed best across all data sets-the training, validation and test data (see Table In addition, as seen in Fig. 6, at each time step for each data set it either had the lowest error or was virtually tied.Remarkably, the U-Net was able to accomplish this with approximately half the training epochs of the smaller CNN's and a quarter that of CNN-large.Figure 7 shows predictions for B x at z = 0 for the two models which performed best on the test data sets and how they compare to the true values.The results for B y were similar, and thus aren't shown.The small CNN networks struggled to capture the complexity of the training data.As seen in Fig. 7, while they had a qualitatively correct picture, they struggled on getting many of the finer details correct.The large CNN model was able to capture the training data better, but its predictions quickly deteriorated on the validation set as it got further away from the training data, eventually surpassing the error of CNN-small-4, likely due to overfitting of the larger network.The U-Net captured the training data well and although its error also increases on the validation set, as seen in Fig. 6, it does so at a much smaller rate than any of the CNN models, indicating it learned to generalize better. All the models struggled the most at the beginning of the Gaussian Beam, where the initial beam was most different from what they were trained on.Of the CNN models, CNN-small-4 did the best job generalizing, though it still lagged behind the U-Net.These results reinforce that the U-Net is doing the best at generalizing.( 14) The U-Net model took 380 training epochs.We took that trained model and retrained it on the first 85% of Gaussian Beam.In only 20 epochs, the model's relative error on the Gaussian beam dropped dramatically, and its performance was comparable to what it was on the complex beam (see Fig. 8).This demonstrates that our general can be useful for a wide range of new beam configurations which the model has not previously seen, so that it can be applied for the study of new different setups.

Improving generalization
To demonstrate the robustness and generalization capability of our approach when trained with more data, two new datasets were created, complex beam 2 and simple beam 2. Like the previous simulations, it consisted of an electron bunch entering a solenoid field, however now the initial distribution gradually dropped off in the z direction, as opposed to suddenly (see Figs. 3 and 4).
Using the pre-trained U-Net from earlier, the network was trained on both complex beam and complex beam 2. The network was tested on both the Gaussian Beam and Gaussian Beam 2, then its performance compared to the network trained on just the complex beam.The results can be seen in Fig. 9.For the Gaussian Beam, the average relative error of the new network on the Gaussian beam was 7.3% , while the network trained exclusively on the complex beam was 8.6% .Thus, the new network improved on predictions for the Gaussian beam with the inclusion of more data in the training set.For Gaussian Beam 2, the network trained on both complex beams consistently performed better, with an average relative error of 9.1% compared to 24.6% .Thus, we see the inclusion of more data led to an improvement in the network"s ability to generalize.

Discussion
One reason for the great success of the U-Net could be simply, as mentioned, the skipped connections help with the optimization.Another possibility is that, given that local J z will correlate strongly with local B , the network can start with a good first approximation by not altering the input that much.Indeed, one can compare, for example, J z at the z = 0 mm slice for time step = 150 in the complex beam shown in Fig. 3 with the correspond- ing magnetic field in Fig. 7.The time of inputting Ĵ⊥,z and outputting Â⊥,z for the entire test set was of the order of a few seconds.This is in contrast to the GPT simulation, which ran for ∼ a day.Of course, the GPT simulator is creating both the matter and electromagnetic fields, so this is not exactly a direct comparison.Nonetheless, it does illustrate some of the potential of the FoHMM-NO method to speed EM simulations greatly.
One way to improve our results would be exposing the model to many distinct simulations during the training phase.This would increase the duration of training, but our results in 5.2 suggests that this will make the model more accurate, robust and would generalize better.There are still some of Maxwell's equations that are not built in as hard constraints, namely Eq. ( 8).One possible way to add this as an inductive bias would be to include in the cost function a penalty term that enforces Eq. ( 8), as is done with physics informed neural networks 61 .Such terms have also been used to enforce the Lorenz gauge for PCNN's predicting electromagnetic potential fields 40 and enforcing the mangetohydrodynamics equations with physics informed neural operators 58 .Although this is not a hard constraint, during training it would guide the model to predict more physical solutions.As mentioned, the approach here can be expanded to include ρ and J in the predictions.Alternatively, it can be incorporated into a conventional simulator to speed up it up.

Conclusions
Using Maxwell's equation in Fourier space and modeling EM fields via Â⊥,z , we have developed a novel approach to applying machine learning to EM problems.This method has the advantage of incorporating several of

•Figure 1 .
Figure1.The FoHM-NO method and its built-in hard physics constraints.F denotes Fourier transformation, P ⊥ is the projection of the vector field to components transverse to k , NN a neural network, and F −1 is inverse Fourier transformation.

Figure 2 .
Figure 2. The (x, y) trajectories of 1000 random electrons within the beam are shown along with the z-component of the 0.5 Tesla magnetic field of the solenoid which over-compresses the beam.

Figure 3 .
Figure 3. Cross-sections showing the spatial distribution of J z for the complex beam.The top row of plots shows x-y slices at z = 0 mm for different time steps.The bottom row shows x-z slices at y = 0 at the same time steps.

Figure 4 .
Figure 4. Cross-sections showing the spatial distribution of J z for the Gaussian beam.The top row of plots shows x-y slices at z = 0 mm for different time steps.The bottom row shows x-z slices at y = 0 at the same time steps.

Figure 5 .
Figure 5.The adapted U-Net architecture is shown.For the number of starting filters we took F = 16 .The skip connection concatenates two feature maps of the same dimensions.The CNN architectures were very similar, with the skip connections removed and the number of filters in each layer changed with CNN-small-4 having filter numbers 16, 32, 64, 128, 64, 4, 64, 64, 32, 16 between the inputs and outputs.Having a double convolution layer before the in the input and output improved results.The output here is also cropped to 126 × 126 × 126 (see text).

Figure 6 .
Figure 6.Relative Error of the models on both the complex beam and the Gaussian beam.

Figure 7 .
Figure 7.The top 5 rows show model predictions and absolute errors for the complex beam and the bottom 5 rows for the test Gaussian beam for B x at the z = 0 mm slice at various time points.Only the 2 best performing models are shown.The scale of the absolute errors is zoomed in to 1/5 that of the true and model predictions.

Table 1 .
Investigated architectures are arranged in descending order based on total number of parameters.

Table 2 .
67mmary of results.Error over a data set is ε(t) , defined by Eq. (16), averaged over time.Bold indicates best performing model according to column metric.Transfer learning is the process where a model trained on one set of data is retrained on another set.Since much of what was learned earlier can be helpful for the new data set, this cuts down on the training time and is quicker than if one started from scratch.It has been used in cases like Physics-Informed Neural Networks to learn beyond a single instance of a PDE67.