Analysis of Rutherford backscattering spectra with CNN-GRU mixture density network

Ion Beam Analysis (IBA) utilizing MeV ion beams provides valuable insights into surface elemental composition across the entire periodic table. While ion beam measurements have advanced towards high throughput for mapping applications, data analysis has lagged behind due to the challenges posed by large volumes of data and multiple detectors providing diverse analytical information. Traditional physics-based fitting algorithms for these spectra can be time-consuming and prone to local minima traps, often taking days or weeks to complete. This study presents an approach employing a Mixture Density Network (MDN) to model the posterior distribution of Elemental Depth Profiles (EDP) from input spectra. Our MDN architecture includes an encoder module (EM), leveraging a Convolutional Neural Network-Gated Recurrent Unit (CNN-GRU), and a Mixture Density Head (MDH) employing a Multi-Layer Perceptron (MLP). Validation across three datasets with varying complexities demonstrates that for simple and intermediate cases, the MDN performs comparably to the conventional automatic fitting method (Autofit). However, for more complex datasets, Autofit still outperforms the MDN. Additionally, our integrated approach, combining MDN with the automatic fit method, significantly enhances accuracy while still reducing computational time, offering a promising avenue for improved analysis in IBA.


Supplementary Material B: MDN Training
Training mixture density network is notorious for being prone to numerical instability problem.In this section, we delve into our strategies for mitigating this issue.
As usual, training is done by minimizing a loss function.In our case, the loss function is the negative log posterior.Given N train training data {(x (k) , y (k) ), k = 1, 2, ..., N train }, the joint posterior distribution is given by p(y (1) , ..., y (N train ) |x (1) , ..., x (N train ) ) = ∏ k p(y (k) |x (k) ).Given Eqs.(7) in the main article, the loss function L is therefore given by Note that a prefactor 2/N train is added in order for the loss function to reduce to mean-squared-error (MSE) loss when M = 1 and the Gaussian posterior has a constant (homoscedastic) variance.
Given the loss function (a), the source of the numerical problems often encountered during the training of Mixture Density Networks (MDN), particularly with large training datasets ( > 20,000 training samples), can be identified as follows: 1. Underflow/overflow from the exponent in (a).To mitigate this problem, we use logsumexp-trick: where k 0 = arg max k λ k .We note here that even after using this trick, overflow/underflow can still occur when λ k − λ k 0 is large.This typically happens in the initial stage of training, where the optimizer tried to wildly explore the parameter space.
As mentioned before, the logsumexp-trick does not entirely solve the numerical instability.Therefore, we propose a new loss function L ′ which is semi-equivalent, in the following sense, to the original loss L .The new loss is defined as Compared to the previous loss function (a), we can see there is no longer log or exponent functions appearing, and therefore should be more numerically stable.As it contains fewer non-linearities, it is also easier to minimize.Now we give the following proposition: Proposition 1.For any given network parameters θ , and the equality occurs when φ 1 = φ 2 = ... = φ M .
Proof.The inequality (e) is a direct consequence of the Jensen inequality, that for any concave function ϕ (such as log), For conciseness, let's denote φ k i = φ i (x (k) , y (k) ).Then from (a) and (f), Given that L < L ′ , minimizing L ′ also minimize the loss L .However, as L ′ only serves as an upper bound to the L , the minimum of L ′ is therefore not necessarily the minimum of L .Thus, further refinement is necessary.In practice, our training strategy for the Mixture Density Network (MDN) involves initially training with the L ′ loss function and subsequently refining the optimization by using L .Specifically, we use L ′ loss for the first 40% of the total epochs and then transition to using the L loss for the remaining 60% of the total epochs.
In Figure S1, we show the learning curve of our MDN model, trained on a dataset C with 50 000 training instances over 100 epochs.The MDN uses 10 Gaussian components.Despite the high number of Gaussian components and output parameters, no numerical instabilities are encountered during training.Observing Figure S1, it is evident that L ′ consistently remains larger than L , validating the proposition.We can also observe that from epoch 40 onwards, L diverges from L ′ .This divergence occurs because training with the L loss begins at epoch 40, further refining the true loss L while still maintaining L < L ′ .

Figure S1 .
Figure S1.Learning curve of MDN training on dataset C with 50 000 training data.