Predicting Colloidal Interaction Parameters from Small Angle X-Ray Scattering

: 8 Small Angle X-Ray Scattering (SAXS) is a characterization technique which allows for the 9 study of colloidal interactions by fitting the structure factor of the SAXS profile for a selected 10 model and closure relation. However, the applicability of this approach is constrained by the 11 limited number of existing models which can be fitted analytically, as well as the narrow 12 operating range for which the models are valid. In this work, we demonstrate a proof-of-13 concept for using an artificial neural network (ANN) trained on small-angle x-ray scattering 14 (SAXS) curves obtained from Monte Carlo (MC) simulations to predict values of the effective 15 macroion valency ( 𝑍 eff ) and the Debye length ( 𝜅 −1 ) for a given SAXS profile. This ANN, 16 which was trained on 200,000 simulated SAXS curves, was able to predict values of 𝑍 eff and 17 𝜅 −1 for a test set containing 25,000 simulated SAXS curves with ± 20% accuracy to the 18 ground truth values. Subsequently, an ANN was used as a surrogate model in a Markov

SAXS is a technique which has been used to study colloidal interactions as information about the spatial configuration of colloidal particles in solution is captured within the structure factor, S(q), of the SAXS scattering profile. 12However, the S(q) profile is typically largely featureless, which complicates direct parameter extraction from the S(q) profile due to the lack of distinct features such as peaks. 13Thus, analysis of SAXS curves are typically carried out by fitting the obtained curves using either the Ornstein-Zernike (OZ) integral equation theory or Monte Carlo (MC) simulations.The OZ theory is a thermodynamically consistent theory, which can provide an analytical solution for a chosen model given an appropriately selected closure relation with a minimal computational cost.This approach, which is currently implemented via open-source software packages such as SasView 14 and SASFit 15 , have two key limitations, which are namely the limited number of models for which fitting can be performed, and approximations induced by inherent properties of the selected models and closure relations. 16On the other hand, MC simulations, which are more computationally expensive, have become increasingly common for the analysis of SAXS curves, as this approach allows the actual equilibrium distribution of the system to be obtained for any desired interaction potentials.Machine learning (ML) algorithms have also been used in conjunction with MC simulations, which can be used to generate SAXS curves as training data for the ML, to predict various sample properties from SAXS curves.For instance, the Computational Reverse-Engineering Analysis for Scattering Experiments (CREASE) tool, which uses a genetic algorithm optimizer with an artificial neural network (ANN) surrogate model, is able to determine information such as micelle dimensions and chain configurations of soft matter systems for an input small angle scattering curve. 17ANNs trained on noiseaugmented SAXS profiles of nucleic acids, folded and unfolded proteins generated from experimentally determined models were also successfully employed to estimate the molecular weights and radii of gyration of these macromolecules from corresponding SAXS profiles. 18These existing studies in the literature have primarily focused on using ML https://doi.org/10.26434/chemrxiv-2024-swz2p-v2ORCID: https://orcid.org/0009-0003-4036-2727Content not peer-reviewed by ChemRxiv.License: CC BY-NC-ND 4.0 algorithms to predict sample properties from the form factor, P(q), of a SAXS profile, which is related to the shape and morphology of a scattering entity.To the best of our knowledge, there are no existing studies related to using ML algorithms to predict colloidal interaction parameters from the S(q) of a SAXS profile.
In this work, we demonstrate a proof-of-concept for using an ANN trained on SAXS curves obtained from MC simulations of AuNPs to estimate colloidal interactions parameters, the effective macroion valency ( eff ) and the Debye length ( −1 ), as described by the DLVO theory.Optimization was also carried out to determine the range of q values would provide the best predictive performance for  eff and  −1 .Subsequently, Markov Chain Monte Carlo (MCMC) sampling augmented with an ANN surrogate model was used to sample a defined parameter space to find the maximum a posteriori (MAP) estimates for  eff and  −1 corresponding to an experimentally obtained SAXS profile of AuNPs, as well as providing principled estimates of the confidence in the fitting and information about any possible correlation between parameters.

Materials and Methods:
Gold Nanoparticle (AuNP) Synthesis and Ligand Exchange Procedure.The AuNPs used in this study were synthesized using the protocol reported by Yang et al. 11 Briefly, 0.5 mmol of hydrogen tetrachloroaurate (III) hydrate (HAuCl4•3H2O, 95%, Sigma-Aldrich) was dissolved in 40 mL of a 1:1 volume mixture of oleylamine (C18 content: 80% -90%, Acros Organics) and n-octane (97%) in a 100 mL jacketed round bottom flask.The mixture was sonicated in an inert Argon atmosphere for 10 minutes to ensure complete dissolution of the HAuCl4.The flask was then connected to a temperature-controlled circulating bath (Grant Instruments, GR150-R2) and the contents of the flask were allowed to equilibrate to 20℃.
The reducing solution was prepared by dissolving 0.5 mmol of t-butylamine-borane complex https://doi.org/10.26434/chemrxiv-2024-swz2p-v2ORCID: https://orcid.org/0009-0003-4036-2727Content not peer-reviewed by ChemRxiv.License: CC BY-NC-ND 4.0 (97%) in 1 mL of oleylamine and 1 mL of n-octane before it was rapidly injected into the precursor solution under vigorous stirring.The reaction was left to run for 2 hours at 20℃ in an Argon atmosphere before the reaction was quenched via the addition of 30 mL of acetone.The AuNPs were washed with ethanol, collected via centrifugation at 10,000 rpm for 10 minutes and redispersed in dichloromethane.The washing protocol was repeated three times before the obtained AuNPs were dried overnight in a vacuum desiccator prior to the ligand exchange procedure.0.2 mmol of 11-mercapto-1-undecanesulfonate (MUS) was dissolved in 10 mL of dichloromethane by vigorous mixing at room temperature for 10 minutes.After which, 30 mg of the oleylamine capped AuNPs were dissolved in 5 mL of dichloromethane, injected into the thiol mixture, and allowed to react for 6 hours at room temperature.The ligand exchange reaction was terminated by first evaporating the solvent mixture in a rotary evaporator.The AuNPs were subsequently redissolved in approximately 10 mL of ultrapure water and purified using an Amicon Ultra-15 centrifugal filter unit (Merck Millipore, 10 kDa MWCO).These AuNPs were redispersed and passed through the filter unit thrice to remove any excess free ligands in solution before being dried overnight in a vacuum desiccator prior to subsequent experiments.
AuNP Small Angle X-Ray Scattering (SAXS).SAXS experiments were carried out on a Ganesha 300XL (SAXSLAB) at 20℃ under vacuum with a high brilliance microfocus Cu source ( = 1.54 Å) for exposure durations of 1 hour (q range: 0.015 Å -1 -0.65 Å -1 ).The beam centre and the sample-to-detector distance were calibrated using the positions of diffraction peaks from a standard silver behenate powder before the scattering experiments.
Samples were loaded into borosilicate glass capillaries and sealed using hot glue before being loaded into the measurement chamber.2D SAXS patterns were collected using a Pilatus 300 K solid-state photon-counting detector with a 2 mm beam stop.The obtained 2D SAXS patterns were then radially averaged around the direct beam position using the SAXSGUI software to obtain 1D SAXS patterns.SAXS patterns were obtained for two https://doi.org/10.26434/chemrxiv-2024-swz2p-v2ORCID: https://orcid.org/0009-0003-4036-2727Content not peer-reviewed by ChemRxiv.License: CC BY-NC-ND 4.0 samples, namely aqueous solutions of 5 mg/mL and 40 mg/mL of the MUS-AuNPs.The 5 mg/mL MUS-AuNP sample was sufficiently dilute and was assumed to only have contributions from the form factor (i.e.() ≈ ()).The obtained SAXS curves were fitted using SASFit and the best fit diameter was found to be 3.97 nm, which was the value used in the subsequent Monte Carlo simulations.
Monte Carlo (MC) Simulations.The MC simulations were initialized by randomly assigning coordinates to 5,000 spherical particles within a defined space and interparticle interactions were modelled using the DLVO theory.More specifically, interparticle interactions were considered to be a pairwise sum of van der Waal's attraction and electrostatic repulsion and were calculated for sampled values of  eff and  −1 using Equations ( 1), ( 2) and (3). 15 where, () is the DLVO interaction potential,  vdW is the van der Waals attraction potential and  el is the electrostatic repulsion potential,  is the radius of the spheres and  is the centre-to-centre distance between the two spheres,  A is the Hamaker constant,  B is the Boltzmann constant,  is the temperature,  B is the Bjerrum length.
The MC simulation was then iterated by moving a randomly selected particle and calculating the total free energy within the system.The particle movement was accepted if it lowered the total free energy within the system, and rejected if the particle movement increased the total https://doi.org/10.26434/chemrxiv-2024-swz2p-v2ORCID: https://orcid.org/0009-0003-4036-2727Content not peer-reviewed by ChemRxiv.License: CC BY-NC-ND 4.0 free energy within the system.In order to determine the equilibrium particle distribution, values of total free energy were compared every 100,000 MC steps and the simulation was considered to have reached equilibrium if the difference in total free energy was less than 1% for 100,000 MC steps.After the MC simulation reached equilibrium, the simulation was allowed to run for another 50,000 iterations, from which 1,000 sets of particle coordinates were randomly selected to calculate a radial distribution function (RDF). 19The structure factor, (), was calculated using Equation (4). 20 where, () is the scattering structure,  p is the particle number density, () is the radial distribution function,  is the distance from a reference particle and  is the scattering vector.
The form factor, () , was measured experimentally and approximated using a SAXS measurement of a dilute, 5 mg/mL aqueous solution of MUS-AuNPs, allowing simulated SAXS curves to be calculated by using Equation (5).
A total of 250,000 simulated SAXS curves were obtained from the Monte Carlo simulations over a q range of 0.012 Å -1 to 0. combination (i.e.how well the generated curve fits the data) by using a  2 likelihood function shown in Equation (6).
where,  k is the predicted scattering value,  k is the measured scattering value and  k is the uncertainty in the measurement of the k th bin.
The MCMC sampling was initialized with uniform priors for  eff and  −1 over the range of 10.0 -70.0 and 3.0 nm -7.0 nm respectively and outputs a maximum a posteriori (MAP) estimate, as well as a probability density function which maps the combinations of parameters against the goodness of fit, which provides insight into the confidence in the fitted parameters, reveals any correlations between parameters and accounts for the possible presence of multiple local minimum solutions.All MCMC sampling was performed using the emcee package. 21s and Discussion: Comparison between SAXS curves obtained from Monte Carlo Simulations and Analytical Methods.SASFit and SASView are existing software packages which facilitate the implementation of analytical methods to perform curve fitting for an input SAXS curve to obtain colloidal interaction parameters.More specifically, the software packages compute SAXS curves for a selected pair of form and structure factor models and performs least squares curve fitting to obtain parameters corresponding to a SAXS curve with the smallest  2 difference to the input SAXS curve.However, a successful implementation of the workflow described is heavily dependent on the assumptions made in the selected models.
For instance, the Hayter-Penfold RMSA is a structure factor model which is suitable for https://doi.org/10.26434/chemrxiv-2024-swz2p-v2ORCID: https://orcid.org/0009-0003-4036-2727Content not peer-reviewed by ChemRxiv.License: CC BY-NC-ND 4.0 modelling interactions between charged particles experiencing screened Coulomb repulsion forces, but this potential is only valid for colloid systems with interaction parameters within a certain operating envelope. 22However, the colloidal interaction parameters in systems of interest may not always align with this scope, thereby limiting the general applicability of this model.considering that the Hayter-Penfold RMSA model is established to only be valid for colloidal systems where the product of the Debye length and the particle diameter, k, is ≤ 6.
Nonetheless, the SAXS curves generated from the software packages are in good agreement with the curves obtained from Monte Carlo simulations, suggesting that the Monte Carlo-based procedure of obtaining SAXS curves provides a comparable alternative to the existing analytical approach.
However, the software packages were generally unable to output SAXS curves for  ≥ 4.5 nm, with the exception of SASFit for  −1 = 5.14 nm.This result is consistent with the aforementioned operating envelope which the Hayter-Penfold RMSA model is limited by.On the other hand, the Monte Carlo approach was able to simulate SAXS curves for  −1 ≥ 4.5 nm for a range of  eff values.The appearance of an increasingly prominent downward curvature at the low q region, which is consistent with the presence of stronger interparticle repulsion 23 , is also observed in plots with increasing  eff values for similar values of  −1 .

Training and Validation of an Artificial Neural Network (ANN) for the Prediction of 𝒁 𝐞𝐟𝐟
and  − from SAXS curves.The SAXS curves obtained from the Monte Carlo simulations were used to train an ANN to predict  eff and  from a given SAXS curve.The performance of the ANN was evaluated on a test set containing 25,000 curves and is summarized in of each other, as illustrated in Figure 1, where an increase in either parameter results in stronger interparticle repulsion and thus a more pronounced downward curvature at the low q region of the curve.This highlights a key limitation of extracting colloidal interaction parameters from SAXS curves, which is that a given SAXS curve is not necessarily unique for a combination of  eff and  −1 .In order to overcome this limitation, an inverse model going from parameters to curves which considers sets of possible solutions and the likelihood of these solutions rather than single solutions was developed and is discussed later in this paper.Nonetheless, the ANN algorithm is able to predict  eff and  −1 from a given SAXS curve with prediction errors within ± 20%.

Optimization of q cutoff used for ANN training and comparison between values of 𝒁 𝐞𝐟𝐟
and  − obtained from Analytical Methods and ANN Prediction.The SAXS curves were observed to overlap in the high q region (q > 0.1 Å -1 ), while significant variation in I(q) was only observed at low q region, as illustrated in the 50 randomly selected SAXS curves obtained from the MC simulations shown in Figure 3(a).This suggests that utilizing the entire range of I(q) values for each generated curve may not be necessary for training the ANN algorithm, as fitting to very small difference at high q could result in significant overfitting of the model.Thus, the work done in this section was aimed at identifying the cutoff value for q (qmax) which can reduce the computation cost incurred during training, while delivering the same prediction performance.Figure 4 depicts the workflow for the MCMC sampling procedure, which was used to determine maximum a posteriori (MAP) estimates for  eff and  corresponding to an input SAXS curve.The sampling procedure was initialized by defining a uniform prior for both  eff and  −1 and a "walker", which represents a combination of parameters within the defined parameter space The "walker" is allowed to make random walks within the parameter space, and the random moves are accepted or rejected depending on the log-likelihood evaluated for the "walker".Conventionally, the SAXS profile used to compute the log-likelihood would be obtained from MC simulations, which are slow and computationally expensive due to the complex and iterative nature of the atomistic simulations involved in this work.Thus, a trained ANN was used as a surrogate model to predict a SAXS curve for a given pair of  eff   two parameters, which is consistent with the discussion in an earlier section.The MAP estimate of  eff and  −1 from the MCMC sampling were found to be 24.98 and 6.81 nm respectively.These MAP estimates were used to generate a corresponding SAXS curve using the MC simulation, which is plotted in Figure 5

Conclusion:
In this work, we demonstrated a proof-of-concept for using ANNs to estimate colloidal interaction parameters,  eff and  −1 , from SAXS profiles obtained from MC simulations of nanoparticles subjected to different interaction strengths.As compared to the currently used analytical approach of obtaining colloidal interaction parameters, the ANN approach is not limited by assumptions made in models or closure relations, which are necessary for obtaining analytical estimates of  eff and  −1 .In addition, the trained ANN is able to provide good estimates of  eff and  −1 for a set of test data containing 25,000 SAXS curves, with a majority of the estimates having prediction errors within ± 20%.An inverse ANN was then used as a surrogate model to perform MCMC sampling and successfully used to interpret an experimentally obtained SAXS profile.We believe that the approach described in this study, coupled with appropriate interaction potentials, can be used to study a wide range of colloidal interaction phenomena.654000.The authors also thank the developers of SasView and SASFit for providing guidance on using their respective software packages.
501 Å -1 by sampling combinations of  eff and  −1 ranging from 10.0 -70.0 and 3.0 nm -7.0 nm respectively.Artificial Neural Network (ANN) Model Architecture and qmax Optimization.The simulated SAXS curves were split into training, test and validation sets in the ratio of 8:1:1.The training set was used to train a dense ANN which takes a SAXS curve as input and outputs predicted values of  eff and  −1 for the input SAXS curve.The ANN architecture consisted of 8 hidden layers, with the initial hidden layer containing 512 nodes and the https://doi.org/10.26434/chemrxiv-2024-swz2p-v2ORCID: https://orcid.org/0009-0003-4036-2727Content not peer-reviewed by ChemRxiv.License: CC BY-NC-ND 4.0 number of nodes halving every subsequent hidden layer.The input layer contained a number of nodes corresponding to the number of input q values, which was varied to optimize the range of q values used for training, while the output layer contained two nodes corresponding to the two predicted quantities,  eff and  −1 .The ANN was trained with a rectified linear unit (ReLU) activation function at a learning rate of 0.001 with a loss function of mean squared error (MSE) and training was stopped when validation loss did not decrease for 10 consecutive epochs.Subsequently, the performance of the trained ANN was evaluated on a test set containing 25,000 simulated SAXS curves.Training ANN surrogate model and Inverse Markov Chain Monte Carlo (MCMC)sampling with surrogate ANN model.The simulated SAXS curves were split in the same ratios as discussed above to train a dense ANN which takes  eff and  −1 as inputs and outputs the SAXS curve associated with the input values of the two parameters.The optimized ANN architecture consisted of 4 hidden layers, with the first layer containing 128 nodes and the rest of the hidden layers containing 512 nodes each.The output layer for the ANN contained 225 nodes, which correspond to predicted points on a SAXS profile for the input parameters.The ANN was trained with a ReLU activation function at a learning rate of 0.001 with a loss function of mean squared error (MSE) and training was stopped when validation loss did not decrease for 10 consecutive epochs.The trained ANN was then used as a surrogate for the MC simulations of the SAXS curves in a Markov Chain Monte Carlo (MCMC) procedure to estimate the generating parameters of a given SAXS curve as the ANN runs approximately 2000 times faster than the MC simulations.The MCMC sampling procedure explores sets of possible parameters ( eff ,  −1 ) to ascertain how well each set of parameters explains the observed SAXS curve by using the ANN to generate the SAXS curve before estimating the likelihood of the parameter https://doi.org/10.26434/chemrxiv-2024-swz2p-v2ORCID: https://orcid.org/0009-0003-4036-2727Content not peer-reviewed by ChemRxiv.License: CC BY-NC-ND 4.0

Figure 1 :
Figure 1: SAXS curves obtained from the Monte Carlo simulations, SASFit and SASView over the sampled range of  eff and  −1 values.

Figure 1
Figure1shows the SAXS curves obtained from the Monte Carlo simulations over the sampled range of  eff and  −1 values, as well as the SAXS curves obtained from the software packages using identical  eff and  values.As shown in Figure1, both analytical software packages were able to output SAXS curves for  −1 ~ 3 nm, which was surprising

Figure 2 .
Figure 2. Figures 2(a) and (b) show scatter plots of the ANN predicted values against the ground truth values for  eff and  −1 respectively.The r 2 and RMSE values for the fitted linearregression model for the scatter plots were found to be 0.984 and 3.703 for  eff and 0.982 and 0.2544 nm for  −1 .These values, when taken together, suggest that the ANN algorithm is able to offer accurate predictions of both  eff and  −1 for unknown SAXS curves within the bounds of the training space.The prediction errors were also calculated as a percentage and is shown in Figure2(c) as a scatter plot with the error distribution plotted as marginal

Figure 2 (
Figure 2(a): Scatter plot of predicted  eff values against actual  eff values obtained from the ANN.(b): Scatter plot of predicted  −1 values against actual  −1 values obtained from the

Figure 3
Figure 3 (a): Plot of 50 randomly selected SAXS curves obtained from the Monte Carlo

Figure 4 :
Figure 4: Schematic depicting the workflow for the MCMC sampling procedure, which obtains the posterior distributions for  eff and  for an input SAXS curve.
https://doi.org/10.26434/chemrxiv-2024-swz2p-v2ORCID: https://orcid.org/0009-0003-4036-2727Content not peer-reviewed by ChemRxiv.License: CC BY-NC-ND 4.0 and  −1 values, effectively reducing the computational resources required to run the MCMC.After the Markov chain has converged, the posterior distribution can be visualized as a histogram by sampling the walkers.This histogram provides insight into the distribution of  eff and  −1 values which best explain the input SAXS curve, facilitating the identification of regions within the parameter space which correspond to solutions.

Figure 5 :
Figure 5: (a) Corner plot showing the posterior probability distribution sampled by the MCMC method for the experimentally obtained SAXS curve, with the diagonal panels showing the posterior probability distributions for  eff (top left) and  −1 (bottom right) respectively.Star denotes the MAP estimation of  eff = 24.98 and  −1 = 6.81 nm.(b): Plot of experimentally obtained SAXS curve for MUS-AuNPs with SAXS curves obtained from the ANN and MCMC predictions.

Figure 5 (
Figure 5(a) shows the results obtained from an experimentally obtained SAXS curve with the MCMC sampling and the ANN surrogate model.The diagonal plots in Figure 5(a) show the sampled posterior probability distributions obtained for  eff and  −1 , while the off-diagonal plot shows a 2D projection of the probability distributions, which maps the entire solution (b), together with the experimentally obtained SAXS curve, as well as a SAXS curve obtained for values of  eff and  −1 predicted by the forward ANN model.The curves obtained from MCMC sampling, and the ANN were found to be in good agreement with the experimentally obtained SAXS curve, with the MCMC outperforming the ANN as illustrated in the inset of Figure 5(b).

Table 1
shows the absolute errors associated with values of  eff and  −1 obtained analytically, and the ANN models trained with qmax values of 0.0677 Å -1 (ML 225) for the SAXS profiles shown in Figure1.As discussed earlier, the analytical approach of obtaining values of  eff and  −1 is only valid for a certain operating envelope (k ≤ 6), which explains the lack of predicted  eff and  −1 values for  −1 ≥ 4.5 nm in Table1.The calculated errors presented in Table1show that the ANN predicted values have a significantly smaller absolute errors as compared to the analytically obtained values.

Table 1 :
Absolute errors associated with the analytical and predicted values of  eff and  −1 .

Applying an inverse model using a Markov Chain Monte Carlo (MCMC) sampling with an ANN Surrogate Model on an experimentally obtained SAXS curve. As discussed earlier
24combinations of different  eff and  −1 values do not necessarily result in unique SAXS profiles, which means that multiple pairs of  eff and  −1 values could result in identical SAXS curves.This motivates the formulation of an inverse model capable of analysing a methods.One key benefit of using MCMC sampling in this work is the ability to incorporate the variance associated with the SAXS measurement into the uncertainty of the estimated parameters in a weighted manner by propagating the uncertainty of the SAXS measurement.24 https://doi.org/10.26434/chemrxiv-2024-swz2p-v2ORCID: https://orcid.org/0009-0003-4036-2727Content not peer-reviewed by ChemRxiv.License: CC BY-NC-ND 4.0