A general approach to maximise information density in neutron reflectometry analysis

Neutron and X-ray reflectometry are powerful techniques facilitating the study of the structure of interfacial materials. The analysis of these techniques is ill-posed in nature requiring the application of a model-dependent methods. This can lead to the over- and under- analysis of experimental data, when too many or too few parameters are allowed to vary in the model. In this work, we outline a robust and generic framework for the determination of the set of free parameters that is capable of maximising the in-formation density of the model. This framework involves the determination of the Bayesian evidence for each permutation of free parameters; and is applied to a simple phospholipid monolayer. We believe this framework should become an important component in reflectometry data analysis, and hope others more regularly consider the relative evidence for their analytical models.


INTRODUCTION
Neutron and x-ray reflectometry methods offer an important insight into the structure of interfacial materials; such as surfactant systems [1], biological membranes [2], and polymeric photovoltaics [3]. However, the analysis of reflectometry data is a non-trivial task, due in part to the phase problem. This is where a direct inversion of the reflectometry profile, to recover the scattering length density, ρ(z) profile is not possible since the phase of the reflected radiation is lost [4]. While there has been much work to retrieve the phase information from a reflectometry profile [5]; from using reference layers [4,[6][7][8], polarized neutrons [9], a dwell time approach [10], and from the investigation of multiple contrasts [7,[11][12][13], these methods are non-trivial and often require a specific experimental setup.
Where it is not possible to retrieve the phases, a modeldependent analysis method can be used to determine the structure of the experimental system [14][15][16][17][18][19][20]. This is where a model, typically a series of contiguous layers, is created and the model reflectometry is determined using the Abelès matrix formalism for stratified media [21] or Parratt recursive method [22]. However, the use of model-dependent methods is ill-posed in nature, where the solution may not necessarily be unique [23,24]. This necessitates the integration of a priori information about the underlying physics and chemistry of the experimental system, such as assumptions about molecular volume [1,25,26]. This is typically achieved through the reparameterisation of the model, to describe the system in terms of physiochemical descriptors from which a set of layers can be determined [27][28][29]. The inclusion of this * andrew.mccluskey@diamond.ac.uk; a.r.mccluskey@bath.ac.uk prior information acts to reduce the dimensionality of the optimisation space and therefore reduces the number of possible structures that would offer satisfactory agreement with the data.
The inclusion of prior information is often achieved by taking literature values from other techniques and applying them inflexibly to constrain values in the reparameterised model, such as molecular volumes from computational simulation or solution scattering [30,31]. However, as was shown in the work of Campbell et al. [26] and McCluskey et al. [1], these values are not always strictly representative when the physical or chemical environment is varied. This has lead to the use of Markovchain Monte Carlo (MCMC) methods for the analysis of reflectometry data, resulting in the inclusion of these methods in many common reflectometry analysis packages [20,32,33]. The inclusion of these methods enables the prior probabilities for a series of variables to be outlined, which are considered in a Bayesian fashion during the sampling of the parameter space [1,34]. This allows for more rational inclusion of the prior information about the experimental system in the analysis, in addition to, the ability to obtain information about the sampled posterior probabilities that are allowed from the experimental data.
The ability to include ranges of values, in the form of uninformative uniform prior probabilities, or normally distributed priors centred on a literature value in the analysis of an experimental dataset is an important development. However, "with great power, there must also come -great responsibility!" [35] and the ability to sample or fit all of the parameters in an experimental analysis can lead to a high risk of over-fitting [36]. As a result, it is necessary to investigate methods to quantify the information that is available from a given experimental dataset and to enable the comparison of different analyti-cal models; including the number of free parameters that may be considered. A specific approach to this problem is outlined in early works from Sivia and co-workers [37][38][39], where a two-parameter model was compared with a three-parameter model; with the additional parameter being related to the scaling of the model reflectometry. Sivia and co-workers assumed that the likelihood function for both of these model would have "only one significant maximum", allowing numerical simplifications to be performed and therefore the problem to be solved analytically. This may not be possible for all models investigated by reflectometry, due again to the ill-posed nature of the analysis problem. Furthermore, the work of Hughes and co-workers [40] more recently applied the use of Bayesian model comparison to determine the mixing of lipids in a model bacterial membrane.
In this work, we outline a general Bayesian framework for the comparison of different analytical models for the rationalisation of reflectometry data. This method can allow the maximum information to be obtained from the analysis of a given dataset, without risking over-fitting. In the supplementary information for this paper, we provide detailed code showing how this framework may be applied to a problem leveraging exclusively open-source software packages.

A. Bayesian model selection framework
Bayesian model selection involves the comparison of the posterior probabilities p(H|D) [41], for two models, H x and H y , given some dataset, D. The posterior probability for a given model, H, can be determined from Bayes theorem [34], where, p(D|H) is the evidence for the model given the data, p(H) is the prior probability for the model and p(D) is the probability associated with the measured data, which is equal to the sum of the evidences for every potential model (which is potentially large if not infinite), Therefore, in order to describe the evidence as a truely conditional probability, it is necessary to evaluate very possible model that maybe applied to the experimental data. This is not feasible, therefore in this work we will use the notation, p(D|H x ) for the evidence of a given model, however this is not a strict conditional probability. When comparing the models, concerning the same data, the following relationship can be applied [42], where, the probabilities associated with the data only cancel out, and the relative posterior probabilities depend only on the evidence and prior probabilities for each model. Throughout this work, we have used the uninformative assumption that the prior probability for all models is 1. Therefore, it is only the evidence for each model, for the given data, that influences the posterior probability. Nested sampling is a method for Bayesian evidence determination that was developed by Skilling [43] and is implemented in the Python package dynesty [44]. This sampling method allows for the evidence determination through the estimation of the, possibly multidimensional, integral under the likelihood, L(X|H), and parameter specific prior p(X|H) [45], where, X is a vector of length M of the varying parameters in the model, R is a 2 × M matrix that describes the range over which the integral should be evaluated and p(X|H) is the parameter specific prior. Nested sampling involves "intelligently" sampling this integral until some stopping criterion is achieved. This stopping criterion is when the following is less than a given value, where,p(D|H) i is the current estimated evidence and ∆p(D|H) i can be estimated from the prior volume of the previous point in the sampling. In this work, a stopping criteria of ∆ ln{p(D|H)} = 0.5 was used. The likelihood is the difference between the data and model, and is defined in this work as it is implemented in the refnx reflectometry package [20,46], where, N is the number of measured q-vectors, q i is the ith q-vector, R(q i ) is the experimental reflected intensity at the ith q-vector, δR(q i ) is the uncertainty in the experimental reflected intensity at the ith q-vector, and R(q i ) m is the model reflected intensity at the ith q-vector determined from the implementation of the Abelès matrix formalism for stratified media [21,22] in refnx. The evidence allows for the comparison of two models, where the data are the same. It is only possible to compare a given pair of models for the same data, as in Equation 3 the data-only dependent probabilities were cancelled out as they were equal. The Bayes factor, B xy , is the name given to the ratio between the evidence for TABLE I. Interpretation of the Bayes factor, Bxy between models x and y.

ln Bxy
Evidence against Hy [0, 2) Not worth more than a bare mention [2,6) Positive [6,10) Strong [10, ∞) Very strong two models, The interpretation of the Bayes factor was outlined by Kass and Raftery [47], which is given in Table I. This method is easily generalised to any model-dependent experiment, and has been applied in areas such as systems biology, astronomy, computational chemistry, and indeed, in a less general fashion than the current work, to neutron reflectometry [39,42,48,49]. The Bayesian approach allows for the comparison of models where there are different numbers of free parameters, without the risk of overfitting and "fitting an elephant" [36]. This is due to the Bayesian evidence being derived from an integral in parameter space, and therefore scaling with the number of parameters. This means that the addition of an additional free parameter to the model must offer a significant improvement to the likelihood determined. However, it is important also to note that the accuracy of the determined evidence depends on the prior probabilities chosen for each of the free parameters and therefore care should be taken to ensure that these are meaningful.

B. Neutron reflectometry measurements
The neutron reflectometry measurements investigated in this work were previously published by Hollinshead et al. [50] and subsequently re-analysed by Mc-Cluskey et al. [51]. The measurements concern the study of a monolayer of 1,2-distearoyl-sn-glycero-3phosphocholine (DSPC) at the air/water interface and were conducted on seven different isotopic contrasts of the phospholipid and water. These contrasts were made up from four phospholipid types; fully-hydrogenated (h-DSPC), head-deuterated (d 13 -DSPC), tail-deuterated (d 70 -DSPC), and fully deuterated (d 83 -DSPC). These were paired with two water contrasts; D 2 O and aircontrast matched water (ACMW, where D 2 O and H 2 O are mixed such that the resulting scattering length density is zero). The pairing of the fully-hydrogenated phospholipid with ACMW was not performed, most likely due to the lack of the scattering available from such a system. Table II gives the shorthands that are used in this work to refer to the different contrast pairings investigated. The data analysed in this work was taken at a surface pressure of 30 mNm −1 . Additional details of the data collection may be found in the original publication [50].

C. Analytical model
In the model-dependent analysis of neutron reflectometry data, it is necessary to have a description of the experimental system to which the Abelès matrix formalism may be applied. Campbell et al. investigated different models for the study of phospholipid monolayers and found a two-layer description that differentiates between the phospholipid heads, h, and tails, t, to be the best at reproducing the reflected intensity [26]. The head layer is adjacent to the solvent, while the tail layer is adjacent to the air. The thickness of the tail layer is defined as d t and the head layer is d h . At each of the three interfaces, there is some interfacial roughness, σ, which is modelled with an error function [52]. In this work, the roughness was taken to be conformal, which is to say that it does not vary between interfaces, in agreement with the work of Campbell et al. [26]. This final assumption is reasonable given the fact that this system consists of a monolayer of a single lipid type, however, this may not be reasonable in more complex systems.
The scattering length density, ρ i , for a layer i, from which the model reflectometry profile is obtained, is calculated from the volume fraction of the chemical component, φ i , the scattering length of the component, b i (found in Table III), and its molecular volume, V i , where, ρ H2O is the scattering length density of water, which is 0Å −2 for ACMW and 6.35 × 10 −6Å −2 for D 2 O.
To preserve the chemical bonding between the heads and tails of the phospholipid, the following constraint is included in the model,  for the phosphatidylcholine head layer based on "molecular dimensions" [26]. b From the work of Nučerka et al. [53] and Balgavý [54]. c The maximum tail length for DSPC is found as 24.3Å from the Tanford equation [55], the value of 21Å comes from assuming a chain tilt of 30 • . d Based on the hydrophobic interaction of the tail groups, it is unlikely that water would intercalate. e A maximum of 1 was set as a volume fraction greater than 1 would be unphysical. f From the previous analysis of this dataset [51], which agrees with the expected compression of ∼12 % discussed by Campbell et al. [26]. g From the work of Campbell et al. [26]. h The minimum of 2.9Å is based on that of water [56,57], and the fact that surfactants have been shown to increase surface roughness [58].
This ensures that the number of head groups is equal to the number of pairs of tail groups. The model outlined above has up to six possible free parameters; the two-layer thicknesses, d t and d h , the two molecular volumes, V t and V h , the volume fraction of the tail layer, φ t , and the conformal interfacial roughness, σ. In this work, the evidence as a function of these parameters is of interest as it is possible to use the evidence and Bayes factor to determine the best set of free parameters for a given set of data. Therefore, in Table IV the uniform prior probability range and value when constrained is given for each of the parameters. These six parameters can give rise to 63 permutations where there is at least one free parameter. These permutations constitute every possible combination of the six parameters, either held constant or being allowed to varies within the prior probability. This allows for a complete understanding of which combinations of constrained and free parameters give rise to the highest evidence, and therefore represents the maximum information density that may be obtained from the given data. We note again, that both the chosen prior and the constrained values are not infallible and will influence the result. All seven isotopic neutron contrasts were analysed with a single structural model. In doing so we make the common assumption that there is no effect on the structure as a result of the contrast variation method [16]. Additionally, the reflected intensity from the model was scaled and a uniform background was added based on values obtained in the previous analysis of this dataset [51]. We accept that these parameters may influence the resulting evidence, as was shown in the work of Sivia and Webster [39].

III. RESULTS & DISCUSSION
A. All parameters Figure 1 shows the variation of the estimated evidence as a function of iterations, for the model where all six parameters were unconstrained. This plot shows clearly the ability for the nested sampling method to determine a good estimate of the evidence, with the plot plateauing at 6859.9 ± 0.3 after more than 10 000 samples. This was the slowest model to converge, as it was sampling the high-dimensionality parameter space, on a workstation computer this nested sampling process took around 20 min to converge leading to the results provided here.
In addition to the ability for the nested sampling method to determine the evidence for a given model, it also allows for the posterior probabilities of the analysis to be evaluated. Figure 2 gives the posterior probabilities and cross-correlations between each of the six parameters. It is clear that the volume fraction of the tail layer, φ t , and the interfacial roughness, σ, both are not complete represented by the defined prior probabilities. Hence, there lacks a full Gaussian distribution and the extremely skewed probability distribution. However, maximum for φ t and minimum for σ both come from physical limitations on the system; the former being that the volume fraction cannot be greater than 1, and the latter being that a surfactant monolayer will increase the surface roughness compared to water [58], which is ∼2.9Å [56,57]. The skew-ness of p(φ t ) is to be expected given the hydrophobic interaction between the phospholipid tails which create the situation where it is highly unlikely for water to be intercalated into the tail structure. The nature of p(σ) is similar in that the expected value of 2.9Å is the minimum value available to the distribution. However, it may be the case that by fitting all possible parameters, the data is being overfitted. Luckily, the Bayesian model comparison method allows for the evidence comparison to fit the model which offers the maximum information to be obtained, without overfitting to the data.

B. Greatest evidence
The evidence for all of the 63 possible permutations of the free parameters was determined. This allows for the comparison of these to determine which model (set of varying parameters) offers the most information from the dataset, without the over-fitting to the data that may be occurring when all six parameters are varied. This comparison plotted in Figure 3 [59]. The dotted black line indicates the set of free parameters that offer the greatest evidence for the co-refinement of the seven contrasts of neutron reflectometry data, which in this case is d h /V h /d t /V t , which has an evidence of 6872.1 ± 0.2. The set of parameters with the next greatest evidence is d h /d t /V t , however, the evidence for this set of parameters is 6866.8 ± 0.2. The interpretation of the Bayes factor between these two sets of parameters This indicates that there strong or very strong evidence for the more complex model, and that the maximum information density that can be obtained from the analysis at hand, without over-fitting, is the fitting of Figure 4 shows the posterior probability distributions that were obtained from the d h /V h /d t /V t set of parameters. The median values for each of the probability distributions and their 95 % confidence intervals are given in Table V. The value found here for the phosphatidylcholine is slightly larger than the 10Å used by Campbell et al. [26] and the 10.5Å quoted by Li et al. [60]. The thickness of the tail layer that is observed is less than that suggested previously [26]. However, it agrees well with the results from the original analysis of this data [50], which itself was consistent with previous x-ray reflectometry measurements [61][62][63]. The tail molecular volume agrees well with the compression of ∼12 %, when compared to the values from other experimental techniques [30,31], suggested by Campbell et al..
The reflectometry profiles for the median values is compared with the experimental data in Figure 5. There is clear agreement between the experiment and the model across all of the contrasts. This indicates that it is likely that there is little variation due to the isotopic contrast variation that was used, and the assumption that is discussed in II C is valid.

C. Comparison with previous analysis
As mentioned above, this particular experimental dataset was previously analysed by McCluskey et al. [51]. The analytical model used in this previous work was similar to that used herein, however, the tail volume of the phospholipid was constrained based on the assumption that the measured area per molecule was accurate and the volume fraction of the tail layer is 1. That model had three free parameters, namely d h , d t , and σ.
This previous analysis made no effort to evaluate different models of analysis or numbers of free parameters. Therefore, it is not clear if the three parameters chosen offered the maximum information density available.
While not completely comparable, as the constrained values in this work were different to those used previously, the suggestion from this work that having  the free parameters do not match with those probed previously. In future, we hope that by outlining this general methodology, others will consider the evidence of their particular model and the free parameter being used during their analysis.

D. Best per number of parameters
Finally, we will briefly present the best set of parameters for each number of parameters. These are shown in Figure 6, which has the characteristic shape found in the previous work by Sivia and others [37,39,45,64]. This plot can be seen to peak when there are four free parameters, before dropping off slowly. This drop off indicates that when more parameters are included there is an indication that over-fitting is occurring. This plot shows clearly where the maximum information is available in the analysis for this particular reflectometry data.

IV. CONCLUSIONS
This work outlines a general application of a Bayesian framework to the problem of model selection in neutron reflectometry. Building on the work of Sivia and co-workers in the 1990s [39], we have utilised a nested sampling method to generalise the methodology beyond the assumption of only one significant maximum [37]. This offers the ability to go beyond the three-parameter space previously investigated to consider every possible parameter in an analysis problem. In doing so, it is possible to compare the evidence of different models, where different numbers of parameters are allowed to vary, to determine that which offers the maximum information, without over-fitting.
To show this in action, we have investigated the analysis of seven isotopic contrasts of neutron reflectometry data from a phospholipid monolayer at the air/water interface. This particular analysis had a maximum of six variable parameters, and in this work, the evidence of all 63 possible permutations of these parameters was determined. This allowed for the permutation that offered the maximum evidence to be found, which under the circumstances of the chosen priors was when d h /V h /d t /V t were varied. This particular combination of parameters had an evidence of 6872.1 ± 0.2 , while the next best, which had more free parameters had a value of 6866.8 ± 0.2 . This was interpreted in terms of the Bayes factor, and it was shown that there was strong evidence for the d h /V h /d t /V t model. Finally, we have compared these free parameters with those used previously to analyse the same dataset [51]. This comparison indicates that this previous analysis did not necessarily maximise the information density available to the data.
We accept that the relative simplicity of this model system allows for the use of nested sampling on a relatively short timescale, with all of the computation capable of being performed in a single day on a workstation machine. However, the applicability of the methods discussed herein is relevant to any model-dependent reflectometry analysis, but we note that as the number of free parameters increases so does the dimensionality of the parameter space being sampled, leading to longer computational timescales. Therefore, we propose that for analyses with a large number of parameters the analyst starts by reducing the number of free parameters as much as is possible to ensure that the nested sampling will converge in a reasonable timescale and gradually increasing the complexity. We hope that by developing and sharing of this methodology, in future those analysing reflectometry data will consider which of the free parameters in their models may be varied and will try to determine the evidence for their particular model for other options before completing their analysis.

REPRODUCIBILITY STATEMENT
Electronic Supplementary Information (ESI) available: All analysis/plotting scripts and data files allowing for a fully reproducible, and automated, analysis workflow for the work presented is available at https:// github.com/arm61/model_select (DOI: 10.5281/zenodo.3820690) under a CC BY-SA 4.0 license.

AUTHOR CONTRIBUTIONS
A.R.M. proposed, implemented, and applied the Bayesian model comparison framework, with input from T.A., J.F.K.C. and T.S.; J.F.K.C. and T.S. sourced funding; A.R.M. wrote the manuscript, with input from all authors.