Using Bayesian Model Selection to Advise Neutron Reflectometry Analysis from Langmuir-Blodgett Monolayers

The analysis of neutron and X-ray reﬂectometry data is important for the study of interfacial soft matter structures. However, there is still substantial discussion regarding the analytical models that should be used to rationalise reﬂectometry data. In this work, we outline a robust and generic framework for the determination of the evidence for a particular model given experimental data, by applying Bayesian logic. We apply this framework to the study of Langmuir-Blodgett monolayers by considering three possible analytical models from a recently published investigation [Campbell et al., J. Colloid Interface Sci , 2018, 531 , 98]. From this, we can determine which model has the most evidence given the experimental data, and show the eﬀect that diﬀerent isotopic contrasts of neutron reﬂectometry will have on this. We believe that this general framework could become an important component of neutron and X-ray reﬂectometry data analysis, and hope others more regularly consider the relative evidence for their analytical models.


I. INTRODUCTION
The structure of insoluble monolayers of surfactant material at the air-water interface, known as Langmuir-Blodgett monolayers, are of importance both technologically and biologically [1,2]. However, it is only in the past approximately 30 years, that a non-destructive, quantitative probe of these systems has been developed [3]. Where previously surface pressure-surface area isotherms and indirect methods, such as surface potential measurements, are applied, neutron and X-ray reflectometry and Brewster angle microscopy can offer a more direct, structural, description. The use of reflectometry methods to study the structure of Langmuir-Blodgett surfactant films was first published by Grundy et al. in 1988 [3] and was then popularised through subsequent work [4][5][6][7][8][9][10][11][12].
There have been substantial developments in the instrumentation available at central facilities for neutron and X-ray reflectometry measurements [13,14]. However, there are still significant differences in the models used in the analysis of reflectometry measurements from insoluble surfactant monolayers. Whilst this problem is not limited to insoluble monolayers, this work will focus on this particular application, however, the methods are easily generalised. The differences in analytical models are covered well in the recent work of Campbell et al. [15], which, in particular, highlights the inconsistency in the use of interfacial roughness in the analytical models [16,17], the variation in the number of layers used to describe the system [18,19], and the lack of consideration * andrew.mccluskey@diamond.ac.uk; a.r.mccluskey@bath.ac.uk given to the compaction of acyl chains at high surface pressures [16,20].
There is a growing interest in the application of data science and information theory methods in reflectometry measurements; in particular for the improvement of experimental design [21] and analysis of collected data [22,23]. However, these recent works, in particular when Bayesian methods are applied, represent a resurgence of methods previously applied to reflectometry by Sivia and others in the 1990s [24][25][26][27]. This early work involved the application of maximum entropy (MaxEnt) methods to develop free-form solutions for reflectometry data, the use of Bayesian model comparison to assess relative evidence for different analytical models, and utilisation of Bayesian parameter estimation. A component of this recent growth in popularity can be attributed to the inclusion of Bayesian inverse uncertainty estimation and parameter correlation quantification, through the use of Markov chain Monte Carlo methods, in popular reflectometry packages such as refnx, Refl1D, and RasCAL [22,28,29]. However, to the authors' knowledge, there appears to be no explicit application of Bayesian model comparison methods to the determination of the best structural model for reflectometry analysis, beyond the example referenced above of Geoghagan et al.. Rather, the comprehensive consideration of model comparison is replaced with the application of a "rule-of-thumb", which states the most simple model that enables a description of the data should be used.
There is clear applicability for Bayesian model comparison methods to neutron and X-ray reflectometry analysis, given that the analysis typically takes a modeldependent approach. In this work, we build on the work of Campbell et al. by comparing the analytical mod-els outlined therein, using a Bayesian model comparison method. In doing so, we outline a general, robust framework for the comparison of analytical models that can be applied by experimental users as a component of their data analysis workflow or included in analysis software packages. Finally, we discuss the effect that different combinations of isotopic contrasts can have on the application of particular analytical models. This is achieved through the investigation of seven isotopic contrasts of neutron reflectometry data from a 1,2-distearoyl-snglycero-3-phosphatidylcholine (DSPC) monolayer at an air-water interface, previously published by Hollinshead et al. [17]. We plan to apply the methodology outlined herein in future work focusing on the information content available in a given experimental dataset, building on recent work from Treece et al. [21].

A. Bayesian model comparison framework
The application of a Bayesian framework for model comparison involves the comparison of the posterior probabilities p(H|D)[30], for two hypotheses, H x and H y , given some dataset, D. A specific posterior probability can be determined from Bayes theorem [31], where p(D|H) is the evidence for the hypothesis given the data, p(H) is the prior probability of the hypothesis, and p(D) is a probability associated with the measured data. When comparing two hypotheses concerning the same data, we apply the following relationship [32], where, the probabilities associated with the measured data cancel out, and the relative posterior probabilities depends only on the evidence and prior probabilities for each hypothesis. Throughout this work, we have taken the ratio of the prior probabilities to be unity; suggesting that all models are equally likely. Therefore, it is only necessary to determine the evidence for a model concerning the given data. Dynamic nested sampling was developed by Higson et al. [33], building on the work of Skilling [34], and is implemented in the dynesty Python package [35]. This method enables the efficient estimation of the integral over the likelihood, L(X|H) and parameter specific prior p(X|H), which is gives the evidence [36], where, X is a vector of length M of the varying parameters in the model, R is a 2 × M matrix that describes the range over which the integral should be evaluated and p(X|H) is the parameter specific prior. The likelihood is the difference between the data and model and is defined in this work as it is implemented in the refnx reflectometry analysis package [22,37], where, N is the number of measured q-vectors, q i is the ith q-vector, R(q i ) is the experimental reflected intensity at the ith q-vector, δR(q i ) is the uncertainty in the experimental reflected intensity at the ith q-vector, and R(q i ) m is the model reflected intensity at the ith q-vector. Following the use of the dynamic nested sampling method to evaluate the evidence, it is then possible to compare two models, where the data are the same, the value of which may be interpreted as outlined by Sivia et al. [25], This allows for the easy comparison of the different models outlined by Campbell et al., within a Bayesian framework. This method is easily generalisable to any reflectometry model comparison investigation [38]. Furthermore, it also allows for the comparison of models with different numbers of variable parameters, enabling the quantification of the information density available to a given reflectometry dataset. However, this final point is beyond the scope of the current work and will be the subject of future work as this enables the quantification of information contained within a given experimental dataset.

B. Neutron reflectometry measurements
The neutron reflectometry measurements investigated herein were previously published by Hollinshead et al. [17] and subsequently re-analysed by McCluskey et al. [39]. These measurements concern the study of a monolayer of DSPC at the air/water interface and were conducted on seven different isotopic contrasts of the phospholipid and water. These contrasts were made up from four phospholipid types; fully-hydrogenated (h-DSPC), head-deuterated (d 13 -DSPC), tail-deuterated (d 70 -DSPC), and fully deuterated (d 83 -DSPC). These were paired with two water contrasts; D 2 O and aircontrast matched water (ACMW, where D 2 O and H 2 O are mixed such that the resulting scattering length density is zero). The pairing of the fully-hydrogenated phospholipid with ACMW was not performed, most likely due to the lack of the scattering available from such a system. Table I gives the shorthands that are used in this work to refer to the different contrast pairings investigated. This data analysed in this work was taken at a surface pressure of 30 mN m −1 . Additional details of the data collection may be found in the original publication [17].

C. Analytical models
The analytical models used in this work replicate those outlined in the work of Campbell et al. [15], pictorial descriptions of which can be found in Figure 1. These particular models were chosen for there relevance to available data and recent publication, however, we expect that the specifics of the analytical models would have little impact on the results of this work. These were implemented in the refnx reflectometry analysis package [22,37] and the models can be found in the electronic supplementary information for this publication [40].
Model 1 consists of a single layer describing the whole phospholipid molecule, with a fixed thickness that is determined from the all-trans length of the DSPC molecule with a tail chain tilt of 30 • . The maximum tail group length is found as 24.3Å from the Tanford equation [41], which results in a thickness of 21.0Å when the tilt is considered and the head group is 10Å from the work of Campbell et al., giving a total thickness of 31Å. No interfacial roughness was considered in this model, and a single variable parameter was used, which is the volume fraction of the layer, ϕ. This parameter was given a uniform prior probability from [0.5, 1) , where the remainder of the volume fraction is made up of air and therefore offer a negligible scattering. The scattering length density, ρ was therefore calculated as, where, b l is the scattering length of the DSPC, the value of which is available in Table I, and V l is the volume of DSPC; taken in this work as 1189.9Å 3 , based on a PC head group volume of 339.5Å 3 [42,43] and tail group volume of 850.4Å 3 [39]. This value of the tail group volume agrees well with the expected compression of ∼12 %, as discussed in detail by Campbell et al.. Model 2 is essentially the same as model 1, however, an interfacial roughness is now present at the airphospholipid and phospholipid-water interfaces, which is conformal in nature, such that it has the same thickness at all interfaces. This roughness is modelled with an error function [44], the width of which is taken as 2.9Å, in agreement with the work of Campbell et al.. In this model, the volume fraction of the layer was kept constant at a value of 1 while the layer thickness, d was the single variable parameter with a uniform prior probability from [15,35)Å.
Model 3, which was found to be optimal by Campbell et al., is made up of two layers, describing the phospholipid head groups, h, (adjacent to the solvent) and tail groups, t (adjacent to the air). The tail group layer has a fixed volume fraction of 1, while the head group layer is immersed in the solvent with a volume fraction that is constrained such that the following relationship holds, where, d t and d h are the tail group layer and head group layer thicknesses respectively, and V t and V h are the tail group and head group volume respectively. The purpose of this constraint is to ensure that the molecular structure of the phospholipid molecule is conserved, such that there is one head group for each tail group. The scattering length density for each of the head and tail groups are then evaluated as in Equation ??, where the scattering length values for the head (h) and tail (t) groups are taken from Table II. The values of V t and V h were taken as 850.4Å 3 and 339.5Å 3 respectively, as discussed for model 1. The thickness of the head group layer is defined based on the molecular dimensions, in agreement with Campbell et al., with a value of 10Å and again the roughness is kept constant at 2.9Å. The variable parameter in this model is the thickness of the tail group layer, d t , which was given a uniform prior probability from [10,26)Å. For all of the neutron reflectometry analyses, the model was co-refined against up to seven experimental datasets, as is discussed in Section III the number of datasets was varied to assess the information content of the experimental technique. This co-refinement involved all of the structural variables (all variables except the scattering lengths b h/t/l ) being constrained to be the same values across all of the different contrasts measured. In all of the  [15], the colour transparency represents the volume fraction (φ) of a given component, d is the phospholipid layer thickness, and dt is the phospholipid tail layer thickness. This figure file is available under the MIT license as part the electronic supplementary information [40]. models, the resultant reflectometry was scaled and a uniform background was added on a per contrast basis. The values used for the scaling factors were taken from the previous analyses of these datasets, namely the work of McCluskey et al. [39], while the background level was set as the lowest reflected intensity measured in the reflectometry profile. We accept that this assumption could impact the results of a given analysis, however, the same assumption was applied for all models. Figure 2 shows the reflectometry and scattering length density profiles that will maximise the likelihood of each of the models, when a dataset containing all seven isotopic contrasts is evaluated. From the investigation of the agreement between the measured and modelled reflectometry, it may be observed that model 3 gives the best agreement to the data. Furthermore, on the investigation of Figure 3, the evidence for model 3 is substantially more than for models 2 or 1, as determined from Equation 5. This result agrees well with the outcome of the Campbell et al. [15], where model 3 was found to offer the minimum χ 2 value. Additionally, it agrees with the use of a two-layer, roughness-containing model, such as that which was used previously to analyse this data [17,39] However, the above example, where the dataset contains all seven isotopic contrasts is extremely idealised. It is very rare for a neutron reflectometry dataset to contain this number of isotopic contrasts; in part due to the high cost and difficulty in preparation of deuterated chemicals. For example, in the work of Campbell et al. the contrasts that were investigated were tail-deuterated phospholipid on D 2 O and ACMW and fully-hydrogenated phospholipid on D 2 O and ACMW. Furthermore, the recent works of both Pabios et al. [45] and Ortiz-Collazos et al. [46] investigate tail-deuterated phospholipid on D 2 O and ACMW and fully-hydrogenated phospholipid on only D 2 O and these particular contrasts are commonly utilised in the literature when a two layer model is ap-plied [6,18,19,47].

III. RESULTS & DISCUSSION
It is clear from Equation 1 that the particular dataset investigated will influence the evidence for a given model. Therefore we will now investigate the effect of different isotopic contrasts within a dataset on the resulting evidence for each of the three models discussed in Campbell et al.. While in the main text only specific combinations of isotopic contrasts are discussed, the evidence for every possible combination was evaluated for each of the three models (384 combinations in total) and is available in Tables S1-6 of the electronic supplementary information.
As mentioned above, typically it is not possible to measure as many as seven isotopic contrasts in a given neutron reflectometry dataset. The most common combination of isotopic contrasts that is measured in a dataset is tail-deuterated phospholipid in D 2 O and ACMW and fully hydrogenated phospholipid in D 2 O. Figure 4 compares the relative evidence for each of the models when particular subsets of the available isotopic contrasts were analysis. The three datasets shown may be considered relatively common sets of experimental data containing one, three, and five isotopic contrasts.
It is clear that in all three datasets shown in Figure 4, the maximum evidence is for model 3. However, as the number of contrasts investigated is reduced, there is a change in the relative evidence between the models. This is particularly notable when a dataset containing a single isotopic contrast is compared with those with three or five datasets. The reduction may be rationalised as when less experimental data is present, in particular when contrast variation is not used to pick out specific components of the system, a model is less able to pick out specific elements of the structure. This agrees well with perceived wisdom regarding neutron reflectometry measurements [48].
All of the given examples show clear evidence for model 3 over the others. However, as shown in Figure 5, model 3 is not always the model that offers the maximum evidence for a given dataset. In this example, the particular contrasts within the dataset are those where there is no deuteration difference between the phospholipid head and tail groups. This indicates that it may be possible when partial deuteration is not possible for the model 1 to be that for which there is the greatest evidence.
There is a clear dependence on the available experimental data for the evidence of particular models. Furthermore, the final example shows the chemical information about a particular system may not be completely reflected in the experimental data. In this final case, the chemically distinct nature of the phospholipid heads and tails is lost. While this current work has focused on the application of this Bayesian framework to the problem of Langmuir-Blodgett monolayers, the methods used a completely generalisable to any reflectometry modelling. Therefore, we hope that greater consideration will be given to the information content of neutron and X-ray reflectometry measurements in the future.

IV. CONCLUSIONS
In this work, we have outlined and applied a rigorous and easy to implement framework for the Bayesian model comparison of analytical models used in neutron   [40]. and X-ray reflectometry analysis. Our particular implementation is freely available as Python code in the electronic supplementary information of this work [40]. Similar methods were previously applied to model comparison problems in neutron reflectometry by Sivia and others [24][25][26][27]. However, it is not commonly applied to modern reflectometry analysis; despite the clear benefits.
We have built on the recent work of Campbell et al. [15], which suggests an optimal analytical model for the study of Langmuir-Blodgett monolayers with neutron reflectometry. The Bayesian model comparison method is used to investigate the relative evidence for three models described by Campbell  However, as the number of isotopic contrasts considered in an experimental dataset was reduced, there was an impact on the relative evidence for "more complex" models. This is in agreement with the perceived wisdom of reflectometry, that a more complex analytical model may be applied when there is a greater number of isotopic contrasts present in the dataset. Furthermore, it matches well with early work in the field where simple models were applied to datasets with few isotopic measurements [5]. It was also shown that if partial deuteration was not possible, and therefore only the d 83 -ACMW, d 83 -D 2 O, and h-D 2 O may be measured, the greatest evidence was for model 1, which can be attributed to the lack of definition that is provided by isotopic contrast labelling.
This indicates the substantial effect that the collected data may have on the model that can be used in the experimental analysis, suggesting that one-size-fits-all analytical models may not be possible for the analysis of reflectometry data from Langmuir-Blodgett monolayers. We hope that this will inspire those analysing any reflectometry data to more readily consider the relative evidence of particular models. Finally, we accept that if a full model comparison is not performed, or if the results are overlooked, then this may be done based on a qualitative understanding of the experimental system, despite not being demonstrated fully by the analysed data.

REPRODUCIBILITY STATEMENT
Electronic Supplementary Information (ESI) available: All analysis/plotting scripts and data files allowing for a fully reproducible, and automated, analysis workflow for the work presented is available at https://github.com/arm61/bayes_mod (DOI: 10.5281/zenodo.xxxxxxx) under a CC BY-SA 4.0 license.

AUTHOR CONTRIBUTIONS
A.R.M. proposed, implemented, and applied the Bayesian model comparison framework, with input from T.A., J.F.K.C. and T.S.; J.F.K.C. and T.S. sourced funding; A.R.M. wrote the manuscript, with input from all authors.