Luminescence age calculation through Bayesian convolution of equivalent dose and dose-rate distributions: the D e _D r model

. In nature, each mineral grain (quartz or feldspar) receives a dose rate ( D r ) specific to its environment. The dose-rate distribution, therefore, reflects the micro-dosimetric context of grains of similar size. If all the grains were well bleached at deposition, this distribution is assumed to correspond, within uncertainties, to the distribution of equivalent doses ( D e ). The 15 combination of the D e and D r distributions in the D e _D r model proposed here would then allow calculation of the true depositional age. If grains whose D e values are not representative of this age (hereafter called “outliers”) are present in the D e distribution, this model allows them to be identified before the age is calculated, enabling their exclusion. As the D e _D r approach relies only on the D r distribution to describe the D e distribution, the model avoids any assumption about the shape of the D e distribution, which can be difficult to justify. Herein, we outline the mathematical concepts of the D e _D r approach (more 20 details are given in Galharret et al., 2021) and the exploitation of this Bayesian modelling based on an R code available in the R package ‘Luminescence’. We also present a series of tests using simulated D r and D e distributions with and without outliers and show that the D e _D r approach


Introduction
For luminescence dating of sediments, the development of equipment to perform optically stimulated luminescence (OSL) analyses at the single-grain (SG) level (Duller et al., 1999a(Duller et al., , 1999b) ) has been a significant technological breakthrough, offering the possibility to produce a distribution of individual equivalent doses (De) for a given sample.This advance has also fostered the development of statistical approaches to analyse these De distributions (e.g., Galbraith et al., 1999;Roberts et al., 2000;Fuchs and Lang, 2001;Lepper and McKeever, 2002;Thomsen et al., 2007;Woda and Fuchs, 2008;Cunningham and Wallinga, 2012;Cunningham et al., 2015;Guibert et al., 2017;Guérin et al., 2017).Most of these statistical models target the component comprising the grains whose deposition is relevant for the event to be dated (i.e., the target population) and calculate a (believed) representative De value from this identified sub-population.The latest proposed model (Li et al., 2021) follows the same strategy but allows identifying outliers not representative of the depositional event for several different reasons.
Therefore, all these approaches focus only on the De distribution and require assumptions on how the individual De values are distributed.It is also worth recalling here that the mean environmental dose rate (Dr) representative for the grains constituting the selected sub-population has to be determined with confidence.
In parallel to these developments, a series of investigations approached the dose rate as a cause of dispersion of the individual De values.These investigations were either experimental (Kalchgruber et al., 2003;Cunningham et al., 2012) and/or numerical (Nathan et al., 2003;Mayya et al., 2006;Guérin et al., 2015).They all demonstrated that the spatial distribution of radionuclide bearing minerals such as K-feldspars, but also micas or zircons, might become driving agents dominating the De distribution.In the literature, these micro-dosimetric effects are usually grouped and considered a significant source of unexplained variance (overdispersion, ext_OD).Another important source of external overdispersion is the presence of outlier grains (due for instance, to sediment mixing or incomplete bleaching); this second source is in addition to the overdispersion caused by the Dr distribution inherent to the sample.To a lesser extent, the measurement process of the De values causes an additional dispersion.This component includes a purely experimental and a more theoretical part: the first refers mainly to the reproducibility of the measurement equipment, whereas the second relates to the fact that the protocol applied to determine individual De values is not best tailored to individual grains but represents a compromise of settings deemed optimal.The dispersion induced by these phenomena constitutes the internal overdispersion (int_OD) which combines quadratically with the ext_OD.
Different experimental approaches (Rufer and Preusser, 2010;Romanyukha et al., 2017) have been proposed for quantifying the micro-dosimetric effects, whereas Martin et al. (2015aMartin et al. ( , 2015bMartin et al. ( , 2018) ) and Fang et al. (2018) developed numerical sediment models to calculate the Dr distribution for a given granulometric fraction.Even though such experiments and applications remain rare to date, in this contribution, we want to put forward two questions: Does the information characterizing the Dr distribution provide valuable data to calculate a luminescence age?Furthermore, if so, What would be the way to do it?Moreover, assuming that our contribution convincingly outlines an approach: How does such an approach help identify intrusive or poorly bleached grains potentially present in a De distribution?

Basics
Let us start with a thought experiment assuming the following setting: (1) one considers a series of grains of similar shape and size behaving similarly in terms of luminescence/dose-response, (2) these grains are perfectly bleached and have no residual dose, (3) they are then mixed in a matrix rich in diverse radionuclide bearing mineral phases generating a heterogeneous flux of alpha and beta particles.One also assumes (4) that the equipment used for their future analysis is perfectly reproducible.With these conditions, we expose each grain to a specific dose rate, Dr, which is the sum of a common gamma-and cosmic-dose contribution and heterogeneous alpha and beta-dose rate components.If we wait for 50 ka and measure a massive number of De values from these grains, we would expect to obtain a De distribution with the same shape as the Dr distribution but offset by a factor of 50,000.
If the depositional setting was further complicated by supplementing the matrix of well-bleached grains of interest with a series of grains having non-zero residual doses, then superimposition of the De and Dr distributions could potentially identify these outliers.Consequently, our thought experiments show that thanks to the combination of the De and Dr distributions and without any assumption about the shape of these distributions, the depositional age can be determined even if outliers are present.The mathematical details are somewhat more cumbersome than thought experiments, and hence we will outline them in the following section.

Mathematical model
The main idea behind the De_Dr model is to combine the information from the De and Dr distributions in a Bayesian framework to detect outliers (i.e.grains not representative of the target population) automatically (if there are present) before discarding them and computing the depositional age.

General considerations
In real life, the number of De values measured for a sample is not extremely large.Even in cases where thousands of grains are analysed, the low percentage of grains emitting light combined with applying a series of rejection criteria may lead to a final De distribution comprising at best a few hundred values.In contrast, when the Dr values are obtained by a numerical simulation of the sediment sample, for instance, their number is only limited by the lab resources in terms of computation power.Another key difference between the De and Dr distributions concerns individual uncertainties: current numerical models do not report uncertainties for individual beta-dose rate values.This contrasts with the De values since each one has an error term related to the uncertainties associated with the luminescence signal and the process of its determination (fitting and interpolation).Nevertheless, the Dr distribution is not free of uncertainties: at least three terms (gamma-, cosmic-and beta dose-rates) must be considered, and at least two of them (gamma and cosmic dose rates) are characterized by a mean value and an associated error.
As the De_Dr model relies on the shape of the Dr distribution to describe the expected shape of the De distribution and identify outliers, the int_OD of the De distribution (such as measured with a DRT) needs to be incorporated into the Dr distribution.To do this, individual Dr values (written  #! in Eq. 1 below) are transformed into internally overdispersed Dr values (Dr) using the following equation: where Dr is a value comparable to any De value, int_OD the standard deviation characterizing the DRT distribution, and e a Gaussian variable with uninformative mean and standard deviation (also denoted (0,1)).

Mathematics underpinning the model
In this section, we reiterate the method used for detecting outliers in the frame of the hierarchical model introduced by Galharret et al. (2021).This Bayesian method can estimate an OSL age for a sample with both singlegrain equivalent dose values and simulated (or measured) dose rate distributions.
We assume that the classical relation between the equivalent dose De, the corrected dose rate Dr (according to Eq. 1) and the OSL age A: is satisfied but applies to the probability distributions.More precisely, we assume that the probability distribution of De is equal to the probability distribution of  ×  ! .
To determine A, the first step of the process is to estimate the sample's Dr distribution when the internal overdispersion of the De distribution is incorporated, as described in Eq. 1.Because of the wide variety of possible distributions, we chose a Gaussian finite mixture with an unknown number of components.This is a very flexible class of distributions, allowing to catch symmetric, asymmetric, and multimodal distributions.Note that a Gaussian finite mixture model is a weighted sum of 7  $,  ( ̇$: .All the model parameters (,  ', . . .,  %,  ', . . .,  %,  ', . . .,  %) can be easily estimated using an expectation-maximization (EM) algorithm (Dempster et al., 1977) and the optimal value of the number of components K selected according to the Bayesian Information Criterion (BIC).This method is implemented in the R package 'mclust' (see Scrucca et al. 2016 for details on statistical and numerical aspects).After fitting the mixture parameters on the Dr distribution, we fix their values for the rest of the modelling.According to (Eq. 2), the distribution of the De values is also approximated by a Gaussian finite mixture model with the following parameters ) The second step is to estimate A considering any outliers present and the measurement errors on the De values, which are assumed to be Gaussian with zero mean and known variance.Here, the main idea of the modelling is to associate each measured De with an individual age.We denote a1,…,an these individual ages which are related to age A as follows: where  ) , . . .,  * are independent Gaussian distributions with a zero mean.In the absence of outliers, we can assume that these errors have a common variance.The density of the prior variance is then (cf.Galharret et al., 2021).This probability distribution is named a Shrinkage distribution with parameter  0 ( .This is a usual choice of prior on variance parameter for meta-analysis models (see Spiegelhalter et al. 2004).The parameter  0 ( allows controlling the dispersion of the individual ages a1,…,an around A. Note that a preliminary estimate of individual ages is necessary to get an order of magnitude of the age errors.To do that, we consider the shrinkage parameter as the harmonic mean of the variance of the individual ages.This choice ensures that errors on individual ages are not favoured over the dispersion of the individual ages a1,…,an , and vice versa.In other words, neither is assumed to be negligible relative to the other, both having the same weight under the prior information. At this step, we may refer to this model as a Bayesian Central Age Model (BCAM) because it can be viewed as a Bayesian version of the seminal Central Age Model (Galbraith et al., 1999) even though differences exist, the most important being the absence of any pre-defined function representing the De distribution.However, this model is not robust to the presence of outliers.Hence, before estimating A, we add an additional step to detect and remove the outliers if they are present in the De distribution.
In this additional step, we adapt the BCAM in including individual random effects.This is the same principle as applied in the event model introduced by Lanos andPhilippe (2017, 2018).It amounts to the assumption that the errors  ) , . . .,  * have individual variances  ' ( , . . .,  * ( independently and identically distributed from the same shrinkage distribution as previously chosen for the BCAM.While the event model can be used to estimate A, it suffers from a lack of precision due to the summation of individual variances.Thus, in our approach, we use the posterior distribution of  In summary (Fig. 2), the mathematical model to combine the De and Dr distributions consists of four steps: 1.Each  #! value is transformed according to Eq. ( 1), considering the int_OD value, 2. the Dr distribution is fitted with a weighted sum of normal (Gaussian) densities.The number of functions and their height and width are automatically adjusted to maximize the likelihood function (Fig. 3).Dr

after a rough
Equation ( 1

The implementation of BCAM in R
The mathematical model was implemented in R and is available in the package 'Luminescence' (Kreutzer et al., 2012) version >= 0.9.16 (Kreutzer et al., 2021)

Model tests
Our tests rely on simulated numerical data (Supplement 2).Complex Dr distributions were built with a series of values (at least 1,000 per series) randomly sampled from normal and/or log-normal distributions.In cases where outliers were added to the initial De distribution, their values were randomly determined from a normal or log-normal distribution, and uncertainties were defined as mentioned in the previous paragraph.

Tests without outliers
Table 1 lists the results of tests performed using four different Dr distributions: (1) a single normal distribution, (2) a sum of two normal distributions, (3) a single log-normal distribution and (4) a sum of two log-normal distributions.
For each Dr distribution, five runs were computed, and De_Dr Ages were calculated using the combine_De_Dr() function.Fig. 6 shows an example Abanico plot (Dietze et al., 2016) of a De distribution for each type of simulated Dr distribution.
For the four distribution types considered in these tests, the De_Dr Age is very close to the given age, i.e., 50 ka, demonstrating the efficiency of the De_Dr model.It is also worth mentioning that although we did not add outliers to the initial De distribution, a few values have been identified by the model as outliers and were then discarded before the final age was calculated.However, this is not surprising and can be explained by the stochastic nature of the sampling process of the De values, which each had an associated random uncertainty.
The De_Dr Age is slightly higher than 50 ka in all cases because a few outlier values overlap randomly with the 300 initial De distribution and were therefore not identified as outliers by the De_Dr approach.However, this over-estimation remains low (<5% of the true age), whereas outliers represent almost 17 % (20/120) of the De values.Examples are illustrated in Fig. 7 as Abanico plots (Dietze et al., 2016).To test the model's performance to identify outliers when their values are close to the initial De values, we simulated normal and log-normal distributions with outliers that followed our setting from above: The results are given in Table 3 and displayed in Fig. 8.The De_Dr Ages increase with the percentage of outliers, but the over-estimation remains below 10% of the true age in all cases.This result is particularly interesting because these simulations represent cases where a series of poorly bleached grains (i.e., the outliers) whose De values are not significantly different from the mean De have been measured in addition to well-bleached grains (initial De values).
Deleted: we simulated normal and log-normal distributions with outliers …

340
Deleted: De_Dr Ages   5) is therefore indispensable, as is a visual examination of the outliers identified within the distribution of individual ages (e.g.Fig. 4).
On the plus side, it is also important to recall that the De_Dr model does not require a predefined function to 365 represent the De distribution.For the tests carried out previously, the type of the distribution (normal, log-normal or a mixture of those distributions) was fixed to randomly draw the simulated values only.However, the type of the chosen distribution and the parameters characterizing it (mean and variance) were not supplied to the model.In other words, the De_Dr model did not know about those parameters.
Nevertheless, the De_Dr model does require a precise determination of the Dr distribution.To date, this 370 distribution can be obtained either from numerical sediment models considering bulk density, grain size composition, mineralogy, as well as the spatial distribution of radioelements (for possible 2D approach, cf.Dietze et al., in review) or obtained experimentally using nuclear detectors (e.g., Romanyukha et al., 2017;Fu et al., 2022).Unfortunately, at present, such experiments are scarce and remain relatively difficult to implement.Suppose they become more common, systematic comparisons between the De_Dr model, which provides the most probable age, and other models leading to 375 the De value most representative of the event to be dated, will become possible in a future contribution.Moreover, perhaps cases will be observed where the Dr distributions do not follow a simple distribution (typically log-normal) as already suggested by Martin et al. (2015b).
One output of the model is the posterior distribution of the A defined through a simulated Markov chain.The highest posterior density interval (HPDI), a region of the density curve encompassing a particular credible interval (e.g., 380 68% or 95%), can be calculated from this distribution.The HPD, the HPDI, as well as the mean, ̅ , and the standard  To date, the De_Dr model is thus the first model that allows considering the information from the equivalent doses and dose rates simultaneously, thus offering a substantial paradigm change compared to existing approaches.

Conclusion
The De_Dr model is an alternative to statistical models to determine the target population from a De distribution.
Combining the information associated with the equivalent doses and dose rates experienced by the grains during burial,

Deleted:
these grains are perfectly bleached and have no Deleted: If we wait for 50 ka and 90 Deleted: If the depositional setting was Deleted: comprising

Deleted:
current numerical models do not report uncertainties for individual beta-dose rate values 125 Deleted: the De_Dr model relies on the shape of the Dr distribution to describe the expected shape of the De distribution and identify outliers, the int_OD of the De distribution (such as measured with a DRT) needs to be incorporated into the Dr distribution Deleted: 2021 130 Deleted: with both single-grain equivalent dose values and simulated (or measured) dose rate distributions Deleted: the first step of the process is to estimate the sample's Dr distribution when the internal overdispersion of the De distribution is incorporated 135 individual variances  ' ( , . . .,  * ( for constructing a decision rule to detect outliers.Indeed, these parameters measure the dispersion of individual ages around the central age.Therefore, if an equivalent dose is detected as an outlier, its corresponding individual age will take large values with respect to the prior information on  ' ( , . . .,  * ( .Thus, a De value is identified as an outlier if the posterior distribution of its individual variance is stochastically greater than its prior distribution.To do that, we use quantiles and compare the prior and posterior distributions.More precisely, we fix a probability 1-a close to 1 (for instance 1-a=0.95):if the posterior (1-a)-quantile is greater than the prior (1-a)-quantile, the associated De is tagged as an outlier and removed from the De distribution (Fig.1).

Figure 1 :
Figure 1: Comparison of prior and posterior cumulative distribution functions of individual variance and their 95% credible interval (bottom horizontal lines): the corresponding equivalent dose is detected as an outlier [left] or not [right].When this selection is completed, the age A is estimated with BCAM from the De distribution with the outliers removed, while the posterior distributions are approximated from Markov Chain-Monte-Carlo (MCMC) samples.In practice, we use the Gibbs sampler JAGS(Plummer 2003) through the associated R (R Core Team, 2021) package 'rjags' (Plummer 2019).2.2.3 Original data and structure of the modelInput data for the model are values from the  #! and De distributions.The De distribution is a series of central values with associated errors, whereas the  #! distribution represents the probability of each dose-rate value.Additionally, the internal over-dispersion (int_OD) obtained from the DRT experiment is required.This parameter is used to modify the Dr distribution to be the same shape as the expected De distribution.
estimation of the individual ages (corresponding to the De values divided by the mean dose-rate), the "distance" of each De value and its uncertainty with the model is computed using an MCMC process and compared to a fixed threshold set to 5%.De values scoring lower than 95% are considered outliers (Fig. 4), 4. finally, De values corresponding to the identified outliers are removed from the De distribution, and the age is computed by the Bayesian Central Age Model from this new De distribution.The cumulative probability distribution of the resulting model is then compared with this new De distribution and the original data (Fig.5).

Figure 2 :
Figure 2: Diagram representing the different steps of the estimation method.

Figure 3 :
Figure 3: Approximation of the Dr distribution with a mixture of normal (Gaussian) functions.

Figure 4 :
Figure 4: Characterisation of the De values: the values in red are identified as outliers.

Figure 5 :
Figure 5: Comparison of the cumulative distribution functions: A x Dr (red line), De (dotted blue line) and reduced -after removal of the outliers detected values-De (dashed green line).
under the function name: combine_De_Dr().The De and Dr distributions can be imported directly from an Excel TM spreadsheet or CSV file or simply passed as a data.frame(a data object in R, comparable to a spreadsheet) imported through other formats.The other values are directly passed to the function as parameters.The function combine_De_Dr() returns four plots (Supplement 1, Figs. 2-3 therein): the first two figures are related to detecting outliers and illustrate the variation of the individual standard deviation of the posterior age distributions.The last two figures show a kernel density plot of the posterior ages and the empirical cumulative distribution function plot.This last figure compares the cumulative De distributions (with or without the identified outliers) with the modelled De distribution (A × Dr).We provide a simple example with R code as supplementary information (Supplement 1).
From each obtained Dr distribution, 100 values were randomly drawn and multiplied by 50 to represent individual De values (the Dr values vary around 1 Gy ka -1 , and the De values are then around 50 and expressed in Gy).Each De value was then associated with an uncertainty randomly sampled from a normal distribution of relative uncertainties (0.1,0.05)

Deleted:Figure 6 :
Figure 6: Abanico plots of the De distributions (100 values) without outliers.a) single normal distribution, b) sum of two normal

Figure 8 :
Figure 8: De_Dr Age as a function of the number of outliers added to the initial De distribution (the expected age is 50 ka, indicated by the red line).Norm and log-Norm represent the functions from which the initial De distributions (comprising 100 values) were built.Error bars represent 95% credible intervals.Dotted lines are ±10 %.
415precision, the standard deviation  !being replaced by +  !" + ̅ "  " where  denotes the relative error (in %) associated with the systematic error.(2) To preserve the HPD region that considers the possible asymmetry of the posterior distribution, the systematic error can be modelled by  .= (1 +  ) where  is a 420 standard Gaussian variable independent of the age.One can then easily sample from the corrected age and update the HPD region.If the probability distribution of  . is a Gaussian distribution, both approaches are equivalentMoved (insertion) [4]Luminescence::plot_OSLAgeSummary() (see Supplement 1).If the posterior distribution of the age is a Gaussian distribution, the HPDI coincides with the interval [ ̅ ±  2] at 68% credible level (resp.[ ̅ ± 2  2] at 95%).The lower and the upper end of the value of the HPDI can be supplemented to facilitate systematic errors associated with the average total dose rate and the source dose rate of the equipment used for the De measurements.Two approaches are feasible.(1) The typical approach consists of modifying the precision, the standard deviation  2 being replaced by X  2 ( + ̅ (  ( where  denotes the relative error (in %) associated with the systematic error.(2)To preserve the HPD region that considers the possible asymmetry of the posterior distribution, the systematic error can be modelled by  Y = (1 +  ) where  is a standard Gaussian variable independent of the age.One can then easily sample from the corrected age and update the HPD region.If the probability distribution of  Y is a Gaussian distribution, both approaches are equivalent.
the model offers the possibility to determine the age of the target population without any predefined function representing the De distribution.Future work should focus on tests carried out on well-dated samples (typically cross-checked with 14 C dating) to validate the De_Dr model experimentally.This would, however, first necessitate access to accurately and precisely determined Dr distributions.

Table 2 :
Results of tests with 20 outliers added to the original De distribution.Their values were determined following the function indicated in the second column.Notice that the (m) parameter of these functions varied from 1.3 to 1.6, leading to outlier values which, on average, increased as is observable on the Abanico plots (Fig.7).

Table 3 :
For each type of Dr distribution (normal or log-normal), outliers values were added following either the function : 50 x Norm(n, 1.3, 0.05), or the function : 50 x log-Norm(n, 1.3, 0.05).The number of outliers varied from 0 to 50 (then representing between 0 and 33% of the initial De distribution).The age error is the 95% credible interval.
For the tests carried out previously, the type of the distribution (normal, log-normal or a mixture of those distributions) was fixed to randomly draw the simulated values only.̅ , and the standard deviation,  !, of the posterior distribution can be calculated with the function Luminescence::plot_OSLAgeSummary() (see Supplement 1).If the posterior distribution of the age is a Gaussian distribution, the HPDI coincides with the interval [ ̅ ±  !] at 68% credible 410 level (resp.[ ̅ ± 2  !] at 95%).The lower and the upper end of the value of the HPDI can be supplemented to facilitate systematic errors associated with the average total dose rate and the source dose rate of the equipment used for the De measurements.Two approaches are feasible.(1) The typical approach consists of modifying the 390Moved down [3]: considering bulk density, grain size composition, mineralogy, as well as the spatial distribution of radioelements (for possible 2D approach, cf.Dietze et al., in review)Moved (insertion) [3]Deleted: and remain relatively difficult to implement Deleted: Suppose they become more common, systematic 395 comparisons between the De_Dr model, which provides the most probable age, and other models leading to the De value most representative of the event to be dated, will become possible in a future contribution Deleted: as already suggested byMartin et al.,