Moving radiation protection on from the limitations of empirical concentration ratios

Radionuclide activity concentrations in food crops and wildlife are most often predicted using empirical concentration ratios (CRs). The CR approach is simple to apply and some data exist with which to parameterise models. However, the parameter is highly variable leading to considerable uncertainty in predictions. Furthermore, for both crops and wildlife we have no, or few, data for many radionuclides and realistically, we are never going to have specific data for every radionuclide - wildlife/crop combination. In this paper, we present an alternative approach using residual maximum likelihood (REML) fitting of a linear mixed effects model; the model output is an estimate of the rank-order of relative values. This methodology gives a less uncertain approach than the CR approach, as it takes into account the effect of site; it also gives a scientifically based extrapolation approach. We demonstrate the approach using the examples of Cs for plants and Pb for terrestrial wildlife. This is the first published application of the REML approach to terrestrial wildlife (previous applications being limited to the consideration of plants). The model presented gives reasonable predictions for a blind test dataset.


Introduction
The models used by regulatory authorities for protecting both humans and wildlife from the effect of ionising radiation are all essentially simplified compartmental models. These models have 'constants of proportionality', which are used to model the distribution of radionuclides between environmental compartments. In radioecology, concentration ratios (CRs) serve this function (e.g. Beresford et al., 2008a;Müller and Pröhl, 1993). For terrestrial ecosystems concentration ratios are most often defined as the radionuclide activity concentration in a food crop (dry mass) or wildlife species (whole-organism, fresh mass) divided by the radionuclide activity concentration in soil (dry mass) (see IAEA, 2010IAEA, , 2014. There is a long history of measuring CR values in radioecology, international compilations of recommended values having been published for the human food chain (IAEA, 2010) and wildlife (ICRP, 2009;IAEA, 2014). The CR approach is simple and easy to apply, and there are some data available to parameterise models. However, whilst the CR approach underpins wildlife and human food chain models, they do have some significant shortcomings. Recommended CR values for application in models tend to be a single value for a given 'element -organism/crop combination'. At best, CR values for transfer to plant species may be collated based upon simplified soil groupings (e.g. peat, clay etc.) (IAEA, 2010). Consequently, CR values for an element -organism (or crop) combination vary over orders of magnitude, adding significantly to uncertainty of radiological assessments. For instance, the estimation of transfer of radionuclides to wildlife (via the use of CR values) is considered the most uncertain element of environmental assessment models (e.g. Johansen et al., 2012).
Furthermore, for both human foodstuffs and wildlife we have no, or very few, data for many radionuclides (e.g. see IAEA, 2010 andBrown et al., 2016). Raskob et al. (2018) report this lack of radionuclide transfer parameters for specific foodstuffs as an issue raised by Japanese scientists involved in assessments following the 2011 Fukushima accident. However, regulating a combination of contaminated nuclear legacy sites and releases from various facilities including the more than 440 operating nuclear power reactors world-wide (WNA, 2018) and medical uses of radioisotopes, whilst also providing the environmental safety cases for the construction of nuclear waste repositories and new nuclear power plants, requires the capacity to predict the transfer of many radionuclides into a wide range of wildlife and human foodstuffs. Realistically, we are never going to have specific data for every radionuclide -wildlife/human foodstuff combination. Therefore, we need robust approaches which can be used to make predictions for species for which there are no data but which require assessment.
We have previously developed novel analyses for the transfer of radionuclides into organisms based on taxonomy using residual maximum likelihood (REML) fitting of a linear mixed effects model. The resultant models take account of inter-site variation and explained a useful proportion of variation in radionuclide transfer (Willey, 2014;Beresford et al., 2013Beresford et al., , 2016. This approach has been suggested for a number of years for plants (Broadley et al., 1999(Broadley et al., , 2001Willey and Fawcett, 2006;Willey and Wilkins, 2008;Willey, 2010) and more recently, as a demonstration, we have applied it to freshwater fish and caesium  and also marine organisms and caesium (Brown et al., 2019).
In this paper, we further investigate the application of the approach, originally proposed for plants by Broadley et al. (1999), as an alternative to the CR approach for predicting activity concentrations in wildlife as we initially suggested in Beresford et al. (2013). Whilst in Beresford et al. (2013Beresford et al. ( , 2016 we demonstrated this approach was successful for freshwater fish and caesium, here we extend our consideration to terrestrial wildlife. We then demonstrate how the resultant model can be used as an alternative to the CR approach to predict activity concentrations in wildlife at an assessment site. The approach relies on data being available on radionuclide activity concentrations for at least one species at the assessment site. For plants (including crop species), we refit the model for Cs previously reported in Willey (2010) but have included additional data. We then demonstrate another application of the approach, developing the concept of a benchmark-taxa CR and showing how it can be used to provide missing CR values for species for which no data are available for modelling purposes (accepting that for some predictive purposes models will have to continue to employ CRs in some form).

Methods
We used residual (or restricted) maximum likelihood (REML) fitting of a linear mixed effects model to data for both wildlife (Pb) and plants (Cs). Within these analyses, taxon and site/study were considered to be the fixed and random factors respectively. This provides a method for statistically accounting for as much of the effect of site/study as possible within the collated data. The model output is not a CR or concentration value, but is an estimate of relative means for taxa that can be anchored to relative concentrations (or concentration ratios) for the different taxa. A discussion of the REML procedure for a linear mixed model as used in this paper has been published elsewhere (Willey, 2010). REML fitting of the random parameter is an iterative process based on a maximum likelihood function. We generally use a maximum of 100 iterations, with taxon as the fixed factor, to provide a best estimate of relative values for taxa across studies by allowing for adjustment of the effects of the random ('site' or 'study') factor. The effects of any particular site or study are different, likely unknown and vary significantly -as evidenced by the significant variation in reported concentration ratios for the transfer of radionuclides (IAEA, 2010(IAEA, , 2014. We use residual maximum likelihood estimates, which are based not on absolute values of the random factor but probability distributions of them, because it is generally thought to provide better estimates of relative means for the fixed factor (taxon) when the random parameter (i.e. that being adjusted for) has, for example, high variance (which concentrations ratios or activity concentrations often do) in fitting of linear mixed effects models. For a data set to be suitable for the REML fitting of the linear mixed effects models, concentrations or concentration ratios had to be available in two or more taxa (i.e. species, genus depending upon what taxonomic level the model is being run) from a given site or for plants under a single set of experimental conditions. Furthermore, datasets were only included if they had at least one taxa in common with other datasets. The input data for the REML fitting do not need to be concentration data, CR values can also be used with the proviso that for any given site the data must all be input as either CR values or concentrations. Both applications presented below use a mix of concentration and CR data as inputs; either can be used, as at a given site the comparative differences will be the same for both given that CR is the concentration in organisms/crops divided by a site specific constant (i.e. soil activity concentration).

Pb in wildlife
For this paper, we use Pb as an example of the derivation and application of the approach proposed here for terrestrial wildlife species. Here we use the term 'wildlife' to mean 'any living species (plant or animal) that is not domesticated' as defined in IAEA (2014). The 'Wildlife Transfer Database' (WTD) as described by Copplestone et al. (2013) is a large (now > 100,000 CR values ) compilation of whole-organism, fresh mass (FM), CR values for wildlife in terrestrial, marine and freshwater ecosystems; the database was initially compiled to underpin IAEA (2014) and ICRP (2009) recommendations. All terrestrial data in the WTD are from field studies and we have used it as our primary source of data for the analyses described in this section. We have used CR values from the WTD as the database represents by far the largest compilation of relevant data and, as already stated, at a given site, variation between taxa should be the same for CRs as for concentrations. Models were fitted at three taxonomic levels: order, family and genus.
The requirement for a given site to have data for more than one taxon and for at least one of those taxons to be present at another site can mean that a considerable proportion of the data in the WTD for a given element are unusable. The WTD contained 928 useable entries for Pb and terrestrial species, representing 1195 CR values (some entries are for sample sizes of greater than one and generally have an associated standard deviation value) from 15 different sites. Additional data were obtained from: (i) recently reported studies at sites in Spain (Guillén et al., 2018) and the Chernobyl Exclusion Zone (Beresford et al. in-press, on-line) at which different species were sampled focusing on the ICRP's Reference Animals and Plants (RAPS; ICRP, 2008); (ii) a similar unpublished study conducted at sites in north-east England (see Søvik et al., 2017); (iii) data for 30 wild plant species from a single site in Spain ; (iv) new Pb data for samples described in Wood et al. (2008) (available as FM concentration values) and additional taxa caught during the study reported by Barnett et al. (2014). The resultant dataset contained 1422 data values from 24 sites and is available from Beresford and Barnett (2018). The dataset contained both stable lead and 210 Pb values and these were treated similarly (as is common practice, e.g. IAEA (2014)); sites contaminated by heavy metals were excluded from the WTD (see Beresford et al., 2008b). Because the dataset was a mix of single data entries and also arithmetic means of multiple samples which did not always have associated standard deviation values, the approach described by Wood et al. (2013) was used to reconstruct the dataset (i.e. if a given entry into the dataset was for 10 samples, then 10 individual CR values were generated). As > 900 of the data values could not be attributed to specific species no attempt was made to fit a species level model. REML fitting was performed using the IBM SPSS statistical package (version 24) using 'Linear Mixed Model' analysis within which REML fitting is a standard option; Brown et al. (2019) have previously used this package to fit REML models to data for Cs and marine organisms. Radionuclide and element concentrations (and CR values) in environmental samples are generally log-normally distributed so the natural log of data values were used. Because of the data requirements of the REML analyses, as described above, differing amounts of data were available for the runs at genus, family and order levels. For instance, a few data were only classified at the order level and whilst a site may contain data for multiple genus these may all be from one family or order etc.; Table 1 provides a summary of the data available for each taxonomic level. The Akaike information criterion (AIC) was used as an estimator of the relative quality of statistical models (the lower the AIC the better the model). To check that including site as a random factor resulted in a better model, all runs were repeated with site removed (ensuring that an intercept was included for the fixed parameter (i.e. taxon)).

Cs in plants
Datasets of concentrations or concentrations ratios for species were identified in the literature and compiled into a database. If published studies had, for example, datasets from two different treatment conditions or from two different sites each was used as a separate dataset. A major source of data was the compilation of caesium data in plants previously published in Willey et al. (2005), but added to this were subsequently published data for 29 plants taxa (see Supplementary  Information). In this analysis, dry matter data for green shoots only were included.
REML fitting was carried out in the IBM SPSS statistical package (version 24) using 'Linear Mixed Model' analysis as described above in section 2.1. As for wildlife the natural log of data values were used to fit each of the models. There were 972 data values representing 301 taxa from 35 studies. The 'taxon number' (see Supplementary Information) was the number assigned to each taxon in alphabetical order, and was used as the 'fixed factor'. Studies represented datasets published under a particular set of conditions and study/site was used as the 'random' factor in REML fitting. The exponent of REML adjusted values for taxa then provided a log-normally distributed set of values adjusted for the effect of study.

Results and discussion
The output of the REML fitting of a linear mixed effects model consists of a mean value for each taxa on a common scale after REML adjustment (hereafter referred to as the 'REML mean') taking account of the random factor (i.e. site or study). The REML mean value represents a relative scaling value. The databases of both Cs and plant, and Pb and wildlife, provided significant model fits (p < 0.001); there was, as expected, an effect of site/study and/or taxonomic grouping in the databases.

Wildlife
For all three taxonomic levels the inclusion of site as a random factor resulted in a better model than if site was excluded from the analyses (see AIC summary in Table 2). Based on AIC the genus based model was better than the order or family models (Table 2). On inspecting the resultant REML mean values it was noted that values for taxa containing ducks (i.e. the genus Anas, family Anatidae and order Anseriformes) were especially high. In one instance, it was known that lead shot had been used to kill the bird and it was likely the same for at least one other dataset. Therefore, all model fits were repeated after removing data for duck species. The effect of removing the duck data on the resultant REML values for the majority of taxa was minimal; Fig. 1 compares REML means at the order level with and without the inclusion of data for duck species. At all taxonomic levels the REML values derived with and without the inclusion of duck data were generally within 5% of each other. The greatest deviations, by up to c. 50-60%, were for taxa containing species (Anguis fragilis, Festuca rubra, Triturus cristatus and Vipera beru) which only occurred at a site for which duck data were removed. At the family and order levels, differences of about 20% were also observed for Cervidae and Cetartiodactyla respectively. These taxa, containing deer species, were impacted as one site only had data for white-tailed deer (Odocoileus virginianus) and Canada goose (Branta canadensis) and was consequently removed from the re-analyses. The REML means for all runs can be found in full in the dataset accompanying this paper . As an example the REML means predicted at family level without data for Anatidae (duck species) can be found in Table 3; the table also attributes the families to the ICRP RAP categories (ICRP, 2008) and the IAEA's wildlife groups (IAEA, 2014).
The similarity in REML mean values for the majority of species with and without data for duck species suggests that the resultant models are robust. However, on inspection of the input data it was evident that values for Ericaceae (i.e. heathers and Vaccinium species etc.) comprised a significant proportion of the available data, for instance, at the family level Ericaceae comprised 681 of the available 1420 CR values. To look at the influence of so much data for one taxon, the model was refitted at the family level without the data for Ericaceae species. For 94% of families the REML mean value after the removal of the Ericaceae species were within approximately 30% of their value from the model fitted to all of the data (i.e. as presented in Table 3) (Fig. 2). Hence, the effect of removing near 50% of the original data and refitting the model was relatively minor, further demonstrating the robustness of the approach. Of those families with greatest differences to the original model fitting (e.g. Fabaceae and Rosaceae) much of the available data were from sites which had Ericaceae species. In the case of Cladoniaceae, 95% of the available data could not be used in the refitting as these were for sites which only had values for Ericaceae and Cladoniaceae species.
The AIC values (Table 2) suggested that the best model is that fitted to genus level data likely because taxa at family and especially order levels contain species/genus with considerably different CR values. However, applying the approach at the genus level requires data for a large range of genera and the resultant model is restricted to those genera for which data are available. The family and order level models represent a scientifically based extrapolation approach to make predictions for species for which no data exist; currently used extrapolation approaches often have no such supporting justification .
How is the approach suggested here actually used to make predictions at an assessment site? Firstly, there needs to be measurements in at least one taxon from the site and this can then be used to estimate activity concentrations in other taxa. For instance, if using the family level model the activity concentration in family Z for which no data are available is estimated as:

×
REML mean for family Z REML mean for measured family Activity concentration for measured family Thørring et al. (2016) describe a study sampling species representative of the ICRP RAPs from a site in Norway reporting CR values for a range of elements including Pb; the ICRP RAPs are defined at the taxonomic level of family (ICRP, 2008). We have used these data as a blind test of the predictive ability of the REML means derived above; Fig. 3 presents a comparison of predictions of Pb CR values made using the REML mean values from Table 3 for each of Poaceae, Muridae and Cervidae as the 'measured family' in the above equation. CR values were estimated as these (and not concentration data) were what was available to us for this site, however, as already stated the comparative differences between families will be the same for CR as those for   (2014) which have a tendency to over-predict for most families (Fig. 3).
The comparative predictions using Poaceae as the measured family versus predictions using Muridae or Cervidae demonstrate a potential issue with the methodology proposed here. It is possible that the measured taxon data used for a given site will represent an outlier. The use of more than one known at a site would alleviate the possibility of this impacting on predictions. The probability of outliers affecting results is probably greatest when predictions are made at higher taxonomic levels (i.e. an order contains more species than a genus and hence it is likely that the order will encompass more variability in transfer).

Cs in plants
Acknowledging that for some predictive purposes CR values will continue to be employed within models, here we use the Cs and plant results to demonstrate another application of the proposed approach developing the concept of a 'benchmark-taxon' which could be used to provide CR values for taxa for which no data are available. A value of 0.51 (a recommended fresh weight concentration ratio value for 'grass and herbaceous species' (IAEA, 2014)) was used to benchmark the values, i.e. the exponents of all outputs from REML fitting were transformed such that the geometric mean of all the values was 0.51. This value was used, rather than that for any a particular crop, not only because grasses and herbaceous species made up the compiled database but also because a wide range of forage and food plants are species of these types. The database for Cs uptake by plants included 301 taxa and REML fitting of the CRs suggested that the range of values is from 0.0003 to 71 (Supplementary Information). The range in CR values for 'grass and herbaceous' species in IAEA (2014) is 0.0019-37. Consequently, our analysis suggests that CRs for plants might be more variable than previously suggested; especially given the range within IAEA (2014) will include the effect of site/study whereas our values are adjusted for the effect of study. This suggestion is based on the assumption that the relative mean concentrations of taxa using dry weights (as in the database) will be the same relative mean concentrations using fresh weights. Given that the analysis here focuses on grasses and herbs this assumption is likely to be true but further analysis of it may further reduce uncertainty. Overall, these results show that, after the effect of all aspects of study conditions has been accounted for in the model, there is a significant influence of plant factors on radiocaesium transfer. For example, when values from which the influence of study conditions has been removed were divided into those from Monocots and those from Eudicots (as defined by the APGII classification (APG, 2009)) there were significant differences between them (P < 0.005 for two-sample T-test on log-transformed values) with a geometric mean for Monocots of 0.15 (n = 65) and for Eudicots of 0.69 (n = 236) (Fig. 4). There were also significant differences in the geometric means between Monocots (0.15), Core Eudicots (0.89), Rosids (0.28) and Asterids (0.49) (F = 244, P < 0.005 for OneWay ANOVA on log-transformed values) (Fig. 5).
REML-fitting of linear mixed models for radiocaesium uptake by plants has previously suggested that taxonomic categories above the species level might be useful for general predictions of CRs (Willey and Wilkins, 2008). In fact, there is no reason a priori why the species is an appropriate unit for categorising CRs; the species is essentially a reproductive unit. It has been known for some time that the traits that effect nutrient and element uptake (White et al., 2012) will determine the uptake of radionuclides. On the basis of the Monocots and Eudicots included in the modelling performed here, we suggest that there is a significant difference between these two groups of plants in their Cs uptake. There are some primitive clades of flowering plants not included in the analysis reported here but for all the others we suggest that their taxonomic position in either Monocot or Eudicot clades could be used to refine predictions of Cs CRs. In general, Monocots can have different elemental ratios to Eudicots, including for K (Conn and Gilliham, 2010), which accords with the data presented here; Cs and K often behave as lose chemical analogues in biological systems. The existence of statistically significant differences between lower taxonomic groups might be useful for even further refinement of predictions. Our results also indicate that there might be taxonomic effects at a variety of levels of the taxonomic hierarchy. If there are, then full phylogenetic analyses might be possible for plant CRs. The taxonomic categories do not take account of the genetic distances between the Fig. 1. A comparison of REML means for Pb at the order level showing the ratio of the REML mean value without the inclusion of Anseriformes (ducks) data to the REML mean when Anseriformes data were included in the data used to fit the model. A value of '1' denotes that the two models have the same REML mean for a given order. N.A. Beresford and N. Willey Journal of Environmental Radioactivity 208-209 (2019) 106020 groups but a phylogenetic analysis would do so and might identify the existence of a 'phylogenetic signal' which would be useful for predictions across the flowering plant phylogeny.

General discussion
Even the largest published databases of CRs include values measured for only a limited range of species for many radionuclides (e.g. IAEA, 2010IAEA, , 2014. There are, therefore, many 'data gaps', which, in the absence of mechanisms to predict CRs, means that for many taxon/ radionuclide combinations only generic CR values can be used. For a given set of environmental parameters, e.g. one site or ecosystem, the availability and applicability of the available CR values can be questionable. This has often limited the utility of predictions of CRs in radioecology and, for example, after the deposition of radiocaseisum from the Chernobyl accident, for many species Cs transfer, especially on organic soils, has been greater and has persisted longer (e.g. Smith et al., 2000;Skuterud et al., 2005) than expected. Similarly, for both human foodstuffs and wildlife there is a dominance of data from northern temperate ecosystems with few data for, e.g., Mediterranean ecosystems or African environments.
When they have been measured in different species and under different conditions (IAEA, 2010(IAEA, , 2014 CRs vary in value over orders of magnitude, which suggests that significantly better predictions than those based on generic values could be made if systematic differences between species (or other taxonomic units) could be identified. The REML-fitted linear mixed models of wildlife and plant databases reported here show that by taking account of the effects of site or study (in effect the sum of the environmental parameters that might affect radionuclide transfer), databases of 'relative transfer' can be generated. These are useful in predicting activity concentrations for taxa at a given site in the absence of actual measurements at the site.
For wildlife, applying the derived REML means for Pb gave reasonable predictions for blind test data and performed better than the CR approach; a similar observation was made for Cs and freshwater fish . The REML fitted linear mixed model approach Table 3 REML mean values by family (fitted without Anatidae data). The families are also categorised to the appropriate broader wildlife group (as used in e.g. IAEA (2014) and Brown et al. (2016)) and, where appropriate, ICRP Reference Animal and Plant (RAP) (ICRP, 2008) category. n/a -not applicable.
could be relatively easily incorporated into wildlife assessment models such as the ERICA Tool  potentially reducing one of the largest sources of uncertainty within assessments. The approach may have applicability for heavy metal pollutants as well as radionuclides for wildlife (as demonstrated by the example used here, Pb). A factor that may limit the number of elements the approach can be applied to is data availability; the fitting of the REML model is 'data hungry', requiring measurements for at least two taxa at a given site and that at least one of these taxa is present in the dataset from at least one other site. An advantage of the REML methodology is that it gives an extrapolation approach that is scientifically founded (rather than being simply an assumption as a number of the commonly used extrapolation approaches are (IAEA, 2014;Brown et al., 2013)). Therefore, the approach will be useful in modelling wildlife taxa for which we have no data. It will be especially useful for making predictions for protected species which are the object of assessment in some countries (Copplestone et al., 2005) and for which lethal sampling to provide data is undesirable.
For food crops, international compilations of CR values are highly biased to European temperate climates and production systems (IAEA, 2010). There are few data for other climatic regions of the world, yet nuclear power is existent and planned to increase in Asia, Africa and South America. The REML approach could provide a method to make estimates for regional crops given the lack of data and/or identify crops with potentially high uptakes of radionuclides for targeted study.
The approach proposed in this paper is best suited to application in existing and planned situations, when equilibrium is generally assumed. Applicability of the approach to emergency situations requires further consideration.

Fig. 2.
A comparison of REML means for Pb at the family level showing the ratio of the REML mean value without the inclusion of Ericaceae data to the REML mean when Ericaceae data were included in the data used to fit the model. A value of '1' denotes that the two models have the same REML mean for a given family; the ratio value for Cladoniaceae is 10.1.

Fig. 3.
Predicted to reported measured Pb CR values for different families for samples from a site in Norway (from Thørring et al., 2016) using the REML means from Table 3 and measured data for different families (Poaceae, Muridae and Cervidae) as the 'known' (or measured value). For comparison, predictions using the most appropriate CR values from IAEA (2014) are also presented. A value of '1' would denote that the predicted and measured values were the same.

Fig. 4.
A prediction of geometric mean concentration ratios (CRs) for radiocaesium by Monocot and Eudicot flowering plants. Data for 301 different taxa of flowering plant were compiled from 35 studies using a REML-fitted linear mixed model in order to take account of the effects of study and to predict CRs for taxa. For Monocots n = 65, for Eudicots n = 236. The 95% Confidence Interval is shown for each geometric mean.

Fig. 5.
Predicted geometric mean concentration ratios for major clades of flowering plants. Data for 301 different taxa of flowering plant were compiled from 35 studies using a REML-fitted linear mixed model in order to take account of the effects of study and predict CRs for species. For Monocots n = 65, for Rosids n = 73, for Asterids n = 74, for Core Eudicots n = 85 (four basal taxa were in none of these categories). The 95% confidence interval is shown for each geometric mean.