Bayesian hierarchical modelling of size spectra

A fundamental pattern in ecology is that smaller organisms are more abundant than larger organisms. This pattern is known as the individual size distribution (ISD), which is the frequency distribution of all individual body sizes in an ecosystem. The ISD is described by a power law and a major goal of size spectra analyses is to estimate the exponent of the power law, λ. However, while numerous methods have been developed to do this, they have focused almost exclusively on estimating λ from single samples. Here, we develop an extension of the truncated Pareto distribution within the probabilistic modelling language Stan. We use it to estimate multiple λs simultaneously in a hierarchical modelling approach. The most important result is the ability to examine hypotheses related to size spectra, including the assessment of fixed and random effects, within a single Bayesian generalized mixed model. While the example here uses size spectra, the technique can also be generalized to any data that follow a power law distribution.


| INTRODUC TI ON
In any ecosystem, large individuals are typically more rare than small individuals.This fundamental feature of ecosystems leads to a remarkably common pattern in which relative abundance declines with individual body size, generating the individual size distribution (ISD), also called the community size spectrum (Platt & Denman, 1977;Sprules et al., 1983;White et al., 2008).Understanding how body sizes are distributed has been a focus in ecology for over half a century (Kerr, 1974;Peters & Wassenberg, 1983;Sheldon & Parsons, 1967), in part because body size distributions reflect fundamental measures of ecosystem structure and function, such as trophic transfer efficiency (Kerr & Dickie, 2001;Perkins et al., 2019;White et al., 2007).Individual size distributions are also predicted as a result of physiological limits associated with body size, thereby emerging from predictions of metabolic theory and energetic equivalence (Brown et al., 2004).
More formally, the ISD can be modelled as a probability density function with a single free parameter : (1) f(x) = Cx , x min ≤ x ≤ x max where x is the body size (e.g.mass or volume) of an individual, x min is the smallest possible individual and x max is the largest possible individual.C is a constant equal to: This model is also known as the bounded power law or truncated Pareto distribution.The term 'bounded' or 'truncated' refers to the minimum x min and maximum x max possible body sizes (Edwards et al., 2017;White et al., 2008).
A compelling feature of size spectra is that λ may vary little across ecosystems as a result of physiological constraints that lead to size-abundance patterns more broadly.Metabolic scaling theory predicts + 1 = log 10 log 10 − 3 ∕ 4, where is trophic transfer efficiency in the food web and is the mean predator-prey mass ratio (Reuman et al., 2008).The value of − 3 ∕ 4 is the scaling coefficient of metabolic rate and mass (0.75) (Brown et al., 2004), and as a result, values of λ have been used to estimate metabolic scaling across ecosystems (Perkins et al., 2018(Perkins et al., , 2019;;Reuman et al., 2008).Values of ~−2 represent a reasonable first guess of expected ISD exponents, with values of ranging from −1.2 to −2 appearing in the literature (Andersen & Beyer, 2006;Blanchard et al., 2009;Pomeranz et al., 2022).
Whether λ represents a fixed or variable value is debated, but it varies among samples and ecosystems (Blanchard et al., 2009;Perkins et al., 2018;Pomeranz et al., 2022).It is often described by its connection with the steepness of log-log plots of size spectra, with more negative values (i.e.'steeper') indicating lower abundance of large relative to small individuals, and vice versa.These patterns of size frequency are an emergent property of demographic processes (e.g.age-dependent mortality), ecological interactions (e.g.size-structured predation, trophic transfer efficiency) and physiological constraints (e.g.size-dependent metabolic rates; Andersen & Beyer, 2006;Muller-Landau et al., 2006;White et al., 2008).As a result, variation in λ across ecosystems or across time can indicate fundamental shifts in community structure or ecosystem functioning.
For example, overfishing in marine communities has been detected using the size spectrum in which λ was steeper than expected, indicating fewer large fish than expected (Jennings & Blanchard, 2004).Shifts in λ have also been used to document responses to acid mine drainage in streams (Pomeranz et al., 2019), land use (Martínez et al., 2016), resource subsidies (Perkins et al., 2018) and temperature (O'Gorman et al., 2017).
Given the ecological information it conveys, the data required to estimate size spectra-a vector of individual body sizes-are deceptively simple.As long as the body sizes are collected systematically and without bias towards certain sizes, there is no need to know any more ecological information about the data points (e.g.trophic position, age, abundance).However, the statistical models used to estimate λ are diverse.Edwards et al. (2017) documented eight methods.Six involved binning, in which the body sizes are grouped into size bins (e.g.2-50 mg, 50-150 mg, etc.) and then counted, generating values for abundance within each size bin.Binning and log transformation allows λ to be estimated using simple linear regression.Unfortunately, the binning process also removes most of the variation in the data, collapsing information from 1000s of individuals into just six or so bins.Doing so can lead to inaccurate values of λ, sometimes drastically so (Goldstein et al., 2004;Pomeranz et al., 2024;White et al., 2008).
An improved alternative to binning and linear regression is to fit the body size data to a power law probability distribution (Edwards et al., 2017(Edwards et al., , 2020;;White et al., 2008).This method uses all raw data observations directly to estimate λ using the maximum likelihood estimation method (Edwards et al., 2017).In addition to estimating size spectra of single samples, ecologists have used this method to examine how λ varies across environmental gradients (Perkins et al., 2019;Pomeranz et al., 2022).However, these analyses typically proceed in two steps.First, λ is estimated individually from each collection (e.g. each site or year, etc.).Second, the estimates are used as response variables in a linear model to examine how they relate to corresponding predictor variables (Edwards et al., 2020).We refer to this as the 'two-step' approach.A downside to the two-step approach is that it treats body sizes (and subsequent λs) as independent samples, even if they come from the same site or time.It also removes information on sample size (number of individuals) used to derive λ.As a result, the approach not only separates the data generation model from the predictor variables but also it is unable to take advantage of partial pooling, in which group-level estimates exhibit shrinkage towards each other (Gelman et al., 2012).
Here, we develop a Bayesian modelling framework that uses the truncated Pareto distribution to estimate λ in response to both fixed and random predictor variables.The primary benefit of this approach is that it combines the data generation process and the linear (or non-linear) model into a single generalized linear (or non-linear) mixed model.The model extends the maximum likelihood approach developed by Edwards et al. (2020) to allow for a flexible hierarchical structure, including partial pooling, within the modelling language Stan (Stan Development Team, 2022).

| Translating to Stan
Stan is a probabilistic modelling language that estimates Bayesian posteriors using Hamiltonian Monte Carlo (Stan Development Team, 2022).It does not contain the truncated Pareto described in Equation 2, so we added it as a user-defined log probability density function (lpdf) in rstan (Annis et al., 2017 Team, 2022) (Supporting Information S1).The lpdf was slightly modified as described in S.1.4 of Edwards et al. (2020) to contain a term for the count of each body size.In data sets with individual body sizes, counts will be a simple constant with a value of 1.However, if a sample of body sizes is x = {1.2,1.2, 1.5, 2.8}, then these can be re-formatted to include a vector for all unique x values {1.2, 1.5, 2.8} and a vector for their counts = {2,1,1}, where counts can also be non-integers.In large data sets (e.g.>500 individuals or so), adding a vector for counts can greatly improve the model fitting time (warmup + sampling, Supporting Information S2).
Additionally, adding counts (whether integer or non-integer) is useful for combining body size data sets that are collected with different methods (Supporting Information S2).For example, in freshwater streams, macroinvertebrates are often collected using samplers that cover 0.09 m 2 .Fish in the same streams are often collected over a much larger area, such as electrofishing a 5-m wide stream for 100 m in length, yielding a sample area of 5 × 100 = 500 m 2 .An individual macroinvertebrate of, say, 0.01 mg and an individual fish of, say, 1000 mg would each get a count of 1 in their respective data sets.But that would not reflect their density in the food web.On a m 2 basis, the macroinvertebrate has a density of 1 ∕ 0.09 = 11m 2 , but the fish's count (or density) should be 1 500 = 0.002m 2 .The choice of units for the counts is not trivial.For example, counts per m 2 will have different confidence intervals than counts per mm 2 or per km 2 for λ (see S.1.5 in Edwards et al., 2020).We explore this in more depth in Supporting Information S2.
When body sizes are collected from the same sampling method and are not tallied or binned, all counts equal 1.If data are binned, an alternative approach, such as the MLEbin method, is appropriate (Edwards et al., 2020).
Converting Equation 2 into Stan allows for Bayesian estimation of λs using generalized (non)-linear mixed models.For example, an intercept-only model would look like this: where x i is the ith individual body size, f x; , x min , x max , counts is the truncated Pareto distribution, λ is the size spectrum parameter (also referred to as the exponent), x min and x max are as described above and counts i are the tally or density of the ith body size x in a data set.The parameter is the intercept with a prior probability distribution.In this case, we specified a normal prior since λ is continuous and can be positive or negative.
For cases where the goal is to estimate changes in λ across space or time, the simple model above can be expanded to include predictors and/or varying intercepts and slopes: where x ij is the ith body size from group j.The groups might represent j sites, j experimental units or j times.The x ij body sizes are distributed as a truncated Pareto with an unknown j , corresponding to the size spectrum parameter for each j group, along with group specific x min j and x max j .The linear model for j contains an intercept , a slope , a continuous predictor z j and a varying intercept for each group j .In this example, prior distributions are Normal for , and j .Parameters and require priors for their respective means and and standard deviations and .The varying intercept j has a mean = 0, and a standard deviation j with its own Exponential hyperprior with parameter .
The literature on prior choice is broad and active (Banner et al., 2020;Wesner & Pomeranz, 2021), particularly for priors on hyperparameters like j (Aguilar & Bürkner, 2023;Gelman, 2006).We specify prior distributions here for clarity, but users should choose prior distributions that reflect prior knowledge.An example of checking priors with the prior predictive distribution and prior sensitivity is in the Supporting Information S4.

| Parameter recovery from simulated data
To ensure the models could recover known parameter values, we set j = 1, 2, …, 7 equally spaced λ values from −2.4 to −1.2.We then simulated K = 1000 data sets for each of the seven λs from a bounded power law using the inverse cumulative density function: where x ijk is the ith individual body size from the jth value of j for the kth model run (including new data simulation for each run).The variable u ijk is a unique draw from a Uniform(0, 1) distribution, and all other variables are the same as defined above.We set x min = 1, x max = 1000 and simulated 300 body sizes (i = 1,2,3, …, 300) for each j and k.To generate counts, we rounded each simulated value to the nearest 0.001 and tallied them.This generated only a small number of counts >1 (10 out of 300 body sizes).This approach was chosen to demonstrate the use of counts.For a more detailed discussion of counts, see Supporting Information S2. (3)

Individual lambdas
After simulating the data, we estimated the λ values in three ways.
First, we fit separate intercept-only models (Equation 3) to each simulated data set.This represents the common procedure of estimating λs independently before using them in later analyses (e.g.Arranz where jk are j deviations from the grand intercept for each k simulation, with priors and hyperpriors as described in Equations 10 and 11. The procedures above resulted in 9000 total model runs (7000 for the separate λ estimates plus 1000 each for the fixed and varying intercept models).Each model run includes newly simulated data from Equation 12.To assess how well the models captured the known λs, we estimated coverage and bias for each λ.For coverage, we generated 95% credible intervals (CrI) across each of the model runs and calculated the proportion of those CrIs that contained the known λ value.For bias, we calculated the difference between the posterior median of λ and the known λ across each model run.

Sample size and size range
We examined sensitivity to sample size (number of individual body sizes) across two λ values (−2, −1.6).For each λ, we simulated 30, 100, 300 or 1000 individuals.Each data set was fit using separate intercept-only models.We then repeated this process (data simulation and model fitting) K = 1000 times to estimate bias and coverage as described above.
In addition to sample size, we examined sensitivity to the size range, which can affect interpretations of λ (Sprules & Barth, 2016).
To do this, we again set λ to −2 or −1.6 and then simulated n = 300 individuals, varying x min and x max so that they contained 1, 2, 3, 4 or 5 orders of magnitude in range (i.e.x min = 1 & x max = 10 ∕ 100 ∕ 1000, etc.).For each size range, we repeated the data simulation and model fitting 1000 times to estimate bias and coverage.We also estimated precison as the range between the lowest and highest λ estimates across each model run.

Linear variation in λ across samples
We simulated a linear regression model with a single continuous predictor z such that This contains a known intercept = -1.2 and a slope = -0.05 .
The predictor z ranged from −1 to 1, with 10 equally spaced intervals.Values for and were chosen to keep λ within typical ranges of −1 to −2 across the predictors.After obtaining the 10 λ values (one for each value of z), we simulated 300 individuals from each λ using Equation 12 and setting x min = 1 and x max = 1000.The regression was fit using Equations 6-10, but without the varying intercept j .
We repeated this procedure (data simulation and model fitting) 1000 times (see Section 2.3) and checked for parameter recovery, bias and coverage as described above.

Benefit of partial pooling and priors
Using hierarchical Bayesian models has the benefit of improving λ and regression parameter estimates with partial pooling and informative priors.These can be especially important when data from different times or places have different sample sizes.
To demonstrate this, we modified the linear regression described above to include 12 values of z, one of which was an 'outlier' in which = -1.1 when z = 2.5.According to the regression equation, should actually equal −2.5 when z = 2.5.After estimating the λs, we again simulated n = 300 individuals from each lambda with x min = 1 and x max = 1000 .However, for the outlier, we limited the number of individuals to n = 50.This mimics a situation in which an outlier is potentially due to a low sample size, a scenario for which partial pooling can be particularly effective (McElreath, 2020, p. 413).The purpose of this exercise is not to reflect any particular sampling scheme, but to demonstrate the importance of partial pooling and priors.
We used four techniques to estimate the relationship between z and λ.First, we used the two-step process to (1) individually estimate each lambda and (2) fit a Gaussian Bayesian linear regression between the between z and the separately estimated λs.This is akin to a no-pooling regression, in which no information about sample size or uncertainty in λ is accounted for in the λ estimates.Second, we fit the same regression but added measurement error for λ.This allows for weighting the response by the standard deviation of λs, such that the linear model has the form: where each j has mean true,j and standard deviation sd j and true,j is modelled as a linear function of z with an intercept and slope .

| Model fitting
Because the truncated Pareto pdf as described here is not available in rstan, we built an R package, isdbayes (Wesner & Pomeranz, 2023), to integrate it into rstan using brms in R (Bürkner, 2018;R Core Team, 2020).The main benefit of brms is that it fits Bayesian models in rstan using common R mod-

| Individual lambdas
The three methods (separate models, fixed effects and varying intercepts) recaptured the true λ values (Figure 1) with no apparent evidence of bias.For example, mean bias ranged from −0.01 to 0.008, but all standard deviations included zero (Table 1).Similarly, coverage ranged from 0.93 to 0.96 with a grand mean of 0.95, indicating similarity to the nominal coverage of 0.95 (Table 1).

| Sample size and size range
Coverage ranged from 0.93 to 0.96 across sample sizes (Figure 2a), indicating good statistical coverage even at low sample sizes.
However, precision increased with sample size.At n = 1000 and = -1.6, the range of mean λ estimates (largest minus smallest λ) was 0.13.By comparison, it was 0.9 at n = 30 (Figure 2a).In addition, at n = 30, there was a slight negative bias of 0.03 units compared to the true λ (though the standard deviations all covered the true λ).
Coverage was also consistent across size ranges, achieving nominal coverage even at size ranges of 1 order of magnitude (Figure 2b).
There was also no indication of bias, with mean λ estimates ranging from −0.002 to −0.007 units away from the true λ and standard deviations including 0. However, precision was lower when body sizes ranged 1 order of magnitude (range of estimates = 0.7 and 0.6 units for = -2 and −1.6, respectively).Precision declined to ~0.4 and ~0.3 at body size ranges of two or more orders of magnitude and remained relatively stable (Figure 2b).

| Benefit of partial pooling and priors
Without partial pooling or informative priors, the two-step method was heavily influenced by the outlier, yielding a slope of −0.03 (95% CrI: −0.1 to 0.04; Figure 4a).While the credible interval contains the true slope (−0.1), there is high uncertainty in both the slope value and its sign.For example, there was only a 0.77 probability of a negative slope (and a 0.23 probability of a positive slope).Incorporating measurement error for λ made little difference, with nearly identical values of the regression parameters (Figure 4b).
Fitting the same data with a single truncated Pareto linear mixed effects model reduced the influence of the outlier, yielding a slope of −0.06 (−0.1 to 0), 50% closer to the true slope of −0.1 than the two-step model (Figure 4c).In addition, this model more reliably captured the correct sign, with a 0.96 probability of a negative slope.
Adding strong priors on the slope and intercept parameters further improved the estimate (Figure 4d), with a 0.99 probability of a negative slope.
In addition to improving parameter estimates, the λ estimates themselves are improved in the partially pooled models (Figure 4c,d).
For example, in the two-step method, λ in the outlier is estimated −1.1 (Figure 4a), but it is reduced to ~−1.34 with partial pooling (Figure 4c,d).Partial pooling has a minimal effect on the other λs due to their larger underlying sample size.TA B L E 1 Parameter recovery using three modelling approaches with the same data.First, separate models individually recapture known lambda values.Second, lambdas are estimated using a single fixed effects model.Third, lambdas are estimated hierarchically using a single varying intercept models.Each model and data simulation procedure is repeated 1000 times.Coverage is estimated for 95% credible intervals.Bias represents the mean and standard deviation of bias across the 1000 replicates.et al., 2017;White et al., 2007).However, if more informative priors are required, such as our example in Figure 4d, then caution should be used when comparing prior expectations to previously estimated λs.For example, in an analysis of marine fish trawl data, Edwards  3.15 3.17 3.17 3.17 3.17 3.13 3.16 3.17 3.18 3.17 Similar to priors, partial pooling from varying intercepts provides additional benefits, allowing for the incorporation of hierarchical structure and pulling λ estimates towards the global mean (Gelman, 2005;Qian et al., 2010).In the example shown here, pooling was able to downweight the influence of an outlier that had a relatively small sample size (n = 50 individuals compared to n = 300).
By contrast, in the two step-method, the same outlier had a large influence on the regression outcome, because the model had no information on the number of individuals used to generate each λ.
This becomes especially important when models are used to forecast future ecosystem conditions.Forecasts are becoming more common in ecology (Dietze et al., 2018) and are likely to be easier to test with modern long-term data sets like NEON (National Ecological Observatory Network) in which body size samples will be collected at the continental scale over at least the next 20 years (Kuhlman et al., 2016).In addition, because the effects of priors and pooling increase with smaller sample sizes, varying intercepts are likely to be particularly helpful for small samples.In other words, priors and partial pooling contain built-in scepticism of extreme values, ensuring the maxim that 'extraordinary claims require extraordinary evidence'.
One major drawback to the Bayesian modelling framework here is time.Bayesian models of even minimal complexity must be estimated with Markov Chain Monte Carlo techniques.In this study, we used the No U-Turn sampling (NUTS) algorithm (Hoffman & Gelman, 2014) via rstan (Stan Development Team, 2022).Stan can be substantially faster than other commonly used programs such as JAGS and WinBUGS, which rely on Gibbs sampling.For example, Stan is 10-1000 times more efficient than JAGS or WinBUGS, with the differences becoming greater as model complexity increases (Monnahan et al., 2017).In the current study, intercept-only models for individual samples with ~300 to 1500 individuals could be fit quickly (<2 s total run time (warm-up + sampling on a Lenovo T490 with 16GB RAM)).However, the hierarchical regression models took >1 h to run.These times include the fact that our models included several modifications to improve efficiency, such as weakly informative priors, standardized predictors and non-centred parameterization, each of which are known to improve convergence and reduce sampling time (McElreath, 2020).But if Bayesian inference is desired, these run times may be worth the wait.In addition, they are certain to become faster with the refinement of existing algorithms and the introduction of newer ones like Microcanonical HMC (Robnik et al., 2022).
Body size distributions in ecosystems have been studied for decades, yet comprehensive analytical approaches to testing hypotheses about them are lacking.We present a single analytical approach that takes advantage of the underlying data structures of individual body sizes (truncated Pareto distributions) while placing them in a generalized (non)-linear hierarchical modelling framework.In addition to fitting regression models, the results suggest that sample sizes >100 individuals, but optimally >1000, are sufficient to accurately estimate λ.We also found good performance at size ranges from two to five orders of magnitude, though it is important to note that this result is based on simulated data in which x max is known (i.e.we 'know' it is 10, 100 or 1000 because we are using simulated data).This is more difficult in a field setting.For example, in a community with = -1.5, x min = 1g and x max = 1000g, there is only a 0.00016 probability of sampling an individual >999 g.In other words, if the choice of x max is based only on the sample data, it is likely to underestimate the true x max in the community.One approach is to set x max from the largest individual caught under repeated sampling (Gjoni et al., 2024).In a field sample that ranges, say, three orders of magnitude in body size, researchers should ensure that this range reflects the likely range of true sizes in the data set.We hope that ecologists will adopt and improve on the models here to critically examine hypotheses of size spectra or other power law distributed data.Moreover, while  Simulation the examples here are for ecological size spectra, the statistical approach is not limited to ecological data, but can be applied to analysis of power law distributions that are common in a wide variety of disciplines (Aban et al., 2006;Clauset et al., 2009).
et al. (2019); Pomeranz et al. (2022)).Second, we fit a single fixed effects model of the form:where is the intercept representing the reference value of λ (in this case it is −2.4), m is the coefficient for the m = 1, 2, … 6 contrasts between the reference λ and m , z 1−6ijk are the covariates representing the six groups containing = { -2.2, -2, -1.8, -1.6, -1.4,-1.2}.All other parameters and data are as described above.Third, we fit a varying intercepts model of the form: = + z which the individual lambda estimates exhibit shrinkage towards the mean, particularly for the sample with 50 individuals.Additionally, the regression parameters ( , ) should be less influenced by the outlier compared to the first model.Finally, we fit a model with both varying intercepts and strong priors ( ∼ N( − 1.5,0.1),∼ N( − 1,0.02), j ∼ Exponential(1)).
elling syntax.For example, this linear regression in R, lm(y ~ x, data = data) becomes Bayesian using brm(y ~ x, data = data, …), where brm will translate the model to rstan for MCMC sampling.The dots '…' indicate additional model specifications for the likelihood, priors, iterations, chains, etc.A short tutorial on using isdbayes is available at https:// github.com/ jswes ner/ isdbayes.We specified each of the above models in brms, with the truncated Pareto added from the isdbayes package.Posteriors were explored in rstan (Stan Development Team, 2022) using four chains each with 2000 iterations for each model run.Two exceptions were the fixed and varying intercept models in Figure 1.For those, we specified two chains each with 2000 iterations.These values are lower than the default four chains with 2000 iterations rstan and brms, but were chosen for computational efficiency.In a separate experiment (Supporting Information S3), we re-ran a subset of those models with four chains and 2000 iterations and found no differences in the outcome.All models converged with Rhats < 1.01.Assessments of prior influence and model checking are demonstrated in Supporting Information S4.In particular, for model checking, we use simulations from the posterior predictive distributions.These simulations can check how well the model resembles the raw data.Strong deviations from raw data may indicate poor model specification or may indicate deviation from the assumption of the power law.
result of this work is the ability to analyse individual size distributions (ISDs) using fixed and random predictors in a hierarchical model.Our approach allows ecologists to test hypotheses about size spectra while avoiding the pitfalls of a two-step process in which is estimated individually for each sample and the results are then used as response variables in linear or non-linear models.The generalized mixed model with a bounded truncated Pareto merges these steps, linking the data generation process (e.g.individual body sizes) with the model predictors.This permits the use of prior probabilities and hierarchical structure on regressions of ISDs in a single analytical framework.The ability to incorporate prior information using Bayesian updating has two practical advantages.First, adding informative prior distributions can improve model fit by limiting the MCMC sampler to reasonable sampling space.In other words, it would not be sensible to estimate the probability that is −1234 or − 9. Without informative priors, those values (and more extreme values) are considered equally likely and hence waste much of the algorithm's sampling effort on unlikely values(Wesner & Pomeranz, 2021).Second, and most importantly, ecologists have much prior information on the values that can take.For example, global analysis of phytoplankton reveals values of −1.75, consistent with predictions based on sublinear scaling of metabolic rate with mass of −3/4(Perkins et al., 2019).Alternatively, Sheldon's conjecture suggests that λ is −2.05(Andersen & Beyer, 2006), a value reflecting isometric scaling of metabolic rate and mass, with support in pelagic marine food webs(Andersen & Beyer, 2006).However, benthic marine systems typically have shallower exponents (e.g.~ −1.4;Blanchard et al. (2009)), similar to those in some freshwater stream ecosystems (~ −1.25,Pomeranz et al. (2022)).While the causes of these deviations from theoretical predictions are debated, it is clear that values of λ are restricted to a relatively narrow range between about −2.05 and −1.2.But this restriction is not known to the truncated Pareto, which has no natural lower or upper bounds on λ(White et al., 2008).As a result, a prior that places most of its probability mass on these values (e.g.Normal( − 1.75,0.2))seems appropriate.Such a continuous prior does not prevent findings of larger or smaller λ, but instead places properly weighted scepticism on such values.An important assumption when setting priors is that we have a good understanding of the values that λ can reasonably take.For most of the examples here, our priors are weakly informative in the sense that they rule out clearly unreasonable values (e.g.= -25, etc.), but have weak effects on values within reasonable ranges (e.g.F I G U R E 1 Modelled 95% credible intervals (CrI; K = 1000) of seven λs using (a) separate intercept-only models for each lambda, (b) a fixed linear predictor with the λ value as a group and (c) varying intercepts.Vertical black lines show the true λ with corresponding values to the right of each row.Intervals either include the true λ (yellow) or not (black).For plotting, model runs are arranged from lowest to highest minimum value of each interval.
λ ~ -3 to 0).Most published values of λ fall into this range regardless of the method used by those studies to estimate λ (Edwards et al. (2020) found that binning methods produced λ estimates of ~−2.2 across 30 years of data.Yet reanalysis of the same data using F I G U R E 2 (a) Changes in parameter estimation and coverage (numbers next to densities) as a function of sample size.Sample size is the number of individual body sizes used to estimate λ.Estimates of λ were repeated K = 1000 times for each sample size and known λ combination.(b) The effect of size range on λ estimates.Modelled estimates (K = 1000) of two λs using separate intercept-only models with x min and x max ranging one to five orders of magnitude.Horizontal black lines show the true λs (−1.6 or −2).Dots in (a and b) are the posterior median λ estimates.95% credible intervals of those estimates (not shown for clarity) either include the true λ (yellow) or not (black).
Bias and 95% coverage probabilities for the intercept and slope parametres of a linear regression.Values are estimated across 1000 model runs, each of which includes simulation of body sizes and a model fit using the isdbayes package.found λ estimates closer to −1.6.If we were to use these values to guide prior selection, then the choice of reasonable prior would clearly depend on the method used to estimate λ.The simplest approach would be to assume a fixed correction between the binning methods and the truncated Pareto when setting priors based on binning methods.Unfortunately, such a fixed correction does not appear to exist(Pomeranz et al., 2024), making it difficult or impossible to use λs from binning methods to guide informative prior selection.

F
Modelled 95% credible intervals of (a) the intercept ( ) and (b) the slope ( ) of a generalized linear regression estimating the change in λ across a predictor.Vertical black lines show the true λ.Intervals either include the true λ (yellow) or not (black).For plotting, model runs are arranged from lowest to highest minimum value of each interval.

F
AUTH O R CO NTR I B UTI O N SJeff S. Wesner conceived the ideas and led the writing of the manuscript; Justin P. F. Pomeranz, James R. Junker and Vosjava Gjoni contributed critically to the drafts and gave final approval for publication.ACK N OWLED G EM ENTSThis material is based upon work supported by the National Science Foundation under Grant Nos.2106067 to JSW and 2106068 to JRJ.We thank Dr. Yuhlong Lio for statistical and mathematical advice.Regression results from (a) a two-step process where λs are first estimated with separate models and then used as the response variable in a Gaussian regression.(b) Same as (a), but with measurement error (posterior standard deviation of λ) included on the response variable.(c) A generalized linear mixed model with a truncated Pareto likelihood and weak priors.(d) A generalized linear mixed model with a truncated Pareto likelihood and strong priors.The solid black line shows the true regression slope.Dark shading shows the 50% CrI and light shading shows the 95% CrI.All models have the same underlying individual body size data.Points and error bars show the median and 95% CrI.For (a), only the median is shown, since the model does not include measurement error.