Statistical inference methods for n‐dimensional hypervolumes: Applications to niches and functional diversity

The size and shape of niche spaces or trait spaces are often analysed using hypervolumes estimated from data. The hypervolume R package has previously supported such analyses via descriptive but not inferential statistics. This gap has limited the use of hypothesis testing and confidence intervals when comparing or analysing hypervolumes. We introduce a new version of this R package that provides nonparametric methods for resampling, building confidence intervals and testing hypotheses. These new methods can be used to reduce the bias and variance of analyses, and well as provide statistical significance for hypervolume analysis. We illustrate usage on real datasets for the climate niche of tree species and the functional diversity of penguin species. We analyse the size and overlap of the respective niche or trait spaces. These statistical inference methods improve the interpretability and robustness of hypervolume analyses.


| INTRODUC TI ON
introduced the hypervolume concept, which describes the requirements of species along multiple axes, that is their niche, or the functional diversity of an assemblage.The concept, reviewed in Blonder (2018), has seen wide use.An n-dimensional hypervolume is a shape defined within multiple continuously valued dimensions, for which a distance metric exists.Hypervolumes allow geometric interpretations of high-dimensional datasets and can be used to produce statistics for dataset shapes, volumes and overlaps.
Most of these approaches are unified by their production of descriptive statistics without considering sampling processes.This has often precluded the calculation of inferential statistics.Moving to statistical inference could increase the reliability of conclusions derived from these methods and expand the scope of hypotheses tested.Some existing algorithms do consider inference in limited ways.
In the case of Gaussian kernel density estimates, the assumptions are that the data are samples from random variables, and are independent and identically distributed (Blonder et al., 2018;Carmona et al., 2019).Other approaches have developed statistics that are robust to outliers (Brown et al., 2020;Junker et al., 2016).
Resampling has been used to determine if hypervolumes differ from null expectations (Díaz et al., 2016;Lamanna et al., 2014).However, hypothesis testing (Zhang, 2020) has been rarer.However, many of these studies were implemented using non-public code, limiting standardization and re-use by other investigators.
Here we develop statistical inference methods for the hypervolume R package (Blonder et al., 2014(Blonder et al., , 2018)).We (1) provide a set of R functions that allow calculation of confidence intervals for descriptive statistics under different sampling processes and sample sizes, estimation of bias, and hypothesis tests.We (2) show empirical analyses to illustrate usage on trait and niche datasets.We (3) describe other updates (Box 1) and provide validation simulations in the Supporting Information.

| Bootstrapping
To bootstrap hypervolumes, new hypervolumes are constructed from resampled data (Figure 1).New hypervolumes preserve all parameters used to construct the original hypervolume.The data for the new hypervolumes are sampled with replacement from the original data.
The quantiles of the point estimates obtained through the resampled hypervolumes can be used to create a confidence interval.
Bootstrapped point estimates are more accurate with larger sample sizes even if they are biased.However, at low sample sizes, the bias is not always negligible.When comparing two hypervolumes, bias due to sample size can be mitigated by downsampling one of the hypervolumes to make the sample size of two hypervolumes match.
We assume that the bias terms due to sample size are similar and cancel each other out.
Another way to resample is to introduce weights when bootstrapping a hypervolume.A weighted bootstrap applies weights to each data point.The probability of a particular point being resampled is proportional to the weight at that point.Weighted bootstraps produce even fewer unique points than regular bootstrapped samples, so weighted bootstrapping should only be performed on larger datasets.Figure S1 demonstrates the effect of applying large weights.
The function hypervolume_resample implements the resampling methods described above.Users can select a method parameter of bootstrap, weighted bootstrap, or bootstrap seq.
The bootstrap seq option performs multiple bootstraps at different resample sizes.This is used to investigate the effect of sample size on the variance and expected value of hypervolume descriptive statistics.More details about hypervolume bootstrapping can be found in Appendix S1.

| Permutation
Permuting hypervolumes is useful when comparing two different datasets.A permutation is performed by combining two datasets and shuffling the label of the data.Equivalently, the combined data are sampled without replacement, and the first N samples are given one label, and the remaining M observations are given another label.
When permuting hypervolumes, new hypervolumes are constructed for each partition of data points as depicted in Figure 2. If we assume all observations from both datasets are independent and identically distributed, then each set of permuted hypervolumes has an equal probability of being observed as the originals.
The function hypervolume_permute, takes two hypervolumes and permutes the labels of the data before splitting the data and constructing two new hypervolumes using the shuffled data.Like hypervolume_resample, this function preserves the size of data, hypervolume construction method, and parameters for each hypervolume.More details about hypervolume permutation can be found in Appendix S2.

| Overlap hypothesis test
The overlap test is a distribution test where the test statistic is an overlap statistic calculated from hypervolumes.The null hypothesis for the overlap test is that the distributions of two samples (with sizes N and M) are identical.
The first step in the test is to combine the samples.Sets of data are resampled repeatedly from the combined data by bootstrapping or permutation.Pairs of hypervolumes created from the resampled data are used to generate overlap statistics.The distribution of sample overlap statistics forms the nonparametric null distribution.The p-value of an overlap statistic is the percent of sample null statistics that are more extreme than the observed value.For example, the Jaccard similarity index between two sets is zero for

BOX 1 Additional updates to the R package
In estimate_bandwidth, the definition of the Silverman bandwidth estimator for box and Gaussian kernel density estimation has changed from 1.06 (Silverman, 1986).This change aligns with the best multivariate definition of this estimator.The original univariate estimator is also still available.Additionally, for all bandwidth estimates, an attribute method is now set so that bandwidth vectors can be recalculated using the same methods when resampling.
The package examples now use the penguins morphology data from Antarctica (Horst et al., 2022).We no longer use the iris dataset, because it was first published in the Annals of Eugenics (Fisher, 1936), and its primary proponent, the statistician R.A. Fisher, was a eugenicist (though it was collected for unrelated reasons by Edgar Anderson).

| Effects of sample size on niche volume estimates
We consider a case where we wish to determine whether one species has a larger niche volume than another.We use georeferenced occurrences of Quercus alba and Q. rubra that are transformed into a realized niche space in n = 4 dimensions (annual mean temperature, temperature seasonality, annual precipitation, precipitation seasonality) in Worldclim (Hijmans et al., 2005).There are more occurrences of Q. rubra (M = 2110) than Q. alba (M = 1669).We use

| Climate niche occupancy
We assess the climatic niche occupancy of two speciose plant genera, Acacia and Pinus, counting how many species' niches occupy each point within climate space.For each species within these genera (60 for Acacia and 63 for Pinus) we download occurrence data from the BIEN database (Maitner et al., 2018).Climatic data are obtained from Worldclim (Hijmans et al., 2005).We select three climate variables: mean annual temperature (bio1), mean annual precipitation (bio12), and precipitation of warmest quarter (bio18)/ (bio18 + precipitation of coldest quarter (bio19)).Climate layers are z-transformed prior to hypervolume construction.For each species, niches are built using the function hypervolume_gaussian with default parameters.Occupancy at genus level is obtained with the function hypervolume_n_occupancy using the box method and a mean summary statistics.The functions hypervolume_n_ occupancy_permute and hypervolume_n_occupancy_test are used to determine which part of the niche space is significantly higher in one genus than the other using a two-tailed test (999 permutations).

| Applying weights
We consider a case where climate niches are estimated using geographically biased sampling of species occurrences.In this example with the quercus dataset, observations of Q. alba to the east of 75° W longitude are sampled at half the frequency of other observations.We ask whether such biased sampling would significantly change the climate niche.We give every data point with longitude greater than 75° W a weight of 2 and every other point a weight of 1, then construct a hypervolume using the climate data at each location, following the quercus analysis above.Using the weights and the hypervolume_resample with the weighted bootstrap method, we compared the hypervolume constructed from the original data with the hypervolume constructed from the weighted bootstrap by using hypervolume_overlap_test.

| Overlap test on non-identically distributed samples
Using the above quercus data, we tested whether the climate niches of the two species share a common distribution using hy-pervolume_overlap_test and the sorensen metric.We perform a bootstrap overlap test and bootstrap 100 hypervolumes of each size from the combined Quercus data and obtain a null distribution consisting of 10,000 observations of resampled Sørensen similarity indexes.We also perform a permutation test and generate a second null distribution consisting of 1300 Sørensen similarity indexes.

| Empirical analysis-Quercus niche volume
Figure 3 shows the volume of hypervolumes constructed from the two species at different resampling sizes.Resampled volumes asymptotically approach the true volume as resample size increases, so the plots provide a method to approximate the level of convergence.However, inferences remain limited to the range of sample sizes available.Predictions about larger sample sizes can be made by extrapolation from these curves.Under such an assumption, it appears that neither the Q. rubra nor Q. alba volume estimator has converged to its true value.
We then compared the volume of hypervolumes constructed from a resample of size M = 1669 points (number of observations of Q. alba) from each dataset.We observed a 95% confidence interval of [0.147, 0.186] for Q. alba and [0.177, 0.251] for Q. rubra.Checking the overlap of 95% confidence intervals corresponds to a conservative test and yields the inference that the species' niche volumes are not significantly different.

| Empirical analysis-Climate niche occupancy
The hypervolumes of Pinus and Acacia obtained from climatic data with the function hypervolume_n_occupancy were highly overlapped into the multidimensional space (Figure 4).The low mean occupancy for both genera (Acacia = 0.06; Pinus = 0.08) shows that most of the multidimensional space within each genus is occupied by one or few species.However, the maximum occupancy value was higher in Pinus (0.54) compared to Acacia (0.37).Thus, more than  Areas of the climatic niche space with the most overlap among species-level hypervolumes differ between the two genera.
According to the permutation test, the climate niche of Pinus has a volume of 8.08 (21% of the Pinus niche) in which occupancy is significantly higher than in Acacia.Additionally, the climate niche of Acacia has a volume of 1.98 (6% of Acacia) in which occupancy is significantly higher than in Pinus.

| Weighted sampling
The result of the overlap test using the bootstrap method is shown in Figure 5.The p-value is 0.124, so we do not reject the null hypothesis that the biased sampling does not affect the niche distribution.

| Overlap statistics
In the Quercus datasets, the bootstrap overlap test resulted in a much lower p-value than the permutation overlap test (Figure 6).However; because both p-values are less than 0.05, we can reject the null hypothesis that the two species have the exact same climate niche.The upper triangular matrix shows the result of a two-tailed permutation test to detect differences between the occupancy of the two genera.Only significant differences are reported, with shading indicating the magnitude and colour indicating the direction of the differences (e.g.orange means that Pinus has a higher occupancy than Acacia).
determining whether differences in volume between hypervolumes valid when sample sizes differ.

| Limitations
Current methods in the R package are intended to compare an unknown sample distribution to another unknown distribution, that is when there is no known or suitable parametric distribution.Because this approach is nonparametric, the package does not report likelihoods or any statistics based on known distributions, for example likelihood ratio tests or AIC values.

| Comparisons with other methods
We did not make comparisons with other extant methods.The nicheRover package (Swanson et al., 2015) does provide probabilities for overlap estimates, but is limited to parametric assumptions (multivariate normal).The dynRB package provides some inferential tools (see Parkinson et al. (2018) for a theoretical treatment), but not for occupancy or multiple overlaps.Other packages, for example hyperoverlap (Brown et al., 2020) provide only point estimates, so comparison of performance for inference is not yet possible.Some overlap testing is provided by Di Cola et al. ( 2017) but only for two-dimensional datasets.Independent comparative studies of these or future methods using standardized datasets will be valuable.

| CON CLUS ION
The non-overlapping sets and one for fully overlapping sets, so under the null hypothesis, two samples from the same distribution should have a Jaccard similarity index closer to one.The one-tailed p-value of an observed Jaccard similarity index is defined as the fraction of null Jaccard similarity index values less than or equal to the observed value.F I G U R E 1 (a) The bootstrap procedure starts with an observed hypervolume.Here, data are for the functional diversity of Gentoo penguins in n = 2 dimensions.(b) The observed data are extracted from the hypervolume and sampled with replacement (resampled points shown by the black outline).(c) A new hypervolume of the same type is constructed from the resampled data.a) The permutation procedure starts with two hypervolumes.Data are for the functional diversity of Gentoo and Adelie penguins in n = 2 dimensions.(b) The observed data for m = 187 Gentoo penguins and m = 146 Adelie penguins are pooled and randomly split into two groups of size 187 and 146 corresponding to the original group sizes.(c) The permuted hypervolumes are constructed from the randomly split observations.
Resampling from the combined sample can be done with or without replacement.Creating two datasets by resampling without replacement is equivalent to performing a permutation test.When sampling with replacement, the combined data are treated as an approximation of the original distribution, and samples are bootstrapped from the combined data.Because there is an equal probability of obtaining any pair of bootstrapped hypervolumes of size N and size M, null overlap statistics are generated by overlapping every resampled hypervolume of size N with every resampled hypervolume of size M.This process is more efficient for generating null statistics, but it results in a different distribution than a permutation test.More information about when to use the permutation method versus the bootstrap method can be found in Appendix S3.The function hypervolume_overlap_test supports both methods of performing the overlap test, and returns p-values and simulated null distributions.The overlap statistics supported are Jaccard similarity index, Sørensen similarity index, fraction of points unique to the first hypervolume and fraction of points unique to the second hypervolume.Simulation results demonstrating the statistical power of the overlap test can be found in Appendix S4.2.1.4| Intersection and merging of multiple hypervolumesThe package characterizes the interior of hypervolumes by accepting or rejecting random points, and the overlap between two hypervolumes is determined by the distance between these random points as described inBlonder et al. (2018), and the occupancy (number of hypervolumes overlapping a single point) inLaini et al. (2023).Two functions, occupancy_to_intersection and hypervolume_n_occupancy, allow for reducing error rates when comparing overlap and occupancy of multiple hypervolumes.These functions avoid sequential downsampling issues that are required by repeated application of hypervolume_set to multiple pairs of hypervolumes, and which therefore substantially increase both false positive and false negative error rates for multiple comparisons.The function occupancy_to_intersection retains only the random points shared by all the hypervolumes involved in the calculation.A new hypervolume, corresponding to the intersection of all input hypervolumes, is generated from this set of points.Conversely, the function hypervolume_n_occupancy retains all the random points; for each random point, it calculates a summary statistic (absolute number, mean number) for all the hypervolumes or for groups of hypervolumes.A permutation test is provided to test the difference between pairwise combinations of hypervolumes generated with hypervolume_n_occupancy.The function hypervolume_n_occupancy_permute randomly assigns the original hypervolumes to one of the two groups under comparison for a user-specified number of times.The test is then performed with the function hypervolume_n_occupancy_test by counting the number of times the observed differences (e.g.differences in the average number of hypervolumes containing a given random point) are smaller or greater than those expected by chance.Both one-and two-tailed tests are available.Results for simulation analysis of the performance of hypervolume_n_oc-cupancy_test can be found in Appendix S5.

hypervolume_resample
with the bootstrap_seq method to investigate how the estimated volume of the two climate niches varies as a function of sample size.The observations of Q. alba and Q. rubra are downsampled to evaluate the volume of the hypervolumes, starting at sample size M = 100 in increments of 400.To create confidence intervals, we resample the data 20 times for each sample size.
Bootstrap volume measurements of (a) the Quercus alba climate niche and (b) the Q. rubra climate niche in n = 4 dimensions.Blue points indicate mean estimates and black points indicate a 95% confidence interval.Plots terminate at the observed sample size of each dataset.The plot is generated using the utility function hypervolume_funnel.
-level hypervolumes within the genus Pinus share the same climatic niche space.

|
Areas of applicationStudies that have used point estimates of hypervolume properties to draw conclusions may need revisiting.One case would include determining whether there is sufficient evidence for claims made about overlaps among hypervolumes.Another case would involve F I G U R E 4 Data are for climate niches of Acacia and Pinus in n = 3 dimensions.Each observation represents a randomly sampled point in the interior of 123 species level hypervolumes.The lower triangular matrix shows the occupancy values obtained with the hypervolume_n_occupancy function.

F
overlap and occupancy tests provide statistical significance for comparative analyses.Different methods of resampling hypervolumes support a wide range of hypothesis tests, and give estimates of asymptotic behaviour with respect to sample size.The bootstrap resampling methods allow for construction of confidence intervals and help quantify the variance introduced by the hypervolume construction process as well as from the sampling process.Furthermore the bootstrap methods provide efficient downsampling, enabling robust comparisons between datasets with different sample sizes.Weighted resampling methods also can be used to perturb observed data and test the robustness of results.These methods provide an integrated and standardized interface for statistical inference on hypervolumes.The empirical null distribution of the Sørensen similarity index between the climate niche of Quercus alba and the weighted resample of its climate niche after all observations east of longitude 75° W receive double weight.The red line indicates the observed overlap statistic (with p = 0.124).Empirical null distribution of Sørensen similarity index between two hypervolumes for (a) the bootstrap overlap test and (b) the permutation overlap test.Red lines represent the observed overlap statistic, black bars, the null distribution.The bootstrap overlap test gives a p-value of 0.0004 and permutation overlap test gives a p-value of 0.047.Here, data are for climate niches constructed for Quercus alba and Q. rubra in n = 4 dimensions.
Additional supporting information can be found online the Supporting Information section at the end of this article.

Figure S2 :
Figure S2: The combined samples from the same distribution approximate the original distribution better than either individual sample.

Figure S3 :
Figure S3: Illustration of the 'squeeze' transformations tested in the hypervolume_overlap_test simulation study to determine the power of the hypervolume overlap test under shape change.

Figure S4 :
Figure S4: Statistical power of the overlap test using the bootstrap method when 20 bootstraps of a hypervolume constructed from M = 20 samples from a unit multivariate (n = 5 dimensions) normal distribution hypervolume, and 20 bootstraps of a hypervolume constructed from M = 40 samples (shifted by various distances along one axis) are compared.

Figure S5 :
Figure S5: Statistical power of the overlap test using the bootstrap method when 20 bootstraps of a hypervolume constructed from

Figure S6 :
Figure S6: An example of an occupancy analysis for simulated data, showing (a) the expected results and (b) the hypervolume representation.

Figure S7 :
Figure S7: Results of the simulations to test the accuracy of the permutation test for occupancy.