The α -minimum convex polygon as a relevant tool for isotopic niche statistics

Ecological (isotopic) niche refers to a surface in a two-dimensional space, where the axes correspond to environmental variables that reflect values of stable isotopes incorporated in an animal ’ s tissues. Carbon and nitrogen stable isotope ratios ( δ 13 C-δ 15 N) notably provide precious information about trophic ecology, resource and habitat use, and population dynamics. Various metrics allow for isotopic niche size and overlap assessment. In this paper, we advocate α -minimum convex polygons (MCP) - that have long been used for home range estimation – as a relevant tool for isotopic niche size, overlap, and characteristics. The method allows for outlier rejection while being suited to data that are not Gaussian in the bivariate isotopic ( δ 13 C-δ 15 N) space. The proposed indicators are compared to other existing approaches and are shown to be complementary. Notably an indicator of divergence within the niche is introduced, and allows for comparisons at low (n > 6) and different sample sizes. The R code is made publicly available and will enable ecologists to perform isotopic niche comparison, contraction and expansion assessment, and overlap, based on various methods.


Introduction
Stable isotope analysis provides deep insight into a great variety of trophic and ecological processes. Since diet studies do not allow for a comprehensive view of food webs, it is necessary to use complementary methods. Isotope signatures in organisms yield information on timeintegrated assimilated food (Fry, 1988;Vander Zanden and Rasmussen, 1999). Notably, carbon and nitrogen stable isotope ratios in the δ-space have been successfully used to understand the trophic dynamics in marine systems, to trace the pathway of organic matter of different origins through aquatic food webs (Fry and Scherr, 1984;Kaehler et al., 2000;Pinnegar and Polunin, 2000;Briand et al. 2016), to measure the impact of invasive species (O'Farrell et al., 2014), the impact of predators on preys (Gallagher et al., 2017), or to support the resource breadth hypothesis (Rader et al., 2017), to cite a few applications.
Regarding trophic food webs, carbon isotope composition in living animals usually provides indication of the origin of the ingested organic matter through a low increase in δ 13 C per trophic level of about 1-1.4 ‰ on average (DeNiro and Epstein, 1978;Wada et al., 1991;Sweeting et al., 2007a). Nitrogen isotope ratio can be used as a proxy of the trophic level of organisms, as δ 15 N usually increases to about 3.0-3.4 ‰ from food to consumer Wada, 1984, Sweeting et al., 2007b). Thus, combined measurements of both isotopes can provide information on source material and trophic level, allowing for the construction of trophic relationships within the food web structure (Letourneur et al., 2013;Briand et al., 2016). Within a species, a genus, or a family, the space occupied by the individuals in the δ 13 C versus δ 15 N biplot is called isotopic niche, and can be used as an indicator of the trophic diversity of the species (or genus or family) (Newsome et al., 2007). Layman et al. (2007) pioneered metrics to quantify isotopic niche structure. Ecologists routinely use their total area metric (TA), which is the area of the minimal convex polygon (MCP), or convex hull, containing all the organisms. Indeed, TA provides a useful ecological indication of the actual isotopic space occupied by a species or a community. In Jackson et al. (2011), the authors proposed Gaussian ellipses as an alternative to convex hulls, and introduced the indicator SEA which is the area of the theoretic confidence ellipse containing 40 % of a bivariate Gaussian having the same covariance matrix as the data. Both indicators were empirically compared (Syväranta et al., 2013). An extension of the ellipse method to n-dimensional niches has been proposed in Swanson et al., 2015, and recently an alternative method based on a kernel probability density function estimation of the niche has been introduced in Eckrich et al. (2020). Cucherousset and Villéger (2015) have also provided complementary isotopic niche indicators (isotopic divergence, dispersion, evenness and uniqueness). These indices are multidimensional, abundance-weighted and mathematically independent of the number of organisms analyzed. Niche metrics allow for comparisons between populations, see e.g., Andrades et al. (2019), notably through the study of overlap between occupied niches, see Botta et al. (2018), or to compare population isotopic niches between different habitats (Letourneur et al., 2017).
In behavioral ecology, for home range estimation (i.e., the habitat extent), the minimum convex polygon (MCP) is one of the most widely used methods. It may be traced back to Mohr (1947), and ecologists routinely use the 95% MCP instead, to rule out occasional sallies, see e.g, Van Beest et al. (2010). Inspiring from these works, we advocate the use of MCPs for the study of isotopic niches, as a basis for complementary and relevant indicators we introduce herein. We insist though, that the proposed indicators are devoted to isotopic niche assessment, not home range estimation. The interest will be shown hereafter using both simulated and experimental data in the field of marine biology.
The functions in the statistical environment R are made publicly available at http://www.silvere-bonnabel.com/TA_alpha. They are simple to use even for researchers who are not familiar with R, and allow for reproduction of the indicators and figures of the paper.

Definition of the TA α index
Considering N points in the bivariate isotopic space, N being possibly small, the minimum convex polygon (MCP) of a given set of points is obtained by connecting the points that encompass the data. In other words, MCP is their convex hull. Moreover, it is also possible to define the α-MCP as the polygon of smallest area that encompasses a ratio α (hence between 0 and 100%) of the data. In turn, we define the index TA α as the area of the α-MCP. TA α is thus a value that represents an area, not to be confused with the α-MCP. Suppose α = M/N. This means the α-MCP contains exactly M points by having eliminated the most exterior points. The proposed corresponding index TA α is computed as follows.
• Find the convex hull of the N points. Amid the points that support the MCP, remove the one being the farthest to the barycenter of the polygon and recompute the MCP enclosing the remaining N-1 points.

Desirable features of the Ta α index
The TA α index inherits from the properties of the α-MCP, and generalizes the well-established TA indicator. In this respect, it possesses key properties that are as follows. First, the α-MCP contains the actual proportion α of data (up to a possible small discrepancy at low sample size we accommodate using interpolation) regardless of the underlying distribution of the data (no assumption is made that the distribution is Gaussian). Second, the indicator is robust to outliers as soon as α is not near 1, a feature that actually prompted the introduction of α-MCP in behavioral ecology, e.g., (Van Beest et al., 2010). Third, the indicator matches the boundedness of the support of actual ecological data, and coincides with the well-known TA indicator (Layman et al., 2007) in the extreme case α = 1. This confirms its ecological relevance, in the sense that it reflects the size of an actually occupied isotopic niche space.

The TA α /TA index
The TA index yields no indication of the distribution of the points within the convex hull. As complementary indicators, we introduce a family of indexes TA α /TA that yields an indication of the variability within the convex hull, that is, divergence (see Cucherousset and Villéger, 2015). The TA α /TA index is close to 1 if many points lie close to Fig. 1. Illustration of the 100 % MCP (light grey), and the 66% MCP (dark grey). Data randomly generated with Gaussian distribution (the right plot is obtained by contracting and dilating the left plot). The index TA 1 is the area of the total convex hull, whereas TA 0.66 is the minimum convex polygon containing the most central 66% points, i.e., 33 out of 50 points.
the border of the convex hull, whereas it is low if most points are concentrated within a core area. The value of α may be arbitrarily set by the user. The value 2/3 is illustrative because it is based on simple proportions, but any other value is possible, as long as it is not too close to 0 or 1, in which case it may become meaningless. For example, if we find that TA 2/3 /TA = 0.3 it means that a core group that contains two third of the population occupies a region as small as 30% of the total area. This may prove useful to study for instance the effect of the introduction of non-native species with extreme trophic positions (Cucherousset et al., 2012).
The proposed TA α /TA possesses two desirable features. First, it is mathematically independent from the convex hull area, a feature it shares with the isotopic divergence ratio of Cucherousset and Villéger (2015), and not with previous metrics such as the mean nearest neighbor distance (Layman et al., 2007). This property is illustrated on Fig. 1, where the two plots actually display the same points, up to a (different) scaling factor along each axis. As scalings affect areas in the same proportions, the TA α /TA ratio proves identical in both plots. In this sense, the indicator is independent of the total occupied area and only depends on the dispersion within the convex hull. Second, the proposed indicator is weakly sensitive to sample size N. This allows for comparison between samples having different sizes. This point is further developed in the next subsection.

The corrected (TA α /TA) c index
Albeit weakly sensitive to sample size, the ratio TA α /TA drops at very low size, typically N≤10. To allow for isotopic niche comparison, we may inspire from the SEA c methodology (Jackson et al., 2011), and correct the ratio at low sample size. To do so, we use a Monte-Carlo simulation to easily estimate the average value β = E ( TA TAα

)
of the ratio for a bivariate Gaussian distribution with sample size equal to N, and correct the TA α /TA using β to make it unbiased with respect to sample size at low N. The corrected ratio (TA α /TA) c is displayed on Fig. 2. By contrast, SEA/TA represents the area of the 66% ellipse  divided by TA, as used in Letourneur et al. (2017). Both indicators converged to the same value asymptotically, but SEA/TA is biased at sample sizes that may be of interest and thus may not serve to compare populations having different sizes.

The α-MCP overlap
When considering two groups, it may be interesting to compute both the actual common region of the space occupied by a ratio α of the members, that is, the area of the intersection between the two α-MCPs, as well as the convex hull of both α-MCPs. The former provides an indication of overlap whereas the latter represents a potential isotopic space. For instance, in some ecological contexts, overlap area might represent either a potential competition process or sharing for food resources between groups, whereas the whole α-MCP area might represent the entire isotopic space potentially available for the group of interest ( Fig. 3). Using a value of α being less than 1 allows for outlier rejection, even for α = 0.95, see Fig. 3.

Comparisons and discussion
In this section, we first briefly review the state-of-the-art ellipse methods (Jackson et al., 2011) as well as the recently introduced Kernelbased approach Eckrich et al., 2020. In a second step, we illustrate in the fourth subsection hereafter the relevance of the tools we have introduced using ecological data, collected on Pacific coral reef fish, and how they compare to those indicators and complement them.

The standard ellipse
The standard ellipse consists in assuming the distribution of the points to be Gaussian, then computing the empirical covariance matrix Σ, and applying a correction factor for unbiasedness with respect to sample size for bivariate Gaussians, namely Σ← N N− 2 Σ. Finally, the corresponding 40 % confidence ellipse, and its area, are computed. This yields the SEA c indicator, and by omitting the correction factor N/(N-2) one recovers the SEA indicator.
Ellipse methods rely on an underlying Gaussian assumption whereas indicators based on MCPs are non-parametric. Moreover, the area of the confidence ellipse that theoretically contains a proportion α of the distribution is a pure dilatation/contraction of the 40% ellipse that is independent of the actual data. From an ecological viewpoint it may be unsatisfying, especially for non-Gaussian underlying distributions, and it leads to erroneous values for large α, since the size of the ellipse tends to infinity when α approaches 1.
The SEA method was shown to suffer from uncertainty at sample size with less than 30 individuals, and caution is advised when populations are skewed (Syväranta et al., 2013).

Kernel-based methods
The idea of kernel-based estimation is to use a kernel to transform a sample into a continuous probability density function of a distribution. In turn, this allows for the calculation of the area encompassed by the contours a proportion α of data, e.g., 95%. The interest of Kernel-based methods is to more closely reflect the actual area occupied by the data, by relaxing the convex assumption for the contour, see Eckrich et al., 2020, hence offering versatility with respect underlying distributions as particularly desirable for home range study, (Getz et al., 2007). However, kernel-based methods are not available at low sample size, as the rKIN package does not apply to samples containing less than 10 individuals. Fig. 4 displays the 0.40, 0.66 and 0.95 MCPs (note that those values were arbitrarily set and it is possible to choose other values) and confidence ellipses for two populations of coral reef fishes (Pomacentrus adelus and P. coelestis) sampled in SW lagoon of New Caledonia (Briand et al., 2016) and Marquesas' Islands, French Polynesia, respectively (unpublished data from Fey and Letourneur, in press). It evidences how the underlying true statistical distribution may differ from the bivariate Gaussian, as both look skewed. This is important as the ellipse of (Jackson et al., 2011) strongly relies on a Gaussian assumption. In both cases, the 95 % ellipses clearly appear too large. By contrast the α-MCPs  Caledonia (Briand et al., 2016) and P. coelestis (N = 40) from Marquesas Islands (Fey et al. in prep). The large dot in the center is the barycenter of the ellipses. The 40% ellipse captures only 33% of the population on the left plot, and as large as 50% on the right plot, versus 40% and 42%, respectively, for the TA α .

The TA α /TA as a way to measure discrepancies
Owing to the ability of TA α /TA to compare populations with different sizes and areas (Fig. 2), this novel indicator is used as follows. In a recent article, Letourneur et al. (2017) investigated niche isotopic sizes of four coral reef fish species living in two different (degraded versus healthy coral reef) habitats from New Caledonia (see their Methods section for more details on sample collections, etc.). The indicator SEA c /TA is therein used as a way to measure discrepancies between communities that lived in both coral reefs. We see that amongst all indicators, TA α /TA, or its corrected version, indicates the most significant discrepancies between degraded and healthy reefs (Table 1) and as such better discriminates the statistical distributions. For instance, the variability for the Chaetodon lunulatus is identical in healthy and degraded reefs according to the SEA c /TA indicator whereas it is by 36% lower using proposed TA α /TA or its corrected version, with α = 0.66 (Table 1). This might indicate the TA α /TA is more accurate, as the result is more logical given the ecological requirements of that species, an obligatory corallivore (feeding on live coral). It is thus not surprising here, and even expected, to obtain a lower TA α /TA ratio (or its corrected version) on the degraded reef, as the space occupied by most of the individuals tends to shrink. However, this expected result is not achieved when using a hybrid ratio instead that builds on a Kernel-based α-isopleth, see Table 1. This is logical, as the Kernel isopleth is less sensitive to sample size than TA (and TA α ), making the ratio Kernel/TA more sensitive than TA α /TA (see Fig. 2).
Results for Zebrasoma velifer are somewhat counterintuitive. Indeed, the area occupied shrinks in the degraded reef, although the species can be favored by reef degradation, as an herbivore. The fact that SEA c /TA be much higher than TA 0.40 /TA (Table 1), though, is in part an artefact due to SEA c being corrected at low sample size, and TA being uncorrected. Although the corrected TA α /TA index reflects a more concentrated niche in the degraded reef, as the space occupied by the most central members is in proportion lower, its unbiasedness at low sample size makes it more reliable, indicating that more complex ecological processes might be involved, or that the low size collected sample may merely not be representative.

Overlap indicators
Let us compare the proposed overlapping indicators based on α-MCPs with the Kernel-based approach (Eckrich et al., 2020). Three coral reef fish species having different sample sizes, collected in the SW lagoon of New Caledonia (Gymnothorax chilospilus, Cirrhilabrus bathyphilus, Pomacentrus adelus) (Briand et al., 2016) are displayed on Fig. 5. We see the interest of MCP is to leave some potential outliers out of the estimated niche (see two rightmost points), a feature not necessarily achieved with kernel-based methods. Indeed, the latter do not enforce convexity, allowing them to accommodate a large range of statistical distributions. However, in the context of marine ecosystems we believe Table 1 Comparison of indicators for four coral reef fish species over healthy and degraded reefs in the SW New Caledonian lagoon (see Letourneur et al., 2017). N represents the number of individuals. TA α /TA as an indicator of the shape of the distribution reveals larger discrepancies between both types of habitats than the other indicators. Moreover, its independence with respect to sample size makes it suited for comparisons.  that the Kernel based-approach cannot take into account the latter species case because n < 10.
convexity makes sense as all values between the observed extremes are most likely biologically relevant, each point in the biplot being the result of a monthslong time-integrated diet where isotope quantities continuously evolve. It is thus not relevant to exclude values that lie in the biplot between two members, making the convex hull a domain that should be part of the area computation. For instance, considering the case of C. bathyphilus, isotopic values ranged from 6.64 to 7.71 ‰ for δ 15 N and from − 18.06 to − 16.48 ‰ for δ 13 C (Fig. 5, green dots), and it seems highly plausible that other unsampled individuals might possess C and/ or N isotopic values within these ranges (whereas there is no indication that values beyond this range are achievable by this group). To conclude, two major points shall be noted. First, it was evidenced the proposed TA α /TA index, or its corrected version, does not suffer from some caveats of the other indicators, notably at low sample size.
Moreover it may be used across the entire range of α's, by contrast to the imposed choice of α = 0.40 in the case of classical ellipses (Layman et al., 2007). Second, from an "ecological" point of view, it was evidenced through several examples of ecological interest that the proposed TA α / TA index, albeit simple and easily understandable to all, allowed for more interpretable results relative to the known biological requirements of the studied species.