A checklist for using Beals’ index with incomplete floristic monitoring data

Christensen et al. criticized the application of Beals’ index of sociological favourability to adjust for incomplete species lists when comparing repeated surveys. Their main argument was that using Beals’ conditional occurrence probabilities would systematically underestimate biodiversity change compared to using observed frequencies. Although this might be the case for rare species, as we explicitly stated in our original publication, we here use a worked‐out example to show that this criticism is unjustified for species that are sufficiently represented in the reference data set. In our opinion, the misconception derives from ignoring one of the key requirements for applying Beal's index, which is the use of a sufficiently large reference data set to derive a reliable co‐occurrence matrix. We here show how the predicted probability for the occurrence of a species depends on the size of the reference data set and give recommendations on the premises for applying Beals’ approach for monitoring purposes.


| THE CRITI CIS M OF CHRIS TEN S EN E T A L . (2 021)
argue that using occurrence probabilities as obtained from Beals' index (Beals, 1984) to account for incomplete survey information is inappropriate for monitoring purposes because using Beals' co-occurrence probabilities instead of observed frequencies would only result in a flattening of trends, and thus, in a systematic underestimation of vegetation change. They provide three worked-out examples to show how this might affect potentially overlooked species, newly colonizing species and local extinction of species. To compare the constructed presences/absences with Beals' probabilities, they base the M ij co-occurrence matrix in the formula of Beals' index (see Bruelheide, Jansen, et al., 2020) on (a) two constructed records from polygons in survey at time 1, (b) on two constructed records from polygons in survey at time 2 or (c) on all four of their constructed records. They refer to the first two cases as "normal Beals' index" and to the latter as "integral Beals' index".
The latter approach uses the co-occurrence information across both surveys that are to be compared (in the following referred to as "joint co-occurrence matrix"), which was the approach used in Bruelheide, Jansen, et al., (2020). Notably, they criticize the use of a joint cooccurrence matrix across both surveys for being static in time and state that "if the co-occurrence matrix is static in time, the relative frequencies of the species must be static in time, too, and therefore there are no trends of species."

| FIVE RECOMMENDATI ON S AND A WORKED -OUT E X AMPLE
We argue that these constructed cases are not relevant to the question at hand, which is reliably assessing incomplete, yet highly informative, species surveys. We start by noting the three main requirements and limitations for applying Beals' index, as suggested by De Cáceres and Legendre (2008) based on simulations: 1. Existence of an "ecological structure" in the data, which means that the species to be analysed need to have a relation to the species co-occurrence matrix. The general assumption of Beals' index is that species interact with each other and/or respond similarly to their environment, which is reflected in their co-occurrence pattern. In particular, occurrences cannot be predicted for species whose occurrence does not covary with those of any other species. This might be the case for rare species that do not occur sufficiently frequently in the co-occurrence matrix to derive a reliable occurrence probability. In consequence, Beals' approach will usually assign a very low occurrence probability to rare species, which includes also polygons in which they actually occur. Another case might be species that co-occur with many different species in many different habitats. In such cases, the occurrence probability of these species will be also very low.

The reliability of Beals' probabilities increases with the species richness
of the target record. The more species co-occur in a polygon, the more information can be drawn from the co-occurrence matrix on the ecological conditions of that polygon. For this reason, probabilities of occurrence of a species in monospecific polygons will be 1, assuming that Beals' formula includes the target species in the summation, which was the original formulation of Beals (1984) and which was also used in Bruelheide, Jansen, et al., (2020). In consequence, Beals' index is not suitable for species-poor vegetation types.
3. The reliability of the conditional probability estimations depends on the size of the reference data set used to calculate the M ij species cooccurrences matrix. This reference data set can be obtained from a different source than the records for which prediction shall be made, as long as it encompasses the same ecological structure (see point 1). On these regards, De Cáceres and Legendre (2008) indicated that, in the absence of noise, a minimum of 40 sampling units (in our case polygons) is necessary as a reference data set to obtain accurate estimations of probabilities. In this respect, Bruelheide, Jansen, et al., (2020) were on the safe side as their M ij co-occurrence matrix was based on all available polygon records of both surveys across the whole federal state (53,696 polygons).
We here provide a worked-out example to show that these con-  Tables 1 and 2 and replaced the dummy species notations with species that were actually recorded in our data set (see explanations in Table 1). As a target species i, we chose Anemone nemorosa in our example.  Christensen et al. (2021) is that the reference data set from which species co-occurrences are calculated does not have to be the same data set for which the predictions are made.
Thus, as correctly pointed out by Christensen et al. (2021), these probabilities are uninformative for either predicting whether A.
nemorosa was actually missing in Survey 1 or whether its observed presence in Survey 2 reflects a colonization event.
TA B L E 1 Example dataset with species lists for two polygons surveyed at time 1 and 2 (S1 and S2 respectively). a) presence/absence list of species, b) Beals' probabilities based on the four records of a) and c) Beals' probabilities based on all records with Anemone nemorosa.

Species (a) p/a of species (b) Beals' probabilities (4 records as reference data)
(c) Beals' probabilities (2657 records as reference data) S1 S2 S1 S2 S1 its occurrence probability would simply increase when increasing the size of the reference data set, as shown in Figure 1. However, if the newcoming species would be an accidental record without sufficient representation in the reference data set, the occurrence probability of that species would approach zero with increasing size of the reference data set. This confirms the limitation listed under point 1 above that rare or accidental species might not be predicted accurately. However, predicting the occurrence of rare species is not the aim of using Beals'

S2
index. If this was the aim, better methods exist, such as occupancy models, which also account for population dynamics, the species' observability and observer bias (Isaac et al., 2014). The main strength of using Beals' probabilistic approach is to account for the suitability of a habitat to host a target species, as measured through the occurrence of coexisting species. If this suitability is changed through, for example, human disturbance, nitrogen deposition or abandonment of traditional management practices, it will be reflected in the predictions and also in increased probability to encounter a newly colonizing species.
The final issue raised by Christensen et al. (2021) was that the extinction of a species might not be correctly reflected in Beals' occurrence probabilities. We should note that in a probabilistic approach occurrence probabilities are always higher than zero. Thus, extinctions can never be ascertained with complete certainty. These probabilities depend on the Notes: The example is taken from Bruelheide, Jansen, et al., (2020), from where we retrieved an arbitrary data record with n = 20 species, thus matching the example of Christensen et al. (2021) -1930- -066, no. 100193645, recorded 1993.780°, Lat 54.070°). We found a corresponding data set with n = 20 species, of which 10 were in common (survey S2, polygon 2). We chose one common species as target species i (i.e. Anemone nemorosa) and deleted that species from both polygons to produce the two polygon records of survey 1. Beals' probabilities in b) were calculated using only based on a co-occurrence matrix from a), while Beals' probabilities in c) were calculated using a total of 2657 polygon records in which A. nemorosa was present in the whole habitat mapping data set.

TA B L E 1 (Continued)
species composition in the target polygon. If all species of the first survey are present in the second one, except one target species, this species will be predicted as overlooked. This is seen in the convergence in probabilities for the occurrence probabilities of A. nemorosa of survey S1 and survey S2 in Figure 1, irrespective of whether the species was present (S2) or not (S1). In contrast, if the species composition in the target polygon has changed in a way that the structure does no longer match the cooccurrence for that species, this species would actually be predicted to have a very low probability to occur. Ecologically, such a low occurrence probability might represent an extinction debt (Kuussaari et al., 2009).
Thus, the overall coenotic context provided by Beals' approach might give a realistic picture for the long-term perspective of that species.
Given the limited ability of the Beals' index to estimate the occurrence probability of rare species, we would like to add two rec- surveys, and thus, a common co-occurrence matrix, is a prerequisite to avoid wrong predictions due to pseudo-turnover.
This clearly goes against Christensen et al.'s criticisms that using a joint co-occurrence matrix, that is, a co-occurrence matrix based on all records in both points in time, implies that communities are static, and no change is possible. Rather, the species occurrence probabilities only change in a polygon if there is a change in species composition in that polygon, which is clearly independent from the co-occurrence matrix used.
Including the target species in the summation of the Beals' formula makes sure that the presence or absence of the target species alone will also be considered a change in the species composition of that plot, and in consequence, result in a change in occurrence probability for that species. Thus, Beals' approach makes sure that changes will be detected even if there is no change in species composition of the other species. However, these changes in probability will become smaller with increasing frequency of the target species in the reference data set, as is shown by the convergence for the occurrence probability of A. nemorosa in S1 and S2 in each of the two polygons ( Figure 1).

F I G U R E 1
Dependence of Beals' occurrence probabilities for A. nemorosa in the four records of Table 1 on the number of records in the reference data set used to calculate the M ij co-occurrence matrix in Beals' index. S1 and S2 refer to the survey 1 and survey 2, respectively. The size of the reference data set was increased by stepwise including randomly selected polygon records from Bruelheide, Jansen, et al., (2020), in which A. nemorosa occurred. Rather than choosing plots randomly from the whole data set, we only added records in which A. nemorosa occurred (in total 2657), as these are the only records actually carrying information on the ecological structure that is relevant to A. nemorosa. Note that a reference data set without absences for a target species was chosen here to demonstrate the role of the size of the reference data set, but would of course not be suitable to reflect the ecological structure of the full data. However, the increase in occurrence probability with increasing size of the data set would be qualitatively similar when adding completely random samples from all polygon records, but the steepness of the curves would strongly depend on the frequency of the target species in the reference data set. We would also like to point out that in most cases it is not possible to use survey-specific reference databases for calculating surveyspecific co-occurrence matrices, as suggested with the "normal Beals" by Christensen et al. (2021). If a species is missing in one of the surveys and the survey-specific data are used to calculate the co-occurrence matrix, probabilities can be obtained for these species, and thus, a comparison between the two surveys is not possible. Excluding those species that newly arrived or went extinct between two surveys would be pointless for monitoring questions. In consequence and in contrast to a joint co-occurrence matrix, co-occurrence matrices that are derived from survey-specific reference databases cannot be static in time, as assumed by Christensen et al. (2021).

5.
The target records should be part of the reference dataset and their species should be sufficiently covered in the reference dataset. It is not possible to predict occurrence probabilities for species that are not included in the reference data set. This mistake was made by Christensen et al. (2021) when they calculated the "normal Beals' value" for the target species in Table 1 in survey 1 and in Table 2 in survey 2, using a reference data set that did not include the species in question. The special case of a species that is not represented in the reference data set is not defined in Beals' formula, because then M ij would be divided by M j =0. Thus, the stated occurrence probability of 0 for the "normal Beals" reported by Christensen et al. (2021) is wrong. In contrast, it is also conceivable to choose a reference data set with a much larger spatial and ecological extent. For example,  used a vegetation database of whole Germany with 170,039 records to predict the occurrence probabilities for 6319 dry grasslands records. A large reference dataset, however, will result in predicted probabilities for the target record that include all species in the reference data set, including those that have a geographic distribution range or environmental niche outside the target polygon. For this reason, the geographic and ecological extent of the reference data set should match that of the target records.
We are convinced that Beals' index is a powerful tool to account for incompleteness in monitoring records if these five recommendations are carefully considered. However, violating these recommendations will result in the spurious results that are stated in Christensen et al. (2021).

B I OS K E TCH
The authors are all members of the sMon (Trend analysis of biodiversity data in Germany), which is a project of the German Centre for Integrated Biodiversity Research (iDiv) Halle-Jena-Leipzig.
sMon has the aim to explore the possibilities and limitations of using heterogeneous data sets from different points in time to derive trends in the state of biodiversity in Germany. Another objective is to develop models that allow to make use of incomplete data.