Global and regional ecological boundaries explain abrupt spatial discontinuities in avian frugivory interactions

Species interactions can propagate disturbances across space via direct and indirect effects, potentially connecting species at a global scale. However, ecological and biogeographic boundaries may mitigate this spread by demarcating the limits of ecological networks. We tested whether large-scale ecological boundaries (ecoregions and biomes) and human disturbance gradients increase dissimilarity among plant-frugivore networks, while accounting for background spatial and elevational gradients and differences in network sampling. We assessed network dissimilarity patterns over a broad spatial scale, using 196 quantitative avian frugivory networks (encompassing 1496 plant and 1004 bird species) distributed across 67 ecoregions, 11 biomes, and 6 continents. We show that dissimilarities in species and interaction composition, but not network structure, are greater across ecoregion and biome boundaries and along different levels of human disturbance. Our findings indicate that biogeographic boundaries delineate the world’s biodiversity of interactions and likely contribute to mitigating the propagation of disturbances at large spatial scales.


Supplementary Methods
Standardizing the taxonomy Considering the variety of authors and studies in our dataset, which identified plants and birds with differing resolution, it was necessary to reduce the taxonomic uncertainty in a uniform way. For this, we extracted the frugivore and plant species lists from all networks and performed a series of filters in order to remove non-existent species names (e.g., morphospecies labels) and standardize synonymous names according to reference databases.

Frugivore species
To account for spelling errors, we checked the matching of frugivore species names in our database to those from several taxonomic sources using the Global Names Resolver (GNR) 1 . We accessed this database using the function gnr_resolve from the R package taxize 2 (Supplementary Fig. 3; step 1). This function provides a matching score and the name from any of GNR's sources that most closely matches each name in our species list. Matching is determined by a combination of checking for exact matches against the names in the data sources and fuzzy matching (of canonical forms or parts of the names) using the TaxaMatch algorithm 3 . Because we were only interested in birds, we used the function classification from the same package to retrieve the taxonomic hierarchy and remove non-avian species, using the National Center for Biotechnology Information (NCBI) 4 as the reference database (Supplementary Fig. 3; step 2). For those species classified as birds, we used the function gnr_resolve one more time using BirdLife International 5 as the reference database (Supplementary Fig. 3; step 3). We used data from the Integrated Taxonomic Information System (ITIS) 6 and the synonyms function from the taxize package 2 to obtain the synonyms of the species cross-checked with BirdLife International, as well as of those that were not found in the BirdLife database but were previously classified as birds (Supplementary Fig. 3; step 4). We did this because, while obsolete bird species names usually did not have a match in BirdLife, one of its synonyms could: e.g., the black-fronted piping-guan was not found in the BirdLife database when its former scientific name, Penelope jacutinga, was entered; however, its currently accepted scientific name, Pipile jacutinga, was found as being one of the synonyms of Penelope jacutinga, and this synonym was revealed during step 4.
We also downloaded the Handbook of the Birds of the World (HBW) and BirdLife International (version 4.0) 7 and automatically checked for matches of species names in our frugivore list with the names from the columns 'scientific name' and 'synonym' of the HBW-BirdLife spreadsheet (Supplementary Fig. 3; step 5). By doing this, we were able to retrieve all the scientific names associated with the matched name in HBW-Birdlife. We used a fuzzy matching algorithm based on the Levenshtein distance between two strings to search for other possible names on the HBW-BirdLife spreadsheet for the species without good matches in any of the GNR's sources or BirdLife International, as well as for those species that were not found in the ITIS database (Supplementary Fig. 3; step 6). On some occasions, even this fuzzy matching algorithm could not find matches for a species name, which usually occurred when the genus name was incorrect or obsolete (note that in the vast majority of cases obsolete scientific names were fixed during steps 4 and 5, but some obsolete names were not present in either the ITIS or HBW-BirdLife databases). For those species, we automatically searched for their epithet names in the columns 'scientific name' and 'synonym' of HBW-BirdLife and retrieved only those that had one single match in each column (Supplementary Fig. 3; step 7). The reason for restraining our search for those with one single match is because some epithet names are common and do not necessarily represent the same species. However, even this restriction is not a guarantee that the species with a given epithet in our list is the same species with the epithet in HBW-Birdlife, since a misspelled epithet name may coincidentally match the epithet of other species. Thus, we checked manually the taxonomy of all species corrected using this method (N = 17 species). We did this by searching for both the original species name (before the data cleaning process) and the matched name in the Avibase 8 and BirdLife 5 databases. By applying this series of filters, we were able to correct and validate the names and synonyms of 1,019 bird species. For the remaining species, we checked the taxonomy manually by inspecting the same databases as in the previous step.
Finally, we generated a list object in R 9 in which element names correspond to scientific names accepted by either BirdLife International -obtained using the gnr_resolve function from the taxize package 2 in 28/07/2020 -or HBW-BirdLife 7 , while strings within elements correspond to all their synonyms and original species names. We used this list to standardize the taxonomy of the bird species in our local networks, so that synonyms would not be treated as different species (i.e., if two species were synonyms, they were attributed the same name in the local networks). All species that were removed during the cleaning process (non-bird species and those without genus and/or species names, such as Undefined sp. and Turdus sp.) were removed from our local networks and further analyses (N = 82 species). Around 86% of frugivore species remained per network after the data cleaning. Supplementary Figure 3 shows a summary of the steps of the frugivore data cleaning.

Plant species
We checked the matching of plant species names with several taxonomic sources from the Global Names Resolver (GNR) 1 using the function gnr_resolve from the taxize package 2 (Supplementary Fig. 4; step 1), as with birds above. For those species without matches in any of GNR's sources, we applied a fuzzy matching algorithm based on the Levenshtein distance between two strings to compare these species' names with the matched names from GNR (Supplementary Fig. 4; step 2). We did this because some of the species' names without matches in our step 1 were misspelled names of plant species already included in our dataset but not found by the gnr_resolve function. After this process, we relied on the gnr_resolve function one more time to compare the list of matched names from GNR with the list from the International Plant Names Index (IPNI) 10 (Supplementary Fig. 4; step 3). The reason for using gnr_resolve twice is because we first wanted to make sure that the species had a match with at least one of the taxonomic sources from GNR (i.e., confirm that it is a scientific name) and then check whether the matched name represents a scientific name accepted by IPNI. By doing this, we were able to evaluate which species had high matching scores during our first step but not during the third, indicating that they are not internationally accepted scientific names.
We used data from the Tropicos database 11 to obtain the synonyms of the plant species that had been cross-checked with IPNI. We also relied on the iPlant Taxonomic Name Resolution Service 12 to complement the synonyms list and retrieve the most recent accepted names of the species (Supplementary Fig. 4; step 4). Using this series of filters, we were able to correct and validate the names and synonyms of 1,562 plant species. Finally, we generated a list object in R 9 in which element names correspond to accepted scientific names of species (crosschecked with the IPNI database on 15/09/2020) and strings within elements correspond to all their synonyms and original species names (before the data cleaning process). We used this list to standardize the taxonomy of the plant species in our local networks, as we did for birds.
Because our plant list contained several cases in which two (or more) accepted species shared a synonym within their elements (N = 121), we had to deal with the standardization of these names. We did this by attributing the same name for all the occurrences of the species sharing a synonym only if the shared name was already present in our dataset. For example, Cecropia digitata is one of the synonyms of C. angustifolia, C. obtusifolia and C. pachystachya (and is therefore within the elements of these three species), but C. digitata was not present in any of the networks in our dataset, such that we could maintain the names C. angustifolia, C. obtusifolia and C. pachystachya in our local networks. We did this because shared synonyms that were not present in our dataset usually represented obsolete species that are no longer accepted. Alternatively, for the cases in which the shared synonym was present in our dataset (N = 37), we attributed the same name in the local networks for all the species that shared that given name. We adopted this conservative approach because, in this case, shared synonyms were usually species that were described multiple times by different authors, or species with several subspecies and varieties (note, however, that authors rarely include this level of taxonomic information on networks). Therefore, the shared name could potentially be any of the species that possess it as one of its synonyms.
Considering the high number of species (N = 184) with a valid genus name but without a valid epithet name (as indicated by the absence of matches in our steps 1 and 3, or by the low matching scores to any of the GNR's sources), as well as unresolved species names without good matches in the IPNI database (hereafter, problematic species) in our plant species list, we added two steps to evaluate whether such problematic species could be considered as a separate species from the other species in our dataset. For example, a species without an epithet (e.g., labelled in a study as 'Miconia sp.') could still be treated as a distinct species in the analysis, provided we could be certain that it was not the same as another congeneric (Miconia) species, with or without epithet, in our dataset. Similarly, an unresolved species name that is not internationally accepted could only be considered as a distinct species in our analysis if we could disentangle it from its congeneric species in the dataset. Importantly, we did not perform these additional steps for birds because there were very few cases of birds with valid genus but invalid epithet names.
To determine whether problematic species could be treated as a distinct species for analysis, we evaluated whether the distribution of any of the congeners of problematic species in our dataset overlapped with the location of the problematic species, such that we cannot be confident that the problematic species is not simply another occurrence of one or more of its congeners already in the dataset. For this process, we used the coordinates of the networks in which each problematic species occurred and generated buffer zones (diameter = 500 km) around these network locations. Considering that the size of the buffer zones could potentially affect our results, we also conducted the analysis using buffer zone sizes of 100 km and 1000 km (note, however, that our results still hold independently of the buffer zone size used; see Supplementary Tables 9-32). We collected occurrence data for all other species in the same genus in our dataset to evaluate whether the occurrence points of any of these congeneric species overlapped with the buffer zone of the problematic species (Supplementary Fig. 4; step 5). For collecting occurrence points, we used data from the Global Biodiversity Information Facility (GBIF) 13 and applied a series of filters (for details, see the Occurrence data section below). If the occurrence points of at least one of the congeneric species overlapped with the buffer zone of a given problematic species, we assumed that this problematic species could not be considered, with confidence, as a unique species in our dataset. Conversely, if none of the occurrence points of congeneric species overlapped with the buffer zone of the problematic species, we treated this problematic species as a separate species ( Supplementary Fig. 5), provided that there were no other problematic species (without valid epithets) in the same genus from other studies in the dataset.
Alternatively, if a genus contained more than one problematic species in the same study (e.g., Miconia sp.1, Miconia sp.2), we assumed that the authors distinguished the congeners within the study. For the cases in which a problematic species occurred in a single study and was the only species belonging to that genus in our dataset, the original name of the species was maintained in the local network. However, if there were problematic species from the same genus in different studies, we needed to ascertain whether they could potentially be the same species. Our approach for dealing with this issue was to determine all the possible species that a problematic species could be in each location, and then compare the lists of possible species in each location to identify any overlap. To do this, we first generated buffer zones (as in step 5) for each network location in which these problematic species occurred and obtained occurrence data from GBIF for all known species belonging to that genus (see the Occurrence data section). We then checked whether there were congeneric species with occurrence points within the buffer zones of two (or more) problematic species belonging to the same genus (Supplementary Fig. 4; step 6). If yes, we could not consider that these problematic species were different from each other. Rather, in this case there was a chance that the problematic species were the same species whose occurrence points overlapped the buffer zones of both network locations ( Supplementary  Fig. 6). On the other hand, if there were no species whose distribution overlapped the buffer zones of both network locations, these problematic species could be considered as being distinct species in the dataset.
All species that were removed during the data cleaning process (i.e., the problematic species without a valid genus name, such as Rubiaceae sp. or Undefined sp.) were also removed from our local networks and further analyses (N = 166 species). Problematic species that could not be disentangled from resolved species or other problematic species in the dataset were named according to three distinct scenarios (for details, see the Alternative scenarios section). Around 89% percent of plant species remained per network after the data cleaning (note, however, that this percentage varies slightly depending on the scenario employed). Supplementary Figure 4 shows a summary of the steps of the plant data cleaning.

Occurrence data
We retrieved occurrence data from the Global Biodiversity Information Facility (GBIF) 13 using the function occ_search from the R package rgbif 14 . For each species, we only requested occurrence data for observations for which coordinate points were available and no geospatial issues were detected, as determined by GBIF's record interpretation. We also followed a previous study 15 and removed occurrence points with: (i) a coordinate uncertainty larger than 100 km (the size of our smallest buffer zone); (ii) those for which the collection date was before 1945, as older occurrence points are usually not properly geo-referenced 16 ; (iii) those in which the number of counts associated with the occurrence point was zero; and (iv) those in which the 'basis of record' was not an observation or a preserved specimen.
In addition, we used the function clean_coordinates from the R package CoordinateCleaner 17 and land mass and country data (with a 1:10m scale) from Natural Earth 18 to remove occurrence points for which the coordinates: (v) fell within the ocean or outside the borders of the country where they were recorded, both of which indicate data-entry errors, (vi) were located around the country capital or the centroid of the country, indicating imprecise georeferencing based on inadequate sampling site descriptions, (vii) both latitude and longitude were zero or had equal values, indicating failed geo-referencing, and (viii) were located around a biodiversity institution, suggesting that records might represent specimens that were erroneously geo-referenced to museums, herbaria or universities instead of their sampling localities 17 . After applying this series of filters, 456,582 occurrence points were retrieved for 610 plant species in our dataset. These occurrence points were used for disentangling 'problematic' species during step 5 of the plant species cleaning process ( Supplementary Fig. 5).
Because the next step required us to retrieve occurrence data for all known species belonging to a given genus, we used the function name_lookup from the R package rgbif 14 to search for all accepted species names associated with the genus name. We used the same set of filters previously described to obtain the occurrence points for each species during the step 6 of the plant species cleaning process ( Supplementary Fig. 6). In the end, 994,270 occurrence points were retrieved for 4,793 plant species.

Alternative scenarios
We used three distinct scenarios for attributing names for problematic plant species that could not be considered as unique species in our dataset. In the first scenario, we removed from the local network any problematic species whose buffer zone was overlapped by the distribution of 'resolved' congeneric species in the dataset (step 5 of the plant species cleaning process). For example, if the buffer zone of the problematic species 'Miconia sp.' was overlapped by other resolved Miconia species in the dataset, we removed the species Miconia sp. (and all of its interactions) from its local network. We adopted this strategy rather than considering that the problematic species and the resolved species that overlap its buffer zone are the same because such problematic species could potentially be any of the resolved species that overlap its buffer zone. This, in turn, made it impractical to attribute the name of the resolved species to the problematic species in cases where the buffer zone of the problematic species was overlapped by several resolved species. In addition, our first scenario considers all problematic species that could not be disentangled from each other (step 6 of the plant species cleaning process) as being the same species. For example, if two problematic species labelled as 'Coussapoa sp.' in two separate local networks could not be disentangled because there are congeneric species simultaneously overlapping the buffer zones of both network locations ( Supplementary Fig. 6), we attributed the same name to these two problematic species.
Alternatively, our second scenario treats problematic species as being unique. Therefore, a unique name was given for the problematic species whose buffer zone was overlapped by 'resolved' congeneric species in the dataset. For instance, the problematic species 'Miconia sp.' from the example above would receive a unique name in the second scenario instead of being removed from its local network. In this scenario, we also attributed unique names for problematic species that could not be disentangled from each other. For example, each of the two problematic Coussapoa species mentioned above would receive a unique name instead of sharing the same name.
Finally, the third scenario removes from the local networks all plant species that could not be considered as being unique species in the dataset and is therefore our most conservative scenario (which was used for obtaining the results presented in the main text). Because these three different scenarios could affect our response variables, we repeated the analyses using the sets of networks from all scenarios. Notably, results remained qualitatively the same independently of the scenario used in the analyses (Supplementary Tables 9-32).

Sensitivity analysis
Considering that the threshold of minimum network size used for analyses is arbitrary and has the potential to affect the reported patterns, we performed a sensitivity analysis to evaluate how sequentially removing local networks based on their size would affect the estimates (t and F values) and significance (obtained using a combination of Generalized Additive Models and Multiple Regression on distance Matrices) of our predictor variables. We did this by sequentially removing all networks below a specified threshold of size (i.e., class of network size) in our dataset, from smallest to largest ( Supplementary Fig. 22). Note, however, that although we had 60 different classes of network sizes (minimum network size = 8 species; maximum network size = 238 species), we were only able to perform the analysis up to the removal of networks with 71 (or fewer) species (which represented 183 out of the 196 local networks in our dataset). We could not remove the local networks with larger sizes (N = 13 networks) because removing any of these remaining networks would lead to our Generalized Additive Models (GAMs) having more coefficients than data.
We found that removing small networks from our dataset did not strongly affect our results; for instance, removing networks with up to 10 species (which represent 17 out of 196 local networks in our dataset) would not affect any of the reported patterns (Supplementary Figs. 23 and 24). We also highlight that even though all estimates tend to approach zero with the removal of larger networks, this is partially because beyond a certain number of network removals there are too few data points and insufficient range of the predictor value for the model to be able to detect an effect.
Importantly, of the significant effects in the full model using interaction dissimilarity as response, only biome boundaries seem to be sensitive to the sequential removal of small networks from the dataset ( Supplementary Fig. 23). However, biome boundaries explained a relatively low unique proportion of the variation in interaction dissimilarity in our full model. In fact, most of the deviance explained by biomes was shared with ecoregions ( Supplementary Fig.  12), which is likely because biomes share boundaries with ecoregions and the latter explain finer-resolution environmental differences 19 . This strong effect of ecoregion borders on interaction dissimilarity is corroborated in our sensitivity analysis: the effect of ecoregions remained significant even after the removal of networks with up to 57 species (which represented around 88% of the local networks in our dataset; Supplementary Fig. 22).
In addition, the two significant effects in our full model using network dissimilarity as response variable were very robust to the removal of small networks from the dataset ( Supplementary Fig. 24). More specifically, the effect of spatial distance remained significant even after the removal of networks with up to 22 species (which represented around 50% of our local networks), while the effect of sampling intensity was still significant after the removal of networks with up to 32 species (~ 68% of the local networks in the dataset).  Fig. 3. An overview of the steps for cleaning and standardizing the frugivore species data. Red boxes represent species that were removed from the analyses (nonavian species and species without epithet or genus names). The dashed box comprises the steps performed for the species without good matches in any of the Global Names Resolver (GNR) sources and in the BirdLife International database, or that were not found in the Integrated Taxonomic Information System (ITIS). The final list comprises elements whose names represent scientific names accepted either by the BirdLife International or by the Handbook of the Birds of the World and BirdLife International, and strings within elements comprise their synonymous and original names (before the cleaning process). For example, Pipile jacutinga (Cracidae) is the current accepted name of the black-fronted piping-guan, while its synonymous names include Penelope jacutinga and Aburria jacutinga (green box). All names (strings of synonyms) within elements (accepted names) were replaced by the element name in the local networks, such that a given species had the same name for all its occurrences in the entire database. Numbers inside boxes correspond to the steps of the frugivore data cleaning process. Silhouettes were obtained from http://phylopic.org under a Public Domain license.
Supplementary Fig. 4. Overview of the steps for cleaning and standardizing the plant species data. The red box represents species that were removed from the analyses (species without valid genus names). The dashed box comprises the steps performed for the species without good matches in any of the Global Names Resolver (GNR) sources and in the International Plant Names Index (IPNI) database (i.e., 'problematic species'). We performed two steps to determine if problematic species could be considered as being unique species in our dataset (see steps 5 and 6 of the plant species data cleaning process described in the text and visualized in Supplementary Figs. 5 and 6). The final list comprises elements whose names represent scientific names cross-checked with the IPNI database and strings within elements comprise their synonymous and original names (before the cleaning process), or elements whose names represent new names given for problematic species that can be considered as unique species in our dataset, and the strings within elements comprise their original name. For example (yellow box), Ardisia sieboldii (Primulaceae) is a scientific name accepted by IPNI, while A. formosana, Bladhia sieboldii and Tinus sieboldii represent some of its synonymous names. Meanwhile, Ardisia_GBIFresolved_net_184 is the new name given for the problematic (but unique) species Ardisia sp., as revealed by the step 5 of the plant species data cleaning process. Note that, in the former case, all names (strings) within elements were replaced by the element name in the local networks, while in the latter case strings within elements were replaced by the element name only in the network where the problematic species was observed (in this example, network 184). Numbers inside boxes correspond to the steps of the plant data cleaning process. See the Alternative scenarios section for details on how we attributed names for plant species that could not be considered as unique species in our dataset.

Supplementary Fig. 5. Graphical example (for an unresolved Beilschmiedia species) of step 5 of the plant species cleaning process. Coloured points indicate the distribution of
Beilschmiedia species already contained within our dataset, and these are compared with the occurrence location of a 'problematic species' (a species with genus name only). The distributions of both Beilschmiedia tawa (green dots) and Beilschmiedia tovarensis (blue dots) do not overlap with the buffer zone of the problematic species Beilschmiedia sp., such that Beilschmiedia sp. can be considered as a separate species in our dataset.
Supplementary Fig. 6. Graphical example of step 6 of the plant species cleaning process. In this example, there are two occurrences of species labelled 'Coussapoa sp.' in separate studies (locations 1 and 2). The distribution of Coussapoa ovalifolia (red dots) simultaneously overlaps the buffer zones of two 'problematic species' (i.e., species with genus name only) belonging to the same genus, such that these problematic species could not confidently be considered as being separate species. A distribution map like this was created for all congeneric species with occurrence data in either buffer zone. Note that C. ovalifolia is present in the potential list of Coussapoa species in both network sites (other Coussapoa species were omitted in the species lists for clarity).

Supplementary Fig. 7. Results from Pearson's correlation tests between sampling metrics.
For this analysis, we used the subset of networks sampled in Aotearoa New Zealand (N = 14). Numbers inside circles indicate P values obtained using a two-tailed Pearson correlation test. Sizes of the circles and colors are proportional to the correlation coefficient. Supplementary Fig. 8. Relationships between network metrics and sampling hours and months. We used Generalized Linear Models (with Poisson errors, fitted with quasi-likelihood to deal with overdispersion) to obtain significance values. Points represent the 196 local frugivory networks in our dataset. Solid lines represent significant relationships. Supplementary Fig. 9. Scatterplots of the relationships between our predictor variables of interest (not those used for controlling sampling effects) and species turnover (βS). a The relationship between the quantitative version (environmental dissimilarity) of ecoregion distance and species turnover; point colors indicate whether the pair of local networks belong to the same (blue) or distinct (red) biomes. b The relationship between the quantitative version (environmental dissimilarity) of biome distance and species turnover. c The relationship between local human disturbance distance and species turnover. d The relationship between spatial distance and species turnover. Note that, contrary to species interactions ( Fig. 6 in the main text and Supplementary Fig. 14c), several networks still shared species beyond the threshold distance of 2,500 km (dotted red line). e The relationship between elevation difference and species turnover.
Supplementary Fig. 10. Venn diagram showing the relative contributions (%) of our main predictor variables to explaining the variation in species turnover (βS) across networks, calculated using deviance partitioning. Spatial distance alone explained the greatest proportion (12.9%) of the variation in species turnover, followed by the shared effect of spatial distance and ecoregion boundaries. Note that, to aid visualization, we only included our predictor variables of interest (i.e., not those used for controlling sampling effects). Terms that reduce explanatory power are not shown.
Supplementary Fig. 11. The effect of large-scale ecological boundaries on the proportion of pairs of local networks sharing interactions. Avian frugivory networks located within the same ecoregion/biome were more likely to share interactions than those located across distinct ecoregions/biomes. Note that over 50% of the pairs of networks located within the same ecoregion shared interactions.
Supplementary Fig. 12. Venn diagram showing the relative contributions (%) of our main predictor variables to explaining the variation in plant-frugivore interaction dissimilarity (βWN), calculated using deviance partitioning. The shared effect of ecoregions and spatial distance explained the greatest proportion (6.41%) of the variation in interaction dissimilarity, followed by the unique contributions of these two variables. Note that, to aid visualization, we only included our predictor variables of interest (i.e., not those used for controlling sampling effects). Terms that reduce explanatory power are not shown.
Supplementary Fig. 13. Partial effects plot of the relationship between human disturbance distance and interaction dissimilarity (βWN). The smoothed line was fitted using a Generalized Additive Model (GAM) with interaction dissimilarity as response variable and all of our predictor variables included (see Table 1 in the main text). The lighter green area represents 2 standard errors above and below the estimate of the smooth being plotted.

Supplementary Fig. 14. Scatterplots of the relationships between our predictor variables of interest (except human disturbance distance, which is presented in the main text) and interaction dissimilarity (βWN). a
The relationship between the quantitative version (environmental dissimilarity) of ecoregion distance and interaction dissimilarity; point colors indicate whether the pair of networks belong to the same (blue) or distinct (red) biomes. b The relationship between the quantitative version (environmental dissimilarity) of biome distance and interaction dissimilarity. c The relationship between spatial distance and interaction dissimilarity. Note that interaction dissimilarity increases sharply until a threshold distance of 2,500 km (dotted red line), beyond which few networks shared interactions (a similar pattern can be seen in Fig. 6 in the main text). d The relationship between elevation difference and interaction dissimilarity.

Supplementary Fig. 15. Effect of individual studies on estimates of t (for ecoregion and biome) and F values (for the remaining predictor variables) of Generalized Additive Models with interaction dissimilarity (βWN) as response variable.
Points represent estimate values after removing one study from the data, while asterisks indicate the estimates when the study with the greatest number of networks (N = 35) in our dataset (study ID 76) 20 is removed from the data. The estimates of the full model (with all studies included) are represented by the vertical lines. Red lines indicate a significant effect (P < 0.05), while gray lines indicate a nonsignificant effect. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). The range of the x-axis was defined as ± 3 times the standard deviation of the estimates. Arrows indicate outliers beyond this range (black: when study 76 is removed; red: when other studies are removed).

Supplementary Fig. 16. Effect of individual studies on estimates of t (for ecoregion and biome) and F values (for the remaining predictor variables) of Generalized Additive
Models with network structural dissimilarity as response variable. Points represent estimate values after removing one study from the data, while asterisks indicate the estimates when the study with the greatest number of networks (N = 35) in our dataset (study ID 76) 20 , is removed from the data. The estimates of the full model (with all studies included) are represented by the vertical lines. Red lines indicate a significant effect (P < 0.05), while gray lines indicate a nonsignificant effect. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). The range of the x-axis was defined as ± 3 times the standard deviation of the estimates. Arrows indicate outliers beyond this range (black: when study 76 is removed; red: when other studies are removed). Supplementary Fig. 17. Plant and bird species connecting local networks, ecoregions and biomes. World map with points representing the 196 local avian frugivory networks in our dataset. As in Fig. 2 in the main text, colors of shaded areas represent the 67 ecoregions where networks were located, with similar colors indicating ecoregions that belong to the same biome. Lines represent the connections (shared species) plotted along the great circle distance between networks. Blue lines represent connections within biomes, while red lines represent connections across biomes. Stronger color tones of lines indicate higher similarity of species (1-βS) between networks. a Lines represent connections between networks sharing bird species. Pie charts depict the proportion of pairs of local networks sharing bird species across vs. within ecoregions. b Lines represent connections between networks sharing plant species. Pie charts depict the proportion of pairs of local networks sharing plant species across vs. within ecoregions. c Lines represent connections between networks sharing both plant and bird species. Pie charts depict the proportion of pairs of local networks sharing both plant and bird species across vs. within ecoregions (see Fig. 2 for the world map of shared plant-frugivore interactions). Ecoregions and biomes were defined based on the map developed by Dinerstein et al. 19 (available at https://ecoregions.appspot.com/ under a CC-BY 4.0 license). Silhouettes were obtained from http://phylopic.org under a Public Domain license. Supplementary Fig. 18. Percentage of long-distance network comparisons and connections (shared interactions) across ('distinct') and within ('same') biomes. Around 67% of the pairs of networks located >10,000 km of distance from each other (i.e., long-distance network comparisons) involved networks from distinct biomes. On the other hand, 70% of the longdistance connections (i.e., 70% of the pairs of networks that are located > 10,000 km from each other and share interactions) involved networks from the same biome. Supplementary Fig. 19. Correlations between the predictor variables used in our models. a Correlations between predictors used in our interaction rewiring analysis. b Correlations between predictors used in our model with interaction dissimilarity as the response variable (see Table 1 in the main text). Sizes of the circles and colors are proportional to the correlation coefficient. Red points indicate a significant effect (P < 0.05), while black points indicate a non-significant effect. P values were calculated using a combination of Generalized Additive Models and Multiple Regression on distance Matrices (see Methods). The left y-axis represents the threshold network size class below which networks were removed, while the right axis represents the number of networks removed at this threshold. For reference, horizontal black lines indicate the point where networks with up to 10 species (which represented 17 networks) were removed from the dataset. The estimates of the full model (with all networks included) are represented by the vertical lines, with red lines indicating a significant effect and gray lines indicating a non-significant effect. The range of the x-axis was defined as ± 4 times the standard deviation of the estimates (to allow visualization of all estimates). Note that the significant effects in the full model are robust to the removal of small networks (up to 10 species) from the dataset, even though their estimates progressively tend towards zero as larger networks are removed (similarly, P values tend to increase and fluctuate around the significance threshold as estimates approach zero). Notably, the effect that seems to be most sensitive to the removal of small networks (i.e., biome distance) explained a low unique proportion of the variation in interaction dissimilarity in our full model, as most of the deviance explained by biome boundaries was shared with ecoregions ( Supplementary Fig. 12). Red points indicate a significant effect (P < 0.05), while black points indicate a non-significant effect. P values were calculated using a combination of Generalized Additive Models and Multiple Regression on distance Matrices (see Methods). The left y-axis represents the threshold network size class below which networks were removed, while the right axis represents the number of networks removed at this threshold. For reference, horizontal black lines indicate the point where networks with up to 10 species (which represented 17 networks) were removed from the dataset. The estimates of the full model (with all networks included) are represented by the vertical lines, with red lines indicating a significant effect and gray lines indicating a non-significant effect. The range of the x-axis was defined as ± 4 times the standard deviation of the estimates (to allow visualization of all estimates). Note that the significant effects in the full model (spatial and sampling intensity distances) are very robust to the removal of small networks from the dataset, even though their estimates progressively tend towards zero as larger networks are removed.

Sampling metric Rationale Sampling intensity
Sampling intensity was calculated as the square-root of the number of interaction events divided by the square-root of the product of the number of plant and animal species in the local network 112 . Sampling intensity was included in our models because it presented a strong and positive relationship with the ratio between the number of interactions sampled in the local network and the number of known possible interactions (among that same set of species) in the region (for the subset of networks within the Aotearoa New Zealand metanetwork) (Supplementary Fig. 7).

Sampling completeness
Sampling completeness was calculated as the observed richness of links divided by the estimated richness of links in the local network 113 . We used the Chao 1 richness estimator 114 to obtain the estimated number of links in our networks. Sampling completeness was not included in our models because it did not present a significant relationship with the ratio between the number of interactions in the local network and the number of known possible interactions (among that same set of species) in the region (Supplementary Fig. 7). Thus, we considered that this metric did not provide a good representation of how complete network sampling was in terms of species interactions. Sampling hours Number of sampling hours was included in our statistical models because it presented strong and positive relationships with bird richness, plant richness and number of links in the local networks ( Supplementary Fig. 8). Sampling months Number of sampling months was included in our statistical models because it presented a strong and positive relationship with the ratio between the number of interactions in the local network and the number of known possible interactions (among that same set of species) in the region ( Supplementary Fig. 7), as well as with plant richness and number of links in the local networks (for the entire dataset) (Supplementary Fig. 8).

Sampling focus
Whether the focal organisms were birds, plants, or both. As such, this variable determines if authors used a zoocentric or a phytocentric sampling method (or a combination of the two).

Sampling coverage
Whether there were focal species ('partial coverage') or not ('total coverage').

Interaction frequency type
Whether interaction frequency was estimated by counting the number of bird visits, number of fruits consumed by the bird, number of seeds in bird droppings, or number of bird droppings with seeds.

Supplementary Table 4. Multiple predictors of species turnover (βS) on plant-frugivore networks.
Here, we used the binary version of ecoregion and biome distance matrices. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the nonindependence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 5. Multiple predictors of species turnover (βS) on plant-frugivore networks.
Here, we used the quantitative version (environmental dissimilarity) of ecoregion and biome distance matrices. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 6. Multiple predictors of plant-frugivore interaction dissimilarity (βWN).
Here, we used the quantitative version (environmental dissimilarity) of ecoregion and biome distance matrices. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. (P < 0.05). N pairs of networks = 19,110. Here, we used the binary version of ecoregion and biome distance matrices. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the nonindependence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 9. Multiple predictors of species turnover (βS) on plant-frugivore networks.
Here, we used a buffer zone of 500 km and the alternative scenario 1 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 10. Multiple predictors of species turnover (βS) on plant-frugivore networks.
Here, we used a buffer zone of 500 km and the alternative scenario 2 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 11. Multiple predictors of species turnover (βS) on plant-frugivore networks.
Here, we used a buffer zone of 100 km and the alternative scenario 1 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 12. Multiple predictors of species turnover (βS) on plant-frugivore networks.
Here, we used a buffer zone of 100 km and the alternative scenario 2 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 13. Multiple predictors of species turnover (βS) on plant-frugivore networks.
Here, we used a buffer zone of 100 km and the alternative scenario 3 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 14. Multiple predictors of species turnover (βS) on plant-frugivore networks.
Here, we used a buffer zone of 1000 km and the alternative scenario 1 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 15. Multiple predictors of species turnover (βS) on plant-frugivore networks.
Here, we used a buffer zone of 1000 km and the alternative scenario 2 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 16. Multiple predictors of species turnover (βS) on plant-frugivore networks.
Here, we used a buffer zone of 1000 km and the alternative scenario 3 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 17. Multiple predictors of plant-frugivore interaction dissimilarity (βWN).
Here, we used a buffer zone of 500 km and the alternative scenario 1 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 18. Multiple predictors of plant-frugivore interaction dissimilarity (βWN).
Here, we used a buffer zone of 500 km and the alternative scenario 2 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 19. Multiple predictors of plant-frugivore interaction dissimilarity (βWN).
Here, we used a buffer zone of 100 km and the alternative scenario 1 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 20. Multiple predictors of plant-frugivore interaction dissimilarity (βWN).
Here, we used a buffer zone of 100 km and the alternative scenario 2 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 21. Multiple predictors of plant-frugivore interaction dissimilarity (βWN).
Here, we used a buffer zone of 100 km and the alternative scenario 3 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 22. Multiple predictors of plant-frugivore interaction dissimilarity (βWN).
Here, we used a buffer zone of 1000 km and the alternative scenario 1 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 23. Multiple predictors of plant-frugivore interaction dissimilarity (βWN).
Here, we used a buffer zone of 1000 km and the alternative scenario 2 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 24. Multiple predictors of plant-frugivore interaction dissimilarity (βWN).
Here, we used a buffer zone of 1000 km and the alternative scenario 3 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110. Here, we used a buffer zone of 500 km and the alternative scenario 1 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110. Here, we used a buffer zone of 500 km and the alternative scenario 2 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110. Here, we used a buffer zone of 100 km and the alternative scenario 1 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110. Here, we used a buffer zone of 100 km and the alternative scenario 2 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.  Table 29. Multiple predictors of plant-frugivore network structural dissimilarity. Here, we used a buffer zone of 100 km and the alternative scenario 3 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110. Here, we used a buffer zone of 1000 km and the alternative scenario 1 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110. Here, we used a buffer zone of 1000 km and the alternative scenario 2 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110. Here, we used a buffer zone of 1000 km and the alternative scenario 3 (see Alternative scenarios section) during the data cleaning process. The binary versions of ecoregion and biome distance matrices were used for estimating the effects of ecoregion and biome borders on the response variable. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 19,110.

Supplementary Table 33. Multiple predictors of plant-frugivore interaction dissimilarity (βWN).
Here, we used the binary versions of ecoregion and biome distance matrices and removed the study with the greatest number of networks in our dataset (study ID 76) 20 from the data. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the nonindependence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 12,880.  Table 34. Multiple predictors of plant-frugivore network structural dissimilarity. Here, we used the binary versions of ecoregion and biome distance matrices and removed the study with the greatest number of networks in our dataset (study ID 76) 20 from the data. P values were calculated using a two-tailed statistical test that combines Generalized Additive Models (GAM) and Multiple Regression on distance Matrices (MRM). In this approach, the non-independence of distances from each local network is accounted for in the hypothesis testing by performing 1,000 permutations of the response matrix (see Methods). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 12,880.

Supplementary Table 35. Multiple predictors of interaction rewiring (βOS) on plantfrugivore networks.
Here, we show the results from a Generalized Additive Mixed-effects Model (GAMM) using network IDs as random effects (one random factor for each of the pairs across which distance is compared) to account for the non-independence of distances (see Rewiring analysis section). P values of smooth terms are associated with Wald-type tests of smooth components' equality to zero. Linear terms are used for the categorical variables (ecoregions and biomes). EDF represents the estimated degrees of freedom for each smooth term in the model. N pairs of networks = 1,314.

Supplementary Table 36. The effect of large-scale ecological boundaries on interaction rewiring (βOS).
Here, we show the results from a Generalized Additive Mixed-effects Model (GAMM) using ecoregion and biome distance metrics as predictors and network IDs as random effects (one random factor for each of the pairs across which distance is compared) to account for the non-independence of distances (see Rewiring analysis section). Note that, contrary to the full model (Supplementary Table 35), only categorical variables are included in this model (with a fixed effect for each level of the category). The effect of ecoregion boundaries is significant, likely because of their collinearity with our other predictor variables. N pairs of networks = 1,314.