Pervasive gaps in Amazonian ecological research

combined deforestation


RESULTS
Our detailed assessment of the ecological research in the Brazilian Amazon assessed how logistics and human influence on the forests explained research probability across 7,694 community ecology sites surveyed from 2010 to 2020. Across nine organism groups-benthic invertebrates, heteropterans, odonates, fishes, macrophytes, birds, woody vegetation, ants, and dung beetles-ecological research was unevenly distributed in all three major habitat types investigated with easily identifiable research gaps (research probability < 0.1) covering 54.1% of unflooded areas locally known as terra firme (uplands, hereafter), 27.3% of aquatic habitats, and 17.3% of wetlands ( Figure 1). While ecological research effort differs across organism groups, our findings highlight very consistent spatial patterns of research probability across habitat types, even among groups assessed in multiple habitat types (plants and birds; Figure 2).

Drivers of research biases
Overall, logistics and human influence factors explained 64% of the variation in research probability. Among the logistic-related factors, accessibility and distance to research facilities consistently emerged as important predictors of research probability (Figure 3), highlighting the role of logistical constraints and ease of access. Research probability increased with closer proximity to transportation and research facilities for all upland organisms and most representatives of wetlands and aquatic habitats. In addition, dry season length mattered for ecological research on wetland birds but showed little contribution with respect to other organisms ( Figure 3C). Dry season length also had a contrasting effect across habitat types, increasing the research probability in uplands and aquatic habitats, but decreasing it in wetlands. Although logistics influenced ecological research the most, forest degradation and land tenure also showed a modest but consistent importance across all organism groups. Both predictors affected ecological research in the same direction across organisms, with research probability slightly declining in more degraded areas and indigenous lands but increasing in protected areas ( Figures 3D and 3E).

Research biases and projected environmental changes
Unfortunately, about half of the Brazilian Amazon is either already deforested (23.50%) or projected (27.29%) to be by 2050, 20 with these regions showing contrasting chances of The central map represents the average research probability across all organism groups and habitat types. The inset maps at the bottom illustrate the research probability for different organisms in aquatic (bluish maps), wetland (greenish maps), and upland (orangish maps) habitats. In all maps, the black crosses indicate the sampling points for the period between 2010 and 2020. The donut shows the percentage of samples belonging to each biological group across different habitats (beetles and ants from wetlands were not modeled due to low sample size). Research probability was accurately predicted across all organism groups (mean Sorensen 0.89, range = 0.84-0.93). See Table S3. ecological research ( Figures 4E and S2). For instance, research probability was higher among areas currently deforested than in areas projected to be deforested within the next three decades ( Figure 4E), corroborating the finding that research has occurred mostly in human-modified landscapes. Our findings also indicate that 15%-18% of the most scientifically neglected areas ( Figure S2), herein defined as those in the first quartile of research probability, show high susceptibility to climate changes by 2050 ( Figures 4A and 4C) and habitat destruction ( Figures 4B  and 4D).

DISCUSSION
We elucidate how logistics and human influence have affected ecological research coverage in Brazilian Amazonia. Our comprehensive regional synthesis improves the understanding of Amazonian ecological research and opens new avenues to redress the underrepresentation of tropical rainforests in biodiversity databases. Besides offering further support for the role of accessibility and research facilities in ecological studies, 21 our findings also highlight some of the challenges involved in expanding research to areas that have not been sampled before. 22 While increasing overall research effort is valuable for many reasons, our models show that this would mostly lead to more surveys in areas with a high probability of research, as the lack of accessibility and research infrastructure 15 promotes close distances between new and existing sampling sites. 19 Additional measures will be required to overcome these barriers and reach regions where the research probability is low. Here we explore two key challenges related to resolving spatial research gaps in the Brazilian Amazon.

Remote regions
Accessibility was a key factor in our results. One strategy to reach undersampled regions would be to fund expedition-like programs such as the Expediç a˜o Serra da Mocidade, where Amazon-based researchers from different institutions surveyed remote mountains in northern Amazonia, where no previous studies had been conducted. 23 While the Expediç a˜o Serra da Mocidade focussed mainly on finding new species and understanding species distributions, an ecological expedition would require longer periods in the field to enable the use of the standardized sampling protocols required to assess biotic changes across space and time. For instance, regions with low research probability partially overlap with those projected to experience either high and low climate change, as well as with regions facing high risk of future deforestation and degradation ( Figure 4). Hence, enhancing ecological research in these remote regions could be the sole chance to unveil pieces of the Amazon's biodiversity puzzle before they succumb to human-induced modifications, while also seizing one of our prime opportunities to comprehend climate change effects without the potential anthropogenic influence. 24 Distance to research centers was also an important factor, and an alternative approach would be to fund new centers in cities that are within or close to the areas with the lowest probability of research. While this could be a more logistically demanding approach, it has four major longer-term benefits. First, it would encourage the training of local researchers and help science to endure beyond individual assessments. Second, it could build the base for more detailed scientific research, including the logistical support necessary to identify and prepare specimens and develop local collections. Third, it could enhance the capacity required to conduct the longerterm research required to understand global change. Finally, it may be a more sustainable strategy than investing in expeditions, as these are rarely repeated and risk being canceled under new governments. Although research centers have suffered under recent governments, being subjected to long-lasting neglect affecting funding, structural maintenance, and emptying of key scientific and technical staff, 25 they have also shown themselves to be resilient, and continue to lead and support long-term research across the Amazon. Whatever approach is taken, it is key that it does not undermine the current network of research and education facilities in the Brazilian Amazonia or remove funding from long-term monitoring programs on biodiversity baselines and ecological changes, 24 which often take place near existing research facilities.

Indigenous lands
Results highlighted the limited research effort in indigenous lands when compared to strictly protected and sustainable use reserves. This reflects a major ecological knowledge gap, as indigenous lands cover about 23% of the Brazilian Amazon and play a fundamental role in preserving Amazonia's biocultural diversity. 26,27 Over the last years, Brazil's indigenous lands have come under increasing threat from illegal activities, such as logging, invasion by squatters, and gold mining, 28 with this latter activity also strongly impacting the health of riverside peoples. 29 The lack of government support has forced traditional and indigenous peoples to defend their territories on their own. 30 Indigenous lands such as the Kaiapó and Araribó ia already represent some of the last areas of extensive forest in the south and eastern Amazon, and the role of indigenous lands is expected to become more pronounced under business-as-usual deforestation practices ( Figure 4E).

OPEN ACCESS
While bureaucratic requirements and limitations in local communicability may reduce the research propensity in such areas, 19 coordinated actions between the environment ministry (a specific organization within the Brazilian government) research centers and indigenous peoples have high potential to minimize knowledge gaps within indigenous lands. Any such knowledge co-production needs to be equitable and decolonial, and to recognize and respect the diverse knowledge systems in place. 31,32 For instance, traditional practices of indigenous people should be valued and incorporated into research methodologies, ensuring their active participation and ownership in the process. The co-creation of research with local communities will likely require additional support and training for both the scientific community and for local communities and their organizations. 31 Such a program could have many additional benefits, as enhancing the involvement of local communities could support more inclusive science as well as better resource management and livelihoods. 33 Differences between organisms and habitats We find remarkably consistent drivers of research effort across different organisms and habitats. But one noticeable exception is the disproportionate importance that a small number of local experts have in the spatial sampling of underrepresented taxa.
For instance, most sampling (95%) of aquatic invertebrates (heteropterans and odonates) are distributed in eastern Amazonia (Pará state) and come from a single research group composed of local experts based in the Amazonian city of Bel em. Other aquatic invertebrate specialists have collected in different regions of Amazonia, but their focus on taxonomy means they rarely use the standardized sampling required for ecological data. Considering the distinct objectives of taxonomic research and ecological sampling, prior planning and greater collaboration will be required to achieve the potential benefits that could be accrued from integrating Amazonian ecology and taxonomy. We also demonstrate that research gaps are higher for uplands than wetlands and aquatic habitats, which likely reflects the role that the broad network of navigable waterways has in facilitating access to wetland and aquatic areas.

Can regional curation help reduce global biases?
The metadata we based this research on describe a large number of datasets with substantial coverage across multiple organisms in the Brazilian Amazon (see the Synergize project 34 ). To date, few of these datasets are integrated into global databases. For example, the BioTIME, 35 BIOFRAG, 6 FragSAD, 7 and Predicts 5 databases collectively include only 222 datasets for the  Brazilian Amazon, representing less than 3% of the Synergize effort. In contrast, more than 40% of Synergize datasets (n = 3,281) can potentially contribute to global assessments, adding 1,103 time series datasets for BioTIME, 506 datasets under the requirements of BIOFRAG/FragSAD, and 1,672 datasets on land use comparisons that meet the objectives of Predicts. Although these are upper bound estimates that will fall due to the additional requirements of specific global networks (e.g., pre-defined taxa and habitats), the differences highlight the value of carefully produced regional datasets to mitigate data biases in collaborative networks. 36 To secure the engagement of tropical research communities, it is crucial to implement an inclusive code of conduct that values data ownership in resulting products. [36][37][38] Conclusion Our large-scale assessment of ecological research across the Brazilian Amazon not only highlights the extent of research gaps and biases in tropical systems, but also provides valuable insights into potential solutions to improve conservation planning for the world's most diverse rainforest. We show the importance of going beyond areas that are accessible and close to research bases, and expanding research into regions that will likely be affected by climate change or deforestation. Doing so will not be easy, and ecology alone will not help resolve the environmental crises facing the world. But understanding the responses of biodiversity and ecosystems forms a key part of keeping society informed about its impacts and supporting the implementation of evidence-based policies and practices that can help mitigate the worst outcomes.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:  90 data owners to collaborate with their metadata (Table S1). Overall, 47 authors returned our contact and shared metadata on 2,597 inventories meeting our criteria.

Terrestrial animals
We focused on published studies on ants, birds, dung beetles in Amazonia using the Web of Science platform (Table S1 for string words used and search dates). Searches were conducted in English and with no restriction on year. We did not focus on a search period between 2010 and 2020 for these groups, as the Synergize project is developing a database that goes beyond the period considered here TAOCA. 44 From the initial total of 2,244 published manuscripts obtained through the platform, only 225 were ecological studies. Among these 225 studies, we contacted 99 authors, many of whom were corresponding authors of more than one article. Ninety one percent of these researchers returned our contact and shared metadata of their studies (Table S1).

Aquatic groups
We focused on published studies of Amazon fishes using the Web of Science platform (Table S4 for string words used and search dates). Since fishes are also research subjects in many areas of knowledge (such as social food sciences), we selected only articles related to biodiversity and conservation areas. For other aquatic groups (benthic, heteropterans, odonates and macrophytes), we focused on a period between 2010 and 2020. We divided aquatic invertebrates into benthic, heteropterans, odonates, because most datasets on heteropterans and odonates referred to sampling of adults near streams, while for benthic invertebrates (Ephemeroptera, Plecoptera, Trichoptera, Diptera and Coleoptera), the datasets corresponded to the sampling of larvae in streams. From the initial total of 1,073 published manuscripts obtained through the platform, only 523 were ecological studies. Among these 523 studies, we contacted 69 authors, many of whom were corresponding authors of more than one article. Eighty-one percent of these researchers returned our contact and shared metadata of their studies (Table S1).
All sampling sites -informed within the ecological studies herein considered -had their geographic coordinates verified and crosschecked with databases of the political and environmental limits of the Brazilian Amazon. Studies were included if data collection utilized repeatable methods, which varied among different biological groups (see details in Table S1). Organism groups were sampled along transects, grids, or plots, which exhibited significant variations in size.
Metadata used in this study, including habitat, year of study, organism groups, and geographical coordinates (longitude and latitude) can be found in the STAR Methods (see data and code availability). Other metadata and community data can be found in the original sources, which varied between groups. For woody vegetation, data is available in three consolidated databases: ForestPlots, 36,40 Amazon Tree Diversity Network (Rainfor, 41 ATDN 42 ), and Secondary Forests Research Network (2ndFOR 43 ). For terrestrial fauna, the data are deposited at the TAOCA platform. 44 Fish group data is available in the AmazonFish, 45 aquatic invertebrates and macrophytes are available, under request, with Leandro Juen and Thaise Michelan, respectively, both from the Laboratory of Ecology and Conservation (UFPA, Brazil 46 ).

METHOD DETAILS Predictors of research probability
To model research probability for each organism group, we used predictors related to logistics (accessibility, research facilities, and dry season length), and human influence (land tenure, degradation, and dry season length). We define human influence as those variables used to assess the ecological condition of a particular habitat or ecosystem. 47 We used logistics-related variables as certain areas are easier to work in due to accessibility, proximity to research centres, or location within public protected areas. 19 As the dry season length affords extended accessibility to many of the drier regions in the Brazilian Amazon each year, we deemed it a crucial logistical factor. 48 Although predictor variables change over time, we assumed that these changes in a short time window are negligible relative to the variation across space and adopted a static layer to represent the Brazilian Amazonia over the ten years from which data were collected.
To evaluate model performance, we used the Sørensen similarity index to measure the similarity between predictions and observations. This index is independent of the prevalence of sampled sites (ratio of observed presences to all sites) and less sensitive to under/overprediction issues than metrics based on specificity, such as the True Skill Statistic -TSS. 72 We measured variable importance using the proportional increase in MSE, which measures the relative decrease in model accuracy by shuffling variable values. We also built partial dependence plots to represent the marginal effects of predictor values on research probability across taxa and habitats.
Overlap between research and anthropogenic disturbances We intersected research probability with susceptibility to current and future anthropogenic disturbances to identify areas with ecological knowledge most at risk. We used three indicators of susceptibility to anthropogenic disturbances that reflect continuous trends of long-term demographic growth and economic development: climate change, 73 deforestation, and degradation. 20 To indicate climate change, we computed the difference between current and future projections of climate represented through 13 bioclimatic variables (e.g., DBio1 = Bio1 future -Bio1 current ) obtained from the Intergovernmental Panel on Climate Change (IPCC) Interactive Atlas 73 (Table S4). For each bioclimatic variable, the Dclimatic values were measured between projections for 2041-2060 and 1981-2010 under the SSP585 scenario. To provide a consensus metric of future climate change across different generalised circulation models (GCM), we average Dclimatic values across all GCM available at the IPCC Interactive Atlas (Table S4). We rescaled the Dclimatic values in the interval between -1 and 1 and passed their absolute values (|Dclimate|) through a Principal Component Analysis (PCA) to remove multicollinearity. The magnitude of climate change for each pixel was calculated as the Euclidean distance between its position in the two first axes PCA space and the origin; which represents a reference point of no climate change (where | Dclimate| = 0 along all axes).
To identify areas most threatened by deforestation and degradation, we used projections made recently available indicating trends under a business-as-usual scenario for 2050. 20 The term deforestation refers to the complete removal of canopy cover, whereas degradation is the term used to describe a natural or human-induced disturbance that does not alter the land cover category assigned to a pixel. 56,57 Currently deforested areas are also shown since they may be reforested in case of land abandonment or political incentives. We prepared the spatial layers using Google Earth Engine (GEE 74 ) and carried out all statistical analyses in R 4.0.5. 75 Next, both measures of anthropogenic disturbances, (i) magnitude of climate change and (ii) combined deforestation and degradation, as well as research probability, were split into equal-sized quantiles holding 0À25, 25À50, 50À75, and 75À100% of samples (pixels). We used the first quartile of research probability (0À25%) to identify the most neglected areas in ecological research, and the last quartile (75À100%) of climate change and deforestation-degradation to identify areas most susceptible to anthropogenic disturbances.

ADDITIONAL RESOURCES
For terrestrial fauna, the metadata used in this study were stored in the TAOCA database. This database emerged in response to the demand for organizing, standardizing, and securely storing a large amount of data received by the Synergize project (https://www. taoca.net/).