Editorial: Floristic and vegetation studies in the era of big data: challenges, trends and applications

COPYRIGHT © 2023 Fois, Marcenò and Franklin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. Editorial: Floristic and vegetation studies in the era of big data: challenges, trends and applications

Monitoring spatial and temporal dynamics in plant species and community diversity is one of the major challenges in ecology and conservation. Long-term and accurate data from systematic survey programmes, with standardized sampling designs and rigorous protocols, are unfortunately scarce and concentrated in a few biomes such as GLORIA (alpine, Pauli et al., 2001) or Grassplot (Palaearctic grasslands, Dengler et al., 2018). Recent trends show that the implementation of traditional field and natural history museum collections are not keeping pace; that is, the number of species occurrences and habitat data seem inadequate to examine the changes occurring (Beck et al., 2014). On the other hand, the increasing availability of big datasets, often derived from modern digital technology, is promisingly supplementing information to monitor changes.
In plant science, species data occurrences (Robertson et al., 2014) and vegetationplot databases (Chytrý et al., 2016;Bruelheide et al., 2019) are two of the most common and powerful tools to supplement existing research and provide new perspectives on more complex and geographically broader questions. Floristic and vegetation data were historically almost exclusively retrieved by experts, while, more recently, an increasing contribution from citizen science programmes (Nugent, 2018) and naturalist community platforms (Marcenò et al., 2021) offer an interesting opportunity to retrieve data. Online platforms can amass large and updated data with relatively little effort, while monitoring programs based on expert knowledge are still crucial to avoid biases in species identification and unbalanced efforts on particular species and habitats. The scientific and economic value of expert data providers might, for instance, be recognized in order to re-evaluate the important roles of the field botanist (Crisci et al., 2020) and to maintain high quality and reliable data. Moreover, the formulation of more inclusive data-sharing agreements might allow the growth of stable cross-political and cross-biomes collaborations, and enhance the interest and reliability of the research products (Bruelheide et al., 2018).
In addition, merging big datasets to tackle research questions can be fruitful. Utilizing trait datasets (Kattge et al., 2020) in combination with vegetation plots, for example, may help interpret trends in geographic vegetation shifts, especially in relation to climate and disturbance datasets. The addition of several satellites that provide increased spatial and temporal coverage and resolution that can be linked to plot data and individual tree and shrub species offers an extrapolation of geographic coverage and ecological questions (e.g., Lake et al., 2022;Rocchini et al., 2022).
With the rapid development of implementing, managing, and processing big data in plant science, this collection of articles aims to show the opportunities offered by the use of big data in different fields of plant science and discuss the main gaps and future challenges to better use.
Catarino et al. used the species data occurrences amassed in GBIF and in different online herbaria to study the species diversity of the Leguminosae family in Angola. They identified 953 taxa, of which 165 are endemics, giving information on their biogeographical and conservation status, life form, and main traditional use. The importance of having harmonized data on online platforms to conduct further floristic studies in littleexplored areas is fundamental. Nevertheless, the authors concluded that one of the main issues of this study was the absence of recent data. Although in Angola, the abandonment of field research for several years was caused by the war, the same trend occurs worldwide. Big data offer also the opportunity to predict changes in the distribution ranges of endemic flora. The high value of worldwide repositories was confirmed by Lannuzel et al., who collected data on 87,733 plant occurrences from a hundred different original datasets, which allowed to double the number of known narrow endemic taxa and elucidate 68 putative new species in New Caledonia. Despite the promising reliability of automated data filtering, a vast amount of work for taxonomic analysis by local and international taxonomists was recognized as the most powerful way to reduce data biases and loss.
Some important sources to support floristic and vegetation studies are CHELSA and WorldClim which contain different models on past and future climatic predictions to be correlated with species occurrence data. Peyre performed species distribution models for 664 species in the Andean Páramo. The models were able to predict the extinction rate of the species pool analyzed and future gains and losses areas in the Páramo. Despite the interesting results obtained, the author confirms the necessity to complement these results with fine-scale studies. Knowing the past to understand the future was the approach used by Almeida et al. to reconstruct the spatial distribution trends of the rockrose (Cistus ladanifer L.). In this case, data from 10 different available sources were used for a single, relatively abundant and widely distributed species. A large amount of data required a considerable effort to reduce bias, redundancy and reduce spatial autocorrelation.
The final set of 2,833 revised records was satisfactorily used to elucidate historical, present and future population retractions and expansions through species distribution modeling.
The increasing amount of freely available data offers valuable opportunities for studies in plant ecology, conservation and environmental restoration. This Research Topic shows implications for different fields, with the main general aim to fill gaps and strengthen links between Linnean (i.e., lacking knowledge on total number of species), Darwinian (i.e., lacking species evolutionary relationship knowledge), and Wallacean (i.e., lacking species distribution knowledge) shortfalls (Diniz-Filho et al., 2013, 2023. Overall, this work supports the need to improve the quality of such a large quantity of resources, despite their already current usefulness. Automated data checking can facilitate this process, although the continuous training of experts and the strengthening of collaborations between them remains crucial. We hope that this Research Topic collection will provide some background in this interesting and burgeoning field.

Author contributions
MF had the original idea for this topic and provided an outline and first draft of this introductory article. MF, CM, and SBF collaborated on the description and organization of this Research Topic. CM and SBF provided additional text. All authors revised and approved the manuscript for submission.