Data associated with ecological niche models and post-ENM statistical analyses for Trillium species distributions

Miller, Chelsea, University of Georgia, https://orcid.org/0000-0002-8214-1565

cnm51003@uga.edu

Published May 25, 2021 on Dryad. https://doi.org/10.5061/dryad.6m905qg03

Cite this dataset

Miller, Chelsea (2021). Data associated with ecological niche models and post-ENM statistical analyses for Trillium species distributions [Dataset]. Dryad. https://doi.org/10.5061/dryad.6m905qg03

Abstract

This dataset consists of 1) occurrence data for 21 species of Trillium native to eastern North America, collected between 1900 and 2018 ("Trillium_Occurrences"); and 2) ecological niche model values and species reproductive life history traits used in post-ENM analyses ("Trillium_LifeHistoryTraits"). Occurrence datasets were collected by searching several publicly-available online databases, including the Global Biodiversity Information Facility, the SouthEast Regional Network of Expertise and Collections, Tropicos, and online regional herbaria databases, such as the University of Tennessee Herbarium. Half of all records were assigned latitude/longitude coordinates using GEOLocate software (Rios et al., 2010). A centroid of uncertainty of 3 km² was automatically assigned to each locality by GEOLocate, and minimum uncertainties were adjusted manually based on specificity of record descriptions. Descriptive localities were georeferenced by one of three researchers, and all final coordinates and uncertainties were checked and confirmed by the first author. Occurrences are provided as latitude and longitude decimal degrees, and were used as training and testing data for ecological niche models, implemented using Maxent 3.4.1 (Phillips et al., 2006). For the second dataset, all final ecological niche model values were obtained or calculated based on the final ENMs produced for our study (see manuscript Methods for more details). NatureServe conservation status for each species were obtained from NatureServe Explorer (2020). Reproductive life history traits were obtained from Ohara (1989). These data were used to assess whether reproductive life history traits were significant predictors of the proportional occupancy of the predicted distribution, PO, using beta regression models.

Methods

Occurrence datasets were generated using the following protocol (excerpt from manuscript): we obtained every publicly-available presence record for each Trillium species in ENA, with records dating back to 1900. The databases searched included the Global Biodiversity Information Facility (GBIF; https://www.gbif.org/, accessed July 28 - Aug. 1, 2018), the SouthEast Regional Network of Expertise and Collections (SERNEC; http://sernecportal.org/portal/, accessed Aug 1, 26, 30, and Sept. 1 - 4, 2018; July 12, 2019), Tropicos (https://www.tropicos.org/, March 28, 2019), and online regional herbaria databases, such as the University of Tennessee Herbarium (TENN; https://herbarium.utk.edu/, accessed Aug. 10 – 15, 2018) and the Arnold Arboretum of Harvard University (https://www.arboretum.harvard.edu/, accessed Aug. 17, 2018). Approximately half of all records we obtained consisted of descriptive localities without latitude/longitude coordinates. To assign geographic coordinates to these localities, we used the GEOLocate software (Rios et al., 2010; https://www.geo-locate.org/, accessed from August, 2018 – September, 2019). A centroid of uncertainty with an area of 3 km² was automatically assigned to each locality by GEOLocate. Minimum uncertainty was adjusted manually based on specificity of record descriptions. Descriptive localities were georeferenced by one of three researchers, and all final coordinates and uncertainties were checked and confirmed by the first author. Occurrences are provided as latitude and longitude decimal degrees.

Additional information about the collection and processing of occurrence records, as appears in the manuscript Appendix 3: Occurrence data originated from a variety of sources, but the majority of occurrences were opportunistic. Approximately 2.5% of the occurrences originated from iNaturalist (https://www.inaturalist.org/; both GBIF and SERNEC provided occurrences derived originally from iNaturalist), representing citizen scientist collections. Approximately 25% of the occurrences originated from herbaria/museum records, and are therefore more likely to be reliable scientific specimens with accurate locality data, which may or may not have originated from standardized surveys. The remaining data have unknown sources. The majority of occurrences provided no information about sampling method or date; as such, it was not possible to accurately estimate sampling bias or the temporal range represented by the data set. However, because the data were derived from a variety of sources, we expect that the likelihood that they exhibit a consistent form of sampling bias is low, and the risk that sampling bias would drive distribution models is probably negligible. We estimate that ~ 75% of the occurrences originated within the past 100 years.

For the second dataset, all final ecological niche model values (e.g., area of the EOO (km2), area of the PSA (km2), area of the intersection between the EOO and PSA (km2), and area of proportional occupancy (km2; calculated by dividing the area of the intersection between the EOO and PSA by the area of the PSA) were obtained or calculated based on the final ENMs produced for our study (see manuscript Methods for more details). The total number of occurrences is the final total number of occurrence records used in ENMs for each species. NatureServe conservation status for each species were obtained from NatureServe Explorer, 2020 (http://explorer.natureserve.org/). Reproductive life history traits (e.g., biomass (g), number of ovules, number of seeds produced per individual, seed setting rate, and flower type) were obtained from Ohara (1989)*. These data were used to assess whether reproductive life history traits were significant predictors of the proportional occupancy of the predicted distribution, PO, using beta regression models.

*Ohara, M. (1989). Life history evolution in the genus Trillium. Plant Species Biology, 4(1), 1-28.

Usage notes

The occurrence datasets are comma-separated values (.csv) files, and are organized in the manner required by the Maxent program (v. 3.4.1; Phillips et al., 2006; e.g., specific epithet, longitude, latitude). Lat/long coordinates are in decimal degrees.

A note on training occurrences: if the file is named “[species name]_training.csv,” that is the training dataset that was uploaded to Maxent. The data were split prior to uploading to Maxent, based on minimum uncertainty of georeferences. If the file is named “[species name]_both.csv,” that is the full set of occurrences (i.e., both training and testing) that was uploaded to Maxent. Maxent was allowed to split the data either 50/50 or 70/30, depending on the species. Method of data splitting per species is included in Table 4 in the manuscript.

The life history dataset has some missing values for T. cernuum and T. sulcatum. Values of all reproductive life history predictor variables were obtained from Ohara (1989; Table 1 in the manuscript). Because the Ohara (1989) dataset only contained information for 19 of our 21 study species, excluding T. cernuum and T. sulcatum, we did not include either of these species in beta regression models.

Funding

L. R. Hesler Herbarium Support Fund at the University of Tennessee, Knoxville