Aureoboletus projectellus (Fungi, Boletales) – Occurrence data, environmental layers and habitat suitability models for North America and Europe

Aureoboletus projectellus is a bolete species native to eastern North America, which has been introduced to Central Europe. Here we present summarized data about occurrence of the fungus in both disjunctive ranges based on (1) de novo georeferencing of herbarium specimens and occurrence reports; (2) information from peer-reviewed articles, mycological forums and websites; (3) personal observations and (4) from queries sent to Forest Districts and National Parks in Poland. Corresponding background data were acquired from public databases and include range of genus Pinus – obligatory mycorrhizal partner of A. projectellus – and WorldClim bioclimatic data. Both datasets were fit for purpose of range modelling, i.e. were represented as spatially compatible equal-area raster grids encompassing temperate forest biom in eastern North America and Europe. Additionally, maps of habitat suitability, reflecting association between occurrence and background data, were obtained using maximum entropy approach implemented in MaxEnt.

reflecting association between occurrence and background data, were obtained using maximum entropy approach implemented in MaxEnt. © 2019 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Data
Raw occurrence data are geographic coordinates of fruiting bodies of Aureoboletus projectellus provided in World Geodetic System 1984 (WGS 84) standard. Supplementary Table 1 contains data for the native range e eastern North America, and Supplementary Table 2 e for Europe, where the fungus was introduced. Corresponding background data (range of Pinus and bioclimatic variables) and range models are spatially compatible raster data in the ESRI Grid format projected onto a plane using Albers equal-area projection with 25 sq. km cells. Details of projections are provided in either Table 1  Value of the data Maps of habitat suitability for A. projectellus together with raw occurrence data can be used for efficient monitoring of the fungus dispersal in Europe and studying biology of this potentially invasive species. Environmental background data for eastern North America and Europe constitute a standardized framework for comparative studies of other ectomycorrhizal fungi including those endangered by the spread of A. projectellus. As A. projectellus is a potentially invasive ectomycorrhizal fungus in Europe, it constitutes an excellent model for studying the dynamics of dispersal process as well as for analysing its influence on native ectomycorrhizal communities.
synonyms of the fungus and searching among both preserved specimens and observations of fruiting bodies. All the specimens were georeferenced according to the same standardized protocol using a point radius method [1]. The procedure was based on digitized locality descriptions included in the downloaded database. We used Georeferencing Calculator ver. 20160929 [2] and the GeoLocate batch client [3] as auxiliary tools to make calculations and visualize localities on topographic maps and satellite images. Sites where A. projectellus was reported in Europe (Supplementary Table 2) were taken directly from a recent paper [4] describing the species' spread, based on information from reviewed articles, mycological forums and websites, personal observations and from answers to queries sent to Forest Districts and National Parks in Poland. Additionally, recent records from Sweden were downloaded from Artportalen (artportalen.se).

Environmental data
We downloaded information about natural distribution of pine species in America from the Geosciences and Environmental Change Science Centre (gec.cr.usgs.gov) where digital representations of tree species range maps are archived from other publications [5,6]. Data for Europe were collected from a website of the European Forest Genetic Resources Programme (euforgen.org). Information about ranges of pine species was combined and represented as a presence/absence grid projected using Albers equal-area projection with 25 sq. km cells (Table 1 and Supplementary material). For a given data cell, we reported presence if at least one pine species was reported from that particular location ( Supplementary Grids 1 & 2).
Nineteen bioclimatic variables were downloaded from WorldClim ver. 1.4 [7] with a resolution of 2.5 arc minutes. First, in order to achieve grids spatially compatible with pine presence/absence data, the angular data grid was projected onto a plane using the Albers projection with the same parameters. Second, to obtain a virtually independent set of variables, PCA was performed using a custom script written in R ver. 3.3.1 [8] and the data were limited to a set of first FIVE principal components that explain more than 95% of observed variance ( Table 2). During PCA calculations, American and European datasets were treated as a single artificial continuous region and then separated.
The available bioclimatic variables include major inland bodies of water, which are obviously not suitable habitats for terrestrial fungus. We erased data for a set of major lakes from bioclimatic layers, including the Great Lakes in the native range of A. projectellus and Lake Onega, Lake Ladoga and Lake Vanern in northern Europe (Supplementary Grids 3 & 4).

Models of habitat suitability
Based on the occurrence data of A. projectellus from North America, and environmental layers including PCA-transformed bioclimatic variables and distribution of Pinus in the region, we used the maximum entropy approach implemented in MaxEnt [9] to construct a model of the relative habitat suitability in the native range. A preliminary analysis of georeferenced occurrences revealed that they were spatially correlated, i.e. the distribution pattern of the obtained points was strikingly uneven, with clusters consisting of occurrences reported from a single survey of spatial extent limited to few kilometres or consecutive visits to the exactly the same location. Each of the clusters was represented by an artificial point being a result of the following calculations. First, duplicated records having the same coordinates were excluded from the analysis, keeping only a single point for a given location. Next, we performed cluster analysis using the 'Find Identical' tool implemented in ArcGIS 10.2 Data Management Tools. Two points were assumed spatially coincident if the distance between them was equal to or less than 5,000 m, and the analysis resulted in formally identified clusters of points. Finally, for each cluster, a geographic mean was calculated using the 'Mean Center' tool from the Spatial Statistics extension for ArcGIS 10.2 (Supplementary Table 3).
Bioclimatic data were treated as continuous variables and pine presence-absence records as a categorical (binary) predictor. During each of the 100 replications of the MaxEnt analysis, a different subsample of 75% randomly chosen occurrence records was used to train the model that was finally averaged over the replicates. For each replicate the remaining 25% of occurrences were used to test the model and variability of model parameters between replicates was assessed. We chose the 'raw' output format and left all the remaining settings at the defaults (

Transparency document
Transparency document associated with this article can be found in the online version at https:// doi.org/10.1016/j.dib.2019.103779.