Towards a Holarctic synthesis of peatland testate amoeba ecology: Development of a new continental-scale palaeohydrological transfer function for North America and comparison to European data

training sets. The new model can therefore be used as an effective tool to reconstruct peatland palae- ohydrology throughout the North American continent. Some eco-regions exhibited lower taxonomic diversity and some key indicator taxa had restricted ranges. However, these patterns occurred against a background of general cosmopolitanism, at the moderate taxonomic resolution used. Likely biogeographical patterns at higher taxonomic resolution therefore do not affect transfer function performance. Output from the combined North American and European model suggested that any geographical limit of scale beyond which further compilation of peatland testate amoeba data would not be valid has not yet been reached, therefore advocating the potential for a Holarctic synthesis of peatland testate amoeba data. Extending data synthesis to the tropics and the Southern Hemisphere would be more challenging due to higher regional endemism in those areas.


Keywords:
North America Testate amoebae Peatland Water table Transfer function Ecology a b s t r a c t Fossil testate amoeba assemblages have been used to reconstruct peatland palaeohydrology for more than two decades. While transfer function training sets are typically of local-to regional-scale in extent, combining those data to cover broad ecohydrological gradients, from the regional-to continental-and hemispheric-scales, is useful to assess if ecological optima of species vary geographically and therefore may have also varied over time. Continental-scale transfer functions can also maximise modern analogue quality without losing reconstructive skill, providing the opportunity to contextualise understanding of purely statistical outputs with greater insight into the biogeography of organisms. Here, we compiled, at moderate taxonomic resolution, a dataset of nearly 2000 modern surface peatland testate amoeba samples from 137 peatlands throughout North America. We developed transfer functions using four model types, tested them statistically and applied them to independent palaeoenvironmental data. By subdividing the dataset into eco-regions, we examined biogeographical patterns of hydrological optima and species distribution across North America. We combined our new dataset with data from Europe to create a combined transfer function. The performance of our North-American transfer function was equivalent to published models and reconstructions were comparable to those developed using regional

Introduction
Testate amoebae are a polyphyletic group of microscopic, unicellular protists that occur globally in a range of soils, rivers, lakes and wetlands, including ombrotrophic peatlands e environments which are frequently used to reconstruct Holocene palaeohydrological change . These reconstructions are based on a transfer function approach, inherent in which is a robust understanding of peatland testate amoeba ecology, particularly as it relates to target reconstruction variables, principally water table depth (WTD).
Local-or regional-scale transfer function models around the globe have provided robust palaeohydrological reconstructions based on the strong empirical relationship between independent (i.e. WTD) and response (i.e. testate amoeba community change) variables (e.g. Swindles et al., 2009Swindles et al., , 2014van Bellen et al., 2014;Li et al., 2015). However, reconstruction magnitude can vary when local-and regional-scale models are applied outside of their training set range , providing a strong motivation to develop broader-scale, compiled models that provide standard reconstruction tools with directly comparable results.
Modern analogues from sites close to a core location may potentially better represent the contemporary environment than those from more distant sites in the present day. However, as we go back in time through different climate and ecological boundary conditions, as any transfer function reconstruction inevitably aims to do, local modern analogues may be less appropriate and it becomes more likely that better analogues will be found in different regions. Therefore, whilst local-and regional-scale models perform well statistically in cross-validation, the same performance cannot be assumed for downcore reconstructions and large-scale datasets/ models should yield more reliable reconstructions for any given location given their greater range of modern analogues. However, as the geographical extent of training sets is increased to the continental-, hemispheric-or even global-scales, it may become more problematic to effectively model independent and response variable relationships due to the introduction of confounding factors such as latitude and temperature effects, secondary environmental gradients or non-linear processes such as permafrost and fire. Therefore, there may be a geographical range limit beyond which the compilation of modern testate amoeba and associated environmental data is no longer beneficial for model development.
Recently, it has been shown that the development of continentalscale transfer functions may have advantages over the use of local-or regional-scale models (Amesbury et al., 2016). In particular, large training sets (i.e. > 1000 samples) capture a greater range of variability in target environmental variables and include more modern analogues for taxa that may be abundant in fossil records but that can be rare or absent in modern datasets (Charman et al., 2007;Swindles et al., 2015a), potentially improving palaeohydrological reconstructions.
In addition, although individual transfer functions statistically model the relationship between modern testate amoeba assemblages and WTD, a broader contextual understanding of peatland testate amoeba ecology and biogeography is important. For example, recent research has shown that certain taxa, or groups of taxa, may react to environmental change at different timescales (Sullivan and Booth, 2011;Marcisz et al., 2014) or have geographically restricted or skewed ranges (Booth and Zygmunt, 2005;Amesbury et al., 2016;Beyens and Bobrov, 2016), even extending to regional endemism, particularly in the tropics and Southern Hemisphere (Smith and Wilkinson, 2007;van Bellen et al., 2014;Reczuga et al., 2015). However, these ecological and biogeographical intricacies are not explicit in strictly statistical reconstructions. Indeed, research has shown that continental-scale geographical patterns of both genotypic and phenotypic diversity exist within seemingly cosmopolitan taxa (Heger et al., 2013;Roland et al., 2017;Singer et al., 2018). In the case of Hyalosphenia papilio (Heger et al., 2013), this is a finding potentially supported by a regionally skewed distribution of the same taxon across Europe (Amesbury et al., 2016).
Greater insight into wide-scale biogeographical patterns of soil protists is a fundamental research domain (Geisen et al., 2017). An increasing number of studies have used protists (Foissner, 2006(Foissner, , 2008, and specifically testate amoebae (e.g. Smith and Wilkinson, 2007;Fournier et al., 2015;Fern andez et al., 2016;Lara et al., 2016), to argue against the 'everything is everywhere: but the environment selects' paradigm (Baas Becking, 1934;see also De Wit and Bouvier, 2006;O'Malley, 2007O'Malley, , 2008. Research communities can move towards a broader understanding of these patterns by working collaboratively to make use of the ever-increasing amount of data from individual studies that can be compiled and used to both develop new, robust reconstruction tools (Amesbury et al., 2016), as well as begin to provide insight into important biogeographical debates (Smith et al., 2008). In addition, an understanding of wide-scale peatland testate amoeba ecology and biogeography is important in order to provide robust regional-to continental-scale palaeoclimatic interpretation of 1) palaeo-data compilations (e.g. Swindles et al., 2013), 2) studies from individual sites or groups of sites aiming to study the Holocene evolution of regional-scale climate systems (e.g. Magnan and Garneau, 2014;, 3) studies investigating anthropogenic drainage (e.g. Laggoun-D efarge et al., 2008;Payne et al., 2010;Talbot et al., 2010) and 4) studies of ecohydrological changes from specific climatic disturbances such as permafrost thaw (Lamarre et al., 2012;Swindles et al., 2015b;Pelletier et al., 2017;Zhang et al., 2018).
Building on the development of a recent pan-European transfer function (Amesbury et al., 2016), here we apply a similar approach to the North American continent. Through a wide collaborative network, we have compiled almost 2000 published and unpublished testate amoeba surface samples from 137 sites covering 34 of latitude and almost 100 of longitude. Our objectives are to (1) describe the ecology of peatland testate amoebae in North America  in terms of the key environmental variable, water table depth; (2) develop a new, rigorously tested continental-scale transfer function for reconstructing ombrotrophic peatland palaeohydrology; and (3) as the first step towards a Holarctic synthesis, assess the ecology and biogeography of testate amoeba across two continents by combining our new North-American dataset with data from a pan-European synthesis (Amesbury et al., 2016) to create a combined dataset with >3700 samples in total.

Data compilation and taxonomy
We collated a total dataset of 1956 modern surface samples from 137 peatlands located throughout the Canada and the USA ( Fig. 1; Table 1). Peatlands included were predominately Sphagnumdominated and ombrotrophic, but data from other site types was also included (e.g. Booth and Zygmunt, 2005). Data were compiled from 15 published sources in addition to four unpublished datasets (Table 1). All original datasets, including previously unpublished datasets, in percentage data form and a majority also as raw count data, have been archived and are freely available on the Neotoma Paleoecology Database (www.neotomadb.org). Raw count data was available for 1844 of 1956 samples (excluding the datasets of Warner, 1987 andPayne et al., 2006). For those samples, the mean count was 148 individuals. Only 65 samples had a count <100, with a minimum count in any one sample of 52 individuals. All testate amoeba surface samples had an associated water table depth (WTD) value, with a smaller number (n ¼ 1859) also having an associated pH value. Although some datasets contained other potential environmental variables (e.g. electrical conductivity, vegetation type), these data were not frequent enough across the entire dataset to permit inclusion in the data analysis.
Since the multiple analysts involved in the original data collection (publication dates ranging over 30 years from 1987 to 2017) applied slightly varying taxonomic schemes, it was necessary to merge data into a coherent taxonomy that was valid across all datasets. To do this, we followed the precedent set by Amesbury et al. (2016) in a compilation of European peatland testate amoeba data, taking a low-resolution approach and merging like taxa into groups wherever necessary. However, due to the generally greater consistency between original taxonomies than was the case for Amesbury et al. (2016), the relative extent of grouping required here was less, with a total of 72 taxa in the complete merged dataset, of which 25 were groupings containing multiple taxa (Table 2). Despite the improvement in taxonomic resolution from Amesbury et al. (2016), individual analysts should not count samples to the taxonomic scheme used here, but rather define individual taxa to as great an extent as possible (e.g. particularly within Corythion e Trinema type), grouping into this scheme for statistical analysis only. Exceptions to the taxonomy of Amesbury et al. (2016), permitted by the relative taxonomic consistency of the original datasets, were as follows: 1. Arcella artocrea was consistently identified in addition to Arcella arenaria type and Arcella catinus type, so this division remained. However; 2. Arcella catinus type was identified in most datasets, whereas Arcella arenaria type was only identified in two datasets, that did not include Arcella catinus type. Therefore, Arcella arenaria type was merged into Arcella catinus type; 3. Centropyxis cassis type and Centropyxis platystoma type were consistently identified across all datasets, so these divisions remained. Centropyxis aerophila type was only identified in two datasets so was merged into Centropyxis cassis type; 4. Cyclopyxis arcelloides type was consistently split into Cyclopyxis arcelloides type, Difflugia globulosa type and Phryganella acropodia type so these divisions remained; 5. Difflugia pulex type and Hyalosphenia minuta were consistently identified in addition to Cryptodifflugia sacculus, so all taxa remained and were not grouped into Cryptodifflugia sacculus type; 6. Difflugia pristis type, Pseudodifflugia fulva type and Pseudodifflugia fasicularis type were consistently identified in addition to Difflugia lucida type, so all taxa remained and were not grouped into Difflugia lucida type; 7. Difflugia bacillifera type and Difflugia rubescens type were consistently identified in addition to Difflugia oblonga type, so all taxa remained and were not grouped into Difflugia oblonga type;  Table 1 for full details). Sites are colour-coded by United States Environmental Protection Agency (EPA) ecoregion (see text for details). Locations of two independent palaeo records at Lac Le Caron, Qu ebec and Framboise, Nova Scotia are also shown (black spots). For references to colour, readers are referred to the online version of the article. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) Table 1 Site details and meta-data. Data ordered by year of publication/surname and listed alphabetically by site name. For Warner and Charman (1994), sample codes could not be confidently linked to specific sites, so only the total number of samples in the dataset is shown.  and Warner (1992) Ontario, Canada Wally Creek 49.076 À80.537 108 Warner and Charman (1994) Minnesota, USA Beckman Lake 45.421 À93.186 49 Warner and Charman (1994) Ontario, Canada Buriss 48.666 À93.632 Warner and Charman (1994) Minnesota, USA Cedar Lake 45.412 À93.198 Warner and Charman (1994) Ontario, Canada Emo 48.702 À93.825 Warner and Charman (1994) Ontario, Canada Fort Frances Airport 48.545 À93.398 Warner and Charman (1994) Ontario, Canada Kehl Lake 48.579 À93.576 Warner and Charman (1994) Ontario, Canada Pinewood 48.754 À94.290 Warner and Charman (1994) Ontario, Canada Rainy River 48.758 À94.533 Warner and Charman (1994) Ontario  10. Heleopera petricola was consistently identified in addition to Heleopera sphagni, so this division remained; 11. Padaungiella (Nebela) wailesi type was consistently identified in addition to Padaungiella (Nebela) lageniformis, so this division remained; 12. Planocarina (Nebela) marginata type was consistently identified in addition to Planocarina (Nebela) carinata type, so this division remained.
Following a recent redefinition, we update the nomenclature of Nebela militaris to Alabasta militaris (Duckert et al., 2018, in press). The complete dataset was screened for data quality and rare taxa. Eight taxa were removed as they occurred in <1% of the total number of samples (i.e. <19 samples; Table S1). Eleven samples were removed where taxonomic merging or deletion of taxa only identified to genera (e.g. Difflugia sp., Euglypha sp. etc.) resulted in a percentage total <90% (Table S2). A further 18 samples with extreme WTDs below the 0.5th (À15 cm) or above the 99.5th (67 cm) percentiles of all WTD measurements were removed to avoid undue influence of these samples in subsequent transfer function development (Table S3). This data management resulted in a useable dataset of 64 taxa and 1927 samples (Fig. 2). Hereafter, this will be referred to as the full dataset. Of these 1927 samples, all, with the exception of those from Warner and Charman (1994), had a known site location. It was not possible to assign samples to exact sites for the Warner and Charman (1994) data due to unspecific sample codes. Therefore, in all analyses requiring geographical information, reduced datasets were used that excluded those samples.
To investigate testate amoeba ecological and biogeographical patterns across North America and Europe (Amesbury et al., 2016), we compiled the two continental-scale training sets to form a combined dataset. We used the original merged datasets before the removal of rare taxa (n ¼ 1799 for Europe, n ¼ 1956 for North America, total n ¼ 3755), since combining the two may have resulted in additional samples to characterise previously rare taxa. Given the greater degree of taxonomic grouping in the European data, it was necessary to harmonise the North American data into the European taxonomic scheme (Amesbury et al., 2016). Taxonomic changes made to facilitate this followed the reverse of the 12 numbered points above. In addition, three taxa not identified in Europe (Conicocassis pontigulasiformis, Nebela barbata and Waileseila eboracensis) were added to the European taxonomy, resulting in an initial 63 defined taxa/taxonomic groups. Once combined, we applied the same data quality and rare taxa criteria as above and in Amesbury et al. (2016). We removed 11 rare taxa present in <1% of the total number of samples (<37 samples), 35 samples with extreme WTDs below the 0.5th (À15 cm) or above the 99.5th (82 cm) percentiles of all WTD measurements and a further 45 samples with percentage totals <90% (see Tables S4 e S6 for details of removed samples/taxa). This resulted in a workable combined dataset of 52 taxa and 3675 samples. All biogeographical inferences resulting from this work should be set against the taxonomic resolution applied. Light microscopic studies necessitate a practical, but relatively low-resolution approach when set against known levels of cryptic and pseudocryptic diversity in testate amoebae (e.g. Heger et al., 2013;Oliverio et al., 2014;Kosakyan et al., 2016;Singer et al., 2018). Therefore, where biogeographic patterns are identified at the resolution applied, we can assume that these would still be valid at a higher taxonomic resolution. However, if biogeographic patterns are absent for particular taxa, this does not exclude possible evidence when true genetic diversity is examined.

Statistical analyses
Considering the key aim of this study to produce a transfer function for palaeohydrological reconstruction, non-metric multidimensional scaling (NMDS) was used sensu Amesbury et al. (2016) to objectively examine the dataset and potentially exclude outlying sites and/or samples that were non-typical for the ombrotrophic peatlands commonly used in palaeoclimate research. NMDS was carried out using the 'metaMDS' function in the R package vegan (Oksanen et al., 2015), with water table depth and pH included as potential environmental drivers. Default settings and automatic data transformation were applied and the analysis was conducted based on Bray-Curtis dissimilarity. Environmental variables were passively fitted to the NMDS, and their significance tested, using the 'envfit' function in vegan. Sample-specific values for the Shannon-Wiener diversity index (SWDI) were calculated using the 'diversity' function in vegan. Transfer function development was carried out using the R package rioja (Juggins, 2015). Four common model types were applied; weighted averaging with and without tolerance downweighting (WA/WA-Tol; both with potential for inverse (inv) or classical (cla) deshrinking), weighted averaging with partial least squares (WAPLS), maximum likelihood (ML) and the modern analogue technique (MAT; with the potential for a weighted-mean version, WMAT). For each model, only the best performing version (judged by r 2 and leave-one-out root mean square error of prediction; RMSEP LOO ) was selected for further analysis. Additional statistical testing followed convention in the recent literature after Amesbury et al. (2013) and included calculation of leave-one-siteout RMSEP (Payne et al., 2012; RMSEP LOSO ), segment-wise RMSEP (Telford and Birks, 2011a; RMSEP SW ) and a spatial autocorrelation test (Telford andBirks, 2005, 2009) using the 'rne' (random, neighbour, environment) function in the R package palaeoSig (Telford, 2015). Also, following convention, transfer function development followed a two-step process. We firstly developed models using the full dataset and then, for each model type, performed a second, 'pruned', model run which excluded samples with high residual values greater than 20% of the total range of measured water table depths.
As part of the testing process, models were applied to independent (i.e. no modern samples from the sites were included in the transfer function) palaeo-data from two Sphagnum-dominated ombrotrophic peatlands: Framboise bog, Nova Scotia (45.7208 N,60.5521 W;H. Mackay,unpublished) and Lac Le Caron peatland, Qu ebec (52.2884 N, 75.8393 W;van Bellen et al., 2011). Original taxonomies of the independent palaeo datasets were merged and updated (i.e. in terms of nomenclature) as necessary in order to match the taxonomic scheme of the new models produced here. Sample-specific errors for all transfer function reconstructions were based on 1000 bootstrapping cycles. We compared our reconstructions with output from the most geographically extensive published transfer function in North America  and tested the significance of all new reconstructions (Telford and Birks, 2011b) using the 'randomTF' function in the R package palaeoSig (Telford, 2015;Payne et al., 2016).
To test for regional variability within the continental dataset, we developed regional models based on United States Environmental Protection Agency (EPA) Level 1 eco-region sub-divisions (www. epa.gov/eco-research/ecoregions), using the pruned WA-Tol (inv) model (excluding samples from Warner and Charman, 1994; see Section 2.1) as the basis for regional model development (n ¼ 1696). Our data covered eight different eco-regions but to ensure regional models were representative, we only developed models for those regions containing >50 samples (Table 3). This resulted in five regional models: taiga (samples n ¼ 68, sites n ¼ 7), northern forests (samples n ¼ 978, sites n ¼ 56), northwestern forested mountains (samples n ¼ 136, sites n ¼ 16), marine west coast forest (samples n ¼ 266, sites n ¼ 23) and eastern temperate forests (samples n ¼ 377, sites n ¼ 18). Models were not developed for the tundra (samples n ¼ 7, sites n ¼ 1), Hudson Plain (samples n ¼ 32, sites n ¼ 7) and North American deserts (samples n ¼ 11, sites n ¼ 1) eco-regions. The independent palaeo records were located in the northern forest (Framboise) and taiga (Lac Le Caron) eco-regions. The programme PAST (v3.20;Hammer et al., 2001) was used to run one-way PERMANOVA tests (9999 iterations, To test the validity of a combined North American and European transfer function, we applied the North American (this study), European (Amesbury et al., 2016) and a pruned combined model (n ¼ 3425; all WA-Tol (inv)) to independent palaeo data from both North America (Framboise) and Europe (Tor Royal Bog, UK sensu. Amesbury et al., 2016). We also qualitatively compared the modelled taxa water table depth optima and relative optima ranking of the three models.

Ordination
Summary NMDS results are shown in Fig. 3. For the NMDS of the full dataset ( Fig. 3A and B), square root transformation was automatically applied. WTD was significantly correlated to NMDS axis 1 (p < 0.001), although the WTD vector was at 45 to the horizontal, with taxa favouring drier conditions in the top left of the ordination space and taxa favouring wetter conditions in the bottom right (Fig. 3B). In general, all samples fell in a large, fairly coherent single swarm of data, with no significant outliers, or groups of outliers, though samples tended to cluster more in the uppermost part of the swarm (Fig. 3A). When sub-divided by eco-region (Fig. 3C), there was a high degree of overlap, with no regions outlying all others. Only samples from the northwestern forested mountain eco-region tended to cluster in a specific part of the sample swarm (towards the left; Fig. 3C), whereas samples from all other regions were located throughout the ordination space. This high degree of overlap, coupled with the significant correlation of WTD to NMDS axis 1 provided a strong justification to continue with transfer function development using the full dataset.

Transfer function development
Performance statistics for the best-performing version of each model type are shown in Table 4. For the full dataset, the range of WTD values was 82 cm (minimum WTD -15 cm; maximum WTD 67 cm), therefore a value of 16.4 cm (20% of range) was used to remove outlier samples during data pruning. The number of pruned samples varied for each individual model type (Table 4). Following data pruning, the WMAT K5 (weighted-mean modern analogue technique with five nearest neighbours) model was the best performing when judged by RMSEP LOO and R 2 . ML was the worst performing model (based on RMSEP LOO ) when all samples were included but improved after data pruning (e.g. second best pruned model for r 2 only), largely down to the removal of 410  Table 2. For references to colour, readers are referred to the online version of the article. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) Table 4 Performance statistics for all transfer function models for water-table depth based on leave-one-out (RMSEP LOO ), leave-one-site-out (RMSEP LOSO ) and segment-wise (RMSEP SW ) cross validation methods. Results are in order of performance as assessed by RMSEP LOO of the pruned model run. Results for RMSEP LOO are given for both preand post-removal of samples with high residual values (see text for details; figures in parentheses show performance statistics after the removal of outlier samples). Results for RMSEP LOSO and RMSEP SW are only given for pruned models. Relative decreases in model performance are between RMSEP LOO of the pruned models to RMSEP LOSO and RMSEP SW . SD is the standard deviation of all water-table measurements included in each model after the removal of outliers. Results are only shown for the best performing version of each model type. WA-Tol (inv) ¼ weighted averaging with tolerance downweighting and inverse deshrinking; WAPLS C2 ¼ second component of weighted averaging partial least squares; ML ¼ maximum likelihood; WMAT K5 ¼ weighted averaging modern analogue technique with five nearest neighbours. samples with high residual values, more than double the number of pruned samples than for the next highest model (WA-Tol (inv); n ¼ 197). Pruned samples for all models tended to be reasonably evenly distributed between wet and dry extremes, however ML produced far more extreme negative residuals (i.e. samples where WTD was substantially overestimated; Fig. 4). Partly as a result, both ML and WMAT K5 suffered from comparatively high average bias values when compared to the two WA-based models, although the WA-based models had higher maximum bias (Table 4). WAPLS C2 and WA-Tol (inv) performed very similarly on all metrics, with WA-Tol (inv) marginally outperforming WAPLS C2 after data pruning.
All models showed a decrease in relative performance for RMSEP LOSO and RMSEP SW compared to RMSEP LOO , though for ML, both reductions were marginal (Table 4). Relative performance ratings also changed when different versions of RMSEP were applied. For example, when using RMSEP LOSO , WA-based models were the best performing, but using RMSEP SW , both showed a higher relative decrease in performance than WMAT K5 and ML (though RMSEP SW for the WA-based models was still lower than for ML; Figure S1). Both WA-based models performed worse in 'dry' segments (i.e. WTDs greater than 40 cm) with comparatively less samples, whereas WMAT K5 and ML displayed more consistent RMSEP values across all segments, suggesting that these models were not as affected by WTD and/or segment sampling frequency ( Figure S1). WMAT K5 performed relatively poorly for RMSEP LOSO , suggesting that this model type does not deal well with clustered training sets (Amesbury et al., 2016).
All models displayed evidence of spatial autocorrelation, given that r 2 declined much more rapidly when geographically close compared to random samples were removed; this decline was sharpest for WMAT K5 ( Figure S2). R 2 values for WMAT K5 and ML declined steadily with the continued removal of geographically close samples, whereas, interestingly, r 2 increased from previous values for both WA-based models when all samples within 1000 km of each other were removed, such that the 1000 km values were higher than for 100 km. Even values at 2500 km (i.e. the models would only contain samples that are >2500 km apart) were comparable to 100 km. These results suggest that, despite the initial decline in r 2 , the WA-based models are more robust to spatial autocorrelation than WMAT K5 or ML.
Taxon-specific values for WTD optima and tolerance showed taxa ordered along a WTD gradient both comparable to the diagonal WTD vector along axis 1 of the NMDS (Fig. 3) and also to other studies. In contrast to a European compilation (Amesbury et al., 2016), where taxonomic grouping resulted in more similar tolerance ranges along the WTD gradient, tolerances in this dataset (Fig. 5) showed a more typical pattern of being wider for drier taxa and vice versa (Mariusz Lamentowicz and Mitchell, 2005). For example, the mean tolerance for the driest half of taxa, ranked by optima, was 12.5 cm, whereas the same value for the wettest half was 8.4 cm (significantly different at p < 0.05 for a two-sample, two-tailed t-test, df ¼ 60).

Testing model application
All four models were applied to independent palaeo records from Framboise and Lac Le Caron peatlands (Fig. 1) and compared to reconstructions using the transfer function of , the most geographically extensive published transfer function from North America ( Fig. 6; Figure S3). All model types, as well as the  reconstructions, showed broadly similar patterns of change ( Fig. 6 and S3). WMAT K5 and ML reconstructions tended to reconstruct somewhat more abrupt (i.e. change over less samples) and extreme (i.e. higher and lower water table depth values at either end of the scale) change, when compared to the WA-based models, a trend particularly evident in the Lac Le Caron reconstructions. In general, all four model types tended to co-vary within a range of 5e10 cm, however there were periods when particular reconstructions were more divergent. For example, at 10 cm depth at Framboise, the WMAT K5 model reconstructed a significant dry shift not seen in other models (Fig. 6A) and at Lac Le Caron, the ML model produced wet shifts at 36 cm and around 215 cm that were not replicated by other models, as well as reconstructing generally wetter conditions for the lower~150 cm of the profile (Fig. 6B). However, when viewed as z-scores ( Fig. 6C and D; Swindles et al., 2015a), the majority of shifts were aligned, suggesting that all models are reliably reconstructing the main patterns of palaeohydrological change in the profiles.
The significance of all reconstructions was tested against models trained on random data (Telford and Birks, 2011b,   often to extremes. For example, ML had the lowest p value at 0.287 (despite having the highest value at Framboise), with both WAbased models having p values > 0.98, meaning their reconstructive ability was worse than almost all models trained on randomly generated data (see also Payne et al., 2016).

Eco-region variability and models
One-way PERMANOVA tests showed that there were significant differences in testate amoeba assemblages between factor ecoregions (p < 0.0001, F ¼ 15.16). Species richness (i.e. number of taxa present) differed among eco-regions whereas eco-region mean species diversity values were typically more consistent (in the range 1.9e2, overall range for all samples 0.19e3.07), with the exception of the northwestern forested mountains and North American deserts regions (Table 3). Those eco-regions characterised by only a single site (tundra, North American deserts) had expectedly lower species richness, but, in the case of the tundra region, still displayed species diversity equivalent to the total dataset and to other regions (e.g. Swindles et al., 2009Swindles et al., , 2014, despite being based on only seven samples. There were notable differences in species richness among regions with more comparable numbers of sites and samples. For example, eastern temperate forests (18 sites, 346 samples) contained 62 different taxa, whereas northwestern forested mountains (16 sites, 131 samples) contained only 42 taxa. In this case, the 20 fewer taxa in the northwestern forested mountains were often relatively rare taxa that were absent from multiple regions, however also included the absence of other relatively abundant taxa present in all other regions (e.g. Amphitrema wrightianum, Bullinularia indica, Difflugia oblonga, Difflugia pulex). Inter-regional assemblage differences were investigated by plotting taxa distributions across the eight eco-regions ( Figure S5). Only eight of 64 taxa occurred in all eight eco-regions: Assulina muscorum, Nebela tincta type, Euglypha tuberculata, Hyalosphenia papilio, Corythion-Trinema type, Centropyxis cassis type, Euglypha rotunda type and Difflugia globulosa type. However, allowing for the reduced species richness in the less-sampled tundra and North American desert eco-regions (Table 3; regions 1 and 8 on Figure S5), 42 of 64 taxa occurred in six or more eco-regions. Of the 22 taxa that were present in five or less eco-regions, 14 were absent from the less-sampled tundra and North American desert regions, as well as the relatively taxa-poor northwestern forested mountains region. Of those 14 taxa, the most frequent were Physochila griseola (619 (out of 1884) occurrences), Bullinularia indica (519 occurrences) and Centropyxis ecornis type (376 occurrences). Taxa present in only two or three eco-regions were largely driven by low occurrence, with only Nebela minor of that group occurring in >100 samples ( Figure S5).
Taxa tended to occur on a continuum where their presence in samples and eco-regions, and local abundance, were broadly aligned ( Figure S6). For example, common, locally abundant key hydrological indicator taxa such as Assulina muscorum, Trigonopyxis arcula type and Archerella flavum tended to occur in multiple (i.e. 6e8) eco-regions, whereas rare (in both occurrence and abundance) taxa such as Pseudodifflugia fasicularis type and Padaungiella lageniformis type tended to have more restricted ranges, occurring in only a few (i.e. 2e3) eco-regions. However, this overall trend also encompassed considerable variability. Relatively rare taxa could still be locally abundant and occur across multiple eco-regions. For example, Arcella hemispherica occurred in only 3% of samples, but had a maximum abundance of 63% and was present in six ecoregions. Taxa that were rare in both occurrence and abundance could still occur across multiple eco-regions. For example, Quadrulella symmetrica was present in only 3.6% of samples, never above 9% abundance, but was only absent from one eco-region (North American desert). The absence of Q. symmetrica from the North American desert eco-region is noteworthy given the recent discovery of a Quadrulella taxon in a Mexican desert (P erez-Ju arez et al., 2017). Finally, locally abundant taxa could also have more restricted ranges. For example, Difflugia oviformis type occurred in only two eco-regions (NF and ETF), but formed more than half of the assemblage in some samples in those regions.
To further test overall model performance and to further investigate internal biogeographical variability, we developed models for five eco-regions, using only the WA-Tol (inv) model type (cf. Amesbury et al., 2016; see Section 2.2). Eco-region model reconstructions varied notably ( Fig. 7 and S4), in terms of value, pattern and, less often, direction of change. Most notably, the  . A and C for Framboise bog, B and D for Lac Le Caron peatland. For all plots, higher y-axis water table depth or z-score values indicate drier conditions. Note varying y-axis scales.
northwestern forested mountains (NFM) model produced reconstructions, for both palaeo datasets, that had notably drier reconstructed water table depth values when compared to all other models (including the pruned WA-Tol (inv) model based on the full dataset; Fig. 7A and B). Mean WTD for individual samples and the overall range of WTDs sampled for the NFM model were equivalent to other models (Table 3), suggesting these differences were not simply driven by poor characterisation of 'wet' conditions in the model. Even when viewed as z-scores, the NFM model appeared anomalous, with often conflicting patterns and direction of change, despite having lower RMSEP LOO and higher R 2 values than all other regional models (Table 3). The eastern temperate forest (ETF) model most closely resembled the WA-Tol (inv) reconstruction at both sites; this model had the highest number of samples and joint highest number of taxa of the eco-region models (Table 3). Despite having a much lower R 2 , and higher maximum and average bias values (Table 3), the taiga reconstructions were broadly in line with the WA-Tol (inv) reconstruction at both sites. The northern forest (NF) model reconstructions, despite being based on a high number of species and taxa compared to most other eco-region models (Table 3), were variable in the extent to which they aligned with other models. At Framboise, a site within the NF eco-region, the NF reconstruction was closely aligned to the ETF and WA-Tol (inv) reconstructions, whereas at Lac Le Caron, it often exhibited differing patterns and direction of change, suggesting potential problems with supra-regional application of eco-region models . All eco-region model reconstructions at both sites were tested for significance against models trained on random data (Telford and Birks, 2011b, Table 5). P values for only two of ten reconstructions fell under the suggested significance cut-off value of 0.05 (NF and ETF models at Framboise, 0.015 and 0.007 respectively). Other p values ranged from 0.471 to 0.997 and no reconstructions at Lac Le Caron were significant. Taxa optima for the five eco-region models were compared to those of the full dataset ( Figure S7). Taxa optima rank, cf. Fig. 8, was not compared due to variable n for taxa in each eco-region model. Optima for individual taxa varied from 0.8 cm (A. wrightianum) to 42.5 cm (Difflugia rubescens), with a mean value of 11.6 cm and standard deviation of 7.3 cm.  (Amesbury et al., 2016) and combined models for independent palaeo records in both Europe and North America were very similar when viewed as both reconstructed water table depth values and z-scores (Fig. 8). Individual taxa water table depth optima, whether in cm or rank order, were not significantly different across the three models (comparing the European and North American data as the only completely independent variables; for optima r ¼ 0.80, p < 0.05, for rank r s ¼ 0.82, p < 0.05), confirming WTD as an ecologically important environmental variable (cf. Juggins, 2013). Despite the general consistency in WTD optima, some variability was evident. Most notably, Table 5 PalaeoSig p values for the significance of reconstructions against models based on randomly generated data for the pruned versions of four model types and five ecoregion models, against two independent test sets. See Tables 3 and 4 for  Difflugia labiosa type was the wettest ranked taxa in the North American and combined models, with negative values indicative of standing water, but ranked 45 of 46 taxa in the European model (Amesbury et al., 2016) with a dry water table depth optima of 25 cm. However, in the European dataset, D. labiosa was only found in one sample in one eco-region (Scandinavia) at very low abundance (0.7%; Amesbury et al., 2016) and was therefore poorly characterised. Its more extensive presence in the North American dataset (as D. oviformis type) in 60 samples spanning two ecoregions (northern forests and eastern temperate forests) with a maximum abundance of >50% provides better quality ecological data for this taxon and highlights the value of large, ecologically diverse datasets in improving modern analogue quality, as well as the persistent issue of taxonomic consistency . Other differences in optima and rank were less extreme, but varied by up to 15 ranking positions or approximately 10 cm in terms of water table optima (e.g. Nebela penardiana type, Assulina seminulum type). On average, taxa optima varied by 4.3 cm and 5.2 ranking positions across all three models (equivalent values of 3.8 cm and 4.4 ranking positions excluding the anomalous D. labiosa type).

A new continental-scale transfer function for North America
Performance statistics (i.e. r 2 and RMSEP LOO ) for the North American model were equivalent to a recent European compilation (Amesbury et al., 2016) and a range of other regional-scale models within North America and Europe (Table S7, i.e. within mean ± 1s of 20 reviewed models). Further statistical tests (sensu. Amesbury et al., 2013) showed that potential issues such as clustered sampling design and spatial autocorrelation had only minimal effect on model performance and output. Reconstructions on independent palaeo-data were comparable to a less sample-dense and more geographically restricted North American model . Taken together, these multiple strands of evidence suggest that our new model can be used as an effective tool to reconstruct peatland palaeohydrology throughout the North American continent. More ecologically bespoke regional models may still provide reliable reconstructions within and potentially even outside of their geographical limits Willis et al., 2015) and have an important role to play in ongoing research where they suit specific research questions. However, the volume of data in the continental-scale compiled dataset provides improved modern analogues for many taxa. All four models tested performed similarly in statistical terms, however due to potential issues with spatial autocorrelation and the somewhat abrupt nature of some reconstructed shifts in WTD with WMAT K5 and ML (Fig. 6), as well as the differing magnitude of ML reconstructions ( Fig. 6A and B), we favour WA-based models. A pruned WA-Tol (inv) model marginally outperformed WAPLS C2 and produced reconstructions slightly more aligned to the model of , so WA-Tol (inv) is our preferred model choice.
The variability of p values for significance testing of transfer function reconstructions (Telford and Birks, 2011b) between sites and models trained on the same data brings the use of the 'ran-domTF' test into further question (Amesbury et al., 2016;Payne et al., 2016). Across all model types and eco-regions, no reconstructions from Lac Le Caron had p values < 0.05. Eco-region model reconstructions for Framboise varied widely between 0.007 and 0.782. At Framboise, NF and ETF model reconstructions were both significant (0.015 and 0.007 respectively), whereas MWCF and taiga model reconstructions were non-significant (0.471 and 0.485 respectively), despite showing extremely similar patterns of change ( Fig. 7; Table 5). Given that the efficacy of the 'randomTF' method has been recently reviewed and questioned (Amesbury et al., 2016;Payne et al., 2016), these results further call into question the usefulness of this test.
To make transfer functions more openly accessible to the research community and to facilitate users' own application of the model to fossil data, we provide the full compiled North American dataset along with R code to apply the WA-Tol (inv) model as supplementary information to this manuscript (our dataset is archived on Mendeley data and is searchable at https://data. mendeley.com/datasets). In addition, site-based datasets preserving the original taxonomy of the individual studies are available through the Neotoma Paleoecological Database (www.neotomadb. org).

North-American regional variability
Biogeographical patterns that are evident in the dataset appear to be largely driven either by sampling frequency or restricted to particular eco-regions (e.g. NFM). Outside of these limitations, biogeographical differences occurred against a background of relative cosmopolitanism across the dataset. For example, the finding that H. papilio occurred across all eco-regions is particularly interesting given recent evidence suggesting that it displays geographical patterns of both genotypic and phenotypic diversity in other regions (Heger et al., 2013;Amesbury et al., 2016). This observation is reassuring for the broader applicability of transfer functions across large geographical areas. Although we used only a moderate taxonomic resolution (higher than for a European compilation; Amesbury et al., 2016) that did not allow us to identify the full complexity of biogeographic patterns, and hence may have caused us to miss some discrepancies in taxa hydrological optima, this potential loss of information did not result in poor model performance. Even though a higher taxonomic resolution may permit the development of more precise transfer function models it appears that the taxonomic scheme currently used by ecologists and palaeoecologists is robust and the resulting model performance satisfactory.
Taxa optima were broadly consistent across eco-region models ( Figure S7), with mean variability between eco-region models of 11.6 cm. Highly variable water table depth optima for individual taxa tended to be driven by poor characterisation. For example, Difflugia rubescens and Arcella gibbosa both had tightly clustered optima across a majority of models, with outlying values for the MCWF model only driving the two highest values for overall optima range; in that model, the optima for both taxa were driven by occurrence in a single sample (from 191 samples in the model; Table 3), both at <2%. Likewise, the outlying optima for Padaungiella lageniformis in the NF model was driven by only 6 occurrences (from 915 samples in the model; Table 3), never at >1% abundance. Thus, whilst the variability evident between taxa optima across eco-regions may be partly driven by genuine biogeographical differences, the overall consistency provides support for continentalscale compilation. The variability in eco-region model reconstructions illustrates a potential advantage of sample dense Fig. 9. Comparison of water table depth optima (cm) for 46 taxa from European, North American and combined models, ordered by the combined model. Top panel: rank order of taxa (coloured spots) with range between highest and lowest rank (grey bars). Bottom panel: modelled optima (coloured spots) with range between highest and lowest optima (grey bars). For references to colour, readers are referred to the online version of the article. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) continental-scale compilations. In particular, the NFM model produced anomalous reconstructions that were significantly drier (i.e. higher WTD) than other eco-region models and also anomalous when viewed as z-scores. Although not clearly outlying and therefore included in the data analysis, this was the only region to be partly clustered in the NMDS analysis (section 3.1) and was relatively taxa poor. All sites in this eco-region were from the Rocky Mountains and, although Sphagnum-dominated, were typically fens and included sites that were fed by highly acidic and iron-rich groundwater (Booth and Zygmunt, 2005), which appears to have driven a distinct local flora. Although this produced anomalous results for a regional-scale model, when these data were included in a continental-scale model, potentially confounding factors did not overtly effect the magnitude or direction of change of the reconstruction (Fig. 7), whilst valuable ecological information was maintained.

Spatial scales for transfer function development
Continental-scale models from North America and Europe, as well as a combined model, reconstructed independent test data from both continents identically, in terms of palaeohydrological interpretation. Despite some variability, water table depth optima were also broadly aligned between continents (Fig. 9). Modelled water table depth optima and taxa distribution may vary on local or regional scales due to differences in peatland type (e.g. forested vs. non-forested; Beaulne et al., 2018), climatic context (Booth and Zygmunt, 2005) or the influence of non-linear processes such as permafrost (e.g. at the unpublished Scotty Creek and Toolik sites (Table 1); see also Lamarre et al., 2013). However, the relative position and ecological niches of taxa in respect to surface wetness is often more consistent (Booth and Zygmunt, 2005;Beaulne et al., 2018). Our comparison of water table depth optima and rank between European and North American compilations showed that, with the exception of a single anomalous taxon, differences were relatively small (Fig. 9) and importantly, did not have a noticeable effect on reconstructions of test data from both within and outside of each continent, especially when viewed as residual values ( Fig. 8; Swindles et al., 2015a). Therefore, it appears that potentially confounding climatic or peatland type differences that may exist within continental-scale compiled datasets are compensated by more in-depth characterisation of the average ecological niche of each taxon.
These findings also support the recent assertion that transfer functions trained on very large datasets perform well due to the vastly increased number of analogues contained within them, which provide a better representation of testate amoeba ecology than less populated local-or regional-scale models where modern analogues may be either lacking or based on low numbers of samples for some taxa (Amesbury et al., 2016). In a test of regional and supra-regional application of transfer functions, the only difference between reconstructions was in the absolute magnitude of change (i.e. in reconstructed water table depth values), with direction of change, the most reliable variable for palaeoclimatic interpretation, consistent Swindles et al., 2015a). Given that the recent suggestion to display transfer function output as standardised z-scores (Swindles et al., 2015a;Amesbury et al., 2016) tends to mitigate magnitude differences, we therefore advocate the further development and use of continentalscale models that maximise modern analogue quality without losing reconstructive skill and provide standard reconstruction tools with directly comparable results.
Continental-scale models from North America (this study) and Europe (Amesbury et al., 2016) have similar performance statistics than regional-or local-scale models, but potentially offer additional scientific benefit. The compilation and taxonomic harmonisation of such datasets, alongside their archiving in freely available repositories (e.g. Neotoma), permits further community-driven effort towards a better understanding of wide-scale testate amoeba ecology and biogeography that can contextualise and enhance the interpretation of transfer function results. In addition, the lack of significant reduction in performance statistics and similarity of reconstructions produced by a combined North American and European model, when compared to the two continental-scale models suggests that any geographical limit of scale beyond which further compilation of peatland testate amoebae data would not be beneficial, has not yet been reached. We therefore call for the continued synthesis of modern testate amoeba data, particularly for Eurasia. Our results show that harmonisation of this data alongside existing datasets may permit a future Holarctic synthesis of peatland testate amoeba ecology and biogeography. A potential challenge when extending beyond the Holarctic realm is that additional Southern Hemisphere specific taxa (e.g. Alocodera, Apodera, Certesella) will need to be considered, while many characteristic taxa of northern peatlands will be missing (e.g. Planocarina, Hyalosphenia papilio).

Conclusions
We compiled a dataset of 1956 modern surface testate amoeba samples from 137 peatlands located throughout the United States of America and Canada. Following data management procedures, a reduced dataset of 1927 samples was used, following exploratory NMDS analysis, to develop continental-scale transfer function models. These models were tested using a range of statistical approaches and application to independent palaeo-data. Though all pruned models tested were valid, WA-Tol (inv) was selected as our preferred model choice. Our new model provides an effective, standardised tool for reconstructing peatland palaeohydrology throughout the North American continent. Using the compiled dataset and models based on EPA Level 1 eco-regions, we examined regional variability within the dataset. Some regions and taxa demonstrated differences in taxonomic diversity and/or biogeography, but these were set against a broad pattern of cosmopolitanism across the dataset. To examine the potential for a future Holarctic synthesis of peatland testate amoeba data, we combined our new North American dataset with an existing recent compilation from Europe to form a combined dataset of >3700 samples. Model outputs from this compilation were equivalent to the North American and European models independently, suggesting that there is scope for further merging of data to develop a future Holarctic dataset and transfer function. Such a dataset would be an extremely valuable community resource, allowing collaborative examination of peatland testate amoeba ecological and biogeographical patterns to an extent not currently possible.

Author contributions
MJA and RKB conceived the work and compiled/archived the data. MJA conducted data analysis and wrote the manuscript. All authors contributed data, actively discussed the direction of the research and/or contributed to manuscript editing.
submission to the Neotoma Paleoecology Database was supported by EAR-1550716 under the guidance of RKB. RKB was also funded by the National Science Foundation (EAR-0902441, ATM-0625298) and the Untied State Geological Survey (USGS-DOI cooperative agreement 06ERAG0019). MJA, DJC, PDMH and HM were funded by the UK Natural Environment Research Council (grants NE/G019851/ 1, NE/G020272/1, NE/GO19673/1 and NE/GO2006X/1). HM was also funded by a UK Quaternary Research Association New Research Workers' Award. RJP was funded by the Russian Science Foundation (grant 14-14-00891). EADM was funded by the University of Alaska Anchorage and the Swiss National Science Foundation (grants 205321-109709/1 and 205321-109709/2). JT and NP were funded by the National Sciences and Engineering Research Council of Canada (grant No: 2014e05878).