How do microbiota associated with an invasive seaweed vary across scales?

Communities are shaped by scale dependent processes. To study the diversity and variation of microbial communities across scales, the invasive and widespread seaweed Agarophyton vermiculophyllum presents a unique opportunity. We characterized pro‐ and eukaryotic communities associated with this holobiont across its known distribution range, which stretches over the northern hemisphere. Our data reveal that community composition and diversity in the holobiont vary at local but also larger geographic scales. While processes acting at the local scale (i.e., within population) are the main structuring drivers of associated microbial communities, changes in community composition also depend on processes acting at larger geographic scales. Interestingly, the largest analysed scale (i.e., native and non‐native ranges) explained variation in the prevalence of predicted functional groups, which could suggest a functional shift in microbiota occurred over the course of the invasion process. While high variability in microbiota at the local scale supports A. vermiculophyllum to be a generalist host, we also identified a number of core taxa. These geographically independent holobiont members imply that cointroduction of specific microbiota may have additionally promoted the invasion process.

within one single algal individual may harbour different communities (e.g., Morrissey, Çavaş, Willems, & De Clerck, 2019) which at the same time depend on environmental parameters acting at local (e.g., Campbell, Marzinelli, Gelber, & Steinberg, 2015) and regional (e.g., Lindström & Langenheder, 2012) scales. Microbial availability and connectivity may still be important at the coastal or continental scale (e.g., Martiny et al., 2011;Sunagawa et al., 2015) and effects related to the invasion process, such as genetic diversity of the host or potential adaptations, are likely more relevant at continental and global scales (e.g., Arnaud-Haond et al., 2017). Since the non-native range is spatially far larger and the invasion process has been facilitated by multiple introductions (Kim, Weinberger & Boo, 2010;Krueger-Hadfield et al., 2017) it is probable that one or multiple selection processes have occurred, both in the native and non-native ranges.
Although such adaptions have not been identified genetically, several lines of evidence corroborate that populations from the non-native range tend to be more tolerant to several stressors, including grazing (Hammann, Rempt, et al., 2016), temperature and salinity extremes (Hammann, Wang, Boo, Aguilar-Rosas, & Weinberger, 2016;Sotka et al., 2018) and epiphytic overgrowth .
Experimental work has indicated that A. vermiculophyllum populations from native and non-native ranges are chemically better defended against epiphytic bacterial settlers from their own range (Saha, Wiese, Weinberger, & Wahl, 2016). This does not only suggest that A. vermiculophyllum is (chemically) well equipped to manipulate epibiota, but also implies that the defence is fine-tuned to the environment at some spatial scale. The synthesis of molecules that interact with the microbial community is plastic, as the production or increase in production may also depend on external conditions. These traits might importantly aid a seaweed host in maintaining an epiphytic community that is overall beneficial. Whether such plasticity, increased plasticity through adaptation, or more specific adaptations have contributed to its success as an invader remains unclear. However, the abovementioned studies strongly suggest that interactions with the associated microbial community have been important in the invasion process. Whether changes in the interaction between host and microbes (such as shown in Saha et al., 2016;, reflect a single adaptation of the host that occurred during the invasion, or rather adaptation or acclimation acting on more local scales is an important question to which the answer may importantly contribute to our understanding of the invasion success of A. vermiculophyllum and other invasive species. While specific host-microbe interactions between A. vermiculophyllum and epibiota have been studied (Saha & Weinberger, 2019;Saha et al., 2016;, the composition of the epiphytic and endophytic microbial communities and variation therein across the native and non-native ranges and smaller scales has not been characterized. It is unknown what commonly the most abundant microbial endo-and epiphytic taxa are and whether A. vermiculophyllum hosts core microbiota (i.e., a subset of taxa that is persistently associated, independent of the geographic origin; Shade & Handelsman, 2011). Characterizing the putative core microbiota and how associated microbial communities vary across scales is crucial to identifying the ecological processes and symbionts that are relevant to the invasion process.
To this end, we collected individuals from 14 populations and characterized microbiota using high-throughput amplicon sequencing of the prokaryotic 16S-V4 and the eukaryotic 18S-V7 ribosomal DNA regions aiming to answer the following two primary questions: (a) Do microbial communities differ between the algal surface (epibiota) and the internal tissue (endobiota) microenvironments? and (b) how do epiphytic communities vary across geographic scales?
Furthermore, we expanded subsets of the sampling with a second year and the collection of haploid individuals, to address two additional questions: (a) How do microbial communities vary between years within populations? and (b) do microbial communities differ among host ploidies and sexes (sensu Hughes & Otto, 1999)?

| Sampling design
Since Agarophyton vermiculophyllum spores require hard substratum to settle and germinate and adults lack the ability to produce a new holdfast (Krueger-Hadfield et al., 2016), we specifically collected individuals that were fixed to hard substratum indicating spore recruitment to avoid sampling fragments recently separated from the same individual or clonally reproduced drifting individuals. To avoid confounding possible effects of life cycle stage (i.e., ploidy and/or sex) with other effects related to the geographic scales, we identified life cycle stages and compared microenvironments and geographic scales using only diploid individuals, processing six diploids per collection site. To sample groups that can be considered populations, only sites where individuals that were presumably sexually reproductive were visited (i.e., sites known to include diploids and haploid females and males fixed to hard substrata; Krueger-Hadfield et al., 2016). We selected the collection sites such that the overall design was organized into five nested scales, which in ascending hierarchy were: (a) Individuals; (b) populations; (c) ecoregions (as defined in Spalding et al., 2007); (d) continental coasts; and (e) ranges (native and non-native; Figure 1). To distinguish local from regional effects, all populations were sampled in pairs, separated by approximately 100 km. Three ecoregions from the Asian coast (Yellow Sea, Oyashio Current and Northeastern Honshu), two from the European coast (North Sea and Celtic Seas) and one from the Atlantic and Pacific coasts of the North American continent (Virginian and Northern California, respectively) were visited. The highest hierarchical scale (i.e., range) included one native and three non-native continental coasts ( Figure 1).

| Collection and sample processing
Agarophyton vermiculophyllum specimens were collected from 14 September to 21 September 2016 and 24 August to 23 September 2017. At each site at least 10 individuals and three water samples of 50 ml were taken. Algae were sampled with gloves, directly stored in new and separate freezer storage bags in a cooling box until arrival F I G U R E 1 Geographic scales included in the sampling of this study: ranges (native and non-native), coasts, ecoregions and populations. Sampled ecoregions are in blue (native) and orange (non-native) and labelled with numbers corresponding to the biogeographic framework from Spalding et al. (2007). The insets are ordered by coast and show the sampling sites (labelled as in Table S1) inside the corresponding ecoregions. Sites from where only epibiotic samples were taken are labelled in green and those from which also endobiotic samples were acquired are green-red. Populations collected in both 2016 and 2017 are indicated by squares. Symbols in the lower right corner indicate which ploidies and sexes (⊕, ♀, ♂) were collected [Colour figure can be viewed at wileyonlinelibrary.com] at the laboratory where they were immediately stored at 4°C until processing. All processing occurred within a maximum of 12 hr after collection.
Then, 1 g of each alga was taken from the terminal part of the thallus (including branches with apices) and transferred to a 50 ml tube containing 15 sterile glass beads of 4 mm in 7.5 ml of artificial seawater. The artificial seawater was prepared from sterile distilled water and 24 g/L NaCl. To separate the epiphytic community, we followed the same method as Saha et al. (2016) with slight modifications. Tubes were placed on a vortexer for 3 min after which the water was filtered through a 0.2 µl PCTE filter with a vacuum pump.
Subsequently, a new volume of 7.5 ml sterile distilled water was added, vortexed for 3 min and filtered through the same filter. Water samples were filtered directly. Filters were stored in absolute ethanol until DNA extraction. From the remaining algal tissue, a small fragment (2-4 cm) was preserved in DESS solution (25% DMSO, 2.5M EDTA and NaCl saturated) for DNA extraction of endobiota.
One additional fragment was preserved in silica gel for ploidy identification. Diploids were identified morphologically with a dissecting microscope before sample processing or by post hoc microsatellite genotyping, following the amplification and genotyping methods described in Krueger-Hadfield et al. (2016). In brief, DNA was extracted from the thalli preserved in silica and ploidy was determined using 10 polymorphic microsatellite loci (Kollars et al., 2015). We considered an individual diploid if at least one locus was heterozygous (Krueger-Hadfield et al., 2016). Based on either morphological or microsatellite identification of life cycle stage, six diploids were selected for amplicon sequencing. In order to test differences in associated microbiota among life cycle stages, we collected at least 30 individuals from the four North American populations and identified diploids, haploid females and haploid males based on the presence of tetrasporangial sori (diploid tetrasporophytes), cystocarps (female gametophytes) or spermatangial sori (male gametophytes) using a dissecting microscope (see Krueger-Hadfield, Stephens, Ryan, & Heiser, 2018). From these sites, six diploids and five of each haploid stage (i.e., sexes) were processed for characterizing microbial communities. Finally, from the four populations collected in 2016, ploidies and sexes were not identified (morphologically nor molecularly) and the microsatellite method was unsuccessful for one of the  (Table S1).

| DNA extraction and sequencing to characterize microbial communities
In the laboratory, ethanol was evaporated from preserved filters with a vacuum centrifuge at 30°C for 1-2 hr. The tissue fragments were taken from the DESS solution and rinsed two times with DNA free water. We note that it is possible that certain firmly attached epibiota could have remained associated with the surface throughout this treatment and that differences between endo-and epiphytic communities may be underestimated. DNA was extracted with the ZYMO faecal/soil microbe kit (D6102; ZYMO Research) following the manufacturer's protocol. The 16S-V4 region was amplified with the primers 515F (S-*-Univ-0515-a-S-19) and 806R (S-D-Arch-0786-a-A-20; see Klindworth et al., 2013) and 18S-V7 region with F-1183mod and R-1443mod (Ray et al., 2016). To prepare amplicon libraries, we applied the two-step PCR strategy from Gohl et al. (2016), using the KAPA HIFI HotStart polymerase (Roche) and the same indexing primers. The procedure for the 18S-V7 fragment was identical, except for the target primer sequences. Amplicon concentrations from the second PCR were estimated from gel pictures and amplicons were pooled accordingly into 16S and 18S libraries which were purified by a gel extraction step, combined in a 5:1 ratio and sequenced at the Max-Plank-Institute for Evolutionary Biology on the Illumina 2 × 300 MiSeq platform. Libraries included three negative controls from the DNA extractions, three negative PCR controls and two mock communities as positive control (HM-782D, Bei Resources). The fastq files were demultiplexed (0 mismatches) after which sequence assemblage and quality filtering was performed using mothur software (version 1.40.5; Schloss et al., 2009) and the silva alignment (Quast et al., 2013 release 132) to denoise and classify the full set of reads. Unique sequences were clustered into OTUs with the opticlust algorithm based on the traditional 3% dissimilarity criterion. Mitochondrial, chloroplast, eukaryotic and unclassified sequences were removed from the 16S data set. We also removed sequences from the 18S data set that were unclassified, classified as Bacteria, Archaea, higher plants and the genus of the host (Gracilaria in the silva 132 release). Since we were primarily interested in epiphytes (that is, sessile taxa growing on the surface of the alga) and sessile animals are not known as prevalent epifauna of A. vermiculophyllum (Nyberg, Thomsen, & Wallentinus, 2009), sequences classified to Animalia were also discarded from the 18S data set. While in the present work referred to as the eukaryotic microbiota, this constituent of the community also contains many multicellular epiphytes. Finally, OTUs that were singletons in the full data set and samples with less than 1,000 16S sequence counts and samples with less than 100 18S sequence counts were also excluded from downstream analyses. Raw demultiplexed amplicon reads and metadata were deposited in the sra database (accession: PRJNA564581).
Since the package was not yet compatible with the silva 132 release at the time of analysis, OTUs were reclassified in mothur using the silva 123 release. Aiming to characterize general functional groups within the communities we selected genes essential to autotrophy (ribulose-1,5-biphosphate carboxylase oxygenase; K01601), aerobic heterotrophy (cytochrome c oxidase subunit III; K02276) and nitrogen fixation (nitrogenase iron protein; K02586). We also included anaerobic heterotrophs as a functional group by combining the mean gene count values for sulfate reduction (adenylylsulphate reductase; K00394), denitrification (methane/ammonia monooxygenase; K10944), acetogenesis (formate-tetrahydrofolate ligase; K01938) and fumarate respiration (fumarate reductase; K00244), which we presumed constitute the majority of anaerobic respiration metabolisms.

| Data analysis
For a total of 273 samples the 16S-V4 and 18S-V7 regions were sequenced and the models explained below were applied to both data sets. Importantly, to adjust for the effect of differences in sequencing depth, inherent to amplicon sequencing based community data, we included the natural logarithm of the sequencing depth (LSD) as a continuous variable in all multi-and univariate models. To reduce complexity and computation time, data sets were trimmed for all multivariate analyses to the 95% most abundant OTUs. Sequence counts were expressed in proportion to the total sample count (relative counts) and sorted in descending order after which cumulative percentages were computed. To compare community abundancebased compositions, multivariate generalized linear models (mGLMs) were fitted with the manyglm() function from the r package mvabund (Wang, Naumann, Wright, & Warton, 2012;. The mGLMs assumed a negative binomial distribution with a natural logarithm as link function. In addition to providing a deviance and p-value at the multivariate level (the "community response"), the mGLM computes the same statistics at the univariate level (each OTU) which we used to count OTUs that were differentially abundant across factors of interest (p unadjusted < .05).
Furthermore, by obtaining both uni-and multivariate outputs simultaneously, we could use the mGLMs to compare communities and characterize core microbiota coherently using a single model. To confirm whether mGLMs satisfied the distribution and mean-variance assumptions, QQ-plots and residuals versus fitted-plots were visually inspected.
To address the first question-whether microbial communities are different among microenvironments-we tested the alternative hypothesis that microbial abundance-based community composition varies among microenvironments. Since mixed models are currently not available in the mvabund package for mGLMs, we used a two-step modelling approach. To first adjust the data for geographic effects we included the factor "population identity" in the first model along with the LSD. On the residuals of this first model we ran a second model with microenvironment (including water, surface and tissue) as response variable and a Gaussian distribution (the residuals from this first model were in the link function and therefore normally distributed). The p-value of the overall effect of microenvironment was obtained using the ANOVA.manylm() function, using 500 bootstrap iterations. We obtained the between-group comparisons and corresponding p-values of the levels within microenvironment by running summaries with different reference levels using the summary. manylm() function and also 500 bootstrap iterations (see Table S2a for the full output). Multivariate dissimilarities were visualized with nonmetric dimensional scaling (nMDS, with the r package vegan; Oksanen et al., 2013), using the raw data and rescaled residuals from the mGLMs.
OTU richness, evenness and predicted functionality were compared with mixed linear models using the lme4 r package (Bates, Mächler, Bolker, & Walker, 2015). Since richness and evenness are the variables of which diversity is essentially a function, we used both parameters as indicators of structural differences in communities. We calculated asymptotic richness (S chao , based on the chao1 estimator, Chao & Bunge, 2002) and rarified richness (S n ). For S n all samples were rarified to the read count of the sample with the lowest sequencing depth (1,008 for prokaryotes and 110 for eukaryotes). From a strictly theoretical point of view, both S chao and S n are expected to be independent of the sequencing depth (but the accuracy of the asymptotic estimate should increase with more reads) and are not necessarily correlated. We used the probability of interspecific encounter (PIE; Hurlbert, 1971) as a measure of evenness (McGlinn et al., 2018). To compare functional group abundances between microenvironments and variation in group abundances across geographic scales the same model structures were used. LSD and microenvironment were included as fixed variables and population and individual identity as random variables. To meet the assumption of normality PIE was logit transformed and the gene counts representing the predicted functional group abundances were log transformed. Assumptions on the distribution were verified and p-values for differences among water, epi-and endobiota were obtained by post hoc Tukey HSD tests (see Table S2b for the full statistical output). Marginal and conditional R 2 values (variation explained by fixed effects and fixed plus random effects, respectively) were computed with the r.squaredGLMM function from the lme4 r package (Nakagawa & Schielzeth, 2013).
To characterize how variation in epibiotic communities is partitioned across geographic scales, we used the subset of data representing diploid algal surface samples, comprising all fourteen populations. We fitted mGLMs including the LSD and the nested structure of spatial scales (range/coast/region/population) as explanatory variables. OTU richness, PIE and predicted functional group abundances were analysed using mixed linear models ( Table S2c).
The LSD was included as fixed effect and the nested structure of spatial scales as random effect, resulting in the following model structure: ~LSD + (1|range/coast/region/population), based on the approach used by Messier, McGill, and Lechowicz (2010). We extracted variance estimates and calculated the corresponding 95% confidence intervals by bootstrapping the models with 1,000 iterations. Multivariate GLMs with the same model structure were also fitted on subsets of the data containing only the water samples to verify the assumption that environmental microbiota vary across geographic scales (see Table S2a).
The four populations collected in both 2016 and 2017 were first analysed with an mGLM. The lowest spatial scale (population identity) was included to represent spatial effects, under which the year was nested. Multivariate distances were visualized in dendrograms using a hierarchical clustering approach for which we used the residuals from an mGLM with LSD as single response variable to adjust for the effect of sequencing depth. The clustering was conducted with the r package pvclust, resampling with multiscale bootstrapping and 1,000 iterations (Suzuki & Shimodaira, 2006). The nested structure was specified as random in univariate models to analyse how much variation was explained by the year after explaining the geographic variation (see Table S2d).
To answer the question whether communities vary among host ploidies and sexes we tested the alternative hypothesis that different lifecycle stages host different microbial communities, using the four North American populations, for which all lifecycle stages had been collected. The mGLMs included the variables LSD, microenvironment (from which the water level was removed), life cycle stage and population identity. Based on the lowest AICsum we did not consider the interaction between microenvironment and lifecycle stage in the mGLMs for both pro-and eukaryotic data sets. In the univariate models "population" and "individual identity" were specified as random variables while "LSD", "microenvironment" and "lifecycle stage" were fixed. As the interaction between microenvironment and lifecycle stage yielded a lower AICc for richness estimates, we retained the interaction in all univariate models (see Table S2e for statistical output and Table S2f for model comparisons).
Since the concept of cores is somewhat flexible (Shade & Handelsman, 2011), we characterized core microbiota based on two approaches; applying an occurrence criterion (occurrence core) and a model-based composition criterion (composition core). Both approaches were performed on the subset of our data containing the six populations for which tissue and surface samples were available. We considered OTUs detected in strictly 100% of the water, algal surface or tissue samples as members of the water, epi-and endophytic occurrence cores. Those present in 100% of both tissue and surface samples were considered as part of the algal core. The composition core was determined based on the univariate results of the mGLMs applied on the same data set to compare microbial communities between microenvironments. As these mGLMs were run on data that had been adjusted for spatial effects (by using the residuals from mGLMs including population identity as explanatory variable) we considered OTUs that were significantly more abundant (p unadjusted < .05) to be microenvironment specific and geographically independent. OTUs fulfilling this criterion were identified using the coefficients and p-values from the mGLM summaries and classified accordingly to either water, surface, tissue or alga (no difference between surface and tissue but in both more abundant than in the water).
We explored correlations between OTUs within the holobiont following the joint modelling approach (Warton et al., 2015). Under the assumption that the residuals have been adjusted for the effects of microenvironments (water, surface and tissue) and population identity (and all physicochemical variables confounded in them), these correlations would represent the variation driven by biological variables and could thus hint at patterns of co-occurrence or mutual exclusion among microbes. The untransformed residuals of pro-and eukaryotic mGLM models (normally distributed in the scale of the link function) were concatenated and all Pearson correlation coefficients and corresponding p-values were computed between the 25 most abundant pro-and eukaryotic OTUs. Correlations with p-values ≥ .05 were discarded.  Figure S1). The removal of low read count samples reduced the number of individuals (initially six) in a few populations (see Table S3).

| Are microbial communities different among microenvironments?
For both pro-and eukaryotic communities, the null-hypothesis that there are no differences in abundance-based community composition among microenvironments was rejected (p pro and p euk < .01). Post hoc comparisons using summaries with different reference levels resolved significant p-values for all pairwise combinations (p pro and p euk < .01). These differences are also apparent in the nMDS plots (Figure 2a,b). Overall, 28.9% and 29.6% of 3,054 and 226 prokaryotic and eukaryotic OTUs were differentially abundant, respectively (Figure 3a). Also for the diversity metrics the null-hypothesis was rejected (p pro and p euk < .01). Asymptotic and rarefied OTU richness of epibiota were higher than of endobiota in both pro-and eukaryotic communities. Post hoc pairwise comparisons showed that in prokaryotic communities, richness of the water and epibiotic communities was similar, while richness of eukaryotic communities was higher in the water than both endoand epiphytic communities (Figure 2c,e, only asymptotic richness shown, see Table S2b for the statistical output). The sequencing depth had a significant effect on S chao in both pro-and eukaryotic communities. Despite S chao being an estimate of the asymptotic richness (species count of the overall theoretical community), this is not unexpected for sequencing based community data (e.g., Smith & Peay, 2014). The sequencing depth had no effect on the rarified richness (p pro = .30, p euk = .73) or the evenness parameter PIE (which is mainly depending on the shape of the rarefaction F I G U R E 2 Differences among microenvironments. (a-b) Nonmetric dimensional scaling (nMDS) plots and effects for water, epi-and endophytic microbial communities, adjusted for the effects of sequencing depth and population identity. Upper insets on the right display nMDS plots based on the raw data and lower right insets on the mGLM residuals also adjusted for effects of microenvironment (ME). (a) Prokaryotic communities. (b) Eukaryotic communities. Effects from univariate models are displayed with 95% confidence intervals for prokaryotic S chao (c), prokaryotic logit PIE (d), eukaryotic asymptotic S chao (e), eukaryotic logit PIE (f) and predicted abundance of functional groups (g-j). Effects are labelled with letters to indicate significant differences (p < .

| How do epiphytic communities vary across geographic scales?
Differences in epiphytic community composition were found across all geographic scales (p range = .03, p coast = .03, p region < .01, p population < .01 for the prokaryotic communities and p range < .01, OTUs at lower spatial scales (Figure 3b). A similar pattern can be observed from nMDS plots ( Figure S2) where between-level differences are more pronounced at lower scales. Also across scales the sequencing depth was only significant for S chao (p LSD < .01 in both pro-and eukaryotic models) and explained considerable variation (47.0% in prokaryotic and 27.9% in eukaryotic communities).
In line with the multivariate results, the smallest geographic scale (population) was the most important, explaining variation ranging from 27.5% prokaryotic S chao to 61.7% in prokaryotic S n (Figure 4).
The highest scale (range) was estimated to explain a minor fraction of variation in prokaryotic S chao and S n (4.59% and 4.16%, respectively). For eukaryotic S n , coast was estimated to explain 18.1% and range to explain 2.94% in eukaryotic PIE. Most variation in predicted functional group abundance was explained at the population level with percentages ranging from 25.2% (phototrophs) to 60.7% (anaerobic heterotrophs; without variation explained by LSD). Predicted autotroph and diazotroph abundances also varied at the highest hierarchical level (range) and explained 42.1% of the variation in autotrophs and 33.0% in diazotrophs (Figure 4). Within the water, pro-and eukaryotic microbiota varied compositionally at all scales (p < .01) and this was most pronounced at the population level where a number of OTUs (106 pro-and 11 eukaryotic) differed significantly (Table S2c, Figure S2i).

| How do microbial communities vary between years?
The mGLMs resolved significant p-values for both population (representing the geographic effects) and year (in all cases p < .01).
However, in prokaryotic communities geographic effects were relatively more pronounced with 64.9% of 2,849 OTUs differentially abundant among populations compared to 14.0% between years. In explained by year within population was little to none (S chao = 0%, S n = 4.2% and PIE = 5.7%). In eukaryotic communities, however, more variation was explained by the interannual changes ( Figure S4; Table S2e).

| Are microbial communities different among host ploidies and sexes?
In the mGLMs for prokaryotes and eukaryotes life cycle stage explained the least deviance of all included variables and was only significant for eukaryotic communities (p eukaryote = .02). However, while in this case the alternative hypothesis was not rejected, post hoc comparisons did not resolve any significant pairwise differences (see Table S2a). The hypothesis that S chao , S n and PIE differ among life cycle stages was in all cases rejected. The interaction between life cycle stage and microenvironment was not significant either in any of the models ( Figure S5).

| Core microbiota and correlations
For each of the microenvironments (i.e., water, surface and tissue) a number of prokaryotic OTUs was present in 100% of the samples from the six populations from which both endo-and epiphytic samples were sequenced (Table S4). Based on the occurrence approach we counted two endophytic, five epiphytic, seven algal (present in 100% endo-and epiphytic samples) and two water core OTUs, which were all prokaryotes. The composition core counted 141 en-  Figure S6).

| Endo-and epiphytic microbial communities
The tissue and surface of Agarophyton vermiculophyllum harbour specific communities that differ in terms of composition, diversity and predicted functionality from each other and from the surrounding water. This is perhaps not surprising as distinct endo-and epibiota were also recognized in green algae (Aires, Moalic, Serrão, & Arnaud-Haond, 2015; Morrissey et al., 2019). Overall, epiphytic communities are characterized by higher richness (two-fold or more) and evenness compared to endophytic communities. The decreased richness and evenness in the tissue compared to the surface may reflect that the surface is more exposed to the environment and more prone to community coalescence (Rillig et al., 2015). Further, chemical control by the host might be higher within the tissue and the biofilm on the algal surface could act as a biotic filter that prevents potential symbionts from reaching the endophytic microbial niche. Although richness and evenness are merely diversity parameters, both have been associated with functional community properties, including productivity (Bell, Newman, Silverman, Turner, & Lilley, 2005) and community stability (Coyte, Schluter, & Foster, 2015;Wittebolle et al., 2009). Differences in the predicted functional groups between tissue and surface suggest that endophytic communities are more autotrophic and diazotrophic, while epiphytic communities are more heterotophic (anaerobically). It is noteworthy that one of the most abundant OTUs, which was also a member of the algal core in both the occurrence and compositional approaches, was classified to the Pleurocapsa (Cyanobacteria; Figure S1a). Speculatively, such cyanobacteria might be the main group of potentially autotrophic endophytes and may even be able to fix inorganic nitrogen.

| Core microbiota
Within the high variability of microbial communities, core microbiota (or core microbiomes) represent a signature of stability. Different concepts are used to define microbial community cores based on occurrence, composition, persistence or connectivity (see Shade & Handelsman, 2011). For occurrence cores, a range of percentage thresholds has been applied (e.g., 12% to 100%, Astudillo-García et al., 2017), and OTUs may be collapsed into higher taxonomic levels. We applied a highly conservative 100% occurrence threshold without collapsing OTUs. Since the caveat of an occurrence core is that the probability of detecting core taxa is directly correlated with abundance we also defined a composition core with a model based approach, using the univariate results of the mGLMs comparing composition between microenvironments. With this approach we detected substantially more core taxa (303 compared to 14 associated to tissue, surface or alga). While this may be less strict compared to the occurrence criterion, the model controls for effects of sampling location and then identifies the core based on increased abundances of these OTUs endophytically, epiphytically or in the alga compared to the water, thus representing a geographically stable signature within the high amount of variation.
Despite the considerable geographic distribution, with both occurrence and composition based criteria a number of core taxa was detected. This contrasts somewhat with recent studies on seaweed associated microbiota in the green algal genera Caulerpa and Ulva (Burke, Steinberg, Rusch, Kjelleberg, & Thomas, 2011;Roth-Schulze et al., 2018; but see Arnaud-Haond et al., 2017). In these algae functional-but not taxonomic-cores were detected and it was suggested that the assemblage of microbial communities in seaweeds might be shaped by stochasticity and abiotic environmental filtering, selecting for functions rather than specific taxa (Burke et al., 2011;Roth-Schulze et al., 2018). In contrast, a recent study comparing microbiota associated with the kelp Ecklonia radiata showed that the microbial community composition was more dependent on the host-condition rather than the local and regional environments . For red algae, taxonomic and functional cores have, to our knowledge, not been characterized, nor have associated microbiota been compared across multiple geographic scales. Our results may reflect that the environment (i.e., a composite of the host and the local environment) filters taxa more strongly in A. vermiculophyllum than in green algae that have been studied previously. However, function and taxonomy are correlated and the filtering of specific functions might indirectly also filter for certain taxa and vice versa.
The detected core microbes are either globally available taxa or represent associations that predate the invasion process and originate from the native range. The cointroduction of microbes with specific functions in the holobiont is thought to be an important facilitator of successful invasion (Arnaud-Haond et al., 2017;Rodríguez-Echeverría, Le Roux, Crisóstomo, & Ndlovu, 2011). Only few of the detected core microbiota were found in the water column and A. vermiculophyllum's core may thus largely constitute co-introduced microbes. The ability of an invasive seaweed to maintain specific taxa with beneficial functions might provide an advantage over competitors and perhaps even protect against disadvantageous microbiota from the environment that are unable to settle in the already populated niche. As shown in Saha and Weinberger (2019), A. vermiculophyllum is associated with cultivatable microorganisms that provide it with protection from microbial pathogens. At the same time the host produces metabolites that selectively promote the settlement of these protective bacteria. Such mechanisms may be general and directed at taxonomic or functional groups or be specific and targeted at microbes that are part of the core microbiota.
However, to test such hypotheses it would be necessary to disentangle the filtering by different environmental components from one another (i.e., that from the local abiotic and biotic environments and that of the host itself).

| Variation in epibiota across scales
Given that previous studies indicated that interactions between host and epibiota might have been involved in the invasion process of A. vermiculophyllum (Saha et al., 2016;, we were specifically interested in epiphytic communities from a geographic perspective. While microbial communities have been studied across local and regional scales in other seaweeds (e.g., Campbell et al., 2015;Lindström & Langenheder, 2012;Roth-Schulze et al., 2018), in this study we also compared variation in microbiota across more global geographic scales, reaching to that of a hemisphere. To a certain extent our results are in line with previous studies Roth-Schulze et al., 2018), which suggest that seaweed microbiota strongly depend on the local scale as we found for both pro-and eukaryotic data sets that most variation in OTU richness and evenness is explained among populations. In addition, abundances of all considered predicted functional groups varied mostly at the local scale. However, some functional groups (autotrophs and diazotrophs) varied also substantially between ranges and the overall community composition differed among levels of all geographic scales. This indicates that epibiotic communities, in terms of composition, diversity and function, are mostly defined at the local scale, but also shows that processes acting at larger scales contribute to shaping the microbial community. What these processes are and from where they originate (the environment, the host or an interaction between environment and host) is important to identify as they may represent general mechanisms of relevance Therefore, seasonal stability in the structure of the associated community might be considerably lower for eukaryotic epibiota and the presence of these taxa may be less specific and more sensitive to stochasticity and local environmental conditions. The results from Lachnit et al. (2011) and this study emphasize the need to also consider temporal scales. Important environmental variables such as temperature that can strongly impact microbial communities (Sunagawa et al., 2015) vary both at geographic (e.g., with latitude) and temporal scales (e.g., with season). However, the effect of temperature on microbial communities is scale-dependent and may be unambiguous (Nottingham et al., 2018) or not (Hendershot, Read, Henning, Sanders, & Classen, 2017). In the case of holobionts it is also important to identify processes originating from the host that affect diversity and composition of associated microbiota. To identify, disentangle and classify the importance of these abiotic and biotic processes, more scales must be considered and microbial communities should be studied along natural gradients in for example temperature, salinity and latitudinal space.
Agarophyton vermiculophyllum life cycle stages are characterized by morphological and ecological differences (Krueger-Hadfield et al., 2016). For the haplodiplontic life cycle to be stabilized over evolutionary timescales, ecological differentiation is essential (Hughes & Otto, 1999). However, we found no differences in asso-

| Holobiont and invasion biology
Communities are structured by processes that are scale dependent (McGill, 2010 and references therein). In the case of communities associated with a holobiont these processes may originate from the host, the local environment in which the host resides or the combined environment of host and local conditions. Evidently, a microbial community that is benign, or even beneficial, contributes to the health and success of the host. An ability to exert a certain degree of control over the associated community could therefore be a valuable strategy. Adaptations to associate specific beneficial symbionts could have occurred over time and if such (core) microbes accompany the host to non-native ranges, they may provide an advantage and promote invasiveness (i.e., accompanying mutualist hypothesis; Rodríguez-Echeverría, 2010). Given the tremendous environmental variability of microbes it might be advantageous to adapt simultaneously a more flexible strategy in associating beneficial microbial taxa and invest energy in mechanisms targeting more general groups (generalist host hypothesis; Rodríguez-Echeverría, 2010). While maintaining essential associations by hosting certain microbes as core members, a high degree of flexibility towards microbial associations may thus be of additional benefit for an invasive species. Arnaud-Haond et al. (2017) identified core microbiota from the invasive green alga Caulerpa taxifolia which they showed were associated with (or had accompanied) the host across in native and non-native ranges. Our data show high variability at the local scale and suggest that A. vermiculophyllum is a generalist host. However, the substantial number of core microbiota we found implies that this alga was also accompanied over the course of the invasion. Agarophyton vermiculophyllum may therefore qualify for both a generalist and accompanied host.
Although originally based on legumes and rhizobial symbionts, Rodríguez-Echeverría et al. (2011) classified such accompanied generalist hosts as species with a high probability to invade, and with a typically fast invasion process. Given its current wide latitudinal distribution ranges along the non-native coasts and in some cases rapid regional expansion (e.g., Byers et al., 2012;Nyberg et al., 2009), the invasion process of A. vermiculophyllum may well have been accelerated by accompanying symbionts. However, to characterize the functions of these core microbiota and to assess whether they have promoted the invasion, the A. vermiculophyllum holobiont needs to be studied in an experimental context.
In conclusion, this is the first study which addresses seaweed microbiota with a sampling design at the scale of the known distribution range. The results presented here show that A. vermiculophyllum associated microbial communities differ within the individual between surface and tissue in composition, diversity and predicted functionality. Our data also reveal that A. vermiculophyllum hosts microbial core taxa, some of which are specific to the tissue or the surface, while others are associated to both niches. These taxa are conserved across the distribution range and may represent functionally important symbionts. Variation in epibiota is compositionally partitioned across all local and global scales, but mostly at the local scale (i.e., among populations), at which OTU richness and evenness varied most substantially. At the scale of the invasion (i.e., native and non-native ranges), however, differences in community composition and predicted functional groups could suggest that a shift in epibiota occurred over the course of the invasion process. Finally, with microbiota that are locally highly variable and a geographically conserved core A. vermiculophyllum matches criteria of both the generalist host and the accompanying mutualist hypotheses, which may in part explain its remarkable success as an invasive species.

ACK N OWLED G EM ENTS
We thank Caitlin Cox and Mike Crowley at the Heflin Center for Genomic Sciences (University of Alabama at Birmingham) for use of the capillary sequencer for fragment analysis. We are also grateful to Nadja Stärck and Anna Neu for support during the fieldwork. This study was funded by grants from the Deutsche