Host genomic variation shapes gut microbiome diversity in threespine stickleback fish

ABSTRACT Variation among host-associated microbiomes is well documented across species, populations, and individuals. While numerous factors can contribute to this variation, understanding the influence of host genetic differences on microbial variation is particularly important for predicting co-evolutionary dynamics between hosts and their microbiota. Functional understanding of host genetic and microbial covariation is also of biomedical relevance, for example, providing insights into why some humans are more susceptible to chronic disorders like inflammatory bowel diseases. Unfortunately, disentangling the relative contribution to microbiome variation of host genetics from the environment has been difficult, particularly in humans where confounding environmental effects cannot be completely controlled experimentally. While isogenic laboratory models can be used in controlled environments, the effects on microbiomes of induced large-effect mutations may not recapitulate those of genetic variation observed in nature. Few studies have tested for the direct influence of natural host genetic variation on microbiome differences within a highly controlled environment in which hosts interact freely. To fill this gap, we performed a common garden experiment using families of genetically divergent populations of threespine stickleback fish—an outbred model organism commonly used for determining the genetic basis of complex traits in the context of natural genetic variation. Using germ-free derivation of divergent lines and hybrids between them in this experimental framework, we detected a clear, positive association between stickleback genetic dissimilarity and microbiome dissimilarity. With RAD-seq data, we identified regions of the genome that contributed most significantly to this relationship, providing insight into the genomic architecture of gut microbiome variation. IMPORTANCE A major focus of host-microbe research is to understand how genetic differences, of various magnitudes, among hosts translate to differences in their microbiomes. This has been challenging for animal hosts, including humans, because it is difficult to control environmental variables tightly enough to isolate direct genetic effects on the microbiome. Our work in stickleback fish is a significant contribution because our experimental approach allowed strict control over environmental factors, including standardization of the microbiome from the earliest stage of development and unrestricted co-housing of fish in a truly common environment. Furthermore, we measured host genetic variation over 2,000 regions of the stickleback genome, comparing this information and microbiome composition data among fish from very similar and very different genetic backgrounds. Our findings highlight how differences in the host genome influence microbiome diversity and make a case for future manipulative microbiome experiments that use host systems with naturally occurring genetic variation.

M ulticellular organisms harbor and interact with a diverse array of microbes on and in their bodies (1)(2)(3).Commonly, these relationships are mutualistic or commen sal, with microbiome constituents helping to maintain host health as well as prevent pathogens from successfully colonizing (4,5).Still, relationships between hosts and their microbiota are complicated because whether particular community members are pathogenic can be context dependent.The context under which microbial communi ties reach dysbiosis can be environmental or be dependent on the host's genotype.For example, chronic inflammatory responses are possible when the host's immune system interprets one or more common symbionts as pathogenic, potentially leading to common conditions such as inflammatory bowel diseases (6, 7) in humans.Understand ing how the underlying structure of host genomic variation affects mechanisms like the host immune response, and in turn shapes the microbiome, is a major research priority for host-microbe biology.
Substantial variation in microbiome composition has been documented among many different host species, populations, and individuals (8)(9)(10)(11).Still, the genetic and molecular mechanisms underpinning these differences have been difficult to study.One particular challenge is isolating the influence of host genetic variation on the microbiome from other factors such as shared environmental influences.Elucidating the relationship between host genetics and microbiomes is essential for understanding the causes of variation, as well as population-specific disease states and co-evolutionary dynamics between host populations and those of their resident microbes (12)(13)(14).A specific question, for example, is whether greater genomic dissimilarity among individual hosts of the same species leads to greater dissimilarity in their microbiomes (15,16).Similarly, are there specific genomic regions that contribute to microbiome variation, and what is the distribution of their effects on the structure of the microbiome?
Disentangling confounding factors from host genetic influence on the microbiome has proven challenging.Microbiome variation across human populations has been documented previously, pointing to a possible role for host genetic differences (12,17).However, in most studies of human populations, and depending in part on the study design [e.g., genome-wide association (GWA), twin studies, etc.], covariance between host genetic and environmental factors is variably difficult to measure and account for (16)(17)(18)(19)(20). Furthermore, human GWA studies have also been biased by limited sampling of global populations that fails to completely sample host genetic diversity (21,22).
Work in model organisms has helped address some challenges as environmen tal factors can often be more easily controlled.Common garden designs, in which replicated studies are performed in a common controlled environment, provide a particularly powerful approach for isolating and quantifying the influence of host genetic vs environmental factors (23,24).These common garden designs have been frequently employed for microbiome studies in plants (25,26), yielding insights into both genetic influences and genotype-by-environment interactions (27,28).Using similar common garden designs in free-living animal species is more difficult, however, because unrestricted co-housing requires the tracking of individuals.Most previous studies in animals have kept individuals with different genotypes physically separated but generally exposed to the same conditions or have periods of isolation followed by contact between groups (29)(30)(31).Unfortunately, such designs can obscure the genetic effects by confounding them with rearing environments or microenvironments (29,32).
Other factors beyond environmental influence can also be difficult to control in animal models.Parental transmission (vertical transmission) of microbes has often not been controlled because of a lack of gnotobiotic protocols that allow a common set of microbes to be re-introduced (31,33).Genetic variation is also often missing from traditional inbred models which cannot capture the true effects of genetic variation in natural populations on microbiome variation.Sometimes genetic variation has been introduced in such studies through induced mutations, but these have often been of large effect in genes such as those involved in innate or adaptive immunity (34,35).The significant disruption of biological systems caused by these large-effect, induced mutations will more likely create dysbiotic states of microbiomes rather than reveal the influences of natural genetic variation on microbiome differences (36,37).
What is required to better understand the contribution of genetic variation to microbiome differences are studies in animal models that (i) have natural genetic variation that has been previously linked to microbiome differences in the wild, (ii) are amenable to controlled gnotobiotic studies in the laboratory, (iii) can be analyzed using a true unrestricted co-housed common garden design, and (iv) can take advant age of genomic tools to precisely study host genetic effects.Threespine stickleback fish (Gasterosteus aculeatus) have all these characteristics and have proven to be an exceptional model for studying natural genetic variation contributing to other complex phenotypes.This small fish has been widely studied in its natural marine and freshwa ter habitats, can be reared in the laboratory, and has extensive genomic resources.More recently, stickleback has been developed as a model for host-microbe interactions through the development of tools, such as gnotobiosis, for microbial experiments (38)(39)(40)(41)(42)(43)(44).Previous studies have focused either on correlating variation in the microbiome of wild stickleback populations with differences in natural environments or have directly manipulated microbiota in the laboratory (38)(39)(40)(41)43).A gap in work using stickleback is directly assessing the relative contributions of the environment and host genomic variation to microbiome variation.
Here, we fill this gap by using genetically divergent, laboratory-raised populations of threespine stickleback fish-and hybrids between these populations-in a controlledand-replicated common garden experiment (Fig. 1; see Fig. S1 at https://figshare.com/s/ e40c984d26187d4d5fe7).We find evidence of a causal relationship between host genetic and microbiome variation.We also find that the strength of association between host genetic variation and microbiome attributes varies regionally across the genome, providing insight into the genomic architecture of the gut microbiome as a complex trait.Importantly, we characterize this relationship in a broader phenotypic context and document that at least some of the genetic variation associated with microbiome variation is also related to body size, highlighting the need to account for such inter mediate morphological traits in future studies of host-associated microbiomes.

Fish husbandry
We generated eight genetically distinct families of threespine stickleback (Gasterosteus aculeatus) fish derived from wild-caught Alaskan populations previously maintained in the laboratory for at least 10 generations with periodic within-population outcrossing.These included three families originating from the freshwater population Boot Lake (N 61.7167, W 149.1167), three families from the anadromous population Rabbit Slough (N 61.5595, W149.2583), and two F1 hybrid families generated from Rabbit Slough females and Boot Lake males (Fig. 1; see Fig. S1 at https://figshare.com/s/ e40c984d26187d4d5fe7).These families each have their own genetic variants segregat ing in the offspring and vary in their relatedness (Fig. S1).All experimental families were generated on the same day in a 2-h period via in vitro fertilization.Embryos were incubated overnight in 0.1 µM filter-sterilized antibiotic medium containing 1 µL/mL Ampicillin, 0.1 µL/mL Kanamycin, 0.312 µL/mL Amphotericin, and four parts per thou sand (ppt) artificial seawater (Instant Ocean) (Spectrum Brands, Blacksburg, VA, USA) (see Fig. S2 at https://figshare.com/s/11c61208acba7777b117).Fin clips from the parents were saved and flash frozen in liquid nitrogen (LN2) to be used later for DNA-based parentage assignments.All procedures involving animals were in accordance with University of Oregon institutional animal care and use committee (IACUC) protocols.

Common garden experimental design
Embryos (n = 40) from each family (n = 8) were randomly assigned to one of two groups (n = 20 each) and pooled to comprise two replicate groups of 160 embryos (Fig. 1; see Fig. S2 at https://figshare.com/s/11c61208acba7777b117).To remove microbes from the fertilization process and prevent vertical transfer, we followed germ-free derivation protocols for threespine stickleback that have been validated using 16S PCR and fluorescent in situ hybridization (FISH) (38).Specifically, the two embryo pools were surface sterilized using vacuum-filtered (0.1 µM) 0.2% polyvinylpyrrolidone-iodine and 0.003% bleach diluted in sterile embryo, maintained in large petri dishes in an incubator (20°C) for 10 days, then transferred to two 75.71L (20-gallon) tanks on a recirculating water system.Fish were fed a diet combining live brine shrimp nauplii and larval flake food once daily throughout the experiment.At 60 days post fertilization (dpf ), fish were removed from the tanks for dissection following procedures to standardize the dissect ing process (see Supplementary Methods Text S1 at https://figshare.com/s/ ff4e981a6adacaf06ada).An empty Eppendorf tube accompanied the samples through out processing and was later used as a negative control to capture environmental microbial contamination.Some mortality is common in clutches of stickleback embryos, and at the end of rearing, tank replicate 1 contained 69 surviving fish while tank replicate 2 contained 80 surviving fish (see Fig. S2 at https://figshare.com/s/ 11c61208acba7777b117).

Dissections and sample preparation
After euthanasia, the offspring were imaged using confocal microscopy to capture phenotypic variation including standard length and tail clipped for DNA-based parentage assignment (see Fig. S3 at https://figshare.com/s/351b3938a2d47c5acffa).Tail clips were flash frozen in LN2 and stored at −80°C.Guts were dissected following sterile procedures and tracked to account for introduced microbes (see Supplementary FIG 1 A replicated common garden experimental design enables the controlled measurement of host genetic and environmental influences on the stickleback gut microbiome.We performed eight total stickleback crosses from freshwater (Boot Lake) and oceanic (Rabbit Slough) laboratory populations, including two between-population ("hybrid") crosses.We randomly assigned 40, initially germ-free progeny from each cross to two replicate tanks (20 progeny per family, per tank), and we raised the fish to 60 days post fertilization after a 10-day rearing period in two large petri dishes containing non-sterile embryo media.For the 149 surviving fish, we determined family membership by parentage analysis using RAD-seq genotyping, and we profiled their gut microbiomes using 16S rRNA amplicon sequencing.The two tank-assigned samples are represented here by solid and dashed lines, and the three cross types (Bt, RS, and Hy) are represented by a respective green, blue, and turquoise color scheme, all of which remain consistent in figures throughout this article.
Methods Text S1 at https://figshare.com/s/ff4e981a6adacaf06ada).Guts were then flash frozen in 1.5 mL Eppendorf tubes previously sterilized via UV irradiation in a Stratalinker (Stratagene, San Diego, CA, USA) at 1,000 J/cm 2 and stored at −80°C (see Fig. S3 at https://figshare.com/s/351b3938a2d47c5acffa).The order of processing was random with respect to experimental factors of interest for all samples, throughout dissection, library preparation, and sequencing.

DNA isolation
DNA was extracted from tail clips from offspring and parents using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA, USA) and quantified using the Qubit Fluorometer Broad Range Kit (Thermofisher, Waltham, MA, USA).Sex was determined using PCR amplification of the male-specific GA1 region (45).Gut samples were homogenized for DNA isolation following a two-step homogenization pipeline (see Supplementary Methods Text S1 at https://figshare.com/s/ff4e981a6adacaf06ada), and DNA was then extracted following the Qiagen DNeasy spin column protocol described by Small et al. (42).

Library preparation
Genomic DNA from each tail clip was standardized to 10 ng/µL and digested with the restriction endonuclease SbfI-HF (New England Biolabs, Ipswitch, MA, USA) and was used to generate RAD-seq libraries (46)(47)(48).The uniquely barcoded samples were multiplexed and run in one lane of sequencing on an Illumina HiSeq 4000 to obtain single-end 150 nucleotide (nt) reads.Twelve samples failed during the first round of sequencing and were re-sequenced later in a separate Illumina run following identical methods.
Gut DNA isolations were standardized to 20 ng/µL and submitted to the Univer sity of Oregon Genomics Core Facility (GC3F) for 16S rRNA (V4 region) amplicon library construction and sequencing (paired-end 150 nt reads) in a single lane on an Illumina HiSeq 4000 (see Supplementary Methods Text S1 at https://figshare.com/s/ ff4e981a6adacaf06ada) (42).Also sequenced were three negative control libraries, generated from a no-tissue extraction sample and two no-PCR-template reactions.

Genotyping and parentage assignment
Tail clip sequencing produced approximately 1.65 million reads per fish from the HiSeq 4000.Raw sequence data were demultiplexed by barcode and filtered using the process_radtags program in the Stacks suite (49,50).Retained reads were then aligned using GSNAP (51) to the stickleback reference genome (version BROAD S1) obtained from Ensembl.Genotypes were called using the ref_map pipeline of the Stacks suite (49,50).Filtering was then performed using the Populations package in Stacks with a minimum minor allele frequency of 0.05, a minimum stack depth of 10, and requiring that data be available in 92% of individuals.The resulting 2,400 SNPs were then used for parentage assignment by the maximum likelihood program COLONY version 2.0.6.2 (52).COLONY was run using default settings with allele frequencies set to calculate from data and an error rate of 0.001.

Amplicon sequence variant calling, filtering, and enumeration
We demultiplexed 16S reads and performed quality filtering, merging, denoising, and taxonomy classification using tools from QIIME 2 v2018.8.0 (53), generally according to the methods described by Small et al. (42).Briefly, we used demux (emp-paired), vsearch (join-pairs), quality-filter (q-score-joined), and deblur (denoise-16S) to define amplicon sequence variants (ASVs).To assign taxonomy to the ASVs, we used feature-classifier (extract reads and fit-classifier-naive-bayes) to train a classifier based on the GreenGenes 13_8 99%-clustered OTU database, followed by application of the classifier using feature-classifier (classify-sklearn).We filtered out any remaining ASVs of mitochondrial or chloroplast origin using taxa (filter-table).The ASV count tables and ASV sequences were then exported for further analysis using version 3.6.1 of the R statistical language (54).
Prior to further analysis, we filtered any individual 16S libraries that had particularly low-sequencing throughput as well as ASVs with particularly low representation among samples or those suspected to be from contaminating (non-fish) sources.Specifically, we excluded one ASV (5a743f17db20a09671800518094538a7, classified to family Coma monadaceae) that was in high abundance in the no-tissue negative control library and a likely reagent contaminant, and we excluded ASVs that were present in only one fish library.We conducted a post hoc analysis of other putative contaminants based on ASV frequencies and library DNA mass (see Supplementary Methods Text S1 at https://figshare.com/s/ff4e981a6adacaf06ada) but found no strong candidates that would warrant removal or affect the outcome of any downstream analysis.We also excluded from downstream analysis individual libraries with fewer than 40,000 total ASV counts.In the case of subsequent multivariate and random forest analyses, ASV count data were normalized to account for depth differences among libraries using the cpm function (log = TRUE, prior.count= 0.5) from the edgeR R package (55), whereas the raw counts were used for differential ASV abundance analysis (see below).

Multivariate community analyses
We visualized stickleback gut microbiomes in community space (defined by 16S ASV abundances) using principal coordinates analysis (PCoA) based on Bray-Curtis dissimilar ity, as implemented via the R package phyloseq (56).We also performed PCoA based on a phylogenetic metric of community dissimilarity, weighted UniFrac (57).To obtain the weighted UniFrac dissimilarity matrix, we aligned ASV sequences using AlignSeqs from the R package DECIPHER (58), optimized parameters for a phylogenetic tree via maximum likelihood using pml and optim.pmlfunctions from the R package phangorn (59), and generated the weighted UniFrac matrix with phyloseq's distance function.We performed all visualizations and multivariate statistical tests in parallel, using both Bray-Curtis and weighted UniFrac dissimilarity matrices.
To test hypotheses about contributions of experimental design factors to differences in gut microbiome composition among fish, we first evaluated a series of permutational multivariate analysis of variance (PERMANOVA) models (60) using the adonis2 function from the R package vegan (61).Both visualization and full, strictly additive models including host population, tank, standard length, sex, and dissector identity as explan atory variables suggested negligible influence from sex and dissector, so we tested fully factorial models excluding these two terms for our primary inferences.Due to the family structure (within populations) inherent in our experimental design, we also performed nested PERMANOVA using the nested.npmanovafunction from the Biodiversi tyR package (62) to explicitly test population-level effects on microbiome dissimilarity.Last, we also performed tests to evaluate whether the different host genotypic classes (at both population and family levels) in the experiment showed different degrees of community dispersion (i.e., beta diversity) using vegan's betadisper function.

Random forest classification of host genotype from microbiome data
To test whether gut microbiome structure as quantified by ASV relative abundances provides any predictive potential with respect to host population of origin, we built three random forest (RF) classifiers.The first was trained based on ASV data from tank 1 individuals and applied to tank 2 individuals, the second was based on tank 2 training and tank 1 evaluation, and the third was based on a training set of 80 individuals randomly selected regardless of tank, and application to the remaining 54.We implemented RF models using the randomForest R package and function (63), with the strata and sampsize arguments set to minimize bias arising from class imbalance during training.
To evaluate whether average classification accuracy of the RF models was higher than chance expectations, we performed 999 permutations for each model, across which population labels for the training set were randomly shuffled.The accuracies obtained from the original models with unshuffled training data were compared to the null distributions of accuracies from the permutations to infer statistical significance.

Differential ASV abundance analysis using zero-inflated negative binomial mixed models
We tested for differential abundance of individual ASVs with respect to host population, host size (standard length: SL), tank, and their interactions by fitting zero-inflated and standard negative binomial mixed models, implemented via the R package NBZIMM (64).We deemed ASVs suitable for zero-inflated negative binomial (zinb) models if they had zero counts in at most 80% of the individuals, but in at least 5%.For those ASVs with zeroes comprising less than 5% of the counts, we used standard (nb) models.To fit the models, we also included host family as a random effect, and we summarized the fits from the NBZIMM functions glmm.zinb or glmm.nb,including tests for model terms, using type-II analysis of variance as implemented by the R package car (65).P-values across ASVs for each hypothesis test were adjusted to control the false discovery rate (66).

Genome-wide association of host genetic and microbiome dissimilarity
To understand the relationship between host genetic and gut microbiome dissimilarity in a genome-wide context, we conducted standard and partial Mantel tests (67) with 1,000 permutations and based on individual genetic dissimilarity: (i) calculated from genotypes at all 2,408 RAD-seq loci, and (ii) calculated from maximally overlapping sliding windows of five RAD-seq loci, along each of the 21 stickleback chromosomes.We conducted standard and partial Mantel tests using the mantel and mantel.partialfunctions from the vegan R package (61).
To quantify individual genetic dissimilarity among fish, we calculated the fractional quantity of allelic differences between individuals using the diss.distfunction (with percent = TRUE) from the R package poppr (68).We used community dissimilarity matrices for these tests based on Bray-Curtis and weighted UniFrac metrics, in parallel analyses.To account for potential non-causal associations between genomic region dissimilarity and community dissimilarity arising from linkage disequilibrium, we used partial Mantel tests with total genetic dissimilarities (estimated from all 2,408 loci) as the "control matrix." To account for the contribution of host size (standard length: SL) differences to the relationship between host genetic and gut microbiome dissimilarity, we used partial Mantel tests with size dissimilarity as the control matrix.Size dissimilarity was calculated as the absolute value of the difference in standard length (SL) between two individuals, effectively "Manhattan distance."
Using RAD-seq data generated for the 149 progeny and all 16 of the potential parents, we assigned parentage with 100% confidence to all progeny (see Data Set S1 at https://figshare.com/s/1fe12c74e01708e7cc7d).One Boot Lake family did not contain any surviving progeny in either tank, so we inferred that this family sustained complete mortality at an early stage of development.This was consistent with our monitoring of extra embryos (siblings of the 40 embryos per clutch included in the experiment), which also sustained 100% mortality early in development.Remaining for analysis were approximately uniformly distributed family sizes for two Boot Lake families, three Rabbit Slough families, and two F 1 Boot Lake-Rabbit Slough hybrid families (see Fig. S1 at https://figshare.com/s/e40c984d26187d4d5fe7).

Host genotype, host size, and housing environment influence the stickleback gut microbiome
We found that three factors in our experiment reliably influenced overall microbiome dissimilarity among individual fish: population of origin ("genetic background"), host size, and rearing tank ("environment") (Fig. 2).With respect to Bray-Curtis dissimilarity calculated from 2,306 ASV relative abundances, population explained the most variation (PERMANOVA: R 2 = 0.056; F 2,129 = 4.235; P ≤ 0.001), followed by tank (PERMANOVA: R 2 = 0.046; F 1,129 = 8.219; P ≤ 0.001), and then by standard length (PERMANOVA: R 2 = 0.037; F 1,129 = 5.564; P ≤ 0.001).It should also be noted that in a model includ ing first-order interactions (Fig. 2), we detected a significant effect of the interaction between population and standard length (PERMANOVA: R 2 = 0.028; F 2,124 = 2.148; P = 0.007), suggesting that fish size influences the gut microbiome (or vice versa) but to different degrees depending on the host's genetic background (see Fig. S4 at https:// figshare.com/s/0d139faa133de0a2afde).Beta diversity of the gut microbiome (i.e., dispersion) may differ subtly among the three genetic backgrounds (permutation test: F 2,131 = 3.438; P = 0.034), with Boot Lake individuals being less dispersed, on average, than Rabbit Slough or hybrid individuals (Fig. 2).In contrast, other factors such as sex did not have a significant influence on overall microbiome dissimilarity.Additionally, the identity of the researcher performing the dissections (see Supplementary Methods Text S1 at https://figshare.com/s/ff4e981a6adacaf06ada) had a minor effect on overall microbiome dissimilarity (see Fig. S5 at https://figshare.com/s/422bc5f6f3737c682c51).
To evaluate whether the effect of host genotype in the PERMANOVA models was driven by overall population differences, as opposed to idiosyncratic family effects within populations, we also performed nested analysis based on the Bray-Curtis dissimilarity matrix.We found that population-level variation significantly explained microbiome variation (nested PERMANOVA: F 2,127 = 2.934; P = 0.033), relative to family effects (nested PERMANOVA: F 4,127 = 1.194;P = 0.160).

Genomically dissimilar stickleback hosts exhibit more dissimilar gut micro biota
We calculated pairwise genomic dissimilarity among individual fish, based on genotypes from 2,408 RAD-seq loci.Closely related host pairs, on average, tended to have more similar gut microbiomes as measured by Bray-Curtis community dissimilarity (Fig. 3).We tested for a statistical association between genomic distance and microbial community distance, while accounting for standard length differences, and found evidence for a subtle but statistically significant positive relationship (partial Mantel test: Mantel r = 0.0432; P = 0.011).We found no evidence for such a relationship between host genomic distance and community dissimilarity as measured by weighted UniFrac (partial Mantel test: Mantel r = −0.004;P = 0.580).

Random forest classifiers trained using gut microbiome data predict popula tion membership with less accuracy for fish from hybrid crosses
We applied three random forest models using ASV relative abundances, first by training on fish from Tank 1 to classify population of origin for fish from Tank 2, second by training on Tank 2 fish to classify Tank 1 fish, and third by randomly selecting (regardless of tank) 80 fish for training and 54 fish for testing.For two of these models, total accuracy was significantly better than a null distribution generated by permutation (permutation test; P = 0.04, P = 0.073, and P = 0.004, respectively; see Fig. S6 at https://figshare.com/s/ bc99ee67fb3d46dcfb55).In all cases, the class-wise accuracy was consistently low for F1 (Hy) individuals (0.381, 0.167, 0.333) but higher for Bt (0.429, 0.778, 0.471) and RS individuals (0.517, 0.481, 0.682).
Each model included at least 10 ASVs with relatively high feature importance (see Data Set S2 at https://figshare.com/s/fe7ff5a0127e7fdd3244), which belonged primarily to Phyla Proteobacteria and Actinobacteria, with representation from diverse classes, including Actinobacteria, Alphaproteobacteria, Betaproteobacteria, Gammaproteobacte ria, and Bacilli.One ASV (from Genus Agrobacterium) was consistently important for all three models, and several others assigned at least genus-level taxonomy, including Luteibacter rhizovicinus, Pseudonocardia halophobica, Sphingomonas spp., Bacillus spp., and Edaphobacter spp., showed high saliency for at least two of the classifiers (see Data Set S2 at https://figshare.com/s/fe7ff5a0127e7fdd3244).

Individual bacterial lineages are associated with stickleback genetic back ground, rearing environment, and genetic-by-environment interaction
We fit negative binomial or zero-inflated negative binomial mixed models to test whether relative abundances of individual ASVs (315 in total) differed among stick leback populations of origin.Among-individual variation was high at the ASV (see Fig. S7 at https://figshare.com/s/a80fd5df2e3f8c38541f), class (Fig. 4), and phylum (see Fig. S7 at https://figshare.com/s/a80fd5df2e3f8c38541f) levels, but we found statistical evidence of association with population of origin for many ASVs (see Data Set S3 at https://figshare.com/s/1eadd006fa1063417ef2).Among the 20 ASVs with the largest test statistics (for the effect of population), Class Alphaproteobacteria (Phylum Proteobac teria) dominated, with substantial representation also from Class Chlamydia (Phylum Chlamydiae).Many (48%) of the ASVs subject to a significant population effect also showed evidence for an effect of interaction between population and tank (see Data Set S3 at https://figshare.com/s/1eadd006fa1063417ef2).For example, the ASV with the fifth ranking population effect test statistic (assigned to Genus Bradyrhizobium) was in low relative abundance among Boot Lake families, but this effect was more pronounced in Tank 2 (see Fig. S8 at https://figshare.com/s/bcd2033ef549013801a2).Another ASV, assigned to Genus Agrobacterium and mentioned above as demonstrating consistent random forest feature importance, was also subject to population and population-bytank interaction effects but with consistent low abundance in Boot Lake individuals (see Fig. S8 at https://figshare.com/s/bcd2033ef549013801a2).Other ASVs, for instance one assigned to Genus Perlucidibaca, showed strong tank differences without popu lation or population-by-tank interaction effects (see Fig. S8 at https://figshare.com/s/ bcd2033ef549013801a2).

Gut microbiome and host genetic associations vary in strength and mode along the stickleback genome
Based on a sliding window approach using partial Mantel tests to account for overall relatedness and size differences among fish, we identified at least nine regions of the stickleback genome at which genetic dissimilarity is positively associated with gut microbiome (Bray-Curtis) dissimilarity (Fig. 5; see Data Set S4 at https://figshare.com/s/ 4a6a3e22f857861fb981).In a parallel analysis using Weighted UniFrac to quantify community dissimilarity, we identified at least five genomic regions (see Fig. S9 at https://figshare.com/s/540694b8f9a180402fae), and all except one overlapped with the nine strongest candidate regions from the Bray-Curtis analysis.These four intersecting candidate regions lie on chromosomes 1, 14, 16, and 20, in marker position intervals of 1.923, 0.406, 3.077, and 3.372 Mb, respectively.The strength of the relationship is clear between gut community (Bray-Curtis) and host genetic dissimilarity for these regions (Fig. 5; see Fig. S10 at https://figshare.com/s/6f8e5af4c698dd347851) relative to all markers (Fig. 5) and to four randomly sampled regions of the same approximate size (see Fig. S10 at https://figshare.com/s/6f8e5af4c698dd347851).

Controlled common garden experiments are essential for understanding determinants of animal microbiomes
Several important features of our common garden experimental design are useful in addressing previously challenging questions.We subjected genetically variable individu als to identical rearing conditions, with a controlled and consistent diet, truly co-housed and with replication, allowing the disentanglement of host genetic from environmental effects on the gut microbiome.This degree of experimental control has been implemen ted more commonly for plant microbiome studies (69)(70)(71)(72), but it is logistically difficult for studies of free-living animal species.Because animal experiments with unrestricted cohousing require the tracking of individuals via marking or genotyping, a more conven ient alternative has been to keep subjects with different genotypes physically separated but generally exposed to similar conditions (33,41,42,(73)(74)(75).In contrast, we incorpora ted unrestricted co-housing in our current study, an approach taken in only a few other studies (31,34,76).Importantly, external fertilization and early surface sterilization of embryos via gnotobiotic techniques developed for stickleback (38) allowed us to control for vertical transfer of microbes and to create a common conventional starting point for microbial colonization among individuals.Germ-free derivation has also been leveraged to understand host genetic contributions to microbiome attributes in Drosophila (77), dung beetles (78), mice (79), and zebrafish (34), although some of these studies relied on inbred lines or induced mutations, which place limitations on their ability to recapitulate and interrogate the landscapes of genetic variation often observed in natural popula tions.The design of our current study included both crucial elements: genetic variation sampled recently from natural stickleback populations coupled with a highly controlled, standardized environment.Additionally, by using genetically distinct populations of threespine stickleback and their F1 progeny, we broadened the continuum of genetic variation along which to assess patterns of inheritance in both regional genomic and quantitative genetic contexts.

Interactions among host genetic background, rearing environment, and host morphology suggest complex, interdependent drivers of gut microbiome diversity
The primary objective of our study was to test whether host genetic variation deter mines the composition of the stickleback gut microbiome.To this end, we assayed ASV abundance in each fish to compare taxonomic patterns with previously published studies in wild and lab-reared stickleback.Our observation of Gammaproteobacteria as the major class is consistent with previous findings of our laboratory in adult stickleback (38,42), and the laboratories of other stickleback researchers (41,80), and is consistent across wild and lab-reared populations (38).Our primary hypothesis, however, was that host genetic variation would impact variation in the microbiome overall.Here, we provide three lines of evidence that support this hypothesis.First, we document clear partitioning by host population of variation in community space.Second, ASV abundance data predict host population with above-noise accuracy using random forest classifiers.Third, among-individual genetic differences along the host genome accompany microbiome differences.The general strength of this pattern was greater for analyses based on Bray-Curtis dissimilarity relative to those based on weighted UniFrac, suggesting that groups of bacterial lineages with recent shared ancestry may be collectively influenced by host genetics.Interestingly, we also found that relationships between host genetic variation and gut microbiome variation differed depending on other study variables, namely rearing environment (tank) and fish size (standard length).These statistical interactions between host genotype and other factors are not surprising, as other studies of host-microbiome associations have revealed similar patterns.For example, a tightly controlled, recent study of influences on the stickleback gut microbiome revealed that host genotype effects were more pronounced during infection with the cestode parasite Schistoce phalus solidus, relative to controls (76).Additionally, authors of a study in which two Japanese quail genotypes received different cholesterol diets found that diet interacts with host genotype to influence the intestinal microbiome (81).
Our observed connections among stickleback genotype, standard length, and gut microbiome composition highlight an increasing appreciation for the idea that a multitude of heritable host traits likely unrelated to immunobiology sensu stricto make significant contributions to microbiota.Indeed, traits such as body size in killifish and cod (82) and mice (83), and leaf attributes and physiological traits in Picea spruces (84), have been shown to co-vary with host-associated microbiome attributes but without necessarily explaining the totality of host genetic contributions to microbiome variation.In the case of our study, we noted a relationship of varying strength between SL and community composition across the three genotypes and that host genetic variation still explained microbiome variation after accounting for the size differences.markers, a partial Mantel test (accounting for overall genetic dissimilarity) was used to assess the relationship between genetic and microbiome (Bray-Curtis) dissimilarity.The left (black) y-axis and black points indicate the test statistic, and the right (red) axis and solid red lines represent the lowess-smoothed P-value distribution from the statistical tests.Note that the red y-axis is on a log 10 scale, ascending from high to low, to show that especially low P-values commonly accompany windows with large, positive test statistics.Points (marker windows) above the top, dashed line represent regions of the genome especially strongly (i.e., beyond null expectations) associated with microbiome differences.The red, dashed line marks a P-value of 0.05.Blue bands show four "peaks, " window blocks that are consistently significant across frequency-based (Bray-Curtis) and phylogenetic (Weighted UniFrac) metrics of community dissimilarity.These peaks likely correspond to regions of the stickleback genome that influence gut microbiome differences.Alternating white and gray vertical bands demarcate stickleback chromosomes.Plots of host genetic and gut microbiome dissimilarity for a randomly selected region of 15 markers on chromosome 14 (B and C) show a relatively weak association, whereas the 15 markers in peak 4 (D and E, and marked with an asterisk) show a strong, positive relationship.Each point in B and D is colored (see legend in D) according to the population combination reflected by the fish pair.In Plots B and D, points are "jittered" about discrete degrees of genetic dissimilarity (x-axis) to reduce obfuscation.Plots C and E show means and means ± standard errors as crossing lines representing family-wise stratification of the pairwise dissimilarities, with each family pair represented by a text label.
One outstanding question, however, concerns the directionality of potential causal relationships among these variables.Body size or growth function parameters, which show high heritability in stickleback fishes (85,86), may determine which microbes can colonize and/or persist in the gut.Standard length can vary substantially between adults in threespine stickleback populations and is also often used as a metric for determining developmental stage of juveniles (87).As our fish were 60 dpf-still juvenile-it is possible that differences in developmental rate among genotypes could explain our results.
Alternatively, it is conceivable that a fish's gut microbiome, determined in part by its genotype, affects metabolism and ultimately growth to dictate size.In any case, consideration of a host's composite phenotype, including morphological, behavioral, metabolic, and immunological traits, is important when conceptualizing host genetic contributions to the microbiome.Traits like body size within stickleback and other host systems, for example, should be explicitly evaluated and accounted for whenever possible, when the primary interest of studies (e.g., GWAS) is the identification of host genetic microbiome determinants.Importantly, our ability to document the interactions among these variables further highlights the strength of the common garden experi mental design.

Gut microbiome dissimilarity increases with greater host genetic dissimilarity
One important prediction from our conceptual understanding of host-microbe coevolutionary dynamics is that continuous genetic differentiation among individual hosts in a population, particularly owing to differing immunogenetic repertoires, will translate to compositional dissimilarity of the microbiome (16,88).Tests of this prediction for hosts of the same species have been rare, are seldom performed using controlled experiments, and have provided mixed results.For example, an environment-controlled study of Daphnia galeata hatched from different sediment layers (73) tested for an association between genome-wide genetic dissimilarity and microbiome dissimilarity, and it yielded no such relationship.On the other hand, authors of an exome sequenc ing-based study compared wild-caught house mice from five natural populations and did find evidence for a positive relationship between host genetic and gut microbiome dissimilarity (33), as did authors of a microsatellite-based study of wild-caught threespine stickleback from six lake populations (39).
We found a positive relationship between genetic dissimilarity calculated from 2,408 RAD-seq loci across the stickleback genome and Bray-Curtis gut microbiome dissimilar ity, accounting for differences in fish size.This pattern appears to be driven largely by smaller genomic differences and more similar microbiomes, on average, between Boot Lake individuals, and more generally by smaller genetic and microbiome dissimilarities between individuals within, relative to between, families (Fig. 3).Interestingly, commun ity dissimilarity was not greatest for individual pairs with a Boot Lake and Rabbit Sough individual (the most genetically dissimilar pair type in the experiment), suggesting that allelic dominance at host loci may be important for inheritance with respect to microbiome composition.
Not surprisingly, this structural relationship between genetic variation and beta diversity was also borne out by multivariate analyses treating population of origin as a discrete factor.We observed that the gut microbiota of Boot Lake individuals was less dispersed, on average, relative to Rabbit Slough or hybrid individuals.Boot Lake individuals, considered as a group in this experiment, have experienced more inbreeding and exhibit on average lower genetic variation than Rabbit Slough individuals.
Our abilities to predict population membership of fish based on their gut microbiota also varied by genetic diversity within the groups.Random forest classifiers trained using ASV relative abundances were most accurate for fish belonging to pure freshwater (Boot Lake) or pure anadromous (Rabbit Slough) populations.Accuracy declined when predicting membership for F1 individuals from the hybrid crosses.Deeper investigation, including greater sampling of progeny from a more diverse panel of crosses, is needed to better understand what aspects of genetic variation and forms of inheritance (i.e., non-additive) are driving the patterns observed here.

Host genomic regions contribute differentially to variation in the gut microbiome
Our sliding window analyses of the relationship between host genetic and microbiome dissimilarity suggest that several regions of the stickleback genome contribute large effects to gut microbiome variation among individual fish, relative to the genome average.The distribution of effect sizes for the analysis based on Bray-Curtis community dissimilarity is similar to that for an analysis in which we tested for correspondence between genetic dissimilarity and standard length (SL) differences (see Fig. S9 at https://figshare.com/s/540694b8f9a180402fae), suggesting for this metric of community dissimilarity anyway, a comparable genomic architecture in the broad sense.
As was the case for several other analyses; however, measurement of microbiome dissimilarity using weighted UniFrac resulted in fewer large-effect regions of association than analysis based on Bray-Curtis dissimilarity.Nevertheless, four genomic regions from the two analyses overlap, lending support for at least several potentially important genetic effects, each on a different stickleback chromosome.In addition, we controlled for standard length differences via partial Mantel tests, and none of the four regions overlapped with large-effect regions from the SL analysis, so host genetic effects other than those on body size explain the differences in community structure.
The genomic regions of association we identified are large, covering hundreds of kilobases in some cases, so narrowing intervals to putative causal variants would likely require GWAS or fine-mapping analyses, with considerably larger sample sizes.At least one other study has included an analysis (QTL mapping) of association between genomic variants and gut microbiome composition in threespine stickleback from two British Columbia lake populations (89).Although mapping intervals were also wide in that study, there appears to be minimal overlap between QTL for the latent community variables mapped therein and association peaks from our analyses, with the possible exception of overlapping regions on chromosomes 2, 14, and 20.
Given our finding of an overall association between genetic dissimilarity and microbiome dissimilarity, as well as genomically localized peaks of extreme test statistic values, we conclude that there is an overall polygenic host contribution to microbiome beta diversity.However, our permutation-based association tests carried a lower bound P-value of 0.001, and the contribution of any particular genomic region should be considered preliminary in light of the possibility of false positives inherent in genomic multiple testing frameworks.As in other genomic association studies, the effects of candidate regions here should be confirmed in the future via additional approaches such as independent replication, fine-scale mapping, and genome editing.

Conclusions and perspective
We conclude from our carefully controlled common garden experiment that host genomic diversity is a definitive facilitator of beta diversity for gut microbiomes among individual stickleback fish and that inheritance of structural microbiome traits is not strictly additive.Future studies that both leverage metagenomic interrogations of the microbiome and precise manipulation of host genomes will be required to understand mechanistic bases of the associations reported here.The replicated tank element of our common garden design also demonstrated that rearing environment is an important factor and that the interaction between genetic and environmental variation (G-by-E) influences the microbiome as well.Our results indicate that future studies which harness standing genetic variation measured across the host genome, as compared to gross population or inbred line membership, permit much more nuanced understanding of genomic architecture and possible mechanistic connections between host genes and microbes.In addition, for those studies with inherent reliance on inferences from host genomic data (e.g., GWAS, QTL mapping), efforts should be made to limit the impact of variables that confound genetic effects, through tools like gnotobiotic protocols and unrestricted co-rearing.These tools can be implemented now for powerful outbred fish models such as threespine stickleback and can be developed for many other organisms in future studies to better understand the general principles and mechanisms by which host genetic variation influences microbiome diversity.

FIG 2
FIG 2 Host genotype (population of origin), host size (standard length: SL), and rearing environment (tank) influence the structure of the stickleback gut microbiome.An ordination of individual stickleback guts in microbial community space using PCoA, and based on Bray-Curtis dissimilarity, shows evidence for separation by host family (A) and by tank (B).A scatterplot of values for the first PCoA axis (which explained 28.52% of the total variation in composition) vs. fish standard length (SL) shows differing relationships between community structure and fish length among the three different host populations (C).A similar scatterplot, but including PCoA 2 (11.12% of variation explained), suggests a weak relationship between community structure and fish size.In all plots, colors represent fish populations (Bt = green, RS = blue, Hy = turquoise).In B-D, point style represents rearing tank (tank 1 = closed, tank 2 = open), and in B ellipses, line style represents rearing tank (tank 1 = solid, tank 2 = dashed).Ellipses in A-B reflect 95% confidence regions about respective group centroids, and dashed lines in C-D show population-specific slopes from general linear models.

FIG 3
FIG3 Stickleback that are similar genetically have similar gut microbiomes.A scatterplot (A) of all pairwise Bray-Curtis and genetic (based on 2,408 markers) dissimilarity values.Each point is colored (see legend) according to the population combination reflected by the fish pair.A companion plot (B) shows means and means ± standard errors as crossing lines for family-wise stratification of the pairwise dissimilarities, and each family pair is represented by a text label.Note that the x-axis limits are the same for both A and B, but the y-axis range is smaller for B to better illustrate the distributions of the means.

FIG 4 FIG 5
FIG 4 Relative abundance of bacterial classes based on 16S profiling demonstrates substantial variation in community composition among individuals, populations and families, and rearing tanks.Each vertical bar represents the gut microbiome of an individual fish, and bars are ordered within each family-tank combination by increasing abundance of Class Gammaproteobacteria, the most abundant class overall on average.Black horizontal bars below the plot represent tank (Tank 1 = solid, Tank 2 = dashed), and colored horizontal bars represent stickleback families.The legend to the right of the plot provides a key to the 16 most abundant (on average, from bottom to top) classes.