Strain-level analysis of Bifidobacterium spp. from gut microbiomes of adults of differing lactase persistence genotypes

One of the strongest associations between human genetics and the gut microbiome is a greater relative abundance of Bifidobacterium in adults with lactase gene (LCT) SNPs associated with lactase-non persistence (GG genotypes), versus lactase persistence (AA/AG genotypes). To gain a finer grained phylogenetic resolution of this association, we interrogated 1,680 16S rRNA libraries and 245 metagenomes from gut microbiomes of adults with varying lactase persistence genotypes. We further employed a novel genome-capture based enrichment of Bifidobacterium DNA from a subset of these metagenomes, including monozygotic (MZ) twin pairs, each sampled 2 or 3 times. B. adolescentis and B. longum were the most abundant Bifidobacterium species regardless of host LCT-genotype. LCT- genotypes could not be discriminated based on relative abundances of Bifidobacterium species or Bifidobacterium community structure. Metagenomic analysis of Bifidobacterium-enriched DNA revealed intra-individual temporal stability of B. longum, B. adolescentis, and B. bifidum strains against the background of a changeable microbiome. We also observed greater strain sharing within MZ twin pairs compared to unrelated individuals, and within GG compared to AA/AG individuals, but no effect of host LCTgenotype on Bifidobacterium strain composition. Our results support a “rising tide lift all boats” model for the dominant Bifidobacteria in the adult gut: their higher abundance in lactase-non persistent compared to lactase-persistent individuals results from an expansion at the genus level. Bifidobacterium species are known to be transmitted from mother to child and stable within individuals in infancy and childhood: our results extend this stability into adulthood. IMPORTANCE When human populations domesticated animals to drink their milk they adapted genetically with the ability to digest milk into adulthood (lactase persistence). The gut microbiomes of lactase non-persistent people (LNP) differ from those of lactase-persistent people (LP) by containing more bacteria belonging to the Bifidobacteria. These beneficial gut bacteria, which fall into many species, are known to degrade milk in the baby gut. Here, we asked if adult LP and LNP microbiomes differ in the species of Bifidobacteria present. We studied the gut microbiomes of LP and LNP adults, including twins, sampled at several times. In particular, we used a technique to selectively pull out the DNA belonging to the Bifidobacteria: analysis of these DNA segments allowed us to compare Bifidobacteria at the strain level. Our results show that the LNP enhance the abundance of Bifidobacteria regardless of species. We also noted that a person’s specific strains are recoverable several years later, and twins tend to share the same ones. Given that Bifidobacteria are inherited from mother to child, strain stability over time in adulthood suggests long term, multi-generational inheritance.


INTRODUCTION 64
Lactose tolerance arose in European, African and Middle Eastern human populations with animal domestication (Tishkoff et al., 2007;Itan et al., 2010;Ranciaro et al., 2014). The genetic 66 underpinnings of lactose tolerance represent one of the strongest signals of recent selection in the human genome. The enzyme lactase metabolizes lactose, the primary carbohydrate in mammalian milk. 68 The gene regulatory region of the lactase gene (LCT) controls the downregulation of lactase after weaning (Sabeti et al., 2007). Allelic variants that inhibit downregulation, resulting in lactase persistence, 70 occur in an estimated 35% of humans (Itan et al., 2010). Lactase persistence allows hydrolyzation of lactose and uptake of the resulting glucose and galactose directly in the small intestine of adults and is 72 linked to lactose tolerance.
In a striking parallel, one of the strongest signals for human genotype effects on the gut 74 microbiome also relates to lactase persistence. In Western populations, individuals with a lactase persistent genotype harbor significantly lower relative abundance of Bifidobacterium in their gut 76 microbiomes compared to non-persistent individuals (Blekhman et al., 2015;Goodrich et al., 2016a;Turpin et al., 2016;Rothschild et al., 2018). The association was found to be stronger when dairy 78 consumption in the lactase non-persistent individuals was considered (Bonder et al., 2016). Together these observations suggest that for lactase non-persistent individuals, Bifidobacterium may benefit from 80 the availability of lactose in the gut. Bifidobacterium is a large genus whose members partition the nutrient niche space (O'Callaghan et al., 2016), which suggest that not all Bifidobacterium species may 82 respond equally to lactose availability. However, beyond an overall enrichment of the Bifidobacterium genus, the effects of the lactase persistence genotype on the Bifidobacteria community in the gut 84 remain unclear.
Here, we aimed to interrogate the LCT -Bifidobacterium link at a finer phylogenetic 86 resolution. We re-examined public metagenomic (Xie et al., 2016) and 16S rRNA gene data (Goodrich et al., 2016a) from an UK twin cohort with both lactase persistent (AA, AG) and non-persistent (GG) 88 individuals. We then performed genome-capture enrichment of Bifidobacterium from 11 twin pairs of each genotype across two or three time points per individual, and sequenced each metagenome before 90 and after genome capture. With these data, we asked if lactase persistence genotype influenced strain composition, longitudinal stability within an individual, or similarity within families for three species (B. 92 longum, B. adolescentis, B. bifidum). Our results suggest a proportional increase of the predominant Bifidobacterium species in the gut microbiomes of the lactase-persistent compared to non-persistent 94 individuals. We observed strain sharing for twins, and stability within individuals, regardless of LCT genotype. 96

RESULTS 98
Our analysis of 16S rRNA gene SVs confirmed our previous OTU-based report (Goodrich et al., 2016a) of significantly greater mean relative abundance of Bifidobacterium in lactase non-persistent 100 (GG) versus persistent (AA/AG) individuals (mean AA/AG = 0.96%±0.05 and mean GG = 3.22%±0.4SE, linear mixed-model P < 4.5e -16 )( Figure 1A, Table S1). This analysis revealed 13 Bifidobacterium SVs which 102 occurred in at least two of 1,680 samples. Five of the 13 SVs were unambiguously identified to the species level, while two fell within a broader B. longum/B. breve clade. Both host genotype groups (GG 104 vs. AA/AG) were dominated, in order of abundance, by B. adolescentis, B. longum/breve, B. pseudocantenulatum, B. animalis and B. bifidum. Together these taxa represented over 98.6% of all 106 sequences assigned to the Bifidobacterium genus, although only 1.1% of all sequences across all taxa (Table S1). These five species had similar proportions of the total Bifidobacterium community in both 108 genotype groups and almost identical rank orders (GG: 93%, AA/AG: 91%)( Figure 1B, Table S1). Based on SVs, B. animalis and B. dentium were the only taxa with species level designations that did not have 110 significantly greater relative abundance in GG versus AA/AG genotypes.  After normalization of Bifidobacterium within each genotype (thereby removing the effect of an overall enrichment of the genus), discriminant analysis using LEfSe (Segata et al., 2011) revealed no 130 discriminant Bifidobacterium SVs between GG and AA /AG individuals (data not shown). This result indicates similar proportional abundances of each Bifidobacterium species within each host genotype 132 group, despite an overall greater relative abundance of all taxa in GG versus AA/AG genotypes. ANOSIM permutation tests on the Bray-Curtis Dissimilarity (BCD) matrix of within-genotype normalized 134 Bifidobacterium SV matrixes also revealed no significant community clustering by host genotype group (ANOSIM P > 0.05), further supporting a proportional increase across most taxa rather than genotypic 136 selection of specific species or strains within the genus.
We also assessed the association between frequency of dairy consumption and Bifidobacterium 138 SVs for the 783 samples for which both datasets were accessible. Linear mixed-effect models, with genotype as a random variable, revealed no overall association between dairy consumption and the 140 relative abundance of the genus (linear mixed model P = 0.113). When SVs were interrogated independently, B. animalis and an unclassifiable Bifidobacteria were the only two SVs which showed 142 significant associations (B. animalis: P = 0.0023, B. unknown1: P = 0.014). Interestingly, when each genotype was considered independently, significant associations with levels of dairy consumption were 144 observed for B. animalis within GG individuals, but not in AA/AG individuals (generalized linear model, GG: P = 0.03, AA/AG: P = 0.13). 146 Our taxonomic annotations of metagenomic reads from existing TwinsUK metagenomes (Xie et al., 2016) revealed an overall enrichment of the Bifidobacterium genus in lactase-non-persistent 148 individuals (mean AA/AG = 1.1%±0.15 and mean GG = 2.8%±0.9, linear mixed model P = 0.0025)( Figure   1C, Table S1) and a proportional increase across most Bifidobacterium species in GG versus AA/AG 150 individuals ( Figure 1D). Largely concordant with the results of the 16S rRNA SV analysis, Bifidobacteria metagenome annotations from both host genotype groups were dominated by B. adolescentis, B. 152 longum, B bifidum, and B. animalis, which together represented more than 80% of all Bifidobacterium sequences, though only 1% of the overall community ( Figure 1D, Table S1). Taxonomic annotations from 154 metagenomes showed a greater diversity of Bifidobacterium taxa across the full dataset compared to the 16S rRNA gene SV based analysis, despite a lower sample count (245 metagenomes versus 1,680 16S 156 rRNA samples), with 65 species and subdivisions identified (versus 13 SVs).
Functional annotations of the 245 metagenomes revealed no MetaCyc Bifidobacterium 158 metabolic pathways with significantly different relative abundances between the two genotype groups (pairwise Kruskal-Wallis H-tests for mean abundance of each Bifidobacterium pathway in GG versus 160 AA/AG individuals, Bonferroni multiple comparison correction, implemented in HUMAnN2; data not shown). However, low or zero read coverage for most Bifidobacterium pathways in the metagenomes 162 limited the power to assess differences between host genotype groups at the functional level.
Genome capture enriches Bifidobacteria DNA -We used genome capture (Carpi et al., 2015;Metsky et 164 al., 2019) to enrich for Bifidobacterium DNA in metagenome libraries generated from 22 individuals each with 2 or 3 samples obtained at different time points (46 samples total representing 11 GG and 11 AA 166 individuals). Each genotype group also included 4 sets of adult monozygotic twin siblings (Table S2). Our custom genome capture array included a total capture space of >94Mb and 89k capture targets built 168 from 47 reference Bifidobacterium genomes spanning the entire diversity of the genus (Table S3). The capture reaction was largely genus-specific, with low levels of enrichment of non-Bifidobacterium 170 Bifidobacteriaceae and other non-Bifidobacterium Actinobacteria (Figure 2A). The capture reaction increased the mean relative abundance of all Bifidobacterium across our metagenomic sample subset 172 from 2.1% (±0.27%SE) to 60.2%(±3.7%SE), representing a nearly 30 fold increase from pre-to postcapture libraries. The mean relative abundance of sequences annotated as a specific Bifidobacterium 174 taxa across all pre-captured libraries was proportional to that taxon's initial relative abundance in the pre-captured libraries (rho = 0.99, P < 2.2e -16 , Figure 2B), and the mean relative abundance ranking order 176 of the top 5 taxa was the same in pre-and post-captured libraries (data not shown). shows least-squares regression with 95% confidence intervals, spearman's rho is shown along with the significance of the correlation. 194 Strain level analysis of Bifidobacterium spp. in post-capture metagenomes -We compared 196 strains recovered from post-capture metagenomes using three methods: (i) we generated genome-wide strain phylogenies using multi-locus sequence alignments (MLSAs) via StrainPhlAn 198 (ii) a marker-SNP strain tracking analysis implemented in the Metagenomic Intra-Species Diversity Analysis System (MIDAS) (Nayfach et al., 2016); and, (ii) we used a custom synteny-based approach. We 200 compared strains in all possible pairwise sample comparisons from our genome capture subset (N = 46).
We then grouped each comparison into one of three categories; 'Intra-Individual (comparison between 202 samples from the same person over time), 'Same Family' (between monozygotic twins within a twinship), and 'Unrelated' (between samples from unrelated individuals). These categories allowed us to 204 assess strain stability through time within an individual (intra-individual comparisons) and the influence of monozygotic twins (same family comparisons). Our post-capture libraries yielded sufficient reads to 206 conduct these analyses in four species, B. longum, B. adolescentis, B. animalis, and B. bifidum, with sample numbers shown in Table 1. 208

MLSAs showed greater similarities of strain sequences in intra-individual versus unrelated
comparisons -Using StrainPhlAn, we created MLSAs across nearly 200 strain-resolving marker genes 210 derived from a species' core genome (genes shared by all strains in the species). Of the post-capture shotgun metagenomes, B. adolescentis, B. longum, B. bifidum and B. animalis had sufficient coverage for 212 43, 39, 19, and 11 samples respectively (minimum of 2x coverage across entire alignment) ( Table 1). The marker gene MLSAs ranged from 36 and 83kb in length depending on species (Table S4). The mean 214 number of polymorphic sites within a sample across an alignment ranged from 0.39% to 0.92%, while the dominant allele at each site ranged upwards from 76%, suggesting low strain diversity within 216 samples (Table S4, Figure S1).
Mean patristic distance (i.e., branch length) was significantly less in intra-individual versus 218 unrelated comparisons in phylogenies of B. adolescentis, B. longum, and B. bifidum strains (bootstrapped Wilcoxon rank sum P < 0.05 in all three cases, Figure 3). This result indicates strain 220 stability: an individual's strains are more similar for the same person sampled over time than compared to strains from a different person. We did observe some instances of large patristic distance between 222 samples of the same individual (Figure 4), and in both cases where three samples per individual were included, one of the three samples had a large patristic distance more typical of unrelated comparisons 224 ( Figure 4). These large inter-individual patristic distances could result from detection errors (i.e., a strain was present in both samples but failed to be detected in one), or from differences in the relative 226 abundance of strains differed between time points (this analysis picks the most abundant), or from loss and regain of strains. 228 after 999 bootstraps to smallest group size, ns: p > 0.05 *: p <= 0.05, **: p <= 0.01, ***: p <= 0.001, ****: p <= 0.0001. 238 Mean patristic distance of same-family comparisons was not significantly different than 240 unrelated comparisons for any of the four taxa examined (Wilcoxon rank sum of mean distances in intertwin versus unrelated, P > 0.05 in all cases, Figure 3). Thus, the MLSA-based RaxML phylogenies did not 242 show that co-twins harbored similar strains of these 4 species. Furthermore, this analysis did not reveal any significant grouping of strains according to lactase genotype (ANOSIM permutations of phylogenetic 244 distance matrix P > 0.05, Figure 4)   intra-individual sample). Twin IDs marked with 'X' imply only a single twin had representation on the tree, or that individual had no twin in the dataset. Scales show patristic distances. 258

Marker-SNPs are shared at higher percentages in intra-individual versus unrelated comparisons -To 260
further interrogate Bifidobacterium strain stability within individuals over time, we employed a 'strain tracking' feature, implemented in the MIDAS software, to identify rare marker-SNPs highly discriminant 262 for a specific strain in an individual at a single time point. We had sufficient coverage to employ this approach in >10 individuals for each of B. longum, B. adolescentis, B. animalis, and B. bifidum species 264 (Table 1). Our results show that microbe marker-SNPs of an individual are far more likely to be found in the same individual at a different time point than in an unrelated individual in three of the four species 266 examined (bootstrapped Wilcoxon rank sum of mean %shared SNPs, intra-individual versus unrelated, P < 0.0001 in all cases, Figure 5). B. animalis was the one exception, where SNP sharing was not 268 significantly higher in intra-individual versus unrelated comparisons (bootstrapped Wilcoxon rank sum P = 0.32, Figure 5; Tables 2B & S5). 270 to smallest group size, ns: p > 0.05 *: p <= 0.05, **: p <= 0.01, ***: p <= 0.001, ****: p <= 0.0001. 280 We observed greater mean intra-individual sharing of B. adolescentis and B. longum in GG versus AA individuals ( Figure S2), when all species were considered in aggregate (Wilcoxon rank sum of 282 mean intra-individual %shared SNPs, GG versus AA individuals, P = 3.3e -9 ; Figure S3). This pattern was driven by B. longum and B. adolescentis, the two most abundant species ( Figure S2). 284 The SNP-marker analysis revealed that high stability in either B. longum, B. bifdum, or B.
adolescentis was a good predictor of stability in the other two species within that same individual 286 ( Figure 6). This suggests some individuals carry stable strains, while others witness strain replacement, across all three species together. We detected higher marker-SNP sharing within families (i.e., a twin 288 effect) for B. adolescentis (bootstrapped Wilcoxon rank sum P = 0.021) but not the other three species examined, possibly due to a lower power to detect this pattern. The percent of shared marker-SNPs for 290 a given intra-individual comparison was not correlated with time between sample collection, nor where other metrics of overall microbiome similarity ( Figure S5), meaning the percent of shared marker-SNPs 292 between samples does not decrease with time, and the patterns observed here are therefore not a function of time-induced SNP accumulation. 294 Notably, Bifidobacterium strains exhibited intra-individual longitudinal stability despite high variability in overall microbiome communities around them. Only a very weak, non-significant 296 association was observed between an individual's overall microbiome stability ( assessed using Brey-Curtis Dissimilarity (BCD) of pre-captured metagenomes) and the number of shared marker SNPs for a 298 given Bifidobacterium strain (Figure 6, Figure S6). This pattern held when microbiome community dissimilarity was assessed using either Simka BCD values, a k-mer based annotation-independent 300 method, or BCD values for annotations of metabolic pathways of pre-captured metagenomes ( Figure  S6). These results suggest Bifidobacterium stability may be independent from the dynamics of 302 microbiome communities in which they exist. gene-content comparisons. Gene synteny is defined as the conservation of gene order between chromosomes, along an evolutionary gradient (Engström et al., 2007) and was previously used as a 320 measure of the distance between genomes (Nadeau and Taylor, 1984;Sankoff et al., 1992). Here, we identify syntenic blocks, regions of conserved DNA sequence occurring in the same order in two 322 different chromosomes, to determine the relatedness of Bifidobacterium strains. We created a pipeline to compare the synteny of different genomic regions in three Bifidobacterium species for which we had 324 sufficient coverage from captured metagenomes: B. longum, B. adolescentis and B. bifidum (see methods). Briefly, we performed a de-novo assembly of post-captured metagenomes in each sample, 326 and used a subset of randomly selected genes to perform a BLAST search against the metagenomic assemblies. Next, we performed a pairwise synteny comparison between all genomic regions, which 328 included the gene used as BLAST query and their flanking regions (3.5 kbp upstream and downstream to the gene). Finally, we defined a synteny-score and performed the same pairwise comparisons as 330 outlined above for regions which were identified in >15 single-individual assemblies. In summary, we Boxplots show median and quartiles, while red diamond shows mean. For each species all regions were combined into a single analysis. Significance levels are Wilcoxon rank sum tests, ns: p > 0.05 *: p <= 0.05, 342 **: p <= 0.01, ***: p <= 0.001, ****: p <= 0.0001. 344 For all three species, intra-individual comparisons yielded significantly higher synteny scores than comparisons of unrelated individuals (Figure 7), although this pattern varied somewhat by region; 346 6/7 genomic regions for B. adolescentis, 2/4 regions for B. bifidum, and 4/10 regions for B. longum (results are summarized in Table 2C and Pvalues are shown in Table S6). These results indicate that the 348 same strains persist in the gut over time. Comparisons of synteny between individuals within a twin pair yielded significantly higher scores (relative to comparisons of unrelated individuals) in 4 regions in B. 350 adolescentis and in one region in B. bifidum. In B. longum, although not statistically significant, 3 regions showed lower synteny scores for individuals within a pair (P-values 0.055-0.08, Table S6). Finally, LCT-352 genotype did not influence longitudinal strain stability as measured by synteny. No differences were seen in mean intra-individual synteny scores between lactase persistence genotypes (Wilcoxon rank sum 354 test of mean intra-individual synteny, AA versus GG, P > 0.05), and each genotype independently showed significant differences between intra-individual and unrelated comparisons (bootstrapped 356 Wilcoxon rank sum test, P = 0.0005 for AAs and P = 6.1e -5 for GGs, Figure S3). Together, these results support strain sharing events within the families and maintenance of strains over time within 358 individuals.

DISCUSSION
One of the strongest signals of host genetic effects on microbiome communities is the 362 association between Bifidobacterium and a SNP in the regulatory region of the lactase gene LCT. Several independent studies have noted a greater relative abundance of the genus Bifidobacterium in the gut 364 microbiomes of lactase non-persistent compared to lactase-persistent individuals of European origin (Blekhman et al., 2015;Bonder et al., 2016;Goodrich et al., 2016a;Rothschild et al., 2018). We did not 366 detect an interaction between dairy consumption, LCT genotype and levels of Bifidobacterium here, as reported by Bonder (2016). However, our results show that compared to the lactase-persistent 368 genotypes (AA/AG), the lactase non-persistence genotype (GG) enhances the proportion of prevalent and abundant Bifidobacterium species, without bias towards particular species, strains, or genome 370 content. Our results also indicate that strains of certain Bifidobacterium species are shared within twin pairs and persist over time within individuals against a background of a dynamic microbiome 372 community.
Corroborating results of previous reports (Turroni et al., 2009;Arboleya et al., 2016;Kato et al., 374 2017), we observed that B. adolescentis and B. longum dominated the Bifidobacterium communities of the adult gut microbiome. Strain comparisons indicated that B. adolescentis, B. longum and B. bifidum 376 were stable within adults over multi-year timescales and shared within twin pairs. It is interesting to note that B. animalis, the one species for which stability was not shown, was also among the few species 378 not enriched in GG versus AA/AG genotype groups from our broader 16S rRNA and metagenomic surveys. These observations may reflect a different ecology of B. animalis compared to the others. B. 380 animalis is commonly isolated from diary and other sources outside the human microbiome , and has a reduced genetic repertoire for carbohydrate metabolism versus other taxa in the genus 382 (Milani et al., 2015b). B. animalis may therefore be a more transient autochthonous member of the adult gut microbiome. 384 A leading hypothesis for the greater relative abundance of Bifidobacterium in lactase nonpersistent individuals is a greater availability of undigested lactose in the large intestine, which may be 386 preferentially used by Bifidobacteria (Bonder et al., 2016;Goodrich et al., 2016b). The availability of undigested lactose in the guts of lactase non-persistent individuals could in principle result in niche 388 partitioning between the different species. Indeed, many taxa within the Bifidobacterium genus are considered specialists (Turroni et al., 2009;Milani et al., 2015), and some devote significant proportions 390 of their genomes to particular host derived resources including human milk oligosaccharides in B. longum subs. infantis (Sela et al., 2008), and mucin glycans in B. bifidum (Duranti et al., 2015). B. longum 392 subs. Infantis, for example, is known to specialize in human milk oligosaccharide metabolism, while B. longum subs. longum is specialized in metabolism of plant-derived carbohydrates (Sela et al., 2010;394 O'Callaghan and van Sinderen, 2016). Furthermore, gut Bifidobacterium taxa are known to follow successional patterns with a gradual shift from B. longum subs. infantis, B. breve, and B. bifidum in infant 396 guts to B. catenulatum, B. adolescentis and B. longum subs. longum in adults (Arboleya et al., 2016;Kato et al., 2017). These shifts are thought to result in part from shifts in diet, and are associated with 398 breastfeeding versus formula, weaning, and intake of solid food (Stewart et al., 2018;Vatanen et al., 2019). However, our results instead point to lactose utilization as enhancing all the common 400 Bifidobacterium species, without altering their population structure. This implies that in the adult gut microbiome, common Bifidobacterium species use lactose without competing for it and it does not 402 become a limiting resource. The non-differentiating effect of host LCT-genotype on the Bifidobacterium community results suggest a 'rising tide raises all boats' scenario, where a lactose utilization advantage 404 is equally distributed among common Bifidobacterium species.
The strain comparisons in this study relied on enrichment of Bifidobacterium DNA by genome 406 capture. This worked well for the most abundant species, B. adolescentis and B. longum, and allowed a reduced set of comparisons possible for B. bifidum and B. animalis due to lower coverage. To compare 408 strains of the other Bifidobacteria present would require deeper sequencing or a more tailored set of probes in the genome capture. To compare strains we used two published methods based on sequence 410 composition; one using an alignment of marker-genes (StrainPhlAn), and another using individually discriminant marker-SNPs (MIDAS). The novel synteny-based method that we developed allows for the 412 identification of differences in the organization of genomic regions even when the sequence similarity is high. Generally, the two published methods agreed with the synteny approach, with some discrepancies 414 (Table 2). All three methods detected intra-individual longitudinal stability for B. adolescentis , B.bifidum and B. longum, while no method detected stability in B. animalis. The three methods were divided 416 however in their detection of within-family sharing of strains (a twin effect), and on the influence of lactase persistence genotype on strain stability. Both the marker-SNP and synteny approaches detected 418 greater similarity of B. adolescentis strains between twins versus unrelated individuals, while our marker gene alignments did not. Only the synteny approach detected twin sharing in B. longum and B. bifidum. 420 The marker-SNP approach was the only method to reveal any influence of lactase persistence on longitudinal stability, and did so only in B. adolescentis and B. longum. Why each method varied in its 422 ability to detect a particular pattern is not clear, however their agreement with regards to longitudinal stability of B. adolescentis, B. longum and B. bifidum gives strong support to the conclusion that these 424 species are stable within adults over multi-year timescales.
Temporal stability, along with vertical transmission within families, are expected to be 426 associated with heritability. In addition to temporal stability and within-family sharing of strains show here, we had previously noted that the genus Bifidobacterium was both heritable and stable over time, 428 based on 16S rRNA gene analysis (Goodrich et al., 2014). For the relative abundances of microbial taxa or genes to be stably heritable over time, they must be present every generation and associated with 430 host genetic variation. If the acquisition of microbes from the environment or other individuals is less reliable than acquisition from parents, the heritability of horizontally acquired microbes should be less 432 stable over time than the heritability of vertically acquired microbes. These results, together with previous observations, continue to reveal the Bifidobacteria as highly human-adapted members of the 434 microbiome.
B. adolescentis, B. longum and B. bifidum have also previously been shown to be temporally 436 stable in the gut microbiomes of infants, where they are far more abundant than in the adult gut microbiome. Strains of these species were stable within children for up to three years of age across 438 diverse geographic cohorts (Vatanen et al., 2019), and B. longum supbsp. longum was stable from infancy until six years of age (Oki et al., 2018). Furthermore B. bifidum and and B. longum are 440 transmitted from mother to infant (Asnicar et al., 2017) (Yassour et al., 2018). By extending existing evidence of Bifidobacterium strain stability to an adult cohort and demonstrating strain sharing within 442 families, our results contribute to an overall picture of this genus, one which suggests long term maintenance of specific strains within an individual which can be transmitted across generations. 444

Sample inclusion and access
All existing samples from the TwinsUK cohort were included assuming they had both 1) metagenomic or 448 16S rRNA sequence data and 2) genotype data at the 13910 G/A allele. All work involving the use of these previously collected samples was approved by the Cornell University IRB (Protocol ID 450 1108002388).

16S rRNA gene based community analyses
16S rRNA reads were extracted from existing datasets (N = 1,680) and analyzed via the Qiime2 pipeline 454 (Bolyen et al., 2019) with minor deviations. Briefly, PCR amplicons for the V4 region of the 16S rRNA gene were generated with primers 515F-806R and were sequenced with the Illumina MiSeq 2 × 250 v2 456 Kit at the Cornell University Institute for Biotechnology as previously described (Goodrich et al., 2016a).
DADA2 (Callahan et al., 2016) was used to call 100% sequence identity Sequence Variants (SVs, a.k.a. 458 100% OTUs or ASVs). Taxonomy was assigned to SVs with the QIIME2 q2-feature-classifier (Bokulich et al., 2018) using the SILVA database (v119) (Quast et al., 2013). Taxonomic annotations of 16S rRNA 460 Sequence Variants (SVs) were improved from the standard DADA2 output by extracting each SV representative sequence and querying it against the NCBI type strain database. In one case all NCBI 462 annotations fell within the B. longum/B. breve clade, and was thus assigned as 'B. longum/breve' in subsequent analyses; in all other cases the original DADA2 assignment was kept. 464 Statistical testing for the influence of genotype was done using a non-parametric Wilcoxon ranksum test of the total relative abundance of all Bifidobacterium SVs between genotypes. Only SVs that 466 occurred in at least 2 individuals across the dataset of 1,680 samples were included. Individuals with either AA or AG at 13910*A (rs4988235) were considered lactase persistent, while only GG was 468 considered lactase non-persistent. Independent Wilcoxon rank sum tests were run for each Bifidobacterium SV independently across the two genotype groups. Because our dataset included 470 replicate samples from the same individual (therefore not independent), we determined the overall influence of genotype on Bifidobacterium SV relative abundance using a linear mixed-effect model, with 472 participant ID as a random effect, in the R package "lmr4": BifidobacteriumRA ~ Genotype + (1 | ParticipantID). 474 To assess the community structure of the genus without the influence of an overall enrichment in GG individuals, all SVs annotated as 'bifidobacterium' were extracted into new SV tables and again 476 normalized by 1 within an individual, then input into LEfSE (Segata et al., 2011)  Dairy intake was assessed as the portion size frequency for dairy for a week, adjusted for energy intake, using food frequency questionnaires developed and described elsewhere (Spector et al., 2006). 484 Associations with Bifidobacterium SVs were conducted using linear mixed-effect models, with genotype as a random effect, in the R package "lmr4": BifidobacteriumRA ~ DairyFrequency + (1 | ParticipantID). 486 Statistical significance was assigned using Satterthwaite's method in the R package "lmerTest" (Kuznetsova et al., 2017). 488

Metagenomic community analyses 490
Metagenome sequences were used from existing datasets (N = 245) which were extracted as above with library preparation previously described (Karasov et al., 2018). Taxonomic assignments were made using 492 Kraken2 (Wood and Salzberg, 2014). We assessed the influence of genotype on the relative abundance of Bifidobacterium annotations using a mixed effect model as described above. Metabolic pathway 494 annotations were generated using the HUMAnN2 pipeline against the MetaCyc database (Franzosa et al., 2018), and significantly discriminant gene pathways were revealed using HUMAnN2's built-in 496 'humann2_associate' script with default parameters at the gene pathway level (Franzosa et al., 2018).

Subset for genome capture, strain level analyses, and discriminant functional pathways
An additional subset of the TwinsUK cohort was selected for genome capture based on the i) availability 500 of genotype data at the lactase persistence gene (rs4988235) and ii) at least 2 longitudinal samples between 8 months and 4 years apart with a BMI change of less than 3. We note that variation in 502 temporal distance between sampling events had no impact on stability, community composition, or synteny values ( Figure S4, Figure S5). In total 20 individuals with 2 time points and 2 individuals with 3 504 time points were selected (a total of 46 samples across 11 GG individuals/ 11 AA individuals). Each genotype also contained 4 sets of twin siblings (Table S2). DNA samples were brought through 506 metagenome creation, capture reaction, and sequencing according to NimbleGen (Madison, WI, USA) SeqCap EZ HyperCap Workflow v.1.0. Briefly, gDNA underwent enzymatic fragmentation and adapter 508 mediated PCR using KAPA HyperPlus Library Preparation, followed by a 16h hybridization with a custom set of biotinylated long oligonucleotide probes (the 'probe array'), followed by a final re-amplification. 510 The probe array was designed and manufactured by NimbleGen and included overlapping coverage of a total capture space of >94Mb and 89k capture targets which covered 47 type strain Bifidobacterium 512 genomes (Table S3). Pre-captured and post-captured metagenomes were then sequenced across two Illumina paired-end 300 cycle (HiSeq 3000). Sequences of pre and post captured metagenomes are 514 available on the European Nucleotide Archive accession number PRJEB38000.
Analyses on each pre-and post-captured library from our 46 sample longitudinal subset (92 516 metagenomes total) were conducted with Simka, an annotation independent k-mer based metric (Benoit et al., 2016), and annotation based HUMAnN2 pipeline against the MetaCyc database (Franzosa 518 et al., 2018). Simka Bray Curtis Dissimilarity (BCD) matrixes were calculated directly within the software.
HUMAnN2 BCD matrixes were generated from gene pathway relative abundance tables, after removal 520 of collapsed pathway stratifications, using vegdist in the Vegan R package (V. 1.0.136). Each pairwise comparison within the BCD matrixes was then classified into one of three categories: Intra-individual 522 (same person over time), Same Family (comparisons between twin siblings), and Unrelated (comparisons between unrelated people). Wilcoxon rank sum tests were used to determine differences 524 between comparison categories. To overcome issues of non-independence related to multiple samples from the same individual within the dataset, we ran 9,999 permutations of Wilcoxon Rank Sum tests 526 using subsets of individuals equal to the smallest number of samples from either category, and reported the mean value. Finally, ANOSIM tests of BCD hierarchical clustering was conducted using the 'anosim' 528 script part of the 'vegan' R package with 9,999 permutations. ANOSIM permutation tests were run on intra-individual clusters, unrelated clusters, and same family clusters. 530

MIDAS Marker-SNPs and StrainPhlAn MLSAs 532
Strain level assessments were made using the Metagenomic Intra-Species Diversity Analysis System (MIDAS) (Nayfach et al., 2016) and StrainPhlAn . MIDAS's 'strain_straintracker.py' 534 program was used with default settings to identify marker-SNPs as previously described (Nayfach et al., 2016). Briefly, MIDAS initially identifies abundant species by mapping unassembled metagenomic reads 536 to a database of 30,000 reference genomes and then identifies per-species SNPs for each abundant species within a sample. To identify the SNPs used in our analyses, we had MIDAS identify rare, sample-538 discriminatory SNPs from a pool of 1 sample per individual (i.e. SNPs that were unique to that individual). We term these sample-discriminatory SNPs as 'marker-SNPs'. Each individual's additional 540 sample(s) was then added back into the pool, and marker-SNP overlap was assessed between sets of two samples in all possible pairwise comparisons across the dataset. The percent of shared marker-SNPs 542 was the number of marker-SNPs shared between two samples, divided by the total number across both individuals. Bootstrapped Wilcoxon rank sum tests were performed on the mean percent of marker 544 SNPs shared in each comparisons category described above (i.e. Intra-individual, Same Family, and Unrelated). Bootstrapping was done down to the lowest number in a single category for any given 546 comparison and run for 9,999 permutations, and the mean p-value was reported.
Resulting phylogenies were uploaded into the ITOL program (Letunic and Bork, 2011) for graphical 550 display of family and twinID annotations, and into Geneious (v 6.1.8) to retrieve patristic distances (branch lengths). Mean patristic distances were compared across comparison categories using 552 bootstrapped Wilcoxon rank sum tests as described above.
All software was run parallelized on a high performance computing cluster at the Max Planck 554 Institute for Developmental Biology via Snakemake (Köster and Rahmann, 2012).

Synteny analyses
First, a de-novo assembly of post-captured metagenomes was performed for each sample by using the 558 metacompass software (Cepeda et al., 2017). Next, a set of genes representing different genomic regions was selected randomly (using a custom script) from each of the reference genomes of B. 560 longum, B. adolescentis, B. animalis and B.bifidum. Each gene was used as a query for a BLASTn (Altschul et al., 1990) search against the assembled metagenomes, with minimal identity of 97% and minimal 562 coverage of 90%. For each of the blast hits in the assembled metagenomes, the gene and flanking sequences (3.5kb upstream and downstream to the blast hit) were retrieved and used for synteny 564 comparison. Further analysis was only carried out for regions found in > 15 samples. If for a given species the final number of suitable regions was lower than 4, the process iterated with a new set of 566 genes, keeping a minimal gap of 5 genes between each two selected genes, to avoid overlap between regions. 568 Pairwise synteny comparison between each two DNA sequences was done using the DECIPHER R package (Wright, 2016). Breaks in the synteny were defined as non-homologous regions longer than 15 570 base pairs. To compare the synteny between different types of pairwise comparisons (i.e. intraindividual, twins, non-related individuals) we defined the synteny score (equation 1): 572 Where n is the number of synteny blocks identified in each pairwise comparison, Lseq is the length of 574 the shorter sequence in each pair of compared sequences and Lsb(n) represents the length of the n th synteny block. 576

DECLERATIONS: 578
Ethics approval and consent to participate: All work involving the use of these previously collected samples from human subjects was approved by the Cornell University IRB (Protocol ID 1108002388). 580

Consent for publication: Not applicable.
Availability of data and materials: Sequences of pre and post captured metagenomes (the only new data 582 generated for this manuscript) are available on the European Nucleotide Archive accession number PRJEB38000. 584 Competing interests: The authors declare they have no competing interests.
Funding: This work was supported by the Max Planck Society and NIH grant# 2R01DK093595-05A1 586 category. In panel C (synteny) the number of significant regions out of the total number of regions with 774 sufficient comparisons are shown. P-values are shown in Table S5