A key genetic factor for fucosyllactose utilization affects infant gut microbiota development

Recent studies have demonstrated that gut microbiota development influences infants' health and subsequent host physiology. However, the factors shaping the development of the microbiota remain poorly understood, and the mechanisms through which these factors affect gut metabolite profiles have not been extensively investigated. Here we analyse gut microbiota development of 27 infants during the first month of life. We find three distinct clusters that transition towards Bifidobacteriaceae-dominant microbiota. We observe considerable differences in human milk oligosaccharide utilization among infant bifidobacteria. Colonization of fucosyllactose (FL)-utilizing bifidobacteria is associated with altered metabolite profiles and microbiota compositions, which have been previously shown to affect infant health. Genome analysis of infants' bifidobacteria reveals an ABC transporter as a key genetic factor for FL utilization. Thus, the ability of bifidobacteria to utilize FL and the presence of FL in breast milk may affect the development of the gut microbiota in infants, and might ultimately have therapeutic implications.

I t is becoming increasingly apparent that the bacterial ecosystem in our gut has a profound influence on human health and disease. The gut microbiota contributes to immune system maturation, energy harvesting and sympathetic nervous system development. In particular, the composition and metabolite profiles of gut microbiota have been associated with pathogen resistance [1][2][3] , inflammatory responses 4 and adiposity 5,6 .
Initial gut microbe colonization begins immediately after birth, and bacterial ecosystems develop within the first few days. Previous studies have reported that the composition of the infant gut microbiota differs from that of adults [7][8][9] , that substantial variation occurs between individuals 6,10,11 and that bifidobacteria predominate in most infants [11][12][13] . Recent studies also demonstrated that environmental factors including the mode of delivery and feeding affect the gut microbiota assemblage and that the process is not random 6,13,14 . Furthermore, it has been indicated that the gut microbiota development during infancy can have long-lasting effects on the individual's future health [15][16][17][18] . However, little is known about their pattern of progression, factors that drive the assembly of infant gut microbiota and how these factors affect metabolite profiles.
Here we investigated gut microbiota compositions and metabolic profiles for 217 stool samples obtained from 27 infants during their first month of life (202 samples from 12 infants were analysed longitudinally and 15 samples from 15 infants were studied in follow-up). The dynamics and equilibria of the developing microbiota were investigated, and their associations with metabolites were evaluated. We subsequently analysed phenotypes and genotypes of isolated bifidobacteria, and found a key genetic factor affecting infant gut microbiota composition and metabolite profile.

Results
Early development of gut microbiota. To investigate the dynamics of gut microbiota immediately after birth, we analysed the sequences of the V1-V2 region of the 16S rRNA genes obtained from 12 infants born by normal delivery (Supplementary Table 1) using the 454 GS Junior platform. We obtained stool samples every day during the first week after birth and every other day thereafter until 1 month of age (B17 stool samples per infant; 202 samples in total). A total of 588,293 pyrosequencing reads (average 2,912±1,397 reads per sample; Supplementary Table 2 and Supplementary Fig. 1) were analysed using an open-source Quantitative Insights Into Microbial Ecology (QIIME) software pipeline 19 (Supplementary Table 3). Figure 1a shows an age-dependent, gut microbiota composition heatmap for each subject at the bacterial family level. The analysis demonstrated that there are major variations in both the composition and dynamic progression among individuals. Overall, the composition of the infant's microbiota was relatively simple, being composed of only a few dominant bacterial families. The displacement of predominant bacteria occurred within only a few days. We observed an increased average abundance of Bifidobacteriaceae, a-diversities and total bacterial cell counts, as well as decreased average abundances of Enterobacteriaceae and Staphylococcaceae ( Supplementary Fig. 2).
Characteristics of the taxonomic composition observed among the samples were clearly distinguished by principal coordinate analysis (PCoA) and partitioning around medoids (PAM) 20 on the basis of bacterial family composition data ( Fig. 1b and Supplementary Data 1). Values of the Calinski-Harabasz (CH) index with PAM clustering suggest that the infant microbiota could be divided into three clusters ( Supplementary Fig. 3), which were characterized by the predominance of Bifidobacteriaceae, Enterobacteriaceae or Staphylococcaceae (Fig. 1c).
We subsequently observed sequential transitions occurring from Staphylococcaceae-to Enterobacteriaceae-and/or Enterobacteriaceae-to Bifidobacteriaceae-dominated microbiota, with considerable individual variation in the day of the transition (Fig. 1d). Transitions in the opposite direction were rarely observed. The results suggest that the best-adapted bacterial family in the infant gut may be Bifidobacteriaceae, followed by Enterobacteriaceae and Staphylococcaceae.
Microbiota in 1-month-old infants. Subsequently, we performed 16S rRNA gene-library analysis using additional faecal samples; in addition to the faecal samples from 12 infants (subjects A-L, shown in Fig. 1 on day 29), we obtained faecal samples from their parents (n ¼ 22) and another 15 breast-fed infants at approximately 1 month after birth (Supplementary Tables 4 and 5). A total of 147,010 high-quality reads (average 3,058±1,232 reads per sample; Supplementary Data 2) were analysed, and the resulting family composition heatmap is shown in Fig. 2a.
The results of PCoA and PAM clustering based on bacterial family compositions suggest that the microbiota of 1-month-old infants stratified into two clusters that were distinct from the adult cluster ( Fig. 2b and Supplementary Fig. 4). The majority of infants (n ¼ 18, designated as cluster B) were characterized as having a significantly high abundance of Bifidobacteriaceae, and the minor cluster (n ¼ 9, designated as cluster E) had a significantly high abundance of facultative anaerobes such as Enterobacteriaceae, Enterococcaceae and Staphylococcaceae ( Fig. 2c and Supplementary Figs 5 and 6). The 22 adult samples formed a single cluster (designated cluster AD), showing significantly higher abundances of Lachnospiraceae, 'Clostridiales incertae sedis XIV', Bacteroidaceae, Ruminococcaceae and Peptostreptococcaceae, as well as higher a-diversity compared with the two infant microbiota clusters ( Fig. 2c and Supplementary Figs 5-7).
The correlation coefficients observed between bacterial family abundances in infant and adult microbiota are illustrated in network diagrams ( Fig. 2d and Supplementary Figs 8 and 9). Analysis of the infant stool samples indicated that the abundance of predominant Bifidobacteriaceae negatively correlated with those of Enterobacteriaceae, Enterococcaceae, Clostridiaceae and Staphylococcaceae. Furthermore, the network diagram for adults indicated that bacterial family compositions and their associations were different from those of infants.
Bacterial lineages and gut environments. To understand how gut microbiota affect the host's physiology, it is important to understand the entire gut ecosystem, not only in terms of microbiota compositions but also in terms of the metabolites produced by the bacteria. Therefore, we investigated the pH and organic acid concentrations of each infant's stool (Supplementary Data 3) and assessed the correlation between these parameters and bacterial family abundances ( Fig. 3a and Supplementary  Fig. 10). We found that increased Bifidobacteriaceae abundance positively correlated with organic acid concentrations and total bacterial counts but negatively correlated with pH. In contrast, the abundance of facultative anaerobes such as Enterobacteriaceae and Staphylococcaceae correlated with decreased organic acid concentrations, decreased total bacterial counts and increased pH (Fig. 3a), as well as decreased Bifidobacteriaceae abundance (Fig. 2d).
Previous studies have reported that bifidobacteria produce acetate and lactate as their metabolites 1,21 , and that some of these strains (for example, Bifidobacterium longum ss. infantis ATCC 15697) can efficiently utilize some human milk oligosaccharide (HMO) components [22][23][24][25] . Therefore, we hypothesized that bifidobacteria consume the remaining oligosaccharides in the infant gut, causing elevated acetate and lactate concentrations and decreased pH. Furthermore, we examined the concentrations of major residual HMO components in infant stools by highperformance liquid chromatography (HPLC; Supplementary Data 3 and Supplementary Figs 11 and 12) 26 . As expected, HMO consumption in the gut was associated with increased abundances of Bifidobacteriaceae (Fig. 3a), increased organic acid concentrations and decreased pH values (Fig. 3b). However, some infants showed high faecal oligosaccharide concentrations despite the presence of Bifidobacteriaceae (Fig. 3c). Therefore, we examined the species composition of bifidobacteria in these infants (Fig. 3c). We also analysed the oligosaccharide profiles of breast milk provided by their mothers, and found that three infants (subjects I29, TB16 and TB19) received breast milk from non-secretor mothers (lacking 2'-fucosyllactose (2'-FL) and 2', 3-difucosyllactose (DFL), Supplementary Table 6). However, we found it difficult to explain these discordant results.
Bifidobacterial genomes and FL utilization. To gain insight into how the strains showed differences in FL utilization, we determined the draft genomes of all 29 strains (Table 1 and  Supplementary Table 7) and subsequently performed OrthoMCL clustering analysis using bi-directional BLAST alignments ( Supplementary Fig. 16a). Initially, we investigated the presence of fucosidase genes (glycoside hydrolase family classifications: GH95 and GH29) and confirmed that all strains exhibiting robust growth in an HMO-containing medium possessed at least one fucosidase gene (Table 1 and Supplementary Fig. 16b). However, we found that 6 out of 15 strains showing limited growth in the HMO-containing medium possessed a fucosidase gene. We analysed the phylogeny and subcellular localization of the fucosidase genes. We found that most fucosidases, except for that of B. bifidum BI-14, are intracellular enzymes, with only minor sequence differences occurring between FL-utilizing and non-FLutilizing strains ( Supplementary Fig. 17). Subsequently, we searched for other genes responsible for FL utilization and discovered that the presence of a homologous group (HG_2571) corresponded with the FL-utilization phenotype in all strains, except for B. bifidum BI-14 (Table 1 and Supplementary Fig. 16c). BLAST searches against the KEGG database indicated that the homologous sequences were highly similar to substrate-binding protein (SBP), which participates in the multiple-sugar ABC transporter system (KEGG entry K02027; Supplementary Fig. 16c). Furthermore, we found that two permease genes of the multiple-sugar ABC transporter system (K02025 and K02026) and the fucosidase gene (GH95) were adjacent to the SBP gene in most strains (Fig. 4a). On the basis of these findings, we hypothesized that the ABC transporter mediates FL transportation into bacterial cells. This hypothesis could explain the absence of the transporter in the B. bifidum BI-14 strain, which utilizes FL because that strain expresses extracellular membrane-bound fucosidases (Supplementary Fig. 17).
To determine whether the putative ABC transporter SBP for FL (denoted FL-SBP) mediates FL utilization, the gene was knocked out in B. breve BR-A29 using homologous recombination ( Supplementary Fig. 18). After confirming that the FL-SBP gene was knocked out, growth of the knockout strain was investigated in the HMO medium. In contrast with the original BR-A29 strain, the FL-SBP gene-knockout strain showed limited growth in the HMO medium (Fig. 4b). Furthermore, we confirmed that FL was not utilized by the FL-SBP gene-knockout strain (Fig. 4c), demonstrating that the FL-SBP is responsible for FL utilization.
HMO-utilizing bifidobacteria affect gut ecosystems. Having identified the differences in FL utilization among bifidobacteria, we subdivided the Bifidobacteriaceae-dominated microbiota (cluster B) into those colonized by FL-utilizing bifidobacteria (designated cluster B1; n ¼ 11) and those dominated with non-FL-utilizing bifidobacteria (cluster B2; n ¼ 7; Supplementary  Fig. 19). Furthermore, we compared the faecal organic acids, pH, HMO and microbiota compositions of these two subgroups with those of Enterobacteriaceae-dominated microbiota (cluster E; n ¼ 9). Compared with clusters B2 and E, cluster B1 showed significantly higher acetate concentrations and lower pH and residual oligosaccharide concentrations (Po0.05, Mann-Whitney U-test with Bonferroni's correction; Fig. 5 and Supplementary Fig. 20). In addition, cluster B1 had significantly higher Bifidobacteriaceae and lower Enterobacteriaceae abundances. In contrast, there were no significant differences in faecal acetate concentrations, pH or oligosaccharide concentrations between clusters B2 and E.

Discussion
Gut microbiota development in healthy, full-term infants has been investigated using culture-based enumeration 29 , molecular analysis with 16S rRNA gene-targeting primers or probes [10][11][12]30 , or, more recently, using metagenomic approaches 8,9,13,31 . These intensive observational studies have demonstrated that the modes of delivery 10,13 , feeding 10,13,29 and use of antibiotics 12 can influence the development of the infants' microbiota. In addition, recent investigations have suggested that maternal HMO secretion type 31 , environmental exposures 10,14 and bacterial transmission and propagation 30 can also alter the assemblage of infants' microbiota. However, the overall understanding of gut microbiota development is still limited because of the small number of subjects studied 9 , the low frequency of sample analysis 8,13,31 and/or the limitations in the enumeration methods used [10][11][12] . In addition, metabolite profiles, bacterial strain isolation and related phenotype and genotype investigations were not performed in these studies. Thus, the presence of infant microbiota equilibria, key bacterial lineages and genetic factors regulating gut metabolite profiles and the molecular mechanisms that influence the gut microbiota in early life remain poorly understood. Among the considerable number of metabolic pathways, we demonstrated the importance of the FL-utilization pathway, which is associated with high concentrations of acetate in the gut and alters the microbiota composition (Fig. 6). To our knowledge, this is the first study suggesting that a single bacterial gene may affect the gut metabolic profile and microbiota composition in a human cohort. Previous metagenomics and metabolomics analyses in infants have suggested that gut microbial communities consistently serve the same major functions, regardless of variations in their compositions 18 . In contrast, we focused on the utilization of HMOs, the most abundant carbohydrates in the infant gut, and found that the important gut microbial metabolic activity was associated with a specific group of infant gut microbes in this study.
Genomic analysis of bifidobacteria and subsequent experiments with target gene knockout strain identified an ABC transporter that plays an essential role in FL utilization. Previous studies have demonstrated that some bifidobacteria can efficiently utilize HMO components, such as LNT 22,23 . Differences in FL utilization have also been observed between different species and strains 32,33 . However, the mechanism and importance of FL transport into bacterial cells have not been demonstrated.
Recently, Lewis et al. 31 investigated the relationship between maternal secretor status and the development of infant microbiota, and reported that infants fed by non-secretor mothers are delayed in the establishment of microbiota dominated by bifidobacteria. However, the FL-utilizing properties of bifidobacteria and the molecular mechanisms of FL utilization characterized in this study were not taken account.  Figure 6 | Infant microbiota development and molecular mechanisms of FL utilization by bifidobacteria. Infant gut microbiota showed 3 distinct clusters, which underwent directional transition to Bifidobacteriaceae-dominant microbiota, and displayed individual variations in the pace of progression. Isolated bifidobacterial strain showed differences in FL utilization. Colonization of FL-utilizing bifidobacteria are associated with altered gut acetate concentrations, pH, and Enterobacteriaceae and Bifidobacteriaceae abundances in the cohort, which have been previously shown to affect infant health. We subsequently identified a SBP of the multiple-sugar ABC transporter system is a key genetic factor of FL utilization.
In this study, we found that 3 out of 27 infants received breast milk from non-secretor mothers, and one infant (subject I, Supplementary Fig. 19) harboured FL-utilizing bifidobacteria. The infant exhibited considerably lower Bifidobacteriaceae abundance, higher Enterobacteriaceae abundance, lower acetate concentrations and higher pH (45%, 20%, 16.2 mM and 6.2, respectively; Supplementary Data 2 and 3) compared with the other cluster B1 infants (Fig. 5). These results support our notion that the presence of FL in breast milk and colonization of FL-utilizing bifidobacteria induce altered gut microbiota composition and metabolite profiles, consistent with the observation by Lewis et al. 31 Larger infant cohorts are needed to identify the relationship between maternal secretor status and colonization of FL-utilizing and non-utilizing bifidobacteria in order to draw more robust conclusion. A recent study reported that the 'commensal colonization factors' encoding a unique class of polysaccharide utilization genes play important roles in stable colonization of Bacteroides, one of the most predominant genera in the human adult microbiota 34 . In the current study, we identified the FL transporter as a key stable colonization factor of gut microbes in infants. More importantly, the FL transporter was shown to cause changes in the gut microbiome and the production of metabolites, which may confer benefits to infants. Thus, these data indicate that the FL transporter is a symbiotic colonization factor of infant bifidobacteria, which is conceptually distinct from the commensal colonization factor of Bacteroides colonizing the adult gut.
Changes in metabolite profiles and microbiota compositions caused by FL-utilizing bifidobacteria have been suggested to have a variety of beneficial effects on the host 1,4,[35][36][37][38] . It has been demonstrated that the acetate produced by bifidobacteria protects against Escherichia coli O157 infection by improving the gut barrier 1 or inhibiting Shiga-toxin production 35 in animal models. Other studies have reported that the short-chain fatty acids produced by gut microbes are associated with host energy balance 5,36 , inflammation resistance 4 and sympathetic nervous system development 37 . In addition, correlations between decreased Enterobacteriaceae abundance and lower susceptibility to infection have been demonstrated in animal models 2 and human cohorts 3 . Thus, the ABC transporter that is present in HMO-utilizing bifidobacteria may benefit the host by facilitating acetate production and limiting the abundance of Enterobacteriaceae.
Given the importance of infant gut microbiota for the longterm health of the individual, the FL-utilizing properties of bifidobacteria and related molecular mechanisms may ultimately have implications in therapeutic approaches or preventive medicine. Our findings suggest that the presence of FL in breast milk and its utilization by bifidobacteria help fostering the mutually beneficial relationship between humans and the main constituents of the infant gut microbiota. Since the ABC transporter is present in only a subset of bifidobacteria, future studies should investigate the effects of FL-utilizing bifidobacteria on the health of individuals using human interventions or animal models. The phenotypic and genotypic properties of bifidobacteria should be considered during the development of probiotics targeting the gut microbiota of infants.

Methods
Subject recruitment and sample collection. The study was approved by the ethical committees of Yakult Central Institute (subjects A to L; Supplementary Table 1) and Teikyo University (subjects TB-XX; Supplementary Table 4). Written informed consent was obtained from the parents before enrolment. No infants received antibiotics or probiotics before or during the study period. Infant stool samples were collected by the parents, B17 times each from 12 infants (subjects A to L) and once from 15 infants (Supplementary Table 4). Additional stool samples were collected from most parents of infants A to L (Supplementary Table 5). The samples were frozen at À 20°C, transferred to the laboratory and stored at À 80°C. In total, 239 samples were collected for gut microbiota analysis, including 217 stool samples from infants and 22 stool samples from parents. Parents were instructed to record changes in diet, medications, hospitalizations, birth weight and gestation age at the time of delivery (Supplementary Tables 1 and 4). HMOs were purified from breast milk provided by nine healthy volunteers (32.6 ± 3.9 years of age) at approximately 1 year following delivery.
DNA extraction. DNA was extracted as described in ref. 39. Briefly, 20 mg of each faecal sample (equivalent to 100 ml of fivefold homogenate) was suspended in 500 ml of extraction buffer (100 mM Tris-HCl, pH 9.0, 40 mM EDTA and 1% SDS; final concentrations) in a 2-ml screw-cap tube. Then, 300 mg of 0.1-mm-diameter glass beads and 500 ml of TE buffer-saturated phenol were added to the suspension. Microbial cells were lysed by mechanical disruption using a FastPrep FP 120 (BIO 101, Vista, CA, USA) at a power level of 5.0 for 30 s. The mixtures were centrifuged at 4,500g for 5 min, and the upper layers were subjected to phenol/chloroform/ isoamyl alcohol extraction, followed by isopropanol and ethanol precipitation. Finally, the dried DNA samples were suspended in 1 ml TE buffer and stored at À 35°C until they were pyrosequenced.
16S rRNA gene amplification and pyrosequencing. The V1-V2 regions of 16S rRNA gene were selected for pyrosequencing analysis because that region offers enough sequence variation for species-level discrimination of bifidobacteria, which are reported as the main constituent of infant gut microbiota. The 16S rRNA genes of each sample were amplified using a forward 63Fm-TAG-linker A primer (5 0 -CCATCTCATCCCTGCGTGTCTCCGAC-TCAG-NNNNNNNNNN-GCYT AAYACATGCAAGTMGA-3 0 ) and a reverse 338R-linker B primer (5 0 -CCTA TCCCCTGTGTGCCTTGGCAGTCTCAG-GCTGCCWCCCGTAGG-WGT-3 0 ). The italicized sequence represents 454 Life Sciences linkers A and B, respectively; TCAG in forward primer was inserted as a key sequence as recommended by the manufacturer; NNNNNNNNNN in forward primers represents the unique 10-base barcode for tagging each PCR product; and the underlined sequence is a broadly conserved sequence in bacterial 16S rRNA genes. We used universal primer 63F (ref. 40) with modification since bifidobacteria have a two-base-pair mismatch with respect to the universal primer 27F (ref. 41). The taxonomic coverage of the 63Fm and 338Rm primer sequences was evaluated with the Probe Match programme in the Ribosomal Database Project (RDP; Release 11, Update 2) 42 . A total of 1,354,916 16S rRNA gene sequences in the RDP were retrieved using the following parameters: strain ¼ both, source ¼ both, sizeZ1,200, quality ¼ good. The sequences retrieved were used as reference sequences for the Probe Match programme to evaluate the taxonomic coverage of the primer pair. The taxonomic coverage of the primer pair for each phylum was calculated as the percentage of genera for which more than half of the member sequences in each genus were found to match the candidate sequence with no more than one mismatch 43 . The PCR mixture (50 ml total volume) contained 1 Â SYBR Premix Ex Taq (Takara Bio, Osaka, Japan), 100 nM of each primer and 1 ml of template DNA. The thermocycling conditions used were as follows: 95°C for 5 min, followed by 25-40 cycles of 95°C for 30 s, 55°C for 30 s and 72°C for 1 min. Amplification was performed using an ABI PRISM 7500 Real-Time PCR System (Applied Biosystems, Framingham, MA, USA), and thermal cycling was stopped before the amplification curves reached a plateau to avoid introducing biases to the microbiota composition and generation of chimera. Amplicons were purified with the AMPure XP Kit (Beckman Coulter Genomics GmbH, Bernried, Germany) and quantified using the Quant-iT PicoGreen dsDNA Kit (Invitrogen, Leek, the Netherlands). Equal amounts of the amplicons were pooled and sequenced using a Roche 454 GS Junior pyrosequencer with a GS FLX Titanium emPCR Kit (Lib-L) according to the manufacturer's protocols (Roche Applied Science, Bavaria, Germany).
16S rRNA gene sequence-based microbiota analysis. The sequences generated from the 454 GS Junior platform were analysed using the open-source software package QIIME 19 . The resultant 16S rRNA gene sequences were assigned to operational taxonomic units (OTUs) using the USEARCH algorithm 44 , with a 97% identity threshold. Simultaneously, UCHIME 45 was employed to remove potential chimeric sequences from a representative set of OTUs, using the reference mode (against 'Gold' database 46 ). The taxonomy of each representative OTU sequence was assigned using the RDP-naive Bayesian Classifier with a minimum bootstrap threshold of 50% (ref. 47). A single representative from each OTU was aligned using the MUSCLE alignment tool 48 , and a phylogenetic tree was constructed using FastTree 49 . a-diversities (the number of OTU observed, Shannon index and phylogenetic diversity) were estimated for 1,000 randomly selected sequences to account for differences in sampling effort between the samples. Detailed QIIME commands and options are summarized in Supplementary Table 3.
PCoA and between-class analysis were performed as described in ref. 20. Data generated by QIIME at taxonomic level 5 (family) were used to calculate the Jensen-Shannon distances between samples. The PAM clustering algorithm 50 was applied to cluster the profiles. Estimation of the number of clusters in infant gut microbiota by PAM clustering according to the method in ref. 20 with modifications. We examined the number of clusters exhibiting the highest CH index 51 in the randomly subsampled data set (see Supplementary Figs 3 and 4 for more details). This trial was repeated 1,000 times. A cluster number most frequently exhibiting the highest CH index in the 1,000 trials was defined as the optimal number of microbiota cluster.
Total bacterial counts. Approximately 1 g of faecal samples were diluted fivefold with PBS, homogenized, and a portion of the homogenates (100 ml) was fixed with 300 ml of 4% paraformaldehyde at 4°C for 16 h. The paraformaldehyde-treated samples were smeared on a MAS-coated slide glass (Matsunami, Osaka, Japan) and stained with Vectashield Mounting Medium with 4,6-diamidino-2-phenylindole (Vector Laboratories, Burlingame, CA, USA). The fluorescent cells were counted in an automated format using a Leica MM AF microscope (Leica, Wetzlar, Germany) and Image-Pro Plus Image Analysis Software (version 5.1; Media-Cybernetics, Silver Spring, MD, USA). Bacterial counts were expressed as the mean values of 10 fields for each sample.
pH values and organic acid concentrations. The pH values of the faecal samples were measured directly with a handheld pH meter (model IQ150) equipped with a PH17SS electrode (IQ Scientific Instruments, San Diego, CA, USA). To determine the concentration of organic acids, 180ml of faecal homogenates (fivefold, prepared as described above) were mixed with 20 ml 10% HClO 4 and incubated at 4°C overnight. Subsequently, the samples were centrifuged for 5 min at 14,000g and the supernatants were filtered through Centricut W-MO membrane filters (0.45 mm; Kurabo, Tokyo, Japan). The concentrations of organic acids were determined with a HPLC system equipped with 432 electroconductivity detectors (Waters, Milford, MA, USA) and a Rspak KC-811 column (Showa Denko KK, Tokyo, Japan) 52 .
Statistics. Statistical analyses were performed using the R software, version 3.1.0 (http://www.r-project.org/) and Excel 2013 (Microsoft). Differences in bacterial abundance, a-diversities and other host metadata between the clusters were evaluated by the Mann-Whitney U-test, with Bonferroni's correction. Differences between clusters were represented with box plots. The middle bar of each plot indicates the median, while the top and bottom of each box indicate the third and first quartiles, respectively. Whiskers denote the lowest and highest values within 1.5 times the interquartile range from the first and third quartiles, respectively. Circles denote outliers beyond the whiskers. Correlations between host metadata, microbiota (bacterial family abundances, a-diversities and total bacterial counts) and gut environmental factors (pH, organic acid and oligosaccharide concentrations) were analysed by calculating Spearman's rank correlation coefficients. To construct the network diagram, Spearman's correlations were computed between the bacterial families representing more than 1% (on average) of the microbiota ( Supplementary Figs 8 and 9). Only relationships having an absolute Spearman's correlation above 0.3 with a P-value less than 0.05 were selected. The yEd Graph Editor (https://www.yworks.com/products/yed) was then used to construct the network figures.
Preparation of HMOs. Breast milk samples (B1 l total) were combined, mixed with 4 volumes of chloroform:methanol (2:1) and stirred vigorously for 1 min. After centrifugation at 9,000g, the upper layer was collected, evaporated and dissolved in H 2 O. Two volumes of ethanol were added to it and incubated overnight at 4°C, and the precipitated lactose was removed by passaging samples through filter paper. The ethanol was removed by rotary evaporation, after which the residue was dissolved in H 2 O and applied to an XK column (5 cm in diameter and 100 cm in length; GE Healthcare, Buckinghamshire, UK). Elution was performed overnight using water as the mobile phase, with a flow rate of 1.5 ml min À 1 . Fractions were collected and analysed with HPLC, and those fractions containing HMOs were combined.
Isolation of bifidobacteria from infant faeces. Faecal homogenates were diluted 10 5 -10 8 -fold with sterile saline (0.85% NaCl) and spread on bifidobacteria-specific TOS propionate agar plates (Yakult Honsha Co., Tokyo, Japan) supplemented with 50 mg l À 1 mupirocin. The plates were incubated at 37°C for 3 days in an anaerobic chamber (Coy Laboratory Products, Grass Lake, MI, USA) with 88% N 2 , 5% CO 2 and 7% H 2 . All resulting colonies appearing on the plates inoculated with the highest and second-highest dilutions of faecal homogenate (at least 10 colonies per plate) were then streaked on GAM agar plates (Nissui Seiyaku, Tokyo, Japan). These isolates were further classified by random amplified polymorphic DNA profiling, as described in ref. 53. Isolates with identical random amplified polymorphic DNA patterns using two primers (p1254, 5 0 -CCGCAGCCAA-3 0 ; p1281, 5 0 -AACGCGCAAC-3 0 ) were regarded as identical strain. Each strain was grown overnight in GAM broth (Nissui Seiyaku), centrifuged, suspended in GAM broth with 20% glycerol and stored at À 80°C. For further analysis, the strains were passaged twice in GAM broth and grown in an anaerobic chamber.
In vitro growth of bifidobacterial strains in the presence of HMO as the sole carbon source. The bacterial strains listed in Fig. 4a were cultured in PY medium (200 mM PIPES, pH 6.7, 2 g l À 1 Peptone, 2 g l À 1 BBL Trypticase Peptone, 2 g l À 1 Bacto Yeast Extract, 8 mg l À 1 CaCl 2 , 19.2 mg l À 1 MgSO 4 .7H 2 O, 80 mg l À 1 NaCl, 4.9 mg l À 1 hemin, 0.5 g l À 1 L-cysteine hydrochloride and 100 ng l À 1 vitamin K 1 ), which was supplemented with lactose or HMOs. The PY medium (200 ml) was inoculated with bifidobacterial strains growing at an exponential phase (equivalent to OD 600 ¼ 5 Â 10 À 4 ) and covered with 50-ml sterile mineral oil to prevent evaporation. Growth was monitored by measuring the OD 600 using a PowerWave 340 plate reader (BioTek, Winooski, VT, USA) every 30 min in the anaerobic chamber. Two technical replicates were performed for each strain. The supernatants of the bacterial cultures were collected at the end of the exponential growth phase (40 h) and stored at À 80°C for subsequent glycoprofiling.
Glycoprofiling of HMOs. Oligosaccharides in bacterial supernatants or faecal suspensions were labelled with an ultraviolet light-absorbing compound, p-aminobenzoic acid ethyl ester (ABEE), and quantified using HPLC, as described in ref. 26. Briefly, 200 ml of bacterial culture or faecal homogenate was used as the specimen, and arabinose was added as an internal standard (1 mM final concentration). The cultured media were mixed with 800 ml of chloroform:methanol (2:1) and stirred vigorously for 1 min. The upper layer was collected after centrifugation at 9,000g for 10 min and concentrated by evaporation at 70°C under a nitrogen gas flow. The crude oligosaccharides were dissolved in 200 ml of distilled water. A 10-ml portion of this solution was added to 40 ml of freshly prepared reagent mixture (35 mg ABEE, 3.5 mg NaBH 3 CN, 41 ml acetic acid and 350 ml methanol). The mixtures were heated at 80°C for 45 min in screw-capped vials, after which 500 ml of water was added. The mixture was extracted three times with 500 ml of diethyl ether to remove excess ABEE. The aqueous layer was evaporated, and the residue was dissolved in 300 ml of distilled water. Ten-microlitre samples of labelled HMOs were then separated on a Shimadzu Prominence HPLC System (Kyoto, Japan) with an L-column 2 ODS (Chemicals Evaluation and Research Institute, Japan). Fractions were eluted with an 87:13 (vol:vol) mixture of 100 mM ammonium acetate buffer (pH 4.5) and acetonitrile, using flow rates of 1.0 ml min À 1 (0-33 min) and 2.0 ml min À 1 (33-55 min) at 40°C. Labelled oligosaccharides were detected with an SPD-20 A ultraviolet detector (Shimadzu) at 304 nm. Standard curves for the major HMO components were generated using solutions containing both the internal standard (1 mM arabinose) and oligosaccharides at varying concentrations. The peak areas of the saccharides detected were standardized and quantified by comparison with the internal standard. 3-fucosyllactose, lacto-N-fucopentaose (LNFP) I, lacto-N-difucohexaose (LNDFH) I and LNDFH II were purchased from Dextra Laboratories (Reading, UK). 2'-FL, DFL, LNT, lacto-N-neotetraose, LNFP II and LNFP III were purchased from IsoSep AB (Tullinge, Sweden). N-acetyl glucosamine was purchased from Nacalai Tesque (Kyoto, Japan). These structures are illustrated in Supplementary Fig. 12.
Genome sequencing. The draft genomes of 29 bifidobacterial strains were sequenced with the MiSeq sequencing platform (Illumina, San Diego, USA). Multiplexed shotgun libraries were constructed using the Nextera XT DNA Sample Prep Kit v2 (Illumina). The resulting paired-end sequence reads (250 bp Â 2) were assembled into contigs using the assembly software ABySS 1.3.5 (ref. 54). The optimal k-mer size was predicted using VelvetK (http://bioinformatics.net.au/ software.velvetk.shtml; maximum value ¼ 128). To identify adjacent gene of fucosidase (GH29 and 95) and FL-SBP, draft contigs were aligned to the published genome of B. breve UCC2003 (Accession Number CP000303) 55 or B. longum ss. infantis ATCC 15697 (CP001095) 22 using BLAST to predict the order and orientation. Oligonucleotide primers were designed to anneal to each end of the neighbouring contigs, and PCR amplification was performed to confirm the contig ordering. The resulting amplicons were subjected to Sanger sequencing. Draft genome sequences of 29 bifidobacterial strains were deposited in the DDBJ wholegenome shotgun database under BioProject Accession No. PRJDB4041.
Annotation and comparative genome analysis. Protein-coding sequences (CDSs) on 4500-bp contigs were predicted using the gene prediction software Glimmer 3.02 (ref. 56). Functional annotations of CDSs were performed by BLASTP searches (version 2.2.24) 57 against the KEGG database 58 , with an e-value cutoff of 10 À 6 . All-against-all BLASTP searches for all proteins from the 29 strains were performed with an e-value cutoff of 10 À 5 and a minimum identity of 55%, and the results were clustered using the OrthoMCL method 59 . rRNA genes were identified by BLASTN analysis using known bifidobacterial rRNA sequences as queries. The genes annotated as fucosidase (both GH29 and GH95) by BLASTP searching were aligned using Clustal X (ref. 60) and were used as inputs for calculating phylogenetic relationships. Phylogenetic trees were computed with the neighbour-joining method using Clustal X and were visualized using FigTree (http://tree.bio.ed.ac.uk/software/figtree; Supplementary Fig. 17). Localization of the fucosidase proteins was predicted based on their amino-acid sequences using the PSORT software (http://psort.hgc.jp/form.html).
Multilocus sequence analysis. Identification of isolated strains by multilocus sequence analysis was performed by targeting seven reference genes (16S rRNA, clpC, fusA, groEL, gyrB1, purF and xfp). The genome sequences of B. breve UCC2003 (Accession Number CP000303) were used as queries for sequence identity searches, as described in ref. 61. The homologue sequences of the target genes were extracted from the 29 bifidobacterial strains by BLAST, concatenated and aligned with Clustal X (ref. 60). Phylogenetic trees were computed and visualized as described above.
Data availability. Infant gut 16S rRNA gene microbiome data have been deposited in the DDBJ Sequence Read Archive (DRA) under BioProject accession code PRJDB4038. The draft genome sequences of the 29 bifidobacterial strains have been deposited in the DDBJ whole-genome shotgun database under BioProject accession code PRJDB4597. The FL-SBP gene sequence of B. breve BR-A29 has been deposited in the DDBJ database under accession code LC068768. The authors declare that all other data supporting the findings of this study are available within the article and its Supplementary Information Files, or from the corresponding author upon request.