The Gut Microbiota of Healthy Aged Chinese Is Similar to That of the Healthy Young

We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations.


IMPORTANCE
We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations. KEYWORDS 16S rRNA gene sequencing, DNA sequencing, compositional data, crosssectional study, gut microbiota, healthy aging, microbiota N obel Laureate Elie Metchnikoff is credited not only with providing insight into phagocytosis, but also with linking gut microbes and intake of fermented food to health and longevity (1). The search for healthy aging has been resurrected by the ability to identify a plethora of microbes at various body sites, particularly the gut, and to show that they influence and are influenced by health there and at distant sites (2,3). This has led to investigations into the development of the gut microbiota throughout life, with some studies reporting a gradual change over time (4) based upon relatively small sample sizes (5)(6)(7). In a study of 728 female twins, 637 operational taxonomic units (OTUs) were associated with frailty, including Eubacterium dolichum and Eggerthella lenta, with Faecalibacterium prausnitzii less abundant (8). Since frailty is strongly associated with increasing risk of earlier mortality (9), it is important to characterize the gut microbiota along the continuum of age. The recent consensus is that, in aggregate, the diversity of the gut microbiota declines with age (10), although whether this is associated with healthy aging is controversial and the delineation of what is normal in different cohorts are still not clear. Thus, continued study of the gut microbiota in large and distinct cohorts is needed to identify and separate potential microbial biomarkers for age and frailty.
A study of 314 healthy Chinese subjects showed that ethnicity and lifestyle could be discriminated at the bacterial species level (11). The latter study led us to hypothesize that the gut microbiota is imprinted in early life and associated with longevity. We gained access to a large number of fecal samples from a variety of communities across the age continuum in China and used a compositionally coherent approach to examine the variance of the OTUs across all samples with age as a continuous variable for exploratory data analysis and compositional association and as a discrete variable for differential abundance analysis.
Our results show that the microbial composition of the healthy aged population is remarkably similar to that of younger adult cohorts and that the major differences between cohorts in microbial composition occur prior to age 30 years. While our cross-sectional cohort precludes the assignment of cause and effect, our results suggest that diet and lifestyle choices consistent with healthy aging even into the 10th decade of life include a healthy and diverse microbiota.

RESULTS
The primary cohort of samples was collected from extremely healthy volunteers in three cities from Jiangsu Province in China. These cities, Zhenjiang, Suzhou, and Nantong, were located west of Shanghai on the Yangtze River. Table 1 contains the detailed geographic locations, age and sex distributions, and number of samples included in each group. Subjects under the age of 30 were included if their parents and grandparents lived to at least 80 years of age without major health problems that required surgery or long-term medication. Subjects aged 30 years old or older were included if the subject self-reported as being in extremely good health at the time of collection (see Materials and Methods). A secondary cohort was collected from young people undergoing military and police training in Lanzhou, a city in the north-central part of China. All metadata, tables of derived data, and code required to reconstitute all figures and analyses are publicly available at https://doi.org/10.6084/m9.figshare .4535660. The first analysis was performed on the data set with 883 samples collected from the general population. This data set included 577 operational taxonomic units (OTUs) that occurred at an abundance of greater than 0.1% in any sample and that occurred in at least 20% of the samples.
Initial exploration of the general population cohort by principal-component analysis (PCA) of centered log-transformed data (12) shows that the samples segregate with an observable but weak separation by age group (see "Samples" in Fig. S1 in the supplemental material). Figure 1 shows a compositional PCA plot of the samples and the associated loadings (13,14) for the 62 OTUs that were judged to be the most explanatory of this data set, as outlined in the Materials and Methods; these OTUs are identified in the "OTUs" panel of Fig. S1. While there is significant overlap in the location of the samples from each group, the most extreme groups were the youngest (ages 3 to 6 and 8 to 12 years), the oldest (age Ͼ94 years), and, surprisingly, the 19-to 24-year-old group. The loading plot in Fig. 1B shows the OTUs responsible for the difference in location of the samples. The finding that the 19-to 24-year-old group was divergent from the general population was reexamined by rerunning the analysis with the inclusion of a second cohort of 212 samples collected from military and police training groups (here the "young soldier" group): the entire data set in this case included 1,095 samples and 562 OTUs when filtered as for the general population cohort. The young soldier group included people who originated from various regions of China, were largely male, and again were between the ages of 19 to 24, inclusive. As shown in Fig. S2A in the supplemental material, the young soldier group clustered entirely and tightly within the extreme end of the 19-to 24-year-old group. An unsupervised clustering of the entire data set supported the general conclusion that the 19-to 24-year-old and young soldier groups and the Ͼ94-year-old and the 3-to 6-year-old groups were the most distinctive (Fig. S2B). The tight clustering of the young soldiers within the 19-to 24-year-old group was likely because the young solider group members were selected to be very healthy and very active, and all were housed together in two distinct common environments.
The young soldier group comprises people from a military training center in Lanzhou and from a separate police training facility in Lanzhou. The different groups were enrolled in distinct training regimens and did not mess together. Inclusion of only the 108 OTUs with the greatest effect or that were compositionally associated in this entire data set provided a similar ordination ( Fig. S2C and D) to that seen when the young soldier group was excluded (Fig. 1A).
All pairwise group comparisons were distinct when examined by permutational multivariate analysis of variance (PERMANOVA [P Ͻ 0.001]) using the Aitchison distance (15). Examination of the PERMANOVA results showed that the 19-to 24-year-old and young soldier groups were more similar to each other than to any other group and that the group differences were otherwise small. Thus, the low P values likely reflect the large number of samples and not necessarily a large difference between all groups (see Table S1 in the supplemental material). Furthermore, we observed that the young soldier group had the smallest within-group dispersion and the Ͼ94-year-old group the greatest (Table S1).
The exploratory data analysis shown in Fig. 1 and Fig. S2 revealed several interesting patterns in the microbiota composition for this large age-segregated data set. First, OTUs assigned to the Bifidobacterium genus appeared to be relatively enriched in the youngest groups and relatively depleted in the oldest groups. Conversely, OTUs assigned to the genera Dorea, Clostridium insertae sedis (IS) and sensu strictu 1 (SS1), Marvinbryantia, and, to a lesser extent, members of the Prevotella genus appeared to be relatively enriched in the samples collected from older subjects.
The age gradients and segregation of the 19-to 24-year-old and young soldier groups were very robust to data manipulations (see Fig. S3 in the supplemental material). In particular, the same separation between groups was observed when rare OTUs were excluded (minimum relative abundance in any sample of 1%) or common OTUs were excluded (maximum relative abundance in any sample of 2%). We thus concluded that the structure of the data was a function of the entire microbial ecosystem and was not driven by either rare or abundant taxa. This is in contrast to other studies where the separation of particular groups was driven by only a few relatively abundant taxa (16,17).
Examination of confounding factors suggested that the age groups were distinct from each other and that subsetting the age groups by metadata failed to reveal any substantial variance within groups that could be attributed to the metadata (see Fig. S4 in the supplemental material). We conclude that no other metadata better explained the data.
However, while the overall pattern of group separation was similar, the percentage of variance explained was different for the ordination when only the 562 male-derived samples or the 533 female-derived samples were examined. Including samples from females only, principal component 1 (PC1) explained 11.3% of the variance, while PC1 explained 19.4% in the male-only samples. The percentages of variance explained on the second PCA axis were essentially the same at 7.4% and 6.4%. The data were next examined by plotting age as a continuous variable versus Shannon's diversity, and the results are shown in the ␣ diversity panel in Fig. 2. The range of values is rather narrow but exhibited a modest peak diversity near age 10 and a slight minimum near age 20. The pattern of Shannon's diversity is not an artifact of read depth since examination of samples from all ages with a narrow read depth band exhibited the same pattern, and ␣ diversity and read depth were not found to be correlated (see Fig. S5 in the supplemental material).
Identification of correlated features in compositional data is notoriously difficult (18)(19)(20)(21)(22), and in particular Pearson's or Spearman's correlations give many false-positive associations (18). Thus, we used an expected value of metric (22) to identify clusters of coassociated OTUs in the data set. This strength of association approach identifies OTUs where both the direction and magnitude of variance are similar in the multidimensional data set. In a multivariate compositional sense, the metric is measuring both the slope of the correlation of the centered log ratio (clr)-transformed values and the correlation itself. A slope of 1 and a correlation of 1 are preferred (18,22,23). The metric coupled with a Bayesian estimation of OTU relative abundance (22,24) has the advantage of being agnostic to the level of sparsity in the data, unlike other recent approaches that depend on this sparsity (19,20). Panels A to G in Fig. 2 show the relative abundance of OTUs assigned to one of 9 genera (plus unknown genera) that had an expected value of Ͼ0.65, and where the FIG 2 Exploration of the data set with age as a continuous variable. The ␣ diversity panel plots Shannon's diversity (shdiv) across the age range (x axis). Each point is an individual sample, and the black line is the Loess line of best fit. Panels A through G represent clusters of concordant OTUs with an expected value cutoff of Ͼ0.65; this metric provides a measure of the constancy of the ratio between OTUs and is a replacement for correlation (18,22,23). Each line in a cluster plot is the Loess line of best fit for the clr relative abundance (rAB on the y axis) of an individual OTU across age (x axis). A 0 value indicates that the relative abundance of an OTU is equal to the mean log 2 relative abundance of all OTUs, while a positive or negative value indicates relative abundances greater or less than the mean log 2 relative abundance, respectively. OTU lines of best fit are colored according to the genus that the OTU is classified into according to the key. Note that most of the clusters contain OTUs related by the same genus (A, C, D, F, and G). The lines of best fit suggest approximately equal ratios between the cluster members across the age range; however, this must be investigated further (22,23), as shown in the last panel. For demonstration, the relative abundances between one pair of concordant OTUs from cluster C is plotted in the bottom right panel (OTU 6 versus OTU 3340). These two OTUs are the two relatively most abundant OTUs in cluster C (top two lines), and they have an expected value of 0.8. The slope of association shown in the red line is 0.82. The blue line shows the ideal slope of 1 (22,23). The Pearson correlation coefficient is 0.83. Table S2 contains the slope and correlation information for all pairwise correlated OTUs. cluster size was larger than 2, plotted against age as a continuous variable. For reference, the last panel in Fig. 2 shows the correlation and slope of one pair of OTUs. The correlation and slope for all pairs of OTUs identified are given in Table S2 in the supplemental material.
This approach identified seven distinct clusters, and most clusters contained OTUs classified into the same genus. It is likely that the most strongly associated groups include predominantly members of the same genus because different members of the same genus have similar growth requirements, limitations, and interactions.
Clusters F and G were relatively constant across all samples and were composed of OTUs identified as being members of the Faecalibacterium genus, suggesting that they are members of the core microbiota in this population. This result contrasts with the findings of Biagi et al. (25), who observed that the OTUs assigned to the genus Faecalibacterium as a whole were relatively less abundant in centenarians than in younger persons.
Clusters A to E showed a large change in relative abundance in the 19-to 24-yearold group compared to all other age groups. In particular the relative abundance of OTUs assigned to the genera Prevotella and Bacteroides (clusters A and D) were greatest in the 19-to 24-year-old group, and OTUs classified in both these genera were relatively rarer in people younger than 20. Members of the Bacteroides and of the Bifidobacterium genera were, in general, relatively least abundant in the samples from the oldest subjects. The Prevotella spp. in cluster A tended to remain relatively constant after age 30, but decreased somewhat in the oldest subjects.
Clusters B, C, and E showed a local minimum in relative abundance near age 20. Clusters B and E exhibited an otherwise somewhat constant relative abundance, likely reflecting a set of taxa that are part of the core microbiota displaced by the OTUs assigned to the Prevotella and Bacteroides genera in the 19-to 24-year-old and young soldier groups. In contrast, the members of cluster C, composed entirely of OTUs assigned to the Bifidobacterium genus, were relatively less abundant in the 19-to 24-year-old groups than in the age groups that immediately surround the 19-to 24-year-old and young soldier groups. Members of cluster C exhibited a continuous reduction in relative abundance from age 30 onwards compared to the relative abundance in the younger subjects.
Examining the Shannon's diversity plot, the local minimum in ␣ diversity near age 20 may be caused by members of the Prevotella and Bacteroides genera achieving relative dominance in this age group, thus displacing (or appearing to displace) the remainder of the members of the ecosystem and reducing the overall diversity of the system. Figure 3 shows the standardized effect size differences calculated by the ALDEx2 R package (see Materials and Methods) between each successive group with the OTUs binned by genus. It is striking that the majority of the OTUs in the majority of genera exhibit strikingly concordant differences in relative abundance, even if those differences do not reach an effect size greater than 1. This amplifies the results shown in Fig. 1 and 2, where only a subset of OTUs were displayed. A multitude of small genus-wide differences in relative abundance will have cumulatively large effects on the composition of the microbiota. To our knowledge, no other study has examined the association between OTU-level changes and genus-or other taxonomic-level changes in such a granular manner.
Not surprisingly, the association between OTU-level and taxonomic rank-level changes becomes less obvious with higher taxonomic rank, although large-scale trends can still be observed (see Fig. S6 in the supplemental material). At the phylum level, OTUs classified in the Tenericutes and Proteobacteria were relatively similar across all age groups. In contrast, the bulk of the Firmicutes exhibited large-scale difference, being relatively less abundant in the 19-to 24-year-old and 60-to 79-year-old groups, although there were many OTUs that exhibited trends away from the majority. More clearly, individual OTUs in the Actinobacteria and the Bacteriodetes phyla tended to have altered relative abundances that were more closely allied with the difference observed for all the OTUs in the phylum. Thus, alterations in the relative abundance of particular members of the Actinobacteria or Firmicutes phylum appeared to be compensated for by differences in relative abundances of OTUs in the Bacteriodetes or Firmicutes phylum or both phyla.
Second, we found that entire genera, and other taxonomic levels, are relatively increased or decreased in different age groups, indicating that the behavior observed for a small number of OTUs in Fig. 2, is often genus-wide. For example, OTUs in the genus Blautia displayed little difference in the 3-to 14-year-old age range, but are relatively depleted in the 19-to 24-year-old group, become relatively more abundant in subjects older than 30, and then become relatively depleted again in subjects older than age 60. A similar pattern is observed for Bifidobacterium, except that the earliest evidence of relative depletion occurs in the transition from primary to middle school (8to 12-year-olds versus 13-to 14-year-olds). In contrast, the bulk of the OTUs in the genera Prevotella and Bacteroides become relatively more abundant in all successive age groups until age 20, then relatively rarer in the 30-to 50-year-old group, relatively more abundant in the 60-to 79-year-old group, and relatively rarer again in those Ͼ94 years old.

DISCUSSION
The present study identified a distinct gut microbiota in a 19-to 24-year-old cohort from the general population that has not been observed in large-scale analyses of other populations (25)(26)(27) and may be unique to this healthy cohort in China. This observation may result from an altered diet, altered energy requirements, or an unknown cohort effect, although if the latter, it must have occurred countrywide as the same effect was observed in a population of university age students from Jiangsu Province and from police and military recruits originating from all provinces in China.
There are three possibilities that can explain the data in this cross-sectional study. First, it is tempting to speculate that the patterns of occurrence can be interpreted as Each comparison plot shows a point for each OTU binned by genus with the log 2 standardized difference, the "effect" measure determined by ALDEx2 (24,39), between the two groups on the x axis. Points are colored as red or blue if they have an effect size of Ն1 for the comparison. An effect size greater than 1 indicates that the OTU will be reliably found to have a greater difference between groups than dispersion within either group (24). Equivalent plots for comparisons at different taxonomic levels are shown in Fig. S6.
Gut Microbiota of Healthy Aged Chinese a trajectory that is established early on and reversed or disrupted in the 19-to 24-year-old cohort. Second, it is possible that the patterns observed represent a cohort effect, whereby each age group exhibits a pattern set up from a shared diet or environment. Third, it is possible that the differences between age groups can be accounted for by a survivor bias. This final explanation may be particularly important in the oldest age groups, which differed little from younger subjects. A lifelong longitudinal study would be required to distinguish between these three conjectures.
While the vast majority of samples were collected in Jiangsu Province, we believe that many of the observations will translate to other parts of China, in particular those regions with similar demographics, history, and cuisine. This is illustrated by the striking similarity between the 19-to 24-year-old group collected largely from Jiangsu Province and the young soldier cohort that was collected from two distinct groups of subjects of diverse origins in China. Importantly, the samples from the young soldier cohort were collected in a different way than for the remainder of the samples, yet the 19-to 24-year-old college students and young soldier samples cluster together on PCA plots, by unsupervised clustering, and had a small variance in the PERMANOVA analysis. Thus, given that the data from the 19-to 24-year-old group from Jiangsu Province are potentially generalizable to other people of similar ages from across China, it is possible that the data for other age groups may be similarly generalizable, at least in broad outline.
These data were analyzed using a compositionally coherent approach which examined the variance of the OTUs across all samples with age as a continuous variable for exploratory data analysis and for compositional association and as a discrete variable for differential abundance analysis. This unified approach gave consistent results across all three methods, which is in contrast to more standard approaches such as weighted Unifrac or Bray-Curtis dissimilarity, where the ordination or clustering is driven by the most abundant taxa, but the most differentially abundant taxa are often those that are rarest in the data set (17). We noted that an exploration of the data by nonmetric multidimensional scaling ordination identified the component 1 but not the component 2 separation, suggesting that the separation of the 19-to 24-year-old subjects was the most robust signal and that the compositional approach is more sensitive than a popular nonparametric ordination approach (see Fig. S7 in the supplemental material).
The diversity of gut microbial ecosystems across the healthy life span is somewhat controversial, with some reporting a decline in diversity with age in the elderly, especially the frail elderly (8,10,27), and others reporting that diversity either does not change or increases in the healthy elderly (8,41). Our analysis, with a very large cohort containing all age groups from 3 to Ͼ100, where the participants are all either very healthy or from a family with a very healthy family history, suggests that the answer depends on which group with which the healthy elderly are compared: diversity increases dramatically relative to healthy 20-year-olds, but declines relative to those in aged 13 to 14, and appears to increase slightly compared to those between 30 and 79 years old. In addition, the Ͼ94-year-old group had a larger ␤ diversity than did younger groups; thus, the small sample size of healthy aged people could result in spurious observations. Nevertheless, despite the relatively constant microbial composition in all age groups, there are reproducible differences in the microbiota composition between the age groups.
The largest differences in OTU abundance were found between groups early in life and around age 20, as well as in the extremely healthy elderly. There are large differences in relative abundance of the OTUs of many genera between subjects aged 19 to 24 and younger subjects. If we take the view that this is a cohort effect, we could conclude that members of multiple genera form a minimum or maximum relative abundance near age 20 (Fig. 3). This suggests that a change in lifestyle (e.g., leaving home for university or jobs) or physiology (e.g., levels of sex steroid hormones) in the postteen years is an important determinant of the observed gut microbiota. Indeed, the observed difference in microbiota in the 19-to 24-year-old groups does seem contemporaneous with the rapid rise in plasma levels of sex steroid hormones in males and females experienced at this stage in life, with testosterone in males and estradiol in females peaking around this age (28). These were not investigated in this cohort, but in animal models, the ability of microbes to regulate hormones and for them to change microbial diversity has been demonstrated (29). These differences were found to be more profound in extremely healthy and vigorously active young soldier cohort. Thus, we cannot rule out that either a difference in caloric intake or some other environmental or countrywide historical factor is the cause of the difference between the 19to 24-year-old subjects and the others.
Enterotypes proposed by Arumugam (16) were not identified in this data set: in fact, the relative abundances of members of the Prevotella and Bacteroides genera are largely concordant across the age range. This is not surprising given that the occurrence of enterotypes relies on the dominance of one or more taxa in the data set (17). The compositional analysis that does not rely on abundance coupled with the relatively large number of samples and of taxa and the richness of the Chinese diet may account for the lack of enterotypes.
The OTUs in several genera were relatively constant across all age ranges, with members of the Faecalibacterium genus being most constant and some members of the Blautia, Clostridium I, Anaerostipes, Dorea, and Turcibacter genera being more variable (Fig. 2). These may form a core microbiota for this cohort and perhaps for residents of China in general, as future studies will determine.
The microbiota composition of the centenarian cohort was remarkably similar to those of all members of the cohort over the age of 30, bar a few small changes, in the 30-to 50-year-old cohort that were carried through to elder years. This may reflect a selection bias in our sample cohort for the very healthy elderly. Nevertheless, it is interesting that the very healthy Ͼ94-year-old cohort has been able to maintain a microbiota similar to that of healthy younger people-perhaps by staying in one place and consuming the same type of food. However, other host and environmental factors were not investigated.
Several methodological limitations of the study must be acknowledged. First, only limited participant metadata were collected. Second, we did not conduct negativecontrol DNA extractions or amplifications, although it is unusual for fecal samples to be of low biomass. Third, block randomization of samples was not conducted: although samples from every age group were in each processed batch, they were not randomized or blocked. Finally, the set of participants was self-selected and all personal information was self-reported.
Conclusions. The results suggest that if you live to be 100 and in perfect health in China, your microbiota will likely appear to be relatively similar to that from a person in their mid-30s. Whether this is cause or effect is unknown, but it suggests that resetting an elderly microbiota to that of a 30-year-old might help promote health, if the microbiota is outside the norm. This study showed the practicality and power of a compositional data analysis paradigm, where ordination, differential abundance, and correlation can be analyzed in a unified and robust framework.

Enrollment and exclusion criteria for the study.
Volunteers were asked to fill out a self-reporting health information questionnaire that included information on the inclusion and exclusion criteria described below. The age and detailed geographic locations of all persons who contributed samples are included in Table 1. These were divided into eight groups by age and are referred to by their age range in years (Table 1): kindergarten students, 3 to 6; primary school students, 8 to 12; middle school students, 13 to 14; college students, 19 to 24; soldiers and police recruits, 19 to 24; middle-aged, 30 to 50; elderly, 60 to 79; and centenarians, at least 94.
Only subjects who self-reported as having a personal and family history of extreme health (based on the self-reported questionnaires) were included in this study. Inclusion criteria were nonsmoker, teetotaler, mood was stable (self-assessed/reported), absence of any diseases, no prescription medication and antibiotics for the past 3 months (including birth control pills), no personal and family disease history (such as cardiovascular, gastrointestinal, metabolic, neurological/mental and respiratory diseases, as well as cancers), and parents and grandparents are all alive or passed away after 80 years of age. This last criterion was not applied to subjects recruited older than 31 years of age. These stringent criteria excluded between 97% and 99% of potential volunteers depending on age. The volunteers in the young soldier category were chosen with all the criteria listed above and the following two additions: first, they had passed the standard military entrance medical examination, and second, their grandparents lived to be at least 85 years.
Description of informed consent and ability to use data for publication. Before sampling, all volunteers were informed about the purpose of this study and signed an informed consent form, which included provision of data acquired by examination of the samples they provided. The study was approved by the University of Jiangsu Affiliated Hospital Ethics Committee for Biomedical Research (Zhenjiang City, Jiangsu Province, China).
Collection methods for each cohort. Whole fecal samples were collected without preservatives following a standard operating procedure (SOP) described below. A sampling package was sent to subjects who met the inclusion criteria, and a collecting site was established nearby, with an icebox that was checked hourly. All samples were transferred to Tianyi Health Sciences Institute (Zhenjiang) Co., Ltd., in the icebox within Ͻ3 h of collection, mixed, aliquoted, and stored at Ϫ80°C. We used the same SOP with minor variations for the young soldier and Ͼ94-year-old groups. For the young soldiers' samples, each subject collected a fecal sample following the instructions provided and immediately placed the sample outdoors where the temperature was lower than 0°C. Samples were aliquoted and frozen at 8 a.m. the morning after collection. For the subjects in the Ͼ94-year-old group, the sampling package was taken to their home and retrieved within 2 h of collection, and the samples were then aliquoted at WenCi Hospital. Samples were frozen at Ϫ80°C after aliquoting.
DNA isolation methods. DNA was extracted from all samples using the PowerSoil DNA Isolation kit (Mo Bio Laboratories, Inc.) following the manufacturer's protocol, with modifications as outlined in the Earth Microbiome Project (version 4_13). DNA was quantified using a PicoGreen double-stranded DNA (dsDNA) reagent kit (Invitrogen; Paisley, United Kingdom) with a Molecular Devices SpectraMax microplate reader (Molecular Devices, Sunnyvale, CA). DNA samples were stored at Ϫ20°C until further processing.
Amplicons were purified using the Qiagen QIAquick PCR purification kit (Qiagen; Düsseldorf, Germany) according to the manufacturer's instructions and quantified using PicoGreen dsDNA reagent kit (Invitrogen; Paisley, United Kingdom). Purified amplicons were pooled in equimolar amounts, and the amplicon size was determined by an Agilent 2200 bioanalyzer.
DNA sequencing methods. The pooled product was paired-end sequenced with a 600 cycle kit on the Illumina MiSeq platform (Illumina, Inc., San Diego, CA) according to standard protocols.
Postsequencing processing. A brief description of the pipeline has been published previously (31), with the SOP and all software required available at http://github.com/ggloor/miseq_bin. Sequences were processed using a standardized pipeline (see Text S1 in the supplemental material). Reads were overlapped using PandaSeq v2.5 (32), and any reads that contained ambiguous positions (an N in either strand) were removed. Reads in each run were demultiplexed and assigned a name that uniquely reflected the sample identifier and barcode. Demultiplexed reads from all samples were pooled into one file and collapsed into individual sequence units (ISUs). Note that the ISU clustering step is extremely memory intensive and was conducted on a server with 256 Gb RAM. ISUs were ordered by abundance and were used for open reference OTU picking by USEARCH with a de novo chimera filtering step (33,34). The OTUs occurring in each sample were tabulated, with singleton OTUs and those rarer than 0.1% in any sample excluded, resulting in an initial data set containing 1,514 OTUs apportioned across 1,095 samples. This table is contained on the FigShare site as tab-separated plain text in SupplementaryCount- Table.txt and forms the basis of the analyses described below.
Methods of analysis. The data from high-throughput sequencing are relative abundance data and thus contain only information regarding the relationships or ratios between taxa (13)(14)(15)18). Thus, we adopted a compositional data (CoDa) analysis approach (35) that examines the ratios between OTUs. The full workflow is contained on the FigShare site, but in brief, we did the following. For exploratory analysis, zero count OTUs were replaced by an imputed value using the count zero multiplicative method from the zCompositions R package (36). The centered log ratio (clr) transform was applied to the zero replaced data set, and the data were subsequently used as input for a singular value decomposition (SVD). This approach returns data where the samples are separated by the variance in the OTUs rather than by differences in abundant OTUs (14,15). Initial exploration of the data was conducted by using PCA plots to explore the SVD output (12).
When conducting quantitative analyses, we used the clr-transformed posterior distribution of the data generated by 128 Monte Carlo replicates drawn from a Dirichlet distribution (13,24,37). All analyses report the expected value of the test statistic (22). Differential abundance tests were conducted with the ALDEx2 v1.6.0 Bioconductor package (24,37), and we report those taxa that have an expected effect size difference of Ն1, since effect size measures are more reproducible than are P values (38). Correlation analyses were done using a symmetric modification of the metric (22), which measures the variance in the ratios between OTUs. OTUs with a low ratio variance are said to be compositionally associated. The metric (23) was calculated as an expected value across clr-transformed Monte Carlo Dirichlet replicates (22, 24) generated by the aldex.clr function, and values of Ͼ0.65 were taken as indicating association between pairs of OTUs.
The data were further subsetted to include only OTUs that had an expected effect size of greater than 1 in any pairwise age group comparison or had an expected compositional association value of E() Ͼ 0.65.
Ethics approval and consent to participate. This study was approved by the Affiliated Hospital of Jiangsu University Biomedical Research Ethics Committee (Zhenjiang, Jiangsu, China).
Consent for publication. Sample identities are anonymous, and no identifying information is available for the participants.
Accession number(s). Demultiplexed raw reads are being made available through the SRA database under accession no. SRP107602.
Availability of data. In the interests of reproducibility, all metadata R scripts and tables of derived data are publicly available at https://doi.org/10.6084/m9.figshare.4535660, as is the code required to reconstitute all figures.

SUPPLEMENTAL MATERIAL
Supplemental material for this article may be found at https://doi.org/10.1128/ mSphere.00327-17.  ACKNOWLEDGMENTS G.B.G. thanks Jean Macklaim for invaluable discussions and suggestions on data presentation and interpretation.
Collection, sequencing, and initial analysis were supported by the following grants: J.L., Zhenjiang Tianyi Biotech Co., Ltd., Human Microbiome Project grant TY2015001; K.Y., International Science and Technology Cooperation Program, Zhenjiang GJ2015005 and Jinshan Excellence Projects and Zhenjiang ZJX2016170; W.Z., Social Development of Science and Technology Support Projects, Zhenjiang SH2013036, and Science and Technology Assistance Projects, Xinjiang 2014AB045. Development of the compositional data analysis approach was supported by NSERC Discovery grant RGPIN-03878-2015 to G.B.G.
The Tianyi Health Sciences Institute (THSI) is a private, not-for-profit entity. G.B., A.G., C.J., Y.X., and J.L. are employed by the THSI. G.B.G., G.R., J.P.B., and K.Y. are members of the scientific advisory board of the THSI. The work described here is not currently protected by patent nor is it the subject of a patent application. The remaining authors declare that they have no competing interests. Members of the THSI and advisory board played significant roles in design, collection, and analysis, as outlined in the authors' contributions listed above.