A normative microbiome is not restored following kidney transplantation

Abstract Dialysis and kidney transplantation (Ktx) mitigate some of the physiological deficits in chronic kidney disease (CKD), but it remains to be determined if these mitigate microbial dysbiosis and the production of inflammatory microbial metabolites, which contribute significantly to the uraemic phenotype. We have investigated bacterial DNA signatures present in the circulation of CKD patients and those receiving a KTx. Our data are consistent with increasing dysbiosis as CKD progresses, with an accompanying increase in trimethylamine (TMA) producing pathobionts Pseudomonas and Bacillus. Notably, KTx patients displayed a significantly different microbiota compared with CKD5 patients, which surprisingly included further increase in TMA producing Bacillus and loss of salutogenic Lactobacilli. Only two genera (Viellonella and Saccharimonidales) showed significant differences in abundance following KTx that may reflect a reciprocal relationship between TMA producers and utilisers, which supersedes restoration of a normative microbiome. Our metadata analysis confirmed that TMA N-oxide (TMAO) along with one carbon metabolism had significant impact upon both inflammatory burden and the composition of the microbiome. This indicates that these metabolites are key to shaping the uraemic microbiome and might be exploited in the development of dietary intervention strategies to both mitigate the physiological deficits in CKD and enable the restoration of a more salutogenic microbiome.

Variable Selection       Samples were then amplified using a 5 min 95 °C hotstart followed by 26 cycles of 95 °C for 30 s and 60 °C for 1 min with a final elongation step of 60 °C for 5 min.The resulting amplicons were purified using bead extraction (SPRI select beads, Beckman Coulter, Brea, CA, USA), using 0.9× beads followed by 80% ethanol washes and resuspension in 10 mM Tris-EDTA buffer.The amplicons were quantified using the High Sensitivity DNA Qubit system and profiles were obtained from an Agilent 2100 Bioanalyser using High Sensitivity DNA reagents (Agilent, Santa Clara, CA, USA).Samples were then standardized to 10 ng per reaction and amplified in the presence of Nextera XT v2 indexes using Kapa HiFi Hotstart readymix (2×) for 8 cycles.The resulting indexed libraries were then purified and quality controlled as before.
The libraries were combined in equimolar ratios and sequenced on a MiSeq (Illumina, San Diego, CA, USA) instrument using a paired end, 2 × 300 bp, sequencing run.Samples were sequenced with an average of 50 000 reads per sample.Possible contamination of reagents was controlled for by running a negative control sample (Nuclease-Free water (Ambion™, AM9932, Thermo Fisher Scientific)), instead of a DNA sample through the whole analysis, in conjunction with the true test samples.Water only samples were treated identically to test samples.

Bioinformatics Sequence Quality trimming and ASV Generation
Amplicon sequence variants (ASVs) were constructed, and abundance tables generated using the Qiime2 workflow.A total of 1348 ASVs were identified.Briefly, Paired-end reads were trimmed where the sequence quality dropped using the Deblur algorithm.Qiime phylogeny was used to generate the rooted phylogenetic tree for the ASVs, which were assigned taxonomy by classifying against the SILVA132 reference database.The abundance tables of ASVs were then combined with their taxonomy to generate the Biom file used for downstream statistical analysis in R.

Statistical Analyses
Statistical analyses were performed in R as by Ijaz, U. Z. et al, 2018Z. et al, (Ijaz et al. 2018)).The frequency based method from the decontam package was used to remove contaminating DNA which may have been introduced to samples in the library preparation process.

Alpha Diversity and Beta Diversity
The vegan package was used for alpha and beta diversity analyses.For alpha diversity measures we have used: Shannon entropy -a commonly used index to measure balance within a community, rarefied richness (exponential of Shannon entropy) -the estimated number of species, and Pielou's Evenness-A measure of abundance of each species present in a community ranging from 0 (not even, communities are dominated by particular species) to 1(completely even, abundance counts are evenly distributed amongst all species in the community).Ordination of ASV table in reduced space (beta diversity) was done using Principal Coordinate Analysis (PCoA) plots of ASVs using three different distance measures: Bray-Curtis (shown), Unweighted Unifrac and Weighted Unifrac.Unifrac distances were calculated using the phyloseq package (McMurdie and Holmes 2013).Analysis of variance for explanatory variables (or sources of variation) was performed using Vegan's adonis() against distance matrices (Bray-Curtis/UnweightedUniFrac/Weighted UniFrac/hierarchical MetaStorm).This function, referred to as PERMANOVA, fits linear models to distance matrices and used a permutation test with pseudo-F ratios to explain variability in the microbial community structure, contingent upon the variation in the given extrinsic parameter of interest.

Hierarchical MetaStorm (Functional dissimilarity measure)
As opposed to taxonomic beta diversity calculation which, assumes each feature (microbe) to be an independent entity, and traditionally uses Bray-Curtis distance (or any other count measure), calculation of functional beta diversity is challenging because of the presence of dependencies as well as redundancies in features (KOs).Therefore, applying traditional measures such as Bray-Curtis distance is not recommended as it leads to erroneous results.
To capture redundancies as well as hierarchical nature of KOs which at higher levels form pathways, we used Hierarchical Meta-Storms (HMS),(Y.Zhang et al., 2021) which collapses the metabolic pathways at the observed KOs level by considering BRITE pathways as a set of reference pathways, and then propagating the abundances upward for these pathways in a multi-level pathway hierarchy to give a weighted dissimilarity measure.This then provides higher sensitivity for detecting variations in upper-level metabolic pathways between samples.

Subset Analysis
The sinkr package was used to implement the BVSTEP algorithm to search for the highest correlation (Mantel test) between dissimilarities in a fixed and variable multivariable dataset.
Essentially, the method calculates original distances by determining the Bray-Curtis distance between samples using all the ASVs.It then permutes through the subset of ASVs, calculating the Bray-Curtis distances between the samples again for each permutation.These are then correlated against the original recorded distances until subsets are obtained that explain roughly the same beta diversity as the full set of ASVs.The algorithm permutes through 2n-1 possible combinations of ASVs features in the variable dataset.The 2000 most abundant ASVs were used for the correlation to identify the ASVs that were causing the most significant shifts in beta diversity, as being the most abundant we can assume that they have a significant role to play and will best correlate with the overall dissimilarities given if the total number of ASVs were used.

Null Modelling
A null modelling approach was used to determine the ecological processes driving community assembly, as described fully in (Trego et al. 2021).Briefly, Nearest Taxon Index (NTI) was used to determine to what extent the communities were influenced by environmental pressures.
This was done using the picante package (Kembel et al. 2010).Values >+2 indicate strong environmental pressure influencing the community, whereas values <-2 suggest that the driver is natural competition amongst species.The Jaccard (incidence-based) metric was used to calculate Normalised stochasticity ratio (NST) and modified stochasticity ratio (MST).
Proportional-proportional(P-P) and proportional-fixed (P-F) taxa-richness constraints were applied for each metric used (Ning et al. 2019).Quantitative process estimates (QPE) method was also used as part of this approach, providing a breakdown of the assembly processes used to establish community structre.These can be broken down by selection processes (variable or homogenous), dispersal processes (dispersal limitation, or homogenising dispersal), or In this context, variable selection leads to a more varied community as a result of a multitude of environmental conditions being present, whereas homogenous selection gives rise to a more consistent community due an unchanging environment.Conversely, dispersal processes refer to the migration of organisms form one space to another.For example, high rates of dispersal within microbial communities is known as homogenising dispersal, and results in similar communities, whereas minimal movement of organisms (dispersal limitation) results in dissimilar communities.This type of dispersal can lead to ecological drift, a stochastic assembly processes (Trego et al. 2021).

Core Microbiome Analysis
To identify core microbiome, we have used the approach discussed in (Shade, A., & Stopnisek, N. ( 2019).Abundance-occupancy distributions to prioritize plant core microbiome membership.Current opinion in microbiology, 49, 50-58).The approach first ranks the ASVs by occupancy, and then calculates the minimal occupancy threshold dynamically by learning from the data.After ranking the ASVs, the subset of core taxa is constructed incrementally by adding one ASV at a time to the core set of ASVs, from highly prevalent to lowly prevalent ones.The contribution of the core subset to beta diversity is then calculated every time a new ASV becomes member of the core set using the Bray-Curtis distance in the equation,  = 1 − . As per original author's recommendation, we have used an approach where the occupancy threshold is decided by stopping when addition of an ASV does not cause more than 2% increase in the explanatory power by Bray-Curtis distance.Independently, a neutral model (Shade, A., & Stopnisek, N. (2019).Abundance-occupancy distributions to prioritize plant core microbiome membership.Current opinion in microbiology, 49, 50-58).is fitted to the "S" shaped abundance-occupancy distributions informing about the OTUs that are likely selected by the environment.These are obtained as those that fall outside the 95% confidence interval of the fitted model, and are inferred to be deterministically assembled, rather than neutrally selected, with those that are above the model selected by the host environment (represented by red colour), and those points below the model are dispersal limited (represented by blue colour).We have used site-specific occupancy, where occupancy is viewed as a detection within a particular cohort (CKD 3-4, CKD 5, TX Baseline, TX Year Followup), such that as long as the ASVs are represented in each cohort (not necessarily in all replicates within that cohort), it is counted as occurring there.We then used the neutral modelling approach to partition these core ASVs to those that are neutral, and those that are above/below the model fit (deterministically/assembled).To draw these ASVs, we have used the R's metacoder package (Foster, Z. S., Sharpton, T. J., & Grünwald, N. J. (2017).Metacoder: An R package for visualization and manipulation of community taxonomic diversity data.PLoS computational biology, 13(2), e1005404).

Differential Analysis
To find ASVs that were significantly different between multiple categories considered in this study, we used the DESeq2 package (Love, Huber, and Anders 2014) with the adjusted p-value significance cut-off of 0.05 and log fold change cut-off of 2.0.This function uses negative binomial GLM fitting to obtain maximum likelihood estimates for the ASVs log fold change between the two conditions.Bayesian shrinkage was then applied to obtain shrunken log fold changes, subsequently employing the Wald test for obtaining significances.

CODA-LASSO
As opposed to GLLVM, which regresses individual feature abundance against all sources of variation, in CODA-LASSO we do the opposite.We take a single clinical covariate (a variable with continuous outcome) and try to find a minimal subset of features, a composition of these, have a relationship with the covariate of interest.For this purpose, we employ logcontrast functions and composition balance approach (Susin et al., 2020) where the goal is to identify two disjoint subsets of microbes, those that are positively associated, and those that are negatively associated with the covariate of interest.This is done through the CODA-LASSO approach (Lu et al., 2018)  which forces some of the -coefficients to go to zero, particularly those that do not have a relationship with the covariates, and thus the non-zero -coefficients (which are associated with a subset of microbes) serves as a mean to enable variable selection.The non-zero -coefficients are then divided into two groups, those that are positively associated with the clinical covariate, and those that are negatively associated with the clinical covariate, respectively.The procedure is implemented as coda_glmnet() function in R's coda4microbiome package (Calle & Susin, 2022).Similar to GLLVM, we have used the top 100 most abundant genera, however, after filtering out for microbes with an occupancy threshold of 10 (i.e., genera that are non-zero in at least 10 samples).