Gut Microbiota Has a Widespread and Modifiable Effect on Host Gene Regulation

The composition of the gut microbiome has been associated with various aspects of human health, but the mechanism of this interaction is still unclear. We utilized a cellular system to characterize the effect of the microbiome on human gene expression. We showed that some of these changes in expression may be mediated by changes in chromatin accessibility. Furthermore, we validate the role of a specific microbe and show that changes in its abundance can modify the host gene expression response. These results show an important role of gut microbiota in regulating host gene expression and suggest that manipulation of microbiome composition could be useful in future therapies.


Collinsella Spike-in Experiment
Collinsella aerofaciens was purchased from ATCC (cat#: 25986) and grown in Reinforced Clostridial Medium (BD Biosciences, cat#: 218081) following manufacturer's protocol, in anaerobic conditions. The culture was then centrifuged at 6,000 xg for 15 minutes, the media was removed and the pellet was resuspended in a 12.5% glycerol solution (in normal saline buffer). We verified that we were utilizing Collinsella aerofaciens by extracting the DNA with the PowerSoil kit as described below. We then PCR amplified the 16S region using primer specific to Collinsella aerofaciens with annealing temperature of 63 • C, foward: 5'-CTTTCAGCAGGGAAGAGTCAA-3', reverse: 5'-AGCCATGCACCACCTGTATGG-3' [2]. The PCR product was run on agarose gel with a 100bp ladder (NEB, cat#: N3231S). We observed the expected band of 590bp ( Figure S3F) in the Collinsella aerofaciens sample and we confirmed the specificity of the PCR product by assaying also a DNA sample extracted from Odoribacter splanchnicus (ATCC cat# 29572). We also verified that both samples contained DNA from the 16S region by performing an additional PCR on both samples with the prokaryotic 16S rDNA universal primers at an annealing temperature of 55 • C, forward (27F): 5'-AGAGTTTGATCCTGGCTCAG-3', reverse (1492R): 5'-CGGTTACCTTGTTACGACTT-3' [2].
Cell culturing conditions for this experiment were the same as described above. On the day of treatment, the microbiota sample derived from individual 4 (chosen because this individual did not have a detectable amount of Collinsella at baseline or at any treatment time point) and the Collinsella aerofaciens was assessed by spectrophotometer (OD600) (Bio-Rad Smartspec 3000). Using these values, solutions were made with the microbiota sample at 10:1 to the number of colonocytes and the Collinsella aerofaciens was spiked into the microbiota sample at 4 dilutions: 10%, 1%, 0.1% and 0.01%. Additional wells containing only colonocytes and colonocytes with the microbiota sample (0% Collinsella aerofaciens) were cultured as controls on the same 12-well plate. Each treatment was performed in duplicate.
Following 2 hours of culturing, the wells were scraped on ice, pelleted and washed with cold PBS, resuspended in lysis buffer (Dynabeads mRNA Direct Kit) and stored at -80 • C until extraction of colonocyte RNA (as described above).

RNA-library preparation from colonocytes
Poly-adenylated mRNAs were isolated from thawed cell lysates using the Dynabeads mRNA Direct Kit (Ambion) and following the manufacturer's instructions. RNA-seq libraries were prepared using a protocol modified from the NEBNext Ultradirectional (NEB) library preparation protocol to use Barcodes from BIOOScientific added by ligation, as described in [3]. The individual libraries were quantified using the KAPA real-time PCR system, following the manufacturer's instructions and using a custom-made series of standards obtained from serial dilutions of the phi-X DNA (Illumina). The libraries were then pooled and sequenced on two lanes of the Illumina Next-seq 500 in the Luca/Pique laboratory using the high output kits for 75 cycles to obtain paired-end reads for an average of over 40 million total reads per sample.
Gene annotations from Ensembl version 75 were used and transcripts with fewer than 20 reads total were discarded. coverageBed was utilized to count reads with -s to account for strandedness and -split for BED12 input. The counts were then utilized in DESeq2 with several models to determine changes in gene expression under various conditions. A gene was considered DE if at least one of its transcripts was DE.
In order to identify genes that changed at each time point following co-culturing, we used each microbiota treatment as a replicate with the following model: where Y jn represents the internal DEseq mean gene-expression parameter for gene j and experiment n, T tn = 1 is the time-point indicator {1, 2, 4}, M n is the treatment indicator (control or microbiome), and τ jt and β M jt parameters are time-point specific baseline and microbiome effect respectively. With this model, we identified 1,835 genes that change after 1 hour (70% of genes increase in expression), 4,099 genes after 2 hours (53% of genes increase in expression) and 1,197 genes after 4 hours (56% increase) with BH FDR < 10%, |logFC| > 0.25 ( Figure S1H and I).
In order to identify genes that were differentially expressed at a given time point after co-culturing with a specific microbiota sample, we used the following model: Compared to the first model, here we investigate the effect of each microbiome individual m ∈ {1, 2, 3, 4, 5} and time-point. Note that this model allows for a different effect to each microbiome compared to the previous model. With this model 1,131 genes changed after 1 hour with any of the 5 samples, 3,240 after 2 hours and 1,060 after 4 hours with BH FDR < 10%, |logFC| > 0.25 ( Figure 1B).

5
We next used the likelihood ratio test that is a part of DESeq2 to compare the 2 models above in order to identify genes whose expression changes over time are determined by the individual from which the microbiota sample was taken. In this way, we identified 409 genes at BH FDR < 10%.
We considered a model utilizing baseline alpha diversity to determine if diversity itself, and not necessarily any component of the microbiota, influenced host cell gene expression: Compared to the first model, we investigate the role of alpha diversity in host gene expression where α j is the alpha diversity-specific parameter and A m represents the alpha diversity for each microbiota m sample at baseline. Baseline alpha diversity was measured with three metrics using QIIME: Chao1, Simpson index, and Shannon index. We identified 14, 53, and 7 genes that were associated with Chao1, Simpson index, and Shannon index, respectively. However, there were no genes that were differentially expressed in all three.
In order to identify components of the microbiota samples that affect gene expression we used the following model for each gene j and taxon g: Compared to the first model, we investigate the role of the baseline abundance of a given taxon in host gene expression where γ jg is the taxon-specific parameter and G gm represents the baseline abundance each microbiota sample m and each taxon g at baseline. Baseline abundance is the number of reads (after all samples have been rarified to the sample with the lowest read count of 141,000) for a given taxon in each of the uncultured samples. Each of the time points had the same baseline abundance. This model was 6 run for 62 taxa that had at least 141 reads (0.1% of the total reads in a sample) in at least one of the five uncultured samples. Comparing each taxon to all genes expressed in the colonocytes, we had 9,125,927 tests. We identified 588 significant comparisons (BH FDR < 10%) comprising of 46 taxa and 121 genes (Table S1).
Finally, we analyzed the validation experiment with the spike-in of Collinsella aerofaciens. In order to identify genes that were differentially expressed because of the Collinsella aerofaciens, we used the following model: Gene expression ∼ treatment with microbiome+ relative abundance of Collinsella aerofaciens We investigate the role of the changing Collinsella aerofaciens concentration within the microbiota sample in host gene expression where µ j is the baseline mean, β M j is the overall microbiome effect, and γ C j is the Collinsella-specific parameter and G C n represents the spiked-in abundance Collinsella. Where treatment with microbiome is a factor with values yes and no denoting treatment with the overall microbiome sample and relative abundance of Collinsella aerofaciens is a number with the values 0, 0.01, 0.1, 1 and 10 depending on the spike-in amount of each sample. We identified 1,570 genes that change expression (BH FDR = 10%, Table S1, Figure S2F) depending on the abundance of Collinsella aerofaciens. Note that this analysis has more power than the one across the five microbiomes.

Gene ontology analysis
We utilized GeneTrail [6] to find enrichment of gene ontology terms. We compiled a list of unique genes that changed gene expression at any of the three time points (1hr, 2hr and 4hr) and determined which GO categories were under/over-represented as compared to a list of all genes expressed in colonocytes (28,107 genes). We considered a category over/under-represented if the Benjamini-Hochberg adjusted p-value < 0.05 ( Figure S1J).

Enrichment of DE genes among genome-wide association studies
We downloaded the GWAS catalog [7] (version 1.0.1) on January 5th, 2016. To identify the overlap between DE genes in our dataset and those associated with a GWAS trait, we intersected genes that contain transcripts that change significantly under a given condition with the reported genes from the GWAS catalog. We used a Fisher's exact test on a 2x2 contingency table using 2 groups: genes that contain transcripts that are DE ("DEG") and other genes that are expressed in each sample ("NOT").
We then split these groups into 2: genes that are associated with any GWAS trait ("TRAIT") and genes that are not associated with any trait in the GWAS catalog ("NOT TRAIT").
When considering specific trait enrichment among the 1,570 genes differentially expressed in the spikein validation experiment, 35 traits were tested. These included traits that had at least 5 differentially expressed genes and 1 non-differentially expressed gene associated with it.  The pooled, size-selected product was diluted to 8pM, spiked with 15% PhiX and loaded onto an Illumina MiSeq instrument to generate the 16S rRNA gene sequences. Samples were heat denatured at 96 • C for 2 minutes prior to loading, and a MiSeq 600 cycle v3 kit was used to sequence the sample, resulting in 2.2 million raw reads per sample, on average. Barcodes were removed from the sample reads by UMGC and the Nextera adaptors were trimmed using CutAdapt 1.8.1.

Heatmap and Network of Gene Expression and Microbial Abundance
The trimmed 16S rRNA gene sequence pairs were quality filtered (q-score > 20, using QIIME 1.8.0) resulting in 1.41, 1.06, and 1.53 million high quality reads for sample replicates 1, 2, and 3, respectively [9,10]. OTUs were picked using the closed reference algorithm against the Greengenes database (August, 2013 release) [8,9,10,11]. The resulting OTU table was analyzed to determine microbial community diversity using QIIME scripts and rarefying to 141,000 reads. Rarefaction plot was generated using QIIME on the five uncultured samples ( Figure S1B).
We verified that the fecal samples we utilized were similar to other healthy samples by comparing the OTUs detected to the Human Microbiome Project data [12,13]. 16S V4 OTU and HMP V1V3 OTU tables (https://www.hmpdacc.org/HMQCP/, final OTU table) were run through QIIME's summarize taxa.py and consolidated at the L3 class level. Tables were merged by most prevalent classes (13) with the remaining taxa merged into other representing < 1% of the total count on average. A Bray Curtis distance matrix and PCoA were performed in the R vegan package and plots were created with ggplot2, grid and gridextra libraries ( Figure S1E). HMP fecal samples colored red, study samples colored yellow and all other HMP samples colored grey.

Determining Effect of Colonocytes on Microbiota Composition
OTUs were collapsed to the genera level using scripts in QIIME 1.9.1. In total, 292 taxa were detected across all samples and treatments. To filter the number of taxa, the relative abundances were summed for each taxon, and taxa with summed relative abundance greater than 0.003 were kept in the linear model.
This yields 85 taxa of interest. To account for taxa that may be abundant at only at a particular time point, relative abundances of taxa were summed at each time point and kept if they had a total relative abundance greater than 0.0005. For time point 0, this threshold was 0.00025. The additional step yields 27 taxa; these were added onto the list of 85 for a total of 112 taxa.
To identify taxa that change significantly over time we used a linear model. Relative abundances of the 112 taxa of interest were normalized using an arcsine-square root transformation. In the linear model, "treatment" is binary and refers to microbiome samples grown with colonocytes or without colonocytes.
In the model, this is treated as a factor. "Individual" refers to the individual microbiome being studied and is treated as a factor. Time is treated as a factor variable, with possible values of 1, 2 and 4 hours, in order to allow for changes that are not necessarily linear over time.
To assess how each taxon changed in response to culturing with colonocytes, we ran two linear models and compared the goodness of fit using a likelihood ratio test. P -values from the likelihood ratio test were extracted and adjusted using the Benjamini-Hochberg procedure.
Null Model Transformed Relative Abundance of Taxa ∼ Individual + Time Alternative Model Transformed Relative Abundance of Taxa ∼ Individual+ Treatment + Time where G gn represents the transformed relative abundance of taxa parameter for taxon g and experiment

Characteristics of the Gut Microbiome
Thirty-five microbial samples were assessed with 16S rRNA gene sequencing, including the five uncultured microbiota samples, each of these five cultured alone (without colonocytes) for 1, 2, and 4 hours, and co-cultured with colonocytes for 1, 2 and 4 hours. In total, there were 7 samples from each original microbiome, and 35 samples overall. QIIME 1.8.0 was used to analyze the resulting sequences [9]. We found that the relative abundances of each taxa were most strongly affected by the individual from which the sample was derived and did not change dramatically over time ( Figure S1A and C). We also found that the samples clustered by individual using weighted UniFrac, although more weakly ( Figure S1D).
The microbiome maintains its composition during culture and our sequencing depth is able to capture this variation ( Figure S1B). These results demonstrate that the gut microbiota samples are typical and unaffected by culturing for up to 4 hours.

Response of Human Colonocytes to Inoculation with Microbiome
When we consider how the microbiota influences the host cell response, we identify 3,904 genes that are differentially expressed in at least one sample at 1, 2 or 4 hours compared to control colonocytes. In order to gain power to identify shared differentially expressed genes, we next utilized a model where treatments at a given time point are considered replicates and compared to control colonocyte samples. In this model, we identified 4,099 genes (BH FDR < 10%, | log 2 FC| > 0.25) that change expression consistently across the five microbiota treatments at 2 hours, while over 1,000 change consistently at 1 and 4 hours (examples in Figure 1D, Figure S1H and I, Table S1). We were unable to identify any genes that were differentially expressed based on diversity of the microbiota sample at baseline, independent of diversity measure. However, we identified 121 genes that were associated with microbial taxa at baseline.

Effect of Human Colonocytes on Microbiome Composition
We assessed the microbial composition of the individual microbiome samples grown with and without colonocytes. Microbiome samples grown with colonocytes were collected concurrently and from the same wells that RNA-seq data was generated. We identified 13 taxa that showed varying abundance dependent on the presence of host cells ( Figure S2A, B and C). These data demonstrate that interaction with the host affects microbial communities. We co-culture the colonocytes and microbiota for up to four hours, but it is likely that in vivo, the various species will eventually come to an equilibrium which can then be maintained in the host organism and will reflect both host genetic variation and diet. When this equilibrium is disrupted, previous studies have shown that disease and infection may occur [14,15,16,17,18].