High Sensitivity of Shotgun Metagenomic Sequencing in Colon Tissue Biopsy by Host DNA Depletion

The high host genetic background of tissue biopsies hinders the application of shotgun metagenomic sequencing in characterizing the tissue microbiota. We proposed an optimized method that removed host DNA from colon biopsies and examined the effect on metagenomic analysis. Human or mouse colon biopsies were divided into two groups, with one group undergoing host DNA depletion and the other serving as the control. Host DNA was removed through differential lysis of mammalian and bacterial cells before sequencing. The impact of host DNA depletion on microbiota was compared based on phylogenetic diversity analyses and regression analyses. Removing host DNA enhanced bacterial sequencing depth and improved species discovery, increasing bacterial reads by 2.46 ± 0.20 folds while reducing host reads by 6.80% ± 1.06%. Moreover, 2.40 times more of bacterial species were detected after host DNA depletion. This was confirmed from mouse colon tissues, increasing bacterial reads by 5.46 ± 0.42 folds while decreasing host reads by 10.2% ± 0.83%. Similarly, significantly more bacterial species were detected in the mouse colon tissue upon host DNA depletion (P < 0.001). Furthermore, an increased microbial richness was evident in the host DNA-depleted samples compared with non-depleted controls in human colon biopsies and mouse colon tissues (P < 0.001). Our optimized method of host DNA depletion improves the sensitivity of shotgun metagenomic sequencing in bacteria detection in the biopsy, which may yield a more accurate taxonomic profile of the tissue microbiota and identify bacteria that are important for disease initiation or progression.


Introduction
Culture-independent assessment of the bacterial taxonomic profiles by next-generation sequencing has become a common approach to examining the relationship between dysbiosis and various diseases.Characterization of the microbial profile associated with different anatomical locations has been reported by studies that analyzed human oral, gut mucosal, and fecal samples using 16S rRNA gene amplicon or shotgun metagenomic sequencing [1].The 16S rRNA gene sequencing is limited for its resolution of taxonomic identification, whereas metagenomic sequencing can identify bacteria at species or even strain levels [2][3][4].Moreover, metagenomic sequencing can provide multi-kingdom profiling, with the implication of inferring the interplay across multiple domains [5].
The comprehensive analysis of microbial and host genetic material in samples from patients has important clinical implications, including diagnosis and treatment of disease.We previously identified the microbial marker associating colorectal carcinogenesis [6] or gastric carcinogenesis [7] using 16S rRNA gene sequencing of tissue biopsies.Metagenomic sequencing provides not only the taxonomic composition of the microbiota in the tissue, but also their potential biological functions [8].However, the massive difference in the amount of human and bacterial DNA poses a constraint on the analysis of tissue samples by metagenomic sequencing.Low-abundance bacteria remain undetectable due to the overwhelming amount of human DNA in the tissue.Typically, greater than 99% sequencing reads from tissue samples are of human origin with less than 1% reads of non-human origin.The increased host background in tissues reduces the number and proportion of microbial reads, decreasing the metagenomic sequencing sensitivity in detecting microorganisms [9,10].To mitigate the concern that the majority of the sequencing reads are derived from host DNA dominant in tissue samples, measures such as targeted sequencing [11], increased sequencing depth [12,13], or removal of host DNA prior to sequencing [14][15][16][17] have been proposed.Indeed, various protocols have been reported to deplete host DNA in sputum [8,18], saliva [19], and diabetic foot infection [20] samples, thereby increasing microbial sequencing depth and the number of detected taxa.The effects of host DNA depletion on the metagenomic analysis and microbial profiles in tissue samples remain to be elucidated.In this study, we evaluated the efficacy of metagenomic sequencing on microbiome at the species level of tissue biopsies by host DNA depletion.

Results
Removal of host DNA from tissue sample increased the number of microbial reads Human and mouse colon tissues were homogenized and divided into two portions equally, with one portion undergo-ing host DNA depletion and the other portion serving as the control without host DNA depletion.For the host depletion group, mammalian cells were lysed first to release host DNA which was then degraded by benzonase.Bacterial cells were lysed subsequently for the extraction of bacterial DNA and metagenomic sequencing.For the control group, total DNA was extracted after the lysis of host and bacterial cells.
Bacterial DNA was extracted from colon biopsies after the removal of host DNA and an average of 55.37 ± 7.23 million reads were generated from 8 human (Table 1) and 19 mouse (Table S1) colon biopsy samples.The metagenomic data of human colon biopsies were aligned against a reference database of the human genome (GRcH38), whereas those of mouse tissues were aligned against a reference database of the mouse genome (mm10).After quality control and trimming of signals that were too low to have any biological significance, 95.02% ± 1.02% of the reads were shown to be derived from the human/mouse genome.With the removal of the host reads, the remaining reads were mapped to the National Center for Biotechnology Information (NCBI) standard set of microbial reference genomes for bacterial classification.The sequencing reads were aligned against the reference genome set using the Kraken taxonomic assignment software (Figure 1).
Removal of the host DNA increased the number of bacterial reads by 2.46 ± 0.20 folds while reducing host reads by 6.80% ± 1.06% (Figure 2A).With the cutoff value (read count > 15) (Figure S1), 2998 ± 401 (2.40 times more) bacterial species per sample were detected after host DNA depletion, whereas only 891 ± 98 species per sample were detected in the group without host DNA depletion (Figure 2B).Thus, significantly more bacterial species were detected upon host DNA depletion (P < 0.001).Most (93.45% ± 0.89%) of the bacterial species detected in the non-depleted group were also present in the depleted group, suggesting that our method increases the sensitivity for species detection without undermining the microbial composition of the tissue sample.Additionally, the abundances of bacteria detected only in the host DNA-depleted group were significantly lower compared with the abundances of shared bacteria between the two groups (P < 0.05) (Tables S2 and  S3; Figure S2).The enhanced detection of bacterial species was confirmed by the metagenomic sequencing data from mouse colon tissues.Host DNA depletion protocol increased bacterial reads by 5.46 ± 0.42 folds while decreasing host reads by 10.2% ± 0.83% in the mouse colon tissues (Figure 2C).A total of 3707 ± 1465 bacterial species per sample were detected after host DNA depletion, whereas only 1555 ± 314 species per sample were detected in the non-depleted group (Figure 2D).In line with the human data, significantly more bacterial species were detected in the mouse colon tissue upon host DNA depletion (P < 0.001).83.34% ± 7.00% of the bacteria detected in the non-depleted group were also present in the depleted group.Furthermore, correlation analysis of human and mouse samples suggested that the magnitude of host DNA depletion was strongly correlated with species detection by metagenomics (P < 0.05) (Figure 2E).

Host DNA depletion increased the diversity of bacteria in colon tissues
In concordance with improved species detection, an increased bacterial richness (measured by Chao1 index) was evident in the host DNA-depleted samples compared with non-depleted controls in human biopsies (P < 0.001) and mouse colon samples (P < 0.001) (Figure 3A and B).On the other hand, the beta diversity [measured by bray distance followed by principal coordinates analysis (PCoA)] revealed that the bacterial communities changed significantly upon host DNA removal, mainly because more bacteria were detected (Figure S3).Moreover, most of the bacterial species identified in the mouse colon tissue (98.39% for the depleted group and 97.05% for the non-depleted group) were also found in the stool sample (Figure 3C).The tissue-specific species [79 species in the depleted group (Table S4) and 56 species in the non-depleted group (Table S5)] belong to phyla Actinobacteria (36.71% in the depleted group and 46.43% in the non-depleted group) and Proteobacteria (37.97% in the depleted group and 32.14% in the non-depleted group).The detection of shared species in the colon and stool metagenomes suggested that no contamination was introduced into the samples during the host DNA depletion process.The stool microbiota exhibited the highest bacterial richness diversity, followed by the gut microbiota that had undergone host DNA depletion and the non-depleted gut microbiota (Figure 3D).The difference in bacterial richness among the metagenomes of stool, colon with host DNA depletion, and colon without host DNA depletion further supported that the amount of host DNA in the sample influenced the output of shotgun metagenomic sequencing and subsequent detection of bacterial species.

Host DNA depletion increased the coverage of bacterial genes
To further validate the impact of removing host DNA prior to metagenomic sequencing on microbial analysis, we examined the bacterial genes detected in individual samples.Gene accumulation analysis demonstrated that the detection of bacterial genes was increased by 33.89% upon host DNA depletion in  Correlation analysis suggested that the magnitude of host DNA depletion was strongly correlated with species detection by metagenomics.An increased number of species became detectable as more host DNA was removed from the tissue prior to metagenomic sequencing.
human colon biopsies (Figure 4A).This observation was confirmed by the data collected from mouse colon tissues.Detection of bacterial genes was increased by 95.75% in the depleted group compared with the non-depleted group (Figure 4B).
Host DNA depletion maintained gut microbial relative abundance while increasing the sequencing depth of bacterial DNA The most abundant phyla in human colon biopsies from both groups included Proteobacteria, Firmicutes, Bacteroidetes, Actinobacteria, and Cyanobacteria (Figure 5A and B).There was no significant difference in the dominant phyla between the depleted group and the non-depleted group (P > 0.05, Fisher's exact test), suggesting that the host DNA depletion did not alter the overall structure of microbial communities.
The dominant phyla also remained unchanged after host DNA depletion in the mouse colon tissues (Figure 5C and  D).At the species level, we observed that most of the bacteria (99.97% in human biopsies and 99.89% in mouse tissues) did not show significant differences (Figure 6).However, abundance shifts were observed in several species (0.03%) after removing host DNA.An enrichment of several Gramnegative bacteria, including Acinetobacter baumannii, Escherichia coli, and Pseudomonas sp.286, was observed in human samples after removing host DNA using our protocol (Figure 6A).Likewise, the abundances of some bacterial species (< 0.11%) were altered in mouse samples undergoing host DNA depletion.Several species, such as Lactobacillus murinus, Akkermansia muciniphila, and Bifidobacterium pseudolongum, were enriched whereas other species, such as Salmonella enterica and E. coli, were depleted after removing host DNA from the mouse colon tissues (Figure 6B).Given that the bacteria showing abundance shifts upon host DNA depletion constituted only 0.03% of the entire microbial communities in human samples and 0.11% in mouse samples, the induced abundance shifts had a minimal impact on the structure of the microbial communities.Furthermore, the quantitative polymerase chain reaction (qPCR) result showed that our method depleted host DNA and extracted bacterial DNA as effectively as other methods using osmotic lysis or saponin treatment (Figure S4).We calculated the reduced ratio, referring to the difference between the host DNA ratio in the depleted group and in the non-depleted group.We observed the reduced ratio from the qPCR data (9.46%) was similar to that from metagenomic sequencing data (10%).These findings suggested that our method improved the sequencing depth of bacterial DNA, uncovering low-abundance bacterial species that may have significant biological functions in health maintenance or disease development.

Discussion
Overwhelming quantities of host DNA relative to microbial DNA hinder the detection of specific bacteria in tissue samples by shotgun metagenomic sequencing.As such, we explored the various steps of DNA extraction and described an optimized protocol for reducing host DNA in the colon tissue and enhancing the sequencing output of bacterial DNA.Previous studies have used AHL (QIAGEN) lysis buffer to differentially lyse host cells and degraded host DNA by benzonase prior to extracting bacterial DNA from saliva or sputum samples [8,18,19].Unlike these studies using samples with high micro-  ) colon tissues before and after host DNA depletion.Increased alpha diversity was evident after host DNA depletion in human colon biopsies and in mouse colon tissues (P < 0.001).B. Species accumulation in the host DNA-depleted group reached the plateau sooner than the non-depleted group.Fitted curves were generated using a non-linear model y ¼ a Á e bx þ h, where x is the number of samples and y is the number of detected species.
The coefficients of the model were shown under the plot.The red arrow pointed to the saturated level.C. Majority of bacterial species identified in the mouse colon tissue (98.39% for the depleted group and 97.05% for the non-depleted group) were also found in the stool sample.D. The mouse stool microbiota exhibited the highest alpha diversity, followed by the host DNA-depleted gut microbiota and the non-depleted gut microbiota.
bial biomass (proportion of microbial DNA in total DNA), we extracted DNA from samples with much lower biomass, such as the limited colon biopsies from human.We included an additional step to pre-treat the biopsy tissue with collagenase D and built our own DNA libraries for metagenomic sequencing using only 1 ng of input DNA.Our metagenomic data demonstrate that our method increases the sequencing depth of bacterial DNA, leading to enhanced detection of bacterial species in tissue samples.We improved an existing protocol for the extraction of bacterial DNA from the biopsy tissue.Our method will be a useful tool to process clinical samples, especially biopsy tissues which tend to be in limited amounts, to effectively deplete host DNA and enhance bacteria detection in the samples.
The novelty of our current study was that only a small quantity of the tissue was required for host DNA depletion and consequent shotgun metagenomic sequencing.In most cases, acquiring substantial tissue samples for microbiome research is challenging, particularly when using human tissue biopsies.In this regard, we performed the host DNA depletion on $ 2.5 mg of colon biopsy whereas Nelson et al. and Charalampous et al. performed host DNA depletion on 200 mg and 400 ll of sputum samples, respectively [8,18].We calculated the ratio of bacterial reads in total reads and found that it was comparable but somewhat improved than the ratio reported by Nelson et al., in which they found a 5-fold increase in bacterial reads and a 28% reduction in host reads upon host DNA depletion [8].Likewise, our protocol increased bacterial reads by 5.5-fold and decreased host reads by 10% in the mouse colon tissue.We also reported a 2.5-fold increase in bacterial reads and a 6.8% decrease in host reads in human colon biopsy undergoing host DNA depletion.Our data also showed a small amount of microorganism DNA depletion along with the host DNA depletion, and this could be due to the disrupted microorganism upon freezing and thawing, or extracellular DNA from dead bacteria.However, the level of depleting microorganism DNA was far less than the host DNA depletion.
Another benefit of our method was that more tissue can be reserved for other analyses, including transcriptomics, proteomics, and metabolomics, to gain insights into the microbial function and activity [21][22][23].Our method increased alpha diversity and detected new species absent from the nondepleted controls, while maintaining the overall structure of the microbial communities.There were no significant changes in the dominant phyla of the human and mouse colon undergoing our depletion protocol.Hence, our protocol maintained gut microbial composition while increasing the sequencing depth of bacterial DNA.
Recent studies have developed an alternative approach to differential lysis of host and bacterial cells via separating host DNA from bacterial DNA by methyl-CpG binding domain (MBD-Fc) paramagnetic beads [24].Removing host DNA based on the abundance of CpG methylation was reported by Feehery et al. to evaluate the enriched bacterial DNA in human saliva and blood samples [24].A later study has highlighted the limitation of the methylation-based approach, including bias toward the recovery of CpG-rich and long DNA templates [25].In addition, Feehery et al. used a DNA extraction protocol devoid of mechanistic disruption of the cell  wall.The absence of the bead-beating process in their protocol hinders the detection of Gram-positive bacteria with the thick cell wall.As such, our method comprised enzymatic and mechanistic lysis of bacterial cells and extracted bacterial DNA more efficiently.
A bacterial mock community has been recently constructed to examine the changes of specific taxa upon host DNA depletion [8,18].In contrast to these studies that focused on the impact of host DNA depletion on selective pathogens, our work provided an unbiased examination and comparison of the microbial profile using limited tissue biopsies.Moreover, our depletion method reduced host DNA and enhanced microbial composition of colon tissues, supporting its utility in future relevant research with more accurate characterization of the colon tissue metagenome.
In conclusion, we described an improved method that removes host DNA from colon tissue and enhances the sensitivity of shotgun metagenomic sequencing in bacteria detection in the biopsy tissue.Using this method to remove host DNA from clinical samples may yield a more accurate taxonomic profile of the gut microbiota, leading to the identification of bacteria that are important for disease initiation or progression.

Sample collection
Colon biopsies (2-3 biopsies per case) were collected from eight human participants during colonoscopy at the Prince of Wales Hospital of the Chinese University of Hong Kong and stored at À80 °C until DNA extraction.Colon tissues were also collected in 19 C57BL/6 mice at the age of 6-8 weeks and stored at À80 °C until DNA extraction.

Host DNA depletion and bacterial DNA extraction
Tissue biopsies were homogenized and divided into two portions equally, with one portion undergoing host DNA depletion and the other portion serving as the control without host DNA depletion.For the group with host DNA depletion, host DNA was removed through differential lysis of mammalian and bacterial cells prior to sequencing.The mammalian cells were lysed in the initial lysis steps, whereas the bacterial cells were protected from the initial lysis by their thick cell wall.Host DNA was then degraded by benzonase, and bacterial DNA was isolated through a combination of enzymatic lysis, bead-beating procedure, and column-based purification.In brief, $ 2.5-mg colon tissues were pre-treated with collagenase D (2 mg/ml) at 37 °C for 1 h followed by DNA extraction using the QIAamp DNA microbiome kit (Catalog No. 51704, QIAGEN).500 ll of AHL buffer was added to the sample and incubated for 30 min at room temperature with end-over-end rotation.The sample was centrifuged at 10,000 g for 10 min and the supernatant was removed.190 ll of RDD (QIAGEN) buffer and 2.5 ll of benzonase were added to the sample and incubated at 37 °C for 30 min.The sample was then incubated with 20 ll proteinase K at 56 °C for 30 min.200 ll ATL (QIAGEN) buffer (containing reagent DX) was added to the sample, and the whole mixture was transferred to pathogen lysis tube L. Sample was lysed mechanically by Bioprep-24 homogenizer (Hangzhou Allsheng Instruments, Hangzhou, China) with a velocity of 6.5 m/s three times for 45 s with a 5-min intermission, while the sample was stored on ice.The supernatant was transferred to a fresh tube, and 40 ll proteinase K was added and incubated at 56 °C for 30 min.The sample was incubated with 200 ll of APL2 (QIAGEN) buffer at 70 °C for 10 min.200 ll ethanol was added to the sample, and the whole mixture was transferred to the QIAamp UCP mini spin column.The column was centrifuged at 6000 g for 1 min and then washed with AW1 and AW2 (QIAGEN) buffer.DNA was eluted in 30 ll water.
For the control group without host DNA depletion, total DNA was extracted from $ 2.5 mg of colon tissue by a combination of enzymatic and mechanical lysis of host and bacterial cells.Tissues were treated with lysozyme (10 mg/ml) at 37 °C for 1 h, followed by mechanical lysis using 100-mg glass beads (Catalog No. G4649, Sigma-Aldrich) and Bioprep-24 homogenizer (Hangzhou Allsheng Instruments) at a velocity of 6.5 m/s three times for 45 s with a 5-min intermission.Total DNA was then isolated using the QIAamp DNA mini Kit (Catalog No. 51304, QIAGEN).In brief, 100 ll ATL buffer and 20 ll proteinase K were added to the sample and incubated at 56 °C for 2 h. 4 ll RNaseA was added to the sample and incubated at room temperature for 2 min.200 ll AL buffer was added to the sample and incubated at 70 °C for 10 min.After the addition of 200 ll ethanol, the whole mixture was transferred to the QIAamp mini spin column and centrifuged at 6000 g for 1 min.The sample was then washed with AW1 and AW2 buffer, and eluted in 100 ll water.After DNA extraction, DNA concentration was measured by Quantus fluorometer (Catalog No. E6150, Promega).The step-by-step host DNA depletion protocol is provided in File S1.

Library preparation and metagenomic sequencing
DNA libraries were prepared and subjected to shotgun metagenomic sequencing for taxonomic profiling using the NEBNext Ultra II FS DNA library prep kit for Illumina (Catalog No. E6177, New England Biolabs). 1 ng of DNA was fragmented with fragmentation enzyme for 10 min at 37 °C followed by adaptor ligation.Adaptor was diluted 25fold, and 2.5 ll adaptor was incubated with 35 ll fragmented DNA, 30 ll NEBNext Ultra II ligation master mix, and 1 ll NEBNext ligation enhancer at 20 °C for 15 min.3 ll of USER enzyme was added to the ligation mixture and incubated at 37 °C for 15 min.Purification with beads was carried out according to the manufacturer's instructions prior to PCR enrichment of adaptor-ligated DNA under the following conditions: an initial temperature of 98 °C for 30 s followed by 12 cycles of 98 °C for 10 s and 65 °C for 75 s, then 65 °C for 5 min.Shotgun metagenomic sequencing was performed at Novogene Technology Beijing by Illumina HiSeq 2000 platform (Illumina) with paired-end 150 bp (PE150).

Metagenomic sequence analysis and taxonomic assignment
Metagenomic reads were quality-filtered using Trimmomatic, version 0.36 [26].Any leading or trailing N-bases and other bases that had Phred quality scores of 3 or below, sequence reads with a 4-base sliding window trimmer that required minimum average quality scores of 15, sequence reads that had 36

Figure 1
Figure 1 Workflow of metagenomic analyses

Figure 4
Figure 4 Host DNA depletion increased the coverage of bacterial genes A. Gene accumulation plot for the depleted group and the non-depleted group from human colon biopsies over different sample sizes.Upon host DNA depletion, detection of bacterial genes was increased by 33.89%.B. Gene accumulation plot for the depleted group and the non-depleted group from mouse colon tissues over different sample sizes.Detection of bacterial genes was increased by 95.75% after host DNA depletion.

Figure 3
Figure 3  Host DNA depletion increased the diversity of bacteria detected in colon tissue A. Chao1 index was calculated to examine the bacterial diversity of human (N = 8) and mouse (N = 19) colon tissues before and after host DNA depletion.Increased alpha diversity was evident after host DNA depletion in human colon biopsies and in mouse colon tissues (P < 0.001).B. Species accumulation in the host DNA-depleted group reached the plateau sooner than the non-depleted group.Fitted curves were generated using a non-linear model y ¼ a Á e bx þ h, where x is the number of samples and y is the number of detected species.The coefficients of the model were shown under the plot.The red arrow pointed to the saturated level.C. Majority of bacterial species identified in the mouse colon tissue (98.39% for the depleted group and 97.05% for the non-depleted group) were also found in the stool sample.D. The mouse stool microbiota exhibited the highest alpha diversity, followed by the host DNA-depleted gut microbiota and the non-depleted gut microbiota.

Figure 5
Figure 5 The microbial community structure was maintained after host DNA depletion A. Dominant phyla in the human colon biopsy.There was no significant difference in the dominant phyla between the depleted group and the non-depleted group.The P value was calculated by Fisher's exact test with Monte Carlo simulation.B. Dominant phyla in individual human colon samples before and after host DNA depletion.C. Dominant phyla in the mouse colon tissue.The dominant phyla remained the same for the depleted group and the non-depleted group, suggesting that the host DNA depletion protocol does not alter the overall structure of microbial communities.The P value was calculated by Fisher's exact test with Monte Carlo simulation.D. Dominant phyla in individual mouse colon samples before and after host DNA depletion.

Figure 6
Figure 6 Abundance shifts upon host DNA depletion A. Abundance shifts were observed in several species after host DNA depletion in human colon biopsy.However, the bacteria showing abundance shifts upon host DNA depletion constituted only 0.03% of the entire microbial community, suggesting that the induced shifts have minimal impact on the relative abundance of the microbes.B. Abundance shifts were evident after removing host DNA from mouse colon tissue.The bacteria showing abundance shifts represented only 0.11% of the entire microbial community.FDR, false discovery rate.

Table 1
Alignment of metagenomic data collected from human colon biopsies to reference genome database