Differences in Gut Virome Related to Barrett Esophagus and Esophageal Adenocarcinoma

The relationship between viruses (dominated by bacteriophages or phages) and lower gastrointestinal (GI) tract diseases has been investigated, whereas the relationship between gut bacteriophages and upper GI tract diseases, such as esophageal diseases, which mainly include Barrett’s esophagus (BE) and esophageal adenocarcinoma (EAC), remains poorly described. This study aimed to reveal the gut bacteriophage community and their behavior in the progression of esophageal diseases. In total, we analyzed the gut phage community of sixteen samples from patients with esophageal diseases (six BE patients and four EAC patients) as well as six healthy controls. Differences were found in the community composition of abundant and rare bacteriophages among three groups. In addition, the auxiliary metabolic genes (AMGs) related to bacterial exotoxin and virulence factors such as lipopolysaccharides (LPS) biosynthesis proteins were found to be more abundant in the genome of rare phages from BE and EAC samples compared to the controls. These results suggest that the community composition of gut phages and functional traits encoded by them were different in two stages of esophageal diseases. However, the findings from this study need to be validated with larger sample sizes in the future.


Introduction
Barrett's esophagus (BE) is the only known precursor for the development of esophageal adenocarcinoma (EAC) with a five-year survival rate of less than 20%. The incidence of these diseases is on the rise globally [1,2]. Early diagnosis of patients at risk could prevent the progression of BE to EAC, and effectively reduce the development of EAC. However, as only 0.3-0.5% of BE patients develop EAC, endoscopic biopsy surveillance, while linked to higher survival rates, is only recommended for at-risk patients [3]. In addition, endoscopies are often discomforting, and sometimes lead to inconclusive results [4]. Thus, noninvasive diagnostics with higher accuracy are sought after. The human gut is home to trillions of microorganisms, including bacteria, viruses, fungi, and protozoa. These microorganisms and their human host maintain a symbiotic relationship, in which the host provides a nutrient-rich habitat, and the microbiota supplies key metabolic capabilities, protects against pathogen invasion, and trains the immune system [5]. In addition, an imbalance in gut microbiota, termed dysbiosis, is associated with several human diseases or conditions, This study aimed to investigate the alteration of gut phages in different stages of esophageal diseases. For this purpose, we (1) determined the composition of the isolated bacteriophage community in BE patients, EAC patients, and healthy controls (CT); (2) predicted the bacterial host ranges of the gut phages in all three groups; (3) identified the metabolic pathways encoded by these phages.

Sample Collection
Sixteen samples were selected from the German BarrettNET registry including six BE patients, four EAC patients, and six CT for virome analysis. The clinical data are shown in Table S1, and additional information can be found in a previous study [42]. Stool samples were collected using Stool Collection Tubes with Stool DNA Stabilizer (STRATEC Molecular GmbH, Berlin, Germany). The sampling procedure was conducted mostly at home or in the clinic if the patients were on outpatient visits. Samples were shipped to the clinic human sample biobank and stored at −80 • C until further virome DNA extraction.

Virome DNA Extraction
The stool samples were vortexed vigorously for 4 h at 4 • C, then centrifuged at 4000 g for 30 min to collect supernatant. The supernatant was passed through 0.22 µm filters (PES Membrane, Lot No. ROCB29300, Merck Millipore, Co., Cork, Ireland) to remove bacterial-associated particles, and the volume was subsequently concentrated to less than 50 µL by Amicon ® Ultra Centrifugal Filters (10 kDA, Lot No. R9EA18187, Merck Millipore, Co., Cork, Ireland). Then 1/5 volume of chloroform was mixed with the samples and centrifuged at 14,000 g for 3 min, retaining the upper phase followed by a DNAse I (1 U/µL, Lot No. 1158858, Invitrogen, Carlsbad, CA, USA) treatment for 1 h at 37 • C to remove non-phage DNA. DNase I was inactivated by adding EDTA (0.1 M). Subsequently, lysis buffer (700 µL KOH stock (0.43 g/10 mL), 430 µL DDT stock (0.8 g/10 mL), and 370 µL H 2 O, pH = 12) was added to the reaction and incubated at room temperature for 10 min followed by 2 h incubation at −80 • C, and 5 min at 55 • C. Lysed VLPs were then treated for 30 min at 55 • C with Proteinase K (20 mg/mL, Lot No. 1112907, Invitrogen, Carlsbad, CA, USA) to digest remaining viral capsid and extract the virome DNA. AMPure beads (Agencourt, Beckman Coulter, Brea, CA, USA) were added to the extracted DNA and incubated for 15 min at room temperatureF. DNA was eluted from beads by 35 µL Tris buffer (10 mM, pH = 9.8) and stored at −80 • C until it was sent for sequencing. Sequencing was performed on an Illumina HiSeq-PE150 platform.

Bioinformatic Analysis
On average, 9,358,935 ± 169,389 reads per samples were generated. Raw reads were processed with fastp (v0.20.1) [43] to remove adaptors and low-quality bases. Remaining reads were deduplicated using dedupe.sh from bbmap suite (v38.76) (https: //sourceforge.net/projects/bbmap/; accessed on 29 January 2020). Then the obtained reads were assembled into contigs using metaSPAdes (v3.14.0) [44] with default parameters retaining only contigs longer than 1 kb. Redundant contigs were removed by dedupe.sh. Remaining contigs were used to predict viral sequences by the combination of VirSorter (v1.0.6) [45], CAT (v5.0.4) [46] and DeepVirFinder (v1.0) [47]. Contigs predicted as category 1 and 2 by Virsorter, or predicted as viruses by CAT, were classified as viruses. Contigs also were classified as viruses if they were predicted as category 3 by VirSorter or could not be classified to taxonomy by CAT but were predicted as a virus by DeepVirFinder with q value < 0.01. Predicted viral contigs were clustered using CD-HIT [48] if they shared >95% identity over 80% of the contig length, the longest contigs in each cluster were retained as a representative for downstream analysis.
For each representative viral contig, ORFs were predicted using Prodigal (v2.6.3) [49] and provided to vConTACT (v2.0) [50] for taxonomy annotation. For contigs that could not be assigned a taxonomy by vConTACT, CAT annotations were used. Otherwise, Order and Family level taxonomic annotations were predicted using Demovir script (https://github.com/feargalr/Demovir; accessed on 27 July 2019) with default parameters and database. To calculate the relative abundances of viruses in each sample, clean reads from each sample were mapped to viral contigs using bbmap.sh from bbmap suite (v38.76). CoverM (v0.4.0) (https://github.com/wwood/CoverM; accessed on 20 February 2020) was used to estimate contig coverage. Feature Counts (v2.0.0) [51] was then used to estimate the number of reads that mapped to each gene. Viral proteins predicted in the previous step were fed into VIBRANT (v1.2.1) [52] to identify lytic and lysogenic phages and the function was annotated using protein mode with default parameters. VI-BRANT annotates viral proteins by searching viral proteins against KEGG [53], VOGDB and PFAM databases, which include function annotation of protein sequences and AMGs. The virus (phage)-bacteria (host) interactions were predicted by VirHostMatcher-Net, which is a method based on the combination of features: virus-virus similarity, virus-host alignment-free similarity, virus-host shared CRISPR spacers and virus-host alignmentbased matches [54]. Bacterial hosts were predicted for contigs with a length greater than 10 kb and score higher than 95% according to VirHostMatcher-Net.

Statistics Analysis
Alpha diversity of phage community was measured using qiime2 (https://qiime2.org; accessed on 29 January 2020). Principal Coordinates Analysis (PCoA) based on "Bray-Curtis" similarities was performed using R (v3.2, package vegan, The R Foundation, Vienna, Austria, 2016). Permutational Multivariate Analysis of Variance (PERMANOVA) was used to test the significant difference. All data performed statistical analyses, which were conducted in

Gut Bacteriophage Community Structure Differed for BE and EAC Compared to Their Healthy Counterparts
On average, 43 ± 2% of all reads generated through sequencing were from viruses. In total, 854 ± 50, 1136 ± 19, 920 ± 33 viral contigs were obtained from sequences identified as viruses for CT, BE, and EAC, respectively. On average, from these contigs, over 95% of sequences were assigned to phages. The order of Caudovirales, which included Herelleviridae, Myoviridae, Podoviridae, Siphoviridae, and Unclassified Caudovirales, were the most abundant phages, accounting for more than 50% of total sequences in all three groups (Figure 1a, Figure S1). Among those phage families, the relative abundance of Herelleviridae was lower than 1% in three groups, the relative abundance of Myoviridae (1. 12 in EAC) showed great variation within each group (p > 0.05). Some viral contigs were assigned to other phage or viral families including Inoviridae, Microviridae, Tectiviridae, Herpesvirales, Marseilleviridae, and Pithoviridae with a relative abundance of less than 1%. Meanwhile, the large difference in specific viral taxa between individuals was observed in the same group, which may be attributable to multiple factors such as age, gender, diet, or drug usage (Table S1). We next determined the dominant phage replication cycle (lytic versus lysogenic cycle). On average, EAC samples had more temperate phages (lysogenic cycle) than BE and CT (p > 0.05), 11.97% ± 2.43% in CT, 13.47% ± 1.15% in BE, 19.13% ± 4.90% in EAC ( Figure S2). The percentage of predicted bacterial hosts in CT, BE, and EAC. The inner cycle represents bacterial hosts at the phylum level, the outer cycle represents bacterial hosts at the class level. The low quality represents bacterial hosts predicted by contigs with a length lower than 10 kb and the score was lower than 95%; (c) Viral alpha diversity including richness (Ace) and diversity (Shannon) in samples from CT, BE, and EAC; (d) PCoA plot of the viral community composition based on the Bray-Curtis distances in CT, BE, and EAC samples. CT represents stool samples from healthy controls; BE represents stool samples from Barrett Esophagus patients; EAC represents stool samples from Esophageal Adenocarcinoma patients. Error bars indicate the average ± SE. Statistical significance was determined by Kruskal-Wallis, Dunn's post hoc test, asterisk indicates p < 0.05. We next predicted the bacterial host range of the viral contigs from different groups in the study (Figure 1b). We observed that the bacterial hosts mainly spanned the phyla Actinobacteria, Bacteroidetes, Firmicutes, and Proteobacteria, which were common across all three groups. In addition, we found that less than 0.1% of the phages were predicted to infect Fusobacteria, Spirochaetes, and Synergistetes. When the predicted bacterial host in class level was further compared, their relative abundance showed more obvious variation among the different groups, but these results were not statistically significant. For Actinobacteria, the relative abundance in CT (1.33% ± 0.28%) and BE (1.77% ± 0.21%) was higher than EAC (0.37% ± 0.11%) (p > 0.05). For Flavobacteriia, the relative abundance in CT (5.02% ± 1.45%) and EAC (5.38% ± 1.45%) was higher than BE (1.14% ± 0.16%) (p > 0.05). Notably, the classes Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria were more abundant in CT compared with BE and EAC. Moreover, the relative abundance of Bacteroidia (13.08% ± 2.34% in CT, 4.38% ± 0.45% in BE, 1.25% ± 0.22% in EAC), Bacilli (5.97% ± 1.51% in CT,1.33% ± 0.14% in BE, <0.1% in EAC), and Erysipelotrichia (1.86% ± 0.54% in CT, 0.65% ± 0.093% in BE, <0.1% in EAC) were lower in BE and EAC compared to CT, while the relative abundance of Clostridia (15.06% ± 0.52% in CT, 18.04% ± 0.90% in BE, 29.20% ± 5.60% in EAC) was higher in BE and EAC com-pared to CT. However, there was no significant difference (Jonckheere trend test, p > 0.05). Furthermore, the remaining classes had a lower relative abundance (0.0001%-0.31%) across the three groups.
We further examined how the changes in phages community composition affected the overall diversity. For the alpha diversity, a significant difference in phage diversity (Shannon) was found among the three groups (p = 0.036), while no significant difference was observed in phage richness (Ace) (p > 0.05) (Figure 1c). Furthermore, the alpha diversity showed differences among BE and EAC compared to CT samples (p > 0.05). Specifically, in both BE (1136.17 ± 19.48) and EAC (920.50 ± 33.87), the richness (Ace) was higher compared with that in CT (854.00 ± 50.73). However, only in BE (6.50 ± 0.11), the diversity (Shannon) was higher compared with that in CT (4.53 ± 0.15). Furthermore, BE had a higher level of richness (Ace) and diversity (Shannon) than EAC. In addition, no significant difference was detected (p > 0.05) in beta diversity (PCoA) among the three groups (Figure 1d).

Abundant and Rare Phage Communities in the Gut May Contribute to the Progress of Esophageal Carcinogesis
We used a sorting approach commonly applied in ecological study that classifies microbes into three groups based on their abundance [55,56], aiming to explore the role of less abundant microbes in different ecosystems. Using this approach, the contribution of rare, less abundant, bacterial Operational Taxonomic Units (OTUs) to some of the key ecological functions was revealed in the environment [57], which was previously overlooked. We believe this approach can be beneficial for studying phages in the gut. To this end, we divided phage contigs into abundant phages (relative abundance was more than 1% in total viral contigs), moderate phages (relative abundance was more than 0.1% and less than 1% in total viral contigs), and rare phages (relative abundance was less than 0.1% in total viral contigs). At these three relative abundance levels, members of the order Caudovirales (Myoviridae, Siphoviridae, and Podoviridae) showed the highest relative abundance in all three groups (Figure 2a). Subsequently, we observed that abundant phages presented significantly higher relative abundance (79.54% ± 2.28% in CT, 54.28% ± 2.19% in BE, and 72.25% ± 4.06% in EAC) when compared with moderate (14.79% ± 1.83% in CT, 34.38% ± 1.68% in BE, and 21.19% ± 3.57% in EAC) and rare phages (4.51% ± 0.52% in CT, 11.34% ± 0.85% in BE, and 6.56% ± 0.52% in EAC) in all three groups (abundant vs. moderate p < 0.001, abundant vs rare p < 0.001) (Figure 2b,c), while the highest number of contigs was from rare phages (788 ± 48 in CT, 994 ± 18 in BE, and 836 ± 28 in EAC), exceeding abundant (13 ± 1 in CT, 17 ± 1 in BE, and 11 ± 1 in EAC) and moderate (54 ± 8 in CT, 126 ± 7 in BE, and 74 ± 15 in EAC) phages in all three groups (Figure 2b,c). The highest relative abundance of abundant phages and the highest number of contigs of rare phages may suggest their different behaviors in relation to the gut bacterial community and esophageal diseases. Moreover, a significant difference was observed in beta-diversity on abundant (p = 0.004) and rare phages (p = 0.003) ( Figure S3), which may imply that these two groups of phages showed higher sensitivity to the changes in the upper GI tract through esophageal disease progression. In addition, we found that the abundance of temperate phages that displayed a lysogenic replication cycle increased with the development of esophageal diseases. This may suggest a higher occurrence of HGT in these samples.
To further evaluate the importance of rare phages in HGT, we compared these three groups of phages to the number of bacterial hosts they infect. On the class level, we observed small differences between phage groups from different health conditions, rare phages infected 18 different bacterial classes whereas abundant phages infected 14 ( Figure 1c). However, when bacterial hosts were compared on the genus level, both diversity and abundance showed large differences, 84 for rare versus 46 for abundant phages (Table S2). In particular, contigs belonging to rare phages showed similar characteristics regarding the number of hosts they infect over three groups, showing a broader bacterial host range compared to moderate and abundant phages. For example, the contigs from rare phages were able to infect 6 or 7 different bacterial hosts at the genus level (Table S3), which was relatively higher than the bacterial hosts predicted for the contigs from abundant and moderate phages. The broader bacterial host range and higher number of contigs (Figure 2b, Tables S2 and S3) of rare phages could potentially lead to storing more AMGs in their genomes and, in turn, expand the frequency of HGT between gut bacteria. Figure 2. Composition of the rare, moderate, and abundant gut viruses in CT, BE and EAC samples. Rare, moderate, and abundant viruses were categorized based on the viral contig level. Abundant viruses represent viral contigs whose relative abundance was more than 1% in total contigs, moderate viruses represent viral contigs whose relative abundance was more than 0.1% and less than 1% in total contigs, and rare viruses represent viral contigs whose relative abundance was less than 0.1% in total contigs. (a) The relative abundance of viral families; (b) Number of contigs generated each viral contig category, rare, moderate, and abundant, on left and relative abundance of them on right. (c) Negative correlation between number of contigs, from rare, moderate, and abundant phages, and their relative abundance. CT represents stool samples from healthy controls; BE represents stool samples from Barrett Esophagus patients; EAC represents stool samples from Esophageal Adenocarcinoma patients. Statistical significance was determined by two-way analysis of variance [ANOVA], Tukey's post hoc test, asterisk indicates p < 0.05.

AMGs Found in Rare Bacteriophages Showed Increment in Esophageal Diseases
After annotation of the viral contigs, viruses were found to be involved in most of the microbial functions related to metabolism, cellular processes, genetic information processing, environment information processing, organismal system, and human disease (Figures 3a and S4). Significant differences were found for genes related to metabolism of cofactors and vitamins (p = 0.0083) and genes related to the prokaryotic defense system among the three groups (p = 0.0202) (Figure 3a). Genes involved in metabolism of cofactors and vitamins were found to be most abundant in CT phages, whereas genes related to the prokaryotic defense system were more abundant in EAC phages, suggesting a stronger arms race between phages and bacteria in this disease (Figure 3a). Notably, AMGs encoding bacterial toxins were found to be more abundant in the genome of rare bacteriophages including the spyA gene, tccC gene, entB gene and entD gene, which are involved in microbial cellular processes. The spyA gene, which encodes a C3 family ADP-ribosyltransferase (bacterial exotoxin) [58], showed a slightly higher level of relative abundance in BE and EAC (p > 0.05) compared with the other three AMGs (Figure 3b). Moreover, the spyA gene level was relatively higher in BE (0.00040 ± 0.00011) and EAC (0.0027 ± 0.0012) compared with CT (0.00031 ± 0.000012) (p > 0.05). Other AMGs that relate to LPS biosynthesis pro-teins were also found in the genome of rare phages including the lpxD gene, kdsC gene and gmnB gene, which are involved in microbial metabolism (Figure 3b). The lpxD gene only presented in BE with a relative abundance of 0.00031 ± 0.000113. The kdsC gene presented in BE (0.000089 ± 0.000036) and CT (0.0000024 ± 0.00000097). For the gmnB gene, it was relatively higher in EAC (0.00064 ± 0.00029) and BE (0.00024 ± 0.000094) compared with CT (0.00015 ± 0.000044) (p > 0.05). The higher abundance of these genes in phages from BE and EAC compared to CT may have resulted from the increase of pathogenic bacteria, mainly Gram-negatives, in the esophageal diseases, leading to a higher chance of obtaining AMGs, which are related to LPS biosynthesis proteins encoded by phages. We next explored the appearance of these genes in the Gut Phages Database (GPD) containing 142,809 non-redundant globally distributed phage genomes. We found many phages encoding these genes in GPD with one exception, tccC, showing these AMGs are ubiquitous in the human gut ( Figure S5). Toxin complex (Tc) is a multisubunit toxin consisting of three components (TcA, B, and C) encoded by pathogenic bacteria infecting both insects and humans. TcAs that make functional pores combine with TcB-TcC subunits to create active chimeric holotoxins. Tc toxins are encoded by human pathogens like Yersinia pestis, Y. pseudotuberculosis, and Morganella morganii and are believed to significantly contribute to these bacteria's pathogenicity. Yet, their role in EAC remains to be revealed [59]. The increase of these genes in phages from BE and EAC may contribute to the severity of these diseases through exchanging genes that are involved in bacterial exotoxin production and LPS biosynthesis in esophageal carcinogenesis. This warrants further investigation.

Discussion
Barrett's esophagus (BE) is a condition caused by the metaplastic replacement of the normal squamous epithelium by columnar epithelium. BE is closely associated with the development of esophageal adenocarcinoma (EAC), a disease in which cancerous cells develop in the tissues of the esophagus with a high mortality rate [42]. It has been recently shown that gut dysbiosis can activate oncogenic signaling pathways, leading to the production of tumor-promoting metabolites, and further influence the esophageal mucosal inflammation and tumorigenesis [60]. For example, gut bacteria regulate bile acid (BA) metabolism. Under stimulation such as a high-fat diet, the gut bacteria changed, and the level of BA increased accordingly [61]. The reflux of BA to the esophagus caused esophageal damage, leading to BE and subsequent EAC. In an animal experiment simulating BA reflux, overexpression of the inflammatory cells, IL-6 and TNF-α, was found [62]. This indicated that gut bacterial alterations could indirectly induce the esophageal mucosal inflammation and carcinogenesis [62][63][64]. Despite a wealth of data on the role of gut bacteria in GI tract disease, we have only recently recognized the association of gut viruses with some GI tract diseases, including CRC in which the diversity of the gut viruses is significantly increased in stool samples from CRC patients, suggesting a disease-specific signature that can be used to differentiate CRC samples from controls [37]. The CRC-associated virome includes primarily temperate bacteriophages belonging to Siphoviridae and Myoviridae families [65]. The impact of phages on gut homeostasis is not restricted to their interactions with gut bacteria as phages can directly interact with the human host. In vitro studies have demonstrated that phages can cross the epithelial cell layer through transcytosis, thereby stimulating the underlying immune cells [22,[66][67][68][69]. For example, the interaction between E.coli phages and the immune system has been associated with Type I Diabetes autoimmunity [36]. It has been reported that phages can activate IFN-γ produced by CD4+ T cells via the nucleotide-sensing receptor TLR9, which accelerates intestinal inflammation and colitis, leading to a systemic inflammation response [70]. The consistent disease-specific signature of gut viruses [27,37], suggests a potential association between gut viruses and human disease.
Studies that investigated the esophageal virome, using metagenomic data of whole microbial communities rather than profiling the isolated viral communities, have identified a range of phages, including Streptococcus, Campylobacter, Lactococcus, and γ-Proteobacteria phages [71]. The aforementioned and those that only explored the bacterial community of the esophagus have mainly used biopsy samples for virome and bacterium analysis [10,72,73]. Although, biopsies could directly reflect the disease-associated microbial signature at the lesion, the sampling procedure is invasive, time-consuming, costly, and may induce potential complications [74]. Moreover, biopsy samples often have limited microbial materials, with a lower probability of successful sequencing and downstream analysis [75]. Thus, an amplification step (e.g., whole genome amplification) is necessary, which might introduce biases to study results. On the contrary, stool samples collected by non-invasive methods often supply sufficient materials for research purposes [76].
Here we explored stool samples from BE, EAC, and CT phages community composition in esophageal diseases. Our in-depth gut virome analysis during esophageal carcinogenesis provided some evidence of gut phage community changes between different stages of esophageal diseases. Consistent with previous studies that have explored the gut viruses, mainly in the lower GI tract diseases such as IBD and CRC [27,65], phages from the order Caudovirales were the most dominant phages in the samples from esophageal diseases. Compared with CT, the alpha diversity has changed with the esophageal diseases progress, and a relatively higher alpha diversity was observed in BE samples compared to CT and EAC. This was not reflected in the beta diversity as no significant differences were observed among three groups. Using a common sorting approach in microbial ecology, we identified disease-associated differences in diversity and abundance of rare phages, suggesting a potential link between these phages and esophageal diseases. In addition, consistent with previous studies on diseases like IBD [77] and CRC [65], we observed changes in the proportion of lytic/lysogenic replication cycles of phages, and more temperate phages were observed in esophageal carcinogenesis. These results further support earlier studies that reported the dominance of virulent phages (lytic cycle) in the healthy human gut replaced by temperate phages in Crohn's disease and ulcerative colitis [23,24]. Furthermore, the relatively higher percentage of temperate phages in samples from esophageal diseases may imply more influence on the bacterial physiology through phage mediated HGT in those groups. However, we did not study the bacterial community of these samples, the community structure of the predicted bacterial hosts for the phages identified in the study may suggest a complex relationship between bacteria and bacteriophage community in esophageal diseases. Earlier studies on lower GI tract diseases such as CRC have observed that the effect of phages resulted from their interactions with the whole bacterial community, rather than the bacterial taxa directly contributing to the disease severity [65]. However, there was no direct correlation between bacterial diversity and phage diversity [27,37].
In addition, we found several AMGs in the genome of the rare phages, further emphasizing the potential role of phages in regulating bacterial physiology by supplying their host with beneficial genes. Specifically, a slightly higher abundance of spyA (p > 0.05) was observed in BE and EAC, potentially contributing to the production of bacterial exotoxins, which disrupt cytoskeletal structures and promote colonization of pathogenic bacteria [58]. The relatively higher abundance of AMGs related to LPS biosynthesis proteins were also found in BE and EAC, which may indicate the dominance of Gram-negative bacteria and the potential inflammatory effects of phage-bacteria interactions. Phages that carry these AMGs can introduce these genes to the genome of gut bacteria via integration, which may contribute to the severity of the esophageal diseases through lysogenic conversion. This could further induce gut inflammation through expression of the phage-derived virulence genes and deteriorate esophageal disease. Intestinal permeability caused by phage-mediated changes of gut microbiota could also lead to systemic inflammatory responses [78]. Given the high variability of the microbiome between individuals and the limited number of samples analyzed, it is difficult to identify significant differences in viral community structure between different groups in the current study. Thus, our findings should be further pursued with a larger sample size.

Conclusions
In summary, this study provides further evidence of potential relationship between gut phages and esophageal diseases. Interestingly, the distinct gut phage community structure was identified in two different stages of esophageal diseases, and these differences were mainly found in abundant and rare bacteriophages. Notably, rare phages and HGT mediated by them have been found to be more related to esophageal diseases. Specially, the rare phages contributed to enriching AMGs related to bacterial exotoxin and LPS biosynthesis proteins, and the possible upregulated level of these genes. These, in turn, may contribute to changes in the gut bacterial composition and inflammation, which lead to the development of esophageal diseases, as previously suggested [6]. However, given the small sample size in our study, the potential diagnostic importance of AMGs and disease-specific viral signature identified should be experimentally validated in further studies.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/microorganisms9081701/s1, Figure S1: Gene-sharing taxonomic network of viral sequences in this study, including viral RefSeq viruses v85. Figure S2: The proportion of lytic/lysogenic replication cycles predicted for the viral contigs from three groups. Figure S3: PCoA plot of the viral community composition based on the Bray-Curtis distances in CT, BE, and EAC samples. Figure S4: The relative abundance of different functional traits in viral sequences. Figure S5: The number of phages that contained the identified AMGs of this study in the Gut Phage Database (GPD). Table S1: Clinical information of individuals from three groups. Table S2: The relative abundance of bacterial host at genus level for abundant, moderate, and rare bacteriophages. Table S3: The percentage of contig relative abundance in different number of predicted bacterial genus types for abundant, moderate, and rare bacteriophages.