The human microbiota is a beneficial reservoir for SARS-CoV-2 mutations

ABSTRACT Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) mutations are rapidly emerging. In particular, beneficial mutations in the spike (S) protein, which can either make a person more infectious or enable immunological escape, are providing a significant obstacle to the prevention and treatment of pandemics. However, how the virus acquires a high number of beneficial mutations in a short time remains a mystery. We demonstrate here that variations of concern may be mutated due in part to the influence of the human microbiome. We searched the National Center for Biotechnology Information database for homologous fragments (HFs) after finding a mutation and the six neighboring amino acids in a viral mutation fragment. Among the approximate 8,000 HFs obtained, 61 mutations in S and other outer membrane proteins were found in bacteria, accounting for 62% of all mutation sources, which is 12-fold higher than the natural variable proportion. A significant proportion of these bacterial species—roughly 70%—come from the human microbiota, are mainly found in the lung or gut, and share a composition pattern with COVID-19 patients. Importantly, SARS-CoV-2 RNA-dependent RNA polymerase replicates corresponding bacterial mRNAs harboring mutations, producing chimeric RNAs. SARS-CoV-2 may collectively pick up mutations from the human microbiota that change the original virus’s binding sites or antigenic determinants. Our study clarifies the evolving mutational mechanisms of SARS-CoV-2. IMPORTANCE Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) mutations are rapidly emerging, in particular advantageous mutations in the spike (S) protein, which either increase transmissibility or lead to immune escape and are posing a major challenge to pandemic prevention and treatment. However, how the virus acquires a high number of advantageous mutations in a short time remains a mystery. Here, we provide evidence that the human microbiota is a reservoir of advantageous mutations and aids mutational evolution and host adaptation of SARS-CoV-2. Our findings demonstrate a conceptual breakthrough on the mutational evolution mechanisms of SARS-CoV-2 for human adaptation. SARS-CoV-2 may grab advantageous mutations from the widely existing microorganisms in the host, which is undoubtedly an “efficient” manner. Our study might open a new perspective to understand the evolution of virus mutation, which has enormous implications for comprehending the trajectory of the COVID-19 pandemic. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) mutations are rapidly emerging, in particular advantageous mutations in the spike (S) protein, which either increase transmissibility or lead to immune escape and are posing a major challenge to pandemic prevention and treatment. However, how the virus acquires a high number of advantageous mutations in a short time remains a mystery. Here, we provide evidence that the human microbiota is a reservoir of advantageous mutations and aids mutational evolution and host adaptation of SARS-CoV-2. Our findings demonstrate a conceptual breakthrough on the mutational evolution mechanisms of SARS-CoV-2 for human adaptation. SARS-CoV-2 may grab advantageous mutations from the widely existing microorganisms in the host, which is undoubtedly an “efficient” manner. Our study might open a new perspective to understand the evolution of virus mutation, which has enormous implications for comprehending the trajectory of the COVID-19 pandemic.

mutation: widely distributed in hosts, contributing to easy viral access to hosts, and being immunologically tolerant to hosts.The potential factors that facilitate benefi cial mutation are explored.Here, we designed a fragment-based blast approach and identified the homologous fragments (HFs) containing advantageous mutations in the outer membrane proteins of SARS-CoV-2.It was found that human microbiota might be the major host contributing to the viral mutation acquisition.

A viral mutation fragment-based blast was developed to identify homolo gous fragments harboring beneficial mutations
The first thing that was examined was the frequency of S protein mutations in all VOCs.We found that among the identified 3,116 S protein mutations, all 55 beneficial mutations had a mutation frequency higher than 90% (1,2).The S protein comprises two domains: the S1 domain, which contains the RBD, and the S2 domain, which is critical for membrane fusion (5).The RBD mediates attachment to human cells and is the primary target of neutralizing antibodies, and it carries 90% of S mutations, including the intensively studied N501Y, E484K, L452Q, and F486V (Fig. 1a) (12,35).In addition, some mutations in addition to those of the S protein may markedly affect the direction of the pandemic; these include T9I, a mutation in the membrane envelope (E) protein, leading to virulence reduction in the Omicron variant (36).This finding inspired us to include the high-frequency mutations in E and membrane (M) proteins, two other structural membrane proteins constituting the outer protein envelope and S protein together.Ultimately, a total of 61 mutations were identified: 55 S protein mutations, 2 E protein mutations, and 4 M protein mutations (Fig. 1b; Fig. S1a).Notably, the Omicron variants, including BA.4 and BA.5 subvariants, carry 38 mutations, which is a much higher number than that of any other VOCs (Fig. S1; Table S1).
We concentrated on the relevant viral mutation fragment (VMF), which consists of a mutant amino acid and its neighboring amino acids as a unit, as opposed to a single amino acid mutation alone.A previous study showed that PRRA, a VMF containing four aa, functioned as a furin cleavage site (37)(38)(39).The length of a VMF was tentatively set to be five aa, and a VMF-based blast, aiming to identify homologous fragments of these VMFs, was carried out with the NCBI Protein database, the largest protein database.We found that when the length of VMFs was five aa, numerous HFs were obtained for many VMFs.When the length of VMFs was extended to seven aa, approximately 97% of queries led to identical subjects.When the length was further extended to 9 or 11 aa, less than 7% of the queries led to matches.Therefore, VMFs containing seven aa (XXXMXXX) were selected for the original analysis database creation, with "M" indicating one mutation and "X" indicating the adjacent aa (Table S2).Among the subjects obtained, only those fulfilling all of the following criteria were collected: (i) 100% VMF identity, that is, the seven aa was fully recognized in the database; (ii) 100% homologous sequences; and (iii) sequence origins from SARS proteins were ignored.At this stage, we obtained approximately 8,000 subjects in total.To prevent duplicative statistics and overcome the limitation of the lower error, we defined two filter conditions: (i) for multiple accessions for one protein, only one accession was retained, and (ii) for multiple isoforms for the same protein, only one isoform was retained.After filtering, 5,600 subjects were collected and were used to establish the final database for analysis, and it included the name, ID number, species, and submission number for each HF (Fig. 1c; Tables S3-1, S3-2, S3-3, and S4).

Bacteria contributed more than 60% of the homologous fragments
The species proportion in the final database was analyzed.We were astonished to find that bacteria contributed more than 60% of the HFs.At the same time, Animalia, Fungi, Plantae, and Protista accounted for 19%, 11%, 4%, and 2%, respectively (Fig. 1d).Notably, of the 742,923 subjects in the NCBI protein database, more than 93.5% were found in eukaryotes.Fewer than 5% were found in bacteria (Fig. 1e; Fig. S2).The markedly inverse proportion inspired us to pay close attention to the bacteria.The species proportions were further analyzed based on individual mutation location, and no significant difference was found among S, E, and M protein mutations (Fig. 1d).Then, the database was analyzed from the perspective of individual mutations.We found that the number of HFs ranged from 0 to 2,075, averaging 92 for each of the 61 evaluated mutations.No HFs were found for three mutations (D80A, G142D, and G339D), account ing for only 7% of the total evaluated mutations.Interestingly, although bacterial HFs were identified for 93% of the mutations, for each mutation, the bacterial proportion was not correlated with the number of HFs.For certain mutations, both the identified HF and bacterial contributions were very high; these mutations included S447N (209), T20N (119), T76I (264), and P71L (1,523).In contrast, fewer than five HFs were found for some mutations, such as N501Y (5), L452R (3), and T716I (1), but all of the HFs were from bacteria.Notably, the bacterial proportion for N679K was 0%, and for the other six mutations, although the corresponding HF number was high, it was 13, arguing against the idea that the bacterial contribution that was revealed was a coincidence (Fig. 1f).

The human microbiota is a reservoir of homologous fragments
Next, through categorical and quantitative analysis, we attempted to comprehend the makeup of the microorganisms.Based on phylogeny and taxonomy theories, the 3,271 obtained bacterial species belonged to 73 different phyla.Despite the diversity of these phyla, most of the bacterial species obtained were classified into four major bacte rial phyla, namely, Proteobacteria, 41%; Actinobacteria, 29%; Bacteroidetes, 8%, and Firmicutes, 8% (Fig. 2a).Notably, the four major phyla containing HFs were consistent with the four major phyla of human microbiota, although the proportions were different (40,41).The composition of the bacterial species with S, M, and E protein mutations was then analyzed individually, and a composition pattern similar to that of the mutations in total was obtained.The correlations between the mutations and bacterial phyla with proportions greater than 1% were plotted.We found that all 61 mutations matched multiple bacterial phyla (Fig. 2b).For example, HFs carrying I82T were found in descend ing order in Proteobacteria, Bacteroidetes, Firmicutes, and Actinobacteria, while HFs containing T9I were found in descending order in Bacteroidetes, Actinobacteria, and Firmicutes.In addition, some bacterial phyla carried multiple HFs, including Proteobac teria, which contributed more than 51 different types of HFs (Fig. 2b).To deepen our understanding of the composition of the contributing bacteria, core species were further categorized at the family and genus levels.The four major bacterial phyla contained 331 families, including 132 in Proteobacteria, 89 in Actinobacteria, 30 in Firmicutes, and 28 in Bacteroidetes (Fig. S3a).At the genus level, Streptomyces and genera in the Pseudo monadaceae, Enterobacteriaceae, Micromonospora, Xanthomonadaceae, Rhizobiaceae, Burkholderiaceae, Flavobacteriaceae, and Comamonadaceae families showed average HF abundances of ≥1%, and therefore, they were considered to be the top 10 HF-contri buting genera (Fig. S3b).
The human body contains trillions of microorganisms that infect the skin, oral cavity, intestines, and even the lungs, making it the biggest host of SARS-CoV-2 (42)(43)(44)(45).and gut microbiota (cyan) (below right) at the phylum level.In detail, we collected information on the types and respective counts of genera within the top nine bacterial phyla from our study samples.In parallel, we obtained the types and quantity of human microbiota, gut microbiota, and lung microbiota from the database.The collected data were processed to calculate the proportions of different bacterial phyla and genera.Through Venn comparison, the common bacteria were identified.Then, we calculated the proportion of these common bacteria in each phylum.
The four main phyla between the human microbiota and the bacteria were identical, enabling deep correlation analysis.As shown in the radar chart, almost all bacteria belonged to the human microbiota at the phylum level.In COVID-19 patients, the lung is a major organ affected by viral infection or viral-bacterial coinfection.Approximately 56% of the bacteria we obtained are found in the lungs of healthy individuals; these include Streptomycetaceae and Staphylococcaceae (Fig. 2c) (42,46).Although bacteria in the lungs are abundant, microbial cell populations reach their highest density in the intestines, where they form the gut microbiota and are believed to be important to human life (43,44).Interestingly, 83% of the obtained bacterial species were also found in the human gut microbiota database (Fig. 2c), implying the importance of the gut microbiota.

The species composition of the homologous fragments is similar to that of the human gut microbiota in COVID-19 patients
Furthermore, 10% of COVID-19 patients also had diarrhea and other gastrointestinal problems (47,48).It has been reported that in approximately 50% of COVID-19 patients, the virus was found in feces, leading to the hypothesis that there is not only replication and, therefore, activity in the intestine but also that the virus resides for prolonged periods in the intestines (49)(50)(51).Increasing evidence suggests that gut bacteria are altered in COVID-19 patients, and these alterations are associated with infection severity, treatment effectiveness, and prognosis (52,53).The gut microbial ecological network is significantly weakened and becomes diffuse in patients with COVID-19, and together, the number of beneficial bacteria is decreased, and harmful bacteria production multiplies (52,(54)(55)(56).Two gut microbe databases of COVID-19 patients (gutMEGA) were analyzed.Compared with the composition of non-COVID-19 gut microbes, two abnormal changes were detected in the COVID-19 entries.The population of beneficial bacteria, particularly Firmicutes, significantly declined to 47% from 72%, while that of harmful bacteria, such as Proteobacteria, increased to 9% from 1%.Notably, a similar change was observed for the obtained bacteria carrying HFs, including an increase in harmful Proteobacteria to 41% and a reduction in beneficial Firmicutes to 8% (Fig. 3a) (52,53,56).The influences of gut bacteria changes identified in the two databases were perhaps best exemplified by an intersection analysis with the bacteria containing S protein mutations (Fig. 3b).In the two COVID-19 patient databases, at the family level, 17 and 11 coexpressed bacteria were identified; at the species level, three coexpressed bacteria were identified.These correlations at the genus and species levels were further analyzed (Fig. 3c).At the genus level, we found that among the 55 S mutations, 15 were detected in 19 differential bacterial genera in COVID-19 patients.At the species level, three types of bacteria in COVID-19 patients were found to carry mutational HFs (Fig. 3c).Notably, one genus of bacteria, Escherichia, carried three different mutations (S371P, S373P, and S477N).These strong associations supported the hypothesis that SARS-CoV-2 might obtain HFs from the human microbiota, especially in light of the small size of the present gut bacteria database of COVID-19 patient data.
After the emergence of BA.5, various new Omicron subvariants appeared, including BA.5.2 and BF.7, two major pandemic variants, and BQ.1.1 and XBB, the most resistant SARS-CoV-2 variants discovered to date.These new recombinant subvariants carried five new beneficial mutations, as determined on the basis of the original Omicron, with mutation frequencies higher than 90%, including R346T, K444T, N460K, and F486S in the S protein and T11A in the E protein (Fig. S4a).We applied our VMF-based blast model to search for potential HFs carrying these five mutations.As expected, bacteria contributed more than 80% of the HFs with these five mutations, while Animalia, fungi, Plantae, and Protista accounted for 4%, 4%, 5%, and 1%, respectively (Fig. S4b; Table S5).Moreover, the identified bacterial species were also classified into four major bacterial phyla, namely, Firmicutes, 49%; Proteobacteria, 26%; Actinobacteria, 11%; and Bacteroidetes, 4% (Fig. S4c), which were the same as those carrying mutations previously identified.

SARS-CoV-2 RdRp replicates bacterial mRNAs to form chimeric viral-bacterial RNAs harboring mutations
In contrast to SARS-CoV-2, bacterial HFs are encoded by mRNAs.Nucleotide sequences of bacterial mRNAs encoding the same amino acid sequences as viral RNAs were aligned, revealing a percent identity (PNI) from 63% to 90% in the seven examined fragments (Fig. S5).A PNI less than 100% can be possibly due to the degeneration of the genetic code and particularly the presence of synonymous mutations in SARS-CoV-2 (GenBank accession: NC_045512.2;Wuhan-Hu-1) (57,58).SARS-CoV-2 replication is mediated by a multisubunit replication-and-transcription complex of viral nonstructural proteins (nsp), of which nsp12 is the core component of the RNA-dependent RNA polymerase (3).When these bacterial mRNAs introduce mutations into a viral genome, they must serve as templates for the viral RdRp complex, which was examined experimentally (Fig. 4a).The bacterial mRNAs (up to 27 nt long) carrying mutations such as N501Y, E484K, and L452R, which were fused with a previously validated viral duplex RNA, were successfully replicated by an RdRp complex containing nsp12, nsp7, and nsp8 (Fig. 4b; Table S6).This experiment demonstrated that exogenous bacterial mRNAs are effective templates for SARS-CoV-2 RdRp.
During SARS-CoV-2 replication, base pairings with 6-12 consecutive nucleotides can serve as "junction" sites for template switching, i.e., RdRp switches from copying one genome to copying another, resulting in a subgenome (59)(60)(61)(62).Notably, under certain circumstances, such as when lesions are included in a template, replication-impeding secondary structures are formed, and the tightly bound nucleotide pool is imbalanced, template switching may also be evident among RNAs of different species (61,63,64).Whether the complementary base pairs between bacterial mRNA and viral RNA can act as "junction" sites was therefore examined.As shown in Fig. 4c, the nascent primer strand from SARS-CoV-2 contained 11/12/17 nucleotides; three bacterial template mRNAs harboring the N501Y mutation carried 11 nucleotides complementary to the viral primer, but zero, one, or six mismatches in the 5′ end.Among the three nucleotide groups, only the last group failed to extend.Then, two more types of bacterial template mRNAs harboring G496S (12 complementary nt) or T9I (12 complementary nt) were individually examined under identical conditions.The extension was observed for all G496S mRNAs regardless of the number of mismatches in the 5′ end, but the extension failed for all the T9I mRNAs (Fig. 4b through d; Table S6).The different replication efficiency was perhaps due to the insufficient efficacy of RdRp in the in vitro system.Although these results demonstrated that nucleotide mutations in bacterial mRNAs can be introduced into viral RNAs during replication via the RdRp, clear evidence showing that mutations can be integrated into the SARS-CoV-2 genome was not found.Because one cannot rule out the possibility that artificial coinfection of live viruses and bacteria in laboratories may promote the emergence of new mutations with unpredictable capabilities, experiments with viral RNA and live bacteria are currently prohibited at our organization.
Globally, there are now 700 million infections as of December 2022.The ultrahigh number of infections and wide regions of infection may have provided a wealth of bacterial carriers for the Omicron variant that was not available for other variants.Although data errors cannot be ruled out due to different infection density data and variable persistence of gene surveillance for different variants, all the evidence provided above supports the idea that SARS-CoV-2 may acquire HFs from the human microbiota, as is summarized in a schematic graph (Fig. 5).SARS-CoV-2 VOCs have changed from the original strain to chimeras that incorporate elements of the human microbiota.

DISCUSSION
There is evidence to support the theory that SARS-CoV-2 has a high potential for acquiring advantageous mutations (6,18).Here, a VMF-based blast analysis was performed to explore sources of high-frequency beneficial mutations identified in VOCs.The human microbiome accounted for about 70% of the species with HFs that were detected.In particular, the changes in the pattern of gut bacteria in COVID-19 patients are quite similar to those of the bacteria we identified harboring HFs.In addition, nucleotide mutations in bacterial mRNAs can be introduced into viral RNAs during replication via RdRp, suggesting that the human microbiome may facilitate SARS-CoV-2 mutation accumulation as a possible template reservoir.
There are at least three possibilities that could account for the viral access to bacterial mRNA in the unlikely event that SARS-CoV-2 acquires mutations from human bacteria.In Scenario 1, bacteria and viruses are both exposed to phagocytes, including macro phages and neutrophils.Bacterial coinfection is a common complication of SARS-CoV-2 infections.In particular, severely affected patients suffer a significantly higher rate of coinfection with bacteria (26%) (65).The coinfecting bacteria include mainly Acineto bacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa, which were also identified in our study.These scenarios provide infinite possibilities for virus-bacteria interactions during the mutational evolution of SARS-CoV-2.
In light of this, during coinfection, macrophages are likely to serve as both bacterial and viral genome hosts (66)(67)(68).In Scenario 2, bacterial mRNA is exposed to nonphago cytic cells, such as infected cells or impaired cells.During SARS-CoV-2 infection, the immune barrier is impaired, which may cause invasive pathogens such as Streptococ cus to break through the immune barrier and invade nonphagocytic cells (69,70).The invaded host cells can target intracellular bacteria through autophagic machinery to lyse the bacteria and block bacterial proliferation.Nevertheless, phagocytosis by phagocytes may result from autophagy's inability to stop the growth and multiplication of bacteria.At this point, the bacterial genome and the viral genome may coexist in host cells or phagocytes.In addition, the genetic information from the lysed bacteria may be engulfed by infected cells in some cases (71)(72)(73).In Scenario 3, some bacte ria might be potential hosts for the virus.The mechanism underlying the invasion of bacteria by phages (a group of viruses that infect bacteria) has been elucidated and has become a reliable gene-editing method (74).Furthermore, some bacterial strains can raise the frequency of viral coinfection in mammalian cells, allowing for genetic recombination between two distinct viruses, eliminating harmful mutations, and regaining viral fitness (75).Recently, studies have shown that some ciliates, a type of single-cell organism, consumed virus particles, which fostered their population growth, revealing an unexpected role for viruses in ecosystems (76).As the two most abundant microbial entities, the coexistence of bacteria and viruses remains puzzling.The interactions between the virus and bacterium throughout the mutational evolution of SARS-CoV-2 are endless in these settings.
Numerous abrupt changes in the viral sequence during evolution have been found in SARS-CoV-2 reference genome analysis, indicating possible recombination events (6,18,21).The present study supports the idea that the complementary base pairs between bacterial and viral RNAs promote the recombination and production of chimeric RNAs harboring mutations.For viruses, homologous recombination is the most common type of recombination, including homologous recombination among viruses of the same genus and homologous recombination between viruses of different genera but with high relatedness, cross-species recombination (77)(78)(79)(80)(81)(82).RNA-RNA interactions are the determinants for the template switching efficacy among SARS-CoV-2 RNAs, with 6-12 consecutive complementary nucleotides serving as "junction" sites.The longest bacterial HF that we found was nine aa, which is noteworthy since it shows that bacterial and viral RNAs can exchange up to 27 consecutive complementary nucleotides.In a less-than-ideal in vitro experimental system used in the present study, we showed that 11 consecutive complementary nucleotides were sufficient to generate chimeric RNAs resulting from template switching.During homologous recombination, nucleotide mutations were simultaneously introduced into the viral chimeric RNA.Notably, not all mutations were matched to an HF (Fig. 1f).Consequently, the findings are consistent with the replication-associated mutation and evolutionary pressure selection theories, even though our investigation highlighted the significance of RdRp-mediated homolo gous recombination between bacterial and viral RNAs (57,(83)(84)(85).Our study speculates that bacteria may serve as potential hosts, possessing advantages such as richness, diversity, evolutionary adaptability, and immunological tolerance, which facilitate the rapid acquisition of mutational genetic material by viruses.The relationship between our study and other mutation theories requires further investigation, and we do not exclude the possibility of coexistence under different conditions.
Our work highlights the pivotal role of bacteria in facilitating viruses to acquire advantageous mutations.Bacteria offer several benefits, including their abundance, diversity, evolutionary adaptability, and tolerance to host immunity.The interaction between viral infections, especially SARS-CoV-2, and the human microbiome has garnered significant attention in recent scientific research, revealing intricate dynamics in host-pathogen interactions.Viral infection disrupts the human microbiome equilibrium in the gut, lungs, and oral cavity, influencing disease progression and viral evolution (86,87).For instance, a shift toward microbiota with reduced short-chain fatty acid production compromises gut integrity and immune regulation (86,88).Moreover, virus-induced changes in microbiota composition can modulate host immune responses, potentially exacerbating symptoms and complications by weakening immune defen ses or creating favorable environments for viral replication (86,88,89).Additionally, microbiota-derived natural products exert environmental selection pressures, favoring specific viral mutations and influencing viral adaptability, such as bacteriocins affecting bacterial and viral community dynamics (90).Understanding this complex relationship not only advances infectious disease research but also underscores the importance of leveraging these interactions for innovative disease prevention and management strategies.
COVID-19 is rapidly spreading worldwide.In addition to tracking the evolution of SARS-CoV-2 from its first introduction into human populations to the present, scien tists are also investigating the mechanisms underlying the acquisition of SARS-CoV-2 variations.They are tracking recombination and transmission events in the SARS-CoV-2 population in real time.Our research broadens the perspective to include virus and microbiota studies but leaves many questions unanswered, such as how and where SARS-CoV-2 access bacteria in vivo and how chimeric RNAs are harboring mutations integrated into the SARS-CoV-2 genome.Whether bacteria can be potential hosts in the process of virus transmission and the relationship between viruses, bacteria, and hosts deserves further exploration.Based on the examination of acquired mutations in the virus, which was not enough to account for the bidirectional causative relationship between viruses and bacteria, our conclusions were reached.In addition, due to multiple factors and experimental restrictions, we were unable to provide more experimental evidence, but we hope that more researchers in this field can perform in-depth investigations.For example, bacterial-viral co-infection models will elucidate genetic recombination mechanisms between viruses and the microbiome.Assessing long-term viral effects on the host microbiome may link these impacts to disease development and treatment response.Our research has significant ramifications for understanding the COVID-19 pandemic's future course and creating preventive and therapeutic measures.

Identification of HFs
Homologous fragments' identification was carried out in the NCBI Protein database.VMFs (seven aa) were submitted to the NCBI protein blast research website.The alignment parameters were set as Database Non-redundant protein sequences (nr), Max target sequences 5,000, Expect threshold 0.05, and Word size 7.Among the obtained subjects, only those that fulfill all the following criteria were collected: (i) 100% VMF identity, that is, the query seven aa could be fully recognized by the database; (ii) 100% homologous sequences; and (iii) sequence origins from SARS proteins were ignored.In order to avoid duplicate statistics and overcome the limitation of the lower error, we further defined two filter conditions: (i) multiple accessions for one protein, only one accession was retained; (ii) multiple isoforms for the same protein, only one isoform was retained.After filtering, the name, ID number, species, and submission number were collected.

Sequence data processing, inferring gut microbiota composition, and statistical analysis
The composition of the obtained HFs of bacteria was determined through categoriza tion and quantitative analysis through the NCBI database and the UniProt database.Following this, microbiota composition profiles were inferred from qualityfiltered forward reads using MetaPhlAn218 V.2.7.7 with the V.20 database.Associations of HFs' microbial species with human microbiota parameters were identified using gutMEGA, GMrepo, China National GeneBank Microbiome, Human Oral Microbiome Database V3, and NIH Human Microbiome Project.

RNA extension assays
All RNA oligonucleotides (templates and primer) were chemically synthesized by Genscript, in which the primer had a FAM-labeled (5(6)carboxyfluorescein) at the 5′ end.Template and primer oligonucleotides were annealed at a 1:1 molar ratio by a 5-min heating at 85°C followed by gradually cooling to room temperature in the annealing buffer (10 mM Tris-HCl, pH 8.0, 2.5 mM EDTA, and 25 mM NaCl).The primer sequences are listed in Table S6.

FIG 2
FIG 2 Homologous fragment composition analysis of the obtained bacteria.(a) Pie chart representations of the obtained bacteria, all items (up) and items of spike, envelope, and membrane proteins (bottom) at the phylum level.(b) Chord diagram displaying the network of mutated HFs and their bacterial phyla.(c) Radar maps showing the correlation of the identified bacteria sources of HFs (red), human microbiota (purple) (top), lung microbiota (green) (below left),

FIG 3
FIG 3 Correlation of the identified bacteria carrying homologous fragments and those in the human gut microbiota in COVID-19 patients.(a) The abundance of microbial phyla detected in HFs and stools from in-hospital patients with COVID-19 and non-COVID-19 individuals.The average relative abundan ces of COVID-19 and non-COVID-19 individuals were obtained from two databases (gutMEGA).The baseline proportions of the healthy human microbiome composition are Firmicutes, approximately 72%, Bacteroidetes, which constitute around 24%, Actinobacteria, approximately 2.5%, and Proteobacteria, which comprise around 0.5% of the total bacterial population.(b) Intersection analysis showing the obtained bacteria containing HFs with the gut microbes of COVID-19 patient databases at the genus (left) and species (right) levels.Two databases were used (gutMEGA).(c) A Sankey diagram showing the relationship of the mutations in the spike protein with changes in bacteria containing HFs in COVID-19 patients at the genus (left side) and species (right side) levels.

FIG 4
FIG 4 SARS-CoV-2 RdRp introduces bacterial mRNA nucleotide mutations into viral RNAs through homologous recombination.(a) Schematic diagram showing the RdRp extension experiment.(b) RNA duplex with a 5′-21 or 27 nt from bacterial mRNA overhang as the template for primer extension and RdRp-RNA complex assembly (exemplified by N501Y).The primer strand is labeled with a fluorescence molecule at the 5′ end.The gels are spliced for space conservation and to enhance the layout of the figure.The original data are shown in Fig. S6.(c) The primer strand from SARS-CoV-2 contains 11 or 12 nucleotides that complementarily pair with bacterial template mRNAs but have zero, one, or six mismatches at the 5′ end and in a duplex formation with the bacterial template mRNAs harboring N501Y, G496S, or T9I mutations for RdRp extension (exemplified by N501Y).(d) Schematic diagram showing RdRp introducing a nucleotide mutation into a bacterial mRNA to produce a bacterial-viral chimeric RNA via homologous recombination.

FIG 5
FIG 5 SARS-CoV-2 VOCs are chimeras containing homologous fragments from the human microbiota.Schematic graph showing SARS-CoV-2 acquiring HFs from human microbiota as exemplified by the S protein.The diagrams were created with Biorender.com.