Increased Abundance of Achromobacter xylosoxidans and Bacillus cereus in Upper Airway Transcriptionally Active Microbiome of COVID-19 Mortality Patients Indicates Role of Co-Infections in Disease Severity and Outcome

ABSTRACT The modulators of severe COVID-19 have emerged as the most intriguing features of SARS-CoV-2 pathogenesis. This is especially true as we are encountering variants of concern (VOC) with increased transmissibility and vaccination breakthroughs. Microbial co-infections are being investigated as one of the crucial factors for exacerbation of disease severity and complications of COVID-19. A key question remains whether early transcriptionally active microbial signature/s in COVID-19 patients can provide a window for future disease severity susceptibility and outcome? Using complementary metagenomics sequencing approaches, respiratory virus oligo panel (RVOP) and Holo-seq, our study highlights the possible functional role of nasopharyngeal early resident transcriptionally active microbes in modulating disease severity, within recovered patients with sub-phenotypes (mild, moderate, severe) and mortality. The integrative analysis combines patients’ clinical parameters, SARS-CoV-2 phylogenetic analysis, microbial differential composition, and their functional role. The clinical sub-phenotypes analysis led to the identification of transcriptionally active bacterial species associated with disease severity. We found significant transcript abundance of Achromobacter xylosoxidans and Bacillus cereus in the mortality, Leptotrichia buccalis in the severe, Veillonella parvula in the moderate, and Actinomyces meyeri and Halomonas sp. in the mild COVID-19 patients. Additionally, the metabolic pathways, distinguishing the microbial functional signatures between the clinical sub-phenotypes, were also identified. We report a plausible mechanism wherein the increased transcriptionally active bacterial isolates might contribute to enhanced inflammatory response and co-infections that could modulate the disease severity in these groups. Current study provides an opportunity for potentially using these bacterial species for screening and identifying COVID-19 patient sub-groups with severe disease outcome and priority medical care. IMPORTANCE COVID-19 is invariably a disease of diverse clinical manifestation, with multiple facets involved in modulating the progression and outcome. In this regard, we investigated the role of transcriptionally active microbial co-infections as possible modulators of disease pathology in hospital admitted SARS-CoV-2 infected patients. Specifically, can there be early nasopharyngeal microbial signatures indicative of prospective disease severity? Based on disease severity symptoms, the patients were segregated into clinical sub-phenotypes: mild, moderate, severe (recovered), and mortality. We identified significant presence of transcriptionally active isolates, Achromobacter xylosoxidans and Bacillus cereus in the mortality patients. Importantly, the bacterial species might contribute toward enhancing the inflammatory responses as well as reported to be resistant to common antibiotic therapy, which together hold potential to alter the disease severity and outcome.

microbiome, especially transcriptionally active isolates, in the patient cohort from India which could drive the COVID-19 disease severity sub-phenotypes. Through an integrative genomics approach, combining clinical data, SARS-CoV-2 genome information, and resident microbes, we have highlighted the significance of clinical parameters, SARS-CoV-2 plus respiratory viruses, transcriptionally active microbial diversity, their relative abundance, functional inferences, and enrichment of metabolic pathways of the nasopharyngeal microbiome in COVID-19 patients with different clinical sub-phenotypes and outcomes. Specifically, the identification of significant differential presence of transcriptionally active bacterial species in severe and mortality patients, with the metabolic pathway analysis, allowed us to propose a plausible mechanism that might help to understand additional aspects leading to COVID-19 severity.

RESULTS
Patient clinical characteristics and disease severity in COVID-19 subgroups. A total of 198 COVID-19 patients included in the study were initially segregated into two broad categories based on clinical outcome: recovered (n = 177) and mortality (n = 21). Recovered patients were further divided into three sub-phenotypes: mild (n = 85; 42.9%), moderate (n = 73; 36.8%), and severe (n = 19; 9.6%). The clinical parameters were thoroughly investigated for plausible significant factors that could account for the difference in the disease severity and outcome among the patients Table 1. The statistical correlations and numerical patient distribution of continuous and categorical variables, respectively, across all patient categories has been illustrated in Fig. 1.
Based on the clinical data, a preponderance of male patients (72.72%) was observed in the cohort as a whole, which was true for the mild, moderate, and severe patients of the recovered group (P-value = 0.006) as well as mortality patients, as observed globally (26). The median age of patients within the recovered group (52 years) was significantly different from that in the mortality group (64.5 years) (P-value = 0.021). Classification of recovered patients also shows age as a significant modulator of severity wherein we observed age to be increasing with disease severity (P-value = 0.001). Importantly, the median age of severe patients is comparable to deceased patients (64.5, 63 years) indicating the role of other factors in disease outcome (Fig. 1a). Possibly not unexpected, low peripheral oxygen saturation (SpO 2 ) levels and breathlessness, were significant features of the mortality group compared to the recovered (Fig. 1b). Above factors reflect the association of these parameters with disease severity, and importance of these parameters for severity classification.
The presence of comorbid conditions predisposes patients to an unfavorable clinical course. The comorbid conditions of hypertension (52.38%), diabetes (38.09%), and kidney disorder (23.8%) prevailed more in the mortality group than the recovered in our study, yet the association was statistically significant only for kidney disorder (P-value = 0.011). Stratification within the recovered group revealed significant associations with comorbid conditions of diabetes (P-value , 0.001), hypertension (P-value , 0.001), and heart disease (P-value = 0.007). Overall, only 19.04% of patients from the mortality group did not carry any comorbid condition while 41.24% patients in the recovered group had no comorbidities (P-value = 0.01). Apart from significantly lower hospital stays in the mild group (average 10 days) when compared with moderate group (average 14 days) (P-value ,0.001), both hospital stay and the SARS-CoV-2 RT-PCR based cycle threshold (Ct) values for RdRp and E gene were found to be similar across patient categories (Fig. 1c, d and e).
It was important to ascertain whether clinical factors above had plausible effect on the abundance of transcriptionally active microbial isolates between the disease severity subphenotypes. Thus, we checked for the confounding effects of statistically significant clinical parameters in our study cohort. We performed a correlation analysis (using Pearson's correlation and point-biserial correlation) between bacterial transcript read counts and the statistically significant parameters of age and comorbidities, respectively. We find that the RT-PCR Ct value for RdRp gene in four disease severity sub-phenotypes: mild (green), moderate (yellow), severe (blue), and mortality (red), with statistical significance measured using Mann-Whitney U test. * represents P-value ,0.05, ** represents P-value ,0.001. (e) Categorical clinical features like symptoms and comorbidity information for the patients in the four severity sub-phenotypes: mild (green), moderate (yellow), severe (blue), and mortality (red). r-score between age and cumulative bacterial transcript read count was 20.0077 with P-value 0.94 (using Pearson's correlation), which is not statistically significant. A point-biserial correlation analysis between bacterial transcript read counts and presence and absence of comorbidities, gave r-score of 0.104 and P-value of 0.33 which again is statistically non-significant. Taking together the above observations, it reiterates that in the present study cohort, age and comorbidities are not significant confounders.
Genomic characteristics of SARS-CoV-2 across the clinical sub-phenotypes. To observe the genomic characteristics of SARS-CoV-2, phylogenetic distribution and clades, as well as the mutational spectrum, were analyzed. Phylogenetic and clade distribution revealed that the majority of the samples belonged to the clades 19A (50.62%), and 20A (43.75%) with a relatively minor presence of clade 20B (5.62%) (Fig. 2a). Upon observing the presence of patient severity classes across SARS-CoV-2 clades, we note that 70% of the mortality cases were distributed in clade 20A but, every clade showed a substantial presence of other severity classes (Fig. 2b), thus, diluting the effect of clade specificity for any particular clinical sub-phenotype.
An in-depth analysis of the SARS-CoV-2 genomes at the mutation level allowed us to capture a total of 3,614 mutations across our sample cohort. Following the Fisher exact test, six mutations were found to be significantly linked with the clinical outcomes (P-value ,0.05). A correlation analysis of these six mutations revealed association of two mutations, N:P13L (non-synonymous) and S:789Y (synonymous), with the recovered patients and four mutations, S:Q677H (non-synonymous) and ORF1b:1804L, 5'UTR:C241T, and M:71Y (synonymous), mutations with mortality patients (Fig. 2c; Table S1).
With few mutations identified to be significantly associated with the recovered and the mortality patients, further study is required to understand its functional importance in delineating disease severity and outcome, which merits an independent study. Thus, we hypothesized whether the nasopharyngeal transcriptionally active microbial isolates can provide predictive modulators of disease severity in the background of primary SARS-CoV-2 infection. Nasopharyngeal bacterial profiling and characterization across patient subphenotypes. We used the holo-transcriptome analysis to understand the transcriptionally active microbial populations in a subset of patients with mild (n = 24), moderate (n = 36), severe (n = 14), and mortality (n = 12). Alpha diversity (Shannon index) values across all four sub-phenotypes was observed to be low with no significant difference between the effective number of species (ENS) across sub-phenotypes ( Fig. 3a; Table S2), whereas Beta diversity (Bray-Curtis distance matrix) analysis (Fig. 3b) showed non-distinct clustering by PERMANOVA test. Interestingly, microbial abundance analysis via examining the transcriptionally active isolates revealed significant differences in the transcriptional intensity of bacterial species between the four sub-phenotypes wherein we specifically observed an enhanced presence of bacterial transcripts in severe and mortality patient's nasopharynx compared with mild/moderate groups ( Fig. 3c; Table S3). This finding led us to elucidate further as to which species were causing this differential abundance. Towards this, top 30 bacterial species were identified by mapping the relative bacterial transcriptional active isolates across all the 86 patients (Fig. 3d). Veillonella Parvula, Ralstonia solanacearum, Staphylococcus haemolyticus, and Prevatolla Jejuni were found to be highly abundant across all the four sub-phenotypes. An extensive literature review of top 30 bacterial species along with its reported association with SARS-CoV-2 has been provided as Table S4.
Each of the top 30 bacterial species were analyzed for its significant presence in any of the four clinical sub-phenotypes. Importantly, we identified six bacterial species, Achromobacter xylosoxidans, Bacillus cereus, Leptotrichia buccalis, Veillonella parvula, Actinomyces meyeri, and Halomonas sp. showing significant differential transcriptional intensity across sub-phenotypes ( Fig. 3e to j). Among these species, Achromobacter xylosoxidans and Bacillus cereus were significantly associated with mortality ( Fig. 3e and f) whereas Leptotrichia buccalis with severe group (Fig. 3g). Although Veillonella parvula was found to be highly abundant across the entire cohort, a statistically significant differential enrichment of this species was observed in the moderate patients ( Fig. 3h). Actinomyces meyeri and Halomonas sp. showed association with the mild compared to moderate ( Fig. 3i and j). Identification of severity associated transcriptionally active microbes in COVID-19 have multiple potential applications, with future studies potentially unravelling the mechanistic association of these microbes with COVID-19 severity. To seed these future investigations, we categorized the bacterial metabolic pathways across COVID-19 sub-phenotypes.
Metabolic pathways as a function of microbial diversity between clinical subphenotypes. The extensive non-redundant catalogue of microbial genes (KEGG pathway analysis) identified differentially enriched pathways involved in bacterial functions (Table S5). The top selected pathways showing differential enrichment across the clinical sub-groups is depicted in Fig. 4a. The metabolic potential as revealed by carbohydrate and amino acid metabolism pathways showed depletion in severe and mortality patients as compared with mild and moderate. The metabolism and biosynthesis pathways of nearly all amino acids (particularly phenylalanine, tryptophan, tyrosine, and lysine) were decreased in abundance. Other carbohydrate metabolic pathways (galactose, glyoxylate and decarboxylate, and sucrose metabolic pathways) were consistently under-expressed in both severe and mortality. Thus, the bacterial competence to produce and metabolize nutrients seems to diminish with increase in clinical severity index in COVID-19 cases. Alternatively, glycolysis/gluconeogenesis and oxidative phosphorylation pathways in the mortality group showed expression similar to the mild group. Genetic information processing pathways: replication and repair (homologous recombination and mismatch repair) and folding, sorting and degradation (sulfur relay system) along with ABC transporters (membrane transport for sugars, metals, peptides, amino acids, and other metabolites) were relatively depleted in severe and mortality groups, suggesting altered adaptation to adverse microenvironment. On the other hand, the glycerophospholipid metabolism pathway was observed to be enriched in mild and moderate cases, possibly indicative of concerted efforts for bacterial adaptation to the microenvironment in these groups (27). Of note, the mortality and severe patients showed significant overexpression of ribosomal proteins (Fig. 4b) whereas, bacterial motility proteins and chemotaxis were significantly diminished (Fig. 4c, d).
Ribosomal proteins might be an indicator of changes in growth rate as evident by more transcriptional intensity of rRNA in the microbiome of severe and mortality patients. Several studies highlight the correlation between rRNA abundance with the active proliferation of microbes inclusive of use of cellular rRNA as an indicator of in situ growth rate in various naturally occurring bacterial populations. Studies in Escherichia coli and Salmonella typhimurium have revealed that cellular RNA concentration is closely linked with growth rate (28)(29)(30). However, such correlation is not linear and there are evidences toward the inverse relationship between rRNA and active growth of microbes limiting the association between rRNA and growth rates (31)(32)(33)(34)(35)(36).
The comparative analysis of enriched KEGG orthology identifiers (KO) terms against their corresponding metabolic pathways (Fig. 4a) revealed differential abundance across the sub-clinical phenotypes as shown in Table S6.
Co-presence of respiratory viruses across COVID-19 patients. Using RVOP and holo-transcriptome analysis, both methods identified significant differences in the abundance of viruses in the four sub-phenotypes with relatively higher viral abundance observed in mild and moderate patients (Fig. 5a, b; Table S4). A total of 12 different viruses were captured as illustrated in Fig. 5c. Significant differences in viral abundance was observed only for Choristoneura occidentalis granulovirus and Tobacco mosaic virus, however, we were unable to find association with the disease sub-phenotypes. We also note diminution of Streptococcus Phages and complete absence of Simbu orthobunyavirus from the mortality group (Fig. 5d). This suggests the depletion of viral diversity in the mortality group when considering the entire population of viral species.

DISCUSSION
Disease severity in COVID-19 is orchestrated through a series of variables: viral, host, and resident microbes, leading to diversity of symptoms and an increase in the odds of mortality. Stratification of patients in different classes of severity can be a definitive approach to detangle the concomitant risk factors. This study utilizes an integrative approach involving different perspectives of host clinical characteristics, viral genome variations, and a comprehensive exploration of the nasopharyngeal microbiome threaded together, to elucidate the disease severity observed in the COVID-19 patient sub-phenotypes.
The clinical characteristics of the COVID-19 patients could only partly account for the severe and mortality group cases. The non-modifiable risk factors like age, gender, and comorbidity were in sync with previous studies which have shown detrimental effect during COVID-19 (37), yet, a complete association is difficult to comprehend. It is estimated that roughly 50% of the hospitalized patients had no reported comorbidity (38). Similarly, the clade diversity of the viral isolates revealed a proportional presence of mild, moderate, and severe patients in all the three clades-19A, 20A, and 20B. The dominant presence of mortality cases in clade 20A could be attributable to the fact that clade 20A became the source clade for VOCs (variants of concern) like Delta, Kappa, and Beta strains of SARS-CoV-2 (39). However, presence of mild and moderate patients in clade 20A points toward other plausi- ble factors modulating the clinical outcome. Moreover, N:P13L mutation associated with the recovered group was identified in VUI-NP13L (variant of interest) in Southeast Brazil around June 2020 and was reported with low mortality rates and cases per million (40).
Q677H mutation associated with the mortality group emerged independently across six lineages in the United States and showed evidence of adaptation, due to its effect on the

Microbial Signatures for COVID-19 Sub Phenotypes
Microbiology Spectrum proximal polybasic furin cleavage site in the Spike protein (41). Importantly, the clinical impact of this mutation needs to be fully determined (42). Recently, it was reported to help increase the infectivity and confer neutralizing antibody resistance, particularly in background of other VoCs (43). The presence of these mutations in our cohort highlights the importance of genomics surveillance of the viral population but at the same time in vitro studies toward elucidating the mechanism/s is awaited. The variable host clinical characteristics and the limited viral genetic diversity led us to explore whether alterations in the microbiome composition due to SARS-CoV-2 infection align with the scale of disease severity observed in our patient cohort. If yes, can such signatures be identified during the early phase of the infection with value for disease stratification? The composition of the nasal microbiome has been observed to be altered in several respiratory infections including COVID-19 affecting the course of the disease and clinical outcome (44). Different metagenomic studies portrayed decrease in the nasopharyngeal microbiome diversity in SARS-CoV-2 infected patients, leading to predominance of a specific microbe that correlated with symptom severity (22,23,45).
Although the present study did not reveal significant shifts in diversity and composition of the nasopharyngeal transcriptionally active isolates among the four clinical sub-phenotypes of COVID-19, this could be attributable to the Anna Karenina principle (AKP) of microbiome dysbiosis (46)(47)(48), which implies that a greater observed variability exists among individuals with dysbiotic microbiome. Additionally, multiple studies investigating the bacterial communities and the respiratory microbiome in COVID-19 revealed drastic reduction in diversity and composition with increasing disease severity (23,49). Another plausible explanation for the observed differences relies on the transcriptomic profiling method utilized for the current study, which may capture bioactivity from a diverse community more effectively than the absolute DNA quantification methods commonly used for microbiome analyses (50,51). Moreover, meta-transcriptome sequencing can capture the microbiome profiles at high resolution along with the active functional elements, which has been demonstrated to change (microbial gene expression) without large alterations in overall community structure (52). Notably, a significant increase in the relative bacterial abundance (based on transcriptional activity) in the severe and mortality groups was observed when compared with the mild and moderate. This feature indicates the probable role of dysbiotic microbiome in disease severity. Majorly, dysbiosis is associated with alteration in the abundance of bacterial population causing opportunistic microbes to flourish or making way for pathogens for invasion (53,54). Separate studies have identified enrichment of different species like L. buccalis, V. parvula, C. gingivalis, P. melaninogenica, H. parainfluenzae, R. mucilaginosa, and N. subflava in the oral microbial communities of COVID-19 patients (23,55,56). Moreover, different mechanisms have been suggested by which a dysbiotic nasopharyngeal microbiome, leading to an overgrowth of certain microbial species, can alter or cause progression in disease severity of COVID-19. A proposed mechanism is enrichment and subsequent migration of nasopharyngeal microbial species into the lungs, resulting in pneumonia and emphysema conditions (57). This proposition was also seen in different studies, where oral pathogens propagated a new disease when it migrated to other organs (23,58,59). Alternatively, a shift in the healthy microbial community due to disease conditions might alter the cytokine production and lead to an increase of both inflammatory response and clinical severity in respiratory diseases (60)(61)(62).
Taken together, viral-host resident transcriptionally active microbes' cross talk might initiate a sequence of dynamic events where the SARS-CoV-2 viral infection may modulate differential abundance of particular bacterial species, leading to microbiome dysbiosis. A similar interaction was shown by Susi et al., that competition for limited host resources may result in a "tragedy of the commons" situation, where non-optimal levels of host exploitation may emerge (63). These bacterial species might in turn aggravate the viral/primary infection through different mechanisms based on their inherent functional properties (Fig. 6).
Higher transcriptional intensity of Achromobacter xylosoxidans and Bacillus cereus in the mortality group may have implications in disease severity. A. xylosoxidans is an aerobic, motile, Gram negative bacteria that carries intrinsic as well as acquired mechanisms of resistance, conferring multidrug-resistance (MDR) phenotype (64). A. xylosoxidans, has emerged as an opportunistic pathogen, causing pulmonary infection in the cases of dysfunctional immune response or predisposing conditions like end stage renal disease or cardiac disease (65,66). Interestingly, a study by Jabbar et al. reported A. xylosoxidans as the second most prevalent bacterial species found in severe COVID-19 patients with resistance to several classes of antibiotics (67). This bacteria is reported to enhance inflammation by increasing the production of cytokines such as IL-6, TNFa, and G-CSF as seen in the case of cystic fibrosis (CF) (68). Meanwhile, B. cereus, the facultative anaerobe enriched in mortality cases is known to cause infection via consumption of contaminated food and nosocomial transmission (69). Mainly, intestinal diseases such as diarrhea are reported to be caused by B. cereus yet, rarely though, B. cereus cause lower respiratory tract infections. Shimoyama et al. highlighted a case where the patient succumbed to lung infiltrates associated with B. cereus (70). A recent study shows that during COVID-19, B. cereus co-infection can be observed in a patient with probable immunocompromised state due to inhalation-based steroid use (71). Leptotrichia buccalis is an opportunistic pathogen commonly found in the oral microbiota and reported to cause severe pneumoniae in SARS-CoV-2-infected elderly individuals (72,73). Similarly, L. buccalis was significantly enriched in the severe group of our cohort with a median age above 60 years. The transcriptional abundance of different microbial species between severe and mortality groups with similar age presentation of ;64 years suggests that the observed microbial differences in the study is a function of the disease severity. Interestingly, the other anaerobic opportunistic microbe, Veillonella parvula, has been reported as a marker for COVID-19 when compared with flu and healthy controls. In our patient cohort, V. parvula was found significantly enriched in the moderate group, although its substantial presence was detected across the cohort. Additionally, studies deciphered that both V. parvula and SARS-CoV-2 stimulate production of proinflammatory cytokines, mainly TNF-a, that might aggravate the inflammatory and pro-oxidative responses leading to diverse respiratory infection outcomes (74)(75)(76)(77).
On the other hand, Actinomyces meyeri and Halomonas spp. were reportedly enhanced in mild COVID-19 symptom patients. Disease association with Halomonas species is rarely reported (78) whereas A. meyeri is known to cause pneumonia and has a predilection for dissemination, yet reported cases have mild presentations (79). The significant abundance of specific microbial species across COVID-19 disease sub-phenotypes suggests their modulatory role in disease outcome. We summarized the current findings in Fig. 7.
The causal effect of the presence of different transcriptionally active microbial species in different classes of severity was carefully looked through the treatment regime for each of the subgroups. Based on the symptoms/symptom severity, Table 2 highlights treatment regime given to each of the subgroups. The administration of antibiotics to each sub-phenotype could have been beneficial in alleviating the symptoms and relief to patients, especially for the recovered group of patients. The presence of transcriptionally active isolates of different microbial species might be playing a modulatory role in disease severity, as exemplified by the presence of Achromobacter xylosoxidans and Bacillus cereus in the mortality group, where A. xylosoxidans is reportedly resistant to several classes of antibiotics (67). High transcriptionally active isolate abundance of A. xylosoxidans in the nasal microbiome in our mortality patients might predispose the host to severe respiratory viral infection and nonresponsiveness to antibiotics. Although, it is pertinent to mention that we detected 2,417 genes for A. xylosoxidans with more than 80% similarity in our study, we did not find high confidence MDR A. xylosoxidans genes in the mortality patients. One of the probable reasons might be the time point of the transcriptional profiling, during which the MDR genes may not be actively expressed. Similarly, the resistance of B. cereus to penicillins and cephalosporins as a result of beta lactamase production is also reported to be leading cause of mortality in infected patients (80).
Moreover, not only did the microbiomes demonstrate shifts in bacterial abundance and populations across the sub-phenotypes of COVID-19, but the dysbiosis was also functionally evident. The overall downregulation of metabolic pathways in severe and mortality groups reflected a reduction in microbial functions, pointing toward an unlikely impact on the host. The depletion of bacterial pathways associated with membrane transport (two component system and ABC transporters), bacterial chemotaxis, and cell motility in severe and mortality groups indicated toward lower bacterial potential for sensing and adapting to the environment which is consistent with a previous study of COVID-19 (23,81,82).
Furthermore, the nasopharyngeal transcriptionally active microbiome of the mortality group patients did not show depletion of glycolytic pathway, wherein the micro-environment favors aerobic glycolysis (evident through growth of aerobic bacteria), a condition that also sustains high SARS-CoV-2 replication (84). Reasonably, we infer that nasopharyngeal  microbiome dysbiosis during SARS-CoV-2 infection might enhance pathogen invasion and alter immune responses, contributing toward the observed clinical severity in different groups of COVID-19 patients. In parallel with the transcriptionally active bacterial microbiome profile, our work also characterized the local virome to understand the effect of SARS-CoV-2 infection at the primary site of entry. Studies have reported an increase in pathogenic viral species coinfecting the oral microbiome, accelerating disease severity (13). Yet, a significant difference in abundance and diversity of viral species among the four clinical sub-phenotypes was not observed, pointing toward possible viral interference mechanism due to SARS-CoV-2 infection (85,86). Moreover, a reduction in phage population of the microbiome can be an effect that provides an additional milieu to opportunistic pathogens of the microbiome to grow and cause secondary infections accelerating the clinical course.
Conclusion. The findings in the study offer an opportunity to bring forth the less explored modulatory role of the microbiome alterations and disease severity in a hospitalized cohort of COVID-19 patients from the Indian sub-continent Fig. 8. The differentially active isolates of certain bacterial species associated with clinical groups provide leads for evaluating their probable roles in modulating the disease course in COVID-19. This especially comes into play when it has been recognized that, for co-infecting bacterial species, differences exist between populations at risk, pathogen distribution, and antibiotic susceptibility. Future strategies can include exploring the microbial spectrum in COVID-19 patients from different geographical regions which might be beneficial in health care management.
The study can be strengthened in the future with the inclusion of the longitudinal sampling of the patients included in the study. That may enhance the scope to understand the dynamics of transcriptionally active microbial population change during the course of the disease as well as the functional dynamics. A more balanced gender representation could have been useful. However, because these are hospital-admitted FIG 8 It highlights the study design and significant findings wherein microbial signatures for COVID-19 disease severity has been discovered in our hospital admitted cohort of patients.

Microbial Signatures for COVID-19 Sub Phenotypes
Microbiology Spectrum patients during the COVID-19 disease, we appreciate the practical limitation. If possible, more than one hospital cohort could help expand the understanding and relevance of the findings made in the study.

MATERIALS AND METHODS
Design of the study. The study was conducted with 198 COVID-19 patients enrolled between April 2020 to July 2020, to characterize the abundance of transcriptionally active isolates of different co-infecting bacteria and viruses inhabiting the nasopharyngeal cavity of COVID-19 patients. Detailed clinical presentation and demographic data along with RT-PCR test results and disease outcomes from each patient's electronic medical record were collected and carefully documented for its usage during analysis.
Sample collection and preprocessing. The patients were admitted to the MAX Hospital, Delhi, India with confirmed COVID-19 positive status based on RT-PCR results. The nasopharyngeal and/or throat swabs were collected in viral transport media (VTM) solution by the paramedical staff at the hospital on the day of reporting, by trained medical staff with required safety precautions inclusive of PPE, face mask, and gloves.
Clinical subgrouping of study participants. The patients were categorized into four sub-phenotypes based on disease severity and outcome: mild, moderate, severe, and mortality, as per Indian Council of Medical Research (ICMR) guidelines (Comprehensive Guidelines for Management of COVID-19 patients, Directorate General of Health Services, MoHFW, GOI). Briefly, SpO 2 levels, requirement of respiratory support, and/or breathlessness parameters were taken into consideration. In mild cases, the SpO 2 level was $ 94% with no breathing problem. Moderate patients were defined as showing breathing difficulty with SpO 2 levels ranging between 91% and 93%. Severe patients showed respiratory distress with respiratory support requirement and SpO 2 levels , 90%. Mortality group was defined as patients who succumbed to COVID-19 during hospital stay. Mild, moderate, and severe were clubbed together into one group as "recovered" compared with the mortality cases, for some of the analysis included in the manuscript.
Library preparation and sequencing. Whole genome sequencing of the 198 RT-PCR positive samples, using the capture based Illumina Respiratory Virus Oligo Panel (RVOP), was done to capture SARS-CoV-2 genome as well as additional co-presence of other respiratory viruses. Using a combination of clinical data, disease sub-phenotype, and availability of RNA, a subset of 86 samples of the 198 total samples were studied to explore the presence of transcriptionally active microbes using Holo-Seq (Holo-transcriptome). The library preparation protocols for RVOP and Holo-transcriptome have been previously published from our lab (87). Briefly, double stranded cDNA was prepared from 100 ng RNA using Superscript IV first-strand synthesis system (Thermo Fisher Scientific, Cat. No. 18091050) and DNA polymerase I Large (Klenow) Fragment (New England Biolabs, Cat. No. M0210S). The RVOP library was prepared using Illumina DNA Prep with Enrichment kit (Illumina, Cat. No. 20018705) and the Holo-transcriptome library was prepared using Illumina TruSeq Stranded Total RNA Library Prep Gold (Illumina, Cat. No. 20020598) as per the manufacturer's protocol. Agilent 2100 bioanalyzer was used to check the quality of both libraries. The RVOP library was denatured and diluted to optimal loading concentration for sequencing on MiSeq platform, using v3 reagent kit at 2 Â 75 bp read length. The Holo-transcriptome library was sequenced on the NovaSeq 6000 system, using the NovaSeq SP reagents v1 at 2 Â 101 read length at 400pM loading concentration.
Sequencing data analysis and metatranscriptomic analysis. The sequencing data analysis was performed as previously published from our lab (87). FastQC v0.11.9 was used to check the Phred quality score for all sequences (Babraham Bioinformatics, 2020a -FastQC A Quality Control tool for High Throughput Sequence Data). The quality score threshold was 20 and above. Adapter trimming was performed using the TrimGalore v0.6.6 and alignment of sequences was performed using the HISAT2 v2.2.1 algorithm on human data build hg38 (88,89). SAM tools v1.12 were used to remove aligned human sequences (90). Henceforth, only unaligned sequences were taken into consideration. BCFTools v1.12. generated consensus FASTA and variant calling, which was followed by the alignment of sequences to the 40 respiratory virus panel of Illumina RVOP, to explore the presence of respiratory viruses in addition to SARS-CoV-2 (91). The detected species were counted using the number of reads mapped per species. Kraken2 was used to assign taxonomic labels to microbial species detected from the RVOP and the Holo-Seq analysis (92). The output from the metagenomic classification of the detected species obtained from Kraken was analyzed further using the Pavian software (93).
De novo assembly and pathway enrichment analysis. We used MEGAHIT v1.2.9 (94) to perform the de novo assembly of the samples using the raw sequencing reads, and contigs larger than 150 bp were retained to predict the genes by MetaGeneMark v3.25 (95) using default parameters. Then, CD-HIT v4.8.1 was applied for gene clustering and merging each sample. Finally, redundant sequences with sequence similarity and alignment lengths above 95% of the sequence length were removed. The functional profiles were annotated according to Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology (http://www.genome.jp/ kegg/) with the maximum e-value cut-off 1*10 2 5, minimum identity of 95%, and minimum alignment length of 15 amino acids for proteins.
Mutational and phylogenetic study. Of 198 samples, 160 SARS-CoV-2 genomes (with .50% genome coverage) were used for phylogenetic analysis, as previously described by Mehta et al. (7). Clade assignment to all the genomes was done using Nextclade (https://clades.nextstrain.org/). The vcf files were used for mutational analysis. We applied a nonparametric Fisher exact test of significance (for independence between two categorical variables) on our mutation data set which consisted of the total set of mutations identified for the study cohort (independent of its presence in recovered or mortality). P-values were calculated from two-sided tests using 0.05 as the significance level. The direction of the association between the mutation and group (recovered/mortality) was calculated using phi-coefficient correlation (r w ) by measuring the strength of association, henceforth identifying the significant mutations for the mortality as well as recovered patients. Gviz and trackViewer packages from R were used to plot the lollipop plot to visualize the mutations (96,97).
Statistical analysis. The data was described using descriptive statistics, which display continuous variables as medians or interquartile ranges and categorical variables as percentages or proportions. Wherever appropriate, we compared the differences using the ANOVA, Mann-Whitney U test and Chi square testing. To compare the distribution of bacterial presence across our patient categories, we employed the Kruskal Wallis test. The Shannon Diversity index (H) was calculated to characterize the bacterial species diversity in patient samples used for the Holo-transcriptomics study to account for the abundance and evenness of bacterial species in each patient sample. For analysis of beta diversity, we performed principal coordinate analysis (PCoA) in PAST software using Bray-Curtis dissimilarity matrix and PERMANOVA was calculated to determine the statistical significance of beta diversity (https://palaeo-electronica.org/2001_1/past/issue1_01.htm).
Data availability. The data sets presented in this study can be found online at the NCBI-SRA under the accession numbers PRJNA676016 and PRJNA678831 the consensus fasta are available at the GISAID-EpiCoV (https://www.gisaid.org/) under the submission IDs: EPI_ISL_5316892-EPI_ISL_5317001 and EPI_ISL_5317004 -EPI_ISL_5317014.