Bacteriophage populations mirror those of bacterial pathogens at sites of infection

ABSTRACT Bacteriophages, viruses that parasitize bacteria, are known to be abundant at sites of bacterial colonization, but the relationship between phages and bacteria at sites of infection is unclear. Bacteriophages are highly specific to their bacterial host species, and so we hypothesize that phage populations would mirror those of bacterial pathogens within infected tissues. To test this, here we study publicly available cell-free DNA (cfDNA) generated using next-generation sequencing of infected bodily fluids, including urine, joint fluid, peritoneal fluid, bronchoalveolar lavage fluid, cerebrospinal fluid, and abscess fluid, as well as uninfected control samples. These were analyzed using a computational pipeline for identifying bacteriophage sequences in cfDNA. We find that bacteriophage sequences are present in both infected and uninfected bodily fluids and represent a variety of bacteriophage morphologies and bacterial hosts. Additionally, phages from Escherichia coli, Streptococcus, and Staphylococcus aureus are overrepresented both in terms of proportion and diversity in fluids infected with these same pathogens. These data indicate that phages reflect the relative abundance of their bacterial hosts at sites of infection. Bacteriophage sequences may help inform future investigative and diagnostic approaches that utilize cell-free DNA to study the microbiome within infected tissues. IMPORTANCE Bacteriophages are an active area of investigation in microbiome research, but most studies have focused on phage populations at sites of bacterial colonization. Little is known about bacteriophage ecology at sites of active infection. To address this gap in knowledge, we utilized a publicly available data set to study bacteriophage populations in cell-free DNA collected from sites of infection. We find that phages reflect the relative abundance of their bacterial hosts at sites of infection. These studies may lead to future investigative and diagnostic approaches that incorporate phages as well as bacterial cell-free DNA.

B acterial infections are a major public health issue, causing morbidity and mortality worldwide. Recent studies indicate that bacterial infections may cause 13.6% of all global deaths (1). In recent decades, the rise of antimicrobial resistance in bacterial infections has become increasingly common (2) and poses a growing threat to vulnera ble populations. The study of bacterial ecology at these sites is critical for understanding infection outcomes and developing improved diagnostics and therapeutics. However, much remains unknown.
Bacteriophage (phage) are viruses which infect bacteria, and they are highly specific to their bacterial hosts (3). Phage are present ubiquitously throughout the human body (4,5), and both reflect and influence bacterial populations. For this reason, phage are an attractive therapeutic target (6,7), but outside of the study of phage therapy, endogenous phages have been largely disregarded. Phage have been shown to be associated with bacterial host characteristics, such as antimicrobial resistance (8) or infection chronicity (9). Little is known about phage ecology in the human body as a whole, but what is known is mostly relevant to sites of bacterial colonization, such as in the gut (5,10). Aside from recent work describing phage populations in chronically infected tissues (11), the relationship between phages and bacteria at sites of infection is mostly unclear.
Here, we have investigated phage populations in infected bodily fluids and com pared these versus uninfected controls. To accomplish this, we utilize next-generation sequencing data of cell-free DNA (cfDNA) from a publicly available study. cfDNA are short unencapsulated DNA sequences and can be found in circulation in plasma as well as in bodily fluids. Though being largely comprised human sequences, reflects microbialand bacteriophage-sequences (12). They are, therefore, a strong candidate for analysis of microbial ecology in the context of infection. We utilize here a publicly available data set of 76 infected bodily fluid samples (Bioproject PRJNA558701) generated via Illumina sequencing (13). These included samples from infected wounds, joints, urine, serum, and other sites as well as culture negative controls. Causative infectious agents were identified by culture or 16S rRNA sequencing, including Escherichia coli, Streptococcus spp., and Staphylococcus spp. This data set provides an excellent setting to investigate the relationships between pathogen and bacteriophage at the site of infection.
To identify phages within the metagenomic sequencing data, we apply a previously described bacteriophage annotation pipeline ( Fig. 1A) (14). In brief, raw data were quality controlled and trimmed, human host reads were subtracted by mapping to the human reference genome, and a BLAST search utilizing the full NCBI Nucleotide database was used to assess non-human cfDNA proportions. This revealed that on average, 8.8% of identifiable non-human reads were from bacteriophage in infected fluids, indicating that at the average site of infection, bacteriophage comprises a considerable proportion of free DNA (Fig. 1B). Of note, an average of 12.8% of non-human reads belongs to bacteriophage in the uninfected surgical control fluid samples-indicating that bacteriophage are abundant in these fluids independent of infection status. A first-  Observation mSystems pass BLAST search (15,16) was performed as described with our previously described Curated Phage Database with stringent removal of sequences with human genome homology (14). To link phage with bacterial host(s), we utilize a Curated Phage Dictionary (14), naming convention of bacteriophages with clearly identified host genera, and NCBI nucleotide source host fields for bacteriophage sequence entries. This dictionary includes taxonomic classifications for both the phage and the bacterial host, if known. Observation mSystems Subsequent annotations reflect a diverse pool of bacteriophage across many families (Fig. 1C). We find that when comparing overall composition of phageomes by bacterial hostthere is notable enrichment for pathogen-specific phage by infection etiology and that there is a non-zero phage background in the negative control samples which possessed Escherichia, Klebsiella, Enterobacter, and other phages ( Fig. 2A). When analyzing these local phageomes with respect to infection etiology, we find that E. coli phage are increased in proportion compared to other infections ( Fig. 2A and B). We find similar trends in Streptococcus and Staphylococcus aureus infections ( Fig. 2A, C, and D). Ecological diversity of phage, as calculated by the Shannon Diversity Index (SDI), a measure of entropy often applied in ecology to quantify species richness and evenness (17), is disrupted on a per-infection basis (Fig. 2E through G), with higher levels of specific phage diversity in patients with infection by the corresponding host but not in controls or those with other infection etiology.

Patient Population
Taken together, these findings demonstrate that bacteriophage populations reflect bacterial pathogens in infected bodily fluids.
However, this work possesses several limitations. It is unclear whether the sequences characterized here are from active bacteriophage particles, free phage DNA, or prophage DNA from bacterial hosts-in part due to the relatively low depth of sequencing of non-human cfDNA in these samples preventing generation of phage contigs. Further studies are needed to understand the sources of these DNA as well as how they enter these bodily fluids. Another limitation is that this study utilizes publicly available cfDNA data taken from one patient cohort and may reflect geographically enriched features. Furthermore, the number of uninfected controls is low (10) in comparison to a number of infected fluids (76), and it is possible that this data set is underpowered for the detec tion of signals distinguishing uninfected from infected fluids beyond infection-specific enrichment of phage proportion and diversity. The number of samples additionally limits comparisons by the type of biosample, and future work should include a larger cohort for comparisons across different bodily fluids. Finally, the phage dictionary utilized here relies on characterized and sequenced phage genomes, which necessarily enriches for phages associated with human infections-it is possible that many environmentally associated phages are not being identified here due to a relative underrepresentation in sequence repositories.
In summary, we find that bacteriophage sequences are present in both infected and uninfected bodily fluids and represent a variety of bacteriophage morphologies and bacterial hosts. Additionally, we demonstrate that infection etiology is reflected in infected bodily fluids through pathogen-associated phage proportion and diver sity. Bacteriophage sequences may help inform future investigative and diagnostic approaches that utilize cell-free DNA to study the microbiome within infected tissues.