Temporal Stability and Genetic Diversity of 48-Year-Old T-Series Phages

ABSTRACT T-series phages have been model organisms for molecular biology since the 1940s. Given that these phages have been stocked, distributed, and propagated for decades across the globe, there exists the potential for genetic drift to accumulate between stocks over time. Here, we compared the temporal stability and genetic relatedness of laboratory-maintained phage stocks with a T-series collection from 1972. Only the T-even phages produced viable virions. We obtained complete genomes of these T-even phages, along with two contemporary T4 stocks. Performing comparative genomics, we found 12 and 16 nucleotide variations, respectively, in the genomes of T2 and T6, whereas there were ∼172 nucleotide variations between T4 sublines compared with the NCBI RefSeq genome. To account for the possibility of artifacts in NCBI RefSeq, we used the 1972 T4 stock as a reference and compared genetic and phenotypic variations between T4 sublines. Genomic analysis predicted nucleotide variations in genes associated with DNA metabolism and structural proteins. We did not, however, observe any differences in growth characteristics or host range between the T4 sublines. Our study highlights the potential for genetic drift between individually maintained T-series phage stocks, yet after 48 years, this has not resulted in phenotypic alterations in these important model organisms. IMPORTANCE T-series bacteriophages have been used throughout the world for various molecular biology researches, which were critical for establishing the fundamentals of molecular biology, from the structure of DNA to advanced gene-editing tools. These model bacteriophages help keep research data consistent and comparable between laboratories. However, we observed genetic variability when we compared contemporary sublines of T4 phages to a 48-year-old stock of T4. This may have effects on the comparability of results obtained using T4 phage. Here, we highlight the genomic differences between T4 sublines and examined phenotypic differences in phage replication parameters. We observed limited genomic changes but no phenotypic variations between T4 sublines. Our research highlights the possibility of genetic drift in model bacteriophages.

ampules by standard top-agar assays. To compare the genetic divergence of the phages, we obtained two contemporary stocks of T4 (referred to here as T4 sublines for simplicity) from phage laboratories in Australia and the United States. We then compared the genetic and phenotypic differences of these sublines to Hancock's 1972 stock of T4.

RESULTS
Stability of T-series phages in prolonged storage. To examine the viability of the 48-year-old T-series phage stocks, we plated 1 ml of lysate from T2, T3, T4, T6, and T7 phages with their respective hosts using the soft-agar overlay technique. We did not open or plate T1 phage vials due to concerns about its persistence and history of contaminating laboratory stocks of E. coli. The titers, in PFU per milliliter, were recorded from at least three different vials for each strain (Table 1). We did not observe any plaques from T3 and T7 lysates. To verify whether there were any active phage particles in the T3 or T7 stocks remaining at low titers, we propagated entire vials of the original lysates with E. coli B and E. coli BL21, respectively, overnight in an attempt to recover viable phages. However, no phages could be recovered following overnight amplification. Lysates of T2 and T4 showed 4 to 6 plaques per ml on top agar, while T6 phage had 10 4 to 10 5 PFU/ml (Table 1). Our results revealed that T-even series phages could be revived from the lysates stored approximately 48 years ago, while T-odd series phages (T3 and T7) were not able to be recovered from the 1970s stock.
Genomic characteristics of the T-even series. The complete genomes of 1972 stocks of T2, T4, and T6 phage, referred to here as Hancock phages, were obtained using Illumina HiSeq. Raw reads of each genome were aligned with the corresponding NCBI RefSeq sequences to examine single nucleotide polymorphisms (SNPs) and insertion-deletion variations (indels) using Snippy v4.2.0. We observed 12 and 16 variations (SNPs plus indels) in the T2-Hancock and T6-Hancock phages, respectively, compared with the NCBI RefSeq sequences (T2 nucleotide accession no. LC348380; T6 nucleotide accession no. MH550421) with the nucleotide variations being distributed throughout the genome (Data Set S1, sheets 4 and 5). Comparatively, when we investigated the T4-Hancock phages, we found 172 nucleotide variations compared to the NCBI T4 RefSeq sequence (nucleotide accession no. AF158101). The reason for this high degree of nucleotide variation was not immediately clear. The original T4 genome sequence in the NCBI database was last updated in 2003 and had been completed using traditional sequencing methods involving cloning and PCR (15). To examine whether this difference arose due to artifacts in T4's RefSeq entry, we compared genome sequences of two contemporary T4 sublines from the Barr lab (Monash University, School of Biological Sciences, Australia) (referred to here as T4-Barr) and from the Bacteriophage T4 lab, led by Venigalla B. Rao, The Catholic University of America, Washington, DCs, USA (referred to here as T4-Rao) to the NCBI RefSeq entries. We observed that T4-Barr and T4-Rao had similar numbers of nucleotide variations, 172 and 175, respectively, compared with the T4 RefSeq entry (Data Set S1, sheets 1, 2, and 3). This led us to conclude that there may be artifacts in the NCBI RefSeq genome, and we therefore used T4-Hancock as a historical reference genome in our subsequent analysis. Comparative genomics of T4 sublines. To compare the complete genomes of Teven phages, we assembled the Illumina HiSeq reads using Unicycler v0.4.3 with a minimum contig length of 1,000 bp, which produced a single contig of a complete genome for each strain. The genome size of T-even series of Hancock strains was broadly similar to that of the respective NCBI RefSeq entry ( Table 2). The complete genomes were aligned by pairwise sequence alignment as implemented in Pyani (https://huttonics .github.io/pyani/) to obtain the average nucleotide identity (ANI). Analysis revealed that genomes of T-even phages were closely related to each other, with more than 96% nucleotide sequence similarity between T2, T4, and T6 phages (Fig. 2B). T4-Barr and T4-Rao showed higher similarity to each other than to T4-Hancock ( Fig. 2A).
Next, we assessed the nucleotide variations (SNPs and indels) between T4 sublines. To obtain an unbiased annotation, we manually annotated the complete genome of each subline using information from NCBI T4 RefSeq with an identity threshold of at least 98%. T4-Barr had two insertions (frameshift) and six SNPs (five missense and one synonymous) compared to T4-Hancock. Our analysis showed five additional variations (one insertion and four missense SNPs) in T4-Rao, taking the total number of variations to 13 (Table 3).
Frameshift mutations were observed in an auxiliary gene (arn) and a hypothetical protein (MobD.6). arn encodes an anti-restriction endonuclease that inhibits the E. coli host's endonuclease activity (16). Our manual annotation of T4-Hancock showed that Arn consists of 94 amino acids, which is slightly different from that of NCBI T4 RefSeq (92 amino acids). Interestingly, duplication of nucleotide A at position 258 of arn resulted in a stop codon at position 276, truncating the protein length to 92 amino acids in T4-Barr and T4-Rao compared to 94 amino acids in T4-Hancock. The frameshift mutation in MobD.6 also caused a gain of a premature stop codon at position 180 (of 387 bp), potentially resulting in a loss-of-function mutation in this hypothetical gene.
On the other hand, many missense variations were observed in genes associated with essential structures, such as two units of topoisomerase II (gp39 and gp52), which helps in DNA metabolism, and the baseplate wedge initiator (gp7), tail sheath protein (gp18), and long tail fibers (gp34 and gp37), all associated with phage adsorption to the bacterial host. Of the five additional variations in T4-Rao, three were observed in essential genes: the tail sheath protein (gp18), distal subunit of long tail fiber (gp37), and medium subunit of topoisomerase (gp52).
We also investigated variations in the genome of T4 or T4-like phages available in NCBI databases compared to that of T4-Hancock. At first, we interrogated the NCBI nucleotide database using the keywords "Escherichia phage T4" or "Escherichia virus T4" and "complete genome." We received eight entries with complete genomes (including T4-Hancock of our study and T4-RefSeq) from our search criteria as of 15 December 2020. Furthermore, we included other T4-like genomes from published papers (17)(18)(19)(20), making a total of 10 complete genomes for the comparison. On nucleotide comparison, we obtained identities between 82.9% and 99.9% among 10 genomes (Fig. S1). Next, we selected six genomes that showed .99% similarity for variant analysis. We observed a total of 89 nucleotide variations in various coding sequences (CDS) among those six genomes compared to T4-Hancock (Data Set S1, sheet 6) and this also includes 12 variations (of 13) observed in the genomes of T4-Barr and T4-Rao.
Growth characteristics of T4 sublines in E. coli B and E. coli K-12. We then sought to assess whether the identified mutations had effects on phage replication variables. For these experiments, we used two hosts: E. coli B and the closely related strain E. coli K-12. T4 adsorbs to these two E. coli strains using different receptors. In E. coli K-12, T4 uses lipopolysaccharide (LPS) and outer membrane protein C (OmpC) as receptors. However, in E coli B, which harbors a deletion in ompC, T4 uses LPS as the sole receptor (21,22). First, we performed an efficiency-of-plating (EOP) assay to examine relative titers of T4 sublines on E. coli K-12 compared to the original host of propagation, E. coli B. Our results showed that the relative EOP were not significantly different between any of the three sublines of T4. However, T4-Rao had slightly lower EOP (1.0) than T4-Hancock (1.1) and T4-Barr (1.1) (Fig. 3).   Next, we compared the replication cycle variables of latency period and burst size using one-step growth curves on the same pair of hosts. Our results revealed that all T4 sublines had similar growth features, characterized by a latent period of approximately 20 min, followed by a maturation period of ;10 min, reaching their plateau within ;30 min ( Fig. 4) as described previously (23). Similarly, the average burst sizes of all the T4 sublines were comparable within the two bacterial hosts. However, the burst size in E. coli B was calculated to be approximately 190 phage particles per infected bacterial cell, which was significantly higher than the average burst size of 109 phage particles per infected bacterial cell in E. coli K-12.
Furthermore, to examine the rate of bacterial killing by phages, we performed growth kill assays. The three sublines of T4 were mixed at a multiplicity of infection (MOI) of 0.01 with actively growing (optical density at 600 nm [OD 600 ] ; 0.2) E. coli B and K-12, and growth was measured by determining OD 600 at 5-min intervals for 16 h. All three sublines of T4 were able to suppress the growth of E. coli B and E. coli K-12 within 1 h, with a sharp reduction in OD observed in E. coli K-12 with all three T4 sublines (Fig. 5). The T4 sublines showed different growth patterns between the two strains of E. coli, but there were no significant differences between the growth patterns of the T4 sublines.
Finally, we examined the host range of the T4 sublines on a selection of laboratory and pathogenic strains of E. coli. We used three standard laboratory strains and eight pathogenic strains for this assay. On these spot plate assays, we observed that T4 phages had lytic activity against 6 of 11 strains, but there was no difference in host range between the T4 sublines (Table 4).

DISCUSSION
T-series phages have been used as model systems to understand the fundamental molecular biology of life since the 1940s (3,5,6,(24)(25)(26)(27). Scientists throughout the world have been investigating the genetics and physiology of these bacteriophages, with many of the subsequent discoveries laying the foundation for the entire phage field. T4 phages in particular have been used as model organisms in systems biology to understand important biological phenomena, including the bacteriophage adherence to mucus (BAM) model, which was used to unravel tripartite symbioses between phages, microbes, and the gut mucosa (28), and in vivo studies to understand phagebacterial dynamics in the gut (29). Given that these model phages have been distributed, maintained, and propagated across many different laboratories in a multitude of ways, there exists the possibility for phenotypic and genotypic differences to accumulate between strains from different laboratories, which could ultimately compromise the comparability of these important model organisms.
To address this, we investigated the genetic and phenotypic changes between a 48-year-old T-series phage stock and contemporary laboratory strains. Our analysis of three T4 sublines revealed minor genetic differences and no detectable variation in growth characteristic in their usual hosts, E. coli B and E. coli K-12. We did, however, observe substantial differences between the NCBI RefSeq nucleotide sequence (accession no. AF158101) and those of our three T4 sublines, highlighting the need for an updated reference genome for T4 phage.
The complete genome of T4 was initially established by sequencing small fragments that were obtained following cloning and direct PCR (15). According to the GenBank record, the first information on T4's genome sequence was recorded in 1981 and its latest update was listed in 2003 (AF158101.6). A relatively high genetic variation between the GenBank RefSeq entry and T4 sublines of this study indicates that there may be artifacts in the original GenBank reference sequence. We therefore propose T4-Hancock as an updated reference genome for the field, and we used this genome as the reference in our analysis. The complete genome of T4-Hancock was 5 bp longer than the T4 RefSeq sequence. This difference may have arisen due to a surprisingly high number of nucleotide variations between these two genomes. The difference in the genome length may also be related to the fact that the genome of T4 is terminally redundant and circularly permuted, potentially altering the total length of chromosome between generations (30). Unlike T4, we observed less variability in T2-and T6-Hancock compared to NCBI RefSeq data. The quality of NCBI RefSeq may be the main reason for this. NCBI database record shows that T2 (nucleotide accession no. LC348380) and T6 (nucleotide accession no. MH550421) genomes were completed using Illumina and uploaded in 2018. Furthermore, this also corresponds with our finding of less variability in T4-Barr and T4-Rao genomes compared with T4-Hancock. We further examined the genetic divergence between historical T4 and two contemporary stocks of T4 (T4-Barr and T4-Rao). Our analysis showed the insertion of 13 bp in T4-Barr and 14 bp in T4-Rao compared with T4-Hancock. This finding is in line with the total chromosome length of each subline that we obtained in our study. The average nucleotide identity, although the difference was very small, showed that the two contemporary sublines, T4-Barr and T4-Rao, shared a higher percentage of similarity to each other than they shared with T4-Hancock. Interestingly, both contemporary sublines also shared the nucleotide variations, with T4-Rao having five additional variations. The majority of nucleotide variations were in essential genes, including genes that encode long tail fibers (gp34 and gp37), tail sheath (gp7), and enzymes associated with DNA metabolism. Although we could not track the history of our contemporary T4 phages, it is likely that these phages have gone through indefinite rounds of propagation, purification, and enumeration on various E. coli strains, the results of which may be associated with bottlenecks and mutational drift (17). Furthermore, the difference in mutation rate between T4-Rao and T4-Barr indicates that the mutational drift in T4 phages might have been driven by the different scale of work conducted between the laboratories, highlighting the possibility of genetic diversity of T4 sublines between laboratories across the world. However, we did not observe any noticeable differences in the growth characteristics between T4 sublines in our experimental conditions. Nevertheless, mutations in the essential structural genes, such as the long tail fibers, could potentially affect the phage's adsorption to its host. T4 phage uses the C-terminal region of the distal subunit (gp37) of the long tail fibers to recognize host cellular receptors (31,32), and it has been shown that mutations in this region, which comprises about 140 amino acid residues in a 1,026-residue-long polypeptide, could expand the host range of T4 and T4like phages (33,34). Our analysis showed that T4-Rao had a missense mutation in amino acid residue 417 of gp37, but we could not find any differences in host range  between the T4 sublines on examination against 11 different laboratory and pathogenic strains of E. coli. This study also examined the stability of T-series phages in prolonged storage. All the T-even series lysates had active phages present, which could be propagated in laboratory after 48 years in storage. However, we were not able to recover any viable phages from the lysate of T-odd series (T3 and T7). We do not have complete information on the long-term storage conditions of these phages over the last ;50 years. However, all T-series stocks were stored in the same box and therefore under the same conditions. Furthermore, initial titers of the stock, as labeled on the vials, were also broadly similar between T-odd and T-even phages. Therefore, our results suggest that the T-even series could be more stable than the T-odd series on prolonged storage. This finding is further supported by the close genetic relatedness of T-even phages, compared with the diversity seen across the T-odd phages (11). However, evidence from studies of T-odd phages stored at different time frame is required to verify this finding.
More recently, there has been rising concern regarding the reproducibility of scientific research. The inability to identify or use suitable model organisms is a major problem for reproducibility (35). Given that the variation could be possible in T4 sublines, phage researchers should be vigilant in their use and propagation of these model bacteriophages. For example, a study that investigated T7 and T3 sublines from different sources showed a high variability between sublines across laboratories, some of which were found to be using entirely different strains (36). Our in silico analysis of T4-like genomes further supports this, providing evidence that there were large variations in genome sequences between T4-like phages. However, some of the T4-like strains, although named differently in the NCBI database, closely matched the wild-type T4 strain.
In conclusion, our analyses suggest that individually maintained T4 sublines undergo continuous genetic drift that may cause microevolution in the model bacteriophage stocks. The genetic variation in T4 sublines did not show a difference in our phenotypic analysis, which is a favorable finding for the phage community, and the rationale proposed by Delbruck and coworkers on the use of model bacteriophages to avoid incomparability of results between laboratories proved to be still valid (37). However, the magnitude of genetic changes may vary between laboratories, which highlights a need for a larger-scale comparative study of model bacteriophages sublines.

MATERIALS AND METHODS
Bacteriophage stock. The stock of T-series bacteriophages in this study was obtained from Peter Reeves (University of Sydney, Australia). The stock comprised lysates of T1, T2, T3, T4, T6, and T7 phages, purified and stored in sealed glass ampules with chloroform in 1972. These phages were used in several studies by Robert E. Hancock in the 1970s (referred to as T#-Hancock here) (13,14,38,39). T1 phage was not used in this study due to its potential contamination hazard for our laboratory. These stocks have been stored in a cold room (;4°C) since the 1970s. In addition, we used two contemporary strains of T4; received from the Bacteriophage T4 lab at the Catholic University of America, led by Venigalla B. Rao (called T4-Rao here) and the phage collection of our lab (called T4-Barr here). T4-Rao was originally received from August H. Doermann, University of Washington, before 1990 (V. B. Rao, personal communication). T4-Barr was originally purchased from the ATCC by Forest Rohwer, San Diego State University, where it had been used for more than 5 years before being acquired by the Barr lab in 2016.
Bacteriophage revival and propagation. Lysates from the old stocks were each recovered in 15-ml Falcon tubes. One milliliter of lysate was mixed with susceptible hosts (E. coli B for T2, -3, -4, and -6 and E. coli BL21 for T7) and plated onto lysogeny broth (LB) agar plates using the soft agar overlay method, followed by incubation for 18 to 24 h at 37°C. Each plating was done in duplicate and repeated with lysates from at least three different vials. In instances of no plaque formation, amplification was attempted using the original lysate and its host in broth culture, which was then replated. Lysates with viable phages were propagated following the Phage-on-Tap protocol (41).
Phage DNA extraction and sequencing. High-titer lysate (.10 9 PFU/ml) was used for DNA extraction. To remove host nucleic acid contamination, lysates was treated with DNase (1 mg/ml) and RNase A (12.5 mg/ml) for 2 h at 37°C, followed by an inactivation step of 5 min at 75°C. DNA was extracted using Norgen phage DNA extraction kit (Norgen Biotek, Ontario, Canada), following the manufacturer's instructions. The extracted DNA was vacuum dried into a pellet for transport. Sequencing was performed using the Illumina HiSeq 150-bp paired-end platform at the Genewiz facilities (Suzhou, China).
Bioinformatics analysis. Illumina HiSeq platform generated five to six million raw reads from each sample. Trimmomatic v0.39 (42) was used to remove adaptor sequences and to truncate reads with a quality score less than 15 in a sliding window of 4 bp. Raw reads were assembled using Unicycler v0.43 (43) with a flag for minimum contig length of 1,000 bp, which produced a single and complete genome. We aligned NCBI RefSeq sequences (T2 nucleotide accession no. LC348380; T4 nucleotide accession no. AF158101; T6 nucleotide accession no. MH550421) with the assembled genomes using Mauve genome alignment (44), and the results from the alignment were used to rearrange nucleotides to match the start position of NCBI RefSeq entries. The genomes were then annotated manually; copying the annotation from their respective NCBI RefSeq entries using Geneious v 9.1.8 (45), provided a sequence similarity for the coding region of at least 98%, followed by manual curation where required.
To identify nucleotide variations, the filtered raw reads were mapped to either NCBI RefSeq entries or our assembled genomes using Snippy v4.2 (https://github.com/tseemann/snippy) with the setting of the minimum number of reads covering a site to be considered at 100 and the minimum variant call format (VCF) variant call quality also at 100. To compare nucleotide identity, the complete genomes were aligned by pairwise sequence alignment as implemented in Pyani (https://huttonics.github.io/pyani/). The average nucleotide identity (ANI), as a percentage, obtained from the analysis was converted into a matrix and visualized using pheatmap (46) in R package.
Determination of T4 growth characteristics. The growth characteristics of all three T4 phages were examined using the hosts E. coli B and E. coli K-12. First, relative efficiency of platting between hosts was examined. Each lysate was serially diluted to a titer of ;100 PFU/ml, and 500 ml was used in a soft-agar overlay with each host. After overnight growth, PFU were counted and recorded, and the relative efficiency of plating on E. coli K-12 was calculated as the ratio of PFU in E. coli K-12 to PFU in E. coli B. Each experiment was performed in duplicate and repeated three times.
Next, one-step growth curve experiments were performed on E. coli B and E. coli K-12 separately with each T4 subline. Bacteria from overnight broth cultures were diluted 1:20 in LB and allowed to grow until an optical density 600 nm (OD 600 ) of 0.2 was reached (;2 h), and the culture was then infected with phage at a multiplicity of infection (MOI) of 0.1. Phage was allowed to adsorb for 5 min at 37°C with orbital agitation at 120 rpm. The mixture was then pelleted (4,000 Â g, 2 min, room temperature), resuspended in fresh LB broth, and incubated at 37°C and 120 rpm. Samples (100 ml) were repeatedly taken every 5 or 10 min for a total period of 1 h, transferred into chloroform-saturated phosphatebuffered saline (PBS), serially diluted, and plated to determine numbers of PFU. The number of bacteria in the initial inoculum was determined by plate count. The number of PFU per infected cell was calculated by dividing PFU by initial density of bacteria. The experiment was repeated on at least three different occasions.
Bacterial lysis by phage was determined from sequential OD readings obtained using a microtiter plate reader. Bacterial cultures at exponential growth were obtained as described above. Each culture was added to individual wells of a 96-well microtiter plate and mixed with phage lysate at an MOI of 0.01. The plate was incubated at 37°C in a microplate reader (Epoch Microplate spectrophotometer; BioTek, Winooski, VT, USA) with continuous shaking, and the data were recorded every 5 min.
To gauge the host range of each T4 subline, we used 11 different clinical and laboratory isolates of E. coli and the standard spot plate assay (47). High-titer (.10 10 PFU/ml) phage lysate (10 ml) was spotted on the surface of agar plate seeded with host bacteria. The zone of clearance on the spotted area was recorded as a positive lytic activity.
Graphing, data presentation, and statistical analysis. All data were analyzed and visualized using GraphPad Prism v8. Averages and standard deviations for all the replicates were calculated and compared using a t test to obtain P values. To infer statistical significance, the threshold was set at P values of ,0.05.
Data availability. The nucleotide sequence of T4-Hancock is available in NCBI GenBank under accession number MT984581. Raw sequence data and alignment files are available in NCBI BioProject under ID PRJNA662192.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. DATA SET S1, XLSX file, 0.1 MB. FIG S1, PDF file, 0.4 MB.