Comparative Genomic Analysis of Re-emergent Human Adenovirus Type 55 Pathogens Associated With Adult Severe Community-Acquired Pneumonia Reveals Conserved Genomes and Capsid Proteins.

Human adenovirus type 55 (HAdV-B55) is a recently identified acute respiratory disease (ARD) pathogen in HAdV species B with a recombinant genome between renal HAdV-B11 and respiratory HAdV-B14. Since HAdV-B55 first appeared in China school in 2006, no more ARD cases associated with it had been reported until 2011, when there was an outbreak of adult severe community-acquired pneumonia (CAP) in Beijing, China. Reported here is the bioinformatics analysis of the re-emergent HAdV-B55 responsible for this outbreak. Recombination and protein sequence analysis re-confirmed that this isolate (BJ01) was a recombinant virus with the capsid hexon gene from HAdV-B11. The selection pressures for the three capsid proteins, i.e., hexon, penton base, and fiber genes, were all negative, along with very low non-synonymous (dN) and synonymous (dS) substitutions/site (<0.0007). Phylogenetic analyses of the whole genome and the three major capsid genes of HAdV-B55 revealed the close phylogenetic relationship among all HAdV-B55 strains. Comparative genomic analysis of this re-emergent HAdV-B55 strain (BJ01; 2011) with the first HAdV-B55 strain (QS-DLL; 2006) showed the high genome identity (99.87%), including 10 single-nucleotide non-synonymous substitutions, 11 synonymous substitutions, 3 insertions, and one deletion in non-coding regions. The major non-synonymous substitutions (6 of 10) occurred in the protein pVI in its L3 region, which protein has different functions at various stages of an adenovirus infection, and may be associated with the population distribution of HAdV-B55 infection. No non-synonymous substitutions were found in the three major capsid proteins, which proteins are responsible for type-specific neutralizing antibodies. Comparative genomic analysis of the re-emergent HAdV-B55 strains associated with adult severe CAP revealed conserved genome and capsid proteins, providing the foundation for the development of effective vaccines against this pathogen. This study also facilitates the further investigation of HAdV-B55 epidemiology, molecular evolution, patterns of pathogen emergence and re-emergence, and the predication of genome recombination between adenoviruses.

Human adenoviruses are usually typed according to serum neutralization and hemagglutination-inhibition tests, both of which are associated with the three major capsid proteins, i.e., hexon, fiber, and penton base. Based on partial characterization of its hexon and fiber epitopes, as well as the restriction enzyme patterns , HAdV-B55 was first identified as HAdV-B11a (Hierholzer et al., 1974). This virus may have occurred earlier from an ARD outbreak in a military training base (Spain, 1969), and subsequently re-emerged in several outbreaks of ARD in Turkey (2004), Singapore (2005) discontinuously (Hierholzer et al., 1974;Chmielewicz et al., 2005;Zhu et al., 2009;Kajon et al., 2010;Li et al., 2014;Lu et al., 2014). Complete genome analysis suggested that this new type, HAdV-B55, is a Trojan horse, which contains a recombinant genome from both a renal pathogen, HAdV-B11, which provides the antigenic epitope, and a respiratory pathogen, HAdV-B14, which confers the cell tropism, biological and pathogenicity properties (Walsh et al., 2010). The strain QS-DLL is the first completely characterized HAdV-B55 based on the whole genomic analysis (Walsh et al., 2010). It caused an ARD outbreak in China, from March to April 2006, with 254 patients affected, of which 247 were students in a senior high school and one died during this outbreak .
Since this first isolate QS-DLL of HAdV-B55 was identified in 2006 (Yang et al., 2009;Walsh et al., 2010), no more ARD cases associated with this virus had been reported in China until 2011, when 16 HAdV-B55-like strains were identified based on partial gene sequencing in a CAP outbreak in Beijing, China in February and March (Cao et al., 2014). Of these, strain BJ01 was isolated from a throat swab of a 29-year-old adult who presented with severe CAP (Gu et al., 2012). The whole genome (34,773 bp) of strain BJ01 (Human adenovirus B human/CHN/BJ01/2011/55[P14H11F14]) was sequenced using the Sanger method, following PCR amplification of targeted overlapping regions. (GenBank accession number JX491639). (Zhang et al., 2012a). In this study, comparative genomic analysis of this re-emergent HAdV-B55 strain (BJ01) with the first HAdV-B55 strain (QS-DLL), as well as the bioinformatic analysis of HAdV-B55 strains circulating recently were performed. The findings will contribute to the studies of the evolution and epidemiology of HAdV-B55, as well as provide both the foundation for the development of effective vaccines and public health strategies against this pathogen.

Cells, Virus Stock, and Genomic DNA Extraction
This specimen was originally isolated from a throat swab of an adult diagnosed with severe CAP at Beijing Chao-Yang Hospital of China (March 2011) (Gu et al., 2012;Zhang et al., 2012a). Written informed consent was obtained from the patient. The protocol was approved by the Ethics Committee of Beijing Chao-Yang Hospital, Capital Medical University in accordance with the Declaration of Helsinki. The throat swab was inoculated and cultured in A549 cells (ATCC), and grown in Dulbecco's minimum essential medium supplemented with 100 IU/ml penicillin, 100 µg/ml streptomycin, and 2% (v/v) fetal calf serum, at an atmosphere of 5% (v/v) carbon dioxide. Viral genomic DNA was extracted from infected A549 cells, as described earlier (Le et al., 2001). PCR amplification and molecular typing was performed according to the previous study (Han et al., 2013).

Genome Sequencing and Annotation
The genome of strain BJ01 was sequenced using a Sanger primer-walking method, following PCR amplification of targeted overlapping regions (Zhang et al., 2012a,b). Both 5 -and 3 -ends (including both inverted terminal repeats) were sequenced directly by 5 primer Ad14-LTRS1 (5 -ACAGAAGACTTTCACACGGT-3 ) and 3 primer Ad14-LTRS2 (5 -GGTCCCTCTAAATACACATACA-3 ), respectively, using extracted genomic DNA as sequencing template. The sequence data provided average three-to fivefold coverage of the genome. Ambiguous sequences and gaps were re-amplified and resequenced. All the sequencing ladders were assembled using the SEQMAN (DNAStar; Madison, WI). The genome was further annotated according to the previous report (Walsh et al., 2010).

Genome Recombination Analysis
Genome sequence recombination analysis was done using Simplot 3.5.1 2 , including the options of Simplot and Bootscan analysis (Lole et al., 1999). The software MAFFT was used to align the HAdV-B sequences 3 . The parameters of gap opening penalty (1.53), Offset value (0), and the Scoring matrix for nucleotide sequences (200 PAM/k = 2) were set by default. The LAGAN (Limited Area Global Alignment of Nucleotides) program of mVISTA was used for the pair-wise comparisons of genomes 4 (Brudno et al., 2003).

Phylogenetic Analysis
Phylogenetic analysis was performed by Molecular Evolutionary Genetics Analysis (MEGA) 5.0.5 5 (Tamura et al., 2007). Phylogenetic trees with 1,000 bootstrap replications were constructed using a maximum-composite-likelihood method. All the other parameters were set by default.

Substitution Rate Analysis of HAdV-B55 Hexon, Penton Base, and Fiber Genes
The numbers of non-synonymous (dN) and synonymous (dS) substitutions per site between sequences and dN/dS ratios were calculated. The analysis was performed with MEGA 5.05 (Tamura et al., 2011) using the Nei-Gojobori model (Nei and Gojobori, 1986). The nucleotide sequences from 23 hexon, 43 fiber, and 17 penton base genes available in GenBank were included.
All the genome sequences included for substitution rate analysis were summarized in Table 1. For HAdV-B55, additional genome typing details were also presented. The sequences of the hexon, penton base, and fiber genes were either extracted from the genome files or deposited as single gene entries.

Sequences Used in the Study
The HAdV sequences of the hexon, penton base, fiber genes, and the genomes used for phylogenic analyses were retrieved from GenBank and summarized in Table 1, with additional sequence origin details if available (strain names, countries, years and GenBank accession numbers). The prototype strains were also used.

Nucleotide Sequence Analysis of HAdV-B55 Strain BJ01
The genome data of strain BJ01 was deposited into GenBank (accession number JX491639), and named formally as "Human adenovirus 55 isolate HAdV-B/CHN/BJ01/2011/55[P14H11F14]", further referred to as "BJ01". Figure 1 presented the genomic organization and transcription map of strain BJ01. The genome contained four early, two intermediate, and five late transcription units, and 34,773 bp in length, with a base composition of 26.16% A, 25.08% T, 24.38% G, 24.39% C, and a GC content of 48.76%, which was similar with the other members of subspecies B2 (mean of 49%) (Seto et al., 2010a). A total of 39 coding sequences were identified ( Table 2). The 137-bp ITR of strain BJ01 was identical to that of the first HAdV-B55 strain (QS-DLL; 2006) (Walsh et al., 2010).

Protein Sequence Analysis of HAdV-B55 Strain BJ01
The amino acid sequence percent identities of the proteins of HAdV-B55 strain BJ01 were presented in Table 3. Selected proteins of the representative prototypes in each HAdV species, including all species B adenoviruses, were used for comparison. The amino acid sequence analysis revealed that the proteins of strain BJ01 had the highest amino acid percent identities with that of HAdV-B14 (strain de Wit) except for the hexon protein, which is responsible for virus characterization by serum neutralization. The hexon protein of strain BJ01 had the highest amino acid identity with HAdV-B11 (98.4%), rather than HAdV-B14 (92.2%). The amino acid sequence identities indicated that strain BJ01 is a recombinant between HAdV-B14 and HAdV-B11, with the HAdV-B11 hexon gene inserted chimeric into the HAdV-B14 genome backbone (Table 3). Interestingly, E3 11.7 kDa protein of HAdV-B55 strain BJ01 had the same amino acid sequence identity with both HAdV-B14 and HAdV-B35, indicating that these three types share the conserved and similar E3 11.7 kDa protein. Among all the proteins analyzed, the highest sequence diversity was found in the fiber gene of all the seven species.

Nucleotide Substitution Rates and Selection Pressures for HAdV-B55 Major Capsid Protein Genes
The synonymous and non-synonymous substitutions as well as the selection pressures for the three major capsid protein genes of HAdV-B55 were examined and compared. All the dN/dS ratios of the three genes were less than 1 ( Table 4). This is to be expected because the evolution is dominated by negative selection, which removes mutations harmful to fitness (Zhao et al., 2014). Specifically, the hexon gene had least non-synonymous substitutions per site and the lowest ratio of dN/dS (0.06254). The non-synonymous substitutions and dN/dS ratio of the fiber gene was also low. However,  They are either deposited as single gene entries in GenBank or extracted from the genome files. # The HAdV-B55 genome analyzed in the present study. * They are likely typed as HAdV-B55 according to the genome sequences.
it was relatively higher than that of the penton base and hexon genes. Overall, the major capsid proteins of HAdV-B55, hexon, penton base, and fiber, are highly conserved when compared with other types, such as HAdV-B7 (Zhao et al., 2014).
This indicates that HAdV-B55 BJ01 was a recombinant composed of the HAdV-B14 genome backbone and the chimeric HAdV-B11 hexon.
Recombination analysis of both the genome and the hexon gene of HAdV-B55 strain BJ01 along with other HAdV-B viruses were shown in Figures 2B-D. Simplot analysis strongly indicates that there was a recombination event between HAdV-B14 and HAdV-B11 ( Figure 2B). Apparently, except for the hexon region, BJ01 had the highest similarity with HAdV-B14 instead of HAdV-B11 ( Figure 2B). Bootscan analysis of the genome further confirmed the recombination ( Figure 2C). The fine-resolution Bootscan analysis of the divergent hexon genes showed that the recombination event was within the hexon gene. Loop 1 (HVR1-6; nt 407 to 1,034) and Loop 2 (HVR7; nt 1,363 to 1,484) were derived from HAdV-B11 (the green regions). The other portions were derived from HAdV-B14, so was the rest of the genome ( Figure 2D).
Phylogenetic Analysis of the Whole Genome, the Hexon, Penton Base, and Fiber Genes of the HAdV-B55 The phylogenetic analysis of the whole genomic sequences confirmed that the genomes of HAdV-B members were closely related to each other, forming a clade ( Figure 3A). HAdV-B55 was closer to HAdV-B14 than HAdV-B11 genome (Figure 3A), which was consistent with the global pairwise alignment result in Figure 2A and the genome percent identities between the two types (98.879% vs 97.605%). The phylogenetic analysis of the fiber base and penton base genes drew the same conclusion as the whole genomes: they were closer to HAdV-B14 than HAdV-B11 (Figures 3C,D). However, the phylogenetic tree of the hexon genes showed a different distribution: HAdV-B55 was closer to HAdV-B11 than HAdV-B14 (Figure 3B), which was consistent with the Bootscan analysis in Figures 2C,D as well as with the amino acid percent identities of the hexon protein (the highest identity with HAdV-B11: 98.4%) ( Table 3).

Comparative Genomic Analysis of the First Two HAdV-B55 Strains Emerged in China: QS-DLL and BJ01
Comparative genomics analysis showed that strain BJ01 had a high genome identity with strain QS-DLL (99.87%). However, when compared between the two genomes in high resolution detail, 22 single-nucleotide substitutions, 3 insertions and one deletion were identified between the two genomes. (Table 5).
Twenty-one of twenty-two single-nucleotide substitutions were located in coding sequences from eight regions (E1B, E2B, L1, L2, L3, E2A, L4, E3). Ten single-nucleotide substitutions were non-synonymous substitutions, while 11 were synonymous substitutions ( Table 5). Of the 10 non-synonymous substitutions, six were located in the pVI protein in the L3 region (Figure 4), which is one of the four cement proteins of HAdVs (IIIa, VI, VIII, and IX). Three of the six single-nucleotide nonsynonymous substitutions (A66G, R68K, and V81F) occurred close to the N-terminal domain of protein VI, of which, two (A66G and R68K) were located within one of the two hexon binding domains. The other four non-synonymous substitutions were located in the 20kDa Protein, 55kDa Protein, Terminal Protein Precursor, and 14.9kDa Protein, respectively ( Table 5). The 11 synonymous nucleotide substitutions were present in the 20kDa Protein, 55kDa Protein, DNA polymerase, 43kDa Protein, pV, pVI, Hexon, DNA Binding Protein, 100kDa Hexonassembly Associated Protein, 11.7kDa Protein, and 20.1kDa Protein, respectively ( Table 5). The capsid protein genes were highly conserved between the two strains isolated 5 years apart. There was only one single-nucleotide mutation (T to C) in the hexon gene, which resulted in a synonymous substitution (L to L). Both the penton base and fiber genes of the two strains were identical with each other.
Compared with the genome of strain QS-DLL, strain BJ01 had nucleotide insertions in three regions (TCCATATCCGTGT, AAAAAA, and A, respectively) and one deletion (AA) within L2 pX poly(A) region. All the insertions and deletions were located in non-coding regions (NCR). The insertions of "TCCATATCCGTGT" were located at the upstream of E1A poly(A) region, whereas the insertions of "AAAAAA" were located within IX poly(A) region, and the insertion of "A" were located within L2 pX poly(A) region (Table 5).

DISCUSSION
Human adenoviruses were traditionally classified primarily according to their serological profile and hemagglutination  The mean numbers of non-synonymous (dN) and synonymous(dS) substitutions per site from between sequences were noted and dN/dS were calculated. This selection pressure analysis was conducted using the maximum-composite-likelihood model, and included nucleotide sequences from 23 hexon, 43 fiber, and 17 penton base genes of HAdV-55 available from GenBank. All positions containing gaps were eliminated automatically. Evolutionary analyses were performed with MEGA 5.05 version.
Here presented is the detailed comparative genomic analysis of the re-emergent HAdV-B55 strain BJ01 with the first isolate QS-DLL. The genomic nucleotide identity between the two strains was 99.87%, along with 26 mutations, including 11 synonymous and 10 non-synonymous substitutions, 3 insertions, one deletion and one G to A mutation in L2 NCR (Table 5).  genes, were analyzed for their phylogenetic relationships. For reference, taxon names include the corresponding GenBank accession number, country of isolation, strain name, year of isolation, and genome type. Bootstrapped, maximum likelihood trees with 1,000 replicates were constructed using the MEGA 5.05 software (see Footnote 5). and by applying default parameters, with a maximum-composite-likelihood method. Bootstrap numbers shown at the nodes indicate the percentages of 1,000 replications producing the clade. The scale bar indicates units of nucleotide substitutions per site. The sequences used for phylogenic analyses are retrieved from GenBank and summarized in Table 1. Sequences from strain BJ01 are noted for reference ( ).
Frontiers in Microbiology | www.frontiersin.org Compared with other HAdV types, such as HAdV-B3, -C5, -B7, -B14, -D36, of which the genomes are usually conserved and stable (Mahadevan et al., 2010;Seto et al., 2010b;Nam et al., 2014;Alkhalaf et al., 2015;Yu et al., 2016;Zhang et al., 2017), HAdV-B55 showed similarly conserved genomes when looking at the two strains isolated five years apart (2006 vs 2011). However, the major population distribution of HAdV-B55 associated disease has been changed from juveniles to adults, whereas the disease severity remains similar. The major gene mutations between the two isolates are located in the L3 pVI protein, one of the structural proteins of HAdVs, in which six single-nucleotide non-synonymous substitutions and one synonymous substitution were identified (Figure 4). pVI and its processed form protein VI are special proteins because they have different functions at various stages of HAdV infections. For example, during adenoviral infection, protein VI aids the virion particle to escape from the endosome by inducing a pH-independent disruption of the membrane (Wiethoff et al., 2005). The N-terminal domain in pVI with predicted amphipathic α-helical structure is required for membrane FIGURE 4 | Schematic representation of HAdV-B55 protein VI. The amino acid mutations between HAdV-B55 strains BJ01 and QS-DLL are indicated in yellow. Sites for cleavage by the protease, nuclear export signal, and nuclear localization signal are also indicated in different colors. The hexon binding domains are located between amino acid residues 48 to 74 and 235 to 239. A predicted amphipathic-α-helical domain is also shown. The location of these domains was estimated from the previous report by Wiethoff et al. (2005). lytic activity (Wiethoff et al., 2005). Additionally, it binds the hexon protein and facilitate nuclear import of hexon (Matthews and Russell, 1995). Two regions, between amino acid residues 48-74 and 233-239 of pVI, were required for the interaction with the hexon protein, where two non-synonymous substitutions were identified (Figure 4). Whether the non-synonymous substitutions in pVI can speed up the adenoviral maturation and further change the population distribution of HAdV-B55 infection remains unknown. Further wet-bench confirmation is necessary. Phylogenetic analysis of genome sequences, as well as individual genes, showed a subclade composed of HAdV-B55 isolates (BJ01 included) with high bootstrap values (Figures 3A-D). HAdV-B55 was closely related to HAdV-B11 (Slobitski) and HAdV-B14 (de Wit), forming a clade (Figures 3A-D), which was consistent with the global pairwise alignment data (Figure 2A). Recombination analysis further revealed the mechanism that the two loops of the HAdV-B11 hexon were embedded in the whole genome of HAdV-B14 (Figures 2B-D), giving evidence that the hexon gene, particularly hypervariable regions, acts as an important hotspot for genome recombination. Although the penton base and shaft/knob boundary of the fiber gene were also known to be two recombination sites Liu et al., 2011), no recombination in these regions was identified in HAdV-B55 genomes. The neutralizing and protective antibodies in human are induced primarily by the hexon protein, partially by the fiber and penton base proteins (Gahery-Segard et al., 1998;Hong et al., 2003;Sumida et al., 2005). The lack of non-synonymous substitutions in the three capsid proteins of HAdV-B55 strains make the HAdV-B55 vaccine development easier when compared with other respiratory viruses, such as influenza viruses.
Genome recombination is an important strategy driving HAdV evolution (Robinson et al., , 2011Kajon et al., 2010;Walsh et al., 2010Walsh et al., , 2011Liu et al., 2011Liu et al., , 2012Matsushima et al., 2012), and results in the changes in pathogenicity Singh et al., 2012;Zhou et al., 2012). For HAdVs, to some extent, recombination can be defined as an adaptive change resulting from environmental pressures. In the example of HAdV-B55, the initial HAdV-B14 infection may induce neutralizing antibodies against this type. Meanwhile the coinfection and co-replication of HAdV-B11 and HAdV-B14 in the intestine may lead to the recombination between HAdV-B11 and HAdV-B14 hexon genes, i.e., the resulting HAdV-B55 can escape from pre-existing immunity against HAdV-B14 from the host due to its acquisition of HAdV-B11 loops. The previous study has confirmed that the neutralizing and protective antibodies in human are induced primarily by the two hexon loops. The acquisition both loops of HAdV-B11 is beneficial to the survival and spreading of the recombinant HAdV-B55, because the previous absence of circulating HAdV-B11 in China (Han et al., 2013;Chen et al., 2016;Yu et al., 2016) led to a corresponding low level of herd immunity, which might be associated with the subsequent more and more ARD outbreaks caused by HAdV-B55 (Yang et al., 2009;Zhu et al., 2009;Walsh et al., 2010;Zhang et al., 2012a;Zhang S.Y. et al., 2016;Cao et al., 2014;Li et al., 2014;Chen et al., 2016;Gao et al., 2016;Tan et al., 2016;Yi et al., 2017). This is one of the immune escape capabilities of HAdVs. The detailed recombination mechanism of HAdV-B55 might be illustrated by in vitro co-infection and replication of the two types of adenoviruses, HAdV-B11 and -B14. In addition, we also infected A549, Hela, HEp-2, Hep-G2, and Vero cells with HAdV-B55, -B11, and -B14, respectively (data not shown). The replication efficiency of different types in the cells was compared. The three types of HAdVs replicated most efficiently in A549 cells. Unfortunately, no significant difference of replication efficiency among the three types of HAdVs was identified in all the tested cell clines. The prevalence and outbreak of HAdV-B55 might be not associated with the virus replication efficiency.
In summary, comparative genomic analysis of re-emergent HAdV-B55 pathogens revealed conserved capsid proteins and low non-synonymous substitutions per site of HAdV-B55, which provides the foundation for the development of effective vaccines against this pathogen. It also facilitates the further studies of the adenovirus epidemiology, molecular evolution, patterns of pathogen emergence and re-emergence (Zhang et al., 2017), and the predication of genome recombination between adenoviruses, as well as the development of public health strategies, including surveillance.

AUTHOR CONTRIBUTIONS
QZ, XW, and BC conceived and designed the study. ZC, YY, SJ, W-GL, W-WC, JZ, ML, ShZ, NC, JO, and SuZ performed the computational analyses and made the figures and tables. ZC, YY, and QZ wrote the manuscript. All authors reviewed the manuscript.