Introduction

Influenza A virus is a pathogen that causes thousands of human deaths every year. In April 2009, a new H1N1 influenza A virus strain (pH1N1) emerged in North America and rapidly disseminated around the globe, culminating in the first influenza pandemic of the 21st century [1]. By the end of the pandemic period, the pH1N1 strain was responsible for more than 18,000 deaths worldwide [2]. Comprehensive phylogenetic analysis demonstrated that this virus originated from multiple reassortment events, and although it most likely arose in pigs, it also contained genes from viruses that infect birds and humans [3, 4].

Gradual genetic changes in influenza A viruses culminate in their ability to evade recognition by the immune system and therefore allow their constant circulation among human populations [5]. As demonstrated by Nelson et al. [6], at least seven distinct pH1N1 influenza A virus lineages circulated around the globe in the first months of the pandemic period. Considering its ability to mutate rapidly, the major concern was that the virus would acquire mutations that could lead to greater transmissibility and pathogenicity, and also to drug resistance [79]. For monitoring these evolutionary changes, genome sequencing is a valuable and accessible approach. Genomic sequences provide fundamental information about the chronological and geographical distribution of the strains, and these data support appropriate vaccine development.

Rio Grande do Sul (RS) was the state that was most affected during the pandemic period in Brazil, with over 3,000 confirmed pH1N1 cases and almost 300 reported deaths [10, 11]. A vaccination campaign started in March of 2010 as a measure to control influenza virus infections, and no cases of pH1N1 were confirmed in 2010. On the other hand, over 100 cases occurred in 2011, including 14 deaths [10, 11].

Studies concerning the molecular evolution of this strain in Brazil are scarce. This shortcoming is even worse in the current post-pandemic period, since there are no data available concerning the evolution of the established pH1N1 viruses. Consequently, this lack of information can compromise local public-health vaccination policies.

In this study, six pH1N1 influenza A genomes – four pandemic and two post-pandemic isolates from RS, Brazil – were analyzed. Phylogenetic analysis using the concatenated genome segments was performed to determine the lineages of these isolates. Also, amino acid substitutions in the viral proteins were mapped, and the efficacy of the vaccine against all isolates was predicted. Finally, the type of natural selection that the isolates were subjected to was evaluated.

Materials and methods

Biological samples, virus identification, and genome sequencing

Nasopharyngeal aspirate samples were collected from patients with acute respiratory infection in the state of Rio Grande do Sul, Southern Brazil, during 2009 and 2011 (Online Resource, Table S1). In order to identify the virus, all samples were analyzed by real-time PCR at the State Central Laboratory (LACEN-RS) as described by Veiga et al. [12]. All ethical issues were approved by the Research Ethics Committee of Universidade Federal de Ciências da Saúde de Porto Alegre (UFCSPA).

Twelve pH1N1 samples with high viral load, eight from the pandemic and four from the post-pandemic period, were sent to the J. Craig Venter Institute. Some of them had an excess of mucus and did not homogenize properly with the viral transport media. Therefore, after RNA extraction, only six samples were viable for amplification by multisegment RT-PCR [13]. Subsequently, whole-genome sequencing was performed on these samples, and the sequences were deposited in GenBank (accession numbers are listed in Table 1).

Table 1 Accession numbers of the segment sequences of pH1N1 strains utilized for whole-genome phylogeny

Sequence alignment and concatenation

Sequences of influenza A virus strains were retrieved from the Influenza Research Database (http://www.fludb.org/), and their accession numbers are listed in Table 1. The genomic segments and genes were aligned using MUSCLE [14], embedded in MEGA 5 software [15]. Amino acid substitutions were visually inspected using the sequences of archetype strains California/04/2009 and California/07/2009 as references. Aligned segment sequences were concatenated using Sequence Matrix (http://code.google.com/p/sequencematrix/).

Phylogenetic analysis

For phylogenetic analyses, the aligned sequences were first evaluated using the FindModel software (http://www.hiv.lanl.gov/content/sequence/findmodel/findmodel.html) in order to identify the evolutionary model that best fit the sequence dataset. Subsequently, phylogenetic analysis was performed on the Phylogeny.fr platform (www.phylogeny.fr) [16]. Phylogenetic trees were reconstructed using the maximum-likelihood method implemented in the PhyML program (v3.0 aLRT). The GTR (general time reversible) substitution model was selected assuming an estimated proportion of invariant sites and four gamma-distributed rate categories to account for rate heterogeneity across sites. The gamma shape parameter was estimated directly from the data. The reliability of internal branches was assessed using the aLRT test (SH-Like).

Detection of adaptive evolution

A codon-based Z-test for detection of positive and purifying selection within viral sequences using the Nei-Gojobori method was conducted in MEGA 5 software [15]. The numbers of synonymous (dS) and nonsynonymous substitutions (dN) per site of each gene from RS influenza A virus pH1N1 isolates in relation to its counterparts from the archetype strains California\04\2009 and California\07\2009 were assessed. The difference between the dS and dN of each gene was computed, indicating the type of adaptive evolution. dN>dS indicates positive selection, and dS>dN indicates purifying selection. The variance of the difference was computed using the bootstrap method (1000 replicates).

Measure of antigenic distance

In this analysis, all HA sequences from pH1N1 strains isolated in RS, Brazil, from 2009 and 2011 deposited in the Influenza Research Database (http://www.fludb.org/) were utilized. The additional HA sequences included in the analysis are from the following strains (accession numbers are in parentheses): Rio Grande do Sul/4509/2009 (CY052048), Rio Grande do Sul/4772/2009 (CY052049), Rio Grande do Sul/4782/2009 (CY052050), Rio Grande do Sul/5395/2009 (CY052347), Rio Grande do Sul/7019/2009 (CY052348), Rio Grande do Sul/7108/2009 (CY054282), Rio Grande do Sul/277/2011 (CY099996), Rio Grande do Sul/278/2011 (CY099997), Rio Grande do Sul/279/2011 (CY099998), Rio Grande do Sul/359/2011 (CY099999), Rio Grande do Sul/360/2011 (CY100001) and Rio Grande do Sul/361/2011 (CY100002).

The antigenic distances of the five epitopes of hemagglutinin of the samples in relation to those of the vaccine strain California/07/2009 were evaluated as described by Deem and Pan [17]. The p-distance is defined as the proportion of different amino acids for each epitope between two strains, and it was measured using MEGA software. The largest of the p-distance values is defined as pepitope and can be used to estimate vaccine efficacy (E) by the equation E = 0.47 − 2.47 × pepitope [17]. The difference of the predicted vaccine efficacies between the pandemic (2009) and the post-pandemic (2011) periods in RS, Brazil, was evaluated by Mann-Whitney test using PAST software [18]. A P-value less than 0.05 was considered significant.

Results

Identification of amino acid substitutions

Six influenza A virus pH1N1 strains, four from the pandemic period (2009) and two from the post-pandemic period (2011), isolated in RS, Brazil were sequenced. Predicted proteins from the isolates were compared with those from the archetype strains California/07/2009 and California/04/2009, and the amino acid alterations are shown in Tables 2, 3, 4 and 5.

Table 2 Amino acid substitutions in the HA protein from RS influenza A virus pH1N1 isolates
Table 3 Amino acid substitutions in the NA proteins of RS influenza A virus pH1N1 isolates
Table 4 Amino acid substitutions in the PA, PB1 and PB2 proteins from RS influenza A virus pH1N1 isolates
Table 5 Amino acid substitutions in the M1, M2, NEP, NP and NS1 proteins of RS influenza A virus pH1N1 isolates

The 2011 isolates showed a higher number of amino acid substitutions than the 2009 isolates in relation to the archetype strains, notably in the proteins HA and NA (Tables 2 and 3 and Online Resource, Figures S1 and S2). The HA protein from the 2011 isolates had five modifications in antigenic sites, while that from the 2009 isolates had two modifications (Table 2). The amino acid substitution HA Q310H, which is associated with high mortality rates, was found in the RS samples AVS03 and AVS04 (2009) (Online Resource, Figure S2). However, the alteration D239N/G, which is associated with severe illness [19], was not observed in the isolates. The modification E391K, which is associated with changes in antigenic properties [20], is present in the 2011 isolates (Table 2 and Online Resource, Figure S2).

In the NA protein, there were two substitutions in antigenic sites in the post-pandemic isolates, and only one in the pandemic samples analyzed (Table 3). None of the NA proteins contained the modifications H275Y and S247N (Online Resource Figure S3), which are associated with oseltamivir resistance [21].

The 2011 isolates had more alterations in RNA polymerase complex proteins when compared to the 2009 isolates (Table 4).

Phylogenetic analysis

Preliminary phylogenetic analysis of each segment confirmed that all isolates were phylogenetically closer to pH1N1 than to other influenza A virus strains (Online Resource, Figure S3). Subsequently, whole-genome phylogeny was performed in order to determine the lineages of the isolates. The segments were aligned with those from representative strains of each of the seven clades determined by Nelson et al. [6]. All segment alignments were concatenated and subjected to phylogenetic analysis using maximum likelihood under the GTR model, which had the best performance in the test of nucleotide substitution models (Online Resource, Table S2).

As shown in Figure 1, the Brazilian isolates were distributed between clades 6 and 7, which had high aLRT values (96 and 99, respectively). These isolates exhibited the characteristic amino acid changes that define each clade [6]. Isolates from clade 6 had changes in HA (K2E and Q310H), NP (V100I) and NA (V106I and N248D), and those from clade 7 had substitutions in HA (S220T), NP (V100I), NA (V106I and N248D) and NS1 (I123V) (Tables 2, 3 and 5 and Online Resource, Figures S1 and S2).

Fig. 1
figure 1

Phylogenetic tree of concatenated genomic segments of RS influenza A pH1N1 viruses. The tree was constructed using the maximum-likelihood method. aLRT values greater than 50 % are shown next to the branches. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The RS H1N1 isolates are in bold

The 2009 RS isolates from clade 6 were phylogenetically the closest to each other. In clade 7, the strain Brazil/AVS/06/2009 was closer to the strain Omsk/02/2009 than to the strain Brazil/AVS/07/2009.

The post-pandemic strains formed a monophyletic clade and shared a most recent common ancestor with clade 7 strains. Also, these strains accumulated more mutations than the pandemic strains, as shown by their longest branches in the phylogenetic tree.

Natural selection analysis

Genes from the pH1N1 isolates were tested for purifying and positive selection. No positive selection was detected in any of the genes of the isolates analyzed (Online Resource, Table S3). Results of the purifying selection test are listed in Table 6. Purifying selection was not found in the M2, NS1 and NEP genes of any of the isolates.

Table 6 Purifying selection analysis of the genes from RS influenza A virus pH1N1 isolates

Notably, among the pandemic isolates, the polymerase genes PA, PB1 and PB2 were more frequently affected by purifying selection than other genes. Furthermore, the pattern of purifying selection was not found in any HA genes, while for the NA gene only one isolate (Brazil/AVS03/2009) presented such a pattern of natural selection.

Analysis of the post-pandemic isolates indicated that the HA, NA, M1, NP, PA, PB1 and PB2 genes were under purifying selection.

Vaccine efficacy prediction

The antigenic distances of five epitopes of HA from all isolates in relation to those from the vaccine strain California/07/2009 were computed in order to estimate the efficacy of the vaccine against these isolates using the pepitope method [17], as shown in Table 7. The pepitope of all pandemic strains was 0.029 (dominant epitope E), culminating in a vaccine efficacy estimation of 39.735 %. The dominant epitope varied among the post-pandemic HA1 proteins, and the predicted vaccine efficacy values (median = 31.562 %) were lower than those from the pandemic strains (Mann-Whitney test P-value<0.05) (Table 7). The relative vaccine efficacy medians of the pandemic and post-pandemic strains compared with a perfect-match virus (pepitope = 0) were 84.543 % and 67.154 %, respectively.

Table 7 Predicted vaccine efficacy against RS influenza A virus pH1N1 isolates

Discussion

In this study, genomes of pH1N1 viruses from the pandemic (2009) and post-pandemic periods (2011) isolated in Rio Grande do Sul (RS), Southern Brazil, were investigated.

Whole-genome phylogenetic analysis of the pandemic isolates revealed that clade 6 and clade 7 pH1N1 strains co-circulated in RS in 2009. In a previous study using HA gene sequences, clade 6 strains were isolated in São Paulo (Southeast Brazil), and clade 7 strains in Mato Grosso and Distrito Federal (Center-West region, Brazil), and São Paulo [22]. However, these Brazilian strains were not the most similar to the RS isolates, considering all HA sequences deposited in the Influenza Research Database (http://www.fludb.org/) (data not shown). Therefore, it is likely that in 2009 multiple pH1N1 phylogenetic sublineages could have been introduced in Brazil. The assumption is also supported by the fact that pandemic RS isolates from clade 7 were not the most similar to each other.

On August 2010, the World Health Organization (WHO) declared that the pH1N1 pandemic had ceased and predicted that this virus type would circulate for the years to come, but causing minor problems [2]. In 2010, no cases of pH1N1 infections were detected in RS, Brazil, which was attributed to the intensive vaccination campaign adopted that year, when almost 45 % of the RS population was vaccinated [10]. However, in 2011, 103 cases (14 deaths) of pH1N1 were reported in RS [11]. Phylogenetic analysis of the post-pandemic 2011 RS isolates demonstrated that they belong to a monophyletic group that is nested in clade 7. During the 2009 pandemic, clade 7 strains were the most common in Latin America [22, 23] and in other countries such as the USA [24], India [25] and Canada [26]. The founder effect hypothesis would explain the global dominance of clade 7, since the initial foothold of this lineage occurred in New York State, which could have facilitated its rapid global spread via New York City’s high international interconnectivity [24]. Further studies should be performed to evaluate if most of the currently circulating pH1N1 strains in Brazil were derived from the clade 7 lineage, since the WHO recommendation for influenza vaccine composition is to utilize the clade 1 strain California/07/2009, whose antigenic properties could be distinct from the former ones.

Evolution at high mutation rates is a genetic variation mechanism that ensures that RNA viruses survive [27].

As demonstrated previously, nonepitopic sites tend to accumulate fewer mutations than epitopic ones [28]. This phenomenon is also observed in internal viral proteins, such as matrix (M1), polymerase (PA, PB1, PB2), nucleoprotein (NP), and nonstructural proteins NS, since they are hidden from antibodies and are thus under less selective pressure to change. Furthermore, these proteins have a constrained structure, and mutations that impair their functionality are therefore promptly removed by purifying selection [29]. For instance, polymerase genes have the lowest dN/dS ratio among the virus genes, indicating that intense purifying selection acted on these genes [30].

Among the RS pH1N1 isolates, purifying selection was detected in most genes of the 2011 post-pandemic viruses. However, it was observed that some genes from 2009 pandemic isolates were already under purifying selection, even with the short time of divergence from the common ancestor shared with the archetype strains. Nevertheless, although purifying selection is an important driving force of pH1N1 virus evolution, it is worth noting that RS post-pandemic and pandemic isolates accumulated amino acid mutations relative to the vaccine strain. This observation is in agreement with the findings of Plotkin et al. [29], who suggested that influenza viruses are subject to an intragenomic conflict over the mutation rate: certain genes, and specific residues within those genes, experience frequency-dependent selection to change, while other genes experience purifying selection to remain fixed.

Studies have attempted to correlate the presence of the amino acid alterations in the major viral antigenic determinants HA and NA proteins with prognosis of illness, pathogenicity, virulence, and resistance to antiviral drugs. In this respect, for instance, the substitution Q310H in the HA protein was detected in the isolates AVS03 and AVS04. Although a previous study pointed out the association of this mutation with fatal cases [8], this finding was recently challenged [9, 31].

Amino acid changes in NA associated with oseltamivir resistance were not found in the RS isolates. Assessments of oseltamivir resistance frequency among RS strains during the pandemics and the post-pandemic period (2010-2011) revealed that the H275Y mutation is extremely rare in RS [10, 32]. However, because there are limited options for antiviral treatment, the emergence of resistant strains is a public-health concern [33].

Some amino acid substitutions occurred in antigenic sites of the HA and NA, which could affect influenza vaccine efficacy. As expected, the post-pandemic strains had more amino acid changes in the antigenic sites of HA, culminating in a decrease in the predicted vaccine efficacy against the post-pandemic strains in relation to the pandemic ones. A study performed recently in Japan demonstrated that, in post-pandemic strains containing multiple mutations in antigenic sites of HA (2 to 4 mutations), these mutations did not contribute to a change in antigenicity [34]. However, this probably depends on the site where the mutation occurred and also the type of amino acid substitution. For instance, viruses with the double mutation E391K and D142N have been associated with several breakthrough infections [35]. The D142N alteration was not observed in the 2011 isolates. Nevertheless, other changes in epitopic sites, such as R222K and I233V, which are both lacking in the archetype strains, must be evaluated.

It is worth noting that in 2011, there were 103 cases with 14 deaths due to influenza A virus pH1N1 in RS. In 2012, until October, 525 cases with 67 deaths were confirmed, an increase of 409.71 % and 378.57 %, respectively, in relation to 2011 [11]. Therefore, future studies are necessary to evaluate the efficacy of the current vaccine against the circulating pH1N1 strains in Brazil.

Conclusion

Genome sequences provide fundamental data for the understanding of influenza virus evolution. In this study, we investigated whole-genomic sequences of Brazilian pH1N1 viruses and demonstrated that distinct lineages co-circulated in RS, Brazil. Moreover, we also showed that the pH1N1 isolates contained amino acid substitutions that could alter their biological and antigenic properties. At present, it is evident that the pH1N1 viruses persisted and are constantly evolving. Therefore, monitoring the circulating strains is crucial to ensure proper local prevention and control measures against these pathogens.