Genomic Signature and Mutation Trend Analysis of Pandemic (H1N1) 2009 Influenza A Virus

A novel swine-origin pandemic influenza A(H1N1) virus (H1N1pdm, also referred to as S-OIV) was identified as the causative agent of the 21st century's first influenza pandemic, but molecular features conferring its ability of human-to-human transmission has not been identified. Here we compared the protein sequences of 2009 H1N1pdm strains with those causing other pandemics and the viruses isolated from humans, swines and avians, and then analyzed the mutation trend of the residues at the signature and non-signature positions, which are species- and non-species-associated, respectively, in the proteins of H1N1pdm during the pandemic of 2009. We confirmed that the host-specific genomic signatures of 2009 H1N1pdm, which are mainly swine-like, were highly identical to those of the 1918 H1N1pdm. During the short period of time when the pandemic alert level was raised from phase 4 to phase 6, one signature residue at the position of NP-100 mutated from valine to isoleucine. Four non-signature residues, at positions NA-91, NA-233, HA-206, and NS1-123, also changed during the epidemic in 2009. All these mutant residues, except that at NA-91, are located in the viral functional domains, suggesting that they may play roles in the human adaption and virulence of 2009 H1N1pdm.


Introduction
In April 2009, a new influenza A(H1N1) was reported in Mexico and the southwestern United States [1]. The World Health Organization (WHO) boosted its pandemic alert levels for this flu to phase 4 on 27 April 2009, phase 5 on 29 April 2009, and phase 6 on 11 June 2009, declaring a full-blown influenza pandemic for the first time in 41 years. As of 24 January 2010, the laboratory-confirmed cases of pandemic influenza H1N1 2009, including at least 14,711 deaths, had been reported in more than 209 countries and overseas territories or communities worldwide (http://www.who.int/csr/don/2010_01_29/en/index. html).
The causative agent was proven to be a novel swine-origin pandemic influenza A (H1N1) virus (H1N1pdm, also referred to as S-OIV). Its hemagglutinin (HA), nucleoprotein (NP), and nonstructural (NS) protein genes belong to the classical swine lineage, while its neuraminidase (NA) and matrix (M) protein genes derive from a Eurasian swine influenza lineage which entered pigs from avian hosts around 1979, and its polymerase gene segments, PA, PB1 and PB2, descended from the North American triple reassortant swine lineage [2][3][4][5]. This unique genetic combination may contribute to the improved fitness of the H1N1pdm in humans and its human-tohuman transmissibility, although none of the molecular features previously shown to confer increased human-to-human transmissibility has so far been identified in the 2009 H1N1pdm. Since there is a serious concern that the virus may further mutate into a more dangerous form (http://www.cbsnews.com/stories/2009/12/29/ health/main6034632.shtml), it is critical to monitor the evolutionary trends of the 2009 H1N1pdm virus.
Shih and colleagues previously developed an entropy-based computational scheme to identify host-specific genomic signatures of human and avian influenza viruses [6]. Most recently, they used this method to compare the protein sequences of the 2009 H1N1pdm strains collected before May 28, 2009, with those of avian, swine and human influenza A viruses (IAVs). Among the 47 avian-human signatures, they found that 8 (one in PB1, one in PB2, 2 in PA and 4 in NP) showed human-characteristic signatures, which may serve as a molecular marker for monitoring adaptive mutations in the influenza viruses [7].
In the present study, we compared the protein sequences of 2009 H1N1pdm strains collected from April 1, 2009 to December 31, 2009, with the corresponding protein sequences of the human, avian, and swine IAVs and those causing past influenza pandemics. We then conducted an analysis to gain insight into 1) the mutation trend of the residues at the signature and nonsignature positions in the proteins of H1N1pdm during the pandemic of 2009 and 2) the potential roles of the mutated residues in human adaptation and virulence of the 2009 H1N1pdm influenza virus.

Results and Discussion
Comparison of Genomic Signatures of 2009 H1N1pdm with Human, Swine and Avian Influenza A Viruses, as Well as Those of Other Pandemic Influenza Viruses The consensus protein sequences of the 2009 H1N1pdm were aligned with those of human, avian and swine IAVs collected   between 2000 and 2008, as well as those causing past pandemics. The residues in the protein sequences of each group located at the avian-human signature positions described by Chen et al. [7] were listed in Table 1. The signature residues in the proteins of the 2009 H1N1pdm strains collected in the pre-epidemic period were 17%, 94% and 75% identical to those of human, swine and avian IAVs, respectively ( H1N1pdm that may play roles in viral transmission and virulence. Unlike seasonal flu that usually hits elderly people the hardest, the 2009 H1N1pdm has mostly infected the young, especially school-aged children [9]. Persons born before 1957 had a reduced risk of 2009 H1N1pdm infection [10], suggesting that the immunity induced by the viruses causing influenza pandemics after 1957 are ineffective in protecting people from infection by the 2009 H1N1pdm. After assessing human sera from different age groups, Itoh et al. [3] found that elderly people exposed to the 1918 H1N1pdm had antibodies that cross-neutralized the 2009 H1N1pdm. Hancock et al. [11] also reported that persons under the age of 30 years had little evidence of cross-reactive antibodies to the 2009 H1N1pdm virus, while people born before 1930, who were probably exposed to a 1918 H1N1pdm-like virus, had the highest titers of antibodies against the 2009 H1N1pdm. These findings suggest that the 2009 and 1918 H1N1pdm viruses have high antigenic and immunogenic similarities, raising serious concerns that the 2009 H1N1pdm may follow an evolutionary path similar to that of the 1918 H1N1pdm.

Mutation Trend Analysis of the Signature and Non-Signature Residues in Proteins of 2009 H1N1pdm Isolates Collected at the Different Periods in the 2009 Pandemic
We compared the signature and non-signature residues of the proteins in the 2009 H1N1pdm strains collected at the preepidemic, early, middle and late periods of the pandemic in 2009. We found that among the 47 avian-human signatures [7], only one signature residue at position 100 of NP exhibited a dominant change during the 2009 epidemic. In the pre-epidemic period, only 10% of 2009 H1N1pdm strains had valine to isoleucine change at position NP-100, whereas about 57%, 80% and 93% of the virus isolates collected in the early, middle and late periods possessed this change, respectively (Table 3 and Fig. 1), suggesting that this V100I mutation may play some role in the increased  Table 3).
The influenza viral NP, which forms trimer as a part of the helical genomic ribonucleoprotein complexes, plays a critical role in viral RNA replication [12]. NP may also play a role in crossspecies transmission since among the ten IAV proteins, NP contained the largest number (15 of 47) for genomic signatures (Table 1). Each NP monomer, which contains 17 a-helices and 9 b-strands [13], consists of a head domain and a body domain ( Fig. 2A). The body domain is comprised of three segments (aa 21-149, 273-396 and 453-489) responsible for binding to the PB1 and PB2 subunits of the viral polymerase [14]. The conserved amino acid regions on the surface of the NP body domain that mediate NP-polymerase interactions are crucial for viral RNA replication. For example, an asparagine to lysine mutation at the position 319, which is located on the surface of the body domain, resulted in an increase of polymerase activity and adaptation of an avian influenza virus to a mammalian host [15]. The residue at NP-100, which is located in the body domain ( Fig. 2A-C), is thought to be involved in NP-PB2 interaction [14]. Given that the majority of the viruses gained the V100I mutation during a short period of time when the pandemic alert level was raised from phase 4 to phase 6 ( Table 3), this mutation may play a role in the increased transmissibility or infection of the 2009 H1N1pdm.
Furthermore, we identified four dominant mutations of nonsignature residues in NA, HA, and NS1 proteins of the 2009 H1N1pdm virus. In NA protein, the avian-like residue, valine at NA-91 mutated to the human-like residue isoleucine, which was presented in the 1918 and 1977 H1N1pdm IAVs. The noncharged residue, asparagine at NA-233, mutated to a negatively charged residue, aspartic acid, which is only presented in the 1977 H1N1pdm IAVs. In the pre-epidemic period, about 11% of the 2009 H1N1pdm strains had the V91I mutation and/or N233D mutation. In the early, middle, and late periods, 57%, 82%, and 86% of the viruses possessed V91I mutation, while 51%, 76% and 86% of the viral isolates had the N233D mutation, respectively (Table 3 and Fig. 1), suggesting that many of the 2009 H1N1pdm strains carried both NA V91I and NA N233D mutations.
In the process of virus infection, NA functions as a tetramer (Fig. 2D) to remove sialic acid from cell-surface receptors to allow the newly made virions to release and spread to uninfected cells [16]. Therefore, NA serves as an important target for development of anti-influenza drugs, such as oseltamivir (Tamiflu) [17] and zanamivir (Relenza) [18]. A single-point mutation of the residues located in the drug target domain (DTD), such as H260Y mutation (corresponding to H274Y mutation in H5N1 viruses), may result in viral resistance to oseltamivir [19]. Most recently, a number of reports indicated that several clinically isolated 2009 H1N1pdm strains with NA H260Y mutation were resistant to the NA inhibitor oseltamivir [20,21]. However, we did not find the increased H260Y mutation in the 2009 H1N1pdm NA sequences that we analyzed. Instead, we identified V91I and N233D mutations in the majority of 2009 H1N1pdm isolates collected at the late period of the 2009 epidemic. Since the residue at the NA-233 position is also located in the DTD region (Fig. 2E) and has close proximity to H260, it is worthwhile to investigate the potential effect of N233D mutation on the sensitivity of the virus to NA inhibitors. Since the amino acid at NA-91 is not located in the DTD, the NA V91I mutation may have no direct effect on drug sensitivity of the virus.
We identified one dominant mutation in HA, S206T with the mutation rates of 0%, 27%, 44% and 83% in the 2009 H1N1pdm strains collected in the pre-epidemic, early, middle and late periods, respectively (Table 3 and Fig. 1). This is a unique mutation because it was neither found in 1918 and 1977 H1N1pdm viruses, nor was it found in the human, swine and avian IAVs. Interestingly, however, we found that the S206T mutation transiently appeared in the HA sequences of human H1N1 viruses collected in 1934 and in swine H1N1 viruses collected in 1976 and 1977. S206 is located in the receptorbinding domain (RBD) of HA (Fig. 2G-I) [22]. The binding of IAV to erythrocytes and host cells is mediated by the interaction of its HA RBD with the cell surface receptor containing sialic acid. The RBD sequence is thus the major determinant of IAV host specificity [23]; therefore, HA-206 SRT mutation may directly affect the infectivity and transmissibility of 2009 H1N1pdm in humans.
Another unique dominant mutation occurred in the NS1 protein, NS1-123 IRV, during the pandemic in 2009. None of the IAVs collected in the pre-epidemic period carried this mutation, while 29%, 40% and 78% of the 2009 H1N1pdm strains collected in the early, middle and late periods possessed the NS1-123 IRV mutation, respectively (Table 3 and Fig. 1). This dominant mutation has not been observed in other IAVs that caused past influenza pandemics. NS1, a 26-kDa protein, functions as a dimer (Fig. 2J). Its monomer consists of seven b-strands and three ahelices, which form the two functional domains, the RNA-binding groove (RBG) and the effector domain (ED) (Fig. 2K) [24]. NS1 is responsible for suppressing antiviral interferon (IFN) induction during viral replication by preventing activation of the latent transcription factors IRF-3 [25] and NF-kB [26]. The lethal H5N1 strains with a point mutation, D92E, or a deletion of residues 80-84 in the NS1 protein, exhibited increased virulence, cytokine resistance or both [27]. The highly effective 1918 H1N1pdm NS1 protein as an inhibitor of type I IFN production might have contributed to its exceptional virulence [25]. In the 2009 H1N1pdm virus, we did not find D92E or other mutations that confer the high virulence of H5N1 and 1918 H1N1pdm strains. Similarly, none of the previously identified virulence factors, such as PB2-627 ERK mutation [28,29], has been identified in the 2009 H1N1pdm. Consequently, the potential role of the NS1-123 IRV mutation, which is located in the ED of NS1, in virulence and host adaptation needs to be clarified.
In summary, our study confirms that the 2009 H1N1pdm virus has much closer linkage to the 1918 H1N1pdm than any other pandemic influenza viruses. We identified one dominant mutation at the signature position (NP-100) and four dominant mutations at the non-signature positions (NA-91, NA-233, HA-206, and NS1-123). Except NA-91, all these mutant residues are located in the viral functional domains, suggesting that they may play roles in the human adaption and virulence of 2009 H1N1pdm.

Protein Modeling Analysis
Homology-based structural models of the functional domains with or without mutations were constructed with templates downloaded from the Protein Data Bank, including NP (PDB ID: 2IQH), NA (PDB ID: 2HTY), HA (PDB ID: 1RUZ), and NS1 (PDB ID: 3F5T). Briefly, the model of the corresponding protein (e.g., NP) was downloaded from Protein Data Bank and opened with the PYMOL program. The residues in the protein model were replaced with those at the corresponding positions in the protein to be analyzed using the ''Mutagenesis'' function of PYMOL program [31] (http://www.pymol.org). The main functional domains in the protein were displayed and analyzed.