Predicting Antigenic Variants of Influenza A/H3N2 Viruses

Models based on amino acid changes in influenza hemagglutinin protein were compared to predict antigenic variants of influenza A/H3N2 viruses.

Current inactivated influenza vaccines provide protection when vaccine antigens and circulating viruses share a high degree of similarity in hemagglutinin protein. Five antigenic sites in the hemagglutinin protein have been proposed, and 131 amino acid positions have been identified in the five antigenic sites. In addition, 20, 18, and 32 amino acid positions in the hemagglutinin protein have been identified as mouse monoclonal antibody-binding sites, positively selected codons, and substantially diverse codons, respectively. We investigated these amino acid positions for predicting antigenic variants of influenza A/H3N2 viruses in ferrets. Results indicate that the model based on the number of amino acid changes in the five antigenic sites is best for predicting antigenic variants (agreement = 83%). The methods described in this study could be applied to predict vaccine-induced cross-reactive antibody responses in humans, which may further improve the selection of vaccine strains.
I nfluenza viruses cause substantial medical and social problems throughout the world, and vaccination is the primary method for preventing influenza and its complications. Of the three types of influenza viruses (A, B, and C), only influenza A and B viruses cause epidemic human disease. Hemagglutinin (HA) and neuraminidase proteins are the two surface antigens that induce protective antibody responses and are the basis for subtyping influenza A viruses. Influenza B viruses are not categorized into subtypes (1). Since 1977, influenza A/H1N1, A/H3N2, and B viruses have been in global circulation, and these three viruses are currently included as vaccine components. Current inactivated vaccines provide essential protection when the vaccine antigens and the circulating viruses share high degree of similarity in the HA protein. Since new influenza virus antigenic variants emerge frequently from accumulation of point mutations in the HA protein (i.e., antigenic drift), influenza vaccine antigens need to be updated frequently, based on the results of global influenza surveillance (1), which includes clinical, virologic, and immunologic surveillance. In virologic surveillance, influenza viruses are characterized antigenically on the basis of ferret serum antibody cross-reactivity. Antigenic variants selected serologically are then tested for antibody cross-reactivity in human sera to evaluate the potential cross-protection against the antigenic variants provided by the current vaccines and to select vaccine strains for the next season (2,3).
The HA protein of influenza viruses is synthesized as a single polypeptide (HA0) that is subsequently cleaved into two polypeptides (HA1 and HA2) and forms into homotrimers. The HA1 polypeptide mutates more frequently than the HA2 polypeptide and plays a major role in natural selection (4,5). Three-dimensional (3-D) structure of the HA protein of A/Aichi/2/68 (H3N2) has been determined, and five antigenic sites on the HA1 polypeptide have been proposed conceptually (4)(5)(6). Of the 329 amino acid positions on HA1, 131 lie on or near the five antigenic sites (7,8). Twenty amino acid positions on HA1 have been mapped, based on laboratory variants selected in the presence of mouse monoclonal antibodies (9,10). In addition, 18 amino acid positions have been identified as being under positive selection by comparing 357 viruses isolated from 1984 to 1996 (7). In a recent study, 32 amino acid positions have been identified as diverse codons by comparing 525 viruses isolated from 1968 to 2000 (11). However, the importance of these amino acid positions in terms of predicting antibody cross-reactivity is unclear. Therefore, we conducted this study to explore the usefulness of these amino acid positions for predicting antigenic variants of influenza A/H3N2 viruses. The methods described in this study could be used to predict vaccineinduced cross-reactive antibody responses in humans, which may further improve the selection of vaccine strains.

Cross-Reactive Antibody Data
In the current global influenza surveillance system, influenza viruses are characterized antigenically based on ferret serum hemagglutinin-inhibition (HAI) antibody cross-reactivity. We first screened publications for influenza H3N2 virus cross-reactive antibody data. Then, we searched the H3N2 viruses with cross-reactive antibody data for their amino acid sequences of the HA1 polypeptide (www.flu.lanl.gov) (8). Table 1 shows the full name, abbreviation, identification (ID) by type, and accession code of the H3N2 viruses (12)(13)(14)(15)(16). Six sets of ferret serum HAI cross-reactivity data were available for analysis. The first set included 11 viruses (55 pairwise comparisons, virus ID: A to K) isolated from 1971 to 1979 (12). The second set included 8 viruses (28 pairwise comparisons, virus ID: J, L to R) isolated from 1979 to 1987 (17). The third set included 10 viruses (45 pairwise comparisons, virus ID: S to AB) isolated from 1989 to 1994 (13). The fourth set included 8 viruses (28 pairwise comparisons, virus ID: AC to AJ) isolated from 1994 to 1996 (18). The fifth set included 5 viruses (10 pairwise comparisons, virus ID: AE, AK to AN) isolated from 1995 to 1999 (15). The sixth set included 6 viruses (15 pairwise comparisons, virus ID: AN to AT) isolated from 1999 to 2002 (16). A mathematical method had been proposed to calculate "antigenic relatedness" between two viruses (presented as a percentage) as a geometric mean of two ratios between the heterologous and homologous antibody titers (19,20).
Since our study investigates the relationship between antigenic difference and amino acid changes in the HA1 polypeptide, the mathematical method was modified to calculate "antigenic distance" (i.e., reciprocal of antigenic relatedness). For example, if homologous titers of two viruses are 640 and 640 and two heterologous titers against each other are 320 and 320, the antigenic relatedness between these two viruses is ([320 x 320]/[640 x 640]) ½ = 50%, and the antigenic distance between these two viruses is ([640 x 640]/[320 x 320]) ½ = 2. Table 2 shows the antigenic distances of the 55 pairwise comparisons among the 11 viruses in the first set. In total, 181 pairwise comparisons among 45 viruses were available for analysis. Among the 181 pairwise comparisons, 56 (31%) have an antigenic distance <4 (i.e., similar antigenicity), and 125 (69%) have an antigenic distance >4 (i.e., antigenic variant) (21).

Sequence Alignment
Amino acid sequences of the HA1 polypeptide were downloaded from the Los Alamos Influenza Sequence Database (8) or entered from the original publications if they were not available from the Los Alamos Influenza Sequence Database. Amino acid sequences of the 45 viruses were harmonized to same length (329 residues) and were numbered according to A/Aichi/2/68 HA1 sequence because the 3-D structure of the A/Aichi/2/68 hemagglutinin protein has been determined (4-6). Pairwise alignments among the 45 sequences were conducted by using S-Plus 2000 (Insightful Corporation, Seattle, WA). Pairwise-aligned amino acid sequence data were trans-formed into 0 (without change) and 1 (with change) and were further linked with the pairwise antigenic distance data for predicting analyses.

Predicting Antigenic Variants
The first model was based on amino acid differences in the whole HA1 polypeptide (329 residues). The second  (16) model was based on amino acid differences in the five antigenic sites (131 residues) (online Appendix available at www.cdc.gov/ncidod/eid/vol10no8/04-0107.htm#app) (7,8). The third model was based on the 20 positions related to mouse monoclonal antibody binding (online Appendix) (9,10). The fourth model was based on the 18 positions under positive selection (online Appendix) (7). The fifth model was based on the 32 codons of substantial diversity (online Appendix) (11). For evaluating the qualitative performance of the five prediction models, an antigenic variant was defined as antigenic distance >4 (21). Positive predictive value (PPV), negative predictive value (NPV), and agreement of the five prediction models were calculated, and different cutoff levels of amino acid differences were compared by using the receiver-operating characteristic analysis (22). Figure A shows the scatterplot between antigenic distance and number of amino acid changes in the HA1 peptide (328 residues). Among the 181 pairwise comparisons, the antigenic distance ranged from 1 to 181, and the number of amino acid changes in the HA1 peptide ranged from 1 to 36. Overall, the antigenic distance correlated to the number of amino acid changes in the HA1 polypeptide (R = 0.74, p < 0.001). Different cutoffs of amino acid changes in the HA1 polypeptide were evaluated for predicting antigenic variants. The highest agreement was found with a cutoff of >7 amino acid changes, which shows that the NPV, PPV, and agreement were 66% (31/47), 81% (109/134), and 77% (140/181), respectively ( Figure A). Table 3 shows some unique pairwise comparisons with unusual patterns between antigenic distances and amino acid changes. A/Shanghai/11/87 and A/Victoria/7/87 were antigenically different (antigenic distance = 5.7), but they had only one amino acid difference (R247S). The position 247 is located at the antigenic site D. In addition to the amino acid change at position 247, A/Shanghai/11/87 had two more amino acid differences from A/Sichuan/2/87 (E156K, S186V) and A/Sydney/1/87 (A138S, N193K), but these three viruses were antigenically similar (antigenic distance <4). A/Victoria/7/87 had only two amino acid differences from A/Sichuan/2/87 (K156E, V186S) and A/Sydney/1/87 (S138A, K193N), but A/Victoria/7/87 was antigenically different from these two viruses ( Table 3). The positions 156, 186, and 193 are located at the antigenic site B and the position 138 is located at the antigenic site A. Moreover, the positions 156 and 193 are also located at the mouse monoclonal antibody-binding sites (online Appendix).

Model One
The unusual patterns between antigenic distances and amino acid differences may be due to interaction between amino acid changes in the hemagglutinin or laboratory variability, which needs further experiments to clarify. In addition, A/Victoria/3/75 and A/Victoria/112/76 had only two amino acid differences (L3F, R229G), but they were antigenically different (antigenic distance = 5.7) (Table 3), which also requires further experiments to clarify. The position 3 is not located at any antigenic site, and the position 229 is located at the antigenic site D. We found that 3 of 80 pairwise comparisons with >12 amino acid changes had antigenic distance <4 ( Figure A).
A/Sydney/5/97 and A/Panama/2007/99 had 12 amino acid differences, but these two viruses were antigenically similar (antigenic distance = 1.4) based on ferret serum HAI titers (Table 3)  parisons may indicate that interaction of multiple amino acid changes could potentially preserve the 3-D structure of HA1. Alternatively, the ferret serum HAI assay system is not sensitive enough to detect the antigenic difference. Figure B shows the scatterplot between antigenic distance and number of amino acid changes in the five antigenic sites (131 amino acid positions). Among the 181 pairwise comparisons, amino acid changes in the five antigenic sites ranged from 1 to 32. Overall, the antigenic distance correlated to number of amino acid changes in the five antigenic sites (R = 0.77, p < 0.001). Different cutoffs of amino acid changes in the five antigenic sites were evaluated for predicting antigenic variants. The highest agreement was found by using a cutoff of >7 amino acid changes, which shows that the NPV was 71% (42/59), PPV was 89% (108/122), and agreement was 83% (150/181) ( Figure B). Figure C shows the scatter plot between antigenic distance and number of amino acid changes in the 20 amino acid positions related to mouse monoclonal antibody binding. Overall, the antigenic distance correlated to number of amino acid changes in the 20 amino acid positions (R = 0.74, p < 0.001). Different cutoffs of amino acid changes in the previously defined 20 amino acid positions were evaluated for predicting antigenic variants. The highest agreement was found by using a cutoff of >2 amino acid changes, which shows that the NPV was 64% (32/50), PPV was 82% (107/131), and agreement was 77% (139/181) ( Figure C). Figure D shows the scatterplot between antigenic distance and number of amino acid changes in the 18 amino acid positions under positive selection. Overall, the antigenic distance correlated moderately to number of amino acid changes in the 18 amino acid positions (R = 0.43, p < 0.001). Different cutoffs of amino acid changes in the 18 amino acid positions were evaluated for predicting antigenic variants. The highest agreement was found by using a cutoff of >1 amino acid changes, which shows that the NPV was 55% (6/11), PPV was 71% (120/170), and agreement was 70% (126/181) ( Figure D). Figure E shows the scatter plot between antigenic distance and number of amino acid changes in the 32 codons with substantial diversity. Overall, the antigenic distance correlated moderately to number of amino acid changes in the 32 codons (R = 0.68, p < 0.001). Different cutoffs of amino acid changes in the 32 codons were evaluated for predicting antigenic variants. The highest agreement was found by using a cutoff of >2 amino acid changes, which shows that the NPV was 72% (13/18), PPV was 74% (120/163), and agreement was 74% (133/181) ( Figure E). Overall, the model based on the number of amino acid changes in the five antigenic sites has the highest correlation to the antigenic distance (R = 0.77) and the best performance for predicting antigenic variants (agreement = 83%).

Discussion
Wilson and Cox proposed that a drift variant of epidemiologic importance usually contains >4 amino acid changes located on >2 of the five antigenic sites, but they did not specify the amino acid positions in the five antigenic sites (5). Our study further showed that the model based on the number of amino acid changes in the 131 amino acid positions in the five antigenic sites had the highest correlation to the antigenic distance and the best performance for predicting antigenic variants. Theoretically, not all 131 amino acid positions in the five antigenic sites play a critical role in determining antigenicity, and some immunodominant positions (i.e., major antibody-binding sites) could be identified by using bioinformatics models and reverse genetic techniques (23)(24)(25). A model based on the immunodominant positions can potentially have a better performance than the model based on the five antigenic sites.
The model based on the 20 amino acid positions related to mouse monoclonal antibody binding only have moderate performance for predicting antigenic variants (R = 0.74, agreement = 77%), which indicates that mouse and ferret antibodies may recognize different B-cell epitopes. In addition, that models four and five have a low performance for predicting antigenic variants is not surprising, since these two models identified the amino acid positions only on the basis of virus sequence data without incorporating antigenic properties.
Antigenic variants of influenza viruses are currently determined with the ferret serum HAI assay. The ferret serum HAI assay works well to distinguish major drift variants, but moderate differences are difficult to define reliably (26). As shown in Table 3, some unusual patterns between antigenic distance and amino acid changes in the HA1 may be caused by laboratory variability of the ferret serum HAI assay. The prediction models proposed in the present study may perform better if a more reliable assay system is used. Several studies have shown that neutralization assays are more sensitive for detecting influenza virus antibody responses than HAI assays (27,28). However, tra-ditional neutralization assays based on cytopathic effect are labor-intensive and not suitable for a large-scale surveillance system. A simplified EIA-based neutralization assay may be the potential solution (29).
Several studies have documented that one to three amino acid changes in the HA1 of influenza H1N1 and H3N2 viruses could possibly reduce the antigenicity and efficacy of inactivated vaccines in animal models (30)(31)(32)(33), which are consistent with our results (Table 3). In animal studies, single mutation at amino acid position 156 of the HA1 of two H3N2 viruses was linked to the reduced antigenicity (32,33). The position 156 is located at the antigenic site B and the mouse monoclonal antibody-binding site (see online Appendix). Overall, this evidence may indicate the existence of immunodominant positions in the HA1 and emphasize the importance of identifying the immunodominant positions to monitor the selection of vaccine strains and the process of vaccine manufacturing.
The current global surveillance system largely relies on ferret serum HAI data for selection of influenza vaccine strains (2,3). In some cases, human and ferret cross-reactive antibody data were not consistent (34,35). The methods described in this study could be applied to predict vaccine-induced cross-reactive antibody responses in humans, which may further improve the selection of vaccine strains (35).