West Nile Virus Vaccine Design by T Cell Epitope Selection: In Silico Analysis of Conservation, Functional Cross-Reactivity with the Human Genome, and Population Coverage

West Nile Virus (WNV) causes a debilitating and life-threatening neurological disease in humans. Since its emergence in Africa 50 years ago, new strains of WNV and an expanding geographical distribution have increased public health concerns. There are no licensed therapeutics against WNV, limiting effective infection control. Vaccines represent the most efficacious and efficient medical intervention known. Epitope-based vaccines against WNV remain significantly underexploited. Here, we use a selection protocol to identify a set of conserved prevalidated immunogenic T cell epitopes comprising a putative WNV vaccine. Experimentally validated immunogenic WNV epitopes and WNV sequences were retrieved from the IEDB and West Nile Virus Variation Database. Clustering and multiple sequence alignment identified a smaller subset of representative sequences. Protein variability analysis identified evolutionarily conserved sequences, which were used to select a diverse set of immunogenic candidate T cell epitopes. Cross-reactivity and human leukocyte antigen-binding affinities were assessed to eliminate unsuitable epitope candidates. Population protection coverage (PPC) quantified individual epitopes and epitope combinations against the world population. 3 CD8+ T cell epitopes (ITYTDVLRY, TLARGFPFV, and SYHDRRWCF) and 1 CD4+ epitope (VTVNPFVSVATANAKVLI) were selected as a putative WNV vaccine, with an estimated PPC of 97.14%.


Introduction
West Nile Virus (WNV) is a mosquito-borne Flavivirus that causes West Nile Fever (WNF) and West Nile neuroinvasive disease (WNND) in birds, humans, and horses [1]. Originating in the West Nile regions of Uganda in 1937, WNV has now become a prevalent human infection. Before mid-1990, the virus was confined to Africa and Europe, then spread to North America, the Middle East, and West Asia. Two and a half million cases were reported between 1999 and 2010, of which, 12,000 were WNND, resulting in over 1300 deaths. Thus, WNV has become a major global public health concern.
WNV infection manifests as one of three disease states: asymptomatic carrier, West Nile Fever (WNF), and West Nile neuroinvasive disease (WNND) [1]. After the initial mosquito bite, 3-14 days elapse before the first symptoms, with rapid progression thereafter. Asymptomatic carriers represent 75% of all cases, with 25% presenting with WNF or WNND [2]. WNF is a mild, self-limiting disease which presents as general fever, malaise, and muscle and gastrointestinal pain [3]. Overall mortality is~4.2% but rises to 9.6% in WNND.
WNV is part of the Flavivirus genus, which comprises over 70 viruses and many human pathogens, including numerous mosquito-borne viruses [4]. WNV is transmitted from an infected host via a mosquito bite, primarily by the Culex spp. and to a lesser extent the Aedes spp. [5]. Human and equine infections occur outside the natural transmission lifecycle sustained between mosquitos and birds [6]. Horses and humans are "dead-end" hosts, unable to reinfect mosquitos due to insufficient viremia [4].
The WNV genome comprises a single-stranded nonsegmented positive sense RNA of~11,000 nucleotides [7]. It is transcribed into a single polyprotein, which is cleaved by viral proteases into ten mature viral proteins: three 5 ′ structural segments (C, PrM, and E) and seven 3 ′ nonstructural protein elements (NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5). Phylogenetic analysis suggests that there are at least seven distinct WNV lineages [8]. While infection from other lineages is known, major human outbreaks arise solely from lineages 1, 2, and 5 [6].
There are no current FDA-approved WNV treatments. Reducing exposure to mosquitos is the main strategy, via mosquito nets, protective clothing, and insect repellent and by staying indoors [8]. In the absence of viable therapeutic interventions, effective vaccines could provide long-term protection against WNV. No commercial human WNV vaccines exist [9], but successful vaccines for closely related Flaviviruses-Japanese encephalitis virus, tick-borne virus, and Yellow Fever virus-suggest that an effective, well-tolerated WNV vaccine is feasible [10]. Adaptive immune responses promote viral clearance and control WNV infection [11]. Several novel vaccine candidates are currently in phase I and phase II clinical trials [12], yet WNV clinical trials face several challenges including late or sporadic presentation of symptoms, asymptomatic carriers, inconsistency of outbreaks, trial logistics, and comorbidities in the elderly [8].
The only trialed WNV inactivated vaccine comprised a minimally pathogenic Kunjin virus incubated with hydrogen peroxide [9]. PrM and E proteins of lineages 1 and 2 have been exploited as subunit vaccines. Lineages 1 and 2 can provide crossprotection against other lineages suggesting that a single universal vaccine is feasible [13]. Recent approaches have tried to induce neutralising antibodies by targeting highly immunogenic E antigens. Capitalising on the successful equine vaccine PreveNile [12], the most promising current human vaccine, ChimeriVax-WN02, uses the Yellow Fever 17D backbone to incorporate PrM and E genes [10]. Three gene mutations (L107F, A316V, and K440R) attenuated the virus [13].
PrM, E, NS3, and NS4B proteins are commonly targeted by CTLs [11]. Long-lasting immunity has been achieved in phase I subunit vaccine trials using adjuvants and DIII regions of the WNV E protein [4]. A DNA vaccine expressing the NY99 capsid protein generates a strong CD4+ immune response with a significant rise in IL-2 and IFN-γ levels [12]. Vaccines expressing domain II of the E protein have produced WNV-neutralising antibodies in phase I trials [9].
The lack of extant WNV vaccines prompts us to evaluate potential epitope ensemble vaccines as an alternative, exploiting our evolving approach to vaccine design. We have exemplified this by identifying putative vaccines against hepatitis C [14], influenza [15], malaria [16], Epstein-Barr virus [17], TB [18,19], and dengue [20]. By focusing on highly conserved immunogenic epitopes with a broad population coverage, we identified optimal selections of prevalidated epitopes of proven immunogenicity. To avoid undesired immunogenicity in designed vaccines, we extend our prior work here to filter out epitope cross-reactivity with the human genome.

Analysis of Epitope Sequence
Variability. Conserved epitopes were identified by analysing conservation of the MSA, using the Protein Variability Server (PVS) [21] (URL: http://imed.med.ucm.es/PVS/). The first sequence in the alignment was used as a reference. Sequence variability was masked, and only fragments with a length greater or equal to 9 were selected. The Shannon entropy threshold was 0.5. CD8+ and CD4+ epitopes with at least 50% overlap were retained for subsequent analysis.

HLA Binding Profile Prediction and Calculation of Population Protection Coverage (PPC).
Binding affinities of conserved CD8+ (http://tools.iedb.org/mhci/) and CD4+ (http://tools.iedb.org/mhcii/) T cell epitopes were predicted using IEDB. HLA I reference set was used for MHC I epitopes (Weiskopf, Angelo et al. 2013) and an HLA II reference set was used for MHC II epitopes (Greenbaum et al. [22] Class I binding profiles present in the top one percentile rank were retained. For MHC Class II, epitopes less than 15 amino acids in length were eliminated and binding profiles in the top five percentile rank were obtained.

2
Journal of Immunology Research Conserved CD8+ epitopes were also analysed using EPI-SOPT (URL: http://bio.med.ucm.es/episopt.html) selecting all ethnic groups in the US population (Caucasian, Black, Hispanic, Asian, and native North American) and a PPC above 95%. Global PPC values for highly-conserved epitopes were calculated using IEDB (http://tools.iedb.org/tools/ population/iedb_input). MHC I and MHC II epitopes were then ranked by PPC. Epitopes were combined within each class to calculate overall PPC values. To create a potential 'universal' vaccine candidate CD8+ and CD4+ epitopes were combined and PPC calculated using IEDB as above.

Results
Searching IEDB identified 165 linear CD4+ and CD8+ T cell epitopes presented to T cells during WNV infection: 53 HLA class I and 112 HLA class II epitopes. Epitope length ranged from 8 to 20 amino acids. Genomic sequences representing all strain variants of WNV were also retrieved from the NCBI West Nile Virus Variation Database. 126 unique protein sequences were retrieved. AJR27178, AJR27181, and AJR27181; AJW82677 and AKH144860; and AJW59216 and AJW59220 were found to be identical. Only unique sequences were retained. Sequence clustering using CD-HIT identified nonredundant sequences representative of all WNV sequences. All major human lineages were present; 5 sequences were generated with two representing lineage 2: AJW59217 (USA, 2002), AJR27898 (Italy, 2014), AHB37632 (Italy, 2013/08), AMZ00438 (India, 1988/02/12), and ALK02494 (Australia, 1991). AHB37632 was discarded due to sequence anomalies. A multiple sequence alignment (MSA) was performed on the remaining 4 sequences. All four lineages were highly conserved: 3069 positions had identical amino acids (89.37%), and only 64 positions showed variable amino acids (1.87%). The remainder was either partially conserved (3.17%) or highly conserved (5.59%). Analysis with PVS showed that 122 of the 165 epitopes had ≥50% sequence identity to the masked WNV reference sequence: 32 CD8+ 9mers, three 10mers, and one 11mer, with 19 epitopes showing 100% sequence identity; 86 CD4+ epitopes had ≥50% sequence identity, with 19 of the 86 epitopes showing 100% sequence identity.
As therapeutic peptides can evoke autoimmune responses, conserved epitopes were screened computationally to detect undesirable cross-reactivity against human tissues. Cross-reactivity (CR) was computed using iCrossR for HLA-I 9mer epitopes only. The output for each mutation was averaged to calculate an overall cross-reactivity index (ICR).
Our results show that none of the epitopes have an ICR greater than 0.01; thus, it seems unlikely that self-antigen recognition and toxic effects would occur; when tested at the three mutation levels, many epitopes did exhibit CR. HLA-I binding profiles and PPC were calculated using EPISOPT and IEDB. All CTL 9mer epitopes were entered into EPI-SOPT to estimate the PCC for the five US ethnic groups (see Table 1). ITYTDVLRY had the largest number of binding alleles (10) and the highest PPC. EPISOPT analysis showed that a PPC > 95% could be reached using HLA-I alleles alone.
Using three epitopes, the maximum PPC was 97.65%, with 24 distinct class I restrictions: TLARGFPFV, GPIRFVLAL, and ITYTDVLRY. 190 different combinations of four epitopes achieved a PPC > 95%. TLARGFPFV and ITYTDVLRY were present in most highly scoring epitope sets, with many epitopes absent from all candidate ensembles.
HLA-I binding profiles were also obtained using IEDB. Peptides in the top 1% rank were retained to ensure strong binding and sufficient immunogenicity. More HLA-A binding alleles (14) were seen than HLAB (9). The IEDB PPC tool predicted that the top EPISOPT ensemble had only an 84.68% PPC. To achieve a PPC of >95%, 6 epitopes are needed: TLARGFPFV, ITYTDVLRY, KSYETEYPK, SYHDRRWCF, MPNGLIAQF, and GPIRFVLAL (see Table 2).
HLA-II binding profiles were estimated using IEDB, retaining epitopes in the top 5% rank. PPC calculation showed that individual epitopes had a relatively high PPC, with 8 having values over 50%. VTVNPFVSVATANAKVLI and GEFLLDLRPATAWSLYAV had the highest value: 70.55%. ILVSLAAVVVNPSVKTVR and VTVNPFVSVA-TANAKVLI achieved a combined PPC of 81.81%. The addition of further epitopes had no effect. Many CD4+ epitopes had binding profiles that were subsets of other epitopes. These epitopes were removed, leaving a set of HLA-II alleles covering all epitopes (see Table 3). Many of the high-scoring epitopes also had overlapping binding profiles.
Both CD4+ and CD8+ T cells are important in viral clearance. CD8+ and CD4+ epitopes were combined to calculate a PPC, using the PPC tool on IEDB. By combining the top two epitopes from each HLA subset VTVNPFVSVATANAKVLI and ILVSLAAVVVNPSVKTVR, and ITYTDVLRY and TLARGFPFV, a PPC of 96.36% was achieved. A vaccine ensemble comprising ITYTDVLRY, TLARGFPFV, and SYHDRRWCF and VTVNPFVSVATANAKVLI covered 97.14% of the world's populations. This increased to 99.52% when using 11 epitopes (see Table 4).

Discussion
West Nile Virus has been recognized as a reemerging global pathogen. Present in Africa for over 50 years, recent geographical transmission has raised its profile as a public health concern. There are no current effective treatments, and the cost-to-benefit ratio of the WNV development pipeline is poor. Vaccination is a key intervention. An efficacious WNV vaccine could significantly benefit the global population. Vaccines are available for closely related Flaviviruses and against equine WNV, resulting in a significant reduction in annual mortalities. Veterinary vaccine Equilis West Nile is an inactivated whole virus vaccine comprising a strain known as Yellow Fever-West Nile. The vaccine is given to horses over six months via 2 intramuscular injections, 3 to 5 weeks apart, with a single booster injection given a year later. In comparative evaluations, correlates of protection were seen in 89-94% of treated animals in different test groups. Most previous candidate WNV vaccines have relied on B cell-mediated immunity. Here, we attempt to identify 3 Journal of Immunology Research highly conserved T cell epitopes that might form an epitope ensemble sufficiently immunogenic to protect against geographically diverse WNV strains.
Epitopes in adoptive immunotherapies may exhibit undesired side effects [23], such as CR when foreign peptide sequences resemble those of self-peptides sufficiently to initiate an unwanted autoimmune response. Addition of computational CR prediction to our design-by-selection protocol is a key advance over previous work [14][15][16][17][18][19][20] and should accelerate the early selection of safe vaccines. When using iCrossR [23], none of the epitopes were identified to elicit responses cross-reactive with human tissues. McMurtrey et al. [24] and Kaabinejadian et al. [25] identified epitopes presented by HLA-A * 02:01 and HLA-A * 11:01, and many were also selected by our approach, including RVL9, SVG9, TLA9, KYS9, AVV9, RLD10, ATW9, and SLT9. YTM9, SLF9, and KNM9 were eliminated as they lacked ≥50% sequence identity to the viral reference.
No single epitope provided protection against WNV. A PPC > 95% was only possible when multiple epitopes were combined. A potent immune response needs both CD4+ and CD8+ T cell responses [26]. Combined epitopes generate    greater T cell responses than a single epitope, reinforcing the need for multiple conserved epitopes [27]. At least four epitopes were needed for a PPC > 95%. HLA class I 9mers (TLA9, SYH9, and ITY9) combined with one HLA-II epitope (VTV18n) gave a cumulative population coverage of 97.14%.
Virus replication introduces mutations which can eventually generate new strains or alter existing ones; yet, we found that 90% of the WNV sequence is conserved (H < 0:5). This lack of variability is an aid to vaccine design as conserved regions can be exploited as therapeutic targets, as newly emerging strains retain conserved amino acids. CD-HIT was used to remove redundant protein sequences. While remaining sequences represented the most common human WNV lineages (1a, 1b, 2, and 5), the less common lineages (3 and 4) were excluded. To generate a truly universal vaccine, crossprotection against all WNV lineages and strains should be investigated. Thus, a second-generation WNV vaccine may need inclusion of lineages 3 and 4.
All T cell epitopes included in our final vaccine combination were either located in the E protein (VTV18), NS2A (ITY9), NS3 (SYH9), or NS4B (TLA9). Finding highly conserved epitopes in the E protein, NS3, and NS4B is expected, since previous work has shown these proteins to be highly immunogenic and common targets for CTLs [11,28]. Variability across the genome is uneven, with the structural proteins being the least variable [29].
The C region has the highest proportion of residue alterations (~23%) [30]. The E protein has been exploited in previous vaccine design: most current DNA vaccine candidates against Flaviviruses express the viral E and PrM proteins [31]. Sarri and coworkers identified HLA alleles responsible for susceptibility to WNV infection. They categorized HLA alleles to be either "protective," increased "susceptibility," or CNS-high risk [32]. None of the protective binding alleles suggested by Sarri et al. were identified here. This "protective" function of HLA alleles could be exploited in WNV vaccine development to provide protection for all ethnicities.
By considering functional cross-reactivity with human proteins, the work described here represents a step forward in our evolving approach to vaccine design. Work based on naive sequence similarity have not previously proved useful [33][34][35]. Here, it proved possible to combine 4 epitopes-3 CD8+ and 1 CD4+ T cell epitopes-to achieve a global PPC of 97.14%. Combined CTL and CD4+ are required for successful viral clearance. The ensemble identified is a viable starting point for further in vitro characterization or phase 0 trials, since we can assume that this epitope selection is likely both safe and immunogenic in the majority of most populations. This paper emphasises the application of computational cross-reactivity prediction to vaccine design, only allowing selection of epitopes without either structural or sequence similarity to the human genome. Overall, our work provides a promising starting point for the exploration of next-generation WNV vaccines.

Data Availability
Data is available from http://www.iedb.com and available from the authors.

Conflicts of Interest
The authors declare that they have no conflicts of interest.

Funding
This research was supported by grant BIO2014:54164-R to PAR.