Vaccine would be the ideal solution for the HIV/AIDS problem. An efficient vaccine should ideally stimulate both humoral- and cell-mediated immune responses. Such a versatile vaccine has so far not been developed against HIV. The extraordinary genetic diversity of HIV is one of the major roadblocks, along with other factors like the early establishment of latent infection through integration into the host genome, and the capacity of the virus to evade adaptive immune responses (Walker et al. 2011; Barouch 2008). One of the lessons learnt during the course of last three decades of HIV research is that a global vaccine may not be possible for HIV, and hence efforts have to focus on research to develop a clade/subtype-specific vaccine.

Multi-epitope vaccines have been tried for various conditions including cancer (Wang et al. 2011), hepatitis B virus (Depla et al. 2008), and H5N1 (Adar et al. 2009). This approach has been advocated for HIV/AIDS, as an epitope-based vaccine could address the issue of viral genetic diversity and elicit immune response against HIV (Xiao et al. 2001). Jin et al. (2009) tested a multi-epitope vaccine against HIV in a phase I clinical trial in HIV-uninfected adults and demonstrated CD4+ T cell response with a diverse polyfunctional cytokine profile. Identifying the best combination of epitopes is a prerequisite for the design and development of an effective vaccine against HIV.

A tremendous amount of effort and time have been invested on identifying immunogenic peptides of HIV that can help in the design of a much needed vaccine for this disease. Cytotoxic T lymphocyte (CTL) responses targeting the virus have been well demonstrated in the control of HIV by several investigators (Borrow et al. 1994; Musey et al. 1997; Altfeld et al. 2006). There are over thousand HIV peptides have been demonstrated to elicit CTL responses in in vitro studies (HIV Molecular Immunology Database available from http://www.hiv.lanl.gov/content/immunology/). We undertook an effort to prioritize these epitopes in order to identify the best candidates that can go into the design of an effective multi-epitope vaccine for HIV.

Host genetic components influence disease outcome in many of the infectious diseases including HIV/AIDS (Blackwell et al. 2009). Several HLA alleles have been reported to be associated with resistance or slow progression to HIV/AIDS including HLA-A02 (MacDonald et al. 2000; Singh et al. 2008), HLA-A11 (Selvaraj et al. 2006), HLA-B27 (Kaslow et al. 1996; McNeil et al. 1996; Singh et al. 2008; Neumann-Haefelin 2011), particularly HLA-B*2705 (International HIV Controllers Study 2010), HLA-B51 (Kaslow et al. 1996; Tomiyama et al. 1999; Zhang et al. 2011), and HLA-B*5701 (Kaslow et al. 1996; Migueles et al. 2000). On the other hand, other HLA alleles including HLA-A24 (Singh et al. 2008), HLA-B7 (International HIV Controllers Study 2010), HLA-B35 and HLA-B53 (Kaslow et al. 2005), and HLA-B40 (Selvaraj et al. 2006) are believed to be associated with susceptibility or rapid progression to HIV/AIDS. The differential influence of HLA on the outcome of disease can be exploited for the rational design of a HIV vaccine. Roshorm et al. (2009) have previously demonstrated the efficacy of a HLA-B*5101 restricted multi-epitope vaccine in inducing CD8+ T cell responses in BALB/c mice. Further, the stability of the peptide–HLA complex has been reported to enhance immunogenicity (van der Burg et al. 1996; Kirksey et al. 1999; Borbulevych et al. 2005). Molecular modeling could be effectively used to model epitopes on to the binding sites of HLAs for prioritizing them based on the binding affinity. Thus, based on the fact that epitopes that bind specifically to resistance-associated HLA alleles would potentiate a protective immune response against HIV, we performed molecular modeling of known epitopes on HLA alleles associated with resistance/slow progression to HIV/AIDS in order to prioritize them.

Among several subtypes, HIV-1C is the most common clade responsible for the burden of AIDS globally (Hemelaar et al. 2011). Due to the high degree of genetic diversity between the subtypes of HIV (Gaschen et al. 2002), discovery of specific vaccines for each of them individually may be more feasible than a common vaccine for all of them. In this study, we restricted our analysis to HIV-1 subtype C-specific epitopes.

A list of known CTL epitopes (1,309) available in the ‘HIV Databases’ (HIV Molecular Immunology Database) was downloaded on September 1, 2011. From the set of 1,309 peptides, we selected only peptides nine amino acids in length for our analysis, since MHC class I molecules preferentially interact with 9-mers, while 8, 10, and 11mer peptides are reported to rarely bind to the MHC molecules (Dönnes and Kohlbacher 2006; Lundegaard et al. 2008). This resulted in a set of 645 peptides. Further, we restricted our analysis to only HIV-1C peptides, as HIV-1C is the predominant subtype in the global epidemic (Hemelaar et al. 2011) and identifying a vaccine against this clade of virus is a global priority. This further brought down the number to 155.

The 155 epitopes were modeled onto the binding groove of three different HLA alleles (HLA-A*0201, HLA-B*2705, and HLA-B*5101) known to be associated with resistance or slow progression to HIV/AIDS, using MODPROPEP (Kumar and Mohanty 2007). MODPROPEP is a Bioinformatics tool that can be used to model epitopes onto the substrate binding groove of HLA alleles. As the increase in stability of peptide–HLA complex has been demonstrated widely to increase the immunogenicity (van der Burg et al. 1996; Kirksey et al. 1999; Borbulevych et al. 2005), epitopes that gave better binding energy scores than the set threshold value upon modeling were considered as better epitopes. Among the structures available in MODPROPEP for HLA-A*0201, “1I1Y” (Kirksey et al. 1999) found in complex with an immunodominant epitope from HIV-1 RT 309-317: YLKEPVHGV (mutated from ILKEPVHGV) was selected for modeling epitopes, as this structure had the best resolution among all structures complexed with HIV epitopes. This complex was observed to have a binding energy score of −46.16, and this was used as the threshold value. Seventy of the 155 epitopes modeled with HLA-A*0201 had a better binding affinity than the known immunodominant epitope of HLA-A*0201. Similarly, we selected the HLA-B*5101: 1E27 structure, which is also complexed with an HIV immunodominant epitope (Pol-743-9; LPPVVAKEI (Maenaka et al. 2000) and HLA-B*2705: 1OGT structure (Hülsmeyer et al. 2004). Each of the 155 epitopes were modeled onto the epitope-binding sites of HLA-B*5101 (1E27) and HLA-B*2705 (1OGT). The binding energy scores of the peptides present in the respective HLA crystal structures were used as the threshold value to rank the epitopes. Sixty-seven and 85 epitopes were identified as efficient binders to HLA-B*2705 and HLA-B*5101, respectively. Forty-three epitopes were identified to be efficient binders to all three HLA alleles selected in this study than the respective control (known) peptides. The frequency of HLA-A02 alleles worldwide is about 40.0 % (Sette and Sidney 1999; Singh et al. 2008) and that of HLA-B27 allele is 23 % (Sette and Sidney 1999) and varies widely from 0.9 to 29 % in different regions in the Indian population (Chhaya 2005). HLA-B51 allele frequency varies between 6 and 15 % in the Asian population (Tomiyama et al. 1999; Chhaya et al. 2010).

We evaluated whether any of these epitopes also additionally bind to HLA alleles that are associated with susceptibility to HIV or rapid progression of disease. Among the reported susceptibility associated HLA alleles, only HLA-B*5301 is available in the MODPROPEP server. The 43 epitopes were modeled on to the binding groove of HLA-B*5301 (1A1M). Only 2 of the 43 epitopes were found to bind efficiently to the HLA-B*5301 and were excluded. Five more epitopes were also reported to bind to HLA-A24 or HLA-A*2402 (HIV Molecular Immunology Database). Since HLA-A24 is also known to be associated with susceptibility, these five epitopes were also excluded. This brought down the number of epitopes that interacted exclusively with multiple resistance-associated HLA alleles to 36. The workflow has been represented schematically in Supplementary Fig. 1.

Intermolecular interactions between one of the short-listed epitopes which is also a widely reported one, Gag p17:77-85 SLFNTVATL (Kaul et al. 2001; Lee et al. 2004; Howles et al. 2010), was analyzed with all three HLA alleles used in this study. The interaction profile is shown in Supplementary Fig. 2, 3, 4, and 5 and Supplementary Table 2. Various factors including effective binding of the peptide with the HLA, consequent recognition of epitope–MHC complex by TCR, expression of viral proteins, antigen processing, and the presence of T cell repertoire contribute to the immunogenicity of an epitope (Sette et al. 1994; Deng et al. 1997; Moutaftsi et al. 2006). Among these, binding affinity between the epitope and HLA has been widely studied and found to correlate with immunogenicity (Schueler-Furman et al. 2000; van der Burg et al. 1996; Kirksey et al. 1999; Borbulevych et al. 2005). However, there are also contradictory reports which state that no significant association exists between binding affinity and immune response (Feltkamp et al. 1994; Bihl et al. 2006). In spite of differing opinions about the significance of binding affinity and immunogenicity of epitopes, no non-binders can induce immune response (Sette et al. 1994). Thus, effective binding of epitopes with HLA is a prerequisite for immunogenicity, and therefore, selecting epitopes based on their binding efficiency would be a reasonable method to identify epitopes, although there is a chance for false positives to be picked up. The stability of epitopes with HLA molecules is influenced chiefly by anchor residues at position 2 and 9, and at times by residues at other positions (Ruppert et al. 1993; Li and Bouvier 2004). Both kinds of interactions were found to occur between the above epitope and HLA molecules. Those epitopes which are polyfunctional in nature and able to stimulate CTL, as well as TH cells, would be the most ideal vaccine candidates. Epitopes specific to HLA-class-II are reported to be 9–22 amino acids in length. However, only nine residues generally fit into the binding groove of the MHC class II molecule (Lundegaard et al. 2007; Lafuente and Reche 2009). Thus, some of the CTL epitopes can also bind to CD4+ T cells and induce immune response. We examined whether any of the short-listed CTL epitopes were also present as a subsequence in any of the reported CD4+ epitopes, listed in HIV Molecular Immunology Database. Twenty of the 36 CTL epitopes were found to be part of CD4 + -specific epitopes (Table 1). Thus, these 20 epitopes can be considered the most potent set of candidate epitopes for the formulation of a multi-epitope vaccine. The remaining 16 CTL epitopes are provided in Supplementary Table 1. Of the 20 polyfunctional epitopes, four were from p17, eight from p24, one from Integrase, three from gp160, and four from Nef. This group therefore has representation from different antigenic proteins of HIV-1 and is likely to well represent the expressed genome of HIV-1. Further, 15 of these 20 epitopes are also present in HIV-1 subtype B indicating their relevance in vaccine design against clade B of HIV-1.

Table 1 List of epitopes that selectively bind to HLA alleles associated with resistance or slow progression to HIV/AIDS

We validated our findings using a second method available from IEDB (Immune Epitope Database and Analysis Resource, available from http://tools.immuneepitope.org/analyze/html/mhc_binding.html). This is a consensus algorithm of four other methods reported to be superior to the methods on which it is based (Moutaftsi et al. 2006). In addition to prediction of epitopes from given protein sequences, this tool can also rank peptides based on their binding efficiency with HLA-I alleles. We used this tool to rank all 155 epitopes based on their binding efficiency and examined how many of the short-listed 36 epitopes (using MODPROPEP) fell within the top 50 percentile, with all three different HLA alleles. Nine of the 20 polyfunctional epitopes were identified to be strong binders with all three HLA alleles in the second method. Another eight epitopes were identified as strong binders with two of the three HLA alleles (HLA-A*0201 and HLA-B*2705, two; HLA-A*0201 and HLA-B*5101, three; and HLA-B*2705 and HLA-B*5101, three). Two epitopes were found to bind strongly to HLA-A*0201 alone, and one epitope was identified to be specific for HLA-B*2705 alone. Collectively, 17 epitopes were identified by both methods as strong binders to two or three resistant-associated HLA alleles (Table 1), hence they could be potential candidate epitopes for a vaccine formulation for HIV-1C.

Some of these 17 polyfunctional epitopes overlapped or varied slightly from each other. For example, two variants of an epitope were observed in p17:77-85 (SLYNTVATL and SLFNTVATL) that differed by one amino acid. Similarly, two more variant epitopes were found in the region p17:78-86 (LYNTVATLY and LFNTVATLY), contributing to four different entries in the region of p17:77-86. Further, three epitopes were identified in p24: 32-43; the last six amino acids of epitope p24: 32-40 (FSPEVIPMF) overlapped in two variants (EVIPMF SAL and EVIPMF TAL). Two more epitopes from p24 overlapped by six amino acids (p24: 161-169-FRDYVDRFF and 164-172-YVDRFFKTL). A set of epitopes were also identified to have an overlap of eight amino acids from Nef (83-91: GAFDLSFFL and 84-92: AFDLSFFLK). Thus, four epitopes (one from p17, two from p24, and one from Nef) could be used to represent the above 11 epitopes. The remaining six epitopes (p24:3, Integrase, one, gp160, three; Nef, two) were found to be unique, giving a total number of 10 high priority candidate epitopes.

In summary, we identified a set of 10 highly potential candidate epitopes that can go into the making of a successful multi-epitope vaccine for HIV-1C, as these epitopes have qualified in several rounds of selection: (1) they are all reported CD8+ epitopes by in vitro studies, (2) they bind efficiently with two or three different resistance-associated HLA alleles (demonstrated by two different methods), (3) they bind with greater affinity than two reported immunodominant HIV-1 epitopes found in complex with HLA-A*0201 and HLA-B*5101, (4) they do not bind to susceptibility associated HLA alleles, and (5) they are polyfunctional in nature. This emphasizes that these candidates warrant further evaluation for their potential in vaccine design and development. The limitation of this study is that HLA-B57 and HLA-A11 alleles which are also reported to be associated with resistance to HIV have not been included in the analysis, since their crystal structures are not available in the MODPROPEP. Otherwise, the present study has employed a logical and systematic approach to obtain meaningful clues from the enormous amount of valuable data available.