Prediction of an Epitope-Based Vaccine Against Human Immunodeficiency Virus (HIV) an in silico Approach

Human Immunodeficiency Virus (HIV) is the causative agent of the Acquired Immunodeficiency Syndrome (AIDS). The disease is a major global public health threat that causes over 39 million deaths in 78 million cases ever since it was discovered. To date, there is no vaccine available for prevention, and most of the efforts were directed towards the conventional vaccines approaches. Thus, in the present study, an alternative newly emerging in silico approach for designing peptide-based vaccines has been sought. This immune informatics approach has several advantages regarding the safety, stability and the ease to manufacture. We designed a potential multi-epitope vaccine against HIV-1 purposely for the most endemic sub-Saharan African countries. Three essential structural genes that include envelope ( Env ), group specific antigen ( Gag ) and polymerase ( Pol ) protein sequences were retrieved from the databases, analyzed and verified through a total of 15 in silico tools. This resulted in nine antigenic and non-allergic potential epitopes capable to provoke both humoral and cell mediated immunity without induction of autoimmunity. These epitopes covered 76.57% of the world population with a high sequence conservancy varies from 80-97%. Furthermore, computational docking techniques were used to confirm the strong binding interactions of the epitopes with their specific HLA molecules. Albeit, the results are awaiting validations by in vitro and in vivo experiments, nonetheless, this study provides a useful insight for the developing of successful vaccines to prevent the devastating HIV infections.


Introduction
AIDS is number one leading cause of death in Africa and the fourth worldwide [1]. By the end of 2014, approximately 37 million people were living with HIV, of which, nearly 70% are in sub-Saharan Africa, and almost 1-2 million die of the disease every year [2]. This devastating disease infects approximately 3 million new cases annually, 50% of them occur in young adults (15-24 years of age). Moreover, AIDS has left approximately 14 million orphans worldwide [3]. The HIV is a double stranded ribonucleic acid (RNA) retrovirus that infects cells of the immune system and mainly destructs the cluster of differentiation (CD4 + ) T lymphocytes. The use of Antiretroviral Therapy (ART) along with the behavioral changes have reduced the HIV/AIDS-related deaths globally and the number of new HIV infections respectively [2,4]. Ever since the HIV/AIDS has been discovered in 1981, scientists had strived to come up with a protective and safe vaccine using different vaccine technologies; either (i) protein subunit as in the first clinical trials; AIDSVAX ® (VAX004 and VAX003) in 1998-2003 (ii) recombinant adenovirus vector like in HIV Vaccine Trials Network (HVTN) 503/ Phambili (iii) canary pox vector prime followed by a protein subunit boost as in RV144 Thai trial (vi) or DNA prime followed by recombinant adenovirus vector boost in HVTN 505. All trials showed no efficacy on their target popu-lations, except RV144 that showed 32.1% efficacy in preventing HIV infection nonetheless it did not reduce the viral load [5].
In comparison to the conventional vaccines, that is very much laborious, time consuming and expensive, an alternative computational (in silico) approaches have recently been extensively studied for predicting epitopes with reduced cost and time [6]. Epitope Based Vaccines (EVs) is one of these approaches that has significantly progressed after the advancement and better understanding of the molecular basis of antigen recognition and their binding to the human HLA motifs through refined algorithms. These vaccines are easy to develop, chemically stable, highly specific, and free from any infectious or oncogenic potential hazard [7,8].
The rational design for this work has followed paradigm of the emerging publications Chinese researchers that have already taken a step forward towards this new multidisciplinary strategy. They designed an epitope-based peptide HIV vaccine for Chinese populations using immune informatics. Likewise, yet with different organisms such as in Shigella spp as recently described. Moreover, this approach has already proven its effectiveness in the case of multiple sclerosis [9] tuberculosis, Ebola, Respiratory Syncytial Virus, Corona virus [9,10,11]. The strategy of these studies was to make use of the newly emerging field that combine immunology and bioinformatics to simulate the interaction between antigens and host immune system. Therefore, we undertake this new approach, aiming to design an effective and safe HIV-1 vaccine that will protect humans against HIV/AIDs.

Sequence Retrieval
A number of 45 complete nucleotide sequences of the Env gene were retrieved from the National Centre for Biotechnology Information (NCBI) database(http://www.ncbi.nlm.nih.gov/), targeting the endemic countries of sub-Saharan Africa that include Cameron, South Africa, Kenya, Tanzania and Zambia. These nucleotides were subsequently translated into protein sequences via Expert Protein Analysis System (ExPASy) (http://www.expasy. org/) [12].
In addition to, retrieval of the globally available protein sequences of the other Gag and Pol genes of HIV-1 subgroup M were retrieved from Universal Protein Knowledgebase (http:// www.uniprot.org) [13]. All these sequences were downloaded and stored in FASTA format for further analysis.

Sequence Alignment and Phylogeny
Multiple sequences alignment was performed using Biological sequence alignment Editor (BioEdit) v7.0.9.0 [14] and , to look for conserved regions. Subsequently, a total of 45 Env protein sequences from different sub-Saharan African courtiers were phylogenetically analyzed using Molecular Evolu-tionary Genetics Analysis (MEGA) v.6.06 software [16]. The dendrogram was generated using the unrooted, maximum likelihood method based on the JTT matrix-based model as the best model and 1000 bootstrap replicates.

Antigenicity of the Conserved Regions
Antigenicity is a term used to describe the recognition of the molecules by the antibodies and/or the immune system cells. To elicit the best antigenic conserved regions of Env, Gag and Pol, a VaxiJen v2.0 server [17] was used with the default prediction parameters and a threshold value of 0.4.

Epitopes Prediction
Prediction of CTL Epitopes: The potential CTL epitopes were predicted using an online server Net CTL v1.2 (http://www.cbs. dtu.dk/services/NetCTL/) [18]. A server that implements a prediction method integrating MHC-I binding, Transporters Associated with Antigen Presentation (TAP) and proteasomal cleavage. CTL epitopes restricted to 12 MHC class 1 supertypes (A1, A2, A3, A24, A26 B7, B8, B27, B39, B44, B58, and B62) were predicted from their antigenic conserved peptides. Their proteasomal cleavage prediction that depends on Artificial Neural Networks (ANN) and the TAP transport efficiency that depends on weight matrix. The threshold was set as 0.5 to give a specificity of 94% and sensitivity of 89%. During the analysis as the antigenicity might be lost Vaxijen v2.0 [17] again was used. On the basis of the resulted NetCTL prediction combinatorial scores (Comb) of > 0.9 and Vaxijen scores of ≥ 0.4 the best candidate epitopes were selected for further analysis.

MHC-I Binding Prediction:
The FASTA formats of the selected antigenic NetCTL epitopes were analyzed by the Immune-Epitope Data Base -Analysis Resource (IEDB-AR) (http://www.iedb.org/). To calculate the half-maximal inhibitory concentration (IC50) values that predicts the peptide affinity to MHC-I, the recommended NetMHCpan [19] was applied in MHC-I processing predictions tool (Tenzeret al., 2005) (http://tools.iedb.org/processing/) [20] with accuracy ranging from 81-97% as reported in [20,21]. In this processing the CTL epitopes were predicted from their reference protein sequences based on their ability to cleaved by proteasomal cleavage and be transported by TAP transporter. Here, all the MHC-I alleles were chosen with 9 amino acids as fixed length.

MHC-II Binding Prediction:
In contrast to MHC-I, the antigenic conserve peptides were analyzed to predict their binding with MHC-II molecules, following the IEDB recommended method. This method utilizes the consensus method (Wang et al., 2008 and2010), that selects the best method of binding either the neural network-based alignment (NN-align) [22], Stabilization Matrix Alignment Method (SMM-align) [23], Combinatorial library or Sturniolo Method [24] for the prediction of the specific HLA-DP, DQ and DR. All the available alleles in MHC class II binding predicting tool (http://tools.iedb.org/mhcii/) were selected.

Quality Assessment of the Epitopes
Allergenicity and Epitope Conservancy Assessment: AllerTOP v. 2 [27] was used to predict the allergenicity of the epitopes, and the epitope conservancy analysis tool of IEDB (http://tools.iedb. org/tools/conservancy/iedb_input) [28] was utilized to test each potential epitope conservancy level by searching their identities in all protein sequences retrieved from the database.

The Population Coverage Prediction
The human population coverage was predicted by the IEDB population coverage calculation tool (Bui et al., 2006) [30] in which each selected antigenic epitope was submitted to (http:// tools.iedb.org/tools/population/iedb_input). The corresponding HLA alleles from MHC Class I and II resulted from the previous working steps, were submitted. The default parameters connected to the latest HLA genotypic allele frequency database (http:// www.allelefrequencies.net/) [31] were set. This database contains frequencies of 3245 alleles (including both classes) for the world, 16 geographical areas, 21 ethnicities, 115 countries.

Molecular Docking and HLA-Epitopes Interaction
Computational Details: The molecular docking was carried out on a workstation 2.10 GHz Intel ® core ™ i7-4510U CPU, 4GB (RAM), Intel ® HD Graphics, Generic PnP monitor, and operating under the windows 7 ultimate (x86_64bit). Structure construction, optimization, and visualization were carried out using the molecular modeling packages SYBYL-X suite of programs [32,33], and Molecular Operating Environment (MOE). Compounds were minimized at Tripos force field with Gasteiger-Hückel charges using Powell method.
Receptor Preparation for Docking: 3D-structures used in this study were retrieved from protein data bank (http://www.rcsb. org), all water molecules were removed, both polar and non-polar hydrogen atoms were added, side chains were repaired, chain termini were fixed, protonation types were set for the re-orientation of the hydrogen atoms which would be more favorable to hydrogen bonding. Atomic charges were assigned to the receptor using AMBER7 FF99 force field [32], and to the ligand using Gasteiger-Hückel method. The protein complex was minimized using AM-BER7 FF99 force field. Finally, the 3D structure of the prepared protein was saved as MOL2 file [34].

Sequences Divergence and Phylogenetic Analysis
The Env, Gag and Pol aligned protein sequences revealed a number of 8, 10 and 14 conserved peptide regions respectively (Table 1). On the other hand, the dendrogram was generated from 45 Env protein sequences from five sub-Saharan African countries revealed clear phylogeography where sequences from each and every country were grouped separately from each other except one South African VIRT16255. Additionally, clade II also placed the neighboring countries Tanzania, Zambia and South Africa in one group. However, clade I that grouped Kenya and Cameroon in one group remained to be explained since these countries did not share any neighborhood as illustrated in (Figure 1).

Antigenicity Determination
All the obtained conserved regions were subjected to the antigenicity assessment to ensure their ability to provoke human immunity. This resulted in 4, 6, and 10 antigenic conserved peptides for Gag, Env and Pol genes respectively as indicated in (Table 1).

Epitopes Prediction
Prediction of CTL Epitopes: The above mentioned conserved antigenic peptides generated 5, 10 and 14 predicted CTL epitopes for the Gag, Env and Pol genes respectively. Based on the comb score, we have selected the top 14 epitopes and further subjected them to a second round of antigenicity assessment that might be lost during the step of epitope prediction. This assessment revealed that all except one Pol epitope remain antigenic (Table 2), while the rest of the epitopes we choose to neglect, were interacting with 1 or 2 already involved alleles. • The bolded sequences are the selected CTL epitopes. a.
Combinatorial score: The higher the score, the more qualified the epitope.

MHC-I binding prediction:
The standard IC50 threshold used in the previous studies was 500 nM. However, most of the selected epitopes in this study were having a binding affinity of < 250 nM to increase the level of confidence. Collectively, 7 epitopes were found to interact with 13 different HLA alleles, in which (MIVGGLIGL and KLNWASQIY) were found to be interacting with 4 different alleles, (KMIGGIGGF) were interacting with 3 different alleles (Table 3). The total score combines the proteasomal cleavage, TAP and MHC I binding. b.
The lower the value the higher the binding affinity.

MHC-II binding prediction:
The 15mers conserved antigenic regions mention above (3.2) revealed 1, 9 and 13 predicted MHC-II binding epitopes for Pol, Gag and Env respectively (see Appendix 1, 2 and 3). The best candidate epitopes having a binding affinity of < 250 nM and percentile rank of ≤ 20 % were subjected to antigenicity assessment. This reduces the number of antigenic epitopes to 1, 3 and 4 that were considered the most potential antigenic MHC-II binding epitopes and were further tested for allergenicity assessment.

B-Cell epitopes prediction:
The conserved antigenic outside-located peptides (Table 1) predicted 23 potential B-cell epitopes by a default threshold of 0.51 as shown in (Table 4). a. The higher the ABCpred score, the higher prospection to be a B cell epitope. b. The epitopes having Vaxijen score ≥0.4 score was only considered for further analysis.

Quality Assessment for the Epitopes
Allergenicity and Epitopes Conservancy Assessment: Despite the fact that most researchers used AllerHunter server (Muh. et al., 2009), that has 96%, 78.2% and 87.1% as specificity, sensitivity and accuracy respectively in assessing allergens as well as non-allergens. In this study, we preferred to use AllerTOP v.2 for its highest accuracy 88.7% among those servers. Surprisingly, only 1, 5 and 14 epitopes out of the 7, 8 and 23 epitopes of MHC-I, MHC-II and B-cell respectively were found to be non-allergic (Table 5 and    whereas the allergic epitopes are shown in (Appendix 4). On the other hand, as shown in the same tables, most of the Env gene epitopes that bind to MHC-I, MHC-II and B-cell gave 93.33% sequence similarity to the reference protein. Whereas, the Gag gene epitopes vary in their conservancy from 81-84%, and the Pol gene epitopes revealed 97.92% sequences similarity.
Human Self-Peptides Cross-Reactivity Analysis: Only one single B-cell epitope has been detected to have putative conserved domain identical to human peptide among all the non-allergic epitopes. Thus, it was eliminated from the epitopes pool to avoid triggering an autoimmune response.

Population Coverage
The global population coverage of the six non-allergic MHC-I & II candidate epitopes was 76.57%. Concerning the continents, the overall population coverage is presented in (Figure 2). In our target African sub-Saharan countries, the covering was ranging from 63.48% and 59.49% in Cape Verde and São Tomé and Príncipe respectively, passing through Sudan with 53.57%, while the least recorded percentage was 15.73% in Equatorial Guinea. Interestingly, we detected the highest coverage in Northern Ireland (89.01%) followed by Austria (88.98%) and England (88.39 %).

Molecular docking and HLA-epitopes interaction analysis
According to the binding energy illustrated below in (Table 7).

Peptide
Protein (  The lower the energy, the higher the binding affinity of the ligand to the receptor. Since the hydrogen bonds are considered strong bonds, the more the hydrogen interactions found between the ligand and the receptor, the higher the binding affinity. Although the single MHC-I epitope (MIVGGLIGL) that bind to the HLA * 02.01 was having only two hydrogen bonds with lengths of 1.93 and 2.93 Å shown in (Figure 3 and 4),  however, it has the lowest binding energy of -10 Kcal/ mole out of the three selected epitopes. In a comparison made between the other two MHC-II epitopes that bind to the same HLA-DB1:01:01 shown in (Figure 5 and 6).  The (PEVIPMFSALSEGAT) epitope was found to be better in the binding by having a lower energy -7.1 Kcal/mole (Table  5), with two hydrogen bonds with the lengths of 1.74 and 1.98 Å, three electrostatic ionic bonds ranging from 2.85 -3.06 Å in their lengths, two ionic-dipole bonds with lengths of 2.78 and 2.9 Å, a single charge-transfer interaction between lysine 38 amino acid residue of the receptor with the ligand benzene ring and a hydrophobic Van der Waals (VdW) interaction with Valine 42, Glycine 134, Valine 136 and Phenylalanine 22. Whereas (GAA-SITLTVQARQLL) epitope was found to be having the highest binding energy of -5.9 Kcal/mole, five hydrogen bonds vary in their length from 1.81-2.58 Å, three ionic bonds ranging in length from 2.02-2.93 Å, three ionic-dipole interactions vary in length between 2.78-2.95 Å and VdW interaction with Valine 132, Valine 42 and leucine 45.

Conclusion
In spite the fact that epitopes prediction by means of conventional experimental techniques such as recombinant vaccines, sub-unit protein and DNA vaccines, to stimulate both humoral and T cell immunity are feasible, but they are considered as time-consuming, overpriced, and not applicable in the large-scale screening. Alternatively, advances in computer modeling known as in silico studies, provide a systematic and high through put candidate peptides that can hypothetically interact with HLA alleles via high population coverage, therefore, minimizing the high volume of laboratory work with higher probabilities of success. A successful HIV vaccine along with the availability of ART is required to have a free world of HIV by 2030 as presumed by WHO and UNAIDS, 2015 [2,4]. Hence, in this study we managed to identify potential epitopes that have the ability to provoke the immune system against HIV-1. We have predicted a cohort of six epitopes from the conserved regions of HIV-1 Env, Gag and Pol genes that interact with 13 different HLA alleles from MHC-I and II providing CTL responses in human, beside additional seven epitopes that are able to elicit B-cell responses. All the predicted epitopes have a conservancy of > 81% and global population coverage of 76.57%. However, in vivo and in vitro studies are remained to determine their effectiveness as a proposed multi-epitope vaccine.