First report of computational protein–ligand docking to evaluate susceptibility to HIV integrase inhibitors in HIV-infected Iranian patients

Background Iran has recently included integrase (INT) inhibitors (INTIs) in the first‐line treatment regimen in human immunodeficiency virus (HIV)-infected patients. However, there is no bioinformatics data to elaborate the impact of resistance-associated mutations (RAMs) and naturally occurring polymorphisms (NOPs) on INTIs treatment outcome in Iranian patients. Method In this cross-sectional survey, 850 HIV-1-infected patients enrolled; of them, 78 samples had successful sequencing results for INT gene. Several analyses were performed including docking screening, genotypic resistance, secondary/tertiary structures, post-translational modification (PTM), immune epitopes, etc. Result The average docking energy (E value) of different samples with elvitegravir (EVG) and raltegravir (RAL) was more than other INTIs. Phylogenetic tree analysis and Stanford HIV Subtyping program revealed HIV-1 CRF35-AD was the predominant subtype (94.9%) in our cases; in any event, online subtyping tools confirmed A1 as the most frequent subtype. For the first time, CRF-01B and BF were identified as new subtypes in Iran. Decreased CD4 count was associated with several factors: poor or unstable adherence, naïve treatment, and drug user status. Conclusion As the first bioinformatic report on HIV-integrase from Iran, this study indicates that EVG and RAL are the optimal INTIs in first-line antiretroviral therapy (ART) in Iranian patients. Some conserved motifs and specific amino acids in INT-protein binding sites have characterized that mutation(s) in them may disrupt INT-drugs interaction and cause a significant loss in susceptibility to INTIs. Good adherence, treatment of naïve patients, and monitoring injection drug users are fundamental factors to control HIV infection in Iran effectively.


Introduction
Deadly disease outbreaks and emerging viral diseases inflict severe public health in the developed and developing countries [1] Among these, AIDS continues to be a significant problem worldwide. HIV-1 has three essential enzyme uses for its replication; integrase (INT) is one of them that catalyzes viral integration. HIV-INT consists of three structural and functional domains: the N-terminal domain (NTD, residues 1-49), the catalytic core domain (CCD, residues , and the C-terminal domain (CTD, residues 213-288). It also contains a conserved DDE motif encompassing amino acids Asp64, Asp116, and Glu152 in the CCD necessary for drug binding and enzyme activity [2]. As a result of the drug resistance development across currently available drugs, WHO has put forth the use of INTIs: raltegravir (RAL) and elvitegravir (EVG) as the first-generation inhibitors, dolutegravir (DTG) and bictegravir (BIC), along with the late-phase clinically trialed cabotegravir (CAB), as the second-generation INTIs [3]. First-generation INTIs have a relatively low genetic barrier to resistance, whereas second-generation INTIs confer to a higher genetic barrier against RAMs [2]. Treatment failure occurs due to HIV mutations, poor adherence, variations in pharmacokinetics [4], etc. Two categories of mutations are related to INTIs drug resistance: RAMs and NOPs as the primary and secondary pathways, respectively. NOPs are subtype-specific polymorphic mutations that affect INT DNA binding affinity in the presence of RAMs [5]. Little is known about the potency of mutations influencing susceptibility to INTIs in CRF35-AD subtype virus treatment.
In this study, for the first time computational methods and molecular analysis were done to assess the influence of RAMs and NOPs on docking energy between INTIs and INT protein complexes in Iranian patients.
Since INTIs are currently advised in Iran when patients do not respond to first-and second-line ART, more information on the drug susceptibility, primary, and secondary drug resistance mutations profile of INTIs is required to guide its implementation across the country.
Moreover, posttranslational modifications analysis was performed on patient's INT sequences.

Study population
This cross-sectional study was conducted from June 2017 to June 2020, before the initiation of Iran's national HIV treatment program and the introduction of INTIs. Plasma samples were collected for viral load assay from 850 HIV-infected patients originating from the south of Iran; they were either antiretroviral therapy-naïve or RTIs and/or PIs treatment-experienced patients with a viral load above 1000 IU/ml enrolling at the Behavioral Diseases Consultation Center (BDCC) affiliated with Shiraz University of Medical Sciences. Medication adherence means a ratio of the number of pills doses taken in patients to the number of doses prescribed over a given period [6]. According to the self-report inventory, the patients' adherence level were classified into good, unstable, or poor adherence groups. Patients with an excellent history of adherence and intermittent phases of non-adherence were placed in good and unstable adherence groups, respectively. The poor adherence category was related to patients who could rarely adhere to ARTs [7]. In this report, poor and unstable are defined as reduced adherence.

CD4 count and biochemical tests
Using "FACSPresto Near-Patient CD4 Counter (BD Biosciences)", CD4 T-cell count was performed. Aspartate aminotransferase (AST) and Alanine aminotransferase (ALT) levels (IU/L) in the serum samples were measured using the commercial enzymatic kits of Biorex-Fars Company (Shiraz, Iran) and DIRUI Automatic Biochemistry Machine.

HIV viral load, RT-nested-PCR, and sequencing
Serum RNA was extracted by the QIAamp Viral RNA Mini Extraction Kit (Qiagen, Germany), and Artus HI Virus-1 RT-PCR kit (Qiagen) was utilized to define viral load in all samples. HIV-INT region was amplified by RT-nested PCR using the primers listed in Table 1. Finally, the positive amplicons were purified by a gel extraction kit (QiagenGmbH, Hilden, Germany) followed by Sanger sequencing of both DNA strands with the limit of detection ~15-20%. (Niagene Noor Company, Iran).

Drug resistance analyses
Based on the RAMs and the high frequent NOPs detected in the INT sequences, 78 patients were clustered into 11 groups (1-11) and 7 models including five INTIs mutated models (RAL, EVG, CAB, DTG, and BIC mutated models) and two mutants: mutant 1 and mutant 2. The high frequent NOPs in the INT protein is shown in Table 2.
In this study, INTs of all 11 groups and 7 models are called mutated INT genes/proteins.
Each group encompasses distinct mutations (Table 3); also, five INTIs mutated models and mutant 1 and 2 were generated by substitution of the exact mutation(s) (

Post-modification changes
PTMs of proteins are involved in the attachment of functional groups or small proteins such as disulfide bridging, phosphorylation, glycosylation, etc. to specific residue in the protein that alters the charge and the structure of the protein. NetPhos [11,12], DISPHOS [13,14], and Net-PhosK [15,16] were performed to assess the kinas specific phosphorylation sites in INT proteins, and SCRATCH [13], DIANNA [14], VADAR [17], DbD2 [18], and PIC server [19] were used to define the disulfide bonds. Disulfide post-modification analyses showed that mutations might impact the number of bonds and bond patterns. Formation of the disulfide bond locks the structure in place and mostly increases the stability and half-life of the proteins [20]. Glycosylation is one of the most widespread and versatile protein modifications required for progeny formation and proper folding of viral proteins. Glycosylation sites were predicted via GPP Prediction Server [21], NetOGlyc [22] and GlycoMine [23]; moreover, SUMOylation sites were characterized by Table 1 List of primers used in this study and thermal-cycling conditions of nested-PCR.  JASSA [24], SUMOgo [25], SUMOplot [26], and GPS-SUMO [27]. Finally, Ubiquitylation site was suggested using RUBI tools [28] ( Table 5b).

SOPMA: secondary structure
Secondary structural features can change the binding pocket properties or affect the stability of the whole protein [29] that influence the accessibility of drugs to a protein's active site. This structure was interpreted using "SOPMA" software [13,30,31] (Table 5c). Four conformational states were suggested in all mutated INT proteins and reference, including Helix, Sheet, turn, and coil with the window width and similarity threshold of 17 and 8, respectively.

Ligand receptor docking and visualization
The higher docking score represented better binding affinity, indicating the strong attachment of integrase inhibitors to integrase proteins to suppress HIV functions that would contribute to the promising treatment outcome. The impact of RAMs and high frequent NOPs on INTIs treatment outcome was evaluated through docking analysis of INTs with BIC, EVG, DTG, RAL, and CAB drugs. For this, the PDB format of five INTIs was obtained from the DrugBank database (https://go.drug bank.com/) [38]. To find the possible interaction between mutated INT proteins and INTIs, "Hex" software [39] was employed for docking, and visualization of the amino acids interacting with INTIs was assessed by Discovery Studio software [40].

HIV-1 subtype classification and phylogenetic analysis
Phylogenetic tree and seven online subtyping software (REGA HIV-1 subtyping [44], Stanford HIV subtyping program [45], NCBI genotyping [46], Geno2pheno INT [47], HIV-GRADE [48], COMET [49], and jpHMM [50]) were applied to declare the subtype of all 78 patients (Table 5f). Phylogenic analysis was performed on the sample sequences along with 89 reference strains from various subtypes sequences (Supplemental Table 2), using MEGA software [51]. The neighbor-joining method constructed the phylogenetic tree based on the Kimura 2-parameter distance matrix listed in the MEGA software. Additionally, the statistical significance of the phylogenetic tree was evaluated by the bootstrap method (1000 replicates).

Statistical analysis
As the data was nonparametric, median (Q1 and 3) and frequency (%) were used for quantitative and quantitative data analysis and Kruskal-Wallis and Mann-Whitney tests were used to analyze the data. To interpret the relationship between quantitative variables, we employed Spearman coefficient correlation and for quantitative variables, chi-squared test was used. Differences with a P-value of 0.05 were regarded as statistically significant. Data management was performed using SPSS version 20.0 [52].

Demographic characteristics, CD4 count, and HIV viral load
Of 850 patients for whom HIV viral load test was performed, only 151 individuals had a viral load exceeding 1000 copies/ml that were  considered for RT-nested-PCR during the study period (June 2017-June 2020). This period was before the availability of INTIs in Iran, so it was chosen to provide baseline antiviral resistance data and to ensure the patients were INTIs-naive. The appropriate result in RT-nested-PCR and sequencing was achieved for 78 patients: 62 treatment patients and 16 naïve-treatment individuals. The demographic and laboratory characteristics of 78 patients are summarized in Table 6. Among treatment-experiment people, only 16% were categorized as good adherence, whereas 84% had reduced adherence. A significant correlation was observed between lower CD4 count and more prolonged HIV infection, older age, later stage of HIV infection, naïve treatment patients, seroconversion to anti-HCV antibody positive status, male gender, the symptom of gastrointestinal diseases, the emergence of RAMs, reduced adherence to previous ART regime(s), and being injection drug users (IDUs) (p < 0.05) (Fig. 1). In addition, compared with patients infected with HIV through sex, IDUs showed a significantly higher level of ALT, mutation rate, RAMs, and NOPs (p < 0.05) (Fig. 2).
The median CD4 + T cell count and viral load in patients with good adherence were 451 cells/mm 3 , 213000 copies/ml; in naïve and reduced adherence patients, they were 264 cells/mm 3 and 991000 copies/ml that were significantly different. The factors associated with CD4 count decline included the presence of any symptoms, naïve treatment status, reduced adherence even to reverse transcriptase and/ or protease inhibitors, IDUs status, in men gender, >45 years old, longer duration of HIV infection, and co-infection with HCV. Patients with CD4 levels of 100 or less, or more than 500 cells/mm 3 showed a viral load of more than one million; furthermore, patients with neurological and gastrointestinal symptoms corresponded to significantly higher viral load (p < 0.05).

Genotypic drug resistance analysis
INT nucleotide sequences and related amino acid sequences were aligned and compared with a reference sequence using CLC Main Workbench software (CLC bio, Boston, MA, USA). The presence of RAMs and NOPs was identified using the Stanford University genotypic resistance interpretation algorithm, HIVdb version 8.3 (http://hivdb.stan ford.edu/).

RAMs and NOPs in patients and INT domains
Sequenced samples were screened for the presence of RAMs and NOPs in HIV-INT region in all patients ( Table 7). The data showed that only one major (R263K) and minor (L74I) resistance mutation against INTIs were present in our samples. Accessory mutations were found in 14/78 (17.94%) patients: L74 M (5 cases, 6.41%), Q95K (3 cases, 3.8%), G163R (1 case, 1.3%), and S230 N (5 cases, 6.4%). Of which, one sequence had two accessory mutations, namely L74 M and S230 N.

Table 5
List of links used in this study.

ExPASy ProtParam analysis
Mutated INT proteins with 227, 281 or 282 amino acids length and the average molecular weight of 31.20 KD had the theoretical pI of around 8.68. The average instability and aliphatic index were 29.62 and 84.15, respectively. Next, in vivo half-lives was estimated to be 1.1 h in mammalian cell and >2 and > 3 min in Escherichia coli and yeast. Grand average hydropathy (GRAVY) signified that all INT proteins had been hydropathy with a negative score of − 0.359 (Table 9).

Post-translational modification
In terms of phosphorylation sites, no significant difference could be observed between the reference gene and mutant HIV-1 INT proteins (Supplemental Table 3). The most frequent residues for phosphorylation were serine 39, 57, 134, 255 and threonine 67, 93, 112, 135, and tyrosine 100, 171, and 139. The outcomes of disulfide bond prediction (Supplemental Table 4) revealed that bonds 40-43 and 56-65 had a higher frequency than that of other cysteines.
In this study, the presence of N-glycosylation sites (117 and 120), Cglycosylation sites (19), and some O-glycosylation sites (Supplemental Table 5) was among the most frequent predicted sites. Moreover, SUMOylation sites were assessed, and 19 new amino acid targets (Supplemental Table 6) were suggested that might affect HIV replication. According to the ubiquitination data (Supplemental Table 7), K186, K240, and K258 were the most plausible target lysines for ubiquitination.

Secondary structure
The results of secondary structure prediction of mutant model 1 (A) and reference gene, AB703606 (B) was analyzed (Fig. 3), and the data displayed the pattern of four secondary structures, alpha helix, extended strand, beta-turn, and random coil, which were similar among different groups and models (Table 10). However, alpha-helix (40.07%) and random coil (36.10%) were the most prominent secondary structures in INT mutated proteins (see Table 10).

Tertiary structure
The best-refined models were regarded for validation analysis; then, the qualified models were employed for docking analysis (Supplemental Table 8). To evaluate the effect of RAMs [R263K (major), L74I (minor), G163R, Q95K, S230 N, M50I, V72I, V249I, and L74 M (accessory) mutations] and that of high frequent NOPs (M203I, G134S, V112I, N39S, and Q216H)] on INTIs treatment outcome, we prepared the list of amino acids involved in protein-drug interaction and the related docking scores of INTIs and mutated proteins interaction, as shown in Supplementary  Table 9 and Table 11, respectively. E Value in our patients was in the range of 276.60 kcal/mol for EVG and RAL, 268.5, 263.2, and 261.7 kcal/mol for BIC, CAB, and DTG, respectively. Comparing the average docking score of mutated INT proteins and reference genes of the most frequent subtypes including A1, B, C, AE and CRF35-AD showed that the most efficient INTIs in Iranian patients were EVG, RAL, BIC, CAB, and DTG, respectively. Docking scores of INT reference strains with RAL, EVG, BIC, CAB, and DTG are shown in Fig. 5.

Molecular docking finding
S230 N and Q95K mutations caused a substantial loss in docking energy of groups 7 and 10 for all five INTIs. In addition, reduction in the docking value related to some of the INTIs is displayed in generated models 1, 2, groups 1 and 2. In brief, major, minor, and some of the accessory (Q95K and S230 N) mutations were attributed to E value declined; nonetheless, high frequent NOPs did not influence the binding affinity. Thus, specific mutations in the mentioned groups and models might have a pivotal role in drug resistance.
Although RAMs and NOPs were distributed in different INT domains, all mutated INT proteins were docked through CCD with all INTIs. More importantly, none of the RAMs and NOPs was involved in interaction Table 6 The demographic data of patients and clinical characteristics.

B cell epitopes
Among suggested B cell epitopes, only those that were not placed in α-helix and β-sheet structure of protein were analyzed for antigen properties. Finally, the probable antigens were chosen as favorable epitopes (Table 12).

Subtyping analysis
Based on the phylogenic Neighbor-Joining tree of HIV-INT gene sequences of HIV infected patients in Iran which was generated with the corresponding INT gene of 89 subtype reference strains (Fig. 6) and Stanford HIV Subtyping program analysis, CRF35-AD subtype was the major subtype in our samples, but the other six tools introduced A1 as the predominant subtype. HIV Type 1 Subtypes based on the integrase Gene are listed in Table 13. The result of the phylogenetic tree and  Stanford HIV Subtyping program was in line with the recombination pattern of CRF-35-AD (accession number AF095). The red color shows subtype A1 and purple indicates subtype D (Fig. 7). The subtyping result of all 78 patients is displayed in supplement Table 10.
Conforming to the phylogenetic tree result, some new subtypes including CRF-01B and BF were introduced in Iranian patients for the first time. Of note, our data displayed RAMs found in our patients including V72I, I201V [53,54], M50I, R263K [3], and L74I/M, R263K, S230 N [54] were not specific for any subtype because they were revealed in different subtypes. Comparison of RAMS and NOPs between CRF 35-AD and other subtypes is shown in the Supplementary Table 11.

Discussion
INTIs regimens are highly efficient antiretroviral agents with longlasting potency and reduced toxicity, which are globally accepted in treating naive and experienced individuals. The presence of mutations in INT genes that can change the structural stability and flexibility of these proteins can impact the treatment outcome [2,5].
In this study, we completed several bioinformatics analyses on integrase genes and proteins of the most prevalent subtype in Iran, CRF-35-AD. INT genes failed to amplify in 73 out of 151 plasma samples via an effective RT-nested-PCR. It can be inferred that the emergence of new HIV strains may influence the efficiency of nested-PCR that highlights the importance of whole-genome sequencing of HIV-1 circulating in Iran periodically.
In this study, we found that good adherence even to previous ART regimes was a significant factor to decline the new mutations. Similar to our data, some studies indicated that reduced adherence was linked to some situations and signs such as digestive symptoms [55,56], development of more ART resistance mutations [57,58], more HIV replication, worse virological responses [59][60][61], age less than 35 or more than 45 [58], declined immunological response [57], shorter time on ART, being IDUs, and advanced HIV stage Thereupon [56].
To control HIV infection and achieve better medical outcome, we need to upgrade the surveillance at three levels: identification and treatment of naïve treatment patients, medication adherence improvement, and regular monitoring of IDUs.
From the present analysis, most of NOPs lied in the catalytic core domain that is target for INTIs [62]; yet, such mutations had limited or no impact on the binding energy. This might be due to the hypothesis that NOPs did not influence the functional structure of INTIs; thus, prescribing such inhibitors can be promising in Iranian patients.
The higher docking score represented better binding affinity, indicating the strong attachment of integrase inhibitors to integrase proteins to suppress HIV functions that would contribute to the promising treatment outcome. Yang Luo et al. reported that R263K in combination with four NOPs (S24R, L101 M, G134 N, and K244E) might confer substantial reductions in susceptibility to a wide range of INTIs [67].
R263K appeared with G134 N [68,69] and K244 mutations, it seems the mutation in K244 position can be regarded as one of the HIV escape mechanisms [67].
The E value of interaction between INTIs with integrase genes in group 5 carrying R263Kdeclined for four INTIs: EVG, DTG, RAL, and BIC. This result is in accordance with some studies that observed R263Ksustained a moderate loss in potency against DTG, EVG, RAL, and BIC about 2-fold and had a detrimental effect on CAB susceptibility [70,71].
In addition, one minor RAM, L74I, was found in group 8 that reduced the energy value of BIC, CAB, and EVG. This result is somehow in line with a study that illustrated L74I would have a slight effect on INTIs susceptibility [72]. The previous reports showed L74I contributed to high-level DTG resistance that lowers the potential effect of the first-generation INTIs when combined with some of the major mutations [73][74][75]. Till now, such minor mutation was reported only in one Iranian patient that was resistant to INTIs [68].
In our samples, accessory mutations including G163R, Q95K, S230 N, M50I, V72I, V249I, and L74 M appeared, which can cause a substantial loss in susceptibility to INTIs alone or in combination. INTI drugs usually select G163R [76]; this mutation was reported in INTI-naive patients at a similar rate in our study [64,77]. On its own, our docking result revealed this mutation not only did not appear to be associated with reduced INTIs susceptibility, but also could even enhance the E value.
Other investigations need to be done to clarify the functional role of G163R, and Q95K in response to INTIs.
In addition, the effect of L74 M was evaluated alone or in combination with other mutations. From our data, L74 M retained the docking energy and similar to M50I, may be responsible for the increase in free energy of binding values.
In this study, M50I and R263K were not present in one patient simultaneously. Hence, M50I and R263K were inserted in INT reference, resulting in generation of 5 models, including mutant 1, 2, BIC, DTG, and CAB mutated models followed by the evaluation of the effect of these mutations on the E value. The reduction in energy value was only found in the interaction of BIC drug with BIC-mutated models and mutant 1; it can be concluded that combination of M50I and R263K possibly hurts the sustainability to BIC. In contrast to our data, some studies declared M50I along with R263K was responsible for remarkable loss in DTG [79,80], BIC [81], and CAB [82] susceptibility, but in subtype B.
The high frequent NOPs (M203I, G134S, V112I, and N39P) were identified in our patients not previously reported in two earlier studies in Iran, except for G134S [68]. In our investigation, docking value even increased in the groups of 1-4 carrying M203I, G134S, V112I, and N39P. Therefore, it is suggested that such frequent NOPs will not confer resistance to any of the currently available INTIs. On the other hand, an in vitro study showed V112I was linked to more moderate decreases in viral replication capacity [83,84]; consequently, V112I may contribute to viral fitness to induce resistance in treated individuals [85]. Also, Ceccherini-Silberstein et al. reported that G134S in conjunction with some other mutations resulted in INT catalytic core destabilization and reduction in INTIs efficiency [69,77]. Clinical studies are needed to define whether such mutations at baseline facilitate INTIs resistance in CRF-35AD subtypes.
One of the NOPs in our patients was L101 M, which has not been described previously as RAMs, but L110 M in groups 5, 7, and 9 decreased the docking energy in some INTIs that may be correlated with drug resistance in vivo. Developing the mutant virus carrying L101 M can clarify the influence of this mutation on viral fitness, integration steps, etc.
According to our data, the presence of RAMs and NOPs may cause slight effect on binding energy and drug efficacy; thus, INTIs are likely to be capable of treating Iranian patients infected with the CRF35-AD subtype. This may be due to the location of the amino acid substitutions that were not in the conserved parts of the INT core domain (Asp64, Asp116, and Glu152) [86].

Table 9
The "Protparam" results of INT mutated proteins and selected reference.  To understand the effect of the NOPs and RAMs on binding affinity, performing molecular dynamics can provide a better explanation; nevertheless, the docking data in this report can also be helpful to declare the efficiency of different INTIs on Iranian patients.
Based on the affinity of the ligand-receptor complex, EVG and RAL were linked to higher free binding energy in our samples that can be considered for the optimal INTIs treatment in Iranian population.
One of the reasons for the high score in our patients with EVG and RAL could be attributed to the presence of H bonds that exhibited a strong type of interaction.
Viral infection use PTMs to enhance protein antigenicity and virulence properties; plus, increase protein solubilization, interferon response inhibition, and viral replication that have a significant role in viral pathogenesis.
Therefore, host machinery cells remove the PMTs from viral proteins to activate immune response pathways, control the virus replication, and inhibit the viral protein synthesis to eliminate the virus.
INT undergoes multiple PMTs that play versatile roles in the functions of INT and HIV-1 viral replication [87]. Phosphorylation prediction suggested some residues appropriate for phosphorylation that may be required for the interaction of INT with cellular factors that either tether or stimulate the integration into the genome. Our finding showed that the most suggested sites for phosphorylation modification were S255 and S57. A previous study indicated that preventing phosphorylation at the S255 position exhibited more viral infectivity correlated with an increased chance of viral DNA integration. Also, INT phosphorylation at position S57 led to INT stability that is ultimately required for efficient viral replication [88]. On that account, interfering with S255 and S57 phosphorylation may cause lower viral replication and HIV pathogenesis. Given the different disulfide bonds that can be formed between cysteines, special linkages are shaped according to the energy and structural constraints [89]. Once formed, the disulfide bond covalently locks the structure in place and primarily increases the stability and half-life of the proteins [20].
Therefore, disulfide bonds degradation may degrade viral proteins and provide a new area for antiviral drugs development.
In this report, none of RAMs and NOPs in our sequences provided the new SUMOylation target site. Hence, other factors may have affected the position of SUMOylation. INT is susceptible to be SUMOylated at three SUMOylation sites (45LKGE, 135IKQE, and 243WKQE) on three Lys residues (K46, K136, and K244). INT SUMOylation impairment correlated with a significant drop in integration events and inhibited replication [87,90,91].
Unlike the second SUMOylation site, the first and third SUMOylation sites were conserved in all our samples that can be a desired target for designing drugs to destabilize HIV-INT.
Stabilization of INT is required for efficient genomic interaction which can be done via blockade of ubiquitination associated with proteasomal degradation [92].
Here, different lysines were suggested as ubiquitin targets using bioinformatics tools. Among them, K186 and K240 were essential residues since they play a significant role in the structure and functions of INT protein [93]. To usurp the host-ubiquitin machinery, these lysines should be marked for proteasomal degradation to suppress HIV integration.
Various agents have been categorized as carbohydrate-binding agents (CBAs) that impede virus infection. To suppress the vast majority of viruses, the development of antivirals components to target special deglycosylation sites in viral proteins may provide an absolute opportunity for clinical therapies.
This report defined N, O, and C glycosylation sites and, to the best of our knowledge, the influence of glycosylation on integrase proteins and HIV pathogenesis is not described clearly; thus, experimental studies are needed to elaborate on this matter [94].
The data of ProtParam determined the notable similarity in INT mutated proteins in all patients with reference. However, the pI in different integrase proteins was slightly different, which can be described by the diversity in the number of basic amino acids, as mutated INT proteins are basic proteins. Accurate prediction of the pI of viruses is beneficial for physical/chemical treatment processes and modeling virus behavior in environment [95]. Moreover, the instability index, an estimation of the stability of a protein in a test tube, confirms they are unstable proteins and a relatively high aliphatic index of INT proteins revealed they are thermostable proteins. Next, the average GRAVY, which calculates a grand average hydropathy of the sequence, illustrates all INT proteins inquired moderately hydrophilic property. The GRAVY value less than zero is an indicator of hydrophilicity, suggesting the hydrophilic nature of INT proteins and the possibility of better interaction with water [96].
Attributing to the short half-lives of proteins assessed by Protparam, all INT proteins in this study were a kind of fast degradation proteins in humans, yeast, and E-coli. The rate of protein degradation is dependent on a few factors such as molecular weight, size, and surface charge [97].
In recent years, HIV-INT recombinant protein were used for different approaches including serological diagnostic methods, therapeutic applications, and vaccine development. There is no general expression system to use optimally for all mentioned purposes;thus, various expression host systems should be applied for each purpose.
The effect of the RAMs and NOPs on the secondary structure was analyzed and our finding revealed that these mutations did not change various properties of the secondary structure. Accordingly, significant changes in the binding pocket of the mutated INT proteins did not   happen and drug potency was retained. This finding is in the same line with docking results in this study. In our study, only epitopes located in β-turn and random coil structures were considered for further analysis because such secondary structures are mostly placed in the surfaces of the protein and are more likely to be favorable for binding to antibodies [98]. However, most of the α-helix and β-sheet structures are located inside proteins, which are difficult to be recognized and bound by antibodies [98]. In comparison to the INT reference, different RAMs and NOPs did not affect the location of secondary structures and B cell epitopes.
Here, the subtyping result of our samples is different according to the kind of applied methods. . From our data, INT gene can be considered as   an appropriate region for HIV subtyping in Iranian patients if only the phylogenetic tree or Stanford HIV subtyping program was applied for subtyping. Subtyping results of our patients showed that CRF-35AD subtype was the most frequent subtype in Iran, which is in agreement with other studies in Iran [31,99,100]. The first molecular study of HIV-1 genotypes in 2006 [101] revealed that Iranian HIV-1 subtype was suggested to be A; but, Sanders-Buell et al. analyzed the mentioned sequences again and found they were indeed AD recombinant subtypes. The later research group declared the Iranian sequences contained a small region of subtype D in the envelope regions [102].
In this report, the phylogenic analysis revealed some new subtypes; CRF-01B and BF; therefore, whole-genome sequencing is essential for such samples and those that were not amplified via RT-nested-PCR to confirm the emergence of the new strains.
Till now, the list and role of RAMs are not identified for the CRF35-AD subtype. By comparing the mutations that developed in our CFR-35 AD subtypes with other subtypes (Supplemental Table 11), it can be inferred RAMs are similar in different subtypes.
No compelling evidence was found that HIV-1 subtype, but effectiveness, toxicities, and tolerability of ART regimens need to be considered in choosing first-line or second-line therapy, in low-income and middle-income countries [103]. However, evaluating the role of different mutations in various HIV-1 subtypes which may cause significant loss in INTIs potency may be beneficial to revise algorithms for resistance tests and optimize the prescription protocol of INTIs [104].
Advanced In-silico researches in drug discovery and vaccine designing have a considerable role in HIV studies that have accelerated the rapid advancement in the production and manufacture of medicine to balance the therapeutic options by clinicians.
In other words, choosing a treatment strategy is attributed to the various factors including fewer side effects, availability, acceptable tolerability, short treatment duration, efficacy, and pan-genotype activity. This study could act as a stepping stone to designing novel experiments: Applying mentioned bioinformatic approaches may be very useful to examine the efficacy of antiretroviral drugs periodically in different epidemiological settings to anticipate the most efficient HIV inhibitors. Plus, such results can be even more beneficial if all predictions will be evaluated in-vitro. Furthermore, such tools can optimize new drugs to suppress HIV infection that may lead to the restoration of strong immune responses that aim to eliminate the virus.
Moreover, the molecular docking methods enable the researchers to reveal the interaction between host factors and the HIV proteins to suggest the evolutionary relationship between them that is helpful to recognize the virus behavior.
INT post-modifications are essential for HIV pathogenesis; therefore, applying bioinformatics tools to unveil PTM sites, pathways, and the underlying mechanisms can propose pharmacological inhibitors. Accordingly, to disclose PMT issue bioinformatic studies are suggested for new therapeutic approaches.

Conclusion
Higher E value of EVG and RAL of mutated INT proteins showed these drugs may help achieve optimal treatment response in Iranian patients. Our bioinformatics analysis showed that RAMs and NOPs led to zero to modest loss in INTIs potency, suggesting that INTIs can be considered in the first-line and salvage therapy in treatment of patients infected with CRF35-AD subtype. Among different NOPs, Q95K, S230 N, and L110 M lowered the strength of INTIs docking energy in interaction with INTs that may consider such mutations as the major or minor ones in the CRF35-AD subtype. Various post-translation modifications and Bcell epitope prediction suggested particular target sites and epitope regions for future antiretroviral drugs and vaccines design, respectively. PTM sites and pathways are possible pharmacological targets for new therapeutic approaches. RT-nested-PCR test failed to amplify INT genes in 50% of the samples that might be due to the emergence of new HIV subtypes in Iran; whole-genome sequencing is strongly recommended to clarify this point. More focus on improving the quality of HIV care, good medication adherence to any types of anti-HIV drugs, and treatment of naïve patients along with better management of HIV-positive IDUs are essential factors to achieve continuous HIV care and better medical outcomes in Iranian patients.

Ethics approval and consent to participate
The ethical permission for this research study was granted by the ethics committee of the Shiraz University of Medical Sciences with the Certificate Reference Number of REC 270710028RA. All participants signed written informed consent before participating in the study, in accordance with the Declaration of Helsinki.

Consent for publication
Not applicable.

Funding
This work was supported by the Shiraz University of Medical Sciences with the Certificate Reference Number of REC 270710028RA.