Adapting the geno2pheno[coreceptor] tool to HIV-1 subtype CRF01_AE by phenotypic validation using clinical isolates from South-East Asia Journal of Clinical Virology

Objectives: Geno2pheno[coreceptor] is a widely used tool for the prediction of coreceptor usage (viral tropism) of HIV-1 samples. For HIV-1 CRF01_AE, a significant overcalling of X4-tropism is observed when using the standard settings of Geno2pheno[coreceptor]. The aim of this study was to provide the experimental backing for adap- tations to the geno2pheno[coreceptor] algorithm in order to improve coreceptor usage predictions of clinical HIV-1 CRF01_AE isolates Study design: V3-sequences of 20 clinical HIV-1 subtype CRF01_AE samples were sequenced and analyzed by geno2pheno[coreceptor]. In parallel, coreceptor usage was determined for these samples by replicative pheno- typing in human cells in the presence of specific X4- or R5-inhibitors. Results: The sole introduction of the CRF01_AE V3 region into a full-length otherwise subtype B provirus failed to produce replication-competent viral progeny. A successive genome-replacement strategy revealed that also CRF01_AE derived gag and pol sequences are necessary to generate HIV genomes with sufficient replication competence. Subsequent phenotypic analysis confirmed overcalling of X4-tropism for CRF01_AE viruses using the current version and the standard cut-off at 10% false positive rate (FPR) of geno2pheno[coreceptor]. Lowering the FPR cut-off to 2.5% reduced the X4-overcalling in our sample collection, while still allowing a safe administration of Maraviroc (MCV). Conclusion: This study demonstrates the successful adjustment of geno2pheno[coreceptor] rules for subtype CRF01_AE. It also supports the unique strength of combining complementing methods, namely phenotyping and genotyping, for validating new bioinformatics tools prior to application in diagnostics.


Introduction
HIV-1 uses either CCR5 (R5-tropic virus) or CXCR4 (X4-tropic virus) as coreceptor for cell entry. Virus variants using the CCRR5-coreceptor generally predominate during early stages of infection. This has been assigned to the mechanism of entry, often involving monocytoid and other cells in mucosal tissue. As the HIV infection progresses in the absence of therapy, viral strains experience an increasing variability within the infected host, also with respect to the cellular tropism. In the late stages of infection, X4-tropic strains then become dominant in more than 50% of patients [1,2]. As far as studied so far, these coreceptor dynamics seem to be similar for the various HIV-1 subtypes.
The determination of coreceptor usage became clinically important in diagnostic settings when a mechanistically new antiretroviral drug, the entry inhibitor Maraviroc (MVC), was licensed for treatment of HIV patients. As MVC specifically blocks the CCR5 coreceptor but not CXCR4, a treatment decision for MVC requires prior determination of coreceptor usage in the blood of the respective patient [3].
Principally, co-receptor usage can either be analyzed functionally in vitro, using cell culture assays, termed phenotyping [4], or by sequence analysis of a specific envelope region (env-V3), termed genotyping [5,6]. Phenotyping tests often involve long turnaround times due to the need of sophisticated cell culture formats and can only be performed in a biosafety level 3 laboratory. Also, most tests use DNA recombination into an existing proviral backbone, mostly based on the genetic backbone of HIV-1 subtype B (e.g. NL4-3 or HXB2). Therefore, the analysis of clinical non-B subtype isolates in vitro may not be straight-forward.
One of the most widely used tools to genotypically predict tropism of HIV is the geno2pheno[coreceptor] web service [7]. Pairs of genotypic data and corresponding phenotypic information were used to develop and train the geno2pheno prediction system with machine learning methods. The resulting web tool geno2pheno[coreceptor] [8] has been validated in large subtype B studies, including MOTIVATE [9] and MERIT [10]. It allows for predicting the coreceptor usage based upon the V3 sequence of a given viral genome. The system uses a support vector machine to classify viruses as R5-or X4-capable based on informative patterns in the V3 sequence. HIV-1 isolates that do not exhibit sequence patterns indicative of R5 viruses are typically classified as X4-capable. Many viruses from divergent non-B strains of HIV have V3 sequences, which do not display strong sequence patterns of being R5-tropic and are therefore predicted as X4-capable. This is particularly true for subtype CRF01_AE viruses.
The HIV-1 subtype CRF01_AE, predominantly circulating in South-East Asia, is among those subtypes diverging the most from European subtype B viruses. Correlating the different genetic sequence, differences in the clinical properties have also been reported. It has been suggested that patients infected with subtype CRF01_AE may have a more rapid decline of CD4 + T cell count compared with patients infected with subtype B virus. Further, a shorter time to needing antiretroviral therapy and a higher virulence during the course of infection, have also been documented [11,12].
In a recent study by HIV-GRADE, analyzing the R5/X4-frequency in 2466 clinical HIV-1 isolates in Germany, the overall proportion of X4tropic virus variants was found to be 15-30% overall, applying the 10% false positive rate (FPR) cut-off. However, while the X4/R5 ratio was observed in this range for most subtypes, this ratio was markedly different for samples of subtypes D and CRF01_AE. Here the study predicted an X4 frequency of 50% [13] (Fig. 1).
Potential reasons for the unexpectedly high frequency of X4-capabe virus include i) a suggested true higher prevalence of X4-capable viruses in CRF01_AE infected patients, or, alternatively, ii) a principal false overcalling in CRF01_AE isolates by the current geno2pheno[coreceptor] algorithm [14][15][16].
Matsuda et al. [17]. recently showed by phenotyping that for HIV-1 CRF01_AE there is indeed a significant X4-overcalling when using the 10% FPR cut-off of the classical version of geno2pheno [coreceptor]. As this algorithm has been used in several recent studies performed in South-East Asia [11,18,19], a thorough examination and, if needed, a correction of the geno2pheno tool for the genotypic prediction of CRF01_AE coreceptor-usage is urgently indicated.
The aim of this study was to provide the necessary verification and to provide a basis for adjustments of the geno2pheno tool for CRF01_AE in diagnostic settings.

Study design
Twenty patient-derived env (gp120) HIV-1 CRF01_AE samples from a cohort in Thailand were used for simultaneous phenotyping and genotyping (sequence analysis of the V3 region of the env gene of HIV-1) of these specimens. The samples were randomly chosen from 144 CRF01_AE plasma samples available through the Thailand's National HIV Drug Resistance Surveillance Program from a study among female sex workers [20]; informed consent and ethical approval from the responsible IHRP have been obtained (approval 3/2557).
The viral env region in patient-derived samples was amplified by RT-PCR and cloned into a pNL4-3 cassette or a newly designed CRF01_AE plasmid cassette where it reconstituted fully functional HIV genomes. The new CRF01_AE cassette has been made available through the portal of the European Horizon2020 project of EVAg.
After DNA transfection of 293 T cells in co-culture with the HIVcompetent SXR5-reporter cell line, viral replication and syncytium formation were phenotypically determined in the presence of either the R5- Fig. 1. Frequency of R5-(green) and X4-tropism (blue) by geno2pheno using the standard FPR cut-off of 10% [13] in 2466 treatment-experienced patients, for whom tropism testing was performed at baseline prior to potential Maraviroc administration. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) antagonist MVC or the X4-antagonist AMD3100.
Phenotypic results were compared and combined with genotypic predictions and used for adapting geno2pheno[coreceptor] to the available CRF01_AE samples in order to reduce the prior X4-overcalling.

Results
We had noticed early on that the HIV-1 genome reconstitution inserting exclusively the gp120 region from the samples from Thailand into the NL4-3 background, only a very low viral infection rate was obtained in cell culture. As strategy for improving viral competence, the replication properties of a whole array of recombinant HIV-1 clones, carrying various genomic segments of CRF01_AE-origin were compared side-by-side in the backbone of a prototypic subtype B virus (NL4-3). After each cloning step (initially only the entire env gene, then env plus vpu, then vpu plus nef and eventually the entire region from gag to env), replication of the resulting viral subtype B/AE recombinants were analyzed. As final result, the HIV-1 genome from the BssHII site at nt 712 to NgoMIV at nt 8338 (pNL-AE-K7_short) or to XmaI at nt 8888 (pNL-AE-K7) was substituted in frame by patient-derived CRF01_AE sequences, retaining only the LTRs and the 3 ′ end of nef of NL4-3. Into pNL-AE-K7 we inserted the respective env sequences from clinical specimens in this study. This allowed to phenotypically re-assess the tropism of the respective patient-derived viral envelopes. Passage of cell-free recombinant virus onto human lymphocytes verified the infectivity of progeny CRF01_AE viruses, carrying patient-derived HIV-1 env sequences. By comparing viral V3 sequences after 30 days of passaging to the corresponding patient V3 sequences, possible mutations and contamination could be excluded.
Using the new pNL-AE-K7 we were able to determine the phenotype initially in 20 clinical samples (Table 1, column "Phenotype") by judging drug-based inhibition of viral replication and potential syncytia formation in the presence of either the R5-antagonist MVC or the X4antagonist AMD3100. In this assessment using a virus-replication system, only one sample (Th026) was found to be X4-tropic, while 18 samples were determined to contain R5-tropic virus. For one sample (Th049), no clear tropism determination was possible, since small fusion events of 2-3 HIV-infected cells had formed in the cultures in the presence of either inhibitor. For this case, the presence of a dual-tropic virus could not be excluded.
Noteworthy, for all tested B/AE-recombinants the average syncytium sizes and the overall number of viral infection events in the culture dish remained low (approximately 10% of the control) when compared to the control plasmid pNL-NF. In parallel, the most prevalent genotype present in these 20 clinical samples was predicted using the standard version of geno2pheno[coreceptor] ( Table 1, columns "g2p"). When linking these results to the phenotypic findings, the suspected systematic overcalling of X4-tropism in subtype CRF01_AE by the current version of the geno2pheno[coreceptor] tool became apparent, reaching only a low assay specificity of 66% when the standard FPR cut-off of 10%, was used. By lowering the FPR cut-off to 2.5% the specificity increased to 89%.
For confirmation beyond the small initial data set from Thailand, the newly suggested CRF01_AE-specific FPR cut-off of 2.5% was re-applied to a large data set from a German HIV-GRADE cohort on CRF01_AE samples ( Table 2). When applying this new rule to all available CRF01_AE isolates, the significant discrepancy in the X4/R5 tropism ratio for CRF01_AE isolates, as depicted in Fig. 1, completely disappeared and rendered this subtype similar to the general, subtype independent distribution of clinical samples.
When disregarding differences between subtypes, the overall tropism distribution across all isolates (including all subtypes) would be 72% R5 and 28% X4. When the 10% FPR cut-off was applied specifically to subtype CRF01_AE isolates, this ratio shifts to 51% R5 and 49% X4, indicating a dramatic deviation with a Chi 2 of <0.001. When we now apply the phenotype-supported new "CRF01_AE-FPR" cut-off of 2.5%, a 76% R5 and 24% X4 distribution is seen for the CRF01_AE isolates with a Chi 2 value of 0.43, which is no longer significantly different from the calculated global average of isolates irrespective of their subtype.
For verification and potential fine-tuning, our phenotype-matched geno2pheno values were also subjected to the FPR cut-offs of 1% and 5% as well as to the FPR cut-off of 3.75%, which is used for NGS data. No significant difference from the expected distribution was seen for cutoffs at and above 2.5%.

Discussion
In this study, phenotypically determined co-receptor usage was compared to and combined with genotypic data to improve the prediction of geno2pheno[coreceptor] for subtype CRF01_AE isolates of HIV-1.
For the phenotypic determination of the co-receptor usage major Table 1 Overview of results. The phenotyped CRF01_AE-samples with their confirmed sequence (V3-loop) and the genotypically predicted respective tropism. Blue = X4tropic, green = R5-tropic, ND = not determined.
challenges had to be overcome. A recombinant plasmid-based system (pNL-K7), previously developed by our group [21], was used to reconstitute HIV-1 variants. This cassette permits the exchange of env segments by cleavage with unique restriction endonucleases and placing PCR-amplified HIV-1 env derived from patient plasma directly into a complete viral genome. After transfection into a human indicator cell line, viral replication of the recombinant HIV-1 variant in the presence of inhibitors can be quantitatively analyzed [22]. One hurdle in this process was a poor PCR amplification rate of the env fragments. Standard PCR-primers were derived from a reliably working subtype B consensus sequence. The observed low amplification rates were a strong indicator for the vast sequence heterogeneity of our HIV-1 isolates in the viral env region, suggesting that the validated recombination protocol at predefined sites in Env may not be optimal for generating replicating viral subtype CRF01_AE genomes. Another technical hurdle appeared to be the low replicative fitness of recombinant HIV genomes encountered when using the in-house subtype B-based HIV-1 cassette (pNL-K7). We attributed the poor replicative capacity with previously reported observations that Env may critically depend on interactions with subtype-matched corresponding regions in Gag-Pol [23,24]. It is further possible that Env functions best in a subtype-unique context including its own co-evolved Vpu [25] or other viral proteins [22,26]. To improve the replicative fitness, we therefore designed a new cassette, carrying a near full-length subtype CRF01_AE backbone. With this construct we were able to obtain sufficiently replicative virus to phenotypically determine the tropism of CRF01_AE patient samples. Based on our comparison between phenotypically and genotypically determined tropism, our study supports implementing a significantly lower FPR cut-off of 2.5% (compared to the standard of 10%) as a critical adjustment for appropriate tropism prediction for CRF01_AE samples by geno2pheno [coreceptor]. The suggestion to lower the FPR cut-off for CRF01_AE virus variants is supported by others: One study had shown for CRF01_AE samples a specificity of only 50% at a 10% FPR cut-off, whereas the specificity increased to 77% by lowering the FPR cut-off to 5% [27]. Another study concluded in a comparison of different genotypic tools that for clinical practice, a geno2pheno[coreceptor] FPR cut-off of 5% could be used to predict CRF01_AE tropism [28].
Also, for genotyping of clinical samples using deep V3 sequencing (NGS), the interpretation of the analysis combines the information on FPR on each of the sequences and the corresponding frequency of these different variants in the sample. Currently, this two-dimensional cut-off predicts a sample with >2% of the variants with an FPR < 3.5% as not suitable for maraviroc treatment. This recommendation is so far independent of the HIV subtype [29]. We re-adjusted the FPR-value for a specific variant, namely for subtype CRF01_AE, in order to improve the clinical application of geno2pheno[coreceptor] also for the use for consensus, Sanger-sequencing.
As the R5-antagonist MVC proved to be a well-tolerated drug, lowering the FPR cut-off would potentially allow for more patients benefitting from MVC administration, especially in South-East Asia. Therefore, taking our results into consideration, we suggest setting an FPR cut-off of 2.5% for the tropism prediction of clinical subtype CRF01_AE samples by geno2pheno [coreceptor]. However, further studies on a larger cohort are needed to verify this suggestion.
One proposed reason for the observed X4-overcall using the standard FPR cut-off of 10% is a difference in common sequence motifs. The typical CRF01_AE envelope contains several otherwise uncommon amino acids in the V3 region as unique and inherent feature. The motif GPGQVF at the tip of the V3 loop occurred very frequently in the HIV-1 CRF01_AE samples of this study (Fig. 2).
Using the standard version of geno2pheno[coreceptor], these samples were often predicted to belong to X4-tropic HIV-1 isolates in contrast to samples with the GPGRAF motif being frequent in R5variants of subtype B. This significant deviation raised the speculation that the GPGQVF motif alone might result in incorrect X4 predictions. However, when taking the raw data of the prediction system into account, also additional minor changes outside this very tip region contribute to the X4-overcalling of subtype CRF01_AE variants.
For further improvement of geno2pheno[coreceptor], more information about specific V3-loop characteristics of subtype CRF01_AE variants will be incorporated in future versions of the algorithm.

Study limitation
Although the results are clear within experimental setting, the low number of X4-tropic samples, identified within the study, poses a limitation. One limitation is that most samples appeared to be R5-tropic by phenotypic determination although the genotypic prediction was X4tropic for several samples in the investigated cohort. However, a relatively high proportion of R5-tropic isolates is common for HIV-1 in ART treated individuals, consistent with results reported by Cui et al. [31]. When including our findings reported in Matsuda et al. [17], only two samples were phenotypically identified as X4-tropic, which reflects an overall percentage of 4.8% (2 out of 42 samples). A validation with more X4-tropic samples is recommended to corroborate our results.
It should be noted, although gender is currently not known to play a Table 2 The frequency of X4 in the patients with subtype CRF01_AE in the HIV-GRADE study was likely to be overstated with an FPR cut-off of 10% in comparison to the frequencies of the other subtypes. Lowering the FPR to the cut-off of 2.5% as phenotypically determined in this analysis, the subtype CRF01_AE-specific polymorphisms were correctly accommodated. The relevant FPR range between 2.5% and 5% (Chi [2] >0.2) has been shaded.
role in the tropism distribution, that all specimens of this study came from female sex workers in a cohort in Thailand.

Conclusion
By combining genotypic data with phenotypically determined results, this study demonstrates the previously suspected systematic overcalling of X4-tropism in subtype CRF01_AE by the current version of the geno2pheno[coreceptor] tool. A suitable solution was obtained by adjusting the FPR cut-off for CRF01_AE samples to 2.5%, This results in a correct tropism prediction for clinical CRF01_AE samples and may guide the safe beneficial use of MVC in a broader group of HIV infected individuals, who carry virus of this subtype.

Declaration of Competing Interest
None.