The global burden of HIV-1 drug resistance in the past 20 years

Genotypic drug resistance testing has been an integral part of the clinical management of HIV patients for almost 20 years, not only assisting treatment choices but also informing drug development. Accurate estimations on the worldwide circulation of drug resistance are difficult to obtain, particularly in low/middle-income countries. In this work, we queried two of the largest public HIV sequence repositories in the world—Los Alamos and Stanford HIVdb—to derive global prevalence, time trends and geodemographic predictors of HIV drug resistance. Different genotypic interpretation systems were used to ascertain resistance to reverse transcriptase and protease inhibitors. Continental, subtype-specific (including circulating recombinant forms) stratification as well as analysis on drug-naïve isolates were performed. Geographic information system analysis correlated country-specific drug resistance to sociodemographic and health indicators obtained from the World Bank. By looking at over 33,000 sequences worldwide between 1996 and 2016, increasing drug resistance trends with non-B subtypes and recombinants were found; transmitted drug resistance appeared to remain stable in the last decade. While an increase in drug resistance is expected with antiretroviral therapy rollout in resource-constrained areas, the plateau effect in areas covered by the most modern drug regimens warns against the downgrading of the resistance issue.


INTRODUCTION
Development of drug resistance has been hampering antiretroviral therapy (ART) since the availability of the first anti-HIV drug. Indeed, genotypic drug resistance testing has been an integral part of the clinical management of HIV patients for almost 20 years, not only assisting treatment choices for individual patients but also informing drug development to deliver new drugs and novel drug classes to combat drug-resistant virus selected by previous treatments (Clutter et al., 2016;Iyidogan & Anderson, 2014). HIV drug resistance genotyping is probably the best example of the impact of molecular diagnostics on treatment of an infectious disease. As expected, progress in ART has been paralleled by changes in the prevalence and circulation of different drug-resistant variants along with availability and use of new drugs, drug classes and drug combinations. Notably and reassuringly, most currently recommended drug regimens bring together potency, tolerability, ease of adherence and particularly a medium-to-high genetic barrier to resistance. As a result, the prevalence and incidence of emergent drug resistance have declined in recent years in high-income countries where the most modern treatment regimens have been increasingly used (Frentz et al., 2014;Rhee et al., 2015;Schultze et al., 2015). On the other hand, the ART rollout in low/middle-income countries has progressively expanded the number of patients under therapy but less tolerable and lower genetic barrier regimens have been used widely, actually creating the prerequisites for selection of drug-resistant variants (Bertagnolio et al., 2012). Indeed, the prevalence of drug resistance has been reported to increase significantly in several low/middle-countries in the last few years (Pham et al., 2014).
The relevance of HIV drug resistance would advise for some kind of regularly updated and structured worldwide surveillance to detect new trends and cope with issues timely. However, accurate information on the circulation of drug resistance is difficult to obtain and update, particularly, but not exclusively, in low/middle-income countries and global figures can only be estimated from temporally and geographically sparse data (Godfrey et al., 2017;Minior et al., 2017;Singh et al., 2014). In this work, we queried the Los Alamos HIV data base to derive the worldwide prevalence, time trends and geodemographic predictors of HIV drug resistance, and used the Stanford HIV drug resistance repository for sensitivity analysis. With the aim of estimating the global burden of HIV drug resistance, the primary analysis was done on cumulative resistance, i.e., emergent (on treatment or acquired) and transmitted (pretreatment or primary) resistance. However, given its importance per se, transmitted resistance was also separately investigated.

METHODS
This is a secondary data analysis on public data, collating individual-level de-identified demographic information associated to HIV gene sequences. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.
We set up an advanced query on the Los Alamos HIV data base (Los Alamos National Security LLC, 2017), selecting HIV-1 sequences of any subtype with a known sample year between 1996 and 2016, encompassing the whole protease region (1-99 amino acids) and at least the first 250 amino acids of the reverse transcriptase, using HXB2 nucleotide numbering as reference. Problematic sequences-which as per the Los Alamos definition include high content of ambiguous bases, G-A hypermutants, synthetic or contaminants-were excluded. As additional background information, we selected the Genbank accession number, patient identifier, treatment status, age, sex, mode of HIV transmission, geographic region (continental area as per the Los Alamos definition), and country. Patients with unknown or non-uniquely ascertainable patient identifier were excluded, and only one sequence per patient per year was retained, with no restrictions on the infection time, choosing the earliest sequence in case of multiple entries available in a year.
All downloaded sequences were submitted to Stanford's HIVdb web service (Stanford University, 2017) for quality check, alignment/extraction of mutations with respect to subtype B consensus, and calculation of drug resistance to nucleotide/nucleoside reverse transcriptase inhibitors (NRTIs), non-nucleoside reverse transcriptase inhibitors (NNRTIs), and protease inhibitors (PIs) using the three available algorithms of ANRS v.26 (France) (HIV French Resistance, 2017), Rega v.9.1 (Belgium) (Ku Leuven, 2017), and HIVdb v.8.4 (USA) (Stanford University, 2017). Resistance to one single drug was defined as an intermediate/resistant scoring as per the Stanford's standardized scale, including non-ritonavir boosted PI scoring. Resistance to a drug class (NRTI/NNRTI/PI) was defined as standardized intermediate/resistant scoring for at least one drug belonging to that class. Per-drug and per-drug-class rate of agreement among the three scoring systems was calculated via Cohen's kappa. Transmitted drug resistance was estimated separately using the WHO 2009 list of surveillance drug-resistance mutations (Gifford et al., 2009). Given the large number of sequences involved, fast subtyping was performed with BLAST on the latest HIV subtype and recombinant form reference set available on the Los Alamos data base, composed of 60 subtypes of which 14 pure and 46 circulating recombinant forms (CRFs), plus the chimpanzee simian immunodeficiency virus outgroup; two-three representatives per subtype are available, for a total of 170 reference sequences. For comparison, a subset of sequences was also subtyped using the Rega subtyping tool v.3 (Pineda-Pena et al., 2013).
We analyzed spatiotemporal trends in prevalence of drug resistance to NRTI/NNRTI and PI classes in relation to subtype, geographic area, and sociodemographic characteristics. We used descriptive, univariate and multivariate (logistic regression) analysis, and geographic information system (GIS) analysis. For GIS analysis, prevalence of B subtype, NNRTI, NRTI, PI, and two-class resistance were calculated for countries with at least 50 sequences from 2006 to 2016. Maps were generated using the ArcGIS 10.3 (ESRI, 2017). Country-level sociodemographic and health indicators were obtained from the World Bank's DataBank (The World Bank, 2017). We retrieved the average values of the following 22 indicators between 2006 and 2016: (1) Adjusted net national income per capita (constant 2010 USD), (2) Health expenditure, total (% of GDP), (3) Health expenditure, public (% of GDP), (4) Adequacy of social insurance programs (% of total welfare of beneficiary households), (5) Adults (ages 15+) and children (ages 0-14) newly infected with HIV, (6) Antiretroviral therapy coverage (% of people living with HIV), (7) Condom use, population ages 15-24, female (% of females ages 15-24), (8) Condom use, population ages 15-24, male (% of males ages 15-24), (9) CPIA gender equality rating (1 = low to 6 = high), (10) Improved sanitation facilities (% of population with access), (11) Incidence of tuberculosis (per 100,000 people), (12) Literacy rate, adult total (% of people ages 15 and above), Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age), and (22) Unemployment, total (% of total labor force, modeled ILO estimate). Spearman correlations were calculated between these indicators and the prevalence of B subtype, NNRTI, NRTI, PI, and the two-class resistance. Heatmaps were generated using the ggplot2 package in R (R Development Core Team, 2008).
In a sensitivity analysis, we downloaded HIVdb's Genotype-rx datasets, which include all publicly available HIV-1 isolates in HIVdb and the lists of antiretrovirals received prior to the isolations. These data sets are well curated in terms of sequence quality control and treatment status, but lack patient-level demographic information (e.g., gender), with only year and country available. Therefore we used them to confirm drug resistance trends across classes over time.

RESULTS
We downloaded 107,820 sequences meeting the inclusion criteria from the Los Alamos data base, of which 99.5% passed the HIVdb/BLAST quality check. Non-ambiguous patient identifier and known geographic area was retrieved for 33,057 instances. Table 1 shows population characteristics for the filtered data stratified by geographic region. Half of the sequence isolates came from China, United States, South Africa, Germany and United Kingdom. There was uneven distribution of sequences per country across calendar years (more sequences from North America in earliest years). About 37% of sequences were from antiretroviral-naïve people, evenly distributed across continental areas with the sole exception of North America (6%). The prevalence of B subtypes was 38%, with subtype C being the most prevalent among non-B/CRF ones (23%). Concordance between BLAST and Rega v.3 was high (94% on a subsample of 1,000 sequences), with major discrepancies found on complex recombinant forms (also due to the different reference data sets used).
The worldwide resistance prevalence between 1996 and 2016 was about 12% for PIs, 21% for NRTIs and 22% for NNRTIs, using Stanford's HIVdb algorithm.
The rate of agreement among the three genotypic resistance interpretation algorithms was high overall for each single drug, with the exception of non-B subtypes and PIs, for which ANRS has a completely different score distribution as compared to HIVdb and Rega; the problem is known and relates to tipranavir interpretation. Overall, all algorithms showed high rates of agreement, between 90% and 97%, the highest rates being between HIVdb and Rega (Table S1). Some relevant lower agreement included ANRS and HIVdb/Rega in regards to NNRTIs with subtype B (81%-83%), Rega and HIVdb for PIs with non-B subtypes/CRFs (75%), and as expected ANRS and HIVdb/Rega for PIs with non-B subtypes/CRFs (1%).
From now on, we will use the HIVdb interpretation as proxy for drug resistance among NRTI/NNRTI/PI classes, given the high agreement with Rega, which means that for all drugs at least two algorithms were concordant in resistance interpretation. Figure 1 plots the worldwide trends (1996 to 2016) in HIV drug resistance to NRTIs/NNRTIs/PIs, multi-class and newest NNRTI/PI drugs, overall and stratified per B vs. non-B subtypes/CRFs. There was a sharp decrease in resistance prevalence across all drug classes in subtype B sequences, flattening in the last decade. Instead, resistance to NRTIs, NNRTIs and two-class resistance increased in non-B subtypes/CRFs (remaining stable with respect to PIs). Transmitted drug resistance seemed to remain low and stable Table 1 Characteristics of the HIV-1 isolates retrieved from the Los Alamos sequence data base. Data are stratified by continental area (one sequence per person per year, encompassing 1-99 amino acids of the protease and 1-250 of the reverse transcriptase genes, all with a known year and geographic origin).   2007 (2004-2009) 2006 (2003-2011) 2000 (1999-2008) (below 8%) in all subtypes across years. Resistance to newest NNRTIs and PIs (etravirine, rilpivirine, darunavir) seem to increase over the years in non-B subtypes/CRFs, although more recent years exhibit higher uncertainty. Notably, the worldwide resistance trends were consistent with the estimation made on HIVdb's Genotype-rx datasets ( Figure S1). After filtering, the Genotype-rx data sets had a larger sample size (64746 isolates spanning reverse transcriptase and protease). Compared to the Los Alamos dataset, there was a higher proportion of subtype B isolates (51%), and different continental/national coverages (with the top-frequency country being Brazil).

Data attribute/
When looking at geodemographic factors associated to drug resistance in the Los Alamos dataset ( By continuing on Table 2, HIV-1 transmission by homosexual contact showed lower odds of drug resistance to any of the three inhibition class, and resistance to two or more classes (see also the association of homosexual transmission with B subtype). Injection drug users were at higher risk to carry drug resistance to NNRTIs but lower to other classes. Mother-to-child transmission was instead associated to higher risk of carrying any-class resistance. As expected, antiretroviral-naïve people had lower odds to present with drug resistance to NRTI/NNRTI/PIs as well as to multi-class. People younger than 26 years old had higher risk to carry a resistant virus to any drug class and multi-class as compared to older people. In terms of continental areas, all had lower odds of drug resistance as compared to Europe (Table 2). We also performed a stratified analysis in the Los Alamos dataset looking at sequences coming from the same continental zone (Table S2). Overall, results were consistent with the main analysis (e.g., odds for non-B subtypes/CRFs, treatment-naïve, and risk groups). We found significant differences in the odds of having a resistant virus for patients belonging to different countries in the same continental zone.
Non-B subtypes/CRFs were associated to lower rates of drug resistance (Table 2); however, when breaking down the non-B subtypes/CRFs, some exhibited higher odds of drug resistance, in particular: 04_cpx, O and P for all classes; C and 49_cpx for NNRTIs; and 46_BF for PIs (Table S3). Of note, the kappa agreement between B subtype and antiretroviral-naïve was very poor, i.e., −0.06, with 50% of the sequences being both antiretroviral-naïve and non-B, or the opposite.
Then we passed on to analyzing data from the most recent decade (2006 to 2016). Figure 2 shows drug resistance prevalence to NRTIs, NNRTIs, and PIs for the countries included in the Los Alamos dataset (see Fig. S2 for multi-class resistance). By performing the same multivariable analysis, we found that a more recent calendar year was associated to an increased risk of resistance to any of these three drugs; isolates from Africa and   Central America had also higher odds of presenting with drug resistance to new-generation inhibitors as compared to Europe. We then looked at factors associated with a B vs. non-B subtype (Table 3). Male homosexuals were more likely to carry a B subtype than heterosexuals, whilst intravenous drug users had lower odds. There has been a substantial decrease (about 10%/year) in rate of B subtypes recorded in the data base, with higher odds of carrying non-B subtypes/CRFs for adults 44+ years old. Africa, Asia, the Caribbean, former USSR/Russian Federation, Middle-East, and South America isolates had lower odds of being B subtypes as compared to European isolates, whereas Central America, North America and Oceania were more likely to be B subtypes.
With GIS analysis (Fig. 3), we found that higher rates of drug resistance were associated to unemployment, whilst lower rates were correlated with adequacy of social insurance programs. A higher prevalence of B subtype was correlated to improved sanitation facilities, physicians, higher literacy rates, adjusted net national income, and public health expenditure. Conversely, higher non-B/CRFs prevalence was correlated with high tuberculosis incidence, undernourishment, newly infected population, higher HIV prevalence, and population growth.

DISCUSSION
While development and circulation of HIV drug resistance is recognized as a relevant worldwide issue, estimating its burden and evolution is a challenging task. Information on drug resistance is commonly obtained through cohort studies or national and supranational HIV sequence repositories, sometimes created with different and specific aims and hence introducing sampling biases and failing to represent the whole patient population. In addition, the global view is seriously biased by a far larger number of studies done in high with respect to low/middle income countries with high HIV prevalence, two scenarios where drug availability and prescription policies have widely diverged for a long time. For example, a recent official document from the WHO has been based on 16 nationally representative surveys from only 14 low/middle income countries accounting for around 4,000 untreated and 300 treated patients with very limited representation from Southeast Asia (World Health Organization, 2017). The Los Alamos data base, even though it is not a formal surveillance medium, offers an opportunity to analyze a large number of sequences derived from a comprehensive variety of geographic areas. In addition, structured queries can be run to retrieve important information associated with HIV genotype. The overall worldwide trends of drug resistance obtained from Los Alamos data base in this work were confirmed by a separate analysis on the Stanford HIVdb Genotype-rx datasets, yet Los Alamos and HIVdb differ in terms of country, subtype distribution, and proportion of people on treatment. Nonetheless, this study approach brings a number of limitations, in particular related to sampling bias. First, some countries are strongly over-or under-represented in these databases in relation to their HIV incidence. Second, different types of patients are sampled for different dates and countries. Third, for many sequences no information on prior treatment is available: this concern applies especially to sequences from treatment experienced patients (for which treatment history plays a key role in the observed resistance patterns) but to a lesser extent also for treatment naive sequences. As a general note of caution, since the prevalence of drug resistance in untreated vs. treated patients differs considerably, any global estimate must correct available data for the rate of antiretroviral coverage in specific geographic areas.
Standing such challenges associated with interpreting sequence data for which metadata is not uniformly available, with a careful, stratified study design is still possible to draw estimates that can be corroborated by independent sources. In fact, our study capitalizes on the meta-analysis of Rhee et al. (2015) up on which Stanford HIVdb provides freely a large sequence selection and curation, and moves forward by looking at individual demographics and country-specific social-ecological determinants by GIS analysis. Therefore, large-scale studies like ours can be carried out in parallel with more traditional approaches that may suffer from small sample size, or meta-analyses that increase the sample size but necessarily discard a number of attributes.
A major finding of this work is the divergent temporal trends in drug resistance with B vs. non-B subtypes and CRFs. Although not strictly, these categories tend to represent high-and low/middle-income countries, respectively. While an increase in drug resistance is reasonably expected along with ART rollout in resource constrained areas, the plateau effect in decreasing drug resistance in areas covered by the most modern drug regimens warns against the downgrading of the resistance issue in Western countries, a vision supported by some large size studies published a few years ago. Indeed, although potent and well tolerated, mostly high genetic barrier regimens have been increasingly used, novel drug classes have been not made available since 2008, suggesting that cross-resistance may play a major role in carrying forward a proportion of patients with detectable viremia still fueling drug resistance at population level. Increasing resistance to latest drugs in more recent years, particularly those with high genetic barrier such as darunavir and etravirine, is indeed consistent with expanded use of these drugs but also with cumulative exposure to previously available cross-resistant drugs.
On the other hand, transmitted drug resistance appears to remain stable over time. Notably, stable primary resistance rates have been observed even in areas where emergent resistance has declined considerably, likely due to a process of selection whereby patients failing therapy and harboring drug-resistant virus are favored over patients with undetectable viremia and wild type virus in terms of HIV transmission. However, large clusters of wild type virus transmission are maintained via unsafe practices among individuals unaware of their HIV status or not yet under treatment (Olson et al., 2018).
It must be noted that adherence clearly plays an important role in determining the rate of emergent resistance. A higher prevalence of drug resistance was indeed observed in intravenous drug users with respect to other risk groups but only for NNRTI. The most widely used drugs in this class are well known for their long half-life and low genetic barrier resulting in functional monotherapy and ease of selection of drug resistance upon erratic adherence, a behavior often associated with intravenous drug use. On the other hand, very poor adherence may have also been responsible for the association between lower rates of resistance and healthcare indicators as well as literacy rate. In regard to the correlation of subtype with a number of social-ecological determinants, e.g., sanitation, physicians, public health expenditure, this in part reflects the known fact that subtype B is the predominant subtype in the US and western Europe. However, many of these sociodemographic indices also vary considerably among countries with majority of non-B subtypes, and our findings warrant a deepened analysis that may help public health officials to strategies welfare interventions.
The large sample size of this study allowed for comparisons between B and individual non-B subtypes/CRFs, rather than between B and non-B subtypes/CRFs, a convenience grouping commonly used but which does not reflect any biologically significant category. Apart from a small increase in NNRTI resistance with subtype C, we did not observe relevant differences, suggesting that resistance currently impacts different subtypes/CRFs with comparable rates, a finding also in agreement with the lack of evidence of differential outcome of ART with different HIV clades. The high levels of resistance to some rare recombinants (i.e., CRF04_cpx) may be probably due to the low number of sequences available. Therefore for rare variants the findings may be heavily biased.
An important limitation of this study is that resistance to integrase inhibitors was not analyzed. While this drug class is expected to be dominant in ART, both in high and low/middle income countries, currently scoring algorithms and isolate data in Los Alamos data base do not allow deriving robust estimates; Stanford HIVdb provides a larger collection of sequences, but the individual-level relevant demographic information is not available. However, updates are warranted along with increasing collection of integrase sequences over time. It is expected that the introduction of high genetic barrier integrase inhibitors such as dolutegravir, cabotegravir and bictegravir will limit emergence and transmission of resistance to this drug class.
Given the diverse collection policies and sources of sequence provenience for the Los Alamos and Stanford HIVdb repositories, we are aware that a proper sample selection for minimizing bias is difficult, and some of the findings here reported may have been affected by this issue.

CONCLUSIONS
Data obtained from large and in part curated collections of HIV sequences can effectively complement resistance centered survey such as those reported by the WHO to derive the global burden of drug resistance and inform future surveillance and education programs.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work has been supported by the ''Virogenesis'' project which has received funding from the European Union's Horizon 2020 research and innovation programme (grant agreement No. 634650), and by the United States' National Institutes of Health-National Institute of Allergy and Infectious Diseases grant ''HIV-DYNAMITE'' (No. 1R21AI138815-01). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.