Phylogenetic transmission clusters among newly diagnosed antiretroviral drug-naïve patients with human immunodeficiency virus-1 in Korea: A study from 1999 to 2012

Population-level phylogenetic patterns reflect both transmission dynamics and genetic changes, which accumulate because of selection or drift. In this study, we determined whether a longitudinally sampled dataset derived from human immunodeficiency virus (HIV)-1-infected individuals over a 14-year period (1999–2012) could shed light on the transmission processes involved in the initiation of the HIV-1 epidemic in Korea. In total, 927 sequences were acquired from 1999 to 2012; each sequence was acquired from an individual patient who had not received treatment. Sequences were used for drug resistance and phylogenetic analyses. Phylogenetic and other analyses were conducted using MEGA version 6.06 based on the GTR G+I parameter model and SAS. Of the 927 samples, 863 (93.1%) were classified as subtype B and 64 were classified as other subtypes. Phylogenetic analysis demonstrated that 104 of 927 patient samples (11.2%) were grouped into 37 clusters. Being part of a transmission cluster was significantly associated with subtype-B viruses, infection via sexual contact, and the infection of young males. Of all clusters, three (~8.1%) that comprised 10 individual samples (22.2% of 45 individuals) included at least one member with total transmitted drug resistance (TDR). In summary, HIV transmission cluster analyses can integrate laboratory data with behavioral data to enable the identification of key transmission patterns to develop tailored interventions aimed at interrupting transmission chains.


Introduction
The extremely high diversity of human immunodeficiency virus (HIV) has been attributed to its high replication capacity and the high frequency of errors introduced by reverse transcriptase during replication. HIV-1 is the most common virus types worldwide and has been PLOS ONE | https://doi.org/10.1371/journal.pone.0217817 June 5, 2019 1 / 13 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 classified into four groups as follows: M (major), N (non-M, non-O), O (outer), and P (pending the identification of further human cases); group P was identified recently in two Cameroonian patients. HIV-1 group M can be further classified into nine subtypes including A-D, F-H, and K [1]. This extensive diversity has led to frequent recombination between strains, resulting in several circulating recombinant forms (CRFs) and a very high number of unique recombinant forms (URFs) [2][3][4][5]. To date, 72 CRFs have been isolated, and this number is expected to increase in the future [6]. The unequal distribution of different HIV-1 genotypes worldwide results from the global transmission and spread of certain variants or the limited spread of local endemic strains [1]. Subtype B is predominant in the Americas, Western Europe, and Australia [7][8][9], whereas subtype B is also the most abundant genetic form in Korea [10][11][12].Further, CRFs and URFs are widely distributed in countries where different subtypes co-circulate [13][14][15][16].
Phylogenetic trees based on viral genes can deliver crucial insights into the ecology and evolution of HIV transmission [17,18]. Population-level phylogenetic patterns reflect both transmission dynamics and genetic changes [19][20][21], which accumulate because of selection or drift. Currently, the best method to identify and establish transmission events related to HIV between individuals or within a community is high-resolution phylogenetics based on HIV sequence data [22][23][24][25][26].
In this study, we aimed to determine whether a longitudinally-sampled dataset derived from HIV-1-infected individuals over a 14-year period (1999-2012) could shed light on the transmission processes involved in the initiation of the HIV epidemic in Korea. The identification of transmission clusters and their characterization may provide valuable insights into factors that contributed to the origin of HIV transmission in Korea. We characterized the composition of phylogenetically reconstructed "clusters", or groups of people in which multiple transmissions likely occurred, and assessed the factors associated with membership to these clusters among patients diagnosed from 1999 to 2012. Here we report our results from applying the phylodynamic profiles of HIV-1 subtype B and other subtypes circulating among the antiretroviral drug-naïve population of Korea.

Study population and RNA extraction
Blood and plasma samples of individuals newly diagnosed with HIV-1 infection, for whom highly active antiretroviral therapy (ART) had not been initiated, were collected on an annual basis for genotypic assays of antiretroviral drug-resistant variants in Korea. Variations in HIV-1 pol (a polymerase gene) were monitored continuously using a subset of approximately 10% of the samples isolated from newly-diagnosed drug-naïve patients every year since 1999 (Table 1). A simple random sampling method was used to select patient groups based on their epidemiological history. The study was approved by Korea Centers for Diseases Control and prevention Research Ethics Committee (No. 2012-05CON-11-P).

Viral RNA extraction, pol amplification, and sequencing
The experimental conditions for reverse transcription-polymerase chain reaction (RT-PCR) and PCR for sequencing the PR and RT parts of the pol region were based on the laboratory protocol specified by Stanford's Center for AIDS Research. In accordance with the manufacturer's instructions, viral RNA was extracted from 140 μL of the plasma sample using a QIAamp Viral RNA Mini kit (Qiagen, Valencia, CA, USA) and was suspended in 50 μL of elution buffer. RT-PCR was used to generate a template with the primer set MAW-26 (2028-2051, 10 pmol/mL, forward) and RT-21 (3539-3509, 10 pmol/mL, reverse), using a SuperScript III One-Step RT-PCR kit and Platinum Taq DNA polymerase (Invitrogen, Carlsbad, CA, USA). After RT-PCR, nested PCR was performed. The PCR product of pol (~1.3 kb), containing the entire PR and RT gene sequences, was subjected to direct sequencing using an ABI Prism Big-Dye Terminator Cycle Sequencing 3.1 Ready Reaction Kit (Applied Biosystems, Foster City, CA, USA) with an automated sequencer (ABI Prism 3730 DNA sequencer; Applied Biosystems), after purification with a Millipore PCR Cleanup kit (Millipore Corp., Madison, WI, USA) and following gel electrophoresis.

Genotypic drug resistance assays
PR and RT drug resistance mutations in antiretroviral drug-naïve patients were investigated to analyze genotypic resistance to protease inhibitors (PIs) and nucleoside reverse transcriptase inhibitors (NRTIs), as well as non-NRTI (NNRTI)-related resistant variants. Resistant mutations were identified based on the consensus statement from the Stanford sequence database (DB) for HIV PR (codons 1-99) and RT (codons 1-300). The defined drug resistance positions were as follows: 22 protease inhibitor-resistance positions at codons 10,20,24,30,32,33,36,46,47

HIV genotyping and phylogenetic analysis
HIV-1 subtype and CRF designations were determined by uploading the sequences into REGA HIV-1 automated Subtyping Tool version 2.0 (http://www.bioafrica.net/regagenotype /html/subtypinghiv.html) and were confirmed by in-house phylogenetic analysis of pol nucleotide sequences, as described previously [3,4,17]. For phylogenetic analysis, reference sequences representing the overall genetic variability of HIV-1 group M, including all subtype, sub-subtype, and CRF references, were obtained from the National Institutes of Health/ National Institute of Allergy and Infectious Diseases (NIH/NIAID)-funded HIV database. Phylogenetic analysis was performed using MEGA 6.06 initially, and an alignment of 1437 nucleotides was created. Genetic subtype and potential transmission clusters from the timestamped sequence dataset were first deduced by neighbor-joining tree reconstruction using MEGA version 6.06 based on the GTR G+I parameter model [27]. The robustness of the transmission clusters was further tested by the more rigorous maximum likelihood inference implemented by MEGA version 6.06 [27] using a gamma distribution with discrete gamma categories. The reliability of the branching orders was assessed by bootstrap analysis of 1000 replicates. The most appropriate nucleotide substitution model was determined using Find-Model, a web implementation of Modeltest available at the HIV DB (http://www.lanl.gov. com). Using the Bayesian Markov Chain Monte Carlo framework, 100 million steps were performed, sampling every 10,000, and removing 10% as burn-in. Convergence was assessed using Tracer (v1.4), and effective sample size values greater than 200 were accepted. A maximum clade credibility tree was summarized with TreeAnnotator (available in the BEAST package) and was visualized with Figtree (v1.4) [28,29].

Identification of transmission clusters and analysis of associations with transmission clustering
Phylogenies were inferred using a general-time reversible model of nucleotide substitution, an estimated proportion of invariant sites, and gamma distributed rates among sites. The best sub-tree pruning, and re-grafting and nearest neighbor interchange heuristic options were selected to search the tree space, and bootstrap values with 1000 replicates were used to assess confidence in topology. The existence of transmission clusters was determined based on the statistical robustness of the maximum likelihood topologies assessed by high bootstrap values (90%) with 1000 re-samplings and short branch lengths (genetic distances, 0.015) of the HIV-1 pol sequence. The phylogenetic tree was displayed with FigTree v.1.42 (tree.bio.ed.ac.uk/ software/figtree/). We considered membership in a phylogenetic transmission cluster as the dependent variable for our analysis. We used a Pearson's chi-squared test, a Fisher's exact test, and Student ttests to compare clustered and non-clustered patient samples. We then entered variables with a P value of 0.2 based on preliminary analysis or variables that we considered conceptually important based on previous reports, into univariable and multivariable regression models to examine associations with transmission clustering [16]. We conducted multivariable analysis using the full data set. The analysis was then repeated using samples with available transmission risk data. Among men with a known risk of transmission, we further stratified the regression analysis to identify differences in associations based on clustering comparing men who had sex with men (MSM) versus men who had not had sex with men (non-MSM). We based the multivariable model building on an iterative approach and assessed model fit using the Hosmer-Lemeshow goodness-of-fit test. We analyzed data using SAS version 9.2 (SAS Institute, Cary, NC, USA).

Patient demographics and molecular characteristics of HIV-1
In total, 927 sequences were acquired from 1999-2012, with each sequence acquired from an individual patient who had not received treatment. Seventy-four percent of the population was greater than 30 years of age (median age, 34.5 years). Men comprised 89.4% (829/917) of the study population, whereas women accounted for 6.6% (61/917) of the total HIV-1-infected population. This ratio was similar to that reported by the Division of HIV and TB Control, KCDC for the entire HIV/acquired immunodeficiency syndrome (AIDS) population in Korea in March 2013 (90.9% men and 9.1% women). The majority of men (324; 39.1%) attributed their infection to heterosexual contact; 285 (34.4%) were classified as MSM, and two (0.2%) infections were perinatal. The women attributed their infections to heterosexual contact. Table 1 shows the demographic characteristics of the 927 patients. The samples were characterized according to sex, age, transmission route, CD4 count, and HIV RNA levels in the plasma.

Identification of the transmission clusters
Based on viral sequences, phylogenetic analysis demonstrated that 104 of 927 patients (11.2%) were grouped into 37 clusters. All clusters were composed of 2-8 members; for seven small clusters, both or all three patients that formed the clusters were diagnosed during the same year. Twenty-two clusters included individuals who first consulted a hospital 2-4 years after the report of infection; the remaining eight were clusters that included patients for whom presentation covered 5 years or more prior to consulting a hospital. The mean number of patients per cluster was 2.81 (range, 2-8). Interestingly, all clusters belonged to subtype B, and four large clusters (comprising six or eight patients) and 33 smaller clusters were identified. Large cluster no. 31 comprised young men (mean age, 22.3 years), who exhibited a slightly lower viral load than patients in the other clusters and accounted for a long time span (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012). Phylogenetic trees based on the viral sequences of the clustered and non-clustered patients are shown in Fig 1. An overview of the characteristics of the clustered and non-clustered patients is given in Table 3. Being part of a transmission cluster was significantly associated with harboring a subtype-B virus, infection through sexual contact, male sex, and being a young male. The characteristics of the patients in part of the transmission clusters are summarized in Table 4.

Factors associated with membership in the phylogenetic cluster
Based on the initial exploratory analysis, the clustered patients were younger (median age, 36.5 versus 39.0 years) and were more likely to be male (98.0%) for all comparisons (Table 1). Clustered patients frequently had CD4 counts greater than 350 cells/mm 3 . Based on multivariable analysis, factors of less than 30 years of age, male gender, and CD4 counts greater than 350 cells/mm 3 were found to be associated with cluster membership (Table 2). Clinical records revealed the HIV transmission risk factor data for 674 of 927 (72.7%) individuals; however, no differences in risk factor data availability were detected between clustered and non-clustered patients. Among the men with available risk factor data, MSM transmission risk was slightly more common among clustered patients (Table 5). Our additional multivariable analysis, which included a subgroup for which HIV transmission risk data was available, revealed an association between clustering and age less than 30 years, male gender, MSM status, and higher CD4 counts ( Table 5).

Characteristics of the MSM population
Information on the transmission route was available for 674 of 927 patients. Of these, 300 (44.1%) reported transmission through homosexual contact with men (MSM), and 367 (54.5%) reported transmission through heterosexual contact. Phylogenetic analysis demonstrated that 42 of 300 MSM (14%) grouped into clusters. The clustered patients were more likely to have CD4 counts greater than 350 cells/mm 3 . Based on multivariable analysis, age less than 30 years, male gender, and CD4 counts greater than 350 cells/mm 3 were associated with cluster membership (Table 5). Samples from all MSM clusters were grouped as subtype B. Interestingly, the other clusters were grouped into the monophyletic subtype B group, which includes the Korean clade B, based on Env sequences [30,31].

Discussion
In this study, we compared HIV pol sequences from 927 newly-diagnosed patients receiving care at the Division of AIDS, KNIH between 1999 and 2012. We found that 11.2% of these patients grouped into 37 molecularly-defined HIV transmission clusters. We analyzed the structures of the transmission clusters in Korea through the integration of molecular, clinical, and demographic data. Analysis of the HIV-1 pol sequences generated from antiretroviral resistance surveillance programs has proven useful and informative to assess and define transmission clusters within a population of interest [4,15,32]. Based on these criteria, substantial clustering (11.2%, 104/927) was observed in the current study, indicating that the majority of HIV-1 subtype B infections in Korea were linked to a cluster that might be associated with local and/or foreign HIV-1 networks. Real-time identification of epidemiological hot spots, pinpointing viral transmission chains and biologically linking the drivers of the epidemic, is necessary to introduce effective prevention strategies and interrupt HIV transmission chains [3,24,33]. Identifying clusters that account for the largest proportion of onward transmissions and characterizing the structural, behavioral, and biological correlates that predict transmissibility will aid in the development of targeted interventions [33,34]. The events listed here represent the first identified transmission events of HIV among the general population in Korea.
Surprisingly, subtype B remained the dominant circulating subtype, contributing to approximately 92.5% of all cases of HIV-1 infection in Korea. This finding is in contrast to the results reported for the neighboring countries of China and Japan [2,25,35,36]. In the present study, a number of recombinants involving subtype B and other types were observed. Interestingly, the subtype-B region of these URFs in the Korean MSM population was found to be of western B origin, distinguishing them from the URFs recently described among people who were infected with subtype B of Chinese or Japanese origin [26,[37][38][39]. The emergence of these variants indicates an ongoing active intersubtype recombination event between the predominant genotypes. Such events might complicate disease management. In this study, we found that the presence of TDR, in addition to young age and male sex, was significantly associated with transmission cluster membership. In particular, in the multivariable model, the association between TDR and clustering remained significant, even after controlling for infection duration. These associations with clustering might simply be markers of very high-risk behaviors and rapid ongoing transmission, as there is no evidence to suggest that TDR mutations make the virus more transmissible [14,22,40]. Furthermore, we found several clusters in which nearly all individuals harbored TDR, indicating that drug-naïve individuals contribute to the onward spread of TDR. Importantly, these clusters might reflect sexual networks that are reservoirs of drug resistance beyond ART-experienced individuals.
The incidence of HIV remains low in Korea compared to that in many other countries. In 2013, there was a cumulative total of 10,404 HIV/AIDS cases in Korea, of which approximately 80% were associated with sexual contact, based on available data (KCDC reports, 2013). The number of newly reported HIV cases has increased by more than 5-fold during the past decade, from 199 in 1999 to 1114 in 2013. Despite extensive HIV prevention and education programs targeting at-risk groups, many MSM individuals in Japan appear to engage in highrisk sexual behavior, predisposing them to HIV-1 infection and transmission. Based on our subgroup analysis examining the risk factors associated with HIV transmission clustering among men, the finding that being young and male was only significantly correlated with clustering for non-MSM individuals (versus MSM) might have resulted from the fact that both the clustered and non-clustered MSM groups included a significantly high number of individuals who were less than 30 years of age.
Interestingly, other studies conducted in Asia (China, Japan, and Malaysia) reported similar findings during the same period, indicating that the emergence of the transmission clusters was likely caused by increased exposure to high-risk behaviors among MSM individuals [23,[41][42][43]. HIV-1 epidemics among MSM groups in major cities in the United States and Europe were first detected in the early 1980s; in these epidemics, subtype B was the founder strain responsible for these events [44]. Subtype B is also commonly found in most developed countries of Western Europe, as well as in South American and Asian countries [6,18,39,45,46]. According to Kim et al., who studied samples collected between February 1998 and March 2005 in Korea, subtype B was virtually the only HIV-1 strain identified among MSM individuals in Korea [31].
From a public health perspective, sexual networks within the MSM population could serve as an important force of continual HIV-1 dissemination, thereby representing key entry points for the delivery of intervention strategies. A previous study revealed a significant association between high-risk sexual behavior and ignorance regarding HIV infection [4].

Conclusion
Analysis of the phylodynamic or evolutionary history of HIV relies significantly on the depth of population-based sampling; a study of this type should be continued and expanded upon to improve the resolution of HIV-1 genomic diversity and transmission dynamics within HIV transmission networks. The continuing transmission of HIV among MSM individuals indicates the need to maximize the use of available bio-epidemiological data. HIV transmission cluster analysis integrates laboratory data with behavioral data to enable the delineation of key transmission patterns to develop tailored interventions aimed at interrupting transmission chains.