HIV-1 molecular epidemiology among newly diagnosed HIV-1 individuals in Hebei, a low HIV prevalence province in China

New human immunodeficiency virus type 1 (HIV-1) diagnoses are increasing rapidly in Hebei. The aim of this study presents the most extensive HIV-1 molecular epidemiology investigation in Hebei province in China thus far. We have carried out the most extensive systematic cross-sectional study based on newly diagnosed HIV-1 positive individuals in 2013, and characterized the molecular epidemiology of HIV-1 based on full length gag-partial pol gene sequences in the whole of Hebei. Nine HIV-1 genotypes based on full length gag-partial pol gene sequence were identified among 610 newly diagnosed naïve individuals. The four main genotypes were circulating recombinant form (CRF)01_AE (53.4%), CRF07_BC (23.4%), subtype B (15.9%), and unique recombinant forms URFs (4.9%). Within 1 year, three new genotypes (subtype A1, CRF55_01B, CRF65_cpx), unknown before in Hebei, were first found among men who have sex with men (MSM). All nine genotypes were identified in the sexually contracted HIV-1 population. Among 30 URFs, six recombinant patterns were revealed, including CRF01_AE/BC (40.0%), CRF01_AE/B (23.3%), B/C (16.7%), CRF01_AE/C (13.3%), CRF01_AE/B/A2 (3.3%) and CRF01_AE/BC/A2 (3.3%), plus two potential CRFs. This study elucidated the complicated characteristics of HIV-1 molecular epidemiology in a low HIV-1 prevalence northern province of China and revealed the high level of HIV-1 genetic diversity. All nine HIV-1 genotypes circulating in Hebei have spread out of their initial risk groups into the general population through sexual contact, especially through MSM. This highlights the urgency of HIV prevention and control in China.


Introduction
A recent report revealed that 1920s Kinshasa was the epicentre of the early spread of human immunodeficiency virus type 1 (HIV-1), and HIV-1 then spread across the world via transport networks [1]. Global reports indicate that there are almost 75 million HIV infections in the world [2]. In China, the first acquired immunodeficiency syndrome (AIDS) patient (an Argentinean) was diagnosed in Bejing [3]in 1985 and the first native HIV-1-infected patient with haemophilia was diagnosed at almost the same time in Hangzhou [4], China. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 For over 30 years, HIV-1 has evolved rapidly, with an increasing number of genotypes and novel recombinant forms. Recombination between different genotype gene fragments is one of the main factors responsible for HIV-1 genetic evolution. A series of novel recombinant strains have been identified by scientists from China, including circulating recombinant form (CRF)07_BC and CRF08_BC among intravenous drug users (IDUs) [5], CRF55 01B [6] and CRF59 01B [7] among men who have sex with men (MSM), CRF61_BC [8], CRF57_BC [9], CRF62_BC [10], CRF64_BC [11] and CRF65_cpx [12] among IDUs and/or heterosexuals, and substantially different unique recombinant forms (URFs) among various at risk populations, enriching the global HIV-1 genetic data. Up to now, there are at least 12 HIV-1 genotypes in China [13]. Moreover, the predominant drivers of HIV-1 prevalence in China have obviously shifted, and heterosexual transmission has become the most common route [14]. However, the highest HIV-1 prevalence among MSM was found in newly diagnosed HIV-1 individuals in some provinces such as Liaoning, Beijing and Henan [15,16].
Hebei is a northern province of China, with 11 prefectures, and the gateway to Beijing, and neighbouring Henan in the south. Between 2005 and 2013, newly diagnosed HIV-1 individuals increased rapidly in Hebei, from 244 cases to 978 cases. Our recent study revealed that there were six HIV-1 genotypes circulating in Hebei in 2011, and subtype B, CRF01_AE and CRF07_BC were the most common genotypes [17]. In Hebei, the first recombinant form (CRF07_BC) was fund in 2002 [18], followed by CRF01_AE and CRF08_BC in 2008 [19], and URFs in 2011 [17]. Among the newly diagnosed HIV-1 individuals, the proportion of MSM increased from 4.9% in 2005 to 62.2% in 2013 [20], which is markedly higher than the data reported in 61 other cities in China [15]. Thus, MSM play a critical role in this increasing trend of HIV-1 prevalence in Hebei. The recombinant strains are arising frequently, especially in populations with multiple circulating HIV-1 genotypes [21].
In this study, we have carried out the most extensive systematic cross-sectional study based on newly diagnosed HIV-1 positive individuals in 2013, and characterized the molecular epidemiology of HIV-1 based on full length gag-partial pol gene sequences in the whole of Hebei. Hebei may not be representative of China, however,it can reflect HIV epidemic trend of the low HIV prevalence province, especially the provinces with the highest HIV-1 prevalence among MSM found in newly diagnosed HIV-1 individuals.

Ethics statement
Written informed consent was obtained from all adult subjects and parents/guardians of HIV-1 positive minors/children enrolled in our study before blood collection. Our study was approved by the local Ethics Committee at Hebei Provincial Center for Disease Control and Prevention (CDC). All of the experimental methods were implemented in accordance with the approved regulations and guidelines, and the experimental protocols were approved by the institutional review boards of Hebei CDC.

Study subjects
This study presents the most extensive HIV-1 molecular epidemiology investigation in a province in China thus far. In 2013, a total of 978 individuals were newly diagnosed with HIV-1 infections and did not receive antiretroviral therapy (ART). Of these individuals, 50 recently infected MSM found at MSM sentinel surveillance points have been reported in our previous study [20]. In the present work, a total of 856 blood samples of the remaining 928 newly diagnosed individuals were collected from the 11 Hebei prefectures after obtaining written informed consent, accounting for 87.5% (856/978) of all HIV-1 infections. The 11 prefectures are grouped into three regions (Table 1) according to their proximity (Fig 1). HIV-1 infections were mainly concentrated in central (n = 520+43 [20]), followed by northern (n = 216+7 [20]), and southern (n = 120) in 2013.
Participants' demographic data were obtained by face-to-face interviews using a standard questionnaire when we collected their blood samples.500 μl plasma separated from whole blood within 6 hours of collection was used to obtain HIV-1 nucleotide sequences for subsequent analysis.

HIV-1 sequence analysis
Initially, HIV-1 sequences were analysed and classified as pure subtypes, CRFs-like or URFs using the online REGA HIV-1 Subtyping Tool, version 2.0 [24]. Initial genotypes were then confirmed using the neighbour-joining (N-J) phylogenetic tree and recombinant breakpoints analysis. All reference sequences (A-D, F-H, J, K, O, CRF01_AE, all CRFs associated with B/ C, 01/BC and 01/B recombination) were obtained from Los Alamos HIV Sequence Database (http://www.hiv.lanl.gov/content/index). The original gene fragments sequenced successfully were edited and assembled as described previously [23]. Full length gag and partial pol gene sequences derived from the same subject were assembled together. The N-J phylogenetic tree was constructed using MEGA 5.0 with 1000 bootstrap replicates, based on Kimura 2-parameter Model. The analysis of recombinant breakpoints were implemented using the online jpHMM-HIV (http://jphmm.gobics.de/submission_hiv.html) and online RIP 3.0 (http://www. hiv.lanl.gov/content/sequence/RIP/RIP.html). Ambiguous recombinant forms analysed by the above methods were further confirmed using simplot 3.5.1 software with window size = 300 bp. The maximum-likelihood (ML) tree analysis wasperformed using MEGA 6.0 with 1,000 replicates. Viral pol gene sequences were submitted to the Stanford HIV Database (http:// hivdb.stanford.edu/pages/algs/HIVdb.html), and then compared to gene sequence of a subtype B reference strain(HXB2). Drug resistance mutations and antiretroviral susceptibility were evaluated using the stanford DR algorithm.

Statistical analysis
Statistical analyses for this study were performed using SPSS version 21.0 (SPSS Inc., Chicago, IL, USA). Mean or frequencies of demographic data (age, CD4 cell counts) were calculated. Categorical variables were analysed using the chi-square test. When more than 20% of cells had an expected count less than 5, the Fisher's exact test was used. All tests were 2-sided, and a result was considered statistically significant when the P-value was less than 0.05.
Furthermore, S1 Table shows that the distribution of transmission routes in northern, central and southern regions of Hebei were significantly different (χ 2 = 7.598, p = 0.022). The proportion of MSM in all 11 prefectures except Xingtai (46.2%) exceeded 50.0%. Indeed, MSM in Tangshan, Baoding and Hengshui accounted for 88.9%, 81.4% and 81.3% of the study population, respectively. One hundred per cent of the subjects from Tangshan, Cangzhou, Chengde, Handan and Hengshui were only infected through MSM and heterosexual sex (S1 Table).

Demographic distribution of HIV-1 genotypes
All nine genotypes were identified in the population who sexually contracted HIV-1, including eight genotypes (except subtype C) in MSM and seven genotypes (except subtype A and CRF65_cpx) in heterosexuals. Moreover, HIV-1 genotype distribution between heterosexual and MSM sexual transmission showed the significant difference (χ 2 = 16.365, p = 0.017). For example, in the ML tree analysis based on full-length gag-partial pol (1.3kb) sequences ( Fig  2), eight clusters designated clusters 1-8 were observed among CRF01_AE sequences. Each cluster contained more than 3 our sequences with high branch supports. Clusters 3-8 indicated that the number of sequences was far more from MSM than other risk groups, which suggests that the CRF01_AE genotype is far more prevalent in MSM than other risk groups. However, in cluster 1 and cluster 2, the number of sequences was far more from other risk groups than MSM, and our sequences closely clustered together with those from Guangxi, Guangdong and Sichuan. Close relationship between our sequences with those from Liaoning (cluster 3), Jiangsu (cluster 4), Yunnan (cluster 8) and neighbouring provinces (Shandong, Tianjin, Beijing, etc.) of Hebei (cluster 5) were also observed in the ML tree. Moreover, cluster 6 and cluster 7 provide a strong evidence that HIV-1 CRF01_AE strains are circulating in different prefectures of Hebei through sexual transmission, especially MSM.
Only one genotype was identified in MTCT (CRF01_AE) and blood transmission (subtype B), respectively. For infection status, eight of nine genotypes were found in previously infected subjects, noticeably more than the five genotypes found in recently infected subjects. One of three subjects with genotypes found for the first time in this study was recently infected with CRF65_cpx. URFs were found in all demographic groups except MTCT and blood transmission, and differentially distributed in different infection status (p = 0.048). 93.3% (28/30) of URFs were found in subjects with sexually contracted disease, including MSM (4.8%, 21/411) and heterosexuals (4.1%, 7/172). 6.7% (2/30) of URFs were identified in two IDUs, and one of the two IDUs had not only injected drugs but also had risky heterosexual sexual behaviours. The prevalence of HIV-1 genotypes in different demographic groups is indicated in Table 2.   Geographic distribution of HIV-1 genotypes As indicated in Fig 1 and Table 2, the three main genotypes (CRF01_AE, CRF07_BC and subtype B) were extensively distributed in all 11 prefectures and showed significant differences (χ 2 = 45.555, p = 0.001). In Zhangjiakou, the prevalence of CRF07_BC (50.0%) was higher than that of CRF01_AE (38.5%), being the most frequent genotype. However, in the remaining 10 prefectures, CRF01_AE was the most frequent, and the prevalence of CRF01_AE exceeded 50% in eight of the 10 prefectures, notably up to 68.8% in Handan. Additionally, the occurrence odds of URFs in northern, central and southern regions of Hebei were unequal (χ 2 = 6.924, p< 0.031). The prevalence of URFs exceeded 10% only in Chengde, and the prevalence of URFs in the remaining 10 prefectures is shown in Table 2.
In the southern region, there were three risk groups. MSM (68.9%) and heterosexuals (27.9%) accounted for 96.7% of all subjects. Notably, blood transmission (3.3%) was present solely in this region (S1 Table). There were only two subjects infected with HIV-1 through blood transfusion. One was female farmer aged 42 years and the other was male farmer aged 47 years. These two cases who had the history of blood transfusion in the 1990s [19] were from two unlinked counties of Xingtai, respectively.Only four genotypes were found here. CRF01_AE (60.7%) was the most dominant genotype, followed by subtype B (26.2%), CRF07_BC (11.8%) and CRF08_BC (1.6%). No URFs were found. The prevalence of HIV-1 genotypes in this region was significantly different from the northern and central regions ( Table 2).  Table 3. Of these recombinant forms, with the exception of two IDUs (one subject with intravenous drug use and unprotected heterosexual contact), all of the remaining subjects with these recombinant strains were infected through sexual contact (eight heterosexuals, 24 MSM). The analyses of recombinant breakpoints and the neighbour-joining (N-J) tree (S1 Fig) revealed that 13CZ095, 13TS1101 and 13CZ090 in cluster 1 (S1 and S2 Figs) and 13ZJ045 (S1 and S3 Figs) were diagnosed asCRF55_01B and CRF65_cpx, respectively. The remaining 30 sequences did not cluster with pure subtypes or CRFs reference sequences (S1-S3 Figs), and the breakpoints analysis also confirmed they were URFs (Fig 3).

Identification of URFs
Among 30 URF sequences, the breakpoints analysis (Fig 3) revealed that six recombinant patterns were confirmed using jpHMM-HIV and RIP 3.0. CRF01_AE/BC was the most common recombinant pattern, accounting for 40% (12/   HIV-1 molecular epidemiology in Hebei mosaic structure of sample 13CZ073 based on full length gag-partial pol was the most complicated (Fig 3).
In the CRF01_AE/C pattern, all four sequences harboured a genomic structure with a subtype C fragment within a CRF01_AE backbone, and the recombination positions located in the overlapping site between gag and pol region (one sequence) and pol region (three sequences), respectively. In the B/C pattern, the recombination positions of subtype B and subtype C located in the overlapping gene region (two sequences) of HIV-1 gag and pol genes, and pol region (three sequences), respectively. Only one sequence (13BD159) was identified as a CRF01_AE/B/A2 recombinant. Furthermore, five recombination positions accumulated in the gag region. However, the recombination positions of TS130808 within CRF01_AE/BC/A2 were mainly located in the pol region, and were previously described in detail by Lu et al [25]. Of these six recombinant forms identified, CRF01_AE/B, CRF01_AE/BC and B/C were found in 72 CRFs obtained from the HIV database (http://www.hiv.lanl.gov/content/sequence/HIV/ CRFs/ CRFs.html).
Among 30 URFs, three clusters were identified from 13 sequences (S1 Fig). Cluster 2 and cluster 3 were supported by a 99% bootstrap value, respectively, distantly associated with all reference sequences. Cluster 2 and cluster 3 were potential CRFs (pCRFs) previously described in detail by Lu et al [25]. However, compared with three sequences in cluster 2 reported by Lu et al [25], four of our sequences were included in cluster 2 in this study, and 13TS203 obtained from Tangshan was a new sequence with identical mosaic structure (Fig 3). In cluster 4, the recombinant breakpoints of all five sequences obtained from Chengde (13CD09), Shijiazhuang ) 13 SJ211 and 13SJ212), Tangshan (13TS821) and Langfang (13LF080) were located in the pol region. As shown in Fig 3, 13SJ211 and 13SJ212 contained identical genomic mosaic structures, however, the mosaic structures of 13CD09, 13TS821 and 13LF080 were different: 13CD09 and 13LF080 contained only subtype B gene fragments at positions 2842-3177 and 2462-2572 approximately, respectively. The mosaic structure of 13TS821 harboured a recombinant breakpoint at overlapping site of HIV-1 gag and pol gene sequences, different from the other four sequences. According to the criteria for identification of a new CRF [26], the above fact demonstrated that cluster 4 should not be a pCRF.

Discussion
By the end of 2013, a total of 4148 individuals were diagnosed with HIV-1 infections in Hebei, accounting for 0.9% of the HIV-infected individuals nationwide, thus Hebei is ranked 22 nd of China's provinces [20]in terms of survival HIV or AIDS individuals. In contrast, HIV-infected individuals in Yunnan, Guangxi and Sichuan accounted for 45% of HIV-infected individuals nationwide (http://www.qianzhan.com/news/detail/365/141203-8a6e790b.html). Furthermore, it was estimated that the prevalence of HIV-1 in all populations was 0.011% in Hebei, significantly lower than 0.059% in China and 0.8% in the world (http://yanzhao.yzdsb.com.cn/ system/2014/11/27/014005079.shtml), maintaining a low HIV-1 prevalence. In past years, some areas with high HIV prevalence, such as Yunnan, Sichuan and Guangxi, have received more attention, including policy and funding, than areas with a low HIV prevalence.
This study is the most extensive HIV-1 molecular epidemiological investigation in a province in China. Our present work revealed a high level of HIV-1 genetic diversity and the seriousness of the HIV epidemic in Hebei. A total of nine genotypes were identified among newly diagnosed HIV-1 positive individuals in 2013 in Hebei (Table 2). Compared with our report in 2011 [17], a major shift of HIV-1 genotypes circulating in Hebei was revealed in this study, and CRF01_AE and CRF07_BC have replaced subtype B to become the most common and second common genotypein this area, respectively. Moreover, one subtype (subtype A1), 2 CRFs (CRF55_01B and CRF65_cpx) and 2 pCRFs were first identified within this year. These results suggest that Hebei has become one of China's provinces [13]with the most HIV-1 genotypes.
HIV-1 diversity varied greatly in different demographic characteristics and different geographical regions. All nine genotypes were found in subjects with the following epidemiologic characteristics: male, 25-49 years old, CD4 counts 200 cells/μl, married and Han. Distribution of the three main genotypes showed significant statistical differences in gender, transmission routes and ethnicity. CRF01_AE predominated among nearly all demographic characteristics (more than 50.0%), with the exception of CRF07_BC and subtype B being the most frequent genotypes in IDUs (81.0%) and minorities (56.7%), and in blood transmission (100.0%), respectively. We found the highest level of viral diversity was among the group with sexually contracted HIV-1: all nine genotype categories were identified in this population, including eight genotypes in MSM and seven genotypes in heterosexuals. However, only one subtype was found in blood (subtype B) and MTCT (CRF01_AE), respectively. Furthermore, the distribution of HIV-1 genotypes in the northern, central and southern regions was significantly different. Eight genotypes and four risk groups were observed in the northern and central regions respectively, and CRF01_AE was the most frequent, followed by CRF07_BC, subtype B. However, four genotypes and three risk groups were found in the southern region, and subtype B replaced CRF07_BC as the second prevalent strain. Compared with the northern and central regions, HIV-1 genetic characteristics in the southern region with less HIV-1 infections were simpler. The above fact suggests that all nine HIV-1 genotypes circulating in Hebei have spread out of their initial risk groups into the general population through sexual contact, especially through MSM.
In previous reports [12,14,27], no CRF65_cpx were found in northern China. Our study is the first to confirm the emergence of this CRF in Hebei, a northern province of China. One subject with CRF65_cpx was recently infected through MSM and loved to travel around China, suggesting that the homosexual contact on his trip was the main factor behind CRF65_cpx being introduced into Hebei. Of three subjects with CRF55_01B, two subjects were infected through MSM, and another was infected through heterosexual contact.
The overall prevalence of URFs in Hebei increased rapidly within 3 years, from 1.4% in 2011 [13]to 4.9% in 2013. Meanwhile, the recombinant patterns were more and more complicated, from only one (CRF01_AE/B) in 2011 to six patterns in 2013. Compared with the provinces with high HIV-1 prevalence, Hebei had a higher prevalence of URFs than Henan [28,29] and Guangxi [29,30], but lower than that in Yunnan [29,31,32]. Furthermore, HIV-1 recombination patterns in Hebei were more complicated than that in these three provinces. Additionally, 96.7% of URFs were associated with sexual contact, especially MSM (70.0%). We deduced that the geographic differences in the HIV-1 epidemic will change with time, and tend to decrease, especially in areas where the sexual transmission is the most frequent route of HIV spread.
As the major genotypes, the frequencies of CRF01_AE, BC and subtype B gene fragments recombined into URFs were also the highest, accounting for 83.3%, 60.0% and 26.7%, respectively, which provides new evidence for the opinion that the co-circulation of different HIV-1 genotypes results in the occurrence of novel recombination forms. Especially, two pCRFs originated from CRF01_AE and subtype B, and CRF01_AE and CRF07_BC respectively were also observed. As reported previously, HIV-1 genotype and viral tropism are the most important factors affecting the development of disease [33]. For example, the CRF01_AE strain has the strongest ability to infect people and cause disease, and is almost two times as efficient as the non-CRF01_AE strain [34,35,36]. In this study, in contrast to the most prevalent genotype (CRF07_BC, 35.5%) in China [13], the prevalence of CRF01_AE (53.4%) was the highest in Hebei, far above that of CRF07_BC (23.4%) and subtype B (15.9%). Moreover, the CRF01_AE strains from Hebei were closely related to southern provinces of China and neighbouring provinces of Hebei, and circulating in 11 prefectures of Hebei. Among 105 subjects with TDR-related mutations, 69.5% contained CRF01_AE. This hints that the virulence of HIV-1 CRF01_AE and recombinant strains circulating in Hebei obviously increases and significantly accelerates the progress of AIDS.
In contrast to the heterosexual transmission being the predominant route in China [13,14], MSM transmission has become the most common transmission route in Hebei [20]. Some reports indicated that the recent HIV-1 infection rate among MSM (7.0-8.0%) [37,38] was significantly higher than that among heterosexuals (0.5-4.4%) [39,40,41]. Hence, since the first HIV-1 MSM transmission was identified in 1997, MSM have attributed tremendously to the HIV-1 epidemic in Hebei. In this study, the high level of HIV-1 diversity and recombinant patterns were characterised in a population who had caught HIV through sexual contact. Particularly, HIV-1 diversity among MSM in Hebei was more complicated than that in most provinces of China [13]. Furthermore, HIV-1 resistant strains were only found in sexual contact population, especially MSM (64.8%). These results suggest that the HIV-1 epidemic among MSM constitutes the most predominant driver of the increased HIV-1 prevalence in Hebei. We must take effective measures to intervene in high risk sexual behaviours to control the increasing trend of HIV-1 prevalence, for example through increased education on sexually transmitted diseases and methods of protection.
There are two limitations in this study. First, we attempted to amplify and sequence HIV-1 gag and pol gene sequences from as many samples as possible. However, sequences were not successfully obtained from all the samples due to limited blood plasma volume, the poor quality of samples, the low viral load and limited interim storage conditions (during the sampling collection). Second, full length gag-partial pol sequences used for identification of HIV-1 genotypes maybe underestimate the prevalence of novel recombinant strains. Further studies will be very important to identify pCRFs among URFs basing on full length HIV-1 gene sequences.

Conclusions
Our present study elucidated the complicated characteristics of HIV-1 molecular epidemiology in a low HIV-1 prevalence northern province of China and revealed the high level of HIV-1 genetic diversity. HIV-1 is evolving quickly and spreading through sexual exposure, especially through MSM in Hebei. Our findings highlight the urgency of HIV prevention and control in China, and suggest that the government should pay greater attention to the development of HIV-1 in the low HIV-1 prevalence areas. We should take great efforts to reduce diverse risky sexual behaviours to slow the spread of the epidemic. Of B/C recombinant strains, although jpMM-HIV analysis indicated that 13CZ026 (underlined) was subtype C, the N-J tree (Fig 2, S3), SimPlot 3.5.1 and RIP 3.0 (window size = 300) analysis confirmed that this recombinant strain was composed of subtype B and subtype C, and a small subtype B gene fragment was inserted into a subtype C backbone in the pol region. (TIF) S1 Table.