Skip to content
Publicly Available Published by De Gruyter September 20, 2018

New Challenges in HIV Research: Combining Phylogenetic Cluster Size and Epidemiological Data

  • Nabila Parveen , Erica E. M. Moodie ORCID logo EMAIL logo , Joseph Cox , Gilles Lambert , Joanne Otis , Michel Roger and Bluma Brenner
From the journal Epidemiologic Methods

Abstract

An exciting new direction in HIV research is centered on using molecular phylogenetics to understand the social and behavioral drivers of HIV transmission. SPOT was an intervention designed to offer HIV point of care testing to men who have sex with men at a community-based site in Montreal, Canada; at the time of testing, a research questionnaire was also deployed to collect data on socio-demographic and behavioral characteristics of participating men. The men taking part in SPOT could be viewed, from the research perspective, as having been recruited via a convenience sample. Among men who were found to be HIV positive, phylogenetic cluster size was measured using a large cohort of HIV-positive individuals in the province of Quebec. The cluster size is likely subject to under-estimation. In this paper, we use SPOT data to evaluate the association between HIV transmission cluster size and the number of sex partners for MSM, after adjusting for the SPOT sampling scheme and correcting for measurement error in cluster size by leveraging external data sources. The sampling weights for SPOT participants were calculated from another study of men who have sex with men in Montreal by fitting a weight-adjusted model, whereas measurement error was corrected using the simulation-extrapolation conditional on covariates approach.

1 Introduction

The HIV pandemic is composed of complex sub-epidemics, each influenced by many biological, behavioral and cultural factors in susceptible populations. Concentrated epidemics in North America have been localized to specific at-risk populations such as men who have sex with men (MSM) and intravenous drug users. Antiretroviral therapy has increased the quality and length of life of individuals infected with HIV, and decreased transmissions (Granich et al. 2010). Nevertheless, HIV remains a public health concern that disproportionately affects MSM remain in Canada (Brenner et al. 2007; Cain et al. 2013; Remis et al. 2014; Remis and Palmer 2009): 57% of incident cases of HIV are among MSM (Yang and Ogunnaike-Cooke 2016).

Early stage infection, often defined as within six months of infection, is thought to be a key window for HIV transmission (Cohen et al. 2011; Powers et al. 2011; Brenner et al. 2011; Wainberg and Brenner 2012; Brenner, Wainberg, and Roger 2013), likely due to high concentrations of viremia in bodily fluids. While direct evidence on transmission chains cannot measured by phylogenetic analyses, these have been used to provide insights into transmission networks by clustering individuals based on similarities the RNA of the HIV with which they are infected (Brenner et al. 2011; Leigh Brown et al. 2011; Lewis et al. 2008; Hué et al. 2004; Brenner et al. 2007; Bezemer et al. 2010; Hughes et al. 2009; Yerly et al. 2009; Levy et al. 2011). Coupled with epidemiological data, a phylogenetic strategy may provide a unique window to discern HIV transmission, and to attempt to correlate personal characteristics (demographic, behavioral) with large transmission clusters, which are thought indicative of rapid HIV transmission (Brenner et al. 2007; Brenner and Wainberg 2013; Brenner, Wainberg, and Roger 2013). Our analysis focuses on investigating the behavioral correlates of phylogenetic cluster size using data from SPOT, a free and anonymous HIV testing service offered to MSM in Montreal, Canada. Given the complex and dynamic nature of rapid transmission events, our analysis does not aim to be causal in any sense as we are unable to ensure temporal ordering of some of the epidemiological variables that we consider and infection. Nevertheless, the methods that we propose could be used in settings where temporally ordered data were available to predict which clusters are at risk of growing to a large size to provide real-time responses to outbreaks, as has been done in another Canadian province for both tuberculosis (Cheng et al. 2015) and HIV (Poon et al. 2016).

In this paper, we outline some methodological challenges that have arisen in attempting to combine phylogenetic and epidemiologic data, and demonstrate solutions to address these challenges. The first challenge is one of information. HIV positive tests have been notifiable in Canada since 2004, however reporting is anonymous in the province of Quebec. Epidemiological data is available through a research questionnaire completed by patients participating in SPOT, and HIV genotyping is performed on blood samples of those found to be positive. The phylogenetic sequences from SPOT participants are clustered with all sequences from the Quebec HIV Genotyping Cohort, a cohort of several thousand, to determine the size of the phylogenetic network with which the individual’s HIV clusters. The HIV Genotyping Cohort is part of a drug resistance programme, operational since 2002, that includes HIV pol sequences of nearly all Quebec residents diagnosed with HIV since the cohort’s inception. Thus, for all SPOT participants found to be living with HIV, we may determine the number of people who share a similar virus, i.e. the size of the phylogenetic cluster to which that individual belongs. However, as detailed below, the resulting clusters are known to be too small, and so measurement error in the cluster size must be taken into account. Further, while SPOT has a significant research component, the study recruited often by social networks and with the aim of providing HIV testing to sexually active MSM, so that the generalizability of findings from the analysis are uncertain. We therefore will supplement these data with another study of MSM in Montreal whose sampling design was venue-based and probabilistic.

2 Methods

We use SPOT data to evaluate the association between HIV transmission cluster size and the number of sex partners for MSM, after adjusting for the SPOT sampling scheme and correcting for measurement error in cluster size. Below, we describe in detail our primary data sources, and the methods that we propose to overcome the two major challenges that we encountered in analysing the SPOT data: non-random sampling and measurement error.

2.1 Data sources

The SPOT study. SPOT is an HIV-testing program with a research component targeting MSM in Montreal, Quebec. Beginning in July 2009, SPOT offered free, anonymous, HIV point of care rapid tests to MSM at a community-based testing site close to Montreal’s gay village. Individuals were recruited to the research component of SPOT provided they met the inclusion criteria: self-identification as male; at least 18 years of age; resident of Quebec, speaking and understanding French or English; anal sex with another man in the past 12 months; and unknown HIV status at the time of testing. SPOT promotion was also undertaken through outreach activities organized by the RÉZO community organization. Previously, it has been reported that a large number of participants learned of SPOT from friends (Otis et al. 2016).

In addition to free rapid tests, the SPOT project administered an anonymous questionnaire eliciting information on socio-demographic characteristics, HIV testing behavior, and behavorial/lifestyle information; phylogenetic analyses were undertaken on anyone found to be HIV+. We analyzed the data from 1803 men recruited up until 2013, 36 of whom were found to have HIV.

Two inferential challenges are encountered in SPOT. The first is one of sampling: A sampling frame is of course not available for the target population. SPOT is an HIV testing resource, primarily, and hence attracts individuals in an approach that could be likened to convenience-sampling. However, as detailed in the section that follows, we have access to data that were sampled probabilistically.

The second challenge faced is one of measurement error. The Quebec Genotyping Cohort does not contain all phylogenetic information for all individuals with HIV in the province of Quebec. Individuals may not be included for a variety of reasons including not having been tested (either at all, or only outside of the province), having viral load less than 400 copies per ml (Brenner and Moodie 2012), or having been diagnosed well before the inception of the genotyping program. Consequently, measurement error occurs in defining the cluster size and this measurement error is characterized by a systematic under-counting of the true cluster size due to the absence of the phylogenetic information on the HIV status or phylogenetic cluster of the individuals in the province (Parveen, Moodie, and Brenner 2015). However the under-counting exists is only for those men who are HIV-positive; anyone free of HIV has a cluster size of 0.

Thus, to obtain the valid inferences in investigating correlates of large phylogenetic clusters, we proceed with an analysis that accounts for the non-probabilistic recruitment scheme employed by SPOT, and measurement error in cluster size. We propose doing so through venue-based sampling weighting using an external source of data in combination with a new method of measurement error correction designed specifically for settings in which validation data are unavailable and measurement error may depend on another covariate, namely HIV status: simulation-extrapolation conditional on covariates (Parveen, Moodie, and Brenner 2017).

The ARGUS study.[1] The data used in this study were collected from the second wave of ARGUS (Lambert et al. 2009), a second generation surveillance study designed to monitor trends in HIV, sexually transmitted infections, and risk behaviors among MSM living in the province of Quebec that occurred in 2008-2009. Participants were recruited using time-location sampling, or venue-based sampling, from venues including saunas, bars, coffee shops, sports and recreational groups where gay men interact, as well as a fixed study site (i.e. a non-mobile research site that is not a social venue). At each venue (except for the fixed site), individuals were randomly sampled from among those present. Information was collected on the frequency with which such a venue was attended so that individuals could be reweighted according to the inverse of the likelihood of having been sampled. From the 42 sampling locations in the province of Quebec (37 of which were located in Montreal), 1873 individuals were recruited in the period around when the individuals in SPOT were recruited. A self-administered questionnaire was given, and a blood sample was also collected to perform screening tests for HIV, syphilis and Hepatitis C virus. The ARGUS questionnaire focused on a participant’s socio-demographic characteristics, structure of his social network, sexual and other lifestyle activities. However, phylogenetic cluster size was not available for ARGUS participants at the time of the analysis. As ARGUS employed venue-based sampling, the data from ARGUS respondents, when appropriately reweighted, may better capture the target population of MSM in an urban Quebec setting.

To use estimated weights in SPOT that are derived from models for the ARGUS venue-based sampling weights, several assumptions are required. First, we must assume that both SPOT and ARGUS are targeting the same population, in this case of MSM. We also require that (i) the covariates that are common to SPOT and ARGUS that were used to predict the sampling weights are the only variables that predict the frequency with which an individual might visit a gay social venue, and (ii) that we can correctly specify the dependence of the sampling weight on those covariates. Under these assumptions, SPOT participants are conditionally exchangeable with ARGUS participants, and may be consistently weighted to estimate population-level parameters from the reweighted sample.

2.2 Statistical methods

Venue-based sampling. Venue-based sampling a sampling technique targetting hard-to-reach populations (Gustafson et al. 2013), including MSM. Venue-based sampling typically begins with an on-field team interviewing key service providers and members of the target population to identify a range of venue-day-time (VDT) units in which to locate the members of the target population as the sampling frame. The research team then visits the venues, counting the number of individuals present, prepares a list of potentially eligible VDT units, and estimating the population size for each VDT unit. Upon building the sampling frame, the sample is selected in two stages. In the first stage, venues are selected as primary sampling units using simple or stratified sampling with probabilities proportional to the estimated size of the population captured in each venue. In the second stage, a sample of participants from the selected venues is drawn using systematic (random) sampling (Magnani et al. 2005). There are many advantages to venue-based sampling, such as (i) it allows the calculation of a selection probability for each individual in the sample; (ii) unlike convenience sampling, it greatly diminishes the arbitrary selection of venues or individuals, and provides a replicable sampling selection method; and (iii) it does not require a comprehensive enumeration of individual members of the target population so long as all members of the population can be assumed to be reached at the sampled venues at different times. Venue-based sampling is not without costs: it requires intensive fieldwork to visit and map VDTs. Moreover, bias or low generalizability can occur if key venues are missed, or members of the target population do not (as assumed) frequent the venues included in the sampling frame.

We calculated sampling weights in ARGUS based on the reported frequency of attending the venue from which a man was sampled. E.g. men who were sampled from a café received a weight 60, 15 or 3.75 if they reported visiting cafés where MSM socialize less than 1 time per month, 1–3 times per month, or at least 1–3 times per week, respectively. These weights were obtained as the inversely proportional to the frequency with which the venues were attended, as follows: attendance of a particular venue-type such as a café

  1. less than once/month gave a weight of 60 (which we may think of as being derived from [1/(60days)]1, if less than monthly can be taken to every two months);

  2. 1–3 times/month gave a weight of 15;

  3. 1–3 times/week or more gave a weight of 3.75.

Weights were calculated similarly for men recruited from each of the social sites included in the VDT sample (bars, saunas, etc.). For the fixed study site, all participants were recruited, and hence the sampling weight for these individuals was 1. Therefore, each participants in the ARGUS study received a weights of 1, 3.75, 15, or 60.

As ARGUS and SPOT are assumed to recruit from the sample population, we leveraged the information in the venue-based sampling weights from ARGUS to create venue-based sampling weights for SPOT. Specifically, we built a model to estimate venue-based weights for SPOT using the weight from ARGUS study as dependent variable in a regression model which took all common covariates in the SPOT and ARGUS studies as predictors (see Table 1). Then, using the covariates in SPOT, we predicted the most probable weight for each SPOT participant. In our primary analysis, we used a multinomial logistic regression; a sensitivity analysis using linear regression to model the venue-based weights. Note that the number of sex partners in ARGUS was counted for the last 6 months whereas it was counted in SPOT for the last 3 months. We, therefore, multiplied the number of sex partners in SPOT by 2.

Table 1:

Characteristics of the ARGUS and SPOT participants. Statistics shown are mean (Standard Deviation) for continuous or count variables, and number (Percentage) for categorical variables.

VariablesARGUSSPOT
(n = 1873)(n = 1803)
Age40.45 (12.4)32.6 (10.5)
Ethnic origin
  French-Canadian1390 (74%)883 (49%)
  English-Canadian133 (7%)216 (12%)
  Other origin350 (19%)704 (39%)
Education
  Completed college or below968 (52.1%)680 (37.7%)
  Completed university848 (45.6%)1070 (59.4%)
  Other training42 (2.3%)53 (2.9%)
Income in the previous year (before taxes)
  < 30,000 CAD722 (38.9%)725 (40.2%)
  ≥ 30,000 CAD1107 (59.6%)949 (52.6%)
  Unwilling to report29 (1.6%)129 (7.2%)
  HIV status (HIV+ve)248 (13%)34 (0.02%)
Gay or homosexual1633 (87.9%)1498 (83.5%)
No. of sex partneramedian = 8 (IQR =18)median = 3 (IQR = 4)
No. of one night sex partnersamedian = 4 (IQR =16)median = 1 (IQR = 4)
Used condom at last intercourse1332 (71.7%)1343 (74.6%)
Current no. of gay or homosexual friends
  None50 (2.7%)53 (3%)
  Less than half545 (29.6%)667 (37.2%)
  Half478 (25.9%)591 (33%)
  Most771 (41.8%)480 (26.8%)
Any previous HIV tests1667 (89.7%)1596 (89.1%)
Drugs useb
Snorted or smoked cocaine
  Never1418 (76.3%)1552 (93.4%)
  Occasionally440 (23.7%)110 (6.6%)
Snorted or smoked heroin
  Never1839 (99%)1662 (99.9%)
  Occasionally19 (1.1%)2 (0.1%)
Snorted ketamine
  Never1700 (91.5%)1617 (97.2%)
  Occasionally158 (8.5%)46 (2.7%)
Snorted crystal meth
  Never1813 (97.6%)1637 (98.4%)
  Occasionally45 (2.5%)26 (1.5%)
  1. aIn the last 3 months for SPOT, the last 6 for ARGUS.

    bIn the last 6 months for SPOT.

Simulation-extrapolation conditional on covariates. Over the last three decades, several measurement error correction methods including as regression calibration (Gleser 1990; Carroll, Ruppert, and Stefanski 1995), multiple imputation (Rubin 1987; Cole, Chu, and Greenland 2006), and simulation-extrapolation (Cook and Stefanski 1995; Carroll, Ruppert, and Stefanski 1995) have been proposed. Each of these approaches, with the exception of simulation-extrapolation, requires either a validation sample or replicate data for some fraction of the observed sample. In the SPOT study, phylogenetic cluster size is under-counted due to unobserved (untested) individuals. Therefore, obtaining a validation sample is both ethically and practically infeasible as it would require the testing of all members of the population of interest (in our case, all MSM resident in the province of Quebec). Under this circumstance, that is in the absence of validation or replicated data, simulation-extrapolation is the most promising avenue for correcting the undercounting of the cluster size data.

The simulation-extrapolation method developed by Cook and Stefanski (1995) is a simulation based technique for estimating and reducing bias due to additive measurement error. Simulation-extrapolation is an estimation procedure consisting of a simulation step and an extrapolation step. Estimates are obtained by adding additional measurement error (in known increments) to the mis-measured data, computing estimates from the contaminated data, establishing a trend between these estimates and the variance of the added measurement errors, and extrapolating this trend back to the case of no measurement error. This method requires the knowledge of the distribution of the measurement error (which may be known in cases such as a laboratory assay, or estimated using external or validation data). Let Ui, i=1,,n is the unobserved true explanatory variable, and Xi an error-prone version. Consider Xi=Ui+δi, where δiN(0,σδ2) and δi is independent of Ui,Yi. Simulation-extrapolation proceeds in two steps. In the first simulation step, artificial measurement error is added to Xi and B new covariates Xi,b(λk) are generated through Xi,b(λk)=Xi+λkδib, where b=1,,B; k=1,,K and i=1,,n for values of λk chosen by the analyst and {δi,b}b=1B are independent computer simulated random numbers from N(0,σδ2). It can be shown that the variance of Xi,b(λk) is σU2+(1+λk)σδ2, which increases with λk. For each λk, let βb^(λk) denote the vector of naïve estimates obtained by regressing Y on Xi,b(λk). Using B estimates for each λk, an average estimate can be obtained as B1b=1Bβ^b(λk). By regressing βb^(λk) on λk, and extrapolating back to λk=1, we find the estimate β^(1) corresponding to the error σU2+(1+λk)σδ2=σU2 (i.e. the error free setting). Typically, βb^(λk) is regressed on λk assuming either a quadratic or a non-linear relationship e.g. a lowess smoother (Cleveland 1979). Parveen, Moodie, and Brenner (2017) extended simulation-extrapolation to accommodate measurement error distributions that (i) depend on other covariate and (ii) need not to have mean zero, called simulation-extrapolation conditional on covariates (SIMEX-CC). They expressed the imperfect measurement of Ui as Xi=Uiδi, where the conditional distribution of the error δi given a correctly measured variable Zi was P(δi|Zi=zi)=fz with a finite mean (μδ) and variance (σδ2). In SIMEX-CC, the simulated covariate Xi,b(λk) was taken to be

(1)Xib(λk)=Xiλk×δib+(1+λk)×μδ,

where μδ=E(δi|Zi); b=1,,B; k=1,,K and i=1,,n. Note that this measurement error specification can be made to accommodate settings where Xi is always less than or equal to Ui if, for example, we take Xi=Uiδi with δi a non-negative random variable.

In the SPOT data setting, we consider Ui to be the (unobserved) true cluster size and Zi to be the HIV status of the participants. All HIV negative participants belong to a cluster size zero, and there is no measurement error in this cluster size. Measurement error was only present in the cluster size of HIV positive participants. Thus measurement error in cluster size depends on another covariate: the HIV status of the participants and, as such, we applied the SIMEX-CC method.

To undertake our analysis of the epidemiological correlates of phylogenetic cluster size in the SPOT data, we considered a weighted regression of Y on Xi,b(λk) in the simulation step of SIMEX-CC method, where weights were the venue-based sampling weights estimated with the external information provided from ARGUS, as described above.

3 The SPOT analysis

3.1 Methods

In the present analysis, our main focus was on the association between the phylogenetic cluster size and the number of sex partners in the SPOT data. We adopted log-linear models to assess whether there was any evidence of a relationship between cluster size on number of sex partners, when adjusting for age, ethnicity, education, and income as potential confounders. Selected characteristics (unweighted) of the study samples from SPOT and ARGUS subjects are presented in Table 1; for a detailed breakdown of key covariates in the SPOT study by HIV status, please see Supplementary Materials Table 1, where similar distributions of characteristics are observed between the HIV positive and negative participants. As noted in the previous section, to adjust for the SPOT sampling scheme in the analysis stage, we used external information from the ARGUS study. The distribution of covariates varies between the SPOT and ARGUS studies, with ARGUS participants being more commonly of French-Canadian origin, less likely to hold a university degree, and more likely to have snorted or smoked cocaine.

The sampling weight for ARGUS participants were directly calculated using the frequency of venue attendance. We then use multinomial logistic regression to predict the sampling weights in ARGUS using the variables listed in Table 1, which are the variables common to both the SPOT and ARGUS studies. The fitted model was then used to estimate sampling weights for the SPOT study participants. In the SPOT study, there were some missing data. For the variables age, ethnicity, cluster size, number of sex partners and number of one night sex partners the number of missing data points were 6, 3, 2, 137 and 50 respectively. Since fewer than 8% of data were missing, our primary analysis used complete cases only, and then perform sensitivity analyses following imputation.

To apply SIMEX-CC, we must supply the method with the mean and variance of the measurement error distribution of cluster size of the HIV positive individuals. We estimate this distribution using the following external data: The adult population of Quebec in the year from which the SPOT data were taken was 6,802,700 with an HIV incidence rate of 7/100,000[2] so that the total number of incident cases of HIV in Quebec can be estimated as 6,802,700 × 0.00007 ≈ 476. It has been suggested that in Canada, approximately 25% of people who are living with HIV do not know their infection status.[3] In fact, the proportion of MSM in Montreal who are unaware of their HIV status is likely lower (13% in ARGUS), however we opt for the higher bound as previous work has suggested that it is better to overestimate rather than underestimate measurement error variance (Parveen, Moodie, and Brenner 2017). Thus, we estimate that there are 476 × 0.25 ≈ 160 who are “missing" from the genotyping program database, and thus contributing to the under-counting of cluster size. Based on previous studies of cluster size distributions (see, e.g. Lewis et al. 2008; Leigh Brown et al. 2011) and the sizes of clusters in SPOT, we found through trial and error that a Poisson(3) distribution was appropriate to yield a distribution of cluster sizes that was similar those found in the literature and would account for the approximately 160 individuals who are estimated to be missing from the Quebec genotyping program. While the Poisson distribution almost surely mis-specifies the true error distribution in cluster size, sensitivity analyses will be undertaken to assess the sensitivity of our findings to this rather stringent assumption.

We compared an unadjusted analysis to one that implements only SIMEX-CC, one that implements only the sampling weighting, and a final model that combined the sampling weighting and SIMEX-CC. When using SIMEX-CC, we used B = 200 replicates in the simulation step, and calculated the variance using a non-parametric bootstrap procedure with 1000 resamples. When combining both weighting and SIMEX-CC, we first applied the weighting then applied the SIMEX-CC to the weighted estimates. The bootstrap procedure ensures any variability due to weighting is also incorporated into the estimated standard errors.

3.2 Results

In Table 2, we compare four models: (i) a naïve unweighted model (neither adjusted for the sampling scheme nor corrected for measurement error); (ii) a naïve weighted model (not corrected for measurement error, but adjusted for the sampling scheme); (iii) an unweighted SIMEX-CC analysis (not adjusted for sampling scheme but corrected for measurement error); and (iv) a weighted SIMEX-CC (adjusted for both the sampling scheme and measurement error). SIMEX-CC extrapolant functions are given in Supplementary Figures 1–2. We have found no evidence of association between cluster size and the total number of sex partners across all models. However, notable differences appear in the analysis of the number of one-night sex partners: cluster size appears to be associated with the number of one night partners when adjusting for sampling weights, however the association is very weak in magnitude across all models and not significant when measurement error is taken into account. In this analysis, adjusting for the sampling confers greater changes in the estimates than correcting for measurement error – perhaps unsurprisingly as there are relatively few individuals living with HIV and hence whose cluster size is subject to measurement error.

To assess the sensitivity of the results to the measurement error distribution, missing data, and the estimation of the sampling weight, we re-analyzed the SPOT data (i) assuming measurement error was distributed with a mean and variance of 5; (ii) considering all missing variables as the modal (most common) values for binary variables, and median for continuous and count variables; and (iii) modelling the distribution of sampling weights as via linear regression. We found no meaningful changes in the resulting estimates (see Supplementary Table 2–4). In settings like SPOT, where the measurement error affects only a small fraction of the total sample, analysts may wish to focus efforts on addressing any potential biases due to sampling design.

Table 2:

Results from unweighted and weighted naïve models, and unweighted and weighted SIMEX-CC where the cluster measurement error is assumed to be distribution Poisson(3), to assess the correlation between cluster size and number of sex partners. See Table 1 for definitions of categorical variables.

NaïveNaïveSIMEX-CCSIMEX-CC
UnweightedWeightedUnweightedWeighted
EstimateSECIEstimateSECIEstimateSECIEstimateSECI
Total Number of Sex Partners
Cluster size0.0020.005(0.008, 0.012)0.0020.002(0.006, 0.002)0.0020.008(0.014, 0.018)0.0010.013(0.026, 0.024)
Age0.0160.001(0.014, 0.018)0.004< 0.001(0.002, 0.006)0.0160.004(0.008, 0.024)0.0040.004( 0.004, 0.012)
Not English-Canadiana0.1790.035(0.110, 0.248)0.2080.021(0.167, 0.249)0.1790.096(0.009, 0.367)0.2090.099(0.015, 0.403)
Educ: university0.0160.022(0.027, 0.059)0.0400.007(0.054, 0.026)0.0160.081(0.143, 0.175)0.0400.250(0.530, 0.450)
Educ: other training0.1650.066(0.036, 0.294)0.0090.020(0.030, 0.048)0.1640.352(0.526, 0.854)0.0080.367(0.711, 0.727)
Income ≥ 30,0000.0730.023(0.118, 0.028)0.1860.008(0.170, 0.202)0.0730.093(0.255, 0.109)0.1860.150(0.108, 0.480)
Income not reported0.0340.043(0.050, 0.118)0.1130.014(0.086, 0.140)0.0340.140(0.240, 0.308)0.1130.114(0.110, 0.336)
Total Number of One-night Sex Partners
Cluster size0.0050.005(0.005, 0.015)0.0060.002(0.002, 0.010)0.0050.011(0.017, 0.027)0.0070.014(0.020, 0.034)
Age0.0220.001(0.020, 0.024)0.007< 0.001(0.005, 0.009)0.0220.004(0.014, 0.030)0.0070.005(0.003, 0.017)
Not English-Canadiana0.2280.044(0.142, 0.314)0.2660.027(0.213, 0.319)0.2280.147(0.060, 0.516)0.2690.144(0.013, 0.551)
Educ: university0.0170.027(0.036, 0.070)0.0460.010(0.026, 0.066)0.0170.104(0.187,0.221)0.0470.230(0.404, 0.498)
Educ: other training0.1790.079(0.024, 0.334)0.0360.026(0.015, 0.087)0.1780.528(0.857, 1.213)0.0320.428(0.807, 0.871)
Income ≥ 30,0000.1350.028(0.190, 0.080)0.1250.010(0.105, 0.145)0.1360.120(0.371, 0.100)0.1250.152(0.173, 0.423)
Income not reported0.0260.052(0.076, 0.128)0.1000.017(0.067, 0.133)0.0260.171(0.309, 0.361)0.0950.144(0.187, 0.377)
  1. Abbreviations: SIMEX-CC, Simulation-Extrapolation Conditional on Covariates; SE, Standard Error; CI, Confidence Interval

    aDue to uncertainty as to the most true underlying ethnic composition of the MSM community in Quebec we opted to consider English-Canadian versus other to reduce the influence of this variable on the sampling weights.

4 Discussion

In this paper, we have demonstrated how phylogenetic and epidemiological data may be combined in an effort to leverage new insights into the correlates of HIV phylogenetic cluster size in MSM using two high quality data sources: a rich dataset offering information on both phylogenetic clustering and epidemiological covariates, and another offering similar epidemiological data and yet is one of the few studies in the Canadian MSM population to have used a probabilistic recruitment approach via time-location sampling. To leverage the complementary strengths of these two datasets, we used considerable external information, adjusting SPOT by assuming it could be viewed through the lens of a probabilistic time-location reference sample and addressing measurement error in cluster size using the SIMEX-CC, which can accommodate measurement error that is covariate-dependent and may not have mean zero. In this case study, we have thus demonstrated important methodological tools that can be employed in a wide array of settings, most particularly in studies of hard-to-reach populations and studies where measurement error can be well-characterized.

Our analysis is subject to several limitations. First, while we have drawn on two large studies of urban MSM in Canada, power is limited due to the small number of people who were HIV positive and for whom both phylogenetic cluster size and epidemiological data were available. Further, the measurement error in cluster size was not known. Unlike a setting where error is the results of known, well-calibrated measurement error (e.g. laboratory assays) or where validation data are available, we estimated the distribution of the error using published studies and a trial-and-error approach to testing possible error distributions. In the context of SPOT, where measurement error appears to contribute little bias to the point estimate, this rather uninformed measurement error distribution had little impact on the conclusions of the analysis. Further, in previous work (Parveen, Moodie, and Brenner 2017), we have found that SIMEX can performs well in terms of bias reduction even when the measurement error distribution is mis-specified. Thus, while a more flexible distribution such as one that allow de-couples the mean and variance may have provided a better estimate of the error distribution, it is probable that the impact on the SPOT findings would be small.

Another important limitation is that “cluster size" is not a static measure, but rather one that evolves with new transmissions, and the dynamics that drive large clusters are highly complex and may vary from cluster to cluster. For instance, cluster size may be driven in one cluster by a large number of individuals engaging with a small number of sexual partners within the period of high infectivity shortly after seroconversion, and in driven in another by a small number of individuals with many sexual partners and transmission events. As time progresses, the measurement error in cluster size may decrease: with each new HIV-infected person being tested in the province, cluster size may be updated. However, simply adding precision to some final measure of a cluster size may not provide the relevant information: ideally, we want to correlated an individual’s socio-demographic and lifestyle characteristics at the time of infection with the size of the cluster at that same point in time. Phylogenetics to offer some insights into when an individual was infected, but not with enough precision to accurately determine the size of a cluster at the point at which an particular member of that cluster was infected.

Finally, our analysis implicitly assumed that ARGUS recruited from the same target MSM population as SPOT, and that are models for the sampling weights were correctly specified with no omitted covariates. While demographics are broadly similar in the two populations, ethnic origin did vary (with more SPOT participants being of non-Canadian origin) and cocaine use differed, likely due to recruitment from saunas, bars, and sex clubs in ARGUS. ARGUS sought to recruit lifetime MSM, not only those who were currently sexually active. If it is the case that the venues from which ARGUS sampled did not cover the entire MSM community, then the weighting scheme used will have adjusted SPOT to look more like ARGUS participants, but neither study sample will represent, or generalize to, the entire MSM community in Montreal or the province of Quebec. However, while it is possible that some social venues may not have been identified, the extensive formative research and environmental scan used to map the MSM community is a strength of the ARGUS study that minimizes this as a potential concern. Further, as the number of sexual partners was measured over different time frames in the two studies, we made a strong assumption that constant behavior over time so that the number observed over a 3 month period could be doubled to represent the number over a 6 month period. High risk sexual behavior such as the number of one night partners may vary considerably over time, however we had no data that would allow us to investigate the plausibility of this assumption in the studies.

As in previous work, no evidence was found to support a correlation between large cluster size and sexual risk behavior. This underscores the importance of finding other indicators such as, perhaps, real-time phylogenetic monitoring are needed to identify early stage infection and better understand transmission dynamics among MSM. Clearly, our work only scratches the very surface of the potential links between rapid transmission as hinted at by larger cluster size and individual-level characteristics, but is an important first step in posing the question and offering solutions to some of the methodological hurdles that may be faced.

Funding statement: This work was supported by Institute of Population and Public Health, Funder Id: 10.13039/501100000036, Grant Number: MOP-130402 and Fonds de Recherche du QuébecSanté, Grant Number: Chercheurs-Boursier senior

References

Poon, A.F.Y., R. Gustafson, P. Daly, L. Zerr, S.E: Demlow, J. Wong, et al. 2016. “Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study.” The Lancet HIV 3 (5): e231–e238.10.1016/S2352-3018(16)00046-1Search in Google Scholar

Bezemer, D., A. van Sighem, V.V. Lukashov, L. van der Hoek, N. Back, R. Schuurman, et al. 2010. “Transmission networks of HIV-1 among men having sex with men in the Netherlands.” AIDS 24 (2): 271–282.10.1097/QAD.0b013e328333ddeeSearch in Google Scholar PubMed

Brenner, B.G., M. Lowe, D. Moisi, I. Hardy, S. Gagnon, H. Charest, J.G. Baril, M.A. Wainberg, and M. Roger. 2011. “Subtype diversity associated with the development of HIV-1 resistance to integrase inhibitors.” Journal of Medical Virology 83 (5): 751–759.10.1002/jmv.22047Search in Google Scholar PubMed

Brenner, B., M.A. Wainberg, and M. Roger. 2013. “Phylogenetic inferences on HIV-1 transmission.” AIDS 27 (7): 1045–1057.10.1097/QAD.0b013e32835cffd9Search in Google Scholar PubMed PubMed Central

Brenner, B.G., R.-I. Ibanescu, I. Hardy, D. Stephens, J. Otis, E. Moodie, Z. Grossman, A.-M. Vandamme, M. Roger, and M.A. Wainberg. 2017. “Large cluster outbreaks sustain the HIV epidemic among MSM in Quebec.” AIDS 31 (5): 707–717.10.1097/QAD.0000000000001383Search in Google Scholar PubMed

Brenner, B. G., and E. E. M. Moodie. 2012. “HIV sexual networks: the montreal experience.” Statistical Communications in Infectious Diseases 4 (1): 1948–4690.10.1515/1948-4690.1039Search in Google Scholar

Brenner, B.G., M. Roger, J.P. Routy, D. Moisi, M. Ntemgwa, C. Matte, et al. 2007. “High rates of forward transmission events after acute/early HIV-1 infection.” The Journal of Infectious Diseases 195 (7): 951–959.10.1086/512088Search in Google Scholar PubMed

Brenner, B.G., M. Roger, D. Stephans, D. Moisi, I. Hardy, J. Weinberg, et al. 2011. “Transmission clustering drives the onward spread of the HIV epidemic among men who have sex with men in Quebec.” The Journal of Infectious Diseases 204 (7): 1115–1119.10.1093/infdis/jir468Search in Google Scholar PubMed PubMed Central

Brenner, B. G., and Wainberg, M. A. (2013). Future of phylogeny in prevention. Journal of Acquired Immune Deciency Syndrome, 2:s248–254.Search in Google Scholar

Brenner, B. G., M. A. Wainberg, and M. Roger. 2013. “Phylogenetic inferences on HIV-1 transmission: implications for the design of prevention and treatment interventions.” AIDS 27: 1045–1057.10.1097/QAD.0b013e32835cffd9Search in Google Scholar

Cain, R., E. Collins, T. Bereket, C. George, R. Jackson, A. Li, et al. 2013. “Challenges to the involvement of people living with HIV in community-based HIV/AIDS organizations in Ontario, Canada.” AIDS Care 26 (2): 263–266.10.1080/09540121.2013.803015Search in Google Scholar PubMed

Carroll, R., Ruppert, D., and Stefanski, L. (1995). Measurement Error in Nonlinear Models. London, UK: Chapman and Hall.10.1007/978-1-4899-4477-1Search in Google Scholar

Cheng, J. M., L. Hiscoe, S. L. Pollock, P. Hasselback, J. L. Gardy, and R. Parker. 2015. “A clonal outbreak of tuberculosis in a homeless population in the interior of British Columbia, Canada, 2008–2015.” Epidemiology and Infection 143 (15): 3220–3226.10.1017/S0950268815000825Search in Google Scholar PubMed

Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74(368):829–836.10.1080/01621459.1979.10481038Search in Google Scholar

Cohen, M.S., Y.Q. Chen, M. McCauley, T. Gamble, M.C. Hosseinipour, N. Kumarasamy, et al. 2011. “Prevention of HIV-1 infection with early antiretroviral therapy.” New England Journal of Medicine 365 (6): 493–505.10.1056/NEJMoa1105243Search in Google Scholar PubMed PubMed Central

Cole, S. R., Chu, H., and Greenland, S. (2006). Multiple-imputation for measurement-error correction. International Journal of Epidemiology, 35:1074–1081.10.1093/ije/dyl097Search in Google Scholar PubMed

Cook, J., and Stefanski, L. A. (1995). A simulation extrapolation method for parametric measurement error models. Journal of American Statistical Association, 89:1314–1328.10.1080/01621459.1994.10476871Search in Google Scholar

Gleser, L. J. (1990). Improvements of the naive approach to estimation in nonlinear errors-in-variables regression models. Contemporary Mathematics, 112:99–114.10.1090/conm/112/1087101Search in Google Scholar

Goodman, L. A. (1961). Snowball sampling. Annals of Mathematical Statistics, 32:148–170.10.1214/aoms/1177705148Search in Google Scholar

Granich, R., S. Crowley, M. Vitoria, Y.-R. Lo, Y. Souteyrand, C. Dye, et al. 2010. “Highly active antiretroviral treatment for the prevention of HIV transmission.” Journal of the International AIDS Society 13 (1): 1.10.1186/1758-2652-13-1Search in Google Scholar PubMed PubMed Central

Gustafson, P., M. Gilbert, M. Xia, W. Michelow, W. Robert, T. Trussler, et al. 2013. “Impact of statistical adjustment for frequency of venue attendance in a venue-based survey of men who have sex with men.” American Journal of Epidemiology 177 (10): 1157–1164.10.1093/aje/kws358Search in Google Scholar PubMed

http://argusquebec.ca/english/index.htmlSearch in Google Scholar

Hué, S., J.P. Clewley, P.A. Cane, and D. Pillay. 2004. “HIV-1 pol gene variation is sufficient for reconstruction of transmissions in the era of antiretroviral therapy.” AIDS 18 (5): 719–728.10.1097/00002030-200403260-00002Search in Google Scholar PubMed

Hughes, G.J., E. Fearnhill, D. Dunn, S.J. Lycett, A. Rambaut, and A.J. Leigh Brown. 2009. “Molecular phylodynamics of the heterosexual HIV epidemic in the United Kingdom.” PLoS Pathogens 5 (9): e1000590.10.1371/journal.ppat.1000590Search in Google Scholar PubMed PubMed Central

Lambert, G., J. Cox, T. S. Hottes, C. Tremblay, L. R. Frigault, M. Alary, J. Otis, and R. S. Remis. 2009. “Correlates of unprotected anal sex at last sexual episode: analysis from a surveillance study of men who have sex with men in Montreal.” AIDS and Behavior 15 (3): 584–595.10.1007/s10461-009-9605-3Search in Google Scholar PubMed

Leigh Brown, A.J., S.J. Lycett, L. Weinert, G.J. Hughes, E. Fearnhill, and D.T. Dunn. 2011. “Transmission network parameters estimated from HIV sequences for a nationwide epidemic.” The Journal of Infectious Diseases 204 (9): 1463–1469.10.1093/infdis/jir550Search in Google Scholar PubMed PubMed Central

Levy, I., Z. Mor, E. Anis, E. Leshem, S. Pollack, M. Chowers, et al. 2011. “Men who have sex with men, risk behavior, and HIV infection: integrative analysis of clinical, epidemiological, and laboratory databases.” Clinical Infectious Diseases 52 (11): 1363–1370.10.1093/cid/cir244Search in Google Scholar PubMed

Lewis, F., G.J. Hughes, A. Rambaut, A. Pozniak, and A.J. Leigh Brown. 2008 (3) 18. “Episodic sexual transmission of HIV revealed by molecular phylodynamics.” PLoS Medicine 5 (3): e50.10.1371/journal.pmed.0050050Search in Google Scholar PubMed PubMed Central

MacKellar, D., L. Valleroy, J. Karon, G. Lemp, and R. Janssen. 1996. “The Young Men’s Survey: Methods for estimating HIV seroprevalence and risk factors among young men who have sex with men.” Public Health Report 111 (Suppl 1): 138–144.Search in Google Scholar

Magnani, R., K. Sabin, T. Saidel, and D. Heckathorn. 2005. “Review of sampling hard-to-reach and hidden populations for HIV surveillance.” AIDS 19 (Suppl 2): S67–S72.10.1097/01.aids.0000172879.20628.e1Search in Google Scholar PubMed

Muhib, F.B., L. S. Lin, A. Stueve, R.L. Miller, W.L. Ford, W.D. Johnson, and P.J. Smith. 2001. “A venue-based method for sampling hard-to-reach populations.” Public Health Reports 116 (Suppl 1): 216–222.10.1093/phr/116.S1.216Search in Google Scholar PubMed PubMed Central

Otis, J., A. McFadyen, T. Haig, M. Blais, J. Cox, B. Brenner, R. Rousseau, G. Émond, M. Roger, and M. Wainberg. The Spot Study Group. 2016. “Beyond condoms: risk reduction strategies among gay, bisexual, and other men who have sex with men receiving rapid HIV testing in Montreal, Canada.” AIDS and Behavior 20 (12): 2812–2826.10.1007/s10461-016-1344-7Search in Google Scholar PubMed PubMed Central

Parveen, N., Moodie, E. E. M., and Brenner, B. G. (2015). The non-zero mean SIMEX: Improving estimation in the face of measurement error. Observational Studies, 1:91–123.10.1353/obs.2015.0005Search in Google Scholar

Parveen, N., Moodie, E., and Brenner, B. (2017). “Correcting covariate-dependent measurement error with non-zero mean.” Statist. Med., 36 (17): 2786–2800.10.1002/sim.7289Search in Google Scholar

Powers, K.A., A.C. Ghani, W.C. Miller, I.F. Hoffman, A.E. Pettifor, G. Kamanga, et al. 2011. “The role of acute and early HIV infection in the spread of HIV-1 in Lilongwe, Malawi: implications for “Test and Treat” and other transmission prevention strategies.” The Lancet 378 (9787): 256–268.10.1016/S0140-6736(11)60842-8Search in Google Scholar

Remis, R.S., M. Alary, J. Liu, R. Kaul, and R.W.H. Palmer. 2014. “HIV transmission among men who have sex with men due to condom failure.” PLoS ONE 9 (9): e107540.10.1371/journal.pone.0107540Search in Google Scholar PubMed PubMed Central

Remis, R. S., and R. W. Palmer. 2009. “Testing bias in calculating HIV incidence from the Serologic Testing Algorithm for Recent HIV Seroconversion.” AIDS 23 (4): 493–503.10.1097/QAD.0b013e328323ad5fSearch in Google Scholar PubMed

Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons, Inc.10.1002/9780470316696Search in Google Scholar

Stefanski, L. A., and J. R. Cook. 1995. “Simulation-extrapolation: the measurement error jackknife.” Journal of the American Statistical Association 90 (432): 1247–1256.10.1080/01621459.1995.10476629Search in Google Scholar

Wainberg, M. A., Brenner, B. G. (2012). “The impact of HIV genetic polymorphisms and subtype differences on the occurrence of resistance to antiretroviral drugs.” Molecular Biology International, Article ID 256982.10.1155/2012/256982Search in Google Scholar PubMed PubMed Central

Watters, J. K., and P. Biernacki. 1989. “Targeted sampling: Options for the study of hidden populations.” Social Problems 36: 416–430.10.2307/800824Search in Google Scholar

Xia, Q., M. Tholandi, D.H. Osmond, L.M. Pollack, W. Zhou, J.D. Ruiz, and J.A. Catania. 2006. “The effect of venue sampling on estimates of HIV prevalence and sexual risk behaviors in men who have sex with men.” Sexually Transmitted Diseases 33 (9): 545–550.10.1097/01.olq.0000219866.84137.82Search in Google Scholar PubMed

Yang, Q., Ogunnaike-Cooke, S., et al. (2016). Estimated national HIV incidence rates among key sub-populations in Canada, 2014. Presentation at 25th Annual Canadian Conference on HIV/AIDS Research (CAHR), 12–15 May 2016, Winnipeg, Canada. Abstract EPH3.5.Search in Google Scholar

Yerly, S., T. Junier, A. Gayet-Ageron, E.B. El Amari, V. von Wyl, H.F. Günthard, et al. 2009. “The impact of transmission clusters on primary drug resistance in newly diagnosed HIV-1 infection.” AIDS 23 (11): 1415–1423.10.1097/QAD.0b013e32832d40adSearch in Google Scholar PubMed


Supplementary Material

The online version of this article offers supplementary material (DOI:https://doi.org/10.1515/em-2017-0017).


Received: 2017-11-10
Revised: 2018-05-20
Accepted: 2018-05-24
Published Online: 2018-09-20

© 2018 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 25.5.2024 from https://www.degruyter.com/document/doi/10.1515/em-2017-0017/html
Scroll to top button