Elsevier

Social Networks

Volume 54, July 2018, Pages 118-131
Social Networks

Statistical adjustment of network degree in respondent-driven sampling estimators: Venue attendance as a proxy for network size among young MSM

https://doi.org/10.1016/j.socnet.2018.01.003Get rights and content

Highlights

  • A new venue-based degree measure is introduced to adjust reported network size.

  • Computed venue-informed RDS estimates for population HIV/syphilis seroprevalence are used.

  • Venue-informed RDS estimates had smaller variance than the standard reported size.

  • Venue-informed RDS estimates demonstrate reduced bias compared to the standard reported size.

  • Venue attendance may provide a better performed degree measure for RDS estimates.

Abstract

We introduce a new venue-informed network degree measure, which we applied to respondent-driven sampling (RDS) estimators. Using data collected from 746 young MSM in 2014–2016 in Chicago, IL, and Houston, TX, we estimated the population seroprevalence of HIV and syphilis and risk/protective behaviors, using RDS estimates with self-reported network size as a standard degree measure as well as our proposed venue-informed degree measure. The results indicate that the venue-informed degree measure tended to be more efficient (smaller variance) and less biased than the other measure in both cities sampled. Venue attendance-adjusted network size may provide a more reliable and accurate degree measure for RDS estimates of the outcomes of interest.

Introduction

Social network data collection and analysis have been widely employed to identify network factors that increase transmission of infectious diseases, including human immunodeficiency virus (HIV), acquired immune deficiency syndrome (AIDS), and other sexually transmitted infections (STIs) (Auerbach et al., 1984; Fichtenberg et al., 2009; Friedman et al., 1997; Klovdahl, 1985; Klovdahl et al., 1994; Latkin et al., 1995; Schneider et al., 2013; Valente and Fujimoto, 2010). In public health research, egocentric network data that represent the personal network from the perspective of the respondent are often used to measure network influence to predict individuals’ behavior (Valente, 2010). Egocentric data, or the direct contacts of a focal person (or a respondent) with alters and the respondent’s perceptions about relationships between these alters, are conceptually different from sociometric data, or a whole network composed of both direct and indirect relationships among a population.

Egocentric data form the building blocks of the sociometric data, and, thus, sociometric data can be treated as egocentric data (Valente, 2010). Nevertheless, when using social network data and network analysis in epidemiology, the conceptual distinction between egocentric and sociometric data is important to understanding the spread and transmission of infectious disease at the population level or to identifying and interpreting the flow of infectious agents (Boodram and Williams, 2013; Friedman et al., 1997; Klovdahl, 1985).

Existing network studies of HIV infection and prevention use egocentric designs in greater numbers than they do whole-network designs. This may be due to the difficulties associated with recruitment of network members, including alters; the delineation of network boundaries for recruitment; enumeration of network members; challenges in alter recruitment; and intensity of resources needed for conducting whole-network studies (Boodram and Williams, 2013). Nonetheless, sociometric data have been successfully collected by employing multiple egocentric strategies and constructing links across identifiable alters to recruit whole networks for HIV research (Boodram and Williams, 2013).

The success of network construction depends heavily on the accuracy of reported characteristics. Researchers often construct a sociometric network by cross-referencing the participants’ reports of the characteristics of their alters, such as their names and ages, with those in the larger social network (Fujimoto, Coghill et al., 2017; Fujimoto et al., 2018; Shah et al., 2014; Young et al., 2016). The process, called “duplicate removal” or “entity resolution” (Young et al., 2016), allows researchers to determine potential matches based on a set of criteria to resolve the duplicates (e.g., the same alter is named by two participants, a participant has named another participant) within the sociometric network. Nonetheless, there are concerns about missing data due to unidentifiable cases, suggesting that the observed data represent only a partial network of the complete transmission network. In particular, for network studies that engage the most-vulnerable populations, such as men who have sex with men (MSM) or injection drug users, it is quite difficult to collect sociometric data due to the difficulties of obtaining a sampling frame for these hidden populations.

A statistical framework for inference from partially observed sampled network information has been developed (Handcock and Gile, 2010). One novel way to address the partial- versus whole-network problem is to take a stochastic simulation approach to model HIV transmission dynamics, using egocentrically sampled network data as aggregated empirical data (model statistics) (Goodreau et al., 2012; Krivitsky and Morris, 2017; Morris et al., 2009). This data-driven network simulation approach, within the modeling framework of exponential random graph models (ERGMs) (Frank and Strauss, 1986), models partnership formation or dissolution (Krivitsky, 2009; Krivitsky and Morris, 2017). The approach is used to understand network features, such as concurrency partnerships and assortative mixing patterns, that influence the spread of disease.

The network sampling methodology of egocentric network data is a relatively independent subject when compared to network modeling or other descriptive network approaches. Increasingly, however, social network characterization that includes vulnerable populations has included multiple network data collection strategies, including the collection of multiple “linked” egocentric networks (Rice et al., 2012; Shah et al., 2014), two-mode network data (Fujimoto et al., 2015; Schneider et al., 2013), and digital network data (Schneider et al., 2015).

In the social network field, network studies in public health, health behavior, and prevention research often have involved the collection of egocentric data that use respondent-driven sampling (RDS). RDS (Heckathorn, 1997, Heckathorn, 2002; Salganik and Heckathorn, 2004) involves a network-based link-tracing chain referral recruitment method that has been widely used to sample hard-to-reach populations, including MSM, and to estimate the prevalence of diseases, risk and protective behaviors, and other population characteristics (Johnston et al., 2008; Malekinejad et al., 2008). Within RDS sampling, however, other layers of network characterization, including egocentric and matching through entity resolution, have leveraged the RDS sampling framework and use traditional name generators and interpreters for individuals who are recruited through RDS.

Obtaining valid estimates of disease prevalence or behaviors, using RDS methodology, however, is a controversial topic among RDS researchers, as it can be very challenging to make inferences about valid population parameter estimates. To address this issue, we introduce a way of improving performance of RDS estimates by employing egocentric venue attendance information among young MSM (YMSM) in the United States within a sample generated through RDS. We assume that our method will lead to improvements in RDS estimates through the addition of information on attendance at venues where MSM are known to congregate and interact with their peers to form affiliation-based sexual networks (Frost, 2007; Fujimoto et al., 2013). Notably, these points of interaction should constrain the resulting network possibilities.

Our motivation in developing a new degree measure stems from the challenges that make RDS estimates, using the standard self-reported degree measure, questionable in the case of the population of YMSM (Kuhns et al., 2015). This questionable nature may be due, in part, to the limitations of collecting accurate self-reported network size or number of contacts (degree), which are adjusted in RDS estimates as the respondents’ likelihood of being sampled. An RDS inference method that provides a reliable degree measure among this population is essential from a public health perspective, as YMSM have a high prevalence and incidence of HIV (Oster et al., 2014) and a rising incidence of primary and secondary syphilis infection (Patton et al., 2014).

Our study takes a statistical approach to incorporating venue attendance information into the standard self-reported degree information to estimate the population degree distribution. The resulting venue-informed predicted degree information can then be used to compute RDS estimates. Empirical research suggests that MSM congregate at social venues where they interact with their peers and, in some cases, form risk networks (Frost, 2007; Fujimoto et al., 2013; Reisner et al., 2010; Xia et al., 2006). Venue attendance data, therefore, provide valuable information that is complementary to self-reported degree and may better reflect the meaning of “peer network,” which is not limited to personal networks among MSM. Our study contributes to the existing science on RDS estimation by demonstrating a new venue-based affiliation approach to RDS-based prevalence estimates of HIV and syphilis as well as sexual risk and protective behaviors among this vulnerable population. These estimates should be more efficient (have smaller variance) and less biased than is the standard self-reported degree measurement.

Section snippets

RDS methodology

RDS methodology consists of two components: a sampling design and a statistical inference method for computing the estimators for characteristics that are representative of the population of interest (Salganick, 2012; White et al., 2012). With respect to the first component, RDS samples hard-to-reach populations through their social network. Initially, individuals in the network are purposively selected as “seeds,” who then recruit a fixed number of their contacts (or recruits). To mitigate the

RDS studies among young men who have sex with men

Existing RDS studies of MSM report not only network size related to recruitment but also HIV positivity, risky sexual behavior, and factors related to gay community connectedness and engagement (Forrest et al., 2016; Reisner et al., 2010). A few RDS studies have evaluated the efficiency and effectiveness of RDS-based sampling of YMSM (Kuhns et al., 2015; Phillips et al., 2014). Although network size is associated with recruitment success among this population, RDS poses some challenges in terms

Venue attendance as a proxy for social network size

YMSM tend to form their social networks through venue attendance, and gay venue attendees are more likely to be younger, be men of color, and engage in high-risk sexual behaviors (Xia et al., 2006). MSM who meet sex partners at social and public cruising venues have been found to be productive seeds for RDS recruitment (Reisner et al., 2010). Further, MSM with more frequent venue attendance tend to have more social contacts (Clark et al., 2014) and have a higher probability of peer recruitment (

Introducing a new venue-informed degree measure

Our study assumes that the use of self-reported network size as a proxy for sample inclusion probability may be a problem due to potential recall bias and misreporting on the network size (McLaughlin et al., 2015). To address this issue, we propose a model-adjusted degree measure to approximate true self-reported network size. We use self-reported frequency of venue attendance as a proxy for true network size, which is then used as a covariate to predict self-reported network size in a

RDS estimators

The venue-informed degree measure is then used in the adjustment of the existing RDS estimators. We compute the Volz-Heckathorn (V-H) estimator (RDS-II) (Volz and Heckathorn, 2008) that approximates the inclusion probability as inversely proportional to the personal network size (degree di), defined as follows:pˆvh=i=1nzidii=1n1diwhere Zi represents the variable of interest measured, and di represents the degree based on self-reported personal network size. As di is used as the inverse weight

Study design and setting

Between December 2014 and June 2016, 755 individuals in Chicago (N = 377) and Houston (N = 378) were recruited into the Young Men’s Affiliation Project (YMAP), a prospective cohort study of risk and health venue affiliation networks and HIV risk and prevention among YMSM, using RDS. YMAP is being conducted by the University of Chicago (UC), Ann & Robert H. Lurie Children’s Hospital of Chicago (Lurie), and the University of Texas Health Science Center at Houston School of Public Health

Descriptive statistics

Table 1 shows the descriptive statistics (percentage and means, with standard deviations, in parentheses) of the sociodemographic characteristics of our study sample for Chicago (N = 372) and Houston (N = 374).

Both Chicago and Houston participants reported similar network size: mean = 51.1 for Chicago and 53.1 for Houston. Chicago participants reported attendance at higher numbers of health venues (on average, 2.0) and lower numbers of social venues (on average, 3.4) compared to their Houston

Discussion

Our study introduces a statistical framework to adjust for potentially inaccurate degree estimation caused by under- and/or over-reporting of personal network size toward a venue-informed estimation of population degree among YMSM. This study demonstrated that our venue-predicted degree is more efficient (has smaller variance) than the self-reported degree. It is likely that our venue-predicted degree measure helps by pulling some extremely low degrees in our data toward the center of the

Acknowledgments

This work was supported by the National Institutes of Health (1R01MH100021, 1R01DA039934, and 1R21GM113694). The co-first-author Ming Cao is supported by UTHealth Innovation for Cancer Prevention Research Training Program Pre-doctoral Fellowship (Cancer Prevention and Research Institute of Texas grant # RP160015). The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH or the Cancer Prevention and Research Institute of Texas. We also

References (72)

  • R.G. White et al.

    Strengthening the reporting of observational studies in epidemiology for respondent-driven sampling studies: STROBE-RDS statement

    J. Clin. Epidemiol.

    (2015)
  • A.M. Young et al.

    Accuracy of name and age data provided about network members in a social network study of people who use drugs: implications for constructing sociometric networks

    Ann. Epidemiol.

    (2016)
  • B. Boodram et al.

    Collecting whole network data for human immunodeficiency virus prevention: a review of current strategies

    J. AIDS HIV Res.

    (2013)
  • J. Clark et al.

    Sampling methodologies for epidemiologic surveillance of men who have sex with men and transgender women in Latin America: an empiric comparison of convenience sampling, time space sampling, and respondent driven sampling

    AIDS Behav.

    (2014)
  • B. Cornwell et al.

    Social venue range and referral chain impact: implications for the sampling of hidden communities

    PLoS One

    (2017)
  • G. Csardi et al.

    The igraph software package for complex network research

    Int. J. Complex Syst.

    (2006)
  • C.M. Fichtenberg et al.

    Sexual network position and risk of sexually transmitted infections

    Sex. Transm. Infect.

    (2009)
  • J.I. Forrest et al.

    Factors associated with productive recruiting in a respondent-driven sample of men who have sex with men in Vancouver, Canada

    J. Urban Health

    (2016)
  • O. Frank et al.

    Markov graphs

    J. Am. Stat. Assoc.

    (1986)
  • S.R. Friedman et al.

    Sociometric risk networks and risk for HIV infection

    Am. J. Public Health

    (1997)
  • S.D.W. Frost et al.

    Respondent-driven sampling of injection drug users in two US–Mexico border cities: recruitment dynamics and impact on estimates of HIV and syphilis prevalence

    J. Urban Health

    (2006)
  • S.D.W. Frost

    Using sexual affiliation networks to describe the sexual structure of a population

    Sex. Transm. Infect.

    (2007)
  • K. Fujimoto et al.

    Venue-based affiliation network and HIV risk behavior among male sex workers

    Sex. Transm. Dis.

    (2013)
  • K. Fujimoto et al.

    Venue-mediated weak ties in multiplex HIV transmission risk networks among drug-using male sex workers and associates

    Am. J. Public Health

    (2015)
  • K. Fujimoto et al.

    Network centrality and geographical concentration of social and service venues that serve young men who have sex with men

    AIDS Behav.

    (2017)
  • K. Fujimoto et al.

    Multiplex competition, collaboration, and funding networks among social and health organizations: towards organization-based HIV interventions for young men who have sex with men

    Med. Care

    (2017)
  • K. Fujimoto et al.

    Short Communication: Lack of support for socially connected HIV-1 transmission among young adult Black MSM

    AIDS Res. Human Retroviruses

    (2017)
  • K. Fujimoto et al.

    Social networks as drivers of syphilis-HIV infection among young black men who have sex with men

    Sexually Transmitted infections

    (2018)
  • T.A. Gayles et al.

    Socioeconomic disconnection as a risk Factor for increased HIV infection in young men who have sex with men

    LGBT Health

    (2016)
  • K.J. Gile et al.

    Respondent-driven sampling: an assessment of current methodology

    Sociol. Methodol.

    (2010)
  • K.J. Gile et al.

    Network model-assisted inference from respondent-driven sampling data

    J. R. Stat. Soc.: Ser. A (Stat. Soc.)

    (2015)
  • K.J. Gile et al.

    Diagnostics for respondent-driven sampling

    J. R. Stat. Soc.: Ser. A (Stat. Soc.)

    (2015)
  • K.J. Gile

    Improved inference for respondent-driven sampling data with application to HIV prevalence estimation

    J. Am. Stat. Assoc.

    (2011)
  • S. Goel et al.

    Assessing respondent-driven sampling

    Proc. Natl. Acad. Sci.

    (2010)
  • S.M.C.S. Goodreau et al.

    Concurrent partnerships, acute infection and HIV epidemic dynamics among young adults in Zimbabwe

    AIDS Behav.

    (2012)
  • M.S. Handcock et al.

    Modeling social networks from sampled data

    Ann. Appl. Stat.

    (2010)
  • Cited by (15)

    • Stimulant use interventions may strengthen ‘Getting to Zero’ HIV elimination initiatives in Illinois: Insights from a modeling study

      2022, International Journal of Drug Policy
      Citation Excerpt :

      Mirtazapine, in particular, has been shown in clinical trials to be an effective biomedical treatment for methamphetamine addiction (Colfax et al., 2011; Coffin et al., 2020). While no FDA approved treatment exists for treating cocaine addiction, other interventions such as residential rehabilitation have found moderate success in treating stimulant use disorders (including methamphetamines and cocaine) (AshaRani et al., 2020; Gossop et al., 2000, 2002, 2003). The success of these stimulant use treatments have led to calls for their integration in HIV care (Sylla et al., 2007; Farrell et al., 2019).

    • Integrated molecular and affiliation network analysis: Core-periphery social clustering is associated with HIV transmission patterns

      2021, Social Networks
      Citation Excerpt :

      YMAP was a longitudinal network study of young MSM in Houston (N = 378) and Chicago (N = 377) who were recruited between 2014 and 2016, using a respondent-driven sampling method (Heckathorn, 2002), and who were followed up on between 2015 and 2017. Eligibility criteria for YMAP participants were: (a) between the ages of 16 and 29, (b) male sex assigned at birth and current male identification, (c) reported sex (oral or anal) with another man in the past year, (d) frequented at least one social or preventive venue in the past year, and (e) willing to provide biological samples (Fujimoto et al., 2018a). A sample from the iMAN project was collected during the parent YMAP study, primarily for baseline data collection, with the aim of providing molecular-level validation of venue-based affiliation networks in relation to HIV transmission among HIV-seropositive predominantly YBMSM.

    View all citing articles on Scopus
    1

    These authors share the first co-authorship.

    View full text