Statistical adjustment of network degree in respondent-driven sampling estimators: Venue attendance as a proxy for network size among young MSM
Introduction
Social network data collection and analysis have been widely employed to identify network factors that increase transmission of infectious diseases, including human immunodeficiency virus (HIV), acquired immune deficiency syndrome (AIDS), and other sexually transmitted infections (STIs) (Auerbach et al., 1984; Fichtenberg et al., 2009; Friedman et al., 1997; Klovdahl, 1985; Klovdahl et al., 1994; Latkin et al., 1995; Schneider et al., 2013; Valente and Fujimoto, 2010). In public health research, egocentric network data that represent the personal network from the perspective of the respondent are often used to measure network influence to predict individuals’ behavior (Valente, 2010). Egocentric data, or the direct contacts of a focal person (or a respondent) with alters and the respondent’s perceptions about relationships between these alters, are conceptually different from sociometric data, or a whole network composed of both direct and indirect relationships among a population.
Egocentric data form the building blocks of the sociometric data, and, thus, sociometric data can be treated as egocentric data (Valente, 2010). Nevertheless, when using social network data and network analysis in epidemiology, the conceptual distinction between egocentric and sociometric data is important to understanding the spread and transmission of infectious disease at the population level or to identifying and interpreting the flow of infectious agents (Boodram and Williams, 2013; Friedman et al., 1997; Klovdahl, 1985).
Existing network studies of HIV infection and prevention use egocentric designs in greater numbers than they do whole-network designs. This may be due to the difficulties associated with recruitment of network members, including alters; the delineation of network boundaries for recruitment; enumeration of network members; challenges in alter recruitment; and intensity of resources needed for conducting whole-network studies (Boodram and Williams, 2013). Nonetheless, sociometric data have been successfully collected by employing multiple egocentric strategies and constructing links across identifiable alters to recruit whole networks for HIV research (Boodram and Williams, 2013).
The success of network construction depends heavily on the accuracy of reported characteristics. Researchers often construct a sociometric network by cross-referencing the participants’ reports of the characteristics of their alters, such as their names and ages, with those in the larger social network (Fujimoto, Coghill et al., 2017; Fujimoto et al., 2018; Shah et al., 2014; Young et al., 2016). The process, called “duplicate removal” or “entity resolution” (Young et al., 2016), allows researchers to determine potential matches based on a set of criteria to resolve the duplicates (e.g., the same alter is named by two participants, a participant has named another participant) within the sociometric network. Nonetheless, there are concerns about missing data due to unidentifiable cases, suggesting that the observed data represent only a partial network of the complete transmission network. In particular, for network studies that engage the most-vulnerable populations, such as men who have sex with men (MSM) or injection drug users, it is quite difficult to collect sociometric data due to the difficulties of obtaining a sampling frame for these hidden populations.
A statistical framework for inference from partially observed sampled network information has been developed (Handcock and Gile, 2010). One novel way to address the partial- versus whole-network problem is to take a stochastic simulation approach to model HIV transmission dynamics, using egocentrically sampled network data as aggregated empirical data (model statistics) (Goodreau et al., 2012; Krivitsky and Morris, 2017; Morris et al., 2009). This data-driven network simulation approach, within the modeling framework of exponential random graph models (ERGMs) (Frank and Strauss, 1986), models partnership formation or dissolution (Krivitsky, 2009; Krivitsky and Morris, 2017). The approach is used to understand network features, such as concurrency partnerships and assortative mixing patterns, that influence the spread of disease.
The network sampling methodology of egocentric network data is a relatively independent subject when compared to network modeling or other descriptive network approaches. Increasingly, however, social network characterization that includes vulnerable populations has included multiple network data collection strategies, including the collection of multiple “linked” egocentric networks (Rice et al., 2012; Shah et al., 2014), two-mode network data (Fujimoto et al., 2015; Schneider et al., 2013), and digital network data (Schneider et al., 2015).
In the social network field, network studies in public health, health behavior, and prevention research often have involved the collection of egocentric data that use respondent-driven sampling (RDS). RDS (Heckathorn, 1997, Heckathorn, 2002; Salganik and Heckathorn, 2004) involves a network-based link-tracing chain referral recruitment method that has been widely used to sample hard-to-reach populations, including MSM, and to estimate the prevalence of diseases, risk and protective behaviors, and other population characteristics (Johnston et al., 2008; Malekinejad et al., 2008). Within RDS sampling, however, other layers of network characterization, including egocentric and matching through entity resolution, have leveraged the RDS sampling framework and use traditional name generators and interpreters for individuals who are recruited through RDS.
Obtaining valid estimates of disease prevalence or behaviors, using RDS methodology, however, is a controversial topic among RDS researchers, as it can be very challenging to make inferences about valid population parameter estimates. To address this issue, we introduce a way of improving performance of RDS estimates by employing egocentric venue attendance information among young MSM (YMSM) in the United States within a sample generated through RDS. We assume that our method will lead to improvements in RDS estimates through the addition of information on attendance at venues where MSM are known to congregate and interact with their peers to form affiliation-based sexual networks (Frost, 2007; Fujimoto et al., 2013). Notably, these points of interaction should constrain the resulting network possibilities.
Our motivation in developing a new degree measure stems from the challenges that make RDS estimates, using the standard self-reported degree measure, questionable in the case of the population of YMSM (Kuhns et al., 2015). This questionable nature may be due, in part, to the limitations of collecting accurate self-reported network size or number of contacts (degree), which are adjusted in RDS estimates as the respondents’ likelihood of being sampled. An RDS inference method that provides a reliable degree measure among this population is essential from a public health perspective, as YMSM have a high prevalence and incidence of HIV (Oster et al., 2014) and a rising incidence of primary and secondary syphilis infection (Patton et al., 2014).
Our study takes a statistical approach to incorporating venue attendance information into the standard self-reported degree information to estimate the population degree distribution. The resulting venue-informed predicted degree information can then be used to compute RDS estimates. Empirical research suggests that MSM congregate at social venues where they interact with their peers and, in some cases, form risk networks (Frost, 2007; Fujimoto et al., 2013; Reisner et al., 2010; Xia et al., 2006). Venue attendance data, therefore, provide valuable information that is complementary to self-reported degree and may better reflect the meaning of “peer network,” which is not limited to personal networks among MSM. Our study contributes to the existing science on RDS estimation by demonstrating a new venue-based affiliation approach to RDS-based prevalence estimates of HIV and syphilis as well as sexual risk and protective behaviors among this vulnerable population. These estimates should be more efficient (have smaller variance) and less biased than is the standard self-reported degree measurement.
Section snippets
RDS methodology
RDS methodology consists of two components: a sampling design and a statistical inference method for computing the estimators for characteristics that are representative of the population of interest (Salganick, 2012; White et al., 2012). With respect to the first component, RDS samples hard-to-reach populations through their social network. Initially, individuals in the network are purposively selected as “seeds,” who then recruit a fixed number of their contacts (or recruits). To mitigate the
RDS studies among young men who have sex with men
Existing RDS studies of MSM report not only network size related to recruitment but also HIV positivity, risky sexual behavior, and factors related to gay community connectedness and engagement (Forrest et al., 2016; Reisner et al., 2010). A few RDS studies have evaluated the efficiency and effectiveness of RDS-based sampling of YMSM (Kuhns et al., 2015; Phillips et al., 2014). Although network size is associated with recruitment success among this population, RDS poses some challenges in terms
Venue attendance as a proxy for social network size
YMSM tend to form their social networks through venue attendance, and gay venue attendees are more likely to be younger, be men of color, and engage in high-risk sexual behaviors (Xia et al., 2006). MSM who meet sex partners at social and public cruising venues have been found to be productive seeds for RDS recruitment (Reisner et al., 2010). Further, MSM with more frequent venue attendance tend to have more social contacts (Clark et al., 2014) and have a higher probability of peer recruitment (
Introducing a new venue-informed degree measure
Our study assumes that the use of self-reported network size as a proxy for sample inclusion probability may be a problem due to potential recall bias and misreporting on the network size (McLaughlin et al., 2015). To address this issue, we propose a model-adjusted degree measure to approximate true self-reported network size. We use self-reported frequency of venue attendance as a proxy for true network size, which is then used as a covariate to predict self-reported network size in a
RDS estimators
The venue-informed degree measure is then used in the adjustment of the existing RDS estimators. We compute the Volz-Heckathorn (V-H) estimator (RDS-II) (Volz and Heckathorn, 2008) that approximates the inclusion probability as inversely proportional to the personal network size (degree ), defined as follows:where represents the variable of interest measured, and represents the degree based on self-reported personal network size. As is used as the inverse weight
Study design and setting
Between December 2014 and June 2016, 755 individuals in Chicago (N = 377) and Houston (N = 378) were recruited into the Young Men’s Affiliation Project (YMAP), a prospective cohort study of risk and health venue affiliation networks and HIV risk and prevention among YMSM, using RDS. YMAP is being conducted by the University of Chicago (UC), Ann & Robert H. Lurie Children’s Hospital of Chicago (Lurie), and the University of Texas Health Science Center at Houston School of Public Health
Descriptive statistics
Table 1 shows the descriptive statistics (percentage and means, with standard deviations, in parentheses) of the sociodemographic characteristics of our study sample for Chicago (N = 372) and Houston (N = 374).
Both Chicago and Houston participants reported similar network size: mean = 51.1 for Chicago and 53.1 for Houston. Chicago participants reported attendance at higher numbers of health venues (on average, 2.0) and lower numbers of social venues (on average, 3.4) compared to their Houston
Discussion
Our study introduces a statistical framework to adjust for potentially inaccurate degree estimation caused by under- and/or over-reporting of personal network size toward a venue-informed estimation of population degree among YMSM. This study demonstrated that our venue-predicted degree is more efficient (has smaller variance) than the self-reported degree. It is likely that our venue-predicted degree measure helps by pulling some extremely low degrees in our data toward the center of the
Acknowledgments
This work was supported by the National Institutes of Health (1R01MH100021, 1R01DA039934, and 1R21GM113694). The co-first-author Ming Cao is supported by UTHealth Innovation for Cancer Prevention Research Training Program Pre-doctoral Fellowship (Cancer Prevention and Research Institute of Texas grant # RP160015). The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH or the Cancer Prevention and Research Institute of Texas. We also
References (72)
- et al.
Cluster of cases of the acquired immune deficiency syndrome: patients linked by sexual contact
Am. J. Med.
(1984) - et al.
Partner naming and forgetting: recall of network members
Soc. Netw.
(2007) - et al.
Social networks and infectious disease: the Colorado Springs study
Soc. Sci. Med.
(1994) Social networks and the spread of infectious diseases: the AIDS example
Soc. Sci. Med.
(1985)- et al.
Personal network characteristics as antecedents to needle-sharing and shooting gallery attendance
Soc. Netw.
(1995) - et al.
Errors in reported degrees and respondent driven sampling: implications for bias
Drug Alcohol Depend.
(2014) - et al.
A new HIV prevention network approach: sociometric peer change agent selection
Soc. Sci. Med.
(2015) - et al.
Bridges: locating critical connectors in a network
Soc. Netw.
(2010) - et al.
Exponential random graph (p*) models for affiliation networks
Soc. Netw.
(2009) - et al.
Exponential random graph models for multilevel networks
Soc. Netw.
(2013)
Strengthening the reporting of observational studies in epidemiology for respondent-driven sampling studies: STROBE-RDS statement
J. Clin. Epidemiol.
Accuracy of name and age data provided about network members in a social network study of people who use drugs: implications for constructing sociometric networks
Ann. Epidemiol.
Collecting whole network data for human immunodeficiency virus prevention: a review of current strategies
J. AIDS HIV Res.
Sampling methodologies for epidemiologic surveillance of men who have sex with men and transgender women in Latin America: an empiric comparison of convenience sampling, time space sampling, and respondent driven sampling
AIDS Behav.
Social venue range and referral chain impact: implications for the sampling of hidden communities
PLoS One
The igraph software package for complex network research
Int. J. Complex Syst.
Sexual network position and risk of sexually transmitted infections
Sex. Transm. Infect.
Factors associated with productive recruiting in a respondent-driven sample of men who have sex with men in Vancouver, Canada
J. Urban Health
Markov graphs
J. Am. Stat. Assoc.
Sociometric risk networks and risk for HIV infection
Am. J. Public Health
Respondent-driven sampling of injection drug users in two US–Mexico border cities: recruitment dynamics and impact on estimates of HIV and syphilis prevalence
J. Urban Health
Using sexual affiliation networks to describe the sexual structure of a population
Sex. Transm. Infect.
Venue-based affiliation network and HIV risk behavior among male sex workers
Sex. Transm. Dis.
Venue-mediated weak ties in multiplex HIV transmission risk networks among drug-using male sex workers and associates
Am. J. Public Health
Network centrality and geographical concentration of social and service venues that serve young men who have sex with men
AIDS Behav.
Multiplex competition, collaboration, and funding networks among social and health organizations: towards organization-based HIV interventions for young men who have sex with men
Med. Care
Short Communication: Lack of support for socially connected HIV-1 transmission among young adult Black MSM
AIDS Res. Human Retroviruses
Social networks as drivers of syphilis-HIV infection among young black men who have sex with men
Sexually Transmitted infections
Socioeconomic disconnection as a risk Factor for increased HIV infection in young men who have sex with men
LGBT Health
Respondent-driven sampling: an assessment of current methodology
Sociol. Methodol.
Network model-assisted inference from respondent-driven sampling data
J. R. Stat. Soc.: Ser. A (Stat. Soc.)
Diagnostics for respondent-driven sampling
J. R. Stat. Soc.: Ser. A (Stat. Soc.)
Improved inference for respondent-driven sampling data with application to HIV prevalence estimation
J. Am. Stat. Assoc.
Assessing respondent-driven sampling
Proc. Natl. Acad. Sci.
Concurrent partnerships, acute infection and HIV epidemic dynamics among young adults in Zimbabwe
AIDS Behav.
Modeling social networks from sampled data
Ann. Appl. Stat.
Cited by (15)
Stimulant use interventions may strengthen ‘Getting to Zero’ HIV elimination initiatives in Illinois: Insights from a modeling study
2022, International Journal of Drug PolicyCitation Excerpt :Mirtazapine, in particular, has been shown in clinical trials to be an effective biomedical treatment for methamphetamine addiction (Colfax et al., 2011; Coffin et al., 2020). While no FDA approved treatment exists for treating cocaine addiction, other interventions such as residential rehabilitation have found moderate success in treating stimulant use disorders (including methamphetamines and cocaine) (AshaRani et al., 2020; Gossop et al., 2000, 2002, 2003). The success of these stimulant use treatments have led to calls for their integration in HIV care (Sylla et al., 2007; Farrell et al., 2019).
Integrated molecular and affiliation network analysis: Core-periphery social clustering is associated with HIV transmission patterns
2021, Social NetworksCitation Excerpt :YMAP was a longitudinal network study of young MSM in Houston (N = 378) and Chicago (N = 377) who were recruited between 2014 and 2016, using a respondent-driven sampling method (Heckathorn, 2002), and who were followed up on between 2015 and 2017. Eligibility criteria for YMAP participants were: (a) between the ages of 16 and 29, (b) male sex assigned at birth and current male identification, (c) reported sex (oral or anal) with another man in the past year, (d) frequented at least one social or preventive venue in the past year, and (e) willing to provide biological samples (Fujimoto et al., 2018a). A sample from the iMAN project was collected during the parent YMAP study, primarily for baseline data collection, with the aim of providing molecular-level validation of venue-based affiliation networks in relation to HIV transmission among HIV-seropositive predominantly YBMSM.
Social networks, high-risk anal HPV and coinfection with HIV in young sexual minority men
2022, Sexually Transmitted Infections
- 1
These authors share the first co-authorship.