The Effect of Differential Recruitment, Non-response and Non-recruitment on Estimators for Respondent-Driven Sampling

Respondent-driven sampling is a widely-used network sampling technique, designed to sample from hard-to-reach populations. Estimation from the resulting samples is an area of active research, with software available to compute at least four estimators of a population proportion. Each estimator is claimed to address deficiencies in previous estimators, however those claims are often unsubstantiated. In this study we provide a simulation-based comparison of five existing estimators, focussing on sampling conditions which a recent estimator is designed to address. We find no estimator consistently out-performs all others, and highlight sampling conditions in which each is to be preferred.


Introduction
Respondent-driven sampling (RDS) (Heckathorn, 1997) is currently a widely used method for sampling from hidden populations. A hidden population is a population for which it is difficult or impossible to compile a sampling frame, and hence traditional sampling and estimation methods can not be used. However, it is often still important to collect information on these populations, for example to estimate the prevalence of HIV among injecting drug users in a city. In this paper, we compare five estimators based on RDS data, under sampling conditions the estimator presented in Heckathorn (2007) is designed to address. In particular, we focus on the problem of estimating a population proportion in a hidden population with two groups, which we refer to as infected and uninfected.
The basic method used to select a respondent-driven sample is as follows: First, a small number of population members are selected as seeds, typically selected from among individuals known to the researcher. Each seed is given a number of coupons, each of which has a unique bar-code, and is asked to pass on the coupons to other people they know within the population. When an individual has received a coupon, they are asked to report to a study centre where information of interest is collected by the researcher (such information is also collected from the seeds). A small monetary reward is often offered at this stage to encourage response. The responders are then themselves given coupons, and are asked to hand them on to others they know within the population, usually only to those who have not yet been recruited. In this manner, after the initial selection of seeds, the sampling is driven by the respondents. Those who report to the study centre are known to those who have already been selected, and recruitee-recruiter relationships can be determined from the bar-codes of the coupons. The information available on which to base an estimate is therefore information collected from the respondents and from the recruitment patterns. Respondents are usually asked how many people they know within the population of interest. This provides an estimate of degree, as described later.
The original RDS paper (Heckathorn, 1997) suggested using the sample proportion as an estimator of population proportion (we refer to this as the "Naïve" estimator), and showed that this estimator is unbiased under very strong assumptions about the sampling process. Subsequent papers (Salganik and Heckathorn, 2004;Volz and Heckathorn, 2008;Heckathorn, 2007;Gile, 2010) have relaxed some of these assumptions and proposed several alternative estimators. We will refer to these estimators as the Salganik-Heckathorn (SH) estimator (Salganik and Heckathorn, 2004), the Volz-Heckathorn (VH) estimator (Volz and Heckathorn, 2008), the Heckathorn (H) estimator (Heckathorn, 2007), and the Successive Sampling (SS) estimator (Gile, 2010). The SH-estimator has been implemented in the freely-available RDSAT software (Volz et al., 2007) for several years, and is widely used by researchers implementing RDS. The H-estimator has been implemented in the latest version of this software, RDSAT 6.0 Beta. There is an urgent need to better understand the properties of the Heckathorn estimator in practice, and in comparison to the VH-and SH-estimators. The VH-, SS-and SH-estimators are implemented in the RDS R package Handcock et al. (2009).
Although the estimators are easy to implement, the nature of their behaviour is not well understood. The main reason for this is that several of the assumptions which underpin the theoretical frameworks in which the estimators are derived and analysed are not met in practice. For example, it is generally assumed that sampling is with replacement or that seeds are selected randomly, whereas in practice these conditions almost never hold. Gile and Handcock (2010) conducted a simulation study to study the VH-and SH-estimators under realistic sampling scenarios which violated these assumptions. Since this work was done the Heckathorn estimator has also been added to the RDSAT software and is already being used in practice. This estimator is designed to address violations of sampling assumptions that were not studied in Gile and Handcock (2010).
In particular, all previous estimators assume that sampled individuals always respond, that they always recruit other individuals when requested, and that they recruit uniformly at random from their acquaintances in the population. In practice, sampled individuals may not respond (non-response), may not always recruit others (imperfect recruitment effectiveness) and they may preferentially recruit neighbours with particular characteristics (differential recruitment). In this paper we investigate the effect these three deviations from the sampling assumptions have on the behaviour of the estimators. Thus, our comparison study is wellpositioned to investigate the contributions of the H-estimator, which claims to adjust for differential recruitment and recruitment effectiveness (Heckathorn, 2007).
In the remainder of this paper we first present the estimators we will compare, with a particular focus on the SH-and H-estimators. We then investigate in turn, via a simulation study, the effect of differential recruitment, imperfect recruitment effectiveness and non-response on the bias and variance of the estimators. In all these simulations the SH-and H-estimates were very similar to each other, so we extend the simulation study further to explore the conditions under which these estimates are likely to differ. Finally, we summarise the results and make some concluding remarks.

Estimators of a Population Proportion
In this section we briefly present the form of the five RDS estimators we consider in the simulation study. We use π i to represent the true, unknown probability of selection for individual i. The set of individuals selected in the sample is denoted by s, and s g denotes the intersection of s and the group g. We assume that the response variable is binary, and refer to the levels as infected and uninfected. A is used to denote the set of all infected individuals, and B to denote the set of uninfected individuals. The population quantity of interest is the proportion of infected individuals, denoted P A . Following standard sampling notation, we denote population totals by upper-case letters and sample quantities by lower-case letters. Hence Π g = i∈g π i , for example. The population size is denoted by N , and the sample size by n. We use d i to denote the degree of individual i. The degree of an individual is considered to be the number of other individuals in the population to which that individual could potentially pass a coupon.
Most of the estimators described in this paper are functions of Hansen-Hurwitz (Hansen and Hurwitz, 1943) estimators, or, in the case of the SS estimator, of the closely related Horvitz-Thompson estimators. The Hansen-Hurwitz estimator of a population total Y is unbiased and given by (Hansen and Hurwitz, 1943) The generalised Hansen-Hurwitz estimator of a population mean,Ŷ /N , is not unbiased but is asymptotically unbiased. We refer to any ratio of Hansen-Hurwitz estimators as a generalised Hansen-Hurwitz estimator.

Naïve Estimator
The Naïve estimate (Heckathorn, 1997) is equal to the sample proportion of infected individuals, i.e.
If the true sampling probabilities are given by π i , i = 1, . . . , N , then P N A is a generalised Hansen-Hurwitz estimator of (an explanation is given in appendix A). Equation (3) shows that if the sampling probabilities of all individuals are equal, then P N A is a generalised Hansen-Hurwitz estimator of N A /N . Thus, under very restrictive assumptions, P N A is asymptotically unbiased for P A 1 . If the true probabilities of selection for infected individuals are greater than those of uninfected individuals, then it can be seen from (3) that P N A will be biased upwards.

Volz-Heckathorn Estimator
The Volz-Heckathorn (VH-) estimator (Volz and Heckathorn, 2008) is given by whered i denotes the reported degree of individual i. Using the same approach as in Section 2.1, if the true selection probability of individual i is π i , i = 1, . . . , N , it can be shown that P VH A is a generalised H-H estimator of The VH-estimator is asymptotically unbiased for P A if π i ∝d i for all i. Similarly to the case for P N A , it can be seen from expression (5) that if the true probabilty of selection for infected individuals is larger than assumed, then P VH A is likely to be biased upwards.

Successive Sampling Estimator
The Successive Sampling (SS-) estimator (Gile, 2010) is very similar in form to the VHestimator. The difference is in the estimation of the sampling weights. Rather than assuming sampling probabilities proportional to degree as in P VH A , this estimator substitutes a function of the degree, based on approximating the sampling process as a successive sampling process. This process is without replacement, whereas the Naïve and VH-estimators are derived based on the assumption of with-replacement sampling (among others). Therefore, the key contribution of the SS-estimator over the other estimators is the relaxing of the assumption of with-replacement sampling. Its form is given by: whereπ i denotes the estimated sampling probability of individual i, estimated according to the successive sampling procedure described in Gile (2010). Again, using the approach in Section 2.1, if the true selection probability of individual i is π i , i = 1, . . . , N , it can be shown that P SS A is a generalised H-T estimator of i∈A π i /π i i∈A π i /π i + i∈B π i /π i . (7) Note that here the Horvitz-Thompson estimator (Horvitz and Thompson, 1952) is used in place of the Hansen-Hurwitz estimator because of the without-replacement sampling assumption 2 . Similarly to the case for P N A and P VH A , it can be seen from (7) that if the true probabilty of selection for infected individuals increases relative to uninfected individuals, P SS A will tend to increase.
This estimator also requires knowledge of the population size, N . The sensitivity of the SS-estimator to the value of N is considered in Gile (2010). For all simulations in this paper we assume that the true value, N = 1000, is known, and compute estimates using the R package RDS (Handcock et al., 2009).

Salganik-Heckathorn Estimator
The Salganik-Heckathorn (SH-) estimator makes use of the fact that it is possible to measure the proportion of within-group and cross-group recruitments in the sample by keeping track of the barcodes of coupons distributed and returned.
The Salganik-Heckathorn estimator is given by Here C AB denotes the proportion of all individuals recruited by members of group A who are members of group B, and can be thought of as an estimate of the probability a randomly selected recruit in group A recruits from group B. D A is an estimate of mean degree given by and is a H-H estimator of i∈g π i / i∈g π i /d i . Note that as the probability of selection of infected individuals increases relative to uninfected individuals, the ratio D B / D A will tend to decrease and hence, from (8), P SH A will tend to increase. Because of its more complicated form, and the stronger assumptions required to derive it, the SH-estimator is relatively difficult to study analytically study compared to the Naïve-or VH-estimators.

Heckathorn Estimator
This estimator is an extension of the SH-estimator, and was motivated by the need to control for biases introduced by differential recruitment and recruitment effectiveness (Heckathorn, 2007). Before introducing the form of the estimator, we need to consider the model used in its derivation: that of a Markov chain on degree groups. Degree groups are formed by partitioning the reported degrees into groups of contiguous degrees. Then assuming there is only one coupon, perfect recruitment effectiveness and no non-response, the sampling process is treated as a Markov chain. Here, time is indexed by the wave of the sample, and the state of the chain in a given wave is the degree group of the sampled individual. The transition probability from group g to g ′ is the probability that a coupon will be passed from an individual in group g to someone in group g ′ . Quite strong assumptions about respondent behaviour are required for this model to be appropriate, notably assumptions which are stronger than those required for the sampling process to be modelled as a Markov chain on nodes.
The H-estimator is constructed by using an "adjusted" degree estimate AD A in place of the estimates of mean degree in (8). The adjusted degree estimate is defined as RCD i is called the "recruitment component of degree" for individual i, and is formed based on degree groups defined by the reported degree of the respondents. RCD i is defined as the ratio of the estimated degree group equilibrium probability and the degree group sample proportion. That is, where n g is the number of respondents in degree group g andÊ g is the estimated equilibrium probability of being in degree group g. For all degree groups g, g ′ , the probability of transition for the Markov chain from g to g ′ is estimated by the proportion of recruitments from g to g ′ , C gg ′ . The estimated equilibrium probability of being in group g,Ê g , is then calculated from the matrix of estimated transition probabilities (Heckathorn, 2007, pg 171). The Heckathorn estimator uses the adjusted degree estimates in place of the unadjusted degree estimates in the SH-estimator, which gives If RCD i = 1 for all i, then it can be seen from (10) that AD g = D g for all g, in which case P H A = P SH A . To define the degree groups we use the recommended method of Heckathorn (2007), to mimic as closely as possible what we believe to be the default method implemented in RDSAT. In this method the user specifies a "mean cell size" n c , which determines the "aggregation level" AL = n/n c . The range of degrees is then split into AL groups of approximately equal size. The default value is n c = 12, which is the value we use for all simulations carried out in later sections.
It should be noted that if all recruits from group g only recruit others from group g, the estimated equilibrium probabilityÊ g will be 1. This means that AD A will be based only on individuals within the "absorbing" degree group, which leads to instability. However, this case did not arise for any of the simulations presented in this paper.

Comments on the Heckathorn Estimator
AD A can be seen as some kind of adjustment for divergence from equilibrium of the Markov chain on degree groups. Suppose all the assumptions that are required for the sampling process to be described as a Markov chain on degree groups hold. Then if the Markov chain started in equilibrium, n g /n is an unbiased estimator of E g for all degree groups g. Therefore, from (11) it can be seen that it is unlikely for the H-and SH-estimates to differ unless the Markov chain on degree groups does not start in equilibrium. This is likely to occur if seeds are chosen disproportionately from individuals of low or high degree, for example, or if only infected individuals are chosen as seeds and the degree distribution of infected and uninfected individuals differ. Further, we would expect to see larger differences between the H-and SHestimates if the chain is out-of-equilibrium for a larger proportion of the sample, for example for smaller sample fractions or if convergence is slowed by homophily on degree-group.
The above reasoning is based on the assumption that the sampling process is a Markov chain on degree groups. This is extremely unlikely to ever occur in practice, but may be a good enough model to inform our investigation. We comment further on this in Section 6.

Summary
All the estimators considered substitute some estimate for the true sampling probabilites into a H-H or H-T estimator: the Naïve estimator assumes all π i are equal, the VH-and SHestimators assume π i ∝d i , and the SS-estimator uses a succesive sampling estimate of the π i . Although all the estimators considered in this paper can be shown to be asymptotically unbiased under certain conditions (see Neely (2009) and Gile (2010) for details), in practice these conditions will not hold. Of interest is how the estimators compare when the assumptions are violated. Differential recruitment, imperfect recruitment effectiveness and non-response are all sampling behaviours which violate the assumptions under which the estimators were derived. How much these types of sampling behaviour affect the estimates is likely to depend on how great the resulting difference is between the true and estimated sampling probabilties (Neely, 2009). We investigate this via a simulation study.

Design of the Study
In this section we present the simulation design used for all simulations presented in this paper. Because it is not possible to consider all configurations of network and sampling parameters, we concentrate on parameterisations which are likely to approximate real-life conditions. In particular, the level of homophily, mean degree and the number of seeds were chosen to match the characteristics of the pilot data from the CDC surveillance program (Abdul-Quader et al., 2006;Gile and Handcock, 2010).
The population is modelled as an undirected network with 1000 nodes, where each node represents one individual in the population 3 . An edge between two nodes i and i ′ means that there is a non-zero probability that i will recruit i ′ when given a coupon, and vice versa. The set of potential recruits of an individual i is therefore given by the neighbours of node i, and the number of neighbours is equivalent to the degree of i, d i .
For all simulations 200 of the nodes are infected, so P A = 0.2. The overall mean degree of the network is equal to seven. There is a moderate amount of homophily in all the networks. Specifically, we fix the relative probabilities of edges between two infected nodes and between and infected and an uninfected node, such that the former is five times as likely. In the case of 20% infected nodes and no differential activity (defined below), this implies the probability of an uninfected-uninfected edge is twice the probability of an uninfected-infected edge. Differential activity (DA) is defined as the ratio of the mean degree of infected nodes to the mean degree of uninfected nodes, i.e. D A /D B . This parameter is varied throughout the simulations and takes values in {0.5, 1, 1.8}. Parameter values given in bold face denote the default values. Networks were simulated using the R package statnet (Handcock et al., 2003(Handcock et al., , 2008.
Given a population connected by the network, we simulate a respondent-driven sampling process. In every simulation we draw ten seeds. Unless otherwise stated, seeds are selected at random with probability proportional to degree from all nodes. Sampling is done without replacement, and all respondents are given two coupons. We consider samples of size 200 and 500, but unless there are qualitative differences we only present the simulation results for samples of size 200. The parameterisation of the sampling process relating to differential recruitment, recruitment effectiveness and non-response will be discussed in more detail in the relevant sections.
For any given value of the population and sampling parameters, the simulations are run as follows: 1. Simulate 1000 networks. To allow for comparison, we used the same networks as were used in Gile and Handcock (2010).
2. For each network, simulate one respondent-driven sample. We therefore have 1000 samples in total.
3. For each sample, calculate the estimates P N A , P SS A , P VH A , P SH A and P H A . The mean of the 1000 estimates from any estimator is a measure of the expected value of that estimator under the given population and sampling parameters. The variance of the estimates is a measure of the sampling variation of that estimator in addition to how sensitive the estimator is to variations in population structure.

The Effect of Differential Recruitment
Whereas the sample design is determined by the survey organiser, differential recruitment is a property of respondent behaviour which is not possible for the survey organiser to observe directly. Therefore, it is important to consider how robust the estimators are to the presence of differential recruitment.
In this section we show that the bias of all the estimators is significantly affected by differential recruitment. The direction of this bias can be explained by considering how differential recruitment affects the probabilities of selection π i .
We first define the types of differential recruitment which we will consider in this paper.
Definition 1 (Differential recruitment) Suppose the proportion of neighbours of node i which belong to group g is equal to p gi , for all g ∈ G. Then differential recruitment exists if and only if the probability i passes a given coupon to a neighbour from group g is not equal to p gi for any g ∈ G and any i.
In other words, differential recruitment exists if a respondent is more (or less) likely to pass on a coupon to neighbours of some group than if he were choosing uniformly at random from his neighbours. We will consider two types of differential recruitment: 1. Within-group differential recruitment, when nodes preferentially recruit neighbours from within their own group, and 2. Between-group differential recruitment, when all nodes preferentially recruit nodes from a particular group.
For each type of differential recruitment we distinguish between infection-group differential recruitment and degree-group differential recruitment. It is useful to note that either type of differential recruitment can induce the other if the degree distributions of infected and uninfected nodes are not the same. We first investigate the influence of within-group differential recruitment in some detail, then consider between-group differential recruitment.

Within-group Differential Recruitment
We model within-group differential recruitment with the following assumption: Assumption 1 Suppose that node i is a member of group g, and is currently recruiting. Then the probability that node i recruits a neighbour in group g is proportional to K g , and the probability that node i recruits a neighbour from some other group g ′ = g is proportional to 1. This is similar to the two-group model of Goel and Salganik (2009), except that we allow the strength of differential recruitment to vary by group and do not assume that every individual has a non-zero probability of recruiting any other individual. Clearly if differential recruitment is greater for individuals in group g than for individuals in group g ′ , i.e. K g > K g ′ , then the sampling probabilities of group-g individuals will increase relative to those of group-g ′ individuals. If differential recruitment exists but is similar for both groups, i.e. K g ≈ K g ′ , then the relative sampling probabilities of nodes in groups g and g ′ should not change dramatically.
The effect of within-degree group differential recruitment on the estimators of P A can therefore be seen to depend on the distribution of degree among infection groups. If it is the same, then within-degree-group differential recruitment will not induce within-infection group differential recruitment, and hence will not cause additional bias when estimating P A . If the distribution of degree is not the same among infection groups, for example when there exists differential activity, then within-degree-group differential recruitment will induce withininfection-group differential recruitment. For example, suppose differential activity is equal to 1.8, so infected nodes have higher mean degree than uninfected nodes. If nodes of highdegree prefer to recruit other nodes of high-degree, this will induce differential recruitment by infection group, because infected nodes will preferentially recruit other high-degree infected nodes.
Note that it is not always possible to distinguish from the sample if differential recruitment exists, because its effect on the resulting sampling chain is similar to that of homophily -the proportion of within-group edges. For example, suppose the proportion of recruitments from group A to group B, C AB = 0.5. One possibility is that 50% of the neighbours of group A nodes are in group B, and recruitments were made uniformly at random. Alternatively, it might be the case that 70% of the neighbours of group A nodes are in group B, but that group A nodes preferentially recruited other group A nodes. Thus, sampling probabilities for a given group will increase with either differential recruitment or homophily.
Based on the properties of the estimators considered in Section 2, we can predict whether the estimates will increase or decrease with increasing within-group differential recruitment. We would expect that as K A increases relative to K B , that all of the estimates will also increase.

Results
Results of the simulations for three levels of differential activity are shown in Figure 1. Note that the counts at the top of the boxplot denote the number of samples for which the estimate was equal to one. Counts at the bottom of the boxplot denote the number of samples for which an estimate could not be calculated 4 .   Figure 1: Simulation results for varying levels of within-infection-group differential recruitment and differential activity. Differential recruitment is coded as (K B , K A ).
For all estimators and for all levels of differential activity, as K A increases relative to K B the estimates increase, as expected. It can also be seen that the VH-, SH-, H-and SSestimates are significantly higher or lower than the sample proportion for differential activity levels of 0.5 and 1.8 respectively. The SS-and VH-estimators are generally less biased than the SH-and H-estimators, and the VH-and SS-estimators also have a lower variance than the SH-and H-estimators. Other patterns, which are expected from previous studies, are that the SS-, VH-and SH-estimators correct for differential activity (in constrast to the Naïve estimator) (Gile and Handcock, 2010;Gile, 2010), and that the SS-estimates lie between those of the Naïve and VH-esitmators (Gile, 2010). However, there is no evidence that any of the estimators are adjusting for differential recruitment, because the estimates increase or decrease by approximately the same amount as the sample proportion as the differential recruitment level is varied for a given level of differential activity.
There are no significant differences between the H-and SH-estimates for any of the parameter combinations 5 . We also ran simulations with K A = K B = 1 (results not shown), and these verified that the estimates are very similar to the case that K A = K B = 1, as expected.
In order to parameterise within-degree-group differential recruitment for our simulations, we assumed that all nodes preferentially recruit other nodes of similar degree to themselves. In this way, no matter how the boundaries of the degree groups are defined by the practitioner, within-degree-group differential recruitment will exist. Figure 2 shows the probability that a node of degree i will recruit a node of degree j, for all j and some i, for the simulations with within-degree-group differential recruitment.
The simulation results are shown in Figure 3. From this figure it can be seen that, d=12 d=20 d=35 Figure 2: Relative probabilities that a node of degree d will recruit a node with degree shown on the x-axis for the simulations with within-degree-group differential recruitment, for several values of d.
for the parameterisation studied in this paper, the effect of within-degree-group differential recruitment on the bias of the estimates is small. In the absence of differential activity, withindegree group differential recruitment has no significant effect on the bias of the estimators. When there exists differential activity, then within-degree group differential recruitment reduces (DA = 0.5) or increases (DA = 1.8) the mean of each estimator. However, the effect of changing levels of differential recruitment is quite small. This can be explained by the fact that the induced levels of within-infection-group differential recruitment are likely to be similar for both infected and uninfected nodes, i.e. K A ≈ K B . Hence, from the results of within-infection-group differential recruitment discussed above, the estimators will not show substantial additional bias.
Similarly to the case for within-infection-group differential recruitment, the variance of the SS-estimator is similar to that for the VH-estimator, and the variance of the SH-and Hestimators is slightly greater. There was only one significant difference between the means of the H-and SH-estimates: when there was both differential recruitment within degree group and differential activity equal to 1.8 (adjusted p-value 0.0018). In this case the mean estimate returned by the H-estimator was 3.7 × 10 −4 greater than the mean for the SH-estimator.

Between-group Differential Recruitment
Suppose infected nodes are preferentially recruited by all nodes. In this case infected nodes will be more likely to be sampled than if there were no differential recruitment, so the true sampling probability of infected nodes will be higher than that assumed by the estimators. Therefore, following the arguments of Section 2, we would expect all the estimates to increase.
For our simulations, we parameterised differential recruitment between infection-group by the ratio of the probability a node will recruit an infected node relative to uninfected node.  Figure 3: Simulation results for varying levels of differential activity, with and without withindegree-group differential recruitment.
Possible values are {2, 1, 0.5}. Results of the simulations for three levels of differential activity are shown in Figure 4. This shows that for any fixed level of differential activity, as the probability an infected node will be recruited increases relative to the probability an uninfected node will be recruited, all of the estimates of P A also increase. There is no evidence that any of the estimators considered correct for differential recruitment in this case, although the SSand VH-estimators tend to be slightly less biased than the SH-and H-estimators. The mean of the estimates changes by about the same amount as for the sample proportion as the level of differential recruitment between infection-group is changed. Now consider between-degree-group differential recruitment. If high-degree nodes are preferentially recruited by all nodes then this is likely to influence the estimates of P A only if the occurrence of infection is greater or less in nodes of high-degree than in the general population. That is, we would expect between-degree-group differential recruitment to affect the estimates of P A only if the degree distribution of infected nodes is different from the degree distribution of uninfected nodes, for example if there exists differential activity. In this case between-degree-group differential recruitment will induce between-infection-group differential recruitment, and we would expect the corresponding bias of the estimators.
For these simulations, we used three levels of differential recruitment: none, low degree nodes are preferentially recruited ("low"), and high-degree nodes are preferentially recruited ("high"). For the latter two cases, the relative probabilities that a node of any degree will recruit a node of degree d are illustrated in Figure 5. Results of the simulations for the three levels of between-degree-group differential recruitment and three levels of differential activity are shown in Figure 6. From this figure we can see that when low-degree nodes are preferentially recruited this results in an increase in the mean of the estimators if differential activity is equal to 0.5 and a decrease in mean of the estimators if differential activity is equal to 1.8. This is as we would expect based on the induced differential recruitment between  Figure 4: Simulation results for varying levels of between-infection-group differential recruitment and differential activity.
infection groups. There is no evidence that any of the estimators correct for this type of differential recruitment.

Recruitment Effectiveness and Non-response
In this section we show that all estimators behave in a similar and predictable way in the presence of non-response. We also explore the behaviour of the estimators when there is imperfect recruitment effectiveness, and find that the SH-and H-estimators behave differently to the other estimators considered. We define recruitment effectiveness and non-response as follows: Definition 2 (Recruitment effectiveness) Denote by R g the probability that a coupon given to an individual in group g is passed on to another candidate individual in the population, given that such an individual exists. Then R g is the "recruitment effectiveness" of group g.
Definition 3 (Non-response) Denote by V g the probability that an individual in group g reports to the study centre after having been given a coupon. Then if V g = 1 we say there exists non-response, and refer to V g as the "response rate" of group g.
Recruitment effectiveness and response rate are closely linked. If a coupon given to an individual is not returned, this could be either because that individual does not use the coupon to recruit anybody (imperfect recruitment effectiveness, R g < 1), or because the person they recruit does not report back to the study centre (non-response, V g < 1). It is not always possible to tell reliably for which reason a coupon was not returned, nor to directly estimate recruitment effectiveness and non-response rates.  Figure 5: Relative probabilities of recruitment for nodes, for different levels of differential recruitment between degree-group. The dashed line corresponds to preferential recruitment of low-degree nodes, and the solid line to preferential recruitment of high degree nodes. As with all simulations, the mean degree of all nodes is equal to 7.
In this section we consider how imperfect recruitment effectiveness and non-response are likely to affect the estimators, and carry out a simulation study to test these arguments.

Changes in the Selection Probabilities
Suppose that node j ∈ g is given a coupon and there is probability R g < 1 that he uses it to recruit another participant. If j does not pass on the coupon, then the children of j miss out on a chance to be recruited. Therefore, if R g < 1 the children of nodes in group g will have a lower probability of selection than if R g were equal to 1. Hence in general one would expect that if R g < R g ′ , nodes who have many parents in group g will have a lower probability of selection that nodes with many parents in group g ′ , everything else being equal. If this effect is not balanced between infection groups, and in the presence of homophily, it is likely to result in biased estimates of P A . Because recruitment effectiveness does not affect the choice of recruit given that a coupon is passed on, changes in recruitment effectiveness would not be expected to affect the cross-group recruitment probabilities used in the SH-and H-estimators.

Additional effect of Non-response
Rather than changing coupon-passing probabilities, non-response affects which coupons are returned. Hence rather than reflecting the population proportion of infected individuals, P A will "estimate" the proportion of infected individuals among the population of respondents. Suppose we denote the set of respondents in the population by R. Then by definition 3,   Figure 6: Simulation results for varying levels of between-degree-group differential recruitment and differential activity. The labels indicate whether low ("low") or high ("high") degree nodes are preferentially recruited.
V g = N g∩R /N g and so the population proportion of infected individuals can be written as (derivation given in appendix B). If V A < V B , N A /N will be greater than N A∩R /N R . Thus if there are no other sources of bias, all the estimators will be biased downward. If V A > V B , all estimators will be biased upward. Similarly, non-response will affect the estimated cross-group recruitment probabilities. This is because the estimates are based only on the recruitments to nodes who actually respond, which is a subset of all recruitments actually made. For example, if infected nodes have a lower response-rate than uninfected nodes, then the estimated probability of an uninfected node recruiting an infected node will be reduced compared to the case of perfect response.

Recruitment Effectiveness
For our simulations, recruitment effectiveness by infection group is parameterised as a vector (R B , R A ), where R B and R A are the probability that a coupon is passed on given that a candidate for recruitment exists, for uninfected and infected nodes respectively. We consider values {(0.6, 0.9), (1, 1), (0.9, 0.6)}.
Recruitment effectiveness by degree group is parameterised in the form (α, β), where nodes of degree 1-5 have relative probability α of recruiting, nodes of degree greater than 10 have relative probability β of recruiting, and the relative probability for nodes of degree 6-10 increases linearly from α to β. Possible values are {(0.5, 1), (1, 1), (1, 0.5)}. Figure 7. Figure 7 shows that, for all levels of differential activity, as the recruitment effectiveness of uninfected nodes increases relative to the recruitment effectiveness of infected nodes, the sample proportion of infected nodes decreases. This is as expected due to the population homophily. The VH-estimates and SS-estimates also decrease as the sample proportion decreases, demonstrating that they are biased by different levels of recruitment effectiveness. However, it is interesting to note that neither the SH-nor the H-estimates appear to be influenced to the same extent by recruitment effectiveness as the VH-, SS-or Naïve estimates. If there is no differential activity then the SH-and H-estimators appear to effectively control for different recruitment effectiveness within infection groups. In the presence of differential activity there is a slight bias in the opposite direction to the Naïve estimator, but this tends to be small relative to the bias of the other estimators. We also ran the simulation for a sample size of 500. In this case, shown in Figure 8, the H-and SHestimators have a substantial bias in the opposite direction to the Naïve and VH-estimators. Thus the simulation results suggest that the SH-and H-estimates correct for bias due to differential recruitment in small sample fractions. Again, there is no significant difference between the means of the SH-and H-estimators.

Results of the simulations for the different levels of recruitment effectiveness by infection group are shown in
In the case of different levels of recruitment effectiveness by degree-group, shown in Figure  9, a similar effect can be seen. In this case, if there is no differential activity then different levels of recruitment effectiveness by degree group does not bias the estimators of P A , because the induced recruitment effectiveness by infection-group is the same for infected and uninfected groups. For differential activity greater than 1, if low-degree nodes have higher recruitment rates than high-degree nodes (parameterisation (1, 0.5)), this means that on average uninfected nodes will have higher recruitment rates than infected nodes. Accordingly the sample proportion of infected individuals decreases as recruitment effectiveness by degree- group changes from (0.5, 1) to (1, 0.5) for differential activity equal to 1.8, and the reverse is true if differential activity is equal to 0.5. Therefore, the patterns of bias are similar to the case of recruitment effectiveness by infection group. The bias of the H-and SH-estimators increases accordingly for samples of size 500 (not shown). The difference in mean between the SH-and H-estimators is statistically significant when recruitment effectiveness by degree-group is equal to (0.5, 1) and differential activity is 0.5 or 1.8. In both of these cases the mean of the H-estimates was closer to the true value than the mean of the SH-estimates. However, from the simulation results of Figure 9, this difference may not be large enough to be of practical significance.

Non-response
Non-response by infection group is parameterised as a vector (V B , V A ), where V B and V A are the probability that a node responds given that they have been recruited, for uninfected and infected nodes respectively. Possible values are {(0.6, 0.9), (1, 1), (0.9, 0.6)}. Non-response by degree group is parameterised in the same way as recruitment effectiveness by degree group. Possible values are {(0.5, 1), (1, 1), (1, 0.5)}.
Results of the simulations for the different levels of non-response by infection group and degree group are shown in Figures 10 and 11 respectively. From Figure 10, as the response-rate decreases for infected nodes relative to uninfected nodes, i.e. the parameterisation changes from (0.6, 0.9) to (0.9, 0.6), the sample proportion of infected nodes decreases, as expected. Again, all estimators are biased by changes in the relative rates of non-response. As the sample proportion of infected individuals decreases, so too do the Naïve, SS-, VH-, SH-and H-estimates. There are no significant differences between the mean SH-and H-estimates.
For the case of non-response by degree-group, similarly to the case of recruitment effec-   Figure 9: Simulation results for varying levels of recruitment effectiveness by degree group and differential activity.
tiveness by degree-group, the results can be interpreted in terms of induced non-response by infection-group. Viewed in this way, the results shown in Figure 11 are consistent with those of Figure 10. There are significant differences between the SH-and H-estimates when there exists both non-response by degree group and differential activity, but again these differences don't appear to be large. None of the estimators seem to control for different rates of non-response between infected and uninfected groups.

Comparison of the Salganik-Heckathorn and Heckathorn Estimators
In Section 2.5 we hypothesised that the SH-and H-estimates would differ if the Markov chain on degree groups was started out of equilibrium. Further, the more slowly the chain reaches equilibrium, the larger the expected difference between the SH-and H-estimates for a fixed sample size.
In the simulations presented in Sections 4 and 5 there were very few statistically significant differences between the mean SH-and H-estimates. Those differences which were significant always occurred when there was differential activity and differences in recruitment or response behaviour between degree groups. These are the cases where the equilibrium probabilities of the Markov chain on degree groups are likely to be furthest from the degree distribution of the seeds. However, in all simulations the differences between the mean (and median) estimates of the H-and SH-estimators were unlikely to be large enough to be of practical concern (all mean differences were less than 0.001), and the variances of the estimates were also nearly identical. Heckathorn (2007, pp. 185) admits that traditional RDS designs are likely to result in only "modest" differences between the two estimators. However, he theorises that ". . . changes in research design that induce an association between degree and opportunities to recruit would produce much larger potential effects." A design that gave respondents with higher reported degree more coupons than respondents with low reported degree would have such an effect. Such a design can be viewed as equivalent to one where all nodes are given the same number of coupons but nodes of lower degree have much lower recruitment effectiveness. This case was considered in Section 5, Figure 9, and again produced only very small differences between the H-and SH-estimates. Similarly, a design where infected nodes are given more coupons than uninfected nodes is equivalent to the case we considered in Figure 7, where uninfected nodes have lower recruitment effectiveness than infected nodes. Again, only very small differences between the H-and SH-estimates were observed. We also re-ran the simulations using four coupons rather than two, but this did not produce any significant changes in the results. To try and induce larger differences between the estimates we looked at the effect of increasing homophily (to slow convergence), preferentially selecting seeds from nodes of low or high degree (to start the chain out of equilibrium), and more extreme differences in recruitment effectiveness. The largest differences between the SH-and H-estimates arose when the seeds were selected from nodes of low degree. Compared to the effect of seed selection, increasing differential activity or homophily had little effect on the difference between the estimates. Simulation results for the case that ten seeds are chosen uniformly at random from the twenty nodes of lowest degree are shown in Figure 12. The observable differences between the H-and SH-estimates when seeds are selected from low-degree nodes are statistically significant. It is also very interesting to note that in these cases the H-estimator appears to correct somewhat for the bias introduced by the seed selection mechanism.
To investigate further, we considered the differences in absolute deviations from 0.2 of the H-and SH-estimates. Figure 13 shows that on average the H-estimate is closer to the   Figure 11: Simulation results for varying levels of non-response by degree group and differential activity.
true value of 0.2 than the SH-estimate when seeds are chosen from nodes of either high or low degree. This observation is robust to changes in sample-size. The simulation results therefore suggest that the H-estimator corrects for bias due to selecting seeds with degrees not representative of the theoretical equilibrium distribution on degree groups. If there are no other sources of bias, the H-estimator is likely to be less biased than the SH-estimator in this case. However, it should be noted that the seed selection scenarios which produced the observed differences in behaviour were quite extreme, and are unlikely to occur unless by design.
We also considered how the SH-and H-estimators behave when seeds are selected from nodes of low degree and, in addition, there exists differential recruitment, unbalanced recruitment effectiveness or non-response. In this case the biases due to seed selection and recruitment effectiveness or non-response behaved in an approximately additive fasion. Hence depending on the components of the additive bias, either the SH-or H-estimates could have the lower total bias.
In practice, a scenario as extreme as that illustrated in Figure 12 is unlikely to arise. More realistically, it could happen that there exists differential activity and that seeds are selected only from infected individuals. If infected nodes have a lower or higher average degree than uninfected nodes, then this will induce the same effect as considered above. We ran the simulations for this case and, although the difference between the mean of the Hand SH-estimators was statistically significant for the cases with differential activity and seeds chosen only from infected nodes (and only those cases), the size of the differences (approximately 0.001) is unlikely to be of practical concern. This indicates that it is unlikely that these conditions will result in large differences between H-and SH-estimates in traditional respondent driven sampling designs.  Figure 12: Box-plots of the estimates for different methods of selecting seeds. Seeds are either selected with probability proportional to degree or uniformly at random from the twenty nodes of lowest degree.

Discussion
In this paper we compare the performance of several respondent driven sampling estimators under conditions of differential recruitment, recruitment effectivness and non-response. Our specific findings are summarized in Table 1. Of particular importance is the performance of the estimator introduced in Heckathorn (2007), which claims to adjust for differential recruitment and recruitment effectiveness. We did not find any evidence that the H-estimator adjusts for differential recruitment or non-response. However, in the case of recruitment effectiveness, the resulting bias of the H-estimator and the SH-estimator is in the opposite direction to that of the Naïve, SS-and VH-estimators, and the size of this bias depends on the sampling fraction (see Figures 7  and 8). For the simulations with sample size 200, this resulted in the SH-and H-estimators effectively controlling for the bias due to imperfect recruitment effectiveness.
When seeds were randomly selected with probability proportional to degree, under all conditions of differential recruitment, recruitment effectiveness and non-response the estimates of the H-estimator were almost identical to those of the SH-estimator. In Sections 4 (differential recruitment) and 5 (recruitment effectiveness and non-response), we found statistically significant differences between these two estimators in only 10 of 69 simulation cases (sample size 200). The magnitudes of the differences in these cases were small enough to be negligible for practical purposes.
This led us to explore, in Section 6, other conditions which might lead to differences between the H-and SH-estimates. Here, we found that we can induce differences between the two estimators when the following two conditions both hold: 1. The seeds are selected with probability far from the theoretical equilibrium probabilities 2. The degree distribution of infected and uninfected nodes is not the same (for example if there is differential activity).
Under these conditions the H-estimator was significantly less biased than the SH-estimator, indicating that the H-estimator may correct for the bias introduced by this type of seed selection. Specifically, we found the largest differences between the estimates of the H-and SH-estimators when seeds were selected from among the nodes of lowest degree. Smaller, but substantial differences were also apparent when selecting seeds from among nodes of the highest degree. It is possible this scenario may arise by design, where study designers try to recruit well-connected population members as seeds.
If there are no sources of bias other than unbalanced selection of seeds by degree, then the H-estimator is likely to have a smaller bias than the other estimators considered in this paper. However, if other sources of bias are present (which will almost always be the case), then these will interact to mean that the absolute bias of the H-estimator may not be the smallest of the estimators considered.
All estimators were subject to bias induced by differential recruitment, recruitment effectiveness and non-repsonse. It is interesting, however, that when the differential recruitment and recruitment effectiveness acted only on degree groups, bias was only present in the estimators when the infection groups had different degree distributions (differential activity = 1). Without this condition, over or under representing degree groups and their sampling patterns in the estimators does not differentially affect the two groups. Across these simulations, for random selection of seeds the H-and SH-estimators tended to perform similarly in each case. The Naïve estimator (sample mean) tended to exhibit the largest bias, whereas the SS-, VH-, SH-and H-estimators showed reduced bias due to adjusting for differential activity (Gile and Handcock, 2010). For all cases the SS-estimates fell between the Naïve and VHestimates. In general, apart from the conditions of seed selection mentioned above, none of the SS-, VH-, SH-or H-estimators were consistently less biased than any other. They also had similar variances, although variance of the VH-and SS-estimators appeared to be slightly less in many cases, and the estimates returned by the SS-and VH-estimators were not subject to the same instability as the SH-and H-estimates.
Overall, our findings, indicate that no estimator consistently out-performs the others under all conditions. We find that with small sample fractions, the SH-and H-estimators correct for differential recruitment effectiveness across groups, which induces bias in the other estimators in the presence of homophily. This is an advantage, and therefore these estimators should be used whenever the sample fraction is small and differential recruitment effectiveness is suspected. However, these estimators also are also subject to more instability, and have larger variance and greater bias than the VH-or SS-estimators under many other sample conditions, both in this paper, and as described in Gile and Handcock (2010). This suggests that absent differential recruitment effectiveness, the VH-or SS-estimators are to be preferred. In the case of large sample fractions, the SS-estimator is clearly to be preferred (Gile, 2010). Our results do not provide evidence that the H-estimator should be preferred over the SHestimator. Although we found differences between the H-and SH-estimators in some extreme cases, we suspect these cases are rare enough so as to not justify the additional complexity of the H-estimator. The lack of clear preference among these estimators highlights the need for both new estimators, and also for new diagnostic tests aimed at identifying circumstances in which each estimator is to be preferred.
It is also important to keep in mind the limitations of this study. It was not possible to consider all possible combinations and levels of the parameterisations of the effects studied in this paper. We have tried to focus on a range of simulation conditions and parameter values most likely to highlight contributions of the H-estimator, and considered variations of many parameters at once to draw out potential interaction effects. Nevertheless, there may well be other population or sampling parameters whose levels will impact the results of our study. While we have tried to include any confounders we felt might affect our qualitative findings, any effect sizes should be understood as specific to the simulation conditions under study.

A Derivation of Hansen-Hurwitz Estimators
Consider the Hansen-Hurwitz estimator of Π A , Thus, taking the ratio of Hansen-Hurwitz estimators for Π A and Π A +Π B shows that n A /(n A + n B ) is a generalised Hansen-Hurwitz estimator of Π A /(Π A + Π B ).

B Non-response
By definition 3, V g = N g∩R /N g and so the population proportion of infected individuals can be written as as claimed.