Demand and Welfare Analysis in Discrete Choice Models with Social Interactions

Many real-life settings of consumer-choice involve social interactions, causing targeted policies to have spillover-effects. This paper develops novel empirical tools for analyzing demand and welfare-effects of policy-interventions in binary choice settings with social interactions. Examples include subsidies for health-product adoption and vouchers for attending a high-achieving school. We establish the connection between econometrics of large games and Brock-Durlauf-type interaction models, under both I.I.D. and spatially correlated unobservables. We develop new convergence results for associated beliefs and estimates of preference-parameters under increasing-domain spatial asymptotics. Next, we show that even with fully parametric specifications and unique equilibrium, choice data, that are sufficient for counterfactual demand-prediction under interactions, are insufficient for welfare-calculations. This is because distinct underlying mechanisms producing the same interaction coefficient can imply different welfare-effects and deadweight-loss from a policy-intervention. Standard index-restrictions imply distribution-free bounds on welfare. We illustrate our results using experimental data on mosquito-net adoption in rural Kenya.


Introduction
Social interaction models -where an individual's payo¤ from an action depends on the perceived fraction of her peers choosing the same action -feature prominently in economic and sociological research. In this paper, we address a substantively important issue that has received limited attention within these literatures, viz. how to conduct economic policy evaluation in such settings.
In particular, we focus on welfare analysis of policy interventions in binary choice scenarios with social interactions. Examples include subsidies for adopting a health-product and merit-based vouchers for attending a high-achieving school, where the welfare gain of bene…ciaries may be accompanied by spillover-led welfare e¤ects on those unable to adopt or move, respectively. Ex-ante welfare analysis of policies is ubiquitous in economic applications, and informs the practical decision of whether to implement the policy in question. Furthermore, common public interventions such as taxes and subsidies are often motivated by e¢ ciency losses resulting from externalities. Therefore, it is important to develop empirical methods for welfare analysis in presence of such externalities, which cannot be done using available tools in the literature. Developing such methods and making them practically relevant also requires one to clarify and extend some aspects of existing empirical models of social interaction.
Literature Review and Contributions: Seminal contributions to the econometrics of social interactions include Manski (1993) for continuous outcomes, and Brock and Durlauf (2001a) for binary outcomes. More recently, there has been a surge of research on the related theme of network models, c.f. de Paula (2016). On the other hand, the econometric analysis of welfare in standard discrete choice settings, i.e. with heterogeneous consumers but without social spillover, started with Domencich and McFadden (1977), with later contributions by Daly and Zachary (1978), Small and Rosen (1981), and Bhattacharya (2018). The present paper builds on these two separate literatures to examine how social interactions in ‡uence welfare e¤ects of policy-interventions and the identi…ability of such welfare e¤ects from standard choice data. In the context of binary choice with social interactions, Brock and Durlauf (2001a, Sec 3.3) discussed how to rank di¤erent possible equilibria resulting from policy interventions in terms of social utility -as opposed to individual welfare. They used log-sum type formulae, as in Small and Rosen (1981), to calculate the average indirect utility for speci…c realized values of covariates and average peer choice. Such calculations are not directly useful for our purpose. This is because the aggregate income transfer that restores average social utility to its pre-intervention level does not equal the average of individual compensating variations that restore individual utilities to their pre-intervention level. The latter is related to the concept of average deadweight loss, i.e. the e¢ ciency cost of interventions, and consequently has received the most attention in the recent literature on empirical welfare analysis, c.f. Hausman and Newey (2016), Bhattacharya (2015), McFadden and Train (2019), and it is this notion of individual welfare that we are interested in. However, in settings involving spillover, we cannot use the methods of the above papers, as they do not allow for individual utilities to be a¤ected by aggregate choices -a feature that has fundamental implications for welfare analysis. Therefore, new methods are required for welfare calculations under spillover, which we develop in the present paper.
In order to develop these methods, one must …rst have a theoretically coherent utility-based framework where many individuals interact with each other, i.e. provide a micro-foundation for Brock-Durlauf type models in terms of an empirical game with many players. This is necessary because welfare e¤ects are de…ned with respect to utilities, and therefore, one has to specify the structure of individual preferences and beliefs including unobserved heterogeneity, and how they interact to produce the aggregate choice in equilibrium before and after the policy intervention. This requires clarifying the information structure and nature of the corresponding Bayes-Nash equilibria.
A pertinent issue here is modelling the dependence structure of utility-relevant variables unobservable to the analyst but observable to the individual players. In particular, spatial correlation in unobservables -natural in the commonly analyzed setting where peer-groups are physical neighborhoods -makes individual beliefs conditional on one's own privately observed variables which contain information about neighborhood ones. This complicates identi…cation and inference. The …rst main contribution of the present paper is to establish conditions under which this feature of beliefs can be ignored 'in the limit', and one can proceed as if one is in an I.I.D. setting. This derivation is much more involved than the well-known result that in linear regression models, the OLS is consistent under correlated unobservables. In particular, our result involves showing that the …xed points of certain functional maps converge, under increasing domain and weak dependence asymptotics for spatial data, to …xed points of a limiting map, implying convergence of conditional beliefs to unconditional ones. This, in turn, is shown to imply convergence of complicated estimators of preference parameters under conditional beliefs to computationally simple ones in the limit. These estimators then yield consistent, counterfactual demand-prediction corresponding to a policy-intervention.
The standard setting in the game estimation literature is one where many independent markets are observed, each with a small number of players. Here, we consider estimation of preference parameters from data on a few markets with many players in each, using asymptotic approximations where the number of players tends to in…nity but number of markets remains …xed. In this setting, if the forms of equilibrium beliefs is symmetric among players, 1 the probabilistic laws that they follow have a certain homogeneity across players. Due to this homogeneity, asymptotics on the number of players provides the 'repeated observations'required to identify the players'preference parameters. Menzel (2016) had also analyzed identi…cation and estimation in games with many players. Below, we provide more discussion on the relation and di¤erences between our analysis 1 Symmetry means that (1) if the beliefs are unconditional expectations -as is the case with I.I.D. unobervables -they are identical across players, (2) if they are conditional expectations -as is the case for spatially correlated unobservables -their functional forms are identical. and Menzel's.
Welfare Analysis: The second part of our paper concerns welfare-analysis of policy-interventions, e.g. a price-subsidy, in a setting with social interactions. Here we show that unlike counterfactual demand estimation, welfare e¤ects are generically not identi…ed from choice data under interactions, even when utilities and the distribution of unobserved heterogeneity are parametrically speci…ed, equilibrium is unique, and there are no endogeneity concerns. To understand the heuristics behind under-identi…cation, consider the empirical example of evaluating the welfare e¤ect of subsidizing an anti-malarial mosquito net. Suppose, under suitable restrictions, we can model choice behavior in this setting via a Brock-Durlauf type social interaction model, and the data can identify the coe¢ cient on the social interaction term. However, this coe¢ cient may re ‡ect an aggregate e¤ect of (at least) two distinct mechanisms, viz. (a) a social preference for conforming, and (b) a health-concern led desire to protect oneself from mosquitoes de ‡ected from neighbors who adopt a bednet. These two distinct mechanisms, with di¤erent magnitudes in general, would both make the social interaction coe¢ cient positive, and are not separately identi…able from choice data (only their sum is). But they have di¤erent implications for welfare if, say, a subsidy is introduced. At one extreme, if all spillover is due to preference for social conforming, then as more neighbours buy, a household that buys would experience an additional rise in utility (over and above the gain due to price reduction), but a non-buyer loses no utility via the health channel. At the other extreme, if spillover is solely due to perceived negative health externality of buyers on non-buyers, then increased purchase by neighbours would lower the utility of a household upon not buying via the health-route, but not a¤ect it upon buying since the household is then protected anyway. These di¤erent aggregate welfare e¤ects are both consistent with the same positive aggregate social interaction coe¢ cient. This conclusion continues to hold even if eligibility for the subsidy is universal, there are no income e¤ects or endogeneity concerns, and whether or not unobservables in individual preferences are I.I.D. or spatially correlated.
Indeed, this feature is present in many other choice situations that economists routinely study.
For example, consider school-choice in a neighborhood with a free, resource-poor local school and a selective, fee-paying resource-rich school. In this setting, a merit-based voucher scheme for attending the high-achieving school can potentially have a range of possible welfare e¤ects. Aggregate welfare change could be negative if, for example, with high-ability children moving with the voucher the academic quality declines in the resource-poor school more than the improvement in the selective school via peer-e¤ects. In the absence of such negative externalities, aggregate welfare could be positive due to the subsidy-led price decline for voucher users and any positive conforming e¤ects that raise the utility of attending the rich school when more children also do so. These contradictory welfare implications is compatible with the same positive coe¢ cient on the social interaction term in an individual school-choice model.
For standard discrete choice without spillover, Bhattacharya (2015) showed that the choice probability function itself contains all the information required for exact welfare analysis. In ineligibles are zero by de…nition. The resulting net welfare e¤ect, aggregated over both eligibles and ineligibles, admits a large range of possible values including both positive and negative ones, with associated large variation in the implied deadweight loss estimates, all of which are consistent with the same coe¢ cient on the social interaction term in the choice probability function.
An implication of these results for applied work is that welfare analysis under spillover effects requires knowledge of the di¤erent channels of spillover separately, possibly via conducting a 'belief-elicitation' survey; knowledge of only the choice probability functions, inclusive of a social interaction term, is insu¢ cient.
Plan of the Paper: The rest of the paper is organized as follows. Section 2 describes the set-up, and establishes the formal connection between econometric analysis of large games and Brock-Durlauf type social interaction models for discrete choice, …rst under I.I.D. and then under spatially correlated unobservables. This section contains the key results on convergence of conditional (on unobservables) beliefs in the spatial case to non-stochastic ones under an increasing domain asymptotics. Section 3 shows consistency of our preferred, computationally simple estimator even under spatial dependence, Section 4 develops the tools for empirical welfare analysis of a price intervention -such as a means-tested subsidy -in such models, and associated deadweight loss calculations. In Section 5, we lay out the context of our empirical application, and in Section 6 we describe the empirical results obtained by applying the theory to the data. Finally, Section 7 summarizes and concludes the paper. Technical derivations, formal proofs and additional results are collected in an Appendix.

Set-up and Assumptions
Consider a population of villages indexed by v 2 f1; : : : ; vg and resident households in village v indexed by (v; h), with h 2 f1; : : : ; N v g. For the purpose of inference discussed later, we will think of these households as a random sample drawn from an in…nite superpopulation. The total number of households we observe is N = P v v=1 N v . Each household faces a binary choice between buying one unit of an indivisible good (alternative 1) or not buying it (alternative 0). Its utilities from the two choices are given by U 1 (Y vh P vh ; vh ; vh ) and U 0 (Y vh ; vh ; vh ) where the variables Y vh , P vh , and vh denote respectively the income, price, and heterogeneity of household (v; h), and vh is household (v; h)'s subjective belief of what fraction of households in her village would choose alternative 1. The variable vh is privately observed by household (v; h) but is unobserved by the econometrician and other households. The dependence of utilities on vh captures social interactions. Below, we will specify how vh is formed. Household (v; h)'s choice is described by A vh = 1 fU 1 (Y vh P vh ; vh ; vh ) U 0 (Y vh ; vh ; vh )g ; (1) where 1 f g denotes the indicator function. In the mosquito-net example of our application, one can interpret U 1 and U 0 as expected utilities resulting from di¤erential probabilities of contracting malaria from using and not using the net, respectively.
The utilities, U 1 and U 0 , may also depend on other covariates of (v; h). For notational simplicity, we let W vh = (Y vh ; P vh ) 0 , and suppress other covariates for now; covariates are considered in our empirical implementation in Section 6.
For later use, we also introduce a set of location variables fL vh g: where L vh 2 R 2 denotes (v; h)'s (GPS) location.
Incomplete-Information Setting: In each village v, each of the N v households is provided the opportunity to buy the product at a researcher-speci…ed price P vh randomly varied across households. These households will be termed as players from now on. Players have incomplete information in that each player (v; h) knows her own variables (A vh ; W vh ; L vh ; vh ). We assume, in line with our application context, that a player does not know the identities of all the players who have been selected in the experiment and thus their variables (Wṽ k ; Lṽ k ; ṽk ) and choice Aṽ k (for anyṽ 2 f1; : : : ; vg and k 6 = h). Accordingly, we model interactions of households as an incomplete-information Bayesian game, whose probabilistic structure is as follows.
We consider two sources of randomness: one stemming from random drawing of households from a superpopulation, and the other associated with the realization of players'unobserved heterogeneity f vh g. This will be further elaborated below.
We assume players have 'rational expectations' in accordance with the standard Bayes-Nash setting, i.e., each (v; h)'s belief is formed as where E [ jI vh ] is the conditional expectation computed through the probability law that governs all the relevant variables given (v; h)'s information set I vh that includes (W vh ; vh ). Here, 'rational expectation'simply means that subjective and physical laws of all relevant variables coincide. The explicit form of (2) in equilibrium is investigated in the next subsection after we have speci…ed the probabilistic structure for all the variables.
Each player (v; h) is solely concerned with behavior of other players in the same village. In this sense, the econometrician observes v games ( v is eleven in our empirical study), each with 'many' players. To formalize our model as a Bayesian game in each village, given the form of (2), U 1 and U 0 would be interpreted as expected utilities. This is possible when the underlying vNM utility indices u 1 and u 0 satisfy i.e., u 1 is linear in the second argument; U 0 and u 0 satisfy an analogous relationship. This will hold in particular when utilities have a linear index structure, as in Manski (1993) and Durlauf (2001a, 2007).
Dependence Structure of Unobserved Heterogeneity: We assume that unobserved heterogeneity f vh g Nv v=1 (v = 1; : : : v) takes the following form: where v stands for a village-speci…c factor that is common to all members in the vth village and u vh represents an individual speci…c variable. Below we will consider two di¤erent speci…cations for the sequence fu vh g Nv h=1 : for each v, given v , viz., (1) u vh are conditionally independent and identically distributed, and (2) u vh is spatially dependent. 2 We assume that the value of v is commonly known to all members in village v but u vh is a purely private variable known only to individual (v; h). Neither f v g nor fu vh g is observable to the econometrician. We also assume that this information structure as well as the probabilistic structure of variables imposed below (c.f. conditions C1, C2, and C3 with I.I.D. or SD below) is known to all the players in the game.
Given our settings so far, we can specify the form of player (v; h)'s information set as In our empirical set-up, the group level unobservables f v g will be identi…ed using the fact that there are many households per village.
Having described the set-up through equations: (1), (2), (3), and (4), we now close our model by providing the following conditions on the probabilistic law for the key variables: C1 f(W vh ; L vh ; v ; u vh )g Nv h=1 , v = 1; : : : ; v, are independent across v.
Assumption C1 says that variables in village v are independent of those in villageṽ(6 = v).
C2 For each v 2 f1; : : : ; vg, This conditional I.I.D.-ness of C2 for observables represents randomness associated with sampling of households in our …eld experiment. Additionally, the household (v; h) is assumed to know For the distribution of unobservable heterogeneity, we consider two alternative scenarios: C3-IID (i) For each v, given v , the sequence fu vh g Nv h=1 is conditionally I.I.D., with u vh j v F v u ( j v ); (ii) fu vh g Nv h=1 is independent of fW vh ; L vh g Nv h=1 conditionally on v .
C3-SD For each v, the sequence fu vh g de…ned as for a stochastic process fu v (l)g l2Lv , indexed by location are independent of fu v 0 (l)g l2R v 0 for v 6 = v 0 , and satisfy the following properties: (i) for each v, fu v (l)g l2Rv is an alpha-mixing stochastic process conditionally on v , where the de…nition of an alpha-mixing process is provided in Appendix A.2; (ii) fu v (l)g l2Rv is independent of The conditional I.I.D.-ness imposed in C3-IID (i) leads to equi-dependence within each village, i.e., Cov [ vh ; vk ] = Cov vh ; vk (6 = 0) for any h 6 = k andh 6 =k. In contrast, C3-SD (i) allows for non-uniform dependence that may vary depending on the relative locations of the two players, i.e., if two households (v; h) and (v; k) selected in the experiment with locations L vh and L vk , respectively, live close to each other (i.e., jjL vh L vk jj is small), u vh and u vk (and thus vh and vk ) are more correlated. For example, in our application on mosquito-net adoption, this can correspond to positive spatial correlation in density of mosquitoes, unobserved by the researcher.
Assumption C3-SD is consistent with the "increasing domain" type asymptotic framework used for spatial data, formally set out in Appendix A.2 of this paper (brie ‡y, the area of R v = R N v tends to 1 as N ! 1; c.f. Lahiri, 2003, Lahiri andZhu, 2006).
For the purpose of inference, C3-SD may be seen as a generalization of C3-IID, but in our Bayes-Nash framework with many players, they will, in general, imply substantively di¤ erent forms for beliefs and equilibria. In particular, under C3-IID, each player (v; h)'s unobservables u vh is not useful for predicting another player (v; k)'s variables and behavior, and therefore her belief vh -de…ned in (2) as the average of the conditional expectations about all the others'A vk -is reduced to the average of the unconditional expectations (as formally shown in Proposition 1) below. On the other hand, under the spatial dependence scheme C3-SD, since u vh and u vk are correlated, knowing one's own realized value of u vh can help predict others'u vk ; in other words, (v; h)'s own information I vh = (W vh ; L vh ; u vh ; v ) is useful for forming beliefs about others.
This allows for iden-ti…cation and consistent estimation of model parameters. In the context of the …eld experiment in our empirical exercise, this exogeneity condition can be interpreted as saying that realization of unobserved heterogeneity is independent of how researchers have selected the sample. Note that the exogeneity condition is conditional on L vh (and v ), and it does not exclude correlation of u vh and W vh (P vh ; Y vh ) in the unconditional sense. Say, if Y vh is well predicted by location L vh (say, there are high-income districts and low-income ones, and no restriction is imposed on the joint distribution of (W vh ; L vh )), we can still capture situations where u vh tends to be higher for Two Sources of Randomness: The above probabilistic framework with two sources of randomness has parallels in Andrews (2005, Section 7) and Lahiri and Zhu (2006). It is also related to Menzel's (2016) framework with exchangeable variables (below we provide further comparison of our framework with Menzel's). As stated, C2 represents randomness induced by the researchers' experimental process. In contrast, the speci…cation in C3 represents randomness of unobserved heterogeneity conditionally on fL vh g Nv h=1 , the (locations of) households selected in the experiment. Conditions C2 and C3-IID imply that f(W vh ; L vh ; u vh )g Nv h=1 are I.I.D. conditionally on v , and thus our framework can be interpreted as the standard one with a single source of randomness.
For the spatial case C3-SD, the beliefs depend on I vh , and in particular, on the unobservable (to the econometrician) u vh , which complicates identi…cation and inference. We get around this complication by showing that under an "increasing domain" type of asymptotics for spatial data, reasonable in our application, the model and estimates of its parameters under C3-SD converge essentially to the simpler model C3-IID, and this justi…es the use of Brock-Durlauf type analysis even under spatial dependence.

Equilibrium Beliefs
In this subsection, we investigate the forms of players'beliefs de…ned in (2) …rst in the I.I.D. and then in the spatially dependent case. We …rst consider the case of C3-IID. This case corresponds to Brock and Durlauf's (2001a) binary choice model with social interactions where, additionally, unobserved heterogeneity was modelled through the logistic distribution. BD01 made an intuitive, but somewhat ad hoc, assumption that beliefs, corresponding to our vh , are constant and symmetric across all players in the same village. We …rst show that under C3-IID, this assumption can be justi…ed in our incomplete-information game setting via the speci…cation of a Bayes-Nash equilibrium. We next consider the spatially dependent case with C3-SD. As brie ‡y discussed above, beliefs under the spatial dependence have to be computed through conditional expectations. However, under an "increasing domain"asymptotic framework for spatial data, conditional-expectation based beliefs converge to the beliefs in the I.I.D. case. The mathematical derivation of this result is somewhat involved; so in the main text we outline the key points, and provide the formal derivation in the Appendix. We investigate the forms of beliefs under C3-IID through the two following propositions: 3 In our application, prices P vh are randomly assigned to individuals by researchers and thus P vh and u vh are independent both unconditionally and conditionally on L vh . Proposition 1 Suppose that Conditions C1, C2, and C3-IID are common knowledge in the Bayesian game described in the previous section. Then, for any The proof of Proposition 1 is provided in Appendix A.1. Note that this proposition does not utilize any equilibrium condition. It simply con…rms, formally, the intuitive statement that (v; h)'s own variables are not useful to predict other (v; k)'s behavior A vk . Given this result, we can write the belief vh (de…ned in (2)) as and vh is a function of v and independent of (v; h)-speci…c variables, (W vh ; L vh ; u vh ), while the functional form of vh may depend on the index (v; h) in a deterministic way; for notational simplicity, we suppress the dependence of vh on v below.
Beliefs in equilibrium solve the system of N v equations: )# ; h = 1; : : : ; N v ; where for notational economy, we will often suppress the dependence of m v (r) on v ; but note that m v (r) is independent of individual index h under the conditional I.I.D. assumption given v . Now we are ready to provide the following characterization of beliefs: Proposition 2 Suppose that the same conditions hold as in Proposition 1 and the function m v v ( ) de…ned in (8) is a contraction, i.e., for some 2 (0; 1), jr rj for any r;r 2 [0; 1] : Then, a solution ( v1 ; : : : ; vNv ) of the system of N v equations in (7) uniquely exists and is given by symmetric beliefs, i.e., vh = vk for any h; k 2 f1; : : : ; N v g: The proof is given in the Appendix. Propositions 1-2 show that, given the (conditional) I.I.D.
and contraction conditions, the equilibrium is characterized through vh = v for any h = 1; : : : ; N v ; within each village (given v ). This implies that the beliefs can be consistently estimated by the sample average of A vk over village v, which is exploited in our empirical study.
The contraction condition (9) can be veri…ed on a case by case basis. In particular, for the linear index model used below, the condition is where denotes the coe¢ cient on beliefs, i.e. the social interaction term, and f " ( ) denotes the density of ", the unobservable determinant of choosing option 1 (de…ned below through vk or u vh ).
We verify that these conditions are satis…ed in our application.
Note, however, from the proof of Proposition 2, that the contraction condition (9) is not necessarily required for uniqueness. That is, if a solution ( v1 ; : : : ; vNv ) to the system of equations (7) is unique and m v ( ) de…ned in (8) has a unique …xed point (i.e., a solution to r = m v (r) is unique), then the same conclusion still holds. We have imposed (9) since it is a convenient su¢cient condition that guarantees uniqueness both in (7) and r = m v (r); it also appears to be a mild condition, and easy to verify in applications.

Convergence of Beliefs under Spatial Dependence
In this subsection, we provide a formal characterization of beliefs in equilibrium under the spatial case C3-SD. When the unobserved heterogeneity fu vh g are dependent, beliefs in equilibrium may not reduce to a constant within each village, unlike in Proposition 1. With correlated u vk and u vh , the conditional expectation E[A vk jI vh ] is in general a function of the privately observed u vh , because knowing u vh is useful for predicting u vk and thus A vk (the latter is a function of u vk ).
While (v; h)'s beliefs are given by a constant under C3-IID, they will in general be a function of (v; h)'s variables unobserved by the researcher, when spatial dependence is allowed, thereby complicating the analysis. In this subsection, we investigate formal conditions under which this feature of beliefs disappears "in the limit". 5 Asymptotic Framework for Spatial Data: Under spatial dependence, the …rst key condition enabling consistent estimation of our model parameters is the spatial analog of weak dependence. This amounts to specifying that u vk and u vh are less dependent when the distance between (v; k) and (v; h), jjL vk L vh jj 1 , is large. The notion of asymptotics we use is the so-called "increasing domain" type (c.f. Lahiri, 1996), where the area from which fL vk g Nv k=1 is sampled expands to in…nity as N v ! 1. In particular, for each player h, the number of other players who are almost uncorrelated with h expands to 1, and the ratio of such players (relative to all N v players) tends to 1. Given this, and assuming that any bounded region in the support of L vk does not contain too many observations (even when N v tends to 1), we can (i) ignore the e¤ect of spatial dependence on equilibrium beliefs "in the limit", and (ii) derive limit results for spatial data (e.g., the laws of large numbers and central limit theorems as in Lahiri, 1996Lahiri, , 2003, and use these to develop an asymptotic inference procedure.
In our empirical set-up, the average distance between households within every village is more than 1 kilometer, and is close to 2 kilometers in most villages. This corresponds well with the increasing domain framework above.
Convergence of Equilibrium Belief : We now characterize the game's equilibrium under the asymptotic scheme outlined above. The formal details of the analysis are laid out in Appendix A.2; here we outline the main substantive features and their implications for the belief structure.
To characterize beliefs in equilibrium, write given each v . vh ( ) may depend on index (v; h) in a deterministic way. Note that this expression (10) follows from the speci…cation of vh in (2), de…ned as the average of the conditional expectations. Then, in the equilibrium, for each village v, beliefs are given by the set of functions, vh ( ), h = 1; : : : N v , that solves the following system of N v equations: for h = 1; : : : ; N v (almost surely).
Note that the solution f vh ( )g to (11) depends on N v , the number of households. We now discuss the limit of the solutions when N v ! 1. To this end, for expositional ease, consider a symmetric equilibrium such that vh ( ) = v ( ) for any h = 1; : : : ; N v ; symmetry is imposed here solely for easy exposition, and a formal proof without symmetry is provided in Appendix A.2. Under symmetry, the functional equation in (11) is reduced to where v;Nv is a functional operator (mapping) from a [0; 1]-valued function g (of random variables, where u vh = u v (L vk ) as formulated in C3-SD. Under C3-IID in (7), we have considered the system of equations that can be eventually de…ned through the unconditional expectations E v [ ]. In contrast, here we have to consider conditional expectations of the form as in (11) and (13). Given the correlation in fu vh g, they do not reduce to the unconditional ones since u vh is useful for predicting others'u vk . However, under the increasing domain asymptotics and a weak dependence condition (i.e., u v (L vk ) and u v (L vh ) are less correlated when jjL vk L vh jj 1 is large), both of which are standard asymptotic assumptions for inference with spatial data, the number of players in the game whose unobservables are almost uncorrelated with any given player (v; h) becomes large as N v ! 1, and further the ratio of such players (among all N v players) tends to 1. As a result, the operator v;Nv [g] converges to the average of the unconditional expectations: for any g, where we call each summand E v [ ] an 'unconditional'expectation in that it is independent of (W vh ; L vh ; u vh ), and we also suppress the dependence of v;1 on v for notational simplicity. 6 The precise meaning of this convergence, together with required conditions, is formally stated in the Appendix (see (81) in the proof of Theorem 5, for the general case without symmetry).
The convergence of the operator v;Nv to v;1 caries over to that of a …xed point of v;Nv (i.e. the solution of v = v;Nv v ) when the limit operator v;1 is a contraction. The above discussion can be summarized as:  (14) is a contraction with respect to the metric induced by the norm jjgjj L 1 := E[jg(W vh ; L vh ; u v (L vh ))j] (g is a [0; 1]-valued function on the support of (W vh ; L vh ; u v (L vh ))), 7 i.e., Let v 2 [0; 1] be a solution to the functional equation g = v;1 [g] (which is unique under the contraction property). Then, for each v, it holds that for any solution v to g = v;Nv [g], which may not be unique, Note that the limit of v , a …xed point of v;1 , corresponds to the equilibrium (constant and symmetric) beliefs for the C3-IID case (a …xed point of m v ( ) in (8); recall that vh = v by This theorem is restated as Theorem 5 in Appendix A.2, where its proof is also provided.
Theorem 5 derives the convergence of the equilibrium beliefs (without the symmetry assumption vh ( ) = v ( )), viz. that the limit of the solution to (11) is given precisely by the solution of (7). The theorem also derives the rate of the convergence in 15: The rate is faster if (1) the area of each village expands quicker as N v ! 1 under the increasing-domain assumption; and if (2) the degree of spatial dependence of fu vh g is weaker. Note that the contraction condition of the limit (unconditional) operator implies existence and uniqueness of the solution, but we do not need to impose it on the operator de…ned via the conditional operator; multiplicity of solutions ( v = v;Nv v ) is allowed for, and any of the solutions would then converge to v , where the existence of a solution can be relatively easily checked using other, less restrictive …xed point theorem.
In sum, this convergence result justi…es the use of Brock and Durlauf (2001a) type speci…cation of constant and symmetric beliefs, even when unobserved heterogeneity exhibits spatial dependence.
This enables us to overcome complications in identi…cation and inference posed by the dependence of beliefs on unobservables. In the next section, we present two estimators -one based on the Brock and Durlauf type speci…cation and another that takes into account the conditional expectation feature of the beliefs as in (10). Then, we (a) show that the di¤erence between the two estimators is asymptotically negligible, and (b) justify using observable group average outcome as a regressor in an econometric speci…cation of individual level binary choice as in Brock and Durlauf's estimation procedure. Menzel (2016): In our discussion of the spatial case, the sequence fu vh g = fu v (L vh )g, de…ned through two independent components, is called subordinated to the stochastic process fu v (l)g via the index variables fL vh g. Subordination has been used previously in econometrics and statistics for modelling spatially dependent processes, c.f. Andrews (2005, Section 7) and Lahiri and Zhu (2006). One implication of subordination is the so-called exchangeability property (see, e.g., Andrews, 2005), and if a sequence of random variables is exchangeable, it can be I.I.D. conditionally on some sigma algebra (often denoted by F 1 , the tail sigma algebra), which is known as de Finetti's theorem (see, e.g., Ch. 7 of Hall and Heyde, 1980). In our setting, this corresponds to the conditional I.I.D.-ness of f(W vh ; L vh ; u v (L vh ))g, given a realization of the stochastic process u v ( ) (as well as that of v ), where F 1 is set as the sigma algebra generated by the random function u v ( ).

Further Discussions and Comparison with
Menzel (2016) has proposed a conditional inference method for games with many players under the exchangeability assumption. Indeed, Menzel (2016) and the present paper are similar in that both consider estimation of a game with the I.I.D. condition relaxed and under many-player asymptotics. However, there are some substantive di¤erences between Menzel's (2016) framework and ours. Firstly, in his conditional inference scheme, the probability law recognized by players in a game is di¤erent from that used by researchers for inference purposes (i.e., the former is the unconditional law and the latter is the conditional law given F 1 ), but they are identical in our setting.
This feature of non-identical laws causes di¢ culty in constructing a valid, interpretable moment restriction that guarantees consistent estimation. In the context of estimating structural economic models (including game theoretic models), such a restriction is usually presented as some exogeneity or exclusion condition that is derived by taking into account players' optimization behavior, i.e., the restriction is constructed based on the players' perspective. This sort of construction may not give a valid moment restriction under the conditional inference scheme where validity has to be judged from the researcher 's perspective with the conditional law. To see this point, consider a simple binary choice example: Y i = 1 fX 0 i + " i 0g, where " i jX i N (0; 1) and X i is a covariate. In the standard case, the parameter can be estimated through E [w (X i ) fY i (X 0 i )g] = 0, where w ( ) is a weighting function, and is the distribution function of N (0; 1). In contrast, under an inference scheme that exploits exchangeability or conditional I.
The F 1 -conditional moment is in general hard to interpret, is not implied by the unconditional one, and it is not always be obvious whether it holds. Indeed, Andrews (2005) discuses failure of consistency in a simple least square regression case when the conditional law is used.
Another feature of Menzel (2016) that is distinct from ours is his focus on aggregate games.
In his setting, players' utilities depend on the 'aggregate state', that is computed through the conditional expectation of others' actions (G mn (s; m ) de…ned in Eq. (2.1) on p. 311, Menzel, 2016). This object is the counterpart of vh in our setting in that players'interactions take place only through the aggregate state m ( vh in our notation). Our vh for the spatially dependent case is de…ned in (10) and (11) through conditional expectations (E[A vk jI vh ]) given all information I vh available to player (v; h), i.e., both the individual variables (W vh ; L vh ; u v (L vh )) and common variable v . On the other hand, a counterpart of Menzel's aggregate state in our context is where the conditional expectation is computed given only the common v (called a public signal on p. 310 in Menzel, 2016, denoted by w m ). The formulation (16)

Econometric Speci…cation and Estimators
In this section, we lay out the econometric speci…cation of our model, and describe estimation of preference parameters (denoted by 1 ), assuming that the observed sample is generated via the game introduced in the previous section and satisfying assumptions C1, C2, and C3-SD (the C3-IID case is simpler, and is nested within the C3-SD case; see more on this below). In particular, we de…ne the true parameter via a conditional moment restriction that is derived from speci…cation of utility functions and the structure of the game in each of v villages. As discussed above, the beliefs in the …nite-player game possess a conditional expectation feature, so the conditional expectation used to de…ne 1 has a complicated form, and consequently the estimator based on it, denoted bŷ SD 1 below, is di¢ cult to implement. Therefore, we construct another, computationally simpler estimator^ 1 based on a conditional expectation restriction derived from the limit model with the limit belief v (derived in Theorem 1), and use it in our empirical application. We call^ 1 Brock-Durlauf type as it resembles the estimator used in Durlauf (2001a, 2007). Since the limit model is not the actual data generating process (DGP), our preferred estimator^ 1 is based on a mis-speci…ed conditional moment restriction. However, we show that the estimator for the …nite-player game with spatial dependence, SD 1 , which takes into account the conditional-expectation feature of the beliefs (as in (10)) shares the same limit as^ 1 that is based on the limit model, as N ! 1, under the asymptotic scheme for spatial data as introduced in the previous section and in Appendix A.2.1. In this sense, the two estimators,^ SD 1 and^ 1 , are asymptotically equivalent, and this result justi…es the use of the simpler, Brock-Durlauf type estimation procedure. This result is formally proved in Theorem 2 below. The key challenge in this proof is showing uniform convergence of the …xed point solutions (beliefs) over the parameter space.
Forms of Beliefs under Spatial Dependence: To develop our estimators, we assume that the players'beliefs in (10) are symmetric: vh = v (W vh ; L vh ; u vh ; v ), i.e., the functional form of v ( ) is common for all the players in the same village v. 8 We note that given the (conditional) independence assumptions in C2 and C3-SD, the forms of the beliefs can be slightly simpli…ed. That is, the beliefs are a …xed point of the conditional expectation operator (13) with (W vh ; L vh ; u vh ; v ) being conditioning variables; however, we can show that and accordingly, the …xed point solution is a function of (L vh ; u vh ; v ) without W vh . 9 Thus, with 8 This can be justi…ed under C1, C2, and C3-SD when the mapping from a [0; 1]-valued function g ( ) to another is a contraction, where I vh = (W vh ; L vh ; u vh ; v ). This contraction condition for the functional mapping is analogous to that for the function m v (r) (de…ned in (8)) in Proposition 2. The proof of symmetric equilibrium beliefs v ( ) is similarly analogous to the proof of Proposition 2, and is omitted for brevity. We provide and discuss a su¢ cient condition for (17) to be a contraction in Appendix A.3. 9 We can prove (18) as follows: The sequence f(W vh ; L vh )g Nv h=1 is conditionally I.I.D. given v (by C2) and thus it is also conditionally independent of the stochastic process fuv (l)g given v (by C3-SD (ii)). Therefore, Since it also holds that (W vh ; L vh ) ? fuv (l)g j v , we apply the conditional independence relation (63) with Q = (W vh ; L vh ), R = (W vk ; L vk ), and S = fuv (l)g, to obtain where the derivations of the second and fourth lines have used the following conditional independence relation: for ran- with C = (L vh ; v ).
slight abuse of notation, we write Linear Index Structure: We now specify the forms of the utility functions. With few large peer-groups (e.g. there are eleven large villages in our application dataset), one cannot consistently estimate the impact of the belief vh on the choice probability function nonparametrically holding other regressors constant. 10 Accordingly, following Manski (1993), and Durlauf (2001a, 2007), we assume a linear index structure with = ( 0 ; 1 ) 0 viz. that utilities are given by where corresponding to Assumptions 1 -2, we assume that 0 > 0, 1 > 0, i.e., non-satiation in numeraire, 1 need not equal 0 , i.e. income e¤ects can be present, and that 1 0 0 , i.e., compliance yields higher utility. These utilities can be viewed as expected utilities corresponding to Bayes-Nash equilibrium play in a game of incomplete information with many players, as outlined in Section 2 above. Below in Section 4, we will provide more details on interpretation of the individual coe¢ cients in (20) when discussing welfare calculations. These details do not play any role in the rest of this section.
Using (20) and the structure of where we have de…ned v : Recall that the probabilistic conditions in C2 and C3-SD are stated conditional on the (realized values of) village-…xed unobserved heterogeneity v , as in the econometric literature on …xed-e¤ects panel data models. In this sense, we can treat v as non-stochastic. Indeed, given many observations per villages, the (realized) values of v can be estimated and are included in a set of parameters to be estimated. We discuss this point further in Section 4.4 below.
Econometric Speci…cations: We now present the alternative estimators. To do this, we need some more notation. Let 1 = (c 0 ; ) 0 denotes a (preference) parameter vector, where c = (c 1 ; c 2 ) 0 is the coe¢ cient vector corresponding to W vh = (P vh ; Y vh ) 0 . In the rest of this Section 3, we 1 0 This is because vh is constant within a village in the (conditionally) I.I.D. case, and this constancy also holds for the limit model in the spatial case. In particular, the …xed point constraint does not help because of dimensionality problems. Indeed, the …xed point condition: = R q1 (p; y; ) dFP;Y (p; y), where FP;Y (p; y), the joint CDF of (P; Y ) is identi…ed, the unknown function q1 (p; y; ) has higher dimension than the observable FP;Y (p; y).
assume that the village-…xed parameters 1 ; : : : ; v are known, which is for notational simplicity; this assumption does not change any substantive arguments on the convergence of the estimators.
We discuss identi…cation/estimation schemes of these parameters below and provide a complete proof for the case when 1 ; : : : ; v are estimated using one of the identi…cation schemes (e.g. the homogeneity assumption) in Appendix A.4. Given (19) and (21), we can write In order to incorporate the …xed-point feature for notational simplicity, we can assume a parametric model of spatial dependence for the stochastic process f" vh g, which is required to compute the functional equations de…ning v .
Corresponding to the de…nition of given " v (l) = e, parametrized by a …nite dimensional parameter 2 2 2 , and the (pseudo) true value is denoted by 2 . We also write the marginal CDF of " v (l) by H (e) and its probability density h (e). In the sequel, we also write the marginal CDF of " v (l) as F " (e), and thus H (e) = 1 F " ( e). The joint distribution function given the location indicesl and l. 11 To develop estimators that incorporate the …xed point restriction, de…ne the following functional operator based on H: for v = 1; : : : ; v, where F ? v;Nv is a functional operator from a [0; 1]-valued function g = g (l; e; 1 ; 2 ) to another function F ? v;Nv [g], and F v W L (w; l) is the joint CDF of (W vh ; L vh ). We provide su¢ cient conditions for this F ? v;Nv to be a contraction in Appendix A.3. Given the above set-up, de…ne the model to be estimated as: where 1 (= (c 0 ; ) 0 ) and 2 denote the true parameters and ? v (L vh ; " vh ; 1 ; 2 ) is a solution to the functional equation de…ned through the operator (23) (for each ( 1 ; 2 ) given): and C1, C2, C3-SD, and some regularity conditions (provided below) are satis…ed. Henceforth, the model (24) will be assumed to be the DGP of observable variables f(A vh ; W vh ; L vh )g Nv h=1 (v = 1; : : : ; v).

Econometric Estimators
De…nition of the Estimand: Suppose for now that the true parameter 2 for the spatial dependence is given. Then, based on (22), we de…ne the true preference parameter 1 (i.e., our estimand) as the solution to the conditional moment restriction: where C v is the conditional choice probability function 12 : Practical Estimator Based on the Limit Model: Given our parametric set-up, we can in principle compute an empirical analogue of (27) by solving an empirical version of the …xed point equation (25). This estimator, denoted below by^ SD 1 , is di¢ cult to compute in practice. Therefore, we consider an alternative estimator based on the simpler conditional moment condition: This is derived from the limit model with the limit beliefs v , which do not depend on the unobserved heterogeneity and other (v; h) speci…c variables. Indeed, the limit model is not the true DGP, and thus this (28) is mis-speci…ed under C3-SD (it is correctly speci…ed under C3-IID). Nonetheless, we show that the estimator based on (28), which we eventually use in our empirical application, can be justi…ed in an asymptotic sense. This simpler estimator is given by:  (24)) and the simpler one^ 1 have the same limit. Potential Estimator for the Finite-Player Game: We now formally introduce the computationally di¢ cult potential estimator^ SD 1 based on (26). It is de…ned through the following objective function: io whereĈ is an estimate of the conditional choice probability that explicitly incorporate conditionalbelief and …xed-point features: and^ ? v (L vh ; e; 1 ; 2 ) is an estimator of the belief and is de…ned as a solution to the following functional equation for each ( 1 ; 2 ): (23)) in which the true F v W;L is replaced byF v W;L : This^ ? v is an empirical version of a solution to (23). A notable feature of this is that it is a function of the unobserved heterogeneity (represented by the variable e). Due to this dependence on e, computation ofĈ in (30) andF ? v;Nv in (32) is di¢ cult, and requires numerical integration of the indicator functions; furthermore, …nding the …xed point^ ? v in the functional equation (31) will also require some numerical procedure.
Here, we do not pursue how to identify and estimate the parameter for the spatial dependence 2 (since our empirical application is not anyway based onL SD ( 1 ; 2 )), but suppose the availability of some reasonable preliminary estimator^ 2 with^ 2 p ! 2 , and de…ne our estimator aŝ SD 1 = argmax Note that given this form of^ SD 1 , we can again interpret this estimator as a moment estimator that solvesM with some appropriate choice of the weight ! W vh ; 1 ;^ 2 . This may be viewed as a sample moment condition based on the population one in (26). The corresponding estimation procedure would be similar to the nested …xed-point algorithm, as in Rust (1987).

Convergence of the Estimators
We now show that jj^ SD based on the correct condition moment restriction (26) and^ 1 based on the mis-speci…ed one (28) are asymptotically equivalent. That is, if^ 1 is consistent, so is^ SD 1 and vice versa; in the proof, we show that both the estimators are consistent for 1 that satis…es (93). This is formally stated in the following theorem: Theorem 2 Suppose that C1, C2, C3-SD, Assumptions 4, 5, 6, 7, and 8 hold. Then The formal proof is provided in Appendix A.4; the outline is as follows. We start by introducing another, intermediate estimator that is based on constant beliefs but solves the Fixed Point problem of the Limit model,^ FPL 1 = argmax is a solution to the …xed point equation for each 1 (…xed): which is the population version of (33) withF v W replaced by the true CDF F v W of W vh . This^ FPL 1 is constructed based on the limit model (with constant beliefs), but it explicitly solves the …xed point restriction (33) (unlike^ 1 derived from the Brock-Durlauf type moment restriction (28)).^ FPL 1 may be interpreted as a moment estimator that is derived from the conditional moment restriction 13 : can also be de…ned as solvingM FPL ( 1) = 0, where, given an appropriate choice of the weight Note that this restriction is also a mis-speci…ed one.
We show the convergence of jj^ SD 1 ^ 1 jj in two steps. In the …rst step, we show that^ FPL 1 and^ 1 have the same limit, which is the solution to a di¤erent conditional moment restriction (See (93) in Appendix A.4). In the second step, we show thatL SD ( 1 ;^ 2 ) is asymptotically well approximated byL FPL ( 1 ) uniformly over 1 2 1 for any sequence of^ 2 (as N ! 1).

Welfare Analysis
We now move on to the second part of the paper, which concerns welfare analysis of policy interventions under spillovers. Since we assume spillovers are restricted to the village where households reside, any welfare e¤ect of a policy intervention can be analyzed village by village. So for economy of notation, we drop the (v; h) subscripts except when we account explicitly for village-…xed e¤ects during estimation. Also, we use the same notation to denote both individual beliefs entering In order to conduct welfare analysis, we impose two restrictions on the utilities. is continuous and weakly decreasing, i.e. conforming yields higher utility than not conforming for each individual.
De…ne q 1 (p; y; ) to be the structural probability (i.e. Average Structural Function or ASF) of a household choosing 1 when it faces a price of p, and has income y and belief : and let q 0 (p; y; ) = 1 q 1 (p; y; ), where F is the CDF of vh Policy Intervention: Start with a situation where the price of alternative 1 is p 0 and the value of is 0 . Then suppose a price subsidy is introduced such that that individuals with income less than an income threshold become eligible to buy the product at price p 1 < p 0 . This policy will alter the equilibrium adoption rate; suppose the new equilibrium adoption rate changes to 1 . How the counterfactual 1 and 0 are calculated will be described below. For given values of 0 and 1 , we now derive expressions for welfare resulting from the intervention. By "welfare" we mean the compensating variation (CV), viz. what hypothetical income compensation would restore the postchange indirect utility for an individual to its pre-change level. For a subsidy-eligible individual, for any potential value of 1 corresponding to the new equilibrium, the individual compensating variation is the solution S to the equation whereas for a subsidy-ineligible individual, it is the solution S to Note that we do not take into account peer-e¤ects again in de…ning the CV because the income compensation underlying the de…nition of CV is hypothetical. So the impact of actual income compensation on neighboring households is irrelevant. Since the CV depends on the unobservable , the same price change will produce a distribution of welfare e¤ects across individuals; we are interested in calculating that distribution and its functionals such as mean welfare.
Intuitively, this condition strengthens Assumption 1 by requiring that utilities can be increased and decreased su¢ ciently by varying the quantity of numeraire. Existence follows via the intermediate value theorem. Under an index structure, existence is explicitly shown below. Finally, uniqueness of the solution to (36) and (37) follows by strict monotonicity in numeraire. Since the maximum of two strictly increasing functions is strictly increasing, the LHS of (36) and (37) are strictly increasing in S, implying a unique solution.
Welfare with Index Structure: In accordance with the literature on social interactions (see Section 3 above), from now on we maintain the single-index structure introduced in (20): with 0 > 0, 1 > 0, and 1 0 0 . 14 In our empirical setting of anti-malarial bednet adoption, there are multiple potential sources of interactions (i.e. 1 ; 0 6 = 0). The …rst is a pure preference for conforming; the second is increased awareness of the bene…ts of a bednet when more villagers use it; the third is a perceived negative health externality. The medical literature suggests that the technological health externality is positive, i.e. as more people are protected, the lower is the malaria burden, but the perceived health externality is likely to be negative if households correctly believe that other households' bednet use de ‡ects mosquitoes to unprotected households, but ignore the fact that those de ‡ected mosquitoes are less likely to carry the parasite. Indeed, the implications for adoption are di¤erent: under the positive health externality, one would expect free-riding, hence a negative e¤ect of others' adoption on own adoption; under the negative health externality, the correlation would be positive.
In particular, let p > 0 denote the conforming plus learning e¤ect, and H denote the health externality. Then it is reasonable to assume that 1 p 0 and 0 = H p 0. In other words, the compliance motive and learning e¤ect together are equal in magnitude but opposite in sign between buying and not buying. Further, if a household uses an ITN, then there is no health externality from the neighborhood adoption rate (since the household is protected anyway), but if it does not adopt, then there is a net health externality e¤ect H from neighborhood use, which makes the overall e¤ect 0 = H p and 1 6 = 0 in general. 15 In the context of ITNs, the technological e¤ects are unlikely to be large enough and/or the villagers are unlikely to be sophisticated enough to understand the potential deterrent e¤ects of ITNs. Therefore, we assume from now on that the perceived health externality is non-positive, and thus 1 0 0 . Given the linear index speci…cation, the structural choice probability for alternative 1 at (p; y; ) is given by where F ( ) denotes the marginal distribution function of ( 1 0 ). It is known from Brock and Durlauf (2007) 1 4 We can also allow for concave income e¤ects by specifying, say, but we wish to keep the utility formulation as simple as possible to highlight the complications in welfare calculations even in the simplest linear utility speci…cation. 1 5 An analogous asymmetry is also likely in the school voucher example mentioned in the introduction if the voucher-led 'brain-drain'leads to utility gains and losses of di¤erent amounts, e.g. if better teaching resources in the high-achieving school substitute for -or complement -peer-e¤ects in a way that is not possible in the resource-poor local school.
the probability distribution of " = ( 1 0 ). In the application, we will consider various ways to estimate the structural choice probabilities, including standard Logit and Klein and Spady's distribution-free MLE. One can also use other semiparametric methods, e.g. Bhattacharya (2008) or Han (1987) that require neither speci…cation of error distributions nor subjective bandwidth choice.
The condition 1 0 0 makes the model di¤erent from standard demand models for binary. In the standard case, for the so-called "outside option", i.e. not buying, the utility is normalized to zero. In a social spillover setting, this cannot be done because that utility depends on the aggregate purchase rate . As we will see below, in welfare evaluations of a subsidy, 1 and 0 appear separately in the expressions for welfare-distributions, but cannot be separately identi…ed from demand data, which can only identify Toward obtaining the welfare results, consider a hypothetical price intervention moving from a situation where everyone faces a price of p 0 to one where people with income less than an eligibilitythreshold are given the option to buy at the subsidized price p 1 < p 0 . This policy will alter the equilibrium take-up rate. Assume that the equilibrium take up rate changes from 0 to 1 . We will describe calculation of 0 and 1 later. For given values of 0 and 1 , the welfare e¤ect of the policy change can be calculated as described below. We …rst lay out the results in detail for the case where 1 > 0 , which corresponds to our application. In the appendix we present results for a hypothetical case where 1 < 0 (which may happen if there are multiple equilibria before and after the intervention). For the rest of this section, we assume that 1 > 0 .

Welfare for Eligibles
The compensating variation for a subsidy-eligible household is given by the solution S to Since LHS is strictly increasing in S, the condition S a is equivalent to If a < p 1 p 0 1 1 ( 1 0 ) < 0, then each term on the LHS of (40) is smaller than the corresponding term on the RHS. If a 0 0 ( 0 1 ) > 0, then each term on the LHS is larger than the corresponding term on the RHS. This gives us the support of S: Remark 1 Note that the above reasoning also helps establish existence of a solution to (39). We know from above that for S < p 1 p 0 , the LHS of (39) is strictly smaller than the RHS, and for S 0 0 ( 0 1 ), the LHS of (39) is strictly larger than the RHS. By continuity, and the intermediate value theorem, it follows that there must be at least one S where (39) holds with equality.
Back to calculating the CDF, now consider the intermediate case where In this case, the …rst term on LHS of (40) is larger than …rst term on RHS for all 1 , and the second term on LHS of (40) is smaller than the second term on the RHS for all 0 , and thus (57) is equivalent to For any given 1 , we have that the probability of (41) reduces to The intercept c 0 , the slopes c 1 ; c 2 and are all identi…ed from conditional choice probabilities; but 1 is not identi…ed, and therefore (42) is not point-identi…ed from the structural choice probabilities. However, since 1 2 [0; ], for each feasible value of 1 2 [0; ], we can compute a feasible value of (42), giving us bounds on the welfare distribution.
Note also that the thresholds of a at which the CDF expression changes are also not point-identi…ed for the same reason. However, since 1 0 > 0 and 0 > 0, 1 > 0, the interval will translate to the left as 1 varies from 0 to .
Putting all of this together, we get the following result: Theorem 3 If Assumptions 1, 2, and the linear index structure hold and 1 > 0 , then given ], the distribution of the compensating variation for eligibles is given by Remark 2 Note that the above theorem continues to hold even if the subsidy is universal; we have not used the means-tested nature of the subsidy to derive the result.
Mean welfare: From (43), mean welfare loss is given by Z 0 Given , the welfare gain in expression (44) is increasing in 1 ; i.e., the welfare gain is largest in absolute value when 1 = and 0 = 0, and the smallest when 1 = 0 and 0 = . Conversely for welfare loss. Intuitively, if there is no negative externality from increased on non-purchasers, then they do not su¤er any welfare loss, but purchasers have a welfare gain from both lower price and higher . Conversely, if all the spillover is negative, then purchasers still get a welfare gain via price reduction, but non-purchasers su¤er welfare loss due to increased . Also, note that under quasilinear utilities, where income e¤ects are absent, the y drops out of the above expressions, but the same identi…cation problem remains, since 1 does not disappear. Changing variables p = p 1 a, one may rewrite (44) as Note that if 1 = 0, then the …rst term is the usual consumer surplus capturing the e¤ect of price reduction on consumer welfare; for a positive 1 , the term 1 1 ( 1 0 ) yields the additional e¤ect arising via the conforming channel. Also, if 1 = 0, then the second term, i.e. the welfare loss from not buying, is the largest (given ): this corresponds to the case where all of is due to the negative externality.
The second term in (45), which represents welfare change caused solely via spillover and no price change, is still expressed as an integral with respect to price. This is a consequence of the index structure which enables us to express this welfare loss in terms of foregone utility from an equivalent price change. To see this, recall eq. (39) which is of the form From Bhattacharya, 2015, this is exactly the form for the compensating variation S 0 in a binary choice model without spillover when income is y 0 and price changes from p 0 0 to p 0 1 . 16

Corollary 1
In the special case of symmetric interactions, i.e. where 1 = 0 in (20) (e.g. if there is no health externality in the health-good example), we get that 1 = 0 2 0 = 1 2 , and from (45) mean welfare equals: If 0 = 0, and = 1 , i.e. all spillover is via conforming, average welfare is given by if on the other hand, all spillover is due to perceived health risk, i.e. = 0 and 1 = 0, then average welfare is given by Equations (47) and (48) correspond to the upper and lower bounds, respectively, of the overall welfare gain for eligibles. 17

Welfare for Ineligibles
Welfare for ineligibles is de…ned as the solution S to the equation max fU 1 (y + S p 0 ; 1 ; ) ; U 0 (y + S; 1 ; )g = max fU 1 (y p 0 ; 0 ; ) ; U 0 (y; 0 ; )g . 1 6 Analogously, the choice probabilities have the form q1 (p; y; ) = F (c0 + c1p + c2y + ) = F c0 + c1 p + c1 + c2y q1 p + c1 ; y , i.e. the choice probabilities under spillover at price p; income y and aggregate use can be expressed as choiceprobabilities in a binary choice model with no spillover at an adjusted price and the same income. 1 7 In independent work, Gautam (2018)  Using the index-structure, S a is therefore equivalent to max 1 + 1 (y + a p 0 ) + 1 1 + 1 ; 0 + 0 (y + a) + 0 1 + 0 If a < 1 1 ( 0 1 ) < 0, then each term on the LHS is smaller than the corresponding term on the RHS for each realization of the s. So the probability is 0. Similarly, for a 0 0 ( 0 1 ) > 0, each term on the LHS is larger, and thus the probability is 1. In the intermediate range, a 2 [ 1 1 ( 0 1 ) ; 0 0 ( 0 1 )), we have that the …rst term on the LHS exceeds the …rst term on the RHS for each 1 , and the second term on the LHS is smaller than the second term on the RHS for each 0 . Therefore, (49) is equivalent to Putting all of this together, we have the following result: Theorem 4 If Assumptions 1, 2, and the linear index structure hold and 1 > 0 , then for each For ineligibles, all of the welfare e¤ects come from spillovers, since they experience no price change. In particular, for ineligibles who buy, there is a welfare gain from positive spillover due to a higher . For ineligibles who do not buy, there is, however, a potential welfare loss due to increased . This is why the CV distribution has a support that includes both positive and negative values. From (50), mean compensating variation is given by Z 0 Using the change of variables, p = p 0 a, the above expression becomes The …rst term in (52)  (i) Equations (54) and (55) correspond to the upper and lower bounds, respectively, of the overall welfare gain for ineligibles, and therefore, the overall bounds generically contain both positive and negative values, since 6 = 0.

Deadweight Loss
The average deadweight loss (DWL) can be calculated as the expected subsidy spending less the net welfare gain. In particular, if 0 = 0 and = 1 , i.e. there are no negative spillover, then from (45) and (51), the DWL equals DW L(y) = 1 fy g (p 0 p 1 ) q 1 (p 1 ; y; 1 ) | {z } Subsidy spending 1 fy g is large enough, then it is possible for the deadweight loss to be negative, i.e. for the subsidy to increase economic e¢ ciency under positive spillover, as in the standard textbook case. This can happen because there is no subsidy expenditure on ineligibles, and yet those that buy enjoy a subsidy-induced welfare gain due to positive spillover. Similarly, eligibles also receive an additional welfare gain via positive spillover, over and above the welfare-gain due to reduced price, and it is only the latter that is …nanced by the subsidy expenditure. In general, the deadweight loss will be lower (more negative) when (i) the positive spillover ( 1 ) is larger, (ii) the change in equilibrium adoption ( 1 0 ) due to the subsidy is greater, and (iii) the price elasticity of demand ( 1 ) is lower -the last e¤ect lowers deadweight loss simply by reducing the substitution e¤ect, even in absence of spillover.

Calculation of Predicted Demand and Welfare
In order to calculate our welfare-related quantities, we need to estimate the structural choice probabilities q 1 (p; y; ) and the equilibrium values of the aggregate choice probabilities, 0 and 1 in the pre and post intervention situations. To do this we will consider two alternative scenarios. The …rst is where we assume that the unobservables = vh are independent of realized values of price and income (conditional on other covariates) in the available, experimental data. The second is where we assume that exogeneity holds, conditional on unobserved village-…xed e¤ects. Note that price in our data are randomly assigned, so the endogeneity concern is solely regarding income. Under income endogeneity, Bhattacharya (2018) had discussed interpretation of welfare distributions as conditional on income. See Appendix A.6 of the present paper for a review of that discussion.
Regardless, calculation of the equilibrium s requires us to either assume exogeneity of observables or to estimate village-…xed e¤ects, conditional on which exogeneity holds, as in our assumptions above.
No Village-Fixed E¤ects: Under the index-restriction (20) and no village-…xed e¤ects, estimation of q 1 (p; y; ) can be done via standard binary regression, using the variation in price and income across and within villages and of observed across villages to estimate the coe¢ cients constituting the linear index. This implicitly assumes, as is standard in the literature, that even if the game can potentially have multiple equilibrium 's, only a single equilibrium is played in each village, and thus one can use the observed from each village as a regressor to infer the preference parameters. Note that given the index structure, we do not need to impose a speci…c distribution for the s to calculate the index coe¢ cients. Any existing semiparametric estimation method for index models can be used for calculations, e.g. Klein and Spady (1993), which requires bandwidth choice and Bhattacharya (2008), which does not.
Finally, the equilibrium values of 0 and 1 can be calculated in each village by solving the …xed point problems Once we obtain the predicted values of 0 and 1 , we can calculate (43) and (50) directly, using previously obtained estimates of the index coe¢ cients.
With Village-Fixed E¤ects: Our data for the application come from eleven di¤erent villages with approximately 180 households per village. It is plausible that utilities from using and from not using a bednet are a¤ected by village-speci…c unobservable characteristics, such as the chance of contracting malaria when not using a bednet. Such e¤ects were termed "contextual"by Manski (1993). Brock and Durlauf (2007) discussed some di¢ culties with estimating social spillover e¤ects in presence of group-speci…c unobservables. To capture this situation explicitly, recall the linear utility structure from Section 2, given by where 0 and 1 denote unobservable village speci…c characteristics. Therefore, U 1 (y p; ; ) U 0 (y; ; ) Since is village speci…c and we have many observations per village, we can use a dummy v for each village, and estimate the regression of take-up on price, income and other characteristics that vary across households h within village v, together with village dummies, i.e.
where F " ( ) refers to the distribution of " = " vh (which may potentially depend on the realized value v for village v). The consistency of these estimates results from exogeneity conditional on village-…xed e¤ects (See assumptions C3-IID (ii) and C3-SD (ii) above).The identi…ed coe¢ cients v of the village dummies therefore satisfy v = v + c 0 + v . We will need to identify the sum But since we have only eleven villages in our data, we do not consider this avenue.
Welfare Calculation with Village-Fixed E¤ects: Once we have a plausible way to estimate the structural choice probabilities, we can proceed with welfare calculation in presence of social spillover and unobserved group-e¤ects, as follows. Consider an initial situation where everyone faces the unsubsidized price p 0 , so that the predicted take-up rate 0 = 0v in village v solves where F v Y (y) is the distribution of income Y vh in village v, and c 1 , c 2 , , and v are estimated as above. Now consider a policy induced price regime p 0 for ineligibles (wealth larger than a) and p 1 for eligibles (wealth less than a). Then the resulting usage 1 = 1v in village v is obtained via solving the …xed point 1v in the equation Finally, average welfare e¤ect of this policy change in village v can be calculated using where W Elig v (y) and W Inelig v (y) are average welfare at income y in village v, calculated from (43) for eligibles and (50) for ineligibles, respectively, using 0v and 1v as the predicted take-up probability in village v (analogous to 0 and 1 in (43) and (50)), 1 2 [0; ] as above.

Empirical Context and Data
Our empirical application concerns the provision of anti-malarial bednets. Malaria is a life-threatening parasitic disease transmitted from human to human through mosquitoes. In 2016, an estimated 216 million cases of malaria occurred worldwide, with 90% of the cases in sub-Saharan Africa (WHO, 2017). The main tool for malaria control in sub-Sahran Africa is the use of insecticide treated bednets. Regular use of a bednet reduces overall child mortality by around 18 percent and reduces morbidity for the entire population (Lengeler, 2004). However, at $6 or more a piece, bednets are una¤ordable for many households, and to palliate the very low coverage levels observed in the mid-2000s, public subsidy schemes were introduced in numerous countries in the last 10 years. Our empirical exercise is designed to evaluate such subsidy schemes not just in respect of their e¤ectiveness in promoting bednet adoption, but also their impact on individual welfare and deadweight loss, in line with classic economic theory of public …nance and taxation. Based on our discussion in Section 4, we focus on two main sources of spillover, viz. (a) a preference for conformity, and (b) a concern that mosquitoes will be de ‡ected to oneself when neighbors protect themselves. Both will generate a positive e¤ect of the aggregate adoption rate on one's own adoption decision, but they have di¤erent implications for the welfare impact of a price subsidy policy.
Experimental Design: We exploit data from a 2007 randomized bednet subsidy experiment conducted in eleven villages of Western Kenya, where malaria is transmitted year-round. In each village, a list of 150 to 200 households was compiled from school registers, and households on the list were randomly assigned to a subsidy level. After the random assignment had been performed in o¢ ce, trained enumerators visited each sampled household to administer a baseline survey. At the end of the interview, the household was given a voucher for an bednet at the randomly assigned subsidy level. The subsidy level varied from 40% to 100% in two villages, and from 40% to 90% in the remaining 9 villages; there were 22 corresponding …nal prices faced by households, ranging from 0 to 300 Ksh (US $5:50). Vouchers could be redeemed within three months at participating local retailers.
Data: We use data on bednet adoption as observed from coupon redemption and veri…ed obtained through a follow-up survey. We also use data on baseline household characteristics measured during the baseline survey. The three main baseline characteristics we consider are wealth (the combined value of all durable and animal assets owned by the household); the number of children under 10 years old; and the education level of the female head of household. 18

Empirical Speci…cation and Results
We work with the linear index structure (20), where y = Y vh is taken to be the household wealth, p = P vh is the experimentally set price faced by the household, = vh is the average adoption in the village. The health externality from bednet use is implicitly accounted for via the dependence of utilities from adoption and non-adoption on the average adoption rate (c.f. eq. (20)). 19 For the empirical analysis, we also use additional controls, denoted by Z vh below, that can potentially a¤ect preferences (U 1 ( ) and U 0 ( )) and therefore the take-up of bednet, i.e. q 1 ( ). In particular, we include presence of children under the age of ten and years of education of the oldest female member of the household. A village-speci…c variable that could a¤ect adoption is the extent of malaria exposure risk in the village. We measure this in our data from the response to the question: "Did anyone in your household have malaria in the past month?". Summary statistics for all relevant variables are reported in Table 1, and their village averages are shown in table 2, for each of the eleven villages in the data.
Our …rst of results correspond to taking F ( ) to be the standard logit CDF of vh = ( 1 vh 0 vh ) (as in (38), i.e. with no …xed e¤ects), and including average take-up =^ v (= 1 N vh P Nv h=1 A vh ) in village as a regressor. 20 As shown in Theorem 2 above, even if unobservables are spatially correlated, our increasing domain asymptotic approximation will lead to consistent estimates of preference parameters. This approximation is reasonable in our empirical setting where the average distance between households within a village typically exceeds 1.5 Kilometers. The marginal e¤ects at mean are presented in Table 3. It is evident that demand is highly price elastic, and that average bednet adoption in the village has a signi…cant positive association with private adoption, conditional on 1 8 Not all households in a village participated in the game. However, at the time of the experiment, non-selected households did not have the opportunity to buy an ITN, and the outcome variables for such households are always zero. So even if we allow for interactions among all households (including non-selected ones), it is easy to make the necessary adjustments in the empirics. See Appendix A.7 for more on this. 1 9 There are some households who live in the village but were not part of the formal experiment. Since the ITN was not available from any source other than via the experiment, this only impacts the game via the computed fraction vh . We clarify this point in Appendix A.7. price and other household characteristics, i.e. > 0 in our notation above. The social interaction coe¢ cient is 2:4 which is less than 4, as required for the …xed point map to be a contraction (see discussion following Proposition 2) in the logit case. The e¤ect of children is negative, likely re ‡ecting that households with children had already invested in other anti-malarial steps, e.g. had bought a less e¤ective traditional bednet prior to the experiment. We also computed analogous estimates where we ignore the spillover, i.e., we drop average take-up in village from the list of regressors. The corresponding marginal e¤ects for the retained regressors are not very di¤erent in magnitude from those obtained when including the average village take-up, and so we do not report those here. Instead, we use the two sets of coe¢ cients to calculate and contrast the predicted bednet adoption rate corresponding to di¤erent eligibility thresholds. These predicted e¤ects are quite di¤erent depending on whether or not we allow for spillover, and so we investigated these further, as follows.
In particular, we consider a hypothetical subsidy rule, where those with wealth less than are eligible to get the bednet for 50 KSh (90% subsidy), whereas those with wealth larger than get it for the price of 250 KSh (50% subsidy). Based on our logit coe¢ cients, we plot the predicted aggregate take-up of bednets corresponding to di¤erent income thresholds . In Figure 1, for each threshold , we plot the fraction of households eligible for subsidy on the horizontal axis, and the predicted fraction choosing the bednet on the vertical axis, based on coe¢ cients obtained by including (solid) and excluding (small dash) the spillover e¤ect. The 45 degree line (large dash) showing the fraction eligible for the subsidy is also plotted in the same …gure for comparison.
It is evident from Figure 1 that ignoring spillovers leads to over-estimation of adoption at lower thresholds and underestimation at higher thresholds of eligibility. To get some intuition behind this …nding, consider a much simpler set-up where an outcome Y is related to a scalar covariate X via the classical linear regression model Y = 0 + 1 X + where is zero-mean, independent of X and 1 > 0. OLS estimation of this model yields estimators^ 1 ,^ 0 with probability limits (and also expected values , respectively. Corresponding to a value x of X, the predicted outcome has a probability limit of y : Having obtained these (uncompensated) e¤ects, we now turn to calculating the average demand and the mean compensating variation for a hypothetical subsidy scheme. We consider an initial situation where everyone faces a price of 250 KSh for the bednet, and a …nal situation where an bednet is o¤ered for 50 KSh to households with wealth less than = 8000 KSh (about the 27th percentile of the wealth distribution), and for the price of 250 KSh to those with wealth above that.
The demand results are reported in Table 4, and the welfare results in Table 5. We perform these calculations village-by-village, and then aggregate across villages. To calculate these numbers, we …rst predict the bednet adoption when everyone is facing a price of 250 KSh, and then when eligibles face a price of 50 KSh and the rest stay at 250 KSh, giving us the equilibrium values of 0 and 1 , respectively, in our notation above. In all such calculations with our data, we always detected a single solution to the …xed point (i.e. a unique equilibrium) as can be seen from Figure 2 The …rst row of Table 4 shows the pre-subsidy predicted demand (using a logit CDF F ) by subsidy-eligibility. In the second row, we calculate the predicted e¤ect of the subsidy on demand, and break that up by the own price e¤ect (Row 2) and the spillover e¤ect (row 3). The own e¤ect is obtained by changing the price in accordance with the subsidy but keeping the average village demand equal to the pre-subsidy value; the spillover e¤ect is the di¤erence between the overall e¤ect and the own e¤ect. It is clear that spillover e¤ects on both eligibles and ineligibles are large in magnitude. In particular, the spillover e¤ect raises demand for ineligibles by nearly 33% of its pre-subsidy level.
In Table 5, we report welfare calculations. First, in the row titled "Logit", we report the average CV of the subsidy rule for eligibles, corresponding to assuming no spillover. In this case, we simply use the results of Bhattacharya (2015) to calculate the (point-identi…ed) average CV for eligibles as the price changes from 250 KSh to 50 KSh. This yields the value of welfare gain to be 51:9 KSh.
As there is no spillover, the welfare change of ineligibles is zero by de…nition, and therefore the net welfare gain, denoted by net CV is simply the fraction eligible (0:27) times the average CV for eligibles. This is reported in the second column of Table 5.
We next turn to the case with spillover. Using the predicted adoption rates 0 and 1 , we compute the lower and upper bounds of the overall average CV using (45), (47) and (48) for eligibles, and using (52), (54) and (55) for ineligibles. These are reported in Columns 3-6 of Table   5. The most conspicuous …nding from these numbers is that ineligibles can su¤er a large welfare loss on average due to the subsidy. This is because the subsidy facilitates usage for solely the eligibles, raising the equilibrium usage in the village, but the ineligibles keep facing the high price, and thus a lower utility from not buying because is now higher (in the index speci…cation, 0 0).
However, the few ineligibles who buy, despite the high price, get some welfare increase from a rise in the average adoption rate, that explains the small upper bound corresponding to the case 0 = 0.
As for eligibles, the lower and upper bounds on average welfare gain do not contain the estimate that ignores spillovers, suggesting over-estimation of welfare gains in the latter case. This is also consistent with Figure 1, where we see that at 27% eligibility and lower, demand is overestimated Group-E¤ects: It is evident from table 2 that villages 1 and 11 are highly similar in terms of the average values of key regressors, except that the (randomly assigned) average price in village 1 is much higher than in village 11, which explains the much lower average adoption in village 11.
Given this, we assume that villages 1 and 11 are likely to be similar in terms of their unobservables, and as such, we estimate a single v for them. Speci…cally, we …rst estimate where Z vh is a vector containing presence of children and female education, the v s are village-speci…c intercepts (estimated using dummies for the villages), and P vh and Y vh are price faced by the household in the experiment and its wealth, respectively. In the second step, we solve the linear for and v , for v = 1; :::; 11, where v is obtained in the previous step, and the v s are the average adoption rates in individual villages in the experiment. In solving this system, we set 1 = 11 , which incorporates the homogeneity assumption discussed above. We can do all of this in one step by adding nine dummies for villages 2-10 and one for villages 1 and 11, and then running a regression of individual use on the regressors p; y and x, the average use in each village, as well as the village dummies. In the second row in Table 5, we report the average welfare e¤ects of the same hypothetical policy change as described above, using expression (60).
Next, we use the correlated random e¤ect approach described above, where village averages of observable regressors (price, wealth, female education, number of children) are added as additional controls in a probit (instead of logit) regression. The corresponding welfare results are reported in the third row of table 5.
Semiparametric Estimates: Finally, in the fourth row of Table 5, we report welfare results from a semiparametric index estimation of the conditional choice-probabilities, i.e. retaining the index structure but dropping the logit assumption. This is achieved by using the "sml" routine (de Luca, 2008) in Stata which implements Klein and Spady's (1993) estimator for single index models, using (i) a default bandwidth of h n = n 1=6:5 to estimate the index, and then (ii) a local cubic polynomial for regressing the binary outcome on the estimated index to produce the predicted probabilities, using a bandwidth of h n = cn 1=5 where c is chosen via leave-one-out cross-validation.
The welfare numbers do vary a bit across speci…cations. But all of these results support the overall conclusion that accounting for spillovers can lead to much lower estimates of net welfare gain from the subsidy program and higher deadweight loss. Some of this di¤erence arises from potential welfare loss su¤ered by ineligibles that is missed upon assuming no spillover, and some from the impact of including spillover terms on the prediction of counterfactual purchase-rates (c.f. Fig 1).
In Table 6, we report standard errors for the simple logit case. In principle, one can also derive formulae for standard errors adjusted for spatial correlation, but given that the paper is already quite long, and such standard errors contribute nothing substantive, we do not attempt that here. Table 6 also reports the welfare calculations corresponding to the special case where This would be reasonable when there is no negative externality due to de ‡ection, i.e. H = 0 above, whence average welfare becomes point-identi…ed. Note that this case is di¤erent from the results obtained assuming no spillover whatsoever, i.e. the …rst row third column of table 5. We still obtain a negative average e¤ect of the subsidy due to the larger aggregate welfare loss of ineligibles compared with the gains of eligibles.
Comparative Statics: In Table 7, we show how the welfare e¤ects change as we vary the generosity of the subsidy scheme; the wealth threshold for quali…cation is varied so that either 20%, 40% or 60% of the population is eligible. It is apparent from Table 7 that the upper bound on welfare loss for ineligibles increases as more people become eligible (since equilibrium take up is higher), and the deadweight loss larger still due to both a larger extent of subsidy induced distortion, as well as the higher welfare loss of ineligibles. The lower bound on the welfare gain for eligibles decreases as the share eligible increases, in fact it becomes negative when 40% are eligible. This is because those among the eligible who are too poor to buy the bednet even at the 50Ksh price are now experiencing a welfare loss since equilibrium take-up is higher. The overall e¤ect is an unambiguous increase in the deadweight loss.
Endogeneity: Price variation is exogenous in our application, since price was varied randomly by the experimenter. Indeed, it is still possible that wealth Y is correlated with , the unobserved determinants of bednet purchase. However, experimental variation in price P implies also that P is independent of , given Y . Consequently, one can invoke the argument presented in Bhattacharya

Summary and Conclusion
In this paper, we develop tools for economic demand and welfare analysis in binary choice models with social interactions. To do this, we …rst show the connection between Brock-Durlauf type social interaction models and empirical games of incomplete information with many players. We analyze these models under both I.I.D. and spatially correlated unobservables. The latter makes individual beliefs conditional on privately observed variables, complicating identi…cation and inference. We show when and how these complications can be overcome via the use of a limit model to which the …nite game model converges under increasing domain spatial asymptotics, in turn yielding computationally simple estimators of preference parameters. These lead to consistent point-estimates of potential values of counterfactual demand resulting from a policy-intervention, which are unique under unique equilibria. However, with interactions, welfare distributions resulting from policy changes such as a price subsidy are generically not point-identi…ed for given values of counterfactual aggregate demand, unlike the case without spillovers. This is true even for fully parametric speci…cations, and when equilibria are unique. Non-identi…cation results from the inability of standard choice data to distinguish between di¤ erent underlying latent mechanisms, e.g. conforming motives, consumer learning, negative externalities etc., which produce the same aggregate social interaction coe¢ cient, but have di¤erent welfare implications depending on which mechanism dominates. This feature is endemic to many practical settings that economists study, including the health-product adoption case ex-amined here. Another prominent example is school-choice, where merit-based vouchers to attend a fee-paying selective school can create negative externalities by lowering the academic quality of the free local school via increased departure of high-achieving students. The resulting welfare implications cannot be calculated based solely on a Brock-Durlauf style empirical model of individual school-choice inclusive of a social interaction term. This is in contrast to models without social interaction, where choice probability functions have been shown to contain all the information required for welfare-analysis. Nonetheless, we show that under standard semiparametric linear index restrictions, welfare distributions can be bounded. Under some special and untestable cases e.g. exactly symmetric spillover e¤ects or absence of negative externalities, these bounds shrink to point-identi…ed values.
We apply our methods to an empirical setting of adoption of anti-malarial bednets, using data from an experiment by Dupas (2014) in rural Kenya. We …nd that accounting for spillovers provides di¤erent predictions for demand and welfare resulting from hypothetical, means-tested subsidy rules. In particular, with positive interaction e¤ects, predicted demand when including spillover is lower for less generous eligibility criteria, compared to demand predicted by ignoring spillovers.
At more generous eligibility thresholds, the conclusion reverses. As for welfare, if negative health externalities are present, then subsidy-ineligibles can su¤er welfare loss due to increased use by 13 KSh when all spillovers are ignored. The potential welfare loss of ineligibles and non-buyers translates into larger estimates of potential deadweight loss from price intervention. We perform robustness checks allowing for village-level unobservables and a semiparametric speci…cation.
The implication of these results for applied work is that under social interactions, welfare analysis of potential interventions requires more information regarding individual channels of spillover than knowledge of solely the choice probability functions (inclusive of a social interaction term).
Belief-eliciting surveys provide a potential solution.
We conclude by noting that we have used the basic and most popular speci…cation of interactions, viz. that physical neighbors constitute an individual's peer group. This also seems reasonable in the context of our application, which concerns adoption of a health product in physically separated Kenyan villages. It would be interesting to extend our analysis to other network structures, e.g. those based on ethnicity, caste, socioeconomics distance, etc. We leave that to future work.

Figure 1. Predicted equilibrium adoption of ITN under changing eligibility rule for subsidy, plotted against fraction eligible
Notes: We consider a hypothetical subsidy in which eligible gets a price of 50Ksh for an ITN while the rest face a price of 250Ksh. We plot the predicted aggregate take-up of ITNs corresponding to different eligibility shares, based on coefficients obtained by including (solid) and excluding (small dash) the spillover effect.      Table 5 row 2, we group villages 1 and 11.  The table shows estimated welfare effects. "LB" stands for lower bound, and "UB" for upper bound. CV = compensating variation. In the "group effect" estimation, we group villages 1 and 11.    using some function g vk ( ) which may depend on each index (v; k) but is deterministic (non-random).

Thus, plugging this expression of vk into
we can also write for some deterministic function f vk ( ), where W vk = (Y vk ; P vk ).
By C3-IID, we have the two of the conditional independence restrictions: (u vh ; u vk ) ? (W vh ; L vh )j v and u vh ? u vk j v . These imply that where we have used the following conditional independence relation: for random objects Q, R, and S, which is applied with Q = u vk , R = (W vh ; L vh ), and S = u vh . By the same token, C3-IID implies that which is equivalent to We below denote by E v [ ] the conditional expectation operator given v (i.e., E[ j v ]; we also write for any random variable). Given the above, we have where the …rst equality uses (61), the second and third equalities follow from (62) where henceforth we suppress the dependence of vk on v for notational simplicity. By Proposition 1 and (6), we have Given these, we can write We can easily see that if a symmetric solution to the system of N v equations in (67) exists uniquely, then that of (7) (in terms of f vh g Nv h=1 ) also exists uniquely (vice versa; note that vh = P Nv k=1 vk (N v 1) vh by (66)). Therefore, we investigate (67).
Corresponding to (67), de…ne an N v -dimensional vector-valued function of r = (r 1 ; r 2 ; : : : ; r Nv ) 2 where we write P 1 k Nv; k6 =h = P k6 =h for notational simplicity, and the metric in the domain and range spaces of M v is de…ned as jjs sjj 1 := max (68), this r must be a unique solution, which is a set of symmetric beliefs. The proof is completed.

A.2 The Spatially Dependent Case
In this section, we present formal speci…cations for the spatially dependent process fu vh g and derive the belief convergence result. We prove Theorem 5 below, which is a …ner, more general version of Theorem 1 in Section 2 in that it also derives the rate of convergence without the assumption of symmetric beliefs.
Note that given C1 (independence over villages), each village may be analyzed separately.

So for notational simplicity, we drop the village index
All of the conditions and statements here should be interpreted as conditional ones given v for each village v, where we note that C2 and C3-SD are stated conditionally on v .
To avoid any notational confusion, we re-write C2 and C3-SD in the following simpli…ed forms (without the village speci…c e¤ects v and village index v): C3-SD' fu h g N h=1 is de…ned through u h = u(L h ), where fu (l)g l2R 2 is a stochastic process on R 2 with the following properties: i) fu (l)g is alpha-mixing satisfying Assumption 3 (provided below); ii) fu (l)g l2R 2 is independent of f(W h ; L h )g N h=1 .

A.2.1 Spatially Mixing Structure
Now, we provide additional speci…cations of fu h g modelled as a spatially dependent process. To this end, we introduce some more notation. For a set L where jD j j stands for the volume of each square D j . Given these, we de…ne alpha-(strong) mixing coe¢ cients of the stochastic process fu (l)g by where d(L 1 ; L 2 ) is the distance between two sets: d(L 1 ; L 2 ) := inffjjl l jj 1 : l 2 L 1 ;l 2 L 2 g, jjl l jj 1 stands for the l 1 -distance between two points in R 2 : jl 1 l 1 j+jl 2 l 2 j for l = (l 1 ; l 2 ) andl = (l 1 ;l 2 ). 21 We suppose (a; b) is decreasing in a (and increasing in b). In particular, the decreasingness of in a implies that u(l) and u(l) are less correlated when jjl l jj 1 is large, i.e. the process is weakly dependent when the mixing coe¢ cients (a; b) decay to zero as a tends to in…nity.
For location variables fL h g, we consider the following increasing-domain asymptotic scheme, which roughly follows Lahiri (1996). We regard R 0 as a 'prototype' of a sampling region (i.e., village), which is de…ned as a bounded and connected subset of R 2 , and for each N , we denote by R N a sampling region of the village that is obtained by in ‡ating the set R 0 by a scaling factor N ! 1 maintaining the same shape, such that In particular, if R 0 contains the origin 0 2 R 2 , we can write R N = N R 0 , which may be assumed WLOG. It is also assumed that R 0 is contained in a square whose sides have length 1, WLOG.
Thus, the area of R N is equal to or less than 2 N . We let f 0 ( ) be the probability density on R 0 , and then for s h f 0 ( ), where the dependence of L h on N is suppressed for notational simplicity. 22 Given these, we have , and the expected number of households residing in a region A R N ( R 2 ) is We can also compute the expected distance of two individuals with L k and L h : 2 1 For the veri…cation of Theorem 5 below, this de…nition of the mixing coe¢ cients using R(b) is slightly more complicated than necessary. We maintain this de…nition, however. It is the same as the one used in Lahiri and Zhu (2006), and they howed validity of a spatial bootstrap under this de…nition and some mild regularity conditions. 2 2 Note that when R 0 does not contain the origin, we need to consider some location shift: instead of (72), where s is some point in R 0 such that the region 'R 0 s '(shifted by s ) contains the origin.

A4
using changing variables withs =l= N and s = l= N . Since the second term on the last line is a …nite integral (independent of N ), which exists under sup s2R 0 f 0 ( ) < 1, the average distance between any k and h grows at the rate of N . This sort of growing-average-distance feature is key to establishing limit theory for spatially dependent data under the weakly dependent (mixing) condition above. We discuss this point and its implications below after introducing Assumption 3. Now, we state the following additional conditions on the data generating mechanism: Assumption 3 (i) The stochastic process fu (l)g l2R N is alpha-mixing with its mixing coe¢ cients Condition (i) controls the degree of spatial dependence of fu (l)g, which is a key for establishing limit (LLN/CLT) results. The same condition is used in Lahiri and Zhu (2006), and some analogous conditions are also imposed in other papers such as Jenish and Prucha (2012). (ii) is the increasingdomain condition, and is important for establishing consistency of estimators (Lahiri, 1996). The uniform boundedness of the density is imposed for simplifying proofs, but can be relaxed at the cost of a more involved proof.
Conditions (i) and (ii) have an important implication for identi…cation and estimation of our model: Given the increasing-domain condition (ii), the distance between two of individuals, k and h, on average, increases with the rate N ! 1 as N ! 1, as in (73). This implies that, given the weak dependence condition (i), the correlation between two variables, k and h , for any k and h, becomes weaker as N tends to 1. In other words, for each h, the number of other individuals who are almost uncorrelated with h tends to 1 and, furthermore, the ratio of such individuals (among all N players) tends to 1. That is, the conditional law of u(L k ) and that of A k are less a¤ected by We formally verify this convergence result in Theorem 5.
Note that such convergence is not speci…c to our speci…cation of the data-generating mechanism, but it occurs generically in settings with spatial data. For example, Jenish and Prucha (2012) derive various limit results for spatial data (or random …elds) under the increasing-domain assumption and the so-called minimum distance condition , where the latter means that the distance between any two individuals is larger than some …xed constant d > 0 (independent of N ). 23 These two assumptions imply that the number of individuals who are 'far away' from each h tends to 1. Before concluding this subsection, we present the following Assumption 4 under which Theorem 1 in Section 2 is veri…ed. This is a multi-village version of Assumption 3 in which we allow for v > 1 and v 6 = 0 (and thus vh = v + u vh ): Assumption 4 (i) For each v 2 f1; : : : ; vg, given v , the stochastic process fu v (l)g l2R N v is alphamixing with its mixing coe¢ cients satisfying v (a; b) Ca 1 b 2 for some constants C 2 (0; 1), given v , let fL vh g Nv h=1 be the conditionally I.I.D. sequence introduced in C2. Each L vh is continuously distributed with its support sampling region for each village v and N is a scaling constant with N= 2 N ! c for some c 2 (0; 1).

A.2.2 Convergence of Equilibrium Beliefs
To formally state our belief convergence result, we introduce the following functional operator T 1 that maps a [0; 1]-valued function g to some constant in [0; 1]: where T 1 [g] is independent of k by the (conditional) I.I.D.-ness of fW k ; L k g (W k = (Y k ; P k ) 0 ) and the independence between fW k ; L k g and fu(l)g, imposed in C2'and C3-SD'. If f(W k ; L k ; u(L k ))g N k=1 were I.I.D., the equilibrium beliefs would be characterized as a …xed point of this T 1 (as clari-…ed through Propositions 1 and 2). While beliefs are given as conditional expectations under the spatial dependence of unobserved heterogeneity as modelled in C3-SD'they are still characterized through T 1 in an asymptotic sense stated below.
To show this, we introduce the following mapping to characterize the beliefs under C3-SD'for each N . Let g N = (g 1 ; : : : ; g N ) be an N -dimensional vector valued function, each element of which is a [0; 1]-valued function g h on the support of (W h ; L h ; u(L h )). Then, de…ne T N as a functional mapping from g N to an N -dimensional random vector: T N g N := (T N;1 g N ; : : : ; T N;N g N ); for any d > 0, k 6 = h, where the convergence holds as the area of (u; r) jju rjj1 1 N d shrinks to zero and f0 ( ) is uniformly bounded; thus for any d > 0, we have the minimum distance condition with probability approaching 1.
where each T N;h g N is a mapping from g N to a [0; 1]-valued random variable de…ned as Note that T N;h g N corresponds to individual h's belief h (this is written as vh in Section 2 where multiple villages are considered), when h predicts other k's behavior using g k (W k ; L k ; u(L k )).
Therefore, in the equilibrium, the system of beliefs, Note that (75) may be equivalently written in the following coordinate-wise form:  (75), which may not be unique, where C 2 (0; 1) is some constant (independent of N , N , and ), whose explicit expression is provided in the proof, and thus An important pre-requisite of Theorem 5 is that the mapping T 1 is a contraction. This condition is easy to verify, e.g., see  (71).
Proof of Theorem 5. De…ne a functional mapping T 1 N;h from an N -dimensional vector valued function g N = (g 1 ; : : : ; g N ) to r 2 [0; 1]: where T 1 is de…ned in (74) (as a mapping on scalar valued functions), and each g N h is a [0; 1]-valued function on the support of (W h ; L h ; u(L h )). Based on this T 1 N;h , we also de…ne an N -dimensional vector mapping: ( 1 (W 1 ; L 1 ; u(L 1 )); : : : ; N (W N ; L N ; u(L N ))) = T N N : where T N maps an N -dimensional vector valued function to an N -dimensional random vector.
Given (78) and (79), we can see that Thus, by the triangle inequality and the contraction property of T 1 , we have By the de…nition of T 1 N;h in (77) as well as that of N = ( ; : : : ; ), the second term on the majorant side is bounded by where the last inequality follows from the contraction condition on T 1 . Thus, this bound and (80) lead to Therefore, if it holds that Proof of (81). For notational simplicity, we write ; for an arbitrary function, g : [0; 1] ! [0; 1]. Then, the inequality (81) follows if where the supremum is taken over any (Borel measurable) functions, g : [0; 1] ! [0; 1].
To show this inequality, observe that by (ii) of C3-SD', Here, we recall the following result on independence: for random objects Q, R, and S, Q ? R jS and R ? S ) (Q; S) ? R: Applying this with Q = fu (l)g, R = (W k ; L k ), and S = (W h ; L h ), since C2'implies that (W h ; L h ) ?
(W k ; L k ), we can obtain which in turn implies that The relation (84) also leads to by the independence of fu(l)g and (W k ; L k ), and the last inequality uses the Fubini theorem.
To bound the RHS of (88), note that for jjl ljj 1 > 0, we can always construct two sets on R 2 ,L and L satisfying 1) the former containsl and the latter contains l, 2) the distance between the two sets is larger than jjl ljj 1 =2, 3) Each ofL and L is a square in R N with its area less than 1. u(l) and terms of (jjl ljj 1 =2; 1). That is, since jm g j is uniformly bounded ( 1), we obtain uniformly over anyw,l, and l.
To …nd an upper bound of the majorant side of (88), recall that the (marginal) distribution function F L (whose support is given by R N ) has the density f L (l) = (1= 2 N )f 0 (l= N ) for each N , and also that by the de…nition of the mixing coe¢ cients in (69)   Thus, we can see that this upper bound of (88) is independent of h, k, and g, and thus the inequality (82) holds with C := 6 [2 + C2 1 ] f 2 0 , completing the proof.

A.3 Su¢ cient Conditions for Contraction
Here, we investigate the contraction property of F ? v;Nv (de…ned in (23)) as well as its limit operator:

A12
The restriction for g being nondecreasing-ness is innocuous when considering …xed points of F ? v;1 and F ? v;Nv . This is because, given the non-negativity of and the stochastic-dominance of H, the …xed points are also nondecreasing in e (since F ? v;1 [g] and F ? v;Nv [g] are also nondecreasing in e for such a nondecreasing). In this proposition, we have de…ned the limit operator F ? v;1 on the set of general functions, g(l; e; 1 ; 2 ), which may depend on (l; e). This general domain space is required to consider the convergence of the operator F ? v;Nv and its …xed point. However, if we de…ne the limit operator F ? v;1 only on the restricted space of functions, g( 1 ; 2 ), each of which is independent of (l; e), we can write Note that in the probit speci…cation in which " vh is supposed to follow the standard normal, sup e2R f " (e) = 1= p 2 ; and the logit speci…cation, sup e2R f " (e) = 1=4.
Thus, we can …nd a unique e 0 satisfying w 0 c + v + g(l; e 0 ; 1 ; 2 ) + e 0 = 0; for each (w;l; 1 ; 2 ). For each a 0, let e be a unique number satisfying w 0 c + v + g(l; e; 1 ; 2 ) + a + e = 0: Since a 0 and the slope of the function g(l;ẽ; 1 ; 2 ) +ẽ is greater than or equal to 1, we must have e 0 > e and (e 0 e) 1 a. This upper bound of (e 0 e) holds for any (w;l; 1 ; 2 ). Thus, Therefore, if (92) holds, the so-called discounting condition is satis…ed. Therefore, given I) and II), we have veri…ed F ? v;1 is a contraction. Next, we investigate F ? v;Nv . Note that since g(l; e; 1 ; 2 ) is nondecreasing in e, so is 1fw 0 c + v + g(l;ẽ; 1 ; 2 ) +ẽ 0g, and given (ii) of Assumption 5, the mapped function F ? v;Nv [g] (l; e; 1 ; 2 ) is also nondecreasing. Therefore, the domain and range spaces of F ? v;Nv can be taken to be identical. We can also check the Blackwell su¢ cient conditions for F ? v;Nv exactly in the same way as for F ? v;1 , implying the desired contraction property.

A.4 Proof of Theorem 2 (the Estimators'Convergence)
Here, we prove Theorem 2 through several lemmas. In Section 3, for ease of exposition, we assumed that the village-…xed e¤ects 1 ; : : : ; v are known to the econometrician. Here, we explicitly include them in the parameter 1 to be estimated. Note also that identi…cation of preference parameters in presence of 0 s requires identi…cation of the 0 s themselves; hence we need to use one of the methods for doing so, as described in Section 4.4. Here we use the homogeneity assumption 1 = v ; an alternative proof can be given for the correlated random e¤ects case. To sum up, for this section, we re-de…ne the eventual parameter as 1 = (c 0 ; 1 ; : : : ; v 1 ; ) (see e.g. Assumption 7), with all other related quantities interpreted analogously. Consistency of the estimators for the case with 1 ; : : : ; v known is a simpler corollary of Theorem 2.
To analyze^ FPL and^ BR , we de…ne the following conditional moment restriction: where A 1 vh is a hypothetical outcome variable based on the limit model 24 : For each v, let r v = lim Nv N , where this limit ratio value is supposed to be in (0; 1) (note that N = P v v=1 N v ). We also consider the limit versions ofL FPL ( 1 ) andL BR ( 1 ), is de…ned as a solution to (34) for each 1 , and v in L BR ( 1 ) is de…ned as the (probability) limit of^ v = 1 Nv P Nv h=1 A vh (note that the limits of^ v and 1 Nv P Nv h=1 A 1 vh coincide, which follows from arguments analogous to those in the proof of Lemma 3). The …rst order condition of L FPL ( 1 ) may be seen as an unconditional moment restriction based on the conditional one (93).
Note that given the continuity of F " ( ), L FPL ( 1 ) and L BR ( 1 ) are continuous in 1 . Lemma 3 shows the uniform convergence ofL FPL ( 1 ) to L FPL ( 1 ) in probability over 1 ; we can also show that ofL BR ( 1 ) to L BR ( 1 ) in probability over 1 (the proof this result is analogous to that of Lemma 3, and is omitted).
Given the limit objective function, we let Lemma 2 shows identi…cation of 1 (i.e., it is a unique maximizer of L FPL ( 1 ) over 1 ) and the same result as for # 1 . As a result, by Theorem 2.1 of Newey and McFadden (1994), given the compactness of the parameter space 1 , we obtain In this subsection, we investigate identi…cation of 1 and # 1 (de…ned in (95) and (96), respectively). To this end, we impose the following conditions: and and the (marginal) CDF of " v (l) is F " ( ) for each l 2 L v , whose functional form is supposed to be known, and F " ( ) is strictly increasing on R with its continuous PDF f " ( ) satisfying sup z2R f " (z) <

1.
(ii) The random vector W vh includes no constant component. The support of (W 0 vh ; 1) 0 is not included in any proper linear subspace of R d W +1 , where d W is the dimension of W vh . Assumption 6 is quite standard. The condition in (i) on the support of " v (l) may be relaxed, allowing for some bounded support (instead of R), but it simpli…es our subsequent conditions and proofs and thus is maintained.

Conditions
This is an extension of (97) to the model-based probabilities for all ( 1 ; : : : ; v 1 ; ) 0 in the parameter space, where we note that (99) implies (97)   are satis…ed, in which (iv) is satis…ed with c } of this } 1 , then for 1 2 1 ,   (95) and (96), respectively) may di¤er in general, this lemma states that they are identical if we suppose the correct speci…cation, under which we will identify them and always write 1 hereafter.
Given this function of , we consider the set of its values: o : Next, we compute the Jacobian matrix of K with respect to = ( 1 ; : : : ; 10 ; ) 0 : where the upper-left 10-by-10 submatrix is the identity matrix. This matrix (@=@ 0 )K ( ; ) has dominant diagonals for any ( ; ) in the sense of Gale and Nikaido (1965, p. 84), that is, letting , whose dependence on c and v is suppressed for notational simplicity, (@=@ 0 )K ( ; ) is said to have dominant diagonals if we can …nd strictly positive numbers d v and it is possible to …nd some d 1 2 (0; 1) since which is imposed in (99) of Assumption 7. Since (@=@ 0 )K ( ; ) has dominant diagonals for each . Therefore, we can de…ne a function ( ) on V , i.e., the inverse function of ( ) introduced in (108). That is, we have shown that ( ) is one-to-one (injective; ( ) 6 = (~ ) for 6 =~ ), implying the desired result (107). We have now completed Case 5) and thus the whole proof.
Proof of Lemma 2. Given the de…nition of A 1 vh in (94), observe that where the …rst equality follows from the law of iterated expectations and the correct speci…cation assumption and the inequality holds by Jensen's inequality. By the strict concavity of log, this inequality holds with equality if and only if F , which is equivalent to 1 = 1 by (b) of Lemma 1. That is, we have shown that 1 is the unique maximizer of L FPL ( 1 ) over 1 .
To establish the same result for L BR ( 1 ), note that ? v ( 1 ) is the …xed point, and thus the condition (93) (that determines 1 Therefore, meaning that the conditional choice probability model with v (instead of ? v ( 1 )) is also correctly speci…ed at 1 = 1 . By the same arguments as in (110), we can see that 1 is also the unique maximizer of L BR ( 1 ) over 1 . The proof is completed.
Proof of Lemma 3. By boundedness of the support of W vh and boundedness of the parameter is bounded away from 0 and 1 uniformly over 1 , v, and (any realization of) W vh , i.e., we can …nd some (small) constant 2 (0; 1=2) (independent of 1 and v) such that Thus, given the globally Lipschitz continuity of log ( ) on [ ; 1 ], and that of F " ( ) and ? v ( ) (see the global Lipschitz continuity result (120) in the proof of Lemma 5), as well as the uniform boundedness of f " ( ), we can see that are also globally Lipshitz continuous in 1 , implying the global Lipschitz continuity of L FPL ( 1 ) in 1 , we de…ne the following function: Given the uniform convergence of^ ? v ( 1 ) to ? v ( 1 ) (Lemma 5), by arguments analogous to those for the global Lipschitz continuity of L FPL ( 1 ), we can easily see that if the pointwise convergence holds which is to be shown below. And, analogously to the proof of Lemma 7 below, we can obtain as its simpler corollary. Then, using this result and arguments quite analogous to the proof of Lemma 4 below, we also have implying that Then, by (112) and (114), we can obtain the desired conclusion of the lemma. It remains to show the pointwise convergence (113), note that each summand ofL FPL ( 1 ) is a function of 1 , W vh , and u vh (since u vh = (u 0 v (L vh )); u 1 v (L vh )) 0 and " vh = " v (L vh ) = u 1 v (L vh ) u 0 v (L vh )). Thus, letting which is uniformly bounded since (111) holds, we can apply Lemma 6 to obtaiñ where r v 2 (0; 1) is the limit of N v =N . This completes the proof.
Proof of Lemma 4. Let Recall also the de…nition of H (e) = 1 F " ( e) (F " is the CDF of "), these lower and upper bounds can be computed as Since F " is Lipschitz continuous, both the bounds converge to F " W 0 vh c + v + ? v ( 1 ) in probability. Further, the absolute di¤erence of the lower and upper bounds is bounded by sup z2R f " (z) 2 j jk Nv , implying the uniform convergence ofĈ (W vh ; L vh ; 1 ; 2 ) as in (103).

1)
To show the pointwise convergence, we compute E j^ ? v ( 1 ) ? v ( 1 ) j 2 . To this end, de…ne a functional mapping g(2 (0; 1)) 7 ! T V 1 (g) (2 (0; 1)) for each (v; 1 ): Analogously, we de…ne the following mapping: where the (true) CDF F v W in T v 1 is replaced by the empirical oneF v W . Since T v andT v are contraction (by (iii) of Assumption 7; see also discussions in Appendix A.3), we can …nd ? v ( 1 ) and^ ? v ( 1 ), unique …xed points of T v 1 andT v 1 , respectively, for each ( 1 ; v). By the I.I.D.-ness of fW vh g Nv h=1 in C2, where the last inequality holds since F " is the CDF and jF " (W 0 vh c+ v + g ( 1 )) E F " (W 0 vh c + v + g ( 1 )) j 2 4. Therefore, we have shown that where the supremum is taken over any [0; 1]-valued function on 1 .
2) To verify the continuity of ? v ( 1 ), observe that for 1 6 =~ 1 , By the so-called conditional-covariance decomposition formula, we have The second term on the RHS of (123) is zero since (W vh ; L vh ) ? (W vk ; L vk ) and the conditional expectations are reduced to uniformly over any (v; h) and (v; k), where the last equality follows from the same arguments as for (90) (in the proof of Theorem 5). Using these, we can compute which completes the proof of Lemma 6. Note that thisF ? v;1 is a contraction (by Proposition 3) which does not depend on 2 (the dependence ofF ? v;1 [g] ( 1 ; 2 ) on 2 is only through that of g), and its …xed point is also independent of 2 ; thus, we write^ ? v ( 1 ) (instead of^ ? v ( 1 ; 2 )). By the triangle inequality, 1fw 0 c + v + g(l;ẽ; 1 ; 2 ) +ẽ 0g h(ẽje; jl lj 1 ; 2 ) h(ẽ) dẽdF v W;L (w;l) where the second inequality follows from Assumption 8, and this upper bound is independent of g, e, 1 , and 2 . SinceF v L is the empirical distribution function of the I.I.D. variables fL vk g Nv k=1 , we have By the same arguments as those for (90)  o .
Thus we have that: To see why that is the case, recall the binary choice setting discussed above, and de…ne the conditional-on-income structural choice probability at income y 0 as q c 1 p; y 0 ; y = Z 1 U 1 y 0 p 1 ; U 0 y 0 ; dF ( jy) , where F ( jy) denotes the distribution of the unobserved heterogeneity for individuals whose realized income is y, where y may or may not equal y 0 . Now, given a price rise from p 0 to p 1 , for a real number a, satisfying 0 a < p 1 p 0 , the distribution of equivalent variation (analogous to compensating variation for a fall in price as in a subsidy) at a, evaluated at income y, conditional on realized income being y, is given by (see Bhattacharya, 2015) Pr (EV ajY = y) = q c 1 (p 0 + a; y; y) , Now, q c 1 (p 0 + a; y; y), by de…nition, is the fraction of individuals currently at income Y = y who would choose alternative 1 at price p 0 + a, had their income been y. Now if prices are exogenous in the sense that P ? jY , then the observable choice probability conditional on price p and income y is given by q 1 (p; y) = Z 1 fU 1 (y p; ) U 0 (y; )g f ( jp; y) d = Z 1 fU 1 (y p; ) U 0 (y; )g f ( jy) d (by P ? jY ) = q c 1 (p; y; y) .
Therefore, (127) equals q 1 (p 0 + a; y), so no corrections are required owing to endogeneity. This implies that if exogeneity of income is suspect and no obvious instrument or control function is available, then a researcher can still perform meaningful welfare analysis based on the EV distribution at realized income, provided price is exogenous conditional on income and other observed covariates. For a fall in price, as induced by a subsidy, the same conclusion holds for the compensating variation which we have calculated in our application. Furthermore, one can calculate aggregate welfare in the population by integrating q 1 (p;y) = q c 1 (p; y; y) over the marginal distribution of income.

A.7 Nonparticipating Households
We note that in our …eld experiment conducted over eleven villages in West Kenya, a subset of households in each village is participating in the game, and our sample does not cover all village members. This might potentially cause a problem since selected households might interact with non-selected ones but we do not have any data about the latter. However, at the time of the experiment, non-selected households did not have any opportunity to buy an ITN and the outcome variables A for such households are always zero, whose conditional expectations are zero as well.
Thus, in our speci…cation, even if we allow for interactions among all the village members (who are selected or non-selected by us), it is easy to do the necessary adjustments in the empirics.
To see this point, we interpret the index (v; k) as representing any of selected and non-selected households, i.e., k 2 f1; : : : ; N v g where N v is the number of all households in village v (thus, which is a scaled version of vh . Even if (v; h)'s behavior is a¤ected by non-selected households, i.e., it is determined by (1) but with vh being replaced by vh , its di¤erence from the previous case is only the scaling by ( Nv 1 Nv 1 ). In our empirical setting, this ratio is 0:8, and we apply this adjustment throughout the analysis.