2SLS with Multiple Treatments

We study what two-stage least squares (2SLS) identifies in models with multiple treatments under treatment effect heterogeneity. Two conditions are shown to be necessary and sufficient for the 2SLS to identify positively weighted sums of agent-specific effects of each treatment: average conditional monotonicity and no cross effects. Our identification analysis allows for any number of treatments, any number of continuous or discrete instruments, and the inclusion of covariates. We provide testable implications and present characterizations of choice behavior implied by our identification conditions.


Introduction
In many settings-e.g., education, career choices, and migration decisions-estimating the causal effects of a series of different treatments is valuable. For instance, in the context of criminal justice, one might be interested in separately estimating the effects of conviction and incarceration on defendant outcomes (Humphries et al., 2022). Identifying treatment effects in settings with multiple treatments using instruments has, however, proven challenging (Heckman et al., 2008). A common approach in the applied literature is to estimate a "multivariate" two-stage least squares (2SLS) regression with indicators for receiving various treatments as multiple endogenous variables and at least as many instruments. 1 While such an approach is valid under homogenous treatment effects, it does not generally identify meaningful treatment effects under treatment effect heterogeneity.
To fix ideas about the identification problem, consider a case with three mutually exclusive treatments-∈ {0, 1, 2}-and a vector of valid instruments . Let 1 and 2 be the causal effects of receiving treatments 1 and 2 relative to receiving treatment 0 for agent on an outcome variable. Then, the 2SLS estimate of the causal effect of receiving treatment 1 is, in general, a weighted sum of 1 and 2 across agents, where the weights can be negative. Thus, the estimated effect of treatment 1 can both put negative weight on the effect of treatment 1 for some agents and be contaminated by the effect of treatment 2. In severe cases, the estimated effect of treatment 1 can be negative even though 1 > 0 for all agents. The existing literature has, however, not clarified which conditions are necessary for the 2SLS estimate of the effect of treatment 1 to assign proper weights-non-negative weight on 1 and zero weight on 2 . 2 In this paper, we present two necessary and sufficient conditions-besides the standard rank, exclusion, and exogeneity assumptions-for multivariate 2SLS to assign proper weights: average conditional monotonicity and no cross effects. Our results apply in the general case with treatments and ≥ continuous or discrete instruments. We also provide results for settings where exogeneity holds only conditional on covariates-a common feature in applied work that complicates the analysis of 2SLS already in the binary treatment case (Słoczyński, 2020;Blandhol et al., 2022). 3 Finally, our results allow the researcher to specify any set of relative treatment effects, not just effects relative to an excluded treatment (treatment 0).
1 Examples of studies estimating models with multiple treatments using 2SLS-either as the main specification or as an extension of a baseline specification with a binary treatment-include Persson & Tabellini (2004); Acemoglu & Johnson (2005) Angrist et al. (2022). 2 The requirement that 2SLS assign proper weights is equivalent to 2SLS being weakly causal (Blandhol et al. 2022)-providing estimates with the correct sign-under arbitrary heterogeneous effects. 3 Blandhol et al. (2022)'s analysis also extend to the estimation of aggregate effects of multiple ordered treatment.
For expositional ease, however, we continue with the three-treatment example. When does the 2SLS estimate of receiving treatment 1 put non-negative weight on 1 and zero weight on 2 ? To develop intuition about the required conditions, assume we run 2SLS with the following two instruments: the linear projection of an indicator for receiving treatment 1 on the instrument vector , which we refer to as "instrument 1", and the linear projection of an indicator for receiving treatment 2 on the instrument vector , i.e., "instrument 2". Using instruments 1 and 2-the predicted treatments from the 2SLS first stage-as instruments is numerically equivalent to using the full vector as instruments. The first condition-average conditional monotonicity-requires that, conditional on instrument 2, instrument 1 does not, on average, induce agent out of treatment 1. 4 The second condition-no cross effects-requires that, conditional on instrument 1, instrument 2 does not, on average, induce agent into or out of treatment 1. The latter condition is necessary to ensure zero weight on 2 and is particular to the case with multiple treatments. Notably, both conditions can be tested empirically.
For instance, one can regress an indicator for receiving treatment 1 on instrument 1 and 2 on subsamples and test whether the coefficient on instrument 1 is non-negative and whether the coefficient on instrument 2 is zero.
Building upon our general identification results, we consider a prominent special case: the just-identified 2SLS with mutually exclusive treatments and mutually exclusive binary instruments. This case is a natural generalization of the canonical case with a binary treatment and a binary instrument to multiple treatments. A typical application is a randomized controlled trial where each agent is randomly assigned to one of treatments, but compliance is imperfect. In this setting, our two conditions require that each instrument affects exactly one treatment indicator. In particular, there must be a labeling of the instruments such that instrument moves agents only from the excluded treatment 0 into treatment . This result gives rise to a more powerful test in just-identified models: If 2SLS assigns proper weights, each instrument must affect exactly one treatment indicator in a first-stage IV regression. This test can be applied both in the whole population and in subsamples.
The requirement that each instrument affects exactly one choice margin restricts choice behavior in a particular way. In particular, 2SLS assigns proper weights only when choice behavior can be described by a selection model where the excluded treatment is always the preferred alternative or the next-best alternative. It is thus not sufficient that each instrument influences the utility of only one choice alternative. For instance, an instrument that affects only the utility of receiving treatment 1 could still affect the takeup of treatment 2 by inducing agents who would otherwise have selected treatment 2 to select treatment 1. Such cross effects are avoided if the excluded treatment is always at least the next-best alternative. To apply 2SLS in this just-identified case, the researcher must argue why the excluded treatment is always the best or the next-best alternative.
Our results essentially imply that unless researchers can infer next-best alternatives-as in Kirkeboen et al. (2016) and the following literature-2SLS in just-identified models does not identify a meaningful causal effect under arbitrary heterogeneous effects.
Until now, we have considered unordered treatment effects-treatment effects relative to an excluded treatment. Our results, however, also apply to any other relative treatment effects a researcher might seek to estimate through 2SLS. An important case is ordered treatment effects-the effect of treatment relative to treatment k-1. 5 In the ordered case, the just-identified 2SLS assigns proper weights if and only if there exists a labeling of the instruments such that instrument moves agents only from treatment k-1 to treatment . As in the unordered case, this condition can be tested both in the full population and in subsamples. The condition also imposes a particular restriction on agents' choice behavior: we show that 2SLS assigns proper weights in just-identified ordered choice models only when agents' preferences can be described as single-peaked over the treatments. When treatments have a natural ordering-such as years of schooling-the researcher might be able to make a strong theoretical case in favor of such preferences.
We finally present another special case of ordered choice where our conditions are satisfied: a classical threshold-crossing model applicable when treatment assignment depends on a single index crossing multiple thresholds. For instance, treatments can be grades and the latent index the quality of the student's work, or treatments might be years of prison and the latent index the severity of the committed crime. Suppose the researcher has access to exogenous shocks to these thresholds, for instance through random assignment to judges or graders that agree on ranking but use different cutoffs.
Then 2SLS assigns proper weights provided that there is a linear relationship between the predicted treatments-an easily testable condition. 6 Our paper contributes to a growing literature on the use of instruments to identify causal effects in settings with multiple treatments ( Angrist & Imbens (1995) showed the conditions under which 2SLS with the multivalued treatment indicator as the endogenous variable identifies a convex combination of the effect of treatment 2 relative to treatment 1 and the effect of treatment 1 relative to treatment 0. In contrast, we seek to determine the conditions under which 2SLS with two binary treatment indicators-1 = 1 [ ≥ 1] and or discrete instruments, and any definition of treatment indicators. We also allow for covariates. Moreover, we show how the conditions can be tested. By comparison, the existing literature provides only sufficient conditions in the just-identified case with three treatments, three instrument values, and no controls (Behaghel et al., 2013;Kirkeboen et al., 2016).
In the case of just-identified models with unordered treatments, we show that the extended monotonicity condition provided by Behaghel et al. (2013) is not only sufficient but also necessary for 2SLS to assign proper weights, after a possible permutation of the instruments. This non-trivial result gives rise to a new test in just-identified models: For 2SLS to assign proper weights each instrument can only affect one treatment. Furthermore, we show that knowledge of agents' next-best alternatives-as in Kirkeboen et al.
(2016)-is implicitly assumed whenever estimates from just-identified 2SLS models with multiple treatments are interpreted as a positively weighted sum of individual treatment effects. We thus show that the assumption that next-best alternatives are observed or can be inferred is not only sufficient but also essentially necessary for 2SLS to identify a meaningful causal parameter. 7 We also provide new identification results for ordered treatments. First, we show when 2SLS with multiple ordered treatments identifies separate treatment effects in a standard threshold-crossing model considered in the ordered choice literature (e.g., Carneiro et al. 2003;Cunha et al. 2007;Heckman & Vytlacil 2007). While Heckman & Vytlacil (2007) show that local IV identifies ordered treatment effects in such a model, we show that 2SLS can also identify the effect of each treatment transition under an easily testable linearity condition. We also show how the result of Behaghel et al. (2013) extends to ordered treatment effects. Finally, we show that for 2SLS to assign proper weights in just-identified models with ordered treatment effects, it must be possible to describe agents' preferences as single-peaked over the treatments.
In contrast to Heckman & Pinto (2018), who provide general identification results in a setting with multiple treatments and discrete instruments, we focus specifically on the properties of 2SLS-a standard and well-known estimator common in the applied literature. Other contributions to the literature on the use of instrumental variables to separately identify multiple treatments (Lee & Salanié, 2018;Galindo, 2020;Lee & Salanié, 2023;Mountjoy, 2022;Pinto, 2021)  The only exception being that next-best alternatives need not be observed for always-takers.
In Section 2, we develop the exact conditions for the multivariate 2SLS to assign proper weights to agent-specific causal effects and discuss how these conditions can be tested. In Section 3, we consider two special cases-the just-identified case and a threshold-crossing model. Section 4 concludes. Proofs and additional results are in the Appendix.

Multivariate 2SLS with Heterogeneous Effects
In this section, we develop sufficient and necessary conditions for the multivariate 2SLS to identify a positively weighted sum of individual treatment effects under heterogeneous effects and explain how these conditions can be tested.
Using 2SLS, a researcher can aim to estimate two out of three relative treatment effects. We focus on two special cases: the unordered and ordered case. In the unordered case, the researcher seeks to estimate the effects of treatment 1 and 2 relative to treatment 0 by estimating 2SLS with treatment indicators unordered . 10 In that case, the treatment effects of interest are represented 8 All random variables thus correspond to a randomly drawn agent. We omit subscripts. 9 We focus on mutually exclusive treatments throughout. Treatments that are not mutually exclusive can always be made into mutually exclusive treatments. For instance, two not mutually exclusive treatments and an excluded treatment can be thought of as four mutually exclusive treatments: Receiving the excluded treatment, receiving only treatment 1, receiving only treatment 2, and receiving both treatments.
10 The more general case of unordered choice-where the researcher seeks to estimate all relative by the random vector unordered ≡ ( (1) − (0) , (2) − (0)) . In the ordered case, the researcher is interested in ordered ≡ ( (1) − (0) , (2) − (1)) and uses treat- For a response type ∈ , let be the induced mapping between instruments and treatment indicators: where is the linear projection of on We refer to as the predicted treatments-the best linear prediction of the treatment indicators given the value of the instruments. Similarly, we refer to for ∈ {1, 2} as predicted treatment . Since 2SLS using predicted treatments as instruments is numerically equivalent to using the original instruments, we can think of as our instruments.
In particular, is a linear transformation of the original instruments such that we get one instrument corresponding to each treatment indicator. We will occasionally use language such as "the effect of 1 on treatment 2".

Identification Results
What does multivariate 2SLS identify under Assumptions 1 and 2 when treatment effects are heterogeneous across agents? The following proposition expresses the 2SLS estimand as a weighted sum of average treatment effects across response types: 11 treatment effects-could be analyzed by varying which treatment is considered to be the excluded treatment. In that case, the researcher should discuss and test the conditions in Section 2 for each choice of excluded treatment. Often, however, the researcher will only be interested in some relative treatment effects. 11 The literature has documented that 2SLS gives a weighted average of individual treatment effects in the case of two mutually exclusive treatment indicators when using a discrete instrument (Kirkeboen et al., 2016) or two continuous instruments (Mountjoy, 2022). Proposition 1 generalizes these results Proposition 1. Under Assumptions 1 and 2 In the case of a binary treatment and a binary instrument, 2SLS is a weighted sum of average treatment effects for compliers and defiers. In the canonical case of a binary treatment and a binary instrument, identification is ensured when there are no defiers. The aim of this paper is to generate similar restrictions on the possible response types in the case of multiple treatments. To become familiar with our notation and the implications of Proposition 1, consider the following example.
Response type thus selects treatment 1 unless 2 is turned on. When 2 = 1, selects treatment 2. Assume the best linear predictors of the treatments indicators are 1 = 0.1 + 0.4 1 and 2 = 0.2 + 0.5 2 and that all instrument values are equally likely.
to more than three treatments, other definitions of treatment indicators, and more instruments than treatments. Moreover, the proposition provides a concise and easily interpretable expression for the weights.
If response type were present in the population, the weights on the average treatment effects of response type would be 12 The average effect of treatment 2 for response type thus contributes positively to the estimated effect of treatment 2 ( 2SLS

2
) and negatively to the estimated effect of ). The presence of this response type in the population would be problematic. The response type on the other hand, has weight matrix The effect of treatment 2 for this response type only contributes to the estimated effect of treatment 2. Two-stage least squares thus assigns proper weights on this response type's average treatment effects.
Under homogeneous effects, the weight 2SLS assigns on the treatment effects of a particular response type is not a cause of concern. By the following corollary of can we interpret the 2SLS estimate of the effect of treatment 1 as a positively weighted average of the effect of treatment 1 under heterogeneous effects. Throughout the rest of the paper, we say that the 2SLS estimate of the effect of treatment 1 assigns proper weights if the following holds: Definition 1. The 2SLS estimate of the effect of treatment 1 assigns proper weights if for each ∈ supp ( ), the 2SLS estimand 2SLS 1 places non-negative weights on 1 and zero weights on 2 .
We say that 2SLS assigns proper weights if it assigns proper weights for both treatment effect estimates. Requiring 2SLS to assign proper weights is equivalent to requiring 2SLS to be weakly causal (Blandhol et al., 2022) under arbitrary heterogeneous effects, i.e., the 2SLS provides estimates with the correct sign. 14 For 2SLS to assign proper weights, the matrix must be a non-negative diagonal matrix for all . When is this the case? To gain intuition, note that the formula determining is the same as the coefficients obtained from hypothetical linear regressions of the potential treatments  Intuitively, Assumption 3 requires that, controlling linearly for predicted treatment 2, there is a positive correlation between predicted treatment 1 and potential treatment 1 for each agent across values of the instruments. We refer to the condition as average conditional monotonicity since it generalizes the average monotonicity condition defined by Frandsen et al. (2023) to multiple treatments. 16 In particular, the positive relationship between predicted treatment and potential treatment only needs to hold "on average" across realizations of the instruments. Thus, an agent might be a "defier" for some pairs of instrument values as long as she is a "complier" for sufficiently many other 14 See Section B.1. 15 The partial correlation between 1 and 1 ( ) given 2 is the Pearson correlation coefficient between the residuals from regressing 1 on 2 and the residuals from regressing 1 ( ) on 2. In other words, it is the correlation between 1 and 1 ( ) when linearly controlling for 2. This partial correlation has the same sign as the coefficient on 1 in a linear regression of 1 ( ) on 1 and 2.
16 With one treatment, Assumption 3 reduces to Cov ( 1, 1 ( )) ≥ 0 which coincides with the average monotonicity of Frandsen et al. (2023). Under Assumption 4, Assumption 3 is equivalent to the correlation between 1 and 1 ( ) being non-negative (without having to condition on 2).
pairs. Informally, we can think of Assumption 3, as requiring the partial effect of 1 on treatment 1 to be, on average, non-negative for all agents. In order to make sure that all the weights on 2 in 2 1 are zero we need the following much stronger condition.
Assumption 4. (No Cross Effects). For all the partial correlation between 1 and 2 ( ) given 2 is zero. This condition requires that, linearly controlling for predicted treatment 2, there is no correlation between predicted treatment 1 and potential treatment 2 for each agent.
Informally, we can think of Assumption 4 as requiring there to be no partial effect of 1 on treatment 2 for all agents. Intuitively, if 1 has a tendency to push certain agents into or out of treatment 2-even after controlling linearly for 2 -the estimated effect of treatment 1 will be contaminated by these agents' treatment effect of treatment 2.
Assumptions 3 and 4 are necessary and sufficient conditions to ensure that 2SLS assigns proper weights. This is our main result: Informally, 2SLS assigns proper weights if and only if for all agents and treatments , (i) increasing predicted treatment tend to weakly increase adoption of treatment and (ii) conditional on predicted treatment , increases in predicted treatment ̸ = do not tend to push the agent into or out of treatment . In Appendix Section B.5, we show how our conditions relate to the Imbens-Angrist (IA) monotonicity condition (Imbens & Angrist, 1994). In particular, we show that if IA monotonicity holds, then 2SLS assigns proper weights under an easily testable linearity condition.

Testing the Identification Conditions
Assumptions 3 and 4 can not be directly assessed since we only observe ( ) for the observed instrument values. But the assumptions do have testable implications. In particular, let ∈ {0, 1} be a random variable such that ⊥ ( , ). 17 Informally, is a pre-determined variable not influenced by or correlated with the instrument. We then have the following testable prediction. 17 We have already assumed ⊥ (Assumption 1). The condition ⊥ ( , ) requires, in addition, that is independent of the joint distribution of and . If the instrument is truly random, then any variable that pre-dates the randomization satisfies ⊥ ( , ). For instance, in a randomized control trial, can be any pre-determined characteristic of the individuals in the experiment.

If 2SLS assigns proper weights then
is a diagonal non-negative matrix.
This prediction can be tested by running the following regressions for the sub-sample = 1 and testing whether 11 and 22 are non-negative and whether 12 = 21 = 0. The predicted treatments can be estimated from a linear regression of the treatments on the instruments on the whole sample-a standard first-stage regression. This test thus assesses the relationship between predicted treatments and selected treatments in subsamples. 18 If 2SLS assigns proper weights, we should see a positive relationship between treatment and predicted treatment and no statistically significant relationship between treatment and predicted treatment ̸ = in (pre-determined) subsamples. 19 Note that failing to reject that is non-negative diagonal across all observable pre-determined does not prove that 2SLS assigns proper weights, even in large samples: There might always be unobserved pre-determined characteristics such that Var ( | = 1) −1 Cov ( , | = 1) is not non-negative diagonal. 20

Special Cases
In this section, we first provide general identification results in the just-identified case and derive the implied restrictions on choice behavior under ordered or unordered treatment effects. Then, we provide identification results for a standard threshold-crossing model-an example of an overidentified model with ordered treatment effects.

Identification Results
The standard application of 2SLS involves one binary instrument and one binary treatment. In this section, we apply the results in Section 2 to show how this canonical case generalizes to the case with three possible treatments and three possible values of the instrument. 23 This case is the just-identified case with multiple treatments: the number of distinct values of the instruments equals the number of treatments. In particular, assume we have an instrument, ∈ {0, 1, 2}, from which we create two mutually exclusive 21 Flexibly interacting instruments and covariates is very data demanding and is almost never done in applied work (Blandhol et al., 2022).
22 See Section B.1. 23 The results generalize to possible treatments and possible values of the instruments. See Section B.2 in the Appendix. Response Type ( (0) , (1) , (2)) Never-taker (0, 0, 0) Always-1-taker (1, 1, 1) . 24 We maintain Assumptions 1 and 2. This setting is common in applications. For instance, might be an inducement to take up treatment , as considered by Behaghel et al. (2013). Under which conditions does the multivariate 2SLS assign proper weights in this setting? It turns out that proper identification is achieved only when each instrument affects only one treatment.
For simpler exposition, we represent in this section the response types as functions where ( ) is the treatment selected by response type when is set to . Thus, ( ) is the potential treatment when = 1. In words, for each agent and treatment , either never takes up treatment , always takes up treatment , or takes up treatment if and only if ( ) = . Each instrument is thus associated with exactly one treatment ( ) . For ease of exposition, we can thus, without loss of generality, assume that the treatments and instruments are ordered in the same way: Assumption 5. Assume instrument values are labeled such that ( ) = for all ∈ {0, 1, 2} where is the unique mapping defined in Proposition 3.
To see the implications of Proposition 3, consider the response types defined in Table   1. It turns out that, for 2SLS to assign proper weights, the population can not consist of any other response type: if and only if all agents are either never-takers, always-1-takers, always-2-takers, 1compliers, 2-compliers, or full compliers. 24 There are other ways of creating two instruments from . For instance, one could define 1 = 1 [ ≥ 1] and 2 = 1 [ = 2]. Such a parameterization would give exactly the same 2SLS estimate. We focus on the case of mutually exclusive binary instruments since it allows for an easier interpretation of our results. Behaghel et al. (2013) show that these assumptions are sufficient to ensure that 2SLS assigns proper weights. See also Kline & Walters (2016) for similar conditions in the case of multiple treatments and one instrument. 25 Proposition 3 shows that these conditions are not only sufficient but also necessary, after a possible permutation of the instruments. These response types are characterized by instrument not affecting treatment ̸ = -no cross effects.
As in Section 2.3, this prediction can be tested by running a first-stage regression 27 = + * + on the sub-sample = 1 and testing whether is a non-negative diagonal matrix. In other words, for 2SLS to assign proper weights, there must exist a permutation of the instruments such that there is a positive relationship between treatment and instrument and no statistically significant relationship between treatment and instrument ̸ = in all (pre-determined) subsamples. (2016), generalized to two instruments, also describes the same response types:

Equation 1 in Kline & Walters
The requirement that the instruments can cause individuals to switch only from "no treatment" (the excluded treatment) into some treatment is shared by Rose & Shem-Tov (2021b)'s extensive margin compliers only assumption.
26 Typically, the researcher would have a clear hypothesis about which instrument is supposed to affect each treatment, avoiding the need to run the test for all possible permutations.
27 Heinesen et al. (2022) show that the coefficients from a first-stage regression on the full sample can be used to partially identify violations of "irrelevance" and "next-best" assumptions invoked in Kirkeboen et al. (2016). Their Proposition 4 implies that is non-negative diagonal if their monotonicity, irrelevance, and next-best assumptions hold. We show that this test is a valid test not only of their invoked assumptions, but also more generally of whether 2SLS assigns proper weights. Moreover, we show that the test can be applied on subsamples.

Choice-Theoretic Characterization
What does the condition in Proposition 3 imply about choice behavior? To analyze this, we use a random utility model. 28 Assume that response type 's indirect utility from choosing treatment when = is ( ) and that selects treatment if The implicit assumptions about choice behavior differ according to which treatment effects we seek to estimate. We here consider the cases of ordered and unordered treatment effects.
Unordered Treatment Effects. What are the implicit assumptions on choice be- The excluded treatment is, thus, always "in between" the selected treatment and all other treatments in this selection model. Since next-best alternatives are not observed, there are selection models consistent with 2SLS assigning proper weights where other alternatives than the excluded treatment are occasionally next-best alternatives. But 28 In a seminal article, Vytlacil (2002) showed that the Imbens & Angrist (1994) monotonicity condition is equivalent to assuming that agents' selection into treatment can be described by a random utility model where agents select into treatment when a latent index crosses a threshold. In this section, we seek to provide similar characterizations of the condition in Proposition 3. 29 Since all agents of the same response type have identical behavior, it is without loss of generality to assume that all agents of a response type have the same indiriect utility function. We assume response types are never indifferent between treatments. 30 Lee & Salanié (2023) consider a similar random utility model. Their Additive Random Utility Model in combination with strict one-to-one targeting and the assumption that all treatments except = 0 are targeted gives ( ) = + 1 [ = ] for constants and > 0. As shown by Lee & Salanié (2023), these assumptions are generally not sufficient to point-identify local average treatment effects. Our result shows that-in the just-identified case-local average treatment effects can be identified if we additionally assume that all agents have the same next-best alternative. 31 The assumption that arg max ( ) is a singleton implies that preferences are strict. School 1 School 0 School 2 Student A Student B Note: Example of setting where the excluded treatment could plausibly be argued to be the best or the next-best alternative for all agents. Here the treatments are schools, with School 0 being the excluded treatment. Agents are students choosing which school to attend. The open dots indicate locations of schools, and the closed dots indicate the location of two example students. If students care sufficiently about travel distance, School 0 will be either the best or the next-best alternative for all students.
when indirect utilities are given by 0 = 0 and = + 1 [ = ], this can only happen when always selects the same treatment, in which case the identity of the next-best alternative is irrelevant. 32 In which settings can a researcher plausibly argue that the excluded treatment is always the best or the next-best alternative? A natural type of setting is when there are three treatments and the excluded treatment is "in the middle". For example, consider estimating the causal effect of attending "School 1" and "School 2" compared to attending "School 0" on student outcomes. Here, School 0 is the excluded treatment. Assume students are free to choose their preferred school. If School 0 is geographically located in between School 1 and School 2, as depicted in Figure 1, and students care sufficiently about travel distance, it is plausible that School 0 is the best or the next-best alternative for all students. For instance, student A in Figure 1, who lives between School 1 and School 0, is unlikely to prefer School 2 over School 0. Similarly, student B in Figure 1 is unlikely to have School 1 as her preferred school. If we believe no students have School 0 as their least favorite alternative and we have access to random shocks to the utility of attending Schools 1 and 2 multivariate 2SLS can be safely applied. 33 Kirkeboen et al. (2016) and a literature that followed exploit knowledge of next-best alternatives for identification in just-identified models. In the Appendix Section B.7, we show that the conditions invoked in this literature are not only sufficient for 2SLS to assign proper weights but also essentially necessary. 34 Thus, to apply 2SLS in justidentified models with arbitrary heterogeneous effects, researchers have to either directly observe next-best alternatives or make assumptions about next-best alternatives based on institutional and theoretical arguments.
32 See the proof of Proposition 5. 33 Another example where the excluded treatment is always the best or the next-best alternative is when agents are randomly encouraged to take up one treatment and can not select into treatments they are not offered. In that case, one might estimate the causal effects of each treatment in separate 2SLS regressions on the subsamples not receiving each of the treatments. But when control variables are included in the regression, estimating a 2SLS model with multiple treatments can improve precision.  Thus, without loss of generality, we can assume that instrument increases the utility of treatments ≥ by the same amount while keeping the utility of treatments < constant. But this assumption is not sufficient to prevent instrument from influencing treatment indicators other than . In addition, we need that preferences are single- for all ∈ , ∈ {0, 1, 2}, and ∈ {1, 2} and the preferences are single-peaked.

A Threshold-Crossing Model
In several settings, treatments have a clear ordering, and assignment to treatment could We are interested in the ordered treatment effects = ( (1) − (0) , (2) − (1)) ,
Proof. (Proposition 1). We have The third equality uses the law of iterated expectations. The fourth equality uses that is a deterministic function of . The sixth and the seventh equality invokes The seventh equality also relies on the law of iterated expectations. In particular Proof. (Corollary 2). The partial correlation between 1 and 1 ( ) given 2 is the Pearson correlation coefficient between the residuals from regressing 1 on 2 and the residuals from regressing 1 ( ) on 2 . By the Frisch-Waugh-Lovell theorem, this partial correlation has the same sign as the coefficient on 1 in a regression of ( 1 ) on .
Proof. (Corollary 3). The proof is analogous to the proof of Corollary 2.
Proof. (Proposition 5). We first show that indirect utilities of the stated form satisfy the assumptions in Proposition 3 required for 2SLS to assign proper weights. We need to show that (i) a response type never selects unless she selects it at = and (ii) a response type that selects when = ̸ = always selects . To show (i), assume Informally, there is a "peak" at treatment −1. By single-peakedness, −1, ( ) must then be the largest indirect utility when = .
To prove the other direction of the equivalence, assume that 2SLS assigns proper weights. We want to show that choice behavior can be described by indirect utilities of the form given in the proposition. To do this, define indirect utilities These cases cover all possible response types when 2SLS assigns proper weights: As argued above, under ordered treatment effects, the conditions in Proposition 3 require that each a response type either always select the same treatment or selects treatment if = and treatment − 1 if ̸ = for some . It is straightforward to verify that ( ) = arg max ( ) with these values of and . Note that no response type is ever indifferent, so arg max ( ) is indeed a singleton.
We now verify that the indirect utilities given above are always single peaked. First, consider the case when response type always selects treatment . Then Proof. (Proposition 7). This result follows directly from Proposition B.8.

Proof. (Proposition B.1). We have
Cov The second equality uses that E [︁˜]︁ = 0. The third equality uses that and, using E [︁˜| ]︁ = 0 and the law of iterated expectations The fourth equality applies the law of iterated expectations. The sixth equality uses where the third equality invokes the conditional independence assumption.
where the third equality uses Assumption B.1.

we have 42
Cov (︁˜, for each response type present in the subsample = 1. Thus is a non-negative diagonal matrix. Also, note that We then have that 43 where the second equation uses ⊥ | , and the law of iterated expectations. E.g., where the second equality uses the law of iterated expectation and

Proof. (Proposition B.3). Under Assumptions B.1-B.4, we have
The result then follows by an application of Proposition 3 on each subsample = .
The permutation must be the The result then follows by applying Proposition 7 on each subsample = .
Proof. (Proposition B.6). To see that IA is not sufficient for Assumption 4, consider the following example. In the model of Section 3.1 with unordered treatment effects, assume there are only two response types, and ′ , defined by In a population composed of and ′ , IA monotonicity is trivially satisfied. But the no-cross effect condition (Assumption 4) is not satisfied. Instrument 2 affects both treatment 1 and treatment 2 violating the conditions in Proposition 3. In particular, response type ′ is not among the allowed response types in Corollary 5.
To see that IA monotonicity does not imply Assumption 3, assume = { 1 , 2 } is composed of two continuous instruments taking values between 0 and¯. For simplicity, assume only 1 affects treatment 1 and 1 ⊥ 2 . Assume the linear relationship between 1 and 1 is increasing but the highest propensity to take up treatment 1 occurs at 1 = 0. In other words, the linear relationship between the instrument and treatment imposed by the first stage is misspecified. Then a response type that selects 1 if and only if 1 = 0 satisfies IA monotonicity but violates average conditional monotonicity. 44 To prove ii), consider the case when Assumption 6 holds and 1 ⊥ 2 . IA monotonicity then requires that the selection of treatment 1 is strictly non-decreasing in 1 for all agents. But Assumption 3 is weaker and only requires the selection of treatment 1 to have a non-negative correlation with 1 for all agents. this is equivalent to We want to show that this condition is equivalent to for constants and . First, assume Equation A.2. We then have 45 44 When 1 ⊥ 2, average conditional monotonicity with respect to treatment 1 reduces to Cov ( 1, 1 ( )) ≥ 0 which is violated in this case.
45 The reverse direction of the equivalence is straightforward to verify.

B.2 Generalization to More Than Three Treatments
In this section, we show how our results generalize to the case with more than three treatments. Assume there are + 1 treatments: ∈ ≡ {0, 1, . . . , }. Otherwise, the notation remains the same. In particular, the treatment effect of treatment relative to treatment is given by ( ) − ( ). Using 2SLS, we can seek to estimate of these treatment effects denoted by the random vector ≡ ( 1 , 2 , . . . , ) . For instance, we might be interested in unordered treatment effects-the causal effect of each treatment . 48 Weak causality requires that the 2SLS estimate is non-negative (non-positive) if the effect for all response types is non-negative (non-positive). When the treatment effect is zero (both non-negative and non-positive) for all response types, the 2SLS estimate is then required to be zero (both non-negative and non-positive).
where is a function of { (0) , (1) , . . . , ( )}. For instance, if (e.g., = { , + 1, . . . , }). In Figure   B.1, we show these two special cases and a third generic case as directed graphs where an arrow from = to = indicates that we are interested in estimating the treatment effect ( ) − ( ). By using different treatment indicators one can estimate any subset of treatment effects that connects the treatments in an acyclic graph. 49 For a response type ∈ , define by the induced mapping between instruments and treatment indicators: Proposition 1 generalizes directly to this setting with more than three treatments-the only required modification to the proof is to replace ( In words, only instrument can influence treatment . For instance, in Table B.1 we show all allowed response types in the case with four treatments. In the case of unordered treatments effects, = 1 [ = ], treatment 0 must play a special role: Unless response type always selects the same treatment, we must have either ( ) = or ( ) = 0, for all . If instrument is an inducement to take up treatment , an agent can thus never select any other treatment / ∈ {0, } when induced to take up treatment , unless the agent always selects treatment . Response Type ( (0) , (1) , (2) , (3)) Never-taker (0, 0, 0, 0) Always-1-taker (1, 1, 1, 1) Always-2-taker (2, 2, 2, 2) Always-3-taker (3,3,3,3)

B.3 Multivariate 2SLS with Covariates
In this section, we show how our results generalize to 2SLS with multiple treatments and control variables. In particular, assume we have access to a vector of control variables with finite support such that the instruments are exogenous only conditional on where, for ∈ and ∈ , ≡ Var .
Thus, 2SLS with controls assigns proper weights if and only if , is a non-negative diagonal matrix for all ∈ and ∈ . This condition is in general hard to interpret.
But the condition has a straightforward interpretation under the following assumption: being a non-negative diagonal matrix for all ∈ and ∈ . Note that predicted treatment is predicted based on and , but not the interaction.
53 In this setting, Var (︁˜| = )︁ = Var (︁˜)︁ is equivalent to (i) the correlation between incarceration and conviction rates between judges not varying across courts and (ii) the ratio between the standard deviation of incarceration rates and the standard deviation of conviction rates not varying across courts. 54 Using the assigned judge as the instrument, the predicted treatments are the judge's incarceration and conviction rates. In applied work, these rates are typically estimated by leave-one-out estimates to avoid small sample bias. bias coming from predicted treatments covarying differently across values of . This case, however, seems hard to characterize in an interpretable way. Also note that Assumption B.3 only depends on population moments and can thus be easily tested.
It is useful to contrast 2SLS with multiple treatments and covariates to OLS with multiple treatment and covariates. Goldsmith-Pinkham et al. (2022) show that OLS with controls and multiple treatments is generally affected by "contamination bias"-the estimated effect of one treatment being contaminated by the effects of other treatments.

Then Var
(︁˜| = 1 )︁ = 9 Var (︁˜| = 0 )︁ . Had incarceration and conviction rates been forced to be either zero or one-as in OLS-it would not have been possible to construct such an example.
This assumption requires that the predicted treatments estimated on the full sample coincides with predicted treatments estimated separately for each value of .

B.4 Assumptions About Heterogeneous Effects
Average conditional monotonicity and no cross effects (Assumptions 3 and 4) make sure that 2SLS is weakly causal for arbitrary heterogeneous effects. 61 But these assumptions can be relaxed if we are willing to make some assumptions about heterogeneous effects.
To analyze this case, define compliers (defiers) as the response types pushed into (out of) treatment 1 by instrument 1. To simplify language, we here refer to 1 as "instrument 1" and 2 as "instrument 2". 62 Define the following parameters: sum of the positive weights on treatment 2 in 2SLS 1 61 See Section B.1. 62 All statements below are conditional on controlling linearly for the other instrument. E.g., "pushed into treatment 1 by instrument 1"="pushed into treatment 1 by instrument 1 when controlling linearly for instrument 2". We also change the notation of to to improve readability.
We then obtain the following decomposition: 63 The 2SLS  . The same inequality is obtained when ≤ 0 for all agents. 67 Since the weights on compliers sum to one when negative = 0, the magnitude of 2 cross can be interpreted as the number of agents pushed into or out of treatment 2 by instrument 1 as a share of the complier population.

B.5 Relationship to Imbens-Angrist Monotonicity
In this section, we show how the conditions in Section 2.2 relate to the classical Imbens & Angrist (1994)  This assumption requires Imbens & Angrist (1994)  69 IA monotonicity requires potential treatment to be strictly non-decreasing in the probability of receiving treatment conditional on the instruments. When there are many instrument values, as in for instance random judge designs with many judges, IA monotonicity is considerably more demanding than Assumption 3. In the just-identified case with treatments and mututally exclusive binary instruments, however, Assumptions 3 and 4 do imply IA monotonicity. To see why the predicted treatments need to be linearly related, consider the following example. Assume there are three treatments and that the relationship between 2 and 1 is given by E [ 2 | 1 ] = 1 − 2 1 . The probability of receiving treatment 2 is highest when the probability of receiving treatment 1 is 50% and equal to zero when the probability of receiving treatment 1 is either zero or one. Since we only control linearly for 1 in 2SLS, changes in 2 will then push agents into or out of treatment 1, violating the no cross effects condition.
The required linearity condition is easy to test and ensures that 2SLS assigns proper weights also when Assumption B.6 does not hold: In Section 3.2, we apply Proposition B.8 to the case when treatment is characterized by a latent index crossing multiple thresholds.

B.6 Relationship to Unordered Monotonicity
In this section, we show how the conditions in Section 2.2 relate to Heckman & Pinto for all ∈ or ( ) ≤ ( ′ ) for all ∈ .
Note that unordered monotonicity is a strictly stronger condition than Assumption 70 For instance, consider the just-identified case with three treatments where all the response types in Table 1 are present. In that case, Assumption B.5 is satisfied but unordered monotonicity is violated: We have 0 (2) > 0 (1) for "1-compliers" and 0 (2) < 0 (1) for "2-compliers".