Bias Formulas for Violations of Proximal Identification Assumptions

Causal inference from observational data often rests on the unverifiable assumption of no unmeasured confounding. Recently, Tchetgen Tchetgen and colleagues have introduced proximal inference to leverage negative control outcomes and exposures as proxies to adjust for bias from unmeasured confounding. However, some of the key assumptions that proximal inference relies on are themselves empirically untestable. Additionally, the impact of violations of proximal inference assumptions on the bias of effect estimates is not well understood. In this paper, we derive bias formulas for proximal inference estimators under a linear structural equation model data generating process. These results are a first step toward sensitivity analysis and quantitative bias analysis of proximal inference estimators. While limited to a particular family of data generating processes, our results may offer some more general insight into the behavior of proximal inference estimators.


Introduction
Causal inference using observational data often rests on the assumption of no unmeasured confounding.This assumption is not empirically verifiable, but sensitivity analysis methods (e.g., [2,4,10,12]) are available to assess robustness of results to possible unmeasured confounding.Alternatively, investigators might turn to methods such as instrumental variable analysis or difference-in-differences, which depend on different assumptions.Sensitivity analyses for violations of the assumptions required by these alternative methods are also available ( [1,9]).
There has been recent interest in the use of negative control methods to detect and resolve confounder bias.A negative control outcome (NCO) is a variable known not to be causally affected by the treatment of interest, while a negative control exposure (NCE) is a variable known not to causally affect the outcome of interest [15].Tchetgen Tchetgen and colleagues have developed a Proximal Inference framework ( [3,7,16]) which uses NCE-NCO pairs sharing the same unmeasured confounders as the treatment-outcome relationship of interest as proxies to adjust for unmeasured confounding.
However, some of the assumptions that proximal inference relies on are themselves empirically untestable [16] and bias resulting from violations of proximal identification assumptions is not fully understood.In this paper, we characterize bias from violations of proximal inference assumptions in a linear structural equation data generating process.Our results build understanding of the sensitivity of proximal inference to assumption violations and serve as a first step toward sensitivity analysis and quantitative bias analysis [6] tools for proximal inference.
The organization of the paper is as follows.In Section 2, we review proximal inference.In Section 3, we describe the forms of bias that we will study.In Sections 4 and 5, we derive bias formulas for the settings described in Section 3. In Section 6, we present numerical experiments based on the bias formulas from Sections 4 and 5 to explore their implications.In Section 7, we conclude by discussing some potential insights into the sensitivity of proximal inference estimators gained from our results.
2 Proximal Identification of the Average Treatment Effect

Review of Definitions and Assumptions
We use the potential outcome framework [13] to formally define causal effects.Let A denote the binary treatment of interest, Y the observed post-treatment outcome, and Y (a), a = 0, 1 the potential (counterfactual) outcome that would have been observed had treatment A been set to a.We implicitly make the no-interference assumption that the potential outcome of each individual does not depend on the treatments received by other individuals [18].We aim to estimate the average causal effect (ACE) of A on Y , defined as ψ = E[Y (1) − Y (0)].
Let L denote the set of measured covariates.We make the standard assumptions of Consistency and Positivity, defined below.
In other words, the observed value of Y under treatment A coincides with the counterfactual outcome that would have been observed under the same treatment value.Thus, we only observe the counterfactual outcome corresponding to the treatment value that was actually administered in our data.
Assumption 2. (Positivity) 0 < P(A = a|L) < 1, for a = 0, 1 Assumption 2 states that both exposure levels are observed at all levels of the observed covariates L.
Many analyses further make the assumption that there is no unobserved confounding, i.e. that observed covariates block all non-direct (or 'backdoor') causal paths between treatment and outcome.( Exchangeability is a strong assumption that is empirically untestable.[8] propose an alternative to Assumption 3 that allows us to identify the counterfactual mean E[Y (a)] despite the presence of unobserved confounding.We review the alternative conditions developed by [8] leading to the proximal g-formula, a counterpart to (1) allowing for some unobserved confounding.
As in [3], we consider a (potentially multidimensional) variable L that can be partitioned into three types of variables (X, Z, W ), such that 1) X includes observed variables that may be common causes of A and Y (observed confounders) 2) Z includes treatment-inducing confounding proxies, i.e.Z includes causes of A that share an unmeasured common cause U Z with Y Assumption 5. (Outcome-inducing confounding proxy) Assumption 6. (Latent unconfoundedness) If U denotes the set of unobserved confounders, then Assumption 4 states that Z does not have a direct effect on Y upon intervening on A, while Assumption 5 states that neither A nor Z have a causal effect on W . Past works [15] refer to variables Z satisfying (2) and (4) as negative control exposure (NCE) variables, and to variables W satisfying (3) and (5) as negative control outcome (NCO) variables.This terminology is based on negative control methods employing variables that share a confounding mechanism with the treatment-outcome relationship in view to detect bias in epidemiological research.Although there is a subtle distinction between the proxy and negative control nomenclature when discussing the design of observational studies [16], for the theoretical analysis employed in this paper we will be using treatment-inducing (outcome-inducing) confounding proxies and NCE (NCO) variables interchangeably.
In addition to Assumptions 1-6, [7] introduce the following completeness conditions for the identification of E[Y (a)]: Assumption 7. (Completeness) For any a, x and for any square-integrable function g: Assumption 7(a) can be interpreted as a requirement that the NCE Z has enough variability relative to the variability of U ; similarly, assumption 7(b) requires the variability of W to be large enough relative to the variability of Z.Under conditions 7(a) and (b), we can essentially account for U in our ACE estimate without either measuring or modeling the distribution of U .The role of completeness will be further explored in Section 2.2, where we outline the analytical framework by which the ACE is estimated using the proximal g-formula.
Completeness assumption 7(a) has a simple interpretation in the case where confounders U and the negative control pair (Z, W ) are all categorical.As mentioned in [3], if (U, Z, W ) are categorical with respective number of categories (d u , d z , d w ), then completeness 7(a) requires that In other words, proximal inference can account for unmeasured confounding if the number of categories of U is less than that of either Z or W .This leads to the practical recommendation to measure a rich set of baseline characteristics (which can be used as negative controls), such that the proximal identification approach has a higher chance of mitigating unmeasured confounder bias [3].There is not such a straightforward method for expressing the completeness condition in the case of continuous U and negative controls (Z, W ), though some intuition for nonparametric regression results from [3].In Section 4, we investigate the behavior of proximal inference in LSEM setups in which the completeness assumption 7(a) is violated.Lastly, to be valid proxies the variables (Z, W ) must be U − relevant: The U -relevance assumption (also known as U -comparability [15]) requires the unmeasured confounders U of the A−Y relationship to be the same as the unmeasured confounders of the A − W and Z − Y secondary treatment-outcome associations.This is such that, by the negative control framework, any non-null A − W or Z − Y association can be attributed to U confounding the A − Y relationship (while null associations imply no empirical evidence of unmeasured confounding).
Throughout this paper, we suppress the observed confounders X unless otherwise stated.While we do not include X in the sensitivity analysis discussion of Section 6, the addition of X is a straightforward extension of our bias formulations.

2.2
Estimating the Proximal g-Formula via Moment Restriction [8] introduce the notion of an outcome confounding bridge function, which transforms the negative control outcome W to match the confounding effect of U on Y .More precisely, an outcome confounding bridge function h(W, A, X) is a function satisfying for all values of a, x.In other words, if function h(W, A, X) exists, then the confounding effect of U on the transformed variable h(W, a, X) equals the confounding effect of U on Y at exposure level A = a.Given assumptions 1, 5, 6, and 8, [8] infer that which means E[Y (a)] can be estimated following the identification of an outcome bridge function h(W, A, X), if such a function is assumed to exist.[3,7] established the following proximal identification result for the outcome confounding bridge function that leverages the distribution of a NCE Z: Theorem 1. Suppose there exists an outcome confounding bridge function h(w, a, x) solving the Fredholm integral equation almost surely.Then, under Assumptions 1, 2, 4-6, and 7(a), and the ACE is identified by Remark 1.
[3] establish a similar proximal identification result for the existence and identification of a treatment confounding bridge function q(Z, A, X) that leverages the NCO variable W (and an assumption analogous to completeness Assumption 7) instead.Due to the higher complexity of 1 P(A=a|U,X) relative to E[Y |U, A, X] in our chosen LSEMs, we delegate sensitivity analysis involving the treatment confounding bridge function to future work.
Assuming the outcome confounding bridge function h(W, A, X) exists and is identifiable as a solution to (12), [16,8] provide a practical approach for estimating the proximal g-formula using the generalized method of moments (GMM).Suppose one has access to n i.i.d.samples where Z, W are assumed to be correctly classified as treatment-and outcome-inducing confounding proxies, respectively).Moreover, suppose one has specified a parametric model for the confounding bridge, h(W, A, X) = h(W, A, X; b) (e.g., h(W, A, X; b) is linear in W , A, X with unknown parameter b).The true model for h(W, A, X) is unknown, but one may fit a fairly flexible model (including, for instance, splines or interaction terms) to obtain a reasonable estimation in practice.
We define the target parameter θ = (b, ψ) to encode the parameters b of h(W, A, X; b) and the ACE ψ, along with the moment restrictions which can be equivalently written as The resulting parameter estimate b is unbiased by (9), while the ACE estimate ψ is unbiased by (10).

Bias Settings
We have so far collected a series of untestable assumptions 4-8 that replace exchangeability and account for the effect of unmeasured confounders U without directly modeling or estimating U .The impact on the direction and/or magnitude of bias resulting from violations of these assumptions has not been explored.We trust the analyst to identify true negative control exposures and outcomes in this work (Assumptions 4 and 5), as subject matter knowledge should often be quite reliable on this point.Latent unconfoundedness (Assumption 6) is not really an assumption since it presumably holds for some sufficiently rich U .But the richer (or higher dimensional) the U required to satisfy latent unconfoundedness Assumption 6, the less plausible it is that completeness (Assumption 7) or U -relevance (Assumption 8) hold.If many components of U are common causes of the negative control exposures and outcomes, then completeness (Assumption 7) is difficult to satisfy.And if many components of U are required to block all backdoor paths between A and Y, then they are less likely to all be associated with both Z and W , violating Assumption 8.
In Section 4, we characterize the proximal inference estimator bias in a LSEM under scenarios in which each of Z and W are one-dimensional but U (comprising common causes of any of A, Y , Z, and W ) has two independent components.We first consider the case where one component of U is an 'extra' common cause of Z and W not associated with A or Y (which violates completeness (Assumption 7) and is illustrated in Figure 2), then we consider the case where one component of U is a common cause of A and Y but is not associated with either Z or W (which violates U -relevance (Assumption 8) and is illustrated in Figure 3).We would argue that it is difficult to guard against violations of Assumptions 7 and 8 arising in this way using subject matter knowledge, making sensitivity analysis for violations of these types particularly valuable.Additionally, for the settings of Figures 2 and 3, we compare the bias of the proximal estimator due to violations of Assumptions 7 and 8 to the bias of alternative estimators of the ACE which the analyst might implement under an incorrect unconfoundedness assumption.We consider (1) an outcome regression estimator (referred to as "OR") which adjusts for (Z, W ) In Section 5, we characterize the bias of the proximal estimator in a LSEM where Z and W have the same (arbitrary) number of dimensions and U has at least as many components as either Z or W , under the simplifying assumption that the effect of A on Y is not modified by U on the additive scale.This simplifying assumption makes tractable calculations that allow us to develop more general bias formulas for scenarios in which each component of U might have missing arrows into any of A, Y, Z, or W in the causal DAG.
Let us consider i.i.d.data generated according to The causal DAG corresponding to this data generating process can be seen in encode the association between confounder U and the NCE/NCO, respectively.We will explore the sensitivity of the proximal inference bias to particular values of α u , θ u , µ u .The NCE Z is a post-treatment variable in this DGP.We note that DAGs other than Figure 4 might also be compatible with proximal inference assumptions (e.g., Z → A or no arrow between A and Z, in the absence of other changes).More examples of DAGs compatible with proximal inference assumptions can be found in [15].

Base Case: No Violated Assumptions.
As a sample application of the proximal identification method, we identify the confounding outcome bridge function h(W, A, X) corresponding to the baseline case of one-dimensional U (that is, α u2 = θ u2 = µ u2 = γ u2 = 0).For simplicity, we drop the index denoting the first component of U .We have the following DGP: The confounding bridge functions that solve both the outcome and treatment bridge function equations take the form with fitted parameters The proof for the correctness and uniqueness of the above bridge functions is provided in Appendix A.1.The proximal g-formula using either bridge function yields an unbiased estimate of the ACE.

Violations of Proximal Inference Assumptions
We examine setup (18) for two-dimensional U , which implies that at least one of vectors (α u2 , γ u2 ) and (θ u2 , µ u2 ) has all nonzero entries.
In the case where θ u is nonzero (that is, there is a nonzero association between the NCE Z and at least one component of U ), the following theorem holds: Theorem 2. If θ u is nonzero (i.e., Z is U -relevant), then the LSEM (18) with Gaussian (X, U ) violates completeness assumption 7(a).
A proof of Theorem 2 which constructs a counterexample function g(U ) for assumption 7(a) is provided in Appendix B.1.By Theorem 1, we know that violating assumption 7(a) leads to a potentially biased ACE estimate as the outcome confounding bridge function ĥ(W, A, X) resulting from the GMM procedure no longer satisfies (12).In the upcoming sections, we will derive formulas for the resulting bias in the above LSEM when Z is U -relevant, for the particular cases • α u2 = γ u2 = 0 and θ u2 , µ u2 = 0 (section 4.2.1) The two cases were treated separately for simplicity, but they may be combined into a general sensitivity analysis in the context where either vector (α u2 , γ u2 ) or (θ u2 , µ u2 ) has all nonzero entries (and the two cases are not mutually exclusive).

Completeness Violation: Association between Negative Controls
through U = (U 1 , U 2 ) (as in Figure 2) For simplicity, we exclude X from these computations.Let us consider i.i.d.data generated according to where θ u , µ u have all non-zero entries.
From Theorem 2, we know that the above setup satisfies all assumptions except 7(a).Thus, proceeding to solve for the parameters b of a linear outcome bridge function (which is the functional form an investigator who was unaware of U 2 would select) will lead to a biased estimate of the average treatment effect, even if the linear bridge function is correctly specified.The following theorem (see appendix B.1 for a proof) provides a formula for this bias under a linear outcome bridge function: (18) yields a proximal outcome estimator bias equal to where In particular, for γ au1 = 0, the bias can be written as In the remaining theoretical analysis of this case, we make the additional simplifying assumption γ au1 = 0.The general case γ au1 = 0 will be considered in the numerical experiments of Section 6, but we restrict ourselves here for clarity.
Under setup γ au1 = 0: By comparison, the bias resulting from the non-proximal g-computation estimator regressing Y onto (1, Z, W, A, AZ, AW ) is Additionally, the bias resulting from regressing Y onto A is The proofs for the non-proximal g-computation biases can be found in Appendix C.3.It turns out that, under certain setups (θ u , µ u ) denoting the strengths of association between (Z, W ) and U , we are guaranteed to obtain less bias from the proximal g-computation estimator than from the unadjusted regression estimator.We formalize these setups in the following theorem: Theorem 4.Under setup γ au1 = 0, the proximal g-computation bias δ P OR and the unadjusted esimator bias δ unadj can be compared as follows: (i) If θ u1 µ u1 and θ u2 µ u2 have the same sign (both positive or both negative), then |δ P OR | < δ unadj .
(ii) If θ u1 µ u1 and θ u2 µ u2 have different signs, then The proof of Theorem 4 can be found in Appendix C.5.

Partial U -relevance for Two-Dimensional Unobserved Confounder
U (as in Figure 3) For simplicity, we exclude X from subsequent computations.Let us consider i.i.d.data generated according to where α u , γ u have all non-zero entries.
From Theorem 2, we know that the above setup violates assumption 7(a).In addition, we do not have a proof identifying the true outcome confounding bridge function, so fitting a linear model might also be misspecified.The following theorem (see Appendix C.1 for a proof) provides a formula for this bias under a linear bridge function, which can be used in sensitivity analysis.
By comparison, the bias resulting from the non-proximal g-computation estimator regressing Y onto (1, Z, W, A, AZ, AW ) can be obtained as in Section 4.2.1, but we omit the formula here due to space constraints and only include this estimate in numerical experiments.
If we are not considering the proximal estimator, then we may not consider adjusting for the post-exposure variables Z and W .In this case, the bias resulting from regressing Y onto A is Remark 2. We note that, while the proximal estimator bias formula depends only on ), and α u1 (through E[AU 1 ]), the unadjusted estimator bias depends additionally on parameters γ u1 and γ au1 governing the strength of confounding introduced by U 1 in the non-proximal case.

Confounder-Treatment Interaction
We additionally look into a simplified case where γ au = 0 -that is, the confounder is not an effect modifier.Moreover, we assume that the analyst is aware of the lack of interaction between A and U in the true outcome model, so we consider a simplified bridge function model This assumption allows us to more easily obtain bias formulas in the general case of multidimensional Z, W, U, X with dim(Z), dim(W ), dim(U ), dim(X) = (m, n, p, q), for certain relationships between m, n, p, q.For simplicity, we assume that the unobserved and observed confounders (U, X) jointly follow a multivariate normal distribution with mean 0 p+q , V ar(U i ) = V ar(X j ) = 1 for all i = 1, . . ., p, j = 1, . . ., q (under a potential transformation), and some appropriate PSD covariance matrix such that Cov(U, X) = ρ ∈ (−1, 1) p×q .
Let us consider i.i.d.data generated according to The following theorem provides a formula for the proximal identification bias under a linear bridge function. and If m = n < p and matrix B T µ u ∈ R m×m has full rank, then fitting a linear outcome bridge function x X under LSEM (32) yields a proximal outcome estimator bias equal to A proof of Theorem 6 can be found in Appendix C.6.
Remark 3. If m = n = p and B T µ u has full rank, then δ = 0.If p < m or p < n, then we have a similar discussion as in [14] where we can either consider the Moore-Penrose inverse of B T µ u , or reduce the dimensions of Z and W until they match the dimension of U .
Remark 4. Theorem 6 enables sensitivity analysis.Note that the terms E[A] and E[AX] in (33) are straightforwardly estimated from data.Thus, to perform a sensitivity analysis using the bias formula (33), it remains for the analyst to specify the parameters E[AU ] (which is determined by α u ), µ u , γ u , and ρ.An analyst could specify a distribution over these parameters, which, via (33), would imply a distribution over δ as each realization of the parameters drawn from the distribution would correspond to a different bias δ.The range of magnitudes of the parameters governing the strength of association between U and the other variables might be chosen based on the range of magnitudes of associations between the observed variables.And probabilities of zero components in θ u and µ u could determine the proportion of components of U that contribute to bias from U −irrelevance.

Numerical Experiments
In this section, we provide numerical examples to illustrate how the bias of different estimators (proximal and non-proximal) varies with the strength and direction of associations between unobserved U and (A, Y, Z, W ). Due to the relatively large number of parameters involved in the bias formulas, we fix values α 0 = γ 0 = θ 0 = µ 0 = 0 , θ a = θ u1 = 1, µ u1 = 0.5, γ u1 = 1, γ au1 = 1.5 similar to the simulation DGP in [8].We then analyze bias sensitivity to different values of (α u2 , θ u2 , µ u2 , γ u2 ), which encode how strongly the proximal identification assumptions are violated in the presence of U 2 .The numerical results and plots in this discussion have been outputted in Mathematica.

Completeness Violation: Association between Negative
Controls through U = (U 1 , U 2 ) 6.1.1Comparison of estimators for θ u1 , θ u2 , µ u1 , µ u2 > 0 This is the case considered in Theorem 4(i), where it is shown that the bias of the unadjusted estimator always exceeds that of the proximal estimator.Figures 5-8 illustrate the change in absolute bias for each of the three estimators.In all figures, we use the following notation: (1) The solid black curve ("PI") corresponds to the (absolute) proximal estimator bias (2) The dashed curve ("Unadj") corresponds to the (absolute) unadjusted estimator bias from regressing Y on A (3) The dot-dashed curve ("OR") corresponds to the (absolute) adjusted estimator bias from regressing Y on (1, Z, W, A, AZ, AW ) Consistent with Theorem 4 (i), the proximal estimator always outperforms the unadjusted estimator.However, we also note that there exist settings (α u1 , θ u2 , µ u2 ) for which the non-proximal adjusted estimate is (significantly) less biased than the proximal one.Although standard criteria for variable adjustment do not usually include post-exposure covariates in the adjustment set [17], the strong underlying association between U and the observed proxies (Z, W ) might actually help mitigate bias through adjustment.

Comparison of Estimators for Different Directions of Association
Products θ u1 µ u1 , θ u2 µ u2 Same sign of θ u1 µ u1 and θ u2 µ u2 : Figure 9 illustrates the change in absolute bias for each of the three estimators relative to the value of θ u1 , where it is assumed that µ u2 = θ u2 in all cases.We observe that the absolute unadjusted bias is always greater than the proximal estimator bias, which is consistent with the result in Theorem 4.
Comparisons with the adjusted estimator are not as straightforward, but there seem to exist threshold values of α u1 which determine whether the adjusted estimator bias ever exceeds the unadjusted one (such as in Figures 9(a) and (d)).Moreover, when α u1 > 0, there exists a threshold value of |θ u2 | which determines whether the proximal estimator bias exceeds the adjusted bias.The parameters used in these plots are: α 0 = γ 0 = θ 0 = µ 0 = 0, θ a = θ u1 = 1, γ u1 = 1, γ ua1 = 1.5, µ u1 = 0.5, γ a = 0.5.The parameters used in these plots are: Opposite directions of association γ u1 , γ u2 : Figure 12 illustrates the change in absolute bias for the proximal and unadjusted estimators relative to the value of α u2 .The parameters used in these plots are: The distributions of bias appear almost shifted by translation.We observe a reversal in which estimator has less bias to the case of γ u1 , γ u2 > 0.

Discussion
By deriving bias formulas for proximal inference estimators under violations of completeness and U -relevance under a LSEM, we begin to gain some insight into the sensitivity of the proximal inference estimator to these sources of bias.For example, under some settings, it is possible for completeness violations alone without any failure of U -relevance (i.e.too many common causes of the negative control exposure and negative control outcome) to lead the proximal inference estimator to be arbitrarily more biased than an unadjusted estimator completely subject to unobserved confounding (see Figure 10).However, under a LSEM, if the unobserved confounder leads to a positive association between the negative control exposure and negative control outcome, then under a completeness violation with full U -relevance the prox- imal inference estimator is guaranteed to perform better than an unadjusted one, no matter how strong the completeness violation (as shown in Theorem 4).A tentative rule of thumb for the design of proximal inference studies, pending additional evidence from other data generating processes, should be to select negative control exposures and outcomes that are positively associated.Additionally, as discussed in Remark 4, we can use our bias formula results (in particular, (33)) to devise schemes for sensitivity analyses of proximal inference studies.While (33) was derived under the strong assumptions that the data generating process is a LSEM and U is not an effect modifier, an analyst might reasonably conduct a sensitivity analysis using (33) as we described even if they did not believe their data were generated by a LSEM (and did not construct their proximal inference estimators under that assumption), and even if they did not believe that U is not an effect modifier.There is a long history of unrealistic simplifying assumptions in sensitivity analysis.For example, [11] and [18] both assume a one-dimensional binary confounder for tractable sensitivity analysis of no unobserved confounding.Later, [4] developed an approach that made far fewer restrictions.We are in the early stages of proximal inference, so we currently need to settle for sensitivity analyses that make strong simplifying assumptions.
Coefficients of q: We have that P[A|X, U ] = 1 1+exp{(−1) A (α 0 +αxX+αuU )} , such that (35) implies Since Z|U, A, X ∼ N (θ 0 + θ a A + θ u U + θ x X, 1), we get for each A = 0, 1.This is equivalent to Assigning values A = 0, 1, we obtain the system we get Let us consider We will prove that E[g(U )|Z = z, A = a, X = x] = 0 for any values z, a, x.We have We have that and 2 du 2 = 0 (as integrals of odd functions).We then obtain for any z, a, x.However, we clearly do not have g(U ) ≡ 0 a.s., so completeness assumption 7(a) does not hold.

C Bias computations
C.1 Computing the (asymptotic) bias obtained through Method of Moments estimator under setup (29) We will compute the asymptotic bias obtained from the method of moments solver using bridge function h(W, A, 0 We define the moment restrictions , and E[U 1 U 2 ] = 0, we express the coordinates of E[h(D; θ)] = (m 1 , m 2 , m 3 , m 4 , m 5 ) as follows: Let We obtain the estimated bridge function parameters cannot be computed in closed form but can be obtained numerically using a software like Mathematica or Maple once we provide the values of α 0 and α u .
C.2 Computing the (asymptotic) bias obtained through Method of Moments estimator under setup (24) We will compute the asymptotic bias obtained from the method of moments solver using bridge function h(W, A, We define the moment restrictions , and E[U 1 U 2 ] = 0, we express the coordinates of E[h(D; θ)] = (m 1 , m 2 , m 3 , m 4 , m 5 ) as follows: We obtain the estimated bridge function parameters The estimated effect resulting from ĥ(W, A, 0; b) is then which yields a bias equal to In the particular case γ au1 = 0, we obtain a bias equal to Similarly to Proof C.1, we note that the expectations M T Y for the OLS estimator and the following we obtain a linear regression estimator bias equal to In particular, for γ au1 = 0, we obtain a bias equal to to compute the regression estimator bias δ OR .We then obtain S 1 θ a θ u1 1 + S 2 θ a θ u1 1 + In particular, for γ au1 = 0, we obtain a bias equal to Thus, S 1 > 0 as well.
Taking the ratio of magnitudes for the two biases, we have IV.If θ u1 µ u1 θ u2 µ u2 < r * , then 0 ≤ δP OR δunadj < 1.In particular, θ u1 µ u1 θ u2 µ u2 = −∞ implies that the proximal estimator is unbiased (as either θ u2 = 0 or µ u2 = 0).We will compute the asymptotic bias obtained from the method of moments solver using bridge function h(W, A, X; b) = b 0 + b a A + b T w W + b T x X and vector function q(A, Z, X) = (1, A, Z, X).We assume the general case of multi-dimensional U, Z, W, X with Z ∈ R m , W ∈ R n , U ∈ R p , X ∈ R q .Throughout this section, we use

3 )Figure 1 contains
Figure 1 contains DAGs representing each of the proxy types included in L.

Figure 1 :
Figure 1: DAGs representing the three types of variables (X, Z, W ) partitioning L

)
almost surely.Under Assumption 6, we have E[Y (a)] = E E Y |U, A = a, X = x for all a, x.The counterfactual mean E[Y (a)] can then be computed as follows: Corollary 1.1.(Proximal g-formula) If (12) holds almost surely, then the counterfactual mean E[Y (a)], a = 0, 1 is nonparametrically identified by

Figure 2 :
Figure 2: DAG encoding the causal relationships among variables in (24) in which completeness 7(a) is violated

Figure 3 :
Figure 3: DAG encoding causal relationships among variables in (29) in which Urelevance is violated

Figure 4 :
Figure 4: DAG encoding the causal relationships among variables in(18)

2 1 +
exp{−α 0 − α u1 u} du cannot be computed in closed form but can be obtained numerically using a software like Mathematica or Maple once we provide the values of α 0 and α u .C.3 Computing the estimator bias under setup (29) Let M = 1 Z W A AZ AW .By the typical formula b = M T M −1