Adjusting for Indirectly Measured Confounding Using Large-Scale Propensity Score

Confounding remains one of the major challenges to causal inference with observational data. This problem is paramount in medicine, where we would like to answer causal questions from large observational datasets like electronic health records (EHRs) and administrative claims. Modern medical data typically contain tens of thousands of covariates. Such a large set carries hope that many of the confounders are directly measured, and further hope that others are indirectly measured through their correlation with measured covariates. How can we exploit these large sets of covariates for causal inference? To help answer this question, this paper examines the performance of the large-scale propensity score (LSPS) approach on causal analysis of medical data. We demonstrate that LSPS may adjust for indirectly measured


Introduction
Causal inference in the setting of unmeasured confounding remains one of the major challenges in observational research.In medicine, electronic health records (EHRs) have become a popular data source for causal inference, where the goal is to estimate the causal effect of a treatment on a health outcome (e.g., the effect of blood-pressure medicine on the probability of a heart attack).EHRs typically contain tens of thousands of variables, including treatments, outcomes, and many other variables, such as patient demographics, diagnoses, and measurements.Causal inference on these data is often carried out using propensity score adjustment [1].Researchers first select confounders among the many observed variables, either manually (based on medical knowledge) or empirically.Then they estimate a propensity model using those selected variables and employ the model in a standard causal inference method that adjusts for the propensity score (the conditional probability of treatment).While this strategy is theoretically sound, in practice researchers may miss impor-tant confounders in the selection process, which leads to confounding bias, or may include variables that induce other types of bias (e.g., a "collider" or a variable that induces "M-bias").
In this paper, we study a closely related, but different, technique, known as large-scale propensity score (LSPS) adjustment [2].LSPS fits an L1regularized logistic regression with all pre-treatment covariates to estimate the propensity model.LSPS then uses standard causal inference methods, with the corresponding propensity scores, to estimate the causal effect.For example, LSPS might be used with matching [3,4,5] or subclassification [6].
In contrast to the traditional approach of explicitly selecting confounders, LSPS is a "kitchen-sink" approach that includes all of the covariates in the propensity model.While the L1-regularization might lead to a sparse propensity model, it is not designed to select the confounders in particular.Instead, it attempts to create the most accurate propensity model based on the available data, and LSPS diagnostics (described below) use covariate balance between treatment and control groups (i.e., that covariates are distributed similarly in the two groups) to assess whether all covariates are in fact adjusted-for in the analysis regardless of their L1-regularization coefficient.
The discussion over how many covariates to include in a propensity model is an old one [7,8,9,10,11,12,13] and considers-in the setting of imperfect information about variables-the tradeoff between including all measured confounders versus including variables that may increase bias and variance [14,15].To address this issue, LSPS uses only pre-treatment covariates to avoid bias from mediators and simple colliders, and it uses diagnostics and domain knowledge to avoid variables highly correlated with the treatment but uncorrelated with outcomes (known as "instruments").Including such variables can increase the variance of the estimate [14,15,16,17] and can amplify bias [18,19,20,21,22,23].
Further, researchers have studied whether LSPS may also address indirectly measured confounders [34,40,44].The hope behind these studies is that when we adjust for many covariates, we are likely to be implicitly adjusting for the confounders that are not directly measured but are correlated with existing covariates.Hripcsak et al. [34] and Schuemie et al. [40] used LSPS to estimate the causal effect of anti-hypertension drugs, adjusting for about 60,000 covariates.An important confounder, baseline blood pressure, was not contained in most of the data sources.In the one source that did contain blood pressure, adjusting for all the other covariates but no blood pressure resulted in (nearly) balancing blood pressure between propensityscore-stratified cohorts; the resulting causal inference was identical to the one obtained when including blood pressure in the propensity model.
Based on this observation, Chen et al. [44] studied the effect of dropping large classes of variables from the LSPS analysis, using balance of the covariates between the treatment and control groups as a metric for successful adjustment (i.e., is every covariate in the propensity model balanced between the cohorts).If all the variables of one type were eliminated from the propen-sity model (e.g., medical diagnoses), then the inclusion of a large number of other variables (e.g., medications, procedures) resulted in the complete balancing of the missing variables.Even more striking, if all variables related to one medical area like cardiology were dropped from the model (e.g., all cardiology-related diagnoses, procedures, medications, etc.), then the rest of the covariates still balanced the dropped cardiology covariates.Yet if too few covariates were included, such as just demographics, then balance was not achieved on the other covariates.Based on these studies, LSPS appears to be adjusting for variables that are not included but correlated with the included covariates.
In this paper, we explore conditions under which LSPS can adjust for indirectly measured confounders.In particular, we provide some theoretical assumptions under which LSPS is robust to some indirectly measured confounders.They are based on the "pinpointability" assumption used in Wang and Blei [45,46].A variable is pinpointed by others if it can be expressed as a deterministic function of them, though the function does not need to be known.In the context of causal inference from EHR data, we show that if confounders that are indirectly measured but can be pinpointed by the measured covariates, then LSPS implicitly adjusts for them.For example, if high blood pressure could be conceivably derived from the many other covariates (e.g., diagnoses, medicines, other measurements) then LSPS implicitly adjusts for high blood pressure even though it is not directly measured.
From a theoretical perspective, pinpointability is a strong and idealized assumption.But in practice, several empirical observations showed that important confounders that are not directly measured often appear to undergo adjustment when LSPS is used.Therefore, there might be hope that some of the indirectly measured confounders are capturable by the existing covariates.We do not assert LSPS as a magical solution to unmeasured confounding-the assumption is strong-but as an attempt to better understand the empirical success of LSPS in adjusting for indirectly measured confounders.
To explore this phenomenon, we use synthetic data to empirically study the sensitivity of LSPS to the degree to which pinpointability is violated.We find that under perfect pinpointability, adjusting for measured covariates removes the bias due to indirectly measured confounding.As the data deviates from pinpointability, adjusting for the measured covariates becomes less adequate.
Finally, we study real-world medical data to compare LSPS to a traditional propensity score method based on previously used manually selected confounders.We find that removing a known confounder has a bigger impact on a traditional propensity score method than on LSPS, presumably because it is indirectly measured.This finding suggests that including large-scale covariates with LSPS provides a better chance of correcting for confounders that are not directly measured.
The paper is organized as follows.Section 2 describes the LSPS algorithm, the pinpointability assumption, and the effect of pinpointability on M-structure colliders, instruments, and near-instruments.Section 3 studies the impact of violations of pinpointability on the fidelity of the estimated causal effects.Section 4 presents empirical studies comparing LSPS to classical propensity-score adjustment (with manually selected covariates), and methods that do not adjust.Section 5 compares LSPS to other approaches to adjusting for indirectly measured confounding and makes connection to other related work.Section 6 concludes the paper.

The Large Scale Propensity Score Algorithm
In this section, we summarize the LSPS algorithm, describe an assumption under which LSPS will adjust for indirectly measured confounding and potentially mitigate the effect of adjusting for unwanted variables, and make some remarks on the assumption.

The LSPS algorithm
We summarize the LSPS algorithm [2], including the heuristics and diagnostics that normally surround it (e.g., Weinstein [25]).Consider a study where a very large number of covariates are available (e.g., over 10,000) and the problem of estimating the causal effect of a treatment.Rather than selecting confounding covariates and adjusting for them, LSPS adjusts for all of the available covariates.It uses only pre-treatment covariates to avoid adjusting for mediators and simple colliders (which induce bias), and it uses diagnostics and domain knowledge to avoid "instruments," variables that are correlated with the treatment but do not affect the outcome.(Such variables increase the variance of the causal estimate.)By design, LSPS includes all measured confounders.The hope is that in real-world data, such as in medicine, adjusting for all the other nonconfounder variables would not impart bias, and empirical comparisons to traditional propensity approaches seem to bear that out [2,24,25,26].The further hope is that by balancing on a large number of covariates, other indirectly measured factors would also become balanced, and this is what we address in Section 2.2.
The inputs to LSPS are observed pre-treatment covariates X and binary treatment T .The output is the estimated causal effect ν. LSPS works in the following steps.
1. Remove "instruments."Remove covariates that are highly correlated with the treatment and are unlikely to be causally related to the outcome.Univariate correlation to treatment is checked numerically, and domain expertise is used to determine if the highly correlated variables are not causally related to the outcome; if the relationship is unclear, then the variable is not removed.Note these covariates are commonly called "instruments," and used in instrumental variable analysis [47].LSPS, however, does not do instrumental variable analysis, and removes these variables to reduce downstream variance.

Fit the propensity model and calculate propensity scores.
Given the remaining covariates, fit an L1-regularized logistic regression [48] to estimate propensity scores p(t | x).The regression is where θ is the vector of the regression parameters.L1-regularized logistic regression minimizes where λ is the tuning parameter that controls the strength of the L1 penalty.
LSPS uses cross-validation to select the best regularization parameter λ.
It then refits the regression model on the entire dataset with the selected regularization parameter.Finally, it uses the resulting model to extract the propensity scores for each datapoint.
3. Check the equipoise of the propensity model.In this step, LSPS assesses whether the conditional distribution of assignment given by the propensity model is too certain, i.e., whether the treatment and control groups are too easily distinguishable.The reason is that a propensity model that gives assignments probabilities close to zero or one leads to high-variance estimates [15], e.g., because it is difficult to match datapoints or create good subclasses.
To assess this property of the propensity model, LSPS performs the diagnostic test of Walker et al. [49].This diagnostic assesses the overlapping support of the distribution of the preference score, which is a transformation of the propensity score 1 , on the treatment and control groups.If there is overlapping support (at least half the mass with preference between 0.3 and 0.7) then the study is said to be in equipoise.If a study fails the diagnostic, then the analyst considers if an instrument has been missed and removes it, or interprets the results with caution.

Match or stratify the dataset based on propensity scores and
then check covariate balance.Matching [3,4,5] or subclassification [6] on propensity scores can be used to create groups of individuals who are similar.
The details about the two methods are provided in Supplementary S1.The remaining of the algorithm is explained in the context of 1-to-1 matching.
1 Define the preference score P ref as a transformation of the propensity score p(t | x) that adjusts for the probability of treatment p(t = 1), Once the matched groups are created, balance is assessed by computing the standardized mean difference (SMD) of each covariate between the treated group and the control group from the matched dataset , where xt=1 and xt=0 are the mean of the covariate in the treated and the control group respectively, and σ 2 t=1 and σ 2 t=0 are the variance of the covariate in the treated and the control group respectively.Following Austin [50], if any covariate has a SMD over 0.1 [50], then the comparison is said to be out of balance, and the study needs to be discarded (or interpreted with caution).

5.
Estimate the causal effect.The last step is to use the matched data to estimate the causal effect.In the simulations in Section 3, the causal effect of interest is the average treatment effect where Y i (1) and Y i (0) are the potential outcomes for a subject under treatment and under control.
To estimate the ATE using matched data, a linear regression is fitted on the matched data with a treatment indicator variable.The coefficient of the treatment indicator is the average treatment effect.When subclassification is used to create balanced subclasses, the effect is estimated within each subclass and then aggregated across subclasses.The weight for each subclass is proportional to the total number of individuals in each subclass.
In the empirical studies of Section 4, the causal effect of interest is hazard ratio (the outcome Y is time to event).When matching is used, we fit a Cox proportional hazards model [51] on the matched dataset to estimate the hazards ratio.When subclassification is used, we fit a Cox model within each subclass and then weigh the conditional hazards ratio by the size of the subclass to obtain the marginal hazards ratio.More details are provided in Supplementary S2.

Adjusting for indirectly measured confounders
As noted in the introduction, LSPS has been found to adjust for known but indirectly measured confounders [34,40,44].We describe here an assumption under which LSPS will adjust for indirectly measured confounding.
Consider the causal graph in Fig. 1 for an individual i (the subscript is omitted in the graph), where vector of observed pre-treatment covariates with length M (which includes observed confounders and other variables), and U ∈ R is the indirectly measured confounder.The goal is to estimate the causal effect of treatment T on the outcome Y .To do so, we need to adjust for both pre-treatment covariates X (including directly measured confounders) and U (indirectly measured confounder).We assume that there are no other unmeasured confounders.
In the following sections, we will demonstrate that LSPS can still produce unbiased causal estimates even in the presence of indirectly measured confounders.We first introduce Assumption 1, which indicates the relationship between measured covariates and indirectly measured confounders.where δ(•) denotes a point mass at f (•).
In other words, the indirectly measured confounder U can be represented by a deterministic function f of the measured covariates X. Theorem 1, building upon Assumption 1, formally states the conditions that LSPS needs in order to obtain unbiased causal estimates by only conditioning on measured covariates.We use the potential outcome framework by Rubin [52].
Let Y i (1) and Y i (0) denote the potential outcome under treatment and under control respectively for an individual i.
Theorem 1.The treatment and the potential outcomes are independent conditioning on all confounders, both the directly measured (X) and indirectly measured (U ), Under the pinpointability assumption, the above conditional independence can be reduced to only conditioning on the measured covariates, In other words, the causal effect of the treatment on the outcome is identifiable by only adjusting for the measured covariates X.We do not need to know the indirectly measured confounders U or its functional form f (•).
Proof.Theorem 1 relies on the marginalization over U in computing the propensity score using high-dimensional measured covariates, where u * = f (x).
When U is weakly pinpointed by X, or in other words, f (X) measures U with error, the average treatment effect is not point identifiable.Assuming identification holds conditional on the unmeasured confounder, Ogburn and VanderWeele [53] show that the average treatment effect adjusting for the noisy measured confounder is between the unadjusted and the true effects, under some monotonicity assumptions.We extend the work by Ogburn and VanderWeele [53] to conditions where additional confounders and covariates exist.
Theorem 2. Let T be a binary treatment, Y be an outcome, X C be all measured confounders, U be an ordinal unmeasured confounder, and U ′ be the noisy measurement of U .Assume that the measurement error of U is tapered and nondifferential with respect to T and Y conditional on X C , and that the support of X C .Then the average treatment effect adjusting for the measured covariates lies between the true effect and effect adjusting for only the measured confounders, that is, Theorem 2 states that the average treatment effect adjusting for all measured covariates (i.e., by LSPS) is bounded between the true effect and the effect adjusting for only the measured confounders.Nondifferential misclas- ) for all y, y ′ ∈ Y, t, t ′ ∈ {0, 1}, and Tapered misclassification error means that the misclassification probabilities p ij = p(U ′ = i | U = j), i, j ∈ {1, . . ., K} is nonincreasing in both directions away from the true unmeasured confounder.
We show that Theorem 1 and Theorem 2 hold in the simulations with various degrees of pinpointability.Proof of Theorem 2 is an extension of the work by Ogburn and VanderWeele [53], and is given in the Supplement.

Effect of pinpointing on instruments and M-bias
Because LSPS uses a large number of covariates, there is a concern that adjusting for these covariates will induce bias due to M-structure colliders, instrumental variables (IVs), and near instrumental variables (near-IVs).As noted above, our goal is not to do instrumental variable analysis but rather to remove their potential effect of increasing variance and amplifying bias.
IVs are addressed in part by domain knowledge and diagnostics, but some IVs may remain.In this section, we discuss how LSPS in the setting of pinpointing may address them.

Effect on IV and near-IV
Instrumental variables [47] may persist despite LSPS's procedures.In the setting of unmeasured confounding, IV can cause bias amplification as shown numerically [54,55] and proved theoretically in various scenarios [18,19,20,21,22,23].Insofar as pinpointing adjusts for indirectly measured confounding (Fig. 2a), even if there are IVs in the propensity score model, they will not produce bias amplification [22].
Near-instrumental variables (near-IVs) [15], which are weakly related to  the outcome and strongly related to the treatment, may also lead to bias amplification [15,18,22], and the bias amplification or the confounding may dominate.Just as for IVs, pinpointing (Fig. 2b) may reduce bias amplification by reducing indirectly measured confounding [22], while the confounding is eliminated by adjusting for the near-IV.

Addressing M-bias
Despite the use of pre-treatment variables, bias through colliders is still possible due to causal structures like the one in Fig. 2c, known as an Mstructure, causing M-bias.In this case, two unobserved underlying causes create a path from T to Y via a collider that can precede T in time.If the collider is included in the many covariates, then this can induce bias.
LSPS may be able to address M-bias in the following way.If the common cause between the treatment and the collider (Z 1 ) can be pinpointed by the measured covariates, then this will block the back-door path from T to Y .
Similarly, the common cause between the outcome and the collider (Z 2 ) could be pinpointed, also blocking the path.The assertion that one or both of these common causes is pinpointed is similar to the assertion that U is pinpointed.

Simulations
We use the simulation to show that, under the assumption of pinpointability, LSPS can adjust for the indirectly measured confounding, and as the condition deviates from pinpointability, bias in LSPS increases, and the estimate by LSPS is between the true effect and the effect adjusting for only measured confounders.In this simulation, we assume that the large number of covariates X are derived from a smaller number of underlying latent variables V.This data generating process induces dependencies among the measured covariates, mimicking the dependencies observed in EHR.

Simulation setup
Each simulated data set contains N = 5, 000 patients, M = 100 measured covariates (including 10 measured confounder), 1 indirectly measured confounder, 10 latent variables, a treatment and an outcome.The data , where v i is a vector of latent variables, v i = (v i1 , . . ., v i10 ); x i is a vector of measured covariates, x i = (x i1 , . . ., x iM ); u i , t i and y i are all scalar, representing the indirectly measured confounder, treatment and outcome respectively.When the function is stochastic, U is weakly pinpointed by X.The degree of pinpointability is varied by adding varied amount of noise into the function.
1. Simulate the latent variable v i as v i ∼ Bernoulli(0.5)K .
3. Simulate the indirectly measured confounder u i as where β u ∼ N (0, 1) M .Notice that u is a deterministic function of x.
To allow only a small subset of the covariates pinpoint u, we randomly select 90% of the β u and set their value to 0.
4. Simulate the treatment t i as where the effect of the indirectly measured confounder on the treatment γ u = 1, γ x ∼ N (0.5, 1) for the 10% of covariates that serve as measured confounders.The rest of the γ x are set to 0.

Simulate the outcome y i as
where the true causal effect ν = 2, the effect of the indirectly measured confounder on the outcome η u = 1, η x ∼ N (0.5, 1) for the covariates that serve as measured confounders and 0 otherwise.
The above steps illustrate the simulation under pinpointability.To increase the deviation from pinpointability, we add an increasing amount of random noise to the indirectly measured confounder.To do so, we modify the simulation of u i in Step 3 to be where ϵ i ∼ N (0, σ 2 ).To increase deviation from pinpointability, we increase σ 2 from 10 −4 to 10 4 .At each pinpointability level, the Gaussian noise is non-differential with respect to the treatment, outcome, and the measured covariates, and tapered (decreasing in both directions away from the true unmeasured confounder).We simulate 50 datasets at each pinpoitability level.

Statistical analysis
We demonstrate LSPS's capacity in adjusting for unmeasured confounding relative to other methods under varied degree of pinpointability.Specifically, we compared the following five methods: • unadjusted: no covariate was adjusted for.
• manual without U: adjust for all confounders not including U • manual (oracle): adjust for all confounders including U • LSPS without U: adjust for all measured covariates not including U • LSPS: adjust for all covariates including U Notice that manual with U coincides with oracle in this simulation, because manual with U adjusts for nothing but the confounders, both measured and unmeasured.In practice, because the confounding structure is rarely known, it is unlikely a manual method captures all the confounders.
For the four methods that adjust for confounders, we estimated propensity scores with L1-regularized logistic regression (and selected the regularization parameter with cross-validation).We then used 1:1 matching and subclassification (results not shown) to create a balanced dataset.To estimate the average treatment effect, we fit a linear regression model on the balanced dataset.We then calculated the mean, 95% confidence interval, and root-mean squared error (RMSE) of the effect estimates.
The RMSE, defined as follows, can be calculated because the true treatment effect is known in the simulation (ν = 2).This metric is not applicable to the empirical study because the true effect of medications is unknown in practice.
where νs is the effect estimate at simulation s, and we simulate a total of S = 50 datasets at each given pinpointability condition.

Results
Fig. 4 shows the results of the simulation.When pinpointability holds reasonably well, the unmeasured confounder has a bigger impact on the manual method than on LSPS.As pinpointability gets weaker, adjusting for large-set of covariates becomes less adequate for accounting for the unmeasured con- founder, and the estimated treatment effect approaches the estimate from the method adjusting for only the measured confounders.
Under strong pinpointability, the estimates from the two large-scale approaches (LSPS without U and LSPS) have almost the same bias, variance, and RMSE compared to the estimate from the oracle.The manual approach (manual without U ) does not benefit from pinpointability because it does not include covariates that assist pinpointing.As the condition deviates from strong pinpointability, the large-scale approach (LSPS without U ) becomes increasingly biased, approaches and eventually overlaps the estimate from the manual without U approach.This simulation result matches Theoreom 2 that AT E cov is between AT E conf and AT E true under certain monotoniticty assumptions.In the simulation, AT E conf is given by manual without U , AT E cov is given by LSPS without U , and AT E true is given by manual with U .

Empirical studies
We now use real data to compare LSPS to the traditional propensity-score adjustment (with manually selected covariates) to adjusting for confounding.
With an EHR database, we compared the effect of two anti-hypertension drugs, hydrochlorothiazide and lisinopril, on two clinical outcomes, acute myocardial infarction (AMI) and chronic kidney disease (CKD).For both outcomes, type 2 diabetes mellitus (T2DM) is a known confounder.Thus, by including or excluding T2DM in an adjustment model while keeping other covariates the same, we can assess a method's capacity in adjusting for a known confounder that is not directly measured but may be correlated with measured covariates.

Cohort and covariates
We used a retrospective, observational, comparative cohort design [56].
We included all new users of hydrochlorothiazide monotherapy or lisinopril monotherapy and defined the index date as the first observed exposure to either medication.We excluded patients who had less than 365 days of observation prior, a prior hypertension treatment, initiated another hypertension treatment within 7 days, or had the outcome prior to index date.We followed patients until their end of continuous exposure, allowing for maximum gaps of 30-days, or their end of observation in the database, whichever came first.
For the LSPS-based approach, we used more than Most covariates (e.g., diagnoses, medications, procedures) were encoded as binary, that is, 1 indicates the code is present in the patient's medical history prior to treatment, and 0 otherwise.For some variables that are often considered as continuous (e.g., lab tests), LSPS does not impute the value because imputation could do more harm than good when missing mechanism is not known.Instead, LSPS encodes the lab ordering pattern as binary variables.Residual error appears as measurement error.
The study was run on the Optum© de-identified electronic health record database of aggregated electronic health records.

Statistical analysis
We examined a method's capacity in adjusting for confounding that is not directly measured by comparing the effect estimates from each method with or without access to the confounder.We excluded all variables related to T2DM, including diagnoses and anti-glycemic medications in models without access to indirectly measured confounders.We included an unadjusted method as a baseline for comparison.Specifically, we studied the following five methods (analogous to the five methods in the simulation): • unadjusted: no covariate was adjusted for.
• manual: adjust for a list of manually selected confounders.
• manual without T2DM: adjust for a list of manually selected confounders without T2DM-related confounders.
• LSPS: adjust for all pre-treatment covariates in the database.
• LSPS without T2DM: adjust for all pre-treatment covariates in the database without T2DM-related confounders.
For the four methods that adjust for confounders, we estimated propensity scores with L1-regularized logistic regression (and selected the regularization parameter with cross-validation).We used subclassification and stratified the dataset into 10 subclasses.To estimate the treatment effect, we fit a Cox proportional-hazards model [51] to estimate the hazard ratio (HR).We then calculated the mean and 95% confidence interval of the HR.

Results
Fig. 5 shows the results of empirical studies.These results show that the T2DM had a bigger impact on the manual methods than on the LSPS methods.The impact was determined by comparing the absolute difference in effect estimates between the two manual models versus the two LSPS-based models.In the CDK study, the absolute difference between the two manual methods was 0.09 (Manual without T2DM: HR 0.77 [95% CI, 0.71-0.83]; Manual: HR 0.86 [95% CI, 0.79-0.93]),higher than the absolute difference between the two LSPS-based methods, which was 0.05 (LSPS without T2DM: HR 0.84 [95% CI, 0.77-0.92];LSPS: HR 0.89 [95% CI, 0.82-0.97]).In fact, the estimates for manual with T2DM, LSPS with T2DM, and LSPS without T2DM were all closer to each other than to manual without T2DM.Therefore, whether manual with T2DM or LSPS with T2DM is actually closer to ground truth, LSPS without T2DM is closer to either one than is manual without T2DM.
This finding suggests that by including large-scale covariates, one has a better chance of correcting for confounders that are not directly measured.
The pinpointability assumption is more likely to hold when there are many measured covariates.

Discussion
We have illustrated conditions under which LSPS adjusts for indirectly measured confounding and the impact of violations of such conditions on effect estimation.We have found in previous practice, in our current simulations, and in our current real-world study that indirectly measured (or  unused) confounding can be adjusted for in LSPS, apparently working better than smaller, manually engineered sets of covariates that are also missing the confounder.
Even though pinpointing in the current simulation is achieved by generating the unmeasured confounder from a function of the measured covariates, this does not suggest the causal direction between the unmeasured confounder and the measured covariates.In medicine, it is likely that an unmeasured confounder (e.g., a disease such as T2DM) induces dependencies among large-scale clinical covariates (e.g., medications for treating the disease, laboratory tests for monitoring the disease, and other diseases that often co-occur with the disease can be correlated).In other words, the unmeasured confounder could be a latent variable in a factor model, and the strength of pinpointing depends on the number of measured covariates and the degree of dependency among the covariates.
We describe here methods that are related to LSPS.These methods are related to LSPS in different ways.Section 5.1 compares and contrasts LSPS to other methods that also address unmeasured confounding in causal effect estimation.Section 5.2 compares LSPS to another propensity score-based method that also uses large-scale covariates.Section 5.3 draws similarity between LSPS and another method for causal effect estimation in the presence of indirectly measured confounding where pinpointability is a required assumption.

Relation to proxy variable, multiple imputation and residual bias detection
Studies such as those by Kuroki and Pearl [62], Miao et al. [63], and Tchetgen Tchetgen et al. [64] have shown that causal effects can be identified by observing proxy variables of confounders that are not directly measured.
In this case, the confounder is known but not measured, there is sufficient knowledge of the structural causal model such that proxies can be selected, and there is the knowledge that there are no other unmeasured confounders.
In contrast, LSPS does not require explicit knowledge of the causal model.
Another approach is to use measured covariates to explicitly model unmeasured confounders using multiple imputation [65,66].While it differs from our approach, it exploits the same phenomenon, that some covariates contain information about unmeasured confounders.This is in contrast to LSPS, where there is no explicit model of unmeasured confounders; adjusting for measured covariates should be effective for causal inference as long as the measured covariates pinpoint the unmeasured confounders.
Given the need to assume no additional unmeasured confounding-additional in the sense of not being pinpointed or not having proxies-a complementary approach is to estimate the degree of residual bias, potentially including additional unmeasured confounding.Large-scale use of negative and synthetic positive controls [42,41] can detect residual bias and can additionally be used to calibrate estimates to assure appropriate coverage of confidence intervals.

Relation to high-dimensional propensity score adjustment
LSPS adjusts for all available pre-treatment covariates.In practice, because the sample size is limited, regularized regression selects a subset of variables to represent the information contained in the whole set of covariates, but the goal is to represent all the information nonetheless.Therefore, LSPS diagnostics [42,2] test balance not just on the variables that regularized regression included in the model, but on all the covariates.All covariates are retained because even those that are not direct confounders may still contribute to the pinpointing of the unobserved confounders.Therefore, LSPS is not a confounder selection technique.
LSPS is distinct from techniques that attempt to select confounders empirically [67].Some of these techniques also start with large numbers of covariates, but they attempt to find the subset that are confounders using information about the treatment and outcome.They then adjust for the selected covariates.As long as all confounders are observed and then selected, adjusting for them should eliminate confounding.It may not, however, benefit from the pinpointing that we identify in this paper.Unlike LSPS, confounder selection techniques are dependent on the outcomes, and the outcome rates are often very low in medical studies, potentially leading to variability in selection.Empirical studies [44] show that adjusting for a small number of confounders does not successfully adjust for unobserved confounders, and an empirical comparison of the methods favored LSPS [2].

Relation to the deconfounder
LSPS and the deconfounder [45,46] are distinct but share several features.
The deconfounder is a causal inference algorithm that estimates unbiased effects of multiple causes in the presence of unmeasured confounding.Under the pinpointability assumption (unmeasured confounders are pinpointable by multiple causes), the deconfounder can infer unmeasured confounders by fitting a probabilistic low-rank model to capture the dependencies among multiple causes.The deconfounder has been applied to EHR data for treatment effect estimation in the presence of unmeasured confounding [68].Both methods thus can be shown to address unmeasured confounders when there is pinpointing.

Conclusions
In summary, LSPS is a confounding adjustment approach that includes large-scale pre-treatment covariates in estimating propensity scores.Once the subclasses are defined, LSPS checks that the subclassification achieves covariate balance.Balance is achieved if when we reweigh the data according to the subclasses, each covariate in the treatment and control group follows the same distribution.
To check covariate balance, we first compute the weight for each datapoint w i to be equal to the reciprocal of the number of datapoints from its treatment group within its subclass.Mathematically, that is, where the denominator is the number of datapoints that received treatment t in subclass s.Then, the weighted mean of the covariate for treatment group The weighted covariate variance is defined as With the weighted covariate mean and variance, LSPS computes the standardized mean difference to assess covariate balance.

S2. Outcome Model for Survival Analysis
In the empirical studies of Section 4, we use a Cox proportional hazards model (Cox, 1972) to estimate an unbiased hazard ratio (the outcome Y is time to event).The Cox model is expressed by the hazard function denoted by h(τ ).Within a subclass s, the subclass-specific hazard function h s (τ ) is estimated as, where τ is the survival time, h s 0 (τ ) is the baseline hazard at time τ , t is the treatment, and exp(ζ s ) is the subclass-specific hazard ratio of the treatment.This expression gives the hazard function at time τ for subjects with treatment t in subclass s.The hazard ratio can be obtained by reweighting exp(ζ s ) by the size of the subclass.However, in practice, due to within-subclass zero counts and finite machine precision, the hazard ratio is estimated by optimizing the Cox partial conditional across subclasses as in the Cyclops R package [69].

S.3 Proof of Theorem 2
We prove Theorem 2, which states that under certain monotonicity assumptions about the effect of the unmeasured confounder on the treatment and on the outcome, the average treatment effect adjusting for the all measured covariates lies between the true effect and the effect adjusting for only the measured confounders.
Let T be a binary treatment, Y be an outcome, U be an ordinal unmeasured confounder, U ∈ {1, . . ., K}, and X be some measured covariates.Furthermore, let X C be the measured confounders, and X ∁ C be the complement set of the measured confounders, X ∁ C = {x : x ∈ X : x ̸ ∈ X C }.We assume that the measured covariates X ∁ C form a noisy measurement of U .We denote this noisy measurement as U ′ .The relationship between the true unmeasured confounder U and the noisy measurement U ′ , is established by the following misclassification probabilities, p ij = p(U ′ = i | U = j), i, j ∈ {1, . . ., K}.Then, p(U ′ = i) = K j=1 p ij p(U = j).We assume that the misclassification probabilities of U is nondifferential with respect to T , Y , and X; that is, p(U ′ = u ′ | U = u, Y = y, T = t, X = x) = p(U ′ = u ′ | U = u, Y = y ′ , T = t ′ , X = x ′ ) for all y, y ′ ∈ Y, t, t ′ ∈ {0, 1}, and x, x ′ ∈ X .If p ij ≤ p ik and p ji ≤ p ki for j < k < i , and p il ≥ p im and p li ≥ p mi for i < l < m, then we say that the misclassification probabilities are tapered.
The proof of this theorem relies on Lemma 1-4.We extend Lemma 1-4 in Ogburn and VanderWeele [53] to conditions where identifiability holds by conditioning on measured and unmeasured confounders.
then the treatment effect adjusting for all measured covariates falls between the true effect and the effect adjusting for the measured confounders alone.
and E [T | X, U ] is nondecreasing and the other nonincreasing in U , then Lemma 2 establishes the relationship between the true expectations and expectations conditioning on the measured confounders.Lemma 3 establishes the relationship between expectations conditioning on all measured

Assumption 1 .
(Pinpointability of indirectly measured confounder) An unmeasured confounder U is said to be pinpointed by measured covariates if

Figure 1 :
Figure 1: Causal graphs to estimate the treatment effect of T on the outcome Y .(a) Causal graph under no pinpointability.The unmeasured confounder U is not pinpointed by X.(b) Causal graph under perfect pinpointability.The unmeasured confounder U is a deterministic function of the measured covariates X. (c) Causal graph under weak pinpointability.The unmeasured confounder is only partially pinpointed by X. Random variables are represented with circles, deterministic variables are represented with squares, measured variables are shaded, indirectly measured or unmeasured are not shaded, strong pinpointing is presented with a solid line, and weak pinpointing is presented with a dash line.

Figure 2 :
Figure 2: Causal graph of a) instrumental variable, b) near-instrumental variable and c) M-structure collider.The dashed line with a solid dot means that the variable by the solid dot can be pinpointed by the measured covariates.We use different subscripts to distinguish the measured covariates playing different roles in the causal graph.

Fig. 3
Fig.3shows the data generating process using a causal diagram.Below are the steps to simulate data for patient i.

Figure 3 :
Figure 3: Causal diagram of the simulation to estimate the effect of the treatment T on the outcome Y .The high-dimensional measured covariates X are induced by a lowdimensional latent variable V.The unmeasured confounder U is simulated as a function of the measured covariates X.When the function is deterministic, U is pinpointed by X.

Figure 4 :
Figure 4: Sensitivity analysis of pinpointability in Simulations 1.As pinpointability of the indirectly measured confounder decreases, LSPS's ability to adjust for the indirectly measured confounder decreases.(a) The mean and 95% CI of the estimated average treatment effect.(b) The RMSE of the estimated average treatment effect.
(a) Acute myocardial infarction (b) Chronic kidney disease

Figure 5 :
Figure 5: Comparison of hazard ratio from the unadjusted model and four models adjusting for confounders.The indirectly measured (or unused) confounder T2DM had a bigger impact on the HR estimated by manual models than by LSPS.(a) HR of the two antihypertensive medications on AMI.(b) HR of the two anti-hypertensive medications on CKD.
The parameters in the Cox model are estimated by optimizing the likelihoodL(ζ s ) = i:C i =1∩S i =s exp(ζ s t i ) j:Y j ≥Y i exp(ζ s t j ), where C i = 1 indicates the occurrence of the outcome.

Lemma 2 .Lemma 3 .Lemma 4 .
If E [Y | T, X, U ] and E [T | X, U ] are either both nonincreasing or both nondecreasing in U , then E X C [Y | T = 1] ≥ E [Y (1)] and E X C [Y | T = 0] ≤ E [Y (0)].If one of E [Y | T, X, U ] and E [T | X, U ] is nonincreasing and the other nondecreasing in U , then E X C [Y | T = 1] ≤ E [Y (1)] and E X C [Y | T = 0] ≥ E [Y (0)].Suppose that U is nondifferentially misclassified with respect to T and Y .If E [Y | T, X, U ] and E [T | X, U ] are both nondecreasing or both nonincreasing in U , then E X C ,U ′ [Y | T = 1] ≥ E [Y (1)] and E X C ,U ′ [Y | T = 0] ≥ E [Y (0)].If one of E X C ,U ′ [Y | T, X, U ] and E [T | X, U ] is nondecreasing and the other nonincreasing in U , then E X C ,U ′ [Y | T = 1] ≤ E [Y (1)] and E X C ,U ′ [Y | T = 0] ≤ E [Y (0)].Suppose that U is nondifferentially misclassified with respect to T and Y with tapered misclassification probabilities.If E [Y | T, X, U ] and E [T | X, U ] are both nondecreasing or both nonincreasing in U , then 60,000 covariates in the EHR database, including demographics, all medications in the 365 days prior to index date, all diagnoses in the 365 days prior to index date, and the disease, atrial fibrillation, Charlson index -Romano adaptation, Platelet aggregation inhibitors excl.heparin, Warfarin, corticosteroids for systemic use, dipyridamole, non-steroidal anti-inflammatory drugs(NSAIDS), protonpump inhibitors (PPIs), statins, estrogens, progestogens, body mass index (BMI), chronic obstructive pulmonary disease(COPD), liver disease, dyslipidemia, valvular heart disease, drug abuse, cancer, HIV infection, smoking and stroke.