Head to head comparison of the propensity score and the high-dimensional propensity score matching methods

Background Comparative performance of the traditional propensity score (PS) and high-dimensional propensity score (hdPS) methods in the adjustment for confounding by indication remains unclear. We aimed to identify which method provided the best adjustment for confounding by indication within the context of the risk of diabetes among patients exposed to moderate versus high potency statins. Method A cohort of diabetes-free incident statins users was identified from the Quebec’s publicly funded medico-administrative database (Full Cohort). We created two matched sub-cohorts by matching one patient initiated on a lower potency to one patient initiated on a high potency either on patients’ PS or hdPS. Both methods’ performance were compared by means of the absolute standardized differences (ASDD) regarding relevant characteristics and by means of the obtained measures of association. Results Eight out of the 18 examined characteristics were shown to be unbalanced within the Full Cohort. Although matching on either method achieved balance within all examined characteristic, matching on patients’ hdPS created the most balanced sub-cohort. Measures of associations and confidence intervals obtained within the two matched sub-cohorts overlapped. Conclusion Although ASDD suggest better matching with hdPS than with PS, measures of association were almost identical when adjusted for either method. Use of the hdPS method in adjusting for confounding by indication within future studies should be recommended due to its ability to identify confounding variables which may be unknown to the investigators. Electronic supplementary material The online version of this article (doi:10.1186/s12874-016-0119-1) contains supplementary material, which is available to authorized users.


Background
Observational studies provide real world information on drug use and their potential effect on health outcomes but are prone to confounding by indication [1][2][3][4]. The traditional propensity score (PS) method is often used to control for confounding by indication. It represents "the conditional probability of assignment to a particular treatment given a vector of observed covariates" and is generally obtained thanks to a logistic regression model [5].
The high-dimensional propensity score (hdPS) method has recently been proposed and has been rapidly and widely adopted to address confounding by indication [6,7]. Unlike the PS method which is limited to investigator-specified covariates; the hdPS method also uses a computerized algorithm to select a large number of potential confounders contained within the examined database [5,7].
It is of interest to compare the performance of these two methods in controlling for confounding by indication to inform the design of future observational studies. Performance of both methods may be compared using two distinct approaches, 1) by examining the balance achieved on key potential confounders between subcohorts matched on these two scores [4,[8][9][10][11], and 2) by comparing the measures of associations obtained from the matched sub-cohorts to a gold standard comparator [7,[12][13][14].
Recently, results of a meta-analysis of randomized controlled trials (RCT) have found that exposure to higher statin doses might be associated with higher risks of diabetes (Odds ratio [OR] =1. 12 [95 % confidence intervals (CI) 1.04-1.22]) [15]. Although results obtained from observational studies have been conflicting [16], four out of five published studies found a small increased but statistically significant dose-dependent relationship [17][18][19][20]. However, it is possible that in those observational studies, patients at higher risk of diabetes were more likely to be initiated on higher statins doses: a classic example of confounding by indication offering an excellent opportunity to compare the relative performance of these two scores. In this study, we aim to compare the performance of the PS and hdPS methods in adjusting for confounding by indication using the two approaches defined above.

Data sources
This study was performed using medico-administrative databases from the province of Quebec, Canada. Quebec is the second most populated province in Canada, with more than 8 million inhabitants [21]. A unique identification number is assigned to every individual, and all diagnoses and all health services provided are systematically recorded within the Régie de l'assurance maladie du Québec (RAMQ) databases. Pharmaceutical claims are also recorded but only for residents covered by the RAMQ public drug insurance plan. Information was obtained from the Quebec physician's service and claims databases (i.e. RAMQ databases) and the Quebec hospitalisation databases (i.e., Maintenance et Exploitation des Données pour l'Étude de la Clientèle Hospitalière [MED-ECHO] databases), which have previously been validated [22][23][24][25]. For this study we used three RAMQ databases (i.e., the Demographic, Medical Services and Claims and Pharmaceutical databases) and three MED-ECHO databases (i.e. the Hospitalisation Descriptions, Diagnoses and Intervention databases). Patient records were linked across all databases by use of the unique identification number. The identification numbers were encrypted to protect patient confidentiality. Access to data was granted by the Commission d'accès à l'information and the protocol was approved by the Centre hospitalier de l'Université de Montréal ethics' committee.

Full Cohort
RAMQ provided us with a cohort of 800,551 incident statin users; the date of the first statin dispensation was defined as the cohort entry date. Patients were considered to be incident statin users if they did not have a claim for a statin dispensation in the year prior to the cohort entry date. Eligible patients had: 1) to have been newly initiated on either simvastatin, lovastatin, pravastatin, fluvastatin, atorvastatin or rosuvastatin between January 1 st 1998 and December 31 st 2010, 2) to be covered by the RAMQ public drug insurance plan for at least a year prior to the cohort entry date and 3) to be at least 40 years of age at the cohort entry date. We excluded every patient who, in the year prior or on the cohort entry date: 1) received any other cholesterol lowering drug dispensation (including niacin, cerivastatin or a combination statin drug); 2) received a dispensation for drugs used in the treatment of diabetes (WHO ATC A10) [26]; 3) received a diagnosis of diabetes (ICD-9 code: 250.x; ICD-10 codes: E10.x -E14.x); 4) were admitted in a long-term care facility or 5) received >1 statin dispensation on the cohort entry date. Patients who met both inclusion and exclusion criteria were entered within the Full Cohort.

Exposure status
Patients were categorized into two categories based on the dose and relative potency of the first statin dispensation they received [18,27]. Patients whose first statin dispensation was for a daily dose of ≥10 mg of rosuvastatin, ≥20 mg of atorvastatin or ≥40 mg of simvastatin formed the high potency group and remaining patients formed the lower potency group.

Outcome status
Every patient who received either a dispensation of a drug used in the treatment of diabetes (WHO ATC A10) or a diagnosis of diabetes (ICD-9 code: 250.x; ICD-10 codes: E10.x -E14.x) within the 2 years following the cohort entry date was defined as a case, all other patients were considered to be diabetes-free.

Propensity score method
We used the same list of variables that were used by Dormuth and colleagues to create the PS model [18]. This list included the following covariates: patients' sex, age and poverty level status (yes versus no) at the cohort entry date, year of entry within the cohort (as a categorical variable), medical resource utilization variables (≥1 hospitalisation, ≥5 outpatient visits, ≥5 distinct drugs dispensed to the patient, all within the year prior to the cohort entry date), drug dispensation variables (dispensation of loop diuretics, dispensation of acetaminophen, dispensation of calcium blockers, dispensation of betablockers, dispensation of angiotensin receptor blockers and dispensation of angiotensin converting enzyme inhibitors, all in the year prior to the cohort entry date) and comorbidity variables (hypertension, hypercholesterolemia, myocardial infarction (MI), stroke, peripheral vascular disease (PVD), congestive heart failure, coronary artery bypass graft, and percutaneous coronary intervention (PCI), all in the year prior to the cohort entry date).
Following the selection of the PS model, patients' PS were assessed for all patients included within the Full Cohort. Trimming was performed and patients located within non-overlapping regions of the PS distribution were excluded from the analysis, all other patients were eligible for inclusion within the Matched PS Sub-Cohort [28]. Lower potency controls were found for patients in the high potency group using a greedy, nearest neighbor 1:1 matching algorithm. Matching occurred if the difference in the logit of PS between nearest neighbors was within a caliper width equal to 0.2 times the standard deviation (SD) of the logit of the PS [29]. Patients selected by the matching algorithm were included within the Matched PS Sub-Cohort.
High-dimensional propensity score method hdPS were estimated for all patients included in the Full Cohort [7]. Detailed description of the hdPS method can be found elsewhere [7]. Briefly, the construction of the hdPS model involves two processes, 1) investigators select covariates to be forced within the hdPS model (similar to what is done within an investigator-specified PS model) and 2) the hdPS algorithm selects an additional list of covariates measured within the selected data dimensions based on their multiplicative bias assessment which is then also included within the final hdPS model. Within this study, estimation of patients' hdPS were conducted using the default setting of the SAS hdPS macro v.1 (i.e., top 200 most prevalent variables per data dimensions, top 500 binary empirical covariates based on multiplicative bias assessment).
We structured the data collected from the year prior to the cohort entry date from the following 6 data dimensions: 1) drugs dispensed in an outpatient setting, 2) physician claims codes for inpatient and outpatient procedures, 3) physician claims for inpatient and outpatient diagnostic codes, 4) specialty of the physician providing care, 5) hospitalisation discharge data for inpatient procedure codes and 6) hospitalisation discharge data for inpatient diagnostic code.
In addition to the 500 variables selected by the default option of the hdPS algorithm [7], we forced the following covariates within the hdPS model: patients' sex, age and poverty level status (yes versus no) at the cohort entry date, year of entry within the cohort (as a categorical variable), medical resource utilization variables (≥1 hospitalisation, ≥5 outpatient visits, ≥5 distinct drugs dispensed to the patient, all within the year prior to the cohort entry date). These variables were forced in the model since they could not be selected by the SAS hdPS algorithm v.1 we were using. Trimming was performed and patients located within non-overlapping regions of the hdPS distribution were excluded from the analysis [28], all other patients were eligible for inclusion within the Matched hdPS Sub-Cohort. Lower potency controls were found for patients in the high potency group using a greedy, nearest neighbor 1:1 matching algorithm. Matching occurred if the difference in the logit of hdPS between nearest neighbors was within a caliper width equal to 0.2 times the SD of the logit of the hdPS [29]. Patients selected by the matching algorithm were included within the Matched hdPS Sub-Cohort.

Statistical analyses
Absolute standardized differences (ASDD), defined as the absolute between group difference over the pooled SD of the two groups, were used to compare patient characteristics between patients exposed to a high potency versus lower potency statin within the Full Cohort and both sub-cohorts [4,[8][9][10][11]. ASDD < 0.1 are generally assumed to indicate good balance between groups [2,10]. Discrete data are presented in absolute and relative values (n [%]) and continuous data are presented as mean (SD). OR (95%CI) of diabetes occurrence in the high over lower potency statin groups were estimated within the Full Cohort and within both matched sub-cohorts; no adjustment beyond matching was performed.
All statistical analyses were conducted with SAS version 9.3 (SAS Institute, Cary, North Carolina). Baseline characteristics of the Full Cohort are shown in Table 1 (Fig. 1). This sub-cohort was comprised of 119,931 male patients (50.0 %) and the average age was 64.7 years (SD 11.2) ( Table 2). Balance was obtained for all 18 examined patient characteristics (ASDD ranged from 0.002 to 0.031 with an average of 0.015).

Characteristics of patients included within the Matched hdPS Sub-Cohort
Three hundred and one (0.1 %) patients, 54 (0.0 %) lower potency and 247 (0.1 %) high potency, had hdPS located within non-overlapping regions and were excluded from the analysis (kernel density hdPS curves for all patients included within the Full Cohort are provided in Additional file 2). Among the remaining 403,828 patients, we matched 116,014 patients (28.7 %) initiated on a high potency statin to 116,014 patients (28.7 %) initiated on a lower potency statin based on their individual hdPS; selected patients formed the Matched hdPS Sub-Cohort (Fig. 1).
Patients included within the Matched hdPS Sub-Cohort were on average 64.6 years old (SD 11.2) and 116,688 of them were males (50.3 %) ( Table 3). Balance was obtained in all 18 examined patient characteristics, whether or not they were forced within the hdPS model (ASDD ranged from 0.001 to 0.023 with an average of 0.008).

Performance of the PS and hdPS in adjusting for confounding by indication
As mentioned previously, performance of both methods in adjusting for confounding by indication was tested by two distinct approaches, 1) by comparing the ASDD obtained within both sub-cohorts and 2) by comparing the adjusted OR of diabetes occurrence in the high over lower potency statin groups estimated by the logistic Measures of associations obtained within the Full Cohort and the two matched sub-cohorts indicated that patients in the high potency group had higher odds of developing diabetes within 2-years follow-up. Results

Discussion
As expected, overall patient profiles within the Full Cohort showed imbalance on many key baseline characteristics suggesting the presence of confounding by indication. Such results would tend to indicate the presence of bias within measures of associations estimated within the Full Cohort if appropriate adjustment were not used in the analyses.
In their original paper, Schneeweiss et al. [7] assessed the performance of the hdPS method by comparing measures of associations adjusted for patients' hdPS to the results of a RCT. By showing that the adjusted measures of association were closer to the results of the RCT than the crude measure of association, they showed that hdPS method had improved the adjustment for confounding by indication within their study. Performance of the hdPS method has been assessed by others using the same approach and their results also supported its use [12][13][14]. Measures of association obtained within both matched sub-cohorts were closer to the null (OR = 1. Seeing as the hdPS model adjusted for more variables than the PS model, such a result was to be expected but it needed to be verified in a situation where we have prior knowledge on which confounders to adjust for. The results we show in this study support the idea that the hdPS method may be used to adjust for confounding by indication, but the possibility that residual confounding remaining after this adjustment cannot be ruled out. Our study has several strengths. First, we compared the PS and hdPS method in a large cohort of incident statin users showing substantial imbalance suggesting the potential for confounding by indication. As such, this provided an excellent situation in which to compare the performance of both methods. Second, our conclusions favored the hdPS method when our study design should have favored the PS method. Although all of the examined covariates were forced within the PS model, only five investigatorselected covariates were forced within the hdPS model (only demographic, socio-economic and medical resource utilization variables were forced within the hdPS model, all remaining covariates were selected by the hdPS algorithm [n = 500]) [7]. Therefore, the hdPS method performance was mainly achieved through the use of the automated hdPS algorithm and not by investigator choice.
Our study has also several limitations. First, we compared patients on a relatively small number of baseline patient characteristics. It is possible that the performance observed within the 18 prespecified patient characteristics may not be representative of the overall performance regarding all potential patient characteristics. However, these variables were selected because we believed, like others have [18], that they could lead to confounding by indication and our results show that the hdPS method achieved substantial balance within all of these even though most were not forced within the hdPS model.
Second, we defined unbalance as ASDD > 0.1. Although this cut-off is frequently used [2,10,16], other values could have been used as well. Regardless of the cut-off value chosen, our results indicate that the hdPS method outperformed the PS method in achieving the most balanced sub-cohort. Although this added level of balance may not eliminate all biases within the observational study (i.e. information bias, unmeasured confounders, time-varying confounding), it will at least tend to reduce the level of bias caused by these baseline characteristics.
Third, no mechanism of action by which statins could cause diabetes has been identified. Although we compared both methods using a frequently used exposure definition, we cannot claim that this exposure definition reflects the true mechanism of action by which statins could cause diabetes. It is possible that the results obtained, had we used an exposure definition reflecting the true mechanism of action, could have differed from those obtained within this study. This also reflects the fact that we do not know what the true measure of association between the exposure to statins and diabetes is. As mentioned, traditionally the hdPS method has been validated by comparing the crude and hdPS-adjusted measures of association to a gold standard measure but in this case, such a true gold standard is not available. We recognize that this would not have been the case had we conducted this comparison within an ordinary simulation study in which the truth would be defined by the investigators. However, as others have highlighted, [7,30] the hdPS method cannot be evaluated through the use of ordinary simulation studies since its performance depends on the complexity and quantity of data available to the hdPS algorithm, levels of which cannot be reproduced within a fully artificial setting. In order to circumvent this issue, Franklin et al. [30] recently proposed that performance of the hdPS method compared to the performance of the PS method be compared using plasmode simulation studies. Using this framework, Franklin et al. showed that an investigator-independent hdPS method performed nearly as well as a fully specified PS model further supporting the use of the hdPS method in situations where little prior knowledge regarding potential confounding variables is available [30]. Such an approach may be validated in future work aimed at further evaluating the performance of the hdPS method.
Fourth, our results show that hdPS trimming removed slightly more patients than PS trimming (301 vs 55). Although this could impact our conclusion regarding the value of both methods, its impact should be marginal since the total number of patients that were trimmed in both settings remains trivial in comparison to the total sample size of the Full Cohort.
Finally, we only examined the relative performance of the PS and hdPS methods within a single context; the Comorbidity status, drug dispensations and medical utilization rates were assessed in the year prior to the cohort entry date. Absolute standardized differences are defined as the between group difference over the pooled standard deviation of the two groups a At the cohort entry date results obtained within this study may not be generalizable to other studies focusing on other exposure-disease associations. Furthermore, we only compared the traditional PS method estimated using a logistic regression model to hdPS method, while other methods are also available (e.g., classification and regression trees and boosting methods) [31,32]. Future work will be needed to compare the relative performance of all these different methods.

Conclusions
In conclusion, we recommend comparing the PS and hdPS methods by means of their relative ability to select balanced sub-cohorts over their adjustment potential within ethiological studies. Although both methods adequately adjusted for confounding by indication, we cannot rule out the possibility that the hdPS method will be dominant in other contexts since it has the potential to identify confounders which are unknown to the investigators.

Ethics approval and consent to participate
Access to data was granted by the Commission d'accès à l'information and the protocol was approved by the Centre hospitalier de l'Université de Montréal ethics' committee.

Not applicable
Additional files