Statistical Tool in Integrated Oncology: Propensity Score Methods

Cancer is a good example where randomized experiments may not be good enough to evaluate effectiveness of care. On one hand, care is a process in a dynamic system, spanning from primary prevention through long-term survival and end-of-life care, involving multiple steps and interfaces that need to proceed smoothly in contracts to a reductionist approach which focuses on improvements in specific technical aspects of care and not the system. The totality of the diagnostic and treatment advances brought by the reductionist approach is less than the integrated care that is desired [1]. On the other hand, cancer patients turn to complementary and alternative medicine, influenced by cultural beliefs, expectations, and family and social support, in hopes of improving clinical outcomes controlling symptoms, and enhancing quality of life [2,3]. Such complementary and alternative therapies include acupuncture, yoga, hypnosis, meditation, guided imagery, biofeedback, aromatherapy, herbal remedies, massages integrated in to the conventional care [2,4]. Existing evidence suggest that the use of alternative medicines instead of conventional treatment is associated with worsened survival [5]. To help patients make informed decisions and improve quality of care, practicing oncologists and health care professionals need to have evidence on how multiple level of influence impact quality of care in addition to the benefit and risk of the different alternative therapies in an integrated healthcare system.


Introduction
Randomized controlled trials are viewed as the "gold" standard study designs to generate evidence on effectiveness of interventions. In some situations in which randomized experiments are not possible, or not ethical, or not economical, observational studies will play a pivotal role to generate evidence.
Cancer is a good example where randomized experiments may not be good enough to evaluate effectiveness of care. On one hand, care is a process in a dynamic system, spanning from primary prevention through long-term survival and end-of-life care, involving multiple steps and interfaces that need to proceed smoothly in contracts to a reductionist approach which focuses on improvements in specific technical aspects of care and not the system. The totality of the diagnostic and treatment advances brought by the reductionist approach is less than the integrated care that is desired [1]. On the other hand, cancer patients turn to complementary and alternative medicine, influenced by cultural beliefs, expectations, and family and social support, in hopes of improving clinical outcomes controlling symptoms, and enhancing quality of life [2,3]. Such complementary and alternative therapies include acupuncture, yoga, hypnosis, meditation, guided imagery, biofeedback, aromatherapy, herbal remedies, massages integrated in to the conventional care [2,4]. Existing evidence suggest that the use of alternative medicines instead of conventional treatment is associated with worsened survival [5]. To help patients make informed decisions and improve quality of care, practicing oncologists and health care professionals need to have evidence on how multiple level of influence impact quality of care in addition to the benefit and risk of the different alternative therapies in an integrated healthcare system.
Observational studies using routinely collected data may provide evidence needed on the impact of multiple levels of contextual influence and the comparative effectiveness as well as safety of alternative medications in real life setting [6]. However, they are constrained by confounding bias which arise due imbalances, systematic differences, in patient characteristics between treatment groups. To reduce the effect of confounding bias, propensity-score (PS) [7,8] methods have been frequently used when using observational data to estimate the effects of treatments [9]. Propensity score is the probability of receiving a certain treatment (Z=1) versus a comparator (Z=0) conditional on a set of measured patient characteristics, covariates [7,8]. It is estimated using, for example, ordinary logistic regression models where the dependant variable is treatment received (Z=1 versus Z=0) and the independent variables include pre-treatment patient characteristics [7,8]. It is important to note that PS methods help to control confounding by measured covariates and it cannot balance unmeasured covariates, except to the extent that they are correlated with measured ones [8].
The aim of propensity sore methodology is to balance covariates between treatment groups hence while fitting the propensity score models, it is important to give emphasis on covariate to be included in the model and balance achieved by the PS model [7,9]. Inclusion of covariates related to the outcome of interest and confounding factors (related to both the treatment assignment and outcome) should be included. Inclusion of 1) intermediates, factors in the causal path way between treatment and outcome, 2) colliders, which are effects of treatment and outcome or treatment and confounders, should not be included in the model, and 3) strong instrumental variables, variables that are only related to treatment but independent of confounders and outcome, should not be included in the PS model. This approach will avoid adjusting away the effect of treatment in 1, collider stratification bias in 2 and amplification of bias due to a strong unmeasured confounding in 3 [9][10][11]. The third bias in unlikely scenario in most studies and only result when the instrument is 1) strong, 2) independent of measured and unmeasured confounders as well as outcome 3) if there is strong unmeasured confounding in which case the instrument can be used to conduct instrumental variable analysis [12]. It is useful to include clinically important interaction and higher order terms of covariates to improve balance. Since the true PS is not known in observational studies, there is no a single model that gives the best balance. As a result PS model should be fitted in an iterative way by checking covariate balance achieved by each PS model [7]. Graphical presentation of the overlap in PS distributions between treated and untreated subjects gives an insight in to the quality of the data to answer the research question at hand and the extent of generalizability that could be made. Absolute standardized mean difference is an optimal measure of balance with <10% considered as an acceptable balance [9,13] and should be accompanied by graphical methods such as box plots [7][8][9]. The use of significance tests (pvalues), goodness of fit tests, C-statistics should be avoided as these measured do not adequately indicate balance achieved, the aim of PS methods [9,14,15].
Once the PS is estimates and adequate covariate balance is achieved, one of the four PS methods can be used: propensity score matching (PSM), sub-classification on the PS, covariate adjustment using the PS, and inverse probability of treatment weighting (IPTW) using the PS. [7,8,16] It is important to note that these methods might give different answers, the treatment effect estimate, and the choice should be made based on the research question. While IPTW and covariate adjustment suing the PS will provide the average effect of treatment in the entire population (ATE) which would be obtained from a randomized experiment: the effect that we would have had if everyone is be treated versus if no one is treated. Similar treatment effect can be obtained from PSM if all untreated subjects are retained during the matching, however, PSM often gives the effect of treatment in the treated population (ATT) and depending on the number of untreated patients excluded during the matching, and it could be the ATT for whom we have matched untreated subjects. Sub classification using the PS can provide ATE or ATT by using different weights when pooling the stratum specific treatment effects [7][8][9]17].
In settings where the outcome is influenced by multi-level factors, for example, cancer care or surgical procedures (at patient level and different system levels such as hospitals or treatment centres), the use of ordinary logistic regression may result in misinterpretation of study findings. [18] We recommend multi-level modelling of the PS to distinguish individual level effects from contextual effects (hospital, society, etc.) and the methods have been well developed [18][19][20].
Reporting of the different aspects of PS analysis is as important as the analysis itself for better appraisal of the studies [9].