Skip to main content

Advertisement

Log in

Panacea or poison: Assessing how well basic propensity score modeling can replicate results from randomized controlled trials in criminal justice research

  • Published:
Journal of Experimental Criminology Aims and scope Submit manuscript

Abstract

As a substitute, randomized controlled trial (RCT) researchers increasingly rely on propensity score modeling (PSM) to estimate causal effects. However, some warn about the dangers of placing too much blind faith in the abilities of PSM. This study tests the reliability and validity of seven common PSM methods in their ability to remove an artificial selection bias and replicate results from several RCTs in criminal justice data. Findings suggest PSM can be an effective means for simulating RCT results. Meta-analyses reveal the average difference between PSM and RCT estimates were relatively small. Ultimately, our findings suggest that PSM can be an effective means for simulating an RCT while also harboring reason for concern. Researchers and policy-makers should approach the use and interpretation of PSM with cautious optimism as it appears to provide a reliable and valid estimate of the treatment effect most of the time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. The NACJD is a subsection of the Inter-university Consortium of Political and Social Research (ICPSR), which is a data-sharing, online repository. ICPSR and NACJD partner with the federal government and public funding institutions to ensure that any data collected under the auspices of such funding are publicly available.

  2. Many of these records were duplicates given the similarities in the keyword search terms.

  3. Although seemingly inherent in the term RCT, there were many studies identified by this keyword that did not involve true random assignment over the course of the project. For instance, some researchers may have been required to stop random assignment in the middle of an evaluation due to ethical issues, such as upon evidence that the rehabilitation program under study was effective in improving behavior. These studies may have introduced bias into the treatment effects, and therefore, we opted not to include them in the current investigation.

  4. This number is based on what was needed to prepare the data for PSM. A power analysis indicated that to detect a true difference of a medium effect size (using Cohen’s d = .5), with approximately .80 power, required at least 65 cases per group (Cohen, 1988). To ensure there are at least twice as many comparison cases available for matching once we introduced the 50% selection bias to the treatment group, our analysis required a minimum of 130 cases per group in the original study.

  5. We ran our analyses with and without this study, and there were no substantive or statistical differences in the results.

  6. We estimated the standardized percent bias using Austin’s (2011) two formulas, with continuous measures calculated as \(d=\frac{{\overline x}_{treatment}-{\overline x}_{control}}{\sqrt{\displaystyle\frac{s_{treatment}^2+s_{control}^2}2}}\) where \(\overline x\) denotes the mean of the respective groups (treatment or control), and S2 denotes sample variance, and for dichotomous measures\(d=\frac{{\widehat P}_{treatment}-{\widehat P}_{control}}{\sqrt{\displaystyle\frac{{\widehat P}_{treatment}\left(1-{\widehat P}_{treatment}\right)+{\widehat P}_{Control}\left(1-{\widehat P}_{Control}\right)}2}}\) where \(\widehat P\) denotes the proportion of the measure’s respective group.

  7. Although others have argued that standardized differences can vary from 10 to 25% (Stuart et al., 2013), we opted to rely on the original standards set by Rosenbaum and Rubin (1985) as the 20% is a slightly more conservative ceiling.

  8. The rest of the PSM studies used a different form (covariate balancing propensity score estimation or machine learning), while others (nine studies) did not mention the technique used to condition the score at all.

  9. We refer readers to Guo and Fraser (2014) for a more detailed description of these techniques and their assumptions.

  10. It is worth noting that while Austin (2010) highlighted .20 rather than .25 as being a good standard, the .25 we use here is based on the original conceptualization of Rosenbaum and Rubin (1985), and likely captures a wider practice by researchers in both academia and imbedded in justice agencies.

  11. The minimum number of matched controls was represented by \(minimum\;n=\frac{\frac{1-\left({\textstyle\frac t{t+c}}\right)}2}{\displaystyle\frac t{t+c}}\) where t is the number of treatment cases in the biased sample and c is the number of comparison cases from which to draw matches. Similarly, the maximum number of controls to match to each treatment was represented by \(maximum\;n=\frac{2\left(1-\left(\frac t{t+c}\right)\right)}{\displaystyle\frac t{t+c}}\).

  12. This is an important distinction because some 1-many matching schemes force a matched set to achieve a certain number of cases. If the researcher determines each treatment case should have three matched controls, then each matched set will have invariably four cases. Any treatment case that cannot achieve three matched controls is either lost (when a caliper is employed) or is forced to match with an otherwise incompatible control (when a caliper is not used; see Ming & Rosenbaum, 2000).

  13. It is still possible that adequate matches are not found for both control and treatment cases.

  14. As its name suggests, the IPTW applies a weight to control cases that is equal to the inverse of the case’s odds of being in the treatment. This weight (\(\omega\)) for each case \((x\)) is calculated using \(\omega \left(t,x\right)=t+(1-t)\frac{Pr}{1-Pr}\) where t is the treatment measure (1 for treated cases, 0 for untreated), and Pr is the propensity score (see Guo & Fraser, 2014, citing Hirano & Imbens, 2001; Hirano, Imbens, & Ridder, 2003).

  15. After accounting for some degree of common support, the control weight is calculated as \({\omega }_{s} =\frac{{n}_{z,s}}{{n}_{{z}^{^{\prime}},s}}\) where \({n}_{z,s}\) is the number of units assigned to the treatment group within each stratum (\(s)\), and \({n}_{{z}^{^{\prime}}, s}\) is the number of units in the control group within stratum s (see Hong, 2010, p. 519). All treatment units would receive no weight (i.e., equal to 1).

  16. It is reasonable to expect that the original RCT samples possess relatively low AUC values (e.g., < .600) whereas the biased samples should yield much higher AUC values (e.g., > .800). The closer the AUC value of a PSM sample gets to .500, the more it can be said that the propensity score can no longer differentiate between the treatment and control cases (i.e., the two groups are balanced). To calculate an AUC for the unbiased, experimental data, we fit a logistic regression model to the original dataset with the same measures used in the biased samples’ propensity score. All AUC statistics were calculated using the DeLong et al. (1988) approach, and compared using Hanley and McNeil (1982)’s test of significance for independent sample curves. While the AUC does not address any misspecification of the propensity-score conditioning logistic regression, we assessed and ensured the fit of the logit model individually.

  17. In tandem importance with the reduction of bias is the estimation of hidden bias. Due to the nature of PSM being a quasi-experimental design, there is always the potential that an unobserved covariate may have impacted the findings if it had been observed. To test for this, we use the sensitivity analysis posited by Rosenbaum (2002, 2005) which focuses on the difference in the matched and unmatched outcomes. Specifically, we used the user-written codes for Stata of mhbounds for dichotomous outcomes and rbounds for continuous outcomes. These tests assess how sensitive the findings are to the potential of hidden bias by simulating the ability of an unobserved covariate to predict assignment to the treatment condition in the form of gamma. Gamma is essentially a measure of the degree to which an unobserved measure must improve the prediction of treatment assignment compared to the current propensity score models. As gamma increases, the findings are understood as robust to hidden bias.

  18. Cohen’s d is the standardized difference between two means, and it is calculated as the difference in means between two groups divided by their pooled standard deviation.

  19. If the two 95% CIs did not overlap, we considered them statistically different from one another (Cumming & Calin-Jageman, 2016).

  20. We interpreted r values of .1, .3, and .5 as indicative of small, medium, and large correlations (Cohen, 1988).

  21. For this analysis, we averaged the outcomes within study to produce only one ES per unique sample. This process ensured that our meta-analysis would be weighted by sample size and not by the number of outcomes included.

  22. The random-effects model was selected a priori on conceptual grounds because this method can be used to extend the results of the meta-analysis to a wider population of studies when it cannot be determined with any degree of certainty that the current population of studies is functionally similar (see Bornstein et al., 2009).

  23. Although it is common for meta-analysts to test study heterogeneity with the Q test, the Q statistic only informs about the presence or absence of heterogeneity, not the extent of such heterogeneity. In contract, the I2 statistic serves to quantify the degree of heterogeneity between studies and is presented in easily comparable percentage terms. We interpreted I2 according to Higgins and Thompson’s (2002) guidelines, where values of around 25%, 50%, and 75% are indicative of low, medium, and high levels of heterogeneity among the ESs, respectively.

  24. We also conducted the same set of analyses by averaging the ESs within each study first and then making comparisons (n = 11). This process yielded similar findings to those presented here.

  25. The .24 difference in d was established in the education literature and specific to educational performance in relation to various intervening practices. That said, it is related to many, if not all, of the outcomes we used in that the educational measures are typically behavioral and/or scaled attitudinal tests, with some of the study samples involving educational settings for crime prevention programs.

References

  • Apel, R. J., & Sweeten, G. (2010). Propensity score matching in Criminology and Criminal Justice. In A. R. Piquero & D. Weisburd (Eds.), Handbook of Quantitative Criminology (pp. 543–562). New York: Springer.

    Chapter  Google Scholar 

  • Austin, P. C. (2008). A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Statistics in Medicine, 27(12), 2037–2049.

    Article  MathSciNet  PubMed  Google Scholar 

  • Austin, P. C. (2009). Some methods of propensity-score matching had superior performance to others: Results of an empirical investigation and Monte Carlo simulations. Biometrical Journal, 51(1), 171–184.

    Article  MathSciNet  PubMed  Google Scholar 

  • Austin, P. C. (2010). Statistical criteria for selecting the optimal number of untreated subjects matched to each treated subject when using many-to-one matching on the propensity score. American Journal of Epidemiology, 172(9), 1092–1097.

    Article  PubMed  PubMed Central  Google Scholar 

  • Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399–424.

    Article  PubMed  PubMed Central  Google Scholar 

  • Austin, P. C., & Stuart, E. A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661–3679.

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Braga, A. A., Piehl, A. M., & Hureau, D. (2009). Controlling violent offenders released to the community: An evaluation of the boston reentry initiative. Journal of Research in Crime and Delinquency, 46(4), 411–436.

    Article  Google Scholar 

  • Bornstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. John Wiley & Sons.

    Book  Google Scholar 

  • Campbell, C. M., Labrecque, R. M., Mohler, M. E., & Christmann, M. J. (2022). Gender and community supervision: Examining differences in violations, sanctions, and recidivism outcomes. Crime & Delinquency, 68(2), 284–325.

    Article  Google Scholar 

  • Campbell, C. M., Abboud, M. J., Hamilton, Z. K., vanWormer, J., & Posey, B. (2019). Evidence-based or just promising? Lessons learned in taking inventory of state correctional programming. Justice Evaluation Journal, 1(2), 188–214.

    Article  Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge.

    Google Scholar 

  • Cole, S. R., Platt, R. W., Schisterman, E. F., Chu, H., Westreich, D., Richardson, D., & Poole, C. (2010). Illustrating bias due to conditioning on a collider. International Journal of Epidemiology, 39(2), 417–420.

    Article  PubMed  Google Scholar 

  • Cumming, G., & Calin-Jageman, R. (2016). Introduction to the New Statistics: Estimation, Open Science, and Beyond (Reprint edition). Routledge.

    Book  Google Scholar 

  • Dehejia, R. H., & Wahba, S. (1999). Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American Statistical Association, 94(448), 1053–1062.

    Article  Google Scholar 

  • Dehejia, R. H., & Wahba, S. (2002). Propensity score-matching methods for nonexperimental causal studies. Review of Economics and Statistics, 84(1), 151–161.

    Article  Google Scholar 

  • DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44(3), 837. https://doi.org/10.2307/2531595

    Article  CAS  PubMed  Google Scholar 

  • Diamond, A., & Sekhon, J. S. (2012). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. The Review of Economics and Statistics, 95(3), 932–945.

    Article  Google Scholar 

  • Dong, N., & Lipsey, M. W. (2018). Can propensity score analysis approximate randomized experiments using pretest and demographic information in pre-k intervention research? Evaluation Review, 42, 34–70.

    Article  PubMed  Google Scholar 

  • Freedman, D. A., & Berk, R. A. (2008). Weighting regressions by propensity scores. Evaluation Review, 32(4), 392–409.

    Article  PubMed  Google Scholar 

  • Gaes, G. G., Bales, W. D., & Scaggs, S. J. A. (2016). The effect of imprisonment on recommitment: An analysis using exact, coarsened exact, and radius matching with the propensity score. Journal of Experimental Criminology, 12, 143–158.

    Article  Google Scholar 

  • Gottfredson, D. C., Cook, T. D., Gardner, F. E., Gorman-Smith, D., Howe, G. W., Sandler, I. N., & Zafft, K. M. (2015). Standards of evidence for efficacy, effectiveness, and scale-up research in prevention science: Next generation. Prevention Science, 16(7), 893–926.

    Article  PubMed  PubMed Central  Google Scholar 

  • Guo, S., & Fraser, M. W. (2014). Propensity score analysis: Statistical methods and applications (2nd ed.). SAGE Publications Inc.

    Google Scholar 

  • Hamilton, Z. K., Campbell, C. M., van Wormer, J., Kigerl, A., & Posey, B. (2016). The impact of swift and certain sanctions: An evaluation of Washington State’s policy for offenders on community supervision. Criminology & Public Policy, 15(4), 1009–1072.

    Article  Google Scholar 

  • Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747

    Article  CAS  PubMed  Google Scholar 

  • Hansen, B. B. (2004). Full matching in an observational study of coaching for the SAT. Journal of the American Statistical Association, 99(467), 609–618.

    Article  MathSciNet  Google Scholar 

  • Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558.

    Article  PubMed  Google Scholar 

  • Hill, J. (2008). Discussion of research using propensity-score matching: Comments on ‘A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003’ by Peter Austin. Statistics in Medicine, 27(12), 2055–2061.

    Article  MathSciNet  PubMed  Google Scholar 

  • Hirano, K., & Imbens, G. W. (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Services and Outcomes Research Methodology, 2(3–4), 259–278.

    Article  Google Scholar 

  • Hirano, K., Imbens, G. W., & Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71(4), 1161–1189.

    Article  MathSciNet  Google Scholar 

  • Hong, G. (2010). Marginal mean weighting through stratification: Adjustment for selection bias in multilevel data. Journal of Educational and Behavioral Statistics, 35(5), 499–531.

    Article  Google Scholar 

  • Hong, G. (2012). Marginal mean weighting through stratification: A generalized method for evaluating multivalued and multiple treatments with nonexperimental data. Psychological Methods, 17(1), 44.

    Article  PubMed  Google Scholar 

  • Hong, H., Aaby, D. A., Siddique, J., & Stuart, E. A. (2019). Propensity score-based estimators with multiple error-prone covariates. American Journal of Epidemiology, 188(1), 222–230.

    Article  PubMed  Google Scholar 

  • Imai, K., & Ratkovic, M. (2014). Covariate balancing propensity score. Journal of the Royal Statistical Society: Series B (statistical Methodology), 76(1), 243–263.

    Article  MathSciNet  Google Scholar 

  • Kim, R. H., & Clark, D. (2013). The effect of prison-based college education programs on recidivism: Propensity Score Matching approach. Journal of Criminal Justice, 41(3), 196–204.

    Article  Google Scholar 

  • King, G., & Nielsen, R. (2016). Why propensity scores should not be used for matching. Political Analysis, 27(4), 435–454.

    Article  Google Scholar 

  • Labrecque, R. M., Mears, D., & Smith, P. (2019). Gender and the effect of disciplinary segregation on prison misconduct. Advanced on-line publication.

    Google Scholar 

  • LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. The American Economic Review, 76(4), 604–620.

    Google Scholar 

  • Loughran, T. A., Wilson, T., Nagin, D. S., & Piquero, A. R. (2015). Evolutionary regression? Assessing the problem of hidden biases in criminal justice applications using propensity scores. Journal of Experimental Criminology, 11(4), 631–652. https://doi.org/10.1007/s11292-015-9242-y

    Article  Google Scholar 

  • Luellen, J. K., Shadish, W. R., & Clark, M. H. (2005). Propensity scores: An introduction and experimental test. Evaluation Review, 29(6), 530–558.

    Article  PubMed  Google Scholar 

  • Lunt, M. (2014). Selecting an appropriate caliper can be essential for achieving good balance with propensity score matching. American Journal of Epidemiology, 179(2), 226–235.

    Article  MathSciNet  PubMed  Google Scholar 

  • MacDonald, J., Stokes, R. J., Ridgeway, G., & Riley, K. J. (2007). Race, neighbourhood context and perceptions of injustice by the police in Cincinnati. Urban Studies, 44(13), 2567–2585.

    Article  Google Scholar 

  • McCaffrey, D., Ridgeway, G., & Morral, A. (2004). Propensity score estimation with boosted regression for evaluating adolescent substance abuse treatment. Psychological Methods, 9(4), 403–425.

    Article  PubMed  Google Scholar 

  • McNiel, D. E., & Binder, R. L. (2007). Effectiveness of a mental health court in reducing criminal recidivism and violence. American Journal of Psychiatry, 164(9), 1395–1403.

    Article  PubMed  Google Scholar 

  • Ming, K., & Rosenbaum, P. R. (2000). Substantial gains in bias reduction from matching with a variable number of controls. Biometrics, 56(1), 118–124.

    Article  CAS  PubMed  Google Scholar 

  • Nagin, D. S., & Sampson, R. J. (2019). The real gold standard: Measuring counterfactual worlds that matter most to social science and policy. Annual Review of Criminology, 2(1), 123–145.

    Article  Google Scholar 

  • Peikes, D. N., Moreno, L., & Orzol, S. M. (2008). Propensity score matching: A note of caution for evaluators of social programs. The American Statistician, 62(3), 222–231.

    Article  MathSciNet  Google Scholar 

  • Ridgeway, G., & McCaffrey, D. F. (2007). Comment: Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 540–543.

    Article  MathSciNet  Google Scholar 

  • Rosenbaum, P. R. (1984). From association to causation in observational studies: The role of tests of strongly ignorable treatment assignment. Journal of the American Statistical Association, 79(385), 41–48.

    Article  Google Scholar 

  • Rosenbaum, P. R. (2002). Observational studies. Springer.

    Book  Google Scholar 

  • Rosenbaum, P. R. (2005). Heterogeneity and causality. The American Statistician, 59(2), 147–152. https://doi.org/10.1198/000313005X42831

    Article  MathSciNet  Google Scholar 

  • Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.

    Article  MathSciNet  Google Scholar 

  • Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39(1), 33–38.

    Article  Google Scholar 

  • Rubin, D. B. (2006). Matched sampling for causal effects. Cambridge University Press.

    Book  Google Scholar 

  • Shadish, W. R. (2013). Propensity score analysis: Promise, reality and irrational exuberance. Journal of Experimental Criminology, 9(2), 129–144.

    Article  Google Scholar 

  • Shadish, W. R., Clark, M. H., Steiner, P. M., & Hill, J. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. Journal of the American Statistical Association, 103(484), 1334–1350.

    Article  MathSciNet  CAS  Google Scholar 

  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.

  • Smith, J. A., & Todd, P. E. (2005). Does matching overcome LaLonde’s critique of nonexperimental estimators? Journal of Econometrics, 125(1–2), 305–353.

    Article  MathSciNet  Google Scholar 

  • Smith, J., & Todd, P. (2001). Reconciling conflicting evidence on the performance of propensity-score matching methods. American Economic Review, 91(2), 112–118.

    Article  Google Scholar 

  • Steiner, P. M., Cook, T. D., Shadish, W. R., & Clark, M. H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15(3), 250–267.

    Article  PubMed  Google Scholar 

  • Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science : A Review Journal of the Institute of Mathematical Statistics, 25(1), 1–21.

    Article  MathSciNet  PubMed  Google Scholar 

  • Stuart, E. A., Lee, B. K., & Leacy, F. P. (2013). Prognostic score-based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. Journal of Clinical Epidemiology, 66(8), S84-S90.e1.

    Article  PubMed  PubMed Central  Google Scholar 

  • ten Bensel, T., Gibbs, B., & Lytle, R. (2014). A propensity score approach towards assessing neighborhood risk of parole revocation. American Journal of Criminal Justice, 40(2), 377–398.

    Article  Google Scholar 

  • Ury, H. K. (1975). Efficiency of case-control studies with multiple controls per case: Continuous or dichotomous data. Biometrics, 31(3), 643–649.

    Article  CAS  PubMed  Google Scholar 

  • van Wormer, J. G., & Campbell, C. (2016). Developing an alternative juvenile programming effort to reduce detention overreliance. Journal of Juvenile Justice, 5(2), 12.

    Google Scholar 

  • Vito, G. F., Higgins, G. E., & Tewksbury, R. (2017). The effectiveness of parole supervision: Use of propensity score matching to analyze reincarceration rates in Kentucky. Criminal Justice Policy Review, 28(7), 627–640.

    Article  Google Scholar 

  • Wooldridge, J. M. (2005). Violating ignorability of treatment by controlling for too many factors. Econometric Theory, 21(5), 1026–1028.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Shenyang Guo, Zachary Hamilton, Stephen Vaisey, and Ozcan Tunalilar for their valuable feedback during this process.

Funding

This project was supported by a grant from the National Institute of Justice (Award #2016-R2-CX-0030). The opinions, findings, and conclusions expressed in this article are those of the authors and do not necessarily reflect those of the Department of Justice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christopher M. Campbell.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(DOCX 15.5 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Campbell, C.M., Labrecque, R.M. Panacea or poison: Assessing how well basic propensity score modeling can replicate results from randomized controlled trials in criminal justice research. J Exp Criminol 20, 229–253 (2024). https://doi.org/10.1007/s11292-022-09532-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11292-022-09532-y

Keywords

Navigation