What do we learn about case studies from follow-up regression analysis?

Most multi-method research (MMR) studies with which I am familiar start with regression analysis (or, in recent years, QCA) and perform the case studies afterward. This is the order recommended by Lieberman in his nested analysis article which, in turn, most probably reflects the widely held view that case studies are only worth doing in MMR when something is going on at the cross-case level. Against this backdrop, it was refreshing to see that Weller and Barnes have a chapter on the opposite sequence in their book on pathway analysis. The goal of their chapter is to explain how a case study can benefit from a regression analysis that is done following the case studies (which is something I also consider in my nested analysis article on which I partially draw here).

The chapter was, however, a bit disappointing because the two arguments that are made are not very surprising. First, based on the regression results, we can determine the location of the case and assess how it fits into the analysis. What is not explained is what we should infer from observing that the initially selected case is well-predicted or, more interestingly, badly predicted. (They do not use this terminology, but this is what results-based case classification is about, in my view.) Second, we can use the results for locating additional cases that constitute suitable follow-up studies and comparisons with the original case. This certainly is a benefit of a follow-up regression analysis, but it boils down to an iterative/multi-stage analysis with a regression analysis sandwiched by case studies (see figure 1 in Lieberman’s article).

So what we can learn from a regression run after a case study about the value of the latter? Nothing, I would say. Imagine that we select a case based on the principles developed in case study literature, starting with Lijphart and Eckstein, in order to test a hypothesis on a mechanism. Once we have estimated the model, we produce an observed-vs-predicted plot and find that our case (the blue marker) stands apart in the broader context, i.e., the other cases denoted by the green markers. If we had first estimated the model and generated the plot, we probably would not go for this case because we would take it as deviant and likely to be unsuitable for learning something about the mechanism.

Hypothetical observed-vs-predicted plot

Hypothetical observed-vs-predicted plot

However, we need not be concerned if we first implement process tracing and later learn the case is deviant when put into context. No case is unsuitable for process tracing per se, but some cases are expected to be more useful than others. You expect typical cases to convey relevant insights on a mechanism because, according to your model, they are well-predicted. Conversely, badly predicted cases are not expected to contain information about a causal mechanism that you can generalize.

These expectations can be right or wrong; in the worst scenario in which a marginal effect of a covariate is significant but the relationship we estimated is spurious, you of course cannot learn anything about a mechanism from any typical case. Your expectation can also veer off the path for a deviant case, that is, badly predicted cases might be perfectly suitable for generating valid inferences on a causal mechanism. (The reason that the case is deviant nevertheless is the non-modeled effect of a factor, which is captured by the residual and unrelated to the operation of the mechanism between your covariate of interest and the outcome.)

If you are concerned that your regression model is not properly specified because your case turns out to be badly predicted, you should run model diagnostics testing if the underlying assumptions are met (which you should also do when your case is typical according to your regression results).The classification of cases based on the regression model remains a useful feature of multi-method research and is superior to case selection ignoring the large-n results. However, we should just not infer too much from the distribution of cases and the classification of any single case.

About Ingo Rohlfing

I am a political scientist. My teaching and research covers social science methods with an emphasis on case studies, multi-method research, causation, and causal inference. I also became interested in matters of research transparency and credibility. ORCID: 0000-0001-8715-4771
This entry was posted in case selection, case study, generalization, mixed methods research, multimethod research, nested analysis, qualitative, quantitative and tagged , , , , , , . Bookmark the permalink.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.