How often do reviews for a paper contain “unfortunately, the link between these data is correlative”? As authors, we fear this critique; it is either the death sentence for a manuscript or, with a more forgiving editor, it is the beginning of a long series of new experiments. So, why is a correlative link considered weak? One common answer is that correlated observations can equally represent a cause-effect relation between two interacting components (A causes B causes C) or a common-cause relation between two independent components (B and C are both caused by A; see Fig. 1). Although this is a valid concern, the root of the problem lies in the unclear definition of what causation actually means. Thus, it becomes a subjective measure. Even mathematicians, whose job is to bring formalism to science, are still engaged in a vigorous debate of how to define causation.

Figure 1
figure 1

Granger causality can distinguish between cause-effect (left panel) and common-cause (right panel) relationships which are usually indistinguishable by correlation.

Cell biologists usually use perturbations of pathways to establish cause-effect relationships. We break the system and then conclude that the perturbed pathway component is responsible for the difference we observe relative to the behaviour of the unperturbed system; for example, tens of thousands of studies have derived the function of a protein from the phenotype produced by its knockdown. Although this approach has led to immensely valuable models of cellular pathways, it has its limitations: first, the approach is again correlative, at best. All it does is correlate the intervention with the shifts in system behaviour. Second, this correlation is relatively easy to interpret in a linear relation between the perturbed component and the measured system parameter. However, nonlinearities in the system complicate the analysis of the outcome. Third, system adaptations and side effects in response to interventions are concerns. If there is a correlation between an intervention and the behaviour of an altered system, how can we be sure that it is indeed related to the perturbed component? Strictly, we can only conclude that we observe the system behaviour in the absence of the intact component. However, it is exceedingly difficult to infer how the targeted component contributed to the unperturbed system behaviour. Of course, we do controls to address this issue. For example, we pair knockdown of a protein with its overexpression, or carry out rescue experiments of mutants. But how often are conclusions drawn with imperfect controls? On the bright side, many powerful tools are emerging with the capability to perturb pathways specifically, acutely and locally. Although this will not remedy the ambiguities of interpreting results from interventions in nonlinear systems, it will greatly reduce the risk of system adaptation.

Additional resources to deal with the causality problem may also come from areas of science with fewer or no tools to intervene in the system under study. For instance, economists have been working for decades on tools to infer, without perturbation, what they call “causality” between components. One such straightforward tool is the Granger causality. Applied to the simple example of a cause-effect and a common-cause scenario, Granger causality accurately predicts that there is a causal link between pathway component B and C in the linear case (Fig. 1, left panel), but no link when B and C are downstream of a common input A (Fig. 1, right panel). The Granger causality relies on two pieces of information to reconstruct this topology: time and the basal stochastic fluctuations in each of the component activities. Time indicates directionality: what happens first ought to be upstream of what happens next. Basal fluctuations indicate topology: in the linear pathway, fluctuations in the activity of component C depend on the activity fluctuations of A transmitted through B and the activity fluctuations of B, whereas this dependence on the fluctuation of B is absent when A is the common input to B and C. The power of this approach is that we no longer need to worry about possible side effects of interventions, and it offers an explicit framework to extract feedback and feedforward relations from pairwise correlative variables. To adapt this approach for cell biology, three goals have to be met. First, specific probes are needed that monitor the relevant activity fluctuations of components in a pathway without perturbing the pathway. Second, methods must be developed to sample in situ activity fluctuations at the time scale of information transfer between linked components. Cell biologists are well underway to reaching these goals, for example by live-cell imaging. Finally, it will also be necessary to adjust some of the technicalities of economic tools for the particular scenario of a cell biological pathway. So, with the right tools, correlative experiments can yield to analysis of causation.