Identification and Quantification of Causal Relations

The observation of regularities between events is an essential cognitive task to understand the physical world. In the time domain, these regularities consist of constant sequences of phenomena, which nowadays are investigated by various mathematical disciplines such as statistics and machine learning [1, 2]. In the last years, artificial intelligence has driven impressive progress in the analysis of time series. An incredible number of tasks, from customer advice to investment decisions and medical diagnosis, is performed entirely or crucially supported by statistical and machine learning tools. And the range of human activities, can benefit from information processing tools is constantly increasing as confirmed recently by ChatGPT success [3].

However traditional artificial intelligence and machine learning tools are not conceived to distinguish between association and causation [4, 5]. On the other hand, the real objective of scientific and engineering studies consists often of determining the causal relations between phenomena. Causality requires a level of analysis deeper than the assessment of statistical dependence. Indeed from a mathematical standpoint, the objective of dependence investigations is to obtain the actual probability density function (pdf) of the data. This is not sufficient for causal analysis, which must guide operational behaviour and in particular provide guidance about the effects of interventions. Basically, the difference between statistical correlation and causality is the one between observing and acting.

Even if determining causal relationships is more difficult than simply calculating correlations, nowadays modern societies and the big physics experiments tend to produce huge amounts of data. Consequently, a lot of potentially very useful information is becoming available to support deeper level causality studies. These considerations have motivated the development of techniques aimed at extracting as much knowledge as possible about causal influences from data, a field called Observational Causality Detection (OCD). The developed theories, tools and software packages constitute today a very significant body of knowledge for cross sectional data as reported in various overview papers. On the other hand, techniques for the analysis of the causal interactions between time series have not reached the same level of maturity. Some approaches have been proposed but an organic, general view is not consolidated [6,7,8,9,10].

The goal of the present work consists of showing the potential of Time Delay Neural Networks (TDNNs) and their ensembles to quantify the causal relation between time series [11, 12], and to show how they can be applied to nuclear fusion problems, where time series are a crucial for both physics understanding and control. Their potential is extremely high, because they cannot only shed light on the mutual influences between signals but can also quantify the strength of their causal interactions. The developed TDNN ensembles are very competitive with all the other techniques available and can outperform them significantly, particularly in the most difficult applications.

The use of TDNN ensembles can play a relevant role in several nuclear fusion problems, being thermonuclear plasmas a typical unsteady complex physical system. High temperature plasmas are complex systems that have to be kept well out of equilibrium by the injection of matter and energy [13,14,15]. Indeed to produce the required rate of fusion reactions they have to reach temperatures higher than the core of the sun. This energy content cannot be achieved with simple ohmic heating and therefore sophisticated additional heating schemes have been developed in the course of the years. The calculation of the local effects of this injected power is a difficult task that requires very sophisticated simulation codes [16,17,18,19]. The ensembles of TDNNs could be very useful not only to confirm the results of these simulations but also to provide rapid answers for intershot analysis.

With regard to the organisation of the paper, next section introduces the technology of Time Delay Neural Networks (TDNNs), their ensembles and how they can be refined to investigate causal relationships between time series. In Sect. 3 the potential of TDNN ensembles is substantiated with the help of various numerical tests. The application to the deposition profile in fusion is the subject of Sect. "Validation of the Method with Synthetic Cases" before the conclusions.

Causality Detection with TDNN Ensembles

Causality detection techniques are based on different assumptions. One of the most used class of algorithms for causality detection is based on the definition of causality proposed by Granger [6, 9, 10]: a variable X(t) evolving in time “Granger-causes” another time-evolving variable Y(t) if the past values of X(t) (t-1, t-2, etc.) allows for a better prediction of Y(t) than the prediction without the use of past X(t). So basically Granger causality is a simple statistical hypothesis test for determining whether one time series provides useful information to forecast another. The methods reported in the present work, already analysed with various synthetic cases in [11, 12], are based on this concept.

Adopted Approach to Causality Detection

Consider the following experimental time series: a set of causes \({X}_{1}\left(t\right), {X}_{2}\left(t\right), \dots ,{X}_{M}\left(t\right)\), one new candidate cause \({X}_{C}\left(t\right)\), and one possible effect \(Y\left(t\right)\). Given two functions predicting \(Y\left(t\right)\), one \({Y}_{f}(t)\) using also past values of \({X}_{C}\left(t\right)\) and one \({Y}_{g}(t)\) not relying on the information contained in \({X}_{C}\left(t\right)\):

$${Y}_{f}\left(t\right)=f\left({X}_{1}\left(t-1,\dots ,t-{N}_{d}\right),\dots ,{X}_{M}\left(t-1,\dots ,t-{N}_{d}\right),Y\left(t-1,\dots ,t-{N}_{d}\right),{X}_{C}\left(t-1,\dots ,t-{N}_{d}\right)\right)$$
(1)
$${Y}_{g}\left(t\right)=g\left({X}_{1}\left(t-1,\dots ,t-{N}_{d}\right),\dots ,{X}_{M}\left(t-1,\dots ,t-{N}_{d}\right),Y\left(t-1,\dots ,t-{N}_{d}\right)\right)$$
(2)

The prediction error of the two functions can be written as:

$${\rm E}_{f}=\frac{1}{N}\sum_{i=1}^{N}{\left(Y\left({t}_{i}\right)-{Y}_{f}\left({t}_{i}\right)\right)}^{2}$$
(3)
$${\rm E}_{g}=\frac{1}{N}\sum_{i=1}^{N}{\left(Y\left({t}_{i}\right)-{Y}_{g}\left({t}_{i}\right)\right)}^{2}$$
(4)

where N is the number of samples in the time series. One may conclude that \({X}_{C}\left(t\right)\) causes \(Y\left(t\right)\) if \({\rm E}_{g}>{\rm E}_{f}+{\mathrm{\rm E}}_{threshold}\), where \({\mathrm{\rm E}}_{threshold}\) is an uncertainty threshold determined by the error and nature of the time series. The \({\mathrm{\rm E}}_{threshold}\) can be determined with statistical techniques such as the method of the surrogate data.

A possible and more robust alternative to the just described approach is based on fitting several \({f}_{k}\) and \({g}_{k}\) functions by using slightly different data (“training set”). So, the average and the standard deviation of the uncertainties of \({f}_{k}\) and \({g}_{k}\) are calculated:

$$\overline{{\rm E }_{f}}=\frac{1}{K}\sum_{k=1}^{K}{\rm E}_{{f}_{k}}; {\sigma }_{{E}_{f}}^{2}=\frac{1}{K}\sum_{k=1}^{K}{\left({\rm E}_{{f}_{k}}-\overline{{\rm E }_{f}}\right)}^{2}$$
(5)
$$\overline{{\rm E }_{g}}=\frac{1}{K}\sum_{k=1}^{K}{\rm E}_{{g}_{k}}; {\sigma }_{{E}_{g}}^{2}=\frac{1}{K}\sum_{k=1}^{K}{\left({\rm E}_{{g}_{k}}-\overline{{\rm E }_{g}}\right)}^{2}$$
(6)

In such a situation, a statistical hypothesis test can be performed (based on the difference of the means) and one can also provide the confidence interval of the results (see Fig. 1 for an overall pictorial description of the entire methodology).

Fig. 1
figure 1

Top: overview of the entire procedure to investigate the causal coupling between systems. Bottom-left: the architecture of the ensembles. Bottom-right: topology of an individual time delay neural network (particularized for the trivariate case)

Traditional implementations of Granger causality, the basis of the approach followed in the present work, are affected by a couple of quite significant limitations. The most important is that many algorithms assume a priori the functional for of the dependence between the effect and the potential causes. Typically, some sort of autoregressive models are implemented, which work satisfactorily only for linear dependencies [8]. Secondly, the threshold is usually a parameter tuned to maximise the sensitivity and specificity of the causality detection performances. Unfortunately, such a tuning is not fully general. In this work, the use of TDNN allows for a fully general training independent of any a priori assumption, while the ensemble methodology allows for a statistical threshold that does not need any tuning (it just depends of the confidence intervals required by the specific application).

Brief Description of Time Delay Neural Networks (TDNNs)

In the perspective of deriving information about the time evolution of systems and their mutual influence, Time Delay Neural Networks (TDNNs), which constitute a natural extension of traditional feed-forward neural networks [24], have proved to be very useful tools. Indeed, they are explicitly designed to learn from the past. Consequently, the topology of TDNNs is also well suited to the investigation of causal relations between time series.

The architecture of Time Delay Neural Networks is explicitly devised to predict the future of time series on the basis of their past evolution. This is achieved by providing as inputs to this type of networks sequences of subsequent time points and not single time slices of data. Therefore, the inputs to TDNNs can be conceptualised as windows of length p into the past. In mathematical terms, this type of network implements a non-linear autoregressive model of order p. Given their simple architecture, the training of traditional feed-forward networks can be easily adapted to the topology of TDNNs.

However, the simple architecture of traditional TDNNs is not completely adequate to learn the temporal structure of the data with the aim of assessing their causal relationships. A slightly modified version of TDNNs, shown in the bottom panel of Fig. 1, has therefore been devised and it is the one adopted to perform the studies reported in the rest of the paper. Moreover, individual TDNNs, even if very performing, are not immune to bias when deployed to address involved cases. Therefore, the final technology implemented consist of Ensembles of DNNs, with the topology depicted in the middle panel of Fig. 1. Other approaches to reduce bias, such as repeating the training with different seeds or dropout learning [25], are of course viable alternatives. However, the proposed ensembles provide additional opportunities, such as the use of bagging noise-based ensembles or adaption of random forests [26, 27], which can be very useful in many practical applications.

These ensembles of TDNNs, deployed to obtain the results described in the following sections, have been implemented with the MATLAB toolbox. The training technique is backpropagation using the Levenberg–Marquardt algorithm. About 5000 epochs have proved to be normally sufficient to solve the numerical examples reported (in any case, convergence has also been achieved for less than the maximum limit set to 10,000 epochs and with 200 validation checks).

As mentioned, the concept of causality adopted in the present work is the one of improved predictivity. Consequently, the TDNNs ensembles have been trained and deployed to predict the future evolution of the signal targets. The quality of the predictions is then determined by the residuals, the differences between the predictions and the actual values of the target time series. The residuals of the target quantity are calculated first providing all the variables as inputs to the TDNN ensembles. Then the residuals are computed again but after removing a candidate driver from the input set (see top panel of Fig. 1). The variance of the residuals, the one obtained considering all the inputs, is then compared with the one obtained after removing the specific driver. If the variance, obtained after removing the potential driver, is statistically significantly higher, than the one calculated when the candidate driver was included, then the removed quantity is considered to have a causal influence on the target.

Causality Quantification

In many applications, it is relevant to determine not only whether one variable causes another, but also to quantify the level of influence exerted by one system to another. A very useful indicator to evaluate such information is the error ratio, defined in the previous work as \({R}_{\sigma }\left({X}_{c},Y\right)\):

$${R}_{\sigma }\left({X}_{c},Y\right)=\frac{\overline{{\rm E }_{g}}}{\overline{{\rm E }_{f}}}$$
(7)

Its uncertainty can be easily calculated with simple error propagation analysis:

$${\sigma }_{{R}_{\sigma }}\left({X}_{c},Y\right)={R}_{\sigma }\left({X}_{c},Y\right)\sqrt{\frac{{\sigma }_{{E}_{f}}^{2}}{{\overline{{\rm E }_{f}}}^{2}}+\frac{{\sigma }_{{E}_{g}}^{2}}{{\overline{{\rm E }_{g}}}^{2}}}$$
(8)

By using the average value (\({R}_{\sigma }\left({X}_{c},Y\right)\)) and its uncertainty (\({\sigma }_{{R}_{\sigma }}\left({X}_{c},Y\right)\)) one may perform a statistical test to determine if \({R}_{\sigma }\left({X}_{c},Y\right)\) is statistically larger than one (and therefore if there is causality between \({X}_{c}\) and \(Y\)).

The interpretability of such indicator can be easily understood with the help of two examples.

In the first case, let us suppose that the equation governing \(Y\left(t\right)\) is:

$$Y\left(t\right)=h\left({X}_{1}\left(t-1\right),\dots ,{X}_{M}\left(t-1\right),Y\left(t-1\right)\right)+w{X}_{C}\left(t-1\right)+\epsilon$$
(9)

where w is constant and \(\epsilon\) is term associated to noise or random fluctuations. A statistical analysis allows writing:

$$\overline{{\rm E }_{f}}={\epsilon }^{2};\, \overline{{\rm E }_{g}}={w}^{2}{\sigma }_{{X}_{c}}^{2}+{\epsilon }^{2};\, {R}_{\sigma }\left({X}_{c},Y\right)=\frac{{w}^{2}{\sigma }_{{X}_{c}}^{2}}{{\epsilon }^{2}}+1$$
(10)

Equation (10) lends itself to a clear interpretation: \({R}_{\sigma }\left({X}_{c},Y\right)-1\) quantifies the amount of influence that \({X}_{c}\) exerts on \(Y\), in terms of the variance that flows from \({X}_{c}\) to \(Y\) (normalised by the variance of the noise affecting \(Y\)).

A slightly more general case may be:

$$Y\left(t\right)=h\left({X}_{1}\left(t-1\right),\dots ,{X}_{M}\left(t-1\right),Y\left(t-1\right)\right)+q\left({X}_{C}\left(t-1\right)\right)+\epsilon$$
(11)

In this case, simple statistical analysis leads to:

$${R}_{\sigma }\left({X}_{c},Y\right)-1=\frac{{\sigma }_{q\left({X}_{c}\right)}^{2}}{{\epsilon }^{2}}$$
(12)

The most general case is expressed by the following equation:

$$Y\left(t\right)=h\left({X}_{1}\left(t-1\right),\dots ,{X}_{M}\left(t-1\right),Y\left(t-1\right)\right)+q\left({X}_{1}\left(t-1\right),\dots ,{X}_{M}\left(t-1\right),Y\left(t-1\right),{X}_{C}\left(t-1\right)\right)+\epsilon$$
(13)

Now the indicator \({R}_{\sigma }\left({X}_{c},Y\right)\) can be written as:

$${R}_{\sigma }\left({X}_{c},Y\right)-1=\frac{{\sigma }_{q\left({X}_{1}\left(t-1\right),\dots ,{X}_{M}\left(t-1\right),Y\left(t-1\right),{X}_{C}\left(t-1\right)\right)}^{2}}{{\epsilon }^{2}}$$
(14)

In this case, the cause of \({X}_{C}\left(t-1\right)\) depends on other variables (for example, one may have something like \(q\left({X}_{1}\left(t-1\right),\dots ,{X}_{M}\left(t-1\right),Y\left(t-1\right),{X}_{C}\left(t-1\right)\right)=w Y\left(t-1\right) {X}_{C}\left(t-1\right)\)). Since there is not a decoupling between the variables (because causes are coupled), this is a more delicate problem in terms of both analysis and interpretation, and different methodologies based on modelling or knowledge of a priori information is needed. A possible approach is the use of the genetic programming via symbolic regression for time series [5].

Validation of the Method with Synthetic Cases

In this section, some numerical models are introduced to investigate the capabilities of new methodology for both causality detection and quantification. The training, validation and test sets are in the proportion of 75%, 15% and 15% of the overall database.

Analysed Systems

Three different systems, all based on three coupled autoregressive equations, are investigated as case studies.

$$System\, A{:} \left\{\begin{array}{c}x\left(t\right)=0.95x\left(t-1\right)+\sigma \left(\mathrm{0,0.1}\right)\\ y\left(t\right)=0.95y\left(t-1\right)+\sigma \left(\mathrm{0,0.1}\right)\\ z\left(t\right)=0.95z\left(t-1\right)+{w}_{x}x\left(t-1\right)+{w}_{y}y\left(t-1\right)+\sigma \left(\mathrm{0,0.1}\right)\end{array}\right.$$
(15)

In system A, both \(x\left(t\right)\) and \(y\left(t\right)\) are independent from external causes and both influence \(z\left(t\right)\) with an intensity equal to \({w}_{x}\) and \({w}_{y}\) respectively. The causality is linear and independent, and therefore the amount of influence (variance) of x on z is independent of the influence (variance) of y on z. \(\sigma \left(\mathrm{0,0.1}\right)\) is a random number sampled from a normal distribution with mean equal to zero and standard deviation equal to 0.1.

$$System\, B{:} \left\{\begin{array}{c}x\left(t\right)=0.5x\left(t-1\right)+\sigma \left(\mathrm{0,0.1}\right)\\ y\left(t\right)=0.5y\left(t-1\right)+\sigma \left(\mathrm{0,0.1}\right)\\ z\left(t\right)=0.5z\left(t-1\right)+wx\left(t-1\right)y\left(t-1\right)+\sigma \left(\mathrm{0,0.1}\right)\end{array}\right.$$
(16)

In system B, again, both \(x\left(t\right)\) and \(y\left(t\right)\) are independent and both exert an influence on \(z\left(t\right)\). However, in this case their effects are not independent, because their influence depends on their mutual product. Consequently, they have no individual causal influence but they affect y \(\left(t\right)\) only if they are both present.

$$System C: \left\{\begin{array}{c}x\left(t\right)=0.5x\left(t-1\right)+\sigma \left(0,0.1\right)\\ y\left(t\right)=0.5y\left(t-1\right)+0.5z\left(t-1\right)+\sigma \left(\mathrm{0,0.1}\right)\\ z\left(t\right)=0.5z\left(t-1\right)+{w}_{x}x\left(t-1\right)+{w}_{y}y\left(t-1\right)+\sigma \left(\mathrm{0,0.1}\right)\end{array}\right.$$
(17)

This last system C replicates the same of case A, with the difference that there is a feedback loop between \(y\left(t\right)\) and \(z\left(t\right)\). Being able to handled causal relationships with feedback is particularly important, because in nature many complex systems present this type of interactions. It will be shown than in this situation, even if the same coefficient (0.5) is applied to \(z\left(t-1\right)\) in the equation of \(y\left(t\right)\), the value of \({R}_{\sigma }(y,z)\) will change because of \({w}_{y}\), that in the feedback loop causes a variation of \(z\). The draw and stock diagrams of the three systems are shown in Fig. 2.

Fig. 2
figure 2

Pictorial view of the causal relationships for the three numerical cases A, B and C. All systems present inertia whereas only the last one contains also a feedback loop

Results

The coefficients of causality \({{\text{R}}}_{\upsigma }\) as a function of both \({{\text{W}}}_{{\text{x}}}\) and \({{\text{W}}}_{{\text{y}}}\) are shown in Fig. 3 for system A.

Fig. 3
figure 3

Coefficients of causality \({R}_{\sigma }\) as a function of \({W}_{x}\) and \({W}_{y}\) for System A

For what concerns the time series x(t) and y(t), they result to be caused only by their past values. In fact, only \({{\text{R}}}_{\upsigma }\left(x,x\right)-1\) and \({{\text{R}}}_{\upsigma }\left(y,y\right)-1\) are significantly larger than zero (see \({{\text{R}}}_{\upsigma }\left(y,x\right)\), \({{\text{R}}}_{\upsigma }\left(z,x\right)\), \({{\text{R}}}_{\upsigma }\left(y,y\right)\), \({{\text{R}}}_{\upsigma }\left(z,y\right)\)). Moreover, their values are independent of both coefficients \({{\text{W}}}_{{\text{x}}}\) and \({{\text{W}}}_{{\text{y}}}\), which model the strength of the causal influence between the two time series.

In the case of time-series z(t), it is observed what expected: \({{\text{R}}}_{\upsigma }\left(x,z\right)\) increases proportionally with \({{\text{W}}}_{{\text{x}}}\), while \({{\text{R}}}_{\upsigma }\left(y,z\right)\) with \({{\text{W}}}_{{\text{y}}}\). No dependences between \({{\text{R}}}_{\upsigma }\left(x,z\right)\) and \({{\text{W}}}_{{\text{y}}}\) or between \({{\text{R}}}_{\upsigma }\left(y,z\right)\) and \({{\text{W}}}_{{\text{x}}}\) are detected. \({{\text{R}}}_{\upsigma }\left(z,z\right)\) is large and almost independent of the two causal intensities determined by the coefficients \({{\text{W}}}_{{\text{x}}}\) and \({{\text{W}}}_{{\text{y}}}\).

The results for system B are summarised in Fig. 4. Again, x(t) and y(t) depend just on their past values respectively. The time series z(t) depends on all values. In this case, both \({{\text{R}}}_{\upsigma }\left(x,z\right)\) and \({{\text{R}}}_{\upsigma }\left(y,z\right)\) increase with W and the causal effects of x(t) and y(t) on z(t) are not separable.

Fig. 4
figure 4

Coefficients of causality \({R}_{\sigma }\) as a function of \(W\) for System B

Figure 5 shows the results for System C. The time series x(t) is independent of other observations. High causality values for y(t) are found for both y(t) and z(t). As for system A, \({{\text{R}}}_{\upsigma }\left(x,z\right)\) increases proportionally with \({{\text{W}}}_{{\text{x}}}\), while \({{\text{R}}}_{\upsigma }\left(y,z\right)\) with \({{\text{W}}}_{{\text{y}}}\). However, since the system C is characterised by a feedback loop between y and z, a variation of \({{\text{W}}}_{{\text{x}}}\) and \({{\text{W}}}_{{\text{y}}}\) directly involves a different causality intensity \({R}_{\sigma }\left(z,y\right)\), as it is shown in Fig. 6. Indeed, even if the past value of z linearly causes y, the feedback loop implies that the cause is not linear.

Fig. 5
figure 5

Coefficients of causality \({R}_{\sigma }\) as a function of \({W}_{x}\) and \({W}_{y}\) for System C

Fig. 6
figure 6

\({R}_{\sigma }(z,y)\) as a function of both Wx and Wy

Nuclear Fusion Case Study: Additional Heating Profile Estimation

An important real-life example of a very complex system, in which the maze of causal relationships is very difficult to disentangle, is the tokamak. In the devices implementing this magnetic configuration, high temperature plasmas are confined for the research on thermonuclear fusion [11]. The main objective consists of achieving conditions, in which the production of energy from the coalescence of light nuclei becomes economically viable [20]. A reactor implementing this concept constitutes one of the most ambitious engineering challenges of present-day societies (see the building of the next generation experiment ITER in the south of France). One particularly delicate aspect is the heating of the plasmas confined by the tokomak field configuration. Indeed, at the temperatures of hundreds of millions of degrees ohmic heating becomes ineffective and other schemes have to be deployed. The research in the field of additional heating techniques for tokamak devices is a very active area [21,22,23]. Among other aspects, the determination of the power deposition profile is a particularly delicate task, which has significant implications both operational and interpretative. For example, in many studies the properties of the transport are investigated, e.g. using modulated ECRH, while assuming the theoretical deposition profile. This highly problematic, as typically, the experimentally deposition profiles are significantly broader than the theoretical profiles [24]. As the transport partial differential equation contains source, sinks and transport, a wrong assumption on the deposition, will lead to miss-interpretation of the transport. The details of the power deposition inside the plasma is typically investigated with very sophisticated codes, which require human supervision and are computationally quite demanding. There is therefore scope for fast but sufficiently accurate algorithms, capable of determining the effects of the additional heating power on the temperature profile in reasonably short time. The potential of the TDNN ensembles to contribute to this aspect is discussed in the rest of this section. It should be emphasised that the case reported in the following is not a realistic physics study but just a numerical example, to motivate the use of the TDNN technology in more detailed and realistic investigations of the subject.

Model

Let us consider a one-dimensional case where both the temperature (T) and the density (n) are a function of the normalised minor radius coordinate rn = r/a (a is the minor radius). Moreover, suppose that the density profile is given by the following equation:

$$n\left({r}_{n}\right)={n}_{0}{\left(1-{{r}_{n}}^{\alpha }\right)}^{\beta }$$
(18)

where \({n}_{0},\) \(\alpha ,\) and \(\beta\) are free parameters defining the shape and the value of the electron density.

One may assume that the temperature changes according with the energy conservation law:

$$n\left({r}_{n}\right)\frac{\partial T\left({r}_{n}\right)}{\partial t}={\Gamma }_{heat}\left({r}_{n}\right)+{\Gamma }_{diffusion}\left({r}_{n}\right)+{\Gamma }_{rad}\left({r}_{n}\right)$$
(19)

where \({\Gamma }_{heat}\left({r}_{n}\right)\), \({\Gamma }_{diffusion}\left({r}_{n}\right)\) and \({\Gamma }_{rad}\left({r}_{n}\right)\) are the heating, diffusive, and radiative terms respectively.

The heating term is assumed to be completely determined by the additional heating systems with a function like:

$${\Gamma }_{heat}\left({r}_{n}\right)={H}_{p}\left({r}_{n}\right)g\left(t\right)$$
(20)

where \(g\left(t\right)\) describes how the total input power varies as a function of time, while \({H}_{p}\left({r}_{n}\right)\) is directly responsible for the local deposition along the minor radius. One of the objectives of the following analysis is the demonstration that the causality score is proportional to the heat deposition function \({H}_{p}\left({r}_{n}\right)\).

The diffusion term is simply defined as.

$${\Gamma }_{diffusion}\left({r}_{n}\right)={D}_{diff}\frac{{\partial }^{2}T\left({r}_{n}\right)}{\partial {r}^{2}}$$
(21)

where \({D}_{diff}\) is a diffusion coefficient, which is assumed constant for simplicity sake in the present treatment.

The radiative term is assumed to be dominated the Bremsstrahlung radiation:

$${\Gamma }_{rad}\left({r}_{n}\right)={C}_{Brem}{Z}^{2}\left({r}_{n}\right){n}^{2}\left({r}_{n}\right)T{\left({r}_{n}\right)}^{1/2}$$
(22)

where \({C}_{Brem}\) is a constant and \(Z\left({r}_{n}\right)\) is the equivalent charge number.

Boundary conditions at the edge just imposed that the temperature at \(T\left({r}_{n}=-1\right)=T\left({r}_{n}=1\right)=1 eV\).

Results

Figure 7 shows in blue the heating profiles for the four cases analysed. In red the causality intensity indicator \({R}_{\sigma }-1\), quantifying the effect of the input power on the temperature, is reported. The causality intensity is resolved for each position \({r}_{i}/a\), by comparing the two TDNN models:

Fig. 7
figure 7

Power deposition profiles in blue and the indicator \({R}_{\sigma }\left({r}_{i}/a\right)-1\) in blue

$${T}_{f}\left(\frac{{r}_{i}}{a},t\right)=f\left(T\left(\frac{{r}_{i-1}}{a},t-1\right),T\left(\frac{{r}_{i}}{a},t-1\right),T\left(\frac{{r}_{i+1}}{a},t-1\right),{P}_{heating}\left(t-1\right)\right)$$
(23)
$${T}_{g}\left(\frac{{r}_{i}}{a},t\right)=g\left(T\left(\frac{{r}_{i-1}}{a},t-1\right),T\left(\frac{{r}_{i}}{a},t-1\right),T\left(\frac{{r}_{i+1}}{a},t-1\right)\right)$$
(24)

So, \({R}_{\sigma }\left({r}_{i}/a\right)-1\) should quantify how much the input power is directly causing a local variation of the local temperature. Four different power deposition profiles have been simulated. They are shown as blue curves in Fig. 7. At first glance, it can be observed that there is a clear proportionality between the causality intensity and the heating profile, which perfectly overlap for case 1 (plot on the top-left). For the other cases, there is a good match between the causality intensity coefficients and the respective heating profiles, even if not perfect.

In Fig. 8, the correlation between the heating profile and the causality intensity is analysed for the four cases. A regression analysis has been performed:

Fig. 8
figure 8

\({R}_{\sigma }-1\) vs the true value of the heating profile for the four cases. The red line is the result from a 2nd order polynomial function

$${\text{log}}\left({R}_{\sigma }-1\right)=0.118{{\text{log}}\left({H}_{p}\right)}^{2}+1.648{\text{log}}\left({H}_{p}\right)+5.928$$
(25)

and the result of the fitting is reported in the figure. The adjusted \({R}^{2}\) of the fit is 98.2% (calculated in logarithmic scale, as the rest of the figure).

These differences between causality intensity and heating profiles are due, mostly, to the non-linearity of the phenomenon described here. In fact, a local value of the heating profile can have completely different causality values on the temperature since both diffusion and radiation depend on the temperature themselves. This clearly implies that a measurement of the heating profile can be investigated by using classical methodologies (e.g. numerical simulations) or by using the new branch of artificial intelligence that combines physics with data-driven techniques, the so-called Physics-Informed Machine Leaning [25], which is gaining increasing attention in most branches of the applied sciences.

Conclusions and Future Work

The investigation of complex system requires the development of new analysis tools capable of providing information about the causal relationships between time series and not only about their correlations. Ensembles of TDNNs have proved to be very performing in this respect. Various numerical tests have proved that they can handle quite complex situations, even including feedback loops between the quantities involved. Moreover, they tend to outperform all the main techniques reported in the literature in terms of both flexibility and accuracy. In addition, they are easy to implement and fast to run (Computational times are of course dependent of sample size. For time-series of 1000 samples, the computational time is around 30 min on a common laptop). Another competitive advantage is the capability of TDNNs to quantify the level of mutual influence between time series not only the directionality of the causal relationships.

Application to arguably the most complex laboratory system in big physics, magnetic confinement thermonuclear fusion, has proven the capability of the approach to handle real life data without any major issue. The prospects in this field are particularly promising for both the tokamak [15, 26] and the RFP configuration [27]. Indeed, the approach could be used to address various crucial issues from the control of the total radiation [28, 29] to the prediction of disruptions [30,31,32,33,34,35]. With regard to the specific aspect of investigating the heating system, very interesting would be the simultaneous analysis of the deposition and the power transport [36]. From a methodological perspective, further work would be necessary to couple TDNNs with symbolic regression in order to extract from the data also the mathematical equations governing the interactions between the systems involved [37, 38].