Causal discovery and inference for evaluating fire resistance of structural members through causal learning and domain knowledge

Experiments remain the gold standard to establish an understanding of fire‐related phenomena. A primary goal in designing tests is to uncover the data generating process (i.e., the how and why the observations we see come to be); or simply what causes such observations. Uncovering such a process not only advances our knowledge but also provides us with the capability to be able to predict phenomena accurately. This paper presents an approach that leverages causal discovery and causal inference to evaluate the fire resistance of structural members. In this approach, causal discovery algorithms are adopted to uncover the causal structure between key variables pertaining to the fire resistance of reinforced concrete columns. Then, companion inference algorithms are applied to infer (estimate) the influence of each variable on the fire resistance given a specific intervention. Finally, this study ends by contrasting the algorithmic causal discovery with that obtained from domain knowledge and traditional machine learning. Our findings clearly show the potential and merit of adopting causality into our domain.


| INTRODUCTION
In our pursuit of discovering knowledge, we seek to identify, or possibly retrieve, the underlying data generating process (DGP) responsible for producing the observations we hope to understand. 1 As such, we devise experiments. Such experiments are designed to test and explore hypotheses. A hypothesis targets a specific direction to uncover the DGP behind a given phenomenon (i.e., the cause(s) leading to the so-called effect). In other words, our experiments are fueled by hypotheses that examine how a set of events/states may produce other events/ states. More generally, we look to identify the causal path (i.e., cause(s) ! effect), for, without the cause(s), the event would not have been generated*. 2 Say that we uncover the true DGP behind fire testing, then it is plausible that this knowledge, once validated, will enable us to focus on other burning questions within our domain. While that is an ambitious goal, uncovering the true DGP is equally important in the short run as it could reduce our reliance on expensive and complex fire tests. For example, instead of testing a series of specimens, one may opt to utilize the identified DGP to estimate the outcome of a particular testing campaign and perhaps test a couple of specimens (vs. all specimens) to cross-check the estimates obtained via the DGP.
Unlike the concepts of the digital twin or finite element (FE) modeling, a causal method is an attractive one-stop approach that may not require continuous simulations (which entails monitoring, collecting data, storage of data, and, most importantly, the building and rebuilding of case-specific models). As such, it is the opinion of the authors of this work that pursuing this research direction is of high merit.
Unfortunately, a search using the Dimensions academic database 3 pertaining to the terms "causality," "causal discovery," and "causal inference" when paired with "structural fire engineering" (individually or collectively) returns a minute amount of published work. 4 On a more positive note, the amount of work with a causal theme is much larger when related to social sciences and/or ecological sciences within the fire domain. This observation and companion informetric analysis restraint our literature review discussion on former work while, at the same time, establishes the need for this current study.
In lieu of the above, we opt to highlight some of the key factors and recent efforts that tackled the problem of fire resistance of reinforced concrete (RC) columns. From this perspective, exposure to elevated temperatures has been noted to degrade the properties of construction materials via a series of physiochemical reactions. 5,6 This often leads to losses in the strength and elastic modulus properties. 6 The loss of strength is primarily a function of the concrete type, among other factors such as heating rate, mixture proportions, and so on. 7 This loss in strength and stiffness properties implies that fire-exposed RC columns are bound to undergo some level of capacity loss under fire conditions.
Notably, RC columns made from normal strength concrete (NSC) display admirable performance under fire conditions-especially when compared to columns made from higher grade concrete materials. 8 In contrast to NSC, high strength concrete (HSC) and ultra-highperformance concrete (UHPC) have a much denser structure and low water/cement ratio. Despite the above, surprisingly, the temperature-induced degradation in HSC and UHPC has been shown to occur at a rapid pace. 9,10 Thus, although HSC and UHPC may attain high strength (2-4 times that of NSC), the same strength does not correlate to improved fire resistance. In fact, the correlation between compressive strength and fire resistance of more than 130 RC columns shown in Figure 1 is a weak positive correlation of 0.21. It is interesting to point out that a column with low concrete strength does not seem to guarantee to achieve high fire resistance.
Other factors that are also linked to the fire resistance of RC columns include the geometric features, the level of applied loading, boundary conditions, and fire scenario (heating rate, heating duration, maximum temperature, cooling duration, etc.). [11][12][13] The interaction of such factors has been heavily examined in the open literature experimentally, 14,15 numerically, 16,17 and theoretically. 18,19 Other notable studies worthy of mentioning include those that explored the influence of unsymmetric heating/ loading, 20-23 unique geometrical features, 11 design/natural fire conditions, 24 reinforcement configuration, 25,26 as well as residual response post fire. [27][28][29] Practically, the fire resistance of RC columns can be predicted and calculated via a number of approaches. These approaches range from traditional aids (i.e., via charts, tables) as documented in fire buildings and standards, 30,31 to hand-calculation based methods derived by researchers, 27,[32][33][34] to alternative methods such as finite element simulations, 16,26 and/or machine learning (ML) (blackbox algorithms, [35][36][37][38] or explainable models 39 ). The aforenoted methods deliver predictions on the expected fire resistance of RC columns given a set of variables. Interestingly, these methods do not often agree if applied to a particular and/or a set of columns-possibly due to differences in derivation, principles, assumptions, and fundamentals. [40][41][42] The same also presents an opportunity to revisit the classical phenomenon of fire resistance of RC columns. Indeed, this is the second motivation behind this work. This paper presents a casual approach to discovering and inferring the causal mechanism responsible for the data generating process of the fire resistance of RC columns. First, casual discovery learning is carried out to uncover the causal structure responsible for tying the variables involved in the fire resistance of RC columns. Four causal structures were identified herein through algorithmic search and incorporation of domain knowledge. Then, inference algorithms are used to estimate the influence of interventions upon the noted variables in each of the identified four structures. For completion, a comparison is drawn to examine the newly discovered knowledge against domain knowledge and traditional machine learning.

| DEVELOPMENT OF THE PROPOSED CAUSAL APPROACH
Regression can be comfortably used to predict an outcome of interest, Y, given a set of regressors. A prediction in this manner does not imply nor indicate that the regressors, X's, are causes of Y-thu such assignment may arrive from domain knowledge. Assigning such regressors is hardly associated with checking of confounders or common causes of the X's either. On the other hand, a causal analysis strives to establish if a set of variables causes Y. A look into Figure 2 showcases a visual depiction of how regression differs from causation.
The proposed causal approach comprises three primary steps. In the first step, causal discovery is adopted to uncover the underlying structure pertaining to the causal ties between a select of variables. Those variables can be identified from domain knowledge, experts' opinions, and/or numerically (i.e., algorithmically). The causal links are established by satisfying causal principles. These principles include the Markov causal assumption, the causal faithfulness assumption, and the causal sufficiency assumption. These assumptions are described herein, and full details can be found elsewhere. [43][44][45] The first assumption states that a given variable is independent of all other variables (except its own effects) conditional on its direct causes. This Markovian assumption is accompanied by the d-separation criterion, 43 which entails whether a variable is independent of another given a third by associating the notion of independence with the separation of variables in a causal graph. The casual faithfulness assumption states that any population produced by a causal graph has the independence relations obtained by applying the d-separation criterion. The causal sufficiency assumption refers to the absence of hidden or latent parameters that we do not know nor are aware of. For completion, a causal graph is a directed acyclic graph (DAG) that satisfies the causal assumptions-see Figure 3. † In the second step, and once the causal structure (say a DAG) is identified, causal inference algorithms are applied to infer how would the output (i.e., fire resistance of RC columns) change by intervening (e.g., changing) the magnitude or degree of a governing variable. An intervention equates to setting X = x as opposed to observing X = x. The former relates to causation (what is the fire resistance of a RC column if its width is increased to where W is assumed to cause X, X is assumed to cause Z and Y, and Z is also assumed to cause Y. Y is conditionally independent of W given X, and Z is conditionally independent of W given X 300 mm?) and the latter to prediction (what is the fire resistance of a RC column given it has a width of 300 mm?).
Finally, in the last step, findings from the causal analysis can be compared against that of a companion analysis (say, from traditional machine learning, statistical analysis, domain knowledge, etc.). An advantage to comparing the causal analysis to existing theory may come in handy to further verify the correctness of an adopted theory or in checking if the causal analysis mirrors that from domain knowledge. Figure 4 demonstrates the proposed approach in more detail.

| DESCRIPTION OF CAUSAL ALGORITHMS AND MODELS
This section describes the two main causal algorithms and corresponding models showcased herein (CausalNex and DoWhy), and a full description of these tools can be found in their respective references [46,47]. To maintain coherence and allow the replication of our analysis, we adopted both algorithms in their default settings. In a nutshell, we adopted CausalNex to uncover the DGP and then incorporated such DGP into DoWhy to infer the influence of interventions upon the fire resistance.

| CausalNex
The majority of existing causal methods are often limited in their ability to consider the complex relationships between variables, that is, assume constant/fixed relationship effects. CausalNex is a new Python library that addresses this challenge by allowing data scientists to develop models that consider how changes in one variable may affect other variables using Bayesian Networks. 48 This not only allows us to find realistic relationships between variables but also documents such relationships via causal models. A causal model that depicts the relationship between variables is displayed via directed acyclic graphs (DAGs).
As discussed in the previous section and noting how the search space is combinatorial and exponentially large implies that finding such relationships can be computationally exhaustive. To address this issue, CausalNex employs an optimized structure learning algorithm, NOTEARS, to easily find the variables that have the most influence on the supplied outcomes. Because its runtime does not rise exponentially, it only scales cubically, as opposed to the exponential expansion witnessed by traditional DAG search methods. As a result, NOTEARS makes the search for DAGs affordable and manageable. This makes CausalNex an ideal tool for scientists who want to uncover hidden cause-and-effect relationships in their data.

| DoWhy
The DoWhy 46 causal inference library provides an intuitive solution to estimate the average causal effect of one variable on another or upon the outcome of interest. This package offers a principled end-to-end library that can efficiently conduct causal analysis through an integrated approach that combines already existing causal and statistical methods.
There are four steps to conducting a causal analysis via DoWhy. These include (1) modeling the underlying causes of a problem, (2) identifying trends/relationships between variables, (3) estimating parameters based on previous models or data sets, and (4) refuting the obtained estimates by testing the robustness of the initial model's assumptions (through three models, the Random common cause, Data subset Refuter, and Placebo Treatment Refuter). ‡ In the final step, the user's assumptions are examined. If a given assumption is valid, then estimations arrived at from the Random common cause, and Data subset refuter models should not vary by much, according to the principles of Refute models. However, if the assumption is correct, the estimate should be near zero, according to the Placebo treatment refutation model. This allows for greater accuracy and efficiency in research. 49

| CAUSAL CASE STUDY
This section describes our case study, as well as findings from the presented causal analysis. We start with a description of the adopted database and then dive into discussing our research.
The reader is to note that this particular database was heavily used in previous ML-based papers by the authors and hence presents an attractive solution to compare the results of the presented causal analysis against that of blackbox ML analysis, 42 as well as explainable ML analysis, 39 and unsupervised learning analysis. § Further statistical details can be seen in Figure 5 and Table 1. A look into the distribution of the variables, as shown in the histograms and provided table, shows that the selected columns are of practical ranges applicable to buildings.

| Causal discovery analysis
As mentioned above, we start our analysis by using Causalnex to identify the possible relationships between all collected variables and their contribution toward improving or reducing a given RC column's resistance to fire. As CausalNex does not accommodate continuous data points (given its Bayesian Network nature), the database was discretized via the Decision Tree Supervised discrimination method. 61 This method is favored by CausalNex and is part of its internal package. The thresholds for all input variables are listed in Table 2.
In addition to discretizing the input variables, the output variable (i.e., fire resistance) was also discretized to classify the dataset using a Bayesian network trained with the conditional probability distribution (CDP). In this case, the FR values were split into three classes; 60, 120, and 180 split points are used. FR values of 0-60 min are discretized as "Class 1," 60-120 min as "Class 2," and >120 min as "Class 3." ¶ Once the data is discretized, the Bayesian network interface within CausalNex is trained to create a model of how each variable affects other variables. The model is created by learning the joint probability distribution of all the variables from the data. This representation can T A B L E 1 Statistics on collected database be seen in Figure 6 where variables are linked together by arrows that indicate which ones affect them most strongly (causal flow). Then, this model was examined for its predictive accuracy. Since the data was discretized, then a classification analysis was followed. In this analysis, the database was split into two sets as 70/30 for the training and testing of the model.** The classification was performed with the Bayesian Network Classifier as recommended by CausalNex (see Table 3). The accuracy score is 0.86/0.86 with precision and recall at 0.90/0.89 and 0.910.86 for training/testing, respectively-equations below Recall Percision where TP (denotes true positives), TN (denotes true negatives), FP (denotes false positives), and FN (denotes false negatives). Following the classification procedure, the discretized data is subjected to regression in order for its explanatory power. The discretized dataset has a 30% test and an 70% training split. The Random Forest machine learning model is able to achieve an R 2 score of 0.85 and 0.811 on the training and test data, which demonstrates its ability and potential for use in future applications where accuracy matters most.

| Causal inference analysis
Once the causal structure was arrived at from CausalNex, we turned gears toward DoWhy to infer the outcome of possible interventions upon the fire resistance of RC columns. Four subanalyses were conducted herein. These analyses include interventions on: isolated DAGs, DAG from CausalNex, modifying CausalNex's DAG with domain knowledge, and a hypothetical DAG. The results from each subanalysis are presented below.

| Isolated DAGs
In the first subanalysis, we built DAGs that only tie a particular variable to the outcome (FR = Y) via an intervention or treatment (T). Fundamentally, a chain is created to link an input variable to the FR. The goal of this analysis is to explore the maximal influence of a particular variable on the fire resistance of RC columns if an intervention is applied (T = 1), or not (T = 0)-see Figure 7. The value of the selected treatment for all of these models is based on the dataset's average value of the influencing variable. For example, in Figure 7, the Treatment value is 1 if W is more than 313.2 mm and is 0 for columns with W is less than 313.2 mm.
For example, the model assigns T = 1 for all columns with W > 313.2 mm and also assigns T = 0 for all columns with W < 313.2 mm (while keeping the other variables as is). Now, the difference in the estimated FR for columns T = 1 and T = 0 is the tabulated mean value listed in Table 2. To ensure that this value is significant, the p-value associated with each estimated mean is checked against the traditional significance limit of 5%. Estimates with a p-value less than 5% imply significance, and p-values of more than 5% imply poor significance. In the latter, the estimated mean can be ignored. Table 4 provides a complete picture of the carried out analysis herein. For instance, when the influence of W on FR is examined, the causal analysis shows that FR increases by 52.8 min when W is larger than the average. Similarly, when the intervention (T = 1) is applied, FR increases by 34.7, 55.4, 145.8, 94.4, 130.9, and 88.7 min, respectively, for r, f c , C, P, L, and K, respectively. The results of the analysis were examined through three different refute methods as described in the previous section. The predicted fire resistance values from the Random common cause and data subset refuter models from three separate refute models are close to the estimated values, whereas the new values obtained from the Placebo treatment refutation model are near to zero. Thus, Table 4 shows that predictions from our analysis for each variable are reliable.

| DAG from CausalNex
Second, we combine both DoWhy and CausalNEx. We employ the CausalNex-developed DAG causality model shown in Figure 8 in the DoWhy analysis. For this, we build seven distinct models, as shown in Figure 8, to examine the effect of intervention/treatment (T) on each variable and fire resistance. The same figure demonstrates that all of the variables, together with the treatment values, have an effect on the output (FR), and some of the variables influence each other. It is clear that some of the identified relationships do not align with our domain knowledge. Thus, for the sake of discussion, we explore the holistic causal DAG here and then refine our DAG in the next section to incorporate domain knowledge. Table 5 shows the value of fire resistance estimates and refutes for each variable. One can see that the FR reduces by 39.6 min when T = 1 for the steel for W is displayed as À252 min, this value is associated with a large p-value and hence can be ignored.

| Modifying CausalNex's DAG with domain knowledge
Given that some of the identified causal links in the DAG by CausalNex do not align with our domain knowledge, an attempt was carried out herein to augment this DAG. Thus, new DAGs were developed, as shown in Figure 9.
In creating these DAGs, we assumed the following: • Knowing the boundary conditions of a particular column affects its length via K. • Knowing the applied level of loading, P, influences the design of columns in terms of f c , W, and r. In turns, f c also affects r. • Knowing W, influences the size of the concrete cover, C.
• Given the above, links can be tied to the fire resistance (FR = Y).
As shown in Table 6, when the effects of all variables on each other and on FR are considered together with the effect of T = 1 on FR, then W, L, f c , and K has a negative effect on FR, whereas the opposite is true for r, C, and P. Likewise, when the refute values are evaluated, it is clear that the estimated values are trustworthy.

| Hypothetical DAG
Finally, we obtain seven different models by disregarding the effects of all variables on each other and assuming that they only have an influence on FR and that each variable above or below the mean separately affects the Treatment value, as shown in Figure 10. In this DAG, we assumed that all variables only have a direct causal link with FR (i.e., without any inter-relation to other   F I G U R E 1 2 Comparisons applied due to interventions variables). Our goal is to further examine the validity of the previous DAG. Table 7 shows that when all variables are assessed for their impacts on FR, positive interventions/ treatments negatively influence FR for W, L, and K, whereas they positively influence FR for r, f c , C, and P. The analysis from this DAG also seems to satisfy all refuting models. It is interesting to note that results from this subanalysis match well (with the exception of f c ) with that from the DAG that was augmented with domain knowledge.

| COMPARISON BETWEEN REGRESSION, ACCEPTED METHODS, MACHINE LEARNING, AND CAUSAL ANALYSIS
In the section, a comparison is drawn between predictions obtained from the four causal models, as well as linear regression, accepted methods, and a previously published machine learning model. 39 We start by showcasing validation plots of fire resistance predictions from each method against those measured from fire tests. These validation plots will show the predictive capability of each method. Then, all methods will be compared in terms of interventions as a means to draw attention to the key differences between predictive modeling and causal modeling.

| Predictive validation
The multilinear regression returned the following expression (in minutes): RF ¼ 0:35f c þ 0:15W À 15:8r À 289:3K À 25:3L þ 1:94C À 0:01P þ 396:1 It is worth noting that this expression yielded a coefficient of determination = 0.58 and a correlation coefficient of 0.76. A full discussion on Eurocode 2 (EC2) method, 62 Kodur and Raut (K&R) method, 63 and machine learning (ML) 39 can be found in their respective resources. These were not repeated herein for brevity. Figure 11 depicts a comparison between all these methods. This figure clearly shows the superior predictivity of the ML model. It is also clear that all other methods, with the expectation of the Regression method, seem to perform well for columns with fire resistance of less than 240 min.

| Interventional estimation
This section draws a comparison as to how predictions from each would be affected by an act of intervention similar to that carried out in the causal case study (i.e., wherein the average value of a variable replaces the values of this particular variable across all examined columns). Figure 12 shows how intervening was not properly captured across the methods. In fact, this action can be seen to cause a radical shift in each of the method's predictions. Such a shift can be clearly seen in the case of the most precious ML model. † † Such a shift can be explained by the fact that all methods were designed to relate all variables, as observed in fire tests. In such tests, the columns had different variables (vs. a fixed variable across all columns). Thus, each method is primarily driven by association and correlation to minimize the variance of the outcome instead of displaying the actual causal mechanism tying each variable to the fire resistance of RC columns. This further emphasized the need to pursue causal analysis in our area.

| CONCLUSIONS
This paper presents a look into causal discovery and causal inference as means to explore the phenomenon of fire resistance of reinforced concrete (RC) columns. Causal discovery is adopted first to uncover the causal structure regarding the examined phenomenon, and then the causal inference is applied to infer the influence of interventions on each of the identified parameters of influence. The following list of inferences can also be drawn from the findings of this study: • Integrating causal principles is expected to further accelerate knowledge discovery in our domain. Further work on this front is warranted and certainly needed. • Algorithmic causal graphs may, and sometimes may not, agree with domain knowledge, and hence incorporating such knowledge into casual analysis is of merit. • Interventions upon the discovered DAGs are seen to be highly influential in terms of column width, column length, concrete cover to reinforcement, effective length factor, and compressive strength of concrete.
Interventions on the level of applied loading and/or reinforcement ratio did not significantly alter fire resistance. • Unlike traditional ML analysis, causal analysis provides us with the most realistic predictions as it can accommodate interventions (without needing new tests or experiments) vs. pure statistical associational predictions provided by traditional ML.

ACKNOWLEDGMENTS
We would like to thank the Editor and Reviewers for their support of this work and for constructive comments that enhanced the quality of this manuscript.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
ORCID M. Z. Naser https://orcid.org/0000-0003-1350-3654 Aybike Özyüksel Çiftçio glu https://orcid.org/0000-0003-4424-7622 ENDNOTES * The discussion on causality can continue to the intersection of philosophy, epistemology, and ontology. A broader look into causality can be found herein. 43,64,65 † The reader is highly encouraged to visit the following sources on causality, do-calculus, and others. 66,67 ‡ Random Common Cause: Adds randomly drawn variables to the database and re-runs the analysis to see if the causal estimate changes or not. The causal estimate should not change by much due to a random variable. Data Subset Refuter: Creates subsets of the data and checks whether the causal estimates vary across subsets. In order to effectively measure causation, there should not be large variances in the estimates. Placebo Treatment Refuter: Randomly assigns a variable as a treatment and re-runs the analysis. If a causal relationship exists, then the causal estimate will move toward zero. § This particular paper is currently under review. ¶ We would like to acknowledge that a more traditional approach would be to have five classes (0-60, 60-120, 120-180, 180-240, and larger than 240 min). We attempted with these five classes initially and noted the poor performance of the models. As such, we opted to use three classes (since four classes also failed to provide a favorable performance). We will be carrying out a more detailed analysis in a future study to explore the influence of database size and classes. ** Please note that CausalNex does not incorporate cross-validation training at the moment. † † Please note that Figure 12 shows a shift associated with intervening on one variable. A multi-intervention can yield a more compound shift where each method seems to center outsee subfigures g and e.