Bayesian network analysis reveals the interplay of intracranial aneurysm rupture risk factors

Clinical decision making regarding the treatment of unruptured intracranial aneurysms (IA) benefits from a better understanding of the interplay of IA rupture risk factors. Probabilistic graphical models can capture and graphically display potentially causal relationships in a mechanistic model. In this study, Bayesian networks (BN) were used to estimate IA rupture risk factors influences. From 1248 IA patient records, a retrospective, single-cohort, patient-level data set with 9 phenotypic rupture risk factors (n = 790 complete entries) was extracted. Prior knowledge together with scorebased structure learning algorithms estimated rupture risk factor interactions. Two approaches, discrete and mixed-data additive BN, were implemented and compared. The corresponding graphs were learned using nonparametric bootstrapping and Markov chain Monte Carlo, respectively. The BN models were compared to standard descriptive and regression analysis methods. Correlation and regression analyses showed significant associations between IA rupture status and patient’s sex, familial history of IA, age at IA diagnosis, IA location, IA size and IA multiplicity. BN models confirmed the findings from standard analysis methods. More precisely, they directly associated IA rupture with familial history of IA, IA size and IA location in a discrete framework. Additive model formulation, enabling mixed-data, found that IA rupture was directly influenced by patient age at diagnosis besides additional mutual influences of the risk factors. This study establishes a data-driven methodology for mechanistic disease modelling of IA rupture and shows the potential to direct clinical decision-making in IA treatment, allowing personalised prediction.


A B S T R A C T
Clinical decision making regarding the treatment of unruptured intracranial aneurysms (IA) benefits from a better understanding of the interplay of IA rupture risk factors. Probabilistic graphical models can capture and graphically display potentially causal relationships in a mechanistic model. In this study, Bayesian networks (BN) were used to estimate IA rupture risk factors influences.
From 1248 IA patient records, a retrospective, single-cohort, patient-level data set with 9 phenotypic rupture risk factors ( = 790 complete entries) was extracted. Prior knowledge together with scorebased structure learning algorithms estimated rupture risk factor interactions. Two approaches, discrete and mixed-data additive BN, were implemented and compared. The corresponding graphs were learned using nonparametric bootstrapping and Markov chain Monte Carlo, respectively. The BN models were compared to standard descriptive and regression analysis methods.
Correlation and regression analyses showed significant associations between IA rupture status and patient's sex, familial history of IA, age at IA diagnosis, IA location, IA size and IA multiplicity. BN models confirmed the findings from standard analysis methods. More precisely, they directly associated IA rupture with familial history of IA, IA size and IA location in a discrete framework. Additive model formulation, enabling mixed-data, found that IA rupture was directly influenced by patient age at diagnosis besides additional mutual influences of the risk factors.
This study establishes a data-driven methodology for mechanistic disease modelling of IA rupture and shows the potential to direct clinical decision-making in IA treatment, allowing personalised prediction.

Introduction
Intracranial aneurysms (IA) are protuberant expansions of cerebral arteries. About 3% of the population is affected by an unruptured intracranial aneurysm (UIA) [1] and is hence at potential risk of rupture [2]. Most IAs are asymptomatic, while sudden rupture of IA leads to aneurysmal subarachnoid haemorrhage (aSAH), a type of hemorrhagic stroke with often poor functional outcome [1]. Uncovering asymptomatic UIAs often occurs coincidentally during diagnosis for other reasons using medical imaging. The detection rate of UIAs increases with progress in imaging technologies [1,3]. based on expertise [10], guidelines [2] and risk prediction scores (i.e. PHASES [12], UIATS [10]). The former evaluate known risk factors and their influence on rupture status based on multivariable linear regression (PHASES) or clinical consensus (UIATS).
Despite extensive research efforts to understand the susceptibility, growth and rupture of IAs [8,14,15], there is no advanced holistic disease model, that allows guiding clinical decision-making accurately. It is inherently difficult to understand the decision process of black-box IA rupture prediction models [16]. Additionally, high performing machine learning models are often challenged by the data availability [17] and confounding variables [18]. It is only possible to increase the functional outcome after rupture and thus reduce the financial impact of IAs on the health care system if there is a better understanding of IA rupture risk factors [2,11,15]. In this sense, a mechanistic model allowing for UIAs treatment personalised decision making appears essential.
Regression analysis identifies statistically significant associations between outcome and covariates without directionalities [19] and without disentangling the correlation of covariates. In this publication, we apply Bayesian network (BN) analysis [20,21] to learn the structure of BNs representing interdependencies between major phenotypic risk factors and their influence on UIA rupture in order to improve the understanding of the disease.
BNs are a form of a probabilistic graphical model (PGM) that represent the conditional independence structure of data and prior knowledge via a directed acyclic graph (DAG) [22]. The DAG represents the structure of the BN in a qualitative way, but at the same time it can be used to estimate causal relationships. Moreover, BNs allow for quantitative reasoning by evaluating the probability of clinicallyrelevant events of interest, thus making it possible to generate and evaluate clinical hypotheses in silico [23].
The combination of a readily interpretable qualitative representation and quantitative reasoning make BN models a powerful alternative to standard regression models. In contrast to models that focus solely on the dependent variable and that make the simplifying assumption that independent variables are independent and influence the dependent variable directly, BNs can learn the dependence structure of all variables directly from data. Such complex dependencies between variables are represented as a DAG, a network of nodes connected by edges with direction if they are conditionally dependent [24]. BNs are an established statistical framework [25] and have been used in health science for decision support [26][27][28][29][30][31][32][33], clinical risk factor analysis [34][35][36][37] and psychology disease analysis [38,39]. Even though they are a promising tool for decision support, they are hardly used in the clinical routine so far.
This article intends to introduce PGMs as an explainable data-driven disease modelling technique for IA and to demonstrate their potential by applying them to a single-centre data set of well-known clinical and IA specific risk factors.
In the present paper, we first provide the required methodological background. Cohort description is followed by risk factor correlation analyses and logistic regression (LR) for IA rupture status. We learn the conditional dependence structure of these risk factors and IA rupture status from data, firstly with classic discrete BNs and then with a more recent mixed-data BN approach with reduced information loss. To this end, we use prior knowledge to prevent BN structure learning (SL) from including physically impossible dependence directions in the models. Afterwards, we learn the parameters of the BNs from data to make quantitative statements on the size, the sign and the direction of clinical effects. The results from all methods were finally compared and discussed in relation to existing literature.

Bayesian network models
A graphical model such as a BN is a representation of the joint probability distribution of a set of variables that factorises according to some graphical structure [40]. The graph structure can be learned from data while incorporating prior knowledge, resulting in a graphical representation of the joint probability distribution. In a second step, the parameters of the distribution can be learned from data following the graph structure.
A directed graph = ( , ) is a set of nodes and edges called arcs . In a DAG, we assume that all arcs are directed and the graph is acyclic, that is, that there is no sequence of arcs connecting a node to itself.
A BN model = ( , ) is a generative stochastic model [41] defined over a set of variables = { 1 , … , } that consists of both a graph structure and parameters . Their dependencies are typically represented in the form of a DAG with (missing) arcs representing conditional (in-)dependencies between variables. The joint distribution of all variables ( ) factorises into conditional probabilities ( | ) of each variable conditional on its parent variables : BN model learning uses the posterior distribution ( | ) to find a BN model that is close to the unknown data generating probability distribution * given a set of data [42]: The term ( ) denotes a prior over the model space that we may specify based on expert knowledge, and ( | ) denotes the marginal likelihood of the data. In principle, expectation values under the posterior given in Eq.
(2) can be calculated by integrating over all potential models . However, this model space is too large to explore exhaustively in most real-world applications and heuristic methods are often used to find the best candidate model [43]. The posterior distribution of the BN model given the data, ( | ) = ( , | ), is usually estimated in two inter-related steps [34,44]: Typically, SL estimates a set of candidate DAGs = { 1 , … , } called a Markov equivalence class [45] that best represents the dependence structure of [42]. Subsets of are naturally grouped into classes that correspond to decompositions of ( ) into local probability distributions ( | ) that can be shown to be equivalent by way of Bayes' theorem. Parameter learning then estimates the parameters of the distributions of the learned structure of the DAG.
In this publication we learned two different types of BNs using datadriven algorithms and incorporating prior knowledge: discrete BNs (DBNs) and additive BNs (ABNs). Prior knowledge was incorporated in the process by setting prior probabilities of certain arcs to zero, signifying a physically impossible dependence direction.

Discrete Bayesian networks
All variables of a DBN are multinomial (categorical) random variables each with a finite number of possible states , [46]. The parameters of DBNs correspond to the conditional probabilities in which ( ) factorises in Eq. (1). Model complexity then depends on the overall number of free parameters.
In general, BNs can also be learned from data sets with continuous and mixed data using, for instance, local Gaussian distributions linked by linear relationships between variables [47]. The resulting conditional linear Gaussian BN models have the limitation that discrete nodes are only allowed to have discrete parent nodes. A possible solution to avoid this limitation is to discretise the continuous variables in the data first and then learn a DBN model. Their values are binned into intervals which become the states of the categorical variables in the DBN. However, it is difficult in practice to find the best compromise between information loss and the complexity of DBN. Moreover, the binning intervals must have a meaningful clinical interpretation for the DBN to be useful to practitioners.

Additive Bayesian networks
Another BN model parameterisation that does not restrict parentchild combinations is that of ABNs, which use members of the exponential family as local distributions for the nodes, parameterised by: That means, nodes in an ABN are parameterised as generalised linear models with the parent nodes as covariates and a link function ( ) [22]. For example, a binomial variable 1 is parameterised as local LR model with the logit link function ( 1 ) = logit( 1 ). This allows to mix discrete and continuous variables [34] without any structural restrictions. Learning an ABN again involves a combination of structure and parameter learning and missing arcs in the DAG represent independence between variables.

Structure learning
Structure learning tries to de-confound the set of variables by reconstructing the graphical representation of the joint distribution of the true, data-generating structural (causal) model [48]. The true network structure is usually unknown and difficult to estimate because the number of DAGs in the search space is super-exponential with respect to the number of variables [49]. Hence, structure learning is often performed using heuristic algorithms [20]; with a small number of variables exact optimisation algorithm become feasible [50]. SL can be improved by incorporating domain knowledge [51,52]. Restricting (or retaining) arcs reduces the structural search space by reducing the number of candidate DAGs as well as by ruling out DAGs within the same Markov equivalence class that have the same score [53].
In this study, we use score-based SL algorithms which optimise a network score in order to learn a DAG. Greedy search is a common choice in this class as it has shown to be very efficient [54]. Local minima are avoided using a combination of hill climbing, memory (tabu list [55]) and random restarts [56].
Model selection generally considers both goodness of fit and some measure of model complexity [57]. Universality and efficiency make the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC) possible choices for a network score. However, AIC relies on a constant penalty coefficient for model complexity and is known to overfit models for small data sets, whereas the BIC penalty coefficient increases with the logarithm of the sample size . Furthermore, BIC is a first-order approximation of ( | ) [58]. Different network structures will have the same score value for both AIC and BIC if they belong to the same equivalence class, which makes it difficult to rely solely on the network score for structural validation [46,58]. Repeated, independent SL under parametric or non-parametric bootstrap resampling allows to estimate our confidence (or, equivalently, our uncertainty) in i.e. arc presence and direction [59,60].
As an alternative to such a maximum a posteriori (MAP) approach, we can consider the whole posterior distribution of the BN structures by Markov chain Monte Carlo (MCMC) sampling. In this way, we can estimate both the expected BN structure and the uncertainty of both the structure as a whole and individual structural features of interest [61]. In contrast to heuristic approaches, MCMC methods allow asymptotically exact inference under the posterior ( | ) [56] but are computationally more demanding. This is feasible in practice by reducing the structure search space to the space of orders where all DAGs with the same topological ordering are combined [61]. An efficient, exact SL algorithm was presented in Koivisto and Sood [50] resulting in a single DAG estimation but it is usually limited to at most 25 random variables. Restricting the maximum number of parents for each node also reduces the complexity further and improves practical applicability [62].
When we have some prior knowledge about the most probable network structure and about the local distributions of its nodes, we can use parametric bootstrapping to account for overfitting and calculate model validation metrics [61]. Sampling networks structures with MCMC methods according to how well they are supported by the data is computationally demanding [63], but possible with suitable choices of stepping techniques [64]. Single-edge operations such as those used in hill-climbing (e.g. [63]) mix slowly and when sampling over the order of structures [61] it is not possible to specify priors on graph structures. Alternative algorithms perform larger steps to mix and converge faster [65] or escape local maxima by Markov blanket resampling [66].

Bayesian network model validation
A DBN model should be validated at the different stages of the learning process using both model-and domain-specific approaches. The arcs present in the DAG learned from the data can be screened by their confidence, which can be computed from a set of DAGs obtained either by bootstrapping or MCMC sampling. We can measure the confidence in each arc from the number that it appears in the DAGs, and then create a consensus DAG which is the structural average of those DAGs. For example, repeatedly applying a SL algorithm (e.g. tabu search [67]) within a non-parametric bootstrap framework of 1000 bootstrap replicates results in 1000 possibly different DAGs. The arc confidence e.g. of an arc from → is 0.6 if it appeared in 600 out of the 1000 estimated DAGs. The minimum level of arc-strength (arc-strength threshold or arc-strength significance level) can be calculated by minimising the L1-norm of the cumulative density function (CDF) of the observed arc-strength and the CDF of its asymptotic counterpart [68]. Higher values of arc-strength significance threshold correspond to higher degrees of sparsity of the DAG.

Methods
The statistical software R version 4.1.2 [69] was used for all data processing and analysis. Data, code and extended results are publicly accessible and the findings from this study can be reproduced using the provided R-package available on Github [70].

Data
The data for this analysis was obtained from a subset of the Aneurysm Data Bank (ADB) [71] of patients that were diagnosed with an IA by three dimensional digital subtraction angiography, three dimensional magnetic resonance angiography, three dimensional computed tomography angiography or surgical documentation. The ADB includes patients evaluated for IA from the Geneva University hospitals (HUG) from November 1, 2006 up to June 1, 2021.
As part of clinical routine, cerebral vessel images were reviewed by two independent and experienced clinicians; one vascular neurosurgeon and one vascular neuroradiologist. Aneurysm size was measured using orthogonal maximum intensity projection (MIP) as well as 3D volume rendering reconstructions using Osirix V.7.5 software (Osirix, Pixmeo, Geneva, Switzerland [72]).
The study was approved by the Ethical Committee of the Geneva University Hospitals and by Swissethics (@neurIST protocol, ethics authorisation PB_2018-00073, previously CER 07-056). All patients consented their data to be used for research in the field of cerebrovascular diseases according to the information and consent forms. All procedures were in accordance with the World Medical Association's Declaration of Helsinki.
In the ADB, 2008 IAs were recorded in total. From these, the most dangerous IA per patient was selected if multiple IAs were present (aSAH present: ruptured IA, no aSAH present: IA most at risk of rupture by clinical assessment according to its size and location [12]), which Table 1 Detailed description of variables considered for IA rupture risk analysis. Abbreviations: anterior communicating artery (Acom), posterior communicating artery (Pcom), vertebral and basilar arteries (V-B), A2 segment of anterior cerebral artery (A2), posterior cerebral artery (PC), middle cerebral artery (MCA), internal carotid artery (ICA), basilar artery (Basilar), A1 segment of anterior cerebral artery (A1 segment ant), cavernous internal carotid artery (CavICA), ophthalmic internal carotid artery (OphtICA), blood pressure (BP).

Factor
Type Description yielded 1248 patient records. Out of those, = 790 records were complete (Suppl. Fig. A1) and subsequently harmonised to a retrospective, single-cohort data set with 9 phenotypic rupture risk factors (Table 2).
Detailed variable descriptions are provided in Table 1. Briefly, patient's age at diagnosis refers to the closest available date of first IA diagnosis. Age at time of diagnosis was discretised as described in Fig. 1 and explained in Section 2.1. IA location was classified by 12 cranial arteries which were grouped in low, medium and high rupture risk locations [12,15,73]. Grouping IA locations by risk of rupture reduces the model's degrees of freedom (Section 2.1). Transforming IA size by the logarithm allowed the approximation by a Gaussian distribution for ABNs and was followed by removing three outliers with log(IA size) > 4 (≈54 mm).

Correlation analysis
For all possible discrete IA rupture risk factor combinations 2 tests of independence were performed. For an intra-variable level correlation analysis all discrete variables were one-hot encoded and Spearman's was computed for each pair of ordinal feature category combination.
All descriptive analyses were performed using the R stats [69] package.

Regression analysis
IA rupture risk was modelled with standard statistical methods using R stats package [69] as a baseline method as used in previous studies [10,12,74,75]. The data set was split into training (70%) and testing (30%) set for the regression analyses. The model's discrimination ability was reported with the area under the receiver operating curve (AUC). Associations with IA rupture (vs. no rupture) were identified using multivariable LR for all discretised IA rupture risk factors. For all predictor variables odds ratios (ORs) and corresponding 95% confidence intervals (CI) were calculated. A generalised additive model was fit using the R mgcv package [76] for IA rupture depending on all discrete IA rupture risk factors together with age and IA size as continuous covariates.

A priori search space restriction
To prevent clinically implausible relationships from being included as arcs in the BN, we compiled a list of restricted arcs as follows. All variables were assigned to one of four categories: target variable, aneurysm properties, modifiable and non-modifiable risk factors (see Fig. 2). The target variable is restricted to have only incoming arcs. Aneurysm properties neither influence modifiable nor nonmodifiable risk factors. Non-modifiable risk factors cannot be influenced by any other variable. Additionally, patient's biological sex is influenced neither by patient's age at diagnosis nor by their familial disease history.
In order to use the BN to explore the structure of the data, no arcs were forced to appear in the BN a priori.

Discrete Bayesian network
DBN analysis was performed in R using bnlearn [77]. We used greedy search to learn the DBN models (tabu search [55]) which is a common choice of score-based SL algorithm [67]. The final discrete consensus BN is the averaged structural model resulting from repetitively applying tabu search (tabu list size = 18) in a non-parametric bootstrap sample. Model averaging after SL within a bootstrap framework reduces the potential of spurious structural patterns [60] and is frequently performed (e.g. [39,68,78]).
We chose a large number of bootstrap replicates (R = 100 000) with the sample size equal to the data set to preserve any fine structures in the data distribution. We learned three DBNs analogously with different network scores and then estimated  Rupture status is forced as leaf node with no child nodes. IA properties can be parent of rupture but cannot be cause of any modifiable or non-modifiable risk factors. In addition, IA size and location cannot be parent of IA multiplicity. Modifiable risk factors cannot be parent of any non-modifiable risk factor. Non-modifiable risk factors cannot be child of any variable from any other category, with sex enforced as root node.
the parameters of the final consensus DBN by maximum likelihood: AIC = − 2 , BIC = − 2 ( ) and a custom score = − 4 ( ) that differs from BIC only in its penalty coefficient. We denoted the maximum-log likelihood as = ( ( | )), is the network's total number of parameters and the sample size.
Hold-out cross-validation is a common approach to quantify the predictive accuracy of a model [42]. The data set is divided in a training set and a hold-out test set. Especially with small data sets, it is difficult to learn a BN model on an even smaller training set that generalises well on the test set. To keep the training set as large as possible, leaveone-out cross-validation (LOOCV) iterates over the whole data set and trains the model on all but one observation, which becomes the holdout set, and averages the performance of all iterations. The posterior classification error was estimated on a set of arbitrary nodes based on likelihood weighting, from = 50 runs of hold-out cross-validation for each of the three networks.

Additive Bayesian network
The ABN parameterisation allows to use continuous and discrete variables and hence prevents information loss caused by initially discretising continuous variables.
The data of this study is of small dimensionality (i.e. 9 variables) permitting exact SL. An ABN was learned with the exact search algorithm from Koivisto and Sood [50] implemented in the R packages abn [21] and mcmcabn [64] as a starting point for MCMC.
The final consensus ABN model from MCMC was obtained in a two step process. Firstly, we incrementally increased the maximum number of parents (in-degree) until the network score (BIC) did not improve anymore (more details are available in the online repository [70]).
Secondly, we generated a set of posterior candidate structures by MCMC sampling starting from this locally optimal model (see also [70]). We ran four independent chains for 100 000 iterations with burn-in period of, 25 000 samples and a thinning factor of 7 to perform MCMC model choice [61] with large scale edge reversal [65] and Markov blanket resampling [66]. We evaluated mixing and processing of the MCMC chains with standard diagnostic tools from the R package coda [79].
Lastly, we computed the arc-strength and arc-strength significance levels and created a final consensus ABN containing only arcs above the arc significance threshold.
The parameters for the final consensus ABN were estimated by maximum likelihood, taking the form of log-odds ratios for discrete variables and of correlation coefficients and standard errors for continuous nodes.

Correlation analysis
The results of the 2 tests of independence are summarised in Fig. 3. A significant relationship between IA rupture status and patient's sex, familial history of IA, age at IA diagnosis, IA location, IA size and IA multiplicity was detected. IA multiplicity significantly related with IA size and patient's awareness about hypertension. IA size and location showed significant relations with patient's sex and positive familial history. IA location additionally depended on patient's awareness of hypertension and the smoking behaviour. Smoking behaviour significantly related with sex and patient's age at diagnosis. Patient's age at diagnosis and awareness of hypertension significantly related.
Variable level correlation analysis provided a more detailed view (Suppl. Fig. B2). For example, the correlation for IAs classified in lowrisk locations with rupture was found negative, positive for high-risk locations, and no correlation was found for IAs in medium risk locations. Female sex positively correlated with a diagnosis at age over 65, being never smokers, diagnosed more likely than males with multiple UIAs and IAs in low-risk locations. Males positively correlate with a diagnosis at age below 40 years, being smokers (in particular current smokers at the time of diagnosis). In addition, Spearman's showed a positive correlation for IAs in males to be located in a high-risk location, to have a size larger than 12 mm, solitary appearance and ruptured. No direct correlation between the patient's smoking behaviour and IA rupture was detected, although they show a positive correlation to be diagnosed with multiple IAs and in medium risk locations. Smaller IAs (IA size = A) negatively correlated with IA rupture, whereas larger IAs (IA size = C) correlated positively with IA rupture. Patients with a positive familial history are more likely diagnosed with small IAs than sporadic cases. Small IAs are more likely sporadic, and large IAs show a positive correlation to be associated with the diagnoses of other IAs. No significant correlation between the patient's age at IA diagnosis and IA size was found.

Regression analysis
Multivariable LR for IA rupture status dependent on discretised IA rupture risk factors resulted in an AUC of 0.77 (95% CI: 0.72-0.83) (additional metrics in Suppl. Tab. F1) with significantly increased odds for IA rupture of IAs at high or medium-risk locations and for patients with multiple IAs. Odds for IA rupture are significantly smaller if patients were aware of IA cases in 1st degree relatives and with smaller IAs (Suppl. Fig. C3).
With a generalised additive model for IA rupture depending on all discrete IA rupture risk factors together with age and IA size as continuous covariates resulted in an AUC of 0.75 (95% CI: 0.69-0.81) (additional metrics in Suppl. Tab. F1). Again, IAs of small size (p < 0.001) and from patients with a positive familial history (p < 0.01) were significant in reducing the odds of IA rupture and IAs at medium and high-risk locations (p < 0.001), IA multiplicity (p < 0.01) and IAs diagnosed earlier in life (p < 0.001) significantly increased the odds of IA rupture (Suppl. Fig. C4). Additionally, current smokers show decreased odds of IA rupture (p < 0.05).

Controlling the number of free parameters
IA locations were recorded by the IA parent vessel artery resulting in IA location by vessel = 12 possible states and transformed into IA locations grouped by risk of IA rupture (see Section 3.1) resulting in IA location = 3 possible states (Fig. 1) to reduce the total number of free parameters of the model (see also Section 2.1).
The DAG with IA locations by vessels from SL under an ABN framework (Suppl. Fig. D5b) differed by one arc missing from gender to IA location compared to the ABN DAG of IA location grouped by risk of rupture (Fig. 4(b)).
DBN SL with IA locations by vessel results in a sparser structure, where IA location is the only node associated with rupture (Suppl. Fig.  D5a), compared to modelling IA location by risk of rupture (Suppl. Fig. E6b). The structure of the DBN with IA locations by vessel is also sparser compared to the ABN DAG learned from IA location by vessel.
An unbalanced number of possible states across the set of variables, heavily penalises in score-based SL incoming arrows of e.g. IA location by vessel with 12 free parameters. Therefore, and to compare DBNs and ABNs on most similar data structures, IA location was considered as grouped by risk for the remainder of this manuscript.

Discrete Bayesian network
Arc-strength threshold variations change structure sparsity. When BIC was used as the network score, accepting only significant arcs resulted in a sparse structure with sex, smoking status and positive familial history not associated to rupture (Suppl. Figs. E6a-E6b). It is known that these risk factors are contributing to rupture [2,4,11,75]. Lowering the arc-strength threshold to 0.3 resulted in three additional arcs connecting sex and positive familial history to the rest of the structure (Suppl. Fig. E6c).
Using the here estimated arc-strength significance threshold is equivalent to accepting arcs in the majority of DAGs from bootstrap sampling. Accepting arcs with strength smaller than 0.5 includes structural features less likely than expected at random. To support lowering the arc-strength below 0.5, structural stability was qualitatively tested by varying the penalty term for model complexity in SL. With AIC as the network score, the model resulted in an overly dense network likely indicating overfitting (results not shown but available in online repository [70]). Using a custom score with a penalty term of log( )∕4, which is half-way in between the penalty term of AIC (penalty term = 1) and BIC (penalty term = log( )∕2), the model was still dense, but less extreme in terms of arc-strength confidence levels (Suppl. Fig. E6d) from a clinical perspective.
There was a marginal difference in the posterior classification error between the partially connected BN (accuracy = 0.67), the BN with the lowered arc-strength confidence threshold (accuracy = 0.71) and the BN with a custom network score (accuracy = 0.70) (results not shown but available in the online repository [70], additional metrics in Suppl. Tab. F1).
Due to the heuristic nature of score-based SL and non-parametric bootstrapping the conservative structure of the DBN with BIC as the network score and an arc-strength confidence of 0.3 was preferred as the final consensus DBN and referred in the remainder of this paper as DBN.

Additive Bayesian network
The evaluation of the mixing and the processing of the MCMC chains from visual inspection of the trace plot showed no anomalies: the chains did not show any convergence issues (̂< 1.1 [80] calculated based on network score) and passed Heidelberger and Welch test [81] (more details are available in the online repository [70]).
The final consensus ABN shown in Fig. 4(b) includes only arcs above the arc-strength significance threshold (results not shown but available in the online repository [70]).
With a total of 14 arcs, the consensus DAG showed a tendency for underfitting compared to the set of candidate structures (results not shown but available in the online repository [70]) and was denser connected than the DBN (10 arcs). The DAG of the consensus ABN learned with the exact search algorithm and structure MCMC did not require manual modification of the threshold for arc inclusion.
The estimated parameters for the final consensus ABN are listed in Suppl. Tab. G2. Fig. 4. The colour of a node corresponds to its risk factor category and shape to the distribution respectively. Fig. 4(a) Averaged DAG of discrete Bayesian network (DBN ) model after non-parametric bootstrapping. Non-significant arcs above an arc-strength ≥0.3 are displayed with dashed lines and significant arcs with straight lines respectively. Fig. 4(b) Averaged directed acyclic graph (DAG) resulting from structural Markov chain Monte Carlo (MCMC) and formulated as additive Bayesian network (ABN ) model with all arcs being significant.

BN suggest new and reinforce known interdependecies
All associations from the DBN were also found in the ABN, while the ABN found additional ones.
In the DBN, IA rupture status was directly influenced by IA characteristic risk factors IA location and size as well as awareness of an IA in a first degree relative. In the ABN, patient's age at time of diagnosis additionally had direct influence on the rupture status with low probability of IA rupture for diagnosis at a later point in life (results not shown but available in the online repository [70]). While patient's awareness of hypertension and smoking behaviour were indirectly linked to rupture in both BNs, they were both unrelated with rupture according the 2 -test of independence (Fig. 3) and LR (Suppl. Fig. C3). Positive familial history significantly correlated with rupture in the correlation analyses ( Fig. 3 and Suppl. Suppl. Fig. B2) and LR (Suppl. Fig. C3); and was directly linked to rupture in both BN models. Awareness of familial cases of IA decreased the odds of rupture ( = −.82, = .22). BN analysis found increased odds for IA rupture with larger size ( = .42, = .09) and in high-risk locations ( = .18, = .13) compared to medium ( = −.75, = .14) or low-risk locations ( = −2.9, = .35). Both BNs showed multiplicity and positive familial history to be the factors associated with IA size and suggested awareness about hypertension as possible cause for multiple IAs. The probability for having an IA located at high or medium-risk location or having multiple IAs were higher for patients aware of hypertension compared to those who were not, which is supported with standard correlation analyses (results not shown but available in the online repository [70]). A direct relation of IA multiplicity and size was found in both BNs with a tendency towards larger size of individual IAs which was concordant to correlation analyses. Patient's awareness of IA cases in relatives showed in both networks a direct relation to IA size with high odds for detection of small IA in patients with positive familial history compared to those being unaware (results not shown but available in the online repository [70]). Awareness of positive familial cases and IA location were found to be associated with low arc-strength (0.64) in the additive BN and found support as significant correlation in standard correlation analyses. Being aware of a positive familial history of IA showed increased odds for IAs at low-risk locations compared to patients being unaware ( = .34, = .18). IA locations were indirectly associated with IA size and patient's age at rupture in both BNs with no significant relation detected in 2 test of independence and partially significant correlations in Spearman's correlation analysis. In both standard correlation analyses significant influences of IA location and patient's sex were found. This relation was found to be not significant in the DBN, but significant in the ABN showing high odds of IAs at low ( = 1.00, = .25) or medium ( = .51, = .19) risk locations for females compared to males. Sampling from the DBN marginalised joint probability distribution supported this finding (results not shown but available in the online repository [70]). In other words, women showed a tendency for IAs at low-risk locations.
Smoking behaviour, IA size and age at rupture in ABN was detected as direct association between smoking and patients age at diagnosis and indirect between smoking and IA size, but the association was not significant in the DBN. Smoking, sex and age showed a relationship according the 2 test of independence and partially significant correlations in Spearman's correlation analysis. Patient's sex and smoking behaviour were in both BNs directly associated, with a non significant association to former smokers in the ABN ( = −.17, = .22, see Suppl. Table G2)) and with higher odds for females to be non smokers compared to males ( = .47, = .11). In both BNs, smoking status did not directly link to any other risk factor of IA rupture. Patient's sex was additionally found to be associated to patient's awareness of some form of hypertension in the additive BN with increased odds for hypertension in males compared to females (see Suppl. Table G2). It was shown in both BNs that awareness of hypertension leads to a higher age at rupture. The probability for no hypertension and IA diagnosis at young age was higher compared to the awareness of some form of hypertension and an early diagnosis of IA.

Discussion
In this study, the complex interdependencies between phenotypic risk factors for IA rupture via probabilistic graphical models were explored and quantified. BNs are able to learn interfactorial associations from data by incorporating expert knowledge.
To mitigate the information loss from discretising variables for DBN analysis, ABNs with mixed, non-discretised data were used. In ABNs patient age at diagnosis and log-transformed IA size were modelled as continuous variables, approximated with a Gaussian ( age = 2 free parameters) distribution. Additionally, it was shown that grouping variable levels, further reduces the model's degrees of freedom resulting in a denser, more stable network structure. The number of possible states of IA location was reduced by combining IA parent vessels into clinical meaningful rupture risk groups of approximately equal bin-size. An approximately equal number of possible states over all variables controls for a balanced penalty coefficient of the network score across all nodes. Overall, the data transformations benefitted SL for DBNs; especially in situations where ABNs are not feasible (e.g. >25 nodes). Furthermore, they allowed to compare and contrast ABNs and DBNs for the first time in their application in a real-world scenario and thereby highlight important decisions for the data preparation.
The findings from a single-centre data set were set in contrast to non-graphical, standard models and it was shown that BN facilitate knowledge discovery of this disease. For example, while patient's awareness of hypertension and smoking behaviour are indirectly linked to rupture in both BNs, they are both unrelated with rupture according the 2 -test of independence (Fig. 3) and LR (Suppl. Fig. C3). Only positive familial history significantly correlates with rupture in Spearman's correlation analysis (Suppl. Fig. 2). Straightforward interpretability is a major advantage of BNs and can help for clinical acceptance as decision making support system [82], while the lack thereof poses a major hurdle for e.g. deep learning classifiers [18,83]. DBNs and ABNs enable to quantify and evaluate the uncertainty of the relationship between variables.

Clinical implications
Understanding the effect of risk factors on IA rupture is of paramount importance to identify patients at risk, recommend measures on modifiable factors and eventually perform interventions on IAs. Estimating the risk of rupture of patients diagnosed with UIAs has been so far based on the observation of patients diagnosed with IA and selected for observation or refusing intervention comparing differences between cases that never encountered an IA rupture and those that did. The approach suffers from a severe case selection bias. The PHASES model derived from those studies was developed, assuming independence among risk factors. In this study, BNs showed that only a few factors, i.e. IA location, IA size, familial disease history and patient age at diagnosis, are directly associated with IA rupture. All other factors indirectly influence the risk of rupture by interacting with IA location or IA size. IA location is influenced by patient sex, females being more likely affected by IA in low-risk locations, and patients aware of suffering high blood pressure were more likely to be diagnosed with IA in low-risk locations. IA size at diagnosis is influenced by a positive familial history of the disease, increasing the odds of diagnosis at a small size. IA multiplicity also increases the odds of IA diagnosis at a smaller size. Awareness of high blood pressure is associated with higher odds of being diagnosed with multiple IAs and IAs in low-risk locations. Concomitantly, awareness of high blood pressure is influenced by the sex and age of the patient and indirectly by the patients' smoking status. The complexity of risk factor interdependencies may contribute to difficulty and controversies regarding the estimation of the risk of IA rupture.

Limitations
Even though we could present the application of an explainable decision support model, there are still major limitations for generalisation. The DAG which represents the underlying BN proposes directions for causal relationships which must be interpreted carefully. For causal inference, the absence of unmeasured hidden confounding or selection variables must be ensured and the joint distribution must satisfy all conditional independencies which are encoded in the DAG with no further relationship required to satisfy this condition [84]. These are strong assumptions and difficult to satisfy for real-world data [85] which makes it a highly relevant research area [86][87][88]. The strong restrictions we enforced on the arc presence and direction when learning the DAGs (Fig. 2) not only reduced the model search space but also ensured definite arc direction in most cases.
The model so far describes the late stage of the disease as having the largest impact on patients. The role of factors inducing the disease and disease progression requires specific studies to be performed that will extend and improve the current study. The design of this study attempts to reduce this bias in the disease model by comparing a cross-sectional observation at the time of diagnosis of two well-defined populations [71]. Nevertheless and despite consecutive populationbased recruitment, it is expected that the studied samples do not represent the whole population at risk and the entire population that suffered from IA rupture. Firstly, the cohort of patients diagnosed with unruptured IA is biased regarding the diagnosis of unruptured IAs because there is no systematic population screening of IA. For example, some patients with a positive family history of the disease are often but not always screened at a young age; people suffering migraine, headaches, vertigo, or tinnitus are more prone to receive head imaging. Secondly, patients with small and very small aneurysms are likely to be underreported either because of small lesions that are more difficult to detect or because radiologists' balance regarding the impact of diagnosis on the patient quality of life and benefit of managing the disease is considered ethically in favour of omitting diagnosis. Thirdly, cases with high or moderate risk lesions are actively removed from the pool of atrisk patients, therefore artificially reducing the frequency of moderate and high-risk IA rupture. Patients with moderate or high-risk lesions or advised, and most do undergo interventions to exclude the IA, prevent the IA from rupturing. Finally, patients suffering from an extremely severe clinical condition after IA rupture, old or disabled patients or patients with advanced directives may only benefit from palliative care and may be denied investigations allowing the diagnosis of IA and SAH. To overcome the issue of those four biases, applying three corrective factors can help to correct them: (1) accounting for the probability of diagnosis under-reporting to better estimate the population at risk to be eventually assessed with a prospective population-based screening; (2) considering the probability of having the IA excluded by an intervention to assess better the population remaining at risk after diagnosis; (3) assessing the probability of palliative care with a dedicated study to better estimate the population of patients not diagnosed after an IA rupture because care and interventions are considered futile.
The current model is based on a selection of surrogate markers relevant for disease assessment being clinically widely accessible, uniformly and reliably measured. It is obviously missing important factors (i.e. hemodynamic, genetics and morphological features) and we cannot exclude potential hidden confounders (e.g. [89]). Due to the limited availability of data that satisfies methodological requirements so far, only data from a single cohort was used and limits the generalisation of findings. It could be considered that the data set lacks diverse geographical populations like e.g. Japan and Finland with distinct rupture rates [12]. This limitation applies to most clinical studies of IAs including the latest developed and validated IA predictive models [90][91][92]. The difficulty to access clinical data was also an issue for the evaluation of the widely accepted PHASES score developed on a multi-centre data set [12].
Another limitation in common with the PHASES score is the widely accepted analysis of IAs on patient-level. Although it is widely accepted that the risk of rupture is determined by the most at risk lesion, it is expected that there is some inter-operator disagreement regarding the selection of the lesion which is most at risk (e.g. [93,94]). It was conjectured that IA size changes as a result of rupture [95][96][97]. While this cannot be excluded in general, several studies have suggested that for the majority of ruptured cases this does not apply [98][99][100]. Modelling rupture risk assessment on a IA-level will allow a precise targeted decision-support. We therefore highly recommend future studies and models to be designed not only at the patient-level but also at the IA-level.
Although the available data was limited, classic multivariable regression modelling is in accordance to findings from past studies [3,15,73,101] and correlation analyses correspond with cohort knowledge affirming our cohort to be representative overall.

Outlook
This study considered cross-sectional data on patient-level, but for a general statement about the dynamic of IA susceptibility, development and rupture, longitudinal and multi-dimensional data on aneurysmlevel including healthy references would be required [75,102].
Therefore, the call from Etminan et al. [11] for improved data as necessity to increase the precision and robustness of future decisionmaking models is strongly supported.

Conclusion
In this paper, additive and discrete BNs were learned on patientlevel, phenotypic IA rupture risk factor data and compared to results from standard statistical analyses. Currently, clinicians rely on risk prediction scores from classic statistical regression combined with expertise, where it is challenging to consider interdependent multifactorial relationships. Even though BNs are a promising tool for decision support, it is not straightforward to assess their clinical value. This study summarises the methodological background of two different BN formulations and provides insights into how IA rupture risk factors are associated, leading to IA rupture. The potential of BNs to increase the holistic understanding of IA disease was shown by comparing them to standard statistical methodologies. The generalisation of the results needs further research that requires improved data quality and structure of clinical health records.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.