Stochastic Modeling of Inhomogeneities in the Aortic Wall and Uncertainty Quantification using a Bayesian Encoder-Decoder Surrogate

Inhomogeneities in the aortic wall can lead to localized stress accumulations, possibly initiating dissection. In many cases, a dissection results from pathological changes such as fragmentation or loss of elastic fibers. But it has been shown that even the healthy aortic wall has an inherent heterogeneous microstructure. Some parts of the aorta are particularly susceptible to the development of inhomogeneities due to pathological changes, however, the distribution in the aortic wall and the spatial extent, such as size, shape, and type, are difficult to predict. Motivated by this observation, we describe the heterogeneous distribution of elastic fiber degradation in the dissected aortic wall using a stochastic constitutive model. For this purpose, random field realizations, which model the stochastic distribution of degraded elastic fibers, are generated over a non-equidistant grid. The random field then serves as input for a uni-axial extension test of the pathological aortic wall, solved with the finite-element (FE) method. To include the microstructure of the dissected aortic wall, a constitutive model developed in a previous study is applied, which also includes an approach to model the degradation of inter-lamellar elastic fibers. Then to assess the uncertainty in the output stress distribution due to this stochastic constitutive model, a convolutional neural network, specifically a Bayesian encoder-decoder, was used as a surrogate model that maps the random input fields to the output stress distribution obtained from the FE analysis. The results show that the neural network is able to predict the stress distribution of the FE analysis while significantly reducing the computational time. In addition, it provides the probability for exceeding critical stresses within the aortic wall, which could allow for the prediction of delamination or fatal rupture.


Introduction
Material inhomogeneities can have a considerable influence on the mechanical behavior of the aorta. When investigating healthy aortas, spatial and temporal variations in the mechanical behavior are often observed, which can be explained by different factors such as age, lifestyle or gender [1][2][3]. Observations range from sharp strain localization under uniaxial loading [4] to large variations in the stress-stretch ratio of tissue samples taken from adjacent sites under uniaxial or biaxial loading [3]. In particular, inhomogeneities may be associated with pathological changes in the aortic wall, which are often remodeling processes that are triggered by hemodynamic changes [5], in particular by changed wall shear stresses. Pathological changes are usually preceded by altered stress distributions in the aortic wall or inflammatory processes [6]. This then leads to spatial and temporal variations in the fraction of individual aortic constituents, such as collagen fibers, elastic fibers, smooth muscle cells or the ground substance [7,8].
Material inhomogeneities play a major role in the initiation and progression of an aortic dissection. Defects in the microstructure of the aortic wall can potentially lead to localized stress accumulations that induce aortic wall delamination. In particular, glycosaminoglycans (GAGs) may play an important role in the pathology of aortic dissection, as hypothesized by Humphrey [9]. The accumulation of GAGs between the elastic lamellae may cause a swelling pressure that leads to separation of the elastic lamellae, which can lead to delamination of the aortic wall. However, not only local accumulation of GAGs is associated with aortic dissection, but also apoptosis and smooth muscle cell dyfunction, remodeling of collagen fibers, and fragmentation or loss of elastic fibers [9]. However, these pathological alterations do not occur individually. Their onset is often correlated, which might be due to the highly interconnected microstructure of the aorta. For example, the swelling pressure caused by pooled GAGs results in degradation of interlamellar, radially-oriented elastic fibers. Elastic fibers, in turn, connect smooth muscle cells to the elastic lamellae. As a result of elastic fiber degradation, smooth muscle cells lose their microstructural integrity, leading to apoptosis and dysfunction [10]. However, the exact mechanisms are not yet fully understood.
As presented in Fig. 1, histological studies show that pathological alterations described above are more localized. Their shape and size also vary significantly. While pooled GAGs often occur in spherical shapes of different sizes [11,12], as shown in Figs. 1(a) and (b), apoptosis and smooth muscle cell dysfunction are usually found incomplete or a in band-like fashion [13]. Similarly, studies have demonstrated a band-like collapse of elastic lamellae and patchy loss of elastic fibers in the aortic wall, as illustrated in Figs.1(c) and (d), respectively. A connec-  [14] and Halushka et al. [13]. tion between apoptosis and dysfunction of smooth muscle cells and the degradation of elastic fibers cannot therefore be excluded. The influence of these local phenomena on the stress distribution is still unclear and challenging to model, since experiments do not yet provide reliable data on the global distribution of pathological changes in the aorta. In most cases, histological or immunohistochemical methods only focus on local phenomena. However, as shown in computational studies, these changes have the potential to significantly affect the local stress distribution in the aortic tissue.
In regard to aortic dissection, there are only a few computational studies examining local inhomogeneities in the aortic wall [15]. In two consecutive studies, Roccabianca et al. [16,17] investigated the effect of local inhomogeneities in the dissected aorta by incorporating an inclusion of GAGs into a rectangular slap of the media. The size and shape of inclusion of GAGs were based on experiments. In addition, the authors assigned only a low tensile stiffness to the inclusion, and osmotic loading of the media and inclusion was built in. After implementing the model in a finite-element (FE) analysis software, circumferential and axial stresses were locally increased around the periphery of the inclusion. Based of these findings, later, Ahmadzadeh et al. [18,19] later used a smooth particle approach to study the potential role of pooled GAGs in the initiation and progression of intralamellar delamination by modeling the coalescence and growth of GAGs in the dissected aortic wall. The computational results showed significant intramural stress concentrations and a transition from normal compressive to unusual tensile stress in the radial direction near the tip of a GAG pool, consistent with the results of Roccabianca et al. [16,17]. Both approaches provide evidence that local inhomogeneities, such as the accumulation of GAGs, can lead to local stress concentration, potentially leading to intralamellar delamination. However, for reasons of computational efficiency, these model approaches are not appropriate to investigate more general boundary-value problems related to healthy or dissected aortas. Also, the importance of size, shape, and location in the aorta has never been modeled or studied.
There are practically no experimental studies on the spatial and temporal distribution of material inhomogeneities over large sections of the aorta, neither in the healthy nor in the diseased case. Therefore, a stochastic model approach is necessary to investigate the effects of heterogeneities in the aortic wall on the stress distribution. In particular, the application of random fields [20,21] is an obvious possibility.
The lack of experimental data about the inherent physiological variability and additionally the limited knowledge about, e.g., in vivo boundary conditions leads to uncertainties in the computational model that are difficult to quantify. By treating the model input as stochastic, the outcome is automatically uncertain as well. In this work, specific material parameters of a computational model are described by a random field. As a result, the computational model can no longer be considered deterministic and is therefore quite meaningless without propagating the uncertainties through the model. Instead, a probabilistic description of the uncertain model parameters is required, resulting in a stochastic problem that needs to be defined and solved.
Random fields allow a material parameter to be treated as a collection of random variables at each point in the domain, meaning that the parameter takes on a stochastic value and a different such value at each point in the domain. In real aortic tissue, adjacent points are not statistically independent. If the parameter takes a certain value at a point, then the parameter should statistically have similar albeit not identical values at adjacent points. This similarity can be described and controlled by the correlation between any two points. Besides that, random fields can also be constructed to obey certain statistics, e.g., a Gaussian or uniform distribution. If the modeled parameter is known to be within a fixed range over the entire domain, a beta distribution or a uniform distribution is a reasonable choice. Random fields offer a compelling way to model a material parameter with all of these properties; stochastic, spatially inhomogeneous, spatially correlated and possibly bounded. In this work, we will construct a random field to model a material parameter that has all these properties.
Apart from random fields, another stochastic approach to modeling heterogeneous materials is based on stochastic volume elements and homogenization. This multiscale approach involves statistical volume elements of the microstructure to obtain the macroscopic and homogenized response of a material by volumetric averaging [22,23], which builds on the general concept of representative volume elements describing the unit cell in a periodic microstructure [24,25].
Random fields offer a convincing alternative if the macroscale is to be modeled heterogeneously and the stochastic properties of the random field can be derived from experimental data [4] or the statistics of microscale simulations, e.g., with statistical volume elements [26][27][28][29]. This is particularly appealing because random fields then allow, in principle, to identify a parametrized form of the spatial correlation structure of a material [30], and so a physical law that can then be used for predictions. This is in contrast to purely statistical volume elements, where a nonparametric spatially correlated structure can be estimated [31,32]. Random fields and statistical volume elements are therefore complementary methods [26]. In this work, we focus on modeling the macroscopic scale with random fields alone. Random fields enable the inclusion of prior information and can seamlessly and consistently be integrated into the uncertainty propagation by applying Bayesian probability theory [33,34]. Since much of the random field theory is based on stochastic processes, especially Gaussian processes, a large amount of advanced mathematics is available.
While stochastic constitutive modeling has recently attracted increasing attention, so has the problem of quantifying uncertainties of model output, particularly in the field of biomedical engineering [35]. The aim of this study is therefore to model inhomogeneities in the dissected aortic wall using a stochastic approach. More specifically, a beta random field with the special case of a uniform distribution is used to describe the stochastic, spatially distributed, and spatially correlated degradation of interlamellar elastic fibers by a degradation parameter based on the previous constraint that the degradation parameter is bounded. Due to the treatment of the constitutive model input as stochastic, the outcome of the FE analysis, more precisely the stress distribution, is also subject to uncertainties. It is therefore important to quantify these uncertainties [36]. This endeavor implies the need to perform FE simulations for a large number of random field realizations. Since this is not computationally feasible for FE models with many degrees of freedom, a surrogate model is used.
Traditional surrogate models for uncertainty quantification (UQ), such as polynomial chaos expansions [37,38] or Gaussian processes regression [30,39] are difficult to construct for the dependent variables of random fields. They also suffer from the curse of dimensionality, i.e. they scale unfavorably with the number of stochastic variables. This severely limits the spatial resolution of the stochastic constitutive model. In contrast, neural networks (NN), used as a surrogate model, do not suffer from this particular short-coming and practically increase the possibilities for uncertainty propagation of stochastic models. In addition, deep and convolutional neural networks (CNN) have proven to be able to capture spatial correlation information [40]. Therefore, in this study, a NN is trained on a comparatively small number of FE simulations and used as a surrogate model. To perform UQ of the stress distributions, a Bayesian deep convolutional encoder-decoder, as proposed by Zhu et al. [41], is applied, which is known to learn information about spatial correlation within data. We use the property that both the input random field and the output stress distribution have the same grid-like structure. Once the NN has learned the simulation output as a function of the random field, it can approximately predict the outcome of the FE simulations with drastically reduced computation time. In other related literature [42] it is shown that it is possible to learn an accurate deep learning-based surrogate for the stress distribution on realistic domains of the aorta as a function of geometry, but not for a constitutive model. Liu et al. [43] used a machine learning-based surrogate model to directly compute an aortic wall failure metric, again as a function of geometrical parameters and for homogeneous material parameters. Differently, He et al. [44] estimated the risk of thoracic aortic aneurysm rupture using different machine learning models trained with experimental in vitro data. The combination of the experimental data-based approach in [44] with the probabilistic modeling in this study in conjunction with Bayesian model comparison [45] could potentially lead to the identification of a parametrized, stochastic model for the spatial correlation structure of tissue inhomogeneities.
The present study is structured as follows. In Section 2, the stochastic constitutive model framework is introduced. In this context, the constitutive model of the aorta is introduced, which describes the behavior of the aortic constituents including the degradation of interlamellar elastic fibers by introducing a degradation parameter. Next, a beta random field of the spatially distributed degradation parameter is generated, sampled and then applied to a uniaxial extension test of the aortic wall, which is solved with the FE method. To quantify the uncertainties in the computational model, Section 3 then sketches a convolutional network as a surrogate model, in particular a Bayesian encoder-decoder, that learns to predict the stress distribution of the FE analysis using the random field as input. Subsequently, the uncertainties of the NN are assessed, and the results obtained from the UQ of the computational model are presented in Section 4.
Finally, Section 5 summarizes and discusses the study presented, also with a view to future work.

Stochastic constitutive model framework
In this section, the constitutive model framework is presented in a stochastic manner. The constitutive model describes the behavior of the aortic wall by explicitly including its constituents, namely collagen fibers, elastic fibers and the ground substance. In addition, it models the degradation of interlamellar elastic fibers during an aortic dissection. A stochastic description of the degradation parameter is then attempted using a random field [21,46]. To do this, we construct the degradation parameter as a beta random field from an auxiliary Gaussian random field. In fact, we generate samples of Gaussian random fields and then map them to beta random fields. Finally, the entire framework is applied to a boundary-value problem.

Constitutive model framework
In the healthy media, elastic laminae are interconnected by interlamellar elastic fibers that are primarily radially oriented. As previously discussed, there are several pathological findings associated with aortic dissection, one of which is the local accumulation of GAGs in the media which leads to a swelling pressure between the elastic laminae. This swelling pressure is related to the rupture of elastic fibers in the radial direction, which correlates with the observation that interlamellar elastic fibers are often degraded in aortic dissection, as shown in Fig. 1. The approach of Rolf-Pissarczyk et al. [47], as recapitulated below, assumes that elastic laminae and interlamellar elastic fibers can be accounted for by a dispersion of elastic fibers, while interlamellar elastic fibers are assumed to be symmetrically dispersed in the lamellar unit of the media. Diseased or degraded elastic fibers are then excluded.
We first introduce the deformation gradient F relative to a predefined reference configuration. If we consider an incompressible material, we require that the determinant of F, known as the Jacobian J, is equal to unity or detF ≡ 1. For this model we can now decouple F into a volumetric (dilatational) part J 1/3 I and an isochoric (distortional) part F = J −1/3 F, where I is the second-order unit tensor. The right Cauchy-Green tensor C = F T F is the basic kinematic variable formulated in the reference configuration, together with its modified counterpart C = F T F and the corresponding first invariants I 1 = tr C andĪ 1 = tr C.
The direction of a fiber in the reference configuration, denoted by the vector N, is given by where E i , i = 1, 2, 3, are the Cartesian unit basis vectors, while Θ and Φ are the polar and azimuth angles, respectively. We further define that the unit vector N(Θ) lies on the unit hemi- Because of symmetry, only half of the unit hemisphere needs to be considered. Then we discretize the unit hemisphere into a finite number of elementary areas ∆S n , n = 1, . . . , m, more precisely spherical triangles.
By assuming a hyperelastic material, we now introduce the strain-energy function Ψ in a decoupled form as where Ψ vol and Ψ iso represent the purely volumetric and isochoric parts, respectively [48]. The volumetric part can be defined as and the isochoric part can be further decomposed into where Ψ g represents the ground substance modeled by a neo-Hookean model, and Ψ c and Ψ e represent the energies stored in the collagen and elastic fibers, respectively.
To formulate the strain-energy function of elastic fibers in terms of the discrete fiber dispersion (DFD) method [49], we can write where ρ en defines the discrete density of a fiber, Ψ en (Ī 4n ) is the single fiber strain energy that is given by a general fiber model [50], andĪ 4en = C : N n ⊗ N n . The choice of (5) must ensure the condition Ψ n (1) = Ψ n (1) = 0. After discretizing the unit hemisphere in m elementary areas, the discrete density ρ en of elastic fibers can be expressed as In addition, we must satisfy the normalization condition, which by definition is satisfied by the choice of the distribution function. For the discrete approach, i.e. m n=1 ρ en = 1.
As proposed by Rolf-Pissarczyk et al. [47], we now introduce a degradation parameter ξ to describe the degradation of elastic fibers as a result of separated elastic laminae, where ξ = 0 is associated with a healthy tissue and ξ = 1 with a completely diseased (damaged/degraded) tissue, which is analogous to the continuum damage theory [48]. Then, to exclude degraded elastic fibers from the total strain-energy function, a degradation or critical fiber angle is defined as Θ ξ = πξ/2. Therefore, we distinguish the cases where f e represents the mathematical expression of the strain-energy function of a single elastic fiber, while I 4en = C : N n ⊗ N n . Since the degradation of elastic fibers initiates from the radial direction due to the highest stretch occurring and leads to a higher rupture vulnerability, radially-oriented elastic fibers are initially excluded. This results to a reduced delamination strength.
The isochoric part of the strain-energy function then reads where the strain-energy function of collagen fibers is formulated analogously to (5), i.e. within the framework of the DFD method by using an exponential approach to model the stiffening of collagen fibers [51]. In order to implement the constitutive model framework in a FE analysis software, the Cauchy stress tensor and the elasticity tensor need to be formulated. In this context, reference is made to the study of Rolf-Pissarczyk et al. [47].

Gaussian random fields
A random field F(Ω 0 ) is a function that takes on a random value at every point in the reference domain X ∈ Ω 0 , X = (X 1 , X 2 , X 3 ) T . It is also sometimes thought of as a stochastic process, but the coordinates are usually spatial and continuous rather than temporal and discrete.
It can be understood colloquially as the generalization of a multivariate random variable, which is a finite collection of random variables, to an infinite collection of random variables. A random field is said to be Gaussian if every marginal probability distribution is Gaussian, i.e. every finite subset of the infinite collection of random variables follows a (multivariate) Gaussian distribution. A Gaussian random field can be completely described by its mean function µ(X) and a positive semi-definite covariance function k(X, X ) between any two points X ∈ Ω 0 and X ∈ Ω 0 .
We now define an auxiliary Gaussian random field F, from which a beta random field for the degradation ξ, as introduced in (8), is later constructed, i.e.
meaning that F is distributed according to a Gaussian process GP. Assuming a smooth Gaussian random field that the magnitude of the local fluctuations of the degradation parameter ξ between two 'neighboring' points in the domain should be small, a suitable covariance function is the so-called squared-exponential where ς 2 and ι are model parameters that define the magnitude of the variation and the length scale of the correlation, respectively. If X = X then ς 2 is the variance. The value of ι specifies the neighborhood. In the following we denote ι as the correlation length. Note that the smoothness is defined by the shape of the covariance function, hence the choice of the covariance function significantly affects the spatial correlation structure and behavior of the random field realizations. One may prefer to choose a different covariance function, e.g., from the more general family of Matérn covariance functions with a smoothness parameter [30]. On the other hand, the mean function µ is not of particular importance at this point, since the Gaussian random field is later transformed and re-scaled into a beta random field. Hence, we can set µ ≡ 0 without loss of generality. Note that this model is stationary, i.e. the value of the covariance function only depends on the distance between any two points, X−X , and does not change upon translation of the points in space. We will later exploit this fact for a fast computation of random field realizations. Non-stationary models usually require more advanced methods [52,53].
Moreover, the presented model can easily be extended to spatially anisotropic correlation structures by defining corresponding anisotropic covariance functions of X = (X 1 , X 2 , X 3 ) T , e.g., ) T , be now the set of coordinates of the nodes for the discretization of the computational domain Ω 0 , i.e. the reference configuration. Then, f(X) is a finite subset of the infinite collection of random variables implied by F(Ω 0 ),X ⊂ Ω 0 . Note the difference between F(Ω 0 ) as an infinite collection of random variables at all locations in the domain X ∈ Ω 0 , and f(X) as a finite collection of random variables at the nodes of the domain discretization X ∈X,X ∈ Ω 0 . Both the random field F(Ω 0 ) and the spatial discretization of the random field f(X) are simply referred to as the random field subsequently, and we clarify this difference where necessary.
According to the definition of a Gaussian random field, the joint probability density function (PDF) p for the random field values f at all X (nx) ∈X must be a multivariate Gaussian given by where N represents the multivariate normal distribution with the mean µ = 0 and the covariance matrix K with the components u and v of size N x given by the number of nodes. The components of K are determined by the chosen covariance function k. We now draw samples from this distribution. In other words, we generate realizations of random fields that satisfy the statistics defined by Eq. (13).

Sampling Gaussian random fields
We now draw samples f (j) from the distribution (13). Note the difference between the random field f and samples f (j) of the random field, also known as realizations, generated from random field's PDF (13). In principle, a sample f (j) can be obtained from (13) by a Cholesky decomposition of the covariance matrix [54], i.e.
where L is the Cholesky factorization of the covariance matrix K, while z is a vector with dimension equal to the number of nodes N x . The components of z are independent, identically distributed random numbers z nx from the univariate standard normal distribution, i.e.
However, the computational effort of the Cholesky decomposition scales with O(N 3 x ) [55] and can therefore be impractical for large or complex domains that require fine discretization, which would lead to a large number of nodes N x and consequently to a large computational effort. In general, there are a number of methods for generating realizations of Gaussian random fields [55,56], e.g., (i) spectral methods such as Fourier or Karhunen-Loève expansions [57][58][59], (ii) methods based on the solution of a corresponding stochastic (partial) differential equation [21,53,60], or (iii) methods based on iterative procedures and polynomials [61,62]. Although the latter two methods are more general and can also be applied to more complex geometries, domain discretizations, or non-stationary covariance functions, they are also usually slower.
Here we exploit the properties of the simple domain, the discretization (regular grid) and the covariance function (stationarity) to apply the comparatively fast spectral representation method as proposed in [57,58]. Furthermore, we neglect spatial fluctuations in the circumferential direction of the aorta in order to adopt the procedure for only two spatial dimensions.
For the spectral representation method, we need the Fourier transform of the covariance function, known as the power spectral density S. Using the stationarity of the covariance function, i.e. k(X, X ) = k(X − X ), S can then be calculated using the Wiener-Khinchin theorem [58] as whereX = X − X and ω = (ω 1 , ω 2 ) T . We may then generate samples f (j) as follows A n 1 ,n 2 cos n 1 ∆ω 1 X 1 + n 2 ∆ω 2 X 2 + Λ (1) with A n 1 ,n 2 = 2S(n 1 ∆ω 1 , n 2 ∆ω 2 )∆ω 1 ∆ω 2 and where X 1 and X 2 denote the coordinates, while ω max,i is a cut-off frequency above which S is assumed to be approximately zero. The cut-off frequency ω max,i together with N i defines the discretization of ω i to n i = 1, . . . , N i pivot points in equidistant steps ∆ω i . Finally, Λ (i) n 1 n 2 are the random phase angles, independently uniformly distributed in the interval [0, 2π). Drawing samples from Eq. (11), i.e. generating random field realizations, then amounts to choosing a discretization of the Fourier domain, drawing uniform random phase angles and evaluating (17). Note that to compute a sample that describes a set of values at all locations f (j) (X), one has to compute f (j) (X) in (17) for all coordinates X ∈X while all other values are fixed, i.e. for fixed random phase angles Λ (i) n 1 n 2 and fixed Fourier domain discretization. In addition, the sampling algorithm only has to calculate the kernel spectrum S once. At this point, we do not restrict ourselves to an equidistant discretization of ω, because Eq. (17) is the approximation of an integral with a discrete sum, more precisely a Riemann sum. If one chooses an equidistant discretization of space and Fourier domain, the evaluation of (17) can be significantly sped up [55,57,63]. According to [64], samples can then be computed by where Y is the fast Fourier transform and z is a vector of samples from independent standard normal distributions, analogous to (14).

Transforming Gaussian random fields to beta random fields
A Gaussian random field F can be transformed into a non-Gaussian random field R. Let C F be the cumulative distribution function of F and C −1 R the inverse of the cumulative distribution function of R. Then R is obtained from the transformation [59,65,66] In many cases this equation is difficult to solve because C F can be expensive to compute and C −1 R is often difficult to find. For the Gaussian random fields and the beta random field in this study, an analytical solution is detailed [67,68]. Numerical approximations might be required for other types of random field models [59,69,70].
In order to build intuition, we first consider a simple univariate case. If two independent random numbers f 1 and f 2 are distributed according to a Gaussian distribution, then the sum of the square of these numbers, g = f 2 1 + f 2 2 , follows a gamma distribution, more precisely a chi-squared distribution (χ 2 -distribution) with two degrees of freedom as a special case of the gamma distribution. Given two independent random variables, g 1 and g 2 , that each follow a gamma distribution, then the combination β = g 1 /(g 1 + g 2 ) follows a beta distribution, see Appendix A. After that, the hyperparameters of a beta distribution can be chosen in such a way that we get a uniform distribution. We now use an analogous result generalized to random fields [63,67,68,71], in order to construct a uniform random field from Gaussian random fields.
However, we choose here that the random fields r are identical yet still independent. For the sake of simplicity, we omit the superscripts j in the following, which indicate samples of the random field r, as used analogously in Section 2.3. Then, gamma random fields g s (X) are computed as Based on this, one is able to generate a gamma field sample from one sample each of at least two independent Gaussian random fields, f 1 (X) and f 2 (X). The correlation structure for this gamma field k gs (X) is transformed as follows where k(X) is the stationary covariance function for the Gaussian field used in Eq. (17). Note that the gamma fields also contain exponential or χ 2 -distributions as a special case. With a set of samples from two independent gamma fields, g s (X) and g s (X), each characterized by the same covariance function, it is possible to sample a beta random field β s,s (X) from The univariate marginal PDF of (22) is a beta distribution, see Appendix A, which is given by where B is the beta function, while β(X) is a univariate random variable at a single particular location X. This means that the random variable β is the value of the random field β(X) at location X and that β follows a beta distribution at all locations X. The correlation structure of this beta random field, k β s,s , is where s + s > 1 and again k(X) is the stationary covariance function for the Gaussian field used in Eq. (17). For the special case s = s = 1 we get a uniform distribution in the interval This means that by this procedure we get a non-Gaussian random field for which every marginal PDF is a uniform distribution. The degradation parameter ξ, as defined in (8), is bounded, a property correctly modeled by a beta random field. Due to the limited experimental data available, a uniform random field was chosen. With Eq. (22) we can now define the degradation parameter as follows We therefore found a way to draw samples from the desired probability distribution As summarized in Algorithm 1, this is achieved by drawing samples of the auxiliary Gaussian random field (11) using the covariance function defined in (12) via the sampling scheme (17) or (19), respectively. The Gaussian random field samples are then input into (20), yielding samples of a gamma-type random field. Finally, samples of the gamma-type random field are used in (22), resulting in samples of a beta random field. Here, with s = s = 1, we need two Gaussian random field samples to calculate a gamma-type random field sample. Then we need two such s,s of beta random fields, where the sample index j as in Section 2.3, is re-introduced here. Anticipating the quantities introduced in Section 3, we refer to N j = N s in the context of the Monte Carlo integration and N j = N ξ in regard to NN training data, see, e.g., Eqs. (32), (68)  Generate f (j) r // see (17) or (19) 5: (20) 7: (17) or (19) 9: end for 10: (22) 12: Compute Cauchy stresses Σ (j) for given ξ (j) with FE or NN 14: end for gamma-type random field samples to compute a beta random field ξ. In the special case of the uniform field in (26), s = s = 1, strictly speaking, a total of four Gaussian random field samples are required to calculate one beta random field sample.
Note that non-uniform beta fields can be modeled for general bounded parameters by choosing s and s accordingly. In addition, the gamma field mapping (20) and the beta field mapping (22) do not allow negative correlations. This limitation is not inherent to the respective fields, but to the special mappings shown in (20) and (22).

Application to a boundary-value problem
The stochastic constitutive model is applied to a boundary-value problem, which is then solved using the FE method. More precisely, in the FE analysis program FEAP [72], a uniaxial extension test of an incompressible unit cube defined by dimensions 1×1×1 mm 3 is performed as shown in Fig. 2. The unit cube is aligned with the Cartesian unit basis vectors E 1 , E 2 and E 3 and a uniform displacement of 0.4 mm along the top face is applied so that the loading direction coincides with the radial vector, i.e. E 3 = E R . Here we define E 1 and E 2 as circumferential and axial directions, respectively. We discretized the unit cube with 100 8-node hexahedral mixed Q1/P0 elements, ten elements in the E 2 and E 3 directions, respectively, and one element in the The augmented Lagrangian method in FEAP [72] was applied to ensure incompressibility. To model the inherent microstructure of the aortic wall, two families of collagen fibers, an isotropic ground substance and one family of elastic fibers were defined. The mean fiber directions of the two collagen fiber families are defined in the (E 2 -E 3 ) plane with a symmetric in-plane angle around E 2 . In contrast, the mean fiber direction of the family of elastic fibers is aligned with the E 3 -direction, so the dispersion of elastic fibers realistically reflects both the elastic lamellar and the interlamellar elastic fibers. The local degradation of elastic fibers is then taken into account by changing the degradation parameter, resulting in a locally reduced delamination strength. The mechanical and structural parameters of the respective constituents agree with the computational study by Rolf-Pissarczyk et al. [47].
The input for the FE analysis are uniform random fields describing the local elastic fiber degradation generated by the spectral method as described in Section 2.3 and the subsequent transformation described in Section 2.4. The correlation length of the degradation parameter was chosen to be ι = √ 2/3 mm and the simulated noise added in the random field sample was chosen to be ς 2 = 0.173. To simplify the computational problem and save computational costs, we assume that the degradation parameter varies only in the (E 2 -E 3 ) plane. Therefore, a twodimensional random field was simulated, then duplicated, and finally two layers were stacked back-to-back for the three-dimensional unit cube. A three-dimensional FE simulation was then carried out with a two-dimensional random field, as shown in Fig. 2. In addition, the twodimensional random field was sampled on an equidistant grid of size 2048×2048 px and further sampled down to a size of 20 × 20 px, so that the evaluated grid points of the low-resolution image coincide with the non-equidistant Gaussian integration points of the FE analysis. More specifically, a Gaussian quadrature rule of second order was applied. The random field values can usually also be generated directly for the integration points, but the grid of the integration points is usually not uniform. Also note that in some cases the FE analysis did not converge, which was usually observed when the gradient between adjacent degradation parameters of the random field was particularly high.

Uncertainty propagation to stress distributions
The stochastic model entails an uncertainty in the stress distribution resulting from a uniaxial extension test. A comprehensive overview of the methods for quantifying these uncertainties in the stress distribution, i.e. on the propagation of the uncertainties through the model, can be found in [36]. Here, we choose a Bayesian approach [33]. For readers unfamiliar with Bayesian probability theory, a brief introduction to the basic rules of probability theory is given and then applied to formulate the UQ problem. Subsequently, NNs are introduced as surrogate models, specifically the Bayesian encoder-decoder architecture to solve the UQ problem. We also refer to the literature [33,45,73,74].

Probability theory and uncertainty quantification
Let Q, R and P be three propositions. If, e.g., Q is a continuous random variable, such a proposition could be 'Q has a certain value'. We will also need other types of propositions, such as the proposition that a certain model P is true or a certain new data set R was measured.
We can further combine propositions with Boolean algebra into new propositions, e.g., Q AND R is the proposition that 'Q has a certain value AND R has been measured'. In the following, such AND-compositions are denoted with a comma, e.g., (Q, R). Propositions can also be conditioned on each other, e.g., Q | R reads 'Q has a certain value given R has been measured'.
Hence, Q | R, P reads 'Q has a certain a certain value given R has been measured AND model P is true'. The basic rules hold for all these types of propositions. The first rule is Bayes' where p(Q | R, P ) is the conditional probability for Q given that R AND P is known. The probability p(Q | R, P ) is usually called posterior. In this study, Q are the mechanical stresses whose uncertainties we want to quantify, R is a data set, and P are the model assumptions. The probability p(R | Q, P ) is called the likelihood, e.g., the probability for measuring a data set R given that the model P is true and Q is the true value. p(Q | P ) is the a priori probability for Q, i.e. the probability before the new data measurements were taken into account, and p(R | P ) is the evidence, which is a normalization constant. The second basic rule is the marginalization rule. For continuous variables Q it reads which is to be understood as a volume integral over the domain of Q.
According to Section 2, the Cauchy stress tensor is a function of the random field parameter (the collection ξ of degradation parameters) and the location X, i.e. σ(ξ, X). In the following, we consider a particular component σ νν of the Cauchy stress tensor. To simplify the notation and because ν, ν can be chosen arbitrarily, we now suppress the indices for the tensor component and write σ ≡ σ νν . It is possible to extend the approach to consider all stress tensor components together. Note that the quantity of interest, here σ, can in general be any quantity of interest derived from the FE analysis, such as the displacement field. The uncertainty of a quantity of interest σ, here a particular component of the Cauchy stress tensor, at a fixed current location X, given the degradation field ξ(X) at all reference locationsX, is described by the probability density function for the value of σ, i.e.
where we have used p(σ | ξ,X) = p(σ | ξ, X), i.e. given the full random field realization ξ at all X ∈X, one only has to specify the measurement location X.
However, according to Section 2, the degradation parameter ξ is a random field and therefore not exactly known. According to the Bayesian paradigm, we have to average over all unknowns, meaning to marginalize the unknowns as in (29). Using the PDF for the random field ξ as described in Section 2 this is achieved via where (32) is an approximation of (31), obtained by stochastic integration. In other words, a Monte Carlo integration [75] was applied with N s samples, often denoted as particles, with sample weights, specifically the probability mass W and the Dirac delta function δ. Note that each sample ξ (ns) here corresponds to a random field realization as introduced before. Here we W ξ (ns) = 1/N s . Note that W are sample weights in the sense of the relative probability mass and are not to be confused with the weights in the sense of surrogate parameters, which define the NN to be introduced later. An important advantage of stochastic integration over deterministic integration is that its convergence rate is independent of the integral dimension N x .
After determining (31) one can also compute the mean, standard deviation, variance and covariance, where the expectation value is denoted as · . Therewith, std (σ(X)) = var (σ(X)), with a normalization Z. In addition, the probability that the random variable σ exceeds a critical threshold value σ crit at a fixed location X is If the critical threshold value σ crit is the failure stress of the aortic wall, then Eq. (37) is the local failure probability at a certain location X. Next, the global failure probability at any location Solving the problem given in (31)

Neural networks as a surrogate and the encoder-decoder architecture
CNNs have seen an increase in interest and applications in the fields of computer vision and pattern recognition since the ImageNet challenge [78] and to date most published research still involves the training of a CNN [79,80]. This popularity also brought CNNs growing attention in other fields, including shape modeling [81], computational chemistry [82], and physics-based simulations [83]. NNs generally consist of fully connected graphs where the nodes or neurons are interconnected by weighted arcs or synapses. The actual configuration is sought through an iterative process called training.
Fully connected graphs can be difficult to train on large inputs and may require high computational and memory costs. Convolution can be used to analyze spatial correlations between subsets of inputs, like neighboring cells in a matrix. This resulted in CNNs represented as stacked layers of linear convolution followed by nonlinear activation [84]. The NN M is therefore a nested sequence of functions M ( ) , i.e.
which yields a recursive relation for the layers = 1, . . . , L, with neurons n = 1, . . . , N in each layer according to where h is the above-mentioned nonlinear activation function of neuron n in layer applied element-wise to its matrix-or vector-valued argument, while w ( ,n ) are the elements of the set of weight matrices w, and b are additional parameters called biases. Note that M ( ) is generally matrix-valued since ξ are also matrix-valued. The weight matrices are parameters for the NN and should not be confused with the particle weights W ξ (ns) from Section 3.1 or The training process then boils down to choosing an optimization criterion, e.g., the L 1 or L 2 loss functions, and minimizing this loss function with respect to all weights w and biases b with a suitable optimization algorithm. The loss function and optimization implications are discussed in more depth in Section 3.3. It can be shown that the gradients for feed-forward networks can be efficiently evaluated. These gradients are then used for error back-propagation and weight update during training [85]. This procedure makes it possible to automatically extract multi-scale features from high-dimensional input, reducing the need for hand-crafted feature engineering, such as searching for the right set of basis functions or relying on experts knowledge [86]. In our case, these features will primarily be complex spatial correlations of the stress distributions.
We usually speak of deep CNNs when the network has two or more intermediate layers [87,88]. Although deep CNNs have shown high accuracy on a large number of tasks, a limitation stems from their low robustness and reproducibility [89,90], also given their intrinsic inability to express uncertainty [41]. A solution to this problem is provided by Bayesian deep networks, which also allow the prediction uncertainty to be expressed [91][92][93]. A Bayesian network can quantify the predictive uncertainty by treating the network parameters as random variables and by performing Bayesian inference on those uncertain parameters, even when the training data set is small. In a fully Bayesian treatment, one would rather learn a probability distribution than minimize a loss function, as will be introduced in Section 3.3.3.
Depending on the learning task and data structure, a large number of NN architectures have been proposed [94,95]. A common architecture among them is the encoder-decoder [79,83,84]. An encoder-decoder architecture is typically used when data needs to be compressed and decompressed, as in matrix-to-matrix regression tasks. Image-to-image transformations are common examples [84]. The task of reducing the complexity of the input data, i.e. the selection or extraction of features, is called an encoder, while the reverse process of decompression is called a decoder. The entire process of reducing the number of features to an encoder space or latent space is understood as dimensional reduction. An encoder-decoder architecture is considered good if it retains the maximum information when encoding, while showing minimal error when reconstructing the data in the decoder.
The encoder-decoder architecture has shown a growing number of applications since its introduction [88,96]. In this study we use an architecture similar to [41], which showed stateof-the-art performance in terms of prediction accuracy and in comparison to other established approaches, such as Gaussian processes [41]. An important motivation for this choice of a surrogate model is that the encoder-decoder structure allows to extract multi-scale features and spatial correlations from the input. These features are then processed by the decoder and finally preserved in the output. For more details on network architecture, network layers and layer parameters, we refer the reader to Appendix B.

Bayesian uncertainty propagation with a neural network surrogate model
where w is a set of surrogate parameters. Also, we will use the encoder-decoder CNNs introduced above as surrogate models, i.e. M has the form of Eq. (39). In connection with NNs, w convoluted to enter the encoder. Then, after entering the dense block with two dense layers, it is transitioned down and passes through a dense block with five dense layers. Subsequently, the representative random field is transitioned up again, followed by a dense block with two dense layers, the decoder, and a final convolution. The final output on the right is the network's prediction for the spatial distribution of the Cauchy stress component σ 33 .
To predict a single random field, we use an ensemble of 20 NNs connected in parallel, from which then follows that we also obtain 20 NN predictions.
are often called weights, which should not be confused with the stochastic integration weights W mentioned above. Then the inputs and outputs of the FE analysis are collected in a training data set D according to which is grouped as followŝ where D consists of N ξ random field samples and their corresponding stresses computed with the FE method at all N x measurement points or nodes in the domain, σ (nx,n ξ ) := σ(ξ (n ξ ) , X (nx) ).
Note that we distinguish between the number of random field realizations N ξ for which a corresponding stress distribution is computed using the FE method and then used to create a surrogate model later, and the number of Monte Carlo samples N s for the stochastic integration in Eq. (32). Accordingly, we also distinguish the indices n ξ = 1, . . . , N ξ and n s = 1, . . . , N s .
The N ξ random field samples ξ (n ξ ) together with the corresponding stress distributions obtained with the FE method Σ (n ξ ) will shortly be used to learn a surrogate model. For the N s random field samples ξ (ns) in the stochastic integration of (32), we will compute the stress distributions with the surrogate model learned in this way.
Next, an optimality criterion O is defined to determine the 'optimal' surrogate parameters w * based on the given training data D, i.e.
This optimization process corresponds to the NN training mentioned in Section 3.2. The optimal surrogate parameters could, inter alia, be found by minimizing the sum of the least squares, i.e. w * = arg min w Nx nx=1 Further implications of this substitution are discussed in the literature [34]. From here on we assume that the functional form of M is fixed, and we will suppress M in the conditional complex of the PDFs when not directly addressed to simplify notation, e.g.
and analogous to Eq. (31) we can then write Other popular functional forms of M are polynomial chaos expansions [37,38], Gaussian process regressors [30,39] or recently physics-informed NNs [40,41]. In Section 3.2 NNs were introduced to construct a single surrogate model to learn and predict stresses at each node inX collectively, rather than a set of N x surrogates that predict the stress at X (nx) , n x = 1, . . . , N x each.

Approximation error of the surrogate
In (47), we assumed that we can approximate the uncertainty of the Cauchy stress component σ by replacing the FE analysis with a surrogate learned from a training data set. This approximation introduces additional uncertainties via the surrogate parameters w through the first term under the integral in (47), i.e.
where p(w | D) is the posterior PDF for the surrogate parameters. Note that the data D in the first term is superfluous, since σ is uniquely determined by given ξ, X and w via Eq. (41).

Substituting Eq. (48) into Eq. (47) yields
So far we have implicitly assumed that (31) is approximated by (47), i.e. p(σ | x, D, M) ≈ p(σ | X). This is only valid if the surrogate model is actually a good approximation to the FE simulation, see (41), and if at the same time the posterior for the surrogate parameters shows a sharp peak at the optimal surrogate parameters w * , as in (44), p(w | D) ≈ δ(w − w * ), see [34]. Consequently, we wrongly assumed that there are optimal surrogate parameters w * that have no uncertainty. In the following we keep the assumption (32) and refrain from neglecting the uncertainties in the surrogate parameters w introduced by using a surrogate in the first place (41). To put it another way, we aim to solve (49). To do this, we must first define the remaining ingredients, i.e. likelihood and prior for the posterior of the surrogate parameters p(w | D).

Likelihood and prior
Using Bayes' theorem and conditionally independent data from the FE analysis and recalling (43) we find that In (50) it is recognized that surrogate parameters w are a priori conditionally independent of locationsX and random fields Ξ, which will be discussed later. In (51) it is assumed that the simulation output Σ (n ξ ) corresponding to a specific random field sample ξ (n ξ ) provides no information about the simulation output corresponding to any other random field sample, i.e.
conditional independence. Finally, in (52) we assumed that to measure the stress at a particular location for a given random field sample ξ (n ξ ) and surrogate parameters w, we only have to specify this particular measurement location, namely X (nx) , and not all,X = {X (nx) }. This conditional independence in (52) might be counter-intuitive because one would think that if the random field ξ is spatially correlated through X then the Cauchy stress components σ should also be spatially correlated. This is indeed true. However, as soon as ξ (n ξ ) , X (nx) , and w are determined in (52) then σ (nx,n ξ ) is uniquely determined by M, as in (41). Then, the PDF for σ in (52) describes only the additional approximation error or uncertainty caused by the surrogate approximation (41). This approximation error is assumed to be location-independent and therefore the same for all locations. Spatially correlated noise could be introduced at this point by another Gaussian process, for example. Also note that the surrogate model predicts stresses at all locations X (nx) ∈X simultaneously, as mentioned in Section 3.2. The spatial correlations of the stresses σ are then implicit in the surrogate. As shown in Section 3.2, it is the particular advantage of NNs as a surrogate to enable learning and prediction of stresses at a large number of measurement sites together.
Due to the lack of information about the distribution for the simulation data, we assume a general exponential likelihood like a Gaussian as a convenient default choice for the likelihood presented in (52), more precisely for the likelihood p(σ (nx,n ξ ) | ξ (n ξ ) , X (nx) , w). Let L (nx,n ξ ) be a loss function or mismatch term for a given data point (n x , n ξ ). Then, given the noise variance ∆ 2 that describes the uncertainty scale of the surrogate, the properly normalized likelihood for a single data reads With the same argument as in (50)- (52) and the total loss L the joint likelihood of all data given surrogate uncertainty ∆ 2 is with where N x is the number of pivot points or the number of measurement sites inX while N ξ is the number of random field samples. It follows that the number of single stress measurements is N x N ξ . Although the uncertainty scale of the surrogate ∆ 2 is the same for all measurements, it is not known and must therefore be marginalized. We can reformulate the uncertainty as a precision parameter τ := 1/∆ 2 . A suitable and conjugate prior for the precision parameter τ is the gamma distribution p(τ | a 1 , b 1 ) = p Γ (a 1 , b 1 ), which leads to a Student-t distribution [97] of the form where a 1 = 2 and b 1 = 2 · 10 −6 are chosen in analogy to [41]. In order to fully define the likelihood, we now only have to specify the loss function L (nx,n ξ ) . In this study, a smooth L 1 loss, as in a Huber loss, does not lead to appreciably different results and is also more difficult to handle because it introduces another hyperparameter. Here we decided to use the L 2 -norm as the loss function, i.e. a Gaussian likelihood, which led to reasonable results. Formally this is For the prior we choose a multivariate Gaussian with a diagonal covariance matrix, i.e. p(w | α) = N (0, α −1 1) which implies that the components of w are conditionally independent for a given a priori precision hyperparameter α. With a gamma-type hyper-prior p(α) = p Γ (a 0 , b 0 ), we get a centered Student-t distribution [97], as where N w is the number of surrogate parameters w. This choice regularizes against outliers and promotes sparsity in the weights w. For the hyperparameters we choose a 0 = 1 and b 0 = 0.05, see [41]. We have now determined the un-normalized posterior in (50).
The last missing ingredient, i.e. the first term under the integral in (49), is already implicitly defined as the likelihood (see Eq. (53)) for a single new data point, which in turn is integrated with respect to ∆ 2 . Explicitly, this is We have now fully specified (49) and will try to solve it below.

Variational inference
For generalized linear models M, i.e. linear in the surrogate parameters w, the integral (48) can often be solved analytically [34]. However, the linearity limits the expressive capacity of the surrogate model. In this study, we want to introduce an NN as a surrogate that has a nonlinear dependence on the surrogate parameters w. The NN has 70 020 parameters, leaving us with a high-dimensional integral defined in (49). This integral is difficult to solve numerically for exact inference, even with the most sophisticated variants of Markov Chain Monte Carlo currently available, and difficult to solve with Nested Sampling [98]. So we approximate (49) by variational inference [99]. In other words, the PDF p(w | D) from (49) is approximated with a suitable PDF q ∈ Q from a family of functions Q, so that the integration with respect to w in (49) becomes manageable. Then, q must be chosen such that the Kullback-Leibler divergence K(q; p) becomes minimal, i.e.
p(w | D) ≈ q * (w) = arg min q∈Q K(q; p), with The Kullback-Leibler divergence is a measure of the distance between two PDFs, namely p and q. This optimization problem can be solved by parameterizing q with tuning parameters θ such that q := q θ . The PDF q θ ∈ Q could then be the exponential family, e.g., Gaussian distributions with mean and variance θ := (µ, ς 2 ). Then analytical expressions for the gradient of the Kullback-Leibler divergence (62) are often available, but with the disadvantage that one is limited to a family of parameterizable distributions. A more advanced approach is Stein variational gradient decent [41,[100][101][102], which represents q numerically with samples and instead parameterizes small perturbations of q by small, parameterized coordinate transformations T according to with small ε and a vector-valued function φ. This in turn defines a perturbed PDF q T given by Then the minimization in (61) corresponds to the minimization with respect to φ. The estimate for the expected gradient of the Kullback-Leibler divergence K(q T , p) was derived in [100][101][102], from which the iterative procedure results with where n t is the number of previous iterations, Z is a normalization constant that can be ne- nt } Nm nm=1 , as advocated in [100]. The kernel function κ used here should not be confused with the convolution kernel in the NN.
During training of the NN, updates of the learning rate ε were performed using a particular optimization algorithm called ADAM [103] and a cosine annealing learning rate schedule as described in Appendix B.
The equations (65) and (66) define an iterative procedure of small coordinate transformations of w and subsequent small perturbations of q. This iterative procedure finally leads to a q * in a sample representation that is optimal in the sense of Eq. (62). We can then approximate the integral with respect to w in (49) by the weighted sum over the samples or particles w (nm) .
These samples represent q * (w) by the following relation where W w (nm) is the normalized weight of the particle w (nm) . More precisely, it is defined in this study as W w (nm) = 1/N m .
The integral with respect to the random field ξ still remains unsolved. We can approximate it by a discrete sum over the samples of ξ drawn from the distribution p(ξ |X), as discussed for Eq. (32). Substituting (32) and (67) where W ξ (ns) = 1/N s is according to Sections 2.3 and 2.4. The individual terms in these sums are the likelihood for new data (Eq. (60)) as argued in Section 3.3.2. With the above substitutions and the definitions provided in Section 3.3.2 this explicitly is The surrogate predicts stresses with significantly reduced computational effort compared to the original FE analysis, presented in Section 2.5. Once the NN has been trained as a surro- The uncertainties (34) derived from the PDF in (68) do also include the uncertainties of the NN itself. Note that this result can also be interpreted as an average over n m = 1, . . . , N m different NNs with surrogate parameters w (nm) and weight W w (nm) , each NN with its own predictions for the stress distribution σ for all given random field samples ξ (ns) . Also note that this approach makes it possible to investigate isolated network uncertainties for fixed parameter fields ξ.

Results
In this study, the NN was trained to predict the results of the FE analysis of the boundary-  respectively. The network prediction is plotted in Fig. 4(c) and the absolute difference between the output of the FE analysis and the NN prediction can be seen in Fig. 4(d). A comparison of the true output and its network prediction shows that the network cannot predict small-scale fluctuations within the data. Therefore, the network prediction looks pretty smooth compared to the Cauchy stress distribution obtained from the FE analysis. These fluctuations can be interpreted by the model as noise in the data and are ultimately responsible for the deviation of the NN prediction from the FE reference solution, as can be seen from the absolute difference in Fig. 4(d) and the predicted standard deviation in Fig. 4(e). Even if the network prediction seems to capture the output well at first glance, it is the elusive small-scale fluctuations that end up causing relative errors of up to 20 %.
In order to quantify uncertainties, it is useful to compare surrogate predictions at certain locations with the true solution, here the FE analysis, as illustrated in    (Fig. B.8).

Discussion
We have presented a stochastic approach to model material inhomogeneities in the aortic wall using the example of aortic dissection. To describe pathological changes in the aortic wall, a constitutive framework was introduced that includes a degradation parameter to model degraded elastic fibers. This parameter was then assumed to be spatially distributed within the aortic wall. Based on this assumption, a beta random field of the degradation parameter was developed and sampled. Subsequently, the stochastic constitutive model was implemented in FEAP [72] and applied to a boundary-value problem, more precisely a uniaxial extension test, which provided the stress distribution as the quantity of interest in this study.
However, the results of the stochastic model are meaningless without accounting for the uncertainties introduced. In this study, uncertainty propagation was achieved using a NN as a surrogate model. This approximation introduces an additional uncertainty that can be treated in the context if parametric input uncertainties on the basis of Bayesian probability theory. The additional uncertainty, i.e. the uncertainty of the network itself, was estimated using a variational inference formulation. The variational inference limitation of parametrized PDFs is overcome with a particle representation of the approximated PDF. This effectively results to an ensemble of NNs whose predictions are averaged. In addition to the degradation parameter uncertainties, other model parameter uncertainties were not considered, e.g., uncertainties in the strain-energy function (10). Uncertainties in geometry and boundary conditions could be neglected in this virtual laboratory setting, but can be important in more physiological models, e.g., patient-specific models. Moreover, the numerical accuracy of the FE solver, e.g., the discretization of domains, random fields, fiber distributions or NN can be neglected in view of the model uncertainty.
At this point, the discrete character of the applied DFD method must be discussed with regard to the uncertainties of the model. As emphasized in previous studies [47,104], the exclusion of degraded elastic fibers strongly depends on the number of discrete elements on the unit hemisphere. Another possibility would be to adapt the architecture so that the small-scale fluctuations are easier to learn. However, the complex architecture of the network complicates the interpretation of the network, and it is not obvious what this modification must look like. For example, simpler architectures with a physics-informed NN [105,106] are promising and require less data for training, but did not provide useful results in the present study. Possible reasons are the fact that the physics is governed by the entire stress tensor and not just by a specific component of the stress tensor, which indicates an inconsistent training objective [107]. Attempts to learn all the stress tensor components jointly suffered again from the indistinguishability of small-scale fluctuations and noise with both the introduced encoder-decoder and the physicsinformed NN. Another reason could be that boundary conditions were not taken into account in the network. This issue can be resolved by a soft or hard enforcement of boundary conditions [108]. Another direction would be to equip the architecture with attention mechanisms, as in natural language processing transformers [109] or vision transformers [110]. While our dense blocks pass forward all features of all scales to subsequent layers, a transformer could allow to automatically weight and correlate features on multiple scales. This could reduce the number of network parameters and allow for more data-efficient training while retaining the important feature correlations across multiple scales, at the expense of a loop in the backpropagation. This could make it easier to find a set of weights and biases that accurately predict small-scale fluc-tuations. Unfortunately, the convergence properties of the discussed NNs are poorly understood [85,88].
Note that Gaussian processes inter alia suffer from the curse of dimensionality in the input space, while NNs shift the curse to the weight space. In this context, this is the particular advantage of NNs, since the dependency of the simulation on a parameter field can be learned with it. If the number of degrees of freedom in the FE model scales linearly with the number of nodes N x , then the inversion of the stiffness matrix scales as O(N 3 x ). Although several much faster methods exist, the following argument still holds. The total computation time for the full problem scales as O (N s N 3 x ) for a brute force approach. When using a surrogate model, the total computation is composed of (i) the computation time for generating a training data set of Adaption of the network to realistic geometries with irregular non-uniform meshes is conceptually easy with geodesic convolutions [111], i.e. representing the convolution kernel in local coordinates, or graph NNs [112]. The scaling of random field generation to large domains was addressed in [76,77]. However, practical limitations could again be the computational budget for data generation, i.e. N s FE analyses and training. Liang et al. [42] showed that it is possible to learn an accurate surrogate for the stress distribution on larger domains as a function of geometry using statistical shape models with far less data (< 800), but neglecting surrogate uncertainties. It remains unclear whether the proposed approach is also really useful for random parameter fields on patient-specific geometries. In order to scale to patient-specific models, it would be promising to define the parameter field via a statistical shape model instead of directly via the FE discretization.
In principle, other machine learning approaches could have been used instead of a NN, e.g., a warped Gaussian process regressor [113]. While the convergence properties and uncertainties of Gaussian processes are much better understood, they are unable to capture higher-order correlations, scaling to high-dimensional input data suffers from the curse of dimensionality, and big data sets also require approximations [114]. Other machine learning approaches such as random forests [115], all too often suffer from the same limitations.
For the beta random field of the degradation parameter, we chose a correlation length of about √ 2/3 mm, which is not based on experiments and is considered a limitation of this study.
Therefore, future work has to investigate the sensitivity of the stress distribution and possible stress accumulations to the correlation length. In particular, recently published experimental results investigating the regional behavior of arterial samples tested in vitro under physiologically relevant loading can be used [4,116,117].
Regardless of the knowledge that many vascular diseases indicate local alterations in the aortic wall composition [116], most patient-specific computational models assume homogeneous material properties and a constant wall thickness [118,119]. In this study, we therefore presented a stochastic constitutive framework that provides a promising framework to study the role of material inhomogeneities using the example of spatially distributed degradation of elastic fibers under simplifying boundary conditions. This framework can also be applied and further extended to any other constitutive model. In order to be able to derive reliable conclusions about the stress distribution, future work must on the one hand contain the application to boundary-value problems that are closer to in vivo conditions and on the other hand the correlation length of the local inhomogeneities must correspond to the experiments [4,116,117]. It is proposed that the presented Bayesian framework allows the identification of a law for stochastic inhomogeneities based on experimental data. We therefore recommend experimentally investigating the role of local alterations in the aortic wall and incorporating these regional changes in material properties in future models. In particular, modeling the correlation between regional pathological changes of different constituents could be of crucial importance.

Data and code availability
Source codes for Python (Section 2) and for Python, PyTorch (Section 3) are available at URL: xxx (the file will be uploaded to https://repository.tugraz.at/).

Funding
This work was funded by Graz University of Technology, Austria through the Lead Project on the 'Mechanics, Modeling, and Simulation of Aortic Dissection' (biomechaorta.tugraz. at) and supported by GCCE: Graz Center of Computational Engineering.
Then with the variable transformation β := g 1 /(g 1 + g 2 ) and λ := g 1 + g 2 we obtain g 1 = λβ and g 2 = λ(1 − β). Thus, we find that We may now choose γ 1 = γ 2 = γ, which results in p(β, λ) = p(β)p(λ). Consequently, this gives so β := g 1 /(g 1 + g 2 ) indeed follows a beta distribution with η 1 = s and η 2 = s . Next we have to extend these considerations from random variables β to random fields β(X). With two Gaussian random fields f 1 (X) and f 2 (X) which have the same covariance function, and by defining with f (nx) r = f r (X (nx) ) the value of random field r = 1, 2 at a particular location X (nx) , and by using the definition of a Gaussian process, we find that where the integral is with respect to all variables f , while k and K are defined in (12) and (13) 2 . Since this holds ∀r and ∀X we reduced the problem to what was shown previously. Note that by definition of the Gaussian process, f r (X) and f r (X ) are not independent, however, by definition of the procedure, f r (X) and f r =r (X) are independent, which is sufficient.
In summary, we have first shown that a random variable β, as defined above, follows a beta distribution. Second, we have shown that this procedure also works for beta random fields by reducing the Gaussian random fields to Gaussian random variables at arbitrary locations. With this we have proven Eq. (23) and thus shown that the random field β constructed in Section 2.4 is in fact a beta random field. The main building blocks of the architecture are briefly described below. Dense blocks [2,5,2] Epochs 500 Growth rate 2 Learning rate 0.03 with cosine annealing Number of single predictions for mean field predictions 20 Dense blocks encoder 2 Dense blocks decoder 2

Dense layers 5
Bottleneck size 1 × growth rate each pixel is convoluted. For example, if h = 1 then each pixel P u,v is only convoluted with itself. In contrary, with h = 3 each pixel is convoluted with all nearest neighbor pixels. For higher h, each pixel is convoluted with the nearest neighbors, next nearest neighbors, next next nearest neighbors, and so on. Thus, the convolution kernel size defines the region in which a particular feature can be found. We can therefore think of it as a window matrix centered at P u,v The pixels corresponding to the respective convolution kernels are then found by optimizing the neural weights w.

Appendix B.2. Dense block
A dense block is a basic module that directly connects all layers with one another. This implies that all interconnected layers have the same input and output dimensions. This idea was introduced by Huang et al. [121] as 'DenseNet'. In other words, each layer l is connected to all previous layers − 1, − 2, . . . in the same dense block. So if an image has C 0 input channels, e.g., for a RGB image C 0 = 3 each th layer has a number of C 0 + ( − 1)C gr input feature maps [M ( −1) ] uv . Since we only learn the Cauchy stress component σ 33 , it follows that C 0 = 1.
However, the input features could correspond to all components of the Cauchy stress tensor, i.e. C 0 = 9. Two design parameters are introduced here, namely the number of layers within a dense block M and the growth rate C gr . This defines the growth of the input feature maps for each layer as the number of features increases due to the connection to all previous layers.
Then the total number of feature maps grows linearly with each layer introduced, so that a total of C out = C 0 + M C gr feature maps are output. This also means that for a dense layer we need to modify (40) as follows where ⊕ is the concatenation of outputs from previous layers. The activation h is applied element-wise to the concatenated elements. The additional double index denotes the weight between neuron n in layer and neuron n in layer < .
Image-to-image regression with encoder-decoder networks requires down-sampling and upsampling to resize the feature maps, which makes concatenation of feature maps impossible.
Therefore, dense blocks and transition layers are introduced to solve this issue.
Similar to conventional CNNs, DenseNet includes batch normalization [122], ReLU [123], convolution (Conv), and transposed convolution (ConvT), with padding for down-sampling and up-sampling, respectively, to ensure correct dimensions from the input feature map to a desired output feature map and vice versa.

Appendix B.3. Transition layers
Transition layers are used to reduce the number of feature maps between dense blocks and their size. More specifically, the encoding layer typically halves the size of feature maps, while the decoding layer doubles the size of the feature map. Both layers reduce the number of feature maps [41]. In addition, batch normalization layers are used after each convolutional layer, since this can also be seen as an effective regularizer [124].
As proposed in [125], fully convolutional networks are the extension of CNNs for pixelto-pixel predictions, where fully convolutional networks replace the fully connected layers of CNNs with convolutional layers. Furthermore, up-sampling layers are added at the end to restore the input spatial resolution, and skip connections between feature maps are included for the down-sampling and up-sampling path, see [41].
This work adopts the architecture of [41], which proposed a very similar approach to Dense-Net [121] with fully convolutional networks, with the main difference that the concatenation of feature maps between the encoding paths and decode paths was omitted. This means that, while in the work of [126] only the last feature map of the convolutional layer is fed into the transition layer, Zhu et al. [41] propose to keep all feature maps and concatenate it before passing it to the transition layer. It also avoids connection skipping due to weak correspondence and no max-pooling in encoding layers was used. To compensate for this, a stride of two was used.
The overall loss accuracy of the constitutive model was 86 %. The reliability plot is shown in Fig. B.8.