An Automated Workflow for Hemodynamic Computations in Cerebral Aneurysms

In recent years, computational fluid dynamics (CFD) has become a valuable tool for investigating hemodynamics in cerebral aneurysms. CFD provides flow-related quantities, which have been shown to have a potential impact on aneurysm growth and risk of rupture. However, the adoption of CFD tools in clinical settings is currently limited by the high computational cost and the engineering expertise required for employing these tools, e.g., for mesh generation, appropriate choice of spatial and temporal resolution, and of boundary conditions. Herein, we address these challenges by introducing a practical and robust methodology, focusing on computational performance and minimizing user interaction through automated parameter selection. We propose a fully automated pipeline that covers the steps from a patient-specific anatomical model to results, based on a fast, graphics processing unit- (GPU-) accelerated CFD solver and a parameter selection methodology. We use a reduced order model to compute the initial estimates of the spatial and temporal resolutions and an iterative approach that further adjusts the resolution during the simulation without user interaction. The pipeline and the solver are validated based on previously published results, and by comparing the results obtained for 20 cerebral aneurysm cases with those generated by a state-of-the-art commercial solver (Ansys CFX, Canonsburg PA). The automatically selected spatial and temporal resolutions lead to results which closely agree with the state-of-the-art, with an average relative difference of only 2%. Due to the GPU-based parallelization, simulations are computationally efficient, with a median computation time of 40 minutes per simulation.


Introduction
Intracranial aneurysms are pathological disorders which consist of an abnormal dilatation of the vessel wall. In severe cases, the aneurysm may rupture, causing subarachnoid hemorrhage, which can lead to severe disability or death [1]. The incidence of unruptured aneurysms is high as it occurs in about 6% of the population; however, the rupture incidence is very low, 7.7 in 100000 cases annually [2].
Consequently, the treatment of unruptured aneurysms also has a high economic cost [3]. Due to its high incidence, it is critical to accurately identify the subset of patients with high risk of rupture and plan the treatment accordingly.
There are several treatment options available to decrease the likelihood of rupture. One possibility is to surgically clip the aneurysm at its neck, to isolate the aneurysmal dome, and to prevent bleeding [4]. Another solution consists of filling the aneurysm with thin wires that constrict the flow and initiate a thrombotic reaction leading to a complete occlusion [5]. A recently proposed approach is based on placing a flow-diverter device that reduces the flow inside the aneurysm by directing most of the flow through the main artery, and inducing intra-aneurysmal thrombosis [6][7][8]. To establish an accurate treatment plan and to evaluate the risk of rupture, a good understanding of aneurysm hemodynamics is required. These can be used to predict the flow before and after the implantation, i.e., to investigate hemodynamic changes and the potential benefits of different therapies and choices for the patient.
The recent advances made in medical imaging, algorithms for automatically extracting anatomical information from the images, as well as in modern computing architectures (like graphics processing units), have enabled the development of much simpler workflows relying on physics-based computational models for patient-specific hemodynamic assessment [9]. Blood flow computations, when used in conjunction with patient-specific anatomical models extracted from medical images, provide important insights into the structure and function of the cardiovascular system. These techniques have been proposed for diagnosis, risk stratification, and surgical planning [10][11][12].
An increasing number of researchers suggest that there is a strong link between flow-related quantities and aneurysm growth or risk of rupture. This is still a highly debated subject [13,14], and correlations found between hemodynamic quantities and aneurysm progression are not yet conclusive, as researchers proposed different quantities. Boussel et al. [15] suggest that aneurysm growth occurs in regions of low wall shear stress. Takao et al. [16] evaluated energy loss, a pressure loss coefficient, wall shear stress, and oscillatory shear index for the prediction of rupture in a set of 100 patients, suggesting that the pressure loss coefficient may be a potential parameter for predicting the risk of rupture. Furthermore, there is a debate on whether low or high wall shear stress is contributing to aneurysm risk of rupture [17]. To this extent, further efforts on integrating personalized blood flow computations in clinical workflows are crucial for developing a unified theory on aneurysm pathophysiology.
There are important technical challenges with the largescale adoption of CFD-based tools for the clinical hemodynamic analysis of aneurysms. The methods are computationally intensive and lead to large runtimes, which is in contradiction with the high cost pressure and the continuously decreasing time a clinician can dedicate to a patient. CFD computations are commonly performed based on a discretization of the Navier-Stokes equations, using either the finite element method (FEM), finite difference methods (FDM), or finite volume methods (FVM). Models based on implicit integration using FEM have the advantage of unconditional stability, along with the ability to easily adapt to complex anatomical structures but require significant computational resources [18][19][20] for the solution of the resulting set of discrete equations. To further investigate the potential link between hemodynamic quantities and aneurysm outcome, a large number of computations are required; hence, an efficient approach for simulating blood flow in patient-specific anatomical models of aneurysms is of paramount importance. Although CFD-based approaches are nowadays routinely used in medical research activities to compute hemodynamic quantities under patient-specific conditions, the only CFD-based solution currently used in clinical practice, available only as a service, is focusing on computing fractional flow reserve in coronary arteries [21].
Another possible limitation of employing CFD in clinical practice is the requirement of CFD-related expertise for performing such computations, e.g., mesh generation, defining the boundary conditions, and choosing the spatial and temporal resolutions. Typical approaches to this problem consist of employing automated local mesh refinement techniques [22][23][24][25]. This limitation has also been addressed by Seo et al. [26], where a solver implementation based on the immersed boundary method has been proposed [27]. Therein, simulations are performed on a Cartesian grid using a level-set function that is directly extracted from medical images, bypassing the need of mesh generation. Recently, on-site clinical solutions have been proposed for hemodynamic analysis, e.g., for the diagnosis of coronary artery disease [28], which are based on reduced-order modeling or machine learning, and do not require specific CFD-or hemodynamics-related expertise. No on-site solution has been proposed to date relying on three-dimensional hemodynamic modeling.
To achieve a fully automated workflow for patientspecific aneurysm hemodynamics, there are two main steps which need to be considered. The first one is the extraction of anatomical models from images, and the second one is the flow computation itself. Although extracting anatomical models is a difficult problem, there are many existing solutions, both fully and semiautomated [29]. We consider that the flow computation step is still a major challenge which needs to be addressed.
In this paper, we propose an alternative approach for further automating cerebral blood flow simulations, starting from a patient-specific anatomical model reconstructed from medical images. We use a reduced-order blood flow model for computing the initial estimates of the flow distribution in all the vessel branches, which is then used to compute an initial grid resolution and time step. Furthermore, we employ an iterative approach that, if needed, then further refines the grid resolution and time step during the simulation.
To address the computational performance challenge, we employ a GPU-accelerated implementation of the lattice Boltzmann method (LBM). In recent years, LBM has emerged as a strong alternative to traditional finite element method (FEM), finite difference methods (FDM), and finite volume methods (FVM) for modeling fluid flows [30,31]. Unlike FEM-based solvers, LBM does not require the use of complex meshing algorithms and operates on a Cartesian lattice, greatly simplifying the preprocessing step. Further, the highly local structure of the LBM algorithm results in an impressive performance on modern parallel architectures [32]. LBM has also attracted the attention in the context of cerebral flow simulation: Chopard et al. [33] employed an open source LBM implementation [34] for studying thrombus formation in a cerebral aneurysm, Bernsdorf and Wang [35] used LBM to study flow rheology in cerebral aneurysm and Závodszky and Paál [36] performed a validation study leading to good 2 Computational and Mathematical Methods in Medicine results when comparing different LBM implementations with a finite volume solver and experimental data. Excellent results were obtained using the herein proposed solution in a very recently published clinical research paper [37]. Therein, 338 aneurysms were analyzed based on clinical, morphological, and hemodynamic parameters determined using CFD. A lower pressure loss coefficient was identified as a significant risk factor for the rupture of small intracranial aneurysms.

Methods
2.1. The Lattice Boltzmann Method. The lattice Boltzmann method (LBM) emerged from the lattice gas automata and is based on a discrete representation of the linearized Boltzmann equation on a regular Cartesian grid, see equation (1). Unlike the continuum Navier-Stokes-based methods that directly act on the macroscopic quantities of the flow, LBM operates at the mesoscopic scale on the particle distribution function. The discretized Boltzmann equations is where f i ðx, tÞ is the probability of finding a particle at spatial location x and time t, traveling with a velocity c i , which belongs to a set of preselected discrete velocities. The righthand side of equation (1) represents the collision term and models the rate at which the fluid state moves towards the thermodynamic equilibrium. There are different formulations for the collision term, the most commonly used being the Bhatnagar, Groos, and Krook (BGK) model [38]. The BGK model is formulated as a linear relaxation of the distribution functions towards an equilibrium distribution f eq and is controlled by a relaxation factor τ which is directly related to the kinematic viscosity ν through the relation, τ = ðν/c 2 s Þ + 0:5. Here, c s is the speed of sound on the lattice, which can be computed for standard isotropic Cartesian lattices as c 2 s = 1/3. It is the most simple and efficient implementation, but the BGK model lacks numerical stability when τ approaches 0.5. This limitation has been addressed and has led to other collision models, e.g., the multiple relaxation time (MRT) model [39] and the entropic model [40]. Equation (1) is a partial differential equation where the unknown is the density function f i . It is typically solved using an explicit time discretization scheme based on two steps: collision (2) and streaming (3), which are applied at each grid point x: where f eq i is the equilibrium distributions and depends only on the fluid velocity u and density ρ. Ω is the collision matrix which contains parameters controlling the relaxation of the distributions towards the equilibrium. There are different formulations for both f eq i and Ω, depending on the chosen collision model, e.g., for the BGK model Ω = τI. The macroscopic velocity u and pressure P of the fluid are related to the density functions f i as follows: The discrete velocities c i are associated to a lattice structure, such that each vector c i corresponds to a link connecting a node x with a neighboring node x + c i . The most commonly employed lattice structures for 3D fluid computations are based on 15, 19, or 27 velocities. Our implementation is based on the multiple relaxation time (MRT) collision operator and a three-dimensional 19-velocity lattice [39]. The main justification for choosing the MRT collision model is the significant improvement in numerical stability at higher Reynolds number flows, compared to the classic LBGK approach. Although cerebral blood flow typically has low Reynolds number, the presence of an aneurysm can lead to more complex flow patterns which may require significantly finer grid resolutions for LBGK-based simulations. Furthermore, Závodszky and Paál [36] performed LBM simulations on an intracranial aneurysm using different collision models and showed that the MRT model is the most accurate for this flow configuration. In the MRT model, the relaxation matrix takes the form Ω = M −1 SM. The distribution functions f are first transformed using the matrix M into moments m = Mf , and the relaxation towards equilibrium is performed in the moment space using the relaxation matrix S. The relaxation matrix S is a diagonal matrix containing a relaxation parameter for each moment m i . Once the new moments m have been computed, they are transformed back to the f space using M −1 . For a more detailed description of the MRT model and for numerical values of the relaxation parameters, we refer to [39].
For lattice nodes located near a boundary, i.e., for which a neighboring node is located outside the fluid region, there are unknown f i values that are required to perform the streaming step (3). The most commonly used way to compute the unknown distributions is the bounce-back approach [41]: the unknown f i are set to the values corresponding to the opposite lattice direction f i ′ , such as c i = −c i ′ . This is equivalent to reversing the velocity of a particle colliding with the wall. Herein, we employ an interpolated bounce-back scheme [42] that can be used for curved walls, being able to take into account the exact location of the vessel surface between two lattice locations: 3 Computational and Mathematical Methods in Medicine ð6Þ u w is the prescribed velocity and q i is a factor with a value between 0 and 1 that accounts for the exact position of the wall between two lattice nodes (see below for more details). For the no-slip boundaries, i.e., the vessel wall, u w is set to zero, while for the vessel inflow, it is set to match the prescribed flow rate. We emphasize that this interpolated bounce-back formulation is capable of taking into account the exact location of the boundary. Although at a fundamental level the fluid domain is approximated as a staircaseshaped volume, the surface is described as an isosurface of a continuous and smooth scalar field; therefore, the boundary surface is independent of the chosen grid resolution and its exact location is always imposed. For the outflow boundary, the velocity is typically unknown and the pressure is imposed. In this case, the nonequilibrium extrapolation method [43] is employed, which replaces all the f i values at the boundary using information extrapolated from a neighboring location: where f neq = f i − f eq i is the nonequilibrium part of the distribution functions, and x neigh is a neighboring fluid node located along the boundary surface normal.

Grid Generation.
The vessel geometry is initially given as a surface mesh where each polygon is tagged depending on the surface to which it belongs: inlets, outlets, or vessel wall. Since LBM computations are performed on a Cartesian grid, the given mesh is voxelized and a level-set representation of the vessel geometry is obtained: a signed distance field ϕðxÞ such that ϕðxÞ < 0 for the inside (fluid) region of the domain and ϕðxÞ > 0 for the outside (solid) region. The distance field ϕ is computed by mapping each node to the closest polygon on the mesh and computing the signed point-to-triangle distance. The exact distance is only required for nodes located close to the boundary, to perform the interpolations described by equations (5) and (6), i.e., for computing the q i values. Hence, the exact distance is computed only for nodes located in a range of ±2δx on both sides of the boundary. For the rest of the domain, only the sign of ϕ is required, and for these nodes, the distance field is extrapolated from the boundary region range. This is an important aspect of the performance improvement aspect, since computing the exact distance for the entire domain would dramatically increase the execution time [44].
The level-set function alone can be used to determine if a grid node is located at the boundary; however, additional information is required to determine which type of boundary condition to apply at each boundary node, i.e., inlet, outlet, or solid wall. Hence, each grid node needs to be labeled accordingly. The labels are computed during the voxelization step by mapping each grid node to its closest polygon on the mesh. Boundary nodes that are located at the intersection of two surfaces of different types, i.e. an inlet and wall intersection or an outlet and wall intersection, are considered corner nodes, and a special labeling logic is employed. We found the logic of labeling corner nodes to be a highly sensitive aspect, as it has a significant effect on the flow when dealing with complex-shaped boundaries. A node x in the grid is considered to be a boundary node if ϕðxÞ ≤ 0, and there is an i such that x + c i is located on the other side of the surface, i.e., ϕðx + c i Þ > 0. A boundary node is considered to be a corner node if ∃i, j, such that the segment given by x and x + c i intersects an inlet or outlet surface, and the   Computational and Mathematical Methods in Medicine segment given by x and x + c j intersects a wall surface. Figure 1 shows a graphical representation of the node labeling process. In the following, we present the labeling logic for each type: (1) Inlet and outlet nodes: all nodes x for which ϕðxÞ ≤ 0, and there is an i such that ϕðx + c i Þ > 0, and the segment given by x and x + c i intersects an inlet or outlet surface, respectively (2) Wall nodes: all the unlabeled nodes x for which ϕðxÞ ≤ 0, and there is an i such that ϕðx + c i Þ > 0, and the segment given by x and x + c i intersects a vessel wall surface We emphasize that the node labeling steps must be performed sequentially, in the given order, to correctly prioritize labeling for the corner nodes.
The factors q i used in equations (5) and (6) for interpolating the distributions are computed using values of the signed distance field ϕ at the current node x and the neighboring node x + c i as follows: All the operations described above are completely automated: after passing the initial surface mesh, there is no user interaction required for setting up the simulation. Although these operations are computationally expensive, they are only performed once in the preprocessing stage of the simulation; hence, the impact on the overall computation time is small. Under typical simulation configurations, the entire preprocessing step occupies a very small fraction of the whole computation time, for example, using a grid of 23.3 million nodes and a surface mesh of 318000 triangles, it requires around 1.4 minutes of runtime. Furthermore, the preprocessing time was found to change very little with respect to mesh size but increases quadratically when grid size is increased.

Automatic Model-Based Parameter Selection.
Reducing the user interaction and the required CFD-related expertise represents an important aspect for employing a flow solver in a clinical setting. A key feature of our implementation is the automatic tuning of the time step δt and the spatial resolution δx, for optimizing accuracy and performance. To achieve this, we propose a heuristic approach based on the known LBM stability limits and some empirically chosen factors. More specifically, δt and δx are chosen to be as coarse as possible, but at the same time, to be small enough to capture relevant flow features and to satisfy LBM-specific stability constraints: where ν lbm and u lbm are the nondimensional kinematic viscosity and flow velocity, respectively. For the 19-velocity MRT implementation, the critical values are ν min = 2:54 × 1  Figure 3: 2D analogy of the indirect addressing scheme applied for handling a sparse, irregular fluid region on a Cartesian grid.  [39]; however, we used smaller values, u max = 0:15 and ν min = 1:00 × 10 −3 , to avoid coming too close to the stability limit. These critical values are defined in terms of a lattice-based unit system, which is different from the physical viscosity and velocities. The transformation between the lattice and physical unit systems is performed using unit scale factors: δx, δt, and δm for position, time, and, respectively, mass. For example, the following transformations are employed for velocity u = u lbm ðδx/δtÞ and pressure P = P lbm ðδm/δxδt 2 Þ.
Writing equations (9) and (10) for physical quantities, i.e., ν = ν lbm ðδx 2 /δtÞ and u = u lbm ðδx/δtÞ and collecting δx and δt leads to the stability conditions: We emphasize that the upper threshold for the velocity u max is expressed in lattice units, while the velocity magnitude kukðx, tÞ is expressed in physical units. Figure 2 displays  Computational and Mathematical Methods in Medicine a graphical representation of the stability range, i.e., the region between the two curves. Although the lower limit of the time step, for a given spatial resolution (i.e., equation (11)), is not common for typical CFD models, for LBM, the nondimensional viscosity ν lbm must be above the critical value ν min which depends on the chosen collision model. The chosen ðδx, δtÞ values that maximize performance while remaining in the stable region are found at the upper intersection of the two curves. As for δm, it is computed from the physical and nondimensional density as follows: The nondimensional density ρ lbm is typically set to 1; hence, δm = ρδx 3 . To compute the optimal ðδx, δtÞ values using equations (11) and (12), two quantities are required: the physical kinematic viscosity ν and the maximum physical flow velocity u max . Viscosity ν is known, however, since it depends on the vessel geometry (e.g., geometries with multiple outlets), there is no information regarding the maximum velocity u max at the beginning of the simulation. Hence, initially, δt and δx are computed using an estimated value, discussed below. As the flow is developing, the maximum flow velocity is continuously monitored, and if it exceeds the critical threshold value (equation (10)), δt and δx are adapted to satisfy the stability constraints. After a grid and/or time step refinement, the simulation is restarted from t = 0 using the new ðδx, δtÞ values.
The number of required refinements (and simulation restarts) depends on the accuracy of the initial estimate of the maximum flow velocity u max . For example, if the vessel geometry presents narrowing segments, the flow velocity increases locally due to convective acceleration and a high number of refinements would be required, resulting in poor runtime performance. Similarly, if the initial estimate is too high, then the grid resolution and time step may be too fine, also resulting in poor runtime performance. A good estimate of u max would consequently result in a good estimate of the required spatial and temporal resolutions and optimal computational performance. To improve the initial estimate of u max , we utilize a surrogate reduced-order model to estimate the distribution of flow in the vasculature. The reduced-order model is formulated as a local cross-sectionally averaged version of the Navier-Stokes equations, and therefore only solves for the total flow and pressure instead of the detailed velocity field. Such reduced-order models have been successfully used in the past to compute time-varying flow rate and pressure waveforms in full-body arterial models [45] and under pathologic conditions in specific parts of the circulation: coronary atherosclerosis [46], aortic coarctation [47], abdominal aorta aneurysm [48], and femoral bypass [49]. The one-dimensional model used herein has been previously introduced in [48] and was validated in several clinical studies [50,51] in the context of coronary artery flow.
The one-dimensional blood flow model is derived from the three-dimensional Navier-Stokes equations based on a series of simplifying assumptions [52]. The governing equations ensuring mass and momentum conservation are where q = qðx, tÞ is the flow rate at the axial location x and time t, Aðx, tÞ is the cross-sectional area, pðx, tÞ is the local pressure, and ρ is the density. Coefficients α and K R account for the momentum-flux correction and viscous losses due to friction, respectively. A state equation is employed to close the system of equations, defining a relationship between the local cross-sectional area and the local pressure.
where E is the Young modulus, h is the wall thickness, r 0 is the initial radius corresponding to the initial pressure p 0 , and A 0 is the initial cross-sectional area. Since the purpose of using the reduced-order model is to obtain a reliable initial estimate of the flow distribution for the rigid-wall LBM model, a very large wall stiffness is chosen, i.e., the elastic parameter is set to a very large value. At each bifurcation, the continuity of flow and total pressure (sum of dynamic and statis pressure) is imposed. The same inlet and outlet boundary conditions as for the LBM solver are employed, and the time-varying pressures and flow rates along the centerlines of the anatomical model are computed. The cross-sectional velocity profile is assumed to be parabolic; hence, the maximum velocity u max = 2u mean is approximated as where the maximum is taken for both the centerline position x and time t.
The approach of using a reduced-order model, by providing reasonable initial estimates, significantly mitigates the need for grid and time step refinement. However, since the refinement operations are affecting the total runtime performance, we apply a heuristic and reduce the initially estimated 7 Computational and Mathematical Methods in Medicine resolution by 20%, with the purpose of further improving performance. As a result, we found that only 1-2 refinement operations were needed on average for the 20 patient-specific aneurysm anatomies presented in Section 3.1.
We emphasize that the sole purpose of the reduced-order model is to provide a gross initial estimate of the required spatial and temporal resolutions, which are then further refined during the 3D simulation. Furthermore, it is not necessary for the presented 1D model to be employed for this task; any reduced-order flow model can be used, for instance [53][54][55].
Besides the two criteria described above for selecting the initial resolution (equations (11) and (12)), an additional criterion based on the geometry of the vessel was found to be necessary. Specifically, in cases where the vessel has a secondary branch with a very small radius and a corresponding very small flow rate, the estimated resolution may be too coarse for that branch. The additional criterion consists in limiting the δx value such that the minimum vessel diameter is represented by at least 15 nodes.

GPU Implementation.
In the past, most highperformance computations were executed on large clusters of computers, each capable of executing a small number of parallel threads. However, over the last decade, general purpose graphics processing units (GPUs) have shown a tremendous increase in performance. Each GPU is capable of executing thousands of low-overhead threads simultaneously. While this kind of performance was originally developed for supporting video applications, they have become indispensable for scientific computing and for accelerating the performance of machine learning algorithms.
The lattice Boltzmann method is inherently a highly parallel algorithm, owing to the largely local nature of the computations. As discussed earlier, there are two main   [32,[56][57][58][59]). In the following, we discuss the data structures used to optimize the LBM implementation for GPUs. Depending on the geometric complexity of the given vessel, the fluid region of the domain usually occupies a small fraction of the entire volume. Hence, allocating memory for all the nodes would be impractical and would reduce the maximum grid resolution due to memory constraint. Unstructured grids are inherently better at handling such geometric complexity, but they cannot be parallelized as easily. The main advantage of a structured grid is that one is able to access any node directly from its grid coordinates ði, j, kÞ, without performing a search operation. This is an important performance aspect since, during the streaming step (3), neighboring locations x + c i are required for each node x. To address this issue and at the same time maintain the advantages of a structured grid, we employed an indirect addressing scheme consisting of using an additional indexing array.
A two-dimensional analogy of the indirect addressing scheme is displayed in Figure 3. The index array contains integer indices and is used for mapping the grid coordinates ði, j, kÞ to a fluid node index. A location in the index array contains an index in the fluid nodes array or -1 if it corresponds to a solid node. Since the fluid region remains unchanged during the simulation, the content of the index array is computed only once during the preprocessing stage. The fluid node array contains the information necessary for describing the flow state: the f i values, macroscopic pressures, velocities, etc. In this implementation, we have one global Cartesian grid with uniform spatial sampling. The index array is defined at every node that belongs to this Cartesian grid, containing a default value of -1 for all nodes which are outside the fluid domain, and the actual index of the fluid node for valid locations. The macroscopic variables of interest (distribution functions, velocity field, pressure, and forces) are only defined for nodes which belong to the fluid domain, resulting in significant memory savings.
As described earlier, the nodes are tagged in the initialization procedure either as belonging to the bulk of the fluid or requiring appropriate execution of a boundary condition model. Each type of node needs to be handled separately since each one may have a different implementation for the collide and streaming procedures. For instance, in the case of inflow and bounce-back nodes, the streaming step (equation (3)) is replaced by the interpolated bounce-back scheme (equation (5) and (6)), whereas for the outflow nodes the streaming step is omitted because all f values are completely replaced in the collision step. Therefore, an array of global indices is created in the preprocessing stage for each node type; these arrays are used to select all nodes of the same type and apply the corresponding collide and stream procedures during the simulation.

Verification.
The numerical implementation of the lattice Boltzmann method was extensively validated in the past on analytical cases with known results, e.g., Womersley flow and channel flow [41,60]. Herein, we focus on validating the methodology on real patient anatomies. First, we performed experiments on a benchmark aneurysm model previously employed in [61] as part of the "Aneurysm CFD Challenge 2012," where the participants were required to perform CFD simulations and predict the flow. The case consists of a giant cerebral aneurysm with a proximal stenosis, displayed in    10 Computational and Mathematical Methods in Medicine submitted solutions and concluded that the pressure drop caused by the stenosis was reasonably well predicted among the vast majority of the participants. We performed simulations using the same configuration and compared our results against the solutions submitted for the challenge. A timevarying flow rate was imposed at the inlet boundary, and simulations were performed for two different pulsatile flows having the same waveform ( Figure 4), but different mean flow rates (5.13 and 6.41 ml/s). The spatial flow profile was flat for all simulations. At the outlet, we employed the Dirichlet boundary conditions for the pressure, imposing p = 0 everywhere. The initial pressure and velocity inside the fluid domain were set to zero. To reduce the transient effects that may be caused by the initial conditions, the simulations were performed for three cardiac cycles and results were extracted from the third cycle. The fluid was assumed to be Newtonian with a kinematic viscosity of ν = 4 mm 2 /s and a density of ρ = 1000 kg/m 3 . In Figure 5, we display the pressure and velocities along the centerline of the vessel computed by our model and compare them to solutions reported in [61], which correspond to solutions submitted by participants for the aneurysm challenge. The pressures in Figure 5 are computed relative to the inlet, i.e., the entire curve is shifted such that the inlet pressure is zero. Overall the centerline pressures match the published results well. As for the centerline velocities, the variability of the solutions is larger, especially in the regions close to the inlet and outlet. The main reason is that only the time-varying flow curve at the inlet was prescribed, and not the exact boundary condition to be used. This resulted in a mix of different boundary conditions, including flat profile, approximate parabolic profiles, and extensions at the inlet. Furthermore, in the outlet region, the variability is given by the turbulences that are produced by the stenosis and other geometric features of the vessel. Apart from the two regions with large variability, the LBM velocity field solutions match the aneurysm challenge solutions well. We emphasize that these computations were performed automatically, with no user interaction other than providing the surface mesh and the inlet flow rate (the spatial and temporal resolutions were automatically selected based on the approach described in Section 2).
In Table 1, we present the inlet-outlet pressure drop for LBM and the challenge solutions, for both pulsatile configurations. The median and the interquartile ranges are computed based on the challenge solutions. The LBMbased pressure drop values match the median values of the published solutions very well. In Figure 6, we present contour plots of the velocity field for the Pulsatile 2 configuration, for the LBM-based results, and for two solutions from [61], Nektar 1 and Nektar2, which are considered to be the reference solutions as they are based on a spectral element solver with high spatial and temporal resolutions.
To further validate our solver, we have performed simulations on 20 patient-specific aneurysm cases and compared the results against those obtained using a commercially available CFD solver (Ansys CFX, Canonsburg PA, http:// www.ansys.com/). The cases correspond to ten internal carotid artery (ICA) and ten middle cerebral artery (MCA) aneurysms. A more extensive verification of our solver on aneurysm cases was performed in [62]. Simulations were performed under the same configuration as herein, three cardiac cycles, and results were extracted from the last cycle only; a time-variable velocity is specified at the inlet boundary, while the outlet is set to have constant pressure. The grid resolution was automatically estimated using the approach proposed in Section 2.
We first compared two quantities which were previously shown to be important indicators for the risk of rupture in aneurysms: the pressure loss coefficient (PLc) and the average wall shear stress on the aneurysm dome (AvWSS) [16]. The pressure loss coefficient is a nondimensional quantity describing the relative pressure drop and is defined as follows: where P in and P out are the mean pressures measured at the inlet and outlet planes, respectively, while u in and u out are mean velocities measured at the same planes. The inlet and outlet planes used for computing PLc are placed perpendicular to the vessel centerline at approximately 1 mm before, and, respectively, after the aneurysm. WSS is typically computed using spatial velocity gradients; however, using MRT-LBM, the WSS can be extracted directly from the nonequilibrium moments m neq = m − m eq , as described in [63].
In Figure 7, we present the comparison for both PLc and AvWSS. Correlation between Ansys CFX and our proposed implementation appears to be exceptionally good for both quantities: the Pearson correlation was 0.999 and 0.993 for PLc and AvWSS, respectively, while the p value was 0 for both cases.
A quantitative comparison of the centerline pressures and velocity magnitude was performed for all twenty cases, and the results are displayed in Tables 2 and 3. Since most

12
Computational and Mathematical Methods in Medicine of the cases have multiple outlet branches, the comparison was performed for each branch separately. The centerline curves contain between 500 and 1000 points, and the differences were taken at each point and represented relative to the CFX quantity averaged over the entire centerline. More specifically, at each point on the centerline, the absolute difference was divided by the average CFX pressure or velocity: where d is the relative difference, x is the position along the centerline, and f is the quantity of interest, i.e, pressure/velocity. The average relative difference was 2.1% and, respec-tively, 2.21% for peak systole and cycle averaged pressure, and 1.85% and 1.73%, respectively, for the velocity. Furthermore, we computed the total pressure drop for all cases, both for the LBM-based and CFX results. Since the outlet pressure was always set to zero, the total pressure drop corresponds to the inlet pressures. Using these values, the relative difference between LBM and CFX was computed. The maximum difference was found to be 4.5% for the cycle averaged pressure and 5.14% for the peak pressure, corresponding to the MCA9 case, while the minimum differences were 0.05% and 0.36% for the cycle averaged and peak pressures, respectively, corresponding to the ICA6 case. On average, the relative differences were 2.15% and 2.5% for cycle averaged and peak systole quantities.
A visual comparison of the results corresponding to one of the cases is provided in Figures 8 and 9. Figure 8 presents  13 Computational and Mathematical Methods in Medicine the pressure, velocity, and wall shear stress (WSS) fields for both the LBM-based and CFX results, and Figure 9 presents the pressure and velocity plot along the centerline of the main branch for the same case.
Since LBM is based on an explicit discretization scheme, there is an implicit delay in the flow propagation from inlet to outlet. In our experiments, we found the delay to be about 20 ms for all cases and outlets. Although this is an undesired aspect, there is a certain delay in physiological conditions caused by wall elasticity. Unfortunately, an accurate quantification of this physiological delay is currently not possible.
Overall, the LBM results appear to match well the CFX results for both cycle averaged and peak systole quantities, without requiring any human intervention in selecting an appropriate grid resolution.

Convergence Study.
To demonstrate that the automatically selected grid size is sufficiently fine, we have performed a convergence study on two representative cases, i.e., an ICA and a MCA case. We performed computations with five different values of the spatial resolution: δx/2, δx, 2δx, 3δx, and 4δx, where δx is the automatically selected value. Since the computed spatial resolution and time step are typically chosen such that they are close to the stability limit, as described in Section 2.3, it would normally not be possible to perform simulations with coarser resolutions (2δx, 3δx, and 4δx) because of the chosen stability limits. Therefore, to perform these simulations, we set the stability for ν min and u max to the original values from [39]: u max was increased to 0.19, and ν min was decreased to 2.56e-3. Also, the third criterion (D min /δx > 15) was removed. Figure 10 displays the results as plots of the pressure and velocity magnitude along the vessel centerline. The green and blue plots correspond to the automatically selected, and, respectively, a twice as fine spatial resolution; the other plots correspond to coarser resolutions. The convergence trend is well visible for both cases. All results were compared against the reference values corresponding to the finest resolution (δx/2). Table 4 displays for each case the mean absolute errors as a function of the spatial resolution. For the simulations corresponding to the automatically selected spatial resolution (first column of Table 4), the maximum error relative to the mean quantity along the centerline is 2.18%. We emphasize that the purpose of these experiments is to show that running the computations at a finer resolution will not significantly change the results, therefore indicating that the automatically chosen resolution is sufficiently fine.
3.3. Computational Efficiency. All LBM computations were performed on a regular workstation with an NVIDIA GTX 1080ti graphics card. The GPU implementation was based on the NVIDIA CUDA version 8.0, and all computations were performed using double precision arithmetic. In Table 5 we report the LBM computation time for each case along with the corresponding grid size and time step. We emphasize that δx and δt were chosen automatically using the approach described in Section 2. The CFX simulations were performed on a cluster of three nodes of 12 CPU cores each. The mesh sizes ranged from 2.6 to 11.3 million tetrahedral elements and were chosen after a grid convergence study. For the flow configuration presented in this paper, the LBM computation time was found to vary between 10 minutes and 300 minutes, which is significantly faster than the typical runtimes reported using existing methods in literature [18,19]. We have only used a single, commodity GPU on standard workstation for the computations. The grid sizes varied between 50 μm and 120 μm, and the time step between 3 μs and 23 μs.

Discussion
Performing CFD computations is typically a challenging task, especially for complex flows like in the case of cerebral aneurysms. The main challenges are given by computational complexity, which leads to large execution times, but also by the requirement of having an experienced user for choosing solver parameters, mesh resolution, etc. Both of these are lim-iting factors that reduce the potential of employing CFDbased tools in clinical settings, where patient-specific computations need to be performed. Herein, we have addressed these limitations and proposed a novel methodology for performing hemodynamic computations in patient-specific cerebral aneurysms. The computational cost was significantly reduced by employing a GPU-based implementation of the lattice Boltzmann method. A CFD simulation for one  Figure 10: Convergence analysis was performed on an ICA case and a MCA case. The plots display peak systole and cycle averaged pressures and velocities along the centerlines. Bifurcations on the plot correspond to the bifurcation near the outlets. Green lines on each plot correspond to simulations performed using the automatically selected spatial resolution value δx, the blue line corresponds to a grid two times finer (δx/2), while the others correspond to coarser grids. cerebral aneurysm can be performed in a matter of minutes on a regular workstation compared to hours on expensive computing clusters. We performed computations on 21 aneurysm cases and the median execution time was 40 minutes using a single-commodity GPU. We emphasize that the measured time includes the preprocessing step, all three simulated cardiac cycles, and also the simulation restarts required for tuning the spatial resolution and the time step. Although the computation time may still be considered too high for employing such tools in a clinical setting, it can be significantly reduced by further increasing the parallelism, e.g., by using multiple GPUs simultaneously [32]. Furthermore, we found that there is a strong dependence between computation time and vessel geometry complexity, i.e., narrowing segments, curvature, and branching. As the flow develops more complex features, a finer resolution is required, which increases the computation time. Also it could be argued that real-time performance may not be necessary for aneurysmal flow computation in clinics as there is no reason for results to be obtained synchronously. However, performance could become important in a virtual treatment scenario where a clinician may want to explore the possibility of deploying an endovascular device and change different properties. Also computational performance could become important for other cardiovascular diseases where flowrelated quantities are needed immediately, e.g., coronary artery disease.
We showed that even though performance is significantly improved, the accuracy is not affected. We extracted the results from all simulations and compared them with other publicly available solutions and also with a commercial solver. First, we considered the solutions reported for a CFD challenge [61], which contains multiple simulation results obtained using various flow solvers and configurations. The comparison shows an exceptionally good agreement, as our solutions lies very close to median of the others, as indicated in Table 1. Furthermore, we performed a comparison of the velocity contours with two of the solutions reported in [61], considered to be the best resolved ones. Although the flow presents strong turbulences inside the aneurysm dome, the velocity contours appear to match very well. Lastly, we compared results with those obtained with a commercial solver (Ansys CFX) on 20 aneurysm cases. A good agreement between solutions was found, with an average relative difference of 2%. We emphasize that for all validation experiments, our results were automatically obtained using the proposed methodology, with no user interaction other than providing the vessel surface mesh. We further emphasize that this comparison is by no means a definitive process for evaluating the accuracy of our results; however, currently,  Furthermore, since the workflow is completely automated, a clinician may use such a tool without requiring CFD-related experience and expertise. To demonstrate that the automatically chosen grid size and time step are sufficiently fine, we performed a convergence analysis on two randomly chosen cases, by running simulations with both finer and coarser spatial resolutions. We found that a spatial resolution twice as fine led to a maximum of only 2.18% relative change in centerline pressure and velocity.
To formulate the criteria for grid refinement, we start from the stability constraints implicitly present in the LBM method, in combination with the prior information obtained through the reduced-order blood flow model. More rigorous criteria for grid refinement were also proposed in the past, for example, Axner et al. [64] proposed a criterion for cases where the flow is driven by a Womersley profile; similarly, Lagrava et al. [23] proposed a criterion for performing automatic grid refinement with LBM. Although we employed a rather heuristic approach, it is based on the same fundamental aspect: the error is proportional to the nonequilibrium part of LBM distribution functions.
As for the limitations of the proposed workflow, the most important one is the lack of patient-specific information regarding the boundary conditions used for the simulations, especially the time-varying flow rate and its profile imposed at the inlet boundary, the outflow resistances, and the viscosity models. Although these are important parameters, other studies indicate that flow quantities are much more sensitive to geometry than to flow parameters, i.e., changes in the anatomical model of the vessel, obtained as a result of the segmentation and reconstruction process, have the greatest impact on accuracy [65][66][67][68][69]. As further research will shed light on the relevant flow-related quantities having the greatest impact on aneurysm pathophysiology, progress will be made towards a unified modeling technique for aneurysmal flow, e.g., by improving the accuracy of the relevant flow aspects and determining which modeling methodology performs best.
We would like to emphasize that the purpose of including Ansys CFX in the verification process is solely for quantifying the accuracy of our results, by comparing them with a trusted state-of-the-art method which is generally accepted in the community. By no means, do we consider our implementation (or LBM methods in general) to be superior for this class of problems in terms of efficiency or other aspects.

Conclusions
We proposed a workflow aimed at improving the potential of using CFD-based tools in a clinical setting, as a tool for aiding decision making and establishing a personalized treatment plan for cerebral aneurysms. We addressed the main limitations: the requirement of CFD-related experience and the computational performance. To speed up the CFD computations, we employed a GPU-accelerated flow solver based on the lattice Boltzmann method. We showed that for this particular flow configuration, the computation time was reduced to minutes on a regular workstation, whereas typical CFD computation times are much larger even when performed on expensive computing clusters. We introduced a fully automatic pipeline for selecting the mesh resolution and time step using solely vessel geometry information, allowing thus clinicians to perform computations and obtain results with minimal CFD-related expertise. We showed that even though computation time was greatly reduced, there is no significant impact on accuracy. We performed computations on several aneurysm cases and compared results with other publicly available solutions and also with a commercial solver. An average relative difference of 2% between the solutions was found. Furthermore, a convergence study was performed, showing that the automatically chosen spatial resolution is sufficiently fine for the chosen case.
Future work will focus on further adapting the automated parameter selection process to also enable the inclusion of flow diverters in the simulation. We also consider further extending the workflow for a broader range of blood flow computations, e.g., the coronary arteries and aorta.

Data Availability
The patient data used to support the findings of this study were supplied by Department of Innovation for Medical Information Technology, Research Center for Medical Science, Jikei University School of Medicine, Tokyo, Japan, so cannot be made freely available.

Disclosure
The concepts and model presented in this paper are based on research and are not commercially available. Due to regulatory reasons, its future availability cannot be guaranteed.