Overhang control based on front propagation in 3D topology optimization for additive manufacturing

Abstract It is attractive to combine topology optimization (TO) with additive manufacturing (AM), due to the design freedom provided by AM, and the increased performance that can be achieved with TO. One important aspect is to include the design rules associated with the process restrictions of AM to prevent the requirement of relatively large support volumes during printing. This paper presents a TO filter that enforces a minimum overhang angle, resulting in an optimized topology that is printable without the need for support structures. The filter is based on front propagation, which, as it is described by a PDE, allows for a straightforward application on unstructured meshes, to enforce an arbitrary overhang angle. Efficient algorithms developed for front propagation are used in combination with adjoint sensitivities, in order to have a minor influence on the total computational cost. The focus of this work is on the implementation of the filter for high resolution 3D cases, which requires development of the front propagation for tetrahedral elements, and its parallelization.


Introduction
Additive manufacturing (AM) offers tremendously more design freedom compared to conventional manufacturing techniques, and geometric complexity has a much lesser relative impact on production cost. Therefore, it is frequently linked to topology optimization (TO), a computational design method that generates optimized designs for given objective and constraints, but often results in complex, organically shaped designs, which were difficult to manufacture until the emergence of AM [1,2].
However, in practice, AM is not completely free of manufacturing constraints. Many studies have been performed to identify design rules for AM [3][4][5][6][7]. Design for manufacturing practice states that ignoring manufacturing constraints during the design process will lead to extra costs during manufacturing [8]. When a topology optimized design is modified after optimization to incorporate the manufacturing constraints, optimality is mostly compromised. Therefore, the AM constraints should be included into the topology optimization to retain optimality while adhering to the manufacturing constraints.
The AM constraint that is the focus of this study is the overhang limitation. This constraint arises from the fact that each consecutive layer that is printed requires a certain amount of overlap with the previous layer, in order to have sufficient mechanical support and/or heat dissipation [9]. Therefore, the distance with which each layer can "overhang" the previous layer is limited. The degree of overhang is usually measured as the angle between a downward-facing surface and the base plate. The minimum allowable overhang angle α oh depends on the type of process and material, hence a general overhang constraint should be able to handle a range of minimum overhang angles [10]. In the remainder of this paper, surfaces are termed overhanging if they have an overhang angle smaller than the minimum overhang angle.
Incorporation of AM constraints into topology optimization has recently become an active field of study. Various papers have presented 2D approaches [11][12][13][14][15], but relevance to industrial practice requires methods to be highly effective in general 3D settings. Therefore we shall focus our discussion on the studies that show a 3D implementation of the overhang constraint. For a more comprehensive review the reader is referred to [16]. Generally, the overhang constraints can be classified into three categories: local boundary angle control, geometrical AM process modelling, and physics-based AM modelling.
Constraints in the first category attempt to detect the topological boundary and constrain the overhang angle locally. This has been applied in [17,18] and [19].
The geometrical AM process modelling constraints also enforce a given overhang angle, but do so by scanning the topology in the print order to detect overhanging areas -crudely mimicking the printing process [20][21][22][23]. Finally, the physics-based constraints incorporate a more elaborate model of the printing process, by modelling the manufacturing, e.g., as a series of self-weight loads [19,24].
The local boundary angle control methods usually converge to sub-optimal local minima, generating saw-tooth like structures that are not manufacturable, whereas the physics-based constraints, although potentially providing more details, are numerically expensive since they generally involve one or more finite element analyses to model the printing process. Therefore, this study presents an overhang constraint of the geometrical AM process modelling type. Compared to the existing methods, it can handle unstructured meshes and variable overhang angles naturally, as opposed to the work of [20] which would require an additional mapping as detailed in [25]. Furthermore, there is no additional non-linearity introduced by a heaviside projection filter [23], and no additional filtering is required to prevent floating supports [21]. Finally, it can be implemented in any density based topology optimization as opposed to [22] which uses a custom topology description.
This paper presents the extension to 3D of the front propagation based overhang filter presented in [15]. While our earlier 2D implementation validated the concept of detecting and eliminating overhang using front propagation, the 3D implementation allows true 3D high-resolution design for AM. The non-trivial steps required in expanding from 2D to 3D are (i) efficiently propagating the front on the element level, and (ii) parallelizing the front propagation. For these challenges novel solutions are proposed. Furthermore, improvements have been made to control the length scale of the overhang filtered design. The paper is organized as follows. In Section 2 the method of overhang detection and the incorporation of the filter in TO is explained. Section 3 provides the details of the numerical implementation in 3D. Numerical results are presented in Section 4, and Section 5 concludes this paper.

Detecting overhang using front propagation
The idea of using front propagation for overhang detection originates from the printing process itself, which can be seen as a front evolving with each printed layer. Similar to the real printing process, the front propagation is initiated at the base plate, defined by a boundary ∂Ω 0 . The front propagation can be described by an arrival-time field T (x), which represents the pseudo-time at which the front reaches to location x. The core idea of our method is to construct an arrival-time field in a 3D design that distinguishes printable regions from overhanging ones.
Consider the structure given in Fig. 1a on which the overhanging region should be identified, when printed in the build direction indicated by b. The first arrival-time field that is required for overhang detection is termed the layer arrival-time field, T layer (Fig. 1b). It describes the sequence of printing, as each isosurface of this field represents a printing layer. The arrival time of each layer is a measure of distance to the base plate, as can be seen in Fig. 1b. Subsequently, a second arrival-time field, T AM , is constructed that includes information on the overhang limitation of an actual AM process (Fig. 1c). It is constructed by using a front propagation that results in T AM = T layer in printable regions, but delays the propagation when the direction of propagation is lower than a given overhang angle α oh (i.e. violating the overhang limitation). By doing so, the isosurfaces of T AM are not flat as T layer , but bend around corners, as displayed in Fig. 1c.
The final step to detect the overhanging regions is to compare the two arrival-time fields T layer and T AM . The delay field τ of the T AM field is defined as When τ (x) > 0, the AM front is delayed compared to the layer front, which implies that the AM front has propagated in a direction lower than α oh . Consequently, overhanging regions are identified as those regions where τ (x) > 0. This can be seen in Fig. 1d, where the region τ (x) = 0 is manufacturable, and the rest is overhanging and, thus, cannot be manufactured. Furthermore, the value of τ (x) is a measure for the distance to the closest manufacturable region, giving an indication of how much material it would require to support a certain location. This continuous measure of overhang is beneficial for gradient-based optimization used in topology optimization.

Constructing the arrival-time fields
The construction of T layer is straightforward. Since the arrival time is a measure of distance to the base plate, this field is defined as: where b is a unit vector pointing in the build direction, f 0 the propagation speed, which can be interpreted as the printing rate, and assuming that the origin of the coordinate system is on the base-plate. For the second arrival-time field, T AM , the front propagation can be written as a boundary value problem and is governed by the Hamilton-Jacobi-Bellman equation: x ∈ ∂Ω 0 , min where Ω is the domain under consideration, T (x) is the arrival time at location x, fixed at T 0 at the boundary ∂Ω 0 , a is a unit vector indicating the local propagation direction: a ∈ S 3 , S 3 = { a ∈ R 3 | ∥a∥ = 1 } , and f s (x, a, α oh ) is a speed function, giving the propagation speed for a given location, propagation direction and overhang angle. The speed function is decomposed in a part dependent on the location, and a part dependent on direction of propagation: Let us for now ignore the location dependence, which is detailed in Section 2.3, and focus on the speed function f (a, α oh ). In order to be able to detect overhang, the speed function f should be chosen with care. The simplest speed function, used for isotropic front propagation (e.g. wave propagation in isotropic media), is f = c, where c represents a constant propagation speed irrespective of the propagation direction. This is depicted in Fig. 2a, where the distance from the origin (the red dot) to the surface gives the propagation speed in each direction. However, it is difficult to detect overhang with this speed function, as no information on the minimum overhang angle is governed by this speed function.
For a front that mimics the printing process, a conic propagation profile, as depicted in Fig. 2b, is suitable. It can be seen that the front propagation speed reduces to zero below the minimum overhang angle, and the propagation speed increases when the front propagates in directions other then the build direction, to compensate for the larger distance it travels to reach the next layer. Basically, the cone represents the region that can be built when starting from a single point. Propagating a front with this speed function would indicate the printable regions in a topology. Unfortunately, this speed function is numerically difficult to propagate, as it has a large anisotropy and a zero sideways velocity, which would result in infinitely large arrival times.
For numerical convenience, the conic speed function is changed to a cylindrical speed function as displayed in Fig. 2c. Compared to the conic speed function it has the same profile for propagation directions above the minimum overhang angle. For other directions, the speed is not set to zero, but to a finite value governed by the surface of the cylinder. This means the front can still propagate in non-manufacturable regions, albeit at a slower speed, and hence it will be delayed. Next to being numerically more tractable compared to the conic speed profile, the cylindrical speed function also has the benefit of yielding information on the severity of the overhang in terms of the delay, which will be utilized during the optimization. The cylindrical speed function for a given overhang angle α oh and propagation direction a is described by the following equation (see [15]): where P is the projection on the plane orthogonal to b, defined as P = (I − b ⊗ b), with x ⊗ y denoting the outer product between the vectors x and y. Propagating a front with this cylindrical speed function gives the required arrival-time field T AM , as shown in Fig. 1c.

Incorporation in topology optimization
Thus far, the front propagation was carried out on a given part domain. However, with topology optimization the geometry is implicitly defined. In this study, density-based topology optimization is used [26], where the topology is defined by a pseudo-density field ρ(x), which indicates for each location if it is either void (ρ = 0), contains material (ρ = 1), or has an intermediate value (0 < ρ < 1). The topology optimization algorithm with overhang filter is schematically depicted in Fig. 3. In the first step, the pseudo-density field ρ is filtered to control length scale and to prevent checkerboarding [27]. The filtered densities ρ * are thus a weighted average of the densities in their surrounding region up to the filter radius r [28].
The filtered densities ρ * are input to the overhang filter, which results in the printable density field ξ , a density field similar to ρ * but with the overhanging regions removed. The objective and constraint evaluation is then carried out on ξ , instead of the filtered densities ρ * one would normally use.
The overhang filter comprises three steps. First, the filtered densities are processed to serve as a scaling field φ for the speed of the front propagation. The purpose of the speed scaling field φ is to only allow the front to propagate through material regions, and to delay it in void regions. With this step, the geometry given by the density field ρ * is coupled into the front propagation; otherwise, the front propagation would not be influenced by the density field at all. For simplicity a linear relation is chosen, and the speed scaling field φ is defined as where v void (0 < v void < 1) is the scaling of the propagation speed in void regions, which is typically chosen as 0.5. As stated in Eq. (4), the speed function for the front propagation (Eq. (5)) is scaled linearly with φ: Then, in the second step, the front propagation is performed with the scaled speed function f s , to obtain the arrival-time field T AM .
In the last step of the overhang filter, the arrival-time field T AM is post-processed to obtain the printable density field ξ . First, the delay is evaluated (Eq. (1)): which is transformed into the printable density field ξ by a function h: Since manufacturable regions are defined as those regions where the delay τ = 0 (Eq. (1)), h should be such that h(0) = 1. Regions with a delay τ > 0 are not manufacturable or void in the original design, and therefore h should decrease towards zero for increasing values of τ . The choice of h dictates the transition of material to void in the printable density field. In order to retain the original length scale of the density filter with a filter radius of r , the following relation has been chosen: where smax p is a smooth maximum operator, and p determines the smoothness. The relation between ξ and τ is displayed in Fig. 4. In the manufacturable regions where τ = 0, the printable density becomes ξ = 1, and for higher values of p, ξ decreases linearly towards 0 with increasing delay, comparable to a density filter [28]. As can be seen, p should not be chosen too small, as that will result in printable densities significantly larger than one for τ = 0. In this study, p is chosen as p = 10. The relation specified in Eq. (10) implies that when the front is propagating at the void speed v void , the transition from ξ = 1 to ξ = 0 will take place within the original filter radius, as depicted in Fig. 4. However, this is not true for every transition from ξ = 1 to ξ = 0. First of all, due to the original density filter, the propagation speed will not decrease to v void instantaneously, but gradually ramp off, and second, the propagation speed is not equal in the build direction and in directions orthogonal to the build direction if α oh ̸ = 45 • . Therefore, the length scale can be influenced by the application of the overhang filter. If an exact length scale is required, the overhang filter can be applied before the density filter. However, this will reintroduce some overhang due to rounding of the density filter, and therefore the density filter is applied first in this work.
With h specified in Eq. (10), the prescribed values T 0 for the front propagation at the boundary can be determined (Eq. (3)). Since material points at the base plate ∂Ω 0 are printable, it should hold that At the base plate, T layer = 0, and thus ξ = h(τ ) = h(T 0 ). To satisfy Eq. (12), the initial arrival time at the boundary is chosen as where h −1 is the inverse function of h such that h −1 (h(ρ * (x))) = ρ * (x). Note that due to the logarithm, T 0 = ∞ for ρ * = 0. If infinite values cannot be used, T 0 should be set to an arbitrary high value wherever ρ * = 0. In the local update, the arrival time of a node x i is calculated from three nodes with known arrival times, by finding a location x c on the triangle Q spanned by these nodes, that minimizes the arrival time at x i .

Numerical implementation
The two main challenges of the 3D implementation of the overhang filter are (i) parallelizing the front propagation problem to achieve the desired scalability with problem size, and (ii) efficiently propagating the front on element level, i.e. the local arrival time update, which is a key component in the numerical front propagation scheme. This section is intended to aide with the numerical implementation of the method and mainly concerns implementation details. Readers who want to obtain a general overview of the front propagation-based overhang filter can choose to continue with the numerical results in Section 4.

Updating local arrival time in 3D
Most front propagation algorithms contain a function where within an element the arrival time of one node is calculated based on the known arrival time of the other nodes. This local update is used to propagate the front throughout the domain, as with each newly calculated node, arrival times for other unknown nodes can now be computed. This update is performed multiple times for each node from different directions, up to the number of elements that contain the node. Eventually, the update resulting in the lowest arrival time is accepted. Because of the multiple evaluations per node, it is the function that is called the most times in the propagation algorithm and thus it is essential that the update is numerically inexpensive. The local update is the only part of the algorithm that is different from the 2D implementation presented in [15], although a similar approach is taken to evaluate the local update.
In the following, the local update is detailed for a tetrahedron, as any other polygonal element can be constructed from it. It is possible to define the local update for other element types as well. Consider a tetrahedron x i x j x k x l as displayed in Fig. 5, where the arrival time T i at node x i is unknown, and the arrival times T j , T k , and T l on the remaining nodes are known. The triangle Q = x j x k x l spanned by the nodes with known arrival times. If these arrival times are equal, Q is the front of the propagation. A point x c on this triangle can be defined in parametric form: with 0 ≤ ζ 1 ≤ 1, 0 ≤ ζ 2 ≤ 1 and ζ 1 + ζ 2 ≤ 1. The arrival time at x c on the front is a linear interpolation between the known nodes and is defined as Now, the arrival time T i , assuming it is updated from x c , can be calculated as the distance ∥χ ∥ between both points divided by the propagation speed f s , plus the arrival time at x c , T c : where χ = x i − x c , a is the direction of propagation defined as a = χ / ∥χ ∥ (Fig. 5), and f s (φ i , a, α oh ) is the speed function as defined in Eq. (7). Note that the propagation speed is only dependent on the speed scaling value φ i , and thus density ρ i , of the target node. This is done to simplify the front propagation and sensitivity evaluation, as it introduces less dependencies as compared to interpolating the speed scaling values φ of all the nodes in the element.
What remains is to determine from which point x c on the triangle Q the arrival time of x i should be updated. Consequently, the local update is essentially a minimization problem where a point x c on triangle Q is sought that minimizes the arrival time T i at the unknown node. Following [29], the equation for the local update for node x i as updated from triangle Q is given by where T i jkl is the arrival time at node i when evaluated from nodes j, k, and l. The final arrival time of node is the minimum of its local updates. The remainder of this section will detail the solving of the minimization problem posed in Eq. (18). First, a list of potential locations that minimize Eq. (18) is determined, by carefully examining the possible scenarios in the minimization problem posed by the maximum and absolute function present in the speed function f s (Eq. (7)). Then, the arrival times resulting from these locations are evaluated and the minimum arrival time is selected.

A closer look at the local update function
The minimization problem posed by Eq. (18) is first solved on the plane that contains the triangle Q. It is later evaluated if the minimum is actually inside Q (i.e. 0 ≤ ζ 1 ≤ 1, 0 ≤ ζ 2 ≤ 1 and ζ 1 + ζ 2 ≤ 1). When the speed function f s as defined in Eq. (7) is substituted in Eq. (18), it can be rewritten as Here |b · χ | represents the length of χ projected on the build direction b, and ∥Pχ ∥ represents the length of χ projected on the plane orthogonal to the build direction. This can be seen as a right-angled triangle, where χ is the hypotenuse, and |b · χ | and ∥Pχ ∥ are the lengths of the legs, as displayed in Fig. 6. Next, the maximum function in Eq. (19) can be interpreted as follows. Let C be a double cone with vertex x i , its axis parallel to the build direction b and its aperture such that the generatrices of the cone make an angle α oh with the base plate (see Fig. 6). The following relations apply: This implies that inside the cone (the red area in Fig. 6), |b · χ | is dominant, and outside the cone tan(α oh ) ∥Pχ ∥ is dominant. This is also drawn for 2D in Fig. 7, where it can be seen that for any plane, indicated with the blue line, that does not contain x i , |b · χ | is minimized outside the cone, and ∥Pχ ∥ is minimized inside the cone. The maximum function is thus minimized on the cone edge ∂C when tan(α oh ) ∥Pχ ∥ = |b · χ |. Besides the maximum term, Eq. (19) also contains the interpolation of the known arrival times T c , which adds a linear field to the maximum term. Since the linear field T c has no interior optimum (Eq. (16)), the minimum of Eq. (19) will either remain on the cone edge ∂C when the gradient of this linear field is small, or the minimum will be at the bounds of the domain when the gradient of the linear field dominates. Based on these observations, the minimization problem is examined for two possibilities: 1. The maximum term is dominant. The minimum occurs at the cone edge ∂C. 2. The arrival time interpolation term is dominant. The minimum occurs outside of the cone C, on the edge of the triangle Q.
Both scenarios are discussed subsequently.  6. Given a point x c on the triangle Q = x j x k x l , the arrival time at x i is dependent on either the distance |b · χ | inside the given cone, or ∥Pχ∥ outside this cone, due to the maximum operator between the two (Eq. (19)). Minimum on cone edge. On the cone edge, both terms that appear on the maximum function in Eq. (19) are equal, and either one can be chosen to minimize over the cone edge. Since |b · χ | is piecewise linear, it is preferred over the quadratic function ∥Pχ∥. First, this function is minimized on the plane defined by the triangle Q. The update can be cast in the following form (see Appendix for details) where g contains the gradients of the update functions inside the cone, K, l and c 6 are parameters defined by the geometry of the cone-plane intersection, and y = [ ζ 1 ζ 2 ] T . This formulation minimizes a linear function on a cone-plane intersection, and can be solved explicitly using Lagrange multipliers ( Appendix).
If the minimum found by solving Eq. (23) is not inside the triangle Q, the minimum might occur on the intersection of an edge of Q with the cone. In order to find the edge-cone intersections, an algorithm described in [30] is used, which solves the following quadratic equation for a given edge x j x k and unknown x i : Minimum outside cone. If the minimum lies outside of the cone, the following equation is to be minimized on each edge of triangle Q: This equation can be rewritten as The stationary points are found at This again gives two solutions for γ , from which the minimum locations are calculated with Finding the minimum. Each of the minimization problems posed in the previous two paragraphs returns a number of potential minimizers x c for Eq. (18). Since the minimum might not be inside the triangle, and not on the edges, the three corners of Q are also appended to the list of potential minimizers. Finally, the true minimum can be found by evaluating T i with Eq. (17) for each potential minimizer, and accepting the one with the lowest value. Note that it is possible to exclude potential minimizers by looking at the second derivative, however because the evaluation of the minimum is extremely cheap, this did not result in a speed up.

Parallel front propagation
The local update described in the previous section can propagate the front from three nodes with a known arrival time to a fourth node. Another aspect of a front propagation algorithm is the order in which the nodes are updated. This is independent of the spatial dimension of the design domain, and in principle one could use the Ordered Upwind Method (OUM) [29], similar to the 2D overhang filter as presented in [15]. From this work it followed that in 2D, the computational time of the overhang filter scaled roughly linear with number of DOFs, and was about one to two orders of magnitude smaller, compared to the computational time of the finite element analysis (FEA) for a stiffness optimization. In 3D, the computation times continue to scale favourably for the front propagation. However, the FEA associated with the objective evaluation is commonly parallelized, reducing its computation time proportional to the number of processors available. It is therefore paramount to parallelize the front propagation as well, such that the overhang filter does not become the main computational burden of the topology optimization.

Sequential ordered upwind method
For the sake of completeness, we briefly discuss the sequential OUM detailed in [29]. The OUM is an extension of the Fast Marching Method (FMM) [31,32] that can also handle anisotropic speed functions. In the OUM, nodes are labelled either Far, Considered or Accepted. The Accepted nodes have their final arrival times, i.e. are not subject to further change. The Considered nodes have been updated at least once from the Accepted nodes, but their arrival time values might still change. Finally, the Far nodes have not yet been updated and their arrival time are initialized at ∞. Furthermore, Accepted nodes can have the additional label AcceptedFront when they are adjacent to at least one Considered or Far node. The AcceptedFront is the current frontier of the propagation. When a node is updated, its arrival time is calculated from the nodes on the AcceptedFront within a radiush(F 2 /F 1 ), whereh is the maximum element edge length, and F 1 and F 2 the minimum and maximum of the direction dependent part of speed function given in Eq. (5). Thus, F 2 /F 1 represents the anisotropy ratio of the speed function, and for α oh = 45 • this is 2. The domain is initialized with all nodes labelled Far, except for the boundary nodes on ∂Ω 0 , which are labelled Accepted. Next, the Far nodes adjacent to the boundary are updated from the AcceptedFront nodes, using the local update described in the previous section, and become Considered. The algorithm then proceeds as described in Algorithm 1. When the algorithm terminates, all the nodes are in Accepted with the correct arrival times.
Algorithm 1 Front propagation 1: Move the Considered node with the lowest arrival time to Accepted. 2: Compute arrival times for the Far and Considered nodes within the radiush(F 2 /F 1 ) from the latest Accepted node, with adjacent nodes in the AcceptedFront, using the local update (Section 3.1). If the computed arrival time is smaller than the current value, update the arrival time. Label the nodes as Considered if they were previously labelled as Far. 3: If Considered is not empty, move to Step 1.

Parallel ordered upwind method
Several parallel implementations of the FMM have been presented (e.g. [33][34][35][36][37]). However, to the best of our knowledge, no parallel implementation of the OUM has been published. We therefore present the parallel implementation of the OUM by adapting the algorithm presented in [33] to accommodate the anisotropic front propagation.
The parallelization of the FMM presented in [33] is based on a domain decomposition. At the basis of the parallel implementation of the OUM lies the sequential front propagation described in Algorithm 1, but within a subset of the entire domain, i.e. the narrow band. That implies that Algorithm 1 is terminated either if Considered is empty, or if the node in Considered with the lowest arrival time has a value higher than a given band limit. As such, the front is only propagated a limited distance, instead of through the entire domain. Furthermore, for anisotropic speed functions, nodes are not necessarily updated from adjacent nodes. Therefore, there must be an overlapping region between adjacent domains, termed the ghost region, at least as wide as the update radiush(F 2 /F 1 ). Finally, the front propagation must be redone for all the Accepted nodes that have an arrival time higher than the updated ghost nodes, because these Accepted nodes can possibly get a lower arrival time from the ghost nodes updated from adjacent domains. An outline of the parallel OUM algorithm that is executed in every parallel domain, is listed in Algorithm 2. The algorithm mainly consists of a loop which performs a front propagation, updates the domain boundary with parallel domains, rolls back the front propagation, and performs a corrective front propagation with the updated ghost nodes.
In the domain decomposition strategy as proposed by [33], each processor has a fixed part of the domain. This has the benefit that only boundary information needs to be communicated to other processors. However, it also results in a large amount of idle time, as the front will only propagate briefly through each decomposed domain. We therefore implemented a shared memory approach per machine, where every processor can work on every domain. By having many more domains than processors, there is little idle time as each processor can pick a domain in which the front needs to be propagated. The downside is that between each machine, not only the boundary nodes, but all the information of every domain that has been updated needs to be shared, in order to allow every processor to work on every domain. This large amount of data communication could be avoided by assigning a fixed domain to every machine, and then further dividing these domains locally. This is however left for future work.
Algorithm 2 Parallel front propagation 1: Perform a front propagation up to the band limit. 2: For each ghost node determine the domain containing the minimum value, and scatter that value to the other domains. 3: Exit if there are no updated nodes and Considered is empty. 4: Move Accepted nodes that have an arrival time higher than or equal to the updated ghost node with the lowest arrival time to Considered, and add these nodes to the heap. 5: Perform a front propagation up to the band limit. 6: Increase the band limit, move to Step 1.

Sensitivities
The sensitivities of the front propagation problem can be calculated efficiently in an adjoint sense, as shown in [15]. The adjoint is calculated backward, in exactly the opposite order as the front propagation. However, when the front propagation is performed in parallel, the order of operations is no longer clearly defined, as several nodes are accepted simultaneously in different domains. Fortunately, for every node it is known from which nodes its arrival time has been calculated. Assume that for every node i, the array dependence(i) contains the indices of the nodes from which the arrival time of node i is calculated. In order to calculate the adjoint in the correct order we therefore propose the strategy outlined in Algorithm 3.
The sensitivity calculation can also be parallelized, where essentially this algorithm is executed in every domain, until the queue is empty, then the dependency and adjoint values of the ghost nodes are exchanged, and Algorithm 3 is restarted. This is repeated until all the sensitivities have been calculated. for all j ∈ dependence(i) do 7: Propagate the adjoint to node j

Numerical examples
The proposed front-propagation-based overhang filter is demonstrated on three cases, presented in the following, using the optimization scheme presented in Section 2.3. In all three cases compliance is minimized, and the optimization problem reads where f and u represent the load and displacement vectors, respectively. K(ξ ) is the stiffness matrix dependent on the printable densities ξ . V lim is the maximum allowed volume fraction and ρ min = 0.01. For more details on the compliance optimization problem and its sensitivities, the reader is referred to [26]. In all cases, a density filter is used with a radius of roughly 2h, whereh is the average edge length of the elements. The domains are discretized with linear tetrahedral elements using Gmsh [38]. Furthermore, the material properties are chosen as E = 1.0 N/m 2 and ν = 0.3, and the SIMP-model is used to interpolate the Young's modulus with p = 3 [39]. The topology optimization code utilizes the Portable, Extensible Toolkit for Scientific Computing (PETSc), which is used to parallelize the density filter and the FEA, and to handle the unstructured mesh [40][41][42]. The optimization is performed with the Method of Moving Asymptotes (MMA) [43], with default asymptote increase and decrease parameters of 1.2 and 0.7, respectively. For this, the PETSc based MMA class of the code presented in [44] has been used [45]. The topologies are visualized by displaying the printable density field for ξ ≥ 0.5. This gives an isosurface of ξ = 0.5 inside the domain, and ξ ≥ 0.5 at the bounds of the domain, as can be seen in Fig. 8. The topologies are displayed without any smoothing, and are obtained using ParaView [46].
Furthermore, similar to the 2D implementation, it is preferred to initially deactivate and then gradually activate the overhang filter, in order to apply the overhang filter to a better initial design. This is achieved with a linear Fig. 8. The cantilever beam design domain and boundary conditions. The light red surface at x = 0 is fully clamped, while a distributed load is acting on the front rib, and the build direction is equal to the positive z-axis. The displayed topology is optimized without overhang filter. The density field is displayed for ξ ≥ 0.5. This gives an isosurface of ξ = 0.5 inside the domain, and ξ ≥ 0.5 at the bounds of the domain.
interpolation between the filtered densities and the printable densities: where c is the continuation variable. The objective and constraints are then evaluated onξ instead of ξ . The continuation scheme applied is as follows: c = 0 for the first 10 iterations, then c is increased by 0.1 every iteration until c = 1 at iteration 20. Finally, all optimizations are terminated after 300 iterations. It was found that for the larger 3D cases, it can easily take up to 1000 iterations before conventional convergence criteria are met, such as the maximum change in design variables smaller than 1 · 10 −2 , or the relative objective change below 1 · 10 −6 . However, after 300 iterations the change in design is minute, and the largest improvement in objective observed by allowing the optimization to run for 1000 iterations instead of 300 was around 0.5%. Therefore, in order to save computational time, and also to mimic a practical engineering application where computational time will be the limiting factor, the number of iterations has been limited to 300, instead of waiting for convergence.

Case 1: the cantilever beam
The overhang filter is first demonstrated on the well-known cantilever beam problem, for several different overhang angles. The design domain is a block with aspect ratio 2:1:1, which is fully clamped on one side and a distributed load is acting on a rib on the opposite side, as displayed in Fig. 8. The resulting design is to be printed in the positive z-direction, as indicated by the build direction b, and the base plate coincides with the bottom of the domain (z = 0). The filter radius is chosen to be 15 mm, the volume constraint is set at 20%, and the domain is discretized with an unstructured tetrahedral mesh, consisting of 4.3 · 10 6 elements and 756 · 10 3 nodes, resulting in an average edge length of 8 mm.
The resulting designs are shown in Fig. 9. It can be seen that the design without overhang filter contains a large overhanging surface at the top. With an increasing allowable minimum overhang angle, the design changes towards a topology with two separated columns, connected with a plate to the bottom. A similar design can be observed in [20]. In Fig. 9c the topology is coloured by the angle between the surface and the base plate, with α = 0 • implying a surface parallel to the base plate. The view is from underneath the domain, with the bottom part cut off to allow a view on the downward facing surfaces. It can be seen that for the topologies optimized with overhang filter, there are a lot of green areas, which are surfaces that lie exactly at the minimum overhang angle. Overall the overhang constraint is satisfied, except in some small regions when two surfaces meet, or some individual elements due to the mesh roughness. These small localized overhanging regions are not an issue in AM practice, as they can be printed without support.  Fig. 8 has been used. In row (c), the colours represent the angle α between the base plate and the isosurface, chosen such that yellow-red colours indicate overhanging surfaces for the given overhang constraint. It can be seen that all designs are practically free of overhang.

Convergence and objective
An interesting aspect of applying an overhang filter is the effect on the convergence behaviour and the final objective. The convergence graphs for the cantilever beam problem are displayed in Fig. 10. For the first 10 iterations, the overhang filter is inactive and thus all optimizations have the same objective values. During iterations 10-20, the overhang filter is activated gradually as descried in Section 2.3, and there is a significant increase in objective for α oh = 45 • and α oh = 60 • . Then, the designs adapt to the overhang filter, and the objective steadily decreases until the optimization is stopped at 300 iterations.
As expected, the optimization without overhang filter has the lowest objective. However, the performance of the overhang filtered optimization is counter-intuitive: the optimization with the lowest minimum overhang angle has the worst performance, while the largest overhang angle performs best. The reason for this behaviour is the fact that the optimizer can use the overhang filter to make smaller features on overhanging surfaces than the filter radius. This effect increases with increasing minimum overhang angle, as larger minimum overhang angles have a lower sideways propagation speed (see Fig. 2c, for larger overhang angles the height of the cylinder remains the same, but the radius decreases). This means that the delay increases faster over distance, which produces sharper features. Therefore, the α oh = 60 • setting can make the sharpest edges, and achieves a lower objective. This can be prevented by placing the density filter after the overhang filter, however, this will reintroduce some overhang due to rounding. Therefore, we choose to place the overhang filter after the density filter. Furthermore, this behaviour only emerges in cases that are relatively insensitive to the design, such as this cantilever beam.

Parallel speedup
The cantilever beam problem is used to evaluate the parallel performance of the overhang filter. The average wall clock time over the first 10 iterations of the FEA, the overhang filter, and the corresponding sensitivities is plotted in Fig. 11, for various numbers of processors. It can be seen that the overhang filter is roughly a factor 10 faster than the FEA. The speedup is defined as T np /T 1 , where T 1 is the wall clock time for the serial process, and T np is the wall clock time for np processors. The speedup of the overhang filter is close to linear for a low number of processors, and drops for larger number of processors, comparable to the FEA. The speedup is tested up to 20 processors, which was the largest number of cores available for a single computational node. Because the available cluster did not feature a fast interconnect, the speedup of the overhang filter did not increase when an extra node was added to the system. Therefore, a hybrid implementation between a complete domain decomposed approach as presented in [33], and a shared memory approach adopted in this work (see Section 3.2.2) would be faster, as it would require only boundary node communication.

Case 2: tension cylinder
The second case on which the overhang filter is tested is a cylinder under tensional loading. This is a critical test for the overhang filter, as for the lower volume fractions, most of the material that is needed for support is Fig. 12. The tension cylinder design domain with boundary conditions. The tension load acts on the grey ring, while the red surface is clamped in the x-direction, and the yellow surface is clamped in the y-direction. One point on the z-axis is fully clamped, as indicated. The build direction is equal to the positive z-axis, and the build plate is the z = 0 plane. not contributing to the structural stiffness. The design domain is a cylinder, with a distributed load acting on the top part of the outer boundary, as displayed in Fig. 12. The filter radius is 20 mm, and the domain is meshed with 17 · 10 6 elements and 2.9 · 10 6 nodes, with in an average edge length of 12 mm.
Without overhang filter, the resulting design will roughly be a solid disk starting from the top of the cylinder, with a height equal to the allowed volume fraction. The resulting topologies for different volume fractions with α oh = 45 • are displayed in Fig. 13. It can be seen that even for the lowest volume fraction, fully dense supports are generated, although they do not contribute to the stiffness. The supports fan out in a tree-like fashion to span an as large as possible area with the least amount of material. As the volume fraction increases, the topology resembles an inverted dome. For the higher volume fractions, the supports are also used to increase the stiffness, as they become interconnected.

Case 3: crane hook
The final case on which the overhang constraint is demonstrated is the stiffness optimization of a crane hook. Due to the capability to detect overhang on an unstructured mesh, it is straightforward to optimize a geometry generated in a CAD environment, with overhang constraint. The geometry of the crane hook is displayed in Fig. 14. It is a complex domain containing multiple curved surfaces, which would be difficult to accurately represent with at structured mesh. The design domain measures 455 × 491×118 mm (width × height × depth). A density filter with a radius of 3 mm is applied, and the elements have an average edge length of 1.5 mm. The volume constraint is set at 20%, and the build direction is in the vertical direction as displayed in Fig. 14. The build plate is at the bottom of the domain.
The domain is fully clamped at the top, with a vertical distributed load applied on the hook, representing a lifting force. Furthermore, this test case features a fixed solid region at the loaded surface and near the tip of the hook (the red and blue surfaces in Fig. 14). If the densities of the blue region would not be fixed, the material will be removed since it does not contribute to the stiffness. However, this region should be retained in the final topology to prevent the load from slipping off the hook. By simply fixing the design variables, the fixed regions will still be removed by the overhang filter when it is overhanging. We therefore added the following constraint: where Γ f is the fixed density region. This constraint is only satisfied when the printable densities ξ = 1 on the fixed region, and this successfully fixed the printable densities of the fixed region. The crane hook is the largest test case in the set, with the final mesh containing 50 · 10 6 elements and 8.8 · 10 6 nodes. The problem was solved using 160 cores divided over 8 nodes with two 10 core 2.2 Ghz Intel Xeon CPUs (E-2630v4) each. The average wall clock time for the FEA was 193 s, while the average time for the overhang filter was 115 s. The computational time for the sensitivities was negligible compared to the forward solve, 1.9 s for the adjoint sensitivities of the FEA, and 17 s for the overhang filter. Although the computational time required for the overhang filter is considerable at 60% of the FEA, one has to keep in mind that the overhang filter is only running at 1 node, compared to 8 nodes for the FEA.
The optimized design is displayed in Fig. 15. It resembles a common crane hook, with a curved stiffener at the back (the leftmost part in Fig. 15a) to counteract the torque on the hook due to the asymmetric geometry. The stiffener contains some small hollow sections to save material. Also the fixed region at the tip is made as thin as possible, for the most part it is only a shell. The design is completely printable, as can be seen on the bottom in Figs. 15b to 15d, where the members in the middle join under a 45 • angle. The decrease in performance compared to an optimization without overhang constraint is 14%.

Conclusion
In this work, the front propagation based overhang constraint as presented for 2D in [15] has been successfully extended to 3D. The two main challenges of the extension were the efficient propagation in tetrahedral elements, as opposed to triangles in 2D, and the parallelization of the front propagation in order to keep the computational times in the same order as the FEA, which is usually parallelized for 3D cases.
It is shown that also for 3D front propagation, a minimization problem can be set up for each tetrahedral element that can be solved by probing a limited set of cheaply obtainable locations. This enables the fast propagation through an unstructured mesh, for arbitrary overhang angle, contrary to minimum overhang angle filters that work on structured meshes.
The parallelization of the ordered upwind method was achieved by following the same strategy that has been presented for the closely related fast marching method [33]. However, instead of providing each processor with its own fixed domain, a different approach was taken where every processor can work on any domain. Although this reduces processor idle time and provides speedup for relatively small meshes (on the order of 1 · 10 6 nodes), the approach presented in [33] is likely to give better performance on increasing mesh sizes. Nevertheless, the parallelized overhang filter is about a factor 10 faster than the parallel FEA, when executed on a single machine.
The overhang filter is demonstrated on three problems to test the performance of the filter. It is shown that the filter handles arbitrary overhang angles, generates solid supports even when there is no benefit for the objective, and works on large unstructured meshes. With the framework for parallel 3D front propagation, further research will focus on different applications of front propagation for topology optimization, for example for accessibility of supports: the complete removal of overhang is often not in the designer's best interest, and we thus would like to activate the overhang constraint only in areas where supports are difficult to remove.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.