The parametrized superelement approach for lattice joint modelling and simulation

From large scale structures such as bridges and cranes to small-scale structures such as lattices, mechanical structures today increasingly consist of “tuneable parts”: parts with a simple geometry described by a small set of strongly varying parameters. For example, although a bridge is a macroscale structure with a complex geometry, it is made from simple beams, bolts and plates, all of which depend on spatially varying, geometrical parameters such as their length-to-width ratio. Accelerating this trend is the concurrent improvement of, on one hand, Additive Manufacturing techniques that allow for increasingly complex parts to be manufactured and, on the other, structural optimization techniques that exploit this expanded design space. However, this trend also poses a challenge to current simulation techniques, as for example the Finite Element Method requires large amounts of elements to represent such structures. We propose to exploit the large conformity between parts inside the mechanical structure through construction of semi-analytic “Parametrized Superelements”, built by meshing with solid elements, reduction to a fixed interface and approximation of the reduced stiffness matrices. These new elements can be employed next to standard elements, enabling the simulation of more complex geometries. The technique we propose is applied to lattice structures and provides a flexible, differentiable, more accurate but still efficient way to simulate their elastic response. A net gain in total computation time occurs after simulating more than twenty lattices. As such, the proposed method enables large-scale parameter exploration and optimization of lattice structures for improved energy absorption, mass and/or stiffness.


Introduction
Lattices are defined by Ashby [2] as structures composed of struts connected at joints. However, in contrast to the macroscale frames encountered in bridges and cranes, the lattices considered in this paper exist on a smaller scale. They can therefore be viewed as materials themselves and their effective properties, tuneable by changing their topological layout, cover a wide range among a multitude of physical applications, such as acoustic [6], fatigue [21] and energy absorption [35]. In this paper, the mechanical properties of lattices are studied. The geometric complexity of lattices poses a challenge for their construction. Whereas large truss structures can be built with traditional manufacturing techniques such as milling and welding, lattices almost always require Additive Manufacturing (AM) [23]. Even with proper tuning of the AM process parameters, defects can occur that negatively impact the mechanical performance [42]. Nonetheless, the inherent topological complexity is also an opportunity for optimization. Although structural optimization dates back to at least 1904, when Michell [24] published his work on optimal frame structures, and Topology Optimization (TO) was invented by Bendsøe and Sigmund [3,4] in the 1980s, the last decade has seen a renewed interest in this area. The area of multiscale topology optimization in particular, sparked by improved AM capabilities, has received more attention since optimizing structures at multiple scales can only improve their performance compared to mono-scale structures [38]. Although multiscale implies optimization of a structure's layout at multiple scales, most often only two are used: the macroscale and the microscale. Although many different algorithms exist, the resulting designs often look similar to lattices on every scale: sets of beams connected at joints. Some algorithms therefore preemptively constrain their microstructural layout to consist only of lattice structures [18,25,40]. Furthermore, for compliance optimization the theoretically optimal structure, a rank-N laminate, could be viewed as a lattice that exists on multiple scales [12,38]. The combination of multiscale TO, AM and lattices leads to structural designs existing on multiple scales and containing intricate lattices whose topology changes throughout the design (see Fig. 1). Accurate and efficient simulation of such lattices is thus paramount.
Current state-of-the-art lattice simulation techniques, further elaborated in Sect. 2.1, are based on other solid or beam elements. Although a solid element mesh is accurate when it has converged, an enormous computational effort is required to do so. In contrast, a beam element represents a beam with only twelve degrees of freedom (dof) and thus requires orders of magnitude less computation time. However, beam elements are unable to represent the strut joints, i.e. the regions where struts meet. Although certain modifications exist in literature (see Sect. 2.1) the underlying problem is not solved, making them at best only accurate for a small set of joints. This paper proposes the use of a so-called "joint element", in combination with the standard beam element, to keep the computational efficiency of beam elements while improving their accuracy.
The main challenge is that joints have too complex a geometry (see Sect. 3) to derive an analytic joint element. The joint is therefore meshed with solid elements and a "dual form" of the standard straight face constraints (see Sect. 2.3) is applied numerically, after which static condensation (see Sect. 2.2) leads to a "joint superelement". Unfortunately, this "direct" approach (see Sect. 4) is too computationally expensive since static condensation is a costly procedure that must be repeated for every joint inside a lattice (of which there can easily be thousands, see again Fig. 1). We overcome this obstacle by constructing a more efficient but sufficiently accurate approximation based on samples, thus trading the online cost for an offline cost and enabling efficient simulation of a wide variety of lattices. This "approximate" approach (see Sect. 5), reduces a variety of joints to their reduced representation, constructs an approximation of that representation based on a tensor decomposition and then expands the approximated, reduced representation to create an analytical and efficient representation of the joint elements. The results in Sect. 5.4 show that our proposed method decreases the online cost by a few orders of magnitude while retaining the accuracy of the direct method, at the price of a larger offline cost.
Another illustration of the proposed lattice simulation is presented in Sect. 6 where we apply it to a lattice with increasing relative density. The accuracy is shown to be much closer to that of the solid element mesh rather than a beam element mesh, while the online cost is reduced from almost half an hour to forty seconds, representing a speedup factor of 45. The offline cost is roughly the time needed to simulate twenty lattices with a solid element mesh. Thus, a net gain in computation time is observed when simulating more than twenty lattices, e.g. during optimization, or when lattices become so large that using solid elements requires too much computer memory. Finally, Sect. 7 draws a conclusion and specifies directions for future work.

Background
Three research fields related to the developed method are now discussed. First, Sect. 2.1 discusses advantages and disadvantages of the current techniques used for simulating lattices that exist on a single scale. Next, Sect. 2.2 discusses how lattices are simulated when they fit inside a larger-scale structure. Finally, Sect. 2.3 refreshes beam theory since the PSE method developed here is strongly tied to it and in particular the employed face constraints.

Mono-scale lattice structure simulation
Lattice structure behaviour under mechanical load is usually simulated with the Finite Element Method (FEM), although no standard meshing approach exists specifically for lattices. Two element types are generally used for lattices in literature: three-dimensional solid elements and one-dimensional beam elements [16].
Their difference lies in the trade-off between accuracy and efficiency. Solid element meshes can accurately capture the mechanical response but require the mesh to be sufficiently fine, leading to high computation times. On the other hand, beam element models require less computational resources: a single beam can be modelled with an element based on beam theory using only twelve dof. However, such models suffer from a diminished accuracy for two main reasons. First, the assumption that planes perpendicular to the beam axis remain planes (see beam theory in Sect. 2.3), breaks down for stout struts, i.e. beams with a low length-to-width ratio. Second, and more important, beam elements alone cannot capture the joints, i.e. the locations where beams are connected [9,10].
The effects of these joints on the mechanical behaviour of lattices is significant for two reasons. First and foremost, even for slender struts modelled accurately by beam theory, joints may significantly contribute to the overall stiffness. This effect increases sharply with the beam radius and is only amplified in AM. Second, in bending-dominated structures (e.g. the diamond lattices in this paper) high stresses occur in the joints, which greatly influence the fatigue life [9,21].
Since the breakdown of beam element accuracy is inherent to the use of beam elements, research has recently focused on incorporating the joint stiffness into the beam elements. Labeas and Sunaric [20] increase the diameter by 40% at the first and last 10% of every beam. In a paper by Smith et al. [32] follow a similar approach but different values are used. Luxner et al. [22] increase the beam stiffness a 1000 times in a spherical region around the strut joint. Finally, Dong and Zhao [9] use an optimization algorithm to choose the parameters of their proposed "joint stiffening element": another beam element with adaptations at the beam ends.
Although all the above approaches report reasonable accuracy with good efficiency, they lack in flexibility and extensibility to deal with various lattice structures. For example, the joint stiffening element developed by Dong and Zhao is optimized for joints inside Vintiles lattices but is not guaranteed to work for other lattice types. Furthermore, adapting beam elements to incorporate the joint stiffness hides the underlying issue: the lack of proper element to represent joints, something also mentioned by Helou and Kara [16]. The method presented in this paper proposes a generic way to create such a "joint element" and use it inside lattice simulation frameworks.

Multiscale lattice structure simulation
Although the previous section showed that simulating lattices on a single scale is already computationally expensive, lattices are often fitted inside larger structures (see again Fig.  1). A short overview of two simulation methods for such multiscale structures is now given.
A first, prevalent method for multiscale simulation is numerical homogenization. Through the assumption of a separation of scale, detailed microscale structures can be viewed from the macroscale as materials with effective properties determined by their microstructural layout. For lattices made by repetition of a unit cell, computation of the homogenized  [13,19] properties can occur with reasonable accuracy on a single unit cell when Periodic Boundary Conditions are used, provided the scale separation assumption holds. The lattice can then be replaced with a simple iso-, ortho-or anisotropic material, greatly reducing the computational effort at the macroscale [41,45]. However, the simulation of lattices still suffers from the same efficiency-versus-accuracy issues discussed in the previous section. A recent technique, called FE 2 , relies on the same homogenization assumptions but uses an iterative procedure [41]. Instead of extracting homogenized material properties, macroscopic strains are applied to a microscale structure. The resulting microscopical stress is then computed and the average is used to update the macroscopic equilibrium. Although the iterative procedure improves the multiscale accuracy, the numerical efficiency is further reduced.
A second multiscale simulation approach is the use of superelements or substructuring. As a standard technique in FEM this is included in commercial packages such as Simcenter Nastran [30]. In substructuring, a structure is meshed and the resulting elements are grouped into superelements connected by their exterior dof. These superelements are then reduced to their exterior dof via static condensation, also known as Guyan reduction. In this sense, superelements are Reduced Order Models (ROM). As noted by Romeo and Schulz [29], the repetitive nature of metamaterials made from reoccuring unit cells can be exploited with this approach, since in practice only a single reduction must be performed. However, when a unit cell changes throughout the structure, many superelements must be constructed and the high computational cost associated with this procedure quickly exceeds the gains. To alleviate this problem, the literature employs Parametric Reduced Order Models (PROM): ROM that are allowed to vary in a select set of parameters [1,26,34]. For example, Amsallem et al. [1] use a PROM approach for aeroelastic modelling of fighter jets with a varying Mach number. These models are usually created by sampling the (possible high-dimensional) parameter space with a highfidelity model, reducing these samples to a ROM and fitting a function through the reduced matrices.
Three main challenges faced by literature on PROM are now discussed. First, in order to allow fitting a function through the space of reduced matrices, they must be compatible, i.e. have the same size. For example, Panzer et al. propose standard techniques for model order reduction and then introduce novel projection methods to ensure the reduced models are compatible [26]. Second, for the specific case of approximating stiffness and mass matrices it is essential that the PROM method conserves key properties of these matrices such a symmetry and positive semi-definiteness. As an example, Speet [34] uses a projection to the manifold of positive definite, symmetric matrices introduced by Pennec et al. [27] to ensure property conservation, whereas Xu et al. [43] use a Cholesky factorization to ensure symmetry and positive-definiteness of their stiffness tensors modelled by neural networks. Third, the curse of dimensionality is a severe hurdle when constructing a PROM. Consider as a first example the Approximation of Reduced Substructure with Penalization (ARSP) method developed by Wu et al. [40]. Here, a two-dimensional cubic microlattice is condensed to its exterior dof and the resulting stiffness matrices are approximated by creating a basis for the reduced stiffness matrix space with the proper orthogonal decomposition (POD) and approximating the coordinates in that basis with polynomial, multi-dimensional basis functions (although in practice only one parameter, the density, is used). As a second example, the approach by Imediegwu et al. [18] is considered. Here, they simulate a large number of lattice structures with varying beam radii through meshing with hexahedral elements. Next, these complex models are reduced with homogenization and approximated using least squares with global polynomial basis functions (up to degree six). Due to the high number of parameters (six in the case of Imediegwu et al.), the curse of dimensionality hampers the approximation efforts. For example, Imediegwu's global basis functions exist in a six-dimensional space, but a simple second-order polynomial in such a space already requires optimization of 6+2 2 = 28 coefficients 1 . The approximation techniques employed by Wu et al. [40] and Imediegwu et al. [18] thus become intractable for higher dimensionalities.
The approach described in this paper could be classified as PROM. First, the compatibility problem is solved by using straight face constraints, leading to a fixed and reusable interface. Second, the property conservation issue is solved by three additional reduction steps that ensure symmetry, positive semi-definiteness and the correct rigid body modes of the approximated stiffness matrix. Finally, we circumvent the curse of dimensionality by employing the canonical polyadic decomposition (CPD) from Hitchcock [17].

Basics of beam theory
Beams are widely used in architectural and mechanical designs, leading to a vast literature describing their mechanical properties: beam theory. Discretization of beam theory leads to one-dimensional beam elements which, as discussed in Sect. 2.1, can be used to simulate lattices. A short overview, without going into much detail about different beam theories (e.g. Timoshenko versus Euler beam theory), is given here to highlight the main assumptions and simplifications. For more information, the reader is referred to these recent review papers [8,28].
Assuming linear elasticity holds, mechanical structures are modelled by three general equations: the equilibrium equations, the strain-displacement relations and the constitutive equations. For isotropic materials, these can be combined into the Navier-Cauchy displacement formulation where λ and μ are Lamé parameters, F and u are force and displacement vectors in R 3 , respectively, ∇ is the Nabla operator and · is the dot operator. In combination with proper boundary conditions this is a well-posed problem often solved with FEM since analytical solutions for general structures are scarce. In contrast, a key property of beams, namely their high aspect ratio (the ratio between their length and width), allows for an approximate differential equation with solutions that are in some cases even analytically computable. For example, when forces are applied as point forces at the end of a beam, Timoshenko beam theory predicts widely known second-order polynomial solutions for the beam displacement.
Consider at a beam boundary face the resultant force f ∈ R 3 and moment m ∈ R 3 of the applied traction t ∈ R 3 , defined as averages over the area and implicitly assuming the variation over the small width is limited: and where × is the cross product and A and p C are the area and the centroid of the boundary face , respectively. They can be thought to stem from a first-order approximation of the traction vector t n , where n denotes the beam plane normal, although the traction vector is allowed to differ from this linear approximation (hence the ≈).
Here, t f = f A is the traction component due to the force and t m =Ĵm is the traction component due to the moment. Here,Ĵ is a matrix containing second moments of area defined in Appendix A.
A kinematic constraint is placed on the boundary displacement u to ensure planes perpendicular to the axis remain so after deformation. This constraint, referred to as the straight face constraint, reduces the dof to a translation u C and a rotation θ C : Through this constraint, the Navier-Cauchy equation can be reduced to a differential equation relating the beam deflection to the forces and moments applied at the beam ends.

Problem statement and methodology
This section lays the groundwork for the introduction of methods discussed in the next two sections by defining the lattice simulation problem. Two topics are discussed. First, a geometrical description of the considered lattices and joints is given. Second, a general overview of the lattice simulation methodology and the idea of a "joint element" is properly introduced.

Lattice geometry
In this paper, lattices are made of beams connected at joints and can therefore be considered truss structures similar to those seen in cranes and bridges. Although any configuration of beams is possible, this paper restricts itself to lattices built by repeating a unit cell topology. Figure 3 shows four common configurations: a cubic (Fig. 3a), a body-centered cubic (BCC, Fig. 3b), a diamond (Fig. 3c) and an octet (Fig.  3d) unit cell. A lattice containing a BCC unit cell repeated three times in every direction is from now on called a 3×3×3 BCC lattice. This paper focuses on diamond lattices containing multiple unit cells. The beam radii are allowed to change throughout the structure, hence the focus on multiple unit cells instead of one.
Geometrically, we represent lattice beams by cylinders and the joints by spheres with one trimmed circular surface for every connecting beam. The radii of these "joint spheres" are chosen such that the beams remain cylindrical, i.e. such that the trimfaces belonging to different beams do not overlap. To emphasize this, the goal of the joints is to ensure the beam geometry is well-defined and can be captured by beam theory. An exploded-view CAD representation of a joint with four connecting beams is shown in Fig. 2. Note that, once a unit cell type like the diamond is chosen, the angles between the beams are not allowed to change. Only the beam radii can change during for example a parameter exploration or an optimization procedure.
For ease of programming, we restrict lattices in this paper to exist in a hexagonal bounding box. The parts of the joints and beams that lie outside the bounding box are thus trimmed away. Beams on a boundary face or edge are thus cut in two or four, respectively. Similarly, joints on a boundary face, edge or corner are cut in two, four and eight, respectively. Finally, the radii of boundary joints are slightly larger than internal joints to ensure the beams stay cylindrical. Figure 4 shows a 2 × 2 × 2 diamond lattice containing all four types of joints that can occur in a diamond lattice.  Note that other types of bounding boxes can be chosen and only affect the exterior of the lattice structure. In practice, lattices are fitted inside more difficult bounding boxes, leading to more difficult geometries for the boundary joints. The hexagonal choice is made here for simplicity and to focus on the internal lattice joints. Combining two lattices together causes the face joints at the connection face to become internal, although numerically the applied straight Different types of lattice joints inside a 2 × 2 × 2 diamond lattice. The joint radii are relatively large with respect to the beam radii to highlight the joint geometries. Visualization with Simcenter 3D [31] face assumption will make these joints stiffer than standard internal joints.

General lattice simulation methodology
As introduced in Sect. 2.1 and experimentally shown in Sect. 6, lattice structures are simulated with either solid elements, achieving high resolution and accuracy at the cost of low numerical efficiency, or with beam elements, easing the numerical cost but greatly reducing the accuracy. The main issue with the beam element model is not that beam theory is inaccurate but rather the lack of joint modeling. To incorporate the joint stiffness, which becomes increasingly important when a beam's radius increases relative to its length, some modifications to standard beam elements have been proposed (see again Sect. 2.1). However, these modifications are quite arbitrary and not generally applicable.
In the proposed methodology, the beam element mesh is changed in two ways. First, the beam elements are shortened to prevent them from touching in a single point. Instead, the beam elements now reflect the beam geometry. Second, a "joint element" is created to represent the joints that connect the shortened beams. A sketch of the proposed change is shown in Fig. 5.
A key element of the joint element that sets it apart from previous literature, is its predefined interface to the exterior. Whereas the techniques described by Romeo and Schulz [29] and Wu et al. [40] reduce their substructures to an exterior number of dof that depends on the meshing, here this interface is fixed, leading to more reusable superelements.
In particular, the interface is as follows. Each joint element has one node with six dof for each external face, i.e. for each connected beam face and each boundary face. The Internal joints in a diamond lattice (i.e. those that are not trimmed away by the bounding box, see Fig. 4), have four beam faces and are thus represented by a joint element with four nodes. The corner joint has one beam face and three boundary faces and thus also has four nodes. The edge and face joints have three external faces in total and their joint elements contain three nodes. All nodes are placed at the centroid of the external face they represent.
In order to generically use a joint element in a Finite Element (FE) program, only the joint stiffness matrix K j ∈ R 6n f ×6n f is required. Here, n f is the number of nodes of the joint element, representing the n f external faces of the joint it represents. The six degrees of freedom are divided into a translation u f and a rotation`f . Translational stiffness coefficients relate forces to translations and thus have units N/m. Similarly, rotational coefficients relate moments to rotations and thus have units Nm/rad = Nm. Finally, "mixed" coefficients relate forces to rotations and moments to translations and have units N.

Direct construction of a joint element
This section deals with the direct construction method for the joint element introduced in the previous section. Constructing a joint element similar to building beam elements, e.g. through discretization of beam theory, is hampered due to the absence of a corresponding "joint theory". Furthermore, constructing such an analytical joint theory from scratch by starting from similar assumptions is impeded due to its complex geometry: whereas a beam's one-dimensional nature allows for easy construction of shape functions (see e.g. the Euler-Bernoulli beam theory simplification of u x = −zθ if x is the axial direction), finding proper shape functions for three-dimensional lattice joints is remarkably harder. A first alternative is now proposed, which is from now on referred to as the Direct Parametrized Superelement (D-PSE) approach. First, Sect. 4.1 explains the four-step methodology of the D-PSE method. Section 4.2 then discusses some implementation details that ensure the efficiency and accuracy of the method. The discretization error is then studied in Sect. 4.3 by means of a mesh convergence experiment. This section also discusses in more detail the discretization error on the rigid body modes, which (being the eigenvectors for the zero eigenvalues of the stiffness matrix K j ), have the largest effect on the mechanical response. Finally, Sect. 4.4 discusses the D-PSE method in general, highlighting its main advantages and disadvantages and providing arguments for the method explained in the next section, the Approximate Parameterized Superelement (A-PSE) method.

Methodology
A D-PSE stiffness matrix is constructed in four steps, as visualized in Fig. 6. First, the joint geometry is modelled with a mesh of solid elements, for a total of N d internal dof. Here, internal denotes the fact that these degrees of freedom are not directly connected to other beams or joints. The inter- are the displacement and force vector, respectively. The internal force vector f int is kept for completeness but is zero for this application.
Second, the n f nodes of the joint element are added at the centroids of the boundary faces for a total of 6n f exterior dof, which are captured in the external displacement vector u ext : Since the external dof are (for now) disconnected, the mesh is represented by the extended system of equations or, more briefly, by K e u e = f e . Here, f ext is the external force vector containing the forces and moments applied by the surrounding structure.
Third, a dual form of the straight face constraints is applied to connect the interior and exterior dof. Whereas standard beam theory (see Sect. 2.3) applies "primal" straight face constraints through Eq. (5), thus fixing every point on the face rigidly to the exterior dof and disallowing any warping, dual straight face constraints only enforce the faces to be straight "on average", resulting in a more relaxed constraint. Furthermore, whereas beam theory finds the resultant forces and moments through averaging of the traction vector via Eqs. 2 and 3, the dual form intrinsically requires the traction vector to be a linear, "straight" function. The relation between the primal and dual form is further visualized in Fig. 7. The primal versus dual terminology is adopted here to highlight the similar yet mirrored assumptions and rela-  Consider a joint with n f external faces. Each external face has a translation vector u C and a rotation vector θ C , where the subscript C refers to the centroid. The following two dual assumptions are made. First, the traction vector on a face is now rigidly constrained to be a linear function: Second, the face displacement is constrained to be a linear function, but only approximately: Here, u C ∈ R 3 and θ C ∈ R 3 are the face's dual displacement and rotation vectors, respectively. Note the difference with standard beam theory (Eqs. 4 and 5), where the equals and approximation sign are switched. The approximation is now enforced through the zerothorder average 1 A A · d A and the first-order average A · × r C d A: Here, r C denotes p − p C . Elaborating leads to expressions for the translational and rotational dof of the face: Here,Ĵ is a matrix containing second moments of area and is defined in Appendix A. This appendix also provides an expression for the applied force and moment as a function of the unknown terms in Eq. (7).
In general, the averaged straight face constraints expressed by Eqs. 11 and 12 relate the interior and exterior dof and can be expressed in general matrix form by or, more succinctly, by Cu e = 0. Using the method of Lagrange multipliers, Eqs. 6 and 13 can be combined into the following representation of the joint element Here,¯contains the Lagrange multiplier dof. Fourth and final, the extended system of equations is reduced to the exterior dof u ext via static condensation. Consider a system of equations containing unknowns that are kept, denoted with subscript "k", and unknowns that are omitted, denoted with subscript "o": This system can be expressed solely by its kept unknowns x k assuming A oo is invertible: The reduced matrix, also known as its Schur complement, is thus equal to A kk −A ko A −1 oo A ok . Applying this technique to Eq. (14), the joint element stiffness matrix K j , defined as the relation between the exterior dof and forces K j u ext = f ext , is

Implementation
Implementation details of the four steps of the D-PSE approach are now discussed. In general, the main lattice simulation framework is developed in Python (v3.8). For the first step, the joint geometry mesh is created by Gmsh v4.8.4 [14], producing a set of tetrahedra. These tetrahedra are converted to NX Nastran's second-order tetrahedral elements (CTETRA10) by adding nodes to every edge. The assembled stiffness matrix is generated with Nastran (available in Simcenter 3D [31]) and loaded into Python with the pyNastran module v1.3. For the third step, i.e. the application of the dual straight face constraints of Eqs. 11 and 12, the integrals 1 A A u d A and A r × u d A must be evaluated on n f faces. In the FE formulation, the displacement field u on a face is expressed as u = u i b i , where u i and b i are the nodal displacement and the basis function belonging to a node i and the summation goes over the nodes on the face. Elaboration leads to the evaluation of integrals A e b i d A and A e r j b i d A over an element surface A e , where r j is the j-th component of r.
Since the elements are of second order, the first integral is integrated exactly using second-order Gauss integration. The second integral contains the nonlinear factor r j and is integrated with a seventh-order Gauss integration scheme, which is found to be satisfactory in Sect. 4.3.
For the final step, i.e. the reduction of Eq. (14) to the joint element stiffness matrix K j , the computation of

Mesh convergence and rigid body modes
To assess the correct implementation of the D-PSE method, two studies are now presented. First, a mesh convergence study is performed. Second, a more in-depth study of the rigid body modes is conducted.
Mesh convergence As summarized in Fig. 8, the mesh convergence study is performed by repeatedly applying the Relative error convergence of an internal joint, compared to the solution found on the finest mesh containing around two hundred thousand elements D-PSE method to a joint with an increasing number of elements N el (ranging from a few hundred to at most 100 000). The number of elements is converted to a representative average "element size" h through h = 3 √ 1/N el . The difference between the successive joint element stiffness matricesK i j = K j (N i el ) and the "exact" joint stiffness matrix K j = K j (N el = ∞) is then studied. Of course, since the exact solution is unknown the solution found on the finest mesh is used. Figure 9 studies the convergence of the stiffness matrix of an internal lattice joint when its mesh is refined from around five hundred elements to two hundred thousand. The relative error with respect to the finest mesh ρ(h) = K (h) − K / K , where · denotes the 2-norm, is shown as a function of the element size h next to the orders of convergence p = 2 and p = 3 (i.e. ρ(h) = O(h p )). Clearly, the actual order of convergence lies somewhere between two and three. This is slightly lower than what's theoretically expected, since second-order tetrahedral elements permit an order of convergence of p = 3 on the displacements [44].
Rigid body modes From standard finite element theory it is known that an elemental stiffness matrix must be positive semi-definite, meaning all eigenvalues are nonnegative. The zero eigenvalues correspond to eigenvectors called rigid body modes, i.e. those displacements that do not induce any internal deformation and thus do not require any external work to achieve them. These modes are known a priori: for unconstrained three-dimensional structures there are six (three translations and three rotations) whereas two-dimensional structures have three (two translations and one rotation). More specifically, a three-dimensional rigid body displacement fieldũ rb (x, y, z) is defined as where u rb , θ rb and r rb are the rigid body translation, rotation and position with respect to the rotation center, respectively. A rigid body mode is thus the sum of a rigid body translation u rb and a rigid body rotation θ rb around an arbitrary reference point r rb , since any change in r rb can be viewed as a rigid body translation. Usually, the centroid of the mechanical structure is used as reference point.
Since the displacement of a mechanical structure is computed by solving the system Ku = f, the mechanical response for force-loaded structures is mainly determined by the lowest eigenvalues of the stiffness matrix. Since the rigid body modes correspond to the lowest possible eigenvalues, it is of utmost importance that these are approximated correctly. This is now shown theoretically.
To verify the theoretical adherence to the rigid body modes, one must ensure that applying such a displacement does not result in internal strain energy. For the standard finite elements used to create the internal stiffness matrix K int , this requirement is already known to hold 2 . Proving the adherence of a joint matrix to the zero-energy rule for rigid body modes therefore only requires showing that the dual straight face constraints captured in C ext do not destroy the properties of K int . This is done by substituting Eq. (16) into the dual straight face constraints of Eqs. (11) and (12) and choosing r rb = r C = r, leading to where the terms 1 A A θ rb × r d A and A u rb × r d A are dropped due to the definition of a centroid and where one can see that θ C = θ rb becauseĴ is invertible. Applying a rigid body displacement field to a face constrained with a dual straightness restriction thus leads to this face translating with u rb and rotating with θ rb , implying that no extra work is needed. This theoretically proves the adherence to the rigid body modes.

Discussion of the D-PSE approach
Although this direct, four-step approach could be seen as a variant of the superelement approach, what centrally distinguishes it from the superelement approach is the use of constraints to achieve a model order reduction. Compared to the standard primal formulation, the results obtained with the dual straight face constraints differ in two ways. First, due to the averaging a looser constraint is applied than the rigid constraint of the primal formulation, leading to a lower overall stiffness of the joint. This replaces the inherent stiffness overestimation that occurs when using primal straight face constraints of standard beam theory with an underestimation. Second, primal face constraints rigidly connect faces that share even just a single point, thus causing a singularity when computing the stiffness matrix (since technically some elements would become infinite). Such faces exist in joints belonging to the edge and the corner of a lattice (see Fig. 4). By contrast, the dual formulation produces proper, "finite" stiffness matrices when dealing with connected faces since the proposed displacement field is only enforced on average (see Eqs. 9 & 10).
Although the D-PSE method converges to the exact solution when the mesh is refined, the accuracy of the D-PSE method is somewhat lacking: a global relative error on the stiffness matrix of 1 to 0.1% is achieved for moderate element sizes. However, by construction the rigid body modes are respected up to a higher accuracy. The main accuracy bottleneck is the meshing error and the three-dimensional nature of this problem. Although one cannot get around the latter problem, the former certainly is improvable and left for future work.
The computational efficiency gain can be quite vast compared to standard FEM solutions. It is most notable when the structure contains N substructures but only n N are unique (up to translation and rotation), since then the D-PSE need only be applied n times and the remaining N − n substructures are replaced by a translation/rotation of the first n. Examples of such structures are truss bridges and lattices with constant beam diameters (see Fig. 1). However, in a generic case where a lattice contains a large variety of beams and joints, every joint requires a different joint element. Since creating such a joint element with the direct approach requires the relatively expensive computation of the Schur complement in Eq. (15), a more efficient alternative is required. This is achieved through approximation in the next section, where the expensive direct computation is performed offline and online lattice simulation only uses an approximate joint stiffness matrix representation.

Joint element construction through reduction, approximation and expansion
Since the D-PSE method becomes computationally inefficient for lattices containing joints with varying configurations, an approximation of the space of joint stiffness matrices is now attempted. The proposed method, from now on referred to as the A-PSE approach, contains three steps also illustrated in Fig. 10.
1. Section 5.1 discusses the reduction step. This step is meant to prevent the approximation of trivial relations such as scaling, rotation and material dependence and ensures approximated stiffness matrix retains important intrinsic properties such as symmetry and positive semidefiniteness. 2. Section 5.2 deals with the approximation step, in which the reduced stiffness matrix is first sampled (see Sect. 5.2.1) and then approximated with a tensor-based method (see Sect. 5.2.2). 3. Section 5.3 then discusses the expansion step, where the goal is to recover the stiffness matrix from its reduced, approximated representation.
Afterwards, Sect. 5.4 shows an example and discusses advantages and disadvantages. Fig. 10 Overview of the proposed approach. Since the direct joint computation is replaced by an approximation that is computed offline, the online computational cost is much lower

Reduction
Consider a joint with n f external faces, of which n b are faces connected to beams, classified as a joint of type T . In the first step of the A-PSE method, the joint is reduced to its most bare representation: all information that need not be approximated is removed to obtain a reduced number of both input and output parameters. This is done both to reduce the required number of samples (and thus the computational load) of the subsequent approximation step as well as to prevent any unnecessary reconstruction errors in the expansion step. Indeed, if intrinsic properties such as the stiffness matrix's symmetry are not handled properly before approximation, then after approximation they will only hold up to the approximation error.
Two types of reduction are used: those on the input or sample space are discussed in Sect. 5.1.1 and those on the output or stiffness matrix space are discussed in Sect. 5.1.2.

Sample space reduction
Two sample space restrictions are employed.
First, the material dependence is removed. Although a lattice joint can be made of any material, only joints made of a "standard" isotropic material with a Young's modulus of 1N /m 2 and a Poisson ratio of ν = 0.3 are used as training data for the approximation. The Young's modulus can change during the expansion step (see Sect. 5.3) and the Poisson ratio should be kept as an input parameter but is for simplicity kept fixed in this paper.
Second, the orientation and the size of the joint is removed as a parameter. Instead, for every joint type T a "standard" configuration in space is chosen, since joints with another configuration can be brought to this standard configuration by means of a linear transformation (see again Sect. 5.3). This configuration can be chosen arbitrarily. In this paper, the joints are restricted to have a unit radius and the normal vectors of the beam faces are chosen to point along positive unit directions as much as possible. Since some joint types contain inherent symmetries, symmetry conditions are also applied. As an example, consider the 2D joint with 2 beams shown in Fig. 11. Approximating all possible combinations of r 1 and r 2 is unnecessary since any combination where r 1 ≥ r 2 can be transformed into a situation where r 1 ≤ r 2 by means of a reflection around v 3 = 1 0.5 T . Therefore, only combinations where the symmetry condition r 1 ≥ r 2 holds are considered. Finally, the radii are also chosen to prevent overlap as illustrated in Fig. 11: beams can touch in at most one point since they would otherwise not remain cylindrical. 3 As a result of the restrictions of the sample space, the only remaining input parameters that describe a joint of type T are the beam radii r i , i = 1, . . . , n b , henceforth also referred to

Joint stiffness matrix reduction
Three properties of a (joint) stiffness matrix K j are enforced by reducing it to a minimal representation.
First, symmetry is considered. Since applying a force and measuring the displacement is (physically) equal to applying that displacement and measuring the force, stiffness matrices are symmetric. 4 Therefore, only the lower triangular part is considered.
Second, as discussed in Sect. 4.3, stiffness matrices have a null space corresponding to the rigid body modes. Since these modes are known beforehand, they are used to further reduce the joint matrix.
Consider the six rigid body modes of a joint stiffness matrix K j ∈ R N ×N captured by R ∈ R N ×6 such that every column of R contains the translations and rotations of the faces under a linearly independent rigid body transformation (see Eq. (16)). By definition, K j R = 0 and by splitting the degrees of freedom into six dependent degrees of freedom (subscript d) and the remaining independent ones (subscript i), one gets As will be shown in Sect. 5.3.1, reconstruction of K j only requires the independent stiffness matrix K ii and the a priori known rigid body modes R. Therefore, only K ii must be approximated. Note that K ii is still symmetric but has six zero eigenvalues less than K j . Remaining zero eigenvalues are caused by one or more of the beam radii being zero, since this means six dofs become disconnected. Reconstruction is discussed in Sect. 5

.3.1).
A third, key property is positive semi-definiteness (PSD) of K j . Whereas symmetry requires K j to have real eigenvalues and rigid body modes force K j to have six zero eigenvalues, PSD requires all eigenvalues to be nonnegative. Note that PSD also still holds for K ii . Physically, this means a positive force cannot result in a negative displacement. It is enforced with Xu et al.'s approach [43] by noting that there exists a Cholesky decomposition K ii = LL T , where L ∈ R n r ×n r is lower triangular and real. Approximating L instead of K ii then ensures PSD. Although this approach works well for positive definite matrices, in the edge case of positive semi-definiteness the Cholesky decomposition is not unique 5 , introducing unnecessary approximation error to Xu et al.'s approach. This occurs when one of the beam radii is zero, thus causing six zero rows and columns and hence six zero eigenvalues. In this paper, this numerical stability issue is solved by employing Tikhonov regularization: a diagonal term I is added to K ii before Cholesky decomposition, thus removing the zero eigenvalues and ensuring its uniqueness. By choosing larger than the machine epsilon but smaller than the expected approximation error, accuracy is retained.
In conclusion, one needs to approximate the map from the feature space F T , represented by feature samples r i , i = 1, . . . , n b , to the space of joint stiffness matrices J T , represented by the elements in the lower-triangular matrix L, where K ii = LL T and K ii is a subblock of the joint stiffness matrix containing the independent degrees of freedom stemming from the rigid body modes.

Approximation
In this section, the method for approximating the reduced joint representation L, a lower-triangular matrix of size 6(n f −1)×6(n f −1), is explained. It consists of a method to sample the feature F T using the D-PSE method, discussed in Sect. 5.2.1, and a method to build an approximation based on these samples in Sect. 5.2.2. Although matrix approximation is a straightforward concept, the multidimensional nature of this application poses practical problems. Both subsections therefore only give a general overview of the method and refer to the appendix for more detailed information.

Sampling the feature space
Reduction is followed by a sampling of the feature space F T to generate training and test data for the subsequent approxi- Fig. 12 Illustration of the 2D feature space (r 1 , r 2 ) for the joint of Fig. 11, showing both the overlap and the symmetry constraints mation step. Here, data must be understood as a set of feature samples, i.e. tuples (r 1 , . . . , r n b ) = ( f 1 , . . . , f n b ), where the features correspond to external face radii 0 < r i < 1 of a joint in standard configuration. The training samples, of which usually a few hundred are constructed, are used as input for the approximation step, whereas the test samples are meant to gauge the approximation error and prevent both over-and underfitting. Due to the use of a tensor-based approximation method, the training data is generated on a multidimensional grid. The details of the sampling algorithm are explained in Appendix B.
As an example of the sampling algorithm's results, Fig.  13 visualizes the distribution of the training and test samples for the internal diamond lattice joint. Each sample is converted to a joint element with the D-PSE method, after which the resulting stiffness matrix is reduced to its reduced representation L.

Approximating the reduced representation L
This section discusses the approximation of the map from the feature space (r 1 , . . . , r n b ) ∈ F T to the reduced joint stiffness matrix space → L ∈ J T . To reiterate, the goal is to circumvent the computationally heavy D-PSE method by a more efficient but still accurate routine. Although this is conceptually a straightforward operation, approximating high-dimensional maps is up to this date still a challenging computational task due to the curse of dimensionality. A high-level overview is given here. For more detailed information, the reader is referred to Appendix C.
Two approximation types are developed: a matrix-level technique that approximates the complete L matrix and an entry-level technique that approximates each entry of L sep-arately. The main idea of both techniques is to construct an approximation that separates the dependence on each feature and then combines them through first multiplication and then summation. As an example, for the matrix-level approach the normalized, reduced stiffness matrix L n = L/ L is approximated by Here, u i r ( f i ) is a function depending only on the i-th feature, L b r is the r -th basis matrix and the parameter R 1 determines the complexity of the model.

Expansion
Evaluation of the approximation leads to an approximationL of the reduced representation L of the joint matrix K j . This is followed by expansion ofL, leading to an approximationK j of the joint matrix. This expansion routine thus reverses the reduction routine and in the process not only ensures matrix symmetry, positive semi-definiteness and the correct rigid body modes but also transforms the parametrized superelement from its standard to its actual material and orientation.
Positive semi-definiteness and symmetry are guaranteed since K ii = LL T . What remains are the rigid body modes, the orientation and the material, discussed in Sect. 5.3.1, Sects. 5.3.2 and sec:expansionspsmaterial, respectively.

Reapplying rigid body modes
Given Eq. (17) and an approximation of K ii , the missing parts of the complete joint matrix are found through Here, R d is invertible when the dependent dof are chosen properly and the resulting joint matrix K j is assured to have the correct rigid body modes.

Scaling and coordinate transformation
After reduction, approximation and expansion of the symmetry and rigid body modes, the resulting stiffness matrix represents a joint with unit material in its standard configuration. This section elaborates on the transformation from the joint stiffness matrix K S j in the standard configuration S to K A j , the stiffness matrix in its actual configuration A. In [36], this is described for transformations p → Ap + b that preserve angles and volume, namely rotation and reflection. 6 A necessary requirement for A is orthogonality (i.e. A T A = AA T = I). A brief overview of [36] is given here.
Through a coordinate change interpretation, it is possible to show that K A j = M T K S j M, where M is a block-diagonal 6 The translational component b is added for completeness but can easily be shown to have no effect on the stiffness matrix. matrix with A on its diagonal repeated once for every set of three dof of the (joint) stiffness matrix. Furthermore, if the transformation is expressed as a sequence of m subtransformations p → A i p + b i , then where M i is again a block-diagonal matrix, now with A i on its diagonal: These results are now extended to uniform scaling, i.e. A = αI, by means of the following dimensional analysis.
Consider a 6 × 6 elementary stiffness matrix K describing the stiffness between the translational and rotational dof of a single node such that Ku = f. It is now stated that each stiffness element is the product of 1) a material parameter (an element of the stiffness tensor S s.t. oe = S : ") with units N/m 2 and 2) a size parameter that depends solely on the geometry of the superelement.
Only the size parameter is affected by the uniform scaling. It depends on the uniform scaling through it's units. Consider therefore the following subdivision of K based on which types of dofs it connects (translational or rotational): The translational stiffness elements in K uu relate displacements to forces and therefore have units Nm. The corresponding size parameter thus must have units m, which scales linearly with α. Through a similar derivation, the mixed blocks K uθ and K θu and the rotational block K θθ are found to scale with α 2 and α 3 , respectively. Thus, undoing a uniform scaling can be implemented in a similar way as undoing the rotations and reflections: by means of a block-diagonal matrix M m+1 containing √ αI and α 3/2 I on blocks corresponding to translational and rotational dof, respectively. By multiplying all m + 1 transformation matrices M i into M = m+1 i=1 M i , the effect of rotation, reflection and uniform scaling on the stiffness matrix K S j can be summarized as

Material transformation
The final expansion step deals with going from the standard unit material to the actual material of the joint. Both the Young's modulus E the Poisson ratio (or any two other Lamé parameters) should be changed. Since the solid joint mesh only contains solid elements, its stiffness matrix K int is linearly related to the Young's modulus and one can then easily show that the joint stiffness matrix, related to K int via Eq. (15), is also linear in E. Going from the unit Young's modulus of E unit = 1N/m 2 to the actual Young's modulus E A thus only requires multiplication by E A /E unit . The Poisson ratio is tougher to incorporate and should in general be incorporated as an extra parameter during sampling and approximation. In this paper, it is kept constant for simplicity.

Numerical example and discussion
A numerical example is now used to discuss three elements of the A-PSE method: its accuracy, its computational efficiency and its relation to the literature, more specifically the ARSP method developed by Wu et al. [40]. The numerical example focuses on the internal diamond lattice joint (see Fig.  4), which connects to four beams and thus requires approximation of a four-dimensional space. Both the matrix-level and the entry-level approaches are applied to a set of 700 training samples and 77 test samples, of which the distribution is shown in Fig. 13. For T 1 , the matrix-level method uses 34 basis matrices and rational functions with sixthdegree numerators and second-degree denominators for the factor vectors u k n . For T 2 , a rank of R = 10 is used and the factor vectors v k n are restricted to lie on fifth-degree polynomials. The entry-level method also uses rational functions with sixth-degree numerators and second-degree denominators for its factor vectors w k n . Solution of the nonlinear least-squares optimization problem is performed by Tensorlab v3 [37].
Accuracy Figure 14 shows the absolute errors (computed with the L2 norm) as a function of the L2 norm, achieved by both methods on the same samples. From this figure, three conclusions on accuracy will now be drawn. First, from the comparison of Fig. 14a and Fig. 14b, it is clear that the entrylevel approach has achieved a higher accuracy, albeit at a higher computational cost. Second, since the matrix-level approach first scales the norm of the matrices, the relative error is minimized and the sample error distribution is parallel to the lines of constant relative error. By contrast, the entry-level approach is unable to minimize the relative error and instead optimizes the absolute error. This causes training samples with a large norm to be overfitted and explains  Fig. 14b. However, the test set error mostly stays below the expected 1% discretization error (see Sect. 4.3). Third, for very low norms the sample accuracy breaks down both for the entry-level and the matrix-level approach. This is very apparent in Fig. 14b where at low norms both the relative training and test set error go up. This is presumed to occur because of the discretization error of the D-PSE method, since samples in this region correspond to joints with a small radius (around 10% of the joint radius) and the mesh is unable to properly capture these small faces with sufficient accuracy.
Computational efficiency A conclusion on computational efficiency is now drawn based on Table 1. Two important factors must be mentioned before studying the detailed timings. First, since the A-PSE method uses the D-PSE method for sampling, Table 1 reports computation times normalized with respect to the time needed to construct a single diamond lattice joint element with the D-PSE method. The number of elements used in the D-PSE method is chosen such that a discretization error of around 1% is achieved on the stiffness matrix, which results in at least 40k tetrahedral elements and a computation time of around nine minutes (on a computer with 2× Intel(R) Xeon(R) CPU, 14 cores and 126GB RAM).
Second, parallelism greatly influences the computation time but not the computational cost, i.e. the number of floating-point operations. Only the computation time is reported in Table 1. The sample generation is carried out with eleven processes working in parallel and is bottlenecked by the RAM used during the sparse LU factorization. The tensor decompositions and online approximation evaluations are carried out on a laptop with an Intel(R) Core(TM) i7, 6 cores and 64GB RAM. The entry-level A-PSE method uses explicit parallelism for both the tensor decompositions and the evaluation, whereas the matrix-level method does not. This immediately showcases the "embarrassingly parallel" nature of the entry-level method: it lends itself perfectly to aggressive parallelization.
Three conclusions are now drawn based on the results of Table 1. First, the sampling is undoubtedly the most expensive step and parallelization greatly helps reducing the Values are normalized with respect to the computation time for evaluation of a single joint with the D-PSE method, which is around nine minutes computation time from a few days to under twelve hours. Since this step is offline, it need only be carried out once. Second, the online cost of evaluating the joint approximations is negligible compared to the D-PSE method cost. As the next section will show, evaluating thousands of joint approximations during the meshing step of a complete lattice simulation is less expensive than solving the resulting assembled stiffness matrix. This is in line with standard finite element solvers, where stiffness matrix assembly has lower computational complexity than linear system solving.
Third, the computational cost of the entry-level approach is higher than the matrix-level approach due to its vastly larger number of parameters. However, aggresive parallelization alleviates most issues here. For our setup, only a 15% increase in decomposition time is observed.

Numerical example
This section provides a numerical example to showcase that joint elements can be used to efficiently simulate global properties of large-scale lattices. To this end, the shear and compression behaviour of a 5×5×5 diamond lattice (shown in Fig. 15a) with increasing relative density is simulated. Three meshing techniques are used: the solid mesh (Fig. 15b), the purely beam element mesh without any of the joint compensation techniques discussed in Sect. 2.1 and the combined beam and joint element mesh (referred to as the PSE mesh). For the face, edge and corner joints (see Sect. 5.4), PSE joint elements are built with the parameters of Table 2.
The following two setups are used to extract global properties.
-Pure compression the lattice structure is fixed in the vertical direction at the bottom and a unit vertical displacement δ y is applied at the top. An illustration of the exact boundary conditions is shown in Fig. 16a. The vertical reaction force R y is used to compute the apparent Young's modulus  All approximations use the matrix-level method since the number of beam n b is low (two for the face joints, one for the edge and corner joints). R 1 and R 2 correspond to the complexity of the approximation and are explained in detail in Appendix C where L and A are the height and cross-sectional area of the lattice structure. -Pure shear the lattice structure is again fixed at the bottom and a unit horizontal displacement δ x is applied at the top. An illustration of the exact boundary conditions is shown in Fig. 16b. The horizontal reaction force R x is used to compute the apparent shear modulus where L and A are again the height and cross-sectional area of the lattice structure.
Accuracy Figures 17 and 18 show theĒ(ρ) andḠ(ρ) curves obtained with the three mesh types and the errors with respect to the solid mesh. Both the beam element mesh and the mesh containing joint elements underestimate the stiffness. For the beam element mesh, this is most likely because the joint stiffness is not incorporated, although an overestimation could also have occurred due to the strict straight face constraints. Adding joint elements reduces the error, although the more relaxed dual form of the straight face constraints still underestimates the apparent Young's modulus found with the solid element mesh. However, a great advantage of using joint elements, despite their stiffness underestimation, is that no arbitrary beam element modifications are required to incorporate the joints, as discussed in Sect. 2.1. Furthermore, joint elements allow evaluation of stress concentrations near joints, although further validation in this direction is left to future work.
Computation time A great advantage of the PSE method is the computational efficiency vs accuracy trade-off. Table 3 shows rough estimates of the computation times needed for generating and solving the solid, PSE and beam meshes.
The number of elements used for the solid mesh was decided based on a mesh convergence test and ranges between 400 000 and 500 000 second-order tetrahedral elements. Generating these meshes alone is a very expensive task due to the complicated CAD geometry: one solid mesh creation took around six minutes to complete. Furthermore, Fig. 17 Comparison of theĒ(ρ) curve of a diamond lattice structure using solid elements (labelled "Solid"), beam elements (labelled "Beam") and a combination of both joint and beam elements (labelled "PSE")

Fig. 18
Comparison of theḠ(ρ) curve of a diamond lattice structure using solid elements (labelled "Solid"), beam elements (labelled "Beam") and a combination of both joint and beam elements (labelled "PSE") not included in the table is the scripted generation of the CAD geometry in Simcenter 3D [31] which, due to the large number of beams and joints, takes a little more than three minutes and becomes progressively harder when more unit cells are added. However, the most expensive task remains the solution of the assembled stiffness matrix, which (depending on the number of elements) took between five and twenty minutes for a single solid mesh. This is in large contrast to the PSE mesh, containing 1162 joint elements and 2000 beam elements, and the simple beam element mesh containing its 2000 beam elements in total. With this small number of elements, meshing becomes a much easier task and takes only a fraction of the computation time. Solving a beam element mesh took roughly nine seconds, of which three seconds were used to generate and assemble the elemental beam stiffness matrices and the remaining six seconds were used to solve the assembled stiffness matrix. The PSE mesh took around 40 seconds to solve, of which ten were used to generate and assemble both the beam and the joint element stiffness matrices. Regarding the offline cost, a net gain in computation time occurs when simulating more than twenty to thirty lattices, depending on lattice size, mesh density and approximation accuracy among others. The PSE method should thus only be used when one needs to simulate larger amounts of lattices, e.g. during optimization. Another option is to use the PSE method when the lattice becomes so large that a solid mesh is unsolvable. Indeed, when increasing to number of unit cells from 5 × 5 × 5 to 10 × 10 × 10 and more, multiple millions of solid elements are required and the limits of computational resources are easily exceeded. The stiffness matrix assembly includes the elemental stiffness matrix construction and is added to show matrix solving remains the most expensive computational step. For the solid mesh, this timing is included in the linear system solving time. All times were measured on a laptop with an Intel(R) Core(TM) i7, 6 cores and 64GB RAM

Conclusions and future work
An efficient and accurate method for simulation of structures with large amounts of parts that vary in a select set of parameters is proposed and applied to lattice structures. To this end, the lattice beams are meshed with standard Timoshenko beam elements and the lattice joints are represented by a Parameterized Superelement. In the D-PSE method, a joint element is constructed by meshing the geometry with solid elements, applying dual straight face constraints to create a fixed interface and finally statically condensating the structure to its exterior degrees of freedom. This method is shown to converge experimentally when the mesh refines and shown to adhere to the rigid body modes. Furthermore, a significant model order reduction is achieved. To circumvent the expensive static condensation during simulation, approximation is performed by the A-PSE method. Here, the space of joint stiffness matrices is first sampled, then reduced to its most basic form, after which the multidimensional space is approximated with tensor decompositions and expansion is applied. Numerical examples showcase the accuracy and computational efficiency of using the PSE method for lattice simulation. Although the global stiffness is slightly underestimated due to the relaxed dual form of the straight face constraints, the error with respect to an expensive solid mesh stays within forty percent and the formulation allows evaluation of the forces and stresses inside the lattice joints. Due to the offline cost of sampling and fitting the joint element space, the PSE method should only be used when simulating more than twenty lattice structures. Due to its low online cost, the PSE method is ideally suited for lattice structure optimization, parameter exploration and nonlinear fatigue simulations.

Future work
Four directions for future work and development are now outlined. First and foremost, more validation (both numerical and experimental) should be performed to fully assess the capabilities of the proposed lattice simulation methodol-ogy. Second, the approach must be extended to lattice unit cells other than the diamond unit cell. Some initial work has already been carried out here but is impeded due to increasing difficulty of fitting higher-dimensional spaces. Third, improvements for the discretization and approximation error should be studied, such as higher-order meshing, improved sampling and approximation of the compliance matrix instead of the stiffness matrix to better approximate the smallest eigenvalues. Fourth, the effect of the Poisson ratio on the joint element matrix and the possibility of a priori modelling of this effect must be investigated.
where p C is the surface centroid and t f and t m are unknown but constant vectors that depend on the reaction force and moment via the averaging Eqs. 2 and 3, respectively. Substituting Eq. (18) into Eq. (2), it can readily be seen that t f = f/A, i.e. applying a force to a plane surface leads to a constant traction vector. Substituting Eq. (18) into Eq. (3) leads to: The last equality holds due to the vector triple product.

Appendix B: Sampling algorithm details
The sampling algorithm's details are now elaborated. It consists of two main decision steps and an acceptance-rejection step.
The first decision step chooses the required number of samples for the training and the test set. For the training set, a decision is made based on the required number of samples N training for the subsequent approximation to be unique. That is, there should be at least as many samples as there are parameters (e.g. basis function coefficients) in the approximate model. Of course, the number of approximation parameters is only known after increasing their number until approximation error convergence is achieved. This was done for some joint types starting from a rough estimate of N training ≈ R · n b · N D , where R is the expected rank and N D the expected number of basis function coefficients for a single dimension, based on [5,Question 4]. Initial attempts showed R ≈ 30 and N D ≈ 6, yielding between 100 and 800 training samples for diamond lattice joints. It must be noted that the amount of training samples increases (usually by at most 10 samples) during approximation, since samples are added in an adaptive way, close to test samples with a large approximation error. To conclude, the number of test samples is chosen such that the test set contains roughly ten percent of the total amount of samples.
The second decision step chooses the number and distribution of "sparse" samples, i.e. samples containing one or more zero features. These sparse samples thus represent joints that lack one or more beams. These samples are important since they lie at the boundary of the sample domain and are thus otherwise very hard to sample. In general, training and test sets are created consisting of roughly thirty percent sparse samples. When a sample is chosen to be sparse, a decision is made on the measure of "sparsity" of this sample or, in other words, how many features should be zero. This decision is made by uniformly dividing the sparse samples over all possible measures of sparsity, going from only a single zero feature to all of them except one (ensuring of course that the joint still has external faces).
Finally, an acceptance-rejection method creates the samples in three substeps. Consider the required generation of a sample (r 1 , . . . , r n b ) with n 0 zero features. First, a random selection is performed: a total of n 0 features are chosen and kept at zero. Second, the remaining n b − n 0 features are generated in a randomized fashion. Third, the generated samples are either accepted or rejected based on both the symmetry and the overlap constraints mentioned in Sect. 5.1 and visualized in Fig. 12. The workings of the second substep are now further explained.
Test sample features are generated from two types of normal distributions, N 1 (μ 1 , σ 1 ) and N 2 (μ 2 , σ 2 ). N 1 has a large standard deviation σ 1 (e.g. 0.25) to ensure the bulk of the feature space is properly sampled, whereas N 2 has a smaller deviation σ 2 (usually 0.1) but a larger mean μ 2 (around 0.8) to better sample the feature space boundaries. For both, features are bounded to be in the interval [r min , r max ] by applying max(min(r , r max ), r min ) to a sampled value r . Here, r min is usually chosen at 0.1 since meshing problems occur for smaller radii.
Training sample features are generated in a similar way. However, since a tensor decomposition is used during approximation, a discrete grid is employed. Consider an n b -dimensional feature space with N points in every dimension, thus containing a total of N n b points (see Fig.  19). The k-th dimension of this multidimensional grid is indexed by i k , where k = 1, . . . , n b , such that f k (i k ) = where N k ≤ N +1. For example, in Fig. 19 there is no sample with i 1 = 0 and thus N 1 = N −1 = 2. The reduced joint representations can also be written as L( f 1 ( j 1 ), . . . , f n b ( j n b )), which we abbreviate to L( j 1 , . . . , j n b ). A major challenge with the above sampling method is the difficulty of sampling close to the feature space boundary (see Fig. 12). This boundary is essential since in practice only joints lying on this boundary are used. Consider a joint in the bulk region, such as the one shown in Fig. 11. The radius of this joint can be reduced without causing overlap, and therefore, as is also clear from Sect. 3, such a joint will Fig. 19 Multidimensional feature grid for features f 1 , f 2 and f 3 with N = 3 and three randomly selected samples shown in red be omitted in favor the closest joint on the feature boundary. However, during sampling the joint radius is fixed at 1 and complicates sampling the boundary. Although the N 2 distribution used for this purpose comes reasonable close to the feature space boundary for test samples, training samples are limited by the hypergrid. An obvious improvement could therefore be to refine the grid spacing to a certain tolerance. From a higher level though, this challenge stems from the fact that the feature space has one superfluous parameter, namely the joint radius currently kept at 1. Future work will focus on this challenge. u 1 r ( f 1 ) · · · · · u n b r ( f n b ) · L b r =: I 1 ( f 1 , . . . , f n b ).
The second tensor is decomposed as where, again, the feature factor vectors v k r , with k = 1, . . . , n b , are restricted to be the evaluations of polynomial or rational functions v k r ( f k ). Due to the very smooth behaviour of the joint matrix norm, a low degree and a limited number of terms R 2 is required to represent it with sufficient accuracy. In this paper, around ten terms and fifth-degree polynomials are used. The functional approximation I 2 ( f 1 , . . . , f n b ) for the reduced joint matrix norm L( j 1 , . . . , j n b ) 2 is then obtained  Entry-level approximation While the matrix-level approximator simultaneously constructs an approximation for all entries inside L, the entry-level method approximates every entry separately, similarly to how the matrix-level method approximates the matrix norm.
Consider again the reduced representation of the joint stiffness matrices L( j 1 , . . . , j n b ). For every lower-diagonal entry