Applying Convolutional Neural Networks to Data on Unstructured Meshes with Space-Filling Curves

This paper presents the first classical Convolutional Neural Network (CNN) that can be applied directly to data from unstructured finite element meshes or control volume grids. CNNs have been hugely influential in the areas of image classification and image compression, both of which typically deal with data on structured grids. Unstructured meshes are frequently used to solve partial differential equations and are particularly suitable for problems that require the mesh to conform to complex geometries or for problems that require variable mesh resolution. Central to the approach are space-filling curves, which traverse the nodes or cells of a mesh tracing out a path that is as short as possible (in terms of numbers of edges) and that visits each node or cell exactly once. The space-filling curves (SFCs) are used to find an ordering of the nodes or cells that can transform multi-dimensional solutions on unstructured meshes into a one-dimensional (1D) representation, to which 1D convolutional layers can then be applied. Although developed in two dimensions, the approach is applicable to higher dimensional problems. To demonstrate the approach, the network we choose is a convolutional autoencoder (CAE) although other types of CNN could be used. The approach is tested by applying CAEs to data sets that have been reordered with an SFC. Sparse layers are used at the input and output of the autoencoder, and the use of multiple SFCs is explored. We compare the accuracy of the SFC-based CAE with that of a classical CAE applied to two idealised problems on structured meshes, and then apply the approach to solutions of flow past a cylinder obtained using the finite-element method and an unstructured mesh.


Introduction
Also known as convnets, Convolutional Neural Networks (CNNs) have provided revolutionary advances in the fields of image compression [1,2] and image classification [3,4]. Although most often encountered in computer vision, technology from CNNs has recently shown promise in helping to solve governing equations for fluid motion [5], as well as analysing these motions [6]. The accuracy of CNNs for image compression seems to carry over to these applications, however, CNN technology is generally based on structured grids and is, therefore, not readily applicable to data held on unstructured meshes. Many computational fluid dynamics (CFD) problems rely on unstructured meshes to resolve complex geometries [7] and interfaces [8]. These meshes typically have more resolution where needed, that is, where finer-scale features occur, such as in the vicinity of a viscous boundary layer, at the interface between fluids, at a density front or shock wave [9,10].
Without a means of applying CNN techniques directly to data on unstructured meshes, a number of other methods have been developed. One may apply a 1D CNN directly to the vectorised nodal values of the solution variables. If the ordering of these variables has been optimised, to increase the efficiency of iterative or direct solvers for example, this might be more successful, however, we have found that this approach either does not work or has limited success. To improve this method, one could interpolate the data from an unstructured mesh to a structured mesh and then apply a CNN, although this will introduce errors due to interpolation and generally requires a much larger number of grid points to capture the features in the original data [11]. Of course, one could also restrict oneself to using structured meshes for the solution data and then apply CNNs directly to this [5,12,13]. However, for modelling certain geometries or physics, this is not feasible.
Graph-based methods can also be used to apply CNNs to arbitrary data that has been arranged in the form of a graph. Graph Neural Networks (GNNs) are networks that can be applied to such data [14], for example, the data associated with an unstructured finite element mesh that has been transformed to a graph. As in classical CNNs, convolutional GNNs [14] or Graph Convolutional Networks [15] take a weighted average of the values at the vertices (or pixels or cells) adjacent to a given vertex and use this to form a new layer or feature map similar to classical CNNs.
Recently developed within PyTorch [16], MeshCNN [17] is one example of this type of method.
MeshCNN forms a graph from the discretisation and then applies convolutional filters directly to the data held on the unstructured mesh by using mesh coarsening operations, akin to those used in multi-grid methods [18]. The resulting coarse meshes are then used to form pooling operations similar to those used in classical CNNs. MeshCNN [17] has a higher accuracy than other methods when tested on a variety of classification and segmentation problems. Another type of graphbased methods are Point Cloud methods [19], which form a graph from the nearest neighbouring points to a given point in the data cloud. Once the graph is constructed, convolutional filters can be applied as described above. Such point cloud methods may be very useful for working with data represented on a series of different meshes, for example, arising from the use of mesh adaptivity [20], as there may be no need to re-train the neural network. Graph networks (a type of GNN), which learn interactions between variables in a compressed space, have successfully been applied to problems in fluid and solid dynamics [21,22]. Both approaches use an encoder and decoder to compress and reconstruct the data, and a time-stepping method for predicting the future solution variables. The former uses particle-based methods to solve the governing equations, and exploits a nearest-neighbour technique to form an underlying graph of interactions. The latter applies the finite-element method to solve the governing equations and simply uses the underlying finite-element discretisation as the graph, combined with a neighbourhood graph for modelling solid-cloth dynamics. In both cases, the graph is used to learn the dynamics of the systems.
By using Space-Filling Curves (SFCs), we demonstrate in this paper that one can apply the traditional CNN approaches (in 1D) directly to data held on unstructured multi-dimensional meshes.
The underlying ideas of space-filling curves are applied to produce a continuous curve through a finite element mesh, in which the curve traverses every node of the mesh. This space-filling curve is used to transform multi-dimensional data into 1D data, to which convolutional layers are then applied. Sparse layers are introduced to the network, which add some smoothing to the results.
The use of multiple SFCs is also explored in this paper.
The first eponymous space-filling curve was discovered by Peano in 1890 [23], when seeking a continuous mapping from the unit interval onto the unit square. One year later, Hilbert introduced a variant of this curve [24] known as a Hilbert curve or Hilbert space-filling curve. Since then, many other SFCs have been developed including those by Moore, Lebesgue, Sierpiński [25,26].
Ideal for transforming and reordering multi-dimensional data to 1D data, space-filling curves: (P1) generate a continuous curve in which neighbours of any node within that curve are nodes that are close to one another; (P2) have in-built hierarchies of features, for example, if one takes every N points along the curve, the result is a coarser mesh across the whole domain; (P3) are Lebesgue-measure preserving (at least the classical SFCs have this property), so that sub-curves of equal length fill multi-dimensional regions of equal area or volume.
The properties of space-filling curves have already been exploited in a number of techniques used in the numerical solution of partial differential equations. For example, property P1 has led to the use of SFC orderings to manage cache memory when calculating matrix-matrix or matrix-vector products [27]. Properties P1 and P2 have led to the use of SFCs to optimise the node numbering for the benefit of iterative solvers [28]. Properties P2 and P3 have resulted in the extensive use of SFCs with unstructured meshes to find a partition suitable for parallel computing [29]. That is, if one slices the SFC into I equal intervals, then the result is a domain decomposition that has I partitions with minimal connectivity (few common edges) between the partitions.
Space-filling curves are a natural partner to CNNs because of properties P1, P2 and P3, and have been applied to structured grids to reduce 3D structured grid data to 1D in image compression [30,31]. They have also been applied to DNA sequencing and classification [32,33].
The latter compares different mappings of 2D structured mesh data to 1D, to which a CNN is applied, and found that the mappings based on Hilbert SFCs performed the best. Analysis of SFCs for their data preserving abilities is investigated in [34]. Applying 1D CNNs to 2D data that has been transformed by an SFC has also displayed comparable accuracy and speed to classical 2D and 3D CNNs [35], again, for data on structured grids.
We present here a method of applying CNNs directly to data on unstructured meshes, such as solutions arising from finite element or control volume discretisations. The method is timely as researchers are currently looking to improve compression or dimensionality reduction methods within reduced-order model frameworks [5,11,12,13,36] by exploiting some of the useful properties of CNNs, such as the multiscale resolution, rotational and position invariance [37,3,5]. Reducedorder models often rely on Singular Value Decomposition (SVD) for compression (or dimension reduction) through Proper Orthogonal Decomposition (POD) [38,39,40,41]. SVD-based methods can be limited in their abilities to interpolate solutions, for example, suffering from Gibbs oscil-lations when there are abruptly changing fields [12] and poor accuracy for convection-dominated problems [41]. Going some way to tackle this, a combination of SVD and a fully-connected autoencoder is presented as a tool for dimension reduction in [36], however, some of the issues of the SVD are still present. Thus, being able to apply CNNs to data on unstructured meshes could significantly improve the quality of such reduced-order models. In addition to the timely nature of this work, the elegant SFC-based approach outlined here, is expected to be computationally faster than the graph-based approaches (MeshCNN and GNNs), since the latter will be dominated by indirect addressing. In the method outlined here, there is only indirect addressing at the inputs and outputs of the space-filling curve CNN and then, only when multiple space-filling curves are used.
The remainder of this paper is as follows. The following section describes how space-filling curves are formed for an unstructured mesh. Section 3 describes the architectures of the convolutional autoencoders used in this paper. The results are presented in Section 4 which includes the application of SFC-based autoencoders to data on a structured mesh (for advection of a square wave and advection of a Gaussian function) and data on an unstructured mesh (for flow past a cylinder). Future work is then discussed and conclusions are drawn.

Determining Space-Filling Curves on Unstructured Meshes
This section describes how to generate space-filling curves for unstructured meshes. The method involves ordering or numbering a nested sequence of partitions in such a way that, when following the numbering, the path through these partitions is continuous, and neighbouring vertices on the path will be close in the original graph. This is desriable, as, when applying CNNs to data, the filters search for structurally coherent features. This requires the node ordering is such that neighbouring nodes on the SFC are close to one another in physical space, so that any structural coherence in the data can be detected. The generation of space-filling curves for structured and unstructured meshes is described in Section 2.1. The method which creates the partitions used to generate the space-filling curve ordering is discussed in Section 2.2. Finally, in Section 2.3, comments are made on the efficacy of multiple space-filling curves and how these are generated.
2.1. Forming the numbering of the space-filling curve Given its fundamental shape of U, a Hilbert curve can be formed by placing the fundamental shape on the coarsest level (level 1) of a grid. For a Hilbert curve, the level one representation fits on a 2 by 2 grid, see Figure 1(a). Once the starting vertex is chosen (from either vertex with a valency of 1), the vertices are numbered as ones traverses the path. The numbering is not unique as the path could also be followed in the opposite direction. To form the level 2 Hilbert curve (on a 4 by 4 grid), the fundamental shape is centred on each vertex of the level 1 curve (Figure 1(b)) and rotated (Figure 1(c)) until the shapes can be linked with one another by connections (Figure 1(d)) that (i) are consistent with the discretisation stencil and (ii) result in a continuous curve. Here the stencil implied by the Hilbert curve's fundamental shape has connections between horizontal and vertical nearest neighbours -known as a five point stencil and often used in solving differential equations. This process of locating the fundamental shape at vertices and rotating them until a continuous curve is found, is repeated until the desired level is reached. Neighbouring vertices on the space-filling curve in (d) will be close in the graph for level 2 shown in (e) but the reverse is not necessarily true. For example, vertices 2 and 15 in Figure 1(d) are not close on the space-filling curve but are close in the original graph in (e). To overcome potential inaccuracies caused by the disconnect between neighbouring vertices on the original graph, the generation of multiple spacefilling curves is explored, the aim of which is that such vertices would be closer on a subsequent space-filling curve. More details are given about multiple space-filling curves throughout this section. As an example, two 2D Hilbert curves for a 32 by 32 grid can be seen in Figure 2; both are used to generate the results for the structured grid test cases that are presented later. 16 (e) Figure 1: Hilbert space-filling curves at level 1 and 2: (a) shows the fundamental shape of the Hilbert curve, which also corresponds to the level 1 Hilbert curve. For level 2, (b) locates the fundamental shape at the vertices of level 1; these shapes are rotated (c) and linked (d) to form a continuous curve. The graph of the level 2 discretisation is shown in (e) with vertices (green) located at the centre of the control volumes, edges (green) and the grid corresponding to level 2 shown in grey.
The approach we take to form space-filling curves for unstructured meshes is similar to the above. For the fundamental shape, we choose a straight line (two vertices and one edge). The two vertices represent two partitions and the edge between them represents an edge on the original graph which connects them. Generating a space-filling curve consists of three steps. First, the graph is constructed, based on the discretisation stencil associated with the nodes or cells of the mesh. In the graph, vertices represent the nodes or cells, and edges represent the connections between nodes or cells as determined by the stencil. We refer to this as the 'original graph' to distinguish it from graphs of the partitions at intermediate levels of decomposition. Second, a hierarchy of partitions (and graphs) are created from the original graph using a domain decomposition approach based on nested bisection. The graph partitioner will be explained in the next section, but, important here is that it generates a series of partitions at different levels of refinement. Third, the partitions are ordered, beginning at the coarsest level and ending at the final level. Once complete, the space-filling curve ordering is obtained, either directly from the vertex ordering of the final level, or after an additional step if there is more than one vertex in any of the partitions.
The starting point for generating the space-filling curve ordering or numbering is therefore a hierarchy of nested partitions. As a consequence of the fundamental shape, at level 1, the coarsest level, there will be two partitions; at level 2, there will be 4 partitions; at level there will be 2 partitions and so on. In addition to the original graph, at every level a graph of the partitions and their connectivities is constructed, from which the numbering is calculated. The method is now illustrated with reference to an example shown in Figure 3. Here, a 2D linear Finite Element (FE) discretisation is assumed which uses three-noded elements. The original graph of the computational domain and discretisation can be seen in Figure 3(a), which, for this stencil, 4 . This is so the path traced out by the space-filling curve traverses the vertices in as efficient a manner as possible (given the stencil). At this level, there are four possible ways of numbering the vertices, resulting in four different paths through the graph.
All combinations of numbering are tried and, for each, a functional is evaluated to find the best path or any one of the best paths. If one combination gives either the same value or an improved functional value, it is accepted. The functional is defined as the number of edges traversed in order to move from the first vertex to the fourth. In Figure 4 we show the four possible combinations of numbering, also shown in Table 1 along with the corresponding functional values. In Figure 4    In the first row, the first of the four paths is shown, where the first vertex in the path is that in partition p 2 1 , the second vertex is p 2 2 and so on. Each path is illustrated in Figure 4 from which the functional f v→w can be calculated. The functional is defined as the minimum number of edges that are traversed in going from vertex v to vertex w.  Table 1. The partitions are shown, labelled p i for partition i at level = 2. The graph is shown in grey and the path corresponding to each ordering is highlighted in green. Once the ordering has been determined (in this case, that shown in (d)) the partitions will be renumbered, see Figure 3(f). so at level 3, if the entire path is considered, that would result in 16 possible orderings. Instead of considering all the partitions simultaneously, groups of 4 partitions are considered in turn, as shown in Table 2. First, the four combinations of ordering the vertices in partitions 1 to 4 are considered by evaluating the functional from vertex 1 to 4, f 1→4 . Once the optimal ordering of these vertices has been found, vertices 1 and 2 are fixed and the vertices in partitions 3 to 6 are considered. The functional is now evaluated from the second to the sixth vertex, f 2→6 as the location of the third vertex relative to the second will affect the optimality of the path. When the ordering has been found, vertices 3 and 4 are fixed and now the vertices in partitions 5 to 8 are considered. The algorithm sweeps forwards and backwards through the partitions, considering four vertices at a time and fixing pairs of vertices based on the optimal path calculated between the appropriate number of vertices (4, 5 or 6) depending on whether the vertices either-side of the path have been fixed (see Table 2). For simple cases, only a small number of iterations are sufficient. For complex cases, more iterations are likely to be needed. In any case, it is not known whether the entire path from vertex 1 to vertex N vert (N vert is the total number of vertices or nodes in the graph) produced by this method will be optimal, however, using 10 forwards and backwards 'sweeps' was found to be sufficient for the challenging example of an unstructured mesh with a discontinuous Galerkin stencil. Once the final level is reached, if there are multiple vertices in a partition, a series of vertex swapping operations are performed until an optimal path is found within that partition: that is a fully-connected path that links to the surrounding partitions and which has the shortest path. Hence, the space-filling curve ordering is obtained, of which, one possible solution is shown in Figure 3 This method of generating space-filling curves is applied to an 8 × 8 structured grid and the resulting space-filling curve is shown in Figure 5. The graph for the 8 × 8 grid was generated by assuming a 5 point finite-difference stencil: the same stencil that was used to generate the graph seen in Figure 1(e). In Figure 5 it can be seen that a satisfactory space-filling curve has been found for the grid, whilst using a method that can be applied to fully unstructured meshes. The curve found here has four instances where the path is longer than it needs to be (indicated by the diagonal lines). However, this path is close to optimal and deemed good enough for use here.   Table 2: Determining the ordering or numbering of the vertices for the partitions after three levels of bisection. The partitions are labelled p i for partition index i and level . The first row indicates that the first path to be considered is partitions 1, 2, 3 then 4. The number of edges traversed in the graph when following this path is counted, that is, the functional from vertices 1 to 4 is evaluated. The second path to be considered is partitions 2, 1, 3 then 4. The functional for this path is evaluated. A certain number of iterations are carried out, which sweep forward and backwards through the partitions, at which point the space-filling curve path is obtained.  The discontinuous Galerkin (DG) discretisation has a particularly complex stencil (and graph) associated with it, see Figure 6 for the stencil associated with one node. For a DG node in element k, this node is connected (through the stencil) to all nodes in element k and all nodes in elements which share an edge (2D) or face (3D) with element k. The edges for the node shown in Figure 6 would be constructed by drawing a line from the this node to each other green node.
To draw the graph for the entire mesh shown here, this process would need to be done for every node in the mesh. The result of applying this method to an unstructured mesh can be seen in The ability of the space-filling curve to group together neighbouring nodes is clearly demonstrated.
By grouping 200 DG nodes together in this way, a coarse-mesh representation of the problem is formed, illustrating one of the key reasons why combining space-filling curves and convolutional networks is so effective. The top right plot in Figure 7 could be interpreted as coarsening of the finite element mesh. Suppose that one level of coarsening was applied by taking every 4th node along the space-filling curve (reducing by a factor of 2 in each dimension), and this was done 4 times, then this is equivalent to taking every 256th node. Therefore, one could say the coarsening shown in this plot, by colouring groups of 200 DG nodes, is similar to applying 4 levels of coarsening which reduces the resolution by a factor 2 in each dimension (as 200∼256), which is similar to applying 4 convolutional layers with a stride of 2 for each dimension. The lower plots in Figure 7 show contour plots of the node numbers, with low numbers in black through to the highest node numbers in yellow. The plot on the left corresponds to the default node numbering based on the numbering provided by the meshing software, and the plot on the right corresponds to the node numbering from the space-filling curve.

Partitioning method
Many problems can be abstracted to the partitioning of a graph, such as the decomposition of computational domains, social network problems and travel network problems. Therefore much effort has been applied to the fundamental challenge of developing a method which simultaneously achieves an equal distribution of vertices/load between the partitions whilst minimising communication between partitions. A comprehensive survey of methods can be found here [42], which include spectral partitioning, geometric partitioning [43], multi-grid methods [18], methods based on the mean-field theorem [44], and also nested bisection approaches [45,46,47], which have seen success for partitioning large graphs.
A recurrent neural network based on mean-field theory (MFT-RNN) [48] is chosen to be the graph partitioner for the current work, as it (i) has a facility to set variable edge weights in the graph and (ii) can balance exactly (as far as possible) the number of vertices in each partition at each level. When generating multiple space-filling curves, there is a need to avoid cutting certain edges in the graph, so that different space-filling curves traverse distinct paths through the graph; which leads to the first requirement. The second requirement arises from the desire to obtain a final set of partitions, of which as many as possible contain at most one vertex. Partitions with more than one vertex can be dealt with, as can empty partitions, but the performance of the space-filling curve numbering algorithm will suffer in the former case. This multi-grid partitioning approach forms a hierarchy or nested sequence of partitions akin to that seen in multi-grid finite element methods [48,18]. Other graph partitioning methods could be used as long as properties (i) and (ii) are attainable, such as methods that use vertex agglomeration by repeatedly merging adjacent vertices in a similar approach to multi-grid agglomeration methods [18].
More details of the MFT-RNN partitioner can be found in [48], however the main points are given here, along with details of the modifications made when generating multiple spacefilling curves. The number of partitions created at each level is represented by S, which, for the particular fundamental shape used in this paper, is necessarily S = 2. This recurrent neural network associates S neurons with each vertex in the graph (i.e. one neuron for each desired partition), whose values represent the probability that a vertex i is in partition µ: The constraint has led to this method being referred to as normalised mean-field theory. For each partition, a vector of probabilities can be defined as where N vert is the number of vertices in the graph. For the special case of S = 2, only one neuron need be included for every vertex, as, given the probability of the vertex being in one partition, the probability of being the other can be calculated directly from the constraint. However, the code used to generate the examples in this paper was written only for the general case. Representing the quality of the partitions, the functional comprises two terms: the first describes the communication cost between vertices in different (proposed) partitions and the second describes how well the vertices are distributed between the partitions. It is written as follows: The value h ij represents the cost of vertex i communicating with vertex j and as well as determining the communication cost in the functional, this value is also taken as the ijth edge weight in the graph. When generating one space-filling curve, h ij = 1 ∀i, j. When generating more than one space-filling curve, this parameter is set as where the heuristically obtained exponent γ is set to be 0.2 and s m i is the new space-filling curve node numbering of the ith vertex in any one of the pre-existing space-filling curves labelled by m.
The idea behind introducing variable edge weights to the graph is to inject a means with which to discourage the graph to partition across edges that have large differences in space-filling curve numbers. This helps to ensure that a subsequent space-filling curve will avoid edges traversed by pre-existing curves, and encourages subsequent space-filling curves to find previously undiscovered paths through the graph. So, if two points are far apart on the first space-filling curve but close in Cartesian space, a second space-filling curve will attempt to find a path on which these points are closer.
Similar to K, the matrix C consists of sub-matrices C µν for partition numbers µ and ν, where µ, ν ∈ {1, 2, . . . , S}. To balance the vertices equally between the partitions (as far as possible), a diffusion matrix is used, the entries of which are defined as The two terms in the functional tends to compete against one another during optimisation, so it is generally advantageous to set α as small as possible, although just large enough so that the vertices are equally distributed between the partitions. If the value of α is too large, the partitions will have equal numbers of vertices but the path traced out by the space-filling curve will not be as continuous as it could have been, resulting in a poor space-filling curve. In this paper, an optimised value of α is used as derived in [48]: Extensive numerical experiments have shown that the value of α obtained from this equation results in good load balancing as well as good quality partitioning that minimises the sum of the weights between the different partitions.
In summary, the graph partitioning algorithm takes a graph and decomposes this into partitions, using a nested bisection approach. This continues until almost all of the partitions contain one vertex. This is done by calculating the number of levels of decomposition, L, by finding the smallest integer L that satisfies 2 L the total number of vertices in the original graph.
The functional, Equation (3), is minimised using the neural network updating algorithm associated with the MFT neural network as described in [48] to obtain the desired graph partitioning. No training of this network is required as the weights are determined by the functional. The spacefilling curve can now be constructed from the numbered partitions.

Multiple space-filling curves
One problem with using the space-filling curve approach is that there can be little connectivity between certain regions of the domain, because although points close together on the space-filling curve are also close together in the physical space, the reverse is not true. A simple example of this can be seen in Figure 1(d), where vertices 2 and 15 are not close on the space-filling curve but are close in the original graph in (e). A possible remedy is to use more than one space-filling curve, where the second curve should be discouraged from having the same edges as the first.
One way of achieving this is though the graph partitioning procedure, by weighting the graph edges as described in Equation (4). This effectively discourages partitions from being made across the same edges in the previous SFCs. Nothing else is needed as the edge weights determine the communication between areas of the domain in the any additional SFCs. An example of two space-filling curves is shown in Figure 8. The first SFC (left) is optimal in that it uses the minimum number of edges to traverse all the cells in the grid. The second SFC (centre) is less optimal, as it jumps to nodes that are not neighbours in the stencil, shown as curved or diagonal lines in (b). In order to do this, the SFC will pass through nodes and edges more than once. This is partly due to the small size of the example. However, from (c), the combination of the two space-filling curves is shown, which shows that all but one of the edges on the original graph (shown in Figure 1(e)) are connected and only three edges appear in both SFCs. We have experimented with an alternative method, which deletes the edges of the first SFC from the original graph used by the graph partitioner. Subsequent SFCs will then be forced to find new paths through the mesh. With this method, the quality of the second and subsequent SFCs was not as good as with the previous method, possibly because it excessively reduced the number of graph edges available to subsequent SFCs.
We have also experimented with introducing a constraint on the second SFC so that this spacefilling curve starts at the final node of the first SFC. In this way, the second SFC follows the first SFC resulting in a continuous curve, which is advantageous to the performance of the 1D CNN.
This constraint could be introduced in the SFC algorithm, simply, by always ensuring that the first partition on each level of the nested bisection contains this node (that is, the end node of the first SFC). For three space-filling curves one adds a similar constraint to the third SFC so that it start at the node next to the final node of the second SFC. A possible disadvantage of using this start-end constraint is that it can compromise the quality of the second and subsequent SFCs, in terms of producing a continuous curve, as, once again, the options of the SFC algorithm are restricted. A second disadvantage is that the number of weights is greater when the SFCs are combined into one continuous curve. For a 2D data set with 2 SFCs, having two separate SFCs results in a factor of two saving, in terms of weights in the 1D convolutional layers for the same total number of channels. A similar saving is also obtained in terms of arithmetic operations. For 3D problems this approach results in a saving of 3 times the number of weights and approximately similar saving for the CPU requirements associated with the convolutional layers. Thus, we chose here to have separate SFCs in the SFC-based CNNs.

Architectures of the convolutional networks
The architectures of the CNNs that will be used in the results section are described in detail in this section. To demonstrate the application of convolutional networks to data on both structured and unstructured meshes, a convolutional autoencoder is chosen. This type of network attempts to learn a compressed representation of data due to a bottleneck as its central layer. Consisting of an encoder and a decoder, the encoder compresses the data to a predetermined number of latent variables, known as the dimension of the latent space. The decoder reconstructs the data from the latent variables. For a convolutional autoencoder, the encoder has a number of convolutional layers typically followed by some fully-connected layers, and the decoder is the reverse of this.
In addition to the layers found in a classical convolutional autoencoder, the SFC-based convolutional autoencoders have: (1) a transformation from multi-dimensional data to 1D data and vice versa, based on SFCs; (2) sparse layers at either end of the network; and (3) nearest-neighbour smoothing (included in some of the networks). The transformation from multi-dimensional data to 1D data is determined by one or more space-filling curves and occurs in the first and final layers of the SFC-based autoencoders. Immediately after the first layer and before the final layer are the sparse layers. The function of the sparse layers is to apply some smoothing to the results and to decide the priority of the feature maps when recombining the data. The amount of smoothing provided by these layers can be increased by including a node's nearest neighbours on the space-filling curve within the smoothing.
On transforming the multi-dimensional data to 1D data with the SFCs, there may be some associated CPU speed advantages as 1D arrays generally map straightforwardly onto the various memory hierarchies of modern CPUs or GPUs. This could make the SFC-based CNNs computationally faster than the classical multi-dimensional CNNs. However, we have found, through considerable trial and error, that it is necessary to use roughly uniform numbers of channels in the convolutional layers of the SFC-based networks. This is in contrast with the classical CNNs, which seem to work best when the number of channels increases in the encoder and decreases in the decoder. The more channels, the greater the computational speed and memory requirements, especially on layers with more neurons. Thus the classical CNN has a computational advantage over the SFC-based CNN in this regard.
To motivate the sparse layers further, consider a node on an SFC which is close in physical space to another node (or, more precisely, close on the graph associated with the discretisation with autoencoders). This effectively takes a weighted average of a node on a SFC with its two neighbouring nodes and forms another neuron from this as well as combining (in some optimal way) SFC results to form feature maps that are fed into the rest of the autoencoder.
The notation used to describe the networks is given in Section 3.1. In the subsequent sections, the autoencoders used in this paper are described in detail. For the structured data on a 128 × 128 grid, we construct (1) a classical 2D autoencoder given in Section 3. (2) an autoencoder based on two space-filling curves with nearest-neighbour smoothing described in Section 3.3.2. Figure 9: The architecture of the SFC-based autoencoder used within this work. This diagram shows the autoencoder based on two-space filling curves with nearest-neighbour smoothing on the first and final layers. The architecture of the autoencoder based on one space-filling curve is similar, but has only one SFC branch so is half the size of that shown. Notice that there are convolutional layers associated with the encoder and decoder, and a fully-connected multi-layer percepton in the centre, which reduces the number of variables down to the required quantity.

Notation used to describe autoencoder architectures
The notation that describes the architecture of the convolutional autoencoders used in the following equations and in Tables 4, 5 In the tables, '1 variable' or '3 variables' for the kernel size represents the identical operations (using kernel sizes of 1 or 3) to the standard filter approach except for the fact that the filter weights that applied to the neurons are not the same but vary for each neuron. The values are optimised as part of the training process. These filters are used to form the sparse layers, which apply smoothing of the SFC-based autoencoder solutions and increase the accuracy of these solutions. Each node in the output is determined by one or three nodes in the input ('1 variable' or '3 variables'). Connecting one node in the output with three nodes in the input is referred to here as nearest-neighbour smoothing. We call these sparse layers, because the number of parameters is much less than if the layers were fully connected. We have found their use to be essential in order to reduce the noise one would otherwise see in the SFC-based autoencoder results.
To identify neighbouring nodes in the SFC ordering, for layer and node number i, we use sf c x + to represent the neighbour (with an increase in SFC node number) of sf c x (that is sf c x + ,i = sf c x ,i+1 ), and use sf c x − to represent the neighbour (with a decrease in SFC node number) of sf c x (that is sf c x − ,i = sf c x ,i−1 ).

Convolutional autoencoders for data on structured meshes
3.2.1. Architecture of a classical 2D convolutional autoencoder  3.2.2. Architecture of the convolutional autoencoder based on one space-filling curve As shown in Table 4, the autoencoder based on one SFC first transforms the multi-dimensional data to 1D data with the space-filling curve ordering. After this, are the sparse layers (layers 1 and 2). Then follows a classical 1D convolutional autoencoder. At the output of this, there are more sparse layers (layers 17 and 18) followed by a transformation of the data from 1D to the original multi-dimensional form. Here we describe the sparse layers. The filters of the sparse layers are defined using weight vectors, as these filters change at each node. This results in an efficient implementation of the sparse layers. These weights effectively provide smoothing of the outputs of the SFC-based autoencoders, without which, would lead to noisy results.

Layers 1 and 2 (sparse layers).
The input, grid x ∈ R 128×128 , is transformed using the Hilbert curve mapping to a 1D vector sf c x 1 ∈ R 16384 (128 × 128 = 16384) in layer 1. Then the output of the sparse layer 2 is: where sf c w 2 ∈ R 16384 is the weight vector (subject to neural network training), sf c b 2 ∈ R 16384 is the bias (again subject to training), f is the ReLU activation function, and is the Hadamard product indicating entry-wise multiplication. Note that sf c x 2 ∈ R 16384 is the input for layer 3 of the neural network. See Table 4 for a description of the convolutional layers, which take input from layer 2.

Layers 17 and 18 (sparse layers).
Given the input sf c x 16 ∈ R 16384 , the output of the sparse layer 17 (which is the final output of the SFC-based CNN) can be written as: where sf c w 17 ∈ R 16384 is the weight vector, sf c b 17 ∈ R 16384 is the bias, and f is the ReLU activation function. Using the inverse space-filling curve mapping, the 1D data, sf c x 17 , is transformed back to 2D data on the structured grid, grid x 18 ∈ R 128×128 . layer input size kernel size channels stride padding output size activation 1-GRID (1, 16384, GRID)

Architecture of a convolutional autoencoder based on two space-filling curves
The architecture of autoencoder based on two space-filling curves is given in Table 5. When using two space-filling curves, we keep the neuron values associated with these curves separate in the convolutional layers, but bring this information together within the fully-connected laters at the centre, as well as the input and output layers. The sparse layers are now described in detail.

Layers 1 and 2 (sparse layers).
The data on the grid, grid x ∈ R 128×128 , is transformed to two 1D vectors using the mappings from the two space-filling curves to produce sf c1 x 1 , sf c2 x 1 ∈ R 16384 in layer 1. The output of the sparse layer 2 is: where sf c1 w 2 , sf c2 w 2 ∈ R 16384 are the weight vectors, sf c1 b 2 , sf c2 b 2 ∈ R 16384 are the bias vectors, and f is the ReLU activation function. The input for layer 3 takes the form sf c1 x 2 , sf c2 x 2 ∈ R 16384 .
See Table 5 for a description of the following convolutional layers of the network.

Layers 18 and 19 (sparse layers).
Given the inputs sf c1 x 17 ∈ R 16384 and sf c2 x 17 ∈ R 16384 , the output of the sparse layer 18 is: where sf c1 w 18 , sf c2 w 18 ∈ R 16384 are the weight vectors. Using the inverse mappings from the first and second space-filling curves, the vectors sf c1 x 18 and sf c2 x 18 are transformed to obtain grid1 x 18 ∈ R 128×128 and grid2 x 18 ∈ R 128×128 . Added to this is the bias grid b 19 ∈ R 128×128 . The output of the network is grid x 19 ∈ R 128×128 and is obtained from: in which f is the ReLU activation function.

Architecture of a convolutional autoencoder based on two space-filling curves with nearestneighbour smoothing
The only difference between this SFC-based CNN and the previous one is the introduction of the nearest neighbours in the sparse layers. This effectively reduces the noise in the output of the SFC-based CNN.

Layers 1 and 2 (sparse layers).
The input to the network, grid x ∈ R 128×128 , is transformed to sf c1 x 1 , sf c2 x 1 ∈ R 16384 by the two space-filling curve mappings in layer 1. In SFC ordering, sf cC x + 1 ∈ R 16384 is the neighbour (with an increase in SFC node number) of sf cC x 1 , and sf cC x − 1 ∈ R 16384 is the neighbour (with an decrease in SFC node number) of sf cC x 1 , in which C ∈ {1, 2} is the number of the SFC. The output of the sparse layer 2 is given as: sf c2  where sf c1 w 2 , sf c2 w 2 , sf c1 w + 2 , sf c2 w + 2 , sf c1 w − 2 , sf c2 w − 2 ∈ R 16384 are the weight vectors, sf c1 b 2 , sf c2 b 2 ∈ R 16384 are the bias vectors, and f is the ReLU activation function. The input to layer 3 is sf c1 x 2 and sf c2 x 2 . See Table 5 for a description of the convolutional layers.

Layers 18 and 19 (sparse layers).
Given the inputs sf cC x 17 ∈ R 16384 ∀C ∈ {1, 2}, in the SFC ordering sf cC x + 17 ∈ R 16384 is the neighbour (with an increase in SFC node number) of sf cC x 17 , and sf cC x − 17 ∈ R 16384 is the neighbour (with an decrease in SFC node number) of sf cC x 17 . The output of layer 18 is: where sf c1 w 18 , sf c2 w 18 , sf c1 w + 18 , sf c2 w + 18 , sf c1 w − 18 , sf c2 w − 18 ∈ R 16384 are the weight vectors. Using the inverse of both space-filling curve mappings, transform sf c1 x 18 and sf c2 x 18 to grid1 x 18 ∈ R 128×128 and grid2 x 18 ∈ R 128×128 respectively, then add the bias grid b 19 ∈ R 128×128 to obtain: in which f is the ReLU activation function and grid x 19 ∈ R 128×128 is the output of this SFC-based CNN on the original grid.

Architectures of convolutional autoencoders for data on unstructured meshes
3.3.1. An autoencoder based on one space-filling curve with nearest-neighbour smoothing Since flow past a cylinder is a more demanding compression problem than the previous two idealised cases, two channels are used for the sparse layer filters. If only one channel were used to be used, as in the idealised cases, the SFC-based CNN outputs tended to be noisy. In addition, within the sparse layers, the velocity components u and v are kept separate, in order to reduce the number of weights required in these layers and also to apply the smoothing to the velocity components separately. These modifications to the CNN are also applied to the SFC-based CNN with two SFCs, also used for flow past a cylinder. See Table 5 for a description of this autoencoder.
We now describe the sparse layers in detail.

Layers 1 and 2 (sparse layers).
The input to the neural network is f em x ∈ R 20550×2 and contains two channels; one for the first velocity component, u, and the second for the other velocity component, v. These are held in the FEM DG ordering of the nodal values of the velocities. The input, f em x, is mapped to the SFC ordering, sf c1 x 1 , using the SFC mapping in layer 1. By splitting the channels of sf c1 x 1 , we obtain sf c1u x 1 and sf c1v x 1 , associated with the two velocity components respectively. In the SFC ordering, sf c1X x + 1 ∈ R 20550×1 ∀X ∈ {u, v} are the neighbours (with an increase in SFC node number) of sf c1X x 1 (that is sf c1X The output of the sparse layer 2 is then: sf c1v where sf c1u w 2 , sf c1v w 2 , , sf c1u w + 2 , sf c1v w + 2 , sf c1u w − 2 , sf c1v w − 2 ∈ R 20550×2 are the weight vectors, sf c1u b 2 , sf c1v b 2 ∈ R 20550×2 are the bias vectors, and f is the tanh activation function. Then: where sf c1 x 2 ∈ R 20550×4 is the input for layer 3. The notation concat(a, b) = (a T , b T ) T is used to represent the concatenation of two vectors into one, and the notation concat2(a) = (a T , a T ) T represents the concatenation of one vector with itself.

An autoencoder based on two space-filling curves with nearest-neighbour smoothing
Layers 1 and 2 (sparse layers).
Given the input f em x ∈ R 20550×2 , convert this to the SFC1 and SFC2 ordering sf c1 x 1 , sf c2 x 1 using the SFC mappings in layer 1. By splitting the channels of sf c1 x 1 and sf c2 x 1 , we obtain sf c1u x 1 , sf c2u x 1 , sf c1v x 1 , sf c2v x 1 for velocity components u and v respectively. The output of the sparse layer 2 is: sf c2u sf c2v are the bias vectors, sf c1u x 2 , sf c1v x 2 , sf c2u x 2 , sf c2v x 2 ∈ R 20550×2 , and f is the tanh activation function.
Then we use: as the input for layer 3-SFC1, and as the input for layer 3-SFC2. See table 5 for a description of the rest of the SFC-based CNN.

Results
Three examples are used to demonstrate the potential of the method proposed in this paper, of applying convolutional networks to data held on any mesh (structured grids or unstructured meshes) by using space-filling curves. Two of the test cases consist of structured grid data; the first data set represents advection of a square wave and the second represents advection of a Gaussian function. The third example uses a data set consisting of solutions of 2D flow past a cylinder solved on an unstructured mesh.

Singular Value Decomposition
The results from the convolutional autoencoders developed in this paper are compared with results from singular value decomposition (SVD). For a matrix M , where each column corresponds to a particular solution or example from the data set, and each row corresponds to a particular node or cell, the SVD is defined as where Σ is a diagonal matrix containing the singular values of M given in descending order, U and V are the left-and right-singular vectors respectively, and the asterisk denotes the conjugate transpose. The square of each singular value indicates how much information is contained in the corresponding mode or basis function (column of U ). A low-rank approximation of M can be formed by retaining only the N Σ largest singular values, setting the remaining singular values to zero, and recalculating the product: where the only non-zero terms in Σ are the N Σ largest singular values from Σ. Comparison is made between convolutional autoencoders which have a latent space of dimension N Σ and SVDs which have been truncated to N Σ singular values.

Measuring the error
To evaluate the error in the various autoencoder and SVD approaches, the mean square error is used: in which N is the number of input (or output variables), the number of nodes or cells for instance, k

Hyper-parameters of the autoencoders
For each particular autoencoder, some of the hyper-parameters are specified in Section 3, including kernel size, number of channels, stride, padding, activation function, number of layers and number of neurons per layer. Other hyper-parameters are loss function, batch size, optimiser, learning rate and number of epochs, which are given here, except for the number of epochs which is stated in the results section where appropriate. Through tuning these hyper-parameters, we have optimised the neural networks within this work. Unless explicitly stated, it can be assumed that the values given in Table 8 are used for the SFC-based autoencoders.
The kernel size for the 2D classical autoencoder is 5 × 5 = 25 with a stride of 2 × 2. To make the 1D SFC-based autoencoders approximately analagous to this, a kernel size of 32 is used with a stride of 4. A number of activation functions were experimented with, and ReLU and tanh performed the best. Here we use ReLU for the structured data and tanh for the unstructured  data. All the input data for the autoencoders are normalised between [0, 1] for the structured data and [−1, 1] for the unstructured data. The unstructured data has a larger computational cost associated with it, so the batch size was reduced from 64 to 16. The loss function chosen is the mean square error (MSE) given in Equation (40), and errors calculated using this formula are based on the normalised data values.

Structured Grid Applications
The generation of the data sets for the square wave and Gaussian function test cases is described.
These data sets represent two extremes, from abruptly changing fields to smoothly changing fields.
The performance of the new autoencoders based on space-filling curve ordering is analysed and comparisons are made with a classical 2D convolutional autoencoder (CAE) and SVD.

Generating square wave data
In order to generate data that represents advection of a square wave, the time-dependent 2D advection equation is solved:

Generating the Gaussian data
To represent advection of a Gaussian function, the following profile is simply located at different points in the domain. The form of the Gaussian is given by where (x c , y c ) represents the centre of the Gaussian function, whose values are randomly sampled from the domain, which is discretised with a structured 128 × 128 grid. The parameter σ, which controls the width of the curve, is uniformly randomly sampled from the interval [10,20]. The data set consists of a total N s = 15360 examples.

Results for advection of a square wave
In order to assess their relative capabilities, we compare three SFC-based convolutional autoencoders, a classical 2D convolutional autoencoder and the SVD, all applied to the square wave data set. The autoencoders, their abbreviations and the section in which their architesctures are described, are listed in Table 9. The convolutional autoencoders all compress to 16 latent variables abbreviation architecture description CAE  having been trained over 5000 epochs and the SVD truncates to 16 variables. The so-called losses, defined here as the mean square error between the inputs and outputs of the autoencoder, see Equation (40), are shown in Figure 10 and Table 10, the latter includes the truncation error of the SVD. Both figure and table show that at 5000 epochs the losses of the classical 2D CAE and CAE-2SFC-NN are smaller than those of the CAE-SFC and CAE-2SFC autoencoders. All the autoencoders have a lower mean square error than the SVD.   For one example, Figure 11 shows shows the pointwise error, which, has an absolute value of, at most, 3%. We also used a finer grid, 256 × 256, to compare the classical 2D CAE and the CAE-2SFC-NN autoencoder for the square wave data set, see Table 12    For one solution, Figure 12 shows the solution (a) before and (b) after being passed through the autoencoder (CAE-2SFC-NN). It can be seen that that the model performs well and accurately reproduces the smooth Gaussian function for this particular example. Plot (c) shows the pointwise error, which, at most, has an absolute value of 2%. Plot (d) compares the original solution with the output of the autoencoder at a height of 1.5. The profiles shown are in very close agreement. The SFC-based autoencoder with two SFCs and nearest-neighbour smoothing is also compared with the classical CAE and SVD for a data set generated on a finer grid of 256 × 256. The losses are shown in Table 14.
Here we see that, as for the square wave data set ( We thus conclude that the new SFC-based autoencoder has comparable (possibly slightly better) accuracy to the classical 2D convolutional autoencoder; it outperforms the SVD (for these low dimensional spaces); it converges for a wider range of problems; but most importantly, the SFCbased autoencoders can be applied to data from unstructured meshes.

Unstructured mesh test case
A data set consisting of solutions for 2D flow past a cylinder is created. These results lie on an unstructured mesh which will reveal the effectiveness of the method for this type of data. The  SFC-based approach is compared with the SVD.

Results for 2D flow past a cylinder
Using the conservation laws, the following system of partial differential equations governing the motion of an incompressible fluid is obtained: where ρ is the density (assumed constant), u is the velocity vector, τ contains the stress and viscous terms, the momentum source is s u , t is time and the gradient operator ∇ is defined as This system of equations is solved as outlined in [8]. For the discretisation, a linear triangular element is adopted with a discontinuous Galerkin discretisation of the velocities and a continuous Galerkin representation of the pressure, often referred to as the P1DG-P1 element. Crank Nickolson is used to discretise in time. Only velocity variables are needed to train the networks, as these fully describe incompressible flow. The Reynolds number for this problem is Re = ρ U L ν = 3900 (47) in which the inlet velocity is constant, U =0.039 m s −1 , the density has value ρ =1000 kg m The data set, formed from the solutions of the above problem, consists of N s = 1000 snapshots.
Each snapshot has N = 20550 nodes, and each node has N uv = 2 features (the two velocity components, u and v, in this two-dimensional problem). The data set is divided randomly (in the N s dimension) into three parts according to the proportion 8:1:1 for training, validation and testing.
First, in order to determine the effectiveness of multiple SFCs, comparison is made between two SFC-based convolutional autoencoders: one using one space-filling curve and the other using two space-filling curves. Both have nearest-neighbour smoothing. Following this, the performance of the CAE-2SFC-NN network is investigated for different dimensions of latent space. The architectures of the autoencoders are described in the sections and tables listed in Table 15.
abbreviation architecture description CAE-SFC-NN Table 6, Section 3.3.1 a convolutional autoencoder based on ordering from one SFC CAE-2SFC-NN Table 7, Section 3.3.2 a convolutional autoencoder based on ordering from two SFCs and nearest-neighbour smoothing  Figure 13 and Table 16 show the losses of the two SFC-based autoencoders, one using one space-filling curve and nearest-neighbour smoothing (CAE-SFC-NN), and the other using two space-filling curves and nearest-neighbour smoothing (CAE-2SFC-NN). Judging from the losses, the network that performs slightly better is the autoencoder based on two SFCs, despite the fact that the number of weights in the convolutional layers for this network is approximately half that of the autoencoder with one SFC (CAE-SFC-NN). The reason for its effectiveness is that the two space-filling curves attempt to capture features in all directions across the mesh as the curves sample two different directions (each curve is very roughly orthogonal to the other) at a given node in the mesh. The compression accuracy of the two networks can be seen in Figure 14 and are both impressive. On careful inspection of the results based on one SFC, in Figure 14 (upper plots), one can see that they are noisier than the results based on two SFCs (lower plots), which highlights a possible advantage of two SFCs leading to more accurate results. However, the network based on one SFC (CAE-SFC-NN) still represents the basic flow features very well.     Figure 15 and Table 17, which indicate that, even though we reduce the number of compressed variables, the accuracy is still reasonable. In fact, the spatial distribution of the results for compression to 1 variable looks similar to the spatial distribution for 128 compressed variables in Figure 14. Notice that as the number of compressed variables is reduced the gap between the validation and training loss is reduced. This could be interpreted as a reduction in the tendency for over-fitting as the number of latent or compressed variables is reduced. Also notice from Figures 15 and 16 that the greater the number of compressed variables the shorter the wavelength of the structures in the error fields. Thus, with smaller compression sizes one seems to be able to capture the basic structures and as more variables are introduced finer scale structures are captured. Note that the imprints of the space-filling curves are not noticeable for the smaller number of compressed variables, therefore, the nearest-neighbour smoothing layers of the autoencoder must be performing well. However, one can see this imprint in the error field for larger numbers of compressed variables.

Comparison between two SFC-based autoencoders
Comparing the CAE-2SFC-NN network with the SVD (see table 17), we find that for compressed variable sizes of 128, 64, 32, 16, and 8, the truncation error of the SVD is smaller than the loss of the autoencoders. However, for compressed variables sizes of 4, 2 and 1, the losses of the autoencoders are smaller than those of the SVD. This is because flow past a cylinder is a smooth problem, periodic in time, for which only a few SVD modes are needed to accurately represent the flow. However, this is still a hard problem for the an autoencoder, and the CAE-2SFC-NN network does manage to more effectively compress to a small number of variables than the SVD approach. Finally, a box and whisker plot can be seen in Figure 17. Here, the average absolute error in space is calculated for each example in the data sets for each compression ratio. Then, for each compression ratio and for each data set (training, validation, test), the median error is shown   outliers and are not shown. The variation over training (left), validation (centre) and test (right) data is similar due to the time-periodic problem studied here. Remarkably, as the number of latent variables is reduced, the range of errors becomes slightly larger, but there is no obvious upward trend. This agrees with the plots shown in Figure 16, where the accuracy seems not to be affected by the number of latent variables.

Future work
The SFC-based approach to applying CNNs to data on unstructured meshes has been established in this paper. Furthermore, this approach has been applied to data on an unstructured mesh with differing areas of resolution. A next big challenge will be to apply the SFC-based CNN approach to solutions on adaptive meshes, that is meshes that change their topology and resolution as time evolves in order to optimally represent the physics. The SFC-based approach has the potential to be able to deal with even this, as the filters do not necessarily depended on the mesh. One may, for example, be able to take into account the connectivity by also including the coordinates as inputs to the 1D CNNs or to the SFC-based CNN. Once these issues are dealt with then it may be possible to apply (at least as a starting model for training) the weights from one example problem to the weights of a different fluids problem. In addition, the approach may be used to interpolate from one mesh to a different mesh while increasing the numerical resolution and sharpness of the features especially when interpolating onto meshes or mesh parts that have finer resolution than the original mesh. This interpolation approach is very much akin to the deblurring GAN-CNN or DeblurGAN [49].
Further work will also be needed to explore the application of the method to 3D, with, potentially, the use of three space-filling curves.
Another area of further study is the inclusion of the time dimension within the CNN. Rather than treating the time dimension independently [50], a space-filling curve could be applied simultaneously to four-dimensional solutions (three spatial dimensions and one temporal dimension).

Conclusions
The space-filling curve (SFC) approach shows great promise in enabling the application of convolutional networks to data from unstructured meshes. The approach has several features that make it ideal for application to convolutional neural networks (CNNs). This includes the aptitude of SFCs for automatic coarsening of meshes.
We demonstrate the approach by compressing the results of a solution of the Navier-Stokes equations for incompressible flow past a cylinder and show that it is able to compress of the order of 50,000 solution variables with a complex discontinuous Galerkin stencil down to between 128 variables and 1 variable, while maintaining accuracy. We also show that it can be beneficial to use two space-filling curves to help increase the accuracy of the CNN. We also found it important to reduce the noise in the outputs of the SFC-based autoencoders by introducing sparse smoothing layers near the output (and input) of the autoencoders. On structured mesh data sets the accuracy of the new SFC-based autoencoders are similar to classical autoencoders and it seems worth exploring their comparative capabilities even for data on structured grids.