Atlas Generative Models and Geodesic Interpolation

Generative neural networks have a well recognized ability to estimate underlying manifold structure of high dimensional data. However, if a single latent space is used, it is not possible to faithfully represent a manifold with topology different from Euclidean space. In this work we define the general class of Atlas Generative Models (AGMs), models with hybrid discrete-continuous latent space that estimate an atlas on the underlying data manifold together with a partition of unity on the data space. We identify existing examples of models from various popular generative paradigms that fit into this class. Due to the atlas interpretation, ideas from non-linear latent space analysis and statistics, e.g. geodesic interpolation, which has previously only been investigated for models with simply connected latent spaces, may be extended to the entire class of AGMs in a natural way. We exemplify this by generalizing an algorithm for graph based geodesic interpolation to the setting of AGMs, and verify its performance experimentally.


Introduction
The ability of deep generative networks to learn complex features of data in an unsupervised fashion has made them a promising tool for dealing with the problem of increasing amounts of unlabelled data and inchoate labelling.A (probabilistic) generating map G : Z !X transforms latent random seeds into synthetic data, usually with Z ¼ R d and X ¼ R D for some d ≪ D. Via G, a low-dimensional manifold structure, which follows high density regions of the data distribution, is learned.
Among other things, this enables continuous interpolations between points in latent space, rendering in X continuous transformation of samples along the underlying manifold structure of the data distribution.While an obvious option is linear interpolation for Euclidean latent spaces, recent research has also investigated using geodesic interpolations for Z ¼ R d considered as a Riemannian manifold.The geometric structure is here chosen so that curve length in the latent space matches curve length in the data space for curves restricted to the manifold G [1][2][3].The approach yields a more accurate notion of distance and shortest paths in Z as it is based on the distance actually traversed in X along the manifold structure.
For inherently non-linear latent spaces, e.g.hybrid discretecontinuous latent space Z Â Y where Y ¼ 1, . . ., m f gfor some m ∈ ℕ, Euclidean distances and linear interpolations are not even well defined.However, in this paper, we suggest that we can still make sense of geodesic interpolations, thus expanding on the geometrical interpretation of deep generative networks.This generalization is important: Even for simple manifolds, a single latent space is not sufficient to accurate represent data spread over the entire manifold.
The manifold estimating qualities of generative networks, combined with a hybrid discrete-continuous latent space Z Â Y, has already lead to an interpretation inspired by the notion of a manifold atlas from differential geometry.For each restriction to some y ∈ Y, the map G y : Z !X resembles (the inverse of) a coordinate chart on the immersed manifold (see Fig. 1).Most notably, it is formally shown in [4] that multiple charts are actually necessary in order to properly approximate data manifold structure with non-trivial topology.Building on this, we state explicitly some general criteria we expect of a generative model in order for them to fully satisfy the atlas interpretation, and use the terminology Atlas Generative Models (AGM) to refer to this class of models.Like with the Chart Auto-Encoder (CAE) of [4], these are models that, in addition to chart inverse estimates G y , yield chart estimates F y : X !Z and a partition of unity ψ : X !Δ m−1 , another concept taken from differential geometry.Here Δ m−1 denotes the standard (m − 1)simplex, and ψ may thus through its coordinate functions ψ y be seen as assigning to each point x∈X the importance of the individual charts in that point, with these summing to one.
We show how examples of AGMs, in addition to the CAE, are also to be found within the paradigms of Variational Auto-Encoders (VAEs) and Wasserstein Auto-Encoders (WAEs).If one relaxes the requirements by not demanding encoding networks F y , we also see examples from the realm of Generative Adversarial Networks (GANs).Some of the models have grown out of the desire to capture non-trivial topological features of the underlying manifold structure of high-dimensional data, while others have been studied for the purpuses of disentangled representation learning or semi-supervised learning.
With a concise yet general concept to build upon, we move on to describe a graph based procedure for approximating latent space geodesic paths, generalizing the algorithm from [5] to the more broad setting of AGMs, thus providing a novel concept of latent space interpolation.Using the partition of unity, we may define areas for which the importance of the charts transcend some thresh- Using the intersections of these, we can identify points in different charts of the latent space in accordance with the manifold structure, and thus make sense of continuously traversing across different charts (see Fig. 1).As this is coherent with the geometric interpretation of generative networks as manifold estimators, we see it as a constructive first step to expanding the theory of non-linear latent space analysis and statistics, to the class of atlas estimating models.
Our main contributions consist of: • Providing a precise characterization of a class of generative networks, AGMs, which may be thought of as estimating an atlas on the underlying manifold structure of a data distribution, and showing that multiple existing generative models fit into this class.
• Introducing a procedure for approximating latent space geodesic paths for any model in the class of AGMs, thus providing a notion of continuous interpolation novel to this class of models.We note, that even having a notion of continuous interpolation is new, as the trivial example of linear interpolation is not well defined for AGMs, due to the inherently discontinuous nature of their latent space.
• Demonstrating empirically that the procedure produces interpolations with comparable qualities to those produced in a non-atlas setting, thus making it a viable tool when working with any AGM.
With these contributions, we extend the current ability to interpolate between data points in generative models with single latent spaces to a much wider class of data manifolds: Manifolds that have non-trivial topology and thus need more than a single latent space to be accurately represented.We construct the extension in a general setting to avoid restricting to a specific generative model.Instead, the construction encompasses an entire class of models, the class that we denote AGMs.
We begin the paper with a theory section, in which we briefly present the necessary background for our contributions.In the following section, we present our definition of AGMs and mention the examples of generative models within this class.After that, we present the graph based procedure for geodesic interpolation for AGMs.We end with an experiment section, in which we demonstrate the procedure for a specific AGM trained on the MNIST dataset [6].

Theory
In this background section, we will first go through the theory and relevant paradigms of generative networks.After this, we will review their relation to the manifold hypothesis, including non-linear latent space analysis, such as geodesic interpolation.

Generative neural networks
Most generative neural networks use a (probabilistic) transformation G : Z !X with X ¼ R D and Z ¼ R d for some d ≪ D. However we will be more concerned with models which have a hybrid discretecontinuous latent space Z Â Y, where Y ¼ 1, . . ., m f gfor some m ∈ ℕ, with prior distribution P(Z, Y) = P(Z)P(Y).Learning the generative transformation G is the common aim of generative network models, but there has been quite varying approaches to optimizing the parameters of G, ultimately rooted in different theoretical motivations.Let us consider a few:

Variational Auto-Encoder (VAE)
In [7] the task of training a generative network is approached by implementing, in addition to a probabilistic decoding network z ↦ P(X| z), a probabilistic encoding network x ↦ Q(Z| x), and training these simultaneously to maximize the variational lower bound on the marginal loglikelihood of the data.If the latent space is Z Â Y, the encoder is given by x ↦ Q(Z| y, x)Q(Y| x) and the decoder by (z, y) ↦ P(X| z, y).The variational lower bound becomes where D KL (⋅‖⋅) denotes the Kullberg-Leibler divergence between two probability distributions and corresponds for a given y to the variational lower bound for a continuous latent space VAE.The significance of the lower bound atlasvariational-lower-bound (1) was proven in [8], where, using it for unlabelled data, it was shown how generative networks can be utilized for semi-supervised classification.It is also the training objective for the InfoCatVAE model of [9] which is used to learn disentangled latent representation in a completely unsupervised fashion, by combining the lower bound with a regularizing term for maximizing mutual information inspired by [10] (see ( 5)).

Generative Adversarial Network (GAN)
Introduced in [11], GANs are motivated from a game theoretical perspective.Together with a generating network G : Z !X, a discriminative neural network D : X !0, 1 ½ is implemented, acting as an adversary to the generator.The generator and the discriminator plays a minimax game, which in its equilibrium state minimizes the Jensen-Shannon (JS) divergence between the true data distribution P data (X) and the distribution of the synthetic data P G (X).
To achieve disentangled latent representation the InfoGAN model is introduced in [10].This model has a latent space Z Â Y and adds a discrete variable inference network x ↦ Q(Y| x), which makes it possible to maximize the mutual information (MI) between the descrete latent variable y and samples G(z,y).The minimax game becomes min where λ > 0 is a hyper-parameter and Here H[p(y)] denotes the differential entropy of the discrete prior P(Y).
InfoGAN obtains impressively disentangled latent space representation in a completely unsupervised fashion, and it is worth noting that it easily fits into a semi-supervised setting as well.This is done in [12], which shows how combining the InfoGAN with small amounts of auxiliary label information both increases the quality of synthetic samples, and speeds up convergence of the model.

Wasserstein Auto-Encoder (WAE)
The WAE models presented in [13] are a flexible class of autoencoding models, with a particular example being the Adversarial Auto-Encoder (AAE) of [14].They operate with the objective of minimizing the optimal transportation cost between P G (X) and P data (X).If we again assume a latent space Z Â Y, the WAE consists of encoding and decoding networks, F : X !Z Â Y respectively G : Z Â Y !X.These can be probabilistic, but we shall for simplicity assume deterministic encoding and decoding.The optimal transpotation cost is approximated by where c : X Â X !R þ is some measurable cost function, and Þis some divergence measure between Q(Z, Y), which is the image distribution in Z of P data (X) under the encoder F, and the prior distribution P(Z, Y).In [15], the Euclidean distance c = ∥ x − G ∘ F(x)∥ 2 is used for the reconstruction and the JSdivergence is used as D, as this can be minimized by an adversarial approach.The result is a variation of the WAE-GAN [13], or AAE [14], which also has discrete latent variables.The motivation behind this is to emulate an atlas, enabling the capture of non-trivial homotopical structures of the data distribution, something we elaborate on in section 2.2.

Chart Auto-Encoder (CAE)
The auto-encoding model CAE presented in [4] is directly based on the notion of a manifold atlas.It has latent space Z Â Y with encoding and decoding maps F y : X !Z respectively G y : Z !X for y = 1, …, m and a chart prediction network P : X !Δ m−1 .This is presented with a specific model architecture, well suited for multi-chart representation and a training loss function defined by for x ∈ X , where e y = ∥ x − G y ∘ F y (x)∥ 2 and l y = softmax(e y ).This model is also topologically and geometrically motivated, i.e. aimed at approximating the underlying manifold structure of the data distribution.

Manifold estimation with generative networks
In differential geometry, a smooth atlas on a d-dimensional manifold M is a collection of charts {φ α : U α → ℝ d }, where U α ⊂ M are open subsets such that ∪ α U α = M.The charts must be homeomorphisms on their image and the transitions φ α ∘ φ β −1 : φ β (U α ∩ U β ) → ℝ d should be smooth maps.A partition of unity subordinate to an open cover {U α } of M is a collection of functions ψ α : M → [0, 1], which satisfy.
• The support supp ψ α is contained in U α for all α.
• Every point x ∈ M has a neighborhood V, such that supp ψ α ∩ V ≠ ∅ for only finitely many α.
In particular, a finite partition of unity may be expressed as a map ψ : M → Δ m−1 , where Δ m−1 is the standard (m − 1)-simplex, for which the coordinate functions satisfy supp A generative model G : Z !X may be considered a map of manifolds.In particular, G may parametrize an embedded manifold in X if the dimension of Z is lower than that of X and it is homeomorphic to its image.As the widely used manifold hypothesis states that high dimensional data is often distributed in proximity of a manifold structure of dimension far lower, it is natural to expect G to estimate this structure.
Unfortunately, there are limitations connected to the choice of a simply connected latent space.In [4], it is shown formally that a model with a simply connected subset of ℝ d as latent space cannot faithfully represent any manifold structure of non-trivial homotopy type, but it is possible if the latent space consists of a collection of coordinate spaces Z Â Y. Having a hybrid discrete-continuous latent space may thus in some scenarios be strictly necessary to obtain approximative qualities of G within a certain margin.

Riemannian geometry of generative networks
A smooth manifold M is called a Riemannian manifold, if it is equipped with a so called Riemannian metric.This is the assignment of an inner product 〈⋅, ⋅〉 p to the tangent space T p M for each point p ∈ M. The Riemannian metric can be used to define curve lengths of piecewise smooth curves γ : [0,1] → M as and in turn gives a notion of distance between points p 0 , p 1 ∈ M, as the infimum inf γ L(γ) taken over all piecewise smooth curves γ going from p 0 to p 1 .Geodesic curves on a Riemannian manifold are curves which are locally length minimizing with respect to this distance.
In case of a map of smooth manifolds, G : ℝ d → ℝ D , one can get a Riemannian metric on ℝ d by taking the pullback of the Euclidean metric in ℝ D .This is given by assigning to each point z ∈ ℝ d the inner product for all tangent vectors u, v ∈ T z ℝ d , where J G (z) is the Jacobian matrix of G evaluated in z.
The idea of considering the latent space of a generative model G : Z !X with the pullback of the Euclidean metric is presented in [1][2][3], in order to give a more accurate notion of distances and shortest paths in the latent space.As such, linear paths and Euclidean distances in Z are replaced with geodesic paths and Riemannian distances.Different approaches to finding geodesic paths with respect to the latent space Riemannian metric are presented in [1][2][3].The Riemannian latent space concept is further expanded with other tools for non-linear latent space analysis and statistics in [16].
To improve efficiency, a graph based approach to approximate geodesics with respect to the Riemannian geometry was presented in [5].
First a graph is formed in latent space as a k-d tree.To get the nodes for the graph a set of data points X = {x (1) ,…, x (N) } are mapped to their respective latent encodings Z = {z (1) ,…,z (N) }.Assuming the geodesics are locally close to linear, one proceeds by finding the k-nearest neighbors of each node z (i) with respect to the Euclidean distance in Z, and connecting each of these to z (i) with an edge.These edges are weighted with the length of the interpolation with respect to the Riemannian metric, i.e. using a numerical estimation of (8).
For two points z 0 , z 1 ∈Z, the geodesic between z 0 and z 1 is approximated by the shortest path through the graph.More specifically, z 0 and z 1 are added to the graph and connected with their k nearest neighbors through edges weighted as above.Using A* path search in the graph, the shortest path between z 0 and z 1 is found.The resulting curve is thus the piecewise linear interpolation in Z between nodes along this graph path.
Even though all of the aforementioned research on non-linear latent space analysis build on the manifold interpretation of generative networks, there has, to the best of our knowledge, been no attempts at generalizing the tools presented to a setting where G : Z Â Y !X estimate an atlas.After laying out exactly which models this includes, by providing a terminological foundation, we shall take a first step in this direction by generalizing the procedure of [5] to such models.

Atlas generative models
Building on the ideas of [4,15], we will now draw up the essential components that make an atlas estimating generative network, which we shall define as the following: A generative model which only yields chart inverse estimates {G y } y=1 m and a partition of unity ψ, we call a semi-AGM.
We stress that the encoding and decoding maps F y respectively G y , only estimate charts of an atlas, and thus do not possess all the theoretical properties of a manifold atlas.Most notably, there typically will not be guarantees that F y are the inverses of G y on their image, though training objectives of the generative models will often encourage that they are close to that.
As we shall see, the generative models surveyed in section 2.1 fall into the class of AGMs or semi-AGMs:

CAE
Designed solely with the purpose of resembling an atlas, the CAE model naturally fits the AGM characterization.Encoding and decoding networks estimate charts F y and chart inverses G y , and the chart prediction network defines the partition of unity ψ.

Atlas VAE
Within the paradigm of VAEs, we can consider the VAE for semisupervised learning [8] or the InfoCatVAE [9].The discrete inference network defines a partition of unity by letting ψ y (x) = q(y | x) for all y ∈ Y. Furthermore we get chart estimates F y : X !R d and chart inverse estimates G y : R d !X by letting ½ respectively.These mean values are typically directly available, as encoders and decoders of VAEs usually map a point to the mean value and variance of a multivariate Gaussian distribution.
These AGMs were initially presented with the purpose of semisupervised classification, respectively unsupervised, disentangled representation learning.

Atlas WAE
Unsurprisingly, the WAE model of [15], which was directly inspired by the notion of atlases in differential geometry, also falls into the class of AGMs.The encoding and decoding maps directly provide the charts F y and inverses G y , and the discrete inference network naturally provides a partition of unity like above.While this is a particular example, we note that another cost function c : X Â X !R þ , as well as divergence term Þ , could also be used, though we have not seen it explored in other research.
In cases where encoding or decoding transformations are probabilistic, i.e. x ↦ Q(Z, Y| x) and (z, y) ↦ P(X| z,y), chart estimates and inverses are easily obtained by using the mean value of the image distributions, exactly as is the case for Atlas VAEs.
In [15], the Atlas WAE model is used to capture underlying, nontrivial topological structures in the data distribution.We observe in section 4 how it also produces a disentangled latent representation, which most likely could be further improved by adding an MI-regularizer like in the InfoCatVAE.We are not aware of research into utilizing Atlas WAEs for semi-supervised learning, though this could also be implemented alike [8].

Atlas GAN
The adversarial paradigm within generative networks has quickly become one of the most popular.A downside to these models though, is the lack inference networks.This is also partly the case for the InfoGAN model, however the discrete inference network may, as in the previous examples, be used as a partition of unity, which together with the decoding maps G y makes the InfoGAN a semi-AGM.
While this category is slightly deficient compared to regular AGMs, we think the significance of the adversarial paradigm makes it worthwile to include.Especially since they share qualities with AGMs, e.g.disentangled representation learning, and since the manifold estimation is also considered a noteworthy quality of GANs.We see in the ss-InfoGAN [12] another example of how charts of (semi-)AGMs may easily be paired with auxiliary semantic labels, in that case improving both sample quality and training efficiency.
The above examples display how the notion of AGMs span models from the most popular paradigms in the field of generative networks.They have already been proved worth studying for their qualities within representation learning, semi-supervised learning and manifold learning.While the connection made in agm-definition to atlas estimation is, at least for some of the models, not new, assembling them into a class of models makes it possible to develop geometric procedures and non-linear statistics concisely, without being model specific, as we shall see exemplified next.

Geodesics in atlas generative models
We shall now proceed to consider geodesic paths in AGMs.It is here the significance of having a partition of unity in addition to the atlas charts comes into play.In differential geometry, a partition of unity subordinate to some open cover can be a convenience, and even sometimes necessary, as it makes explicit the weight of each covering set in any given point.The same convenience is offered by a partition of unity ψ : X !Δ m−1 for an AGM.In particular, we may specify areas of X for which a given chart is represented: Choosing some ε ≥ 0, we may consider the open sets The utility of the sets {U y } y=1 m is that we may start to identify points in different coordinate spaces, on the basis of whether their image under G, or pre-image under F, is contained in intersecting areas ∩ y∈σ U y for some σ ⊂ Y, eventually enabling us to deal with the ambiguity posed by having overlapping charts.
Let us consider this in practice as we generalize the algorithm for graph based geodesics from [5], which we briefly described in section 2.3.Suppose we have an AGM and a set of data points X = {x (1) , …, x (N) }.A latent graph is formed by using encodings z i,y , y À Á ∈Z Â Y whenever x (i) ∈ U y , where z i,y = F y (x (i) ).In other words, for any chart y, for which x (i) is assigned the label y with probability larger than ε, we encode x (i) to a latent point in this chart.Within each copy of Z, we may form a graph exactly as described in section 2.3.Each of these disjoint graphs may then be connected through edges between nodes (z i,y , y) and (z i,y′ , y′), i.e. connecting nodes that are encodings of the same point x (i) .
For edges within a given chart, we shall, inspired by [3], numerically approximate (8) where t i ¼ i n for i = 0, …, n for some number of steps n ∈ ℕ.Which weights to assign to edges connecting charts is, however, not obvious.While the geometric interpretation is that (z i,y , y) and (z i,y′ , y′) are just different coordinates for the same point on the manifold structure and thus should have a 0-weighted edge between them, the reality is that the decodings G(z, y) and G(z′, y′) might differ slightly.As the intuition behind the geodesic distance is that it represents the curve length along the immersed manifold structure in the surrounding space X , another natural choice of weight would be the Euclidean distance ∥ G y (z i,y ) − G y′ (z i,y′ )∥ 2 in X .As such, the weight represents the actual jump made in X in order to change charts.The graph building procedure is summarized in Algorithm 1.
The geodesic interpolation between latent points (z 0 , y 0 ) and (z 1 , y 1 ) may, just like in [5], be found by adding each of the points to the graph, connecting them to their k nearest neighbors in their respective charts, and finding the shortest path between them using the A* search algorithm.
In our approach to graph based geodesic paths, we have stayed as close to the algorithm presented in [5] as possible, including the use of encoded data to build the graph.We note, however, that this is not a canonical choice, and other heuristics may be used to create the latent space graph, e.g.sampling points using the prior (z, y) ∼ P(Z, Y) and connecting them to their encodings in other charts, whenever the decodings satisfy G y (z) ∈ U y′ for some y′ ≠ y.It might also be possible to find overlapping points in different coordinate charts entirely without using encoding maps, which would enable this type of geodesic path estimation for semi-AGMs as well.

Experiments
We have suggested how the notion of AGMs makes it possible to make sense of continuous geodesic interpolations in the latent space, even when this is hybrid discrete-continuous.In this section we experimentally verify this idea.To do so we have implemented an Atlas WAE-GAN (AWAE-GAN) [15], which we use to evaluate the procedure for graph based geodesics described in section 3.1 on the MNIST [6] and the FashionMNIST datasets [17].For comparison, we also implemented the (single chart) graph based geodesic interpolation from [5] on a regular WAE-GAN.
The WAE-GAN is implemented exactly as in [13] with latent space ℝ 8 and multivariate standard Gaussian prior.For the AWAE-GAN, we replace one continuous dimension with the discrete set Y, the size of which we vary (see Table 1).The prior on the continuous part is also a multivariate Gaussian, and we use a uniform categorical distribution for P(Y).The chart encoders and decoders, as well as the discrete inference network, are implemented similarly to [13], though to decrease model size, we use shared layers between the individual encoding charts and the partition of unity, as well as for the decoding charts.Layer sizes of the AWAE-GAN model are decreased depending on the number of charts, so that the overall model size matches that of the WAE-GAN in terms of trainable parameters (see Table 1).A detailed model description can be found in Appendix A.
The performances of the models in terms of test reconstruction errors are similar, though we observe in this experiment better performance by some of the AWAE-GANs compared to the WAE-GAN (see Table 1), despite the discrete latent dimension theoretically enabling less expressiveness than a continuous dimension.This could be due to the topological representation issues discribed in [4], although drawing that conclusion would require more thorough investigation.
In Fig. 2a we display random samples produced by the different charts of our AWAE-GAN with 8 charts.We observe that the same digits typically appear in only 2-3 charts, indicating that the different charts indeed represent different parts of the data distribution, with some overlaps as expected.We note further that even without regularization for improving disentanglement, such as MI-regularization, the chart assignment confidence of the model, measured as the probability of the most significant chart max y ψ y (x) on a test set of 10.000 previously unseen data points, is close to 1 for a large proportion of the data, see Fig. 2b.From an atlas estimation point of view, this is a good quality, as most data is thus only assigned to one chart, while overlaps are less significant.In its own right, it is also an interesting observation that this is indeed the representation the model converges towards, even without adding any regularizing terms to enforce it.
For the graph based interpolation, we sample points for the latent space graph by encoding 2000 data points.These are connected to their 20 nearest neighbors and the edges are weighted with an approximation of the linear interpolation curve length obtained by (10) using 15 intermediate steps.
To evaluate the performance of the geodesics on the Atlas WAE-GAN model, we pick 100 start and end points X start , X end ∼ P data (X) from a test data set and encode these points to their latent representation.Using [5, Alg.1] for the WAE-GAN model and algorithm 1 for the AWAE-GAN model, we find the geodesic interpolations and the lengths of these paths.We see in Fig. 3a, that the interpolations on the AWAE-GAN model tend to be longer than on the WAE-GAN model.We expect this is a result of the chart overlaps creating bottlenecks in the graph, which is not present in the WAE-GAN graph, resulting in slightly longer interpolations.This is also coherent with the fact that the AWAE-GAN graphs have a bigger diameter than the WAE-GAN graph (see Table 1) and the fact that having more charts, thus increasing the total of chart connecting edges, seems to counter the effect.
Overall, we observe that the graph based geodesics on the AWAE-GAN does produce path lengths and interpolation quality (see Fig. 3b) comparable to that of the graph based geodesics on the WAE-GAN.In particular, we see in Fig. 3b, that this is despite intermediate transitions  between charts, which we find noteworthy, given the discontinuous nature of the AGM latent space.

Conclusion
We have introduced the notion of AGMs, a class of generative networks which justifiably resemble an estimation of an atlas on the underlying manifold structure of a given data distribution P data (X).Examples of AGMs have been surveyed, spanning different popular paradigms of generative models.
Though the non-linear nature of AGM latent spaces inherently exclude a notion of linear interpolation and Euclidean distance, we have expanded on the atlas interpretation from differntial geometry, and instead made sense of geodesic interpolations and Riemannian distance for this class of models.We have verified this in practice by presenting an algorithm generalizing the graph based approach to geodesic interpolation of [5], and obtained interpolations of comparable quality to that of a non-atlas model.While the geodesic paths produced in the AGM had a tendency to be slightly longer than those of the non-atlas model, we still conclude that the suggested procedure represents a viable and novel concept of interpolation for the class of AGMs.
Future work could include investigating the generalization of other tools from non-linear latent space analysis and statistics to the setting of AGMs.Another possible direction for further research is to improve the geometric features of the AGMs, e.g. by developing a regularizing term used during training, which produces more coherent and smooth chart overlaps in X .

Fig. 1 .
Fig.1.Geodesic interpolation for AGMs.Using a partition of unity ψ : X !Δ m−1 we can detect the areas where charts overlap, and use it to make sense of continuous geodesic interpolation in an otherwise discontinuous latent space.

Definition 1 .
An Atlas Generative Model (AGM) is a generative model with latent space ℝ d × {1,…, m} for some d, m ∈ ℕ, which posttraining yields a family of chart and chart inverse estimates, F y : X !R d respectively G y : R d !X for y = 1, …, m, together with a partition of unity ψ : X !Δ m−1 .

Fig. 2 .
Fig. 2. AWAE-GAN 8 charts.(a) Each box displays 12 random samples from a single chart.The charts approximate different parts of the data distribution with some overlaps.This is indicated by the fact that each digit/item is typically present in only 2-3 charts.(b) Histogram over chart assignment confidence max y ψ y (x) for 10.000 test data points in X .Data samples are most typically assigned to a single chart with high confidence, also indicating that each chart is responsible for approximating certain parts of the data distribution.

Table 1
Model information (MNIST/FashionMNIST).Total trainable parameter count for encoding and decoding networks together.Reconstruction error calculated as the mean over 10.000 previously unseen data points.Number of nodes and edges, as well as diameters, are from the graphs used to approximate geodesics.