ExoGAN: Retrieving Exoplanetary Atmospheres Using Deep Convolutional Generative Adversarial Networks

Atmospheric retrievals on exoplanets usually involve computationally intensive Bayesian sampling methods. Large parameter spaces and increasingly complex atmospheric models create a computational bottleneck forcing a trade-off between statistical sampling accuracy and model complexity. It is especially true for upcoming JWST and ARIEL observations. We introduce ExoGAN, the Exoplanet Generative Adversarial Network, a new deep learning algorithm able to recognise molecular features, atmospheric trace-gas abundances and planetary parameters using unsupervised learning. Once trained, ExoGAN is widely applicable to a large number of instruments and planetary types. The ExoGAN retrievals constitute a significant speed improvement over traditional retrievals and can be used either as a final atmospheric analysis or provide prior constraints to subsequent retrieval.


INTRODUCTION
The modelling of exoplanetary atmospheric spectroscopy through so-called atmospheric retrieval algorithms has become accepted standard in the interpretation of transmission and emission spectroscopic measurements (e.g. Kreidberg et al. 2018;Tsiaras et al. 2018;Bruno et al. 2018;Mansfield et al. 2018;Spake et al. 2018;Sheppard et al. 2017; Barstow et al. 2017;Rocchetto et al. 2016). These retrieval algorithms are designed to solving the often ill-posed inverse problem of determining atmospheric parameters (such as trace gas abundances for example) from the measured spectra and their corresponding measurement uncertainties (e.g. Irwin et al. 2008;Madhusudhan & Seager 2009;Line et al. 2013;Benneke & Seager 2013;Lavie et al. 2017;Gandhi & Madhusudhan 2018;Cubillos et al. 2016). The associated atmospheric forward model to be Corresponding author: Tiziano Zingales tiziano.zingales.15@ucl.ac.uk fitted varies in complexity from retrieval to retrieval but most times encompasses a high dimensional likelihood space to be sampled. In the era of JWST (Gardner et al. 2006) and ARIEL ) observations, said model complexity will have to increase significantly. To date, the most commonly adopted statistical sampling methods are Nested Sampling (Skilling 2004;Feroz & Hobson 2008;Feroz et al. 2009) and Markov Chain Monte Carlo (e.g. Gregory 2011). These approaches typically require of the order of 10 5 -10 6 forward model realisations until convergence. The traditional analysis method, which uses Bayesian statistics, creates a precarious bottleneck: to achieve convergence within reasonable time frames (hours to days), we require the atmospheric forward model to be fast and consequently overly simplistic. The inclusion of disequilibrium chemistry, self-consistent cloud models and the move from 1 D to 2-3 D radiative transfer, are largely precluded by this constraint. In this paper, we present the first deep learning architecture for exoplanetary atmospheric retrievals and discuss a path towards solving the compu-tational bottleneck using atmospheric retrievals assisted by deep-learning.
Artificial Intelligence has been used extensively to understand and describe complex structures and behaviour in a wide variety of dataset across a plethora of research fields.
In recent years, the field of exoplanets has seen pioneering deep-learning papers on planet detection (Pearson et al. 2018;Shallue & Vanderburg 2018), exoplanet transit prediction (Kipping & Lam 2017) and atmospheric spectral identification Waldmann (2016). In Waldmann (2016) we applied a deep-belief neural network (DBN) to recognise the atmospheric features of an exoplanetary emission spectrum. This approach provided a qualitative understanding of the atmospheric trace gases likely to be present in a planetary emission spectrum, to then be included in our atmospheric retrieval framework TauREx (Waldmann et al. 2015b,a). In this paper, we introduce a generative adversarial network (GAN, Goodfellow et al. 2014) to predict the maximum likelihood (ML) of the full retrieval solution given the observed spectrum. As shown in the following sections, this can be used as a stand-alone solution to retrieval or used to constrain the prior parameter ranges for a more standard atmospheric retrieval later.
We design our algorithm following four guiding principles: • Once trained, the deep or machine-learning algorithm should apply to the widest possible range of planet types.
• Once trained, the algorithm should apply to a wide range of instruments.
• The algorithm should be robust in the presence of unknown 'un-trained' features and be able to generalise to parameter regimes outside its formal training set.
• The design of the algorithm and data format should be modular and easily modifiable and expandable.
In the following sections, we present the Exoplanet Generative Adversarial Network (ExoGAN) algorithm and demonstrate it on a variety of retrieval scenarios. We provide the ExoGAN algorithm freely to the community (see end of paper).

METHOD
In the following sections, we will introduce GANs and deep convolutional generative adversarial networks (DC-GANs), followed by a discussion how we adopt DCGANs for exoplanetary retrievals.

Generative Adversarial Networks
Generative Adversarial Networks first introduced by Goodfellow et al. (2014) belongs to the class of unsupervised deep generative neural networks . Deep generative models can learn the arbitrarily complex probability distribution of a data set, p data , and can generate new data sets drawn from p data . Similarly, they can also be used to fill in missing information in an incomplete data set, so-called inpainting. In this work, we use the data inpainting properties of the GAN to perform retrievals of the atmospheric forward model parameters.
The most common analogy for a GAN architecture is that of a counterfeit operation. The neural network is given a training data set, x, in our case combinations of atmospheric spectra with their associated forward model parameters. We refer to the training set as the 'real' data with the probability distribution p data . Now two deep neural networks are pitted against each other in a minmax game. One network, the generator network (G), will try to create a 'fake' dataset (p g ), indistinguishable from the 'real' data. In a second step, a second neural network, the discriminator (D), tries to classify 'fake' from 'real' data correctly. The training phase of the GAN is completed when a Nash equilibrium is reached, and the discriminator cannot identify real from fake any longer. At this stage the generator network will have learned a good representation of the data probability distribution and p g p data . Figure 1 shows a schematic of our GAN implementation. Unlike for variational inference methods, such as variational autoencoders (VAE; Kingma & Welling 2013;Jimenez Rezende et al. 2014), the functional form of the data likelihood does not need to be specified but is learned by the Generator. Such implicit latent variable models or likelihood-free networks allow the learning of arbitrarily complex probability distributions in an unsupervised manner while assuming minimal prior assumptions on the data distribution.
GANs have been applied to multiple problems, such as semi-supervised learning, stabilizing sequence learning methods for speech and language, and 3D modelling (Denton et al. 2015;Radford et al. 2015;Salimans et al. 2016;Lamb et al. 2016;Wu et al. 2016). Notable examples of GANs applied in an astrophysical context are given by Rodriguez et al. (2018); Stark et al. (2018); Schawinski et al. (2017), who used GANs trained on existing N-body simulations to efficiently generate new, physically realistic realisations of the cosmic web, learn Point Spread Function from data or de-noise groundbased observations of galaxies.
In the field of exoplanets, the use of GANs or similar deep architectures has not yet been explored. In this work, we base ExoGAN on a Deep Convolutional Generative Adversarial Network (DCGAN, Radford et al. 2015).
DCGANs are an evolution from the classical GAN by replacing the multilayer perceptrons (MLPs; Rumelhart et al. 1986;Bengio 2009) in the Generator and Discriminator networks with all convolutional layers. Their characteristics makes DCGAN significantly more robust to discrete-mode and manifold model collapse (Metz et al. 2016;Arjovsky & Bottou 2017) and are found to be stable in most training scenarios (Radford et al. 2015). The use of batch normalisation (appendix B further increases training speed and robustness. Besides, we note that convolutional networks are ideally suited to capturing the highly correlated signals of broad, roto-vibrational spectral bands in NIR and IR wavelengths.

Adversarial Training
As described in the previous section, both Generator and Discriminator networks are pitted against one another during training. The goal of the training phase is to reach a Nash Equilibrium, i.e. when neither player can improve by unilaterally changing one's strategy. Figure 1 shows a schematic of the ExoGAN setup.
In order to return the generator distribution p g over the data x we start from a prior distribution of Gaussian distributed latent variables p(z) and define G(z; θ G ) as the mapping from latent variable space to generated data. Here θ G are the hyperparameters of the Generator network (see table 6). Let D(x) be the probability that x came from the data rather than p g . Hence, in the state of convergence, we have p g = p data and D(x) = 1 2 . In the training phase we need D to maximise the probability of assigning the correct label to both training examples and samples from G. At the same time we want G to minimize the probability log (1 − D(G(z))). We can now define the cross-entropy cost-function of the Discriminator as: During training, we employ batch training, with the cost function of a batch of n data samples being which can be written as the expectation values over the data and generated samples: Since the discriminator wants to minimize the cost function and the generator wants to maximise it, we can summarise the training as a zero-sum game where the cost function for the generator is given by: J (G) = −J (D) . Hence, to capture the entire game, we only need to specify the loss-function of the Discriminator since it encompasses both θ (D) and θ (G) hyperparameters. We then optimise the value function As stated earlier, equation 4 constitutes a minmax game since it involves minimising over G in an outer loop and maximising over D in an inner loop.

Application to exoplanet spectra
Here we explain the data format of the input and training data. In figure 2 we show an example a transmission spectrum of a cloud-free hot-Jupiter with water as the only trace-gas at 3·10 −4 volume mixing ratio at a constant resolution of ∆λ λ = 100. We train ExoGAN on a wavelength range of 0.3µm −50µm. For this paper, we restrict our sampling resolution to be R = 100 for every spectrum. This choice, however, does not preclude training with higher resolution data in the future.

Normalisation
For the neural network to learn efficiently, we must normalise the data to lie between zero and unity. We have experimented with various normalisation schemes. The most obvious scheme is a 'global' normalisation, where we normalise the full training set by its global maximum and minimum values. This approach proved problematic as spectral signatures for planets with low trace-gas abundances, and small atmospheric scale heights, would be too weak/flat to be recognisable by the neural network for reasonable training times. We have therefore opted to normalise each training spectrum to amplify the spectral features. Assuming that the most common broadband absorber is water in an exoplanetary atmosphere, we divide the spectral range along its major water bands in the IR, see dashed red lines in Fig  2. Note this does not mean that water-free atmospheres cannot be detected. Additionally, we divide the spectrum by the pass-bands of the JWST/NIRISS, NIRCam Input spectrum water bins JWST range WFC3 range Figure 2. Spectral binning used in this work. The black line is a simulated spectrum of the hot-Jupiter HD 189733b. The red vertical lines represent the bin edges of prominent water bands. The blue and orange areas are the Hubble/WFC3 and JWST band-passes considered in this paper, respectively . and MIRI instruments (?) and the Hubble/WFC3 instrument passband. In total, we have 14 spectral bands. We now normalise each spectral band between 0 and 1 and record the minimum and maximum normalisation factors for each. This normalisation scheme ensures a maximum amplification of the spectral features while retaining reversibility.

The Atmospheric Spectrum and Parameters Array (ASPA)
To store all aspects of an atmospheric transmission spectrum, we define the Atmospheric Spectrum and Parameters Array (ASPA). It is a 2D array encoding the 1D normalised spectral bands, each band's minimum and maximum normalisation factors and the associated forward model parameter values. We parametrise each training spectrum with seven forward model parameters, φ, namely: H 2 O, CO 2 , CH 4 and CO volume mixing ratios, the mass of the planet M p , the radius R p and its isothermal temperature T p at the terminator. Figure 3 shows a false-colour ASPA. For this paper, the ASPA is a 33×33 pixel array, with the main part (section 1) encoding the spectral information. Sections 2 -5 encode the normalisation factors and 6 -12 the atmospheric parameters. By design, the planet's water abundance takes a significantly large range area of the ASPA, reflecting the relative importance of water in forming the spectral continuum. The ASPA format is adaptable to other configurations in the future.

The training
To train ExoGAN on a wide range of possible exoplanetary atmospheres, we generated a very comprehensive training set of atmospheric forward models using the TauREx retrieval code (Waldmann et al. 2015a,b). We sampled each of the seven previously mentioned forward model parameters (H 2 O, CO 2 , CH 4 and CO abundances, the mass of the planet M p , the radius R p and the temperature T p ) 10 times within the parameter ranges denoted in table 1. This configuration yields 10 7 forward models, which are split into 90% training set and 10% test set. The test set is used to validate the accuracy of the network on previously unseen data. As discussed later on, we find this training set to be overcomplete and only require a smaller subset of the full training set for convergence.
During the training, we perform two training iterations of the discriminator to every training step of the generator. We used an NVIDIA TESLA V100 GPU with minibatch sizes of 64 training ASPAs. We required ∼ 9 hours per epoch on the V100 GPU and comparatively about three days on 20 CPU cores in parallel. . Each area is dedicated to a particular atmospheric characteristic: Area 1 is the spectrum between 1µm and 50µm at resolution 100 normalised between 0 and 1 in each spectral bin. Areas 2 to 5 give information about the normalisation factors used in the different section of the spectrum, clear and dark area give, respectively, information about the maximum values and the minimum values. In areas 6 to 8 we encode the atmospheric trace-gas volume mixing ratios of CO2, CO and CH4 respectively. Areas 9 to 11 are, respectively Mp, Rp and Tp. Area 12 gives information on the H2O trace-gas volume mixing ratio.
The convergences of the loss functions during the training phase are shown in figure 4. The full model setup can be found in the appendix (table 7). We tested three different sizes of our latent variable space z, with z dim = 50, 100 and 200. We found z dim = 50, to yield significantly noisier reconstructions at the end of one epoch of training, whereas no discernible differences between z dim = 100 and z dim = 200 could be observed. We hence settled on z dim = 100. We have adopted a training minibatch size of 64 ASPAs and found no significant effect of larger training batch sizes on network convergence.
During minibatch training, the algorithm is presented with a sub-set of the full training data (in this case 64 ASPAs) rather than the full training set (or batch). This eases memory requirements of large training set, in particular for memory limited devices such as GPUs. By only considering a sub-set of training data at a time, a gradient descent optimiser, such as ADAM, is still able to perform well, despite the increase in variance on the gradient estimated. In order to avoid biased estimations and convergence to local minima, minibatches must be selected randomly from the training set at each iteration.

Data reconstruction
Once we have trained ExoGAN, we can now define our 'retrieval' model. As alluded to above, we use the inpainting properties of a GAN to complete the missing data, in this case, the forward model parameters, in our ASPA. In other words, we convert our observed spectrum into the ASPA format and keep unknown values (parameters and missing wavelength ranges) masked. Given the information available, the ExoGAN will then attempt to fill in the missing information to complete the full ASPA. Here we follow the semantic inpainting algorithm by Yeh et al. (2016).
We can define our reconstructed data, x recon , from the incomplete observed data, y, using where M is a binary mask set to zero for missing values in y, i.e. forward model parameter values and, possibly, missing wavelength ranges. Here, constitutes the Hadamard product and G(ẑ) is the GAN generated data. We note that after the ExoGAN has been trained, z represents an encoding manifold of p data and we denote the closest match of (M G(z)) to (M y) withẑ, whereẑ ⊆ z. The aim is now to obtainẑ that accurately completes x recon . Let us define the following optimisation.
where L is a loss function of z that finds its minimum whenẑ is reached. Following Yeh et al. (2016), we define the loss function to be comprised of two parts, contextual loss and perceptual loss, The contextual loss, L cont (z) is the difference between the observed data and the generated data. Here we follow the definition by Amos (2016): Empirically, Yeh et al. (2016) find the l 1 norm to yield slightly better results, though the l 2 norm can equally be used. Whereas the conceptual loss compares the generated data with the observed data directly, the perceptual loss, L perc (z), uses the discriminator network to verify the validity of the generated data given the training set.
L perc (z) = log (1 − D(G(z))) To solve equation 6 we use the ADAM optimiser (Kingma & Ba 2014) with a learning rate of 0.1. For a deeper discussion about the ADAM optimiser, see Appendix A.
We investigated the ratio of perceptual loss (Eq 9) to contextual loss (Eq 8) and found λ = 0.1 to be optimal but note that λ > 0.1 gives too much emphasis to the perceptual loss term and yielded less reliable results.
In figures 5 & 6 we show the three phases associated to a prediction: Left, the ground truth; Middle: the masked spectrum/parameters; Right: the reconstructed ASPA. Figure 7 shows a water-dominated atmosphere of a test-set hot-Jupiter (black) and the ExoGAN reconstructed spectrum based on the Hubble/WFC3 bandpass only (red). We find a very good agreement between reconstructed and ground-truth spectra.

ATMOSPHERIC PARAMETER RETRIEVAL
To retrieve the atmospheric forward model parameters, we assume the observational uncertainties on the spectrum to be Gaussian distributed. We then generate 1000 noisy instances of the observed spectrum, x i (λ), by    Table 2. ExoGAN prediction accuracies associated to each parameters for the training set. The A(0 σ φ ) column represent the absolute accuracy of the prediction without taking into account the error bar of the retrieval. The 2 nd and 3 rd columns are taking into account the 1 σ and 2 σ retrieved errors following equation 10.
sampling from a normal distribution with a mean of x(λ) and standard deviation σ λ . From these noisy spectrum instances, we generate 1000 corresponding ASPAs with missing information (may they be parameters, spectral ranges or both) masked. We now let ExoGAN predict and inpaint these ASPAs. Finally, we collect all parameter predictions and calculate the mean and standard deviation of the resulting distribution. Hence, the resulting distributions are not posterior distributions derived from a Nested or MCMC sampling atmospheric retrieval, but are conceptually more similar to running a retrieval based on optimal-estimation multiple times and collecting the distribution of results.

ACCURACY TESTS
We defined the accuracy of the retrieved parameter, A, as the function of the ground-truth parameter value, φ, the retrieved value, φ recon , and its corresponding error σ φ , where N is the number of reconstructed ASPA instances. We compute the reconstruction accuracies for 1000 randomly selected planets for each, the test and training sets. The accuracies are summarised in tables 2 & 3 for 0 σ (an exact match), 1 σ and 2 σ confidence intervals. Figure 8 shows an example of the parameter distributions retrieved for a test-case planet.

Comparison with a classical retrieval model
In this section, we compare the ExoGAN results with a 'classical' retrieval result obtained with the TauREx retrieval code. For this comparison and tests in subsequent sections, we used as example the hot-Jupiter    table 4. We now retrieve the forward model parameters for both TauREx and ExoGAN for spectra across the Hubble/WFC3 only band and a broad (0.3 -15 µm) wavelength band. Here the Hubble/WFC3 spectrum was taken from Tsiaras et al. (2018) and interpolated to the ExoGAN resolution using a quadratic interpolation (figure 9). The large wavelength range spectrum is synthetic, based on table 4.
In figure 10 we compare both sets of results. The Hubble/WFC3 and large wavelength retrievals are shown with square and circular markers respectively. In both cases, the ExoGAN predictions are consistent with the TauREx retrievals within the error bars. We note that in the case of CO in the Hubble/WFC3 data, neither TauREx nor ExoGAN feature detections as expected.
We then generated a second synthetic spectrum of HD 189733 b between 0.3 − 15 µm, using the parameters of Venot et al. (2012)  distribution. The only significant difference is the CO abundance, where the ExoGAN abundances are higher. Note that both, TauREx and ExoGAN show tails in their CO abundance posteriors indicating the difficulties of retrieving CO even for classical retrieval algorithms.
Comparisons of run-time are remarkable. Using the TauREx Retrieval code with seven free parameters a standard nested-sampling analysis takes ∼ 10 hours on 24 CPU cores using absorption cross-sections at a resolution of R = 15,000 and spanning a large (0.3 -15 µm) wavelength range. The trained ExoGAN requires ∼ 2 minutes for the same analysis. This result constitutes a speed up of ∼ 300 times and is independent of the number of free parameters and of the resolution of the input spectrum. Similarly, training ExoGAN on higher resolution data, does not significantly impact its runtime after training as both the size and architecture of the underlying network remain unchanged.

ROBUSTNESS TESTS
To test the limits of ExoGAN we simulate three conditions previously encountered by the network. We use the same example planet as in the previous section (table 4) and simulate the following three scenarios unseen by ExoGAN during training phase: • the presence of clouds;  Figure 9. Real HD 189733b observation with the Hubble WFC3 camera (Tsiaras et al. 2018). The black points are the observed data and the green line is the interpolated spectrum to the ExoGAN resolution.
• the addition of a trace gas unknown to the network; • atmospheric temperatures outside the training range.
Each test is discussed below, and the ExoGAN predicted abundances versus the ground-truth are summarised in table 5. Furthermore, we test the ExoGAN's robustness against varying signal-to-noise (S/N) levels of the observed spectrum.  Here we test the response of ExoGAN to the presence of clouds in the atmospheric spectrum. We simulate a grey cloud deck at 10 mbar pressure (figure 12) and let ExoGAN reconstruct the atmospheric parameters, see figure 13. The lack of information due to the clouds presence results in a wider distribution of parameters. However, ExoGAN is still able to retrieve all trace-gas abundances within 1 σ confidence. We find that temperature estimates can be overestimated. This result is likely a consequence of the normalisation procedure used in the presence of clouds.

Presence of molecules outside of the training set
In this test, we simulate the impact of unknown features on the retrievability of known trace gases. We here consider a spectrum containing water at the default test value and NH 3 with a mixing ratio of 10 −4 . Though Venot et al. (2012) estimated an NH 3 mixing ratio of 10 −6 , we use an unrealistically high value as a worst-case scenario. By removing all other trained tracegases but water, we also test for spurious detections in non-existing trace-gases. Figure 14 shows the ExoGAN parameter distributions. We find the network to recognise the absence of trace-gases and does not detect 'false positives', while still recovering the exact mixing ratio of H 2 O.

Parameters outside the training range
In the third robustness test we simulated a default planetary atmosphere but an effective temperature of 2500 K, 500 K above the temperature training range. In this test, as shown in figure 15, all parameters converge toward the real solution within 1 σ, except for the planetary temperature. Here, the network does not retrieve the correct temperature but assigns a large error bar suggesting that the temperature value is unconstrained if the input value is not contained in the domain range of ExoGAN.

Impact of spectral signal-to-noise
We test ExoGAN for varying levels of observational noise. Here we take the default planet (table 4) and add noise in steps of 10ppm in the range [0, 100] ppm. In figure 16 we show examples of spectra at σ λ : 20, 50, 60 and 100 ppm noise level.
For each noise level, we calculated the accuracy of the prediction following equation 10, but setting A(σ φ = 0), figure 17. We note that figure 17 only shows the difference between the predicted value and an exact match and prediction accuracies increase when retrieval error bars are taken into account. Here we want to demonstrate the relative degradation of the prediction accuracy as a function of σ λ . As intuitively expected, the noisier the spectra, the less accurate the model. The Radius of the planet can be easily recognised by the ExoGAN in the entire error range tested. The most difficult parameter to identify is the CO abundance and the mass of the planet. In this work, we used 10 7 forward models over seven atmospheric forward model parameters. We find that this training set is significantly over-complete and the ExoGAN training can be completed successfully with ∼ 50 % of the existing training set. Optimising training in future iterations will allow for the inclusion of more complex atmospheric forward models. One of the main difficulties for training neural networks with transmission spectra is the normalisation of the spectra in R p /R * . A consistent normalisation across a broad range of possible atmospheres is required during the training process, but difficult to achieve in reality given strongly varying atmospheric scale heights and trace-gas abundances. In this work, we adopted a normalisation based on instrument pass-bands as well as water bands. Though in practice this approach works for most scenarios, it can introduce biases when highaltitude clouds are present. In these cases, we find that the normalisation procedure stretches the observed spectrum too much, leading the network to identify higher atmospheric temperatures than it otherwise would. In future work, we plan to mitigate this effect by including grey clouds in the training set as well as further refining the normalisation scheme. We note that for emission spectroscopy a consistent normalisation is more readily achieved if the planetary and stellar equilibrium temperatures are assumed to be known (Waldmann 2016).
ExoGAN has been trained on a large set of simulated forward models. By including ExoGAN as an integral part in the TauREx retrieval framework, we will be able to use forward models created during a standard re-  Figure 17. Accuracy as a function of spectral error bars, σ λ . As discussed in the text, we note that this figure does not take into account the retrieval error bar, i.e. A(σ φ = 0) following equation 10.
trieval run (of the order of 10 5 -10 6 models per retrieval) to perform online learning and continuously improve the accuracy of ExoGAN over time.

Comparison with other machine learning architectures
In the previous sections, we have explored the use of DCGANs to retrieve atmospheric parameters from observations. GANs belong to the class of semi-supervised and unsupervised generative models and since their inception have been subject to significant research. In this paper, our use of DCGAN is unsupervised as we provide the parameters together with the data to be modelled. Such an approach allows for a high degree of flexibility in using ExoGAN, as we only need to re-define the ASPA array to train on new problem sets.
Most other generative models require the likelihood function of the data to be defined, something we do not intrinsically know for many exoplanet observations, whereas GAN based models are likelihood-free methods and p θ (x) does not need to be computed during training. This characteristic has obvious advantages over pure variational autoencoders which require a parametrised form of the probability space from which it draws its latent variables.
Whilst we have explored the use of GANs in the scope of this paper, we note that other neural network architectures, such as simpler deep believe networks or VAEs, may yield comparable results. In fact, recent work by the 2018 NASA Frontier Development Lab 1 has explored various deep learning architecture in the context of atmospheric retrievals with promising results. Similarly, other machine learning frameworks may also be successfully used to model exoplanetary spectra. For example, ? recently presented an atmospheric retrieval algorithm based on random forests regression (?) and demonstrated the algorithm on Hubble/WFC3 observations.

Future work
In this work, we have used the 'vanilla' DCGAN as underlying algorithm. Since its inception, various interesting additions to the classical GAN have been proposed which we intend to explore in future work. Notable amongst them are the VAE-GAN hybrids, random forest and GAN hybrids and Bayesian-GAN models. The VAE-GAN models (e.g. Rosca et al. 2017;Dosovitskiy & Brox 2016;Ulyanov et al. 2017;Makhzani et al. 2015), allow direct inference using GANs. Something that is not possible using purely generative models. To further guard against model collapse, ? have recently proposed a random forest and GAN hybrid algorithm, GAF, where the fully connected layer of the GAN's discriminator is replace by a random forest classifier. Saatchi & Wilson (2017) proposed a Bayesian-GAN by drawing probability distributions over θ (D) and θ (G) , allowing for fully bayesian predictive models and further guarding against model collapse.

CONCLUSION
In the era of JWST and ARIEL observations, nextgeneration atmospheric retrieval algorithms must reflect the higher information content of the observation with an increase in atmospheric model complexity. Complex models are computationally heavy, creating potential bottlenecks given current state-of-the-art sampling schemes. Artificial intelligence approaches will provide essential tools to mitigate the increase in computational burden while maintaining retrieval accuracies.
In this work, we introduced the first deep learning approach to solving the inverse retrieval of exoplanetary atmospheres. We trained deep convolutional generative adversarial network on an extensive library of atmospheric forward models and their associated model parameters. The training set spans a broad range of atmospheric chemistries and planet types. Once trained, the ExoGAN algorithm achieves comparable performances to more traditional statistical sampling based retrievals, and the ExoGAN results can be used to constrain the prior ranges of subsequent retrievals (to significantly cut computation times) or be used as stand-alone results. We found ExoGAN to be up to 300 times faster than a standard retrieval for large spectral ranges. ExoGAN is designed to be universally applicable to a wide range of instruments and wavelength ranges without additional training.
All codes used in this publication are open-access and their latest versions are hosted at https://github.com/ orgs/ucl-exoplanets. Manuals and links to the training sets can be found at http://exoai.eu. Furthermore, the training data and the corresponding ExoGAN software (at the time of paper acceptance) have been assigned the DOI:10.17605/OSF.IO/6DXPS and are permanently archived at https://osf.io/6dxps/.