Neural style transfer of weak lensing mass maps

We propose a new generative model of projected cosmic mass density maps inferred from weak gravitational lensing observations of distant galaxies (weak lensing mass maps). We construct the model based on a neural style transfer so that it can transform Gaussian weak lensing mass maps into deeply non-Gaussian counterparts as predicted in ray-tracing lensing simulations. We develop an unpaired image-to-image translation method with Cycle-Consistent Generative Adversarial Networks (Cycle GAN), which learn efficient mapping from an input domain to a target domain. Our model is designed to enjoy important advantages; it is trainable with no need for paired simulation data, flexible to make the input domain visually meaningful, and expandable to rapidly-produce a map with a larger sky coverage than training data without additional learning. Using 10,000 lensing simulations, we find that appropriate labeling of training data based on field variance allows the model to reproduce a correct scatter in summary statistics for weak lensing mass maps. Compared with a popular log-normal model, our model improves in predicting the statistical natures of three-point correlations and local properties of rare high-density regions. We also demonstrate that our model enables us to produce a continuous map with a sky coverage of $\sim166\, \mathrm{deg}^2$ but similar non-Gaussian features to training data covering $\sim12\, \mathrm{deg}^2$ in a GPU minute. Hence, our model can be beneficial to massive productions of synthetic weak lensing mass maps, which is of great importance in future precise real-world analyses.

1. INTRODUCTION General relativity predicts that an apparent image of a single distant astrophysical source should be distorted by gravitational lensing effects.The distortion is governed by the gravitational potential of intervening mass (lens) and the geometry between the lens and source objects (Bartelmann and Schneider 2001).Extracting coherent distortions over a large number of sources allows us to infer large-scale gravitating mass distributions of the Universe in an unbiased manner.The lensing effect caused by the large-scale structure is commonly referred to as the weak lensing effect, whereas the mass distributions inferred from the weak lensing effect are called weak lensing mass maps in the literature.Statistical analyses of weak lensing mass maps can provide vital information of gravitational clumping of dark matter and cosmic expansion (see, Kilbinger (2015) and Mandelbaum (2018) for reviews).
The two-point angular correlation function of shapes of sources, or its Fourier-space counterpart known as the power spectrum, are commonly used to characterize weak lensing mass maps.Although the power spectrum provides a complete statistical description of a random Gaussian field, numerical simulations of weak lensing effects have shown that weak lensing mass maps have non-Gaussian properties.masato.shirasaki@nao.ac.jpHence the power spectrum alone cannot fully describe a weak lensing mass map (Jain et al. 2000;White and Hu 2000).Various statistical methods have been proposed to study the non-Gaussian features of weak lensing mass maps (see Ajani et al. (2023) and references therein for a list of the statistics proposed so far).At present, there are no unique methods to extract all the information imprinted in weak lensing mass maps.
A generative model of weak lensing mass maps plays an essential role in modern weak lensing analyses.It is highly demanded in practice for the reasons below; • Valid analytic approaches are yet unknown to derive cosmological dependence of most of the non-Gaussian statistics proposed so far.An ensemble average of a given non-Gaussian statistic over a set of generated weak lensing mass maps is commonly used to study its cosmological dependence.
• A precise estimation of statistical errors in a given lensing statistic is crucial in cosmological inference (Dodelson and Schneider 2013).A strong non-Gaussianity in the weak lensing mass maps and several observational effects make it difficult to estimate the statistical errors in an analytic way.The best practice is to run a set of numerical simulations with the relevant physics to lensing analyses and to create realistic synthetic weak lensing data by including various observational effects with appropriate recipes (Shirasaki et al. 2019a).
• Blind analyses of weak lensing mass maps become a standard procedure to remove any human-based systematic errors as some cosmological tensions have been claimed (see Abdalla et al. (2022) for a review of the cosmological tensions).The current blinding scheme of weak lensing data is based on changes of observed galaxy shapes with a single number, introducing modulations in the amplitude of lensing power spectrum (Asgari et al. 2021;Secco et al. 2022;Dalal et al. 2023;Li et al. 2023).More sophisticated blinding is required to test the accuracy of a theoretical template for a given lensing statistic.A blinded challenge, analyzing generated weak lensing mass maps by making input cosmological parameters unknown, is among the most effective approaches for this purpose (see Nishimichi et al. (2020) for an example in galaxy clustering data).A fast precise generative model of weak lensing mass maps enables one to perform the blinded challenge in a userfriendly manner.
• Field-level cosmological inference of weak lensing mass maps receives plenty of attention from the community (Porqueres et al. 2021;Boruah et al. 2022;Dai and Seljak 2022) because it can have no loss of cosmological information in principle.Reconstructions of weak lensing mass maps within a Bayesian framework are also an important subject in modern astronomy even for a fixed cosmological model, whereas the reconstruction from noisy galaxy imaging data often requires a training dataset of noiseless weak lensing mass maps (Shirasaki et al. 2019b;Jeffrey et al. 2020;Fiedorowicz et al. 2022;Remy et al. 2023).A rapid but realistic generator of weak lensing mass maps is fundamental to the field-level inference and Bayesian reconstruction as a matter of course.
Ray-tracing simulations of weak lensing effects are among the most robust generative models thus far, but those need multiple outputs from cosmological simulations of large-scale structures.Because of its computational cost, the ray-tracing simulation can typically provide 100-10,000 realizations of weak lensing mass maps on a yearly basis (Sato et al. 2009;Dietrich and Hartlap 2010;Harnois-Déraps et al. 2012;Liu et al. 2018;Takahashi et al. 2017;Harnois-Déraps et al. 2018;Harnois-Deraps et al. 2019;Shirasaki et al. 2021;Osato et al. 2021;Kacprzak et al. 2023;Hadzhiyska et al. 2023;Ferlito et al. 2023).To reduce the computational cost, previous studies have proposed that the time-consuming cosmological Nbody simulation can be replaced with some alternatives.The proposals include the use of multivariate log-normal models (Taruya et al. 2002;Xavier et al. 2016;Makiya et al. 2021) or its advanced versions (Yu et al. 2016;Tessore et al. 2023), an approximate gravity solver (Izard et al. 2018;Sgier et al. 2019;Böhm et al. 2021), and halo-based models (Giocoli et al. 2017(Giocoli et al. , 2020)).Ray-tracing simulations with the alternative methods drastically shorten computational time with a fixed computing environment, while they have limited predictive power for non-Gaussian observables in general.
Image processing with neural networks provides another interesting approach for the generative model of weak lensing effects.Motivated by recent successes in deep learning, there exist several generative models of weak lensing mass maps based on deep-learning neural networks, including Generative Adversarial Networks (GANs) (Mustafa et al. 2019;Perraudin et al. 2021;Tamosiunas et al. 2021), Score-based Diffusion Models (Remy et al. 2023), and Normalizing Flow (Dai and Seljak 2022).Those deep-learning-based generators commonly produce a new weak lensing mass map from a given set of random numbers, and the generators learn the relation between the mass map and latent random numbers from a large set of training data.Because a field of view and the number of pixels in the training data has to be set a priori, the generators are not able to produce weak lensing mass maps at different sky coverage and angular resolution from the training setup.This is a weak point of the deep-learningbased generative models as ongoing and future galaxy imaging surveys aim at measuring weak lensing effects over a sky coverage of 1000 deg 2 or larger.In general, it becomes difficult to train a given neural network as the size of output data increases, because the number of trainable parameters in the network increases dramatically.
In this paper, we propose a new generative model of weak lensing mass maps based on a deep-learning method, which is designed to be able to increase a field of view without additional training.We train the neural network translating a latent image into a target image of a non-Gaussian weak lensing mass map.We here set the latent image to a multivariate Gaussian map with the power spectrum being the same as the target counterpart.The network will be trained to learn the translation between the Gaussian and non-Gaussian mass maps in a given field of view within the framework of Cycle-Consistent Generative Adversarial Networks (Cycle GAN) (Zhu et al. 2017).After training, we can extend a sky coverage of fake mass maps by (i) producing a large-sky Gaussian map from the given power spectrum, (ii) dividing the Gaussian map into subregions, (iii) performing the learned translation on a patch-by-patch basis, and (iv) combining the non-Gaussian maps on all the subregions.This procedure allows our generator to produce a lensing mass map with an arbitrary sky coverage while not degrading the angular resolution compared to the training datasets.Note that our training data does not require a large sky coverage, enabling the neural network to learn small-scale non-Gaussian features in a weak lensing mass map in a cost-efficient manner.Also, our approach enables us to rapidly increase the number of realistic lensing simulations with low costs of producing a new set of the Gaussian mass maps as long as we thoroughly train the Cycle GAN.
Similar approaches based on an image-to-image translation have been explored in Han et al. (2021) and Piras et al. (2022).A distinct difference from the previous studies is that our training does not need a paired or aligned set of images for the image-to-image translation.This is achieved by recent progress in unsupervised learning for style transfer of natural images as in Zhu et al. (2017).Note that paired sets of images at two different domains are not available in common, that is the main difficulty in constructing generative models for the task of image-to-image translation.Our training strategy only requires an unpaired set of images at two different domains, allowing one to easily increase the number of training data as well as examine various translations in an efficient way.
The rest of the present paper is organized as follows.In Section 2, we summarize the basics of weak gravitational lensing effects.In Section 3, we describe three generative models of weak lensing mass maps based on multivariate random Gaussian variables, the log-normal model, and the generative adversarial networks adopted in this paper.We provide how to produce the data set for training and testing the networks in Section 4. In Section 5, we show the performance of our trained networks at the sky coverage as same as in the training datasets.We then study the applicability of our method in extending a field of view in Section 6. Limitations of our generators are listed in Section 7. Concluding remarks and discussions are given in Section 8.

WEAK GRAVITATIONAL LENSING
The image distortion of a source object (galaxy) induced by weak gravitational lensing is commonly characterized by the 2 × 2 matrix below: where κ is the convergence, γ is the shear, and ω is the rotation, θ obs and θ true represent the observed position of the source object and the true (unlensed) position, respectively.In the limit of κ, γ ≪ 1 which we are interested in, we can express the convergence as the integral of the density contrast of underlying matter density field δ m (x) with a weight function over redshift (Bartelmann and Schneider 2001), where c is the speed of light, H 0 is the present-day Hubble constant, Ω m0 is the matter density parameter at present, χ(z) is the radial comoving distance to redshift z, r(χ) is the angular diameter distance, and p(χ) represents the source distribution normalized to dχ p(χ) = 1.Throughout this paper, we assume a single source plane at redshift of z source = 1, i.e. p(χ) = δ D (χ − χ 1 ) where δ D (x) is the Dirac delta function and χ 1 = χ(z = 1) for simplicity 1 .

MODEL 3.1. Gaussian
The simplest generative model of the lensing convergence field is given by a multivariate random Gaussian distribution.The full statistical information of κ can be characterized by the distribution in Fourier mode κℓ .In general, the distribution of Fourier mode κ ℓ is expressed as the function of norm |κ ℓ | and phase θ κ,ℓ .Assuming zero mean, we can write the distribution of Gaussian κ ℓ as 1 We also examined two different source redshifts of zsource = 0.5 and 1.5 to train our neural-network-based model.We found that the model shows a similar performance as in Section 5 even for zsource = 0.5 and 1.5 as long as we trained the model with labeling of training datasets based on field variances of lensing convergence fields.See Section 4.3 for the labeling.
where C(ℓ) is the power spectrum of κ and it is defined by The Gaussian generative model has an advantage of being computational efficient.It enables us to produce a map with an arbitral angular resolution and sky coverage.In the concordance cosmological model, the initial condition of density perturbations follows a random Gaussian variable (e.g.Planck Collaboration et al. 2020), allowing the Gaussian model to be the zeroth order approximation.Nevertheless, non-linear gravitational growth of matter density fields through cosmic ages causes the couping among different Fourier modes of the initial density perturbations (e.g.Bernardeau et al. 2002, for a review), leading to the emergence of non-Gaussian structures in lensing converegence fields.The Gaussian generative model should remain valid for large-scale Fourier modes, because the large-scale modes are less affected by the gravitational mode coupling.

Log Normal
Non-linear gravitational growth in the matter density contrast δ m makes the lensing convergence non-Gaussian.A prominent non-Gaussian feature of κ in numerical simulations is the presence of a heavy positive tail in one-point probability distribution function (PDF) of κ (e.g.Jain et al. 2000;Takahashi et al. 2011;Castro et al. 2018).An empirical log-normal model can account for the tail in a simple way (Taruya et al. 2002).The model assumes that the real-space convergence field follows the one-point PDF of for κ > κ min , and it holds that Prob(κ) = 0 otherwise.In Eq. ( 8), the variance σ 2 ln is given by where ⟨κ 2 ⟩ is the variance of κ.Note that the variance is determined by the convergence power spectrum of C.
One can rewrite the log-normal model in terms of a new Gaussian variable y by considering a local-type translation of This new Gaussian field y has zero mean and unit variance.
Using the relation of Eq. ( 10) and the PDF of Eq. ( 8), one expect that the two-point correlation of y should be related with the counterpart of κ as where F is the inverse of Eq. ( 10), i.e.F(u) = |κ min | exp(uσ ln − σ2 ln /2) − |κ min |.Note that the two-point correlation of ξ κ (θ) can be computed as the Fourier transform of the power spectrum C; Therefore, Eq. ( 12) sets ξ y for a given convergence power spectrum.In the end, the log-normal generative model is fully specified by two parameters of the minimum convergence κ min and convergence power spectrum C.

Cycle-Consistent Generative Adversarial Networks
The log-normal model partly accounts for non-linear gravitational effects on the lensing convergence, while its validity is still limited.For instance, the log-normal model cannot reproduce summary statistics sensitive to rare events such as skewness and kurtosis (Taruya et al. 2002).Also, multiplepoint correlation functions in the log-normal model can be expressed as a function of the two-point counterpart (Hilbert et al. 2011), but this is not the case in numerical simulations of cosmic large-scale structures (Colavincenzo et al. 2019;Piras et al. 2022).
To improve the log-normal model, we here examine a machine-learning approach for image synthesis.The GAN is a generative model (Goodfellow et al. 2014) that contains two neural networks: a generator G and a discriminator D. Given the real data set, G aims to create false 2 data that looks like the genuine data from the real sample, while D tries to discriminate the false data and the genuine counterpart.The GAN can be effectively trained with the back-propagation algorithm and appropriate network architecture even when limited training data are available (e.g.see Huang et al. 2018, for a review).
Among several models of the GAN, we adopt the Cycle-Consistent GAN (Cycle GAN) consisting of a pair of GANs (Zhu et al. 2017).The Cycle GAN is designed to learn a style transfer between two different image domains.A representative example is the translation between a landscape photograph and a Monet painting.Because there are no paired image sets between landscape photographs and Monet paintings in practice, one requires the Cycle GAN to train with unpaired image collections, consisting of a source set of {x i } N i=1 (x i ∈ landscape photos) and a target set {y j } M j=1 (y j ∈ Monet paintings).In this setup, one is not able to use the information that x i in the input domain matches to y j in the target for training of the Cycle GAN.In this paper, we train the Cycle GAN with the network architecture proposed in Zhu et al. (2017) by using unpaired image sets of the Gaussian convergence and the convergence produced in lensing simulations of Liu et al. (2018).It would be worth noting that the Gaussian convergence model is computationally efficient, while a massive production of lensing simulations requires high computational costs.This situation is similar to the translation between a landscape photograph and a Monet painting, i.e. it is easy to prepare a large set of landscape photographs with low costs, but it is really expensive (impossible in fact) to increase the number of Monet paintings.
Details of the unpaired sets and training process are described in the next section, while the network architecture in the Cycle GAN is provided in Appendix A. We use the Pytorch implementation3 of the Cycle GAN throughout this paper.Note that the network architecture is same as developed in Zhu et al. (2017).The architecture has been examined for various image-to-image translations in real-world examples.
We here briefly summarize how to train the Cycle GAN in an unsupervised fashion.Let X and Y be the image sets produced by the Gaussian model as in Section 3.1 and the set of the lensing convergence in the simulations, respectively.We refer to G X as the generator translating an input image of x ∈ X to y ∈ Y , whereas D X judges if an input image is produced by G X or not.We also define the generator of G Y and the discriminator of D Y for the translation of y → x in a similar way.Our objective is then summarized as where L(G X , G Y , D X , D Y ) is the loss function for the Cycle GAN containing four parts below; The first two functions in the right-hand side of Eq. ( 15) represent the losses for the two GANs and those are defined by where E x represents the expected value over a sample of x and so on.The term of L cyc in Eq. ( 15) regularizes the two generators of G X and G Y so that the learned mapping between X and Y should be cycle-consistent.It is given by where | • • • | 1 is the L1 norm.We further constrain the generators with the term of L id (G X , G Y ) so that they can provide an identity mapping when real samples of the target domain are used as their input, i.e.
We have two hyperparameters of λ 1 and λ 2 in the loss and set λ 1 = 10 and λ 2 = 5 throughout this paper4 .In our setup, the reguralization terms are typically 10 times as small as other GAN losses.Because the relative differences between the GAN losses and the regularization terms are important, we add λ 1 and λ 2 to the L cyc and L id terms alone.

High-cost Lensing Simulations
To train the Cycle GAN, we use publicly available numerical simulations of weak gravitational lensing as performed in Liu et al. (2018).The simulation suite is referred to as MassiveNuS (Cosmological Massive Neutrino Simulations), aiming at detailed studies of cosmic large-scale structures in the presence of massive neutrinos.MassiveNuS consists of 101 models in flat-ΛCDM cosmology.In this paper, we adopt the simplest flat-ΛCDM model including massless neutrinos with cosmological parameters below; total mass density Ω m0 = 0.3, baryon density Ω b0 = 0.046, cosmological constant Ω Λ = 1 − Ω m0 = 0.7, present-day Hubble parameter H 0 = 100h = 70.0km/s/Mpc, primordial scalar spectrum power-law index n s = 0.97, and the primordial curvature power spectrum at the pivot scale k = 0.05 Mpc −1 being A s = 2.1×10 −9 .The authors in Liu et al. (2018) run N-body simulations with the public tree-Particle Mesh code Gadget-2 (Springel 2005), adopting a box size of 512 h −1 Mpc and 1024 3 particles.The Zel'dovich approximation (Zeldovich 1970) was adopted when they generated the initial condition at z = 99 for the N-body simulations.From the Nbody simulation snapshots, they then produced 10,000 convergence maps at five different source redshifts of z source = 0.5, 1.0, 1.5, 2.0, and 2.5 by using the public ray-tracing code LensTools (Petri 2016).Each map covers a sky of 3.5 2 deg 2 with the number of pixels on a side being 512.In this paper, we degrade the convergence map at z source = 1.0 from the original resolution of 512 × 512 to 256 × 256 as it is used for the input of the generators in our GAN.

Low-cost Gaussian Simulations
As another training dataset for the Cycle GAN, we produce 10,000 convergence maps assuming the Gaussian model as described in Section 3.1.In the Gaussian model, we set the convergence power spectrum of C so that it can be in agreement with the average power spectrum in the lensing simulations.To be specific, we assume a functional form of the power spectrum below; where C ref (ℓ) represents a reference model of the power spectrum, and there are three free parameters of ℓ 0 , α and β to account for resolution effects in the lensing simulations.Using Eq. ( 3) with the Limber approximation (Limber 1954), we compute the reference model of C ref as where P δ (k, z) is the three dimensional matter power spectrum at redshift z, and it is computed as the fitting formula developed in Takahashi et al. (2012).We constrain the free parameters of ℓ 0 , α, and β by least-squares fitting for the power spectrum in the lensing simulations.When estimating the convergence power spectrum from the simulations, we first adopt the fast Fourier transform of the convergence field and then measure the binned power spectrum of the convergence field by averaging the product of Fourier modes |κ ℓ | 2 .We employ 15 bins logarithmically spaced in the range of ℓ = 100 to ℓ = 10 4 .In the end, we find that Eq. ( 21) can explain the average power spectrum over 10,000 simulations at z source = 1 within a 5%-level accuracy when setting ℓ 0 = 1.360 × 10 4 , α = 1.643, and β = 4.088.This pre-training of the Gaussian model allows the Cycle GAN to learn non-Gaussian information about the lensing simulations in an efficient way.Also, it is important to prepare training sets with reasonable similarities when one trains the GAN for an image-to-image translation (Isola et al. 2017;Zhu et al. 2017;Shirasaki et al. 2019b).

Training Cycle GAN
For training the Cycle GAN, we follow the procedure as adopted in Zhu et al. (2017).To make the training stable, we replace the negative log-likelihood objectives in Eqs. ( 16) and ( 17) with least-squares loss functions (Mao et al. 2017).In practice, we train 16) at every batch.A similar replacement is applied in Eq. ( 17).Furthermore, we update the discriminators using a history of generated images rather than the ones produced by the latest generators.We keep an image buffer that stores the 50 previously created images.This strategy is effective in reducing a gap between synthetic and real image distributions (Shrivastava et al. 2017).We use the Adam optimizer (Kingma and Ba 2015) with a batch size of 1 to update the parameters in the Cycle GAN.The batch size of 1 is known to provide better results when the generator has a U-net architecture as we adopted (Isola et al. 2017).All networks are trained with a learning rate of 0.0002 for the first 100 epochs, while we linearly decay the rate to zero over the next 100 epochs.Weights are initialized from a Gaussian distribution with zero mean and standard deviation of 0.02.
When using the training datasets without any annotations, we find that convergence fields produced by the generator G X can reproduce some summary statistics observed in the lensing simulations on average, whereas G X fails to predict the variance as in the lensing simulations.To improve the performance of the generator, we divide the training sets into N groups by using a label that is relevant to variations in the convergence fields.Among several candidates, we decided to use V 1 and V 2 Total boundary length and integral of geodesic curvature along contours Scattering transform coefficients s 1 and s 2 A compressed set of two-and four-point spatial correlations the field variance ⟨κ 2 ⟩ to measure the variations.This choice is physically motivated as the scatter in the lensing power spectrum C at large ℓ is mostly determined by four-point correlations with squeezed quadrilaterals including a shared infinite wavelength mode (Takada and Hu 2013).According to a halo-model prescription (Cooray and Sheth 2002), the large-ℓ variance in C depends on the variance of matter density in a survey window, sharing some information with the field variance of ⟨κ 2 ⟩.Hence, we expect that the field variance ⟨κ 2 ⟩ provides a good measure of the variation in lensing convergence fields.An example of labeling the training sets is shown in Figure 1.In the figure, we divide the training sets into N = 10 groups by their field variances.We then train the Cycle GAN by using the lensing simulations and the Gaussian data with the same label of i In the end, we will have N different generators to produce non-Gaussian convergence fields from given Gaussian fields.When producing a new non-Gaussian field, we select which generator will be used by the field variance of an input Gaussian field.Throughout this paper, we set N = 10 for our fiducial choice.
The N dependence of our results is examined in Section 5.2.As a by-product, the Cycle GAN provides a generator to translate non-Gaussian convergence fields into Gaussian counterparts.Although this translation is not our primary purpose, it is interesting to see if the Cycle GAN can be used for a reasonable Gaussianization of weak lensing mass maps.Note that appropriate Gaussianization enhances the cosmological constraining power of weak lensing fields (Yu et al. 2011;Seo et al. 2011;Seo et al. 2012;Simpson et al. 2016).Appedix B summarizes the results of the translation from the non-Gaussian to the Gaussian mass maps.A simple training without any labels can gaussianize the non-Gaussian weak lensing mass maps obtained by the ray-tracing simulation with a reasonable precision; The Gaussianized power spectrum can reproduce the true Gaussian counterpart with a level of 2% at ℓ > ∼ 400, while the Gaussianized one-point PDF is well described by the expected Gaussian distribution.Also, the average bispectrum over 10,000 Gaussianized lensing mass maps is consistent with a null detection within the statistical uncertainty for a sky coverage of 3.5 2 deg 2 .We also find that field-variance-based labels do not help to improve the performance in the translation when the target domain is set to the Gaussian model.

Summary Staistics
Because our neural-network-based model is trained in an unsupervised fashion, it is non-trivial to assess the performance of our model.In fact, the loss for GANs does not provide any measures for model validation.In this paper, we validate the model by studying various summary statistics of generated (fake) convergence fields and comparing them with their counterparts in the ground-truth lensing simula-tions.Because of non-Gaussian natures in the lensing convergence, several summary statistics have been studied in the literature.Apart from the standard power spectrum, we here consider five non-Gaussian statistics of bispectrum, peak counts, one-point probability distribution function (PDF), Minkowski functionals (MFs), and scattering transform coefficients.Characteristics of each non-Gaussian statistic are briefly summarized below.Table 1 provides all the summary statistics of interest in this paper with a short description.

BISPECTRUM
The bispectrum is the three-point correlation in Fourier space and is defined as where B represents the bispectrum.Because it holds that B = 0 at any Fourier-mode configurations for the Gaussian convergence model, the bispectrum provides a powerful means of studying the lowest-order non-Gaussian information in the convergence field (Hui 1999;Cooray and Hu 2001;Benabed and Bernardeau 2001;Takada and Jain 2004;Refregier et al. 2004;Dodelson and Zhang 2005;Coulton et al. 2019).As in the case of power spectra, one can relate the bispectrum to the three-dimensional matter bispectrum B δ ; where we evaluate the wavenumbers in the three-dimensional space as k i = ℓ i /r(χ).Theoretical and numerical studies have predicted that the bispectrum can add a 20 − 50 percentlevel gain to signal-to-noise ratio of the power spectrum up to a maximum multipole of a few thousands (Kayo et al. 2013;Sato and Nishimichi 2013).
To estimate the bispectrum from a given pixelized data of the convergence, we measure an average of the product of three Fourier modes Re[κ(ℓ 1 )κ(ℓ 2 )κ(ℓ 3 )] over a triangle configuration of ℓ 1 + ℓ 2 + ℓ 3 = 0 where Re[• • • ] stands for the real part of a complex number.Throughout this paper, we adopt 15 bins logarithmically spaced in the range of ℓ i (i = 1, 2, 3) = 100 to 10 4 for each of the three multipoles.

PEAK COUNTS
The number count of local maxima in a convergence map referred to as peak counts, has become a popular summary statistic to extract non-Gaussian cosmological information beyond the standard power-spectrum-based analysis in practice (Liu et al. 2015;Hamana et al. 2015 (Hamana et al. 2004;Hennawi and Spergel 2005;Maturi et al. 2005;Marian et al. 2012;Hamana et al. 2012), whereas peaks with modest heights can be created by superposition of large-scale structures and multiple dark matter halos (Yang et al. 2011;Liu and Haiman 2016;Sabyr et al. 2022).
In this paper, we define the peak as a pixel that has a higher value than all of its eight neighbor pixels in the discretized convergence field.Note that we do not apply any additional smoothing to count peaks.We consider 41 bins in the range of −0.05 ≤ κ ≤ 0.15.Since a typical field variance of the convergence field in our simulations is found to be 3 × 10 −4 (see Figure 1), the largest peak heights in our analysis correspond to a ∼ 9σ significance level.

ONE-POINT PROBABILITY DISTRIBUTION FUNCTION
The one-point PDF of the convergence contains vital information about the non-Gaussian matter density field (Bernardeau and Valageas 2000;Liu and Madhavacheril 2019;Barthelemy et al. 2021;Thiele et al. 2020;Boyle et al. 2021;Giblin et al. 2023).It encodes the information originating from moments of all orders in the convergence, allowing us to study the non-Gaussian nature of the weak lensing field at a given length scale.As a one-point statistic, the PDF can be easily obtained from simulated and real data even in the presence of complex survey boundaries (Friedrich et al. 2018;Gruen et al. 2018).In this paper, we measure the PDF from a histogram using 41 linearly spaced bins in the range of −0.05 ≤ κ ≤ 0.15.

MINKOWSKI FUNCTIONALS
MFs are a useful statistic to study morphological information encoded in smoothed random fields.For twodimensional maps as the lensing convergence, three MFs of V 0 , V 1 , and V 2 are given by the area in which the convergence is above a threshold of κ thre , the total boundary length, and the integral of geodesic curvature along the contours, respectively.In particular, V 2 is equal to the number of connected regions above the threshold, minus those below the threshold.Therefore, as the threshold increases, V 2 provides almost the same information as the peak counts.
MFs can be evaluated by a perturbative approach when the field of interest exhibits a weak non-Gaussianity (Matsubara 2003a;Munshi et al. 2012;Matsubara and Kuriki 2021), but it is yet difficult to predict MFs of highly non-Gaussian fields with analytic approaches (Petri et al. 2013).Simulation-based predictions of the convergence MFs have been constructed so far, demonstrating that the MFs can carry cosmological information beyond the standard power spectrum analysis (Kratochvil et al. 2012;Shirasaki and Yoshida 2014;Petri et al. 2013).Note that V 0 provides a cumulative PDF of the lensing convergence, sharing some information with the aforementioned one-point PDF.Hence, we focus on the two MFs of V 1 and V 2 in this paper.We estimate the MFs from a pixelized map by following the method in Lim and Simon (2012); Kratochvil et al. (2012).We compute MFs for 41 equally spaced bins of κ thre between −0.05 and 0.15.

SCATTERING TRANSFORM COEFFICIENTS
The scattering transform (ST) provides a powerful means of extracting information from high-dimensional data, while it has mathematically tractable properties of translational invariance, non-expanding variance, and Lipschitz continuous to spatial deformation (Mallat 2012).The ST generates a set of new fields from a given input field by applying a wavelet convolution and a modulus recursively and uses the expected values of the new fields as a collection of statistical information encoded in the original input field.Hence, the ST performs a similar operation to what a common convolutional neural network (CNN) does in practice, i.e. either the ST or the CNN relies on the use of localized convolution kernels, and a non-expansive non-linear operator.Cheng et al. (2020) has shown that the ST statistic can provide cosmological constraints as tight as the CNN does (Ribli et al. 2019) for given weak lensing mass maps.It would be worth noting that the CNN-based cosmological inference requires a costly training process with simulations, whereas the ST statistic simply needs to emulate its expected values as a function of cosmology (as same as other summary statistics do).Considering these points, it is interesting to see if fake convergence fields generated by our model can reproduce the ST statistic as in the true lensing simulations.
In this paper, we measure the ST statistic by following the method in Cheng et al. (2020).For a given lensing convergence field κ, we first generate a set of first-order fields as I 1 ≡ |κ ⋆ Ψ j1,l1 |, where κ ⋆ Ψ j,l represents a convolution with wavelets Ψ j,l with a size index j and orientation index l, and | • | stands for the modulus of a complex-value field.The second-order fields are then defined as In this paper, we consider the ST statistic up to second order, which can be interpreted as a compact set of spatial information about two-and four-point correlation functions (Cheng et al. 2020).See also Cheng and Ménard (2021) for a more intuitive understanding of the ST statistic.
From a set of κ, I 1 , and I 2 , we define the ST statistic as where ⟨•⟩ represents the spatial average over a survey window.We refer to S 0 , S 1 , and S 2 as the ST coefficients.In this paper, we work with Morlet wavelets with L = 4 different rorations.
One can construct a family of Morlet wavelets by dilating a prototype wavelet by factors of 2 j and rotating it by l × 45 degrees.The prototype Morlet wavelet is defined as where σ = 0.8 pixel and k 0 = 3π/4 pixel −1 .As proposed in Cheng et al. (2020), we further reduce the ST coefficients by averaging over the rotation indices of l because of isotropy in the lensing convergence.The relevant ST coefficients to the convergence up to the second order are then given by Throughout this paper, we consider the Morlet wavelets with J = 7 dyadic scales.In this case, we have 7 and 21 independent elements in s 1 and s 2 , respectively.We measure the ST coefficients of s 1 and s 2 from a given convergence field with a publicly available code developed by S. Cheng 5 .

Visual impression
We first see what convergence fields generated by our GAN-based model look like.Figure 2 shows an example of the fake convergence fields as well as an input Gaussian convergence and a popular log-normal model.For the log-normal model, we assume the power spectrum of Eq. ( 20) so that the model can reproduce the average power spectrum in the lensing simulations.We also set the minimum convergence of κ min = −0.0367for the log-normal model, which is equal to the minimum value over 10,000 lensing simulations.
The right panel in the figure displays the convergence field produced by our generators with the input being the Gaussian field shown on the left.Our generator adds prominent features at positive convergence regions, mimicking highly non-linear gravitational effects in dark matter haloes found in N-body simulations.Similar effects can be seen in the log-normal model as shown in the middle, but we find that our generators tend to make convergence values higher than the log-normal counterpart.This is necessary to make a long tail in the onepoint PDF of the fake convergence fields, as predicted by previous numerical simulations (Das and Ostriker 2006;Wang et al. 2009;Takahashi et al. 2011;Shirasaki 2017).It is worth noting that our model preserves large-scale phase information encoded in the input Gaussian model.Hence, the model enables us to produce a set of lensing convergence fields in an intuitive way for cosmological analyses.

Power spectrum
We then investigate the most basic summary statistic of the convergence power spectrum C(ℓ) generated by our neural network-based model.Figure 3 summarizes the statistical nature of C(ℓ) predicted by our model and compares it with the counterparts of the true lensing simulations.In the figure, we https://github.com/SihaoCheng/scattering_transform look into three properties of ensemble average, variance, and a cumulative signal-to-noise ratio defined as where C(ℓ i ) is the power spectrum at the i-th multipole bin of ℓ i , Cov(ℓ i , ℓ j ) is the covariance between C(ℓ i ) and C(ℓ j ), and the sum in Eq. ( 30) runs over all indices with ℓ i ≤ ℓ max and ℓ j ≤ ℓ max .
We find that the convergence power spectrum predicted by our model is in agreement with the counterparts in the true lensing simulations in many aspects.On the ensemble average, the prediction by our generators provides an accurate fit to the simulation counterpart as shown in the left panels in Figure 3. Apart from the largest angular scales of ℓ < ∼ 200 (corresponding to > ∼ 1 degree), our generators can reproduce the information about two-point correlation functions in the lensing simulations within a 1%-level accuracy.At ℓ < ∼ 200, we find 10%-level differences of the power spectra between the ground truth and expected halofit predictions.These differences are left during the image-to-image translation by our GAN, because the GAN's outputs can hold the same largescale modes as input GRFs.In fact, we observe that the cross correlation coefficients of the power spectra between the input Gaussian fields and GAN's outputs are consistent with unity at ℓ < ∼ 200.We speculate the differences in the large-scale powers may be caused by a finite sky coverage of the true ray-tracing simulations.
The variance in C(ℓ) by our generators is also consistent with the simulation counterpart in a wide range of multipoles.We observe that our generators can enhance the variance at smaller angular scales as predicted by the ground truth.Note that this enhancement arises from non-linear effects in gravitational growth of cosmic mass density (Takada and Hu 2013), sourcing the non-Gaussanity in the lensing convergence.Similar non-Gaussian contributions are also prominent in offdiagonal elements in the covariance matrix, reducing the cumulative S/N of C(ℓ) as small-scale information is added (Sato et al. 2009).The right panels in Figure 3 demonstrate that the power spectra produced by our generators preserve almost the same information contents as in the true simula- 3.-Comparisons of convergence power spectra among several generative models.The left panel shows the average power spectra, while the middle presents the variance at a given multipole ℓ.In the right panel, we compare the cumulative signal-to-noise ratio (S/N) defined by Eq. ( 30).In each panel, the blue circles represent the results of the true lensing simulations, whereas the black solid and orange dashed lines stand for ones by our GAN-based and the Gaussian models, respectively.At the bottom, we also show the fractional difference between the GAN predictions and the simulation counterparts, highlighting that our generators can reproduce average, variance, and the cumulative S/N at 300 < ∼ ℓ < ∼ 2000 within a 10%-level accuracy.Note that all the results in this figure are based on 10,000 realizations of the simulations, fake convergence fields generated by our networks, and Gaussian models at a source redshift being zsource = 1.Each realization covers a sky of 3.5 2 deg 2 .-A close look at covariance matrices of convergence power spectra among several models.Each panel summarizes the covariance at different two scales of ℓ and ℓ ′ as normalized to unity at the diagonal elements.The blue circles in each panel represent the simulation results, while the black solid lines stand for the predictions by our generative model.For comparison, the orange dashed lines in each panel show the covariance matrix estimated by 10,000 realizations of the log-normal model.This figure validates that the covariance matrix by our generators can provide a reasonable fit to the simulation results, making our model valuable for standard cosmological analyses based on two-point correlation functions.
tion.The degradation in the S/N with respect to the Gaussian expectation can be explained by our model as well.
It is beneficial to compare the covariance matrix of C(ℓ) by our model with the simulation counterpart in a direct manner because the covariance plays a central role in determining the constraining power of C(ℓ) for a given survey setup.Figure 4 summarizes the covariance matrix predicted by our generators as well as the simulation counterparts.Each panel represents the covariance Cov(ℓ, ℓ ′ ) as a function of ℓ at a fixed ℓ ′ .Note that we normalize the covariance so that its diagonal element can be unity.For comparison, the predictions by the popular log-normal models are shown in the orange dashed lines in the figure.We find that the log-normal model can account for some levels of the non-Gaussian covariance, but its validity is still limited.The log-normal model introduces a non-zero covariance at the smallest scales of ℓ ∼ 10 4 , but it cannot explain the cross-correlations between two ℓ bins when the difference in ℓ becomes large.Our neural-network-based model can account for significant covariance at a wide range of ℓ as seen in the true simulation data.Note that the significant non-Gaussian covariance in the true lensing simulations can be mostly explained by statistical fluctuations in the number of dark matter halos sampled by a finite survey area, referred to as halo sample variance (HSV) (Sato et al. 2009;Takada and Hu 2013).Our model can efficiently learn the HSV effects.
To get some sense of the learning experience of our model, we examine to what extent our model performance can depend on the choice of the number of labels in the training datasets.It is worth noting that we introduce the labeling of the training datasets by the field variance of lensing convergence fields as in Figure 1.This labeling ensures that the convergence field generated by our generators can have a similar field variance to the true counterpart.Because the HSV effects in the covariance of C(ℓ) depend on the variance in cosmic mass density in a survey window, we expect that the fieldvariance-based labeling can make our model efficiently learn the HSV. Figure 5 shows the model performance when we vary the number of labels in the training datasets.We observe that the training with no labels cannot appropriately teach the HSV effects.The model trained without the field-variancebased labels cannot fully explain the non-Gaussian covariance at small angular scales which is a distinct feature in the true lensing simulations.The labeling significantly improves in predicting the scatter in C(ℓ) at a wide range of ℓ, indicating that the field variance is closely related to the HSV effects in the lensing convergence field as expected.We also confirm that our model performance can be well converged as long as the number of labels is set to 10.Our results highlight the importance of physically-motivated annotations in unsupervised training of neural networks for cosmological purposes.
The failure of predicting the covariance is interpretable that our GAN may suffer from the mode collapse.A more sophisticated framework (e.g.Arjovsky et al. 2017) may help solving this issue, but we leave it for future research.It is also beneficial to develop a single generative model conditioned on the field-variance labels in order to reduce the training time.In our computation resource with NVIDIA A100 tensor core, we require about 100 hours to finish training genenerators with 10 different labels.A well-designed architecture accounting for label dependences may be able to reduce the training time by a factor of ∼ 10.

Bispectrum
We study the lowest-order non-Gaussian statistic of the lensing bispectrum predicted by our neural network-based model.The bispectrum provides information about threepoint spatial correlations among several length scales.Because our model aims for an image-to-image translation and the input Gaussian convergence should have zero bispectrum on average, the bispectrum can be a good measure to look into the non-Gaussianity acquired through the translation by our model.
Figure 6 summarizes the average and variance of the bispectrum over 10,000 realizations of the fake convergence fields by our model.In the figure, we consider three representative configurations of Fourier modes, i.e. equilateral, flattened and squeezed shapes.We find that our model can reproduce the equilateral and flattened shaped bispectra in the simulations with a level of < 10% on average, while its variance is also consistent with the true simulation counterparts.It would be worth emphasizing that the popular log-normal model cannot provide a reasonable fit to the bispectrum in the true simulation as seen in Section 5.4.On the squeezed configuration, we observe that the model prediction deviates from the true counterparts with a level of ∼ 30% and ∼ 50% for the average and variance, respectively.Note that the current stateof-the-art fitting formula of the matter bispectrum can explain the lensing bispectrum in simulations with a 10%-level deviation (Takahashi et al. 2020).
Hence, our model learns the image-to-image translation so that it can preserve novel information about the three-point correlations in the lensing simulations.This highlights that our model does not modulate convergence values from the input Gaussian one on a pixel-by-pixel basis.Phase correlations in the Fourier space are a key ingredient for non-zero bispectrum (Matsubara 2003b), whereas there are no phase correlations in the input Gaussian model by construction (see Section 3.1).The non-zero bispectra predicted by our model indicates that the image-to-image translation is operated in a multi-scale manner.

Comparison with log-normal model
In this section, we check the performance of our model to predict various summary statistics and compare the model predictions with the log-normal counterparts.Figure 7 shows a set of summary statistics by our neural-network-based model, including the equilateral-shaped bispectrum, the peak counts, the one-point PDF, the MFs, and the ST coefficients.We see that the log-normal bispectrum is off from the simulation counterparts, whereas our model can reproduce the equilateral bispectrum in the simulations in an accurate way.On the peak count, the one-point PDF, and the MFs, we find that our model improves in predicting statistical properties around rare convergence regions with a > ∼ 5σ significance compared to the log-normal model.We also observe that the ST coefficients by our model are in good agreement with the simulation counterparts, implying that our model may be useful for cosmological inferences based on neural networks.
Importantly, we also find that our model can reproduce the variance of each statistic in the true simulations.For instance,  -Convergence summary statistics predicted by different generative models.Different seven panels summarize the statistics of the equilateral bispectrum, the peak count, the one-point PDF, the first MF (V 1 ), the second MF (V 2 ), and the ST coefficients up to second order (s 1 and s 2 ).In each panel, the blue circles with error bars show the average statistic and standard deviation obtained from the 10,000 lensing simulations.The black solid lines in the panels are our model predictions, and the gray-shaded regions stand for the standard deviation in our model.For comparison, the orange dashed lines represent the average statistics estimated by 10,000 realizations of the log-normal model.our model can predict the variance of the equilateral bispectrum over a wide range of ℓ with a 10%-level accuracy, while it can provide the variances of the peak counts, the one-point PDF, and the MFs in the range of 0 < ∼ κ < ∼ 0.15 with a 20-30% level accuracy.We also observe that the labeling of the training datasets with the field variance is effective in pre-dicting the variance of various summary statistics.When we trained our model without introducing any labels, the model tended to underestimate the variance of the peak counts, the one-point PDF, and the MFs at a wide range of κ by 50% or larger.
To compare our model predictions with the log-normal  -Distributions of chi-square quantities for various summary statistics.The chi-square quantity for a given statistic S is defined as in Eq. ( 31).In each panel, the blue histogram shows the distribution obtained from 10,000 lensing simulations, while the black solid lines are our model predictions based on the neural networks.For comparison, the orange dashed line in each panel represents the distribution for 10,000 log-normal realizations.In this figure, we use all bins of statistics S measured in a given convergence field with a sky coverage of 3.5 2 deg 2 .Details of binning are found in Section 4.4.It is worth noting that the results in this figure do not account for any observational effects.counterparts in a quantitative way, we introduce a chi-square quantity defined as where Si represents the ensemble average at the i-th bin of a given binned summary statistic S, Cov ij [S] is the covariance matrix for the binned S, and S (r) i is the i-th summary statistic at the r-th realization.We evaluate the ensemble averages and covariance matrices for various summary statistics over 10,000 realizations of the true lensing simulations as well as the fake convergence fields generated by our model and the log-normal counterparts.We then compute 10,000 chi-square quantities for the three generative models and compare the histogram of χ 2 [S] among the different models.The comparisons are important in practice because the chi-square quantity in Eq. ( 31) commonly provides an estimate of log-likelihood for the statistic S, which is a central part of modern cosmological analyses.
Figure 8 compares the histograms of χ 2 [S] for six different sets of summary statistics S among the three cases of the true lensing simulations, the log-normal model, and our neural-network-based model.We find that our model can reproduce the distribution of χ 2 in the true simulations for every statistic, while the log-normal counterpart exhibits notable differences from the true one for some statistics.The clearest difference between the true simulations and the log-normal model is found in the distribution of χ 2 for the bispectrum.Apart from the bispectrum, we observe that the distribution by our model explains the median in the true χ 2 distribution better than the log-normal counterpart.Note that we do not include any observational effects in the comparison, indicating that the results in Figure 8 do not provide a realistic loglikelihood function for given statistics.Also, further study is needed to validate if GAN-based log-likelihood computations can be used for parameter inference in a realistic setup.

EXTENSION OF SKY COVERAGE
For an application of our generative model, we here examine to produce a weak lensing mass map covering a continuous sky larger than the field-of-view of individual training data.Note that previous neural-network-based models in the literature (Mustafa et al. 2019;Perraudin et al. 2021;Remy et al. 2023) cannot produce a continuous map if the sky coverage becomes larger than the one assumed in the training datasets.Considering that our model utilizes the Gaussian model as latent information, we propose a four-step procedure to generate a non-Gaussian convergence field with an arbitrary sky coverage; (i) Create a Gaussian realization of the convergence field with a given sky coverage Ω large , (ii) Chop the Gaussian convergence field into a set of subregions so that each subregion can cover the field-ofview of single training datasets, denoted as Ω small = 3.5 2 deg 2 , (iii) Conduct an image-to-image translation at each subregion with our generators, (iv) Concatenate the non-Gaussian fields generated in the previous step (iii).
We then summarize some additional treatments on a stepby-step basis.

STEP (I)
Because our neural-network-based model needs an input image with the size of 256 × 256, we set the size of a large-sky Gaussian convergence field to (256M x ) × (256M y ) where M x and M y are integers.We consider M x = M y = 4 for the demonstration below.Note that the pixel size of 0.82 arcmin on a side is required to be the same as in the training datasets.Hence, a single Gaussian convergence field produced in this step covers a sky of Ω large = 14 2 deg 2 .STEP (II) When dividing the large-sky Gaussian convergence field into small subregions, we decompose it into two terms of where κ G,large represents the Gaussian convergence field produced at step (i), and the two fields on the right-hand side are produced by high-and low-pass filtering of κ G,large such that (34) In the above, we introduce the high-and low-pass filters of W (L) and W (S) , respectively.In this paper, we define those two filters to be top-hat filters in Fourier space.To be specific, each filter in Fourier space is given by W where H(x) is the Heaviside step function which is 1 for x > 0 and 0 otherwise, and ℓ cut is a cutoff multipole.We calibrate ℓ cut so that the field variance of ⟨[κ (S) G,large ] 2 ⟩ over a 3.5 deg 2 sky can follow the same distribution as in the Gaussian simulations used for the training.This calibration is important to avoid covariate shifts (Shimodaira 2000), ensuring that input Gaussian images for our generators should have the same statistical nature as the training data.We find that ℓ cut = 0.55ℓ f provides a good fit, where ℓ f = 2π/Ω 1/2 small = 102.8represents the fundamental Fourier mode in the training datasets.
We then divide the field of κ G,large into M x ×M y subregion so that each subregion can cover the sky coverage of Ω small .We set the number of pixels to be 256 × 256 in each subregion, but we extract each subregion with a 20-pixel overlap region around the edges of every subfield, ensuring that the final map is continuous after combining subfields (Han et al. 2021).Note that the overlap regions are not used to produce the final map.Hence, this procedure removes (M x × 20) × (M y × 20) pixels in our map making, leading the final map to cover a sky coverage of 12.9 2 deg 2 .We hereafter define κ (S) G,small as the high-pass filtered Gaussian field with a sky coverage of Ω small .STEP (III) Given the M x × M y fields of κ (S) G,small , we add non-Gaussian features to individual fields by our generators.This process is formally written as where f NN (•|σ 2 G,small ) represents our neural-network-based model conditioned by the field variance of σ 2 G,small .For the field variance, we compute σ 2 G,small as ⟨[κ G,small ] 2 ⟩ over each subregion.It would be worth noting that we trained 10 generators according to the field variance of the input Gaussian fields.Hence, we have 10 different models of f NN in this paper.

STEP (IV)
We then combine the fields of κ (S) NG,small after removing the 20-pixel overlap regions so that a single combined field covers a 12.9 2 deg 2 sky.In the joining process, we also add the low-pass filtered Gaussian field κ (L) G,large (θ) in order to preserve the large-scale information inherent with the latent field of κ G,large (θ).
We first examine the effect of margins when combining small-area fields into a single large-area field in Figure 9.The left panel shows the extenstion of sky coverage by our baseline method with the 20-pixel margins, while the right represents a naive extenstion without any margins.The figure clearly demonstrates that the overlapping regions around each subregion are important to remove an undesired discontinuity.
Figure 10 shows ensemble averages of various lensing statistics for the non-Gaussian convergence field covering a sky of 12.9 2 deg 2 .In the figure, we summarize the results obtained from 500 realizations of the large-sky non-Gaussian convergence fields by following the steps (i)-(iv).Note that it takes about 60 seconds to produce a large-sky map with M x = M y = 4 with the use of a single GPU6 .The figure highlights that our model enables us to reproduce the lensing summary statistics as same as the true lensing simulations even if extending a sky coverage.The upper left panel shows the average power spectra with two different sky coverages.For comparison, the orange dashed line in the upper left represents a prediction by Eq. ( 21).On the large scale at ℓ < ∼ 100, our model does not learn any information about the power spectrum in the training process, but it can provide a good fit to the theoretical prediction.A similar result can be found in the upper middle panel showing the equilateral bispectrum.We also observe that our model is still able to reproduce the ensemble averages of peak counts, one-point PDF, and the MFs as in the true lensing simulations when the sky coverage of the model is set to be larger than the counterpart of the training data.Note that the peak counts and MFs can be affected by possible discontinuity across the subregions in our map-making because those statistics are defined with the information of spatial derivatives in the lensing convergence.The results in Figure 10 demonstrate that extension of the sky coverage based on steps (i)-(iv) works with no practical problems.
7. LIMITATION Before concluding this paper, we summarize the major limitations of our neural network-based model.The following issues will be addressed in future studies.

Squeezed bispectrum
Although our model can reproduce most of the lensing summary statistics as in the ground truth, there is room for further improvements in the model performance.We find that the squeezed-shaped bispectrum is one of the most difficult statistic to accurately predict our current model.The squeezed bispectrum is commonly defined by the bispectrum B(ℓ 1 , ℓ 2 , ℓ 3 ) with one multipole being significatly smaller than other multipoles.In the end, the number of triangles at the squeezed configurations is limited in our training datasets.To improve the that its average bispectrum can be equal to the theoretical counterpart by (Smith and Zaldarriaga 2011;Hall and Mead 2018) κNG,pert (ℓ) = κG (ℓ) + 1 6 where κG (ℓ) represents a Gaussian convergence in Fourier space with zero mean and its average power spectrum being C G .The bispectrum of κ NG,pert is equal to the reference B ref on average by construction.Once the squeezed bispectrum template can be obtained by numerical simulations and/or analytic approaches, one can include the information with Eq. ( 38) in the training datasets.This pre-training would be among the simplest approaches to update our neural network-based model in a physically intuitive way, but detailed examinations are required.
7.2.Curved Sky Throughout the paper, we assume a flat sky to generate weak lensing mass maps, whereas the assumption can break as the survey area increases shortly (Wallis et al. 2021;Kansal 2023).The flat-sky approximation is valid for the standard power-spectrum analysis in most cases (Kitching et al. 2017;Kilbinger et al. 2017;Lemos et al. 2017), but the validity would depend on sky positions for other non-Gaussian statistics.For instance, projections of a sphere on a plane can induce some distortions of the surface, affecting precise estimates of peaks and contours in the lensing convergence.A promising approach to account for curved-sky effects is to upgrade the architecture of our neural networks with a graph convolutional neural network as introduced by Perraudin et al. (2019).However, it seems difficult to learn a sphere-tosphere translation by keeping the angular resolution to subarcminutes, because the number of grids on an input sphere can be of an order of O(10 8 ) in the common HEALPix pixelization (Górski et al. 2005).A well-organized training strategy on a patch-by-patch basis is needed in practice (Han et al. 2021).

Generative models for lensing tomography
Modern weak lensing analyses commonly employ a tomographic setup consisting of multiple lensing convergence fields with different source redshift information.Although it is easy to increase the number of channels of inputs and outputs for our neural networks, we expect that appropriate labels for multiple-channel training datasets would be important to make our generative model applicable to realistic tomographic setups.There are several candidates for the label even when we assume input images to be produced by a multivariate Gaussian model.The candidates include the field covariance matrix ⟨κ G,i κ G,j ⟩ where κ G,i represents the Gaussian convergence field at the i-th tomographic bin and the average is computed as the spatial average over a field-ofview in individual training datasets.We assume that the field covariance can be a good measure to quantify the HSV effects in the tomographic lensing analysis as motivated by our single-channel analysis, but this assumption should be examined with actual simulation data.Note that the labeling of the field covariance cannot be determined uniquely, indicating that some trial and error would be crucial.

Dependence of cosmological and astrophysical parameters
The ultimate goal in the generative model of weak lensing mass maps is to rapidly-produce a map for various cosmological models while accounting for different astrophysical effects such as source redshift distributions, intrinsic alignments (Troxel and Ishak 2014), and baryonic effects in the cosmic mass density (Chisari et al. 2019).Our study can provide an important first step toward such an all-in-one generative model, but it is still challenging to add realistic cosmological and astrophysical dependence to a deep-learningassisted simulator in general.There are some attempts to develop a cosmology-dependent generative model of weak lensing or two-dimensional mass density maps in the literature (Perraudin et al. 2021;Tamosiunas et al. 2021;Dai and Seljak 2022).It would be worth noting that previous cosmologydependent generators limit themselves to work with a fixed sky coverage, questioning their utilities in a real-world analysis.Few-shot image generation (e.g.Ojha et al. 2021) is among the most promising approaches to update our model so that it can be adapted to various cosmological and astrophysical scenarios.For instance, a baseline generative model can be trained with a large set of fiducial training data, while few-shot learning allows one to update the baseline generator with a small set of training data at different cosmological and astrophysical models.The few-shot learning already has been examined in a classification problem of galaxies (Zhang et al. 2022), and it might be effective for image generation tasks in astronomy.
8. CONCLUSION In this paper, we have proposed a new generative model of the weak lensing mass maps (convergence fields) based on an image-to-image translation by neural networks.The model aims at translating Gaussian convergence fields to fully non-Gaussian counterparts as simulated by cosmological raytracing simulations (Liu et al. 2018).We have trained the neural networks within the framework of Cycle-Consistent Generative Adversarial Networks (Cycle GAN) (Zhu et al. 2017), enabling the networks to learn the style transfer between two different convergence models in an unsupervised manner.We have found that annotating the training datasets is key to realizing the desired level of diversity in the convergence fields generated by our model.Labeling individual training images with field variances is effective in learning non-Gaussian covariance structures in the convergence power spectrum for our neural network-based model.
Compared with a popular log-normal convergence model, our model can provide more accurate predictions of convergence bispectra as well as local structures around rare highdensity regions.We have validated our model by studying multiple sets of convergence summary statistics, including the bispectra, number density of convergence peaks, one-point probability distribution function, Minkowski functionals, and scattering transform coefficients.The model predictions of those summary statistics are statistically consistent with the counterparts in the true lensing simulations.Likelihood functions of the model summary statistics can give a reasonable fit to the ground truth, highlighting that our generative model can play a central role in validating pipelines for cosmological inference.
We have also examined to produce a non-Gaussian convergence field with an arbitrary sky coverage by our neural network-based model.For this purpose, we have laid out a four-step procedure; (i) Generate a Gaussian convergence map, (ii) Divide the Gaussian map into small patches, (iii) perform an image-to-image translation on a patch-by-patch basis, and (iv) combine the resultant non-Gaussian fields in all the patches.We have demonstrated that the four-step procedure works in producing a non-Gaussian convergence field with a sky coverage of ∼ 166 deg 2 in a GPU-minute, whereas our neural networks have been trained with the simulations covering a sky of ∼ 12 deg 2 .We have confirmed that the model summary statistics are still consistent with the ground truth even when extending sky coverage.This method opens a valuable opportunity for massive productions of weak lensing mass maps with realistic non-Gaussianities but covering large areas of the sky.Note that our trained generators only require low-cost Gaussian convergence fields to produce a new set of realistic non-Gaussian counterparts, allowing us to increase the number of independent non-Gaussian convergence fields in a rapid manner.
Our generative model accelerates the production of mock weak lensing mass maps in a simple and efficient way, but it has to be revised in various aspects for real-world appli-cations.The model tends to overestimate a squeezed-shaped bispectrum with a level of 30%, indicating that the input convergence field for our model should preserve weakly non-Gaussian structures.We constructed our generative model assuming single source redshift in a flat sky, but this is not valid for future lensing surveys with a sky coverage larger than 1000 deg 2 and tomographic setups.Our model is not able to produce a weak lensing mass map as a function of various cosmological and astrophysical parameters, which is a common weak point in deep-learning-assisted approaches.We expect that the few-shot learning method can be an effective breakthrough in cosmology-and astrophysics-dependent generators, as a set of various simulation data becomes available.This is along the lines of our future study.

APPENDIX A. NETWORK ARCHITECTURE IN CYCLE GAN
In this appendix, we provide some details of the architecture for our generative networks.In our Cycle GAN, we adopt the same architecture for a pair of GANs.For the generator, we follow the architecture designed by Johnson et al. (2016).This network contains three convolutions, nine residual blocks (He et al. 2016), two transposed convolutions, and one convolution that generates 256 × 256 images.As in Johnson et al. (2016), we insert an instance normalization module (Ulyanov et al. 2017), improving the quality of image stylization.For the discriminator, we adopt 70 × 70 PatchGANs (Isola et al. 2017;Ledig et al. 2017;Li and Wand 2016), aiming at authenticity determination across 70 × 70 overlapping image patches.Note that a patch-level discriminator architecture allows us to reduce the number of parameters compared to a full-image discriminator (Isola et al. 2017).Also, it can be implemented on arbitrarily sized images in a fully convolutional fashion, making the training efficient with GPU environments.Details of the architecture are provided in Tables 2  and 3.

B. TRANSLATION FROM NON-GAUSSIAN CONVERGENCE TO THE GAUSSIAN COUNTERPART
We here summarize the performance of the generator translating from non-Gaussian convergence maps into the Gaussian counterparts (i.e.Gaussianization), that is an interesting by-product of our Cycle GAN.In this appendix, we show the results of the neural network trained without any labeling.
Figure 11 compares three different summary statistics between the target Gaussian and our generative model.We chose power spectra, equilateral-shaped bispectra, and onepoint PDFs for the comparison because these are primarily important to check the degree of Gaussianity left in a given lensing map.The figure shows that our neural-network-based  -Comparisons of lensing summary statistics predicted by our neural-network-based "Gaussianizer" and target Gaussian counterparts.The left, middle, and right panels summarize the comparisons of power spectra, equilateral-shaped bispectra, and one-point PDFs, respectively.In each panel, the blue points show the average statistics over 10,000 realizations of the target Gaussian generative model, whereas the black solid lines stand for the prediction by our Gaussianizer.The blue error bars in each panel correspond to the statistical uncertainty for a survey of 3.5 deg 2 .For ease of comparison, we also include the average summary statistics over 10,000 non-Gaussian convergence fields in the MassiveNuS simulation (Liu et al. 2018) as shown by the orange dashed lines in each panel.This figure clarifies that our neural network can be used for a precise Gaussianization of non-Gaussian convergence fields.12.-Cumulative signal-to-noise ratios (S/Ns) of lensing power spectra for three different lensing maps.The cumulative S/N is defined in Eq. ( 30).The blue points represent the results of the target Gaussian maps, while the black solid and orange dashed lines stand for the Gaussianized and input non-Gaussian maps, respectively.A perfect Gaussianization ensures that the blue points and the solid line should be consistent with each other.
In Figure 12, we study the degree of Gaussianity in power spectra produced by our generator in detail.We compute the cumulative signal-to-noise ratio (S/N) as in Eq. ( 30) for the target Gaussian, the input non-Gaussian, and the Gaussianized maps.If our generator can perform a perfect Gaussianization, the cumulative S/N should match the target Gaussian counterpart.The figure highlights that our neural network- based Gaussianization cannot be perfect.We observe that the cumulative S/N of the Gaussianized power spectrum deviates from the true Gaussian counterpart at ℓ > ∼ 3000.This means that the Gaussianization by our generator remains at some level of non-Gaussianity at angular scales of < ∼ 0.1 deg.We also caution that our Gaussianization assumes a fiducial cosmology for the image-to-image translation.Hence, it is still uncertain if the Gaussianization can work even for non-Gaussian lensing maps at various cosmological models.Another caveat in our method is that our image-to-image translation does not account for the presence of observational noises, which is the main practical difficulty in the Gaussianization of weak lensing fields.This paper was built using the Open Journal of Astrophysics L A T E X template.The OJA is a journal which provides fast and easy peer review for new papers in the astro-ph section of the arXiv, making the reviewing process simpler for authors and referees alike.Learn more at http://astro.theoj.org.
FIG.1.-Labeling of training datasets with field variances of lensing convergence fields.Different colored squares in the lower left panel stand for labels of two different datasets.The labels are set by the field variance of ⟨κ 2 ⟩ for either simulated or Gaussian fields.The size of each square represents a width of field variances.Each square contains the same number of realizations of training data.In this figure, we consider 10 labels.The horizontal axis in the lower left represents the variance of the Gaussian convergence fields, while the vertical axis shows the simulated counterpart.The upper and right panels show the cumulative distribution of the variances over 10,000 realizations of the two different fields.When training the Cycle GAN, we use the datasets with identical labels, requiring each of 10 different generators to be trained with 1000 realizations of the training sets.It is worth noting that each realization of Gaussian and simulated fields is not aligned, e.g. the two do not share phase information in a survey window with each other.
FIG. 2.-An example of lensing convergence fields generated by our neural network.The left panel shows an input Gaussian field, while the generated convergence by our model is presented on the right.For comparison, the middle panel represents a log-normal model with the same phase information as the Gaussian field.In each panel, the inset figure highlights a zoom-in view around the highest convergence value in a field of view covering 3.5 2 deg 2 .
FIG.3.-Comparisons of convergence power spectra among several generative models.The left panel shows the average power spectra, while the middle presents the variance at a given multipole ℓ.In the right panel, we compare the cumulative signal-to-noise ratio (S/N) defined by Eq. (30).In each panel, the blue circles represent the results of the true lensing simulations, whereas the black solid and orange dashed lines stand for ones by our GAN-based and the Gaussian models, respectively.At the bottom, we also show the fractional difference between the GAN predictions and the simulation counterparts, highlighting that our generators can reproduce average, variance, and the cumulative S/N at 300 < ∼ ℓ < ∼ 2000 within a 10%-level accuracy.Note that all the results in this figure are based on 10,000 realizations of the simulations, fake convergence fields generated by our networks, and Gaussian models at a source redshift being zsource = 1.Each realization covers a sky of 3.5 2 deg 2 .
FIG.4.-A close look at covariance matrices of convergence power spectra among several models.Each panel summarizes the covariance at different two scales of ℓ and ℓ ′ as normalized to unity at the diagonal elements.The blue circles in each panel represent the simulation results, while the black solid lines stand for the predictions by our generative model.For comparison, the orange dashed lines in each panel show the covariance matrix estimated by 10,000 realizations of the log-normal model.This figure validates that the covariance matrix by our generators can provide a reasonable fit to the simulation results, making our model valuable for standard cosmological analyses based on two-point correlation functions.
FIG. 5.-Dependence of our model performance on the number of labels based on field variances.Similar to Figure 3, but we show the results when varying the number of labels N in the training datasets.The blue circles in each panel are the results obtained from 10,000 lensing simulations, while different colored lines stand for the model predictions with different N .
FIG.6.-Comparison of convergence bispectra predicted by the lensing simulation and our model.The left, middle, and right panels show the bispecra with three different configurations of equilateral, flattened, and squeezed shapes, respectively.The blue circles with the error bars in the upper panels represent the simulation results, while the black solid lines stand for the model prediction.The shaded region in the upper panels highlights the statistical uncertainty of the bispectra by our model.In the second row, we show the ratio of the average bispectra between the two (the black solid lines there show the model bispectra divided by the simulation counterparts).The bottom panels summarize the ratio of the variance.The gray lines at the bottom are the model variance divided by the simulation counterparts.All the results are based on 10,000 realizations of the simulations and the neural network-based model.

FIG
FIG.7.-Convergence summary statistics predicted by different generative models.Different seven panels summarize the statistics of the equilateral bispectrum, the peak count, the one-point PDF, the first MF (V 1 ), the second MF (V 2 ), and the ST coefficients up to second order (s 1 and s 2 ).In each panel, the blue circles with error bars show the average statistic and standard deviation obtained from the 10,000 lensing simulations.The black solid lines in the panels are our model predictions, and the gray-shaded regions stand for the standard deviation in our model.For comparison, the orange dashed lines represent the average statistics estimated by 10,000 realizations of the log-normal model.
FIG.8.-Distributions of chi-square quantities for various summary statistics.The chi-square quantity for a given statistic S is defined as in Eq. (31).In each panel, the blue histogram shows the distribution obtained from 10,000 lensing simulations, while the black solid lines are our model predictions based on the neural networks.For comparison, the orange dashed line in each panel represents the distribution for 10,000 log-normal realizations.In this figure, we use all bins of statistics S measured in a given convergence field with a sky coverage of 3.5 2 deg 2 .Details of binning are found in Section 4.4.It is worth noting that the results in this figure do not account for any observational effects.
FIG.11.-Comparisons of lensing summary statistics predicted by our neural-network-based "Gaussianizer" and target Gaussian counterparts.The left, middle, and right panels summarize the comparisons of power spectra, equilateral-shaped bispectra, and one-point PDFs, respectively.In each panel, the blue points show the average statistics over 10,000 realizations of the target Gaussian generative model, whereas the black solid lines stand for the prediction by our Gaussianizer.The blue error bars in each panel correspond to the statistical uncertainty for a survey of 3.5 deg 2 .For ease of comparison, we also include the average summary statistics over 10,000 non-Gaussian convergence fields in the MassiveNuS simulation(Liu et al. 2018) as shown by the orange dashed lines in each panel.This figure clarifies that our neural network can be used for a precise Gaussianization of non-Gaussian convergence fields.
FIG.12.-Cumulative signal-to-noise ratios (S/Ns) of lensing power spectra for three different lensing maps.The cumulative S/N is defined in Eq. (30).The blue points represent the results of the target Gaussian maps, while the black solid and orange dashed lines stand for the Gaussianized and input non-Gaussian maps, respectively.A perfect Gaussianization ensures that the blue points and the solid line should be consistent with each other.

TABLE 1 SUMMARY
STATISTICS OF LENSING CONVERGENCE FIELDS.THOSE STATISTICS ARE USED FOR VALIDATION OF THE PERFORMANCE OF OUR NEURAL NETWORK-BASED GENERATIVE MODEL.WE ARE INTERESTED IN NOT ONLY THE ENSEMBLE AVERAGE BUT ALSO THE COVARIANCE MATRIX OF EACH STATISTIC.

TABLE 2 GENERATOR
NETWORK ARCHITECTURE: LAYER TYPES, ACTIVATIONS, OUTPUT SHAPES (CHANNELS × HEIGHT × WIDTH) AND THE NUMBER OF TRAINABLE PARAMETERS FOR EACH LAYER.CONV(k, s) ARE k × k CONVOLUTION LAYERS WITH STRIDES=s, WHILE TCONV(k, s) STANDS FOR THE TRANSPOSED COUNTERPART.RESNETBLOCK DENOTES A RESIDUAL BLOCK THAT CONTAINS TWO 3 × 3 CONVOLUTIONAL LAYERS WITH THE SAME NUMBER OF FILTERS ON BOTH LAYERS.

TABLE 3 SIMILAR
TO TABLE 2, THISTABLE SHOWS THE DISCRIMINATOR NETWORK ARCHITECTURE.IN THE DISCRIMINATOR, WE SET EVERY LEAKYRELU'S LEAKINE TO BE 0.2.