Novel and flexible parameter estimation methods for data-consistent inversion in mechanistic modelling

Predictions for physical systems often rely upon knowledge acquired from ensembles of entities, e.g. ensembles of cells in biological sciences. For qualitative and quantitative analysis, these ensembles are simulated with parametric families of mechanistic models (MMs). Two classes of methodologies, based on Bayesian inference and population of models, currently prevail in parameter estimation for physical systems. However, in Bayesian analysis, uninformative priors for MM parameters introduce undesirable bias. Here, we propose how to infer parameters within the framework of stochastic inverse problems (SIPs), also termed data-consistent inversion, wherein the prior targets only uncertainties that arise due to MM non-invertibility. To demonstrate, we introduce new methods to solve SIPs based on rejection sampling, Markov chain Monte Carlo, and generative adversarial networks (GANs). In addition, to overcome limitations of SIPs, we reformulate SIPs based on constrained optimization and present a novel GAN to solve the constrained optimization problem.

A common research scenario for mechanistic modeling, especially in biology, involves recordings of ensembles under different conditions.For example, to find the effect of a candidate drug compound, characteristics of two sets of isolated cells may be recorded, one under control conditions and the other under the effects of the drug.The problem of finding input parameters of a model for multiple conditions distinguished by some factor (e.g., drug action, age, disease state, etc.) can be framed as an intervention problem, in which a subset of model parameters (potentially known to be related to the mechanism of action of the intervention) are allowed to vary across conditions, while other model parameters remain unchanged.
For an example of parameter inference in an intervention scenario, we set the goal of simultaneously inferring MM parameters for two sets of observations: the first under a 'control' condition, and the second under a 'drug' condition.We adapted the r-GAN architecture to solve SIP in this scenario.Here we denote by x c ∼ Q Xc , x d ∼ Q X d samples of model input parameters for the control population and treatment populations under the drug, respectively.Our goal was to evaluate distributions Q Xc and Q X d given distributions of observations Q Yc and Q Y d for the control and treatment populations.Note that we consider the situation where we do not have pairwise observations for each object under both control and drug conditions.We address this type of scenario because it is prevalent in healthcare and life sciences domains, such as randomized clinical trials.Inferring model input parameters in the less common situation where pairwise observations are available can be solved more simply.
To proceed, we define a joint probability distribution between X c and X d with marginals Q Xc and Q X d .Interventions rarely affect the whole set of model input parameters.Often, input parameter vectors can be split into the components x s that are not affected by the drug (shared parameters) and components xc , xd forming two vectors of input parameters x c = [x s , xc ], x d = [x s , xd ] for the control and treatment groups, respectively.The split results in the factorization q Xc, Xd |Xs ( xc , xd |x s ) = q Xc|Xs ( xc |x s )q Xd |Xs ( xd |x s ).This problem cannot be solved independently for the two populations.However, the extension of r-GAN to accommodate this problem is straightforward: The graph of the intervention r-GAN with shared parameters is shown in Figure 1A.Through a combination of three generators with both shared (Z1) and unshared (Z2, Z3) base variables, experimental information is incorporated into the structure of the GAN itself.Four discriminators are used to provide the generators' losses in a weighted sum as in (13) below: two ensure that the mechanistic model outputs match both control and drug observations, and two maximise the overlap between sampled parameter sets and the parameter priors.
To demonstrate the intervention r-GAN configuration, we used a synthetic dataset following the intervention scenario.A 2-dimensional Rosenbrock function, was used as the mechanistic model.We generated samples of observations with distribution Q Yc , corresponding to the control condition, from N (250, 50 2 ), shown in Figure 1B, solid black line.The ground-truth distribution of input parameters G Xc coherent to Q Yc is shown in Figure 1D as black contour lines.Next, we sampled from the distribution of ground truth input parameters G Xc and applied linear scaling to the x 2 parameter according to x 2,d = 0.6x 2,c , to generate ground truth input parameters for observations under the intervention (drug) condition.Note that the input parameter x 1 is considered to be the shared input parameter, x s , which is known to be unaffected by the intervention.The groundtruth distribution of input parameters after intervention G X d is shown in Figure 1E as red contour lines.Equation 2 was run with G X d as inputs to obtain the intervention target output distribution Q Y d , shown in Figure 1C, solid red line.We reemphasize that we do not have pairwise data for each object Fig. 1.A r-GAN for parameter inference in an intervention scenario.Here the intervention is applied to a subset of parameters X, while a generator G1 samples unaffected, shared parameters Xs.Two additional generators, G1 and G2, respectively sample X under the intervention and non-intervention conditions d and c.Two simulations produce model outputs from parameters generated for each condition.This r-GAN has four discriminators contributing to the generators' losses in weighted sums.B. KDEs of target distribution under control conditions Q Yc (black solid) line and generated (inferred) output distribution Q Yc,g (dashed line).C. KDEs of target distribution after intervention, Q Y d (red solid line) and the inferred output distribution Q Y d,g (dashed line).D. Joint distribution of model input parameters for control observations for ground truth parameters G Xc used to generate the observations (black contour lines), and inferred parameters via r-GAN (contour map in blue).E. As in D, but after intervention for ground truth parameters G X d (red contour lines), and inferred parameters via r-GAN (contour map in blue).
under control and after intervention conditions, i.e., we do not have information on the joint distribution of observations across the two sets, but only the marginal distributions Q Yc and Q Y d of the separate observations.We used the r-GAN with shared variables (1) (Figure 1A) to infer model input parameters coherent with the observations Q Yc and Q Y d .The distributions of inferred input parameters under control and intervention conditions are shown in Figures 1D and 1E by the blue contour maps.The generated distributions of input parameters resulted in the output observation distributions shown by the dotted density lines in Figures 1B and 1C, which closely match both target distributions.In this scenario, there were not sufficient prior constraints for the inferred parameter distribution after intervention (Figure 1E) to precisely match the ground truth distribution (i.e. the true effect of the intervention on model parameters) as demonstrated by the blue shaded regions outside of the red contour lines.Note, however, that by targeting a uniform distribution in parameter space, and while being constrained to match the observations in output space, the r-GAN inferred a wider range of possible effects of the intervention than were present in the ground truth effects, i.e., the intervention r-GAN result contains the actual intervention effect as one of multiple possible coherent solutions.
Explicitly known deterministic map.Next, we modifed r-GAN to demonstrate its flexibility to adapt to a second scenario, i.e., an intervention with a known effect.We simulated the intervention with an explicit deterministic map x d = T (x c ), and used this to infer the ensemble of MM parameters consistent with data observed before and after the intervention.This configuration would be useful for a scenario where the effect of the perturbation is fully understood.For example, a drug with known effect on a specific cellular membrane protein may be employed to test the response of a cell in an experiment.A suitable r-GAN to solve this intervention SIP (Figure 2A) is then: We used the r-GAN with explicit map (3) (Figure 2A) to infer model input parameters coherent with the observations Q Yc and Q Y d (Figures 2B and 2C), which were the same as in the intervention with shared variables.The r-GAN with explicit deterministic map produced distributions of input parameters shown in Figures 2D and 2E by the blue contour maps, which closely match the ground truth distribution of input parameters.The output distribution of the function, corresponding to the generated input parameters, is shown by the dashed-dotted density lines in Figures 2B and 2C.
These two cases represent two extremes of the knowledge that may be available about the intervention's effect.In the shared variables case, independent input parameters imposed the weakest possible constraint on the joint distribution.We emphasize that despite the factorization, the joint density derived from our novel methods does not necessarily assume that no correlation between X c and X d exists.It is possible to construct an infinite number of joint distributions for input parameters x c and x d , each of which yields exactly same marginal distributions Q Xc , Q X d of the input parameters and generates the same distributions of output observations Q Yc and Q Y d .The factorization of joint density only implies the degree of uncertainty about the joint distribution, with independent treatment of the two populations.The joint distribution of input parameters Q Xc , Q X d in the final solution can then be chosen based on some additional criteria, e.g., by solving the optimal transport problem [1] To construct the second extreme case, we assumed a strong constraint, i.e., we treated the relationship between input parameters in the two groups as a known deterministic map.In this case, the r-GAN correctly inferred the ground truth parameter distributions (Figures 2D and 2E).For simplicity of the presentation, we provided only these two extreme examples, leaving other configurations for future work.For example, it is also possible to construct joint distributions that lie between the two extremes using an r-GAN configuration that accounts for smooth responses to the intervention.Smooth response might be implemented using different configurations of generator networks with additional regularization, e.g, enforcing Lipschitz continuity in neural networks [2,3].

II. TEST FUNCTION EXAMPLES
We tested the Rejection algorithm using Gaussian Mixture Models (GMMs), with and without Markov chain Monte Carlo (MCMC) for sample initialization, along with c-GAN and r-GAN, on several example test functions, in addition to those shown in Figures 2 and 3 of the main manuscript.The examples comprised both two-dimensional and highdimensional Rosenbrock functions, a two-dimensional piecewise discontinuous function [4], and an ordinary differential equation model with two inputs [5].

A. GMM and MCMC methods
To compare our methods, we performed rejection sampling using algorithm 2 in the main manuscript.Samples for the rejection sampling were initialized using either samples from the prior distribution (notated as 'Rejection' or 'Rej') or using a MCMC method implemented using TensorFlow libraries (notated as 'MCMC').To calculate log-density under target distributions in all methods, we fit a Gaussian mixture model to the target samples from P Y , with the same samples used as the target in all GAN methods.For the MCMC initialization step, we initialized workers by sampling from P X , and then used the No U-Turn Sampler [6] to generate initial proposals.

B. Two-dimensional Rosenbrock function
We tested the SIP methods in a scenario in which a nonlinear model is clearly non-invertible, in that multiple disjoint modes in parameter space exist, each capable of producing the full distribution of target model outputs.Using the Rosenbrock function of two input parameters, with a = 1, b = 100 (Figure 3A), over the prior range (x 1 , x 2 ∼ U(0, 2)), the SIP methods were used to infer the joint distribution of input parameters coherent with target samples from a distribution Q Y of N (250, 50 2 ) (Figure 3B).Calculating outputs according to (4) for the inferred input parameter samples Q Xg generated by each method resulted in the model output distributions Q Yg shown in Figure 3B.The generated output distribution almost perfectly matches the desired target distribution.JS-divergence estimates (Figure 3C) show that all four methods perform similarly in sampling parameters coherent with the target samples.Figure 3D-G show histograms of Q Xg for each method.A. An explicit map T can be applied to generated parameters under the control condition Q Xc,g to produce parameters under the perturbed condition Q X d ,g .Two mechanistic simulations produce model outputs from parameters generated for each condition.This r-GAN has three discriminators contributing to the generators' losses in weighted sums.The r-GAN enforces the equality of both Q Yc with Q Yc,g and Q Y d with Q Y d ,g , while maximizing the overlap between P Xc and Q Xc,g .B. KDEs of the target distribution under control conditions Q Yc (black solid) line and the generated (inferred) output distribution Q Yc,g via r-GAN with explicit mapping (dashed dotted line).C. KDEs of the target distribution after intervention, Q Y d (red solid line) and the inferred output distribution Q Y d,g via the r-GAN with explicit mapping (dashed dotted line).D. Joint distribution of the model input parameters inferred via r-GAN with explicit mapping (contour map in blue) for the control observations with distribution Q Yc (black contour lines).E. As in D., but for joint distribution after intervention (contour map in blue) for Q Y d (red contour lines).

C. High dimensional Rosenbrock function
To mimic the complexity of most biophysical models, we also considered a Rosenbrock function with multidimensional inputs, with a = 1, b = 100, and the dimension N set to 8.
To generate a model M with a vector of outputs y rather than a scalar, we performed 5 randomly chosen permutations of the coordinates {x i } in (5), yielding the 5 dimensional output vector (i.e., the dimensions of X and Y were 8 and 5, respectively): where x i comprise permutations of the vector x.Similar to the Rosenbrock function of two input parameters, we considered a uniformly distributed prior for the high dimensional model We applied the SIP methods to infer parameters for this 8-dimensional Rosenbrock function according to (5), with 5-dimensional output according to (6).The target output distribution Q Y was a multivariate normal distribution with means µ i = 250, i = 1, 2, . . ., 5 and diagonal covariance matrix with standard deviation of each individual features σ Yi = 50, i = 1, 2, . . ., 5. Figure 4C shows estimated JSdivergences between P X and Q X and between Q Y and Q Yg .
Here, unlike the other methods, to capture the inverse surrogate of the model, the c-GAN needs to learn the multidimensional output function over the entire support of the prior, then perform amortized inference, resulting in comparatively poor performance of the c-GAN in this scenario.Rejection, MCMC and r-GAN had similar performance at sampling from the coherent distribution.

D. A piecewise smooth function
Another parameter inference test in [4] used a piecewise smooth function, model parameter inference within a disconnected and compact region of a discontinuous function.The mechanistic model was represented by the equation: The distribution of observations Q Y was N (−2.0, 0.25 2 ), and P X for both x 1 and x 2 ∼ U(−1, 1). Figure 5A shows a heat map of y over P X for (7).
In this example, we incorporated a surrogate model in the form of a feedforward network trained to approximate y = M(x) as the model node in the r-GAN.The surrogate network consisted of 4 dense layers with 400 nodes per layer, ReLU activation, and a dropout rate of 0.1 between layers.
Calculating outputs according to (7) for the inferred input parameter samples Q Xg generated by each method, shown in Figures 5D-G, resulted in model output distributions Q Yg via Rejection, MCMC, c-GAN and r-GAN, each of which matched the target observation density (Figure 5B).

Finally, we tested the Rejection, MCMC, c-GAN and r-GAN methods on a model represented by an ordinary differential equation (ODE) with 2 input parameters
The target feature of interest is y(x) = 1 2 2 0 f (t; x)dt.Since the model is an ODE, we could have directly incorporated it into the deep learning networks for inference using a differentiable ODE solution [7].Another option would be to solve the differential equation analytically and use the closed form solution in the deep learning network.However, as in the piecewise function example, we chose to build a forward model surrogate.To train the surrogate model, we sampled 10, 000 points from the prior and obtained the target feature   of interest for all the training points by solving numerically the differential equation using the Python scipy module.Figure 6A shows the heat map of the feature of interest estimated via the trained surrogate forward model over the prior.
The distribution of the target observation Q Y was N (1.8, 0.05 2 ), and the input parameter prior distribution P X with x 1 ∼ U(0.8, 1.2) and x 2 ∼ U(0.1, π − 0.1) as in [5].The joint distribution of the parameters x 1 and x 2 obtained using Rejection, MCMC c-GAN, r-GAN and MCMC is plotted in Figures 6D, E, F and G, respectively.Simulating the ODE model with the inferred input parameter samples produces distributions of outputs that match the target observation density accurately (Figure 6B).

III. GAN CONFIGURATION A. GAN stabilization and training
Reliably training GANs can be challenging due to known problems, such as mode collapse.Several approaches have aimed to stabilize adversarial networks [8].However, in practice, it remains very difficult to find a robust and reliable method for GAN stabilization that works for a broad spectrum of generative models.One class of relatively simple stabilization algorithms is designed to increase the entropy of samples produced by the generator.This is accomplished by increasing mutual information between the input and output of the generator network, which in this case is equivalent to maximizing the entropy of the output.The loss function of the generator is therefore augmented by a mutual information term computed using dual representations of KL-divergence [9] or reconstruction networks as in VEEGAN [10].Here, we incorporate the reconstruction network approach, training the reconstruction network to reproduce the latent distribution P Z from samples from the generator.We simplify the approach of VEEGAN, including only the ℓ 2 component of the reconstructor loss, excluding the cross-entropy term and removing the dependence of the discriminator on z.The r-GAN architecture diagram in the main manuscript excluded the reconstruction network used for stabilization for simplicity.The complete r-GAN, including reconstruction used for training all examples (except SIP extensions in section I), is shown here in Figure 7.
a) r-GAN loss functions: In Figure 7, discriminator D Y distinguishes between samples from the distribution Q Y and samples generated by the generator G forwarded through the mechanistic model, for which the standard loss, is maximized.Discriminator D X distinguishes between samples from the prior over mechanistic parameters P X and samples generated by G, for which the standard loss, is maximized.The reconstruction network R aims to reproduce the original base distribution Z from samples generated by G, for which the squared loss is calculated according to The generator network G generates mechanistic parameter sets from the base variable Z, for which losses are calculated from both D Y and D X according to The total loss for G is then the weighted sum loss, which is minimized, where w Y = 1.0, w X = 0.1, and w R = 1.0 are default weights.We used the Adam optimizer with step size of 0.0001 for G and R, and 0.00002 for D X , and D Y .The β 1 and β 2 parameters of the Adam optimizer were set to default values of 0.9 and 0.999, respectively, as suggested in [11], and minibatch size was 100.Training was performed in two stages.First, G, R and D X were trained together, with w X = 1.0 and the L D Y term removed in (13) (i.e., w Y = 0) for 100 epochs to initialize G by minimizing D(P X ||Q X g ).Second, the full GAN was trained for 200 epochs on a dataset y ∼ Q Y comprising MNIST training images for the super-resolution imaging problem, or 10, 000 samples for the synthetic datasets used in the other experiments (30, 000 samples for the highdimensional Rosenbrock test in section II-C).

B. GAN configuration for images
The network configurations used for the discriminators, generator, and reconstruction network in the r-GAN for superresolution imaging are shown in Figure 8.These networks were incorporated into the r-GAN architecture shown in Figure 7.The binary classifier network was used for JS-divergence calculations on samples from either the full MNIST dataset or subsets according to the labels, and samples from trained GAN or r-GAN generators (see section 6 of the main manuscript).Dropout rate was 0.1 in the discriminators, 0.4 in the reconstruction network, and 0.6 in the binary classifiers.Image size was 28 × 28 for high resolution (HR) images, and 22 × 22 for low resolution (LR) images.

C. GAN configuration for test functions
All test functions except the high-dimensional Rosenbrock had the same structure, with 2 inputs (x) and 1 output (y).We used identical c-GAN and r-GAN networks in all cases, with the generator, discriminator, and reconstruction networks consisting of densely connected, feedforward layers, and using either spectral normalization [2]

D. GAN configuration for SIP extensions
For the intervention SIP examples described below in section I and shown in Figure 1, we used networks with details shown in Table II.In the shared parameter study, with multiple generators, the three sets of base variables P z1 , P z2 and P z3 were concatenated as the target for the reconstruction network, and the reconstruction network took all generated x values

IV. CLASSIFIERS FOR JS-DIVERGENCE ESTIMATION AND
REGULARIZATION OF GAN DISCRIMINATORS We estimated JS-divergence between P X and Q Xg , or between Q Y and Q Yg , using the density ratio trick with a classifier trained to distinguish between samples from the two distributions [12].For samples x 1 and x 2 from two distributions X 1 and X 2 , JS-divergence was calculated according to where S is a classifier trained to distinguish samples from X 1 from samples from X 2 (see Figure 8D), and n 1 and n 2 are the number of samples in x 1 and x 2 .
In practice, for each test function and for each method (Rejection, MCMC, c-GAN and r-GAN), we generated 10, 000 samples from both the target distribution and inferred distribution, and further sampled randomly 1, 000 of each of those samples as test sets for the JS-divergence calculation using (14).With the remaining 9, 000 samples, we trained 5 classifiers using a different random subset of 7, 200 samples as the training set and 1, 800 samples as the validation set for each.The error bars in all JSD measures in the figures show the standard deviation of JSD values across these 5 different trained classifiers.The classifier was a 2-layer dense network with 100 nodes per layer and softplus activation, and a singlenode output layer, trained with the binary cross-entropy loss between the two sets of samples for 1, 000 epochs with batch size 1, 000, using early stopping with patience of 40 epochs based on the loss calculated on the validation set.
As we mentioned in section 4.1 of the main manuscript, estimating divergence measures in a high-dimensional space such as images is extremely challenging.In such a highdimensional space, convolutional classifiers become very effective at discriminating between "true" and generated samples.We used a simple network, relative to the discriminator networks in the r-GAN used for super-resolution imaging, for the binary classification of images shown in Figure 8D.This classification network was simple and highly regularized (dropout rate of 0.6) in order to prevent perfect classification of samples, but the nature of the regularization of the classifier played an important role in the JS-divergence values that are estimated.For JS-divergence estimation in the super-resolution imaging example (see Table I in the main manuscript), we trained classifiers 10 times for each comparison, and report the mean JS-divergence estimate for each.Standard deviations were below 2 decimal places, so were not reported.
The choice of discriminator regularization plays an important role in the solutions to SIP found by r-GAN, especially for high-dimensional problems.Because the generator defines an implicit density model that is constructed during training, the extent to which the discriminator is able distinguish real observations from generated samples strongly influences the parameter distributions that are generated by the final trained model.An extremely powerful discriminator should lead to generation of samples that precisely match the observed samples, whereas a less powerful, more highly regularized discriminator should lead to generation of samples that are less precisely aligned to the observed samples.In the superresolution imaging example, this manifests as a larger JS-divergence between '5's in the MNIST dataset and samples generated by the r-GAN (0.22) than between '5's in the MNIST dataset and samples generated by the GAN with PULSE model (0.07), which finds exactly one output sample per target image.
An important avenue for future work is to establish methods for selecting among reasonable options for discriminator regularization for a particular problem.This could be accomplished, for example, by training classifiers with the configuration of a discriminator testing different regularization methods on the target dataset using methods analogous to cross-validation.

Fig. 2
Fig.2.r-GAN modified for an explicitly known intervention scenario.A. An explicit map T can be applied to generated parameters under the control condition Q Xc,g to produce parameters under the perturbed condition Q X d ,g .Two mechanistic simulations produce model outputs from parameters generated for each condition.This r-GAN has three discriminators contributing to the generators' losses in weighted sums.The r-GAN enforces the equality of both Q Yc with Q Yc,g and Q Y d with Q Y d ,g , while maximizing the overlap between P Xc and Q Xc,g .B. KDEs of the target distribution under control conditions Q Yc (black solid) line and the generated (inferred) output distribution Q Yc,g via r-GAN with explicit mapping (dashed dotted line).C. KDEs of the target distribution after intervention, Q Y d (red solid line) and the inferred output distribution Q Y d,g via the r-GAN with explicit mapping (dashed dotted line).D. Joint distribution of the model input parameters inferred via r-GAN with explicit mapping (contour map in blue) for the control observations with distribution Q Yc (black contour lines).E. As in D., but for joint distribution after intervention (contour map in blue) for Q Y d (red contour lines).

CFig. 3 .
Fig. 3. Two-dimensional Rosenbrock function test. A. Heat map of y over the prior.B. Kernel density estimation (KDE) of the desired target output distribution Q Y (black) and the generated (inferred) output distributions Q Yg using Rejection, MCMC, c-GAN and r-GAN (in green, orange, pink and purple, respectively).C. Left: estimated JS-divergence between samples from P X and Q X and Right: estimated JS-divergence between samples from Q Y and Q Yg for all methods.D-F.2D histograms of Q X for Rejection (D), MCMC (E), c-GAN (F), and r-GAN (G).The dashed rectangle denotes the bounds set by the prior P X .

Fig. 4 .
Fig. 4. High-dimensional Rosenbrock function test. A. Heat map of y 1 over the marginal prior for x 1 and x 2 .Note, only 2 dimensions of x and 1 dimension of y are displayed for visualization B. Marginal KDEs of the desired target output distribution Q Y (black) and the generated (inferred) output distributions Q Yg using Rejection, MCMC, c-GAN and r-GAN (in green, orange, pink and purple, respectively).Multiple lines show marginals for all 5 dimensions of y C. Left: estimated JS-divergence between samples from P X and Q X and Right: estimated JS-divergence between samples from Q Y and Q Yg for all methods.D-F.2D marginal histograms of x 1 and x 2 from Q X for Rejection (D), MCMC (E), c-GAN (F), and r-GAN (G) for x 1 and x 2 .The dashed rectangle denotes the bounds set by the prior P X .Note, only 2 dimensions of x are displayed for visualization.

Fig. 5 .
Fig. 5. Piecewise smooth test function.A. Heat map of y over the prior.B. KDE of the desired target output distribution Q Y and the generated (inferred) output distribution Q Yg using Rejection, MCMC, c-GAN and r-GAN (in green, orange, pink and purple, respectively).C. Left: estimated JS-divergence between samples from P X and Q X and Right: estimated JS-divergence between samples from Q Y and Q Yg for all methods.D-F.2D histograms of Q X for Rejection (D), MCMC (E), c-GAN (F), and r-GAN (G) for x 1 and x 2 .The dashed rectangle denotes the bounds set by the prior P X .

Fig. 6 .
Fig. 6.ODE function test. A. Heat map of y over the prior.B. KDE of the desired target output distribution Q Y (black) and the generated output distributions Q Yg using Rejection, MCMC, c-GAN and r-GAN (in green, orange, pink and purple, respectively).C. Left: estimated JS-divergence between samples from P X and Q X and Right: estimated JS-divergence between samples from Q Y and Q Yg for all methods.D-F.2D histograms of Q X for Rejection (D), MCMC (E), c-GAN (F), and r-GAN (G).The dashed rectangle denotes the bounds set by the prior P X .

Fig. 7 .
Fig. 7. r-GAN architecture, showing reconstruction network R and reconstruction loss L R .
(in D Y discriminators) or dropout (in D X discriminators) for regularization.Table I details the configuration of each network.All test function examples used w X = 0.03, w R = 3.0 in the second stage of training, and Adam optimizer step sizes of 0.0001 for G and R and 0.00001 for D X and D Y .

Fig. 8 .
Fig. 8. Network configurations used in GAN and r-GAN for super-resolution imaging.Grey boxes outline 'residual blocks', with black arrows showing 'skip connections', which used a 1×1 convolution when size changed between residual blocks' inputs and outputs.A. Network used for discriminators D X and D Y , with either high-resolution (HR) or low-resolution (LR) images as inputs, respectively.B. Network structure used for generator G. Input derived from the base distribution P Z , and output formed a HR image.C. Network used for reconstruction network matches the discriminator network, but with additional final layers to recreate P Z .D. Network used for binary classifiers trained for JS-divergence calculations.

TABLE I NEURAL
NETWORKS USED IN C-GAN AND R-GAN ARCHITECTURES.Xs,g , Q Xc,g and Q X d,g as input.The intervention examples were trained for 200 epochs at both training stages.The shared parameter study used w X = 0.1, w R = 3.0, and Adam optimizer step sizes of 0.0001 for G and R, and 0.00002 for D X , and D Y .The explicit map study used the same settings, except for w X = 0.03, and Adam optimizer step sizes of 0.00001 for D X and D Y .
Custom code for r-GAN was developed in TensorFlow and Pytorch, available at https://github.com/IBM/rgan-demopytorch. All experiments were conducted on a single NVIDIA V100 GPU.r-GAN training in the super-resolution imaging example took approximately 5 minutes to compute.The test function examples each took approximately 1.5 minutes to compute for MCMC, 2 minutes for c-GAN training, and 5.5 minutes to compute both stages of r-GAN training.The intervention examples took approximately 11 minutes to compute both stages of r-GAN training.