PhaseGAN: A deep-learning phase-retrieval approach for unpaired datasets

Phase retrieval approaches based on DL provide a framework to obtain phase information from an intensity hologram or diffraction pattern in a robust manner and in real time. However, current DL architectures applied to the phase problem rely i) on paired datasets, i.e., they are only applicable when a satisfactory solution of the phase problem has been found, and ii) on the fact that most of them ignore the physics of the imaging process. Here, we present PhaseGAN, a new DL approach based on Generative Adversarial Networks, which allows the use of unpaired datasets and includes the physics of image formation. Performance of our approach is enhanced by including the image formation physics and provides phase reconstructions when conventional phase retrieval algorithms fail, such as ultra-fast experiments. Thus, PhaseGAN offers the opportunity to address the phase problem when no phase reconstructions are available, but good simulations of the object or data from other experiments are available, enabling us to obtain results not possible before.


INTRODUCTION
Phase retrieval, i. e., reconstructing phase information from intensity measurements, is a common problem in coherent imaging techniques such as holography [1], coherent diffraction imaging [2], and ptychography [3,4]. As most detectors only record intensity information, the phase information is lost, making its reconstruction an ill-defined problem [5,6]. Most common quantitative solutions to the phase problem either rely on deterministic approaches or on an iterative solution [7]. Examples of deterministic solutions to holography are transport of intensity equations (TIE) [8] or based on Contrast Transfer Functions (CTFs) [9]. Such deterministic approaches can only be applied if certain constraints are met. For example, TIE is valid only in paraxial and short-propagation-distance conditions. Furthermore, complex objects can only be reconstructed with TIE when assuming a spatially homogeneous material [10]. Similarly, CTF only applies to weak scattering and absorption objects. Iterative approaches are not limited by these constraints [11,12] and can address not only holography but also coherent diffraction imaging and ptychography. These techniques retrieve the object by alternating between the detector and object space and iteratively applying constraints on both spaces, as depicted in Fig. 1(a). This process is computationally expensive, requiring several minutes to converge, precluding application to real-time analysis. Furthermore, the convergence of such approach is not guaranteed.
Recently, DL has demonstrated potential to solve ill-posed imaging problems, such as holography [13,14], magnetic resonance imaging [15], and phase retrieval [16,17]. DL offers an accurate solution to the phase problem, which is computationally fast compared to iterative approaches [13,17], and independent of physical approximations. DL methods need to be trained before they are used. In supervised training, data is input to a differentiable method with adjustable parameters, e. g., a Neural Network (NN). The NN is then tuned to produce the desired output. In classic paired supervision applied to phase retrieval, for every input (intensity) such training needs to know the precise output (phase). This paired supervision has two main difficulties: First, such approaches require recording large datasets of phase and intensity of exactly the same sample. It is easy to think of conditions where this is not possible: i) Some instruments, like X-ray free-electron laser (XFEL) [18,19,20,21] have limited accessibility, making it difficult to acquire large paired datasets from such instruments. ii) Phase retrieval algorithms might not provide good reconstructions or are not even applicable. Examples of such scenarios are diffraction experiments where only simulations are available but not phase reconstructions [22,23] or Bragg Coherent Diffraction imaging [24] experiments where obtaining good phase reconstructions have proven a challenging task [25,26]. iii) Complementary imaging modalities, e. g., certain imaging experiments might provide low-noise and high-spatialresolution phase reconstructions while another experiment provides high-noise detector images at a lower resolution of similar samples, but not of the same exact sample. This is of particular importance when imaging radio-sensitive samples with directly or indirectly-ionizing radiation, such X-rays. Such scenario requires minimizing the deposited dose, i. e., deposited energy per unit of mass. Alternatively, this is a typical problem when performing fast imaging experiments to track dynamics with a reduced number of photons per exposure. iv) Sensing might alter or even destroy the sample, e. g., in a diffraction-before-destruction imaging modality with high-intensity sources such as XFELs [27,28]. In this scenario, rendering paired sensing with a different modality is impossible. We argue how unpaired training, where all we need is random samples from the two different experimental setups, but not from the same object, will overcome all these four (i-iv) limitations.
Second, even if paired data was available, the results are often unsatisfying when attempting to solve an ill-posed problem, i. e., if one intensity reading does not map to one specific phase solution [29] but to a distribution of possible explanations. Classic paired training is known to average, i. e., spatially blur, all possible solutions if the solution is not unique [30]. Adversarial training [31] can overcome this problem by augmenting the training by a discriminator, i. e., another NN, with the purpose to correctly classify results from the training, as well as true samples of the data distribution, i. e., from-the-wild phase images, as either "real" or "fake". The training uses the information of what was objectionable so that the discriminator could detect a method's results as fake, to improve the method itself. It also uses the information from the true samples of the data distribution to become picky, i. e., good at saying what is "real" or "fake". For ill-posed problems such as phase reconstruction, this will push the solution away from the average of all possible phase images that explain an intensity image -which itself would not be a good phase image, as it is blurryto a specific solution, which also explains the input, but is not blurry.
New DL adversarial schemes have shown the possibility of training on unpaired data sets; that is, a set of images captured from one modality and another set made using a different modality, but not necessarily of the same object. CycleGAN [32] learns a pair of cycle consistent functions, which map from one modality to the other such that their composition is the identity. This consistency constraint is analogous to the constraint applied in iterative phase reconstruction algorithms [11,5], where cyclic constraints are applied between the sample and detector space. Thus, approaches based on CycleGAN offer a framework for phase reconstruction, which mimics the structure of iterative approaches but without the limitation to paired datasets.
In this paper, we demonstrate a DL implementation, christened PhaseGAN, based on CycleGAN. PhaseGAN naturally includes the physics of the image formation as it cycles between the sample and the detector domains. By including the physics of the image formation and other learning constraints, PhaseGAN retrieves phase reconstructions better than CycleGAN, which are comparable to state-of-the-art paired approaches.
The remainder of this paper is structured as follows: First, we describe our approach's architecture and how the physics of the image formation is included. Second, we validate PhaseGAN with synthetic data for in-line holographic (near-field) experiments. In this validation step, we demonstrate the relevance of including the physical model by comparing the results with CycleGAN. Furthermore, we demonstrate that our unpaired approach performs at the level of state-of-the-art paired approaches. Third, we apply PhaseGAN to fast-imaging experimental data where noisy readings of a MHz camera are reconstructed using low-noise phase reconstructions recorded with a different setup and samples. Finally, we discuss the results and future applications of PhaseGAN to experiments where phase reconstructions are not possible today.

THE PHASEGAN APPROACH
This section describes the architecture of PhaseGAN and how it uses physical knowledge to enhance the phase reconstructions. We then describe the training process and our loss function, which includes terms that avoid typical phase-reconstruction artifacts such as missing frequencies or the twin-imaging problem [1,33].
The architecture of PhaseGAN is based on CycleGAN [32]. CycleGAN uses two Generative Adversarial Network (GAN)s, which allow the translation of one image from a domain A to a domain B and the inverse translation from B to A. Thus, the cycle consistency between two domains can be adapted to the object and detector domains, allowing CycleGAN to perform phase reconstructions by mimicking the structure of iterative phase-retrieval approaches, as shown in Fig. 1(b). The main difference between iterative phase-retrieval approaches and CycleGAN approaches is the inclusion of the propagator (H), which includes the physics of the image formation between the object and the detector space. PhaseGAN combines both the iterative and the CycleGAN approach by including two GANs in a cyclic way together with the physics of the image formation via the propagator. The scheme of PhaseGAN is depicted in Fig. 1(c), where each of the GANs is decomposed in their generator (G) and discriminator (D). The generators used in PhaseGAN are U-Net [34]-like end-to-end fully convolutional neural networks. For specific details about the generators see Supplement 1. The discriminators are PatchGAN discriminators [30,32]. G O is the phase-reconstruction generator, which takes the measured intensities (one single channel input) and produces a two-channel complex output, where the two channels can be either the real and imaging part or the phase and amplitude of the complex-object wave field (ψ O ). D O is the discriminator of the phase reconstruction. The object wavefield ψ O is then propagated using the non-learnable but differentiable operator H to the detector plane (ψ D = Hψ O ), and the intensity in the detector plane is computed (|ψ D | 2 ). The propagator H is the near-field Fresnel propagator [35]. G D completes the cycle and works as an auxiliary generator, mapping the propagated intensity |ψ D | 2 to the measured detector intensity using a single channel for the input and output. Due to the propagator H, G D does not need to learn the well-known physical process; thus it only learns the experimental effects of the intensity measurements, e. g., the point-spread function and flat-field artifacts. Finally, the intensity discriminator D D is used to classify intensity measurements as "real" or "fake". For more details about the PhaseGAN architecture, see the Supplement 1.
Our goal is to learn two mappings simultaneously: i) detector images to complex object wavefield G O : I D → ψ O , and ii) propagated diffraction patterns to detector images G D : |ψ D | 2 → I D . This goal is achieved by optimizing arg min This objective is a combination of three terms: an adversarial term, a cycle consistency term, and a Fourier Ring Correlation (FRC) term. The relative weight of the cycle consistency and FRC losses with respect to the adversarial loss is parametrized by α Cyc and α FRC , respectively. The schematic of the learning process is depicted in Fig. 2.
The first term L GAN of Eq. (1) is the adversarial loss [31] In Eq. (2), E x∼X denotes the expectation of the distribution X , and Ψ and I are the phase and intensity distributions, respectively. The second term (L Cyc ) of Eq. (1) requires cycle consistency to confine generator outputs so that it is not just creating random permutation of images following the same data distribution from the desired dataset. As shown in Fig. 2, regardless of where we start the loop we should end up at the starting point, i. e., This cycle consistency loss can be expressed as:

Fig. 2.
Learning process diagram. Our aim is to learn a mapping G O from the intensity sensing regime (right) to a phase modality (left). We require this mapping G O to fulfill two cyclic constraints: First (blue), when its phase result is being mapped back to the intensity domain using a non-learned physical operator H and a learned correction operation G D , the result should be similar (dotted line) to the intensity. Second (red), when the phase is mapped to intensity and back, it should remain the same. Further, we train two discriminators D D and D O to classify real and generated intensity and phase samples as real or fake (green). Finally, we ask the Fourier transform, another fixed but differentiable op of both intensity and phase, to match the input after one cycle.
The last term in Eq. (1), L FRC , calculates the FRC. FRC takes two images or complex waves and measures the normalised cross-correlation in Fourier space over rings [36,37]. Fourier ring correlation can help to avoid common frequency artifacts such as the twin-image problem [1,33] or missing frequencies due to the physical propagation. The L FRC is defined as follows: where FRC is the Fourier ring correlation operator that calculates the FRC over all the Fourier space rings.

VALIDATION RESULTS
In this section, we perform phase-retrieval experiments to validate PhaseGAN. Furthermore, we compare its performance to other state-of-the-art DL methods. This comparison is made with synthetic data in the near-field regime.
To validate PhaseGAN and compare its performance to other DL methods, we generate synthetic X-ray imaging experiments in the near-field regime. The synthetic training dataset consists of 10,000 complex objects and 10,000 synthetic detector images. These sets are unpaired. However, paired solutions for the detector and object simulations are available for validation purposes and training state-of-the-art paired approaches. The wavelength of these experiments is λ = 1 Å, and the pixel size in the object space is constrained to 1 µm. Objects are composed of a random number between one and N of rectangles and circles over a 256 × 256 frame. The complex wavefront of such objects is given by their transmissivity. The transmissivity is estimated by their complex index of refraction n = 1 − δ + jβ and a random thickness (t), up to a maximum thickness (t max ) of 10 nm. For our simulations δ and β are fixed to 10 −3 and 10 −6 , respectively. The complex wavefront after the object in the projection approximation is given by: where ψ i is the illumination wavefront at the object plane, k = 2π/λ is the wavenumber, r are the frame coordinates, and t(r) is the frame thickness map. Then, this wavefront is propagated to the detector (Hψ O ) using the near-field propagator. The near-field detector has an effective pixel-size equal to 1 µm (equal to the sample-simulated pixel size) and is assumed to be 10 cm away from the sample. We also include flat-field noise, i. e., variable ψ i for each frame. This flat-field noise is simulated with 15 elements of a basis extracted by Principal Component Analysis (PCA) from MHz-imaging data coming from the European XFEL [38]. Examples of the simulated holograms can be found in the Supplement 1. We assume that the detector has photon counting capabilities; thus, the noise has Poissonian behaviour. The amount of photons simulated per frame is approximately 6.6 · 10 7 photons. We compare the performance of PhaseGAN to three other methods. The first is a classic supervised learning approach using paired datasets and an L 2 loss, as used by most current phase-retrieval approaches. The second uses the same architecture as before, but with additional adversarial terms as in pix2pix [39]. The global loss function in this pix2pix method is defined by: The first two terms of Eq. (6) calculate the adversarial loss in a similar way as we defined L GAN in Eq. (2). The weight of the L 2 loss, α MSE , was set to 100. The third method is the standard CycleGAN approach presented in Fig. 1(b). We use the same global loss function as expressed in Eq. (1), but without including the physics of the image formation (H) as in Eqs.
(2), (3), and (4). For the training of CycleGAN, we found that the optimal performance was obtained when α Cyc = 20 and α FRC = 4, with an additional weight of 2.5 on the first terms of Eqs. (3) and (4). For the PhaseGAN training, we set α Cyc = 20 and α FRC = 10. For all experiments, we use the same phase-retrieval network G O and the same training dataset. The dataset was paired for the training of the first two methods, but unpaired for the training of CycleGAN and PhaseGAN. The ADAM optimizer [40] with a batch size of 16 was used throughout the training. The generator learning rates were set to be 0.0002 for all four methods. For pix2pix, CycleGAN, and PhaseGAN, the discriminator learning rates were set to be 0.0001. We decayed all learning rates by 10 every 30 epochs and stopped training after 70 epochs. The phase-retrieved results are quantified by using L 2 norm, Dissimilarity Structure Similarity Index Metric (DSSIM) [41], and Fourier Ring Correlation Metric (FRCM). FRCM calculates the mean square of the difference between the Fourier ring correlation and unity over all spatial frequencies. Thus, smaller FRCM values imply a higher similarity between two images. Please note that such metrics are only partially able to capture the ability of a GAN to produce data distribution samples [42]. It must also be considered that while these metrics assume the reference solution to be available, it is -for our method and CycleGAN-only used to compute the metric, never in training. For qualitative assessment, a reader is referred to Tbl. 1. Tbl. 1 depicts the real and imaginary part of a zoom-in area of one of the validation samples or oracle and the retrieved results for each method. In Tbl. 1, we also report, for each of the four DL methods, the logarithmic frequency distribution and the average value (µ) for the aforementioned validation metrics over 1000 validation images. More information about the statistical distribution of the metric values and line profiles through different validation images can be found in the Supplement 1.

EXPERIMENTAL RESULTS
In this section, we applied PhaseGAN to experimental data recorded at the Advanced Photon Source (APS), where unpaired data of metallic foams was recorded with two different detectors at independent sensing experiments.
PhaseGAN offers the opportunity to obtain phase information when phase reconstructions are not possible. To demonstrate this, we performed time-resolved X-ray imaging experiments of the cell-wall rupture of metallic foams at the Advanced Photon Source (APS) [43]. The coalescence of two bubbles caused by the cell-wall rupture is a crucial process, which determines the final structure of a metallic foam [44]. This process can happen within microseconds; thus, MHz microscopic techniques are required to explore it. For this reason, we performed ultra-fast experiments with an X-ray imaging system based on a Photron FastcamSA-Z with 2 µm effective pixel size. The Photron system acquires the cell-wall rupture movies at a frame rate of 210 kHz, which integrated over 31 pulses of APS. Although the images acquired by the Photron camera used a few pulses, they had good contrast, which allows obtaining meaningful phase reconstructions. Images acquired by the Photron system were interpolated to an effective pixel size of 1.6 µm and filtered using 100 iterations of a total variation denoising algorithm [45] with denoising parameter λ = 1.5. Images obtained were phase-reconstructed using a TIE approach for single-phase materials [10] assuming X-ray photons of 25.7 keV, δ/β = 10 3 and propagation distance z = 5 mm. A phase and attenuation reconstructions for a frame of the Photron system are shown in Fig. 3(a) and (b), respectively. In order to increase the temporal resolution and to be able to use single pulses of APS, we used an X-ray MHz acquisition system based on a Shimadzu HPV-X2 camera with an effective pixel size of 3.2 µm. This system was used to record movies of dynamic phenomena in liquid metallic foams using single pulses provided by APS with a repetition frequency of 6.5 MHz. An example of a frame recorded with this system is shown in Fig. 3(c). However, the contrast and noise were not sufficient to perform phase reconstructions with current approaches.
To overcome the impossibility of performing phase reconstructions using the frames recorded by the Shimadzu system, we used PhaseGAN. The dataset for PhaseGAN training consists of 10000 Photron frames and 10000 Shimadzu frames, with frame sizes of 480 × 200 and 128 × 128 pixels, respectively. Due to the different pixel sizes in the two imaging systems, the two sets of images were cropped to 200 × 200 and 100 × 100 before feeding them into the NN. This was done to match the field-of-view in the two different imaging domains. We performed data augmentation by applying random rotations and flips to the randomly cropped training images to take full advantage of PhaseGAN's capabilities. As is commonly used in supervised learning, data augmentation is also indispensable in unsupervised approaches for the neural network to learn the desired robustness properties [46], especially when only limited training examples are available. In our case, the holograms were captured by kHz to MHz camera systems, making detector frames very similar to each other. PhaseGAN reconstructions without data augmentation will not learn the desired mappings from one domain to the other but only remember the common features in each frame. The cropped Photron and Shimadzu frames were subsequently padded during the training to 256 × 256 and 128 × 128, respectively. We slightly modified the network architecture of PhaseGAN for the training of metallic foams, where an extra step of transposed convolution was added to the expanding step in G O to double the size of the output images due to the half-pixel size of the Photron detector in respect to the Shimadzu one. Conversely, the last transposed convolutional layer of the G D was replaced by a normal convolutional layer to accommodate the double-pixel size of the Shimadzu detector with respect to the Photron detector. We set α Cyc = 150 and α FRC = 10. The ADAM optimizer with the same learning rates used for the synthetic data and a batch size of 40 was adopted for the metallic foam training. The training was stopped after 100 epochs. The PhaseGAN phase and attenuation outputs for the Shimadzu frame depicted in Fig. 3(c) are shown in Fig. 3(d) and (e), respectively. A complete movie of the cell-wall rupture of a metallic foam (FORMGRIP alloy [47]) and its phase and attenuation reconstruction using PhaseGAN are provided in the supplemental Visualization 1, 2, and 3. It is noticeable from the movie clip that the coalescence of the two bubbles was finished within 10 µs. In total, 24.4 ms were consumed to reconstruct the 61 frames of the movie, i. e., PhaseGAN reconstructions took 0.4 ms per frame. Thus, PhaseGAN offers an opportunity for real-time analysis.

DISCUSSION
We have presented PhaseGAN, a novel DL phase-retrieval approach. PhaseGAN, when compared to other approaches, provides for the first time phase reconstructions of unpaired datasets. The cyclic structure of PhaseGAN allows to include the physics of image formation in the learning loop, which further enhances the capabilities of unpaired DL approaches, such as CycleGAN. Although we did not include typical constraints used in iterative phase-retrieval approaches, such as support, histogram constraints, and sample symmetries, PhaseGAN performs at the level of state-of-the-art DL phase-reconstruction approaches. However, PhaseGAN's cyclic approach could be adapted to include such constraints to enhance its capabilities further. Another key ingredient of PhaseGAN is the inclusion of a FRC loss term, which penalizes common phase-reconstruction artifacts easy to filter in the Fourier domain, such as missing frequencies and the twin-imaging problem [1,33].
We have demonstrated PhaseGAN's capabilities by performing near-field holographic experiments and compared the results to i) state-of-the-art paired approaches, ii) a GAN method following the pix2pix approach, and iii) CycleGAN. The results of the experiments, using the same training datasets, paired when needed, and phase-retrieval generator (G O ), demonstrate the unique capabilities of PhaseGAN. These results are reported in Table 1. From this table, we can conclude that both paired approaches retrieve competitive phase reconstructions quantitatively and qualitatively. CycleGAN, due to the challenge of training on unpaired datasets, clearly performs worse than paired approaches. PhaseGAN, although unpaired as well, retrieves results at the level of paired-training approaches.
We have applied PhaseGAN to time-resolved X-ray imaging experiments using single pulses of a storage ring to study the cell-wall rupture of metallic foams. In this imaging modality, noisy images with low contrast and low resolution are recorded due to the limited number of photons per pulse. This acquisition scheme records images that cannot be phase-reconstructed. However, such an approach opens the possibility to record dynamics at MHz frame rates. In parallel, we acquired a less noisy and better-contrast dataset that allowed phase reconstructions. This dataset was obtained by integrating over 31 pulses and had about half of the pixel size of the time-resolved dataset. By training using these two different sensing experiments on different realizations of metallic foam, we demonstrate the capability of PhaseGAN to produce phase reconstructions, which are not possible using any current approach.

CONCLUSIONS
To conclude, we have presented a novel cyclic DL approach for phase reconstruction, called PhaseGAN. This approach includes the physics of image formation and can use unpaired training datasets to enhance the capabilities of current DL-based phase-retrieval approaches. We have demonstrated the unique capabilities of PhaseGAN to address the phase problem when no phase reconstructions are available, but good simulations of the object or data from other experiments are. This will enable phase reconstructions that are not possible today by correlating two independent experiments on similar samples. For example, it will open the possibility of phase reconstructions and denoising with X-ray imaging from low-dose in-vivo measurements by correlating them with higher-dose and lower-noise measurements performed on ex-vivo samples of similar tissues and structures. It has the potential to denoise and reconstruct the phase of time-resolved experiments to track faster phenomena with a limited number of photons per frame.
The PhaseGAN code is available at GitHub.

ACKNOWLEDGMENTS
We are greatful to Z. Matej for his support and access to the GPU-computing cluster at MAX IV. The presented research used resources of the Advanced Photon Source, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of a Quadro P4000 GPU used for this research.

PhaseGAN: supplemental document
This document provides supplementary information to "PhaseGAN: A deep-learning phaseretrieval approach for unpaired datasets". In this material, we elaborate on the architecture of PhaseGAN. We also report and depict the results obtained by PhaseGAN when applied to the validation and experimental dataset. This section describes the architecture used for PhaseGAN. The generators used in PhaseGAN are U-Net [1] type end-to-end fully convolutional neural networks. As shown in Fig. S1, the generator architecture consists of a contracting and expansive path. In the contracting path, the spatial resolution is reduced, and the feature information is increased. The contracting path in our model contains multiple convolutional layers with kernel size 3 × 3, each followed by a ReLU activation function. Max pooling operations with kernel size 2 × 2 are applied to 5 of the convolutional layers. After each max pooling, the image size is reduced by 2, decreasing from 256 × 256 to 8 × 8 pixels in the lowest resolution. The number of feature layers is doubled after each pooling operation. The extracted feature information is relocalized in the expansive path by combining upsampled feature mapping with the skipconnected high-resolution components from the contracting path. In the expansive path, the resolution of the images is recovered by repeated application of transposed convolutions. The transposed convolution outputs are then concatenated with the associated feature map from the contracting path and then send into corresponding convolutional layers. The generator weights are initialized by a pre-trained VGG11 encoder to improve model performance and accelerate the training process [2].

PHASEGAN ARCHITECTURE
The discriminators used in this work are PatchGAN discriminators similar to the ones used in [3,4]. They contain four convolutional layers with 4 × 4 convolutional filters, gradually increasing the number of filters by a factor of 2 from 64 to 512. Each convolution layer is followed by a batch normalization layer and a leaky RELU activation function with a slope of 0.2. The discriminators are trained to distinguish real images from the ones faked by the generator. For an image of size 256 × 256, the discriminator output a 30 × 30 matrix, where each matrix element corresponds to a 70×70 image area, examining if this part is from the training dataset or not.
The PhaseGAN architecture was trained using the MAX IV computing cluster. We used Nvidia Tesla V100 SXM2 GPU with 16 and 32 GB of RAM to train the synthetic and metallic foam datasets, respectively. For a given dataset, the speed of training is dependent on various elements including the network architecture, batch size, and the memory of the devices. For the training of metallic foam dataset using 32 GB memory and batch size of 40, it took less than 10 hours to go through 100 epochs. The reconstruction process is less time-consuming. It took 20 ms to reconstruct 50 frames. The generators each contains 22.93 million learnable parameters, while the discriminators have 2.76 M. The model sizes of the well-trained generator and discriminator are 460 MB and 55 MB, respectively.
We provide the PyTorch implementation of PhaseGAN, which is based on the architectures from [4] and [2]. The PhaseGAN implementation is available at GitHub.

PHASEGAN RESULTS SUMMARY
This section presents the training strategy and results obtained for the validation (synthetic) and metallic foam experiments.
PhaseGAN is an unpaired phase-reconstruction approach. To train on unpaired datasets, PhaseGAN needs two cycles that use either detector measurements or phase-reconstructed objects as input. Each of these cycles is required to be consistent, i. e., the input should be We have performed several tests to understand the capabilities of PhaseGAN compared to stateof-the-art DL approaches. Specifically, we have compared PhaseGAN to: i) classical supervised learning approach using paired datasets, ii) adversarial supervised learning with paired datasets using a pix2pix [5], and iii) standard CycleGAN [4]. For more details about the used methods, the reader is referred to the main text. All these approaches use the same G O to retrieve the phase.
One of the most simple tests to understand its capabilities was to look at phase profiles over areas difficult to reconstruct, i. e., regions with a high variation of the phase profile over a small area. The results for three line profiles are shown in Fig. S3. It can be seen that all four methods are capable of reconstructing the homogeneous regions seen in the reference or oracle wavefield. However, the main discrepancies were observed around the object edges.
Second, we report the statistical distributions of three quality metrics L 2 norm, DSSIM, and FRCM for the four DL approaches. For more details about these metrics, the reader is referred to the main text. Smaller values of these three metrics correspond to better reconstructions. Conversely, larger values evidence worse reconstructions. The distributions over 1000 validation images for the L 2 norm, DSSIM, and FRCM, are shown in Fig. S4(a), (b), and (c), respectively. Each validation contains a random number of objects ranging from 1 to 25. The phase of the images ranges from 0 to π to avoid the problem of phase wrapping. For each metric, we also include the best-performed and the worst-performed validation images of each DL method. The left side of the figure depicts the ranked distribution for each metric from smaller to larger values. The ranked distributions are independent for each of the DL methods, e. g., the smallest value for a given metric and method does not have to be obtained from the same input image as for another method with the same metric. The image patches on the left (right) side of each ranked distribution show the best (worst) phase-retrieved results for each DL approach. The frame colour follows the legend colour code for each method. As expected, most of the methods perform better with fewer objects than with a large quantity of them. The overlap between objects also plays a role in the method's performance. On the right side of Fig. S4, the kernel-density estimations are depicted for each of the methods and metrics. These distributions are calculated over the logarithmic distribution of values to enhance the differences between the methods. One can see that PhaseGAN outperforms CycleGAN and performs at the level of current-state-of-the-art paired DL approaches when applied to the phase problem.
Finally, we display five selected frames extracted from a time-resolved X-ray imaging experiment in Fig. S5. This experiment studied the coalescence of metallic-foam bubbles. This is a crucial process that determines the final structure of the metallic foam [6]. The Intensity row corresponds to measurements performed with a MHz X-ray imaging acquisition system based on a Shimadzu HPV-X2 camera. This system was capable of recording single X-ray pulses provided by the Advanced Photon Source (APS). The phase and attenuation rows correspond to the phaseretrieved results from PhaseGAN, which cannot be provided by current methods. The last row in Fig. S5 shows a schematic illustration of the coalescence process.
PhaseGAN provided a satisfactory solution for this condition, which can provide almost real-time (kHz) phase reconstructions avoiding experimental artifacts in the absence of paired image examples. PhaseGAN can also work as an alternative to the traditional iterative phase reconstruction methods in the need for large volumes of data and rapid reconstructions.