Augmented NETT regularization of inverse problems

We propose aNETT (augmented NETwork Tikhonov) regularization as a novel data-driven reconstruction framework for solving inverse problems. An encoder-decoder type network defines a regularizer consisting of a penalty term that enforces regularity in the encoder domain, augmented by a penalty that penalizes the distance to the signal manifold. We present a rigorous convergence analysis including stability estimates and convergence rates. For that purpose, we prove the coercivity of the regularizer used without requiring explicit coercivity assumptions for the networks involved. We propose a possible realization together with a network architecture and a modular training strategy. Applications to sparse-view and low-dose CT show that aNETT achieves results comparable to state-of-the-art deep-learning-based reconstruction methods. Unlike learned iterative methods, aNETT does not require repeated application of the forward and adjoint models during training, which enables the use of aNETT for inverse problems with numerically expensive forward models. Furthermore, we show that aNETT trained on coarsely sampled data can leverage an increased sampling rate without the need for retraining.

Classical deep learning approaches may lack data consistency for unknowns very different from the training data. To address this issue, in [31] a deep learning approach named NETT (NETwork Tikhonov) regularization has been introduced which considers minimizers of the NETT functional Here,  is a similarity measure,  X  E: is a trained neural network, X  ¥ : 0, [ ] a functional and a > 0 a regularization parameter. In [31] it is shown that under suitable assumptions, NETT yields a convergent regularization method. This in particular includes provable stability guarantees and error estimates. Moreover, a training strategy has been proposed, where E is trained such that  E • favors artifact-free reconstructions over reconstructions with artifacts.
1.1. The augmented NETT One of the main assumptions for the analysis of [31] is the coercivity of the regularizer  E • which requires special care in network design and training. In order to overcome this limitation, we propose an augmented form of the regularizer for which we are able to rigorously prove coercivity. More precisely, for fixed > c 0, we consider minimizers a d x of the augmented NETT functional ] is a similarity measure and    D E: • is an encoder-decoder network trained such that for any signal x on a signal manifold we have x x D E ( • )( )  and that  x E ( ( )) is small. We term this approach augmented NETT (aNETT) regularization. In this work we provide a mathematical convergence analysis for aNETT, present a novel modular training strategy and investigate its practical performance.
The term  x E ( ( )) implements learned prior knowledge on the encoder coefficients, while smallness of -  forces x to be close to the signal manifold. The latter term also guarantees the coercivity of (1.3). In the original NETT version (1.2), coercivity of the regularizer requires coercivity conditions on the network involved. Indeed, in the numerical experiments, the authors of [31] observed a semi-convergence behaviour when minimizing (1.2), so early stopping of the iterative minimization scheme has been used as additional regularization. We attribute this semi-convergence behavior to a potential non-coercivity of the regularization term. In the present paper we address this issue systematically by augmentation of the NETT functional which guarantees coercivity and allows a more stable minimization. Coercivity is also one main ingredient for the mathematical convergence analysis.
An interesting practical instance of aNETT takes  as a weighted ℓ q -norm enforcing sparsity of the encoding coefficients [34,35]. An important example for the similarity measure is given by the squared norm distance, which from a statistical viewpoint can be motivated by a Gaussian white noise model. General similarity measures allow us to adapt to different noise models which can be more appropriate for certain problems.

Main contributions
The contributions of this paper are threefold. As described in more detail below, we introduce the aNETT framework, mathematically analyze its convergence, and propose a practical implementation that is applied to tomographic limited data problems.
• The first contribution is to introduce the structure of the aNETT regularizer A similar approach has been studied in [36] for a linear encoder E. However, in this paper we do not assume that the image x consists of two components u and v but rather assume that there is some transformation E in which the signal x has some desired property such as, for example, sparsity. The term  x E ( ( )) enforces regularity of the analysis coefficients, which is an ingredient in most of existing variational regularization techniques. For example, this includes sparse regularization in frames or dictionaries, regularization with Sobolev norms or total variation regularization. On the other hand, the augmented term -  penalized distance to the signal manifold. It is the combination of these two terms that results in a stable reconstruction scheme without the need of strong assumptions on the involved networks.
• The second main contribution is the theoretical analysis of aNETT (1.3) in the context of regularization theory. We investigate the case where the image domain of the encoder is given by X = L 2 ℓ ( ) for some countable set Λ, and  is a coercive functional measuring the complexity of the encoder coefficients. The presented analysis is in the spirit of the analysis of NETT given in [31]. However, opposed to NETT, the required coercivity property is derived naturally for the class of considered regularizers. This supports the use of the regularizer  also from a theoretical side. Moreover, the convergence rates results presented here use assumptions significantly different from [31]. While we present our analysis for the transform domain X = L 2 ℓ ( ) we could replace the encoder space by a general Hilbert or Banach space.
• As a third main contribution we propose a modular strategy for training D E • together with a possible network architecture. First, independent of the given inverse problem, we train a -penalized autoencoder that learns representing signals from the training data with low complexity. In the second step, we train a task-specific network which can be adapted to the specific inverse problem at hand. In our numerical experiments, we empirically found this modular training strategy to be superior to directly adapting the autoencoder to the inverse problem. For the -penalized autoencoder, we train the modified version described in [37] of the tight frame U-Net of [38] in a way such that  poses additional constraints on the autoencoder during the training process.

Outline
In section 2 we present the mathematical convergence analysis of aNETT. In particular, as an auxiliary result, we establish the coercivity of the regularization term. Moreover, we prove stability and derive convergence rates. Section 3 presents practical aspects for aNETT. We propose a possible architecture and training strategy for the networks, and a possible ADMM based scheme to obtain minimizers of the aNETT functional. In section 4, we present reconstruction results and compare aNETT with other deep learning based reconstruction methods. The paper concludes with a short summary and discussion. Parts of this paper were presented at the ISBI 2020 conference and the corresponding proceedings [39]. Opposed to the proceedings, this article treats a general similarity measure  and considers a general complexity measure . Further, all proofs and all numerical results presented in this paper are new.

Mathematical analysis
In this section we prove the stability and convergence of aNETT as regularization method. Moreover, we derive convergence rates in the form of quantitative error estimates between exact solutions for noise-free data and aNETT regularized solutions for noisy data. To this end we make the assumption that we can achieve global minimizers of the functional (1.3) and analyze the properties of these solutions. This is a common assumption in variational regularization approaches and is adopted in this work. Extending the analysis to consider only local minima is consiberably more difficult and is out of the scope of this paper.

Assumptions and coercivity results
For our convergence analysis we make use of the following assumptions on the underlying spaces and operators involved. (A1)  and  are Hilbert spaces.
is weakly sequentially continuous.
is weakly sequentially continuous.
is weakly sequentially continuous.
] is coercive and weakly sequentially lower semi-continuous.We set N D E ≔ • and, for given > c 0, define which we refer to as the aNETT (or augmented NETT) regularizer. According to (A4)-(A6), the aNETT regularizer is weakly sequentially lower semi-continuous. As a main ingredient for our analysis we next prove its coercivity.
Theorem 2.2 Coercivity of the aNETT regularizer. If Condition 2.1 holds, then the regularizer is bounded. Then by definition of  it follows that is bounded and by coercivity of  we have that is also bounded. By assumption, D is weakly sequentially continuous and thus Î x N n n ( ( ) )   must be bounded. Using that > c 0, we obtain the inequality - ·   is coercive. As a sum of weakly sequentially lower semicontinuous functionals it is also weakly sequentially lower semi-continuous [35]. Therefore Condition (A6) is satisfied for the weighted ℓ q -norm. Together with theorem 2.2, we conclude that the resulting weighted sparse aNETT regularizer  is a coercive and weakly sequentially lower semicontinuous functional.
For the further analysis we will make the following assumptions regarding the similarity measure´ Condition 2.4 Similarity measure.
(B1) " =  =  y y y y y y , : , 0 (B2)  is sequentially lower semi-continuous with respect to the weak topology in the first and the norm topology in the second argument.
While (B1)-(B4) restrict the choice of the similarity measure, (B5) is a technical assumption involving the forward operator, the regularizer and the similarity measure, that is required for the existence of minimizers. For a more detailed discussion of these assumptions we refer to [40]. ( )   for some  p 1 and more generally by y = - y y y y , )is a continuous and monotonically increasing function that satisfies

Stability
Next we prove the stability of minimizing the aNETT functional a  y , regarding perturbations of the data y.
is a bounded sequence and hence it has a weakly convergent subsequence.
Let t Î x n n ( ) ( ) be a weakly convergent subsequence of Î x n n ( ) and denote its limit by x ‡ . By the lower semicontinuity we get , .
and concludes the proof. , In the following we say that the similarity measure  satisfies the quasi triangle-inequality if there is some  q 1 such that " Î +      y y y y y q y y y y , , : While this property is essential for deriving convergence rate results, we will show below that it is not enough to guarantee stability of minimizing the augmented NETT functional in the sense of theorem 2.6. Note that [31] assumes the quasi triangle-inequality (2.2) instead of Condition (B4). The following remarks shows that (2.2) is not sufficient for the stability result of theorem 2.6 to hold and therefore Condition (B4) has to be added to the list of assumptions in [31] required for the stability.
Example 2.7 Instability in the absence of Condition (B4). Consider the similarity measuré Id be the identity operator and suppose the regularizer takes the form =  2 ·   .
• The similarity measure defined in (2.3) satisfies (B1)-(B3): Convergence with respect to  is equivalent to convergence in norm which implies that (B3) is satisfied. Moreover, we have " =  y y y y , : , n is taken as a non-increasing sequence converging to zero. We have  y y n and hence also   y y , 0 In summary, all requirement for theorem 2.6 are satisfied, except of the continuity assumption (B4).
 which implies that the similarity measure satisfies the quasi triangle-inequality (2.2). However as shown next, this is not sufficient for stable reconstruction in the sense of theorem 2.6. To that end, let and d  0 n and let a > 0. In particular, with perturbed data y n is given by a = + x y 1 2 n n ( )and the minimizer of . We see that a  + x y 1 2 n ( ) , which is clearly different from x ‡ . In particular, minimizing a +  y , 2 ( · ) ·   does not stably depend on data y. Theorem 2.6 states that stability holds if (B4) is satisfied.
While the above example may seem somehow constructed, it shows that one has to be careful when choosing the similarity measure in order to obtain a stable reconstruction scheme.

Convergence
In this subsection we consider the limit process as the noise-level δ tends to 0. Assuming that Î y K ran( ) we would expect the regularized solutions to converge to some solution of the equation . This raises the obvious question whether this solution has any additional properties. In fact, we prove that the minimizers of the aNETT functional for noisy data converge to such a special kind of solution, namely solutions which minimize  among all possible solutions. For that purpose, here and below we use the following notation.
An -minimizing solution always exists provided that data satisfies Î  y K dom ( ( )), which means that the equation = x y K has at least one solution with finite value of . To see this, consider a sequence of solutions Since  is coercive there exists a weakly convergent subsequence t Î x n n ( ) ( ) with weak limit x ‡ . Using the weak sequential lower semi-continuity of  one concludes that x ‡ is an -minimizing solution. We first show weak convergence.
is bounded. Due to the coercivity of the aNETT regularizer (see theorem 2.2), this implies that Î x n n ( ) has a weakly convergent subsequence.
be a weakly convergent subsequence of Î x n n ( ) with limit  x . From the weak lower semicontinuity we get a where for the second last inequality we used (2.4) and for the last equality we used that d a  0 n n . Therefore,  x is an -minimizing solution of the equation = x y K . In a similar manner we derive d a has a unique -minimizing solution x ‡ , then every subsequence of Î x n n ( ) has itself a subsequence weakly converging to x ‡ , which implies that Î x n n ( ) weakly converges to the -minimizing solution. , Next we derive strong convergence of the regularized solutions. To this end we recall the absolute Bregman distance, the modulus of total nonlinearity and the total nonlinearity, defined in [31].
Here and below ¢   x ( ) denotes the Gâteaux derivative of  at  x .
Definiton 2.11 Modulus of total nonlinearity and total nonlinearity.
. We define the modulus of total nonlinearity of  at  x as Using these definitions we get the following convergence result in the norm topology. is unique, then every subsequence has a subsequence converging to x ‡ and hence the claim follows. ,

Convergence rates
We will now prove convergence rates by deriving quantitative estimates for the absolute Bregman distance between -minimizing solutions for exact data and regularized solutions for noisy data. The convergence rates will be derived under the additional assumption that  satisfies the quasi triangle-inequality (2.2).
Proposition 2.13 Convergence rates for aNETT. Let the assumptions of theorem 2.12 be satisfied and suppose that  satisfies the quasi triangle-inequality (2.2) for some  q 1. Let Î  x ‡ be an -minimizing solution of = x y K such that  is Gâteaux differentiable at x ‡ and assume there exist Together with the inequality of arithmetic and geometric means is an immediate consequence of (a). , The following results is our main convergence rates result. It is similar to proposition [31], theorem 3.1, but uses different assumptions.
Theorem 2.14 Convergence rates for finite rank operators. Let the assumptions of theorem 2.12 be satisfied, take = - y y y y , 1  Proof. According to proposition 2.13, it is sufficient to show that (2.5) holds with - . For that purpose, let P denote the orthogonal projection onto the null-space K ker( )and let L be a Lipschitz constant of . Since K restricted toK ker ( ) is injective with finite dimensional range, we can choose a constant > a 0 such that " Î^ z z az K K ker : . On the other hand, using that  is Gâteaux differentiable and that K has finite rank, shows á ¢ -ñ ¢ - Note that the theoretical results stated remain valid, if we replace  by a general coercive and weakly lower semi-continuous regularizer

Practical realization
In this section we investigate practical aspects of aNETT. We present a possible network architecture together with a possible training strategy in the discrete setting 3 . Further we discuss minimization of aNETT using the ADMM algorithm. For the sake of clarity we restrict our discussion to the finite dimensional case where =  N N and L = L  2 ℓ ( ) for a finite index set Λ.

Proposed modular aNETT training
To find a suitable network D E • defining the aNETT regularizer , we propose a modular data driven approach that comes in two separate steps. In a first step, we train a -regularized denoising autoencoder  D E • independent of the forward problem K, whose purpose is to well represent elements of a training data set by low complexity encoder coefficients. In a second step, we train a taskspecific network that increases the ability of the aNETT regularizer to distinguish between clean images and images containing problem specific artifacts. Let denote the given set of artifact-free training phantoms. N N decoder networks, respectively. To achieve that unperturbed images are sparsely represented by E, whereas disrupted images are not, we apply the following training strategy. We randomly generate images +  x a i i i where  i is additive Gaussian white noise with a standard deviation proportional to the mean value of x i , and Î a 0, 1 i { }is a binary random variable that takes each value with probability 0.5. For the numerical results below we use a standard deviation of 0.05 times the mean value of x i . To select the particular autoencoder based on the training data, we consider the following training strategy Here n b > , 0are regularization parameters. Including perturbed signals +  x i i in (3.1) increases robustness of the -regularized autoencoder. To enforce regularity for the encoder coefficients only on the noise-free images, the penalty  E ( ( · )) is only used for the noise-free inputs, reflected by the pre-factora 1 i . Using auto-encoders, regularity for a signal class could also be achieved by means of dimensionality reduction techniques, where L  is used as a bottleneck in the network architecture. However, in order to get a regularizer that is able to distinguish between perturbed and unperturbed signals we use L  to be of sufficiently high dimensionality.
• TASK-SPECIFIC NETWORK: Numerical simulations showed that the -regularized autoencoder alone was not able to sufficiently well distinguish between artifact-free training phantoms and images containing problem specific artifacts. In order to address this issue, we compose the operator independent network with another network U, that is trained to distinguish between images with and without problem specific artifacts.
For that purpose, we consider randomly generated images z z ,..., m with equal probability. Here K ‡ is an approximate right inverse and h i are error terms We choose a network architecture q qÎQ U ( ) and select = q U U *, where for some regularization parameter g > 0. In particular, the image residuals h + - x D E i ( • )( ) now depend on the specific inverse problem and we can consider them to consist of operator and training signal specific artifacts.
The above training procedure ensures that the network U adapts to the inverse problem at hand as well as to the -regularized autoencoder. Training the network U independently of  D E • , or directly training the autoencoder to distinguish between images with and without problem specific artifacts, we empirically found to perform considerably worse.
The final autoencoder is then given as = N D E • with modular decoder  D U D ≔ • . For the numerical results we take q qÎQ U ( ) as the tight frame U-Net of [38]. Moreover, we choose q q qÎQ D E ( • ) as modified tight frame U-Net proposed in [37] for deep synthesis regularization. In particular, opposed to the original tight frame U-net, the modified tight frame U-Net does not involve skip connections.

Possible aNETT minimization
For minimizing the aNETT functional (1.3) we use the alternating direction method of multiplies (ADMM) with scaled dual variable [41][42][43]. For that purpose, the aNETT minimization problem is rewritten as the following constraint minimization problem a x a x The resulting ADMM update scheme with scaling parameter r > 0 initialized by and h = 0 0 then reads as follows: One interesting feature of the above approach is that the signal update (S1) is independent of the possibly non-smooth penalty . Moreover, the encoder update (S2) uses the proximal mapping of  which in important special cases can be evaluated explicitly and therefore fast and exact. Moreover, it guarantees regular encoder coefficients during each iteration. For example, if we choose the penalty as the ℓ 1 -norm, then (S2) is a softthresholding step which results in sparse encoder coefficients.
Step (S1) in typical cases has to be computed iteratively via an inner iteration. To find an approximate solution for (S1) for the results presented below we use gradient descent with at most 10 iterations. We stop the gradient descent updates early if the difference of the functional evaluated at two consecutive iterations is below our predefined tolerance of 10 −5 .
The concrete implementation of the aNETT minimization requires specification of the similarity measure, the total number of outer iterations N iter , the step-size γ for the iteration in (S1) and the parameters defining the aNETT functional. These specifications are selected dependent of the inverse problem at hand. Table 1 lists the particular choices for the reconstruction scenarios considered in the following section.
In order to choose the parameters for the numerical simulations we have tested different values and manually chose the parameters which maximized performance among the considered parameters. Another way of choosing these parameters could be to try and learn these parameters from the data using some kind of machine learning approach or choose a bilevel approach similar to [44].
In the simulations we have observed that choosing c larger will tend to oversmooth the resulting reconstructions. Taking a smaller value for c we observed that the manifold termx x N 2 ( )   tends to be undervalued resulting in worse performance. In a similar fashion we found that choosing a larger will have a smoothing effect on the resulting reconstructions while lowering a will make the reconstructions less smooth.
The ADMM scheme for aNETT minimization shares similarities with existing iterative neural network based reconstruction methods. In particular, ADMM inspired plug-and-play priors [15][16][17] may be most closely related. However, opposed the plug and play approach we can deduce convergence from existing results for ADMM for non-convex problems [45]. While convergence of (S1)-(S3) and relations with plug and play priors are interesting and relevant, they are beyond the scope of this work. This also applies to the comparison with other iterative minimization schemes for minimizing aNETT.

Application to sparse view and low dose CT
In this section we apply aNETT regularization to sparse view and low-dose computed tomography (CT). For the experiments we always choose  to be the ℓ 1 -norm. The parameter specifications for the proposed aNETT functional and its numerical minimization are given in ]. In both cases, we end up with an inverse problem of the form (1.1), where j   K: N N N N s is the discretized linear forward operator. Elements Î x N N will be referred to as CT images and the elements Îj  y N N s as sinograms. For all results presented below we work with image size 512×512 and use N s =768. The number of angular samples N j is taken 40 for sparse view CT and = j N 1138 for the low dose example. In both cases we use the CT images from the Low Dose CT Grand Challenge dataset [47] provided by the Mayo Clinic. The dataset consists of 512×512 grayscale images of 10 different patients, where for each patient there are multiple CT scanning series available. We use the split 7 2 1for training, validation and testing which corresponds to 4267 1143 526 CT images in the respective sets. We use the validation set to select networks which achieve the minimal loss on the validation set. The test set is used to evaluate the final performance. Note that by splitting of the dataset according to patient we avoid validation and testing on images patients that have already be seen during training time. An example image and the corresponding simulated sparse view and low-dose sinogram are shown in figure 1.

Numerical results
We compare results of aNETT to the learned primal-dual algorithm (LPD) [48], the tight frame U-Net [38] applied as post-processing network (CNN) and the filtered back-projection (FBP). Minimization of the loss-function for all methods was done using Adam [49] for 100 epochs, cosine decay learning rate and N=7 network iterations and train according to [48]. Here, we choose to only use N = 7 network iterations because we observed instabilities during the training phase when this parameter was chosen larger and we have not performed any parametr tuning. For training of the tight frame U-Net we do not follow the patch approach of [38] but instead use full images obtained with FBP as CNN inputs. Training of all the networks was done on a GTX 1080 Ti with an Intel Xeon Bronze 3104 CPU.   figure 2 indicate that aNETT reconstructions are less smooth than CNN reconstructions and less blocky than LPD reconstructions.
• LOW DOSE CT: for the low dose problem, we use a fully sampled sinogram with = j N 1138 and add Poisson noise corresponding to 10 4 incident photons per pixel bin. The Kullback-Leibler divergence  KL is a more appropriate discrepancy term than the squared ℓ 2 -norm distance in case of Poisson noise and the reported values and reconstructions use the Kullback-Leibler divergence as the similarity measure. Quantitative results are shown in table 2. Again, all learning-based methods give similar results and significantly outperform FBP. Visual comparison of the reconstructions in figure 3 shows that CNN yields cartoon like images and the LPD reconstruction again looks blocky. The aNETT reconstruction shows more texture than the CNN reconstruction and at the same time is less blocky than the LPD reconstruction.
• UNIVERSALITY: in practical applications, we may not have a fixed sampling pattern. If we have many different sampling patterns, then training a network for each sampling pattern is infeasible and hence reconstruction methods should be applicable to different sampling scenarios. Additionally, it is desirable that an increased number of samples indeed increases performance. In order to test this issue, we consider the sparse view CT problem but with an increased number of angular samples without retraining the networks. Due to the rigidity of the used framework LPD cannot easily be adapted to this problem and we therefore decided to only compare aNETT with the post-processing CNN. For the results presented here, no network was retrained. Quantitative evaluation for this scenario is given in table 2. We see that aNETT slightly outperforms the CNN in terms of PSNR. The advantage of aNETT over CNN, however, is best observed in figure 4. One observes  that CNN yields a similar reconstructions for both angular sampling patterns. On the other hand, aNETT is able to synergistically combine the increased sampling rate of the sinogram with the network trained on coarsely sampled data. Despite using the network trained with only 40 angular samples, aNETT reconstructs small details which are not present in the reconstruction from 40 angular samples.

Discussion
The results show that the proposed aNETT regularization is competitive with prominent deep-learning methods such as LPD and post-processing CNNs. We found that the aNETT does not suffer as much from oversmoothing which is often observed in other deep-learning reconstruction methods. This can for example be seen in figure 3 where the CNN yields an over-smoothed reconstruction and the aNETT reconstruction shows more texture. Besides this, aNETT reconstructions are less blocky than LPD reconstructions. Moreover, aNETT is able to leverage higher sampling rates without retraining the networks to reconstruct small details while other deep-learning methods fail to do so. We conjecture that this advantage arises due to the fact that aNETT can make use of the higher sampling rate using the data-consistency term in (1.3), while the CNN is agnostic to this change in the sampling rate. In some scenarios, it may not be possible to retrain networks. Especially for learned iterative schemes network training is a time-consuming task. Training aNETT on the other hand is straightforward and, as demonstrated, yields a method which is robust to changes of the forward problem during testing time.
While a more extensive study with respect to the influence of noise could be done to further analyse the advantages and disadvantages of each method, this is not our main focus here and is thus postponed to a future study.
Finally, we note that aNETT relies on minimizing (1.3) iteratively. With the use of the ADMM minimization scheme presented in this article, aNETT is slower than the methods used for comparison in this article. Designing faster optimization schemes for (1.3) is beyond the scope of this work, but is an important and interesting aspect.

Conclusion
We have proposed the aNETT (augmented NETwork Tikhonov) for which we derived coercivity of the regularizer under quite mild assumptions on the networks involved. Using this coercivity we presented a convergence analysis of aNETT with a general similarity measure . We proposed a modular training strategy in which we first train an -regularized autoencoder independent of the problem at hand and then a network which is adapted to the problem and first autoencoder. Experimentally we found this training strategy to be superior to directly training the autoencoder on the full task. Lastly, we conducted numerical simulations demonstrating the feasibility of aNETT.
The experiments show that aNETT is able to keep up with the classical post-processing CNNs and the learned primal-dual approach for sparse view and low dose CT. Typical deep learning methods work well for a fixed sampling pattern on which they have been trained on. However, reconstruction methods are expected to perform better if we use an increased sampling rate. We have experimentally shown that aNETT is able to leverage higher sampling rates to reconstruct small details in the images which are not visible in the other reconstructions. This universality can be advantageous in applications where one is not fixed to one sampling pattern or is not able to train a network for every sampling pattern.