Recovering the CMB Signal with Machine Learning

The cosmic microwave background (CMB), carrying the inhomogeneous information of the very early universe, is of great significance for understanding the origin and evolution of our universe. However, observational CMB maps contain serious foreground contaminations from several sources, such as galactic synchrotron and thermal dust emissions. Here, we build a deep convolutional neural network (CNN) to recover the tiny CMB signal from various huge foreground contaminations. Focusing on the CMB temperature fluctuations, we find that the CNN model can successfully recover the CMB temperature maps with high accuracy, and that the deviation of the recovered power spectrum $C_\ell$ is smaller than the cosmic variance at $\ell>10$. We then apply this method to the current Planck observation, and find that the recovered CMB is quite consistent with that disclosed by the Planck collaboration, which indicates that the CNN method can provide a promising approach to the component separation of CMB observations. Furthermore, we test the CNN method with simulated CMB polarization maps based on the CMB-S4 experiment. The result shows that both the EE and BB power spectra can be recovered with high accuracy. Therefore, this method will be helpful for the detection of primordial gravitational waves in current and future CMB experiments. The CNN is designed to analyze two-dimensional images, thus this method is not only able to process full-sky maps, but also partial-sky maps. Therefore, it can also be used for other similar experiments, such as radio surveys like the Square Kilometer Array.


Introduction
Over the past two decades, the temperature anisotropies of the cosmic microwave background (CMB) have been increasingly measured. These measurements have ushered research into cosmology into the era of precision (Aghanim et al. 2020). Additional information about the history of the universe is contained in the polarization anisotropy of the CMB. Thus, many experiments, such as the Simons Array (Arnold et al. 2014), Atacama Cosmology Telescope (Thornton et al. 2016), SPT-3G (Benson et al. 2014), BICEP3 (Kang et al. 2018), the Simons Observatory (Ade et al. 2019), CMB-S4 (Abazajian et al. 2016), LiteBIRD (Sugai et al. 2020) and POLARBEAR (Errard et al. 2010) , are measuring or planning to measure the polarized CMB with improved sensitivity. For these experiments, the component separation of CMB the raw CMB maps during the data analysis is a key step in obtaining the foreground-cleaned information for the cosmological studies to follow.
The observed CMB maps are contaminated by various foreground components, in our galaxy or beyond. Therefore, the purpose of component separation is to separate the CMB signal from the contaminated data to obtain a foreground-cleaned CMB map for cosmological studies. To achieve this, two kinds of method are usually adopted in the literature: (1) the nonparametric method, which only makes minimal assumptions concerning the foregrounds and seeks only to minimize the variance of the CMB signal-such as the traditional Internal Linear Combination algorithm (ILC, Bennett et al. (2003)), which can quickly provide a foreground-cleaned CMB map, but not detailed information about the various foreground emissions; or (2) the parametric method, which uses parametric modeling of the foregrounds and explores model parameters through Bayesian parameter estimation techniques, by fitting a parametric signal model per pixel-such as the Commander method (Eriksen et al. 2008), which allows us to exploit the features of the foreground-cleaned CMB map in detail, as well as those of the foreground emissions, but is typically a time-and resource-consuming procedure, and only sufficient for low-resolution CMB maps. Apparently, all these algorithms have their pros and cons.
As a method of ML, deep convolutional neural networks (CNNs) have shown excellent capabilities in image recognition and classification tasks. Recent research shows that CNNs are capable of accurately extracting full-sky temperature maps of the CMB from observational data (Petroff et al. 2020). In their analysis, a DeepSphere graph-based convolution (Perraudin et al. 2019) is adopted to the HEALPix sphere to extend traditional CNNs to the sphere, making this method suitable for full-sky CMB maps. As an improvement, we propose that traditional CNNs are also capable of separating various foreground emission sources from CMB observations, to provide a foreground-cleaned CMB map from both full-sky maps and partial-sky maps. For the recovery of the CMB temperature signal, we simulate components of CMB observations to train the CNN model, then apply it to the Planck mission; while for the recovery of the CMB polarization signal, based on the CMB-S4 experiment, we simulate CMB Q and U maps to train the network model and to test its capability of recovering the polarization signal. The main difference between our analysis and that of Petroff et al. (2020) is that we use traditional CNNs to process the spherical sky map and generate foreground components using random spectral parameters, which may be important for the recovery of the CMB signal. In addition, we also use different frequency bands to train the network. The code and ex-amples are available online. 1 This paper is organized as follows. In section 2, we illustrate the methodology of using a neural network to recover the CMB signal, which includes the architecture of the neural network, data simulations, the generation of the training data, data preprocessing, and the training process for the network model. In section 3, we apply the neural network to the CMB experiments-specifically, we first test it using simulated data, then apply it to Planck full mission data sets. In section 4, discussions about using the CNN method for the recovery of the CMB signal are presented. Finally, conclusions are shown in section 5.
2. Methodology 2.1. Network Architecture A feedforward neural network is a mathematical model that consists of multiple layers. Each layer accepts a vector from the former layer as an input, applies a linear transformation to the input, and gives an output to the next layer after a nonlinear activation. Formally, the output of an individual neuron can be written as f (wx + b), where x is the input vector, w and b are the linear weights and biases to be learned, and f is the corresponding nonlinear activation function. The weights w and biases b are optimized through a backpropagation algorithm when training the network on a data set {X train , Y train }, by minimizing the deviation, which is expressed quantitatively by a loss function, of the predicted resultŶ = f w,b (X train ) and the ground truth Y train . A CNN, which is designed for image recognition, is a class of feedforward artificial neural network that consists of at least one convolutional layer. A convolutional layer accepts inputs of images and convolves them with small filters (or kernels), whose values are weights to be learned. A filter is applied across an entire image with a bias, and a feature map is generated after a nonlinear activation function.
Previous work has shown that a deep network based on a CNN can be used for image denoising (Vincent et al. 2008). Therefore, a CNN-based network can be an appropriate choice for the component separation of CMB observations. Here, we adopt the widely used U-Net architecture (Ronneberger et al. 2015) to achieve our analysis. The model consists of two parts: the encoder part (the left part of Figure 1) and the decoder part (the right part of Figure 1). The encoder part aims to learn a representation of the input data, while the decoder part reads the representation to generate an output sequence. In this structure, skip connections (the dashed lines in Figure 1) are added symmetrically from each convolutional layer to its corresponding mirrored deconvolutional layer, to ensure that more information about the convolutional feature maps is passed to the corresponding deconvolutional layers (Mao et al. 2016). The passed convolutional feature maps are copied and concatenated with the de- convolutional feature maps. In practice, we design a deep convolutional neural network with eight convolutional layers and eight deconvolutional layers (see Figure 1) for the component separation of raw CMB maps. Each convolutional or deconvolutional layer is followed by a nonlinear layer. Here, we take the parametric rectified linear unit (He et al. 2015) as the nonlinear activation function. In addition to the last layer, the batch normalization (Ioffe & Szegedy 2015) is implemented before each nonlinear layer. We adopt Adam (Kingma & Ba 2014) as the optimizer, and the least absolute deviation is used as the loss function, averaged over all elements of the target maps. The first two convolutional layers and the last two deconvolutional layers have a kernel size of 5 × 5, while the other parts have a kernel size of 4 × 4. The output channels of each layer are 16,32,64,128,256,512,1024,2048,1024,512,256,128,64,32,16 and 1, respectively. The stride of all convolutional and deconvolutional layers is set to 2, which makes the size of the feature maps decrease in the encoder part and increase in the decoder part. Padding is applied to ensure that the sizes of the feature maps that are output are reduced by half for the convolutional layers and doubled for the deconvolutional layers.

Data Simulations
The network is trained through simulated maps in several frequency bands, which are generated by using the public software Code for Anisotropies in the Microwave Background (CAMB; Lewis et al. (2000)), HEALPix (Górski et al. 2005), and PySM (Thorne et al. 2017). The sources we consider here include the CMB as the signal, and thermal dust, synchrotron, free-free, and anomalous microwave emission (AME) as the contaminations. Besides, the Gaussian beam and instrument noise of the Planck mission are also considered in our analysis.
It should be noted that the network aims to learn a mapping between contaminated CMB maps and foreground-cleaned CMB maps. Therefore, the data space of the contaminated CMB maps fed to the network should be large enough to cover the true observational data (such as the Planck contaminated CMB maps). In order to expand the coverage, we manually add uncertainties at the simulation step to both the CMB signal and the foregrounds. Specifically, for the CMB part, we compute the lensed CMB power spectra with CAMB 2 in the Λ cold dark matter framework (H 0 , Ω b h 2 , Ω c h 2 , τ , A s , n s ). The value of each cosmological parameter is subject to a Gaussian distribution N (P, σ 2 ), where P and σ are the best-fit value and the 1σ standard deviation obtained from the Planck2015 results , respectively. Then we use the function synfast of the public software Healpy 3 (a python wrapper of HEALPix) to compute the CMB maps, with a pixel resolution of N side = 512.
For the foregrounds, the maps are simulated by using PySM, which is a public software that generates full-sky simulations of galactic foregrounds in intensity and polarization for various CMB experiments. We consider four dominant components for the temperature fluctuation maps: thermal dust at high frequencies, synchrotron, free-free, and AME at low frequencies. The model for thermal dust that we use here is the d1 model, referring to the single-component modified blackbody model, with templates for emission at 545 GHz in intensity from the Planck2015 analysis . The model for synchrotron is s1 , referring to a power-law model with the temperature templates at 408 MHz (Remazeilles et al. 2015). The free-free model is f1, referring to an analytic model with the temperature template at 30 GHz, which is derived from the Commander fit to the Planck2015 data (Draine 2011). The model for AME is a2, referring to the sum of two spinning dust populations, based on the Commander code ). The equations for generating the foreground maps are listed below: T a,ν = A ν0,1 (ν, ν 0,1 , ν p,1 , ν p0 ) + A ν0,2 (ν, ν 0,2 , ν p,2 , ν p0 ) , where T d stands for the blackbody temperature of the thermal dust, and A and β for the template maps of amplitude and spectral index, respectively. In order to increase the data space of the foreground maps, we randomize the spectral parameters A and β for each component in each frequency band when generating the training data. For the sake of simplicity, here we only consider randomizing the spectral index β, and we simply randomize all the pixels together. More analysis of the randomness of the spectral parameters will be discussed in section 4.2. Specifically, we generate maps for each component in each frequency band, through randomizing the values of A, according to a Gaussian distribution with its mean value equal to the spectral index of the template and a 10% standard deviation. We note that the method of generating foregrounds through random spectral parameters can also reduce the dependence of the results on the foreground template, which may improve the reliability of the results.
In our analysis, we test the CNN method with four frequency bands: 100 GHz, 143 GHz, 217 GHz, and 353 GHz.
The beam file used here is Bl T R3.01 fullsky X×X.fits, which is taken from the Planck Legacy Archive, 4 where X×X stands for 100×100, 143×143, 217×217, or 353×353. The Planck frequency maps (at 100 GHz, 143 GHz, 217 GHz, and 353 GHz) contain effective Gaussian beams of FWHM=(9 .66,7 .27,5 .01,and 4 .86); thus, these beam effects are considered in our simulation. The instrument noise adopted here is the 300 realizations of noise and systematic residual maps that are publicly available on the Planck Legacy Archive, and these noisy maps are added to the mock data randomly.

Training Set
We generate 1200 mock data sets at four frequency bands with the pixel resolution N side = 512, and split them into three groups: 1000 are taken as the training set, 100 as the validation set, and the others as the test set. We train the networks on the training set, select the optimal on the validation set, and then test it on the test set. As we illustrated in section 2.2, both the CMB and the foreground maps are generated randomly. Therefore, the 1200 mock data are also generated randomly, which means that the cosmological and spectral parameters of each set of maps in the 1200 mock data 4 http://pla.esac.esa.int/pla are different. The inputs of the network are the contaminated CMB maps in four frequency bands, which contain the foregrounds, beam, and noise information, while the output is a foreground-cleaned CMB map with beam FWHM=7 .27.

Data Preprocessing
The simulated CMB and foreground maps are onedimensional arrays in the RING numbering scheme of HEALPix. However, the filter of the convolutional layer used in this work is designed to process two-dimensional image data. Therefore, these one-dimensional arrays should be transformed into two-dimensional arrays before feeding them into the network.
Starting from the one-dimensional array, we first transform it into the NESTED scheme of HEALPix, then arrange it into 12 small pieces of maps, with the size of N side × N side , according to the NESTED scheme. Finally, these small maps are combined into a large plane map (with the size of 4N side × 3N side ) for network training. We note that the combined two-dimensional map is an approximation of the spherical sky map. After the training of the network, we can feed the test set or the observational data to the well-trained network model to obtain a foreground-cleaned CMB map. The recovered CMB map is a two-dimensional map with the size of 4N side ×3N side . Therefore, we can finally obtain a spherical CMB map using the output two-dimensional map via inverse operation.
Besides, in order to facilitate the training of the network, we scale the target to smaller values. Specifically, we divide the foreground-cleaned CMB maps (i.e. the target in the training set) by five to make the network easier to converge. Finally, we can obtain the final estimated CMB signal by inverse operation. Note that the scaling to the target is just a trick to ensure that the network can be better trained, and the number five here is based on experiments.

Training Process
In our pipeline, the batch size is 12, the initial learning rate is set to 0.1 and gradually decays to 10 −6 with the number of iterations, and the network model will be trained through 10,000 iterations by minimizing the loss function. For each iteration, a subsample with the size of the batch size will be loaded from the disk and fed to the network, and then the loss between the predicted maps and the target maps will be calculated, and, finally, the gradient of the loss will be calculated and passed backward to update the network parameters. The network models are trained on four NVIDIA 1080 Ti graphics processing units, and it takes ∼ 6 hr for one network model.

Application to CMB Experiments
The main procedure in using this method is to train a network model using simulated data and then apply the well-trained model to observational data; thus, in this section, we first test our method on simulated CMB maps, and then test it on the Planck CMB maps.
3.1. Recovering the CMB Signal In this section, we train the networks to recover the CMB temperature signal from simulated CMB observations. After selecting the optimal model with the validation set, we feed the maps of the test set to the network model to obtain a series of foreground-cleaned CMB maps. In Figure 2, we show one set of maps of the test set. The upper panel is the simulated pure CMB signal, the middle panel is the CMB map recovered by the neural network, and the lower panel is the corresponding residual map. Based on the residual map, we can see that the CMB information in the high galactic latitudes has been recovered very well, while in the low galactic latitudes, especially close to the galactic plane, the performance of the network seems a little unsatisfactory, due to the strong foreground contaminations. In order to have a deeper look at the recovery of the CMB map, we selected five small patches with 3 × 3 deg 2 from the north pole to the south pole with different latitudes, as shown in Figure 3. The first and last patches are located near the north and south poles, respectively. The second and fourth patches are located between the galactic plane and the north (south) pole, while the third patch is located in the galactic plane. For the first, second, fourth, and fifth patches, the recovered CMB maps are almost the same as the simulated ones, and there is little information left in the residual maps. On the contrary, the recovered CMB map for the third patch looks different to the simulated one, and there is a lot of information in the residual map. As we have shown in the residual map of Figure 2, this should be caused by the foreground contaminations in the galactic plane.
For a quantitative comparison, we calculate the mean squared error (MSE) for the recovered CMB map: MSE = 40.055 µK 2 . Furthermore, we also calculate the root mean square (rms) for 18 zonal bands in the residual map, each being 10 • wide in galactic latitude, shown in the left panel of Figure 4. The rms range of the residual CMB map is from 3.861 µK in the high galactic latitudes to 12.335 µK in the galactic center, with a sudden increase close to the galactic plane b < 20 • . For comparison, we also calculate the rms of the contaminated CMB maps at 100 GHz and 143 GHz, shown in the right panel of Figure 4. We can see that, except for the larger order of magnitude, the shape of this curve of the rms looks similar to that of the residual map. Therefore, by comparing these two panels, one can easily conclude that the sudden increase in rms close to the galactic plane in the residual map is caused by the galactic foregrounds.
We then compute the corresponding power spectrum of the recovered CMB map, shown in the upper panel of Figure 5. The orange line is the power spectrum of the recovered CMB map, while the blue line is that of the simulated CMB map. In the lower panel, we show the difference between the recovered spectrum and that of the simulated CMB map. We can see that the recovered CMB signal is quite consistent with the true spectrum at both small scales and large scales, which is different from the fact in Petroff et al. (2020), where the recovered spectrum diverges from the true spectrum at > 900.
We note that, at > 750, the recovered power spectrum begins to fluctuate, which should be caused by instrument noise and instrument beam effects. To prove this, using the same procedure, we trained another network model with data that have no instrument noise and no instrument beam effects, and test this network using mock data. In Figure 6, we show the power spectrum of one set of maps from the test set. We can see that, at > 750, the difference between the recovered spectrum and that of the simulated CMB map is much better than the results in Figure 5. Therefore, the instrument noise  and instrument beam effects have a great influence on the recovery of the CMB signal.
In addition, we calculate the deviation of the recovered power spectrum, and then compare this deviation with the cosmic variance. First, we calculate the difference between the recovered power spectrum and the simulated ground truth: where C ,recover is the recovered power spectrum and C ,true is the true power spectrum calculated using the simulated CMB map. Then, we calculate the standard deviation of this difference: where N is the number of samples in the test set and ∆C ,CNN is the average of ∆C ,CNN . Finally, we compare this deviation with the cosmic variance:  lated using the CNN model, corresponding to Figures 2 and 5. The result shows that the deviation caused by the CNN is a little larger than the cosmic variance at < 10. However, with the increase of the multiple , the deviation will decreases rapidly, and it will becomes smaller than the cosmic variance by more than two orders of magnitude at > 200. This deviation is quite small when compared with the cosmic variance, and thus it will have less effect on the recovered CMB power spectrum. To further test the reason why the deviation is larger than the cosmic variance at < 10, we apply the Planck mask COM Mask CMB-common-Mask-Int 2048 R3.00.fits 5 when calculating C . The results are shown in Figure 8. We can see that these results are somewhat better than the previous ones, but the deviations caused by the CNN are still larger than the cosmic variance at < 8. This means that the deviation is not primarily caused by the foreground contaminations in the galactic plane; instead, it is more likely that the deviation is caused by the CNN itself. Further research 5 http://pla.esac.esa.int/pla/#maps   is needed to figure out if this issue can be solved fundamentally, which will be shown in our future works.

Effect of Frequency Bands
In the above analysis, four frequency bands are used to train the neural network. However, we note that the number of frequency bands may affect the recovered CMB map. Therefore, in this section, we test the effect of frequency bands on the recovery of the CMB signal. Specifically, we select another three cases to train the network, with one, two, and three frequency bands, respectively; thus, there are four cases in total, as shown in Table 1. Following the same procedure, we generate the training data using the method of section 2.2, train the network models for these four respective cases using   Figure 9. Power spectra of the recovered CMB map (upper panels, using a bin size of 1) and the difference between these spectra and that of the simulated CMB maps (lower panels, using a bin size of 30) when using 1, 2, 3, and 4 frequency bands, respectively.  Figure 10. Power spectra of the recovered CMB maps (upper panels, using a bin size of 1), and the difference between these spectra and those of the simulated CMB maps (lower panels, using a bin size of 30), when using two frequency bands. The left panel is for a 50% noise level and the right panel is for a 30% noise level. the method of section 2.5, and then test them using the corresponding test sets.
In Figure 9, we show the recovered power spectrum of one set of maps from the test set for the four respective cases. From left to right, the training set contains one, two, three, and four frequency bands, respectively. Obviously, we can see that as the number of frequency bands increases, the recovered power spectrum gradually matches the true power spectrum. Therefore, the number of frequency bands significantly affects the recovered CMB; and the more frequency bands, the closer the recovered CMB is to the real one. Quantitatively, for the recovered power spectra, the MSEs are 9240.498, 4761.857, 1068.350, and 532.523 µK 4 , respectively. With the increase in the frequency bands, the MSEs of the power spectra become smaller and smaller, and for the cases that have four frequency bands, the recovered power spectrum can match the true power spectrum very well. This means that for the Planck experiment, the component separation of the CMB can be achieved with four frequency bands.
In addition, we calculate the deviations of power spectrum caused by CNN (see Figure 7) for all the three cases and compare them with that of case 1. We can see that even all the deviations of these three cases are a little larger than that of case 1, they are all much smaller than the cosmic variance at > 10. Therefore, this deviation caused by CNN remains a secondary factor when using fewer frequency bands.

Effect of Noise
The analysis of section 3.2 shows that for the Planck experiment, at least four frequency bands are needed when using the neural network to recover the CMB signal (see Figure 9). However, this may not be the case for other experiments because the results can be affected by the noise level of the instrument. To further test the effect of instrument noise on the recovery of the CMB signal, we consider another two different noise levels of the instrument: 50% and 30% of the Planck noise level. To more clearly illustrate the effect of noise on the recovery of the CMB signal, we train the network model using two frequency bands.
We first consider the case of 50% Planck noise level. The recovered power spectrum of one set of maps in the test set is shown in the left panel of Figure 10. The recovered power spectrum is a little better than that when using Planck noise (see the second panel of Figure 9). Moreover, we can clearly see that the recovered power spectrum is more consistent with the truth than that of section 3.2. Therefore, the noise level will greatly affect the recovery of the CMB signal when using the neural network.
Furthermore, we train the network model using data that have a 30% Planck noise level, and the recovered power spectrum of one set of maps from the test set is shown in the right panel of Figure 10. We can see that the recovered power spectrum is quite consistent with the true spectrum at both small scales and large scales, which is similar to those using four frequency bands (see Figure 5). Furthermore, the recovered power spectrum is much better than that using the Planck noise, and even better than that using a 50% Planck noise level. This indicates that for CMB experiments that have lower noise levels, the neural network is capable of recovering the CMB signal using only two frequency bands. Therefore, this means that the neural network will play an important role in the component separation of future higherprecision CMB experiments. Moreover, following the same procedure, we calculate the deviations of the power spectra using the training set, as shown in Figure 7. The deviations are much smaller than the cosmic variance at > 10, and also smaller than the cases using different frequency bands, in previous sections. This means that the deviations of the power spectra caused by the CNN will be reduced when there are smaller instrument noise levels.

Effect of the Galaxy Foregrounds
Besides the effects of instrument noise and frequency bands, the foregrounds from our Galaxy constitute the other main factor. To prove this, we use two parts of the full-sky map to test the CNN method. Specifically, the areas inside the brown lines in Figure 11 show two blocks of the full-sky map of the CMB observations, and we train the network model using these two blocks, respectively. The first block is at high galactic latitudes, which are far away from the Galaxy plane and are less contaminated by Galaxy foregrounds. The second block is near the Galaxy plane, and is greatly contaminated. Note that the size of the image is 512 × 512, which is smaller than the image using the full-sky map. Therefore, the network model here actually contains five convolutional layers and five deconvolutional layers.
First, we train a network model for the first block. After selecting the optimal model with the validation set, we feed maps of the test set to the network model to obtain a series of foreground-cleaned CMB maps. We show one set of maps from the test set in Figure 12. The upper left panel is the simulated CMB map, and the upper right panel is the CMB map recovered by the network. We can see that these two maps are almost identical, which can also be concluded from the residual map in the lower left panel. To further look at the recovery of the CMB map at a smaller scale, we select three patches with 3 × 3 deg 2 that have different latitudes, as shown in Figure 13. We can see that the recovered CMB maps are almost the same as the simulated ones for all the three patches, and that there is little information left in the residual maps. Furthermore, we calculate the power spectrum of the recovered CMB map, and the difference between this spectrum and that of the simulated CMB map, as shown in the lower right panel of Figure 12. We can see that the recovered CMB signal is quite consistent with the true spectrum at both small scales and large scales, which is similar to the results of section 3.1 (see Figure  5). Therefore, this indicates that the CNN method can also be used for the component separation of partial-sky experiments, thereby increasing the practicability of the method.
Then, using the same procedure, we train another network model for the second block, and test it with the corresponding test set. In Figure 14, we show one set of maps from the test set. It can be seen from the residual map that, compared with the first block (Figure 12), the recovered CMB map is greatly affected by the Galaxy foregrounds. Figure 15 shows three patches with 3 × 3 deg 2 , selected from different latitudes. For the first and third patches, the recovered CMB maps are almost the same as the simulated ones, while for the second patch, the recovered CMB map is slightly different from the simulated one, and there is a lot information in the residual map. The reason is that this patch is located in the galactic plane. From the lower right panel of Figure 14, we can see that the recovered power spectrum of the CMB is consistent with the ground truth at < 1000. However, the recovered power spectrum greatly deviates from the ground truth at > 1000. This means that the Galaxy foregrounds have less influence on the large-scale CMB information, but have greater influence on the small-scale CMB information.

Test with Planck CMB Maps
In view of the success of the CNN in recovering the CMB signal from various huge foreground contaminations, we finally apply this pipeline to the Planck mission. As a preliminary test, we only consider the recovery of CMB temperature fluctuations. Specifically, we apply the CNN model obtained in section 3.1 to the Planck full mission datasets. Therefore, the inputs of the network are four-frequency contaminated CMB maps that contain the beam and noise information, while the target of the network is a foreground-cleaned CMB map with the beam FWHM=7 .27.
The main results are shown in Figure 16. CMB map, which is publicly available on the Planck Legacy Archive and is considered the ground truth; the middle panel is the CMB map recovered by the neural network; and the lower panel is the corresponding residual map. We can see that the CMB map recovered by the neural network is quite similar to the Commanderbased CMB map at high galactic latitudes, but in the low galactic latitudes, especially close to the galactic plane, there is a little difference, which can also be seen from the residual map. Following the same method in section 3.1, we select five small patches with 3 × 3 deg 2 from the north pole to the south pole, shown in Figure 17.
We can see that the recovered CMB maps are consistent with the simulated ones for the first, second, fourth, and fifth patches, while the recovered CMB map is different from the simulated one for the third patch, which should be caused by the foreground contaminations in the galactic plane. For a quantitative comparison, we calculate the power spectra of the recovered CMB and Planck CMB maps, shown in the upper panel of Figure 18. Note that in the galactic plane, the CMB signal is not recovered very well for either the Commander-based CMB or the CNNbased CMB maps; therefore, the Planck mask is applied to the CMB maps in the calculation of the power spectra. Here, the obtained power spectrum D is divided by the Gaussian beam window function B( ), thus the beam information has been removed from the power spectra. Obviously, these two power spectra are almost identical. Furthermore, we calculate the difference between these two power spectra, shown in the lower panel of Figure  18. We can see that the difference is quite small for < 1000, while there is a deviation for > 1000. This may be caused by the application of the Planck mask to the recovered CMB map, because the Planck mask was not used when training the network. A more reasonable approach might be to apply the Planck mask to the training set, and then train the network using the masked data. To do this, we train a new network model using the masked data, and apply it to the Planck datasets to calculate the power spectrum. The recovered power spectrum is shown in Figure 19. Obviously, this result is better than that in Figure 18 for > 1000, and it is also better than that of Petroff et al. (2020). Therefore, this illustrates that the CNN method can successfully recover the CMB signal from the Planck data sets, and it also indicates the feasibility of this method in processing observational data.
However, it should be noted that only four main foreground components are considered in the training set, and we are here only proving the feasibility of this method in processing observational datasets, thus all these analyses are simple and rough. Therefore, when applying this method to observational datasets, a detailed analysis using more comprehensive foreground models should be elaborated in future works.

Frequency Selection for Planck Mission
In sections 3.1 and 3.3, we have shown that both instrument noise and beam effects have an influence on the recovery of the CMB signal. This means that either strong instrument noise or strong instrument beam effects will cause the CMB signal extracted by the CNN to be biased. This is the main reason why only four frequency bands are considered in our analysis.
In order to show the instrument noise and beam effect of the Planck satellite, we plot in Figure 20 the power spectra of a simulated CMB map, considering Planck beam effects and the power spectra of instrument noise at nine observational frequency bands. For 30 GHz and 44 GHz, a lot of the CMB information at > 600 is lost due to the beam effect. Therefore, these two frequency  bands can almost only provide large-scale information. Furthermore, the noise power spectra are larger than the power spectra of the CMB at > 500, which makes it difficult for the CNN to extract the CMB signal at > 500. Therefore, these two frequency bands are not considered in our analysis. For 70 GHz, the CMB information is retained at < 1000, but the noise power spectrum becomes larger than that of the CMB at > 850; thus, this frequency band is also not considered in our analysis. For 545 GHz and 857 GHz, we can see that the noise power spectra are much larger than those of the CMB signal in the order of magnitude. Therefore, these two frequency bands should not be used, even though small-scale information is not affected by the beam effects. Therefore,  Figure 19. The same as Figure 18, but now the Planck mask is applied to the training set.
considering the instrument noise and beam effects, only four frequency bands (100 GHz, 143 GHz, 217 GHz, and 353 GHz) are considered in our analysis.
We note that the effective beam effect of the output CMB map should also be selected reasonably. For the four selected frequency bands, the effective Gaussian beams are 9 .66, 7 .27, 5 .01, and 4 .86, respectively. Obviously, it is not reasonable to select a beam effect with FWHM¡4 .86, since the CNN will not generate additional CMB information. On the contrary, it is reasonable to select a larger value of FWHM among these four frequency bands. Therefore, an effective beam effect with FWHM=7 .27 is considered in our analysis, which is reasonable for recovering the CMB signal. Therefore, we propose that for other current or future CMB experiments, the training data should also be determined ac-  Figure 20. Power spectra of a simulated CMB map, considering Planck beam effects, and power spectra of instrument noise at nine observational frequency bands. Yes Note: 10% refers to the standard deviation of the spectral parameters relative to the template parameter value, and "random pixel independently" means whether a pixel is randomized independently or not.
cording to the specific instrument noise and beam effect when using this method.

Variability of Foregrounds
Since the foregrounds are much larger than the CMB signal in the order of magnitude, foregrounds are very important elements in the component separation of CMB. In addition, the CNN method is a data-driven method that strongly relies on simulated data. Therefore, when generating training data, the foregrounds should be simulated reasonably and correctly. Considering the fact that the spectral parameters of the foregrounds will vary across the sky, the spectral parameters should be generated randomly when simulating the foregrounds. For the simulation of the foregrounds in section 2.2, we only considered the variation of the spectral index β, with a 10% standard deviation. In this section, we will consider more cases of the variation of the spectral parameters to test the impact of this on the recovery of the CMB signal.
In table 2, we show five cases of variations of the spectral parameters. Case 1 refers to the method illustrated in section 2.2, which only considers the variation of the spectral index β, with a 10% standard deviation. For case 2, there are no variations in either the amplitude A or the spectral index β, which means that the template of the foregrounds in PySM is used directly for the training data. For cases 3 and 4, we only consider the variation of the amplitude or the spectral index, respectively. For case 5, we simulate data by varying both the amplitude and the spectral index. It should be noted that for case 1, all pixels are varied together, which means that all the pixels of a map share the same random number. On the contrary, for cases 2, 3, 4, and 5, the pixels are varied independently, which means that each pixel of a map has an independent random number.
Following the same procedure as the analysis above, for case 2, we generate training data and train and test the network. In the upper left panel of Figure 21, we show the recovered CMB power spectrum, and the difference between this power spectrum and that of the simulated one. We can see that the recovered power spectrum is a little lower than the ground truth, especially for > 1000, when compared with case 1 (see Figure 5). We note that only one set of foregrounds (the template) is used in the training data. The results may indicate that, compared with case 1, the variation of the spectral parameters will improve the robustness of the network and increase the accuracy of the recovered power spectrum.
For case 3, we generate new data and train another network model to recover the CMB signal. The results are shown in the upper right panel of Figure 21. We can see that the recovered power spectrum becomes larger than the ground truth at > 900, and gradually deviates with the increase in multipoles, which is unacceptable.
The main difference between case 3 and case 1 is the variation of the spectral parameters. To illustrate this difference, take the thermal dust at 143 GHz as an example-we show the power spectra of ten sets of samples in Figure 22. The black lines refer to the power spectrum of the dust template, while the other colored lines are the power spectra of the simulated maps. For case 1, the simulated power spectra are almost coincident with the template, at both large scale and small scale. However, for case 3, the simulated power spectra are coincident with the template at large scale, but grad-   Figure 21. Power spectra of the recovered CMB maps, and the difference between these spectra and those of the simulated CMB maps, for cases 2, 3, 4, and 5, respectively. ually deviate from the template at > 900. It looks like the simulated thermal dust is obtained by adding a noise map to the template. As we have shown in section 3.3, noise will greatly affect the recovery of the CMB signal, thus the deviation of the recovered power spectrum at > 900 should be caused by the variation of the spectral parameters for the foreground components.
Finally, following the same procedure, we also test cases 4 and 5. Examples of the recovered CMB power spectra are shown in the lower left panel and lower right panel of Figure 21, respectively. For both cases, we can see that the recovered power spectra become lower than the ground truth at > 450, and gradually deviate with the increase in multipoles. Furthermore, we can see that the power spectra of the simulated dust maps (the lower left panel and the lower right panel of Figure 22) deviate from the template at small scale, which is similar to case 3. Therefore, the deviations of the power spectra in Figure 21 are caused by the randomness of the foreground spectral parameters. Comparing the results of case 4 and case 1, we may conclude that the power spectra of the foreground components will be destroyed (especially at small scale) if we randomize the spectral index β for each pixel independently, and more additional noise will be introduced.
The results of these five cases show that the random form of the spectral parameters will have a great impact on the recovery of the CMB signal. Therefore, the spectral parameters of the foregrounds should be randomized reasonably. We propose that the power spectrum of the simulated foregrounds should be statistically coincident with that of the template at both large scale and small scale. We note that the method of varying the spectral parameters used in this paper is very simple, and that further efforts are needed to study how to simulate the foregrounds more reasonably. We will investigate this issue in our future works.

Partial-sky Experiments
The analysis in section 3.4 shows that the CNN method can also be used for partial-sky experiments. However, the block maps used in section 3.4 are regularly shaped sky areas, which may be difficult to achieve in actual observations. For a specific partial-sky experiment, the observed sky area may have an irregular shape. In this section, we provide a method for training the network using irregular maps. For an irregular sky area, we can always transform it and fill it into a rectangle. Taking the simulated data as an example, we apply the Planck mask to the first block of Figure 11 and use it to train a network model. This means that the number of pixels of the unobserved area is set to zero. Using the same   Figure 23. The same as the Figure 12, but now the Planck mask is adopted in the training data. procedure as section 3.4, we train the network on the training set and test it on the test set.
In Figure 23, we show one set of maps from the test set. From the simulated CMB map (the upper left panel of this figure), we can see that, since there is no observation, the pixel value of the area in the lower right corner is set to zero. The recovered CMB map is shown in the upper right panel, and it looks similar to the simulated one. From the lower left panel, we can see that there is very little information is left in the residual map. Furthermore, we calculate the power spectra of both the recovered CMB map and the simulated CMB map. Specifically, we first fill the partial-sky map into a full-sky map, and make sure the other pixels have zero values, then calculate the power spectrum using this full-sky map. The results (see the lower right panel) show that the recovered CMB signal is quite consistent with the true spectrum at both small scales and large scales. Therefore, this indicates that the CNN method can also be used for the component separation of partial-sky experiments with irregular sky areas.
We note that the sky area used here is generated manually, by applying Planck mask, which may be more ideal than the actual sky area. Therefore, further research is needed when applying this method to actual observational data. We will investigate this interesting issue in future works.

CMB Polarization Maps
The analysis above is mainly focused on the recovery of the CMB temperature signal. However, this method is a common method that can also be used for the recovery of the CMB polarization signal. In this section, taking the CMB-S4 experiment as an example, we test the reliability of the CNN in recovering the CMB Q/U maps.
Specifically, we select six frequency bands of the CMB-S4 experiment, from 85 GHz to 270 GHz. The details of the experimental specifications are shown in Table 3. For the sake of simplicity, we only consider a block of sky like the first block in Figure 11. Therefore, the sky fraction f sky = 0.083 for all frequency bands. The network used here is the same as the one used in section 3.4, which contains five convolutional layers and five deconvolutional layers. Following the same method as sections 2.2, 2.3, 2.4, and 2.5, we simulate the CMB, foregrounds, and noise components for the CMB-S4 experiment, generate the training set, preprocess the data, and finally train the network to recover the CMB Q and U maps, respectively.
After training the network, we test it using the test set. In Figure 24, we show one set of maps from the test set. It can be seen from the residual maps that very little information is left, which means that the recovered CMB Q/U maps are very similar to the simulated ones. Furthermore, we calculate the EE and BB power spectra from the recovered CMB Q/U maps, shown in Figure 25. From the residual of the EE power spectrum (the upper right panel of Figure 25), we can see that the recovered power spectrum is quite consistent with the simulated one at < 1250, and that it gradually deviates with the increase in multipoles at > 1250, with the maximum deviation being ∼ 6.5% at = 1500. Unlike the EE power spectrum, the residual of the BB power spectrum, in the lower right panel of Figure 25, shows that the BB power spectrum is consistent with the simulated one at both small scales and large scales, which is quite helpful for the detection of the primordial gravitational waves.

Comparing with Other Methods
In this section, we compare our analysis with that illustrated in Petroff et al. (2020). There are two main differences between our analysis and that of Petroff et al. (2020): the network model and the data used to train the network.
In our analysis, a general two-dimensional CNN with U-Net architecture is adopted, while in the Petroff et al. (2020) analysis, a spherical CNN with U-Net architecture based on DeepSphere is used. The original intention of using a general two-dimensional CNN in our analysis was to enable the method to deal with both the full-sky maps and partial-sky maps. Fortunately, the analysis above shows that the network model based on a general two-dimensional CNN is capable of dealing with both full-sky maps and partial-sky maps. This is useful for experiments that focus on measuring the CMB polarization signal in a part of the sky.
We both tested the CNN method by using the simulated data and by applying the method to the Planck temperature measurement to obtain foreground-cleaned CMB maps. But the difference is that we only considered four frequency bands (100 GHz, 143 GHz, 217 GHz, and 353 GHz) in our analysis, while Petroff et al. (2020) used seven frequency bands, from 70 GHz to 857 GHz, based on the Planck experiment. We used four frequency bands because we found that both the noise level and the beam effect (see sections 3.1 and 3.3) affect the recovery of the CMB signal.
In addition, according to our understanding, Petroff et al. (2020) smoothed the maps in all frequencies, using a Gaussian beam with FWHM=13 .1 (corresponding to the 70 GHz channel), and the target of the training set contained this beam effect. By contrast, we smoothed the maps in different frequencies with different beams, based on the Planck experiment, and the target of the training set contained a Gaussian beam effect with FWHM=7 .27    Figure 25. EE and BB power spectra calculated from the recovered CMB Q/U maps in Figure 24 (left panels, using a bin size of 5), and the difference between these spectra and those of the simulated maps (right panels, using a bin size of 30).
(corresponding to the 143 GHz channel). From Figure  20, we can see that the power spectrum of the CMB at 70 GHz will loses more information than that at 143 GHz due to beam effects. Also, the power spectrum of the noise will be larger than that of the CMB at > 900. Therefore, worse results will be obtained when smoothing the target of the training set using a Gaussian beam with FWHM=13 .1.

The CNN Method
The CNN is capable of extracting the features of CMB signals from images, and passing these features to later layers for the further filtering out of interfering signals. The CNN-based component separation method is a new approach to extracting CMB signals from the perspectives of images. The analysis above shows that the CNN method is capable of recovering the tiny CMB temperature and polarization signals from contaminated CMB observations. Therefore, the CNN method is an independent method, and it can be used as an alternative to various other component separation methods. This should be helpful for the data analysis of current and future CMB polarization experiments.
In the pipeline of recovering the CMB signal using the CNN, the neural network should be trained before it is applied to the observational data sets; thus, the CMB and foreground models used to generate the training set are input knowledge. This means that the CNN method is a model-dependent method that relays on the input information. Therefore, the simulation of the training set is important for this task. The training set should contain all possible components that the detector may receive, such as the CMB signal, all possible foreground components, the instrument noise, and the instrument beam effect.
The analysis in this work has mainly focused on the recovery of the CMB signal. However, similar to the CMB signal, various foreground components can also be regarded as signals, therefore the CNN method can also be used to separate these foreground components from observation data sets, which will allow us to have a better understanding of the physical properties of the foreground components. In addition, the CNN method is a general method for processing two-dimensional images; thus, it can be used for the component separation of other sky survey experiments, such as the radio survey experiments. We will investigate these interesting issues in future works.

Conclusions
In this work, we show that a CNN can be used to recover the tiny CMB signals from various huge foreground contaminations. Focusing on CMB temperature fluctuations, we train the CNN model using simulated data, and find that the CMB temperature maps can be recovered with high accuracy, and that the deviation of the power spectrum C is smaller than the cosmic variance at > 10. Furthermore, we apply this method to the Planck full mission datasets and find that the recovered CMB map is quite consistent with that disclosed by the Planck collaboration. Therefore, this indicates that the CNN method is capable of recovering the CMB signal from observational datasets, making it an alternative to current methods for the component separation of CMB raw maps, which can be used to analyze future CMB observational datasets, such as the CMB-S4 experiment.
Except for the full-sky maps, the CNN method is also capable of dealing with partial-sky maps. Specifically, we train the CNN model using part of the full-sky CMB map, and find that the recovered CMB signal is consistent with the simulated one. This means that this method can also be used in partial-sky experiments, allowing the method to be more widely used in future observational experiments.
Moreover, taking the CMB-S4 experiment as an example, we simulate CMB Q/U maps to test the potential of the CNN in recovering the CMB polarization signal. The results show that, for the CMB-S4 experiment, both Q and U maps can be recovered with high accuracy. Furthermore, we calculate the EE and BB power spectra, and find that both of them can be recovered with high accuracy. Therefore, the CNN method may play an important role in experiments that measure the CMB polarization signal.
More interestingly, the CNN method is aimed at analyzing two-dimensional images; therefore, this means that it can become a common method for other similar tasks, such as the component separation of future radio surveys, like the Square Kilometer Array project (Mellema et al. 2013). We will investigate these interesting issues in future works.  Figure 14. The same as Figure 12, but now using the second block of Figure 11 to train the network.