Baryon density extraction and isotropy analysis of cosmic microwave background using deep learning

The discovery of cosmic microwave background (CMB) was a paradigm shift in the study and fundamental understanding of the early Universe and also the Big Bang phenomenon. Cosmic microwave background is one of the richest and intriguing sources of information available to cosmologists and one parameter of special interest is baryon density of the Universe. Baryon density can be primarily estimated by analyzing CMB data or through the study of big bang nucleosynthesis (BBN). Hence, it is necessary that both of the results found through the two methods are in agreement with each other. Although there are some well established statistical methods for the analysis of CMB to estimate baryon density, here we explore the use of deep learning in this respect. We correlate the baryon density obtained from the power spectrum of simulated CMB temperature maps with the corresponding map image and form the dataset for training the neural network model. We analyze the accuracy with which the model is able to predict the results from a relatively abstract dataset considering the fact that CMB is a Gaussian random field. CMB is anisotropic due to temperature fluctuations at small scales but on a larger scale CMB is considered isotropic, here we analyze the isotropy of CMB by training the model with CMB maps centered at different galactic coordinates and compare the predictions of neural network models.


Introduction
CMB is an electromagnetic radiation whose wavelength lies in the microwave region of the spectrum. The CMB maps display the temperature fluctuations of the photons that originate from a very early time when the Universe was about one-millionth of its present size. These fluctuations follow an almost random Gaussian distribution (Bucher (2015)) which extends onto the pixel intensities of the digital images of Mollweide projections of these maps. Since there is no spatial coherence between the CMB maps considering they are Gaussian random fields, it is rather an intriguing task to check the accuracy with which machine learning models are able to correlate the Gaussian maps to cosmological parameters considering the abstract nature of the dataset. Our objective is to build and study the ability of a machine learning model to predict baryon density of a given CMB map and also analyse the isotropy of CMB by comparing the results of models trained with maps centred at different galactic coordinates. Although there is a loss of information while training the model using rendered images of CMB, we have taken only a single quantity (baryon density) as our parameter of interest with a sole purpose of demonstrating the ability of a machine learning model to extract useful information from CMB data.
The variations in temperature of Cosmic Microwave Background (CMB) are similar to the ripples on the cosmic pond and enclose a lot of information about the Universe. To collect this information we look at the scales at which these temperature fluctuations occur. The amount of temperature fluctuations (in micro Kelvin) is plotted against the multipole moment (l). This is the angular power spectrum graph of a CMB temperature map as shown in figure 1. Such graphs contain a number of peaks which provide us with a lot of information and we exploit this for our use. The first peak is an indication of the geometry of the Universe, whether it is flat or curved (Hu and White (2004)). CMB radiation is distorted by the curvature of the Universe since the radiation comes from all directions of the visible Universe. The fluctuations will appear undistorted if the Universe is flat. The fluctuations would appear magnified if the Universe is positively curved and de-magnified if it is negatively curved.
The second peak reveals information about the amount of baryon present in the Universe. Due to the initial fluctuations in the Universe, all matter would tend to gravitationally group towards the higher density fluctuations. However, baryon matter which is interactive with light would heat up as it clumps up, and the resultant pressure would try to push against the grouped matter (Hu and White (2004)). This implies that the second peak will be more damped if there is more matter.
Thus, the ratio of the first and second peak gives us the baryon density which we use for each map for training our model.
The anisotropies of CMB are determined by two factors, namely, acoustic oscillations and diffusion damping. The pressure of the photons tends to remove anisotropies, whereas the gravitational attraction of matter, makes them collapse to form over-densities. These two effects conflict and compensate each other to create acoustic oscillations, which gives CMB its characteristic peak structure. The resonant frequency at which photons decouple at a certain mode at its maximum amplitude corresponds to the peaks in the power spectrum of CMB. The temperature anisotropy at any point on the sky (θ, ϕ) can be expressed in spherical harmonics as: The temperature anisotropies of the CMB are believed to be a result of the inhomogeneous matter distribution at the time of recombination. Since Compton scattering is an isotropic process, any primordial anisotropies should have been erased before decoupling. This provides proof to the results that are derived from the anisotropy that is observed as the result of density perturbations that facilitated the formation of galaxies and clusters. Anisotropy in temperature provides concrete evidence that such inhomogeneity in density existed in the early Universe, supposedly in the scalar field of inflation caused by quantum fluctuations or through topological defects resulting from a phase transition.
Neural networks have been increasingly in use to approach and solve physics problems, especially in cosmology. Models like variational autoencoders has been used for CMB image inpainting to reduce the uncertainty of parametric estimation (Yi et al (2020)). Deep convolutional neural network or ResUNet has been used for lensing reconstruction of CMB (Caldeira et al (2019)). Removing foreground components from CMB has been achieved using a Bayesian spherical convolutional neural network that captures both spectral and morphological aspects of the foregrounds (Petroff et al (2020)). Continuing this trend, in this paper we leverage neural networks for estimating baryon density of a given CMB temperature distribution and also verify its isotropic nature.

Methodology
We use Code for Anisotropies in the Microwave Background or CAMB to generate the CMB temperature angular power spectra data. CAMB is a cosmology library used to calculate CMB, lensing, source count and dark-age 21 cm angular power spectra (Lewis and Challinor (2014)). CAMB takes several parameters as  input to generate a FITS file containing the initial angular power spectrum data of the Universe. The Curved correlation function is used as the lensing method and we include reionization. Other physical parameters which are input to CAMB include Hubble constant, the temperature of CMB (2.725 5 Kelvin), baryon density (0.022 6), cold dark matter density (0.112), the effective mass density of dark energy, maximum multipoles data, redshift (11), helium fraction (0.24) and so on (LAMBDA-Tools, NASA2015. The power spectrum file generated is used by standard cosmological analysis python package healpy (Gorski et al (2005)) which is used to handle pixelated data on a sphere, to generate random Gaussian CMB temperature maps. 2350 such maps are created. Figure 2 is one such CMB temperature map created using healpy. Anisotropy from dipole effect due to the movement of the earth relative to CMB rest frame and galactic contaminants along the equator corresponding to the galactic plane is removed while generating the temperature maps (Bucher (2015), Planck et al (2011)). The generated full sky maps have the galactic center at the center of the Mollweide projection.
Also, baryon density (extracted from the power spectrum) corresponding to each of the generated temperature maps is stored in a separate CSV file.
We then snip 64x64 pixel (corresponding to 28x28 deg 2 ) size images from the whole sky map using OpenCV and store them separately (figure 3). The images are snipped only along the equator such that these portions could be treated as flat. In addition, we also rotate the CMB whole sky map along 4 different axis, moving the center of the maps to different galactic coordinates and repeat the same process in order to verify isotropy at large scales. These images will be treated as our input into the training model. We end up with 141, 250 cropped images to train our model.
To remove outliers from our data, we have gone with the box plot approach and found the upper quartile (Q3), lower quartile (Q1), and also the inter-quartile distance (IQR = Q3 − Q1) of all the baryon density values in our dataset and then removed all the data points that are greater than (Q3 + 1.5 * IQR) or lower than (Q1 −1.5 * IQR).
Furthermore, to aid our model in better extracting the features, we plot the pixel intensities of the grayscale training images and fit a Gaussian to the plot as shown in figure 4, and then replace the pixels with intensities that fall below a probability value of 0.05 with the average values of the remaining pixels to remove the outliers of an individual CMB distribution. Figure 5 shows a sample CMB distribution and a map with pixels with lower than set threshold intensity set to 0 to demonstrate the outliers .For additional reference, the data along the galactic coordinate (0

Analysis
To estimate the baryon density of a given CMB distribution, we will be using a CNN and an MLP based regression model. Convolutional neural networks (CNN) are one of the most famous sets of neural network architectures used for classifying images due to their ability to learn non-linear hierarchical structures and lower computational costs. CNN takes advantage of local spatial coherence of the input Rippel et al (2015) because we assume that the spatially close images used for training are correlated, but in the case of the CMB dataset, the pixels in the images are random noise following a Gaussian distribution. Hence it will be an interesting exercise to observe how the CNN model performs in our regression problem. The architecture of the CNN model is shown in figure 6.
A multilayer perceptron is one of the most commonly used architectures of feedforward artificial neural networks. A multilayer perceptron comprises of three classes of layers and nodes, the input layer, hidden layers, and an output layer. Each node in a layer is connected to the nodes of the next layer via a non-linear activation function. Multilayer perceptron makes use of one of the most famous techniques of supervised learning called backpropagation (Goodfellow et al (2018)) for training the network. A multilayer perceptron can be distinguished from a linear perceptron from its characteristic use of fully connected multilayers. This makes multilayer perceptrons suitable for working with non-linearly separable data (Bullinaria (2015)). Multilayer perceptrons are often informally referred to as vanilla networks (Hastie et al (2017)). A multilayer perceptron can be perceived as a logistic regression classifier. The input is transfigured with the help of a learned non-linear transformation. The intermediate layers are often mentioned as a hidden layer. These hidden layers make the multilayer perceptron a 'universal approximator' (Sifaoui et al (2008)). The architecture of the MLP model is shown in figure 7.
The weights of the fully connected layers are updated once a batch of data has been passed through the network by measuring the error of the output with the expected result (predetermined labels), this is the essence of learning in neural networks and is carried out with the help of an iterative algorithm called backpropagation. This is an example of supervised learning. Backpropagation uses an iterative optimization algorithm called gradient descent (Goodfellow et al (2018)) to update the weights of the network.
The configuration of the MLP network is tabulated in table 1. The learning rate determines how fast the weights or the coefficients of the network are updated. An epoch can be defined as the number of times the algorithm perceives the entire data-set. Hence, an epoch is completed when all the samples of the data have been perused. An iteration can be defined as the number of times a 'batch of data' has been passed through the algorithm. In the case of a multilayer perceptron, that means the forward pass and backward pass. Hence, an iteration is completed once a batch of data has passed   (Shen (2016), Svozil et al (1997)). We have used the Tensorflow and Keras libraries to implement our CNN and MLP models. We have used Adam optimization algorithm (Kingma and Ba (2014)) instead of the traditional stochastic gradient descent for updating the weights of the network (Michelucci (2018)).
We have used L2 regularization, also known as ridge regularization, to prevent our model from overfitting. In L2 regularization, we add a squared error term as a penalty to the loss function Goodfellow et al (2018).
The training of the network is done in the Google Cloud platform using a Tesla K80 GPU.

Results
We have used Hold-out cross validation with an 80-20 split for evaluating the performance of our models. And for the metrics, we have gone with Mean Absolute error (MAE), Root Mean squared error (RMSE), Mean Magnitude of the Relative Error (MMRE), R2 Score and Pearson Correlation. R2 Score and Pearson Correlation are commonly used statistical measures that compute the closeness between the real labels and the regression model's predicted values.
where, n: no of samples y: actual value yp: predicted value    The CNN and MLP models are trained and tested taking five different galactic coordinates at the center of the Mollweide projection and cropping images along the central horizontal line to test the isotropy of the CMB distribution. The loss convergence of a CNN model trained along the galactic coordinates (−30 • , 90 • ) is shown in figure 8. The results of the CNN model are tabulated in table 2, and the results of the MLP model are tabulated in table 3. We have also shown a scatter plot of predictions along the galactic coordinates (0 • , 0 • ) in figure 9.
Furthermore, to study the isotropy of our CMB samples, we have also presented the Relative error between different CNN models trained at different galactic coordinates in table 4.

Conclusion and future plans
The application of deep learning methodologies on the CMB data has steered us to the verification of two well-established results related to the CMB using a completely different approach via deep learning. Firstly, we were able to predict baryon density with a satiable accuracy. The loss of accuracy can be credited to the fact that the CMB temperature maps are of very high resolution and we were bounded by limited computational power but most importantly the relatively random orientation of the pixels considering the fact that CMB is a Gaussian random field was a major complexity for the neural network model to extract any applicable features from the maps. Secondly, when training and predicting with models trained along different galactic latitude and longitude we were able to get very low error between the predictions which reaffirms the well known isotropic nature of CMB at larger scales. Although the training accuracy is low in the domain of precision physics, the test error values are impressive considering the fact that we were limited by the amount of data and also computational power of training our model. We are not trying to compete with traditional well established techniques with the power spectrum, but rather we are proposing a new domain for the study of CMB and subsequently as a demonstration we have chosen baryon density for the same. Although, in the case of extracting baryon density directly from temperature map there is a loss of information but there could be some property of CMB which could be found using deep learning but not from power spectrum. We hope to develop a deep neural network architecture tailored for CMB maps that is capable of correlating random noise to cosmological parameters.