Nonparametric Representation of Neutron Star Equation of State Using Variational Autoencoder

We introduce a new nonparametric representation of the neutron star (NS) equation of state (EoS) by using the variational autoencoder (VAE). As a deep neural network, the VAE is frequently used for dimensionality reduction since it can compress input data to a low-dimensional latent space using the encoder component and then reconstruct the data using the decoder component. Once a VAE is trained, one can take the decoder of the VAE as a generator. We employ 100,000 EoSs that are generated using the nonparametric representation method based on \citet{2021ApJ...919...11H} as the training set and try different settings of the neural network, then we get an EoS generator (trained VAE's decoder) with four parameters. We use the mass\textendash{}tidal-deformability data of binary neutron star (BNS) merger event GW170817, the mass\textendash{}radius data of PSR J0030+0451, PSR J0740+6620, PSR J0437-4715, and 4U 1702-429, and the nuclear constraints to perform the joint Bayesian inference. The overall results of the analysis that includes all the observations are $R_{1.4}=12.59^{+0.36}_{-0.42}\,\rm km$, $\Lambda_{1.4}=489^{+114}_{-110}$, and $M_{\rm max}=2.20^{+0.37}_{-0.19}\,\rm M_\odot$ ($90\%$ credible levels), where $R_{1.4}$/$\Lambda_{1.4}$ are the radius/tidal-deformability of a canonical $1.4\,\rm M_\odot$ NS, and $M_{\rm max}$ is the maximum mass of a non-rotating NS. The results indicate that the implementation of the VAE techniques can obtain the reasonable results, while accelerate calculation by a factor of $\sim$ 3\textendash10 or more, compared with the original method.


INTRODUCTION
A neutron star (NS) has physical conditions that we can hardly achieve in terrestrial experiments. It can allow us to study the behavior of dense matter under extreme conditions (see Refs Lattimer 2012;Lattimer & Prakash 2016;Özel & Freire 2016;Oertel et al. 2017;Lattimer 2021 for reviews). The states of matter in a stable NS can be described using the so-called equation of state (EOS), i.e., the relationship between the pressure and the energy density at zero temperature. In low-and very high-density regions, the EOS is well understood (Baym et al. 2018), while between the two regions, there remain uncertainties.
Up to now, the NICER collaboration has reported two mass-radius (M − R) measurements of the isolated NS PSR J0030+0451 Miller et al. 2019) and the massive NS PSR J0740+6620 (Riley et al. 2021;Miller et al. 2021). These two measurements have been Corresponding author: yzfan@pmo.ac.cn used to constrain the NS EOS in many works (Bogdanov et al. 2019a,b;Raaijmakers et al. 2019;Jiang et al. 2020;Raaijmakers et al. 2021;Bogdanov et al. 2021;Tang et al. 2021). In addition to the M − R measurements, the well-known gravitational wave (GW) event from the binary neutron star (BNS) merger GW170817 (Abbott et al. 2017(Abbott et al. , 2019, which can be used to calculate the tidal deformability (Λ) of the NSs, has also inspired many studies about the NS EOS (Abbott et al. 2018;Annala et al. 2018;Fattoyev et al. 2018;Landry & Kumar 2018;Lim & Holt 2018;Most et al. 2018;Kumar & Landry 2019;Jiang et al. 2019). The phenomenological methods are commonly used for extracting information from various observations, which can be further divided into two categories, i.e., the parametric and nonparametric methods. The parametric method, for instance, the spectral expansion (Lindblom 2010) and the piecewise polytropes (Read et al. 2009;Özel et al. 2016;Raithel et al. 2017), have been proved to be useful in constraining NS EOS. However, the parametric method may significantly rely on the parametric form, resulting in a biased outcome due to misspecification. Therefore, we need a method that does not depend on a specific parametric form, i.e., the nonparametric method. The Gaussian process (GP) has been used as a nonparametric method (Landry & Essick 2019;Essick et al. 2020;Landry et al. 2020), but such a method is not easy to be incorporated by Bayesian inference with the Markov Chain Monte Carlo (MCMC) algorithm due to the nontrivial jump proposals (Titsias et al. 2011). In Han et al. (2021), we developed a nonparametric method via the feed-forward neural network (FFNN), and by using the sampling algorithm MultiNest we obtained the posterior distributions of EOS using the NS observations. To make the model nonparametric, we had to use 31 parameters in the FFNN; thus, the nonparametric method has far more parameters than the parametric method, which may increase the calculation cost and make the sampling algorithm hard to converge.
Deep learning has recently become a powerful method in astrophysical data analysis. Fujimoto et al. (2018Fujimoto et al. ( , 2020Fujimoto et al. ( , 2021 have developed a supervised learning method to constrain the NS EOS, where they used piecewise polytropes to represent the EOS. They took the squared sound speed c 2 s at corresponding pressure as the output of the network, and the mass, radius, and their variances as the input. Therefore, one can use the NS observations to get the parameters of the NS EOS via the trained network. Besides, Soma et al. (2022) have trained two networks, one is for generating the EOS (EOS Network), and the other one is trained to solve Tolman-Oppenheimer-Volkoff (TOV) equations (TOV-Solver Network), i.e., translate the EOS (p(ρ)) to the NS observations (M (R) and M (Λ) curves). Then the authors take the difference between the predicted quantities and the real observations as the loss function to train the network. Once the loss function converges, they can use the EOS Network to generate the desired EOS. However, both the above two methods are deterministic, and they estimate the uncertainties by just repeating the optimization procedure many times. As mentioned in the previous paragraph, the nonparametric method introduced in Han et al. (2021) combines the nonparametric representation of the EOS and the Bayesian inference, which can naturally handle the uncertainties. However, the high dimensionality of the parameters in such a method increases the difficulty of sampling.
In deep learning, the variational autoencoder (VAE) is a generative neural network that is commonly used for dimensionality reduction. The VAE and the other variants based on it have also been widely used in astronomy (Green et al. 2020;Bayley et al. 2022;Whittaker et al. 2022;Gabbard et al. 2022;Martínez-Palomera et al. 2022). In this work, we use the VAE to reduce the dimension of the parameter space in the nonparametric representation and use the Bayesian method to obtain the posterior distributions of the NS EOS parameters given the NS observations. In Sec. 2 we first review the nonparametric representation of the NS EOS in Han et al. (2021) and then introduce the architecture of the VAE and the training process. We summarize the observation data used in this work in Sec. 3 and present the results in Sec. 4. Finally, we give the summary and discussion in Sec. 5. In our previous work ), we introduced a nonparametric representation of NS EOS. Here we briefly recall the method in Han et al. (2021). In that work, the NS EOS can be described by an FFNN model with a single hidden layer, where ϕ is an auxiliary variable defined as ϕ = log c 2 dε dp − 1 .
In the above equations, p is the pressure, ε is the energy density, w 1i /w 2i are the weights parameters, b i (B) are bias parameters (B can also be considered as the overall residual), N is the number of the neural nodes (or the width of the network), and σ(·) stands for a nonlinear function (the so-called activation function). The activation function we choose here is the sigmoid function, which can guarantee the requirement of sigmoidal functions (for more details, see Cybenko 1989 andHan et al. 2021), i.e., σ(x) → 0(1) when x → −∞(+∞). In this work, we use a slightly different version of that model, which reads We take the rest-mass density ρ as the input variable and the squared sound speed c 2 s /c 2 = dp/dε as the output variable, instead of ϕ and p. One can easily find that σ(−ϕ) = c 2 s /c 2 , so using the squared sound speed with a sigmoid activation function as the output is almost the same as using the auxiliary variable ϕ as the output.
As for the input, the rest-mass density is more straightforward for applying the multiple constraints, e.g., the nuclear constraints that directly constrain the pressures at specific rest-mass densities. These variations in the input/output variables only have a little effect on the results, which could be negligible. Besides, there is another choice of the activation function in Eq. (4), the hyperbolic tangent function. The differences between these two activation functions have been discussed in the Appendix A of Han et al. (2023), and we do not discuss the influence of activation functions in this work. With the FFNN in hand, we can now make a representation of the NS EOS.

Variational AutoEncoder
The VAE is a deep generative model, and it is very similar to the autoencoder (AE), so we briefly introduce the AE first. A typical AE structure is shown in the left panel of Fig. 1, which consists of two parts: the encoder and the decoder. The goal of an encoder is to learn a mapping from the data x to a low-dimensional latent space Z, where the reduction of dimensionality can help us to compress the data and get a compact feature representation of them. And note that this is an unsupervised learning problem since we do not have labels in the training set. Therefore, by minimizing the reconstruction error between x andx, e.g., the mean square error L(x,x) = ∥x −x∥ 2 , we can learn the latent representation of the data by itself without any labels (that is why we call it AE, i.e., automatically encoding data). Nevertheless, there is a shortcoming in the AE. The encoder of AE is deterministic, which means that if we draw a random sample of the latent vector and put it into the decoder, we may not be able to get the desired result (i.e., a new sample that is not in the training set but similar to those in the training set).
The fundamental distinction between the VAE and AE models is that the VAE is now a probabilistic model rather than a deterministic one. The structure of VAE is shown in the right panel of Fig. 1, where we can see that the deterministic layer of AE's encoder is replaced by the so-called sampling layer, i.e., we compute the mean µ and the standard deviation σ from the encoder and draw a random sample ϵ from the standard normal distribution, then the latent variable z is computed by: ⃗ µ + ⃗ σ ⊙ ϵ. Therefore, the goal of the VAE can be described in a probabilistic manner as follows: the encoder is going to be trained to compute the probability distribution of the latent variable z given the input data x, i.e., q ϕ (z|x), while the decoder is going to take that learned latent representation and compute a new probability distribution of the input data x given the latent distribution of z, i.e., p θ (x|z). However, computing the q ϕ (z|x) analytically is impossible due to its high dimensionality, and using a numerical method like MCMC is too expensive in computation, so usually, we use a prior distribution to approximate the target distribution, i.e., the variational inference (VI). Thus, for approximating the target distribution, we need to reduce the difference between q ϕ (z|x) and the prior p(z). The difference between two distributions (Q(x) and P (x)) is usually measured by the KL divergence D KL , which is defined by Now we take a look at the loss function of the VAE, (6) The total loss function has two terms, the reconstruction error L rec and the regularization term L KL . The first term is just like the reconstruction error in AE, and the second term is the KL-divergence between the prior distribution p(z) (here the prior distribution is the standard normal distribution) and the target distribution q ϕ (z|x). The reason why we call L KL the regularization term is that it can encourage the encodings to distribute evenly in the center region of the latent space, and punish the network when it tries to "cheat" by clustering the points in specific regions (i.e., without the regularization term, the output deviation of the encoder σ is almost zero and the VAE degenerates to AE).

Training process
This section aims to train a VAE decoder, i.e., the EOS generator, which can generate EOSs from a lowdimensional latent space. Before the training, we need to generate the training set of the NS EOS. Note that the training set is just the set of the NS EOSs, which can be any physically realistic NS EOSs. In this work, we use the FFNN model based on Han et al. (2021) to generate the training set. We randomly draw the samples of the FFNN parameters, i.e., w 1i , w 2i , b i , and B (31 in total, and each parameter is uniformly sampled in (-5, 5)), and use the FFNN model to calculate the corresponding EOS. Once we get the EOS, we need to solve the TOV equations with the EOS, and this process is too complicated to be done analytically. Therefore, we usually solve it numerically with a tabulated EOS. The resolution of the EoS table is 128D, namely an NS EOS is represented by a 128D c 2 s (ρ) array, which is controlled by the 31 FFNN parameters. The rest-mass densities of tabulated points are logarithmically uniform in [∼ x denotes the data,x is the reconstruction of the data x, and z is the latent variable in latent space Z. In the right panel, µ and σ are the mean and standard deviation of the latent variable z, and ϵ is a random variable that follows the standard normal distribution.  Table 1. Hyper parameters of the VAE we used in this work. The latent layer consists of two parts: the first part is made of two dense layers that take the output of layer 4 as input, and output the mean and standard deviation of a multivariate normal distribution, and the second part is the so-called sampling layer that draws samples from a multivariate normal distribution whose mean and standard deviation are computed by the first part.
0.3ρ sat , 10ρ sat ]. We sample 100,000 EoSs from the priors, whose maximum masses satisfy the condition of M max ∈ (1.4, 3) M ⊙ . After the training set has been generated, we can then use it to train the neural network. We build a VAE neural network, whose architecture is shown in Tab. 2.3. We use Python package Keras (Chollet & others 2018) in TensorFlow (Abadi et al. 2016) and the Adam (Kingma & Ba 2014) optimizer, with a learning rate of 0.0001 and a batch size of 32. By minimizing the loss function defined in Eq. (6), we can get the trained VAE model. As for the choice of dimensionality of the latent space, we test several settings (1-32), and for each setting, we train the model for 200 epochs. From Fig. 2, we can see that at low dimensions (1-4), the loss of the model decreases rapidly as the dimension of the latent space increases; when the dimension of the latent space is larger than 4, the loss of model converges. Therefore, it is reasonable to use a 4D latent space in this work. Finally, once the VAE neural network has been trained, we can draw a random vector from the 4D standard normal distribution and then use the trained decoder to reconstruct the 128D EOS table. To summarize, we first use the FFNN model to generate the training set, which is to convert the 31D joint uniform distribution of the FFNN parameters, i.e., w 1i , w 2i , b i , and B, to the 128D joint distribution of the EOS parameters, i.e., the squared sound speed at corresponding rest-mass densities (128D) mentioned in the previous paragraph. We then train a VAE model with the training set, and take out the trained decoder part. This process is to convert the above 128D joint distribution of the EOS parameters (c 2 s (ρ)) to the 4D joint standard normal distribution of the latent variables, i.e., z 1 , z 2 , z 3 , and z 4 . As a result, we transform the nonparametric representation's prior of the NS EOS from a 31D uniform joint distribution to a 4D joint standard normal distribution. At the same time, it still contains the degrees of freedom of the model (compared to the parametric models). Now we can use just 4 parameters to control an NS EOS.

OBSERVATIONS
Complementary to terrestrial nuclear experiments, NS observations can be used to constrain the EOS of matter at supranuclear density. Recently, the radius measurements of PSR J0740+6620, the heaviest pulsar known, have been obtained by the scientific team of NICER. With the extra information from radio timing (Fonseca et al. 2021) and XMM-Newton spectroscopy, the radius of this massive NS was inferred by the pulse profile modeling of the hotspot's light-curve, which is 12.39 +1.30 −0.98 km (by Riley et al. 2021) or 13.7 +2.6 −1.5 km (by Miller et al. 2021) at 68% credible level. In comparison to the first results obtained by NICER in 2019, i.e., the simultaneous mass-radius measurement of the isolated NS PSR J0030+0451 Miller et al. 2019), the massive pulsar PSR J0740+6620 shares almost the same radius with PSR J0030+0451, though their masses differ > 50% from each other. Since more massive NS generally has a larger central density, such measurements allow us to probe the EOS at densities much higher than those based on previous NS observations. Meanwhile, the very nearby pulsar PSR J0437-4715, whose mass (∼ 1.44 M ⊙ ) is determined by the reliable timing analyses (Reardon et al. 2016), is one of the prime targets for NICER . The radius of this object has been updated in González-Caniulef et al. (2019) and will be directly tested by the dedicated NICER observations in the near future. Besides, via the so-called cooling-tail method, the mass−radius measurement of 4U 1702-429 was obtained with small uncertainty by Nättilä et al. (2017). The tidal deformability measurement of the landmark event GW170817 (Abbott et al. 2017(Abbott et al. , 2019, originating from the merger of two NSs, has also given us a large opportunity to study the EOS (Abbott et al. 2018). Therefore, we use all of the observation data discussed above to perform joint analysis, which includes the tidal-deformability measurements from GW170817, and the mass−radius measurements of PSR J0030+0451, PSR J0740+6620, PSR J0437-4715, and 4U 1702-429. To simplify, we use D 1 to stand for the data set containing GW170817, PSR J0030+0451, and PSR J0740+6620 data, and D 2 to denote the data set that contains all of the observation data discussed above.
Supposing that all NSs share the same EOS, we can take the following likelihood to constrain the EOS parameters ⃗ θ EOS by performing Bayesian inference with Bilby (Ashton et al. 2019) and PyMultiNest (Buchner et al. 2014) packages. The L Nuc ( ⃗ θ EOS ) is the likelihood of the nuclear constraints, which have also been used in Han et al. (2021). The likelihood is 1 when all the nuclear constraints are satisfied, else 0, where the nuclear constraints are 3.12 × 10 33 dyn cm −2 ≤ p(ρ sat ) ≤ 4.70 × 10 33 dyn cm

RESULTS
After using the VAE to represent the NS EOS, the EOS can be described by only 4 parameters. The direct results of the Bayesian inference are the posteriors of the latent variables, i.e., z 1 , z 2 , z 3 , and z 4 . These latent variables are also called hidden variables because they do not have any direct relations to the reconstructed data. Thus, we need to convert the above results (latent variables) to data (the EOS tables) that we can understand using the trained VAE decoder, i.e., decoding. The posterior distributions of the latent variables and the bulk properties of NS are shown in Fig. 3. We find that the radius and tidal deformability of a canonical 1.4M ⊙ NS have a strong correlation, and the data set D 1 (D 2 ) gives R 1.4 = 12.14 +0.94 −0.93 km (12.59 +0.36 −0.42 km) and Λ 1.4 = 374 +283 −160 (489 +114 −110 ) at 90% credible level. Except for the parameter z 2 , the other three latent vari-   ables' posteriors are obviously different from their priors, which means that the observation data are informative. And for the variable z 4 , the results of data sets D 1 and D 2 show a little difference, while for the other three latent variables, the results have no apparent discrepancies between the two data sets. Interestingly, the variables z 1 − z 2 show a correlation in the joint distribution. And the variables z 3 and z 4 are correlated with the maximum mass of nonrotating NS. The maximum mass M max is constrained to be M max = 2.26 +0.39 −0.23 M ⊙ (2.20 +0.37 −0.19 M ⊙ ) for data set D 1 (D 2 ), which is consistent with some previous results (Shao et al. 2020;Tang et al. 2021;Nathanail et al. 2021).
After the decoding, we can now discuss the results of the EOS directly. To illustrate the efficacy of the VAE method, we also perform the Bayesian inference directly to the FFNN model, which is controlled by 31 parameters. The likelihood function is the same for these two methods; thus we can directly compare these results. In the upper panels of Fig. 4, we can see that at most densities regimes the results of the FFNN model (solid lines) are almost the same as that of the VAE model (dashed lines). The consistencies of the reconstructed Λ−M and M − R (see the lower panels) are even more remarkable. This indicates that the VAE approaches have been implemented very successfully because we can utilize the  VAE model, which only has four parameters, to produce the same outcomes as the FFNN model, which has 31 parameters, while spending less time. The use of VAE techniques in this work has the potential to accelerate calculation times by a factor of ∼ 3 or more. With the data set D 1 (D 2 ), the FFNN model requires 3.87 (45.28) hr, whereas the VAE model requires only 1.27 (18.22) hr. All these calculations are performed in one compute node with 128 cores. In some cases, the enhancement is even more efficient. Without incorporating the nuclear constraints, for the data set D 1 , the calculation of the VAE model can be more than 10 times faster than the FFNN model. Nevertheless, we find that in a high-density region (i.e., ≳ 4ρ sat ), the constraint on the EOS with the VAE model is less "stringent" than that of the FFNN model (see the upper left panel of Fig. 4). The same happens in the upper right panel of Fig. 4. This difference is likely caused by the lack of effective probe at such high densities. The magenta regions in the upper panels of Fig. 4 represent the central density of PSR J0740+6620, the most massive neutron star that has been accurately measured so far. Clearly, even for such a massive compact object, the central density can only reach ∼ 4ρ sat , above which the EOS cannot be effectively constrained. In view of the above facts, we conclude that the VAE model can yield reasonable results efficiently.

SUMMARY
In this work, based on Han et al. (2021) we develop a new Bayesian nonparametric method for studying the NS EOS. We use the deep neural network VAE to reduce the number of parameters that represent the EoS. By comparing different settings of the network, we find that a VAE with 4D latent space is a proper choice for the representation of EOS. After the training process, we get a trained decoder network. Then we draw a random vector from a 4D standard normal distribution and use the decoder to convert it to the reconstructed 128D EOS table, i.e., we can represent the NS EOS using only four parameters. We perform Bayesian inference to infer the EOS posteriors using the NS observations, i.e., the M −R measurements of PSR J0030+0451, PSR J0740+6620, PSR J0437-4715, and 4U 1702-429, as well as the M − Λ measurements of GW170817. Sampling from the posteriors of the latent variables with numerical sampling algorithm PyMultiNest, we compute the EOS tables using the trained decoder, and by numerically integrating the TOV equations we finally get the macroscopic properties of NS we are interested in. The radius and tidal deformability of a canonical 1.4 M ⊙ NS are constrained to be R 1.4 = 12.14 +0.94 −0.93 km (12.59 +0.36 −0.42 km) and Λ 1.4 = 374 +283 −160 (489 +114 −110 ) at 90% credible level for data set D 1 (D 2 ), respectively. Besides, the maximum mass of a nonrotating NS M max is constrained to be M max = 2.26 +0.39 −0.23 (2.20 +0.37 −0.19 M ⊙ ) for data set D 1 (D 2 ). As for the latent variables, we find that except for z 2 , all the latent variables are well constrained, and some show correlations with each other or the macroscopic properties.
Though we use only four parameters to represent the NS EOS with the VAE neural network, it still maintains the nonparametric feature. This dimensionality reduction process makes a significant development in the Bayesian nonparametric inference of NS EoS because it can dramatically reduce the dimension of the parame-ter space and effectively reduce the difficulty and time during the sampling. Quantitively, the VAE techniques can accelerate calculation by a factor of ∼ 3-10 or more. Nevertheless, there are still some aspects to be improved in future work. As mentioned in Sec. 4, the latent variables are hidden variables that do not have any direct relations to the EOS parameters or the macroscopic properties. However, the posteriors obtained in this work do show a few correlations. Thus, we can further investigate the relationship between the latent variables and the parameters we are interested in, i.e., disentangle the latent variables. Though we have tried different hyperparameters of the network to find the proper setting of the VAE, the compress process may still lead to the loss of information. Therefore, in future works, one can still enhance the efficiency of the representation while maintaining accuracy. Besides, in the lowand very high-density regions, we can also incorporate the constraints from chiral effective field theory (Essick et al. 2021) and perturbative quantum chromodynamics (Gorda et al. 2022).