Nonparametric Representation of Neutron Star Equation of State Using Variational Autoencoder

Ming-Zhe Han; Shao-Peng Tang; Yi-Zhong Fan

doi:10.3847/1538-4357/acd050

1. Introduction

A neutron star (NS) has physical conditions that we can hardly achieve in terrestrial experiments. It can allow us to study the behavior of dense matter under extreme conditions (see Lattimer 2012; Özel & Freire 2016; Lattimer & Prakash 2016; Oertel et al. 2017; Lattimer 2021 for reviews). The states of matter in a stable NS can be described using the so-called equation of state (EOS), i.e., the relationship between pressure and energy density at zero temperature. In low- and very high-density regions, the EOS is well understood (Baym et al. 2018), while between these two regions, there remain uncertainties.

Up to now, the NICER collaboration has reported two mass–radius (M – R) measurements of the isolated NS PSR J0030+0451 (Miller et al. 2019; Riley et al. 2019) and the massive NS PSR J0740+6620 (Miller et al. 2021; Riley et al. 2021). These two measurements have been used to constrain the NS EOS in many works (Bogdanov et al. 2019a, 2019b; Raaijmakers et al. 2019; Jiang et al. 2020; Bogdanov et al. 2021; Raaijmakers et al. 2021; Tang et al. 2021). In addition to the M–R measurements, the well-known gravitational wave (GW) event from the binary neutron star (BNS) merger GW170817 (Abbott et al. 2017, 2019), which can be used to calculate the tidal deformability (Λ) of NSs, has also inspired many studies about the NS EOS (Abbott et al. 2018; Annala et al. 2018; Fattoyev et al. 2018; Landry & Kumar 2018; Lim & Holt 2018; Most et al. 2018; Jiang et al. 2019; Kumar & Landry 2019). These phenomenological methods are commonly used for extracting information from various observations, which can be further divided into two categories, i.e., the parametric and nonparametric methods. The parametric methods, for instance, the spectral expansion (Lindblom 2010) and the piecewise polytropes (Read et al. 2009; Özel et al. 2016; Raithel et al. 2017), have been proved to be useful in constraining the NS EOS. However, the parametric method may significantly rely on the parametric form, resulting in a biased outcome due to misspecification. Therefore, we need a method that does not depend on a specific parametric form, i.e., the nonparametric method. The Gaussian process (GP) has been used as a nonparametric method (Landry & Essick 2019; Essick et al. 2020; Landry et al. 2020), but such a method is not easy to incorporate by Bayesian inference with the Markov Chain Monte Carlo (MCMC) algorithm due to the nontrivial jump proposals (Titsias et al. 2011). In Han et al. (2021), we developed a nonparametric method via the feed-forward neural network (FFNN), and by using the sampling algorithm MultiNest we obtained the posterior distributions of the EOS using the NS observations. To make the model nonparametric, we had to use 31 parameters in the FFNN; thus, the nonparametric method has far more parameters than the parametric method, which may increase the calculation cost and make the sampling algorithm hard to converge.

Deep learning has recently become a powerful method in astrophysical data analysis. Fujimoto et al. (2018, 2020, 2021) have developed a supervised learning method to constrain the NS EOS, where they used piecewise polytropes to represent the EOS. They took the squared sound speed ${c}_{{\rm{s}}}^{2}$ at corresponding pressure as the output of the network, and the mass, radius, and their variances as the input. Therefore, one can use the NS observations to get the parameters of the NS EOS via the trained network. Besides, Soma et al. (2022) have trained two networks: one is for generating the EOS (EOS Network), and the other one is trained to solve Tolman–Oppenheimer–Volkoff (TOV) equations (TOV-Solver Network), i.e., translate the EOS (p(ρ)) to the NS observations (M(R) and M(Λ) curves). Then the authors take the difference between the predicted quantities and the real observations as the loss function to train the network. Once the loss function converges, they can use the EOS Network to generate the desired EOS. However, both the above two methods are deterministic, and they estimate the uncertainties by just repeating the optimization procedure many times. As mentioned in the previous paragraph, the nonparametric method introduced in Han et al. (2021) combines the nonparametric representation of the EOS and the Bayesian inference, which can naturally handle the uncertainties. However, the high dimensionality of the parameters in such a method increases the difficulty of sampling.

In deep learning, the variational autoencoder (VAE) is a generative neural network that is commonly used for dimensionality reduction. The VAE and the other variants based on it have also been widely used in astronomy (Green et al. 2020; Bayley et al. 2022; Gabbard et al. 2022; Martínez-Palomera et al. 2022; Whittaker et al. 2022). In this work, we use the VAE to reduce the dimension of the parameter space in the nonparametric representation and use the Bayesian method to obtain the posterior distributions of the NS EOS parameters given the NS observations. In Section 2 we first review the nonparametric representation of the NS EOS in Han et al. (2021) and then introduce the architecture of the VAE and the training process. We summarize the observation data used in this work in Section 3 and present the results in Section 4. Finally, we give the summary and discussion in Section 5.

2. Method

2.1. Feed-forward Neural Network

In our previous work (Han et al. 2021), we introduced a nonparametric representation of the NS EOS. Here we briefly recall the method in Han et al. (2021). In that work, the NS EOS can be described by an FFNN model with a single hidden layer,

$\begin{eqnarray}&&\phi =\displaystyle \sum _{i=1}^{N}{w}_{2i}\sigma ({w}_{1i}\mathrm{log}p+{b}_{i})+B,\end{eqnarray} \tag{ 1 }$

where ϕ is an auxiliary variable defined as

$\begin{eqnarray}&&\phi =\mathrm{log}\left({c}^{2}\displaystyle \frac{d\varepsilon }{{dp}}-1\right).\end{eqnarray} \tag{ 2 }$

In the above equations, p is the pressure, ε is the energy density, w_1i/w_2i are the weights parameters, b_i (B) are bias parameters (B can also be considered as the overall residual), N is the number of the neural nodes (or the width of the network), and σ(· ) stands for a nonlinear function (the so-called activation function). The activation function we choose here is the sigmoid function,

$\begin{eqnarray}&&\sigma (x)=\displaystyle \frac{1}{1+{e}^{-x}},\end{eqnarray} \tag{ 3 }$

which can guarantee the requirement of sigmoidal functions (for more details, see Cybenko 1989 and Han et al. 2021), i.e., σ(x) → 0(1) when x → −∞ (+∞ ). In this work, we use a slightly different version of that model, which reads

$\begin{eqnarray}&&{c}_{s}^{2}{/c}^{2}=\sigma \left(\displaystyle \sum _{i=1}^{N}{w}_{2i}\sigma ({w}_{1i}\mathrm{log}\rho +{b}_{i})+B\right).\end{eqnarray} \tag{ 4 }$

We take the rest-mass density ρ as the input variable and the squared sound speed ${c}_{{\rm{s}}}^{2}/{c}^{2}={dp}/d\varepsilon$ as the output variable, instead of ϕ and p. One can easily find that $\sigma (-\phi )={c}_{s}^{2}/{c}^{2}$ , so using the squared sound speed with a sigmoid activation function as the output is almost the same as using the auxiliary variable ϕ as the output. As for the input, the rest-mass density is more straightforward for applying the multiple constraints, e.g., the nuclear constraints that directly constrain the pressures at specific rest-mass densities. These variations in the input/output variables only have a little effect on the results, which could be negligible. Besides, there is another choice of the activation function in Equation (4), the hyperbolic tangent function. The differences between these two activation functions have been discussed in Appendix A of Han et al. (2022), and we do not discuss the influence of activation functions in this work. With the FFNN in hand, we can now make a representation of the NS EOS.

2.2. Variational Autoencoder

The VAE is a deep generative model, and it is very similar to the autoencoder (AE), so we briefly introduce the AE first. A typical AE structure is shown in the left panel of Figure 1, which consists of two parts: the encoder and the decoder. The goal of an encoder is to learn a mapping from the data x to a low-dimensional latent space Z, where the reduction of dimensionality can help us to compress the data and get a compact feature representation of them. And note that this is an unsupervised learning problem since we do not have labels in the training set. Therefore, by minimizing the reconstruction error between x and $\hat{x}$ , e.g., the mean square error $L(x,\hat{x})=\parallel x-\hat{x}{\parallel }^{2}$ , we can learn the latent representation of the data by itself without any labels (that is why we call it AE, i.e., automatically encoding data). Nevertheless, there is a shortcoming in the AE. The encoder of AE is deterministic, which means that if we draw a random sample of the latent vector and put it into the decoder, we may not be able to get the desired result (i.e., a new sample that is not in the training set but similar to those in the training set).

**Figure 1.** Structure of the two neural networks. The left panel is the autoencoder, while the right is the variational autoencoder. x denotes the data, $\hat{x}$ is the reconstruction of the data x, and z is the latent variable in latent space Z. In the right panel, μ and σ are the mean and standard deviation of the latent variable z, and is a random variable that follows the standard normal distribution.
Download figure:
Standard image High-resolution image

**Figure 1.** Structure of the two neural networks. The left panel is the autoencoder, while the right is the variational autoencoder. x denotes the data, $\hat{x}$ is the reconstruction of the data x, and z is the latent variable in latent space Z. In the right panel, μ and σ are the mean and standard deviation of the latent variable z, and is a random variable that follows the standard normal distribution.
Download figure:
Standard image High-resolution image

The fundamental distinction between the VAE and AE models is that the VAE is now a probabilistic model rather than a deterministic one. The structure of the VAE is shown in the right panel of Figure 1, where we can see that the deterministic layer of the AE's encoder is replaced by the so-called sampling layer, i.e., we compute the mean μ and the standard deviation σ from the encoder and draw a random sample from the standard normal distribution, then the latent variable z is computed by: μ + σ ⊙ . Therefore, the goal of the VAE can be described in a probabilistic manner as follows: the encoder is going to be trained to compute the probability distribution of the latent variable z given the input data x, i.e., q_ϕ(z∣x), while the decoder is going to take that learned latent representation and compute a new probability distribution of the input data x given the latent distribution of z, i.e., p_θ(x∣z). However, computing the q_ϕ(z∣x) analytically is impossible due to its high dimensionality, and using a numerical method like MCMC is too expensive in computation, so usually, we use a prior distribution to approximate the target distribution, i.e., the variational inference. Thus, for approximating the target distribution, we need to reduce the difference between q_ϕ(z∣x) and the prior p(z). The difference between two distributions (Q(x) and P(x)) is usually measured by the KL divergence D_KL, which is defined by

$\begin{eqnarray}&&{D}_{\mathrm{KL}}(Q(x)\parallel P(x))=\int Q(x)\mathrm{log}\displaystyle \frac{Q(x)}{P(x)}{dx}.\end{eqnarray} \tag{ 5 }$

Now we take a look at the loss function of the VAE,

$\begin{eqnarray}\begin{array}{rcl}{L}_{\mathrm{total}} & = & {L}_{\mathrm{rec}}+{L}_{\mathrm{KL}}=\parallel x-\hat{x}{\parallel }^{2}\\ & & -\,\displaystyle \frac{1}{2}\displaystyle \sum _{{\rm{j}}=0}^{{\rm{k}}-1}({\mu }_{{\rm{j}}}^{2}+{\sigma }_{{\rm{j}}}-1-\mathrm{log}({\sigma }_{{\rm{j}}})).\end{array}\end{eqnarray} \tag{ 6 }$

The total loss function has two terms: the reconstruction error L_rec and the regularization term L_KL. The first term is just like the reconstruction error in AE, and the second term is the KL-divergence between the prior distribution p(z) (here the prior distribution is the standard normal distribution) and the target distribution q_ϕ(z∣x). The reason why we call L_KL the regularization term is that it can encourage the encodings to distribute evenly in the center region of the latent space, and punish the network when it tries to "cheat" by clustering the points in specific regions (i.e., without the regularization term, the output deviation of the encoder σ is almost zero and the VAE degenerates to AE).

2.3. Training Process

This section aims to train a VAE decoder, i.e., the EOS generator, which can generate EOSs from a low-dimensional latent space. Before the training, we need to generate the training set of the NS EOS. Note that the training set is just the set of the NS EOSs, which can be any physically realistic NS EOSs. In this work, we use the FFNN model based on Han et al. (2021) to generate the training set. We randomly draw samples of the FFNN parameters, i.e., w_1i, w_2i, b_i, and B (31 in total, and each parameter is uniformly sampled in (−5, 5)), and use the FFNN model to calculate the corresponding EOS. Once we get the EOS, we need to solve the TOV equations with the EOS, and this process is too complicated to be done analytically. Therefore, we usually solve it numerically with a tabulated EOS. The resolution of the EOS table is 128D, namely an NS EOS is represented by a 128D ${c}_{s}^{2}(\rho )$ array, which is controlled by the 31 FFNN parameters. The rest-mass densities of tabulated points are logarithmically uniform in [∼0.3ρ_sat, 10ρ_sat]. We sample 100,000 EOSs from the priors, whose maximum masses satisfy the condition of ${M}_{\max }\in (1.4,3)\,{M}_{\odot }$ .

After the training set has been generated, we can then use it to train the neural network. We build a VAE neural network, whose architecture is shown in Table 1. We use the Python package Keras (Chollet et al. 2018) in TensorFlow (Abadi et al. 2016) and the Adam (Kingma & Ba 2014) optimizer, with a learning rate of 0.0001 and a batch size of 32. By minimizing the loss function defined in Equation (6), we can get the trained VAE model. As for the choice of dimensionality of the latent space, we test several settings (1–32), and for each setting, we train the model for 200 epochs. From Figure 2, we can see that at low dimensions (1–4), the loss of the model decreases rapidly as the dimension of the latent space increases; when the dimension of the latent space is larger than 4, the loss of model converges. Therefore, it is reasonable to use a 4D latent space in this work. Finally, once the VAE neural network has been trained, we can draw a random vector from the 4D standard normal distribution and then use the trained decoder to reconstruct the 128D EOS table.

**Figure 2.** Losses after training for 200 epochs with different settings for the dimensionality of latent space.
Download figure:
Standard image High-resolution image

Table 1. Hyperparameters of the VAE Used in This Work

Layer	Type	Number of Neurons	Activation Function
Input	⋯	128	⋯
Layer 1	Dense	64	ReLu
Layer 2	Dense	64	ReLu
Layer 3	Dense	32	ReLu
Layer 4	Dense	32	ReLu
Latent Layer	Lambda	4	⋯
Layer 5	Dense	32	ReLu
Layer 6	Dense	32	ReLu
Layer 7	Dense	64	ReLu
Layer 8	Dense	64	ReLu
Output	Dense	128	Sigmoid

Note. The latent layer consists of two parts: the first part is made of two dense layers that take the output of layer 4 as input, and output the mean and standard deviation of a multivariate normal distribution, and the second part is the so-called sampling layer that draws samples from a multivariate normal distribution whose mean and standard deviation are computed by the first part.

Download table as: ASCII Typeset image

To summarize, we first use the FFNN model to generate the training set, which is to convert the 31D joint uniform distribution of the FFNN parameters, i.e., w_1i, w_2i, b_i, and B, to the 128D joint distribution of the EOS parameters, i.e., the squared sound speed at corresponding rest-mass densities (128D) mentioned in the previous paragraph. We then train a VAE model with the training set, and take out the trained decoder part. This process is to convert the above 128D joint distribution of the EOS parameters ( ${c}_{s}^{2}(\rho )$ ) to the 4D joint standard normal distribution of the latent variables, i.e., z₁, z₂, z₃, and z₄. As a result, we transform the nonparametric representation's prior of the NS EOS from a 31D uniform joint distribution to a 4D joint standard normal distribution. At the same time, it still contains the degrees of freedom of the model (compared to the parametric models). Now we can use just four parameters to control an NS EOS.

3. Observations

Complementary to terrestrial nuclear experiments, NS observations can be used to constrain the EOS of matter at supranuclear density. Recently, the radius measurements of PSR J0740+6620, the heaviest pulsar known, have been obtained by the scientific team of NICER. With the extra information from radio timing (Fonseca et al. 2021) and XMM-Newton spectroscopy, the radius of this massive NS was inferred by the pulse profile modeling of the hotspot's light curve, which is ${12.39}_{-0.98}^{+1.30}\,\mathrm{km}$ (by Riley et al. 2021) or ${13.7}_{-1.5}^{+2.6}\,\mathrm{km}$ (by Miller et al. 2021) at 68% credible level. In comparison to the first results obtained by NICER in 2019, i.e., the simultaneous mass–radius measurement of the isolated NS PSR J0030+0451 (Miller et al. 2019; Riley et al. 2019), the massive pulsar PSR J0740+6620 shares almost the same radius with PSR J0030+0451, though their masses differ >50% from each other. Since the more massive NS generally has a larger central density, such measurements allow us to probe the EOS at densities much higher than those based on previous NS observations. Meanwhile, the very nearby pulsar PSR J0437-4715, whose mass (∼1.44 M_⊙) is determined by reliable timing analyses (Reardon et al. 2016), is one of the prime targets for NICER (Guillot et al. 2019). The radius of this object has been updated in González-Caniulef et al. (2019) and will be directly tested by dedicated NICER observations in the near future. Besides, via the so-called cooling-tail method, the mass−radius measurement of 4U 1702-429 was obtained with small uncertainty by Nättilä et al. (2017). The tidal-deformability measurement of the landmark event GW170817 (Abbott et al. 2017, 2019), originating from the merger of two NSs, has also given us a large opportunity to study the EOS (Abbott et al. 2018). Therefore, we use all of the observation data discussed above to perform joint analysis, which includes the tidal-deformability measurements from GW170817, and the mass−radius measurements of PSR J0030+0451, PSR J0740+6620, PSR J0437-4715, and 4U 1702-429. To simplify, we use ${{ \mathcal D }}_{1}$ to stand for the data set containing GW170817, PSR J0030+0451, and PSR J0740+6620 data, and ${{ \mathcal D }}_{2}$ to denote the data set that contains all of the observation data discussed above.

Supposing that all NSs share the same EOS, we can take the following likelihood

$\begin{eqnarray}\begin{array}{rcl}{ \mathcal L } & = & {{ \mathcal L }}_{\mathrm{GW}}(d| {{\boldsymbol{\theta }}}_{\mathrm{GW}})\times \displaystyle \prod _{i}{{ \mathcal P }}_{i}(M({{\boldsymbol{\theta }}}_{\mathrm{EOS}},{h}_{i}),R({{\boldsymbol{\theta }}}_{\mathrm{EOS}},{h}_{i}))\\ & & \times \,{{ \mathcal L }}_{\mathrm{Nuc}}({{\boldsymbol{\theta }}}_{\mathrm{EOS}})\end{array}\end{eqnarray} \tag{ 7 }$

to constrain the EOS parameters θ _EOS by performing Bayesian inference with Bilby (Ashton et al. 2019) and PyMultiNest (Buchner et al. 2014) packages. The ${{ \mathcal L }}_{\mathrm{Nuc}}({{\boldsymbol{\theta }}}_{\mathrm{EOS}})$ is the likelihood of the nuclear constraints, which have also been used in Han et al. (2021). The likelihood is 1 when all the nuclear constraints are satisfied, else 0, where the nuclear constraints are 3.12 × 10³³ dyn cm⁻² ≤ p(ρ_sat) ≤ 4.70 × 10³³dyn cm⁻² (Lattimer & Steiner 2014; Tews et al. 2017; Jiang et al. 2019) and p(1.85ρ_sat) ≥ 1.21 × 10³⁴dyn cm⁻² (Özel et al. 2016). Since the results obtained by using data of Riley et al. (2019)/Riley et al. (2021) and Miller et al. (2019)/Miller et al. (2021) are nearly consistent with each other (Tang et al. 2021), we only use the data of Riley et al. (2019) for PSR J0030+0451³ and Riley et al. (2021) for PSR J0740+6620.⁴ For GW170817, we take the interpolated marginalized likelihood from Hernandez Vivanco et al. (2020) into analysis, which shows good consistency with the original GW data. For mass–radius measurements, we use the Gaussian kernel density estimation of the publicly distributed posterior samples of mass and radius to build the likelihood (see Tang et al. 2021 for more details).

4. Results

After using the VAE to represent the NS EOS, the EOS can be described by only four parameters. The direct results of the Bayesian inference are the posteriors of the latent variables, i.e., z₁, z₂, z₃, and z₄. These latent variables are also called hidden variables because they do not have any direct relations to the reconstructed data. Thus, we need to convert the above results (latent variables) to data (the EOS tables) that we can understand using the trained VAE decoder, i.e., by decoding. The posterior distributions of latent variables and the bulk properties of NS are shown in Figure 3. We find that the radius and tidal deformability of a canonical 1.4 M_⊙ NS have a strong correlation, and the data set ${{ \mathcal D }}_{1}$ ( ${{ \mathcal D }}_{2}$ ) gives ${R}_{1.4}={12.14}_{-0.93}^{+0.94}\,\mathrm{km}$ ( ${12.59}_{-0.42}^{+0.36}\,\mathrm{km}$ ) and ${{\rm{\Lambda }}}_{1.4}={374}_{-160}^{+283}$ ( ${489}_{-110}^{+114}$ ) at 90% credible level. Except for the parameter z₂, the other three latent variables' posteriors are obviously different from their priors, which means that the observation data are informative. And for the variable z₄, the results of data sets ${{ \mathcal D }}_{1}$ and ${{ \mathcal D }}_{2}$ show a little difference, while for the other three latent variables, the results have no apparent discrepancies between the two data sets. Interestingly, the variables z₁–z₂ show a correlation in the joint distribution. And the variables z₃ and z₄ are correlated with the maximum mass of nonrotating NS. The maximum mass ${M}_{\max }$ is constrained to be ${M}_{\max }={2.26}_{-0.23}^{+0.39}{M}_{\odot }$ ( ${2.20}_{-0.19}^{+0.37}{M}_{\odot }$ ) for data set ${{ \mathcal D }}_{1}$ ( ${{ \mathcal D }}_{2}$ ), which is consistent with some previous results (Shao et al. 2020; Nathanail et al. 2021; Tang et al. 2021).

**Figure 3.** Posteriors of the latent variables and bulk properties of NS. The red curves and blue curves are the results of the data set ${{ \mathcal D }}_{1}$ and ${{ \mathcal D }}_{2}$ , respectively. The gray curves are the priors of the latent variables, which are all standard normal distributions. The priors of bulk properties of NS are almost uniform. All the results are at 90% credible level.
Download figure:
Standard image High-resolution image

**Figure 3.** Posteriors of the latent variables and bulk properties of NS. The red curves and blue curves are the results of the data set ${{ \mathcal D }}_{1}$ and ${{ \mathcal D }}_{2}$ , respectively. The gray curves are the priors of the latent variables, which are all standard normal distributions. The priors of bulk properties of NS are almost uniform. All the results are at 90% credible level.
Download figure:
Standard image High-resolution image

After the decoding, we can now discuss the results of the EOS directly. To illustrate the efficacy of the VAE method, we also perform the Bayesian inference directly to the FFNN model, which is controlled by 31 parameters. The likelihood function is the same for these two methods; thus we can directly compare these results. In the upper panels of Figure 4, we can see that at most density regimes the results of the FFNN model (solid lines) are almost the same as that of the VAE model (dashed lines). The consistencies of the reconstructed Λ − M and M–R (see the lower panels) are even more remarkable. This indicates that the VAE approaches have been implemented very successfully because we can utilize the VAE model, which only has four parameters, to produce the same outcomes as the FFNN model, which has 31 parameters, while spending less time. The use of VAE techniques in this work has the potential to accelerate calculation times by a factor of ∼3 or more. With the data set ${{ \mathcal D }}_{1}$ ( ${{ \mathcal D }}_{2}$ ), the FFNN model requires 3.87 (45.28) hr, whereas the VAE model requires only 1.27 (18.22) hr. All these calculations are performed in one compute node with 128 cores. In some cases, the enhancement is even more efficient. Without incorporating the nuclear constraints, for the data set ${{ \mathcal D }}_{1}$ , the calculation of the VAE model can be more than 10 times faster than the FFNN model. Nevertheless, we find that in a high-density region (i.e., ≳ 4ρ_sat), the constraint on the EOS with the VAE model is less "stringent" than that of the FFNN model (see the upper left panel of Figure 4). The same happens in the upper right panel of Figure 4. This difference is likely caused by the lack of an effective probe at such high densities. The magenta regions in the upper panels of Figure 4 represent the central density of PSR J0740+6620, the most-massive NS that has been accurately measured so far. Clearly, even for such a massive compact object, the central density can only reach ∼4ρ_sat, above which the EOS cannot be effectively constrained. In view of the above facts, we conclude that the VAE model can yield reasonable results efficiently.

**Figure 4.** Posterior distributions for the pressure (p) vs. the rest-mass density (ρ) (upper left), the squared speed of sound divided by the squared light speed in vacuum ${c}_{{\rm{s}}}^{2}/{c}^{2}$ vs. ρ (upper right), the dimensionless tidal deformability (Λ) vs. mass (M) (lower left), and M vs. the radius (R) (lower right). All of the uncertainty regions are at 90% credible level. The prior is shown by black dashed–dotted lines. The result of using data set ${{ \mathcal D }}_{1}$ and ${{ \mathcal D }}_{2}$ are shown with the blue and orange lines, respectively. For comparison, we also draw the results obtained by using a similar method to that of Han et al. (2021; dashed lines). i.e., the FFNN model. The black vertical lines in the upper panels denote several typical densities (1ρ_sat, 2ρ_sat, 4ρ_sat, and 8ρ_sat, where ρ_sat is the nuclear saturation density), and the vertical magenta regions are the central density of PSR J0740+6620. Besides, the horizontal dashed black line in the panel of ${c}_{{\rm{s}}}^{2}/{c}^{2}(\rho )$ (upper right) is the conformal limit, i.e., ${c}_{{\rm{s}}}^{2}/{c}^{2}=1/3$ , while the black straight lines in the lower panels stand for 1.4 M_⊙. The M–R measurements (at 68.3% credible level) of PSR J0030+0451, PSR J0740+6620, PSR J0437-4715, and 4U 1702-429 are represented by the blue dotted–dashed contour, magenta dotted–dashed contour, purple error bar, and gray area, respectively. The red and green dotted–dashed contours represent the M–R posteriors of the NS associated with GW170817 (adopted from the right panel of Figure 3 of Abbott et al. 2018).
Download figure:
Standard image High-resolution image

**Figure 4.** Posterior distributions for the pressure (p) vs. the rest-mass density (ρ) (upper left), the squared speed of sound divided by the squared light speed in vacuum ${c}_{{\rm{s}}}^{2}/{c}^{2}$ vs. ρ (upper right), the dimensionless tidal deformability (Λ) vs. mass (M) (lower left), and M vs. the radius (R) (lower right). All of the uncertainty regions are at 90% credible level. The prior is shown by black dashed–dotted lines. The result of using data set ${{ \mathcal D }}_{1}$ and ${{ \mathcal D }}_{2}$ are shown with the blue and orange lines, respectively. For comparison, we also draw the results obtained by using a similar method to that of Han et al. (2021; dashed lines). i.e., the FFNN model. The black vertical lines in the upper panels denote several typical densities (1ρ_sat, 2ρ_sat, 4ρ_sat, and 8ρ_sat, where ρ_sat is the nuclear saturation density), and the vertical magenta regions are the central density of PSR J0740+6620. Besides, the horizontal dashed black line in the panel of ${c}_{{\rm{s}}}^{2}/{c}^{2}(\rho )$ (upper right) is the conformal limit, i.e., ${c}_{{\rm{s}}}^{2}/{c}^{2}=1/3$ , while the black straight lines in the lower panels stand for 1.4 M_⊙. The M–R measurements (at 68.3% credible level) of PSR J0030+0451, PSR J0740+6620, PSR J0437-4715, and 4U 1702-429 are represented by the blue dotted–dashed contour, magenta dotted–dashed contour, purple error bar, and gray area, respectively. The red and green dotted–dashed contours represent the M–R posteriors of the NS associated with GW170817 (adopted from the right panel of Figure 3 of Abbott et al. 2018).
Download figure:
Standard image High-resolution image

5. Summary

In this work, based on Han et al. (2021) we develop a new Bayesian nonparametric method for studying the NS EOS. We use the deep neural network VAE to reduce the number of parameters that represent the EOS. By comparing different settings of the network, we find that a VAE with 4D latent space is a proper choice for the representation of EOS. After the training process, we get a trained decoder network. Then we draw a random vector from a 4D standard normal distribution and use the decoder to convert it to the reconstructed 128D EOS table, i.e., we can represent the NS EOS using only four parameters. We perform Bayesian inference to infer the EOS posteriors using NS observations, i.e., the M–R measurements of PSR J0030+0451, PSR J0740+6620, PSR J0437-4715, and 4U 1702-429, as well as the M − Λ measurements of GW170817. Sampling from the posteriors of the latent variables with numerical sampling algorithm PyMultiNest, we compute the EOS tables using the trained decoder, and by numerically integrating the TOV equations we finally get the macroscopic properties of the NS we are interested in. The radius and tidal deformability of a canonical 1.4 M_⊙ NS are constrained to be ${R}_{1.4}={12.14}_{-0.93}^{+0.94}\,\mathrm{km}$ ( ${12.59}_{-0.42}^{+0.36}\,\mathrm{km}$ ) and ${{\rm{\Lambda }}}_{1.4}={374}_{-160}^{+283}$ ( ${489}_{-110}^{+114}$ ) at 90% credible level for data set ${{ \mathcal D }}_{1}$ ( ${{ \mathcal D }}_{2}$ ), respectively. Besides, the maximum mass of a nonrotating NS ${M}_{\max }$ is constrained to be ${M}_{\max }={2.26}_{-0.23}^{+0.39}$ ( ${2.20}_{-0.19}^{+0.37}{{\rm{M}}}_{\odot }$ ) for data set ${{ \mathcal D }}_{1}$ ( ${{ \mathcal D }}_{2}$ ). As for the latent variables, we find that except for z₂, all the latent variables are well constrained, and some show correlations with each other or the macroscopic properties.

Though we use only four parameters to represent the NS EOS with the VAE neural network, it still maintains the nonparametric feature. This dimensionality reduction process makes a significant development in the Bayesian nonparametric inference of NS EOS because it can dramatically reduce the dimension of the parameter space and effectively reduce difficulty and time during sampling. Quantitively, VAE techniques can accelerate calculation by a factor of ∼3–10 or more. Nevertheless, there are still some aspects to be improved in future work. As mentioned in Section 4, the latent variables are hidden variables that do not have any direct relation to the EOS parameters or their macroscopic properties. However, the posteriors obtained in this work do show a few correlations. Thus, we can further investigate the relationship between the latent variables and the parameters we are interested in, i.e., disentangle the latent variables. Though we have tried different hyperparameters of the network to find the proper setting of the VAE, the compress process may still lead to the loss of information. Therefore, in future works, one can still enhance the efficiency of the representation while maintaining accuracy. Besides, in the low- and very high-density regions, we can also incorporate the constraints from chiral effective field theory (Essick et al. 2021) and perturbative quantum chromodynamics (Gorda et al. 2022).

We appreciate the referees' helpful suggestions and thank Dr. J. L. Jiang for the useful discussion and input. This work was supported in part by NSFC under grants Nos. 12233011, 11921003 and 11525313.

Nonparametric Representation of Neutron Star Equation of State Using Variational Autoencoder

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction