The reconstruction of dark energy with Ridge Regression Approach

It may be determined by non-parametric method if the dark energy evolves with time. We propose a method of combining PCA and biased estimation on the basis of ridge regression analysis to reconstruct parameters, meanwhile we present an interesting principal component selection criterion to avoid the arbitrariness of principal component selections, and use numerical integral by Lagrange interpolation to linearize the luminosity distance integral formula in nearly flat space to avoid instability of derivative for functional data. We get the preliminary test results that shows if $\Delta \overline w (z) = \overline {\left| {1 + w(z)} \right|}<=0.05$ included $w(z) = -1$, the probability of making a type I error for ${w_{recon}} \ne -1$ is almost zero ($1\% $) in the test; otherwise, if $\Delta \overline w (z) = \overline {\left| {1 + w(z)} \right|}>0.05$, the probability of making a type I error for ${w_{recon}} = -1$ is not more than $10\% $. Finally, we use JLA sample to reconstruct $w(z)$, and the results reject ${\rm{w(z) }} \ne {\rm{ - 1}}$, which is agreement with $\Lambda CDM$ model.


INTRODUCTION
Since the cosmic acceleration was firstly discovered in 1998 (Riess et al. 1998), physicists predict the existence of dark energy for explaining the accelerating expansion of the universe, on the other hand, the nature of dark energy whether or not evolves with time has also become a significant issue (Linder et al. 2003). The various observational data can be used to test the dark energy equation of state.
On the data aspects, astronomers are getting more and higher precision original observational data including Type Ia SNe (Amanullah et al. 2010;Betoule et al. 2014;Scolnic et al. 2018), Hubble parameter H(z) (Lewis et al. 2002), cosmic microwave background radiation(CMB) (Hu et al. 2002;Komatsu et al. 2011;Ade et al. 2016;Aghanim et al. 2016), and large scale structure (LSS) (Eisenstein et al. 2005). Although the correction methods for apparent magnitude of SNe Ia including SALT2 (Guy et al. 2005(Guy et al. , 2007, SALT2 with the improved absolute magnitude depend on the prior cosmological models or the dark energy equation of state, the calibrated data can still be applied to partially recover the state parameter, w(z). Meanwhile Hubble parameters including some data obtained from the age-redshift relationship of galaxies can be directly used to recon-struct the dark energy w(z) models because its correction does not need to depend on a prior cosmological model (Riess et al. 2002). Therefore, original observation data including Type Ia SNe, Hubble parameter H(z), cosmic microwave background radiation(CMB), and large scale structure (LSS) can be employed to obtain statistical results to measure the dynamic property of dark energy. Betoule1 et al. in 2014(Betoule et al. 2014) used a Joint Light-Curve Analysis of the SDSS-II and SNLS3 Supernova Survey and obtained the statistical result which indicates w(z) model including linear and constant w(z) has not evident superiority, comparing to ΛCDM model by minimum chi-square. Even so, we still consider using non-parametric methods to examine whether or not w(z) model has obvious advantage compared to ΛCDM .
The non-parametric methods have several types: i) Principal Component Analysis (PCA), includes nonlinear PCA (Jimenez et al. 2003) and linear PCA (Clarkson et al. 2010), this method provide a good idea to test w(z) model, the testing yields good result, it can recover ΛCDM model, but it may be difficult to recover w(z) model because of the instability of derivative. On the other hand, the selection of principal component has some artificial effects; ii) Gaussian Processes (GP) (Holsclaw et al. 2010;Shafieloo et al. 2012;Seikel et al. 2012), the advantage of GP method is that the calculation is simple, but the covariance function parameters have a significant impact on the calculation of results, and its selection has a certain arbitrariness. iii) PCA with the smoothness prior (Riess et al. 2009;Crittenden et al. 2012;Zhao et al. 2012), although the results obtained by this method are good, considering that it is a non-linear regression, the calculation is more complicated, and the choice of principal components and the covariance function parameters also have the above problems.
We consider a method of combining numerical integral by Lagrangian interpolation, PCA analysis and biased estimation on the basis of ridge regression analysis, we apply this method to test the dark energy equation of state. At the same time, we propose a new principal component selection criterion to better distinguish between ΛCDM and w(z) ̸ = −1 models by reconstruction.

THE RECONSTRUCTION OF PARAMETER
USING PCA WITH BIASED ESTIMATION

Linearization of the relation between of luminosity distance and redshift
We concern that the large positive and negative space curvature has not been obviously observed by observational data, and approximately adopt curvature Ω k = 0. Then to avoid the unstable problems of first and second derivation of luminosity distance, which are used to reconstruct w(z) by second derivative formula of w(z) (Clarkson et al. 2010). We use Gauss integration to linearize the luminosity distance formula, and adopt the simplest Lagrange's interpolation method to get a linear regression, which can be used to directly reconstruct w(z).
The luminosity distance formula d l is where H 0 is the Hubble constant at redshift z = 0, h(z) = H(z)/H 0 , H(z) is the Hubble parameter at redshift z. We use Gauss integration to linearize the luminosity distance formula, the linear regression can be written as Hprior )/dr, r = 1 / 1 + z , and X = X ′ T , T is Trapezoid numerical integration matrix, X ′ is the coefficient matrix of Gauss integration, which is given by = 0, which K is covariance matrix, the boundary condition can be satisfied. We select the number of interpolation points N ≥ 20, and use PCA with biased estimation to reconstruct regression parameter β.

Biased estimation and PCA
Considering a linear regression assume both of y and β are function data, we put a zero mean Gaussian prior with covariance matrix K on the regression coefficient β, the biased estimate of β is (Rasmussen et al. 2006) whichβ is the unbiased estimate of β, Cβ is the covariance ofβ, and Cβ = (X T Σ y −1 X) −1 , Σ y is the covariance of data y, covariance function at r and r ′ written as where σ 2 is the variance, l is the correlation lengths, k(r, r ′ ) is the element of matrix K. When s = 2, covariance matrix condition number will be too large to reduce the impact of input errors on estimated parameters. It is same for other forms of the k(r, r ′ ) function, so we set the value of s ∈ (1.5, 1.9). It can avoids some artificial effects for the choice of the parameters s, and also decrease matrix condition number, which can reduce the impact of input errors on estimated parameters, and avoid unstable problem in parameter values. The biased covariance of biased estimate ofβ * is (see Appendix A5) Here we use the linear regression method that combined PCA with biased estimate based on ridge regression analysis (Frank et al. 1993;Wold et al. 2001), to decrease variance and reconstruct the biased estimatê β, the linear regression can be expressed aŝ and where α is the regression coefficient vector, P T is the eigenmatrix, Λ is the diagonalizable matrix of eigenvalue. Because P T α is smooth estimate values, the right form of Eq. (8) can satisfy that new continuous estimates are available, but does not change estimates. Using the orthonormality condition, the estimatesα can be computed asα = Pβ * , its covariance matrix is diagonal, thus we can choose appropriate principal component to reconstruct the estimateβ.

The choice of principal component coefficientα i and covariance function parameters
We assume a linear transformations c = qβ about linear regression y = Xβ, making sure the covariance matrix Cĉ of unbiased estimate ofĉ is diagonal, then we put a zero mean Gaussian prior with covariance matrix K c on the parameter c, which K c is diagonal, and its element is k ci 2 , minimizing the trace of Eq. (7) T r(Vĉ * ) is equal to k ci = |c i |, after that we combine Eq. (5) and use biased estimateĉ * to substitute c, and find if k ci has a real solution,ĉ should satisfy conditionĉ 2 i ≥ 4σ 2 (ĉ i ) (see Appendix B). Thus, we give an interesting criteria which expressed aŝ We define type I error is the situation that β true = 0 but β recon ̸ = 0, and type II error is the situation that β true ̸ = 0 but β recon = 0. If we choose σ(α i ) as criteria, which will increase the probability of making a type I error and decrease the probability of making a type II error, whereas if we choose 3σ(α i ) as criteria, which will increase the probability of making a type II error and decrease the probability of making a type I error.
If we assume k(r, r ′ ) = σ 2 , minimizing T r(Vβ * ) to be equal to σ 2 ≥ β 2 max , so we give σ = β * max , and the value of β * max trend to be stable with increasing of σ (see Appendix B). We select the value of s in the range of 1.5 ≤ s ≤ 1.9, which has little or not impact on the variety of β recon , and the value of correlation lengths l also has much less effect on β recon , we will discuss this issue later on.

TESTING THE METHOD BY THE VARIOUS W (Z) FUNCTIONS
We construct hypothetical d H data sampled from various w(z) models, with the redshift in range of 0 < z ≤ 1.3, and assume statistical uncertainty is σ m = 0.15mag. The test w(z) functions include the following forms: i) constant w model; ii) the linear model which is often used in parametric method (Linder et al. 2003;Maor et al. 2002;Riess et al. 2004Riess et al. , 2007Sahni et al. 2006); iii) the feature model (Xia et al. 1998),where ln is natural logarithm function; vi) the transition model which is reconstructed as fiducial model in nonparametric method (Jimenez et al. 2003;Corasaniti et al. 1998). We use Eq.
(2) , (8) and (11) , and choose a flat ΛCDM model as a prior model to obtain w recon , meanwhile we give the ∆( 1 h ) recon by the different values of correlation lengths l, their reconstructions are shown in Fig. 1, which show l has much less effect on reconstruction. The tests demonstrate that the selection of covariance function parameters l and s have not obvious influence on the calculations, and the value of σ is given by explicit form from the analysis of the above section.
The reconstructions of w(z) and the functions are shown in Fig. 2, the test results indicate if ∆w(z) = |1 + w(z)| <= 0.05 for various w(z) models including w(z) = −1, the probability of making a type I error for w recon ̸ = 1 as shown in the upper left figure is almost zero (1%) in the test which are not less 100 for the times of sampling; otherwise, if ∆w(z) = |1 + w(z)| > 0.05, the probability of making a type I error for w recon = 1 is not more than 10% as shown in other three graphs. We choose principal components by a new criteria, the goal is to better identify the highest priority for the ΛCDM or w(z) ̸ = −1 models. When ∆w(z) = |1 + w(z)| <= 0.05 or w = −1, the reconstruction results are w = −1 by our criteria (Eq. (10) ), When ∆w(z) = |1 + w(z)| >= 0.05, the reconstruction results are w ̸ = −1 by same criteria.

USING THE JLA SAMPLE TO RECONSTRUCT
W (Z)

Used data
The SDSS-II and SNLS3 Supernova Survey Joint Light-Curve Analysis is called JLA for short. The Supernova Legacy Survey Program used a large CCD mosaic MegaPrime at the Canada-France-Hawaii Telescope to detect and monitor approximately 2000 high-redshift Supernovae between 2003 and 2008. 239 SNe Ia based on the first three years of data is contained in JLA sample, the goal of the Survey is to investigate the expansion history of the universe, improve the constraint of cosmological parameters, as well as dark energy study, including the measure of the time-averaged equation of state of dark energy w to 0.05 (statistical uncertainties only) in combination with other measurements and to 0.10 considering systematic effects (Conley et al. 1998).
The SDSS-II Supernova Survey used the SDSS camera (Gunn et al. 1998) on the SDSS 2.5 m telescope (York et al. 1998;Gunn et al. 1998) at the Apache Point Observatory (APO) to search for SNe in the northern fall seasons (September 1 through November 30) of 2005 to 2007. Until running on the end of the year 2007, a wide variety of sources including solar system objects, galactic variable stars, active galactic nuclei, supernovae (SNe), and other astronomical transients were observed (Sako et al. 1998), 403 sources were identified as SNe (Betoule et al. 2014).
In 2014 an large catalogue was released containing light curves, spectra, classifications, and ancillary data of 10,258 variable and transient sources by this Survey (Sako et al. 2014), The release resulted in the largest sample of supernova candidates ever compiled with 4607 likely supernovae, 500 of which have been confirmed as SNe Ia by the spectroscopic follow-up. JLA sample consists of a selection of 374 SNe Ia from this spectroscopic sample. The rest of JLA sample are taken from the C11 compilation, comprising SNe from SDSS, SNLS, HST and several nearby experiments (Conley et al. 1998). This extended sample of 740 SNe Ia is called the JLA sample.

SALT2 calibration for JLA sample
We consider the prior dark energy equation of state is unknown, so we use SALT2 and Taylor expansion of d H − z relation to directly calibrate JLA sample, which can simplify the problem.
The distance modulus µ ob correction formula is given by SALT2 model (Guy et al. 2005(Guy et al. , 2007 where m B corresponds to the observed peak magnitude in rest frame B band, x 1 describes the time stretching of the light curve, c describes the SN colour at maximum brightness, and α, β are nuisance parameters in the distance estimate. M B is the absolute B-band magnitude, which depends on the host galaxy properties (Betoule et al. 2014). Notice that M B is related to the host stellar mass (M stellar ) by a simple step function Here M ⊙ is the mass of the Sun. Meanwhile, the relation of distance modulus µ and luminosity distance d H can be written as We use SALT2 and Taylor expansion of d H −z relation to directly calibrate JLA sample, the Taylor expansion of d H − z relation can be given by where y = z/(1 + z). In order to reduce calculation error for high redshift data, we take this variable substitution. q 0 is the deceleration parameter, j 0 is the jerk parameters, and Ω k0 is the curvature term.  Figure 2. The reconstruction results of w(z) from dH data which are sampled from various w(z) models together with the luminosity distance formula. The upper left figure is the reconstruction results for test models w(z) = −1 or ∆w(z) = |1 + w(z)| < 0.05, the results show that when ∆w(z) = |1 + w(z)| <= 0.05 or w = −1, the reconstruction result is w = −1 if we choose 2σ(αi) as criteria (Eq. (10) ), but the reconstruction result will have slightly deviated from w = −1 if we choose σ(αi) as criteria. It shows that the method can be used to test models with slight deviations for w(z) = −1. The top right and bottom graphs are the reconstruction results for linear model, feature model, and transition model, it shows that our method has certain applicability to the recovery of w(z) ̸ = −1 models by same criteria .
The χ 2 of JLA data can be calculated as where ∆d H = d H,ob − d H,th . C d H is the covariance matrix of d H , which can be given by the propagation error formula together with C µ . C µ is the covariance matrix of the distance modulus µ, we only consider statistical error, and We use JLA sample to reconstruct w(z). Firstly, we combine the luminosity distance data and Eq. (2) to derive the estimation of ∆( 1 h ) ′ , and then adopt Eq. (8) to reconstruct ∆( 1 h ) ′ , meanwhile employ integration to obtain ∆( 1 h ), in order to reduce the probability of making a type II error, we choose criteria σ(α i ) and 2σ(α i ) for Eq. (10) . Finally, we adopt second derivative formula Eq. (11) to obtain w recon . The reconstruction is shown in Fig. 3, the results indicates the equation of state has not obviously deviated from w = −1, or it may be consistent with ΛCDM .

CONCLUSIONS
We have presented a new, nonparametric reconstruction technique, which combines PCA, biased estimation, and numerical integral by Lagrangian interpolation techniques, this new method provide a good idea for solving the derivative problems of large variances functional data. The advantages of this method are as follows: i) adopting Lagrangian interpolation to avoid instability of derivative for functional data, ii) presenting an interesting principal component selection criterion to better identify the highest priority for the ΛCDM or w(z) ̸ = −1 models, meanwhile the choice of prior does not have much impact on reconstruction results, iii) using linearization formula to make the calculation easier. The test results demonstrate the PCA-biased method can be used to determine the most probable behavior of w(z) and to infer how likely a target trajectory is given the current data. Thus it can be used to accept or reject classes of ΛCDM model.
We employ this method for the dark energy equation of state and applied it to JLA supernova sample, the results are consistent with ΛCDM . In the future, we hope to observe more high redshift Type Ia SNe data z ≥ 1, it will be more convenient to reconstruct the dark energy equation of state.
On the other hand, although the origin and composition properties of dark energy remain unknown, we still can use the observational data to test the dynamic property of dark energy. A solid measurement that w = −1 or ̸ = −1 (which would rule out the cosmological constant) would have profound implications for cosmology and particle physics, it may bring us more new physical ideas.

A. THE BIASED ESTIMATE AND BIASED COVARIANCE
Considering a linear regression Y = Xβ, the covariance of data Y is Σ y , which is diagonal matrix, then adopt least square method to derive the biased estimate of parameter β β = (X T Σ y −1 X) −1 X T Σ y −1 y, and its covariance Cβ = (X T Σ y −1 X) −1 .

B. THE RIDGE REGRESSION ANALYSIS FOR BIASED COVARIANCE
Making a linear transformations for the regression coefficient which Cβ = q T Λq, q T is the eigenmatrix, Λ is the diagonalizable matrix of eigenvalue, then the covariance of unbiased estimateĉ is Cĉ = Λ.
We put a zero mean Gaussian prior with covariance matrix K c on the parameter c, which K c is diagonal, and its element is k 2 ci , then get the trace of biased covariance where Vĉ * is the biased covariance of biased estimateĉ * , λ i (K c ) and λ i (Λ) are the eigenvalue of matrix K c and Λ. Minimize T r[Vĉ * ], and obtain k ci = |c i | .
which c i is the element of regression parameter vector c. If we use biased estimateĉ * to substitute c, and k ci has a real solution,ĉ should satisfy conditionĉ