Probing dark energy with cluster counts and cosmic shear power spectra: including the full covariance

(Abridged) Combining cosmic shear power spectra and cluster counts is powerful to improve cosmological parameter constraints and/or test inherent systematics. However they probe the same cosmic mass density field, if the two are drawn from the same survey region, and therefore the combination may be less powerful than first thought. We investigate the cross-covariance between the cosmic shear power spectra and the cluster counts based on the halo model approach, where the cross-covariance arises from the three-point correlations of the underlying mass density field. Fully taking into account the cross-covariance as well as non-Gaussian errors on the lensing power spectrum covariance, we find a significant cross-correlation between the lensing power spectrum signals at multipoles l~10^3 and the cluster counts containing halos with masses M>10^{14}Msun. Including the cross-covariance for the combined measurement degrades and in some cases improves the total signal-to-noise ratios up to plus or minus 20% relative to when the two are independent. For cosmological parameter determination, the cross-covariance has a smaller effect as a result of working in a multi-dimensional parameter space, implying that the two observables can be considered independent to a good approximation. We also discuss that cluster count experiments using lensing-selected mass peaks could be more complementary to cosmic shear tomography than mass-selected cluster counts of the corresponding mass threshold. Using lensing selected clusters with a realistic usable detection threshold (S/N~6 for a ground-based survey), the uncertainty on each dark energy parameter may be roughly halved by the combined experiments, relative to using the power spectra alone.


I. INTRODUCTION
In recent years great observational progress has been made in measuring the constituents of the universe (e.g. [1, 2, 3]). It appears that the universe is currently dominated by an unexpected component that is causing the universe to accelerate in its expansion. This component is dubbed "dark energy". Understanding the nature of dark energy is one of most fundamental questions that remain unresolved with the current cosmological data sets (e.g. [4,5]). This is now the focus of several planned future surveys [6,7,8,9,10,11,12].
Whether the accelerating expansion is as a consequence of the cosmological constant, a new fluid or a modification to Einstein's gravity, these future surveys will provide key information. In addition they will provide a wealth of further cosmological information, such as constraints on the neutrino mass and the spectrum of primordial perturbations generated in the early universe (e.g. [13]).
Combining several techniques accessible from different cosmological observables is often a powerful way to improve constraints on cosmology. However, care must be taken if the observables are not completely independent. Two of the most promising methods for constraining the * Electronic address: takada@astr.tohoku.ac.jp, sarah.bridle@ucl.ac.uk dark energy are galaxy cluster counts and cosmic shear (e.g. [14]). Clusters of galaxies contain galaxies, hot gas and dark matter in ratio approximately 1:10:100 [15]. They are the largest gravitationally bound objects in the universe and the number of clusters of galaxies has long been recognized as a powerful probe of cosmology [16,17,18,19,20]. Counting clusters of galaxies as a function of redshift allows a combination of structure growth and geometrical information to be extracted, thus potentially allowing constraints on the nature of dark energy [21,22,23,24,25]. If cluster masses can be measured accurately then the shape of the mass function also helps to break degeneracies [26]. The distribution of clusters on the sky (e.g. two-point correlation function) carries additional information on dark energy [27,28].
The bending of light by mass, gravitational lensing, causes images of distant galaxies to be distorted. These sheared source galaxies are mostly too weakly distorted for us to measure the effect in single galaxies, but require surveys containing a few million galaxies to detect the signal in a statistical way. This cosmic shear signal has been observed [29,30,31,32] and used to constrain cosmology (most recently [33,34,35,36]). By using redshift information of source galaxies the evolution of the dark matter distribution with redshift can be inferred. Hence, measuring the cosmic shear two-point function as a function of redshift and separation between pairs of galaxies can be used to constrain the geometry of the universe as well as the growth of mass clustering. This method has emerged as one of the most promising to obtain precise constraints on the nature of dark energy if systematics are well under control [37,38,39].
Future optical imaging surveys suitable for cosmic shear analysis will also allow the identification of clusters of galaxies. This could be done either using the colors of the cluster members (e.g. [40,41]) or using peaks in the gravitational lensing shear field (e.g. [42,43,44]). In addition cluster surveys in other wavebands will overlap with the cosmic shear surveys allowing detection using X-rays and the thermal Sunyaev-Zel'dovich (SZ) effect.
Clusters of galaxies produce a large gravitational lensing effect on distant galaxies, therefore cluster counts and cosmic shear will not be strictly statistically independent. The volume surveyed is finite and therefore the number of clusters observed will not be exactly equal to the average over all universe realizations. If the number of clusters happens to be higher for a given survey region, then the cosmic shear signal is also likely to be higher. Although the volumes will be large, and thus the deviation is small, this may amount to a significant uncertainty in the dark energy parameters as obtained by cluster counts, and dominates the non-Gaussian errors on the cosmic shear [45,46,47,48].
One aspect of this cross-correlation was discussed in [49] and found to be negligible. However, here we make a full treatment of this effect using the halo model for non-linear structure formation, and quantify the resulting change in joint constraints on the dark energy parameters.
The structure of our paper is as follows. In § II we describe how our observables, cluster number counts and lensing power spectra, can be expressed in terms of the background cosmological model and the density perturbations. In § III, we describe a methodology to compute covariances of the cluster counts and the lensing power spectra, and the cross-covariance between the two observables. The detailed derivations of the covariances are presented in Appendix. In § IV we first study the total signal-to-noise ratios expected for a joint experiment of the cluster counts and the lensing power spectrum fully including the cross-covariance predicted from the ΛCDM cosmologies. We then present forecasts for cosmological parameter determination for the joint experiment, with particular focus on forecasts for the dark energy parameter constraints. Finally, we present conclusions and discussion in § V.

A. A CDM model
We work in the context of spatially flat cold dark matter models for structure formation. The expansion history of the universe is given by the scale factor a(t) in a homogeneous and isotropic universe (e.g., see [50]). We describe the Universe in terms of the matter density Ω m (the cold dark matter plus the baryons) and dark energy density Ω de at present (in units of the critical density 3H 2 0 /(8πG), where H 0 = 100 h km s −1 Mpc −1 is the Hubble parameter at present). In general the expansion rate, the Hubble parameter, is given by where we have employed the normalization a(t 0 ) = 1 today and w(a) specifies the equation of state for dark energy as w(a) ≡ p de (a)/ρ de (a). Note that Ω m + Ω de = 1 and w = −1 corresponds to a cosmological constant. The comoving distance χ(a) from an observer at a = 1 to a source at a is expressed in terms of the Hubble parameter as (2) This gives the distance-redshift relation χ(z) via the relation 1 + z = 1/a. Next we need the redshift growth of density perturbations. In linear theory after matter-radiation equality, all Fourier modes of the mass density perturbation, δ(x)(≡ δρ m (x)/ρ m ), grow at the same rate, the growth rate (e.g. see Eq. 10 in [51] for details).

B. Number counts of galaxy clusters
The galaxy cluster observables we will consider in this paper are the number counts drawn from a given survey region. Clusters can be found via their notable observational properties such as gravitational lensing, member galaxies, X-ray emission and the SZ effect. For number counts we simply treat clusters as points; in other words, we do not care about the distribution of mass within a cluster. Hence, the number density field of clusters at redshift z can be expressed as where δ 3 D (x) is the three-dimensional Dirac delta function. The summation runs over halos (the subscript i stands for the i-th halo), and S(m i ; z) denotes the selection function that discriminates the halos used for the cluster number counts statistic from other halos.
In this paper, we will consider the following two toy models for the selection function, to develop intuition for the importance of cross-correlation between cluster counts and the lensing power spectrum and to make a comparison between cosmological parameter estimations derived from different cluster samples. Note that throughout this paper we will ignore uncertainties associated with cluster mass-observable relation, which could significantly degrade the ability of cluster counts for constraining cosmological parameters (e.g. [23]). We shall discuss this issue in § IV E.
A mass-limited cluster sample -The first toy model we will consider is a mass-limited cluster sample. For this model, we include all halos with masses above a given mass threshold: To a zero-th order approximation, the mass-limited selection may mimic a cluster sample derived from a fluxlimited survey of clusters via the SZ effect, as this effect is free of the surface brightness dimming effect (e.g. see [52]). A lensing-based cluster sample -A lensing measurement allows one to make a reconstruction of the twodimensional mass distribution projected along the line of sight [53]. A high peak in the mass map provides a strong candidate for a massive cluster (see [42,43,44] for an implementation of this method to actual data). To be more explicit, one can define height or significance for each peak in the reconstructed mass map using the effective signal-to-noise ratio (see [54] for details): Here κ cluster is the convergence amplitude due to a given cluster at redshift z and with mass m, and σ N is the rms fluctuations in κ due to the intrinsic ellipticity noise arising from a finite number of the background galaxies. Note that we assume an NFW profile [55] with profile parameters modeled in [56], and consider the convergence field smoothed with a Gaussian filter of angular scale θ s = 1 ′ . To compute the (S/N ) cluster for a cluster at redshift z, we take into account the remaining fraction of background galaxies behind the cluster for a given redshift distribution of whole galaxy population (see § IV A). This accounts for the variation of mean redshift and number density of the background galaxies with cluster redshift, which changes both the signal and the intrinsic noise in Eq. (4). From the reconstructed mass map, a cluster sample may be constructed by counting mass peaks with heights above a given threshold, ν min : the selection function is given by As carefully investigated in [54], the minimum mass of clusters detectable with a given threshold varies with cluster redshift; clusters at medium redshift between observer and a typical source redshift are most easily detectable, while only more massive clusters can be detected at redshifts smaller and greater than the medium redshift, as discussed below. We will employ the halo model to quantify the statistical properties of cluster observables. In the halo model approach, we assume that all the matter is in halos. Following the formulation developed in [56,57,58,59] (also see Appendix A 1 and [60] for a thorough review), the ensemble average of Eq. (3) can be computed as where n(m) is the halo mass function corresponding to the redshift considered and we have used the ensemble . Thus, as expected, the ensemble average of the cluster number density field is given by the integral of the halo mass function, which does not depend on the cluster distribution and spatial position. For the halo mass function, we employ the Sheth-Tormen fitting formula [61], modified from the original Press-Schechter function [62]. Note that we use parameter values a = 0.75 and p = 0.3 in the formula following the discussion in [63]. We assume that the mass function can be applied to dark energy cosmologies by replacing the growth rate appearing in the formula with that for a dark energy model [64].
A more useful quantity often considered in the literature is the total number counts of clusters available from a given survey, which is obtained by integrating the three-dimensional number density field over a range of redshifts surveyed. Cluster redshifts are rather easily available even from a multicolor imaging survey alone because their central bright galaxies, or red sequence galaxies, have secure photometric redshift estimates. Having these facts in mind we will use as our observable the angular number density averaged over a survey area and divided into redshift bins: where W (θ) is the window function of the survey defined so that it is normalized as d 2 θW (θ) = 1, χ H is the distance to the Hubble horizon, and the comoving volume per unit comoving distance and unit steradian is given by d 2 V /dχdΩ = χ 2 for a flat universe. The subscript in the round bracket, (b), stands for the b-th redshift bin for the cluster number counts. In the following, we will simply consider the sharp redshift selection function Note that the redshift z appearing in the argument of S (b) (m i ; z) is related to the comoving distance χ via the relation dχ = dz/H(z).
Using the halo model, the expectation value of the angular number density can be computed from the ensemble average of Eq. (6) as Thus, the expectation value again does not depend on the cluster distribution. The sensitivity of the number density to dark energy arises from the comoving volume and the mass function n(m) [21].
FIG. 1: The average angular number density of halos with masses above a given threshold, per unit square arcminute and per unit redshift interval. The upper pair and lower pair of curves are for halos with M/M⊙ ≥ 10 14 and 5 × 10 14 , respectively. Increasing the dark energy equation state from w = −1 to w = −0.9 decreases the number density, as shown by the dashed curves. Fig. 1 shows the average angular number density of halos with masses greater than a given threshold, per unit square arcminute and per unit redshift interval assuming the fiducial model defined in IV A. Increasing the dark energy equation of state from our fiducial model w = −1.0 to w = −0.9 decreases the number density, because the change decreases both the comoving volume d 2 V /dχdΩ and the number density of cluster-scale halos, for a given CMB normalization of density perturbations. Comparing the results for mass thresholds M min /M ⊙ = 10 14 and 5 × 10 14 clarifies that a factor 5 increase in the mass threshold leads to a significant decrease in the number density, reflecting the mass sensitivity of the halo mass function in its exponential tail.
In Fig. 2 we present the number density for the lensingbased cluster sample in which clusters having a lensing FIG. 2: As in the previous plot, but shown here is the number density for the lensing-based cluster sample, where clusters having a lensing signal greater than a given detection threshold are selected in the sample as described around Eq. (5). The dashed, solid and dotted curves show the results for the detection thresholds (S/N ) cluster ≥ 6, 8 and 10, respectively. For comparison, the two dot-dashed curves show the number density for the mass-selected cluster sample with masses M/M⊙ ≥ 5, 10 × 10 14 . Increasing w0 from w0 = −1.0 to w0 = −0.9 leads to a decrease in the number density as shown by the thin-solid curve, compared to the bold-solid curve. The lensing selected number densities peak at a redshift z ∼ 0.25, reflecting redshift dependence of the lensing efficiency function for source galaxies at zs ∼ 1. signal greater than a given detection threshold are included in the sample as discussed around Eq. (5). Note that to compute the results shown in this plot we assumed the redshift distribution of galaxies described in § IV A and the NFW mass profile to model the cluster lensing. In practice high detection thresholds such as (S/N ) cluster > ∼ 6 are necessary in order to make robust estimates for cluster counts, because contamination of false peaks due to intrinsic ellipticities or the projection effect are expected to be low for such high thresholds (see [54,65] for the details). Comparing with the number density for a mass-selected sample shown by the dot-dashed curves, one can roughly find which mass and redshift ranges of clusters are probed by the lensing-based cluster sample. For example, the cluster sample with lensing signal (S/N ) cluster ≥ 10 contains massive clusters with masses M > ∼ 10 15 M ⊙ over redshift ranges z < ∼ 0.4, while only even more massive clusters are included in the sample at the higher redshifts. This cluster sample has a narrower redshift coverage than the simple mass threshold; all the curves peak at a redshift z ∼ 0.25. The peak redshift is mainly attributed to redshift dependence of the lensing efficiency for source galaxies of z s ∼ 1 in our redshift distribution. A change of w 0 from w 0 = −1.0 to w 0 = −0.9 leads to a decrease in the number density, as seen in Fig. 1. As before the effect comes partially arises from the decrease in comoving volume and the change in the halo mass function. Unlike the simple mass threshold case, there is now an additional contribution to the decrease in number density caused by the lower lensing efficiency and thus lower S/N for a cluster of a given mass and redshift.
C. Lensing power spectrum with tomography Gravitational shear can be simply related to the lensing convergence: the weighted mass distribution integrated along the line of sight. Photometric redshift information on source galaxies allows us to subdivide galaxies into redshift bins (we will discuss possible effects of photometric redshift errors on our results in § IV E). This allows more cosmological information to be extracted, which is referred to as lensing tomography (e.g., see [66,67,68] for a thorough review, and see [37,38,39] for the details of lensing tomography).
In the context of cosmological gravitational lensing the convergence field with tomographic information is expressed as a weighted projection of the three-dimensional mass density fluctuation field: where θ is the angular position on the sky, and W (i)g is the gravitational lensing weight function for source galaxies sitting in the i-th redshift bin (see Eq. (10) in [39] for the definition). Note that, hereafter, quantities with subscripts in the round bracket such as (i) stands for those for the i-th redshift bin. To avoid confusion, throughout this paper we use i, j or i ′ , j ′ for the lensing power spectrum redshift bins, and b, b ′ for the cluster count redshift bins. The lensing tomographic information allows us to extract redshift evolution of the lensing weight function as well as the growth rate of mass clustering. These are both sensitive to dark energy. For example, increasing the equation of state parameter w from w = −1 lowers W (i)g as well as suppressing the growth rate at lower redshifts. Therefore when the CMB normalization of density perturbations is employed, an increase in w decreases the lensing power spectrum due to both the lower W (i)g and the lower matter power spectrum amplitude. The sensitivity of lensing observables to the dark energy equation of state roughly arises equally from the two effects (e.g., see [69]).
The cosmic shear fields are measurable only in a statistical way. The most conventional methods used in the literature are the shear two-point correlation function. The Fourier transformed counterpart is the shear power spectrum. The convergence power spectrum is identical to the shear power spectrum but is easier to work with as it is a scalar. Using the flat-sky approximation [70], the angular power spectrum between the convergence fields of redshift bins i and j is found to be (10) where P δ (k) is the three-dimensional mass power spectrum. We can safely employ the flat-sky approximation for our purpose, because a most accurate measurement for the lensing power spectrum is available around multipoles l ∼ 1000 for a ground-based survey of our interest (e.g. see Fig. 1 in [71]), and the flat-sky approximation serves as a very good approximation on these small scales [72].
For l > ∼ 100 the major contribution to P (ij)κ (l) comes from non-linear clustering (e.g., see Fig. 2 in [39]). We employ the fitting formula for the non-linear P δ (k) proposed in Smith et al. [73], assuming that it can be applied to dark energy cosmologies by replacing the growth rate used in the formula with that for a given dark energy model. We note in passing that the issue of accurate power spectra for general dark energy cosmologies still needs to be addressed carefully (see [74,75] for related discussion). Fig. 3 demonstrates how lensing of back-FIG. 3: A lensing power spectrum for the non-tomographic case (i.e. one redshift bin) for a ΛCDM model, expected for a ground-based survey that probes galaxies with mean redshift zs = 0.9. The two thin dashed curves show the 1-and 2-halo term contributions to the power spectrum, while the bold curve shows the total power. The three thin solid curves show the 1-halo term contributions obtained when the lensing effects on background galaxies due to halos with masses M/M⊙ ≥ 10 15 , 10 14 , 10 13 are included, respectively.
ground galaxies by clusters contributes to the lensing power spectrum. Note here that we have employed the halo model developed in Takada & Jain [56,59] to compute the mass power spectrum, although we will use the Smith et al. fitting formula to compute the lensing power spectrum in most parts of this paper instead. Briefly, to compute the spectra based on the halo model approach, we need to model three ingredients: (i) the halo mass function (see also the description below Eq. [5]); (ii) the profile for the mass distribution around a halo; and (iii) the halo bias parameter.
It is clear that the convergence on scales l > ∼ 100 is significantly boosted by the existence of non-linear structures, halos. In this paper we are especially interested in using the lensing information inherent in angular scales l < ∼ 3000 1 to constrain dark energy, and a fair fraction of the power at scales l ∼ 10 3 , up to ∼ 60% of the total power, arises from massive halos with M > ∼ 10 14 M ⊙ . The 1-halo term contribution is given by redshift-space integral of the halo mass function and halo profiles weighted with the lensing efficiency. The results imply that, if massive clusters with M > ∼ 10 14 M ⊙ happen to be less or more populated in a survey region, amplitudes of the observed lensing power spectrum from the survey are very likely to be smaller or greater than expected, respectively. Therefore, a cross-correlation between the lensing power spectrum and the cluster counts are intuitively expected, if both of the observables are measured from the same survey region.
In reality, the observed power spectrum is contaminated by the intrinsic ellipticity noise. Assuming that the intrinsic ellipticity distribution is uncorrelated between different galaxies, the observed power spectrum between redshift bins i and j can be expressed as where σ ǫ is the rms of intrinsic ellipticities per component, andn (i) denotes the average number density of galaxies in the i-th redshift bin. The Kronecker delta function, δ K ij , accounts for the fact that the cross-spectra of different redshift bins (i = j) are not affected by the shot noise contamination. We will omit the superscript 'obs' when referring to P obs κ (l) in the following for notational simplicity.

III. COVARIANCES OF LENSING POWER SPECTRUM AND CLUSTER OBSERVABLES
To estimate a realistic forecast for cosmological parameter constraints for a given survey we have to quantify sources of statistical error on observables of interest, the cluster number counts and the lensing power spectrum, and then propagate the errors into the parameter 1 At the smaller angular scales l > ∼ 3000, more complex uncertainties in non-linear clustering such as the baryonic effects arise, which need to be addressed more carefully.
forecasts. In this section, we will present the covariance matrices of the observables.

A. Covariances of the cluster number counts
The cluster observables can be naturally incorporated in the halo model approach, allowing us to compute the statistical properties in a straightforward way. In this paper we focus on the average angular number density of clusters drawn from a survey, also subdivided into redshift bins as described in § II B. The covariance between the average number densities in redshift bins b and b ′ , given by Eq. (6), is defined as Based on the halo model the covariances of the angular number density can be derived in Appendix B 1 (also see [63] for the original derivation) as where b(m) is the halo bias parameter ( [76]; we use the model derived in [61]), P L δ (k) is the linear mass power spectrum, andW (x) is the Fourier transform of the survey window function; for this we simply emploỹ W (lΘ s ) = 2J 1 (lΘ s )/(lΘ s ) (J 1 (x) is the 1-st order Bessel function) assuming a circular geometry of the survey region, Ω s = πΘ 2 s . In the following, the tilde symbol is used to denote the Fourier components of quantities. To derive the covariance (13), we have ignored correlations between the number densities between different redshift bins, which would be a good approximation for a redshift bin thicker than the correlation length of the cluster distribution.
The first and second terms in Eq. (13) arise from the 1and 2-halo terms in the halo model calculation; the former gives the shot noise due to the imperfect sampling of fluctuations by a finite number of clusters, while the latter represents the sampling variance arising from fluctuations of the cluster distribution due to a finite survey volume. It should be noted that our formulation allows us to derive the shot noise term without ad hoc introducing as often done in the literature (e.g., see [21]). The two terms in Eq. (13) depend on sky coverage in slightly different ways 2 , and the relative importance depends on the survey area; for a larger survey, the sampling variance could be more important than the shot noise [63].

B. Covariances of lensing power spectra
In reality the lensing power spectrum has to be estimated from the Fourier or spherical harmonic coefficients of the observed lensing fields constructed for a finite survey. In this paper we assume the flat-sky approximation and thus use Fourier wavenumbers ℓ, which are equivalent to spherical harmonic multipoles ℓ in the limit ℓ ≫ 1 [72]. Because the survey is finite, an infinite number of Fourier modes are not available, and rather the discrete Fourier decomposition has to be constructed in terms of the fundamental mode that is limited by the survey size; l f = 2π/ √ Ω s , where Ω s is the survey area. We assume a homogeneous survey geometry for simplicity and do not consider any complex boundary and/or masking effects. The lensing power spectrum of a multipole l is observationally estimated by averaging over wavenumber direction in an annulus of width ∆l where the integration range is confined to the Fourier modes of l ′ satisfying the bin condition l−∆l/2 ≤ l ′ ≤ l+ ∆l/2 and A(l) denotes the integration area in the Fourier space approximately given by A(l) ≡ |l ′ |∈l d 2 l ′ ≈ 2πl∆l. This is discussed in more detail in Appendix B 2.
Once an estimator of the lensing power spectrum is defined, it is straightforward to compute the covariance [46,77] (also see [48] for the detailed derivation). From Eq. (B12), the covariance to describe the correlation between the lensing power spectra of different multipoles and redshift bins is given by where f sky is the sky coverage (f sky = Ω s /4π) and the lensing trispectrum T κ is defined in terms of the 3D mass trispectrum T δ as with k i = l i /χ. Note that the power spectra P (ij)κ appearing on the r.h.s. of Eq. (15) are the observed spectra given in Eq. (11), and therefore include the intrinsic ellipticity noise. The indices m, n denote elements expressed as ∝ (1/f sky ) R ∞ 0 dx xP (k = x/Θsχ)|W (x)| 2 , which looks similar to the f sky dependence of the first term given as ∝ 1/f sky . However, the f sky dependence could be different via the dependence in P (k = x/Θsχ) for the 2nd term.
in the lensing power spectrum covariance and run over the multipole bins and redshift bins. For tomography with n z redshift bins, there are n z (n z + 1)/2 different spectra available at each multipole. Hence, if assuming n l multipole bins, the indices m, n run as m, n = 1, 2, . . . , n l n z (n z + 1)/2. In most parts of this paper we adopt 100 multipole bins logarithmically spaced, which are sufficient to capture all the relevant features in the lensing power spectrum. For example, for tomography with 3 redshift bins, the covariance matrix C g has dimension of 600 × 600 for n l = 100.
The first term of the covariance matrix (second line of Eq. [15]) represents the Gaussian error contribution ensuring that the two power spectra of different multipoles are uncorrelated via δ K ll ′ , while the second term gives the non-Gaussian errors to describe correlation between the different power spectra. The two terms both scale with sky coverage as ∝ 1/f sky . Note that the non-Gaussian term does not depend on the multipole bin width ∆l because of d 2 q/A(l) ≈ 1, and taking a wider bin only reduces the Gaussian contribution or equivalently enhances the relative importance of the non-Gaussian contribution. Naturally, however, the signal-to-noise ratio and parameter forecasts we will show below do not depend on the multipole bin width if the bin width is not very coarse (see [48] for the details).
We employ a further simplification to make quick computations of the lensing covariance matrices. We use the halo model approach to compute the lensing covariance matrices. We know that most of the signal in the power spectrum comes from small angular scales at l ∼ 10 3 to which the 1-halo term provides dominant contribution as shown in Fig. 3. In addition, the non-Gaussian errors are important only at small angular scales. For these reasons, we only include the 1-halo term contribution to the lensing trispectrum to compute the non-Gaussian errors. Although the trispectrum generally depends on four vectors in the Fourier space such as l 1 , l 2 , l 3 and l 4 , the 1-halo term does not depend on any angle between the vectors, but rather depends only on the length of each vector; T 1h (l 1 , l 2 , l 3 , l 4 ) = T 1h (l 1 , l 2 , l 3 , l 4 ) (see [48,56,59]), reflecting spherical mass distribution around a halo in a statistical average sense, which does not have any preferred direction in the Fourier space. Therefore, the non-Gaussian term in Eq. (15) can be further simplified as where we have assumed that the lensing trispectrum does not change significantly within the multipole bin, which is a good approximation for the lensing fields.
C. Cross-covariances of the cluster number counts and lensing power spectra The cluster observables and the weak lensing power spectra probe the same density fluctuation fields in largescale structure, if the two observables are drawn from the same survey region. As implied in Fig. 3, a somewhat significant correlation between the two observables is expected if the small-scale lensing power spectrum is considered. We again use the halo model to compute the cross-covariance. The detailed derivation is described in Appendix B 3, and the cross-covariances can be expressed as Here B (b)cδδ is the 3D bispectrum corresponding to the three-point function of the cluster distribution and the two mass density fluctuation fields. The cross-covariance arises from two contributions of the 3D bispectrum, the 1-and 2-halo terms: whereũ m is the Fourier transform of a halo profile for which we assume an NFW profile [55] as explicitly defined in Eq. (A14). The cross-covariance arising from the 1-halo term represents correlation between one cluster, treated as a point, and the lensing effects on two different background galaxies due to the same cluster. The 2-halo term contribution shows the correlation between one cluster, the lensing field on a background galaxies around the cluster, and the lensing field due to another cluster. Note that the cross-covariance (18) is derived assuming the flat-sky approximation as we focus mainly on small angular scale information, but the full-sky expression can be derived combining the methods developed in this paper and in [72]. Fig. 4 shows the cross-covariance between the massselected cluster counts and the weak lensing power spectrum as a function of angular multipole l, for a concordance ΛCDM model. For illustrative clarity we use a single redshift bin for both of the cluster counts and the lensing power spectrum. The dashed, solid and dotted curves are the results obtained when minimum halo masses of M min /10 14 M ⊙ = 1, 5 and 10 are assumed for the cluster counts, respectively. The two thin, solid curves show the 1-and 2-halo term contribution to the total power FIG. 4: The cross-covariance between the cluster counts N and the lensing power spectrum P est κ (l) as a function of the angular multipole l for a ΛCDM model. Note that for the purpose of this Figure we assume a single redshift bin for the lensing power spectrum for a typical ground-based survey (see § IV A) and, for the cluster counts, we include all the clusters with masses above a given minimum halo mass over a range of redshifts 0 ≤ z ≤ 1. The dashed, solid and dotted curves demonstrate the results when minimum halo masses of Mmin = 1, 5 and 10 × 10 14 M⊙ are employed, respectively. The two thin, solid curves show the 1-and 2-halo term contributions to the cross-covariance of Mmin = 5 × 10 14 M⊙ (see Eqs. [18] and [19]), while the thin dashed and dotted curves show the 1-halo term contribution.
(bold solid curve) for the M min /10 14 M ⊙ = 5 mass cut. It is apparent that the cross-covariance at small angular scales l > ∼ 500 arises mainly from the 1-halo term contribution. Comparing the dashed, solid and dotted curves clarifies that the covariance amplitude gets greater with decreasing minimum halo mass, as the weak lensing and the cluster counts probe more similar density fields in the large-scale structure as implied in Fig. 3.
A more useful quantity is the cross-correlation coefficients defined as where the subscript '1' denotes the first redshift bin because for this calculation we are putting all the clusters into a single redshift bin, for illustration (the cluster redshift bin index b = 1 for this case). The coefficients quantify the relative importance of the cross-covariance to the auto-covariances at a given l. The upper panel of Fig. 5 shows the correlation coefficients for model parameters assumed in Fig. 4. The coefficients depend on the multipole bin width taken in the lensing power spectrum covariance calculation as well as on a survey sky coverage; we here assumed ∆l/l ≃ 0.04 and Ω s = 5000 deg 2 , except for the thin solid curve where a full-sky survey f sky = 1 is considered. The upper panel of Fig. 5 shows that the coefficients peak around l ∼ 1000, and decrease at smaller scales. On the intermediate scales there is a significant crosscorrelation since the 1-halo term in the lensing power spectrum depends so strongly on the number of clusters (Fig. 3). However on smaller scales the lensing covariance is dominated by shot noise in the intrinsic galaxy shapes (e.g., see Fig. 1 in [71]), which do not correlate with the cluster counts. Comparing the thin and bold solid curves shows that the coefficients have only weak dependence on the sky coverage, reflecting that the sampling variance of the cluster count covariance (13) roughly scales as f −1 sky for a large area survey of our interest, which is same dependence as the other elements in the covariances.
The lower panel of Fig. 5 shows the correlation coefficients with varying mass thresholds in the cluster counts, for a fixed multipole l of the lensing power spectrum. The lensing power spectrum at l ∼ 1000 is found to be most correlated with the cluster counts for the M min ∼ 10 15 M ⊙ mass cut. The correlation decreases at high mass thresholds when the number of clusters is very small and therefore not representative of the lensing field. The correlation also decreases at smaller masses since the contribution to the lensing power spectrum is small for light halos (Fig 3). As can be seen from comparison of the bold and thin solid curves, an inclusion of the intrinsic ellipticity noise suppresses the correlation coefficients.
FIG. 6: The relative contribution of each redshift clusters to the cross-covariance at a given multipole. We assumed the same model parameters as in Fig. 4. For an angular scale of l = 1000, clusters at z ∼ 0.2 most contribute to the crosscovariance, reflecting the peak of the lensing efficiency function. Note that we are here using the simple mass threshold for the cluster selection, not the lensing-based selection. On the other hand, for a larger angular scale of l = 100, most of contribution comes from clusters at lower redshifts.
We now consider multiple redshift bins in the cluster catalog. Fig. 6 shows the relative contribution of each cluster count redshift bin to the cross-covariance at a given multipole. It is clear that clusters at z ∼ 0.2 − 0.3 contribute most to the cross-covariance for an angular scale of l ∼ 1000. Since the number density of massselected cluster counts has a weak redshift dependence as shown in Fig. 1, one can notice that the redshift peak reflects redshift dependence of the lensing efficiency func-tion for a source redshift z s ∼ 1 that corresponds to the survey depth we assumed; structures at z ∼ 0.2−0.3 most efficiently cause the lensing effect on source galaxies at z s ∼ 1. It is also worth noting that clusters at higher redshifts have a smaller angular size (smaller virial radii) than θ ∼ 1/l (e.g. see the right panel of Fig. 2 in [59]). In other words, clusters at z > ∼ 0.5 may carry complementary information to the lensing power spectrum. On the other hand, for an angular scale of l ∼ 100, clusters at lower redshifts z ∼ 0.1 contribute most to the covariance, because the cluster virial radius matches such a large angular scale only if the cluster is located at lower redshift.

A. A CDM model and survey parameters
To compute the observables of interest we need to specify cosmological model and we assume survey parameters similar to those of future surveys in order to estimate realistic measurement errors.
We include the key parameters that may affect the observables within an adiabatic CDM dominated model with dark energy component: the density parameters are Ω de (= 0.73), Ω m h 2 (= 0.14), and Ω b h 2 (= 0.024) (note that we assume a flat universe); the primordial power spectrum parameters are the spectral tilt, n s (= 1), the running index, α s (= 0), and the normalization parameter of primordial curvature perturbation, δ ζ (= 5.07 × 10 −5 ) (the values in the parentheses denote the fiducial model). We employ the transfer function of matter perturbations, T (k), with baryon oscillations smoothed out [78]. We employ the dark energy model [79,80] parametrized as w(a) = w 0 + w a (1 − a), with fiducial values w 0 = −1 and w a = 0.
We specify survey parameters that well resemble a future ground-based survey (e.g., see [8]). We model the redshift distribution of galaxies by using a toy model given by Eq. (4) in [81]; we employ the parameter value z 0 = 0.3 leading the redshift distribution to peak at z peak = 2z 0 = 0.6 and have a mean redshift of z m = 3z 0 = 0.9. The intrinsic ellipticities dilute the lensing shear measurements according to Eq. (11); we simply assume that the shot noise contamination is modeled by the rms ellipticity per component, σ ǫ = 0.22, and the total number density of galaxies,n g = 30 arcmin −2 . The survey area is taken to be Ω s = 5000 degree 2 for our fiducial choice. Note that throughout this paper we will assume that the two observables we are interested in, the cluster number counts and the lensing power spectrum, are taken from the same survey region, to study how the cross-correlation affects the parameter constraints. as the two methods probe the same cosmic structure. Signal-to-noise (S/N ) ratio as a function of the mass threshold of the cluster count measurement. The solid curve shows the total S/N for a combined measurement of the cluster counts and lensing power spectrum where the cross-correlation between the two observables, Cov[N cl , Pκ(l)], is included. For comparison, the dashed curve shows the S/N assuming that the two observables are independent (i.e. the cross-correlation is ignored). The dotted and dot-dashed curves show the S/N when either of the cluster counts or the lensing power spectrum alone is considered. Lower panel: The percentage difference in S/N with and without the cross-covariance Cov[N cl , Pκ(l)] (i.e., the difference between the solid and dashed curves divided by the dashed curve in the upper panel ×100). Interestingly, the cross-correlation improves the S/N by up to 10% when only massive clusters with M > ∼ 10 15 M⊙ are included in the counts (see text for a more detailed discussion). All curves assume a survey with an area of Ωs = 5000 deg 2 . Note that we considered a single redshift bin for both of the two observables, and included the number counts of clusters with masses greater than the mass threshold over redshifts 0.05 ≤ z ≤ 1.0 and the lensing power spectrum at multipoles over 50 ≤ l ≤ 3000, respectively (see text for the details).
It is instructive to investigate the expected signal-tonoise (S/N ) ratio for a combined measurement of the cluster counts and the lensing power spectrum, in order to highlight how the cross-correlation between the two observables affects the measurement accuracies. The S/N can be estimated using the covariance matrix derived in § III as Here the data vector of our observables, D, constructed from the lensing power spectrum tomography with n s redshift bins and the cluster number counts with bredshift bins is defined as Note that the dimension of D is b+n l ×n s (n s +1)/2 when the lensing tomography with n l multipole bins and n s redshift bins and the cluster counts with b redshift bins are considered (also see below Eq. [16]). For a case of b = 10, n s = 3 and n l = 100, the dimension of D is 610. The full covariance matrix for the joint measurement, C c+g , can be constructed from Eqs. (13), (15), and (18) as Note that (C g+c ) −1 appearing in Eq. (21) is the inverse matrix of C g+c . For comparison we consider the S/N from each of cluster counts and weak lensing alone by using the relevant part of the data vector in Eq. (22) and the covariance matrices. We also compare with the S/N if the crosscorrelation is not taken into account, i.e. a matrix of zeros is used instead of the matrix C gc in Eq. (23). In this case that the two are independent, e.g. measured from non-overlapping two survey regions, the S/N values from each of the cluster counts and the lensing power spectrum alone therefore add in quadrature to form the joint S/N . When computing S/N in Eq. (21) care must be taken with numerical accuracy of the matrix inversion. The observables of interest, the angular number density of clusters and the lensing power spectrum, have different units and their amplitudes could therefore differ from each other by many orders of magnitudes. To avoid numerical inaccuracies caused by this fact, we have used the dimension-less covariance matrixC g+c normalized by the data vector as In terms of the re-defined covariance matrix, the total S/N can be computed simply as (S/N ) 2 ij . Fig. 7 shows the S/N ratios expected for a groundbased survey with area Ω s = 5000 deg 2 and our fiducial ΛCDM model, as a function of minimum halo mass used in the mass-selected cluster counts. Here we include all the clusters with masses greater than a given mass threshold over a range of 0.05 ≤ z ≤ 1, and include the lensing power spectrum at multipoles 50 ≤ l ≤ 3000 assuming the redshift distribution of galaxies described in § IV A. Note that we here consider a single redshift bin for both the cluster counts and cosmic shear power spectrum for simplicity, and the signal-to-noise ratios only slightly increase by adding redshift binned information (e.g., see Fig. 5 in [39]). First of all, we should notice that the lensing power spectrum and the cluster number counts have similar S/N ratios, when the mass threshold M min ∼ 10 14 M ⊙ is used. At mass thresholds smaller than 10 14 M ⊙ , the cluster counts (dotted curve) have a greater S/N than the lensing power spectrum (dotdashed curve) due to an increase in the number of sampled clusters, while the lensing power spectrum has a greater S/N at the greater mass threshold.
The solid curve shows the total S/N for a combined measurement of the cluster counts and the lensing power spectrum, when the cross-correlation between the two observables is correctly taken into account for the full covariance matrix (see Eq. [23]). We compare this to the standard approach in which the two probes are considered to be independent (dashed curve). The lower panel explicitly shows the percentage difference in S/N with and without the cross-covariance.
At small mass thresholds < ∼ 10 14 M ⊙ , the total S/N taking into account the cross-covariance is degraded compared to when the probes are considered independent. This is because the cosmic density field probed is shared by the two observables and therefore an inclusion of the cross-covariance reduces independence of the two observables. However, the degradation ceases at a critical mass scale where the total S/N (including the covariance) is equal to the S/N for the lensing power spectrum alone.
In other words, the total S/N is never smaller than the S/N obtained from either alone of the lensing power spectrum or the cluster counts.
Then, an intriguing result is found: the total S/N is slightly improved by including the cross-covariance as the mass threshold is increased up to M ∼ 10 15 M ⊙ , where the improvement is up to ∼ 10% as shown in the lower panel. This occurs even though the S/N ratio from the cluster counts alone is much less than that for the lensing power spectrum alone. The peak mass scale of the total (S/N ) corresponds to the mass scale at which the correlation coefficient of the covariance peaks as can be found in the lower panel of Fig. 5. That is, the improvement in S/N could happen when the two observables are highly correlated. Since the cross-covariance describes how the two observables are correlated with each other, it appears that a knowledge of the number of such massive clusters with M > ∼ 10 15 M ⊙ for a given survey region helps to improve the amount of information that can be extracted from the weak lensing measurement (also see [82,83] for the related discussion). In simpler words, if a smaller or greater number of massive clusters than the ensemble average value was observed from a given survey region, the observed lensing power spectrum will most likely have smaller or greater amplitudes at l ∼ 10 3 , respectively.
We reproduce this qualitative behavior using a simple toy model described in the Appendix C, where the lensing power spectrum is modeled to be given solely by the number of halos, ignoring the halo mass profile and the clustering of different halos. Based on this toy model we attribute the increase in the total S/N for high cluster mass thresholds to the fact that the lensing power spectrum amplitude is sensitive to the number of such massive clusters as demonstrated in Fig. 3. The fact that the lensing power spectrum is sensitive to the number counts weighted by the mass squared, means that it adds complementary information to the unweighted sum from the cluster counts.
It should, however, be noted that the improvement in S/N is achievable only if the cross-covariance is a priori known from the theoretical prediction e.g. based on CDM structure formation scenarios. Alternatively it could be obtained from a measurement of the crosscorrelation from the survey region.
In Fig. 8 we study which model ingredient in the full-covariance calculation mainly leads to the results in Fig. 7. The dashed curve shows the percentage difference in S/N when only the 1-halo terms are included in each element of the full covariance matrix (23), which corresponds to a simplified case that there is no clustering between halos. Compared to the solid line, or Fig. 7, the results are little different. The dot-dashed curve shows the result obtained when we ignore the intrinsic ellipticity noise that contributes only to the diagonal elements of the weak lensing power spectrum covariance. Again only a small difference is found. On the other hand, the thin dashed curve shows the results when only the 2-halo term contribution to the cross-covariance is included, which attempts to reproduce the results in the previous work [49]. For this case, the impact of the cross-covariance on the S/N is negligible as concluded in [49]. Rather, it turns out that the most important effect comes from the lensing trispectrum contribution to the lensing power spectrum covariance. If we switch off the non-Gaussian contribution, the percentage difference in S/N is significantly changed. In particular, there is a significant improvement in S/N by adding the cluster counts with M > ∼ 10 15 M ⊙ , because ignoring the lensing trispectrum decreases the diagonal elements in the full covariance matrix and thus enhances the relative importance of the cross-covariance. This also implies that the cosmic shear fields are highly non-Gaussian as carefully investigated in [48]. Fig. 9 demonstrates how this percentage difference in S/N depends on the sky coverage (f sky ) and the maximum multipole (l max ) of the lensing power spectrum. All the curves in Fig. 9 are very similar, showing a weak dependence on f sky and l max . (Note however that the absolute S/N itself has a strong dependence.) Nevertheless, there are several points to note when interpreting the results. Comparing the dotted, solid and dot-dashed curves clarify that, with increasing l max , the mass threshold corresponding to the dip in the percentage difference in S/N increases. This is because the lensing power spectrum at higher multipoles is more sensitive to the cosmic density fields down to smaller length scales, and therefore the cluster counts including less massive halos are more correlated with the lensing power spectra. Also the impact of the cross-covariance is reduced when assuming l max = 10 4 , compared to the fiducial case of l max = 3000, because the intrinsic ellipticity noise (shot noise) is dom-  Fig. 7, but the total signal-to-noise for the lensing-based cluster counts (see Fig. 2) combined with the lensing power spectrum measurement is shown against detection thresholds of the cluster lensing signal (see Eq. [4]). Note that the plotting range of y-axis in the upper panel is smaller than that of Fig. 7. inant in the lensing power spectrum covariance at such small scales.
Similar to Fig. 7, Fig. 10 shows the total S/N ratios for a combined measurement of the lensing-based cluster counts and the lensing power spectrum, as a function of the cluster lensing-signal thresholds (see Eq. [4] and Fig. 2). Notice that the plotting range of y-axis is roughly half of that in Fig. 7. Because the number densities for the lensing signal thresholds of interest are less than that for the mass-selected cluster sample (as shown in Fig. 2) the lensing-based cluster counts do not contribute much to the total S/N ratios, compared to Fig. 7. Other than this difference, the behavior for the S/N curves found in Fig. 7 are similar to those in Fig. 10.

C. Fisher analysis for cosmological parameter constraints
We now estimate accuracies of cosmological parameter determination, given the measurement accuracies of the observables, using the Fisher matrix formalism [84,85]. This formalism assesses how well given observables can constrain cosmological parameters around a fiducial cosmological model. The parameter forecasts we obtain depend on the fiducial model and are also sensitive to the choice of free parameters. Furthermore, the Fisher matrix gives only a lower limit to the parameter uncertainties, being exact if the likelihood surface around the lo-cal minimum is Gaussian in multi-dimensional parameter space. Ideally a more quantitative method would be used to explore the global structure to realize more accurate parameter forecasts. As described in § IV A, we include all the key parameters that can describe varieties in the observables within ΛCDM model cosmologies.
The Fisher information matrix available from the lensing power spectrum tomography is given by where the partial derivative with respect to the α-th cosmological parameter p α is evaluated around the fiducial model, with other parameters p β (α = β) being fixed to their fiducial values. The error on a parameter p α , marginalized over other parameter uncertainties, is given by σ 2 (p α ) = (F −1 ) αα , where F −1 is the inverse of the Fisher matrix. Similarly, the Fisher matrix for the cluster counts is given by For a combined measurement of the lensing power spectrum and the cluster counts, the Fisher matrix is calculated using the full covariance matrix defined by Eq. (23) (also see Eq. [24]) as where the summation indices i, j run over the redshift and multipole bins of the tomographic lensing power spectra as well as the redshift bins of the cluster counts. Using probes of structure formation alone is not powerful enough to constrain all the cosmological parameters simultaneously and well. Rather, combining the largescale structure probes with constraints from CMB temperature and polarization anisotropies significantly helps to lift parameter degeneracies and, in particular, the dark energy parameters (e.g. [39,86]). When computing the Fisher matrix for a given CMB data set, we employ 9 parameters: the 8 parameters described in § IV A plus the Thomson scattering optical depth to the last scattering surface, τ (= 0.10). Note that we ignore the contribution to the CMB spectra from the primordial gravitational waves for simplicity. We use the publicly-available CMB-FAST code [87] to compute the angular power spectra of temperature anisotropy, C TT l , E-mode polarization, C EE l , and their cross-correlation, C TE l . To model the measurement accuracies we assume the noise per pixel and the angular resolution of the Planck experiment that were assumed in [86].
To be conservative, however, we do not include the CMB information on the dark energy equation of state parameters, w 0 and w a . We do this because essentially angular positions of the CMB acoustic peaks constrain a degenerate combination of the curvature of the universe and the dark energy parameters, through their dependences on the angular diameter distance to the last scattering surface. We are assuming a flat universe and therefore wish to remove the artificially good constraint on dark energy that we would get from the CMB. Note, however, that our parameter forecasts shown below would not change significantly even for a non-flat universe, because we focus on large-scale structure probes in combination with the CMB constraints, as carefully shown in [88].
We remove the CMB information on the dark energy parameters using the following steps. We first compute the inverse of the CMB Fisher matrix, F −1 CMB , for the 9 parameters in order to obtain marginalized errors on the parameters, and then re-invert a sub-matrix of the inverse Fisher matrix that includes only the rows and columns for the parameters besides w 0 and w a . The submatrix of the CMB Fisher matrix derived in this way describes accuracies of the 7 parameter determination, having marginalized over degeneracies of the dark energy parameters w 0 and w 0 with other parameters for the hypothetical Planck data sets. In addition, we use only the CMB information in the range of multipoles 10 ≤ l ≤ 2000, and therefore we do not include the integrated Sachs-Wolfe (ISW) effect that contribute to the CMB spectra mainly at low multipoles l < ∼ 10, because the ISW effect is very likely correlated with the cosmic shear power spectrum and cluster counts, and we will ignore the correlations in this paper.
To obtain the Fisher matrix for a joint experiment combining the lensing and/or cluster experiments with the CMB information, we simply sum the two Fisher matrices as, e.g. F g+c+CMB = F g+c + F CMB because the CMB information can be safely considered as an independent probe to the low-z universe probes in our setting. Note that the final Fisher matrix such as F g+c+CMB has 9 × 9 dimensions.

D. Forecasts for parameter constraints
When forecasting cosmological parameter determination, we should notice that the redshift information inherent in the cluster number counts and the lensing power spectrum can be very powerful to significantly tighten cosmological parameter errors, especially dark energy parameters (e.g., see [39]). In the following, we will assume 3 redshift bins for lensing tomography and 10 redshift bins for the cluster number counts over 0.05 ≤ z ≤ 1.0, for a survey with 5000 deg 2 area. The 'three' redshift bins for lensing tomography is a minimal choice to obtain non-degenerate constraints on the 'three' dark energy parameters, Ω de , w 0 and w a as implied from Fig. 3 in [89], although four or more redshift bins lead to further improvement, albeit not so much, in the parameter errors. Since we ignore various systematic errors for both the cluster counts and the lensing tomography, we adopt a rather conservative setting for the lensing tomographic binning. However note that we have checked a different redshift binning for cluster counts in combination with the lensing tomography does not change the main results below significantly. An investigation into survey optimization for survey parameters (area and depth) and redshift binning will be presented elsewhere in a more practical manner taking into account possible effects of the systematic errors.
We first consider mass-selected cluster counts, and Fig. 11 shows the expected 68% limits on each of the parameters Ω de , δ ζ , w 0 and w a , as a function of mass thresholds in the cluster counts. In each case we have marginalized over the remaining 8 cosmological parameters (see § IV A and IV C for the cosmological parameters used). It should be also noted that the errors on these 4 parameters are enlarged only by ∼ 10% without the CMB priors. In the upper panel of each plot, the solid curve shows the error on a given parameter when the cross-covariance between the two observables is correctly taken into account, while the dot-dashed curve shows the error obtained from the lensing tomography alone. Comparing the solid and dot-dashed curves demonstrates that adding the cluster counts for the smaller mass cuts into the lensing tomography can more improve the errors on dark energy parameters, Ω de , w 0 and w a , because the two observables depend on cosmological parameters in different ways and combining the two can lift the parameter degeneracies (also see [49]). To be more explicit, the errors are improved by ∼ 40% for mass threshold M min ∼ 10 14 M ⊙ , while the errors are improved only slightly by ∼ 5% for M min ∼ 10 15 M ⊙ .
On the other hand, there is a complex behavior in the error on the primordial curvature perturbation, δ ζ . This is explained as follows. The cluster counts are very sensitive to the normalization parameter of the linear mass power spectrum (δ ζ for our case or σ 8 often used in the literature) through the sharp exponential cut-off of the halo mass function at high mass end. As we reduce the minimum mass threshold down to the range 3 × 10 13 < ∼ M min /M ⊙ < ∼ 10 14 we are beginning to lose information about the number of very high mass clusters, which are diluted in the total count by the large number of low mass clusters. At much lower mass cuts M min < ∼ 3 × 10 13 M ⊙ , the cluster counts come back to yield a tighter constraint on δ ζ than the lensing tomography through the better measurement accuracy due to the larger number of very low mass halos. It would be also worth pointing out that knowing the number of clusters can effectively allow the lensing power spectrum to yield more information on the linear theory part of the power spectrum (e.g. one could subtract off the contribution from the clusters to get at the two halo term). Therefore the constraint on the power spectrum amplitude can be partly improved from the joint constraint from the mass function and the linear power spectrum. Note that in this paper we consider a simple mass threshold for the cluster counts, however if clusters can be binned by mass then and wa (lower-right), marginalized over other parameters (9 parameters in total), as a function of mass thresholds used in the cluster counts. We assume 10 redshift bins for the cluster counts over redshifts 0.05 ≤ z ≤ 1.0, and 3 redshift bins for the lensing power spectrum tomography assuming the redshift distribution of galaxies described in § IV A, for a survey of 5000 deg 2 area. In the upper panel of each plot, the solid curve shows the errors expected from a combined measurement of the cluster counts and the lensing tomography when the cross-covariance between the two observables are correctly taken into account, while the dot-dashed curves shows the errors for the lensing tomography alone. The dashed curve shows the error from combining cluster counts and lensing tomography when the cross-covariance is ignored. Note that all the results shown here assume the Planck priors discussed in § IV C. The lower panel of each plot shows the percentage difference in the parameter errors with and without the cross-covariance, highlighting the impact of the cross-covariance on the parameter forecasts. The errors on the dark energy equation of state parameters, w0 and wa, are improved by an inclusion of the cross-covariance over mass thresholds we have considered, but the effect is small (only a few %). the information would be restored and we may also use the shape of the mass function to constrain cosmological parameters (e.g. [90]).
The impact of the cross-covariance between the two observables on the parameter forecasts can be found from comparison of the solid and dashed curves: the dashed curve shows the error obtained when the two observables are considered to be independent. Further, the lower panel of each plot explicitly presents the percentage difference in the errors. The impact of the cross-covariance on the parameter errors is generally very small, at only a few per cent. Fig. 11, but this plot explicitly shows projected 68% error ellipses in two-parameter subspace the dark energy parameters (Ω de , w0, wa), for cluster counts of mass cut Mmin = 5 × 10 14 M⊙. The outermost and middle bold-solid curves in each panel are the error ellipses expected when using either alone of the cluster counts or the lensing power spectrum tomography in combination with Planck, while the innermost shaded ellipses show the errors for the two observables combined. For comparison, the thin-solid curves show the error ellipses obtained when ignoring the cross-covariance; the effect is very small (the area is enlarged only by a few %). The cross symbol in each plot shows the fiducial model for the Fisher matrix analysis.

FIG. 12: As in
Nevertheless, interestingly, in some cases an inclusion of the cross-covariance leads to an improvement in the parameter errors; for example, the errors on w 0 or w a are improved over a range of the mass thresholds we have considered. This perhaps counter-intuitive result is in part due to working in 9 dimensional parameter space, and could also be explained as follows (also see [48] for related discussion). As we have carefully investigated, the cross-covariance predicted from a CDM model quantifies how the cluster counts and the lensing power spectrum amplitude are correlated with each other in redshift and multipole space. The positive cross-correlation shown in Fig. 4 implies that for a given survey region, if the number of clusters probed happens to be higher or lower than the ensemble average, the lensing power spectrum will be expected to have larger or smaller amplitudes, respectively. Therefore, such a correlated offset in the two observables makes it difficult to determine their true amplitudes compared to the case in which the two observables are independent, thereby degrading the errors of parameters that are primarily sensitive to the amplitudes of the two observables. This explains the degradation in the errors on Ω de and δ ζ for some range of mass thresholds. On the other hand, the correlated offset rather preserves relative values between the cluster counts and the lensing spectrum amplitudes in redshift and multipole space. That is, a priori knowledge on the cross-covariance leads to an improvement in the errors on parameters that imprint characteristic redshift and multipole dependences onto the cluster counts and the lensing power spectrum. This is the case for the parameters w 0 and w a .
In Fig. 12 we show the projected 68% C.L. error ellipses in the dark energy parameter space, for one particular example mass threshold, M min = 5 × 10 14 M ⊙ . The error ellipses in a two-parameter subspace highlight how the two parameters considered are degenerate for a given observable and the degeneracies are broken by combining different observables. It can be seen that the lensing power spectrum tomography and the cluster counts have similar degeneracy directions in constraining the dark energy parameters. Adding the cluster counts only slightly improves the parameter errors compared to the errors from the lensing tomography alone. The plot also shows that the cross-covariance has a negligible effect on the error ellipses. Fig. 13 shows the results for the lensing-based cluster counts, as a function of the cluster lensing signal, where clusters having a lensing signal greater than a given threshold, (S/N ) cluster , are included in the counts. As in Fig. 11, we consider 10 redshift bins for the cluster counts over redshifts 0.05 ≤ z ≤ 1 and 3 redshift bins for the lensing tomography. Note that the plotting range of yaxis in the upper panel of each plot is same as that in Fig. 11, while the plotting range in the lower panel is different.
First of all, it should be noted that adding the lensingbased cluster counts into the lensing tomography does tighten the errors on Ω de , w 0 and w a significantly even though the cluster counts include fewer clusters than the mass-selected counts (see Fig. 2). For the threshold FIG. 13: Similar to the previous plot, but for the lensing-based cluster counts as a function of the detection thresholds, where clusters having the lensing signal greater than the threshold are included in the counts. Note that the plotting range of y-axis in the upper panel of each plot is same as that in Fig. 11. A similar improvement in the parameter error is obtained by adding the cluster counts, even though the lensing based cluster counts generally contain fewer clusters than the mass-selected cluster counts as shown in Fig. 2. The inclusion of the covariance between cluster counts and lensing tomography is slightly larger here, compared to when a mass threshold is used for the cluster counts.
(S/N ) cluster ∼ 10, which includes clusters with masses M > ∼ 10 15 M ⊙ and mainly covers a narrow redshift range of 0.05 < ∼ z < ∼ 0.6, the cluster counts still improve the dark energy parameters by ∼ 25%, in contrast with only ∼ 4% improvement for the mass-selected cluster counts with M > ∼ 10 15 M ⊙ in Fig. 11. We find the same percentage improvement when the CMB priors are not included. With the reasonable value of (S/N ) cluster ∼ 6 the uncertainties are halved by adding cluster counts to lensing power spectra. The relatively amplified sensitivity to dark energy is attributed to the fact that the cluster lensing signal itself depends on the dark energy parameters via the lensing efficiency, even for a fixed halo mass (see Eq. [4]).
For the primordial curvature perturbation δ ζ , adding the cluster counts does not improve the error much, compared to the result in Fig. 11. The parameter δ ζ does not affect the amount of lensing for a given cluster so the poorer accuracy just arises from larger statistical errors due to the smaller number of clusters, compared to the mass-selected counts.
As shown in the lower panel of each plot, the crosscovariance has more influence on the parameter forecasts, compared to Fig. 11. This is because lensing-based clus-FIG. 14: As in Fig. 12, but for the lensing-based cluster counts of detection threshold (S/N ) cluster = 6.
ter counts and lensing tomography both pick up halos over a very similar range in redshifts. Therefore there are more significant cross-correlations between the two observables. However, the effect of the cross-covariances on the parameter errors is small, by less than ∼ 10%, for (S/N ) cluster > ∼ 6. Fig. 14 shows the marginalized error ellipses, for the lensing-signal detection threshold (S/N ) cluster = 6. The degeneracy directions in dark energy parameter constraints are very similar between the cluster counts and the lensing tomography. Compared to Fig. 12, the lensing-based cluster counts have a better accuracy of constraining the dark energy parameters, thereby leading the error ellipses to more shrink when the cluster counts and the lensing tomography combined. This can be explained because the contours in the dark energy equation of state parameter space is just a projection of the full 9d space. We have investigated eigendirections in cosmological parameter space and identified differences between cluster counts and lensing for δ ζ and Ω m h 2 (effectively h given that Ω m is a parameter in the Fisher matrix). For example, we find that if we plot w 0 against δ ζ then we see that the cluster counts plus CMB contours are more aligned with the w 0 axis whereas the lensing plus CMB contours are more aligned with the δ ζ axis. When combined together, both degeneracies are broken and the error bar on w 0 is reduced. Similarly for δ ζ and w a and Ω m h 2 with each dark energy parameter. This explains why the combined contours (cluster counts plus lensing plus CMB) are smaller than either separate contour (cluster counts plus CMB or lensing plus CMB), even though the separate constraints have the same degeneracy directions when projected down onto w 0 versus w a space. Table I summarizes the results shown in Figs. 12 and 14 showing the marginalized uncertainties for determination of the 4 parameters Ω de , δ ζ , w piv and w a , where M min = 5 × 10 14 M ⊙ and (S/N ) cluster = 6 are employed for the mass-selected cluster counts and the lensing-based cluster counts, respectively. The error on w piv , σ(w piv ), shows the error in the dark energy equation of state at the best constrained redshift for given observables, or equivalently the error on the constant equation of state parameter w 0 obtained by fixing w a . Note that the pivot redshift z pivot is similar for all the cases: z pivot ∼ 0.5. The numerical value σ(w piv ) × σ(w a ) is proportional to the area of error ellipses in the right panels of Figs. 12 and 14. For the mass selected clusters there is a mild improvement in both the error on w piv and w a when adding cluster counts to lensing tomography. For the lensing selected clusters case, the improvement is mostly in w a .

E. Discussion of systematic errors
We have considered idealized cases: we have ignored possible systematic errors involved in both the cluster counts and the lensing power spectrum measurement for simplicity. In this subsection, we present some discussion of possible effects of the ignored systematic errors on our results.
An imperfect knowledge of galaxy redshifts inferred from multi-band imaging data (photometric redshifts hereafter simply photo-z) could affect both measurements of cosmic shear and cluster counts. For cosmic shear, statistical errors in photo-zs are unlikely to be a dominant source of the error budget of cosmic shear measurement, if they are well characterized [81,89]. This is because gravitational lensing has a broad redshift sensitivity function. As carefully investigated in [81,89], the dominant source of the systematic error is rather caused   by mean bias in photo-zs, causing mean redshifts of the tomographic bins to be shifted relative to the true mean redshifts. For planned future surveys, the mean redshifts need to be known to better than a few tenths of a percent accuracy in redshift in order to avoid much degradation in cosmological parameter errors. To achieve this requirement, a large representative spectroscopic redshift sub-sample of the full set of galaxies used for lensing may be needed to calibrate/correct photo-z errors. For cluster counts, photo-z errors cause uncertainties in redshift estimates of individual clusters and in addition cause uncertainties in the lensing signal of individual clusters, if a lensing-based cluster catalog is used. For the lensing signal, the requirements on photo-z accuracy would be similar to that for cosmic shear, as discussed in the previous paragraph. To estimate the redshift of a cluster using photo-zs, we often have old red-sequence galaxies for which good photo-zs are easier to obtain due to a strong 4,000Å break (e.g., [41]). Further, the redshift of the cluster is an average over the redshifts of the cluster members, thus reducing the uncertainty yet further. In addition, since clusters are relatively rare objects it would not be very expensive to perform follow-up spectroscopy on a central bright galaxy or some member galaxies. These high-quality redshifts would allow much finer redshift binning of the cluster distribution than redshift bins of the cosmic shear tomography. Then, taking the cross-correlation between the clusters with known redshifts and a fair sub-sample of the galaxies used for the cosmic shear tomography may be used to calibrate the photo-z errors, because the cross-correlation is nonvanishing only if the source galaxies are physically associated with the clusters (see [91] for the related discussion). This issue would be worth exploring further, and will be presented elsewhere.
Intrinsic alignments of galaxy ellipticities are another potential source of systematic errors for cosmic shear measurement (see [92] for the detail and references therein). There are two kinds of the contamination.
The first is intrinsic-intrinsic galaxy alignment (II) that may arise from neighboring galaxies residing in a similar tidal field of large-scale structure [93]. The second effect is a cross-correlation between intrinsic ellipticity of a foreground galaxy and lensing distortion of background galaxy shape (GI) because the foreground tidal field affecting the intrinsic ellipticity of a foreground galaxy may also cause lensing shear of a distant, background galaxy ( [94]). In general member galaxies of a cluster tend to be more elliptical and therefore the width of the ellipticity histogram (intrinsic ellipticity dispersion) will be smaller for cluster members due to the absence of e.g. edge-on spirals. Therefore if there are by chance more clusters in a surveyed area then the noise on the shear power spectrum will be slightly smaller. This is likely a tiny effect and can be safely neglected. Another interesting possibility is that if there are many clusters in a surveyed area then both the II and GI contamination to the cosmic shear power spectra would be larger, since member galaxies of a cluster are more aligned with each other. Also the stronger tidal field due to the cluster may also cause the cluster members to be more anti-aligned with background galaxy shapes due to lensing distortion by the cluster. However, identifying clusters within a given survey region could be a useful way to remove/correct this II and GI contamination, which is another interesting possibility of the combined cluster counts and cosmic shear to use for future surveys.
For cluster counts, a most problematic source of the systematic errors is the uncertainty in relating cluster observables to halo mass. One traditional way to tackle this obstacle is to investigate properties of known massive clusters in great detail combing various techniques (radio, optical, cluster lensing, X-ray and the SZ effect). Or one might develop a reliable model for the mass-observable relation using hydrodynamical simulations of cluster formation fully taking into account the associated physical processes in the intracluster medium. Then the massobservable relation obtained in these ways could be used for cluster counting statistics if the derived relation is a fair representation of the mass-observable distribution of clusters in the sample.
For lensing-based cluster counts, projection effects on a cluster lensing signal due to mass along line-of-sight that is not associated with the cluster introduce additional statistical errors in the mass estimates of individual clusters [54,65]. In addition, the scatter will be correlated with the cosmic shear power spectrum, which we have also not taken into account. To estimate the mass estimate uncertainty and the effect of the ignored correlation in a quantitative way, ray-tracing simulations of cosmic shear including cluster lensing contributions will be needed. Also in practice traditional methods (optical, X-ray, the SZ effect) will need to be combined to exclude false clusters from the sample. These issues are beyond the scope of this paper.
One may develop a model to describe the massobservable relation in terms of nuisance parameters. Then we could use cluster observables available from a given survey to 'self-calibrate', i.e. determine both the cosmological parameters and the nuisance parameters concurrently. In particular, it was shown in [23,27] that adding the two-point correlation function of clusters to cluster counts, both of which are drawn from the same survey region, can be a useful way to self-calibrate the model systematic errors in the mass-observable relation because the amplitude of the cluster two-point function is very sensitive to halo bias that is fairly well specified by halo masses.
Having the discussion above in mind, it would be interesting to address whether the self-calibration regime could be attained for the combined measurements of cluster counts and cosmic shear tomography, taking into account the effects of systematic errors involved in each observable. The cluster counts and cosmic shear depend on the cosmological parameters in different ways and are sensitive to different systematic errors. Hence one can use the combined measurement to constrain simultaneously the cosmological parameters as well as the nuisance parameters of systematic errors, mitigating degradation in the cosmological parameter determination due to the systematic errors. Also importantly one could realize, for a given survey, the requirements on the control of the systematic errors (photo-z, mass-observable relation etc) to attain the desired accuracy of constraining dark energy parameters. In this direction, the cross-covariances between the cluster counts and cosmic shear tomography may play an intriguing role, because (1) the crosscorrelations are cosmological signals arising from the cosmic mass density field in large-scale structures or, in other words, there is in general little cross-correlation between the systematic errors in the two observables, and (2) a CDM structure formation model provides accurate predictions for the cosmological cross-covariances. Hence including the cross-covariances in the parameter estimations may be used as another viable monitor of the systematic errors. This interesting issue is beyond the scope of this paper and will be presented elsewhere.

V. CONCLUSION AND DISCUSSION
In this paper we have estimated accuracies on cosmological parameters derivable from a joint experiment of cluster counts and cosmic shear power spectrum tomography when the two are drawn from the same survey region. In doing this we have properly taken into account the cross-covariance between the two observables, which describes how the two observables are correlated in redshift and multipole space. This is necessary because the two experiments probe the same cosmic density fields. However note that, since we have ignored possible systematic errors, all the results shown in this paper demonstrate pure cosmological powers for the combined method. We will below summarize our findings, and then will discuss the remaining issues.
We have developed a formulation to compute the crosscovariance between the cluster counts and the cosmic shear power spectra based on the dark matter halo approach within the framework of a CDM structure formation model (see Appendix). The cross-covariance arises from the three-point correlation function between the cluster distribution and two points of the mass density fields. It is found that there is a significant positive crosscorrelation between the cluster counts probing clusters with masses M > ∼ 10 14 M ⊙ and the lensing power spectrum amplitudes at multipoles l > ∼ 10 3 . Here the term 'positive' is used to mean that if fewer or more massive clusters are found from a given survey region than the ensemble average, the lensing power spectra will most likely have smaller or larger amplitudes, respectively. The cross-correlation on angular and mass scales of interest arises mainly from the 1-halo term contribution of the three-point correlations: the correlation between one point within a given cluster and the shearing effects on two different background galaxies due to the same cluster. Our results are more accurate than the earlier work presented in [49], because their work ignored the 1-halo term contribution to the cross-covariance and only included the 2-halo term contribution, which is dominant only on large angular scales where the useful cosmological information can not be extracted.
To quantify the impact of the cross-covariance, we first investigated the total signal-to-noise (S/N ) ratios for a joint measurement of the cluster counts and the lensing power spectrum. It was shown that an inclusion of the cross-covariance leads to degradation and, depending on the mass thresholds or the lensing detection thresholds, improvement in the S/N ratios up to ∼ ±20% compared to the case that the two observable are considered to be independent (see Figs. 7 and 10). The improvement occurs when the cluster counts including massive halos M > ∼ 10 15 M ⊙ are combined with the lensing power spectrum measurement (also see [82,83] for the related discussion). This occurs even though the S/N ratio for the cluster counts alone is much less than that for the lensing power spectrum alone. That is, a knowledge of the number of such massive clusters for a given survey region helps improve accuracies of the joint measurement. This improvement is achievable only if the cross-covariance is a priori known by using the theoretical predictions or by directly estimating the cross-correlation from the survey region. We also note that the results change greatly if we ignore the non-Gaussian error contribution to the lensing power spectrum covariance, which arises from the lensing trispectrum (see Fig. 8). This implies that the lensing fields are highly non-Gaussian (see [48] for an extensive discussion).
We then presented forecasts for accuracies of the cosmological parameter determination for the joint experiment. To do this we included redshift binning for both the cluster counts and the lensing power spectrum, motivated by the fact that the additional redshift information is very useful to tighten the cosmological parameter constraints, especially the dark energy parameters. In this paper we considered two simplified cluster selection criteria: one is a mass-selected cluster sample, and the other is the lensing-based cluster sample, where the latter contains clusters having the lensing signal greater than a given threshold in the sample. For the mass-selected cluster counts, it was found that combining the cluster counts and the lensing tomography leads to significant improvement in the errors on the dark energy parameters by ∼ 40% only if the cluster counts including down to less massive halos such as M > ∼ 10 14 M ⊙ are considered (see Fig. 11). The improvement is due to different dependence of the two observables on the cosmological parameters.
On the other hand, for the lensing-based cluster counts, adding the cluster counts to the lensing power spectrum tomography is more complementary to tighten the errors on the dark energy parameters than the massselected cluster counts (see Fig. 13). For example, adding the counts of clusters with the high lensing signals (S/N ) cluster > ∼ 6 improves the dark energy errors by a factor of 2, even though the counts contain many fewer clusters and probe a narrower redshift range than the mass-selected clusters of M > ∼ 5 × 10 14 M ⊙ (see Fig. 2). This result is encouraging because such massive halos are rare and therefore it seems relatively easy to make followup observations, e.g., in order to obtain well-calibrated relations between cluster mass and observables (also see § IV E for the discussion). The reason lensing-based cluster counts are more powerful is ascribed to the fact that the cluster lensing signal itself depends on the cosmological parameters via the lensing efficiency and the dependence amplifies the sensitivity of the cluster counts to the dark energy parameters. However, with low detection thresholds such as (S/N ) cluster < ∼ 3 the lensingbased counts begin to suffer too much from projection effects due to large-scale structures that are not associated with the cluster. Hence, if traditional mass-selected cluster counts can go to lower masses then they might catch up with, or overtake, the lensing-based counts in their constraining power.
For the impact of the cross-covariance on the parameter determination, the effect is generally small for both the mass-selected and lensing-based cluster counts. This is partly because the lensing power spectra are sensitive to the total number of clusters roughly weighted by the cluster mass squared whereas for the cluster counts we simply added up the number of clusters (see Appendix C). This means that the two probes are not measuring such a similar quantity and the cross-covariance is smaller than if they both measured the unweighted total number of clusters. Further, the redshift weighting is different for the lensing power spectra and the cluster counts, so not all the halos are in common. It is also partly a result of working in multi-dimensional parameter space (9 parameters for our case). Yet, it is intriguing to note that the dark energy parameters are in most cases improved by including the cross-covariance (see the lower panels of each plot in Figs. 11 and 13; also see [48] for the related discussion). In summary, a joint experiment of cluster counts and lensing power spectrum tomography will be worth exploring in order to exploit full information on the cosmological parameters from future massive surveys, and including the cross-covariance will be needed in order to correctly estimate the error bars.
In this work we have assumed that cluster counts measure the total number of clusters above some threshold, in a number of redshift bins. In principle subdividing cluster counts in mass or lensing signal bins could also improve cosmological parameter constraints [25,26,90]. This would make the improvement on including cluster counts to lensing power spectra even more impressive, however the covariance may be more important than we find in this paper.
Finally, we comment on a possibility for ultimate experiments combining all observables available from one survey region. As we have shown, one can combine different observables to improve accuracies of the cosmological parameter determination, even though the observables probe the same cosmic density fields. Besides the cluster counts and the lensing power spectrum considered in this paper, there will be other various observables available: cosmic shear bispectra or more generally n-point correlation functions of the cosmic shear fields [39], n-point correlation functions of cluster and galaxy distributions [95], small-scale cluster lensing signals [96], cosmic flexion correlation functions [97,98] and so on. Then, one natural question raises: Can we combine all the observables in order to improve the parameter constraints as much as possible? Or, in the presence of the systematic errors, is there an optimal combination of the observables to maximize the parameter constraints as well as most mitigate degradation in the parameter constraints due the systematic errors. However, to address this interesting issue quantitatively, all the covariances between the observables used have to be correctly taken into ac-count. We believe that the formulation developed in this paper would be useful to compute the covariances for any observables and the combinations. This kinds of study will be worthwhile exploring in order to exploit the full potential of future expensive surveys for constraining the nature of mysterious dark energy components and possible modifications of gravity.
[ where we have introduced the normalized halo density profile u m (x) through ρ h (x; m) = mu m (x), the summation i runs over halos (the index i denotes the i-th halo) and δ D (x) is the Dirac delta function. The first equality just says that the mass density field at the position x is expressed as a superposition of density profiles of all halos existing in the universe, where the vector between the position x and the halo position is given by x − x i with x i being the i-th halo's center.
The ensemble average over realisations of the universe of the mass density field is shown to be where we have assumed the ensemble average so that the ensemble average does not depend on any specific spatial position. Eq. (A2) thus demonstrates that the ensemble average of the mass density field is equal to the cosmic average mass densityρ m as expected. Note that the mass function n(m) given in Press & Schechter [62] or the improved one in [61] by definition satisfy the normalization condition dm n(m)(m/ρ m ) = 1. To properly define the halo mass m for an extended halo profile the the normalization condition d 3 x u m (x) = 1 must be satisfied. In this paper we assume the density is zero outside the virial radius (see [56] for a detailed discussion about which mass definitions are self-consistent in the halo model approach).
In this paper we consider constraints on cosmology calculated from number counts of clusters in a hypothetical cluster experiment (see § II B). In number counts the cluster distribution is treated as points, and in other words one does not care about the shape of the mass distribution within a cluster. The relevant quantity is the number density field of clusters, which can be straightforwardly modeled based on the halo model formulation, from a slight modification of Eq. (A1): where the subscript 'cl' in n cl stands for cluster and S(m) denotes the selection function that discriminates clusters used in the number counts from other halos. The ensemble average of the cluster number density field (A3) is found, similarly to in Eq. (A2), to be given bȳ Thus, as expected, the ensemble average of the number density field is indeed given by mass-integral of the halo mass function with a given selection criterion.

Correlation functions of mass and cluster distributions
In this subsection we use the halo model approach to derive the correlation functions of the cluster distribution and the cross-correlation between the cluster distribution and the mass distribution. These are needed to quantify covariances of the cluster counts and the cross-covariance between the cluster counts and the lensing power spectrum, respectively.
From Eq. (A3), the 2-point correlation function of the cluster number density field can be computed as where we have used S 2 (m i ) = S(m i ) since S(m i ) = 1 or 0 (see text below Eq. [3]), and the 2-point correlation function ξ cc is dimension-less as usual. The first term on the r.h.s. represents the 1-halo term contribution that arises due to discrete nature of clusters probed (see § 31 in [101]), while the second term gives the 2-halo term contribution that arises from clustering of clusters. Note that in the last line on the r.h.s. we have assumed the 2-point correlation between different two halos with masses m and m ′ is given by the linear theory mass correlation function, ξ L δ (r), multiplied by the halo bias parameters b(m) and b(m ′ ): ξ h (r; m, m ′ ) = b(m)b(m ′ )ξ L δ (r). Next we consider the 3-point correlation functions between the cluster number density field and two points on the mass density fluctuation field. This 3-point function is needed to quantify the cross-covariance between the cluster counts and the lensing power spectrum, where the lensing power spectrum arises from the 2-point correlation of the mass fluctuation field. The 3-point correlation function we are interested in is defined as where ζ cδδ is dimension-less, δ m denotes the mass density fluctuation field, and δn cl (x 1 ) is the fluctuation part of the cluster number density field n cl (x) in Eq. (A3) (the homogeneous part does not contribute to the 3-point correlation).
In the second equality on the r.h.s. we divided the 3-point function into three distinct contributions: the 1-, 2-and 3halo terms from the picture of the halo model approach. Note that the 3-point correlation function is given as a function of triangle configuration, and the amplitude is invariant under parallel translation and rotational transformation of triangle configuration assuming the statistical symmetry. Therefore, the 3-point correlation function is specified by three parameters that describe a triangle configuration, e.g. three side lengths.
Using a similar calculation procedure as used in Eq. (A5), the 1-halo term contribution to ζ cδδ can be computed as The 2-halo term contribution ζ 2h cδδ is found to bē The 2-halo term arises from the correlation between two different halos, where the clustering strength between the two halos is given by b(m 1 )b(m 2 )ξ L δ the same as we did in Eq. (A5). The 3-halo term ζ 3h cδδ in Eq. (A6), which arises from the correlation between three different halos, is given bȳ where ζ PT δ is the perturbation theory prediction for the 3-point correlation function of the mass density field, and we have assumed that the 3-point correlation of halo distribution can be expressed by ζ PT δ multiplied by halo bias parameters: For convenience for the following discussion, we derive the Fourier-transformed counterpart of the 3-point correlation function ζ cδδ , the bispectrum B cδδ . The bispectrum is related to the 3-point correlation function via the Fourier transform given by Note that the bispectrum is also defined by the ensemble average of the three Fourier-transformed coefficients of the cluster number density field and the mass density fields as Similarly to in Eq. (A6), the bispectrum can be divided into the 1-, 2-and 3-halo term contributions as Combining Eq. (A7) and Eq. (A10), the 1-halo term of the bispectrum, B 1h cδδ , is found to bē Hereũ m (k) is the Fourier transform of the halo density profile defined as where we have assumed a spherically symmetric density profile for simplicity, r vir is the virial radius (more generally, the boundary radius of a halo used for the mass definition), and j 0 (x) is the zero-th order spherical Bessel function, j 0 (x) = sin x/x. Note thatũ m has the property thatũ m (k) = 1 for k → 0. From Eq. (A13), one finds that the bispectrum does not depend on wavenumber k 1 that comes from the Fourier transform of the cluster number density contribution; since the cluster distribution is modeled as discrete points, the contribution to the 1-halo term arises from representative one point within a given halo (e.g., the halo center), which corresponds to white noise (therefore no k-dependence) in the Fourier transform. Similarly, from Eq. (A8), the 2-halo term contribution to the bispectrum, B 2h cδδ , is given bȳ Here, square brackets are used in order to emphasize that the halo mass integral in the terms enclosed by square brackets can be calculated separately from other terms. From Eq. (A9), the 3-halo term of the bispectrum, B 3h cδδ , is found to bē where B PT δ is the perturbation theory prediction for the mass bispectrum given by B PT δ (k 1 , k 2 , k 3 ) = 10 7 + k 1 k 2 + k 2 k 1 k 1 · k 2 k 1 k 2 + 4 7 (k 1 · k 2 ) 2 k 2 1 k 2 2 P L δ (k 1 )P L δ (k 2 ) + 2 perm., where the terms denoted by '2 perm.' are obtained from two permutations of k 1 ↔ k 3 and k 2 ↔ k 3 (e.g., see by [102] for the derivation of B PT δ ).
convergence field available over a given survey region. The Fourier decomposition has to be done for modes taken from a finite survey region. For such a finite sky measurement, infinite number of the Fourier modes are not available. Therefore, the Fourier decomposition is by nature discrete, and the fundamental mode is limited by the size of surveyed area, l min = 2π/Θ s , where the survey area is given by Ω s = Θ 2 s (we assume a square survey geometry for simplicity) 3 . For this case, the convergence field can be expanded using the discrete Fourier decomposition as where the summation runs over the combination of integers (n x , n y ) for l = (2π/Θ s )(n x , n y ). Here we consider the convergence field for a single source redshift bin for simplicity, and it is very straightforward to extend the following discussion to a tomographic case. For an infinite survey limit (Θ s → ∞), the Fourier transform above becomes In the discrete Fourier expansion, the orthogonal relation for eigenmode function e il·θ is given by where the integration range for d 2 θ is confined to the survey area and δ K l−l ′ is the Kronecker type delta function for vectors defined as The orthogonal relation (B4) implies that the Kronecker delta function δ K l−l ′ should be replaced with the Dirac delta function for an infinite survey (Θ s → ∞) (also see [72]) as From this relation, the definition for the convergence power spectrum is also modified for the finite-sky Fourier decomposition to κ l1κl2 = Ω s δ K l1+l2 P κ (l 1 ) (B7) in that the power spectrum definition matches the conventional one κ l 1κ l 2 = (2π) 2 δ 2 D (l 1 + l 2 )P κ (l 1 ) for an infinite survey limit. Note that, from Eqs. (B2) and (B4), the inverse Fourier transform is given byκ l = Ωs d 2 θ κ(θ)e il·θ .
Once the discrete Fourier modes of the convergence field for a finite survey are defined, an estimator for the convergence power spectrum measurement may be defined as where the summation runs over all the Fourier modes whose length is in the range of l − δl/2 ≤ |l| ≤ l + δl/2 for a given bin width δl. Here N p (l) is the number of modes taken for the summation, and is given by N p (l) = li;l∈l b ≈ 2πlδl/(2π/Θ s ) 2 = 2lδlf sky , where f sky is the sky coverage as Ω s = Θ 2 s = 4πf sky . Note that Eq. (14) corresponds to an integral form of Eq. (B8), where the two approximately match each other for large l ≫ 1/Θ s . Using Eq. (B7), the ensemble average of the power spectrum estimator (B8) is found to indeed give the underlying true power spectrum P κ (l): where we have assumed that the power spectrum changes little within the bin width in the third equality on the r.h.s. The covariance between the convergence power spectra in multipole bins l and l ′ is defined as Thus an estimation of the power spectrum covariance requires a knowledge on the 4-point correlation functions of the convergence field. The 4-point correlation function generally has two contributions; one is the Gaussian contribution given by the power spectrum, and the other is the non-Gaussian contribution that is the connected part of the 4point function, the so-called trispectrum. The trispectrum of the convergence field is naturally induced by non-linear evolution of gravitational clustering in structure formation, which carries additional information beyond that of the 2-point functions. For a finite-sky Fourier decomposition, assuming the statistically isotropic, random field forκ l , the 4-point function in Eq. (B10) can be expressed in terms of the power spectrum and the trispectrum as κ lκ−lκl ′κ −l ′ = κ lκ−l κ l ′κ −l ′ + κ lκl ′ κ −lκ−l ′ + κ lκ−l ′ κ −lκl ′ + κ lκ−l ′κ −lκl ′ c = Ω 2 s P κ (l)P κ (l ′ ) + Ω 2 s P κ (l)δ K l+l ′ P κ (l ′ )δ K l+l ′ + Ω 2 s P κ (l)δ K l−l ′ P κ (l ′ )δ K l−l ′ + Ω s T κ (l, −l, l ′ , −l ′ ). (B11) Inserting this equation into Eq. (B10) gives l;l∈l b l ′ ;l ′ ∈l b 2P 2 κ (l)δ K l+l ′ + 1 N p (l)N p (l ′ )Ω s l;l∈l b l ′ ;l ′ ∈l ′ b T κ (l, −l, l ′ , −l ′ ) = 1 N p (l) 1 N p (l ′ ) l;l∈l b 2P 2 κ (l)δ K ll ′ + 1 N p (l)N p (l ′ )Ω s l;l∈l b l ′ ;l ′ ∈l ′ b T κ (l, −l, l ′ , −l ′ ) where A(l) ≡ |l|∈l d 2 l ≈ 2πlδl. In the third equality for the first term calculation, we have replaced the Kroneckertype delta function for vectors, δ K l+l ′ , with the delta function for scalar, δ K ll ′ , because the first term is non-vanishing only if two multipoles l and l ′ are same to within the bin widths. In the fifth line, we have used the integral form for the second term rather than the summation form for notational simplicity. The first term of the covariance represents the Gaussian errors where the power spectrum of different multipoles are independent, while the second term represents the non-Gaussian errors to describe correlations between the power spectra in different multipole bins. Extending Eq. (B12) to the tomographic case for source redshift distribution gives Eq. (15) in main text. The equation (B12) is equivalent to the expression used in [46] when l, l ′ ≫ 1. cluster counts. From Eqs. (6) and (B8), the cross-covariance is defined as Cov[N cl , P est κ (l)] ≡ N cl P est κ (l) − N P κ (l) = δN cl (θ)P est κ (l) = 1 Ω 2 s N p (l) l;l∈l b d 2 θW (θ) κ lκ−l δN cl (θ) = 1 Ω 2 s N p (l) In the second line on the r.h.s., we introduced the angular number density fluctuation field of the cluster counts, δN cl (θ), for convenience for the following discussion. Please do not confuse δN cl (θ) with the ensemble average of the average angular number density, N . The angular number density fluctuation field can be expressed in terms of the three-dimensional number density fluctuation field of clusters defined in Eq. (A3): δN cl (θ) = dχ d 2 V dχdΩ δn cl (χ, χθ).
In the fourth line on the r.h.s. of Eq. (B13), we used the Fourier transform of δN cl using the discrete Fourier decomposition for a finite sky survey as discussed around Eq. (B2): Similarly, from Eq. (A15), the 2-halo term contribution is found to be Cov[N , P est κ (l)] 2h = 2 Ω s dχ d 2 V dχdΩ W 2 g (χ) 1 χ 4 dm n(m)S(m)b(m) m ρ mũ m (k = l/χ) 2 P L δ (k = l/χ; χ), where we have used P L (k) → 0 for k → 0 for the linear power spectrum as predicted from the inflation motivated primordial power spectrum. It is also interesting to find the 3-halo term contribution to the covariance is vanishing as because the perturbation theory bispectrum B PT (k 1 , k 2 , k 3 ) → 0 for k 1 → 0.

APPENDIX C: A TOY MODEL
In this section we show that the qualitative behavior of the percentage difference in S/N (plotted in the lower panel of Figs. 7 and 10) can be recovered using a surprisingly simple toy model. Although extremely simple it will help us gain intuition for the cause of the increase and decrease in the percentage difference in S/N . Imagine a simple universe in which all halos exist only at a small range of redshifts around redshift z, are unclustered (P L (k) = 0), and have a profile which is independent of mass (ũ 2 m (k = l/χ) ∼ũ 2 (k = l/χ)). Eq. (8) for cluster counts may be rewritten as D c ≡ N = χ 2 δχ dm S(m)n(m), where we assume that the volume element over a small redshift interval is given by dχ(d 2 V /dχdΩ) = χ 2 δχ, and consider a single redshift bin. Further, from Eq. (10) and using the halo model expression for the 1-halo term of the 3D mass power spectrum (e.g. see Eq. [9] in [56]), the lensing power spectrum for our simple universe can be shown to be given by D g ≡ P κ = W 2 g (χ)χ −2 δχũ 2 dm n(m) m ρ m 2 . (C2) For simplicity we consider observations at a single ℓ using one redshift bin, therefore the lensing power spectrum measurement is a single number.
To reproduce the results in Fig. 7 we also need to calculate the covariances in this simple universe. We will assume that there is no galaxy intrinsic ellipticity σ ǫ = 0 so that the shot noise term due to intrinsic galaxy shapes is negligible. Further we assume that the lensing power spectrum covariance arises only from the 1-halo term of the lensing trispectrum in Eq. (15), or in other words ignore the Gaussian error contribution (the first term in Eq. [15]). Substituting the 1-halo term of the lensing trispectrum into Eq. (15) for this simple universe gives The cluster count variance is given by the shot noise (see Eq. [13]): C c = 1 Ω s χ 2 δχ dm S(m)n(m).
The cross-covariance between cluster counts and the lensing power spectrum is expected from Eq. (B19) to be C gc = 1 Ω s W 2 g (χ)χ −2 δχũ 2 dm S(m)n(m) m ρ m 2 .

(C5)
Thus the only ingredient is the mass function weighted by various powers of the mass, and by the cluster selection function, the lensing efficiency and the volume element. The correlation coefficient between the cluster counts and lensing power spectrum is given as before as Note that all the prefactors in Eqs. (C3), (C4) and (C5) appearing in front of the halo mass integral drop out and therefore the results shown in this Appendix are independent of redshift and multipole. This is plotted in the left panel of Fig. 15 and may be compared to the lower panel of Fig. 5 for the full treatment. The shape is remarkably similar to that for the full treatment, implying that this simple model containing the different weightings of the mass function captures the essence of the complementarity. The correlation peaks at around 10 15 M ⊙ and decreases at low minimum cluster masses. It makes sense that the cluster counts and lensing are correlated at high minimum cluster masses, because the mass weighting in the toy lensing power spectrum is similar to the mass cut in the cluster counts: they are both dominated by high mass clusters. They become less correlated at lower minimum masses because the cluster counts are dominated by low mass clusters that contribute less to the lensing power spectrum.
The size of the correlation is larger for this simple model than for the full treatment. This makes sense because the full model contains several ingredients that will reduce the correlation including: the contributions of the Gaussian errors and shot noise to the lensing power spectrum covariance (e.g., compare thick and thin lines in the lower panel of Fig. 5); halos at a range of redshifts which will be weighted differently by the lensing power spectra and cluster counts; and terms involving more than one halo at a time. However an even simpler toy model, in which all halos in the universe have the same mass, would make lensing power spectra and cluster counts 100 per cent correlated. We see that simply including the mass weighting ∝ m 2 for the lensing power spectra stops a complete redundancy of information and starts to explain how these seemingly similar probes can be so complementary.
In this simple model we have only one data point from cluster counts D c and one data point from lensing D g (since we have only one redshift bin for each, and we are considering a single wavenumber l). Therefore we can write the signal-to-noise in terms of the correlation coefficient All the terms scale with (Ω s χ 2 δχ), the comoving volume of the redshift interval considered. We use a redshift slice at z = 0.3 of thickness 0.1 for illustration.
To reproduce the plot of percentage difference in (S/N ) we need to compare this to the (S/N ) when the covariance is not taken into account, found by setting r = 0 in the above. The factor 1/(1 − r 2 ) is close to unity when the correlation coefficient r is small, but when the correlation is strong (r ∼ 1) it gets much larger. This causes the (S/N ) to be larger when the covariance is included than when it is not included and gives rise to the peak in the right hand panel of Fig. 15. The final term in the round brackets causes a decrease in S/N , relative to the case where no covariance is included. This is important especially when the correlation is small and the factor 1/(1 − r 2 ) is unimportant. This explains the dip in right hand panel of Fig. 15. In summary: the fact that Fig. 15 is qualitatively similar to the lower panel of Fig. 7 suggests that the peak and dip can be explained just in terms of the different mass weighting of cluster counts and the lensing power spectrum.