CLUMP-3D: Three-dimensional Shape and Structure of 20 CLASH Galaxy Clusters from Combined Weak and Strong Lensing

We perform a three-dimensional triaxial analysis of 16 X-ray regular and 4 high-magnification galaxy clusters selected from the CLASH survey by combining two-dimensional weak-lensing and central strong-lensing constraints. In a Bayesian framework, we constrain the intrinsic structure and geometry of each individual cluster assuming a triaxial Navarro-Frenk-White halo with arbitrary orientations, characterized by the mass $M_{200\mathrm{c}}$, halo concentration $C_{200\mathrm{c}}$, and triaxial axis ratios ($q_{\mathrm{a}} \le q_{\mathrm{b}}$), and investigate scaling relations between these halo structural parameters. From triaxial modeling of the X-ray-selected subsample, we find that the halo concentration decreases with increasing cluster mass, with a mean concentration of $C_{200\mathrm{c}} = 4.82\pm0.30$ at the pivot mass $M_{200\mathrm{c}}=10^{15}M_{\odot}h^{-1}$. This is consistent with the result from spherical modeling, $C_{200\mathrm{c}}=4.51\pm 0.14$. Independently of the priors, the minor-to-major axis ratio $q_{\mathrm{a}}$ of our full sample exhibits a clear deviation from the spherical configuration ($q_{\mathrm{a}}=0.52 \pm 0.04$ at $10^{15}M_{\odot}h^{-1}$ with uniform priors), with a weak dependence on the cluster mass. Combining all 20 clusters, we obtain a joint ensemble constraint on the minor-to-major axis ratio of $q_{\mathrm{a}}=0.652^{+0.162}_{-0.078}$ and a lower bound on the intermediate-to-major axis ratio of $q_{\mathrm{b}}>0.63$ at the $2\sigma$ level from an analysis with uniform priors. Assuming priors on the axis ratios derived from numerical simulations, we constrain the degree of triaxiality for the full sample to be $\mathcal{T}=0.79 \pm 0.03$ at $10^{15}M_{\odot}h^{-1}$, indicating a preference for a prolate geometry of cluster halos. We find no statistical evidence for an orientation bias ($f_{\mathrm{geo}}=0.93 \pm 0.07$) (abridged)


INTRODUCTION
Galaxy clusters are gravitationally dominated by dark matter and serve as a wealth of ideal laboratories to study structure formation in the universe. In particular, an accurate mass estimation of galaxy clusters is crucial not only for utilizing them as cosmological probes (Planck Collaboration et al. 2015;Mantz et al. 2015;Bocquet et al. 2015;de Haan et al. 2016) but also for understanding the root cause of various astrophysical processes in massive halos, such as environmental quenching of galaxies (Dressler 1980). Conventionally, the total cluster mass is determined from projected measurements assuming spherical symmetry. In this context, N -body simulations in the standard Λ cold dark matter (ΛCDM) model established a nearly self-similar form for the spherically averaged density profile ρ(r) of dark-matter halos (Navarro et al. 1996), which can be characterized by two parameters, namely the characteristic density and radius of halos. This two-parameter model gives a satisfactory success in terms of statistically quantifying the ensemble mass of observed clusters over a sizable sample (e.g., Umetsu et al. 2011a;Newman et al. 2013;Umetsu et al. 2014;von der Linden et al. 2014;Hoekstra et al. 2015;Umetsu et al. 2016;Okabe & Smith 2016).
However, cluster halos are predicted to be non-spherical, with a preference for prolate shapes according to N -body simulations in the ΛCDM model (Frenk et al. 1988;Dubinski & Carlberg 1991;Warren et al. 1992;Jing & Suto 2002). Besides, the shape of halos is predicted to depend on the redshift, halo mass, and cluster-centric radius (Bailin & Steinmetz 2005;Hopkins et al. 2005;Allgood et al. 2006;Bett et al. 2007;Bonamigo et al. 2015), as well as on baryonic effects (Flores et al. 2007), large-scale environments (Kasun & Evrard 2005), and the background cosmology (Allgood et al. 2006;Despali et al. 2014). Therefore, cluster mass estimates assuming spherical symmetry cause a substantial scatter around their true mass (e.g., Battaglia et al. 2011). Importantly, an inappropriate assumption about the cluster shape and orientation could significantly bias individual mass measurements (e.g., Oguri et al. 2005). There have been initial attempts to compare the observed shapes of galaxy clusters with those predicted by cosmological numerical simulations (e.g., Oguri et al. 2010), opening up a new avenue of testing models of structure formation. It is thus important to perform a statistical analysis of the shape of clusters by extending cluster mass determinations beyond spherical modeling.
Compared to the theoretical efforts to characterize the shape of galaxy clusters in numerical simulations, significantly less progress has been made on the observational side (De Filippis et al. 2005;Sereno et al. 2006;Corless et al. 2008;Morandi & Limousin 2012;Limousin et al. 2013;Umetsu et al. 2015;Sereno et al. 2017b). Detailed observational work thus far was subject to case studies instead of a statistical interpretation from a large cluster sample, because of difficulty in acquiring data sets that are needed to achieve the required precision. Moreover, the shape of the total mass distribution in clusters was often inferred indirectly from observations of the intracluster medium (ICM) assuming hydrostatic equilibrium, which however would be violated in the presence of, for example, turbulent and bulk motions of hot gas (Lau et al. 2009;Molnar et al. 2010;Chiu & Molnar 2012). Astrophysical processes, such as radiative cooling of ICM and entropy injection from active galactic nuclei, as well as the cluster dynamical state (Cialone et al. 2017), further complicate the interpretation of cluster shapes inferred from X-ray or the Sunyaev-Zel'dovitch Effect (SZE hereafter;Sunyaev & Zel'dovich 1970, 1972 observations. Thus, investigating the shape of galaxy clusters is observationally challenging. Gravitational lensing provides a direct access to the underlying mass distribution of galaxy clusters without requiring any assumptions about their dynamical or physical state. There have been many successful attempts to determine the cluster mass by weak lensing von der Linden et al. 2014;Hoekstra et al. 2015;Melchior et al. 2017;Medezinski et al. 2017;Schrabback et al. 2018), strong lensing (Broadhurst et al. 2005b;Richard et al. 2010;Zitrin et al. 2015;Grillo et al. 2015) and the combination of both (Bradač et al. 2006;Oguri et al. 2012;Umetsu 2013;Umetsu et al. 2016). With recent progress in controlling systematics in weak lensing (e.g., intensive calibration against simulations; Bridle et al. 2010;Kitching et al. 2012;Mandelbaum et al. 2015), together with advances in instrumentation and observing techniques, we are in a great position to utilize gravitational lensing with high-quality lensing data.
In this work, we aim to use both weak and strong lensing to constrain the three-dimensional (3D) structure and shape of clusters targeted by the CLUster Multi-Probes in Three Dimensions (CLUMP-3D) program (Sereno et al. 2017b). Our sample consists of 20 high-mass clusters that were selected by the Cluster Lensing And Supernova survey with Hubble (CLASH, hereafter; Postman et al. 2012). Importantly, these 20 clusters were all deeply followed up from the ground and from space in different wavelengths (Donahue et al. 2014;Umetsu et al. 2014;Rosati et al. 2014;Czakon et al. 2015), with the goal of precisely characterizing the cluster mass distribution. Besides, these clusters have been intensively studied in previous work in the context of galaxy evolution (Annunziatella et al. 2014;DeMaio et al. 2015;Gupta et al. 2016), characterization of strongly lensed arcs (Zitrin et al. 2015), wide-field weak-lensing analysis (Umetsu et al. , 2014, and exploration of the high-redshift universe (Zheng et al. 2012;Coe et al. 2013;Balestra et al. 2013;Monna et al. 2014;McLeod et al. 2016).
In the first paper of the CLUMP-3D program, Sereno et al. (2017b) carried out a full triaxial analysis of MACS J1206.2−0847 using multi-probe data sets from weaklensing, strong-lensing, X-ray, and SZE observations, demonstrating the power of multi-probe cluster analysis. In our companion paper, Umetsu et al. (2018) present direct reconstructions of the two-dimensional (2D) matter distribution in the 20 CLASH clusters from a joint analysis of 2D shear and azimuthally averaged magnification measurements. This work is the third paper of the series, where we focus on character-izing the 3D mass distribution of the 20 CLASH clusters by combining weak and strong lensing. A multi-probe triaxial analysis of 16 X-ray-selected CLASH clusters using weaklensing, strong-lensing, X-ray, and SZE data sets is presented in another companion paper . We note that even though a joint analysis of multi-probe data sets can formally achieve a better precision, studies using gravitational lensing alone have the advantage of being free from assumptions about baryonic components in clusters. Therefore, both approaches are required and complementary to each other. This paper is organized as follows. We will briefly introduce the basics of gravitational lensing in Section 2. We then describe the cluster sample and the lensing data products in Section 3. In Section 4 we outline our methodology for triaxial modeling. We discuss our results in Section 5, followed by the conclusions made in Section 6. Throughout this work, we assume a flat ΛCDM cosmological model with Ω m = 0.27, H 0 = h × 100 km s −1Ṁ pc −1 with h = 0.7, and σ 8 = 0.8. We define an ellipsoidal overdensity radius R ∆ (e.g., Corless et al. 2009;Umetsu et al. 2015) such that the mean interior density contained within an ellipsoidal volume of semimajor axis R ∆ is ∆ times the critical density of the universe ρ c (z) at the cluster redshift z. We use ∆ = 200 to define the halo mass, M 200c 9 . All quoted errors are 68% confidence limits (i.e., 1σ) unless otherwise stated. We use the AB magnitude system. The notation U(x, y) stands for a uniform distribution between x and y.

THEORY OF GRAVITATIONAL LENSING
In this section, we briefly review the basics of gravitational lensing with emphasis on cluster lensing. In this case, we can approximate the lensing cluster of interest at redshift z d as a single thin lens embedded in a homogeneous universe where background sources at redshift z > z d are all lensed. We refer the readers to Bartelmann & Schneider (2001), Umetsu (2010) and Hoekstra et al. (2013) for a more complete overview of gravitational lensing.
To the first order, the deformation of observed background images due to gravitational lensing can be described by the lensing Jacobian matrix (see Equation (1)), which is characterized by the convergence κ and the shear γ ≡ γ 1 + iγ 2 at the position ϑ on the lens plane, where κ, γ 1 , and γ 2 are written as linear combinations of second derivatives of the lensing potential. The convergence κ(ϑ) is the surface mass density normalized by the critical surface mass density for lensing Σ c , where Σ(ϑ) is the surface mass density of the cluster projected along the line of sight, and with G the Newton's constant, and D l , D s and D ls the angular diameter distances between the observer-to-cluster, observerto-source, and the cluster-to-source pairs, respectively. The complex shear γ is related to the convergence κ by where the convolution kernel is defined as D(y) ≡ y 2 2 − y 2 1 − 2iy 1 y 2 /(π|y| 4 ). In general, the observable quantity for weak lensing is not the gravitational shear γ but the reduced shear g in the subcritical regime, The reduced shear g remains invariant under the global transformation, κ → λκ + 1 − λ and γ → λγ, for any λ = 0. This is referred to as the mass-sheet degeneracy (Bartelmann & Schneider 2001), which can be broken, for example, by including the lensing magnification effect. The lensing magnification is characterized by the inverse determinant of the Jacobian matrix, The magnification factor transforms differently as µ → λ −2 µ, which can be used to break the mass-sheet degeneracy. In the subcritical regime where µ > 0 and |g| < 1, the magnification introduces two competing effects: the reduction (increase) of observed area on the source plane given a solid angle, and the amplificatin (deamplification) of flux of background sources. As a net result, the surface number density of a "flux-limited" background sample is altered due to the presence of lensing magnification depending on the intrinsic slope of the background luminosity function. This effect is known as magnification bias (Broadhurst et al. 1995;Taylor et al. 1998). The effect of magnification bias can be measured by comparing cumulative number counts of flux-limited background galaxies with and without gravitational lensing as where n(< m) and n 0 (< m) represent the lensed and unlensed surface number densities of background galaxies brighter than the apparent magnitude m, respectively, and s ≡ d log n 0 (< m)/dm is the logarithmic slope of the cumulative magnitude distribution. It has been shown that, with a sizable sample of galaxy clusters, this effect can be solely used to calibrate the cluster mass proxies (e.g., Hildebrandt et al. 2009;Ford et al. 2012;Chiu et al. 2016a;Tudorica et al. 2017). By combining complementary observables of shear and magnification, one can break the mass-sheet degeneracy (Broadhurst et al. 2005a;Umetsu & Broadhurst 2008). In this work, we combine both observables to derive an unbiased convergence map for each individual cluster (see Section 3.2).
In the regime of strong lensing, detailed modeling with many sets of multiple images with known redshifts allows to determine the location of critical curves, which then returns robust estimates of the Einstein mass, that is, the projected mass enclosed by the critical area A c of an effective Einstein radius θ Ein = A c /π (Zitrin et al. 2015), In this work, we use strong-lensing constraints in the form of the enclosed projected mass profile around the effective Einstein radius. These constraints were obtained by Umetsu et al. (2016) using detailed lens models constructed by Zitrin et al. (2015) from a combined strong and weak lensing analysis of Hubble Space Telescope (HST) observations. We give further details in Section 3.3.

CLUSTER SAMPLE AND DATA
We first describe the cluster sample in Section 3.1. The data products of weak and strong lensing are presented in Section 3.2 and Section 3.3, respectively.

Cluster Sample
In this work, we study a sample of 16 X-ray regular and 4 high-magnification galaxy clusters targeted by the CLUMP-3D program (Sereno et al. 2017bUmetsu et al. 2018). Our sample stems from the CLASH wide-field weak-lensing analysis of Umetsu et al. (2014), and comprises two subsmples, both taken from the CLASH survey  targeting 25 high-mass clusters. Here, 20 clusters in the first CLASH subsample were selected to have X-ray temperatures greater than 5 keV and to have regular X-ray morphology. Numerical simulations suggest that this subsample is largely composed of relaxed clusters and free of orientation bias (Meneghetti et al. 2014). The second subset of 5 clusters were selected for their high magnification properties. These clusters turn out to be dynamically disturbed, merging systems Medezinski et al. 2013;Balestra et al. 2016;Jauzac et al. 2017). Accordingly, modeling with a single-halo component may not be adequate to describe the high-magnification subsample (Medezinski et al. 2013), in contrast to the X-ray-selected subsample that can be well described by a single Navarro-Frenk-White (Navarro et al. 1997, hereafter NFW) profile out to large cluster radii (Umetsu et al. 2016;Umetsu & Diemer 2017). For the sake of homogeneity, however, we analyze all clusters in the full sample in a consistent manner. We will also split the sample into several subsamples and statistically characterize each of them (see Section 4.3).
This sample spans a factor of ≈ 5 in mass (4 × 10 14 M h −1 < M 200c < 20 × 10 14 M h −1 ; Umetsu et al. 2016) and a redshift range of 0.18 < z < 0.69. Following Umetsu et al. (2014), we adopt the location of the brightest cluster galaxy (BCG) as the center for each cluster. As discussed in Umetsu et al. (2014), the rms of positional offsets between the BCGs and X-ray peaks for the full sample is ≈ 30 kpc h −1 , and it reduces to 10 kp h −1 for the Xray-selected subsample. Therefore, the effect of miscentering is not expected to be significant in this work (Johnston et al. 2007;Umetsu et al. 2011b;Umetsu et al. 2016). We tabulate the basic information of our 20 clusters in Table 1.
We note that this sample has been intensively studied in previous CLASH work, especially by Umetsu et al. (2014, hereafter U14) and Umetsu et al. (2016, hereafter U16), who performed recnstructions of the azimuthally averaged surface mass density profile from weak and weak+strong lensing data, respectively. Accordingly, both U14 and U16 focused on spherical mass estimates of these clusters. In this work, we analyze HST Einstein-mass constraints in combination with 2D weak-lensing mass maps of Umetsu et al. (2018) reconstructed from a joint analysis of 2D shear and azimuthally averaged magnification constraints. We extend the analyses of U14 and U16 to investigate the 3D structure and shape of Note. -The right ascension α BCG and declination δ BCG of the BCG position are adopted as the cluster center. The first 16 clusters are taken from the CLASH X-ray-selected subsample, while the other 4 clusters are from the CLASH high-magnification subsample. the 20 clusters using combined strong and weak lensing data sets.

Weak-lensing Data
In this section, we briefly summarize the weak-lensing data products used in this study, and refer the reader to our companion paper ) for full details. Our weaklensing analysis is based on deep multi-band imaging taken primarily with Suprime-Cam (Miyazaki et al. 2012) on the Subaru Telescope (typically, 5 Suprime-Cam bands; Table 1 of Umetsu et al. 2014), as obtained by the CLASH collaboration (Umetsu et al. 2014). For our southernmost cluster (RX J2248−4431), we used data taken with the Wide-Field Imager at the ESO 2.2 m MPG/ESO telescope at La Silla (Gruen et al. 2013). General data products from the CLASH survey, including the reduced Subaru/Suprime-Cam data, weight maps, and photometric catalogs, are available at the Mikulski Archive for Space Telescopes (MAST) 10 . Details of the image reduction, photometry, background galaxy selection, and the creation of weak-lensing shear catalogs are presented in Umetsu et al. (2014).
In our companion paper, Umetsu et al. (2018) have presented a 2D weak-lensing analysis for the 20 CLASH clusters using the background-selected shear catalogs and azimuthaly averaged magnification profiles, both published in Umetsu et al. (2014). In this study, we use pixelized 2D surface mass density maps obtained in Umetsu et al. (2018) as our weaklensing constraints. For each cluster, the mass map is pixelized on a regular grid of 48 × 48 pixels covering the central 10 https://archive.stsci.edu/prepds/clash/ 24 × 24 region. Umetsu et al. (2018) accounted for various sources of errors associated with their weak-lensing shear and magnification measurements (see their Section 3), including the covariance due to uncorrelated large-scale structures projected along the line of sight. All these errors are encoded in the covariance matrix used in our analysis.
As summarized in Section 5.1 of Umetsu et al. (2018), we quantified major sources of systematic errors in the CLASH weak-lensing analysis. In particular, we consider the following systematic effects: (1) dilution of the lensing signal caused by residual contamination from cluster members (2.4% ± 0.7%), (2) photometric-redshift bias in estimates of the mean lensing depth (0.27%), and (3) uncertainty in the shear calibration factor (5%). These errors add to 5.6% in quadrature. This corresponds to the mass calibration uncertainty of 5.6%/Γ 7% with Γ 0.75 being the typical value of the logarithmic derivative of the lensing signal with respect to cluster mass (Melchior et al. 2017).
On the other hand, by performing a shear-magnification consistency test, Umetsu et al. (2014) estimated a systematic uncertainty in the CLASH mass calibration to be 8%. In the CLUMP-3D program, we conservatively use this value as the systematic uncertainty in the ensemble mass calibration.
3.3. Strong-lensing Data Zitrin et al. (2015) obtained detailed lens models for the CLASH sample using two different parameterizations, one assuming that light traces mass for both DM and galaxy components, and the other using an analytical elliptical NFW form for the DM-halo components. Here we include HST lensing constrints of Zitrin et al. (2015) to improve modeling of cluster cores, which are unresolved by the wide-field weaklensing observations. Full details of data acquisition, reduction, and analysis of HST lensing data are fully given in Zitrin et al. (2015) and U16, to which we refer the reader for more details.
Here we give a brief summary of our HST lensing data. Specifically, for each cluster except RX J1532.9+3021 for which no secure identification of multiple images has been made (Zitrin et al. 2015), we use enclosed projected mass constraints M SL (< r) for a set of four fixed integration radii, r = 10 , 20 , 30 , and 40 . These constraints are presented in Table 1 of Umetsu et al. (2016). The measurement errors σ MSL(<r) include systematic as well as statistical uncertainties, by accounting for modeling discrepancies between the two modeling methods of Zitrin et al. (2015). The integrated signal-to-noise ratio of the enclosed mass constraints is on average ≈ 12, comparable to that of the weak lensing constraints Umetsu et al. (2014).

METHODOLOGY
In this section, we describe our methodology for triaxial modeling of galaxy clusters. We first describe the formalism for halo modeling in Section 4.1, and outline Bayesian methods in Section 4.2. In Section 4.3, we perform Bayesian inference for individual and ensemble clusters using the combined weak and strong lensing data sets. In Section 4.4, we examine scaling relations with halo mass for our clusters.

Halo Modeling
In this section, we describe triaxial halo modeling based on the 2D weak-lensing and central HST lensing data sets. To this end, we closely follow the forward-modeling approach of  Bonamigo et al. (2015) U (0, 1) U (−π/2, π/2) U (−π/2, π/2) Note. -Uniform priors of U (14, 16) and U (−1, 1) are used for log (M 200c ) and log (c 200c ), respectively. Masses are expressed in the units of M h −1 . All parameters other than M 200c and c 200c are fixed in Spherical modeling. In Triaxial modeling, we use uniform priors on the shape (qa and q b ) and orientation (cos θ, φ, and ψ) parameters. In Triaxial+B15 modeling we assume informative shape priors taken from cosmological numerical simulations of Bonamigo et al. (2015), while keeping the uniform priors for the orientation parameters. Umetsu et al. (2015) and Sereno et al. (2017b). Specifically, we forward-model the projected cluster lensing observations by projecting a triaxial NFW halo (Corless et al. 2009) along the line of sight.
The density profile of the triaxial NFW model is written as a function of the ellipsoidal radius R as where ρ s is the characteristic densty, and R s is the ellipsoidal scale radius measured along the major axis of the halo ellipsoid. The ellipsoidal radius R is related to the principal coordinates (X, Y, Z) centered on the cluster as with q a the minor-to-major axis ratio and q b the intermediateto-major axis ratio. By definition, we have 0 < q a ≤ q b ≤ 1. Equation (9) reduces to the spherical NFW model if q a = q b = 1. The M 200c mass and the R 200c radius for a cluster at redshift z d are related to each other by On the other hand, M 200c can be expressed as We define the concentration parameter c 200c as the ratio of the cluster radius to the scale radius along the major axis, Combining Equations (11), (12), and (13), one can express ρ s as We specify the radial density profile of the triaxial NFW model (see Equation (9)) with (M 200c , c 200c ), instead of (ρ s , r s ).
A triaxial halo is projected onto the lens plane as elliptical isodensity contours (Stark 1977), which can be specified by the intrinsic axis ratios (q a , q b ) and orientation angles (θ, φ, ψ) defined with respect to the line of sight of the observer. Following Umetsu et al. (2015) and Sereno et al. (2017a), we adopt the z-x-z convention of Euler angles (Stark 1977). The angle θ describes the inclination of the major (Z) axis with respect to the line of sight.
After a coordinate transformation of the first two Euler angles, elliptical isodensity contours of the projected ellipsoid can be described as a function of the elliptical radius ζ defined in terms of the observer's sky coordinates (X , Y) as The third Euler angle ψ represents the rotational degree of freedom in the sky plane to specify the observer's coordinate system. To sum up, our triaxial NFW model is specified by seven parameters, namely, halo mass and concentration (M 200c , c 200c ), intrinsic axis ratios (q a , q b ) characterizing the intrinsic halo shape, and three Euler angles (θ, φ, ψ) describing the halo orientation with respect to the line of sight. In this way, for a given set of the parameters, we can project a triaxial NFW halo onto the lens plane and compute the surface mass density at each position.
In this work, we pay a special attention to two geometric quantities that characterize the intrinsic shape and orientation of clusters. The first quantity is a geometrical factor f geo that describes the degree of elongation of the cluster mass distribution along the line of sight. Specifically, a cluster halo is elongated along the line of sight if f geo > 1, while it is elongated in the plane of the sky if f geo < 1. We stress that the case of f geo = 1 does not necessarily correspond to a spherical halo configuration, but indicates that the halo sizescale along the line of sight is eqaul to that in the plane of the sky. Following Sereno et al. (2010) and Umetsu et al. (2015), we define f geo by where q ⊥ is the minor-to-major axis ratio of the projected el-lipsoid, L represents the line of sight half length of the ellipsoid of ellipsoidal radius R = R s , ξ s is the projected scale radius (semi-major axis) in the sky plane. That is, for a given ellipsoid, f geo is the ratio of the line-of-sight half length L to the geometric mean of the semi-major and semi-minor axes of the projected isodensity contour. The second quantity of interest is the degree of triaxiality, T . Following the definition in Umetsu et al. (2015), the triaxiality is defined by By construction, 0 ≤ T ≤ 1. The degree of triaxiality T approaches unity, or q a = q b (zero, or q b = 1) if the halo shape is maximally prolate (oblate).

Bayesian Inference
In what follows, we describe the Bayesian inference formalism that we used to explore parameter space given the combined weak and strong lensing data sets. The joint posterior probability distribution of model parameters p given data D is written as where P(p) denotes the prior distribution of p. In this work, our model includes up to seven parameters, namely, , depending on modeling approaches (see below). The likelihood L(D|p) describes the probability of observing data D given the model p. Here we explicitly express the data as with D SL = {M SL (< r) |r = 10 , · · · , 40 } the data vector containing a set of encloesd projected mass constraints, and D WL the concatenated data vector containing pixelized values of the weak-lensing mass map. For each cluster, we evaluate the log-likelihood where the index i runs over the four strong-lensing constraints, σ MSL(<ri) is the uncertainty in the enclosed projected mass estimate M SL (< r i ), M 2D (< r i ) is the model prediction, and C represents the error covariance matrix for the weak-lensing data. In this work, we fit weak-lensing data across the entire 24 × 24 region centered on the cluster. We checked that restricting the fitting range to the central 4Mpc/h×4Mpc/h region (side length corrsponding approximately to twice the virial radius) does not significantly change the results, indicating that our analysis is not sensitive to the 2-halo term. The enclosed projected mass measurements from the HST lensing analysis impose a set of integrated constraints on the inner density profile. We note that, by doing this, no assumption is made of azimuthal symmetry or isotropy of the underlying mass distribution.
We use the python implementation of the affine-invariant ensemble Markov Chain Monte Carlo (MCMC) sampler, emcee (Foreman-Mackey et al. 2013), to explore parameter space. We consider the following three different modeling approaches: (1) spherical modeling with uniform priors on log M 200c and log c 200c , (2) triaxial modeling with uniform priors on all parameters, and (3) triaxial modeling incorporating informative shape priors from cosmological N -body simulations (Bonamigo et al. 2015, hereafter B15). For simplicity, we refer to these three approaches as Spherical, Triaxial, and Triaxial+B15 modeling, respectively. Here we briefly describe each case.
• Triaxial modeling: We use uniform priors on log M 200c , log c 200c , intrinsic shapes (q a , q b ), and orientation angles (cos θ, φ, ψ). We assume the following form of the prior probability distribution for the intrinsic axis ratios, where and and q min = 0.1 is the lower bound of the minor-tomajor axis ratio (Sereno & Umetsu 2011).
• Triaxial+B15 modeling: We adopt informative shape priors from N -body simulations of B15, who characterized the distribution of intrinsic axis ratios of N -body CDM halos as function of the halo peak height and redshift. We self-consistently update the shape prior for a given set of M 200c , c 200c , and redshift. Table 2 summarizes the prior distributions assumed in this study.

Modeling Strategy
In this study, we perform both individual and joint ensemble modeling of clusters. For the latter case, we simultaneously fit a single density profile to a (sub)sample of clusters. Specifically, we consider the following five (sub)samples of clusters: • full sample of 20 clusters, • low-mass subsample containing the ten lowest M 200c mass clusters from U14, • high-mass subsample containing the ten highest M 200c mass clusters from U14, • CLASH X-ray-selected subsample of 16 clusters, • CLASH high-magnification-selected subsample of 4 clusters. Note. -The first column lists the cluster name, followed by marginalized posterior constraints on respective parameters from Spherical, Triaxial and Triaxial+B15 modeling. The cluster masses are expressed in the unit of 10 15 M h −1 . The first twenty rows show the results of individual modeling, and the last five rows show the results from joint ensemble modeling. We provide 2σ lower limits on the axis-ratio parameters when they are ill-constrained.
For joint ensemble modeling of clusters, we assume that all clusters have the same mass, concentration, and intrinsic axis ratios, and fit the orientation angles for each individual cluster. Specifically, the joint posterior probability distribution for ensemble modeling is written as where i runs over all clusters in the (sub)sample, q denotes the vector containing model parameters is the model of the ith cluster. The individual and ensemble modeling approaches are complementary to each other. We will present both results in Section 5.

Scaling Relation Fitting
Here we describe our Bayesian regression approach to examining mass scaling relations of various cluster observables using the results from individual cluster modeling. Specifically, we investigate the following four scaling relations: (1) concentration to mass (c 200c -M 200c ) scaling relations, (2) minor axis ratio to mass (q a -M 200c ) scaling relations, (3) geometrical factor to mass (f geo -M 200c ) scaling relations, (4) triaxiality to mass (T -M 200c ) scaling relations. To this end, we use the following equation: together with the intrinsic scatter D X ≡ σ X |M200c , where the observable X runs over q a , f geo and T , respectively. For the concentration to mass relations, we fit Equation (26) using logarithmic observables (i.e., log c 200c and log M 200c ) with log-normal intrinsic scatter D c200c ≡ σ log c200c|M200c . Note that we examine these scaling relations with a pivot mass of M 200c = 10 15 M h −1 , which is close to the median mass of our sample, to reduce degeneracies between A X and B X .
To derive an observable (X ) to mass relation for a given sample of clusters, we draw 5000 random samples from MCMC posterior distributions of the clusters. For each random realization, we fit Equation (26) to data drawn from the posteriors and obtain a set of the regression parameters (A X , B X , D X ). Finally, we derive the median values and confidence intervals of the parameters. In this way, we directly account for the covariance between the observable X and mass M 200c in our Bayesian regression analysis (Chiu et al. 2016b;Gupta et al. 2017;Chiu et al. 2017). We have checked that the regression results are not sensitive to the number of random realizations used. In our analysis, we have ignored the intrinsic scatter between weak-lensing and true cluster mass (Sereno & Ettori 2015).  Note. -The best-fit parameters for the concentration to mass, minor-to-major axis ratio to mass, geometrical factor to mass, and triaxiality to mass scaling relations are listed. Each mass scaling relation is characterized by the normalization A X , mass slope B X , and intrinsic scatter D X , where X runs over c 200c , qa, fgeo and T . For the concentration to mass relation, we use logarithmic observables (i.e., log M 200c , log c 200c ) for the regression analysis. For the other scaling relations, we use linear observables without logarithmic transformation. The results of Triaxial and Triaxial+B15 modeling are shown for each scaling relation. Additionally, the results of Spherical modeling are presented for the concentration to mass relation. For ill-constrained parameters, we give 2σ upper limits. We perform a regression analysis of observable-mass scaling relations following this procedure for each of the Spherical, Triaxial and Triaxial+B15 modeling approaches.

RESULTS AND DISCUSSION
In Section 5.1, we first compare our results of Spherical modeling to those obtained by previous CLASH work of U14 and U16. In Sections 5.2 to 5.5, we present the resulting observable-mass scaling relations based on individual cluster modeling. We will also discuss the results of joint ensemble modeling in these subsections.
In Table 3 we summarize marginalized constraints on the spherical and triaxial NFW model parameters derived from the individual and ensemble modeling approaches. Here we have employed the biweight estimators of Beers et al. (1990) for the central location and scale of the marginalized posterior distributions (e.g., Sereno & Umetsu 2011;Umetsu et al. 2014;Umetsu et al. 2015) 11 . The regression parameters of various mass scaling relations are summarized in Table 4. In Figure 1, we show the joint ensemble constraints on cluster parameters from different modeling approaches. We also show the results obatained with and without the HST lensing constraints M SL (< r) to demonstrate the consistency between the weak and strong lensing data sets. For the results from individual cluster modeling, we show the resulting marginalized posterior distributions in Appendix A.

Consistency of Spherical Modeling
We compare our results from Spherical modeling of the 20 clusters to those of U14 and U16 for a consistency check. Both U16 and this work are based on the weak-lensing shear and magnification data obtained by U14. U16 reconstructed azimuthally averaged surface mass density profiles of these individual clusters by combining the weak-lensing data of U14 with the central HST lensing constraints M SL (< r) from Zitrin et al. (2015).
In both U14 and U16, the NFW mass and concentration parameters were derived assuming spherical symmetry, corresponding to the case of Spherical modeling in this work. Although these three studies share the same data as input to modeling, the crucial difference of this study is that we directly fit a model profile to the 2D surface mass density maps of Umetsu et al. (2018) without azimuthal averaging. This comparison is thus useful for validating the robustness of our reconstruction and modeling procedures, for a given data set.
We begin with the results of individual cluster modeling. Comparing our M 200c mass estimates to those M 200c,U16 from U16, we find the mean diffrence 12 in logarithmic mass of ∆ log M 200c = log M 200c − log M 200c,U16 = −0.01 ± 0.04, which meets the criterion of < 8% (or 0.035 dex) for consistency (see Section 3.2). Similarly, the mean difference in logarithmic concentration is ∆ log c 200c = log c 200c − log c 200c,U16 = 0.04 ± 0.04. Except that we observe a mild increase (0.04 dex or ≈ 10%) in concentration with respect to U16, our results are in satisfactory agreement with U16. This comparison with U16 is shown in Figure 2.
Excluding the HST lensing constraints M SL (< r) from our Spherical modeling results in mass estimates that 11 The biweight estimator is robust against skewed distributions, because it gives a higher weight to points that are close to the central location of a distribution. 12 We use an unweighted mean here because the uncertainties of this work and U14/U16 are highly correlated with each other. are consistent with those from U14 based on the onedimensional (1D) weak-lensing analysis, with a mean difference in logarithmic mass of ∆ log M 200c,WL = log M 200c,WL − log M 200c,U14 = 0.01±0.04. This is much smaller than the systematic uncertainty in the overall mass calibration of 8% (or 0.035 dex). Similarly, the mean difference in logarithmic concentration with respect to U14 is ∆c 200c,WL = log c 200c,WL − log c 200c,U14 = −0.03 ± 0.05. Again, no significant tension with U14 is found, as also shown in Figure 3.
In what follows, we compare our results from joint ensemble modeling to those from U14 and U16. Since U14 and U16 constrained the c-M relation only for the X-ray-selected subsample, we restrict our ensemble Spherical modeling to the same 16 X-ray-selected clusters for a fair comparison.
U16 constrained the NFW parameters from the stacked weak and strong lensing profile as c 200c = 3.76 +0.29 −0.27 and M 200c = (10.08 ± 0.7) × 10 14 M h −1 , respectively. In this work, joint ensemble Spherical modeling with combined weak and strong lensing yields c 200c = 4.18 +0.20 −0.19 and M 200c = 9.62 +0.49 −0.52 × 10 14 M h −1 , consistent with the 1D analysis of U16 within the quoted uncertainties. Note that this joint ensemble constraint on c 200c from our Spherical modeling is ≈ 10% higher than that from U16 at the 1σ level. This tendency is consistent with the case of individual cluster modeling (see Figure 2). This ensemble constraint is shown in red contours in the left panel of Figure 1. We further compare our spherical mass estimates to those fom Sereno et al. (2018). They obtained cluster masses M 200c, S18 from a joint analysis of weak and strong lensing, X-ray, and the SZE data sets, in both triaxial and spherical approaches. For the spherical mass comparison, we find a mean difference in logarithmic mass of M 200c, S18 (< R) − M 200c (< R) = (2 ± 3) % and (0 ± 1) % at R = 1Mpc h −1 and 1.5Mpc h −1 , respectively. This again demonstrates excellent agreement.
On the basis of these consistency tests, we find no significant tension between the results using different combinations of data sets (U14, U16, Sereno et al. (2018)), ensuring the robustness of our modeling procedures. We will discuss more results of Spherical modeling in Section 5.2. with a log-normal intrinsic scatter D c200c ≡ σ log c200c|M200c of 0.05, < 0.12 (2σ upper bound), and 0.04 using the Spherical, Triaxial and Triaxial+B15 modeling approaches, respectively. A redshift evolution of the NFW c-M relation, c ∝ (1 + z) −0.668±0.341 , was suggested for X-ray-selected CLASH-like halos in N -body hydrodynamical simulations (Meneghetti et al. 2014). We find that including the redshift scaling in regression analysis results in a negligible change in the inferred regression parameters within the errors. We thus ignore the redshift dependence of the c-M relation in this study. In Figure 4, we plot the resulting scaling relations along with the individual cluster constraints for the X-ray-selected subsample. The scaling relations obtained for the full sample and the high-magnification subsample are given in Table 4. In Figure 4, we see a steep mass dependence of the c-M relation. Assuming spherical symmetry, we find a mass slope of B c200c = −0.47 ± 0.07 for the X-ray-selected subsample, and an even steeper slope of B c200c = −0.65 ± 0.06 for the full sample. Here we note that this is due in part to our fitting procedure, in which we do not account for the underlying distribution of true cluster masses. That is, the steepening of the intrinsic mass function combined with the selection function could alter the resulting distribution of true cluster masses (Sereno & Ettori 2015). Accounting for this effect, U16 found B c200c = −0.44 ± 0.19 for the same subsample, which is consistent with our results, but with a much larger uncertainty. From Triaxial modeling including additional shape and orientation parameters (with uniform priors), we find a shallower mass slope of −0.36±0.11, which is consistent with the Spherical modeling results within the errors.

Concentration to Mass Relations
The normalization of the c-M relation is constrained as c 200c = 4.51 ± 0.14 and 4.82 ± 0.30 at the pivot mass of M 200c = 10 15 M h −1 for Spherical and Triaxial modeling, respectively. We note that, by construction, c 200c (Spherical) c 200c (Triaxial) (see Sereno et al. 2018). On the other hand, employing the informative shape priors from B15 in Triaxial+B15 modeling does not change the results in a significant manner. Regardless of the priors chosen, the effect of triaxiality has no significant impact on the resulting c-M relation, so that the assumption of spherical symmetry is well validated in determining the overall density structure of the CLASH clusters. Now we discuss the results of joint ensemble modeling of the cluster mass and concentration. In the left panel of Figure 1, we show the weak-lensing-only results in gray and the weak and strong lensing results in black, both derived for the full sample of 20 clusters. Joint ensemble constraints for the X-ray-selected, high-magnification, low-mass, and highmass subsamples are also shown in the same panel. We see a clear trend of decreasing concentration with increasing mass. In particular, the high-magnification subsample consisting of four very massive disturbed clusters (M 200c ≈ 1.5 × 10 15 M h −1 ) has c 200c ≈ 2, much lower than other similar-mass clusters, indicating a selection effect. We will further discuss this in Section 5.2.1.
In Figure 5, we show our joint ensemble constraints on the mass and concentration parameters for the X-rayselected subsample, obtained with three different modeling approaches (Spherical black; Triaxial red; Triaxial+B15 blue). In all cases, we use the weak and strong lensing constraints. From Triaxial (Triaxial+B15) modeling, we find c 200c = 3.87 +0.76 . Overall, triaxial modeling results in a concentration that is slightly higher than spherical modeling at the ≈ 7% level, regardless of the chosen priors. As noted above, it is expected that c 200c (Spherical) c 200c (Triaxial) . However, this level of difference is consistent with zero within the errors. Therefore, we conclude that the spherical symmetry is a well valid assumption in estimating the concentration of the CLASH clusters.

Comparisons with Previous Work
We first compare our concentration to mass scaling relations to that obtained by U16. In U16, the c-M relation for the X-ray-selected subsample is constrained as c 200c with a log-normal intrinsic scatter of σ log c200c|M200c ≈ 0.056 ± 0.026, assuming spherical symmetry. This is in good agreement with our Spherical modeling (see Equation (27)) at the 1σ level, in terms of the mass slope and intrinsic scatter. On the other hand, we find a normalization that is ≈ 13% higher than U16. This trend is consistent with what we found in Section 5.1.
According to Meneghetti et al. (2014), it is expected that the mean concentration of the X-ray-selected CLASH subsample recovered from projected lensing measurements is ≈ 11% higher than that for the full population of clusters. Specif- at their median redshift of z ≈ 0.35, with an intrinsic scatter of σ log c200c|M200c ≈ 0.07. The observed normalization (A c200c = 4.51 ± 0.14) is thus (10 ± 4) % higher than this CLASH prediction. The derived mass slope (B c200c = −0.47 ± 0.07) is steeper than the CLASH prediction (Meneghetti et al. 2014) at the 2.4σ level. The predicted scatter is consistent with our measurements, but considerably smaller than the typical intrinsic scatter, σ log c200c|M200c ≈ 0.15 (≈ 0.11), predicted for the full (relaxed) population of halos in cosmological N -body simulations (e.g., Duffy et al. 2008;Bhattacharya et al. 2013). This is consistent with the expectation that the X-ray-selected CLASH sample is largely (≈ 70%) composed of regular and highly relaxed clusters (Meneghetti et al. 2014). The intrinsic scatter is increased by a factor of ≈ 2 if we include the four high-magnification CLASH clusters (see Table 4).
Next, we compare our results to previous CLASH work of Merten et al. (2015), who studied 19 X-ray-selected CLASH clusters. They simultaneously combined HST strong+weak lensing constraints (specifically, HST shear catalogs plus lo- cations and redshifts of multiple images) with wide-field shear catalogs of Umetsu et al. (2014) to reconstruct 2D mass maps of individual clusters. Weak magnification lensing was not used in their analysis. Cluster mass estimates were obtained from spherical NFW fits to azimuthally averaged surface mass density profiles 14 . Assuming spherical symmetry, Merten et al. (2015) found c 200c ∝ (3.66 ± 0.16) × M −0.32±0.18

200c
, with a log-normal intrinsic scatter of 0.07, in good agreement with our results in terms of the mass slope and intrinsic scatter. However, the normalization obtained by Merten et al. (2015) is significantly lower than our results, likely arising from the different reconstruction methods.
We then compare our results to those of Sereno et al. (2017a), who carried out a 1D weak-lensing analysis to derive the c-M relation for SZE-selected clusters from the Planck survey. Examining their c-M relation with M 200c = 10 15 M h −1 and the median redshift of our full sample, z d = 0.377, we find c 200c = 4.04 +6.59 −2.50 , which is consistent with our Spherical modeling results for our full sample within large errors.
In addition, we compare our results to Oguri et al. (2012), who combined strong and weak lensing constraints in a 1D analysis to derive the c-M relation for a sample of 28 stronglensing-selected clusters from the Sloan Digital Sky Survey. The best-fit c-M relation of Oguri et al. (2012) is c vir = (7.7 ± 0.6) × M vir /5 × 10 14 M h −1 −0.59±0.12 , with a log-normal intrinsic scatter of σ log c200c|M200c = 0.12 defined with the virial overdensity. We convert this relation to that with ∆ = 200 by substituting M 200c = 0.88M vir and c 200c = 0.83C vir at M vir = 5 × 10 14 M h −1 and 14 In this work, we combine the enclosed projected mass constraints from the HST lensing analysis of Zitrin et al. (2015) with the wide-field weaklensing mass maps from Umetsu et al. (2018), followed by direct model fitting without azimuthal averaging. the median redshift of our full sample, z d = 0.377. The resulting relation using a pivot mass of 10 15 M h −1 is c 200c = (4.0 ± 0.3) × M 200c /10 15 M h −1 −0.59±0.12 , in good agreement with our full-sample results (Table 4).
This comparison is particularly interesting because the Oguri et al. (2012) sample was selected by the presence of strong-lensing features, specifically giant arcs. In contrast, the high-magnification CLASH clusters were selected by their high lensing magnification properties, with the goal of searching for strongly lensed galaxies at high redshifts. The giantarc selection of Oguri et al. (2012) preferentially selects clusters that are more centrally concentrated and/or elongated along the line of sight, resulting in a positive bias in the apparent degree of concentration relative to the full population of clusters. Importantly, this bias is predicted to be mass dependent and more prominent for low mass clusters (Oguri et al. 2012, e.g., estimated concentration being biased high by ≈ 80% for M vir ≈ 8 × 10 13 M h −1 and 20% for M vir 10 15 M h −1 ). In this work, we find an opposite trend of significantly lower concentration for high-magnificationselected clusters (Tables 3 and 4). This is expected for typical merging, high-mass clusters, where the mass distribution is not as concentrated as relaxed systems. In fact, our high-magnification clusters are found to be dynamically disturbed systems Medezinski et al. 2013;Balestra et al. 2016;Jauzac et al. 2017), where complex merging events are taking place. Nevertheless, this comparison suggests that clusters selected by their strong-lensing features tend to be a highly biased population in terms of their morphology and dynamical state.
Finally, we compare our joint ensemble constraints (Table 3) to recent simulation work of Duffy et al. (2008), Bhattacharya et al. (2013), and Diemer & Kravtsov (2015). Here we compare their predictions to our results from Spherical modeling because they measured halo mass and concentration from spherically averaged density profiles of simulated halos. Overall, our conclusions are not altered significantly if comparing to our triaxial results, given the good agreement between the spherical and triaxial results, regardless of the priors chosen. Duffy et al. (2008)  We then compare our results to the model of Diemer & Kravtsov (2015), who also characterized the halo concen- 15 The average peak height of our sample is ≈ 3.8 tration as a function of the halo peak height. Their model yields c 200c = 3.73 for the typical mass of our sample, M 200c ≈ 10 15 M h −1 , which shows no tension with our measurement (black contours in the left panel of Figure 1).
The comparisons we discussed above can be visualized in Figure 4. To sum up, our results on the c-M relation are in satisfactory agreement with previous lensing studies. A better agreement can be achieved once the selection function, the cosmology adopted, and/or the modeling systematics are taken into account. We find that the typical values of halo concentration (A c200c ) range from c 200c ≈ 3 to ≈ 4.5 at M 200c = 10 15 M h −1 , largely depending on the sample selection rather than the modeling assumption on the shape of clusters.

Axis Ratio to Mass Relations
Here we present the minor-to-major axis ratio to mass scaling relation for our full sample of 20 clusters derived using the Triaxial and Triaxial+B15 modeling approaches (Equations (30) and (31) with an intrinsic scatter D qa|M200c of 0.05 and < 0.06 (2σ upper bound), respectively.
We plot these results in Figure 6 in a similar manner as in Figure 4. There is no clear difference in the resulting q a -M relations between the X-ray-selected and high-magnification subsamples. We thus focus on the results based on the full 200c at the 2.7σ level, indicating that more massive clusters tend to be less spherical. On the other hand, introducing informative shape priors in Triaxial+B15 modeling yields an ensemble average of q a ≈ (0.44 ± 0.02) at the pivot mass M 200c = 10 15 M h −1 and a shallower slope of B qa ≈ (−0.14 ± 0.07). This corresponds to a marginal shfit in A qa and B qa at the 2σ and 1.3σ levels, respectively, with respect to the case using uniform priors. On the basis of the results above, we have detected a non-spherical shape of the clusters. The average minor-to-major axis ratio q a is ≈ 0.5, depending on the priors used, and it monotonically decreases with increasing cluster mass at the 2.7σ level.
In Figure 1, we show joint ensemble constraints on the concentration, mass, and axis ratios for the full sample obtained with different modeling approaches and data sets. It is seen in the right panel of Figure 1 that there is no clear correlation between the shape parameters (i.e., axis ratios) and the overall structural parameters (i.e., concentration and mass) regardless of the priors. This is consistent with the implication in Section 5.2 that the assumption of cluster shapes does not statistically affect the c-M relation of the CLASH clusters. We see from the right panel of Figure 1 that including the HST lensing data M SL (< r) results in a lower concentration, but it does not alter the axis ratios. Conversely, introducing the informative shape priors from B15 has an impact on the axis ratios, but not on the mass and concentration parameters.
We show in Figure 7 our joint ensemble constraints on the intrinsic shape parameters for our full sample of 20 clusters, along with the B15 prior distribution. We constrain the minor-to-major axis ratio as q a = 0.652 +0.162 −0.078 using uniform priors, which is higher than the expectation from the B15 prior distribution, q a = 0.412 +0.095 −0.083 , at the 1.5σ level. Employing the B15 priors in Triaxial+B15 modeling yields q a = 0.499 +0.018 −0.056 , which is consistent with the Triaxial constraint at the 1σ level, given the long tail of the posterior distribution.
Furthermore, we compare in Figure 8 our joint ensemble constraints on q a for the full sample with theoretical predictions from N -body numerical simulations of Despali et al. (2014), B15, Suto et al. (2016), andVega-Ferrero et al. (2017). Note that we evaluate the theoretical predictions in Figure 8 at the median redshift of our cluster sample, z d = 0.377. We see that our constraints on q a obtained using uniform priors are in favor of the axis ratio that is higher than the theoretical predictions. However, this trend is only at the 1.5σ level and not statistically significant. It is worth mentioning that including baryonic physics in numerical simulations results in a rounder shape of galaxy clusters (Kazantzidis et al. 2004;Bryan et al. 2013;Suto et al. 2017), which better agrees with our results based on the uniform priors than the purely N -body simulations. With upcoming large cluster surveys to dramatically improve statistics, this work demonstrates an opportunity to constrain the effects of baryonic feedback on the halo shape by using gravitational lensing.
Conversely, we do not have informative constraints on the second axis ratio q b in Triaxial modeling with uniform priors (see Table 3 and the right panel of Figure 1): We can only constrain the lower bound of q b for the full sample as 0.73, 0.63 and 0.50 at the 1σ, 2σ and 3σ levels, respectively. Introducing the B15 priors in Triaxial+B15 modeling gives q b = 0.636 +0.078 −0.045 , compared to the expectation from the B15 prior distribution, q b = 0.55 +0.14 −0.11 . Taking the covariance between q a and q b into account, the overall discrepancy between our lensing data and the B15 priors is at the 2.5σ level. Therefore, we do not have statistically significant evidence for a strong tension between the lensing data and the B15 simulations.
Additionally, we show in Figure 9 joint ensemble constraints on the axis ratios for the low-mass and high-mass subsamples. We observe that (1) the discrepancy between the Triaxial modeling and Triaxial+B15 modeling is smaller for the high mass samples, and (2) the constraints are significantly stronger for the high mass sample, suggesting that the weak constraints on the shape parameters for the full sample are likely due to the inclusion of the low mass clusters.
We note that we currently do not have compelling constraints on the intrinsic shape (especially q b ) of clusters based on the lensing data alone (using uninformative uniform priors). Nevertheless, we observe a marginal discrepancy between the lensing data and simulations, which can be better examined with a large statistical sample of clusters.

Geometrical Factor to Mass Relations
We constrain the geometrical factor to mass scaling relation for our full sample of 20 clusters from Triaxial and Tri-axial+B15 modeling as with an intrinsic scatter D fgeo|M200c of < 0.34 (2σ upper bound) and < 0.26 (2σ upper bound), respectively. The geometrical factor f geo is a derived quantity from the posterior distributions of the triaxial NFW parameters according to Equation (16). Specifically, f geo is defined as the ratio of the line-of-sight half length of a triaxial ellipsoid to the geometric-mean scale-radius of its isodensity contour projected on the sky. Therefore, it represents the degree of line-of-sight elongation of the mass distribution. A geometrical factor greater (smaller) than unity indicates a line-of-sight excess (deficit) of mass structure.
We show the results of the geometrical factor to mass scaling relations in Figure 10. Although the geometrical factor is a very noisy quantity for individual clusters, the ensemble behavior from the best-fit scaling relations suggests no significant deviation from random orientations (i.e., f geo is consistent with unity within the quoted 1σ uncertainties) regardless of the shape priors (uniform or B15). We find no significant evidence of a dependence of f geo on cluster mass. A mild trend at the 2σ level is found when the B15 priors are employed.
We stress that, given the fact that lensing can only probe the integrated mass along the line of sight, we do not have a sensitivity to the line-of-sight elongation of clusters using lensing data alone (Dietrich et al. 2014). Hence, we must rely on external information (e.g., X-ray/SZE data or simulation priors) to better constrain the orientation of clusters. With the B15 priors applied on the axis ratios, we extract the biweight center location of the marginalized posterior distribution of cos(θ) for each cluster (Section 5). The distribution of cos(θ) spans a wide range between 0.37 and 0.70, with a median value of cos(θ) = 0.55 and a typical biweight scale of ≈ 0.30. This corresponds to a mean inclining angle of cos −1 ( cos(θ) ) ≈ 57 deg, suggesting that the orientations of our sample are nearly random (i.e., no orientation bias). This is consistent with our results on the f geo -M relation (Table 4) regardless of the chosen sample, and in line with the theoretical expectation for the X-ray-selected CLASH clusters (Meneghetti et al. 2014). It is worth mentioning that a positive bias at a level of 3%-6% was suggested in the mass estimates of stacked weak-lensing measurements for optically selected clusters (Dietrich et al. 2014), while there is no clear indication of orientation bias for our clusters that are largely selected by their X-ray properties.

Triaxiality to Mass Relations
The degree of triaxiality T is a quantity derived from the posterior distributions of the intrinsic axis ratios (Equation (17)). A prolate mass distribution (i.e., q a = q b ) has T = 1, while a oblate shape (i.e., q a < q b = 1) has T = 0.
We stress again that we can only constrain the lower bound of the second axis ratio q b and thus the upper bound on the degree of triaxiality T from Triaxial modeling when using uniform priors. Accordingly, we only present the 2σ upper bound for the results from Triaxial modeling. The 2σ upper bound on the T -M relation from Triaxial modeling is with an intrinsic scatter of < 0.13 (2σ upper bound). We show the results of the T -M relation as well as the individual cluster constraints in Figure 11.
In Figure 11, a clear offset in the normalization A T is seen between the Triaxial and Triaxial+B15 modeling results: The degree of triaxiality is constrained as T ≈ 0.79 ± 0.03 at the pivot mass M 200c = 10 15 M h −1 using the B15 priors, and the offset in the normalization relative to the Triaxial results is at the ≈ 3σ level. This discrepancy is strongly driven by the fact that we can only constrain the lower bound of the intermediate-to-major axis ratio q b from Triaxial modeling, resulting in a nearly flat distribution of T . This can be further seen in Figure 12, where we plot T against cluster mass using the posteriors joint ensemble modeling (full, Xray-selected, and high-magnification samples). Without the B15 priors, T is essentially unconstrained by the lensing data alone. On the other hand, employing the B15 priors gives T ≈ 0.8, implying a prolate configuration of the CLASH clusters. We find that the posterior constraints on the mass slope are not sensitive to the chosen prior (B T ≈ 0.07), although the uncertainties are too large to claim a significant mass dependence. Even though the CLASH clusters exhibit non-spherical shapes, we echo that spherical symmetry is a well-validated assumption in estimating the cluster mass and concentration (see Section 5.2).

CONCLUSIONS
In this paper, we have combined the wide-field weaklensing mass maps obtained by Umetsu et al. (2018) Figure 12. Constraints on the degree of triaxiality T and cluster mass M 200c from joint ensemble modeling of different cluster subsamples. Using uninformative uniform priors, the constraints on T from Triaxial modeling are shown in gray (yellow, light blue) for the full (X-ray selected, highmagnification) sample. Using the B15 priors, the constraints on T are shown in black (red, dark blue) for the full (X-ray selected, high-magnification) sample.
central CLASH-HST lensing constraints (Zitrin et al. 2015;Umetsu et al. 2016) to perform three-dimensional modeling of the intrinsic mass distribution for a sample of 16 Xray-selected and 4 high-magnification clusters targeted by the CLASH survey. These clusters span a mass range of 4 × 10 14 M h −1 M 200c < 20 × 10 14 M h −1 and a redshift range of 0.18 < z < 0.7, with a median redshift of ≈ 0.377. Specifically, we have forward-modeled these lensing data sets assuming a triaxial NFW halo in a Bayesian framework, and constrained the mass M 200c , concentration c 200c , intrinsic axis ratios (minor-to-major ratio q a and intermediate-to-major ratio q b ), and orientation angles. For the case of triaxial modeling, we considered either uniform priors on all parameters or, alternatively, informative shape priors on q a and q b taken from cosmological N -bod simulations (Bonamigo et al. 2015). We have also performed spherical NFW modeling with the M 200c and c 200c parameters, while fixing the other parameters to the spherical configuration. We performed Bayesian modeling of both individual and ensemble clusters using the combined weak and strong lensing data sets. With the observed constraints on each individual cluster, we have investigated mass-scaling relations of the halo concentration c 200c , the minor-to-major axis ratio q a , the geometrical factor f geo , and the degree of triaxiality T .
Our results show that the halo concentration decreases with increasing mass, as found by previous work assuming spherical symmetry. The results are insensitive to both the assumed cluster geometry (spherical or triaxial) and the chosen shape prior. However, we find that the selection of clusters plays an important role. The four high-magnification CLASH clusters (M 200c ≈ 1.5 × 10 15 M h −1 ) have a significantly low concentration, compared to the X-ray-selected CLASH subsample. For the 16 X-ray-selected CLASH clusters, we find a mean concentration of c 200c = 4.82 ± 0.30 at the pivot mass M 200c = 10 15 M h −1 , and it scales as M −0.36±0.11 200c according to triaxial modeling with uniform priors. On the other hand, jointly modeling this subsample assuming a triaxial NFW halo, we obtain joint ensemble constraints of c 200c = 3.87 +0.76 −0.11 and M 200c = (0.99 ± 0.11) × 10 15 M h −1 . Our results are consistent with previous work from observations and simulations. A better agreement can be achieved if accounting for the sample selection, geometry of clusters, the background cosmology adopted, and the choice of the priors. The results from triaxial modeling are in good agreement with those from spherical modeling within the errors, suggesting that the assumption of spherical symmetry is well validated in estimating the overall mass profile of the CLASH clusters, even though we do observe evidence of aspherical shapes of clusters.
When using uninformative uniform priors, we obtain joint ensemble constraints on the minor-to-major axis ratio of q a = 0.652 +0.162 −0.078 at the typical mass M 200c = 10 15 M h −1 for our full sample of 20 CLASH clusters. Conversely, only a lower bound on the intermediate-to-major axis ratio q b is obtained as q b > 0.632 at the 2σ level. Using the B15 priors gives improved joint ensemble constraints of q a = 0.499 +0.018 based on the results with the uniform and B15 priors, respectively. Overall, no significant tension is seen between the lensing data and the numerical predictions from B15 in terms of the intrinsic cluster axis ratios. Our results suggest that we currently do not have strong constraints ( 3σ) on the intrinsic shape of clusters based on gravitational lensing alone, unless informative shape priors are employed.
We have also studied the geometrical factor f geo , an indicator of the line-of-sight elongation of cluster mass distributions. We find that our sample shows no significant deviation from isotropic, random orientations: f geo = 0.93 ± 0.07 and f geo = 0.96 ± 0.07 based on the uniform and B15 priors, respectively. The results are in agreement with the theoretical expectation for the CLASH clusters dominated by relaxed systems (Meneghetti et al. 2014). No significant mass dependence of f geo is seen regardless of the chosen prior. The average inclination angle θ between the cluster major axis and the line of sight is θ ≈ 57 deg, suggesting again that there is no evidence of orientation bias for the CLASH clusters.
Finally, the degree of triaxiality for our sample is constrained as T = 0.79 ± 0.03 at the pivot mass M 200c = 10 15 M h −1 using the B15 priors, suggesting that the geometry of our sample is close to the prolate configuration (T = 1) rather than the oblate one (T = 0). However, we stress that this result strongly depends on the choice of the shape priors. With the uniform priors, we can only constrain the upper bound of T as T < 0.69 at the 2σ level. No significant mass trend of triaxiality is observed in our sample regardless of the priors.
We have presented a statistical three-dimensional analysis of a sizable sample of high-mass galaxy clusters using highquality weak and strong lensing data sets. We observed clear evidence of a departure from spherical symmetry in our sample of 20 clusters. On the other hand, we find that the assumption of spherical symmetry is still well validated in terms of determining the overall mass profile (such as concentration and mass) if the sample is free from orientation bias. We find increasingly promising constraints on the intrinsic shape parameters with increasing halo mass or with increasing size of cluster sample. Therefore, it will be very desirable to extend this type of analysis to large, well-controlled samples of clusters defined from ongoing large-sky surveys, such as the Subaru Hyper Suprime-Cam survey (Miyazaki 2015) and the Dark Energy Survey (Flaugher 2005). ACKNOWLEDGMENTS We thank Tetsu Kitayama and Daichi Suto for providing us with simulated data points that are presented in Figure 8. We thank the anonymous referee for providing constructive suggestions that lead to the improvement of this paper. KU acknowledges support from the Ministry of Science and Technology of Taiwan  This paper made use of the code colossus (Diemer 2017) and the packages from Bocquet & Carter (2016) and Hinton (2016) for plotting. This work made use of the IPython package (Pérez & Granger 2007), SciPy (Jones et al. 2001), TOPCAT, an interactive graphical viewer and editor for tabu-lar data (Taylor 2005), matplotlib, a Python library for publication quality graphics (Hunter 2007