Line-of-sight Elongation and Hydrostatic Mass Bias of the Frontier Fields Galaxy Cluster Abell 370

We present a detailed weak-lensing and X-ray study of the Frontier Fields galaxy cluster Abell 370, one of the most massive known lenses on the sky, using wide-field BRz Subaru/Sprime-Cam and Chandra X-ray observations. By combining two-dimensional (2D) shear and azimuthally averaged magnification constraints derived from Subaru data, we perform a lensing mass reconstruction in a free-form manner, which allows us to determine both radial structure and 2D morphology of the cluster mass distribution. In a triaxial framework assuming a Navarro-Frenk-White density profile, we constrain the intrinsic structure and geometry of the cluster halo by forward modeling the reconstructed mass map. We obtain a halo mass $M_{200}=(1.54 \pm 0.29)\times 10^{15}h^{-1}M_\odot$, a halo concentration $c_{200}=5.27 \pm 1.28$, and a minor-major axis ratio $q_a=0.62 \pm 0.23$ with uninformative priors. Using a prior on the line-of-sight alignment of the halo major axis derived from binary merger simulations constrained by multi-probe observations, we find that the data favor a more prolate geometry with lower mass and lower concentration. From triaxial lens modeling with the line-of-sight prior, we find a spherically enclosed gas mass fraction of $f_\mathrm{gas}=(8.4 \pm 1.0)\%$ at $0.7h^{-1}$ Mpc $\sim 0.7r_{500}$. When compared to the hydrostatic mass estimate from Chandra observations, our triaxial weak-lensing analysis yields spherically enclosed mass ratios of $1-b \equiv M_\mathrm{HE}/M_\mathrm{WL} = 0.56 \pm 0.09$ and $0.51 \pm 0.09$ at $0.7h^{-1}$ Mpc with and without using the line-of-sight prior, respectively. Since the cluster is in a highly disturbed dynamical state, this represents the likely maximum level of hydrostatic bias in galaxy clusters.


INTRODUCTION
Galaxy clusters can provide a range of valuable information from the physics driving structure formation to the nature of dark matter and dark energy. Their matter content reflects that of the universe: ∼ 85% dark matter and ∼ 15% baryons, with ∼ 90% of the baryons residing in the hot intracluster medium (ICM). Determining the evolution of the abundance of rare massive clusters provides powerful cosmo-logical constraints, especially on the matter density parameter, Ω m , and the amplitude of linear density fluctuations, σ 8 (e.g., see Mantz et al. 2015, and references therein). Conversely, an accurate determination of the total mass of galaxy clusters using direct mass probes, such as weak gravitational lensing, is essential to harness the full potential of cluster cosmology (e.g., Pratt et al. 2019;Chiu et al. 2021;Tam et al. 2022).
In the context of the standard Λ cold dark matter (ΛCDM) model, galaxy clusters are non-spherical in shape and better approximated as triaxial halos (Jing & Suto 2002), with a preference for prolateness over oblateness and preferentially aligned with surrounding filaments (Bett et al. 2007).
Cluster-scale halos can be characterized as triaxial ellipsoids with a typical minor-to-major axis ratio of 0.4-0.5 (Bonamigo et al. 2015), where more massive objects tend to be more prolate. Thus, while the intrinsic shape and orientation of galaxy clusters contain unique cosmological information (Okumura & Taruya 2020), they can also introduce significant scatter and bias in cluster mass estimates due to the unknown orientation of cluster halos. In particular, gravitational lensing is sensitive to such projection effects (e.g., Becker & Kravtsov 2011).
According to cosmological N -body simulations, "superlens" clusters characterized by large Einstein radii (θ Ein 30 for a source redshift of z s = 2) represent the most lensing-biased population of clusters, with their major axis preferentially aligned with the observer's line of sight (Hennawi et al. 2007;Oguri & Blandford 2009;Meneghetti et al. 2010aMeneghetti et al. , 2011. A statistical bias in favor of prolate structure pointed close to the observer arises, because such a halo geometry can boost the projected mass density and hence the lensing signal. In particular, major mergers of two clusters colliding nearly along the line of sight provide a possible mechanism for producing a powerful superlens (e.g., see the case of Cl0024+1654; Umetsu et al. 2010).
Abell 370 (hereafter A370; a.k.a. PSZ2 G172.98−53.55) at z = 0.375 is known as a prominent strong lens with an Einstein radius of θ Ein ≈ 34 (for z s = 2; see Table 1) and is the first galaxy cluster where gravitational lensing has been observed in the form of a giant luminous arc (Soucail et al. 1987(Soucail et al. , 1988. A370 is also among the most massive clusters based on weak gravitational lensing, with an estimated virial mass of M vir ∼ 2×10 15 h −1 M Hoekstra et al. 2015, all relevant symbols are defined at the end of this section). Because of its large projected mass and exceptional lensing strength, A370 was selected as one of the six Hubble Frontier Fields (Lotz et al. 2017) and has recently been targeted by the Beyond Ultra-deep Frontier Fields and Legacy Observations (BUFFALO; Steinhardt et al. 2020) with the Hubble Space Telescope (HST), which expands the area coverage of the Frontier Fields in optical and near-infrared pass bands.
Lensing studies of A370 reveal a bimodal mass distribution in the core elongated in the north-south direction (e.g., Kneib et al. 1993;Umetsu et al. 1999;Richard et al. 2010;Diego et al. 2018;Lagattuta et al. 2017Lagattuta et al. , 2019Ghosh et al. 2021). Strait et al. (2018) combined strong and weak lensing constraints from Frontier Fields imaging and spectroscopic observations to reconstruct the central mass distribution of A370. Their mass map shows two dominant peaks associated with the two brightest cluster galaxies (BCGs), with the northern peak much less concentrated than the southern one and slightly offset from the stellar mass distribution. These bimodal and offset features are often an indication of recent major merger activity (e.g., Bradač et al. 2008).
In contrast to its extreme mass and exceptional lensing properties, A370 is intriguingly faint in both X-ray and Sunyaev-Zel'dovich effect (SZE) signals and does not follow the X-ray/SZE observable-mass scaling relations (see Czakon et al. 2015). The X-ray brightness distribution revealed from Chandra observations is highly elongated in the north-south direction, showing a disturbed morphology with the brightest X-ray peak located about halfway between the two BCGs (Molnar et al. 2020). The irregular morphology in X-ray emission with a large elongation similar to that of the mass distribution is a strong indication that the cluster is far from hydrostatic equilibrium (Lee & Suto 2003).
Recently, N -body hydrodynamical simulations of binary cluster mergers constrained by lensing, X-ray, SZE, and optical spectroscopic observations suggest that A370 is a massive post-major merger viewed after the second core passage in the infalling phase, just before the third core passage (Molnar et al. 2020). In this post-collision phase, the gas has not settled into the gravitational potential of the cluster, which explains the faintness of the X-ray and SZE signals. These results also suggest that the mass distribution of A370 is highly elongated along the current direction of the collision axis, which is oriented close to the line of sight in their best-matching simulation.
In this paper, we present a detailed weak-lensing and X-ray study of A370 using wide-field BR C z imaging taken with Suprime-Cam on the Subaru telescope and high-quality data from the Chandra X-ray Observatory. The primary aims of this paper are to obtain an accurate inference of the threedimensional (3D) mass model of A370 from a full triaxial analysis of two-dimensional (2D) weak-lensing data and to determine the level of hydrostatic mass bias and the gas mass fraction as a function of cluster radius. The key for this study is to perform a lensing mass reconstruction in an unbiased manner, from which to constrain both radial structure and 2D morphology of the cluster mass distribution. To this end, we perform an improved joint shear and magnification analysis of 2D Subaru weak-lensing data, revisiting our earlier onedimensional (1D) work presented in Umetsu et al. (2011). Since A370 is extremely massive and in a highly disturbed dynamical state, this analysis will provide a constraint on the likely maximum level of the hydrostatic bias expected in galaxy clusters. This paper is organized as follows. Section 2 describes the basic theory of cluster weak lensing and outlines the methodology used to reconstruct the cluster mass distribution. Section 3 describes details of the Subaru observations, reduction procedures, and weak-lensing analysis. Section 4 presents the results of our mass reconstruction, followed by our triaxial modeling in Section 5. Section 6 describes the X-ray data analysis. Section 7 compares the weak-lensing and Chandra mass profiles. Finally, a summary is given in Section 8.
We denote the critical density of the universe at a particular redshift z as ρ c (z) = 3H 2 (z)/(8πG), with H(z) the Hubble function. We generally denote spherical and projected radii from the cluster center as r and r ⊥ , respectively, and reserve the symbol R for ellipsoidal cluster radii. We adopt the standard notation M ∆ (or M ∆m ) to denote the mass enclosed within a sphere of radius r ∆ (or r ∆m ) within which the mean overdensity equals ∆ (or ∆ m ) times ρ c (z) (or the mean background density ρ m (z)). We compute the virial mass and radius, M vir and r vir , using an expression for ∆ vir based on the spherical collapse model (Bryan & Norman 1998). For its ellipsoidal counterpart R ∆ , see Section 5.2. We use "log" to denote the base-10 logarithm and "ln" to denote the natural logarithm. All quoted errors are at the 1σ confidence level unless otherwise stated. The AB magnitude system is used throughout.

Basics of Galaxy-Cluster Weak Lensing
The effects of weak gravitational lensing on background galaxies are characterized by the convergence, κ, and the shear with spin 2 rotational symmetry, γ = |γ|e 2iφγ (for reviews, see Bartelmann & Schneider 2001;. In this work, we closely follow the notation of .
The lensing convergence κ alone causes an isotropic magnification of galaxy images and it is defined as the surface mass density Σ of a lens in units of the critical surface density for gravitational lensing, κ = Σ/Σ cr , where with c the speed of light, G the gravitational constant, and D l (z l ), D s (z s ), and D ls (z l , z s ) the observer-lens, observersource, and lens-source angular diameter distances, respectively. The dimensionless factor β(z l , z s ) describes the geometric lensing efficiency as a function of lens redshift z l and source redshift z s . The shear and convergence thus depend on (z l , z s ) as well as on the image position θ. Einstein radius ( ) 33.9 ± 1.1 for zs = 2 NOTE-The optical cluster center is at the midpoint of the two BCGs (Lotz et al. 2017;Steinhardt et al. 2020). Units of right ascension are hours, minutes, and seconds, and units of declination are degrees, arcminutes, and arcseconds. The cluster velocity dispersion is derived from spectroscopic observations of Lagattuta et al. (2022) in the core region of the cluster (see also Lagattuta et al. 2019). The X-ray emission centroid is determined from a 2D β-model fit to Chandra X-ray observations (see Section 6). The average temperature of the cluster is measured from the Chandra X-ray spectrum in the radial range ∈ [50, 500] h −1 kpc centered on the X-ray centroid. The Einstein radius is constrained by detailed strong lens modeling by Kawamata et al. (2018).
The gravitational shear field γ(θ) is directly observable from image ellipticities of background galaxies in the weaklensing regime, |κ| 1, |γ| 1. The shear and convergence fields are related by with D(θ) the complex kernel D(θ) = (θ 2 2 − θ 2 1 − 2iθ 1 θ 2 )/(π|θ| 4 ). The key observable for weak shear lensing in the subcritical regime is the complex reduced shear, which remains invariant under the global transformation κ(θ) → λκ(θ) + 1 − λ and γ(θ) → λγ(θ) with an arbitrary constant λ = 0 (for a fixed source redshift z s ). This is referred to as the mass-sheet degeneracy (Schneider & Seitz 1995). This degeneracy can be broken or alleviated, for example, by measuring the magnification factor µ in the subcritical regime, We note that in practical applications to magnification bias measurements, this degeneracy can be lifted only if the unlensed mean source background density is known or can be estimated from the data (see Section 2.5). The magnification factor µ transforms as µ(θ) → λ −2 µ(θ). For simplicity of notation, we often use the inverse magnification ∆ µ = µ −1 . The reduced shear g 1,2 can be decomposed into the tangential component g + = γ + /(1 − κ) and the 45 • -rotated cross-shear component g × = γ × /(1 − κ) with respect to a given reference point. The tangential shear γ + (θ) averaged around a circle of projected radius θ is related to the excess surface mass density ∆Σ(θ) through the following identity: where Σ(θ) is the azimuthally averaged surface mass density at radius θ and Σ(< θ) is the average surface mass density interior to θ. The azimuthally averaged cross-shear γ × (θ) is expected to vanish if the signal is due to weak lensing.

Source Redshift Distribution
We consider a population of source galaxies characterized by their mean (unlensed) redshift distribution, N (z). In general, we use different magnitude, color, size, and quality cuts in background selection for measuring the shear and magnification effects. This results in different N (z) for shear and magnification. The source-averaged mean lensing depth β n X (n = 1, 2, . . . ) for a given population (X = g, µ) is (6) In general, N (z) for a given lens can include foreground galaxies. The contribution from unlensed objects with β = 0 is thus taken into account in the calculation of β n X . We introduce the relative lensing strength of a given source population W X = β X /β ∞ with β ∞ ≡ β(z l , z s,∞ ) defined relative to a reference source in the far background at redshift z s,∞ (Bartelmann & Schneider 2001). We use a reference redshift of z s,∞ = 20000, which was adopted in the CLASH program (Umetsu et al. 2014;Merten et al. 2015). The associated critical surface density is Σ cr,∞ (z l ) = c 2 /(4πGD l )β −1 ∞ . Hereafter, we use the far-background fields κ ∞ (θ) and γ ∞ (θ) to describe the projected mass distribution of the cluster.

Reduced Shear Field
We use the reduced shear field as the primary constraint from our weak-lensing observations. The source-averaged reduced shear g n = g(θ n ) is measured from shape measurements of background galaxies onto a regular grid of N pix pixels (n = 1, 2, . . . , N pix ) as −1 (10) where S(θ, θ ) is a spatial window function, g (k) is an estimate of g(θ) for the kth galaxy at θ (k) , and w g(k) is its statistical weight, w g(k) = 1/(σ 2 g(k) + α 2 g ), with σ 2 g(k) the error variance of g (k) . The α g parameter is set to a typical value of the shear dispersion σ g = 0.4 found in Subaru weak-lensing observations (e.g., Umetsu et al. 2009Umetsu et al. , 2014. The source-averaged expectation (denoted by a hat symbol) for the observable g n (Equation (10)) is given by (Seitz & Schneider 1997;Umetsu et al. 2015) where W g is the source-averaged relative lensing strength (see Section 2.2) and f W,g ≡ W 2 g / W 2 g = β 2 g / β 2 g is a dimensionless correction factor of the order unity. The error variance σ 2 g,n for g n is expressed as 1 Because of the choice of the basis function, an unbiased extraction of the mass coefficients {Σn} N pix n=1 is possible by performing a spatial integral of κ∞(θ) over a certain area. Such operations include spatial smoothing, azimuthal averaging for the radial profile extraction, and fitting with smooth parametric functions.

Flux Magnification Bias
Lensing magnification influences the observed surface number density of background sources behind lenses, enhancing the apparent source fluxes and expanding the area of sky. The former effect increases the source counts above the limiting flux, whereas the latter reduces the effective observing area in the source plane, thus decreasing the observed source counts per unit solid angle. The net effect, known as magnification bias , depends on the intrinsic slope of the source luminosity function.
Deep multi-band photometry can be used to sample the faint end of the luminosity function for quiescent galaxies at z ∼ 1 (Ilbert et al. 2010). The effect of magnification bias for such a population is dominated by the geometric area distortion, because there are relatively few fainter objects that can be magnified into the flux-limited sample. This effect results in a net depletion of source counts (e.g., Broadhurst et al. 2005;Ford et al. 2012;Coe et al. 2012;Radovich et al. 2015;Ziparo et al. 2016). The key advantage in the regime of density depletion, at the expense of deep multi-band imaging, is that the effect is not sensitive to the exact form of the source luminosity function (Umetsu et al. 2014).
In cluster-galaxy weak lensing, the change in magnitude δm = 2.5 log 10 µ due to magnification is small compared to the range over which the slope of the luminosity function varies. The source counts can thus be locally approximated by a power law at a given cutoff magnitude m cut . Following Umetsu et al. (2014Umetsu et al. ( , 2016, we interpret the source-averaged magnification bias as (see Appendix A) where the expected value of a weak-lensing observable is denoted by a hat symbol, N µ (< m cut ) = ∞ 0 dz N µ (z| < m cut ) is the unlensed mean counts per cell, W µ is the source-averaged relative lensing strength (Section 2.2), and s is the logarithmic count slope evaluated at the cutoff magnitude m cut , Since a given magnitude cut corresponds to different luminosities at different source redshifts, different source populations probe different regimes of magnification bias . A net depletion (or enhancement) of source counts results when s < 0.4 (or > 0.4). In this study, we measure the density depletion signal using a source population with s < 0.4. For simplicity, we write N µ (θ) = N µ (θ| < m cut ) and N µ = N µ (< m cut ). In the weak-lensing limit The covariance matrix Cov[N (θ m ), N (θ n )] ≡ (C N ) mn of the counts in cell includes the clustering and Poisson contributions, (C N ) mn = (N µ ) 2 ω mn + δ mn N µ (θ m ) (Hu & Kravtsov 2003) with ω mn the cell-averaged angular correlation function of source galaxies. As discussed in detail by Umetsu et al. (2015), C N can be approximated as with δN 2 µ (θ m ) the variance of the mth counts. To overcome this noise, we azimuthally average the observed counts N µ (θ) in a set of clustercentric annuli and calculate the surface number density profile {n µ,i } N bin i=1 of background galaxies as (Umetsu et al. 2015 where Ω cell is the solid angle of each cell and P im = ( m A mi ) −1 A mi is the projection matrix normalized by m P im = 1; A mi denotes the area fraction of the mth cell lying within the ith radial bin and f mask,i is the mask correction factor for the ith bin, with f m the masked area fraction in the mth cell due to saturated objects, foreground galaxies, and cluster members. We use Monte Carlo integration to calculate the area fractions A mi for individual cells (Umetsu & Broadhurst 2008). The Poisson and clustering contributions to the uncertainty in n µ,i are Additionally, we account for systematic uncertainties in the magnification analysis. In Appendix C, we describe the procedure used to estimate the uncertainties σ µ,i in n µ,i .

Mass Reconstruction Algorithm
A practical limitation of the shear-only lensing analysis is the inherent mass-sheet degeneracy, which can be alleviated by using the complementary combination of shear and magnification (Schneider et al. 2000;Umetsu & Broadhurst 2008;Rozo & Schmidt 2010). Measuring the two complementary effects also enables us to check the internal consistency of weak-lensing measurements (Umetsu et al. 2014). Moreover, obtaining accurate mass maps has the important advantage of being able to identify local mass structures and to directly compare them with multiwavelength observations.
In this work, we use the mass inversion algorithm developed by Umetsu et al. (2015), who generalized the cluster lensing mass inversion (CLUMI) code of Umetsu (2013) into a 2D description of the pixelized mass distribution. This freeform algorithm combines a 2D shear pattern (g 1 (θ), g 2 (θ)) with azimuthally averaged measurements of magnification bias {n µ,i } N bin i=1 . The latter imposes a set of azimuthally averaged constraints on Σ(θ) to effectively break the mass-sheet degeneracy. The CLUMI-2D algorithm takes full account of the nonlinear subcritical regime of lensing.
Given a model λ and observed data d, the Bayes' theorem states that the joint posterior probability P (λ|d) is proportional to the product of the likelihood L(λ) ≡ P (d|λ) and the prior probability P (λ). In our inversion problem, λ is a signal vector containing the pixelized mass coefficients m = {Σ n } Npix n=1 (Section 2.3) and calibration nuisance parameters c (see Section 2.6.3), so that λ ≡ (m, c).
We express the joint likelihood function L for combined weak-lensing data d as a product of the two separate likelihood functions, L = L g × L µ with L g and L µ the likelihood functions for shear and magnification, respectively. We assume that the observational errors follow a Gaussian distribution, so that L ∝ exp(−χ 2 /2), with χ 2 the standard misfit statistic.

Shear Log-likelihood Function
The log-likelihood function l g ≡ − ln L g for 2D shear data is written as Umetsu et al. 2015Umetsu et al. , 2018 where g α,m (λ) is the theoretical expectation for g α,m = g α (θ m ) and (W g ) αβ,mn is the shear weight matrix, Here, M m is a mask weight, defined such that M m = 0 if the mth cell is masked out and M m = 1 otherwise, and C g is the shear covariance matrix given by Equation (13).

Magnification Log-likelihood Function
The log-likelihood function for magnification bias data l µ ≡ − ln L µ is written as (Umetsu et al. 2015(Umetsu et al. , 2018 where n µ,i (λ) is the theoretical expectation for n µ,i and (W µ ) ij is the magnification weight matrix, W µ = C −1 µ , with C µ the corresponding covariance matrix, where the diagonal errors σ µ,i (i = 1, 2, . . . , N bin ) are given by Equation (C4). The l µ function sets azimuthally integrated constraints on Σ(θ), providing the otherwise unconstrained normalization of Σ(θ) over a set of concentric annuli where magnification measurements are obtained. No assumption is made about the azimuthal symmetry of Σ(θ) in our analysis. We use Monte Carlo integration to compute the projection matrix P im (Equation (18)) of size N bin × N pix , which is necessary to predict { n µ,i (λ)} N bin i=1 for a given model λ = (m, c).

Calibration Parameters
In our joint likelihood analysis, we account for the uncertainty in the observational calibration parameters, with W g = β g /β ∞ , W µ = β µ /β ∞ , and f W,g = β 2 g / β 2 g (Section 2.2). To this end, we include Gaussian priors on c defined with mean values and uncertainties directly estimated from data. Specifically, we use for each parameter the mean and uncertainty estimated from the Suprime-Cam data (Tables 3 and 4) as the center and dispersion of the prior distribution, respectively.

Best-fit Solution and Covariance Matrix
The log-posterior function F (λ) = − ln P (λ|d) is written as a linear sum of the log-likelihood and log-prior (or quadratic penalty) terms. The global maximum of the joint posterior probability distribution function (PDF) over λ is found by minimizing F (λ) with respect to λ. We use the conjugate-gradient algorithm (see Press et al. 1992) to find the global solution λ. We employ an analytic expression for the gradient function ∇F obtained in the nonlinear, subcritical regime (see Appendix B of Umetsu et al. 2018).
The reconstructed mass pixels are correlated primarily because the relation between the shear and convergence is nonlocal (Equation (2)). Additionally, the effects of spatial averaging (Equation (14)) and cosmic noise due to projected uncorrelated large scale structure can produce a covariance between different pixels. In our analysis, the effects of correlated errors are modeled analytically. Specifically, we take into account the statistical and cosmic-noise contributions to the total covariance matrix C mn = Cov(Σ m , Σ n ) (m, n = 1, 2, . . . , N pix ) as where C stat is given by (C stat ) mn = F −1 mn with F the Fisher matrix evaluated at the best-fit solution λ = λ (see Appendix B of Umetsu et al. 2018), and (C lss ) mn = Σ 2 cr ξ lss (|θ m − θ n |) with ξ lss (|θ|) the cellaveraged two-point angular correlation function for the cosmic convergence field κ lss (θ) (Kaiser 1992). In this work, we approximate the pixel window function (e.g., Hu & White 2001) by a Dirac delta function centered at each pixel and compute the elements of the C lss matrix for a given source population (see Section 3.3), using the nonlinear matter power spectrum of Smith et al. (2003) for the base-ΛCDM model from Planck 2018 cosmic microwave background (CMB) anisotropy data in combination with CMB lensing (Planck Collaboration et al. 2020, see their Table 2).

SUBARU DATA AND WEAK-LENSING ANALYSIS
In this section, we describe our new weak-lensing analysis of A370 based on deep Suprime-Cam BR C z imaging.
In this study, we analyze the Suprime-Cam data using our reduction and analysis pipelines presented in Umetsu et al. (2014, see also Umetsu et al. 2015, who performed a homogeneous weak-lensing analysis of 20 high-mass clusters targeted by the CLASH program. As detailed in Section 3.1, the present analysis further implements an improved astrometry based on the Gaia mission (Gaia Collaboration et al. 2021).

Data and Photometry
We analyze deep BR C z images centered on A370 observed with the wide-field camera Suprime-Cam (34 × 27 ; Miyazaki et al. 2002) mounted at the prime focus of the 8.2 m Subaru Telescope. Details of the Subaru/Suprime-Cam observations are summarized in Table 2. We use existing archival data taken from SMOKA. 2 The R C -band images 2 http://smoka.nao.ac.jp For the weak-lensing shape measurements (Section 3.2), we use the R C -band data, which have the best image quality in our data sets. Figure 1 shows a Suprime-Cam BR C z composite color image of the cluster field, produced using the TRILOGY software (Coe et al. 2012). The image is overlaid by mass contours from our weak-lensing analysis (Section 4) and X-ray brightness contours from our Chandra analysis (Section 6).
The image reduction pipeline used in this study derives from Nonino et al. (2009). Several modifications and improvements have been applied to the original pipeline (e.g., Umetsu et al. 2012Umetsu et al. , 2014Umetsu et al. , 2015Medezinski et al. 2013Medezinski et al. , 2016. In particular, it has been optimized separately for accurate photometry and shape measurements. For multi-band photometry, standard reduction steps include bias subtraction, super-flat-field correction, and masking of saturated star trails and other artifacts. Photometric catalogs are created using SEXTRACTOR (Bertin & Arnouts 1996) in the dualimage mode on PSF-matched images, with the Suprime-Cam z band image as the detection image.
An accurate astrometric solution was derived with the SCAMP software (Bertin 2006) using Gaia Data Release 2 (DR2; Gaia Collaboration et al. 2016Collaboration et al. , 2018 as an external reference catalog. An astrometric solution has been obtained at the camera level using Gaia DR2 sources extracted from individual exposures for each CCD chip. This astrometric solution does not account for the proper motions of Gaia DR2 sources since the epoch of the Suprime-Cam observations. Comparing with the astrometric solution obtained from proper-motion-corrected source positions based on Gaia Early Data Release 3 (EDR3; Gaia Collaboration Figure 1. Subaru/Suprime-Cam BRCz composite color image centered on A370, overlaid with mass contours from our weak-lensing analysis (Section 4). The image is 15 × 15 in size. The mass map is smoothed with a Gaussian of 1.2 FWHM. The lowest contour level and the contour interval are ∆κ = 0.08. Also overlaid are logarithmically spaced X-ray brightness contours (cyan dashed) from Chandra observations in the 0.5-7 keV energy band (Section 6). The horizontal bar represents 1 h −1 Mpc at the cluster redshift. North is up and east is to the left. et al. 2021), we find a mean positional offset of ≈ 2-8 mas and an rms of ≈ 20-30 mas for the R C -band astrometry.
The SWARP software (Bertin et al. 2002) is used to stack individual exposures on a common World Coordinate System (WCS) grid with pixel scale of 0.2 . No point spread function (PSF) matching is applied. For each passband, we create a full stack of co-added images from which to measure source photometry. For the weak-lensing band (R C ), we additionally create two separate co-added images, each from different camera rotation angles (see Section 3.2).
Once the Subaru images had been combined into a full stack of co-added images, a catalog was then produced and matched directly to the corresponding Gaia DR2 sources to validate the astrometric properties of the full stack. A total of 428 sources from the full stack catalog were matched directly to Gaia DR2 sources, and the positional offsets were measured between each of these catalog sources and their matched Gaia DR2 counterparts. The resulting distribution of positional offsets displays well-behaved symmetry, with an rms uncertainty of 35 mas in R.A. and 33 mas in decl., demonstrating that the positional accuracy of the full stack is in good agreement with the accuracy of the Gaia DR2 alignment carried out on each of the individual single-exposure frames that were used to construct the full stack, as previously described.
Finally, after having verified that the positional uncertainties of the full stack catalog sources were comparable to those of all the individual single-exposure frames, this full stack catalog was then aligned to Gaia EDR3, to ensure that the absolute astrometry could be as up-to-date as possible. The astrometric difference between sources from the full-stack catalog (which was still on Gaia DR2) and the matched sources from Gaia EDR3 show excellent agreement, with only a small difference needing to be applied to place these sources onto Gaia EDR3, namely 2.1 mas in R.A. and 1.4 mas in decl., perhaps due to slight residual differences in proper motion corrections, and not significant compared to the rms uncertainties of 33-35 mas in the catalog source positions.
The photometric zero point for the Suprime-Cam z filter was calibrated against stars from the Pan-STARRS data release 1 (DR1) catalog (Flewelling et al. 2020). The zero points for the Suprime-Cam B and R C filters were derived by matching the stellar locus in the B − R C vs. R C − z diagram to the COSMOS2020 photometry (Weaver et al. 2022). These zero points were further refined by matching the color distributions in the B − R C vs. R C − z diagram between our Suprime-Cam data and the COSMOS2020 data. The magnitudes for galaxies were corrected for foreground Galactic extinction according to Schlegel et al. (1998). Full details of our photometric calibration are described in Appendix B.

Shape Measurement
We use our shape measurement pipeline based in part on the IMCAT package (Kaiser et al. 1995, KSB), with modifications incorporating several key improvements developed by Umetsu et al. (2010Umetsu et al. ( , 2014. In this work, we perform a weak shear analysis of A370 following the procedure of Umetsu et al. (2014).
Here we briefly summarize some of the main features and refer to Umetsu et al. (2014) for details of the analysis pipeline. We select isolated galaxy images for the shape measurement, reducing the impact of crowding and blending. After the rejection of close pairs, objects detected with low significance ν g < 10 are excluded from our analysis. Here ν g is the peak detection significance given by IMCAT's peakfinding algorithm. We select galaxies detected with high significance ν g 30 as a sample of shape calibrators, which is a subset of the target galaxy sample with ν g 10. The key feature of our analysis pipeline is that only those galaxies detected with sufficiently high significance, ν g 30, are used to model the isotropic PSF correction as a function of object size and magnitude . This calibration method is designed to minimize the inherent noise bias and was employed by the CLASH and LoCuSS collaborations in their cluster weak-lensing studies based on Subaru/Suprime-Cam data (Umetsu et al. 2014;. For the shape measurement, we separately stack R C -band images collected at two different camera rotation angles (Section 3.1). In this way, we do not smear individual exposures before stacking, so as not to degrade the weak-lensing signal derived from the shapes of galaxies (Umetsu et al. 2014(Umetsu et al. , 2015. A shape catalog is created for each camera rotation separately. The two subcatalogs are combined by properly weighting and stacking the calibrated distortion measurements for galaxies in the overlapping region (Umetsu et al. 2014, see their Section 4.3). All galaxies with usable shape measurements are matched to those in our BR C zselected background samples (see Section 3.3).
Our KSB+ implementation has been extensively tested and applied to ground-based observations of a large number of massive clusters including 20 CLASH clusters (Umetsu et al. 2014;Merten et al. 2015). Full details of our shear recovery test based on simulated Subaru/Suprime-Cam images are found in Umetsu et al. (2018, see their Appendix A). They found that the reduced shear signal g α (α = 1, 2) can be recovered with m α ≈ −0.05 of the multiplicative calibration bias and |c α | ∼ 10 −4 of the additive shear bias. Here the observed and true values of the reduced shear (g obs , g true ) are related by (Heymans et al. 2006;Massey et al. 2007), Accordingly, we include for each galaxy a shear calibration factor of 1/0.95 (g → g/0.95) to account for the residual multiplicative bias.

Background Galaxy Selection
Contamination of background galaxy samples by unlensed objects, when not accounted for, leads to a systematic underestimation of the true lensing signal. Inclusion of foreground galaxies produces a dilution of the lensing signal that is independent of the cluster radius. In contrast, the inclusion of cluster members dilutes the lensing signal more strongly at smaller cluster radii . A secure selection of background galaxies is thus essential for obtaining accurate cluster mass estimates from weak lensing (e.g., Medezinski et al. 2010;Okabe et al. 2013;Gruen et al. 2014).
In this study, we employ the color-color (CC) selection method of Medezinski et al. (2010) (see also Medezinski et al. 2018) to define background galaxy samples for measuring both shear and magnification effects. We use BR C z photometry from Subaru/Suprime-Cam, which spans the full optical wavelength range. The CC-cut selection method has been calibrated with evolutionary color tracks of galaxies (Kotulla et al. 2009;Medezinski et al. 2010Medezinski et al. , 2011 as well as with photometric-redshift (photo-z) catalogs from deep multiwavelength surveys such as COSMOS (Ilbert et al. 2009;Laigle et al. 2016;Weaver et al. 2022). For this purpose, we use the photometric properties and redshifts derived from the COSMOS2020 catalog (Weaver et al. 2022) based on the FARMER photometry using the LEPHARE code (Ilbert et al. 2006).
In Figure 2, we show the distribution of galaxies in the B − R C vs. R C − z plane obtained for the COSMOS field . Binned distribution of galaxies in color-color space for the COSMOS field (left) and A370 (right). Color boundaries of the blue and red background samples (left blue and lower-right red regions, respectively) selected on the basis of Subaru BRCz photometry are indicated in each panel. In the right panel, the green polygon marks the boundaries of our green sample dominated by red-sequence galaxies of A370 at z = 0.375. The middle peak with colors bluer than the cluster sequence shows the overdensity of foreground galaxies (see also Figure 3). The plots in both panels are limited to z < 26 mag, which is close to our detection limit (Table 2)  (left panel) and A370 (right panel). Similarly, Figure 3 shows the binned average photo-z distribution of COSMOS field galaxies in CC space. As demonstrated by Medezinski et al. (2010Medezinski et al. ( , 2011, the color region dominated by the foreground population is well defined in CC space as a clear overdensity Medezinski et al. (2010Medezinski et al. ( , 2011, we select two distinct populations that encompass the "red" and "blue" branches of background galaxies in CC space, each with typical redshift distributions N (z) peaked around z ∼ 1 and ∼ 2, respectively (see Medezinski et al. 2011;Lilly et al. 2007).

NOTE-Subaru
BRCz selected samples of background galaxies. We use the composite blue+red background sample for our weak-lensing shear analysis. The mean lensing depth β and the spread parameter f W = β 2 / β 2 for each source population are estimated using photometric redshifts from the COSMOS2020 FARMER catalog. The quantity z eff represents the effective source redshift of each sample, defined as β(z eff ) = β . The S/N is the detection significance for the tangential distortion profile g+(θ).
The color boundaries of our CC-cut samples are shown in Figure 2. The green polygon shown in the right panel marks the boundaries of the "green" sample comprising mostly the red-sequence galaxies of A370. We see in Figure 2 that the foreground peak for A370 is more pronounced compared to the COSMOS field. This enhancement is likely due to the contribution from bluer cluster members and galaxies in the surrounding regions (see Umetsu et al. 2012Umetsu et al. , 2015. To further reduce residual contamination by bright foreground objects, we apply bright magnitude cuts of z > 21 and 22 mag for the red and blue photometry samples, respectively (Medezinski et al. , 2018. These selection criteria yield a total of 31306 and 14954 galaxies for the red and blue NOTE-Lensing-cut and null-test samples of CC-red background galaxies selected for our weak-lensing magnification analysis. Apparent magnitude cuts are applied in the reddest CC-selection band available (z ) to avoid incompleteness near the detection limit ( Table 2). The mean lensing depth β for each source population is estimated using photometric redshifts from the COSMOS2020 FARMER catalog. The quantity z eff represents the effective source redshift corresponding to the mean lensing depth β of each sample, defined as β(z eff ) = β . The S/N is the detection significance for the magnification bias profile bµ(θ) = nµ(θ ) /nµ. photometry samples, respectively. For our shear analysis, we use the weak-lensing-matched, blue and red composite sample containing 16667 galaxies with usable R C shape measurements, corresponding to a mean surface number density of n g ≈ 21 galaxies arcmin −2 (Table 3).
To measure the magnification bias, we use magnitudelimited samples of CC-red galaxies. For the measurement of density depletion (Section 2.5), we define a "lensing-cut" sample by applying a faint magnitude cut of z = 25.6 mag to the red photometry sample (Table 4). 3 On the other hand, since the net effect of magnification bias is expected to vanish at s = 0.4 (Section 2.5), lensing magnification also provides a null test, which allows us to assess the level of residual bias that could be present in the measurement for the lensing-cut sample (see Chiu et al. 2020;. To this end, we define a "null-test" sample with a faint magnitude cut of z = 23.6 mag, at which the count slope is found to be s = 0.404 ± 0.069 (Table 4).

Lensing Depth Estimation
To assess the mean lensing depth ( β , β 2 ; see Equation (6)) for our CC-cut samples, we use the COSMOS2020 FARMER catalog with robust photometry and photo-z measurements. For each background sample, we apply the same cuts to the COSMOS multi-band photometry and obtain the redshift distribution N (z) of the selected galaxies. The lensing weight w g (see Section 2.4) is not taken into account in the depth estimation, because there are no photo-z estimates available for our background sample in the A370 field. 4 The 3 Our CC-cut selection is not expected to cause incompleteness at the faint end in the bluer filters (see Hildebrandt et al. 2012 for a general discussion) because we have deeper photometry in the bluer bands  and our CC-red galaxies are relatively blue in B − R C (Figure 2). 4 The effect of neglecting the lensing weight wg was checked using photo-z and shape catalogs based on Suprime-Cam 5-band imaging available for CLASH clusters at similar redshifts, z l ∈ [0.35, 0.40] (Umetsu et al. 2014). The fractional differences in the estimated β values are found to be < 1%, which is not significant compared to the total fractional uncertainty of 5% adopted in this study. resulting depth estimates for our shear and magnification analyses are summarized in Tables 3 and 4, respectively. 5 For a consistency check, we also make use of photo-z estimates from alternative aperture-based COSMOS2020 photometry, CLASSIC (Weaver et al. 2022). For each sample, we obtain consistent depth estimates (to within 1%) from the FARMER and CLASSIC catalogs. Taking into account the field-to-field variance in N (z) (Umetsu et al. 2014, see their Section 4.4), we assume a fractional uncertainty of 5% in the COSMOS-based estimates of β . We marginalize over this uncertainty in our mass reconstruction (Section 4).
The level of residual cluster contamination for the CCcut method has been assessed by Umetsu et al. (2016) using large spectroscopic samples from the CLASH-VLT program (Rosati et al. 2014). Combining VLT spectroscopic redshifts and Subaru multi-band photometry available for 10 southern CLASH clusters with a mean redshift of z ≈ 0.37, Umetsu et al. (2016) found a mean contamination fraction of (2.4 ± 0.7)% in the blue+red CC-cut sample. This level of residual contamination is subdominant compared to other uncertainties in our lensing analysis. Figure 4 shows the azimuthally averaged tangential (g + ) and cross (g × ) components of the reduced shear as a function of projected cluster radius. We find a rising g + (θ) profile toward the cluster center from both blue and red background samples. In contrast, the g + (θ) signal for the green sample is suppressed by the inclusion of cluster members and consistent with zero at R < ∼ 2 h −1 Mpc, while it becomes comparable to the pure background signal outside the cluster region.

Null Tests
In the absence of higher-order effects, weak lensing only produces tangential shape distortions (Section 2.1). The presence of × distortions can thus be used to check for systematic errors. Here we use a χ 2 test to assess the statistical significance of the measured ×-mode signal against the null  hypothesis. We find χ 2 values of the null hypothesis to be 10. 6, 16.4, 9.8, and 14.9 for N bin = 12 degrees of freedom, for the red, blue, green, and blue+red samples, respectively. For all the cases tested, the ×-component signal is statistically consistent with a null detection. Figure 5 shows the coverage-and mask-corrected surface number density of background galaxies as a function of projected cluster radius, for the lensing-cut and null-test samples. In both cases, no clustering is observed toward the center, demonstrating that there is no detectable contamination by cluster members. The lensing-cut sample reveals a systematic decrease in their counts toward the cluster center, caused by magnification of the sky area. In contrast, the nulltest sample shows no significant evidence for radial count variations with χ 2 = 10.0 for 12 degrees of freedom, as expected by their count slope. A more quantitative magnification analysis will be discussed in Section 4. Before carrying out a 2D mass reconstruction, we first perform a weak-lensing 1D radial profile analysis (WL-1D) of our Subaru observations (Section 3). A370 has two central BCGs separated by ≈ 37 (about 140 h −1 kpc at z = 0.375) along the north-south direction (Figure 1). In this work, we adopt the optical center, or the midpoint of the two BCGs (see Table 1), as the cluster center for our radial profile analysis.
We derive azimuthally averaged radial profiles of tangential reduced shear (g + ) and magnification bias (n µ ) from our [ kpc] Figure 5. Coverage-and mask-corrected surface number density profiles nµ of BRCz -selected red background samples. The results are shown for our lensing-cut (circles) and null-test (crosses) samples. The error bars include both Poisson and clustering contributions estimated from the data. For the lensing-cut sample, a radial count depletion due to magnification of the sky area is seen toward the cluster center. For the null-test sample with s ≈ 0.4, the net effect of magnification bias is expected to vanish. The mean background levels estimated for the lensing-cut and null-test samples are marked with solid and dashed horizontal lines, respectively.
Subaru/Suprime-Cam data. We calculate the binned lensing profiles, {g +,i } N bin i=1 and {n µ,i } N bin i=1 , in N bin = 12 logarithmically spaced radial bins centered on the cluster, spanning the range from θ min = 1.3 to θ max = 16 , with a logarithmic spacing of ∆ ln θ ≈ 0.21. Our radial profile analysis begins at θ min = 1.3 , which is sufficiently large compared to twice the effective Einstein radius, 2θ Ein ∼ 1.1 (for z s = 2; Table 1), determined from strong-lens modeling by Kawamata et al. (2018). Hence, our analysis does not include outer multiple images of strongly lensed galaxies lying at θ ∼ [θ Ein , 2θ Ein ], and our data do not resolve the central substructures. The outer boundary θ max = 16 (≈ 3.5 h −1 Mpc) is large enough to encompass the entire cluster region with r vir ∼ 2 h −1 Mpc , but sufficiently small compared to the size of the Suprime-Cam field of view so as to ensure accurate PSF correction.
For the magnification analysis, the count normalization parameter n µ is estimated in the reference background region at θ ∈ [12 , 16 ]. 6 The estimated values and errors for n µ and 6 The 2-halo term (κ < ∼ 10 −2 ) does not cause bias in the mass reconstruction, because the range of the prior on nµ is sufficiently wide. See Umetsu et al. (2014, their Section 7.4.2) for detailed discussion. s are summarized in Table 4. Details of the error analysis and the mask correction procedure are described in Appendix C. We reconstruct the radial mass profile of A370 from a joint likelihood analysis of azimuthally averaged shear and magnification constraints, using the CLUMI code of . We have a total of 24 constraints {g +,i , n µ,i } N bin i=1 in 12 radial bins. The model is described by N bin + 1 = 13 parameters, is the average surface mass density interior to θ min 7 and Σ i is the surface mass density averaged in the ith bin. In addition, we account for the calibration uncertainty in the observational parameters c (Equation (25); see Tables 3 and 4). 8 Following Umetsu et al. (2014), we fix f W,g to the estimated value. Figure 6 compares the observed lensing profiles {g +,i , n µ,i } N bin i=1 with the respective joint reconstructions. The joint solution has a χ 2 value of 18.7 for 11 degrees of freedom, indicating a slight (but statistically not significant) discrepancy between the two data sets. We see from the lower panel of Figure 6 that the measured n µ value at θ ∼ 5 is ∼ 2σ lower than expected from the joint reconstruction. This is consistent with the result for the null-test sample, which exhibits a similar local deficit of the galaxy counts in the same radial bin (see Figure 5). For the other bins, we find a good agreement between the shear and magnification data. The reconstructed Σ(θ) profile is shown in the upper panel of Figure 7, along with the 1σ confidence interval of the spherical Navarro-Frenk-White (Navarro et al. 1996(Navarro et al. , 1997, hereafter NFW) model (see Section 5 for details of the modeling). The corresponding cumulative mass profile M 2D (< θ) = π(D l θ) 2 Σ(< θ) is shown in the lower panel of Figure 7.

Two-dimensional Map Making (WL-2D)
We apply our CLUMI-2D method (Section 2.6) to our Subaru/Suprime-Cam data (Section 3) for obtaining an unbiased recovery of the projected mass distribution Σ(θ) in A370. In this analysis (WL-2D), we combine the observed shear field (g 1 (θ), g 2 (θ)) with the azimuthally averaged magnification data {n µ,i } N bin i=1 (Section 4.1), which impose a set of azimuthally integrated constraints on the underlying Σ(θ) field. CLUMI-2D takes into account the nonlinear subcritical regime of the lensing properties. Figure 8. Spatial distribution of reduced shear constraints (g1, g2) averaged onto a mass grid of 48 × 48 pixels, covering a field of 24 × 24 centered on A370. Each point represents a single pixel with measured (g1, g2) averaged within a top-hat region with radius θ f = 0.4 . We exclude from our analysis those pixels lying within the central 1 region (red circle) and those having no background galaxies with usable shape measurements. Azimuthally averaged magnification constraints are obtained in 12 logarithmically spaced annuli centered on the cluster spanning the range θ ∈ [1.3 , 16 ].
We use a top-hat window of θ f = 0.4 (Section 2.4) to average over a local ensemble of galaxy image ellipticities at each grid point, accounting for the intrinsic ellipticity distribution of background galaxies. To avoid potential systematic errors (see Section 2.6.1), we exclude from our analysis 12 pixels lying within the central θ cut = 1 and one pixel containing no background galaxies. For distortion measurements (g 1 (θ), g 2 (θ)), this leaves us with a total of 2291 usable measurement pixels (blue points in Figure 8), corresponding to 4582 constraints. For magnification measurements, we have 12 azimuthally averaged constraints {n µ,i } N bin i=1 (Figure 8). The total number of constraints is thus N data = 4594, yielding N data − N pix = 2290 degrees of freedom. For visualization purposes, the mass map is smoothed with a 3 × 3 pixel boxcar kernel. The dashed contours show the surface density distribution of red-sequence cluster galaxies, smoothed with a Gaussian of 1.2 FWHM. The lowest contour level and the contour interval are both 10% of the peak density n ≈ 47 galaxies arcmin −2 . The red circle indicates the cluster radius of r200 ≈ 1.7 h −1 Mpc. North is to the top, east to the left.
In Figure 9, we show the Σ(θ) field reconstructed from the joint analysis of the 2D shear and azimuthally averaged magnification data. The χ 2 value for the global maximum posterior solution is χ 2 ( λ) = 2871 for 2290 degrees of freedom. For comparison, we plot in Figure 9 the surface density distribution of the green sample (dashed contours) composed mostly of cluster members. The projected mass distribution is elongated in the north-south direction and similar to that of cluster member galaxies (Figure 9). Our mass reconstruction barely resolves substructure features (e.g., a north-south mass extension located about 1 north and south of the cluster center) revealed by the free-form mass inversion of Ghosh et al. (2021) based on BUFFALO strong-lensing data. We defer a more detailed investigation of weak-lensing substructures in the A370 field to a forthcoming paper (S.-I. Tam et al. 2022, in preparation). We construct the binned radial profiles Σ(θ) and Σ(< θ) and their associated covariance matrices from an optimally weighted projection of the Σ(θ) map using the method described in Appendix D. We thus obtain model-independent constraints on the projected total mass M 2D (< θ) = π(D l θ) 2 Σ(< θ) from our WL-2D analysis. The resulting projected mass estimates are listed in Table 5.

Radial Mass Profiles
In Figure 10, we compare the surface mass density profiles Σ(θ) of A370 obtained from our WL-1D (Section 4.1) and WL-2D (Section 4.2) analyses. Our 1D-and 2D-based Σ profiles are consistent within the errors in each radial bin. The gray shaded area in the figure represents the 1σ confidence region of the spherical NFW fit to the 1D-based Σ profile (see Section 5 for details of the modeling).
For comparison, we overplot in Figure 10 the azimuthally averaged Σ profile (shown out to 2θ Ein ≈ 1.1 for z s = 2) based on strong lens modeling of Hubble Frontier Fields data performed by Kawamata et al. (2018, see also Oguri 2010Kawamata et al. 2016), obtained using the technique detailed in Oguri (2021) to speed up lensing calculations. The inner Σ profile derived from HST strong lensing is in excellent agreement with our WL-1D constraints on the NFW profile.
In Figure 10, our Σ profiles are also compared with the 1D results of Umetsu et al. (2011) based on their joint shear and magnification analysis of Suprime-Cam data. In the Umetsu et al. (2011) analysis, the innermost measurement radius was taken to be θ min = 0.7 (≈ 1.2θ Ein for z s = 2), in contrast to the conservative choice adopted in this work (θ min = 1.3

MASS MODELING OF A370
In this section, we present mass modeling of A370. With ground-based Subaru weak-lensing observations alone, we cannot spatially resolve the bimodal structure of the cluster in the supercritical region (Sections 4.1 and 4.2). In this study, we thus restrict ourselves to single-component mass models of a spherical or ellipsoidal halo. We forward model pro-  Molnar et al. (2020) found that initial conditions of the two progenitors with an infall velocity of 3500 km s −1 and an impact parameter of 70 h −1 kpc can reproduce the positions and the offsets between the peaks of the X-ray emission and the total mass surface density, the amplitude of the integrated SZE signal (Czakon et al. 2015), and the relative line-of-sight velocity between the two BCGs (V ≈ 1024 km s −1 ). Moreover, the best-matching simulation reproduces well the velocity dispersion and the line-of-sight velocity distribution of cluster member galaxies (Lagattuta et al. 2019;Molnar et al. 2020). These simulation results support the large total mass of A370 derived from lensing .
The binary merger simulations of Molnar et al. (2020) suggest that A370 is a post-major merger of two similar-mass clusters, viewed after the second core passage in the infalling phase, just before the third core passage. These results also suggest that the mass distribution of A370 is highly elongated along the current direction of the collision axis, which is oriented close to the line of sight in their best simulation, with a viewing angle of ϑ = 17.6 • ± 3.5 • , or cos ϑ = 0.95 ± 0.02.

Triaxial NFW Model
Triaxial modeling of density profiles gives an improved description of simulated ΛCDM halos over the conventional spherical model (Jing & Suto 2002;Kasun & Evrard 2005). In this work, we model the cluster mass distribution with a triaxial NFW density profile. The radial dependence of the spherical NFW profile is given by (Navarro et al. 1996(Navarro et al. , 1997 where ρ s is the scale density and r s is the characteristic scale radius at which the logarithmic slope of the density profile equals −2. We generalize the NFW density profile ρ(r) to obtain its triaxial expression by replacing r and r s with their respective ellipsoidal radii R and R s as with q a and q b the minor-major and intermediate-major axis ratios, respectively. By definition, we have 0 < q a q b 1. The degree of triaxiality is defined as ) where 0 T 1 by construction. The value of T approaches unity at q a = q b (or zero at q b = 1), if the halo shape is maximally prolate (or oblate). For q a = q b = 1, Equation (29) reduces to the spherical NFW profile ρ(r) with r = √ X 2 + Y 2 + Z 2 . We define an ellipsoidal overdensity radius R ∆ (Corless et al. 2009;Sereno & Umetsu 2011;Buote & Humphrey 2012) such that the mean interior density contained within an ellipsoidal volume of semimajor axis R ∆ is ∆ × ρ c (z l ). The total mass enclosed within R ∆ is expressed as Spherical-equivalent overdensity radii r ∆ are defined by Similarly, we define r s = (q a q b ) 1/3 R s . The triaxial concentration parameter is defined as the ratio of the ellipsoidal overdensity radius R ∆ to the scale radius R s along the major axis, The characteristic density is then expressed as In this study, we use ∆ = 200 to define the halo mass, M 200 , and the concentration parameter, c 200 .
A triaxial halo is projected onto the lens plane as elliptical isodensity contours, which can be expressed as a function of the intrinsic halo axis ratios (q a , q b ) and orientation angles (ϑ, φ, ψ) with respect to the observer's line of sight. Following Umetsu et al. (2015), we adopt the z-x-z convention of Euler angles (ϑ, φ, ψ) to be consistent with Stark (1977, see also Sereno et al. 2012). The angle ϑ represents the inclination of the major axis (Z) with respect to the line of sight.
After a rotation by the first two Euler angles (ϑ, φ), elliptical isodensity contours of the projected ellipsoid can be described as a function of the elliptical radius ζ, expressed in terms of projected Cartesian coordinates (x , y ) as The minor-major axis ratio ( 1) of the elliptical isodensi- Finally, the third Euler angle ψ describes the additional rotational degree of freedom in the sky plane to specify the observer's coordinate system (x, y), defined such that x = x cos ψ − y sin ψ and y = x sin ψ + y cos ψ.
For a self-similar mass model expressed as ρ(R) = ρ s f 3D (R/R s ), the projected mass density Σ(ζ) is related to ρ(R) (see Equation (29)) as (Umetsu et al. 2015) where Σ s is the scale surface mass density defined by ξ = x 2 + y 2 /q 2 ⊥ , and ξ s is the semi-major scale length of the projected halo. Here we have chosen the new coordinate system (x , y ) such that the x axis is aligned with the major axis of the projected ellipse. In this study, we employ the radial dependence of the projected NFW profile f 2D (u) as given by Wright & Brainerd (2000).
To summarize, our mass model is specified by a total of seven parameters describing the total matter ellipsoid, namely, halo mass and concentration (M 200 , c 200 ), intrinsic axis ratios (q a , q b ), and three Euler angles (ϑ, φ, ψ): In this way, for a given set of the model parameters, we can project a triaxial (or spherical) NFW halo onto the lens plane and compute the surface mass density Σ(x, y) at each angular position. As discussed in Umetsu et al. (2015) (see also Sereno & Umetsu 2011), however, it should be noted that 2D lensing observations can effectively constrain only four observationally accessible parameters, namely, Σ s , ξ s , q ⊥ , and the position angle of the projected major axis (Gavazzi 2005). That is, the deprojection of triaxial systems is intrinsically underconstrained (Limousin et al. 2013). On the other hand, the spherical NFW model (q a = q b = 1) is specified by two parameters, (M 200 , c 200 ), which can be constrained by data in principle.

Bayesian Inference Procedure
The likelihood function L of the 2D mass distribution data m = {Σ(θ n )} Npix n=1 given a set of model parameters p is expressed as (Oguri et al. 2005;Umetsu et al. 2015) where Σ n (p) = Σ(θ n |p) is the surface mass density at the grid position θ n predicted by the model p and C = C stat + C lss is the total covariance matrix (Equation (26)).
We use a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to obtain a well-characterized inference of the model p. We consider the following three different modeling approaches: (1) spherical modeling with uninformative uniform priors on log M 200 and log c 200 , (2) fiducial triaxial modeling with uninformative uniform priors on all parameters, and (3) triaxial modeling incorporating an informative line-of-sight (LOS) prior from Molnar et al. (2020). For simplicity, we refer to these three modeling approaches as Spherical, Triaxial, and Triaxial+LOS modeling, respectively.
Here we briefly summarize the assumed priors for each case. 2. Fiducial triaxial modeling: We use uniform priors on the intrinsic shapes (q a , q b ) and orientation angles (cos ϑ, φ, ψ), while keeping the same log-uniform priors on M 200 and c 200 as in the spherical case. We assume the following form of the prior PDF for the intrinsic axis ratios: where and q min = 0.1 is the lower bound of the minorto-major axis ratio q a (e.g., Oguri et al. 2005;Chiu et al. 2018b), which is introduced to exclude unstable configurations that are not expected for cluster halos. For the orientation angles, we consider a population of randomly oriented halos with P (cos ϑ) = 1 for 0 cos ϑ 1, P (φ) = 1/π for −π/2 φ π/2, and P (ψ) = 1/π for −π/2 ψ π/2.

Triaxial+LOS modeling: We adopt an informative
prior on cos ϑ based on the binary merger simulations of Molnar et al. (2020). For the other parameters, we use the same priors as for the fiducial triaxial modeling. Specifically, we employ a Gaussian prior on cos ϑ of 0.95 ± 0.02 (Section 5.1) truncated in the range 0 cos ϑ 1.
For comparison purposes, we perform spherical NFW modeling with the surface mass density profile m 1D = {Σ min , Σ i } N bin i=1 derived from the WL-1D analysis (Section 4.1). The likelihood function L(p) for the WL-1D analysis is defined as in Equation (26) of Umetsu et al. (2014). In the covariance matrix of WL-1D, we account for systematic effects due to the residual mass-sheet degeneracy, in addition to the measurement error and cosmic noise contributions (see Section 2.7). This residual uncertainty is estimated in each Σ bin as a difference between the joint and marginal posterior solutions (see Umetsu et al. 2014). 10 Similarly, we also perform spherical NFW modeling with the reduced tangential shear profile {g +,i } N bin i=1 obtained in our WL-1D analysis, because this tangential shear fitting is the standard approach to infer cluster masses from weaklensing data (e.g., Okabe et al. 2013;Applegate et al. 2014;Hoekstra et al. 2015;Schrabback et al. 2018). Here we account for the measurement error and cosmic noise contributions in the covariance matrix (see Section 4.4 of Umetsu 2020).

Posterior Parameter Constraints
The main results from our Bayesian inference of the spherical and triaxial NFW models are summarized in Table 6. As summary statistics, we employ the biweight estimator of Beers et al. (1990) to represent the center location (C BI ) and 10 Because of the large number of parameters involved, we do not explore the whole likelihood surface in the CLUMI-2D code, and thus we are not able to include the systematic term in the WL-2D analysis. As we have seen in Figure 10, our WL-1D and WL-2D results are consistent with each other, with no significant evidence for a systematic offset. . Marginalized 1D and 2D (68%, 95%, and 99.7% confidence level contours) posterior distributions for the NFW model parameters (log 10 M200, log 10 c200) obtained using uniform priors assuming spherical symmetry. Blue shaded contours show the constraints obtained from the 1D mass reconstruction (WL-1D; see Figure 7). Orange contours show the constraints from the 2D reconstruction (WL-2D; see Figure 9). the scale or spread (S BI ) of marginalized 1D posterior PDFs (e.g., Umetsu et al. 2020). For a lognormally distributed quantity, C BI approximates the median of the distribution. Triaxial modeling allows for a more general description of the intrinsic shape of cluster halos, leading to broader posterior distributions than the spherical case (Oguri et al. 2005;Sereno & Umetsu 2011). The parameter constraints become more degenerate because of the lack of information of the halo elongation along the line of sight. These trends are found in the posterior distributions from our data.
Our  NOTE-Cluster halo parameters derived from a spherical or triaxial NFW fit to Subaru weak-lensing data. We adopt a concordance cosmology of h = 0.7, Ωm = 0.3, and Ω Λ = 0.7. We note that the degree of triaxiality T is a derived parameter that depends on qa and q b (see Equation (31)). As posterior summary statistics, we use the biweight estimator of Beers et al. (1990) to represent the center location (C BI ) and the spread (S BI ) of marginalized 1D posterior distributions. For each parameter, symmetrized biweight statistics C BI ± S BI are shown. The κ(θ) profile is reconstructed from the WL-1D analysis of the {g + (θ), nµ(θ)} data set (Section 4.1), while the κ(θ) map from the WL-2D analysis of the {g 1 (θ), g 2 (θ), nµ(θ)} data set (Section 4.2). a factor of ∼ 1.3 larger than that of Spherical modeling. It is insightful to compare our results with those of Umetsu et al. (2015), who performed a WL-2D analysis of the superlens cluster Abell 1689 based on deeper Suprime-Cam observations. Analyzing their WL-2D data, Umetsu et al. (2015) obtained fractional uncertainties in M 200 of ≈ 8% and 20% for their spherical and full-triaxial NFW models, respectively (see the first and second rows of Table 7 in Umetsu et al. 2015). For both clusters, the fractional mass uncertainty in full triaxial modeling is ∼ 20%, suggesting that the mass accuracy in deep weak-lensing observations is essentially limited by the uncertainty in the intrinsic shape and orientation of the cluster. Similar trends are also found for the concentration parameter. To accurately infer the cluster mass and concentration from lensing, it is thus necessary to directly model or marginalize over the 3D shape of clusters; when spherical symmetry is assumed, the effect of the intrinsic shape of the cluster should be accounted for in the error analysis (e.g., Gruen et al. 2015;Umetsu et al. 2016).
We also derive summary statistics on the total mass M ∆ evaluated at several characteristic interior overdensities ∆. Table 7 lists the results of our cluster mass estimates. Our estimates of M vir obtained without the LOS information are consistent within the errors with M vir = 2.28 +0.26 −0.22 × 10 15 h −1 M from the combined weak-and strong-lensing analysis of Umetsu et al. (2011, see Section 5.1). In particular, our WL-1D analysis yields M vir = (1.97 ± 0.40) × 10 15 h −1 M and c vir = 5.91 ± 1.87, in agreement with the results of Umetsu et al. (2011). We find that spherical mass estimates from the WL-2D analysis are slightly lower than but consistent within the errors with the WL-1D results.
It should be noted that the halo mass M ∆ constrained using the LOS prior is likely to be considerably lower than the sum of the initial bound masses of the two progenitors, because A370 is expected to be in a highly disturbed dynamical state (see Section 5.1). Our estimates of M vir obtained without the LOS prior are consistent to better than 2σ with the total mass of the system M vir = 2.3 × 10 15 h −1 M adopted in the binary merger simulations of Molnar et al. (2020).
In Figure 11, we show posterior constraints on the NFW parameters (log M 200 , log c 200 ) inferred from spherical modeling of both WL-1D and WL-2D data. The blue and orange contours in the lower-left panel represent the joint posterior PDFs for WL-1D and WL-2D, respectively, showing good agreement between the two methods. In both cases, the marginalized posterior PDFs for (log M 200 , log c 200 ) are unimodal and symmetric.  Figure 11). The posterior PDFs for the shape and orientation parameters (q a , q b , cos ϑ) from our fiducial modeling are very broad, reflecting the fact that the deprojection of triaxial halos is intrinsically underconstrained. In contrast, the axis ratio of the projected mass distribution, q ⊥ (q a , q b , ϑ, φ) (Section 5.2), can be directly constrained by the WL-2D data. Our posterior inference of the projected axis ratio is q ⊥ = 0.78 ± 0.13 and 0.79 ± 0.13 with and without using the LOS prior, respectively. This is slightly larger than, but consistent with, the median axis ratio q ⊥ ∼ 0.6 expected for randomly oriented cluster-scale CDM halos (Umetsu et al. 2018, see also Bonamigo et al. 2015;Suto et al. 2016).
Compared to the fiducial results obtained with uniform priors, our inference with the informative Gaussian prior on cos ϑ (Triaxial+LOS) prefers a more prolate geometry with lower mass and lower concentration. In fact, there is a slight increase in the posterior probability for a prolate configuration (q a ∼ q b < ∼ 0.6) with lower mass and lower concentration. This can be understood as a consequence of the boosted surface mass density of the cluster lens due to the strongly aligned configuration.

X-RAY DATA AND ANALYSIS
Here we describe our analysis of archival Chandra X-ray data (Section 6.1). We use two complementary approaches to determining the 3D gas density and temperature profiles of A370 under the assumption of spherical symmetry. First, we derive gas densities and temperatures of the cluster in concentric spherical shells from a spectral deprojection analysis (Section 6.2). Second, we perform forward modeling to simultaneously fit X-ray surface brightness profiles binned in multiple energy bands to infer the 3D gas density and tem- Figure 13.
Logarithmically scaled, exposure-corrected, and background-subtracted Chandra ACIS image of A370 in the 0.5-7 keV band, smoothed with a Gaussian of FWHM = 4.6 . The image is 5.2 × 5.2 (1.1 h −1 Mpc on a side at z = 0.375) in size and centered on the optical cluster center (open diamond symbol). The positions of the two BCGs are marked with + symbols. The X-ray centroid position is marked with a × symbol. A bright X-ray source in the north corresponding to a foreground elliptical galaxy is masked by a black solid circle. North is up and east is to the left. perature profiles in a parametric form (Section 6.3). With the forward-fitting method, we will also derive the total mass profile assuming hydrostatic equilibrium.

Chandra Data Reduction
We analyze archival X-ray data of A370 taken with the Advanced CCD Imaging Spectrometer (ACIS; Garmire et al. 2003) on board the Chandra X-ray Observatory. The observation identification (ObsID) numbers of Chandra observations analyzed in this study are 515 and 7715. Our analysis uses the Chandra Interactive Analysis of Observations software (CIAO, version 4.13;Fruscione et al. 2006) and the Chandra Calibration Database (CALDB, version 4.9.5). We checked the light curve of each data set using the lc clean task in CIAO, filtering flare data. The net exposure time of each data set is 62.9 ks and 7.1 ks. Point sources were identified using the wavdetect task in CIAO and excluded from the analysis.
In our spectral analysis, we use the X-ray Spectral Fitting Package (Arnaud 1996, XSPEC version 12.11.1) and the ATOMDB code (Heuer et al. 2021, version 3.0.9) for plasma emission modeling, assuming that the ICM is in collisional ionization equilibrium (Smith et al. 2001). The abundance table of Anders & Grevesse (1989) is used in Here, the abundance of a given element is defined as Z i = (n i,obs /n H,obs )/(n i, /n H, ), where n i and n H are the number densities of the ith element and hydrogen, respectively. We use the iron abundance to represent the ICM metal abundance, such that the abundance of other elements is tied to the iron abundance as Z i = Z Fe (Ueda et al. 2021). The Galactic absorption column density is estimated at N H = 2.89 × 10 20 cm −2 according to HI4PI Collaboration et al. (2016) and fixed in our X-ray spectral analysis. The blank-sky data included in CALDB are used to determine the background contribution.
To determine the centroid of X-ray emission in A370, we fit the surface brightness distribution with a 2D β-model using the SHERPA fitting package in CIAO (Freeman et al. 2001;Doe et al. 2007;Burke et al. 2021). The surface brightness map was extracted from the ACIS S3 chip in the data set of ObsID 515 to reduce the uncertainty in the background determination. A bright foreground galaxy lying about 2 north of the cluster center was masked with a circle of radius 30 from its X-ray peak. From the best-fit model, we find the X-ray centroid of R.A. = 2 : 39 : 53.2 and decl. = −1 : 34 : 35.1 (Table 1), with positional uncertainties of (∆R.A., ∆decl.) = (0.33 , 0.39 ). Figure 13 shows the exposure-corrected and backgroundsubtracted Chandra ACIS image of A370 in the 0.5-7 keV band, smoothed with 4.6 FWHM Gaussian. The X-ray emission centroid determined from Chandra observations is 4.9 (≈ 18 h −1 kpc) away from the optical center defined as the midpoint of the two BCGs.
Here projct allows us to fit spectra extracted from a series of concentric annuli simultaneously, assuming spherical symmetry to calculate suitable geometric factors (e.g., Fabian Three-dimensional gas temperature Tgas(r) (upper panel) and enclosed gas mass Mgas(< r) (lower panel) profiles of A370 derived from Chandra X-ray observations. The red shaded region in each panel shows the marginalized 1σ confidence region of the respective profile obtained from forward modeling of Chandra X-ray data. Blue open boxes show the gas temperatures with 1σ uncertainties from spectral deprojection in concentric spherical shells. Blue filled circles with error bars in the lower panel show the results obtained from spectral deprojection. et al. 1980, 1981Kriss et al. 1983;Arabadjis et al. 2002). In this analysis, the cluster redshift and N H are fixed at 0.375 and 2.89 × 10 20 cm −2 , respectively. The metal abundance of the ICM in the 100 -200 region is assumed to be 0.3Z (Fujita et al. 2008;Werner et al. 2013;Urban et al. 2017;Ghizzardi et al. 2021).
To set the outer boundary conditions, we fit the background-subtracted X-ray spectrum in the 0.4-2.0 keV band extracted from the outermost annular region of 200 -400 , ignoring the emission from gas outside the outermost shell (see Humphrey et al. 2006) and fixing the metal abundance to 0.3Z . The best-fit parameters for the outermost shell, T 3D = 9.7 +10.2 −3.3 keV and n e = (1.76 ± 0.08) × 10 −4 cm −3 , are included and fixed in our deprojection analysis of the inner concentric regions.
The resulting best-fit parameters for each spherical shell are summarized in Table 8 (see also Figure 14). The gas density ρ gas is related to the electron number density n e as ρ gas = µ e m p n e , with µ e ≈ 1.11 the mean mass per electron in units of proton mass m p .

Parametric Forward Fitting
We perform a forward model fitting of the Chandra observations for A370. The MBPROJ2 algorithm developed by Sanders (2017, see also Sanders et al. 2018) is capable of modeling radial X-ray surface brightness profiles in multiple energy bins, with or without assuming hydrostatic equilibrium. Motivated by their work, we have implemented a forward-modeling algorithm to simultaneously fit the X-ray brightness profiles binned in multiple energy bands to infer the 3D gas density and temperature profiles in a parametric form, without assuming hydrostatic equilibrium. Both algorithms assume spherical symmetry.
We model the 3D gas density profile n e (r) as a β-profile and the 3D temperature profile T 3D (r) as a universal temperature profile of Vikhlinin et al. (2006): where n e0 is the central electron number density, β is the slope parameter, and r c is the core radius of the β profile; T 0 is the central gas temperature, a and c are the temperature slope parameters, and r t is the temperature scale radius.
In this analysis, we fix the cluster redshift to z = 0.375, the Galactic absorption column density to N H = 2.89 × 10 20 cm −2 , and the metal abundance of the ICM to Z = 0.3Z . We use the spectroscopic-like temperature T 2D (r) of Mazzotta et al. (2004) to approximate spectroscopic temperatures extracted from Chandra X-ray observations: with w = n 2 e (r)T −3/4 3D (r). The X-ray surface brightness S X (r ⊥ ) as a function of projected cluster radius r ⊥ = D l θ is modeled by the following equation (Ettori 2000): where Λ X (T 2D , Z) is the cooling function, n p0 ≈ n e0 /1.17 is the central proton number density, and B(x, y) is the beta function. We use the PYATOMDB python package (Foster & Heuer 2020) to evaluate the cooling function Λ X (T 2D , Z) in each energy band for a given value of the spectroscopic-like temperature T 2D (r ⊥ ).
We have extracted the radial profiles of X-ray surface brightness in N spec = 10 energy bands between neighbouring energies of 0.5, 0.75, 1, 1.25, 1.5, 2, 3, 4, 5, 6, and 7 keV. In each energy band, the X-ray surface brightness is sampled in 49 linearly spaced radial bins in the range θ ∈ [10 , 200 ] centered on the X-ray centroid. Following Sanders et al. (2018), we have chosen these bands so as to capture most of the spectral information without overly increasing the computational time. We estimate in each radial bin the pixel-topixel variance of X-ray brightness over the 0.4-7 keV energy band, finding that the standard errors of the mean based on the estimated variance are highly consistent with the errors determined based on the photon counts. In this work, we use the standard error based on the estimated variance to characterize the uncertainty in the mean X-ray surface brightness in each bin.
The background contribution in each energy band is determined from the blank-sky data included in CALDB (Section 6.1). We estimate the count rate of the blank-sky data in the spectral range of 9-12 keV dominated by the particle background (Hickox & Markevitch 2006). Using the ratio between the count rate observed in A370 and the background one in the 9-12 keV band, we rescale the background contribution in each energy band to match the observations of A370, accounting for the difference in exposure times. We then construct the azimuthally averaged radial profile of the background map in each energy band. Similarly, we create azimuthally averaged radial profiles of exposure maps in the 10 energy bands.
We simultaneously fit the observed X-ray surface brightness profiles in the 10 energy bands with our model using affine-invariant MCMC sampling (Goodman & Weare 2010) implemented by the EMCEE python package (Foreman-Mackey et al. 2013). The log-likelihood function for the data is defined by (up to a normalization constant) where i and j run over all energy bands and all radial bins, respectively, d ij is the binned X-ray brightness measured in units of counts per pixel, σ ij is the statistical uncertainty of the measurement in each bin, T i represents the Galactic transmission in the ith energy band calculated by XSPEC using the photoionization cross sections of Verner et al. (1996), S X,ij is the model prediction in each bin for the X-ray surface brightness given by Equation (45), w ij is the conversion factor proportional to the product of the effective area and the net exposure time in each bin, BGD ij denotes the background contribution in each bin given in units of counts per pixel, and N i is a dimensionless calibration factor of the background in the ith energy band.
Our model has a total of 17 parameters, of which seven parameters describe the cluster X-ray emission (see Equation (43)), namely (n e0 , r c , β, T 0 , r t , a, c), and the rest are calibration nuisance parameters, {N i } Nspec i=1 . For the parameters describing the cluster X-ray emission, we use uninformative uniform priors of n e0 ∈ [0, 1] cm −3 , r c ∈ 1.01 ± 0.04 N (3.00−4.00 keV) 0.98 ± 0.05 N (4.00−5.00 keV) 1.02 ± 0.04 N (5.00−6.00 keV) 1.01 ± 0.04 N (6.00−7.00 keV) 1.00 ± 0.03 , we adopt a Gaussian prior of 1.0 ± 0.2. We sample the posterior PDFs of all model parameters over the full parameter space allowed by the priors. Posterior summaries of the model parameters are listed in Table 9. In Appendix E, we show the Chandra X-ray brightness profiles along with the best-fit model.
Finally, we use the posterior samples obtained with the MCMC algorithm to derive constraints on the gas mass M gas (< r) enclosed within the spherical radius r and the hydrostatic equilibrium mass M HE (< r) of A370. The hydrostatic mass M HE (< r) is given by where k B is the Boltzmann constant and µ g ≈ 0.60 is the mean molecular weight. We will compare the resulting hydrostatic mass profile, M HE (< r), with our weak-lensing results in Section 7.1.
In the top panel of Figure 14, we show the marginalized 1σ confidence region of T 3D (r) obtained from our forward modeling, along with the deprojected temperatures inferred from our spectral deprojection analysis (Section 6.2). Similarly, we compare in the bottom panel of Figure 14 our determinations of M gas (< r) from both methods. For both comparisons, we find that the two complementary approaches yield highly consistent results. Chandra X-ray Figure 15. Comparison of spherically enclosed total mass profiles Mtot(< r) of A370 as a function of spherical radius r. The orange and blue shaded areas represent the marginalized 1σ confidence regions from triaxial NFW modeling of the Σ(θ) map (Figure 9) with and without using the LOS prior, respectively. The dashed line shows the posterior mean from spherical NFW modeling of the Σ(θ) map. The red hatched area represents the marginalized 1σ confidence region of the total hydrostatic mass MHE(< r) obtained from forward modeling of Chandra X-ray data that cover the radial range θ ∈ [10 , 200 ].

Hydrostatic Mass Bias
Hydrostatic mass estimates M HE are expected to be biased low, depending on details of nongravitational processes and the level of residual gas motions in the ICM. Determining the level of hydrostatic mass bias for a representative sample of galaxy clusters has important implications for both cluster cosmology and astrophysics (Planck Collaboration et al. 2014Pratt et al. 2019). Cosmological hydrodynamical simulations suggest a modest level of hydrostatic mass bias for an ensemble of galaxy clusters, b HE ≡ 1 − M HE /M true ∼ 5%-20% at r r 500 (Nagai et al. 2007;Lau et al. 2009;Meneghetti et al. 2010b;Nelson et al. 2012;Angelinelli et al. 2020;Ansarifard et al. 2020), defined with respect to the true enclosed mass M true . Since A370 is a highly disturbed system, the cluster is likely to exhibit a higher than typical value of mass bias, which could serve as an extreme limit expected for galaxy clusters.
With the aim of characterizing the level of hydrostatic mass bias in A370, we compare our lensing-based determinations of the cluster mass profile (Section 5) to the hydrostatic mass profile M HE (< r) derived from Chandra X-ray data (Section 6.3). For this purpose, we compute the total mass M tot (< r) of a triaxial halo enclosed within a sphere of radius r: where ρ(X, Y, Z) is the density function (Equations (29) and (30)), the region of integration V is √ X 2 + Y 2 + Y 2 r, and dΩ = sin ϑdϑdφ is the solid angle element in spherical coordinates.
In Figure 15, we compare the spherically enclosed total mass profiles M tot (< r) obtained from our WL-2D and Xray analyses. Here we have extrapolated the X-ray forward model beyond the range of the fitted data ( < ∼ 720 h −1 kpc) to compute M HE (< r) out to larger cluster radii. We note that in contrast to the triaxial lensing constraints on M tot (< r), the hydrostatic mass M HE (< r) obtained assuming spherical symmetry is not corrected for the projection effect due to the LOS elongation of the gas distribution. Since the shape of the collisional gas is rounder than the underlying matter (e.g., Suto et al. 2017), the level of projection bias in the gas distribution is expected to be less than ∼ 10% found in the total mass distribution (see Section 5.4).
In Figure 16, we show the hydrostatic mass bias as a function of spherical radius r, defined with respect to the total mass M WL (< r) determined from weak lensing: The results are shown for the three different priors on the halo shape employed in our mass modeling of the WL-2D data (see Table 6). We find no significant evidence for a strong variation of b(r) both within and beyond the radial range probed by the Chandra data ( < ∼ 720 h −1 kpc). At each radius r, we find similar central values of the distributions from the spherical and the fiducial triaxial cases (see Table 6).
From the triaxial lens modeling, we obtain mass ratios of 1−b(r) = 0.56±0.09 and 0.51±0.09 at r = 0.7 h −1 Mpc ∼ 0.7r 500 , with and without using the LOS prior, respectively. When the X-ray forward model is extrapolated out to r 500 ∼ 1 h −1 Mpc, we find 1−b(r 500 ) = 0.54±0.12 and 0.50±0.11 with and without the LOS prior, respectively. The range of mass bias inferred for A370, b ∈ [0.34, 0.61] at the 1σ level, is on the high side of the distribution expected from cosmological cluster simulations (see Nagai et al. 2007;Lau et al. 2009;Ansarifard et al. 2020) and is in better agreement with the value of 1 − b HE = M HE /M true = 0.58 ± 0.04 required to bring the Planck CMB and cluster constraints into full agreement in the base ΛCDM cosmology of Planck Collaboration et al. (2016). However, it should be noted again that the mass bias found for this cluster should be considered as an extreme value expected for galaxy clusters.

Gas Mass Fraction
In Figure 17, we show the ratio of spherically enclosed gas mass M gas (< r) to total mass M tot (< r) as a function of spherical radius r: Here the total mass M tot is taken to be either the weaklensing mass M WL or the hydrostatic mass M HE (see Section 7.1) and the gas mass M gas is derived from the X-ray forward model (Section 6.3). We find that the gas mass fraction f gas (< r) increases progressively outward, indicating that the hot gas is more extended than the underlying matter distribution.
Our lensing results, when combined with the X-ray gas mass measurements, yield a direct estimate for f gas (< r), free from the assumption of hydrostatic equilibrium. Using the total mass derived from triaxial lens modeling, the gas mass fraction enclosed within a sphere of radius r = 0.7 h −1 Mpc ∼ 0.7r 500 is found to be f gas (< r) = (8.4 ± 1.0)% and (7.6 ± 1.0)%, with and without using the LOS prior, respectively. Extrapolating the gas mass measurements out to r 500 , we find f gas (< r 500 ) = (9.6 ± 1.2)% and (9.0 ± 1.2)% with and without the LOS prior, respectively. When compared to the cosmic baryon fraction f b ≡ Ω b /Ω m = 0.156 ± 0.002 determined by the Planck mission (Planck Collaboration et al. 2020), our constraint on the gas mass fraction indicates f gas (< r 500 )/f b = 0.62 ± 0.08 and 0.58 ± 0.08, with and without using the LOS prior, respectively. These are significantly lower than the typical values of f gas /f b ∼ 0.8-0.9 observed for high-mass galaxy clusters (Chiu et al. 2018a;Tian et al. 2020;Akino et al. 2022). Such a high degree of gas depletion can be caused by the adiabatic expansion of the post-shock gas (Ricker & Sarazin 2001;Umetsu et al. 2010). It would take of the order of Gyrs for the gas to fall back into the gravitational potential well of the cluster.
By contrast, the gas mass fraction based on the X-ray hydrostatic mass, f gas (< r) = M gas (< r)/M HE (< r), reaches the cosmic baryon fraction f b at r ≈ 0.8 h −1 Mpc and increasingly exceeds it at larger cluster radii.

SUMMARY AND CONCLUSIONS
The Frontier Fields cluster A370 is a superlens characterized by a large Einstein radius (θ Ein = 33.9 ± 1.1 for z s = 2; Table 1) and is one of the most massive known lenses on the sky. Recent dedicated numerical simulations of binary cluster mergers constrained by multi-probe observations suggest that the cluster is a post-major merger of two similar-mass clusters (Molnar et al. 2020). These results also suggest that A370 is in a highly disturbed dynamical state and is elongated along the current direction of the collision axis, which is closely aligned with the line of sight in their best-matching simulation.
In this paper, we have carried out a detailed weaklensing and X-ray study of A370 using wide-field BR C z Subaru/Sprime-Cam (Section 3) and Chandra X-ray (Section 6) observations. By combining 2D shear and azimuthally averaged magnification constraints derived from the Subaru data, we have performed a lensing mass reconstruction in a free-form manner (Section 4; Figures 1 and 9), which allows us to determine both radial structure and 2D morphology of the cluster mass distribution.
In a parametric triaxial framework assuming an NFW density profile, we have constrained the intrinsic structure, shape, and orientation of the cluster halo by forward modeling the reconstructed Σ(θ) map (Section 5; Tables 6 and 7). We obtain a halo mass M 200 = (1.54±0.29)×10 15 h −1 M and a halo concentration c 200 = 5.27 ± 1.28 with uninformative uniform priors. Using a prior on the LOS alignment of the halo major axis derived from the binary merger simulations of Molnar et al. (2020), we find that the data favor a more prolate geometry with lower mass and lower concentration, M 200 = (1.38 ± 0.20) × 10 15 h −1 M and c 200 = 4.45 ± 0.93.
When compared to the hydrostatic mass estimate M HE from Chandra observations (Section 7.1), our triaxial weaklensing analysis yields spherically enclosed mass ratios M HE /M WL of 1 − b(r) = 0.56 ± 0.09 and 0.51 ± 0.09 at r = 0.7 h −1 Mpc ∼ 0.7r 500 , with and without using the LOS prior, respectively ( Figure 16). Extrapolating our X-ray forward model to r 500 , we find 1 − b(r 500 ) = 0.54 ± 0.12 and 0.50 ± 0.11 with and without the LOS prior, respectively. Since the cluster is in a highly disturbed dynamical state (Section 5.1), this represents the likely maximum level of hydrostatic bias expected in galaxy clusters.
Our lensing results, when combined with the X-ray gas mass measurements, yield a direct estimate for the gas mass fraction, free from the assumption of hydrostatic equilibrium. From triaxial lens modeling with the LOS prior, the gas mass fraction enclosed within a sphere of radius r = 0.7 h −1 Mpc ∼ 0.7r 500 is found to be f gas (< r) = (8.4 ± 1.0)% (Section 7.2). When the gas mass measurements are extrapolated to r 500 , f gas (< r 500 ) = (9.6±1.2)%, or f gas (< r 500 )/f b = 0.62 ± 0.08 relative to the cosmic baryon fraction, f b = Ω b /Ω m (Figure 17). These are significantly lower than the typical values of f gas /f b ∼ 0.8-0.9 found in high-mass galaxy clusters (Chiu et al. 2018a;Tian et al. 2020;Akino et al. 2022). The high degree of gas depletion observed for A370 is in line with the post-major merger scenario of Molnar et al. (2020).
We have also constructed the projected radial mass profile from an optimally weighted projection of the Σ(θ) map (Table 5), obtaining a model-independent constraint on the projected total mass of M 2D (< r ⊥ ) = (3.11 ± 0.47) × 10 15 h −1 M at r ⊥ ≈ 2.3 h −1 Mpc ∼ 1.2r vir for the projected mass of the whole system, including any currently unbound material around the cluster.
Combining the data products presented in this work with HST strong-and weak-lensing data sets available from the Frontier Fields and BUFFALO programs will allow us to conduct a multi-scale lensing reconstruction in the cluster of exceptional projected mass. Such a full-lensing analysis can then be used to detect and study mass substructures in the unique merging environment (e.g., Jauzac et al. 2016Jauzac et al. , 2018Tam et al. 2020), for a detailed comparison with the distribution of intracluster baryons. It will also allow us to perform a detailed characterization of the mass profile shape and its deviation from the equilibrium form over a wide radial range, for an improved determination of the total mass bound to the cluster. This paper is based on data collected at Subaru Telescope, which is operated by the National Astronomical Observatory of Japan. This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www. cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa. int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement. Software: IMCAT package (Kaiser et al. 1995), xspec (v12.11.1;Arnaud 1996), Sherpa (Freeman et al. 2001;Doe et al. 2007;Burke et al. 2021), CIAO (v4.13;Fruscione et al. 2006), PYATOMDB (Foster & Heuer 2020), SCAMP software (Bertin 2006), SWARP (Bertin et al. 2002) In this study, we applied both bright and faint magnitude cuts to define a magnitude-limited sample of background galaxies (Table 4). Applying an additional bright magnitude cut is to reduce the contamination from unlensed foreground galaxies (Medezinski et al. , 2018. However, it will also modify the signal of magnification bias because magnified source galaxies near the bright cut will be removed from the observed sample. As a result, the net effect of magnification bias includes the contribution from the bright cut as well as from the faint cut . Following Chiu et al. (2020), we obtain the expression for the magnification bias signal expected for a background sample defined in the magnitude range m bright m < m faint as where we have used the weak-lensing limit (µ 1 + 2κ); the quantity s eff denotes the effective count slope for a background sample defined in the magnitude range m bright m < m faint : with f bright = N µ (< m bright )/N µ (< m faint ).
In typical observations of magnification bias based on deep multi-band imaging (Umetsu et al. 2014;Chiu et al. 2016), we require m bright to be 2-3 magnitudes brighter than m faint (Table 4). In this work, we have f bright ≈ 1.0% for the lensing-cut sample and f bright ≈ 3.3% for the null-test sample, so that we can safely ignore the correction terms proportional to f mask . In the limit f bright → 0, we have In this study, we interpret the magnification bias signal using the approximation s eff s(m faint ).

B. PHOTOMETRIC ZERO POINT CALIBRATION
The zero point for the Suprime-Cam z filter was calibrated by matching SEXTRACTOR's AUTO fluxes for point sources to their PSF fluxes from the Pan-STARRS DR1 catalog (Flewelling et al. 2020). Since the transmission curves of the Pan-STARRS z filter and the Suprime-Cam z filter are different, we followed the procedure of Umetsu et al. (2010) to infer z -band fluxes from the Pan-STARRS photometry. We use the HYPERZ code (New-Hyperz ver. 11; Bolzonella et al. 2000) to perform a spectral energy distribution (SED) fitting to Pan-STARRS grizy photometry, with stellar templates from the Pickles library (Pickles 1998). Pan-STARRS's z fluxes were obtained using the transmission curve of the Suprime-Cam z filter. Point sources with Pan-STARRS's z magnitudes in the range [18.5, 20.0] are used for calibration, because stars with z brighter than 18.5 mag are saturated. Figure 18 compares the calibrated Suprime-Cam z magnitudes and the reference z magnitudes derived from Pan-STARRS grizy photometry. The residual rms scatter is 0.034 mag. The zero points for the Suprime-Cam B and R C filters were first derived by matching the stellar locus in the B −R C vs. R C − z diagram to the COSMOS2020 photometry (Weaver et al. 2022). The uniformity of the colors of Galactic stars permits a reliable color calibration between fields with |b| > 30 • (for detail, see Gilbank et al. 2011). To this end, we use isophotal fluxes for better color measurements. Since the COSMOS photometry does not cover the Figure 19. Binned distribution of stars in color-color space for the COSMOS field (left) and A370 (right), after the color matching. The locations of the peak density (redder colors) in the two fields are different because of the field-to-field variations of stellar populations as well as of different magnitude cuts applied to both samples. By contrast, the shapes of the stellar locus in the two fields are similar. Therefore, the cross-correlation technique can be used to match the color distribution of stars in the cluster field to the reference COSMOS photometry.
Suprime-Cam R C band, we needed to estimate R C magnitudes for COSMOS field objects. We use again the HYPERZ code with SEDs from the Pickles library (Pickles 1998) to obtain the best-fit model for each star using COSMOS2020 isophotal photometry in 4 Suprime-Cam intermediate bands (IB574, IA624, IA679, IB709). The R C photometry was derived using the transmission curve of the Suprime-Cam R C filter. Since the wavelength coverage of the Suprime-Cam R C band is well sampled by the 4 Suprime-Cam intermediate filters, this R C estimation is regarded as an interpolation of data. The R C magnitudes obtained with this method are thus model independent and sufficiently accurate for our purpose (see Umetsu et al. 2010). The matching was performed using the cross correlation in CC space between our Suprime-Cam data in A370 and the COSMOS2020 data. Figure 19 shows the result of color matching. Once the color offsets are determined, the zero points for the Suprime-Cam B and R C filters were derived from the color offsets.
We repeat the same procedure to obtain Suprime-Cam R C magnitudes for galaxies in the COSMOS field. The SED fitting for each COSMOS galaxy was performed with the HY-PERZ code by using spectral templates from the GALAXEV library (Bruzual & Charlot 2003) and by fixing the redshift to each photometric redshift (computed with the LEPHARE code; Ilbert et al. 2006) from the COSMOS2020 FARMER catalog. A Galactic extinction correction was applied to galaxies in both data sets according to Schlegel et al. (1998). By matching the distributions of galaxies in the B − R C vs. R C − z diagram for both data sets, we find that additional offsets of −0.02 mag and +0.10 mag need to be added to the Suprime-Cam B and R C magnitudes, respectively. We have corrected for the residual offsets by adding −0.02 mag and +0.10 mag to the respective magnitudes in our data set. These additional offsets may be due to the bias in our Suprime-Cam R C estimation based on the SED fitting and the uncertainty of the cross correlation matching.

C. WEAK-LENSING MAGNIFICATION ANALYSIS
Here we detail our magnification analysis. Following the procedure outlined in Umetsu et al. (2014Umetsu et al. ( , 2016, we account for the Poisson, intrinsic clustering, and additional systematic contributions to the total uncertainty σ µ (Section 2.5). First, we estimate σ int µ,i dominated by intrinsic clustering from the azimuthal variation of the counts in cell. A positive tail of > νσ cells is then removed in each bin using iterative σ clipping with ν = 2. This is to alleviate the bias due to angular clustering of red galaxies. The Poisson noise term σ stat µ,i is estimated from the clipped mean counts in each annular bin. The difference between the mean counts estimated with and without σ clipping is taken as a systematic error, σ sys µ,i = |n µ,i the clipped and unclipped mean counts in the ith annulus, respectively. Finally, these errors are combined in quadrature as σ 2 µ,i = (σ int µ,i ) 2 + (σ stat µ,i ) 2 + (σ sys µ,i ) 2 .
Our magnification analysis is insensitive to the particular choice of ν because of the inclusion of the σ sys µ term (Umetsu et al. 2014. Note that by including the σ stat µ term in Equation (C4), we are in effect double counting the contribution of Poisson fluctuations in estimating the errors. We find that including the σ stat µ term increases the estimated total uncertainty by 10%-20%. This slight overestimate of the uncertainty is not expected to significantly affect our joint mass reconstruction, because our lensing constraints are dominated by the shear measurements.
Masking of observed sky is corrected for using the method of Umetsu et al. (2011, Method B of Appendix A), which is fully automated once the configuration parameters of SEX-TRACTOR (Bertin & Arnouts 1996) are optimally tuned (Umetsu et al. 2014Chiu et al. 2016). We find that the masked area fraction f mask is ∼ 4% of the sky at θ > 10 , increasing toward the cluster center up to ∼ 10% at θ < ∼ 2 .
The masked area fraction averaged over the radial range θ ∈ [θ min , θ max ] is ≈ 4.6%. This is similar to the results of Umetsu et al. (2016) for the CLASH sample at the median redshift z l ≈ 0.35. The mask-corrected magnification bias profile b µ,i = n µ,i /n µ is proportional to (1 − f mask,back )/(1 − f mask,i ) ≡ 1+∆f mask,i with f mask,back estimated in the reference background region at θ ∈ [12 , 16 ]. Thus, the mask correction essentially depends on the difference of the f mask values, ∆f mask,i f mask,i − f mask,back , which is insensitive to the particular choice of the configuration parameters for source extraction. Accordingly, the systematic uncertainty on the mask correction is not expected to significantly bias our magnification measurements.

D. TWO-DIMENSIONAL TO ONE-DIMENSIONAL PROJECTION
To enable a direct comparison between the results from 1D and 2D mass reconstructions, we construct a surface mass density profile Σ(θ) from an optimally weighted projection of the Σ(θ) field as (Umetsu et al. 2015) where Σ (2) = {Σ(θ m )} Npix m=1 is a pixelized mass map, C (2) is the pixel-to-pixel covariance matrix of Σ (2) , Σ (1) is a data vector containing radially binned Σ values, and A is a mapping matrix whose elements A mi represent the area fraction of the mth pixel lying within the ith clustercentric radial bin (Section 2.5). The bin-to-bin covariance matrix for Σ (1) is given by E. CHANDRA X-RAY BRIGHTNESS PROFILES Figure 20 shows the radial X-ray surface brightness profiles of A370 measured in 10 energy bands (0.5 to 7 keV) from Chandra observations. The binned total and background X-ray brightness profiles in each energy band are plotted in each panel, along with the best-fit model derived from simultaneous forward modeling of the 10 energy bands. . Radial X-ray surface brightness profiles of A370 measured in 10 energy bands from Chandra observations. Red (gray) vertical bars in each panel show the 1σ confidence range of the total (background) counts in radial bins. The red (gray) shaded area in each panel shows the marginalized 2σ confidence region of the model for the total (background) counts obtained from simultaneous forward fitting of the 10-band brightness profiles. The red dashed line in each panel shows the posterior mean profile of the cluster contribution.