Massive molecular gas reservoir in a luminous sub-millimeter galaxy during cosmic noon

We present multi-band observations of an extremely dusty star-forming lensed galaxy (HERS1) at $z=2.553$. High-resolution maps of \textit{HST}/WFC3, SMA, and ALMA show a partial Einstein-ring with a radius of $\sim$3$^{\prime\prime}$. The deeper HST observations also show the presence of a lensing arc feature associated with a second lens source, identified to be at the same redshift as the bright arc based on a detection of the [NII] 205$\mu$m emission line with ALMA. A detailed model of the lensing system is constructed using the high-resolution HST/WFC3 image, which allows us to study the source plane properties and connect rest-frame optical emission with properties of the galaxy as seen in sub-millimeter and millimeter wavelengths. Corrected for lensing magnification, the spectral energy distribution fitting results yield an intrinsic star formation rate of about $1000\pm260$ ${\rm M_{\odot}}$yr$^{-1}$, a stellar mass ${\rm M_*}=4.3^{+2.2}_{-1.0}\times10^{11} {\rm M_{\odot}}$, and a dust temperature ${\rm T}_{\rm d}=35^{+2}_{-1}$ K. The intrinsic CO emission line ($J_{\rm up}=3,4,5,6,7,9$) flux densities and CO spectral line energy distribution are derived based on the velocity-dependent magnification factors. We apply a radiative transfer model using the large velocity gradient method with two excitation components to study the gas properties. The low-excitation component has a gas density $n_{\rm H_2}=10^{3.1\pm0.6}$ cm$^{-3}$ and kinetic temperature ${\rm T}_{\rm k}=19^{+7}_{-5}$ K and a high-excitation component has $n_{\rm H_2}=10^{2.8\pm0.3}$ cm$^{-3}$ and ${\rm T}_{\rm k}=550^{+260}_{-220}$ K. Additionally, HERS1 has a gas fraction of about $0.4\pm0.2$ and is expected to last 250 Myr. These properties offer a detailed view of a typical sub-millimeter galaxy during the peak epoch of star-formation activity.


INTRODUCTION
The cold molecular gas (traced by millimeter (mm) observations of the molecular CO) is the key fuel for active star formation in galaxies (Carilli & Walter 2013). The fraction of the molecular gas reservoir that ends up in new stars (what is usually referred to as the star formation efficiency) depends on parameters such as the fragmentation and chemical compositions of the gas and is diminished by phenomena that disperse the gas and prevent the collapse, such as feedback from an active nucleus (AGN) or star-formation-driven winds (Bigiel et al. 2008;Sturm et al. 2011;Swinbank et al. 2011;Fu et al. 2012). Existing evidence suggests AGN activity is the dominant mechanism in quenching star formation at high redshifts, specifically in the most extreme environments (Cicone et al. 2014). On the other hand, the UV emission from newly born hot stars in star-forming regions ionizes the surrounding gas, generating a wealth of recombination nebular emission lines. The presence and intensity of these lines reveal valuable information on the physics of ionized gas surrounding these regions (Coil et al. 2015;Kriek et al. 2015;Shapley et al. 2015).
The most intense sites of star formation activities in the universe at high redshift happen in the gas-rich dusty star-forming galaxies (Casey et al. 2014). These heavily dust-obscured systems are often discovered in the millimeter wavelengths and are believed to be the progenitors of the most massive red galaxies found at lower redshifts (Toft et al. 2014). Despite many efforts, the physics of the ionized gas (chemical composition, spatial extent, and relative line abundance) is still poorly understood for these star-forming factories of the universe. Recent resolved studies of ionized gas at high redshift mostly focus on normal star-forming galaxies and miss this hidden population of starbursting systems (Genzel et al. 2011;Förster Schreiber et al. 2014).
Submillimeter galaxies (SMGs) are more efficient in turning gas into stars than normal star-forming galaxies at the same epoch (i.e., Lyman-break Galaxies (LBGs) and BzK-selected galaxies), similar to local ultraluminous infrared galaxies (ULIRGs) (Papadopoulos et al. 2012). Recent studies suggest that the LBG selected star forming galaxies might be fundamentally different from that of SMGs whereas the former is usually characterized by SFR < 100 M yr −1 the latter dominates the high-SFR end, and these are believed to be driven by different star formation mechanisms (Casey 2016). Using unlensed SMGs, Menéndez-Delmestre et al. (2013) showed that the star formation scale in these highredshift systems seems to be very different from that of local starburst and are more extended over 2-3 kpc scales.
Through Herschel wide-area surveys, we have now identified hundreds of extremely bright submillimeter sources (S 500µm ≥ 100 mJy) at high redshifts. After removing nearby contaminants, such bright 500 µm sources are either gravitationally lensed SMGs or multiple SMGs blended within the 18 Herschel PSF (Negrello et al. 2007(Negrello et al. , 2010(Negrello et al. , 2017 with most turning out to be strongly lensed SMGs in our high-resolution followup observations (Fu et al. 2012;Bussmann et al. 2013;Wardlow et al. 2013;Timmons et al. 2015). For this study we have selected a very bright Keck/NIRC2observed Einstein-ring-lensed SMG at z = 2.553 (HERS J020941.1+001557 designated as HERS1 hereafter; Figure 1). This target is identified from the Herschel Stripe 82 survey (Viero et al. 2014) covering 81 deg 2 with Herschel/SPIRE instrument at 250, 350, and 500 µm. HERS1 at z = 2.553 is the brightest galaxy in Herschel extra-galactic maps with S 500 µm = 717 ± 8 mJy. The Three-color image of HERS1 adopting HST/WFC3 F110W (blue), F125W (green), and F160W (red) with submillimeter array (SMA) 870 µm contours overlaid. The SMA contours start from 9σ and increase in steps of 9σ with σ = 305 µJy beam −1 . The beam size is shown in the bottom right. Two deflecting galaxies at z = 0.202 (Geach et al. 2015) are marked as G1 and G2. galaxy was first identified as a gravitationally lensed radio source by two foreground galaxies at z = 0.202 in a citizen science project (Geach et al. 2015) to an Einsteinring with a radius ∼ 3 . HERS1 has also been selected by other wide-area surveys such as Planck and ACT and has extensive follow-up observations from CFHT and HST in the near-infrared along with ancillary observations by JVLA, SCUBA-2, and ALMA with a CO redshift from the Redshift Search Receiver of Large millimeter telescope (LMT) and independently from Hα using the IRCS on Subaru (Geach et al. 2015;Harrington et al. 2016;Su et al. 2017). Here we report new data from Keck/NIRC2 Laser Guide Star Adaptive Optics (LGS-AO) imaging in H and K s bands, HST/WFC3 F125W, Submillimeter Array (SMA), and ALMA. The wealth of multiband data combined with high-resolution deep imaging provides a unique opportunity to study the physical properties of HERS1 as an extremely bright SMG during the peak epoch of star formation activity. This paper is organized as follows: In Section 2 we present observations and data reduction as well as previous archival data. In Section 3, we describe the lensmodeling procedures and reconstructed source-plane images of the high-resolution observations. We then show the source properties including the CO spectral line en-ergy distribution (SLED), delensed CO spectral lines, large velocity gradient (LVG) modeling as well as infrared spectral energy distribution (SED) fitting in Section 4. Finally, we summarize our results in Section 5.
Throughout this paper, we assume a standard flat-ΛCDM cosmological model with H 0 = 70 km s −1 Mpc −1 and Ω Λ = 0.7. All magnitudes are in the AB system.
2. DATA 2.1. Hubble Space Telescope WFC3 imaging HERS1 was observed on 2018 September 02 under GO program 15475 in Cycle 25 with two orbits (PI: Nayyeri). We used the WFC3 F125W filter with a total exposure time of 5524 s. The data were reduced by the HST pipeline, resulting in a scale of 0 .128 pixel −1 . The photometry was performed following the WFC3 handbook (Rajan 2011).

Keck Near-IR imaging
The near-IR data of HERS1 was observed with the KECK/NIRC2 Adaptive Optics system on 2017 August 27 (PID: U146; PI: Cooray). The H and K s -band filters were used at 1.60 µm and 2.15 µm respectively. The observations were done with a custom nine-point dithering pattern for sky subtraction with 120 s (H band) and 80 s (K s band) exposures per frame. Each frame has a scale of 0 .04 pixel −1 adopting the wide camera in imaging mode. The data were reduced by a custom IDL routine (Fu et al. 2012).

Submillimeter array
Observations of HERS1 were obtained in three separate configurations of the SMA as described below. In each configuration, observations of HERS1 were obtained in tracks shared with a second target. Generally, seven of the eight SMA antennas participated in the observations, except in one case (see below). The SMA operates in double-sideband mode, with sideband separation handled through a standard phase-switching procedure within the correlator (Ho et al. 2004). During the period, the new SMA SWARM correlator was expanding, resulting in increasing continuum bandwidth with each observation. All observations were obtained with a mean frequency between 341 and 343 GHz (870 µm).
HERS1 was first observed in the SMA subcompact configuration (maximum baselines ∼ 45 m) on 2016 September 20 (PID: 2016A-S007; PI: Cooray). The weather was good and stable, with a mean τ 225 GHz ranging from 0.07 to 0.09 (translating to 1.2 mm to 1.6 mm precipitable water vapor(pwv)). The target observations were interleaved over a roughly 7.2 hr transit period, resulting in 190.5 minutes of on-source integration time for HERS1. Observations of HERS1 were next obtained in the SMA extended configuration (maximum baselines ∼ 220 m) on 2016 October 16 . The weather was good, with a mean τ 225 GHz of 0.04 to 0.06 (0.6 mm to 1.0 mm pwv). The target observations were interleaved over a roughly 9.5 hr transit period, resulting in 225 minutes of on-source integration time for HERS1. The last observations of HERS1 were conducted in the SMA very extended configuration (maximum baselines ∼509 m) on 2017 September 30 and again on October 6 (PID: 2017A-S036; PI: Cooray). For the September 30 observation, only six antennas were available due to a cryogenics issue, which recovered in time for the October 6 observations. For the first observation, the weather was very good, with a mean τ 225 GHz of 0.06 (1.0 mm pwv), and phase stability was generally very good. In the second observation, the weather was somewhat worse, with τ 225 GHz rising from 0.06 to 0.085 (1 to 1.5 mm pwv), with somewhat marginal phase stability. For both tracks, the target scans were interleaved over a roughly 9 hr transit period, resulting in 160 minutes (30 September) and 206 minutes (6 October) of on-source integration time for HERS1. For all the observations, passband calibration was obtained using a observations of 3C454.3, and gain calibration relied on periodic observations of J0224+069. The absolute flux scale was determined from observations of Uranus.
The integrated continuum visibility data for all four tracks were jointly imaged and deconvolved using the Astronomical Image Processing System (AIPS). Using natural weighting of the visibilities, the synthesized resolution is 580 mas × 325 mas (PA 27 • .2 ), and the achieved rms in the combined data map is 305 mJy beam −1 .

ALMA observation
We obtained four CO emission lines with ALMA from two programs. The CO(6-5) and CO(9-8) were obtained in project 2018.1.00922.S (PI: Riechers) on 2018 October 21 and November 30. Each execution used four spectral windows (SPW) covering the target lines. Each SPW was 2.000 GHz wide with 128 channels. Both observations were performed with ALMA 7 m antennas with a maximum baseline of 48.9 m, yielding the synthesis beam size 8 .14 × 5 .12 (PA= 76 • ) for CO(6-5) and 5 .28 × 3 .32 for CO(9-8) (PA= 88 • ). The bandpass and flux calibrators were J0237+2848 (CO(6-5)) and J0238+1636 (CO(9-8)). J0217+0014 was used as phase calibrator for both executions. The integration time was 18 minutes and 15 minutes reaching the rms of 1.4 mJy beam −1 over the 266 MHz bandwidth (CO(6- References- (1) 5)) and 1.2 mJy beam −1 over the 388 MHz bandwidth (CO(9-8)) respectively. CO(7-6) and CO(5-4) were observed in another project, 2016.2.00105S (PI: Riechers). The observation covered the four frequency ranges of 211.  GHz. Data were acquired on 2017 September 09 and 21 using ALMA 7 m antennas with a total on-source integration time of 11.8 and 10.3 minutes, respectively. Calibrators used for bandpass, flux, and phase calibrations were J0006-0623, Uranus and J0217+0144. The rms of the data reach 1.040 mJy beam −1 over bandwidth 310.6 MHz for CO(7-6) and 1.800 mJy beam −1 over the 221.6 MHz bandwidth for CO(5-4). The synthesis beam were 6 .88 × 4 .64 at a PA of 87 • (CO(7-6)) and 10 .50 × 5 .90 at a PA of 80 • (CO(5-4)). The CI fine-structure line (refer as CI(2-1) hereafter) was also detected in the same detection window of CO(7-6) at the frequency ν obs = 227.8 GHz. All data were mapped using tclean task in the casa package (v.5.1.1 or 5.4.0) with a natural weighting.  Ma et al. (2018). HAWC+ is capable of carrying out far-infrared imaging in five bands from 40 to 300 µm. We obtained the observation in Band C at 89 µm with a bandwidth of 17 µm in the Total-Intensity OTFMAP configuration. The HAWC+ Band C image has a field of view of 4.2 ×2.7 with a PSF (FWHM) of 7 .8. The total effective on-source time is about 5250 s. The raw data were processed through the CRUSH pipeline v2.34-4 (Kovács 2008), and the final Level 3 data product was flux calibrated at the SOFIA Science Center. The flux calibration error is about 10%. The resulting map has an rms noise level of ∼ 30 mJy beam −1 . The imaging data do not show a clear detection of HERS1 at its location. The 3σ upper limit, extracted for a point source with the HAWC+ Band C PSF at the peak location of HERS1, is 168 mJy and is shown in Figure 13.
2.6. Archival observations HERS1 was observed by extensive programs covering different bands of emission lines and continua as mentioned above. Here we present a brief summary of the archival data of previous observations that were used in this paper. Geach et al. (2018) performed an observation on 2017 December 11 and 14 (PID: 2017.1.00814.S; PI: Ivison) using the ALMA 12m array. The frequency ranges were 126.26-130.01 GHz and 138.26-142.01 GHz. A 1σ rms sensitivity of 150 µJy per 23 MHz channel was acquired with a total of 1.8 hr of on-source integration. The resulting beam size is 0 .27 × 0 .21. The data resulted in the discovery of three emission lines, CN(4-3) at 127.6 GHz, CO(4-3) at 129.8 GHz, and CI(1-0) at 138.5 GHz (for details see their paper). The NII(1-0) fine-structure line was also observed by this project at 441.2 GHz on 2018 August 26. A rms noise of 3 mJy beam −1 in a 15.6 MHz channel was reached after a 42 minutes of total on-source integration. The synthesis beam is 0 .32 × 0 .25 (Table 1).
The CO(3-2) line (ν obs = 97.3 GHz) was detected on 2017 December 26 (PID:2017.1.01214; PI: Yun) with the 12 m ALMA array. The line was covered in the spectral window centered on the frequency of 97.308 GHz with 480 channels. The resulting synthesized beam size was 0 .45 × 0 .32 and the rms was 420 µJy beam −1 with a bandwidth 3.9 MHz (Table 1).

HERS1
HERS1 is a gravitationally lensed galaxy magnified by two foreground galaxies (a primary galaxy G1 and a satellite G2 located at the northwest of G1 shown in Figure 1) at the same redshift z = 0.202. In order to derive its intrinsic properties, we first built a lens model of this system.
We used the publicly available code lenstool 1 to construct the best-fit model. The best-fit parameters were found by performing a Bayesian Markov Chain Monte Carlo (MCMC) sampling. The lensing system is mainly made up of two parts as shown in Figure 1. We used the HST/WFC3 F125W high-resolution image to constrain the lens model parameters. We first fitted the light profile of the lensing galaxies (G1 and G2) by a Sérsic function using galfit (Peng et al. 2010). We then subtracted the modeled foreground galaxies to obtain the lensed image. We checked the robustness of the galfit results by performing a photometric decomposition with the gasp2d code (Méndez-Abreu et al. 2008. Therefore, we kept the simpler model defined by a single Sérsic function. The lensed components were then identified using the photutils 2 package (Bradley et al. 2020) and broken into four ellipses. The elliptical size and fluxes of these ellipses were then measured as input information for lenstool. We chose a singular isothermal ellipsoid profile for the primary galaxy G1 and a singular isothermal sphere profile for G2. We fixed the positions of the galaxies, so the model was parameterized by the ellipticity e, the position angle θ and the ve-locity dispersion σ 1 for G1 and velocity dispersion σ 2 for G2. The redshift of foreground galaxies and background galaxies was fixed at z = 0.202 and z = 2.553, respectively. The optimization output provided the best-fit results. Table 2 gives the best-fit parameters of the lensing galaxies. The reconstructed images obtained from the best-fit model are shown in Figure 2. From this model, we obtain the luminosity-weighted magnification factor to be µ star = 13.6 ± 0.4. The best-fit model established above was also used to reconstruct images of the aforementioned high-resolution emission lines and SMA dust continuum as shown in Figure 3. The dust map provides a magnification factor µ dust = 12.8 ± 0.3. The source-plane images are reconstructed using the cleanlens algorithm within lenstool, the results are also illustrated in Figure 2 and Figure 3. Our model is generally consistent with the model of Geach et al. (2015) while our ellipticity for G1 (0.34) is larger than their results (0.12). The magnification of the stellar and dust components is also similar to that of other bands (Geach et al. 2015;Geach et al. 2018;Rivera et al. 2019) that µ =11 -15.

Second lensed source
In the high-resolution HST data, an additional clump is detected, as shown in Figure 4. Our model does not have a predicted counterimage in that position. We traced this part to the source plane using the best-fit model above; the corresponding source located in the northeast of the main source as shown in Figure 4 which suggests it is an extra individual component. S2 is also detected in ALMA [NII] 205 µm (data from Geach et al. 2018 anddescribed in Doherty et al. 2020). While this study failed to identify the detection with a second source, we confirm the presence of S2 at the same redshift as S1 through a combination of HST/WFC3 and [NII] observations. The line profiles of S1 and S2 are compared in Figure 5. The extra component has a narrower line width, which also implies that it comes from a different region. The integrated [NII] 205 µm spectral line flux density of S1 and S2 are 35.72 ± 1.35 Jy km s −1 and 1.05 ± 0.62 Jy km s −1 , respectively.
We also made a comparison of the ratio of the detection results. These are listed in Table 3. For the nondetection of the SMA dust continuum of S2, we adopted a 3σ upper limit when calculating the expected ratios. As shown in the table, the new component contains much less dust compared to S1, but the [NII] detection shows a higher ionization fraction. Assuming a low electron density (below 44 cm −3 ), the [NII] 205 µm emission yields a lower limit on the star formation rate of S2 of ∼ 3 2019). These results are more consistent with a normal star-forming galaxy that is likely merging with the central source (S1) of HERS1. As the detections of S2 are limited to rest-frame optical emission in HST/WFC3 and [NII] 205 µm with ALMA, it is difficult to determine its exact nature. Further deeper observations such as the CO emission and continuum flux densities in the optical to near-IR rest-frame wavelengths are needed to extract gas and stellar properties and to establish its physical properties.

CO line properties
CO SLED is an effective tool that can be used to reveal the bulk physical properties of a galaxy. HERS1 has multiple CO line detections up to J up = 11. We first calculated the observed CO SLED combining the data in this paper and the results from the literature (Geach et al. 2018;Harrington et al. 2021). Figure 6 shows the line variations normalized by CO(1-0). Other wellstudied systems are also shown for comparison (Fixsen et al. 1999;Papadopoulos et al. 2012;Bothwell et al. 2013;Carilli & Walter 2013;Riechers et al. 2013). Overall, the CO flux increases with J up and peaks at J up = 6, then declines toward higher rotation numbers. The excitation is higher than the average of SMGs but not as high as the extremely excited galaxies such as APM 08279 and HFLS-3. The ladder shape is also similar to some local ULIRGs and starburst galaxies (Mashian et al. 2015;Rosenberg et al. 2015).
In order to further study the intrinsic properties of the source, we applied the magnification correction to these line intensities. Due to the lack of data of some lines, we only calculated the delensed fluxes of lines J up = 3, 4, 5, 6, 7, 9. For lines CO(3-2) and CO(4-3), we first rebinned the data to a width of 36 km s −1 and calculated the magnification factor on each channel map based on the best-fit lens model and high-resolution lensed image. The results are shown in Figure 7 (the CI and [NII] lines are presented as well using the same method). For lines with J up ≥ 5, we adopted the magnification factor of CO(4-3) velocity channels due to the lack of high-resolution images. It is suggested that differential lensing may cause a bias on the CO SLED cal-  culation. Serjeant (2012) shows the 1σ dispersion of CO(6-5) can reach 20% of the mean valve for 500 µmselected sources with µ > 10. Frequency-dependent lens models are required to verify our assumption, but this is beyond the scope of this paper. Though differential lensing distorts the CO SLED, the mid-J lines may have similar distortion. The line widths of our CO lines are close to each other, which implies they arise from similar emission regions and this is consistent with the assumption. So we consider using the magnification factor of  a The total integrated [NII] line flux densities from the ALMA spectral line cubes at rest-frame 205 µm were extracted for S1 and S2 (Section 3.2) in units of millijansky.
The HST and SMA flux densities are also converted to units of millijansky so the ratio is unitless. The HST value is for the observed F125W filter while the SMA flux density is for the 870 µm band. b The undetected SMA flux density for S2 (in millijansky) was calculated by adopting a 3σ upper limit.
CO(4-3) as the factor of higher-J up lines is a moderate assumption. Figure 8 shows the observed and delensed line profiles.
The observed line fluxes present a double-horned profile; this feature becomes more obvious after correcting the magnification in each velocity channel. We also separated the line profile into two individual components, a 'blue' part and a 'red' part based on their velocity. A double-Gaussian model was used to fit each delensed line by fixing the peak separation (∆v = 419 km s −1 ) and the two FWHMs (FWHM b = 221 km s −1 , FWHM r = 209 km s −1 ). The model lines were also illustrated in Figure 7 and Figure 8. Velocity-integrated flux densities were then measured from the Gaussian functions. Results are reported in Table 4. The intrinsic SLED of the total source as well as the individual components was also shown in Figure 9. As we can see, the magnification factors vary with the velocity. The blue components have higher magnification than the red components, so the delensed flux profiles show more symmetrical forms compared to the observed spectra.
To further investigate the molecular gas, we apply the lvg code to the CO SLED. We adopt the code radex (van der Tak et al. 2007) to fit our CO fluxes. radex is a non-LTE analysis code to compute the atomic and molecular line intensities with an escape probability of β = (1−e τ )/τ . CO collision files are taken from the Leiden Atomic and Molecular Database (LAMDA) (Schöier et al. 2005).
The input parameters of radex include the molecular gas kinetic temperature T k , the volume density of molecular hydrogen n H2 , the column density of CO, N CO , and the solid angle of the source. The velocity gradient is fixed to 1 km s −1 , so the CO column density N CO also equals to the column density per unit velocity gradient N CO /dv. We are only concerned with the first three parameters, T k , n H2 , and N CO /dv, because the resulting CO SLED shape is not dependent on the solid angle. Instead of using radex grids, which produce a grid of CO emission fluxes given a range of parameters, we adopted a Bayesian method and performed an MCMC calculation to fit radex results with the observed fluxes. This allows a faster convergence and a better sampling in the parameter space (Yang et al. 2017). The code radex emcee was used for the calculation 3 . This package combines pyradex 4 , a python version of Radex converted by Ginsburg, and emcee 5 to achieve the fitting.
Previous studies show that the CO emissions are likely dominated by two components (e.g., Daddi et al. 2015;Yang et al. 2017;Cañameras et al. 2018) and a singlecomponent model is inadequate for fitting our SLED. So we adopted a two-component model including a warmer high-excitation component and a cooler low-excitation component in our fitting as the one-component model poorly fitted the CO SLED. That model allows two sets of parameters for different physical conditions of the two components. We applied the flat log-prior for n H2 , N CO /dv (e.g., Spilker et al. 2014). The ranges of the parameters are taken as n H2 = 10 2 − 10 7 and N CO = 10 15.5 − 10 19.5 for both two components. An extra limit of dv/dr with a range of 0.1 − 1000 km s −1 is adopted (e.g., Tunnard & Greve 2016), this provided a limit of the ratio between N CO /dv and n H2 . For the kinetic temperature, two different prior ranges were chosen. The warmer component also has a flat log-prior with the range T k = T CMB − 10 3 K , where T CMB is the CMB temperature at the redshift of the source. For the cooler part, we set an additional limit that the temperature was close to the temperature of cold dust. As discussed by Goldsmith (2001), the temperature of gas and dust couple well at high density, N H2 ≥ 10 4.5 cm −3 , but this relation is not satisfied when n H2 ≤ 10 3.5 cm −3 . So we took a normal distribution of the cooler component, which gave a reasonable guess in the range T CMB −90 K. Further, the size of the cooler component was set to be larger than the warmer component, inspired by observations of the sizes of different CO line emission regions (e.g., Ivison et al. 2011b). Figure 10, Figure 11 and Table 5 show the best-fitting two-component results. The results indicate that two components to the CO SLED can give a better description of the CO excitation, a low-excitation component with a cooler temperature and a high-excitation component with a warmer temperature. The low-excitation component peaks at J up = 3 and only contributes to J up ≤ 4 excitation. The high-excitation component dominates the CO emission and peaks at J up = 6. The gas density and temperature are consistent with the results of other studies of high-redshift SMGs (e.g., Spilker et al. 2014;Yang et al. 2017). The mid-J lines such as CO(6-5) and CO(7-6) trace the molecular gas, which is related to the star formation. Several works have shown that mid-J CO luminosity has a linear correlation with the infrared luminosity L IR (e.g., Greve et al. 2014;Rosenberg et al. 2015;Lu et al. 2017). The warm component, which contributes nearly all CO flux at J up > 4, is thought to be more closely related to the star formation activity. Yang et al. (2017) also found a correlation between the thermal pressure P th (defined as P th ≡ n H2 × T k ) and the star formation efficiency (defined as SFE ≡ SFR/M gas ), which suggested a tight relation of star formation and the gas in the warmer component in high-redshift SMGs. The warmer component has a high kinetic temperature, T k = 479 K, which is much higher than the dust temperature, T dust = 35 K.
Beacuse the excitation is dominated by the warmer component, the high-T k /T dust ratio may imply extra heating mechanisms. Separately, it has also been suggested that HERS1 might contain an active galactic nucleus (AGN) (Geach et al. 2015). The AGN can be a strong heating source in galaxies as studied by a number of works (e.g., Weiß et al. 2007;Gallerani et al. 2014). The CO SLED shape provides simple diagnosis. Lu et al. (2014) has shown that the ratio of the combined luminosity of mid-J lines to the infrared luminosity L IR would be different in star formation (SF)-dominated and AGN-dominated galaxies. The SF-dominated galaxies have an average logarithmic ratio of -4.13, and the galaxies with a significant AGN contribution show a lower value. HERS1 has a ratio of −3.92 ± 0.11, suggesting the lack of a significant AGN impact on the CO SLED as shown in Figure 12. Two representative AGNdominated galaxies, NGC 1068 and Mrk 231, suggest a much lower mid-J to L IR ratio. These two galaxies are also found to have higher CO emissions at J up > 10. The high-J CO lines can be employed to characterize the AGN heating as well. The ratio of high-J to mid-J CO line has been used to discriminate the starburst and AGN activity. We use the ratio of CO(10-9) and CO(6-5) to characterize the AGN heating, which has been used in the literature (e.g., Wang et al. 2019;Jarugula et al. 2021). An AGN can increase the high-J excitation and will provide a flat SLED shape as well as a higher line ratio in high-J and mid-J lines. From the model results, we can get L CO(10−9) /L CO(6−5) ∼ 0.6−0.7, which is similar to the Class II samples of Rosenberg et al. (2015). It is suggested that most of the Class II galaxies have a low AGN contribution. The line ratio is also close to the average value of the local starburst galaxies as shown in Carniani et al. (2019). Though we cannot completely rule out an AGN contribution, the line ratio result does not conclusively suggest AGN heating as the dominant mechanism. Further observations such as X-ray studies are needed to study the AGN in HERS1 in detail. Other heating sources such as mechanical processes invoked in some local starburst galaxies (e.g., Hailey-Dunsheath et al. 2008;Nikola et al. 2011) are also possible. The two individual kinematic components were then fitted in a similar manner. The two spectral components show similar excitation compositions and gas properties to the global SLED.

References
The SED-fitting procedure used here did not include a contribution from an AGN. As mentioned above, Geach et al. (2015) has shown that HERS1 contains a radioloud AGN with eMERLIN observations, showing a radio  Note-Values for each parameter are the median value and 1σ range from the marginal probability distribution. The 'L' and 'H' indicate the low and high excitation. The 'HERS1', 'Blue' and 'Red' denote the total and the two kinematic components, respectively.
flux density excess at 1.4GHz above that expected from the far-IR-radio relation for star-forming galaxies. The presence of an AGN through CO SLED, especially the high-J component, as discussed above, is inconclusive. Therefore, we expect that ignoring this AGN component does not strongly affect the final results presented here. One reason for this argument is that the 22 µm WISE/W4 band does not show a prominent excess in the mid-IR band. Employing an AGN diagnostic suggested in Stanley et al. (2018), the flux density ratio from the data is log 10(F 870µm /F 24µm ) ∼ 1.5 inferred from the SMA and WISE results. Such a value indicates HERS1 resides in the pure star-forming galaxies area and any AGN contribution to the IR luminosity is likely less than 20%. For such an AGN contribution Hayward & Smith (2015) has shown that magphys can give a robust inference of galaxy properties, even if the AGN contribution to the total IR luminosity reaches up to 25%. These results suggest that the lack of an AGN contribution in the SED fitting is reliable. The differential lensing effect was also considered to be negligible because all of the current continuum models have similar magnification factors. From the best-fit SED model, we can derive the total intrinsic infrared luminosity of HERS1, L IR = (1.0 ± 0.3) × 10 13 L , which makes it one of the hyperluminous infrared galaxies at high-redshift. The corresponding star formation rate is 1023 ± 264 M yr −1 assuming a Chabrier initial mass function (Chabrier 2003) and a convertion formula SFR = 1 × 10 −10 L IR . HERS1 also possesses a large value of stellar mass, which is in agreement with simulations (Davé et al. 2010) and model requirements of submillimeter-bright galaxies (Hayward et al. 2011). The relation between stellar mass and SFR relation is shown in the top panel of Figure 14. It is suggested that there is a tight correlation (called 'main sequence') between the SFR and stellar mass for the majority of star-forming galaxies both in the local and high redshifts. Figure 14 presents the z = 2.6 'main sequence' of Speagle et al. (2014) with a scatter of 0.2 dex. As we can see, HERS1 has a higher SFR than the 'main sequence' value at the corresponding stellar mass. The huge molecular gas reservoirs are available to meet the intense star formation activities. For the dust temperature, the best-fit result is T d = 35.1 +1.9 −1.4 K. The dust temperature is correlated with the infrared-luminosity for both local infrared luminous SMGs and high-redshift far-infrared or submillimeter-selected galaxies as shown by many works (Chapman et al. 2005;Hwang et al. 2010;Magdis et al. 2010;Elbaz et al. 2011;Casey et al. 2012;Magnelli et al. 2012;Symeonidis et al. 2013;Magnelli et al. 2014;Béthermin et al. 2015). It is a useful way to study different galaxy populations using the L IR − T d relation. The dust temperature of galaxies at high redshift is likely biased to a cooler value compared with local galaxies with the same luminosities. Hwang et al. (2010) found a modest evolution of L IR − T d relation as a function of redshift using Herschel-selected samples out to z ∼ 2 − 3. They concluded that SMGs are on average 2 − 5 K cooler than the local counterpart from their observation. A larger scatter of the relation at high redshift is also possible, as found by Magnelli et al. (2012), although this can be reconciled with the results of Hwang et al. (2010) by considering selection effects. Figure 15 shows the dust temperature and infrared luminosities of HERS1 with submilimeter-and Herschel-selected SMGs at z ∼ 2 − 3. Compared with   these galaxies, HERS1 has a colder temperature similar to other lensed candidate samples of Nayyeri et al. (2016).   Flux density (mJy)

Molecular gas and CI
Attenuated SED Unattenuated SED Figure 13. Best-fit SED of HERS1 using the demagnified flux densities in magphys. The blue line shows the attenuated SED while the orange line shows the unattenuated SED. The red points are the photometric results, as tabulated in Table 6 after demagnification.
The intrinsic luminosity L CO(1−0) of CO(1-0) is (5.12±1.81)×10 10 K km s −1 pc 2 using the GBT data observed by Harrington et al. (2021) as mentioned above. The molecular gas mass is (5.57 ± 1.97) × 10 10 M , adopting a conversion factor α CO = 0.8 M (K km s −1 pc 2 ) −1 , which is usually used in the high-redshift starburst galaxies and taking into the a factor of 1.36 for helium. We also calculated the gas mass from SMA data using the empirical calibration of Scoville et al. (2014). The result is (8.92 ± 2.23) × 10 10 M . Note that the conversion factor α CO is one of the main uncertainty sources in the molecular gas mass determination. In the CO-H 2 conversion, different values are taken for various types of galaxies. The factor α CO =0.8 is commonly used for starburst galaxies while a high value α CO = 4.0 is often used in Milky Way and other local star-forming galaxies. Scoville et al. (2014) also used a high conversion factor during the calibration of α 850 in their empirical relation of gas mass and dust emission. The lower α CO value adopted in this paper results a very low gas/dust ratio (∼ 20), which makes it become an extreme case. More research is required to determine the α CO in HERS1. For example, Genzel et al. (2015)  has shown that α CO could be measured given the metallicity of the galaxy. This could help to further constrain the α CO value. The observation of CO(7-6) also had a detection of C I(2-1) line at 227.79 GHz 'for free. ' We traced the CI line into the source plane using the same method for high-J CO lines. The result is shown in Figure 8 as well. It is suggested that C I lines arise from the same region as low-J CO transitions and can be used to derive the gas properties without other information. The intrinsic luminosity of C I(2-1) from our observation is L CI(2−1) = (1.60 ± 0.22) × 10 10 K km s −1 pc 2 . Combing the C I(1-0) observation giving the intrinsic luminosity L CI(1−0) = (1.65 ± 0.20) × 10 10 K km s −1 pc 2 , we can derive the carbon excitation temperature T ex = 50 K from the formula T ex = 38.8 K/ln[2.11/R CI ] where R CI is the ratio between C I(2-1) and C I(1-0) luminosity. The carbon mass was estimated following Weiß et al. (2003), M CI = (2.16 ± 0.26) × 10 7 M . The atomic carbon can also be an effective tracer to measure the gas mass. Assuming the atom carbon abundance X[CI]/X[H 2 ] = M (CI)/(6M (H 2 )) = 3 × 10 −5 , the gas mass is (1.63±0.63)×10 11 M including helium. Figure  14 shows the gas mass results of the above three methods along with other galaxies at z ∼ 2. The carbon abundance also shows variation in different galaxies. Though there are no significant changes, a higher (e.g., Weiß et al. 2005;Walter et al. 2011) or lower (e.g. Jiao et al. 2021) abundance will slightly alter the consistency of the results derived from CI and other components. Having the gas mass and star formation rate, we can derive the gas depletion timescale t dep ≡ M gas /SFR. HERS1 has a gas depletion of ∼100 Myr; this is much smaller than ∼1 Gyr for star-forming galaxies (e.g., Kennicutt 1998;Genzel et al. 2010;Saintonge et al. 2011;Decarli et al. 2016a,b) and a 'main sequence' ∼0.7 Gyr or even shorter (Saintonge et al. 2013;Tacconi et al. 2013;Sargent et al. 2014). The gas fraction f gas can be calculated as M gas /(M * + M gas ). HERS1 has a low gas fraction with f gas = 0.19 ± 0.14 as shown in the bottom panel of Figure 14. The low gas fraction and high stellar mass indicate HERS1 has formed most of its stars.

SUMMARY
We present a detailed study of an extremely luminous SMG at z = 2.553, gravitationally lensed by two foreground galaxies at z = 0.202. The lensed galaxy, dubbed HERS1, features a partial Einstein ring with a radius of ∼ 3 observed in high-resolution maps of HST/WFC3, SMA, and ALMA. Based on the reconstructed lens model, we find magnification factors µ star = 13.6 ± 0.4 and µ dust = 12.8 ± 0.3 for stellar and dust emissions of HERS1, respectively. We perform SED fitting on multiband photometry of HERS1, corrected for magnification, to measure its physical properties including stellar mass M * = 4.3 × 10 11 M , star formation rate SFR=1023 M yr −1 , and dust temperature T d = 35 K.
We analyze the physical conditions of the molecular gas through CO SLED modeling and find that its lowexcitation component has a gas density n H2 = 10 3.8 cm −3 , and kinetic temperature T k = 18 K, while the high-excitation component has n H2 = 10 3.1 cm −3 , and T k = 479 K. We also find that HERS1 shows higher excitation compared to an average SMG. We further derive the total molecular gas of HERS1 using three distinct tracers, including CO(1-0), CI lines, and SMA 870 µm. We measure a gas fraction f gas = 0.19 with a depletion time t dep ∼ 100 Myr. The short gas depletion time, compared to 1 Gyr for typical SFGs at z ∼ 2, suggests that HERS1 will become quiescent shortly owing to the lack of cool gas replenishment. The location of HERS1 on the SFR-M * relation shows that it is located on the massive end of the main sequence of star formation. It reveals that HERS1 has formed the bulk of its stellar mass by z ∼ 2.5, and is about to enter a quiescent phase through the halting of cool gas replenishment, possibly caused by feedback or environmental processes.
Moreover, we report the detection of another lensing arc feature in deep HST/WFC3 images. The feature is also detected in [N II] 205 µm, implying that it is at the same redshift as HERS1. We compare [N II] 205 µm line profile of this feature with that of HERS1 and find that the extra lensing feature has a narrower line width. We thus conclude that this extra feature is originated from a different region. Due to the lack of high S/N multiband detections, further deep observations are needed to fully understand the physical properties of this extra lensing component.
ACKNOWLEDGEMENT Financial support for this work was provided by NSF through AST-1313319, PID 05-0087 from SOFIA for H.N. and A.C.. A.C., H.N. and N.C. acknowledge support from NASA ADAP 80NSSC20K04337 for archival data analysis of bright Herschel sources. U.C.I. group also acknowledges NASA support for Herschel/SPIRE GTO and Open-Time Programs. This work is also based on observations made with the NASA/DLR Stratospheric Observatory for Infrared Astronomy (SOFIA). Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc. The authors wish to extend special thanks to those of Hawaiian ancestry on whose sacred mountain we are privileged to be guests. Without their generous hospitality, most of the observations presented herein would not have been possible. The Submillimeter Array is a joint project between the Smithsonian Astrophysical Observatory and the Academia Sinica Institute of Astronomy and Astrophysics and is funded by the Smithsonian Institution and the Academia Sinica.